Your 70B LLM Fine-Tune Doesn't Need an H100
If you're doing research — fine-tuning, running ablation studies, iterating on model architectures — your bottleneck is rarely raw compute throughput.
Your bottleneck is iteration speed and how many experiment variants you can afford to run.
And that is where the common intuition about GPU rental starts to work against you.
The H100 trap
Let's say you want to fine-tune a 70B parameter model, run a few LoRA variants, compare prompt strategies, or do some quick training runs to validate a hypothesis.
The natural instinct is: get the biggest GPU available.
Grab an H100. 80GB of VRAM. Top-tier performance. Maximum headroom.
Cost: ~$3–5 per hour, depending on the cloud provider.
Now run that experiment.
Maybe it takes 3 hours. That's $15 for a single run.
Now you need to try a different learning rate. That's $15 again.
Different LoRA rank? $15 again.
You want to try a different dataset mix. $15.
Try a smaller adapter first. $15.
Check if the model fits with a different quantization. $15.
Before you know it, you've spent $150 on what should have been a cheap iteration cycle.
And the H100 didn't even matter.
You weren't running inference at scale. You weren't training a base model from scratch. You were iterating.
What researchers actually need
The core requirement for most research workloads is:
- Enough memory to not OOM — fit the model, the optimizer state, the gradients, the activations
- Fast enough iteration — run small experiments quickly so you can decide what to do next
- Cost low enough to run many variants — because you don't know the right answer yet, and you'll need to try a lot of things
For a 70B model with LoRA fine-tuning, you need roughly 48–80GB of VRAM, depending on the precision and rank.
The H100 gives you 80GB — great.
But a 70B LoRA fine-tune doesn't need 80GB of raw throughput. It needs 80GB of memory — and then it runs fine on hardware with less compute headroom.
DGX Spark has 128GB of unified memory. The model fits easily. The memory bandwidth is more than enough for a fine-tuning workload.
And it costs $0.65 per hour.
That's not a marginal difference. That's a 5–8x cost reduction for the same workload.
The numbers
Here's a realistic comparison for a common research scenario:
Scenario: Fine-tuning a 70B model with LoRA on a research project
You want to run 10 experiment variants:
| Hardware |
Cost/hour |
10 runs |
Total |
| H100 (80GB) |
$3.85 |
~3 hrs each |
$115 |
| H200 (141GB) |
$4.50 |
~3 hrs each |
$135 |
| DGX Spark (128GB) |
$0.65 |
~3 hrs each |
$19.50 |
That's the same experiments. Same model. Same quality of results.
The only difference is how much you pay.
Now add the hidden cost of the expensive option: fear of running too many variants.
When every run costs $15, you start being conservative. You run fewer learning rates. Fewer dataset combinations. Fewer rank configurations. You skip the thing you wish you could try, because it's just too expensive to burn on a hunch.
On Spark at $1.95 per run? You try everything. And that's where the actually good results come from.
"But 128GB is smaller than an H100's VRAM"
This is the part that catches people off guard.
128GB of unified memory fits more models than 80GB, because it's not fragmented across separate memory pools. It's unified CPU + GPU memory on the Blackwell architecture.
A 70B model at 4-bit quantization needs 35GB for the weights. Add LoRA adapters (2–4GB), optimizer state (20GB for AdamW), and activations. You're looking at **60–75GB total**.
That fits comfortably in 128GB.
The H100's 80GB is tight. You're often squeezing the quantization, dropping batch sizes, or wrestling with offloading just to make it work.
On Spark, the model fits naturally. No gymnastics. And the 128GB gives you real headroom for larger models, bigger batches, or more complex fine-tuning setups.
A real budget example
Here's what one researcher's month looked like, running 70B fine-tuning experiments:
On AWS (p4d with H100s)
- 3 weeks of experiments
- ~25 runs total
- Average 4 hours per run
- H100 cost: ~$3.85/hour
- Total: ~$385
On DGX Spark
- Same 3 weeks
- Same 25 runs (+ 12 more because "hey, why not")
- Average 4 hours per run
- Spark cost: $0.65/hour
- Total: ~$182
The difference isn't just money. It's how much research you can actually do in a semester budget.
When more compute throughput does matter
This isn't saying H100 is bad. It's an incredible GPU — for the right workloads.
But match the tool to the job:
Use H100 when you need maximum compute throughput:
- Training from scratch (base model pretraining)
- Large batch inference at scale
- Production serving with tight latency requirements
- Workloads that are truly compute-bound
Use DGX Spark when throughput isn't the bottleneck:
- Fine-tuning and LoRA experiments
- Ablation studies and hyperparameter sweeps
- Running 70B–200B class models (with quantization)
- Iterating on architectures or prompts
- Budget-conscious research where you're measuring learning rate per dollar, not tokens per second
The iteration multiplier
Here's the thing that doesn't show up in benchmark charts:
When experiments are cheap, you do more of them. And more experiments = better results.
A researcher on H100 at $15/run might try 3–5 variants before the budget runs out.
A researcher on Spark at $1.95/run tries 15–20 variants.
The better result is almost always in the experiments you didn't run because they were too expensive.
That's the iteration multiplier. It's not a raw performance metric. It's the compound effect of being able to try more things without feeling guilty about the cost.
For research, that matters more than peak throughput.
The bottom line
Research isn't about running one expensive experiment perfectly.
It's about running many experiments cheaply, quickly, and with enough headroom to explore.
For that kind of work, H100's compute throughput is wasted money. You don't need more GPU cores — you need enough memory to fit the model, and you need to iterate fast.
DGX Spark gives you 128GB of unified memory (more than the H100's 80GB) at a fraction of the cost. For iterative research, that's the right trade-off.
Less compute throughput. More memory capacity. Cheaper experiments. Better research.
Spark gives researchers access to dedicated 128GB DGX Spark environments at $0.65/hour — built for fine-tuning, experimentation, and the kind of iterative work that defines good research. To get started, visit spark.enverge.ai.