At low volume, serverless is the obvious choice. At scale, dedicated beats per-token pricing. Find the exact crossover for your model and monthly token volume.
Serverless
$2,750
per month
Dedicated
$2,044
per month
Dedicated wins
$706
26% cheaper than serverless · $8,472 / year
Serverless rates are approximate public list prices (April 2026). Dedicated costs assume 1× AMD Instinct MI300X for 730 hrs/month. Actual savings vary by workload and configuration.
Even below the crossover, dedicated endpoints unlock things serverless APIs can't offer.
Scale to zero between requests. Pay only for active compute, not provisioned time.
Your endpoint, your throughput. No queueing behind other tenants at peak.
Serve LoRAs and full fine-tunes without serverless catalog restrictions.
Deploy on FlexAI Cloud or bring your own. Inference never leaves your boundary.
You pay per token. Cost scales linearly with volume — predictable, but the per-token price bundles infrastructure, operations, and provider margin.
monthly = tokens_M × rate_per_MYou lease a GPU configuration for the month. Cost is flat regardless of how many tokens you generate — above the crossover, the per-token effective rate falls below any serverless price.
monthly = gpu_count × rate × 730 hrsAt low token volumes, serverless GPU inference is the obvious choice: no upfront commitment, no infrastructure to manage, and you pay only for what you use. But serverless pricing bundles infrastructure, operations, and provider margin into every token — at scale, that overhead adds up.
Dedicated inference flips the model. You lease a fixed GPU configuration for the month and your endpoint processes as many tokens as the hardware allows. The effective per-token cost falls as volume grows, and above the crossover point, dedicated consistently undercuts even competitive serverless pricing.
The break-even threshold varies widely by model. A smaller model with a low market rate has a higher crossover — you need more volume before dedicated makes sense. A large model with aggressive serverless pricing may cross over at a surprisingly modest volume. This calculator shows you the exact threshold for your workload in two fields.