Predictable pricing.Auditable per model
Every serverless model priced from public market sources and repriced automatically, with each row linking its source.
Every serverless model priced from public market sources and repriced automatically, with each row linking its source.
Pay for what you use, one OpenAI-compatible key, every model in the catalog.
Pricing an agent workload means more than the sticker rate: average tokens per run, tool calls, fallback rate, and whether steady traffic should move to dedicated endpoints.
| Model | Input $/M | Output $/M | Context | Source |
|---|---|---|---|---|
| DeepSeek V3.2 | $0.225 | $0.225 | 160K | source |
| GPT-OSS 120B | $0.035 | $0.090 | 128K | source |
| GPT-OSS 20B | $0.027 | $0.117 | 128K | source |
| Llama 3.1 8B Instruct | $0.018 | $0.027 | 16K | source |
| Llama 3.3 70B Instruct | $0.090 | $0.288 | 128K | source |
| Mistral Nemo | $0.018 | $0.027 | 128K | source |
| Qwen3 Coder 30B A3B | $0.063 | $0.234 | 256K | source |
| Qwen3.6-35B-A3B | $0.090 | $0.135 | 256K | source |
| Gemma 4 31B IT | $0.108 | $0.315 | 256K | source |
| Gemma 4 26B A4B | $0.054 | $0.270 | 256K | source |
| PaddleOCR-VL 1.5 | $0.126 | $0.720 | 16K | source |
| Nemotron 3 Super 120B A12B | $0.081 | $0.405 | 256K | source |
| MiniMax M2.7 | $0.225 | $0.900 | 192K | source |
| Qwen3.5 9B | $0.090 | $0.135 | 250K | source |
| GLM 4.5 Air | $0.113 | $0.765 | 128K | source |
| DeepSeek V4 Flash | $0.082 | $0.164 | 1.0M | source |
| Model | Input $/M | Context | Source |
|---|---|---|---|
| BGE-M3 | $0.009 | 8K | source |
| Model | Per image | Context | Source |
|---|---|---|---|
| FLUX.1 [schnell] | $0.0004 | — | source |
Find your serverless to dedicated break-even point
$10/month in free credits for your first 3 months.
Your own endpoints, fine-tuned models, or whole clusters, priced per GPU-hour and metered per second, on FlexAI or your own infrastructure. On-demand and reserved options; managed fine-tuning available.
Compute hosted on FlexAI's NVIDIA and AMD fleets, with up to 99.9% SLA by tier.
*Contact sales for Essential and Custom package rates.
Private cloud on your hardware, with the same managed AI stack.
LegML fine-tuned a 32B legal LLM on FlexAI H100.
Estimate your savings against AWS Bedrock, Azure OpenAI, GCP Vertex, Together, Fireworks, OpenAI, or Anthropic, at FlexAI's published rates.
Leave your email and we'll send a model-by-model savings breakdown.
| Tier | Price | Includes | SLA |
|---|---|---|---|
| Starter | $10/month in free credits for your first 3 months, card required at signup, then pay-as-you-go at our published per-token rates | 2 workspace seats. OpenAI-compatible API. Dedicated endpoints on demand. Playground access. | 99% |
| Essential | $100 matched ($100 deposit → $200 credit). Contact us today; self-serve soon. | 8 seats. Concurrency + multi-fractional. Architecture call at $50 spend. SE access. HIPAA, DORA. | 99.5% |
| Custom | Custom pricing on your workload. Talk to us. | VPC, on-prem, air-gapped. Dedicated CSM. Geo redundancy. | 99.9% |
Building for production? See the FlexAI Startup Program. Apply-only, three stages, scaling on one account as your workload grows.
Models with royalty obligations or pricing floors (currently MiniMax M2.7 and FLUX.2) are excluded from Startup Program introductory discounts.
For each model we track the cheapest credible market rate (pricepertoken.com, Artificial Analysis, and provider pages) and price at that rate × 0.90. When a source moves, a scheduled refresh reprices automatically.
Every row links the public source its rate derives from. Click through to check the current market rate for yourself.
Most token APIs reprice reactively when they lose share. A rate pinned to a public source, at a fixed margin below it, is predictable: you can budget against it.
Coming from OpenAI or Claude? Read the migration guide
Comparing providers? See how FlexAI compares with OpenAI, Fireworks, AWS Bedrock, and CoreWeave.