Cut your inference bill by switching to open models
Compare closed-model APIs against open-weight models on FlexAI Token Factory, for text and image generation.
Compare closed-model APIs against open-weight models on FlexAI Token Factory, for text and image generation.
$1.25/M input · $10.00/M output
Shapes the recommendation and the default input/output mix.
Efficient MoE: frontier quality at mid-tier pricing
$0.06/M input · $0.09/M output · indicative
GPT-5 monthly cost
$475
Qwen 3 235B A22B Instruct monthly cost
$7.43
You save
98%
on every token, vs. GPT-5
$468 / month
$5,611 / year
Closed-source rates reflect published API pricing as of April 2026. Token Factory rates are live. See the pricing page for current per-model rates.
$0.040 per 1024×1024 image
Black Forest Labs open champion: Apache 2.0, 4-step generation, most popular open image model
$0.0004 per 1024×1024 image · indicative
More open-weights image models (FLUX.1 [dev], Kontext Max) are planned for Token Factory wave 2.
DALL-E 3 (standard) monthly cost
$400
FLUX.1 [schnell] monthly cost
$4.50
You save
99%
per image, vs. DALL-E 3 (standard)
$396 / month
$4,746 / year
Closed-source rates reflect published per-image API pricing as of April 2026. Token Factory rates are live. See the pricing page for current per-model rates.
| Model | Provider | Monthly cost |
|---|---|---|
| GPT-5 | OpenAI | $475 |
| Claude Sonnet 4.6 | Anthropic | $780 |
| GPT-OSS 120B | FlexAI Token Factory | $6 |
| GLM 4.7 | FlexAI Token Factory | $77 |
| Llama 3.1 8B Instruct | FlexAI Token Factory | $2 |
At 100M tokens/month (60/40 input/output mix), switching from GPT-5 to GPT-OSS 120B on Token Factory saves ~$469/month (99%).
Closed-source APIs price in R&D recovery, brand, and margin on top of compute. Open-weights models on Token Factory are priced to the underlying inference cost. You pay for tokens (or images), not for the label on the box.
Open-weights models (Llama 4, Qwen 3, DeepSeek V3.2, FLUX.1) are competitive with frontier closed-source offerings on a growing number of public benchmarks, and for many production workloads they're a drop-in swap.
How this calculator works, and what the numbers mean.