Question 1

Where do the closed-source prices come from?

Accepted Answer

Published API rates from OpenAI, Anthropic, and Google as of April 2026, the same per-unit figures you'd see on their pricing pages. For text, we blend input and output token rates using the input/output mix you set under Advanced.

Question 2

What prices are you using for the open-source models?

Accepted Answer

Per-model rates from the FlexAI Token Factory model database, sourced from OpenRouter, Together AI, and Cloudflare Workers AI. Token Factory rates are now live. See the FlexAI pricing page for current rates.

Question 3

What about image models?

Accepted Answer

Covered in the Image tab above. We compare DALL-E 3, GPT-Image-1, Imagen 4, and Gemini 2.5 Flash Image against FLUX.1 on FlexAI Token Factory.

Question 4

Why does the recommended open-source model change when I switch use case?

Accepted Answer

Different open-weights models lead in different workloads: DeepSeek R1 and Kimi K2.5 for reasoning, Qwen 3 Coder for code, Llama 4 Maverick for long-context RAG, GPT-OSS 120B for general chat at mid-tier cost. The picker ranks by use-case fit first, then by price.

Question 5

How will this stay current as new models ship?

Accepted Answer

The model catalogues and prices live in data files kept in lockstep with the FlexAI Token Factory model database. We refresh whenever a new frontier-class model lands or a provider re-prices.

Question 6

How do I know an open-source model is good enough?

Accepted Answer

Run an eval on your own workload. That's the only answer that holds up. Read our guide on evaluating open-source models to understand what benchmarks matter, then use our lm-evaluation-harness blueprint to run 300+ standardized tests on FlexAI with no infra setup.

Question 7

How much cheaper is Token Factory compared to OpenAI GPT-5?

Accepted Answer

At 100M tokens/month, GPT-5 costs approximately $475/month. GPT-OSS 120B on FlexAI Token Factory costs approximately $6/month, a 99% reduction. GLM 4.7 runs at roughly $77/month for the same volume.

Question 8

Which open-source LLM is most cost-effective on Token Factory for high-volume use?

Accepted Answer

Llama 3.1 8B Instruct is currently the most cost-effective chat option at ~$0.018/MTok input, making it ideal for high-volume classification, summarization, and RAG workloads. GPT-OSS 120B offers the best cost-performance balance for reasoning-heavy tasks.

Question 9

Does Token Factory support OpenAI-compatible APIs?

Accepted Answer

Yes. Token Factory uses an OpenAI-compatible REST API. Change your base URL and API key. No SDK changes required. Works with LangChain, LlamaIndex, PortKey, LiteLLM, and any library that supports custom endpoints.

Question 10

How does FlexAI Token Factory pricing compare to Fireworks AI or Together AI?

Accepted Answer

Token Factory is priced against the credible live market rate per model, source-linked and recalculated automatically whenever any provider moves, with no manual repricing and no model-by-model lag. Fireworks AI and Together AI reprice reactively, model by model. There are no seat fees, no minimums, and no GPU reservations.

Question 11

What is FlexAI Token Factory?

Accepted Answer

Token Factory is FlexAI's serverless, per-token inference. Pay per token, no GPU reservations required, no upfront commitments. Serves GPT-OSS, GLM 4.7, Llama, Qwen, Gemma, DeepSeek and more. See the live catalog on the models page.

Question 12

Can I switch from OpenAI to open-source models without rewriting my code?

Accepted Answer

Yes. Token Factory is OpenAI API-compatible. Point your existing OpenAI client to https://tokens.flex.ai/v1, replace your key, and change the model name. No code refactoring required.

Cut your inference bill by switching to open models

What you're paying today

Recommended open-source swap

What you're paying today

Recommended open-source swap

Example: 100M tokens/month

Why open-source costs less

No proprietary markup

Same quality tier, different economics

FAQ

Model	Provider	Monthly cost
GPT-5	OpenAI	$475
Claude Sonnet 4.6	Anthropic	$780
GPT-OSS 120B	FlexAI Token Factory	$6
GLM 4.7	FlexAI Token Factory	$77
Llama 3.1 8B Instruct	FlexAI Token Factory	$2