Name: FlexAI Token Factory
Brand: FlexAI
Price: 0.225 USD
Availability: InStock

Question 1

How is FlexAI priced?

Accepted Answer

Token Factory is usage-priced per unit: per token for text models, and for media models per image, per million characters of speech, per minute of audio, or per generated video clip. Each rate is set against the credible market rate, repriced automatically as the market moves, with the public source linked per row. Dedicated endpoints are priced per GPU-hour, and AI Factory is priced per deployment. See the Dedicated and Custom tabs.

Question 2

How does FlexAI set each model's price?

Accepted Answer

Every model is priced against the credible market rate (pricepertoken.com, Artificial Analysis, provider pages) and set below it, repriced automatically when the source moves. See "How our pricing works" below, and click any row for its source.

Question 3

How often does pricing update?

Accepted Answer

Automatically. A scheduled refresh re-reads the sources, so when the tracked market rate moves, our rate moves with it.

Question 4

What if a source is unreliable?

Accepted Answer

Every row is attributed to a named, verified source. Unreliable sources and outliers are excluded: we track credible published prices, not whatever number happens to be on the internet.

Question 5

Are any models priced differently?

Accepted Answer

The open-weight catalog follows the rule above. Frontier/closed passthroughs and dedicated deployments are priced separately. See the Dedicated and Custom tabs.

Question 6

How does Dedicated pricing work vs Serverless?

Accepted Answer

Serverless bills per token, best for spiky or early-stage volume. Dedicated bills per GPU-hour for your own endpoints, best once steady-state volume crosses the break-even point.

Question 7

When should I switch from Serverless to Dedicated?

Accepted Answer

When sustained volume makes a reserved GPU cheaper than per-token billing. The crossover calculator shows your exact break-even.

Question 8

I'm coming from OpenAI or Claude. How hard is the switch?

Accepted Answer

From the OpenAI SDK it's a two-value change: point base_url at tokens.flex.ai/v1 and swap the model. From the Claude API it's a small refactor. The migration guide has runnable snippets and a closed-to-open model mapping.

Model	Input $/M	Output $/M	Context	Source
DeepSeek V3.2	$0.225	$0.225	160K	source
GPT-OSS 120B	$0.035	$0.090	128K	source
GPT-OSS 20B	$0.027	$0.117	128K	source
Llama 3.1 8B Instruct	$0.018	$0.027	16K	source
Llama 3.3 70B Instruct	$0.090	$0.288	128K	source
Mistral Nemo	$0.018	$0.027	128K	source
Qwen3 Coder 30B A3B	$0.063	$0.234	256K	source
Qwen3.6-35B-A3B	$0.090	$0.135	256K	source
Gemma 4 31B IT	$0.108	$0.315	256K	source
Gemma 4 26B A4B	$0.054	$0.270	256K	source
PaddleOCR-VL 1.5	$0.126	$0.720	16K	source
Nemotron 3 Super 120B A12B	$0.081	$0.405	256K	source
MiniMax M2.7	$0.225	$0.900	192K	source
Qwen3.5 9B	$0.090	$0.135	250K	source
GLM 4.5 Air	$0.113	$0.765	128K	source
DeepSeek V4 Flash	$0.082	$0.164	1.0M	source

Model	Rate	Context	Source
Whisper Large V3 Turbo	$0.0006 / min	—	source
NVIDIA Parakeet TDT 0.6B v3	$0.0014 / min	—	source
Kokoro-82M	$0.558 / M chars	—	source

Tier	Price	Includes	SLA
Starter	$10/month in free credits for your first 3 months, card required at signup, then pay-as-you-go at our published per-token rates	2 workspace seats. OpenAI-compatible API. Dedicated endpoints on demand. Playground access.	99%
Essential	$100 matched ($100 deposit → $200 credit). Contact us today; self-serve soon.	8 seats. Concurrency + multi-fractional. Architecture call at $50 spend. SE access. HIPAA, DORA.	99.5%
Custom	Custom pricing on your workload. Talk to us.	VPC, on-prem, air-gapped. Dedicated CSM. Geo redundancy.	99.9%

Predictable pricing.Auditable per model

Serverless: Token Factory

Dedicated GPU pricing

On FlexAI

Custom: AI Factory & Enterprise

Cloud Savings Calculator

Want the breakdown as a PDF?

How our pricing works

How the rate is set

Auditable per row

Why this matters

Frequently Asked Questions