Skip to content

    Predictable pricing.Auditable per model

    Every serverless model priced from public market sources and repriced automatically, with each row linking its source.

    Serverless: Token Factory

    Pay for what you use, one OpenAI-compatible key, every model in the catalog.

    Pricing an agent workload means more than the sticker rate: average tokens per run, tool calls, fallback rate, and whether steady traffic should move to dedicated endpoints.

    ModelInput $/MOutput $/MContextSource
    DeepSeek V3.2$0.225$0.225160Ksource
    GPT-OSS 120B$0.035$0.090128Ksource
    GPT-OSS 20B$0.027$0.117128Ksource
    Llama 3.1 8B Instruct$0.018$0.02716Ksource
    Llama 3.3 70B Instruct$0.090$0.288128Ksource
    Mistral Nemo$0.018$0.027128Ksource
    Qwen3 Coder 30B A3B$0.063$0.234256Ksource
    Qwen3.6-35B-A3B$0.090$0.135256Ksource
    Gemma 4 31B IT$0.108$0.315256Ksource
    Gemma 4 26B A4B$0.054$0.270256Ksource
    PaddleOCR-VL 1.5$0.126$0.72016Ksource
    Nemotron 3 Super 120B A12B$0.081$0.405256Ksource
    MiniMax M2.7$0.225$0.900192Ksource
    Qwen3.5 9B$0.090$0.135250Ksource
    GLM 4.5 Air$0.113$0.765128Ksource
    DeepSeek V4 Flash$0.082$0.1641.0Msource
    ModelInput $/MContextSource
    BGE-M3$0.0098Ksource
    ModelPer imageContextSource
    FLUX.1 [schnell]$0.0004source
    ModelRateContextSource
    Whisper Large V3 Turbo$0.0006 / minsource
    NVIDIA Parakeet TDT 0.6B v3$0.0014 / minsource
    Kokoro-82M$0.558 / M charssource

    Find your serverless to dedicated break-even point

    Get an API key

    $10/month in free credits for your first 3 months.

    Dedicated GPU pricing

    Your own endpoints, fine-tuned models, or whole clusters, priced per GPU-hour and metered per second, on FlexAI or your own infrastructure. On-demand and reserved options; managed fine-tuning available.

    On FlexAI

    Compute hosted on FlexAI's NVIDIA and AMD fleets, with up to 99.9% SLA by tier.

    • State-of-the-art NVIDIA GPUs with NVLink and InfiniBand
    • On-demand and reserved pricing
    • Per-second metering
    • Region selection
    • Managed fine-tuning: per-M training tokens + storage $/GB-mo
    • One-click setup: get started in minutes
    Per GPU-hour, on-demand*
    B200$6.25/hr
    H200$3.15/hr
    H100$2.10/hr
    A100$1.80/hr
    L40S$1.50/hr

    *Contact sales for Essential and Custom package rates.

    Contact Us

    Custom: AI Factory & Enterprise

    Private cloud on your hardware, with the same managed AI stack.

    • Custom pricing on your workload. Talk to us.
    • VPC, on-prem, and air-gapped options
    • SLA 99.9% with geo redundancy
    • Dedicated CSM
    Proven on FlexAI
    • 75% lower compute cost
    • €22.5K total training cost

    LegML fine-tuned a 32B legal LLM on FlexAI H100.

    Cloud Savings Calculator

    Estimate your savings against AWS Bedrock, Azure OpenAI, GCP Vertex, Together, Fireworks, OpenAI, or Anthropic, at FlexAI's published rates.

    TierPriceIncludesSLA
    Starter$10/month in free credits for your first 3 months, card required at signup, then pay-as-you-go at our published per-token rates2 workspace seats. OpenAI-compatible API. Dedicated endpoints on demand. Playground access.99%
    Essential$100 matched ($100 deposit → $200 credit). Contact us today; self-serve soon.8 seats. Concurrency + multi-fractional. Architecture call at $50 spend. SE access. HIPAA, DORA.99.5%
    CustomCustom pricing on your workload. Talk to us.VPC, on-prem, air-gapped. Dedicated CSM. Geo redundancy.99.9%

    Building for production? See the FlexAI Startup Program. Apply-only, three stages, scaling on one account as your workload grows.

    Models with royalty obligations or pricing floors (currently MiniMax M2.7 and FLUX.2) are excluded from Startup Program introductory discounts.

    How our pricing works

    How the rate is set

    For each model we track the cheapest credible market rate (pricepertoken.com, Artificial Analysis, and provider pages) and price at that rate × 0.90. When a source moves, a scheduled refresh reprices automatically.

    Auditable per row

    Every row links the public source its rate derives from. Click through to check the current market rate for yourself.

    Why this matters

    Most token APIs reprice reactively when they lose share. A rate pinned to a public source, at a fixed margin below it, is predictable: you can budget against it.

    Frequently Asked Questions

    Coming from OpenAI or Claude? Read the migration guide

    Comparing providers? See how FlexAI compares with OpenAI, Fireworks, AWS Bedrock, and CoreWeave.