Open models for production agents

Recommended · Research agents

DeepSeek V4 Flash

ServerlessDeepSeekChat

Context

1.0M

Input

$0.0819/MTok

Output

$0.1638/MTok

ServerlessBlack Forest LabsImage

FLUX.1 [schnell]

Price

$0.00045 / image

Recommended · Multimodal generation

Gemma 4 26B A4B

ServerlessGoogleChat

Context

256K

Input

$0.054/MTok

Output

$0.27/MTok

Recommended · Multimodal generation

Gemma 4 31B IT

ServerlessGoogleChat

Context

256K

Input

$0.108/MTok

Output

$0.315/MTok

GLM 4.5 Air

ServerlessZhipu AIChat

Context

128K

Input

$0.1125/MTok

Output

$0.765/MTok

Recommended · Workflow automation

GPT-OSS 120B

ServerlessOpenAIChat

Context

128K

Input

$0.0351/MTok

Output

$0.09/MTok

Recommended · Coding agents+1

GPT-OSS 20B

ServerlessOpenAIChat

Context

128K

Input

$0.027/MTok

Output

$0.117/MTok

Recommended · Support agents

Kokoro-82M

ServerlesshexgradAudio

Price

$0.558 / M chars

Llama 3.1 8B Instruct

ServerlessMetaChat

Context

16K

Input

$0.018/MTok

Output

$0.027/MTok

Recommended · Private copilots

Llama 3.3 70B Instruct

ServerlessMetaChat

Context

128K

Input

$0.09/MTok

Output

$0.288/MTok

MiniMax M2.7

ServerlessMiniMaxChat

Context

192K

Input

$0.225/MTok

Output

$0.9/MTok

Recommended · Workflow automation

Mistral Nemo

ServerlessMistralChat

Context

128K

Input

$0.018/MTok

Output

$0.027/MTok

Nemotron 3 Super 120B A12B

ServerlessNVIDIAChat

Context

256K

Input

$0.081/MTok

Output

$0.405/MTok

ServerlessNVIDIATranscription

NVIDIA Parakeet TDT 0.6B v3

Price

$0.0014 / min audio

Recommended · Support agents+1

PaddleOCR-VL 1.5

ServerlessBaiduVision

Context

16K

Input

$0.126/MTok

Output

$0.72/MTok

Recommended · Coding agents

Qwen3 Coder 30B A3B

ServerlessAlibabaChat

Context

256K

Input

$0.063/MTok

Output

$0.234/MTok

Qwen3.5 9B

ServerlessAlibabaChat

Context

250K

Input

$0.09/MTok

Output

$0.135/MTok

Qwen3.6-35B-A3B

ServerlessAlibabaChat

Context

256K

Input

$0.09/MTok

Output

$0.135/MTok

ServerlessOpenAITranscription

Whisper Large V3 Turbo

Price

$0.0006 / min audio

Recommended · Support agents

DedicatedGoogleEmbeddings

BERT Base Uncased

Context

512

Params

110M

Min H100s

1

DeepSeek R1

Context

160K

Params

685B MoE (37B active)

Min H100s

16

DeepSeek R1 0528

Context

160K

Params

685B MoE (37B active)

Min H100s

16

DeepSeek R1 Distill Qwen 1.5B

Context

32K

Params

1.5B

Min H100s

1

DeepSeek R1 Distill Qwen 32B

Context

32K

Params

32B

Min H100s

1

DeepSeek V3

Context

32K

Params

685B MoE (37B active)

Min H100s

16

DeepSeek V3 0324

Context

160K

Params

685B MoE (37B active)

Min H100s

16

DeepSeek V4 Pro

Context

1.0M

Params

1.6T MoE (49B active)

Min H100s

16

DedicatedHuggingFaceTranscription

Distil-Whisper Large V3

Params

756M

Min H100s

1

Gemma 3 27B IT

DedicatedGoogleChat

Context

128K

Params

27B

Min H100s

1

Gemma 3n E4B Instruct

DedicatedGoogleChat

Context

32K

Params

~5B (selective)

Min H100s

1

GLM 4.7

Context

198K

Params

~355B MoE (32B active)

Min H100s

8

GLM 4.7 Flash

Context

198K

Params

~106B MoE (lite)

Min H100s

2

GLM 5

Context

195K

Params

744B MoE (40B active)

Min H100s

12

GLM 5.1

Context

198K

Params

754B MoE

Min H100s

8

GLM OCR

DedicatedZhipu AIVision

Params

~5B (est)

Min H100s

1

I2VGen-XL

DedicatedAlibabaVideo

Params

~7B

Min H100s

1

IBM Granite 4.1 30B

DedicatedIBMChat

Context

512K

Params

30B

Min H100s

1

IBM Granite 4.1 8B

DedicatedIBMChat

Context

512K

Params

8B

Min H100s

1

LFM2 1.2B

DedicatedLiquid AIChat

Context

32K

Params

1.2B

Min H100s

1

Llama 3.1 8B Base

Context

128K

Params

8B

Min H100s

1

Meta Llama 3 70B Instruct

Context

8K

Params

70B

Min H100s

2

Meta Llama 3 8B

Context

8K

Params

8B

Min H100s

1

Meta Llama 3 8B Instruct

Context

8K

Params

8B

Min H100s

1

MiMo V2.5

DedicatedXiaomiChat

Context

1.0M

Params

311B MoE (8/256 experts active)

Min H100s

8

MiniMax M3

DedicatedMiniMaxChat

Context

192K

Params

427B MoE

Min H100s

8

Mistral 7B Instruct v0.2

DedicatedMistralChat

Context

128K

Params

7B

Min H100s

1

Mistral Medium 3.5

DedicatedMistralChat

Context

256K

Params

128B

Min H100s

4

Mistral Small 3.1 24B

DedicatedMistralChat

Context

128K

Params

24B

Min H100s

1

Nemotron 3 Ultra 550B A55B

DedicatedNVIDIAChat

Context

256K

Params

550B MoE (55B active)

Min H100s

16

Nemotron Nano 9B v2

DedicatedNVIDIAChat

Context

128K

Params

9B

Min H100s

1

DedicatedNVIDIATranscription

Nemotron Speech Streaming 0.6B

Params

600M

Min H100s

1

Nex-N2-Pro

DedicatedNex-AGIChat

Context

256K

Params

397B MoE (17B active)

Min H100s

8

DedicatedNomic AIEmbeddings

Nomic Embed Text v1.5

Context

8K

Params

137M

Min H100s

1

NVIDIA Nemotron 3 Nano 30B A3B FP8

DedicatedNVIDIAChat

Context

256K

Params

30B MoE (3B active)

Min H100s

1

OLMo 3 32B Think

DedicatedAllen AIChat

Context

64K

Params

32B

Min H100s

1

DedicatedCanopy LabsAudio

Orpheus TTS

Params

3B

Min H100s

1

Phi 4 Multi Modal Instruct

DedicatedMicrosoftChat

Context

128K

Params

14B

Min H100s

1

Qwen2.5 32B Instruct

Context

128K

Params

32B

Min H100s

1

Qwen2.5 Coder 32B Instruct

Context

128K

Params

32B

Min H100s

1

Qwen3 235B A22B 2507

Context

256K

Params

235B MoE (22B active)

Min H100s

4

Qwen3 30B A3B Thinking 2507

Context

33K

Params

30B MoE (3B active)

Min H100s

1

Qwen3 8B

Context

41K

Params

8B

Min H100s

1

Qwen3 Coder 480B A35B

Context

256K

Params

480B MoE (35B active)

Min H100s

8

Qwen3 Coder Next

Context

256K

Params

80B MoE (3B active)

Min H100s

1

Qwen3-Next 80B A3B Instruct

Context

256K

Params

80B MoE (3B active)

Min H100s

2

Qwen3-VL 235B A22B Instruct

Context

256K

Params

235B MoE (22B active)

Min H100s

4

Qwen3.5 397B A17B

Context

256K

Params

397B MoE (17B active)

Min H100s

8

Qwen3.5-4B

Context

256K

Params

4B

Min H100s

1

QwQ 32B

Context

128K

Params

32B

Min H100s

1

SmolLM3-3B

DedicatedHuggingFaceChat

Context

128K

Params

3B

Min H100s

1

DedicatedStability AIAudio

Stable Audio Open 1.0

Params

1.2B

Min H100s

1

DedicatedStability AIImage

Stable Diffusion 3.5 Large

Params

8B

Min H100s

1

Step 3.7 Flash

DedicatedStepFunChat

Context

256K

Params

201B MoE

Min H100s

4

DedicatedMistralTranscription

Voxtral Mini 4B Realtime

Context

128K

Params

4B (3.4B LM + 0.97B audio enc)

Min H100s

1

Wan2.2 T2V A14B

DedicatedWan-AIVideo

Params

14B

Min H100s

1

Wan2.2 TI2V 5B

DedicatedWan-AIVideo

Params

5B

Min H100s

1

DedicatedOpenAITranscription

Whisper Large V3

Params

1.5B

Min H100s

1

Zephyr 7B Beta

DedicatedHuggingFaceChat

Context

32K

Params

7B

Min H100s

1