From Training to Deployment in 5 Steps Without Setup Delays

Most AI teams waste 50-70% of their time wrestling with infrastructure instead of building better models. What if you could go from training to production in hours, not weeks? This blueprint shows you exactly how.

The Infrastructure Problem

Traditional AI infrastructure creates friction at every step. FlexAI eliminates these bottlenecks:

  • Minimal setup & infrastructure overhead: No manual node provisioning, GPU management, or CUDA installations. It just works out of the box.
  • Fast debugging via interactive sessions: Launch dev sessions that mirror your training environment for quick testing and bug spotting.
  • Seamless scaling: Parallelize training across many GPUs without rewriting code or managing orchestration.
  • Minimal code changes: Your existing training code runs with little to no modification.
  • Smooth transition to inference: Deploy trained models for inference easily with vLLM integration.
  • Flexibility without abstraction overhead: Not locked into rigid workflows, but you don't build everything from scratch either.

This blueprint shows you how FlexAI eliminates these bottlenecks through a 5-step process. We'll walk through a real example: fine-tuning a French language model using LlamaFactory on FlexAI.

Step 1: Configure Dataset and Training Parameters

First, define what you want to train. No infrastructure configuration needed—just ML work.

Register your dataset in LlamaFactory's dataset registry (dataset_info.json):

{  "openhermes-fr": {    "hf_hub_url": "legmlai/openhermes-fr",    "columns": {      "prompt": "prompt",      "response": "accepted_completion"    }  }}

Then configure training parameters in a YAML file:

model_name_or_path: Qwen/Qwen2.5-7B
stage: sft
finetuning_type: full
dataset: openhermes-fr
learning_rate: 1.0e-5
num_train_epochs: 3.0
deepspeed: code/llama-factory/ds_z3_config.json

That's it. Pure ML configuration—no Kubernetes, no Docker, no cloud-specific setup.

Step 2: Store Secrets

Securely store access credentials:

flexai secret create HF_TOKEN

Optional: Pre-Fetch Large Models

For faster training, pre-fetch models to storage:

flexai storage create HF-STORAGE --provider huggingface --hf-token-name HF_TOKEN
flexai checkpoint push qwen25-7b --storage-provider HF-STORAGE --source-path Qwen/Qwen2.5-7B

This eliminates model download time on every training run.

Step 3: Launch Training with One Command

All infrastructure complexity disappears into a single command:

flexai training run french-qwen-sft \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --secret HF_TOKEN=HF_TOKEN \
  --requirements-path code/llama-factory/requirements.txt \
  -- llamafactory-cli train code/llama-factory/qwen25-7B_sft.yaml

Note: Remember to specify an output directory (e.g., --output_dir /output-checkpoint) inside your YAML or command.

This ensures your training results and checkpoints are stored in a retrievable location.

For more details, see the FlexAI training documentation.

What happens automatically:

  • 8 H100 GPUs provisioned across 1 node
  • Code repository pulled, dependencies installed
  • Multi-GPU coordination configured
  • Training starts immediately

The command takes seconds. Training takes hours, but you're training—not configuring infrastructure.

Optional: Step 3.5: Debug with Interactive Sessions

Need to test your training setup before committing to a full run? Launch an interactive session that mirrors your training environment:

 flexai training debug-ssh \
  --repository-url https://github.com/flexaihq/nanoGPT \
  --vscode

This opens an SSH session (or VSCode window) where you can edit scripts, run commands, and debug issues in the exact training environment. Perfect for testing data pipelines, verifying configurations, or debugging distributed training before launching your full job.

Step 4: Monitor Progress and Get Checkpoints

Check training status:

flexai training inspect french-qwen-sft

View and fetch training logs:

# Stream logs in real-time
flexai training logs french-qwen-sft

For advanced monitoring and visualization, access Grafana dashboards to see metrics, resource utilization, and training progress. You can find the Grafana link in your FlexAI console or training job details.

Once training completes, grab your checkpoints:

flexai training checkpoints french-qwen-sft

Look for checkpoints marked INFERENCE READY = true.

Step 5: Deploy to Production

Deploy your checkpoint as a production endpoint with one command:

flexai inference serve french-endpoint --checkpoint <CHECKPOINT_ID>

This automatically handles:

  • Model loading and optimization
  • API exposure and autoscaling

Your model is now live. Same environment for training and inference means zero compatibility issues.

Test Your Model

curl -X POST "https://your-endpoint-url/v1/completions" \  -H "Content-Type: application/json" \  -H "Authorization: Bearer YOUR_API_KEY" \  -d '{    "prompt": "Expliquez l'intelligence artificielle:",    "max_tokens": 200  }'

Real Results and Impact

Let's look at concrete results from our French language fine-tuning example, then examine the broader impact this approach delivers.

Before and After: French Model Quality

We asked both models: "Qui a gagné la Coupe du monde 2018?" (Who won the 2018 World Cup?)

Base Model

La Coupe du monde de football 2018 a été remportée par la Russie.

Issues: Incorrect answer (Russia instead of France)

Fine-tuned Model

La France a remporté la Coupe du monde de football 2018, en battant le Croatie lors de la finale disputée à Moscou le 15 juillet 2018.

mprovements: Correct answer, excellent grammar, accurate details, proper structure

Time to Production

  • Traditional: 2-4 weeks from training to deployment
  • This blueprint: 4-6 hours for the complete journey

10-50x faster deployment velocity

Resource Efficiency

  • GPU utilization: 90%+ vs. 30-50% typical
  • Cost savings: Teams save $87K/year on average
  • Flexibility: No vendor lock-in, switch GPUs without code changes

Technical Details

Resource Requirements (7B Model)

  • Setup: 1 node, 8 H100 GPUs
  • Memory: ~200GB GPU memory
  • Training Time: 2-4 hours for 3 epochs
  • Storage: ~30GB for checkpoints

Scaling Options

  • Faster training: Scale to 2 nodes (16 H100s)
  • Different hardware: Switch between NVIDIA, AMD

Why This Works

Traditional infrastructure forces you to become an expert in Kubernetes, Docker, GPU drivers, and cloud-specific quirks. FlexAI eliminates that entire layer.

You focus on three things:

  • What to train (model, dataset, hyperparameters)
  • How much compute (nodes, accelerators)
  • Where to deploy (production endpoint)

Everything else happens automatically.

This isn't just faster deployment. It's infrastructure that adapts to your needs rather than forcing you to adapt to its constraints.

Bonus: Validate Your Model with Comprehensive Evaluation

After training your model, you need to know how well it actually performs. FlexAI makes it just as easy to run comprehensive evaluations using the LM Evaluation Harness, a framework that tests your model across 300+ standardized benchmarks.

Evaluate your fine-tuned French model on key benchmarks with a single command:

flexai training run evaluate-french-model \
  --accels 4 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --checkpoint <YOUR_CHECKPOINT_ID> \
  --requirements-path code/lm-evaluation-harness/requirements.txt \
  -- lm_eval \
      --model hf \
      --model_args pretrained=/input-checkpoint \
      --tasks hellaswag,arc_challenge,mmlu \
      --device cuda \
      --batch_size 8 \
      --output_path /output-checkpoint/eval_results.json
  • Test across popular benchmarks: HellaSwag (commonsense reasoning), MMLU (multi-task understanding), GSM8K (math), HumanEval (code generation)
  • Get standardized metrics comparable to published research
  • Same simple workflow: one command, automatic infrastructure, downloadable results

For a complete guide on model evaluation including advanced configurations, custom tasks, and interpreting results, check out the LM Evaluation Harness blueprint at lm-evaluation-harness

FlexAI Logo

Get Started Today

To celebrate this launch we’re offering €100 starter credits for first-time users!

Get Started Now