Most AI teams waste 50-70% of their time wrestling with infrastructure instead of building better models. What if you could go from training to production in hours, not weeks? This blueprint shows you exactly how.
Traditional AI infrastructure creates friction at every step. FlexAI eliminates these bottlenecks:
This blueprint shows you how FlexAI eliminates these bottlenecks through a 5-step process. We'll walk through a real example: fine-tuning a French language model using LlamaFactory on FlexAI.
First, define what you want to train. No infrastructure configuration needed—just ML work.
Register your dataset in LlamaFactory's dataset registry (dataset_info.json):
{ "openhermes-fr": { "hf_hub_url": "legmlai/openhermes-fr", "columns": { "prompt": "prompt", "response": "accepted_completion" } }}Then configure training parameters in a YAML file:
model_name_or_path: Qwen/Qwen2.5-7B
stage: sft
finetuning_type: full
dataset: openhermes-fr
learning_rate: 1.0e-5
num_train_epochs: 3.0
deepspeed: code/llama-factory/ds_z3_config.jsonThat's it. Pure ML configuration—no Kubernetes, no Docker, no cloud-specific setup.
Securely store access credentials:
flexai secret create HF_TOKENFor faster training, pre-fetch models to storage:
flexai storage create HF-STORAGE --provider huggingface --hf-token-name HF_TOKEN
flexai checkpoint push qwen25-7b --storage-provider HF-STORAGE --source-path Qwen/Qwen2.5-7BThis eliminates model download time on every training run.
All infrastructure complexity disappears into a single command:
flexai training run french-qwen-sft \
--accels 8 --nodes 1 \
--repository-url https://github.com/flexaihq/blueprints \
--secret HF_TOKEN=HF_TOKEN \
--requirements-path code/llama-factory/requirements.txt \
-- llamafactory-cli train code/llama-factory/qwen25-7B_sft.yamlNote: Remember to specify an output directory (e.g., --output_dir /output-checkpoint) inside your YAML or command.
This ensures your training results and checkpoints are stored in a retrievable location.
For more details, see the FlexAI training documentation.
What happens automatically:
The command takes seconds. Training takes hours, but you're training—not configuring infrastructure.
Need to test your training setup before committing to a full run? Launch an interactive session that mirrors your training environment:
flexai training debug-ssh \
--repository-url https://github.com/flexaihq/nanoGPT \
--vscodeThis opens an SSH session (or VSCode window) where you can edit scripts, run commands, and debug issues in the exact training environment. Perfect for testing data pipelines, verifying configurations, or debugging distributed training before launching your full job.
Check training status:
flexai training inspect french-qwen-sftView and fetch training logs:
# Stream logs in real-time
flexai training logs french-qwen-sftFor advanced monitoring and visualization, access Grafana dashboards to see metrics, resource utilization, and training progress. You can find the Grafana link in your FlexAI console or training job details.
Once training completes, grab your checkpoints:
flexai training checkpoints french-qwen-sftLook for checkpoints marked INFERENCE READY = true.
Deploy your checkpoint as a production endpoint with one command:
flexai inference serve french-endpoint --checkpoint <CHECKPOINT_ID>This automatically handles:
Your model is now live. Same environment for training and inference means zero compatibility issues.
curl -X POST "https://your-endpoint-url/v1/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "prompt": "Expliquez l'intelligence artificielle:", "max_tokens": 200 }'Let's look at concrete results from our French language fine-tuning example, then examine the broader impact this approach delivers.
We asked both models: "Qui a gagné la Coupe du monde 2018?" (Who won the 2018 World Cup?)
La Coupe du monde de football 2018 a été remportée par la Russie.Issues: Incorrect answer (Russia instead of France)
La France a remporté la Coupe du monde de football 2018, en battant le Croatie lors de la finale disputée à Moscou le 15 juillet 2018.mprovements: Correct answer, excellent grammar, accurate details, proper structure
10-50x faster deployment velocity
Traditional infrastructure forces you to become an expert in Kubernetes, Docker, GPU drivers, and cloud-specific quirks. FlexAI eliminates that entire layer.
You focus on three things:
Everything else happens automatically.
This isn't just faster deployment. It's infrastructure that adapts to your needs rather than forcing you to adapt to its constraints.
After training your model, you need to know how well it actually performs. FlexAI makes it just as easy to run comprehensive evaluations using the LM Evaluation Harness, a framework that tests your model across 300+ standardized benchmarks.
Evaluate your fine-tuned French model on key benchmarks with a single command:
flexai training run evaluate-french-model \
--accels 4 --nodes 1 \
--repository-url https://github.com/flexaihq/blueprints \
--checkpoint <YOUR_CHECKPOINT_ID> \
--requirements-path code/lm-evaluation-harness/requirements.txt \
-- lm_eval \
--model hf \
--model_args pretrained=/input-checkpoint \
--tasks hellaswag,arc_challenge,mmlu \
--device cuda \
--batch_size 8 \
--output_path /output-checkpoint/eval_results.jsonFor a complete guide on model evaluation including advanced configurations, custom tasks, and interpreting results, check out the LM Evaluation Harness blueprint at lm-evaluation-harness

To celebrate this launch we’re offering €100 starter credits for first-time users!
Get Started Now