This blueprint demonstrates how to fine-tune a language model using LlamaFactory on FlexAI. We'll use the Llama-3-1B model and the identity and alpaca-en-demo LlamaFactory datasets as an example, but you can adapt this guide for other models and datasets.
As you'll see below, you only need to pass your LlamaFactory configuration YAML.
Note: If you haven't already connected FlexAI to GitHub, run flexai code-registry connect to set up a code registry connection. This allows FlexAI to pull repositories directly using the -u flag in training commands.
To be authenticated into your HuggingFace account within your code, you will use your HuggingFace Token.
Use the flexai secret create command to store your HuggingFace Token as a secret. Replace <HF_AUTH_TOKEN_SECRET_NAME> with your desired name for the secret:
flexai secret create <HF_AUTH_TOKEN_SECRET_NAME>Then paste your HuggingFace Token API key value.
To speed up training and avoid downloading large models at runtime, you can pre-fetch your HuggingFace model to FlexAI storage. For example, to pre-fetch the Qwen/Qwen2.5-72B model:
flexai storage create HF-STORAGE --provider huggingface --hf-token-name <HF_AUTH_TOKEN_SECRET_NAME>flexai checkpoint push qwen25-72b --storage-provider HF-STORAGE --source-path Qwen/Qwen2.5-72BDuring your training run, you can use the pre-fetched model by adding the following argument to your training command:
--checkpoint qwen25-72bThe qwen25-72B_sft.yaml file has been adapted from this example.
To launch the training job:
flexai training run llamafactory-sft-qwen-72B \
--accels 8 --nodes 4 \
--repository-url https://github.com/flexaihq/experiments \
--env FORCE_TORCHRUN=1 \
--secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
--requirements-path code/llama-factory/requirements.txt \
--runtime nvidia-25.06 \
-- /layers/flexai_pip-install/packages/bin/llamafactory-cli train code/llama-factory/qwen25-72B_sft.yamlTo take advantage of model pre-fetching performed in the Optional: Pre-fetch the Model section, use:
flexai training run llamafactory-sft-qwen-72B-prefetched \
--accels 8 --nodes 4 \
--repository-url https://github.com/flexaihq/experiments \
--checkpoint qwen25-72b \
--env FORCE_TORCHRUN=1 \
--secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
--requirements-path code/llama-factory/requirements.txt \
--runtime nvidia-25.06 \
-- /layers/flexai_pip-install/packages/bin/llamafactory-cli train code/llama-factory/qwen25-prefetched_sft.yamlThe llama3_sft.yaml file has been adapted from this example.
To launch the training job:
flexai training run llamafactory-sft-llama3 \
--accels 8 --nodes 4 \
--repository-url https://github.com/flexaihq/experiments \
--env FORCE_TORCHRUN=1 \
--secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
--requirements-path code/llama-factory/requirements.txt \
--runtime nvidia-25.06 \
-- /layers/flexai_pip-install/packages/bin/llamafactory-cli train code/llama-factory/llama3_sft.yamlYou can speed up training and improve reproducibility by prefetching your dataset to FlexAI storage. This makes the dataset available as a FlexAI dataset object, allowing you to reference it directly in your training jobs.
For example, to prefetch the Hugging Face legmlai/openhermes-fr dataset using the storage provider created earlier:
flexai dataset push openhermes-fr --storage-provider HF-STORAGE --source-path legmlai/openhermes-frOnce the dataset is uploaded, you can launch a training job that uses both the prefetched model and dataset:
flexai training run llamafactory-sft-qwen-72B-prefetched-all \
--accels 8 --nodes 4 \
--repository-url https://github.com/flexaihq/experiments \
--checkpoint qwen25-72b \
--dataset openhermes-fr \
--env FORCE_TORCHRUN=1 \
--secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
--requirements-path code/llama-factory/requirements.txt \
--runtime nvidia-25.06 \
-- /layers/flexai_pip-install/packages/bin/llamafactory-cli train code/llama-factory/qwen25-prefetched_all_sft.yaml
To celebrate this launch we’re offering €100 starter credits for first-time users!
Get Started Now