This blueprint will continue training from a Checkpoint emitted by the Training Job in the A Simple Training Job on FlexAI blueprint, so make sure to complete it and download its output artifacts before proceeding.
Extract the contents of the output_0.zip file into a directory named fetched_checkpoints:
unzip output_0.zip -d fetched_checkpointsThis fetched_checkpoints directory contains the different checkpoints that have been saved in the /output-checkpoint of the Training Job's runtime environment during execution.
Let's use the checkpoint (saved at step 500) located in fetched_checkpoints/output/checkpoint-500/.
Create the FlexAI checkpoint to be passed to the next run that will resume the training:
flexai checkpoint push gpt2-ckpt500 --file fetched_checkpoints/output/checkpoint-500Resume training from your checkpoint with the following command:
flexai training run gpt2training-resume --repository-url https://github.com/flexaihq/experiments --dataset gpt2-tokenized-wikitext --checkpoint gpt2-ckpt500 --requirements-path code/causal-language-modeling/requirements.txt \
-- code/causal-language-modeling/train.py \
--do_eval \
--do_train \
--dataset_name wikitext \
--tokenized_dataset_load_dir /input/gpt2-tokenized-wikitext \
--model_name_or_path /input-checkpoint \
--resume_from_checkpoint /input-checkpoint \
--output_dir /output-checkpoint \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--logging_steps 50 \
--save_steps 500 \
--eval_steps 500 \
--eval_strategy steps \
--num_train_epochs 6Compared to the blueprint that starts training from the base model, note that:
--checkpoint gpt2-ckpt500 has been added - referring to the checkpoint created above, the content of the checkpoint-500 folder will be mounted on /input-checkpoint--model_name_or_path has been updated, pointing to the new checkpoint locationtogether with additional HuggingFace args to resume the training from the checkpoint:
--resume_from_checkpoint /input-checkpoint--num_train_epochs 6
To celebrate this launch we’re offering €100 starter credits for first-time users!
Get Started Now