Resuming From Checkpoint¶
To resume from a checkpoint, simply add the resume
and resume_checkpoint
options to any of your training commands.
conda activate mistral
cd mistral
python train.py --config conf/mistral-micro.yaml --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id resume-demo --resume true --resume_checkpoint /path/to/checkpoint
When resuming from checkpoint the process should pick up from where it left off, using the same learning rate, same point in the learning rate schedule, same point in the data, etc …