General Tips
General Tips
While debugging it’s helpful to simplify your test scenario as much as possible. Here are some tips for doing so:
Make sure you are using the latest version of axolotl
This project changes often and bugs get fixed fast. Check your git branch and make sure you have pulled the latest changes from main
.
Eliminate concurrency
Restrict the number of processes to 1 for both training and data pre-processing:
Set
CUDA_VISIBLE_DEVICES
to a single GPU, ex:export CUDA_VISIBLE_DEVICES=0
.Set
dataset_processes: 1
in your axolotl config or run the training command with--dataset_processes=1
.
Use a small dataset
Construct or use a small dataset from HF Hub.
When using a small dataset, you will often have to make sure sample_packing: False
and eval_sample_packing: False
to avoid errors.
If you are in a pinch and don’t have time to construct a small dataset but want to use from the Huggingface Hub, you can shard the data (this will still tokenize the entire dataset, but will only use a fraction of the data for training.
For example, to shard the dataset into 20 pieces, add the following to your axolotl config):
yaml dataset: ... shards: 20
Use a small model
A good example of a small model is TinyLlama/TinyLlama-1.1B-Chat-v1.0.
Minimize iteration time
Make sure the training loop finishes as fast as possible, with these settings.
micro_batch_size: 1
max_steps: 1
val_set_size
: 0
Clear Caches
Axolotl caches certain steps and so does the underlying HuggingFace trainer.
You may want to clear some of these caches when debugging.
Data pre-processing: When debugging data pre-processing, which includes prompt template formation, you may want to delete the directory set in
dataset_prepared_path:
in your axolotl config. If you didn’t set this value, the default islast_run_prepared
.Huggingface Hub: If you are debugging data pre-processing, you should clear the relevant HF cache HuggingFace cache, by deleting the appropriate
~/.cache/huggingface/datasets/...
folder(s).The recommended approach is to redirect all outputs and caches to a temporary folder and delete selected subfolders before each run. This is demonstrated in the example configuration below.
Last updated