LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page
  • General Tips
  • Eliminate concurrency
  • Use a small dataset
  • Use a small model
  • Minimize iteration time
  • Clear Caches

Was this helpful?

General Tips

General Tips

While debugging it’s helpful to simplify your test scenario as much as possible. Here are some tips for doing so:

Make sure you are using the latest version of axolotl

This project changes often and bugs get fixed fast. Check your git branch and make sure you have pulled the latest changes from main.

Eliminate concurrency

Restrict the number of processes to 1 for both training and data pre-processing:

  • Set CUDA_VISIBLE_DEVICES to a single GPU, ex: export CUDA_VISIBLE_DEVICES=0.

  • Set dataset_processes: 1 in your axolotl config or run the training command with --dataset_processes=1.

Use a small dataset

Construct or use a small dataset from HF Hub.

When using a small dataset, you will often have to make sure sample_packing: False and eval_sample_packing: False to avoid errors.

If you are in a pinch and don’t have time to construct a small dataset but want to use from the Huggingface Hub, you can shard the data (this will still tokenize the entire dataset, but will only use a fraction of the data for training.

For example, to shard the dataset into 20 pieces, add the following to your axolotl config):

yaml dataset: ... shards: 20

Use a small model

Minimize iteration time

Make sure the training loop finishes as fast as possible, with these settings.

  • micro_batch_size: 1

  • max_steps: 1

  • val_set_size: 0

Clear Caches

Axolotl caches certain steps and so does the underlying HuggingFace trainer.

You may want to clear some of these caches when debugging.

  • Data pre-processing: When debugging data pre-processing, which includes prompt template formation, you may want to delete the directory set in dataset_prepared_path: in your axolotl config. If you didn’t set this value, the default is last_run_prepared.

  • The recommended approach is to redirect all outputs and caches to a temporary folder and delete selected subfolders before each run. This is demonstrated in the example configuration below.

PreviousJSON (JavaScript Object Notation)NextDatasets

Last updated 11 months ago

Was this helpful?

A good example of a small model is .

Huggingface Hub: If you are debugging data pre-processing, you should clear the relevant HF cache , by deleting the appropriate ~/.cache/huggingface/datasets/... folder(s).

TinyLlama/TinyLlama-1.1B-Chat-v1.0
HuggingFace cache
Page cover image