LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page

Was this helpful?

  1. Llama3

Llama3 - Data and Precision

These configurations allow you to control various aspects of the training process, such as data handling, precision settings, and hardware utilisation.

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

Let's go through each configuration and explain its purpose and implications:

train_on_inputs: false

  • This configuration determines whether to include or mask out the human's prompt from the training labels.

  • When set to false, the model will not train on the human's prompt, meaning that the prompt will be excluded from the training labels.

  • In other words, the model will only learn from the desired output or response and not from the input prompt.

  • This is useful when you want the model to generate responses based on the given prompts without explicitly learning to reproduce the prompts themselves.

  • By masking out the human's prompt, the model can focus on learning the mapping between the prompt and the desired output.

group_by_length: false

  • This configuration controls whether to group similarly sized data together to minimise padding during training.

  • When set to false, the data will not be grouped by length and will be processed in the order it appears in the dataset.

  • Grouping data by length can be beneficial when working with variable-length sequences, as it helps to reduce the amount of padding needed.

  • Padding is the process of adding dummy tokens to shorter sequences to match the length of the longest sequence in a batch.

  • By grouping similarly sized data together, you can minimize the amount of unnecessary padding, which can lead to more efficient memory usage and faster training.

  • However, enabling group_by_length may result in slower data loading and preprocessing, as it requires downloading and sorting the entire dataset before training.

  • It's also worth noting that when group_by_length is enabled, the training loss may exhibit an oscillating pattern due to the reordering of the data.

bf16: auto

  • This configuration relates to the use of BFloat16 (BF16) precision during training.

  • BFloat16 is a 16-bit floating-point format that offers a wider dynamic range compared to the more common FP16 (Half-precision) format.

  • When set to auto, the framework will automatically determine whether to use BF16 based on the available hardware and software support.

  • If the hardware (e.g., GPU) and software (e.g., PyTorch version) support BF16, it will be used for training.

  • BF16 can provide a good balance between computational efficiency and numeric precision, potentially leading to faster training times while maintaining model accuracy.

  • However, the actual performance gains may vary depending on the specific hardware and model architecture.

fp16:

  • This configuration is related to the use of FP16 (Half-precision) during training, but in the provided configuration, it is left empty.

  • FP16 is a 16-bit floating-point format that offers reduced precision compared to the standard FP32 (Single-precision) format.

  • Using FP16 can help to reduce memory usage and accelerate training on certain hardware (e.g., NVIDIA GPUs with Tensor Cores).

  • However, the empty value suggests that FP16 is not being explicitly enabled or configured in this case.

tf32: true

  • This configuration is specific to NVIDIA GPUs and relates to the use of TensorFloat-32 (TF32) precision.

  • TF32 is a 19-bit floating-point format that is used by default on NVIDIA Ampere architecture GPUs (e.g., NVIDIA A100) for certain operations, such as matrix multiplications and convolutions.

  • When set to true, TF32 will be used for supported operations on compatible hardware.

  • TF32 offers a balance between performance and precision, providing faster computation compared to FP32 while maintaining similar accuracy.

  • Enabling TF32 can lead to improved training speeds on NVIDIA Ampere GPUs without significant impact on model quality.

PreviousLlama3 - Training ConfigurationNextLlama3 - Optimisations

Last updated 1 year ago

Was this helpful?

Page cover image