LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page

Was this helpful?

  1. Axolotl Configuration Files

Sequence Configuration

Field Name
Explanation

sequence_len

The sequence_len field specifies the maximum length of an input sequence to use during training. This value should typically be less than 2048, as most models have a token/context limit of 2048. Setting it appropriately ensures that input sequences are within the model's capacity.

pad_to_sequence_len

The pad_to_sequence_len field is intended to pad inputs to a constant size, which can reduce memory fragmentation and potentially prevent Out of Memory (OOM) errors. It helps re-use memory more efficiently by ensuring that all input sequences have a uniform length. However, this field appears empty in the provided configuration and should have a value (e.g., true or false) to specify whether to pad to the sequence_len.

max_packed_sequence_len

max_packed_sequence_len sets the maximum sequence length for concatenating training samples together. This concept is inspired by StackLLaMA and can be useful for efficient training. However, it's important to note that there is a FutureWarning that suggests this field will soon be deprecated.

sample_packing

The sample_packing field, when set to 'true,' indicates the use of efficient multi-packing with block diagonal attention and per-sequence position_ids. This setting is recommended for optimizing training and memory efficiency, especially when dealing with long sequences.

eval_sample_packing

eval_sample_packing allows you to enable or disable sample packing during evaluation. If you encounter errors during evaluation when sample packing is on, you can set this field to 'false' to disable it.

sample_packing_eff_est

This field is not fully specified in the provided configuration. It appears to be related to estimating the efficiency of sample packing optimizations but is left empty.

total_num_tokens

total_num_tokens seems to be intended for specifying the total number of tokens in the dataset. However, this field is also left empty in the configuration.

PreviousData Loading and ProcessingNextLora Configuration

Last updated 1 year ago

Was this helpful?

Page cover image