LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page
  • sequence_len
  • sample_packing
  • pad_to_sequence_len

Was this helpful?

  1. Llama3

Llama3 - Sequence Configuration

Before training or fine-tuning begins, the input data must be correctly formatted and prepared.

We will be configuring the:

Sequence Length

  1. Sequence Length

  2. Sample Packing

  3. Padding to Sequence

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

sequence_len

This parameter sets the maximum allowable length for input sequences. Sequences longer than this length may be truncated or split during training.

This limit is essential since transformers process input data in fixed-size blocks.

Sequences longer than this length are either truncated or split.

Truncation means cutting off the part of the sequence that exceeds the limit, while splitting involves dividing a long sequence into smaller segments, each within the maximum length.

The choice of sequence length affects memory usage and computational requirements. Longer sequences can capture more context but require more computational resources.

sample_packing

A flag that determines whether sample packing should be used.

This is a method to optimise the training process by packing multiple shorter sequences into a single training example (batch). It can increases training efficiency by reducing padding needs and better utilizing GPU memory. This technique is particularly useful when dealing with variable-length sequences.

Implementation: If set to true, sequences that are shorter than sequence_len are concatenated with others to form a packed batch. This process continues until the maximum sequence length is reached or no more sequences are available for packing.

pad_to_sequence_len

This is a flag that controls whether sequences should be padded to match the specified sequence length.

This ensures that all sequences in a batch are of the same length, which is necessary for parallel processing by the model.

Shorter sequences are extended (padded) with special tokens (usually [PAD]) to reach the defined maximum sequence length.

Padding is a standard practice in training neural networks on sequences of varying lengths, but it can introduce additional computational overhead, especially with longer sequence lengths.

If set to "true," input sequences will be padded with special tokens to reach the maximum sequence length defined earlier.

PreviousLlama3 - Data Loading and PathsNextLlama3 - Lora Configuration

Last updated 1 year ago

Was this helpful?

Page cover image