LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page

Was this helpful?

  1. Axolotl Configuration Files

Augmentation Techniques

Field Name
Explanation

noisy_embedding_alpha

noisy_embedding_alpha is used for applying noise to embeddings as part of data augmentation. It is based on the NEFT (Noisy Embedding Fine-Tuning) technique and can be set to a number (e.g., 5) to add noise to embeddings. This technique helps introduce variability into the training data, potentially improving robustness and generalization.

flash_optimum

flash_optimum determines whether to use the "Optimum Layer-Order for Transformers" technique provided by Better Transformers. It's an advanced technique that optimizes the order of layers in the transformer model for improved performance.

xformers_attention

xformers_attention specifies whether to use the attention patch from the XFormers library. XFormers is a library that provides optimized implementations of transformer components, including attention mechanisms.

flash_attention

flash_attention controls whether to use the Flash Attention patch from the Flash Attention library. Flash Attention is another library that offers optimized attention mechanisms for transformers.

flash_attn_cross_entropy

flash_attn_cross_entropy determines whether to use the Flash-Attention Cross Entropy implementation. This is an advanced option and should be used with caution, as it may require specific use cases.

flash_attn_rms_norm

flash_attn_rms_norm specifies whether to use the Flash-Attention Root Mean Square (RMS) Norm implementation. RMS Norm is a technique for normalizing model activations.

flash_attn_fuse_qkv

flash_attn_fuse_qkv controls whether to fuse the Query, Key, and Value (QKV) components of the attention mechanism into a single operation. This can potentially improve efficiency during training.

flash_attn_fuse_mlp

flash_attn_fuse_mlp determines whether to fuse part of the Multi-Layer Perceptron (MLP) components of the attention mechanism into a single operation. Like the previous option, this aims to enhance efficiency.

sdp_attention

sdp_attention specifies whether to use the Scaled Dot-Product Attention mechanism, which is a fundamental component of transformer models. The link provided points to the PyTorch documentation for this attention mechanism.

landmark_attention

landmark_attention is used only with LLaMA and controls whether to use landmark attention. Landmark attention is a specialized attention mechanism designed for specific use cases.

xpos_rope

xpos_rope is related to the RoPE (Relative Positional Encoding) technique and is specific to LLaMA. It appears to be related to modifying RoPE for positional encoding in the LLaMA model. The provided link points to an external resource for more details.

PreviousTraining ConfigurationNextAxolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide

Last updated 1 year ago

Was this helpful?

Page cover image