LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page
  • Summary
  • Tutorials

Was this helpful?

  1. Training Ideas around Hyperparameters

Hugging Face documentation on loading PEFT

Summary

  • PEFT (Parameter-Efficient Fine Tuning) methods freeze pretrained model parameters and add a small number of trainable adapter parameters, allowing for memory-efficient fine-tuning.

  • Adapters are much smaller than full models, making them easier to share and store. Examples: OPT adapter is 6MB vs 700MB for full model.

  • Transformers natively supports Low Rank Adapters, IA3, and AdaLoRA PEFT methods. Other methods require using the PEFT library.

  • To load a PEFT adapter, the Hub repo or local directory needs an adapter_config.json and the adapter weights. Use AutoModelFor* class or model.load_adapter().

  • bitsandbytes integration allows loading in 8-bit or 4-bit precision to save memory.

  • Multiple adapters of the same type can be added to a model. Use model.set_adapter() to switch between them.

  • Adapters can be enabled/disabled with model.enable_adapters() and model.disable_adapters() after being added.

  • The Trainer class supports training PEFT adapters with minor code additions. Define adapter config, add to model, pass model to Trainer.

  • Additional layers like the language model head can be fine-tuned on top of a PEFT adapter by specifying modules_to_save in the config.

Tutorials

Choosing the Right PEFT Method

Example: If you have a large model and limited GPU memory, consider using LoRA or AdaLoRA for parameter-efficient fine-tuning

from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b")
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
  • Tip: Experiment with different PEFT methods to find the one that works best for your specific task and dataset.

  • Best Practice: Consider the trade-offs between memory efficiency and performance when selecting a PEFT method.

  • Potential Error: Using a PEFT method that is not compatible with your model architecture or task type.

Optimising Adapter Hyperparameters

Example: When configuring a LoRA adapter, experiment with different values for lora_alpha, lora_dropout, and r to find the optimal balance between performance and efficiency.

lora_config = LoraConfig(
    r=16,  # Experiment with different values of r
    lora_alpha=32,  # Experiment with different values of lora_alpha
    lora_dropout=0.1,  # Experiment with different values of lora_dropout
    target_modules=["q_proj", "v_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)
  • Tip: Start with the default hyperparameters and gradually tune them based on your task and dataset.

  • Best Practice: Use a validation set to evaluate the performance of different hyperparameter configurations.

  • Potential Error: Setting the hyperparameters to extreme values that lead to poor performance or unstable training.

Efficient Storage and Sharing of Adapters

Example: When saving a trained adapter, use a descriptive name that includes the model architecture, PEFT method, and task information.

model.save_pretrained("output/opt-1.3b-lora-custom-task")
  • Tip: Store adapters separately from the base model to facilitate reuse across different projects.

  • Best Practice: Use a version control system like Git to track changes to your adapter configurations and training scripts.

  • Potential Error: Overwriting an existing adapter by mistake when saving a new one.

Combining Multiple Adapters

Example: If you have multiple adapters trained on different tasks or datasets, you can combine them using model.set_adapter() to leverage their combined knowledge.

model.load_adapter("adapter1")
model.load_adapter("adapter2")
model.set_active_adapters(["adapter1", "adapter2"])
  • Tip: Experiment with different adapter combinations to find the ones that yield the best performance for your specific use case.

  • Best Practice: Ensure that the adapters you combine are compatible in terms of their PEFT method and model architecture.

  • Potential Error: Combining adapters that have conflicting or incompatible weights, leading to poor performance or unexpected behavior.

Fine-Tuning Additional Layers with PEFT Adapters

Example: If your task requires fine-tuning the language model head in addition to the adapter, specify modules_to_save=["lm_head"] in your PEFT configuration.

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    modules_to_save=["lm_head"],  # Fine-tune the language model head
)
  • Tip: Be cautious when fine-tuning additional layers, as it may increase the risk of overfitting, especially if you have a small dataset.

  • Best Practice: Start by fine-tuning only the adapter and gradually add additional layers if needed based on performance evaluation.

  • Potential Error: Fine-tuning too many additional layers, leading to overfitting and poor generalization.

Monitoring Adapter Training

Example: Use a logging library like Weights and Biases (wandb) to track the training progress, including loss curves, evaluation metrics, and hardware utilization.

import wandb

wandb.init(project="peft-adapter-training")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],  # Set up early stopping
)
trainer.train()
  • Tip: Regularly check the training logs to identify any anomalies or convergence issues.

  • Best Practice: Set up early stopping criteria to prevent overfitting and save computational resources.

  • Potential Error: Neglecting to monitor the training progress, leading to suboptimal results or wasted resources.

Remember to refer to the official Hugging Face PEFT documentation for the most up-to-date information and API references.

These code examples are meant to provide a starting point and may need to be adapted to your specific use case.

PreviousTraining Ideas around HyperparametersNextAfter fine tuning LLama3

Last updated 1 year ago

Was this helpful?

Page cover image