Page cover image

Hugging Face documentation on loading PEFT

Summary

  • PEFT (Parameter-Efficient Fine Tuning) methods freeze pretrained model parameters and add a small number of trainable adapter parameters, allowing for memory-efficient fine-tuning.

  • Adapters are much smaller than full models, making them easier to share and store. Examples: OPT adapter is 6MB vs 700MB for full model.

  • Transformers natively supports Low Rank Adapters, IA3, and AdaLoRA PEFT methods. Other methods require using the PEFT library.

  • To load a PEFT adapter, the Hub repo or local directory needs an adapter_config.json and the adapter weights. Use AutoModelFor* class or model.load_adapter().

  • bitsandbytes integration allows loading in 8-bit or 4-bit precision to save memory.

  • Multiple adapters of the same type can be added to a model. Use model.set_adapter() to switch between them.

  • Adapters can be enabled/disabled with model.enable_adapters() and model.disable_adapters() after being added.

  • The Trainer class supports training PEFT adapters with minor code additions. Define adapter config, add to model, pass model to Trainer.

  • Additional layers like the language model head can be fine-tuned on top of a PEFT adapter by specifying modules_to_save in the config.

Tutorials

Choosing the Right PEFT Method

Example: If you have a large model and limited GPU memory, consider using LoRA or AdaLoRA for parameter-efficient fine-tuning

from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b")
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
  • Tip: Experiment with different PEFT methods to find the one that works best for your specific task and dataset.

  • Best Practice: Consider the trade-offs between memory efficiency and performance when selecting a PEFT method.

  • Potential Error: Using a PEFT method that is not compatible with your model architecture or task type.

Optimising Adapter Hyperparameters

Example: When configuring a LoRA adapter, experiment with different values for lora_alpha, lora_dropout, and r to find the optimal balance between performance and efficiency.

lora_config = LoraConfig(
    r=16,  # Experiment with different values of r
    lora_alpha=32,  # Experiment with different values of lora_alpha
    lora_dropout=0.1,  # Experiment with different values of lora_dropout
    target_modules=["q_proj", "v_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)
  • Tip: Start with the default hyperparameters and gradually tune them based on your task and dataset.

  • Best Practice: Use a validation set to evaluate the performance of different hyperparameter configurations.

  • Potential Error: Setting the hyperparameters to extreme values that lead to poor performance or unstable training.

Efficient Storage and Sharing of Adapters

Example: When saving a trained adapter, use a descriptive name that includes the model architecture, PEFT method, and task information.

model.save_pretrained("output/opt-1.3b-lora-custom-task")
  • Tip: Store adapters separately from the base model to facilitate reuse across different projects.

  • Best Practice: Use a version control system like Git to track changes to your adapter configurations and training scripts.

  • Potential Error: Overwriting an existing adapter by mistake when saving a new one.

Combining Multiple Adapters

Example: If you have multiple adapters trained on different tasks or datasets, you can combine them using model.set_adapter() to leverage their combined knowledge.

model.load_adapter("adapter1")
model.load_adapter("adapter2")
model.set_active_adapters(["adapter1", "adapter2"])
  • Tip: Experiment with different adapter combinations to find the ones that yield the best performance for your specific use case.

  • Best Practice: Ensure that the adapters you combine are compatible in terms of their PEFT method and model architecture.

  • Potential Error: Combining adapters that have conflicting or incompatible weights, leading to poor performance or unexpected behavior.

Fine-Tuning Additional Layers with PEFT Adapters

Example: If your task requires fine-tuning the language model head in addition to the adapter, specify modules_to_save=["lm_head"] in your PEFT configuration.

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    modules_to_save=["lm_head"],  # Fine-tune the language model head
)
  • Tip: Be cautious when fine-tuning additional layers, as it may increase the risk of overfitting, especially if you have a small dataset.

  • Best Practice: Start by fine-tuning only the adapter and gradually add additional layers if needed based on performance evaluation.

  • Potential Error: Fine-tuning too many additional layers, leading to overfitting and poor generalization.

Monitoring Adapter Training

Example: Use a logging library like Weights and Biases (wandb) to track the training progress, including loss curves, evaluation metrics, and hardware utilization.

import wandb

wandb.init(project="peft-adapter-training")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],  # Set up early stopping
)
trainer.train()
  • Tip: Regularly check the training logs to identify any anomalies or convergence issues.

  • Best Practice: Set up early stopping criteria to prevent overfitting and save computational resources.

  • Potential Error: Neglecting to monitor the training progress, leading to suboptimal results or wasted resources.

Remember to refer to the official Hugging Face PEFT documentation for the most up-to-date information and API references.

These code examples are meant to provide a starting point and may need to be adapted to your specific use case.

Last updated

Was this helpful?