Hugging Face documentation on loading PEFT

Summary

PEFT (Parameter-Efficient Fine Tuning) methods freeze pretrained model parameters and add a small number of trainable adapter parameters, allowing for memory-efficient fine-tuning.
Adapters are much smaller than full models, making them easier to share and store. Examples: OPT adapter is 6MB vs 700MB for full model.
Transformers natively supports Low Rank Adapters, IA3, and AdaLoRA PEFT methods. Other methods require using the PEFT library.
To load a PEFT adapter, the Hub repo or local directory needs an adapter_config.json and the adapter weights. Use AutoModelFor* class or model.load_adapter().
bitsandbytes integration allows loading in 8-bit or 4-bit precision to save memory.
Multiple adapters of the same type can be added to a model. Use model.set_adapter() to switch between them.
Adapters can be enabled/disabled with model.enable_adapters() and model.disable_adapters() after being added.
The Trainer class supports training PEFT adapters with minor code additions. Define adapter config, add to model, pass model to Trainer.
Additional layers like the language model head can be fine-tuned on top of a PEFT adapter by specifying modules_to_save in the config.

Tutorials

Choosing the Right PEFT Method

Example: If you have a large model and limited GPU memory, consider using LoRA or AdaLoRA for parameter-efficient fine-tuning

from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b")
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)

Tip: Experiment with different PEFT methods to find the one that works best for your specific task and dataset.
Best Practice: Consider the trade-offs between memory efficiency and performance when selecting a PEFT method.
Potential Error: Using a PEFT method that is not compatible with your model architecture or task type.

Optimising Adapter Hyperparameters

Example: When configuring a LoRA adapter, experiment with different values for lora_alpha, lora_dropout, and r to find the optimal balance between performance and efficiency.

lora_config = LoraConfig(
    r=16,  # Experiment with different values of r
    lora_alpha=32,  # Experiment with different values of lora_alpha
    lora_dropout=0.1,  # Experiment with different values of lora_dropout
    target_modules=["q_proj", "v_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)

Tip: Start with the default hyperparameters and gradually tune them based on your task and dataset.
Best Practice: Use a validation set to evaluate the performance of different hyperparameter configurations.
Potential Error: Setting the hyperparameters to extreme values that lead to poor performance or unstable training.

Example: When saving a trained adapter, use a descriptive name that includes the model architecture, PEFT method, and task information.

model.save_pretrained("output/opt-1.3b-lora-custom-task")

Tip: Store adapters separately from the base model to facilitate reuse across different projects.
Best Practice: Use a version control system like Git to track changes to your adapter configurations and training scripts.
Potential Error: Overwriting an existing adapter by mistake when saving a new one.

Combining Multiple Adapters

Example: If you have multiple adapters trained on different tasks or datasets, you can combine them using model.set_adapter() to leverage their combined knowledge.

model.load_adapter("adapter1")
model.load_adapter("adapter2")
model.set_active_adapters(["adapter1", "adapter2"])

Tip: Experiment with different adapter combinations to find the ones that yield the best performance for your specific use case.
Best Practice: Ensure that the adapters you combine are compatible in terms of their PEFT method and model architecture.
Potential Error: Combining adapters that have conflicting or incompatible weights, leading to poor performance or unexpected behavior.

Fine-Tuning Additional Layers with PEFT Adapters

Example: If your task requires fine-tuning the language model head in addition to the adapter, specify modules_to_save=["lm_head"] in your PEFT configuration.

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    modules_to_save=["lm_head"],  # Fine-tune the language model head
)

Tip: Be cautious when fine-tuning additional layers, as it may increase the risk of overfitting, especially if you have a small dataset.
Best Practice: Start by fine-tuning only the adapter and gradually add additional layers if needed based on performance evaluation.
Potential Error: Fine-tuning too many additional layers, leading to overfitting and poor generalization.

Monitoring Adapter Training

Example: Use a logging library like Weights and Biases (wandb) to track the training progress, including loss curves, evaluation metrics, and hardware utilization.

import wandb

wandb.init(project="peft-adapter-training")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],  # Set up early stopping
)
trainer.train()

Tip: Regularly check the training logs to identify any anomalies or convergence issues.
Best Practice: Set up early stopping criteria to prevent overfitting and save computational resources.
Potential Error: Neglecting to monitor the training progress, leading to suboptimal results or wasted resources.

Remember to refer to the official Hugging Face PEFT documentation for the most up-to-date information and API references.

These code examples are meant to provide a starting point and may need to be adapted to your specific use case.

PreviousTraining Ideas around Hyperparameters NextAfter fine tuning LLama3

Last updated 1 year ago

Was this helpful?