Hugging Face documentation on loading PEFT
Summary
PEFT (Parameter-Efficient Fine Tuning) methods freeze pretrained model parameters and add a small number of trainable adapter parameters, allowing for memory-efficient fine-tuning.
Adapters are much smaller than full models, making them easier to share and store. Examples: OPT adapter is 6MB vs 700MB for full model.
Transformers natively supports Low Rank Adapters, IA3, and AdaLoRA PEFT methods. Other methods require using the PEFT library.
To load a PEFT adapter, the Hub repo or local directory needs an adapter_config.json and the adapter weights. Use AutoModelFor* class or model.load_adapter().
bitsandbytes integration allows loading in 8-bit or 4-bit precision to save memory.
Multiple adapters of the same type can be added to a model. Use model.set_adapter() to switch between them.
Adapters can be enabled/disabled with model.enable_adapters() and model.disable_adapters() after being added.
The Trainer class supports training PEFT adapters with minor code additions. Define adapter config, add to model, pass model to Trainer.
Additional layers like the language model head can be fine-tuned on top of a PEFT adapter by specifying modules_to_save in the config.
Tutorials
Choosing the Right PEFT Method
Example: If you have a large model and limited GPU memory, consider using LoRA or AdaLoRA for parameter-efficient fine-tuning
Tip: Experiment with different PEFT methods to find the one that works best for your specific task and dataset.
Best Practice: Consider the trade-offs between memory efficiency and performance when selecting a PEFT method.
Potential Error: Using a PEFT method that is not compatible with your model architecture or task type.
Optimising Adapter Hyperparameters
Example: When configuring a LoRA adapter, experiment with different values for lora_alpha
, lora_dropout
, and r
to find the optimal balance between performance and efficiency.
Tip: Start with the default hyperparameters and gradually tune them based on your task and dataset.
Best Practice: Use a validation set to evaluate the performance of different hyperparameter configurations.
Potential Error: Setting the hyperparameters to extreme values that lead to poor performance or unstable training.
Efficient Storage and Sharing of Adapters
Example: When saving a trained adapter, use a descriptive name that includes the model architecture, PEFT method, and task information.
Tip: Store adapters separately from the base model to facilitate reuse across different projects.
Best Practice: Use a version control system like Git to track changes to your adapter configurations and training scripts.
Potential Error: Overwriting an existing adapter by mistake when saving a new one.
Combining Multiple Adapters
Example: If you have multiple adapters trained on different tasks or datasets, you can combine them using model.set_adapter()
to leverage their combined knowledge.
Tip: Experiment with different adapter combinations to find the ones that yield the best performance for your specific use case.
Best Practice: Ensure that the adapters you combine are compatible in terms of their PEFT method and model architecture.
Potential Error: Combining adapters that have conflicting or incompatible weights, leading to poor performance or unexpected behavior.
Fine-Tuning Additional Layers with PEFT Adapters
Example: If your task requires fine-tuning the language model head in addition to the adapter, specify modules_to_save=["lm_head"]
in your PEFT configuration.
Tip: Be cautious when fine-tuning additional layers, as it may increase the risk of overfitting, especially if you have a small dataset.
Best Practice: Start by fine-tuning only the adapter and gradually add additional layers if needed based on performance evaluation.
Potential Error: Fine-tuning too many additional layers, leading to overfitting and poor generalization.
Monitoring Adapter Training
Example: Use a logging library like Weights and Biases (wandb) to track the training progress, including loss curves, evaluation metrics, and hardware utilization.
Tip: Regularly check the training logs to identify any anomalies or convergence issues.
Best Practice: Set up early stopping criteria to prevent overfitting and save computational resources.
Potential Error: Neglecting to monitor the training progress, leading to suboptimal results or wasted resources.
Remember to refer to the official Hugging Face PEFT documentation for the most up-to-date information and API references.
These code examples are meant to provide a starting point and may need to be adapted to your specific use case.
Last updated