# Hugging Face documentation on loading PEFT

### <mark style="color:blue;">Summary</mark>

* PEFT (Parameter-Efficient Fine Tuning) methods freeze pretrained model parameters and add a small number of trainable adapter parameters, allowing for memory-efficient fine-tuning.
* Adapters are much smaller than full models, making them easier to share and store. Examples: OPT adapter is 6MB vs 700MB for full model.
* Transformers natively supports Low Rank Adapters, IA3, and AdaLoRA PEFT methods. Other methods require using the PEFT library.
* To load a PEFT adapter, the Hub repo or local directory needs an <mark style="color:yellow;">**adapter\_config.json**</mark> and the <mark style="color:yellow;">**adapter weights**</mark>. Use AutoModelFor\* class or model.load\_adapter().
* bitsandbytes integration allows loading in 8-bit or 4-bit precision to save memory.
* Multiple adapters of the same type can be added to a model. Use model.set\_adapter() to switch between them.
* Adapters can be enabled/disabled with model.enable\_adapters() and model.disable\_adapters() after being added.
* The Trainer class supports training PEFT adapters with minor code additions. Define adapter config, add to model, pass model to Trainer.
* Additional layers like the language model head can be fine-tuned on top of a PEFT adapter by specifying modules\_to\_save in the config.

### <mark style="color:blue;">Tutorials</mark>

#### <mark style="color:green;">Choosing the Right PEFT Method</mark>

Example: If you have a large model and limited GPU memory, consider using LoRA or AdaLoRA for parameter-efficient fine-tuning

```python
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b")
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
```

* Tip: Experiment with different PEFT methods to find the one that works best for your specific task and dataset.
* Best Practice: Consider the trade-offs between memory efficiency and performance when selecting a PEFT method.
* Potential Error: Using a PEFT method that is not compatible with your model architecture or task type.

#### <mark style="color:green;">Optimising Adapter Hyperparameters</mark>

Example: When configuring a LoRA adapter, experiment with different values for `lora_alpha`, `lora_dropout`, and `r` to find the optimal balance between performance and efficiency.

```python
lora_config = LoraConfig(
    r=16,  # Experiment with different values of r
    lora_alpha=32,  # Experiment with different values of lora_alpha
    lora_dropout=0.1,  # Experiment with different values of lora_dropout
    target_modules=["q_proj", "v_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)
```

* Tip: Start with the default hyperparameters and gradually tune them based on your task and dataset.
* Best Practice: Use a validation set to evaluate the performance of different hyperparameter configurations.
* Potential Error: Setting the hyperparameters to extreme values that lead to poor performance or unstable training.

#### <mark style="color:green;">Efficient Storage and Sharing of Adapters</mark>

Example: When saving a trained adapter, use a descriptive name that includes the model architecture, PEFT method, and task information.

```python
model.save_pretrained("output/opt-1.3b-lora-custom-task")
```

* Tip: Store adapters separately from the base model to facilitate reuse across different projects.
* Best Practice: Use a version control system like Git to track changes to your adapter configurations and training scripts.
* Potential Error: Overwriting an existing adapter by mistake when saving a new one.

#### <mark style="color:green;">Combining Multiple Adapters</mark>

Example: If you have multiple adapters trained on different tasks or datasets, you can combine them using `model.set_adapter()` to leverage their combined knowledge.

```python
model.load_adapter("adapter1")
model.load_adapter("adapter2")
model.set_active_adapters(["adapter1", "adapter2"])
```

* Tip: Experiment with different adapter combinations to find the ones that yield the best performance for your specific use case.
* Best Practice: Ensure that the adapters you combine are compatible in terms of their PEFT method and model architecture.
* Potential Error: Combining adapters that have conflicting or incompatible weights, leading to poor performance or unexpected behavior.

#### <mark style="color:green;">Fine-Tuning Additional Layers with PEFT Adapters</mark>

Example: If your task requires fine-tuning the language model head in addition to the adapter, specify `modules_to_save=["lm_head"]` in your PEFT configuration.

```python
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    modules_to_save=["lm_head"],  # Fine-tune the language model head
)
```

* Tip: Be cautious when fine-tuning additional layers, as it may increase the risk of overfitting, especially if you have a small dataset.
* Best Practice: Start by fine-tuning only the adapter and gradually add additional layers if needed based on performance evaluation.
* Potential Error: Fine-tuning too many additional layers, leading to overfitting and poor generalization.

#### <mark style="color:green;">Monitoring Adapter Training</mark>

Example: Use a logging library like Weights and Biases (wandb) to track the training progress, including loss curves, evaluation metrics, and hardware utilization.

```python
import wandb

wandb.init(project="peft-adapter-training")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],  # Set up early stopping
)
trainer.train()
```

* Tip: Regularly check the training logs to identify any anomalies or convergence issues.
* Best Practice: Set up early stopping criteria to prevent overfitting and save computational resources.
* Potential Error: Neglecting to monitor the training progress, leading to suboptimal results or wasted resources.

Remember to refer to the official Hugging Face PEFT documentation for the most up-to-date information and API references.&#x20;

These code examples are meant to provide a starting point and may need to be adapted to your specific use case.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/training-ideas-around-hyperparameters/hugging-face-documentation-on-loading-peft.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
