# Llama2 - Lora Configuration

This is the default Lora Configuration. &#x20;

```yaml
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
```

<mark style="color:yellow;">**`lora_r`**</mark>

This parameter determines the rank of the low-rank matrices used in LoRA.&#x20;

It controls the capacity and expressiveness of the LoRA adaptation. A higher value of <mark style="color:yellow;">**`lora_r`**</mark> allows for more fine-grained adaptations but also increases the number of trainable parameters.&#x20;

In this configuration, **`lora_r`** is set to 32.

<mark style="color:yellow;">**`lora_alpha`**</mark>

This parameter controls the scaling factor applied to the LoRA adaptation.&#x20;

It determines the contribution of the LoRA matrices to the original model's weights.

A higher value of **`lora_alpha`** gives more importance to the LoRA adaptation. In your configuration, **`lora_alpha`** is set to 16.

<mark style="color:yellow;">**`lora_dropout`**</mark>

This parameter specifies the dropout rate applied to the LoRA matrices during training.&#x20;

Dropout is a regularization technique that helps prevent overfitting.&#x20;

A value of 0.05 means that 5% of the elements in the LoRA matrices will be randomly set to zero during training.

<mark style="color:yellow;">**`lora_target_modules`**</mark>

This parameter specifies the names of the modules in the model architecture where LoRA will be applied.&#x20;

In this  configuration, LoRA is applied to the <mark style="color:yellow;">**`q_proj`**</mark> and <mark style="color:yellow;">**`v_proj`**</mark> modules, which are likely the query and value projection matrices in the attention mechanism.

We have commented out other potential target modules like <mark style="color:yellow;">**`k_proj`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`o_proj`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`gate_proj`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`down_proj`**</mark>, and <mark style="color:yellow;">**`up_proj`**</mark>.

<mark style="color:yellow;">**`lora_target_linear`**</mark>

This parameter is not set in your configuration.&#x20;

If set to `true`, LoRA will be applied to all linear modules in the model.

<mark style="color:yellow;">**`peft_layers_to_transform`**</mark>

This parameter allows you to specify the indices of the layers to which LoRA should be applied.

If not specified, LoRA will be applied to all layers by default.

<mark style="color:yellow;">**`lora_modules_to_save`**</mark>

This parameter is relevant when you have added new tokens to the tokenizer. In such cases, you may need to save certain LoRA modules that are aware of the new tokens.&#x20;

For LLaMA and Mistral models, you typically need to save <mark style="color:yellow;">**`embed_tokens`**</mark> and <mark style="color:yellow;">**`lm_head`**</mark> modules. <mark style="color:yellow;">**`embed_tokens`**</mark> converts tokens to embeddings, and <mark style="color:yellow;">**`lm_head`**</mark> converts embeddings to token probabilities.&#x20;

In this configuration, these modules are commented out.

<mark style="color:yellow;">**`lora_fan_in_fan_out`**</mark>

This parameter determines the structure of the LoRA matrices. If set to <mark style="color:yellow;">**`true`**</mark>, it uses a more efficient implementation of LoRA that reduces the number of additional parameters. In your configuration, it is set to <mark style="color:yellow;">**`false`**</mark>.

These hyperparameters allow you to control various aspects of the LoRA adaptation during fine-tuning.&#x20;

The optimal values for these hyperparameters may vary depending on your specific task, dataset, and model architecture.&#x20;

It's recommended to experiment with different configurations and monitor the performance to find the best settings for your use case.
