| The adapter field specifies whether to use 'lora' for fine-tuning or leave it blank to train all parameters in the original model. Using 'lora' allows for more fine-grained control over certain model parameters during training. |
| If you already have a trained LoRA model that you want to load, you can specify the directory path here. This is useful for testing the model after training. Ensure this value matches the path to the saved LoRA model. |
| lora_r is a hyperparameter that determines the number of repetitions or hops for the LoRA mechanism. It controls how many times the adapter parameters are applied to the input sequence. A higher value can capture longer dependencies but may require more resources.
|
| lora_alpha is another hyperparameter that controls the strength of the LoRA mechanism. It determines how much influence the adapter parameters have on the original model's output. Higher values make the adapter more influential.
|
| lora_dropout represents the dropout rate applied to the adapter's internal layers. Dropout is a regularization technique that helps prevent overfitting during training by randomly dropping some neurons.
|
| lora_target_modules is a list of module names within the transformer architecture that the LoRA adapter parameters should be applied to. In this example, it targets 'q_proj' and 'v_proj' modules. You can uncomment and add more modules as needed.
|
| If set to 'true,' the LoRA adapter will target all linear layers within the specified lora_target_modules . This option can be used for a more comprehensive adaptation of linear layers. |
| lora_modules_to_save specifies which LoRA modules should be saved. It's essential when adding new tokens to the tokenizer because certain LoRA modules need to be aware of these new tokens. In this example, the modules are commented out, but you should specify the modules to save as needed.
|
| After completing training, the model will be saved to the directory specified by lora_out_dir . If you merge the adapter to the base model, a subdirectory named 'merged' will be created under this directory. Ensure that lora_model_dir points to this directory if you intend to use the trained model. |
| When set to 'false,' this option disables the use of fan-in/fan-out scaling in the LoRA mechanism. Fan-in/fan-out scaling is a technique that helps stabilize training by scaling the adapter's parameters based on the input and output dimensions of the targeted modules. |
| relora_steps specifies the number of steps per ReLoRA (Restartable LoRA) restart. ReLoRA is a technique used to restart the LoRA mechanism during training to improve convergence. This field sets the frequency of restarts.
|
| relora_warmup_steps sets the number of warm-up steps for each ReLoRA restart. Warm-up steps allow the LoRA mechanism to adapt gradually during a restart, which can improve training stability.
|
| When set to 'true,' this option performs LoRA weight merges on the CPU during restarts, potentially saving GPU memory. It's useful for situations where GPU memory is a limiting factor during training. |