Lora Configuration

Field Name

Explanation

adapter

The adapter field specifies whether to use 'lora' for fine-tuning or leave it blank to train all parameters in the original model. Using 'lora' allows for more fine-grained control over certain model parameters during training.

lora_model_dir

If you already have a trained LoRA model that you want to load, you can specify the directory path here. This is useful for testing the model after training. Ensure this value matches the path to the saved LoRA model.

lora_r

lora_r is a hyperparameter that determines the number of repetitions or hops for the LoRA mechanism. It controls how many times the adapter parameters are applied to the input sequence. A higher value can capture longer dependencies but may require more resources.

lora_alpha

lora_alpha is another hyperparameter that controls the strength of the LoRA mechanism. It determines how much influence the adapter parameters have on the original model's output. Higher values make the adapter more influential.

lora_dropout

lora_dropout represents the dropout rate applied to the adapter's internal layers. Dropout is a regularization technique that helps prevent overfitting during training by randomly dropping some neurons.

lora_target_modules

lora_target_modules is a list of module names within the transformer architecture that the LoRA adapter parameters should be applied to. In this example, it targets 'q_proj' and 'v_proj' modules. You can uncomment and add more modules as needed.

lora_target_linear

If set to 'true,' the LoRA adapter will target all linear layers within the specified lora_target_modules. This option can be used for a more comprehensive adaptation of linear layers.

lora_modules_to_save

lora_modules_to_save specifies which LoRA modules should be saved. It's essential when adding new tokens to the tokenizer because certain LoRA modules need to be aware of these new tokens. In this example, the modules are commented out, but you should specify the modules to save as needed.

lora_out_dir

After completing training, the model will be saved to the directory specified by lora_out_dir. If you merge the adapter to the base model, a subdirectory named 'merged' will be created under this directory. Ensure that lora_model_dir points to this directory if you intend to use the trained model.

lora_fan_in_fan_out

When set to 'false,' this option disables the use of fan-in/fan-out scaling in the LoRA mechanism. Fan-in/fan-out scaling is a technique that helps stabilize training by scaling the adapter's parameters based on the input and output dimensions of the targeted modules.

relora_steps

relora_steps specifies the number of steps per ReLoRA (Restartable LoRA) restart. ReLoRA is a technique used to restart the LoRA mechanism during training to improve convergence. This field sets the frequency of restarts.

relora_warmup_steps

relora_warmup_steps sets the number of warm-up steps for each ReLoRA restart. Warm-up steps allow the LoRA mechanism to adapt gradually during a restart, which can improve training stability.

relora_cpu_offload

When set to 'true,' this option performs LoRA weight merges on the CPU during restarts, potentially saving GPU memory. It's useful for situations where GPU memory is a limiting factor during training.

PreviousSequence Configuration NextLogging

Last updated 1 year ago

Was this helpful?