# Merge Lora Instructions

### <mark style="color:blue;">Training LoRA (Low-Rank Adapter)</mark>

* Axolotl allows you to fine-tune a base model using LoRA, which is a parameter-efficient fine-tuning method.
* You can train a LoRA adapter on top of the base model using a configuration file that specifies the training details, such as the dataset, hyperparameters, and LoRA-specific settings.

#### <mark style="color:green;">Example configuration snippet</mark>

```yaml
load_in_8bit: true
load_in_4bit: false
strict: false
sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
```

#### <mark style="color:green;">Merging LoRA with the Base Model</mark>

* After training the LoRA adapter, you need to merge it with the base model to create a single, fine-tuned model.
* Axolotl provides a command to merge the LoRA adapter using the <mark style="color:yellow;">**`axolotl.cli.merge_lora`**</mark> command.
* Typical command to merge a local LoRA:

{% code overflow="wrap" %}

```
python3 -m axolotl.cli.merge_lora examples/llama-3/lora-8b.yml --lora_model_dir="llama4-out" 
```

{% endcode %}

* If the LoRA model is not stored locally, you may need to download it first and specify the local directory using the <mark style="color:yellow;">**`--lora_model_dir`**</mark> argument.
* If you encounter CUDA memory issues during merging, you can try merging in system RAM by setting <mark style="color:yellow;">**`CUDA_VISIBLE_DEVICES=""`**</mark> before the merge command.

<details>

<summary><mark style="color:green;"><strong>merge_lora script analysis</strong></mark></summary>

The script is a command-line interface (CLI) tool to merge a trained LoRA (Low-Rank Adaptation) model into a base model.

### <mark style="color:blue;">It imports the following external classes and modules</mark>

* <mark style="color:yellow;">**`Path`**</mark> from the <mark style="color:yellow;">**`pathlib`**</mark> module: This class provides an object-oriented interface for working with file and directory paths.
* <mark style="color:yellow;">**`fire`**</mark> module: This is a library for automatically generating command-line interfaces from Python functions and classes.
* <mark style="color:yellow;">**`transformers`**</mark> module: This is the Hugging Face Transformers library, which provides state-of-the-art pre-trained models for natural language processing tasks.
* <mark style="color:yellow;">**`do_merge_lora`**</mark>, <mark style="color:yellow;">**`load_cfg`**</mark>, and <mark style="color:yellow;">**`print_axolotl_text_art`**</mark> from the <mark style="color:yellow;">**`axolotl.cli`**</mark> module: These are custom functions specific to the Axolotl project, likely related to merging LoRA models and loading configuration files.
* <mark style="color:yellow;">**`TrainerCliArgs`**</mark> from the <mark style="color:yellow;">**`axolotl.common.cli`**</mark> module: This is likely a custom class defining the command-line arguments for the trainer.

1. The script defines a <mark style="color:yellow;">**`do_cli`**</mark> function that takes a <mark style="color:yellow;">**`config`**</mark> parameter (default is `Path("examples/")`) and any additional keyword arguments (<mark style="color:yellow;">**`**kwargs`**</mark>).&#x20;

#### <mark style="color:green;">This function is the main entry point for the CLI.</mark>

Inside the <mark style="color:yellow;">**`do_cli`**</mark> function:

* It prints the Axolotl text art using the <mark style="color:yellow;">**`print_axolotl_text_art`**</mark> function.
* It creates a <mark style="color:yellow;">**`transformers.HfArgumentParser`**</mark> instance with <mark style="color:yellow;">**`TrainerCliArgs`**</mark> to parse the command-line arguments.
* It parses the command-line arguments into <mark style="color:yellow;">**`parsed_cli_args`**</mark> using the <mark style="color:yellow;">**`parse_args_into_dataclasses`**</mark> method.
* It sets <mark style="color:yellow;">**`parsed_cli_args.merge_lora`**</mark> to <mark style="color:yellow;">**`True`**</mark>**.**
* It loads the configuration using the <mark style="color:yellow;">**`load_cfg`**</mark> function with the provided <mark style="color:yellow;">**`config`**</mark> path and additional keyword arguments.
* It performs some validation and sets default values for the <mark style="color:yellow;">**`lora_model_dir`**</mark> and <mark style="color:yellow;">**`output_dir`**</mark> based on the loaded configuration.
* It sets <mark style="color:yellow;">**`load_in_4bit`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`load_in_8bit`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`flash_attention`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`deepspeed`**</mark>, and <mark style="color:yellow;">**`fsdp`**</mark> to <mark style="color:yellow;">**`False`**</mark> or <mark style="color:yellow;">**`None`**</mark>.
* It calls the <mark style="color:yellow;">**`do_merge_lora`**</mark> function with the loaded configuration (<mark style="color:yellow;">**`parsed_cfg`**</mark>) and parsed command-line arguments (<mark style="color:yellow;">**`parsed_cli_args`**</mark>).

Finally, if the script is run as the main module (<mark style="color:yellow;">**`__name__ == "__main__"`**</mark>), it uses the <mark style="color:yellow;">**`fire.Fire`**</mark> function to automatically generate a command-line interface for the <mark style="color:yellow;">**`do_cli`**</mark> function.

To learn more about the external classes and modules used in this script:

* For the <mark style="color:yellow;">**`fire`**</mark> module, refer to the Fire documentation: <https://github.com/google/python-fire>
* For the <mark style="color:yellow;">**`transformers`**</mark> module, refer to the Hugging Face Transformers documentation: <https://huggingface.co/docs/transformers/>
* For `axolotl.cli` and `axolotl.common.cli`, these are likely custom modules specific to the Axolotl project. You should refer to the project's documentation or codebase for more information.

</details>

<mark style="color:blue;">python3</mark> -m <mark style="color:purple;">axolotl.cli.merge\_lora</mark> <mark style="color:yellow;">examples/llama-3/lora-8b.yml</mark> --lora\_model\_dir="llama4-out"&#x20;

* If you trained a QLoRA (Quantized LoRA) model that can only fit into GPU memory at 4-bit quantization, merging can be challenging due to memory constraints.
* To merge a QLoRA model, you need to ensure that the model remains quantized throughout the merging process.
* Modify the merge script to load the model with the appropriate quantization configuration, such as using the <mark style="color:yellow;">**`bitsandbytes`**</mark> library for 4-bit quantization.
* Use libraries like <mark style="color:yellow;">**`accelerate`**</mark> for managing device memory and offloading parts of the model to CPU or disk if necessary.

#### <mark style="color:green;">Warnings and Considerations</mark>

* Axolotl may raise warnings related to sample packing without flash attention or SDP attention, indicating that it does not handle cross-attention in those cases.
* It is recommended to set <mark style="color:yellow;">**`load_in_8bit: true`**</mark> for LoRA fine-tuning, even if the warning suggests otherwise.
* Merging quantized models, especially with parameter-efficient fine-tuning methods like QLoRA, can be complex and may require adjustments to the standard merging scripts.

#### <mark style="color:green;">Merging Duration</mark>

* The time taken to merge a LoRA adapter back to the base model depends on the model size and hardware.
* For a 70B parameter model fine-tuned on 4 A100 GPUs, the merging process can take a significant amount of time (over an hour or more).

These are the key ideas and considerations when using model merging with Axolotl. It's important to carefully configure the training and merging processes, handle quantization appropriately, and be aware of potential memory constraints and warnings.&#x20;

Consulting the official Axolotl documentation and seeking guidance from the Axolotl community can provide further assistance in navigating the model merging process.

### <mark style="color:blue;">Tips and Tricks</mark>

Ensure that you have generated the LoRA model before attempting to merge it.&#x20;

The typical workflow is to first train the LoRA model and then merge it in a separate command.

Use the <mark style="color:yellow;">**`--merge_lora`**</mark> flag along with the <mark style="color:yellow;">**`--lora_model_dir`**</mark> flag to specify the directory containing the trained LoRA model.&#x20;

For example:

{% code overflow="wrap" %}

```bash
accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml --merge_lora --lora_model_dir="./qlora-out"
```

{% endcode %}

Set <mark style="color:yellow;">**`--load_in_8bit=False`**</mark> and <mark style="color:yellow;">**`--load_in_4bit=False`**</mark> when merging the LoRA model to avoid compatibility issues. The 4-bit and 8-bit loading options are not supported for merging.

If you encounter the error "ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is", try using the old command for merging:

{% code overflow="wrap" %}

```bash
python3 scripts/finetune.py examples/code-llama/7b/qlora.yml --merge_lora --lora_model_dir="./qlora-out" --load_in_8bit=False --load_in_4bit=False
```

{% endcode %}

Make sure you have the latest version of Axolotl installed, as some issues might have been resolved in newer versions.

If you encounter CUDA-related errors, try setting the <mark style="color:yellow;">**`CUDA_VISIBLE_DEVICES`**</mark> environment variable to specify the desired GPU device. For example:

{% code overflow="wrap" fullWidth="false" %}

```bash
CUDA_VISIBLE_DEVICES="0" accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml --merge_lora --lora_model_dir="./qlora-out" --load_in_8bit=False --load_in_4bit=False
```

{% endcode %}

After merging the LoRA model, only the <mark style="color:yellow;">**`pytorch_model.bin`**</mark> file will work.&#x20;

Attempting to quantize it directly may fail.&#x20;

To quantize the merged model, you may need to copy additional files (e.g., <mark style="color:yellow;">**`tokenizer.model`**</mark>) from the original model and use external tools like <mark style="color:yellow;">**`llama.cpp`**</mark> for quantization.

If you encounter issues with missing files or directories, double-check the paths specified in the command and ensure that the necessary files and directories exist.

Remember to refer to the Axolotl documentation and the GitHub issues for the most up-to-date information and troubleshooting steps.
