Merge Lora Instructions
merge_lora
Training LoRA (Low-Rank Adapter)
Axolotl allows you to fine-tune a base model using LoRA, which is a parameter-efficient fine-tuning method.
You can train a LoRA adapter on top of the base model using a configuration file that specifies the training details, such as the dataset, hyperparameters, and LoRA-specific settings.
Example configuration snippet
Merging LoRA with the Base Model
After training the LoRA adapter, you need to merge it with the base model to create a single, fine-tuned model.
Axolotl provides a command to merge the LoRA adapter using the
axolotl.cli.merge_lora
command.Typical command to merge a local LoRA:
If the LoRA model is not stored locally, you may need to download it first and specify the local directory using the
--lora_model_dir
argument.If you encounter CUDA memory issues during merging, you can try merging in system RAM by setting
CUDA_VISIBLE_DEVICES=""
before the merge command.
python3 -m axolotl.cli.merge_lora examples/llama-3/lora-8b.yml --lora_model_dir="llama4-out"
If you trained a QLoRA (Quantized LoRA) model that can only fit into GPU memory at 4-bit quantization, merging can be challenging due to memory constraints.
To merge a QLoRA model, you need to ensure that the model remains quantized throughout the merging process.
Modify the merge script to load the model with the appropriate quantization configuration, such as using the
bitsandbytes
library for 4-bit quantization.Use libraries like
accelerate
for managing device memory and offloading parts of the model to CPU or disk if necessary.
Warnings and Considerations
Axolotl may raise warnings related to sample packing without flash attention or SDP attention, indicating that it does not handle cross-attention in those cases.
It is recommended to set
load_in_8bit: true
for LoRA fine-tuning, even if the warning suggests otherwise.Merging quantized models, especially with parameter-efficient fine-tuning methods like QLoRA, can be complex and may require adjustments to the standard merging scripts.
Merging Duration
The time taken to merge a LoRA adapter back to the base model depends on the model size and hardware.
For a 70B parameter model fine-tuned on 4 A100 GPUs, the merging process can take a significant amount of time (over an hour or more).
These are the key ideas and considerations when using model merging with Axolotl. It's important to carefully configure the training and merging processes, handle quantization appropriately, and be aware of potential memory constraints and warnings.
Consulting the official Axolotl documentation and seeking guidance from the Axolotl community can provide further assistance in navigating the model merging process.
Tips and Tricks
Ensure that you have generated the LoRA model before attempting to merge it.
The typical workflow is to first train the LoRA model and then merge it in a separate command.
Use the --merge_lora
flag along with the --lora_model_dir
flag to specify the directory containing the trained LoRA model.
For example:
Set --load_in_8bit=False
and --load_in_4bit=False
when merging the LoRA model to avoid compatibility issues. The 4-bit and 8-bit loading options are not supported for merging.
If you encounter the error "ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is", try using the old command for merging:
Make sure you have the latest version of Axolotl installed, as some issues might have been resolved in newer versions.
If you encounter CUDA-related errors, try setting the CUDA_VISIBLE_DEVICES
environment variable to specify the desired GPU device. For example:
After merging the LoRA model, only the pytorch_model.bin
file will work.
Attempting to quantize it directly may fail.
To quantize the merged model, you may need to copy additional files (e.g., tokenizer.model
) from the original model and use external tools like llama.cpp
for quantization.
If you encounter issues with missing files or directories, double-check the paths specified in the command and ensure that the necessary files and directories exist.
Remember to refer to the Axolotl documentation and the GitHub issues for the most up-to-date information and troubleshooting steps.
Last updated