# Merging Model Weights

To merge your fine-tuned LoRA adapter with the base model and create a single model that can be used for inference, you can use the `peft` library's `merge_and_unload()` function.&#x20;

Here's how you can do it:

<mark style="color:green;">Load the base model and the LoRA adapter</mark>

```python
from transformers import LlamaForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_path = "/home/paperspace/axolotl/models/Meta-Llama-3-8B"
lora_model_path = "path/to/llama3-out"

tokenizer = AutoTokenizer.from_pretrained(base_model_path)
base_model = LlamaForCausalLM.from_pretrained(base_model_path)
lora_model = PeftModel.from_pretrained(base_model, lora_model_path)
```

<mark style="color:green;">Merge the LoRA adapter with the base model</mark>

```python
merged_model = lora_model.merge_and_unload()
```

<mark style="color:green;">Save the merged model in the desired format (e.g., SafeTensors)</mark>

```python
merged_model.save_pretrained("path/to/merged_model", safe_serialization=True)
```

This will save the merged model in the SafeTensors format, which is compatible with TensorRT-LLM.

<mark style="color:green;">Copy the necessary files from the base model directory to the merged model directory</mark>

* `config.json`
* `generation_config.json`
* `special_tokens_map.json`
* `tokenizer_config.json`
* `tokenizer.json`

You can use the following commands to copy these files:

```bash
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/config.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/generation_config.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/special_tokens_map.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/tokenizer_config.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/tokenizer.json path/to/merged_model/
```

Make sure to replace `"path/to/merged_model"` with the actual path where you saved the merged model.

After following these steps, you should have a merged model directory that contains the necessary files for TensorRT-LLM optimization:

* `config.json`
* `generation_config.json`
* `model-00001-of-00001.safetensors` (assuming a single SafeTensors file)
* `special_tokens_map.json`
* `tokenizer_config.json`
* `tokenizer.json`

You can now use this merged model directory as input to TensorRT-LLM for optimization and deployment.

Note: The exact number of SafeTensors files generated during the saving process may vary depending on the model size and configuration. Make sure to include all the generated SafeTensors files in the merged model directory.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/merging-model-weights.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
