Merging Model Weights
To merge your fine-tuned LoRA adapter with the base model and create a single model that can be used for inference, you can use the peft
library's merge_and_unload()
function.
Here's how you can do it:
Load the base model and the LoRA adapter
Merge the LoRA adapter with the base model
Save the merged model in the desired format (e.g., SafeTensors)
This will save the merged model in the SafeTensors format, which is compatible with TensorRT-LLM.
Copy the necessary files from the base model directory to the merged model directory
config.json
generation_config.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
You can use the following commands to copy these files:
Make sure to replace "path/to/merged_model"
with the actual path where you saved the merged model.
After following these steps, you should have a merged model directory that contains the necessary files for TensorRT-LLM optimization:
config.json
generation_config.json
model-00001-of-00001.safetensors
(assuming a single SafeTensors file)special_tokens_map.json
tokenizer_config.json
tokenizer.json
You can now use this merged model directory as input to TensorRT-LLM for optimization and deployment.
Note: The exact number of SafeTensors files generated during the saving process may vary depending on the model size and configuration. Make sure to include all the generated SafeTensors files in the merged model directory.
Last updated