LogoLogo
Continuum Knowledge BankContinuum Applications
  • Introduction
  • Creation of Environment
    • Platform Installation
    • Axolotl Dependencies
    • setup.py objectives
      • script analysis
  • Huggingface Hub
  • Download the dataset
    • Types of Dataset Structures
    • Structuring Datasets for Fine-Tuning Large Language Models
    • Downloading Huggingface Datasets
    • Use Git to download dataset
    • Popular Datasets
    • Download cleaned Alpaca dataset
    • Template-free prompt construction
  • Downloading models
    • Phi 2.0 details
    • Downloading Phi 2.0
    • Available Models
  • Configuration for Training
  • Datasets
  • Model Selection - General
  • Phi 2.0
    • Phi 2.0 - Model Configuration
    • Phi 2.0 - Model Quantization
    • Phi 2.0 - Data Loading and Paths
    • Phi 2.0 - Sequence Configuration
    • Phi 2.0 - Lora Configuration
    • Phi 2.0 - Logging
    • Phi 2.0 - Training Configuration
    • Phi 2.0 - Data and Precision
    • Phi 2.0 - Optimisations
    • Phi 2.0 - Extra Hyperparameters
    • Phi 2.0 - All Configurations
    • Phi 2.0 - Preprocessing
    • Phi 2.0 - Training
    • Uploading Models
  • Llama2
    • Llama2 - Model Configuration
    • Llama2 - Model Quantization
    • Llama2 - Data Loading and Paths
    • Llama2 - Sequence Configuration
    • Llama2 - Lora Configuration
    • Llama2 - Logging
    • Llama2 - Training Configuration
    • Llama2 - Data and Precision
    • Llama2 - Optimisations
    • Llama2 - Extra Hyperparameters
    • Llama2- All Configurations
    • Llama2 - Training Configuration
    • Llama2 - Preprocessing
    • Llama2 - Training
  • Llama3
    • Downloading the model
    • Analysis of model files
      • Model Analysis - Configuration Parameters
      • Model Analysis - Safetensors
      • Tokenizer Configuration Files
        • Model Analysis - tokenizer.json
        • Model Analysis - Special Tokens
    • Llama3 - Model Configuration
    • Llama3 - Model Quantization
    • Llama3 - Data Loading and Paths
    • Llama3 - Sequence Configuration
    • Llama3 - Lora Configuration
    • Llama3 - Logging
    • Llama3 - Training Configuration
    • Llama3 - Data and Precision
    • Llama3 - Optimisations
    • Llama3 - Extra Hyperparameters
    • Llama3- All Configurations
    • Llama3 - Preprocessing
    • Llama3 - Training
    • Full Fine Tune
  • Special Tokens
  • Prompt Construction for Fine-Tuning Large Language Models
  • Memory-Efficient Fine-Tuning Techniques for Large Language Models
  • Training Ideas around Hyperparameters
    • Hugging Face documentation on loading PEFT
  • After fine tuning LLama3
  • Merging Model Weights
  • Merge Lora Instructions
  • Axolotl Configuration Files
    • Configuration Options
    • Model Configuration
    • Data Loading and Processing
    • Sequence Configuration
    • Lora Configuration
    • Logging
    • Training Configuration
    • Augmentation Techniques
  • Axolotl Fine-Tuning Tips & Tricks: A Comprehensive Guide
  • Axolotl debugging guide
  • Hugging Face Hub API
  • NCCL
  • Training Phi 1.5 - Youtube
  • JSON (JavaScript Object Notation)
  • General Tips
  • Datasets
Powered by GitBook
LogoLogo

This documentation is for the Axolotl community

On this page

Was this helpful?

Merging Model Weights

To merge your fine-tuned LoRA adapter with the base model and create a single model that can be used for inference, you can use the peft library's merge_and_unload() function.

Here's how you can do it:

Load the base model and the LoRA adapter

from transformers import LlamaForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_path = "/home/paperspace/axolotl/models/Meta-Llama-3-8B"
lora_model_path = "path/to/llama3-out"

tokenizer = AutoTokenizer.from_pretrained(base_model_path)
base_model = LlamaForCausalLM.from_pretrained(base_model_path)
lora_model = PeftModel.from_pretrained(base_model, lora_model_path)

Merge the LoRA adapter with the base model

merged_model = lora_model.merge_and_unload()

Save the merged model in the desired format (e.g., SafeTensors)

merged_model.save_pretrained("path/to/merged_model", safe_serialization=True)

This will save the merged model in the SafeTensors format, which is compatible with TensorRT-LLM.

Copy the necessary files from the base model directory to the merged model directory

  • config.json

  • generation_config.json

  • special_tokens_map.json

  • tokenizer_config.json

  • tokenizer.json

You can use the following commands to copy these files:

cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/config.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/generation_config.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/special_tokens_map.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/tokenizer_config.json path/to/merged_model/
cp /home/paperspace/axolotl/models/Meta-Llama-3-8B/tokenizer.json path/to/merged_model/

Make sure to replace "path/to/merged_model" with the actual path where you saved the merged model.

After following these steps, you should have a merged model directory that contains the necessary files for TensorRT-LLM optimization:

  • config.json

  • generation_config.json

  • model-00001-of-00001.safetensors (assuming a single SafeTensors file)

  • special_tokens_map.json

  • tokenizer_config.json

  • tokenizer.json

You can now use this merged model directory as input to TensorRT-LLM for optimization and deployment.

Note: The exact number of SafeTensors files generated during the saving process may vary depending on the model size and configuration. Make sure to include all the generated SafeTensors files in the merged model directory.

PreviousAfter fine tuning LLama3NextMerge Lora Instructions

Last updated 1 year ago

Was this helpful?

Page cover image