After fine tuning LLama3

The files you have obtained after fine-tuning the LLama3 model are the essential components needed to run inference with your fine-tuned model.

Here's what each file represents and how you can use them to run your fine-tuned LLM:

adapter_config.json: This file contains the configuration settings for the adapter (LoRA) used during fine-tuning. It includes information such as the base model path, LoRA hyperparameters, target modules, and more.

adapter_model.bin: This file contains the trained LoRA weights. It represents the learned adaptations to the base model during fine-tuning.

checkpoint-*: These files (e.g., checkpoint-112, checkpoint-28, etc.) represent the saved model checkpoints at different stages of the training process. They contain the model's state at specific iterations or epochs.

config.json: This file contains the configuration settings for the base model, such as the model architecture, hidden size, number of layers, and other hyperparameters.

README.md: This file provides information about the fine-tuned model, including the training details, evaluation results, and any additional notes.

special_tokens_map.json: This file maps special token names to their corresponding token IDs in the tokenizer.

tokenizer_config.json and tokenizer.json: These files contain the configuration and trained weights of the tokenizer used by the model.

To run inference with your fine-tuned LLM using these files, you can follow these steps:

Load the base model (LLama3) using the `config.json` file

from transformers import LlamaForCausalLM, AutoConfig

config = AutoConfig.from_pretrained("path/to/config.json")
model = LlamaForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B", config=config)

Load the LoRA adapter using the `adapter_config.json` and `adapter_model.bin` files:

from peft import PeftModel

model = PeftModel.from_pretrained(model, "path/to/adapter_model.bin")

Load the tokenizer using the `tokenizer_config.json` and `tokenizer.json` files:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("path/to/tokenizer_config.json")

Use the loaded model and tokenizer to run inference on your input text:

input_text = "Your input text here"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids, max_length=100)

generated_text = tokenizer.decode(outputs[0])
print(generated_text)

Make sure to replace "path/to/..." with the actual paths to your files.

By following these steps, you can load your fine-tuned LLM and run inference on new input text using the trained LoRA adapter and tokenizer.

Remember to have the necessary dependencies installed, such as the transformers and peft libraries, and ensure that you have the required hardware (GPU) and sufficient memory to run the model.

You can refer to the README.md file for any additional instructions or notes specific to your fine-tuned model.

PreviousHugging Face documentation on loading PEFT NextMerging Model Weights

Last updated 1 year ago

Was this helpful?

Load the base model (LLama3) using the config.json file

Load the LoRA adapter using the adapter_config.json and adapter_model.bin files:

Load the tokenizer using the tokenizer_config.json and tokenizer.json files:

Use the loaded model and tokenizer to run inference on your input text:

Load the base model (LLama3) using the `config.json` file

Load the LoRA adapter using the `adapter_config.json` and `adapter_model.bin` files:

Load the tokenizer using the `tokenizer_config.json` and `tokenizer.json` files: