# After fine tuning LLama3

The files you have obtained after fine-tuning the LLama3 model are the essential components needed to run inference with your fine-tuned model.&#x20;

Here's what each file represents and how you can use them to run your fine-tuned LLM:

<mark style="color:yellow;">**`adapter_config.json`**</mark><mark style="color:yellow;">**:**</mark> This file contains the configuration settings for the adapter (LoRA) used during fine-tuning. It includes information such as the base model path, LoRA hyperparameters, target modules, and more.

<mark style="color:yellow;">**`adapter_model.bin`**</mark><mark style="color:yellow;">**:**</mark> This file contains the trained LoRA weights. It represents the learned adaptations to the base model during fine-tuning.

<mark style="color:yellow;">**`checkpoint-*`**</mark><mark style="color:yellow;">**:**</mark> These files (e.g., `checkpoint-112`, `checkpoint-28`, etc.) represent the saved model checkpoints at different stages of the training process. They contain the model's state at specific iterations or epochs.

<mark style="color:yellow;">**`config.json`**</mark><mark style="color:yellow;">**:**</mark> This file contains the configuration settings for the base model, such as the model architecture, hidden size, number of layers, and other hyperparameters.

<mark style="color:yellow;">**`README.md`**</mark><mark style="color:yellow;">**:**</mark> This file provides information about the fine-tuned model, including the training details, evaluation results, and any additional notes.

<mark style="color:yellow;">**`special_tokens_map.json`**</mark><mark style="color:yellow;">**:**</mark> This file maps special token names to their corresponding token IDs in the tokenizer.

<mark style="color:yellow;">**`tokenizer_config.json`**</mark> and <mark style="color:yellow;">**`tokenizer.json`**</mark>: These files contain the configuration and trained weights of the tokenizer used by the model.

To run inference with your fine-tuned LLM using these files, you can follow these steps:

#### <mark style="color:green;">Load the base model (LLama3) using the</mark> <mark style="color:green;"></mark><mark style="color:green;">`config.json`</mark> <mark style="color:green;"></mark><mark style="color:green;">file</mark>

```python
from transformers import LlamaForCausalLM, AutoConfig

config = AutoConfig.from_pretrained("path/to/config.json")
model = LlamaForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B", config=config)
```

#### <mark style="color:green;">Load the LoRA adapter using the</mark> <mark style="color:green;"></mark><mark style="color:green;">`adapter_config.json`</mark> <mark style="color:green;"></mark><mark style="color:green;">and</mark> <mark style="color:green;"></mark><mark style="color:green;">`adapter_model.bin`</mark> <mark style="color:green;"></mark><mark style="color:green;">files:</mark>

```python
from peft import PeftModel

model = PeftModel.from_pretrained(model, "path/to/adapter_model.bin")
```

#### <mark style="color:green;">Load the tokenizer using the</mark> <mark style="color:green;"></mark><mark style="color:green;">`tokenizer_config.json`</mark> <mark style="color:green;"></mark><mark style="color:green;">and</mark> <mark style="color:green;"></mark><mark style="color:green;">`tokenizer.json`</mark> <mark style="color:green;"></mark><mark style="color:green;">files:</mark>

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("path/to/tokenizer_config.json")
```

#### <mark style="color:green;">Use the loaded model and tokenizer to run inference on your input text:</mark>

```python
input_text = "Your input text here"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids, max_length=100)

generated_text = tokenizer.decode(outputs[0])
print(generated_text)
```

Make sure to replace `"path/to/..."` with the actual paths to your files.

By following these steps, you can load your fine-tuned LLM and run inference on new input text using the trained LoRA adapter and tokenizer.

Remember to have the necessary dependencies installed, such as the `transformers` and `peft` libraries, and ensure that you have the required hardware (GPU) and sufficient memory to run the model.

You can refer to the `README.md` file for any additional instructions or notes specific to your fine-tuned model.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/after-fine-tuning-llama3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
