> For the complete documentation index, see [llms.txt](https://axolotl.continuumlabs.pro/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://axolotl.continuumlabs.pro/llama3/analysis-of-model-files/tokenizer-configuration-files.md).

# Tokenizer Configuration Files

The <mark style="color:blue;">**tokenizer\_config.json**</mark> and <mark style="color:blue;">**tokenizer.json**</mark> files serve different purposes in the tokenization process of the Llama3 language model.&#x20;

Let's clarify the difference between the two and how they interact:

### <mark style="color:blue;">tokenizer\_config.json</mark>

* This file contains the <mark style="color:yellow;">configuration settings for the tokenizer</mark>
* It defines the <mark style="color:yellow;">behaviour and properties of the tokenizer</mark>, such as the special tokens, maximum sequence length, and input tensor names.
* The tokenizer\_config.json file <mark style="color:yellow;">specifies how the tokenizer should handle and interpret the input text during the tokenization process.</mark>
* It includes settings like the beginning-of-sequence (BOS) token, end-of-sequence (EOS) token, and whether to clean up extra spaces during tokenization.
* The tokenizer\_config.json file also defines the mapping between special token IDs and their corresponding token content in the "added\_tokens\_decoder" section.

### <mark style="color:blue;">tokenizer.json</mark>

* This file <mark style="color:yellow;">contains the actual vocabulary and mappings used by the tokenizer to convert input text into token IDs.</mark>
* It defines the mapping between each word, subword, or character in the vocabulary and its corresponding unique token ID.
* The tokenizer.json file is used during the tokenization process to look up the token IDs for each word or subword in the input text.
* It is a crucial component of the tokenizer and is loaded by the tokenizer implementation to perform the actual tokenization.

### <mark style="color:blue;">Interaction between the two files</mark>

* The <mark style="color:blue;">**tokenizer\_config.json**</mark>**&#x20;file** provides the configuration settings for the tokenizer, specifying how it should behave and handle special tokens.
* The <mark style="color:blue;">**tokenizer.json**</mark> file contains the actual vocabulary and mappings used by the tokenizer to convert input text into token IDs.
* During the tokenization process, the tokenizer implementation loads both files:
  * It uses the tokenizer\_config.json file to configure its behavior and special token handling.
  * It uses the tokenizer.json file to look up the token IDs for each word or subword in the input text.
* The tokenizer applies the configuration settings from tokenizer\_config.json while utilizing the vocabulary and mappings from tokenizer.json to perform the tokenization.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://axolotl.continuumlabs.pro/llama3/analysis-of-model-files/tokenizer-configuration-files.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
