# Phi 2.0 - Model Configuration

### <mark style="color:blue;">Model Configuration</mark>

The first configuration block of the Axolotl configuration file is <mark style="color:blue;">'model type'.</mark>  It comprises three main configurations.

1. base\_model
2. model\_type
3. tokenizer\_type

```yaml
base_model: microsoft/phi-2
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
```

Below are explanations of the Huggingface Transformer classes that are used within the Axolotl training script specific to the model and tokenizer type.

<details>

<summary>Reference: <mark style="color:yellow;">AutoModelForCausalLM</mark> <mark style="color:green;">- class within the Huggingface Transformer library</mark></summary>

The <mark style="color:yellow;">**`AutoModelForCausalLM`**</mark> class is part of the Hugging Face Transformers library and is designed to provide a convenient way to instantiate and work with pre-trained models for causal language modelling tasks.

Causal language modelling, also known as autoregressive language modeling, is a type of language modelling task where the model predicts the next token in a sequence based on the previous tokens.&#x20;

In other words, given a sequence of tokens, the model learns to predict the probability distribution of the next token. This is commonly used for tasks like text generation, where the model generates text by predicting one token at a time based on the previously generated tokens.

The <mark style="color:yellow;">**`AutoModelForCausalLM`**</mark> class is a subclass of <mark style="color:yellow;">**`_BaseAutoModelClass`**</mark>, which is a <mark style="color:blue;">base class for all the auto model classes in the Transformers library</mark>.&#x20;

The purpose of the auto model classes is to provide a unified interface for loading and using pre-trained models for various tasks.

Here's how the <mark style="color:yellow;">**`AutoModelForCausalLM`**</mark> class works:

1. It has a class attribute <mark style="color:yellow;">**`_model_mapping`**</mark> that maps the model names to their corresponding classes for causal language modelling. This mapping allows the class to automatically determine the appropriate model class based on the provided model name or path.
2. When you call the <mark style="color:yellow;">**`from_pretrained()`**</mark> method of <mark style="color:yellow;">**`AutoModelForCausalLM`**</mark> and provide a pre-trained model name or path, it automatically retrieves the corresponding model class from the <mark style="color:yellow;">**`_model_mapping`**</mark> based on the model name.
3. It then initializes and returns an instance of the retrieved model class, which can be used for causal language modelling tasks.

The advantage of using <mark style="color:yellow;">**`AutoModelForCausalLM`**</mark> is that you don't need to know the specific model class for a given pre-trained model. You can simply provide the model name or path, and the class will handle the instantiation of the appropriate model class for you.

For example, if you have a pre-trained GPT-2 model and want to use it for causal language modelling, you can do the following:

```python
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("gpt2")
```

This code will <mark style="color:blue;">automatically load the GPT-2 model</mark> and return an instance of the appropriate model class for causal language modelling.

Overall, the <mark style="color:yellow;">**`AutoModelForCausalLM`**</mark> class provides a convenient and flexible way to work with pre-trained models for causal language modelling tasks, abstracting away the need to know the specific model classes and allowing you to focus on using the models for your desired task.

</details>

<details>

<summary>Reference: <mark style="color:yellow;"><strong>AutoTokenizer</strong></mark> <mark style="color:green;">- a class within the Huggingface Transformer library</mark></summary>

The <mark style="color:yellow;">**`AutoTokenizer`**</mark> class is a powerful and versatile tool in the Hugging Face Transformers library that simplifies the process of instantiating the appropriate tokenizer for a given pretrained model.

It serves as a high-level interface that automatically selects and initializes the correct tokenizer class based on the provided pretrained model name or path.

Let's dive into the key aspects of the <mark style="color:yellow;">**`AutoTokenizer`**</mark> class and understand how it enhances the usability and flexibility of tokenization in natural language processing tasks:

Automatic Tokenizer Selection

* The primary purpose of the <mark style="color:yellow;">**`AutoTokenizer`**</mark> class is to *<mark style="color:yellow;">**automatically determine the appropriate tokenizer class to use based on the pretrained model.**</mark>*
* It eliminates the need for users to manually specify the tokenizer class, saving time and reducing the chances of errors.
* The class leverages various methods to infer the tokenizer class, such as examining the model's configuration, using pattern matching on the model name or path, or utilizing a tokenizer configuration file.

<mark style="color:green;">**Pretrained Model Support**</mark>

* The `AutoTokenizer` class seamlessly integrates with pretrained models available on the Hugging Face Model Hub or locally saved models.
* It accepts a <mark style="color:yellow;">**`pretrained_model_name_or_path`**</mark> parameter, which can be a model identifier, a path to a directory containing the necessary files, or a URL to a specific file.
* This flexibility allows users to easily load tokenizers associated with a wide range of pretrained models, enabling quick experimentation and transfer learning.

<mark style="color:green;">**Tokenizer Instantiation**</mark>

* The <mark style="color:yellow;">**`from_pretrained()`**</mark> class method is the primary entry point for instantiating tokenizers using the <mark style="color:yellow;">**`AutoTokenizer`**</mark><mark style="color:yellow;">**.**</mark>
* It takes care of downloading and caching the required files, such as vocabulary files, if they are not already present locally.
* The method accepts various parameters to customize the tokenizer's behavior, such as specifying the tokenizer type, using a fast tokenizer variant, or providing additional keyword arguments.

<mark style="color:green;">**Fast Tokenizers**</mark>

* The <mark style="color:yellow;">**`AutoTokenizer`**</mark> class supports the use of fast tokenizers, which are implemented in Rust and offer improved performance compared to their Python counterparts.
* By setting the <mark style="color:yellow;">**`use_fast`**</mark> parameter to <mark style="color:yellow;">**`True`**</mark> (default), the class automatically selects the fast tokenizer variant if available for the given model.
* If a fast tokenizer is not available, it gracefully falls back to the standard Python-based tokenizer.

<mark style="color:green;">**Trust Remote Code**</mark>

* The <mark style="color:yellow;">**`trust_remote_code`**</mark> parameter allows users to control whether the <mark style="color:yellow;">**`AutoTokenizer`**</mark> should trust and execute custom tokenization code defined in the model's repository.
* This feature is useful for models that require specific tokenization logic but should be used with caution and only for trusted repositories.

<mark style="color:green;">**Tokenizer Configuration**</mark>

* The <mark style="color:yellow;">**`AutoTokenizer`**</mark> class utilizes configuration objects <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`PretrainedConfig`**</mark><mark style="color:yellow;">**)**</mark> to determine the appropriate tokenizer class to instantiate.
* It first attempts to load the tokenizer configuration from a dedicated file <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`tokenizer_config`**</mark><mark style="color:yellow;">**)**</mark> associated with the pretrained model.
* If the tokenizer configuration is not available, it falls back to using the model's configuration <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`AutoConfig`**</mark><mark style="color:yellow;">**)**</mark> to infer the tokenizer class.

<mark style="color:green;">**Tokenizer Registration**</mark>

* The <mark style="color:yellow;">**`AutoTokenizer`**</mark> class provides a <mark style="color:yellow;">**`register()`**</mark> method that allows users to register new tokenizer classes.
* This feature is particularly useful for extending the <mark style="color:yellow;">**`AutoTokenizer`**</mark> to support custom or newly developed tokenizers.
* By registering a configuration class along with the corresponding slow and fast tokenizer classes, users can seamlessly integrate their own tokenizers into the <mark style="color:yellow;">**`AutoTokenizer`**</mark> ecosystem.

In summary, the <mark style="color:yellow;">**`AutoTokenizer`**</mark> class is a powerful tool that simplifies the process of initializing tokenizers for pretrained models.&#x20;

It abstracts away the complexities of manually selecting and instantiating tokenizer classes, allowing users to focus on their natural language processing tasks.&#x20;

With its automatic tokenizer selection, support for pretrained models, fast tokenizer variants, and extensibility through registration, the <mark style="color:yellow;">**`AutoTokenizer`**</mark> class greatly enhances the usability and flexibility of tokenization in the Hugging Face Transformers library.

</details>

The next step after determining the model type configurations is to configure the <mark style="color:blue;">data loading and processing parameters</mark>
