# Llama2 - Model Configuration

### <mark style="color:blue;">Model Configuration</mark>

The first configuration block of the Axolotl configuration file is 'model type'.  It comprises three main configurations.

1. base\_model
2. model\_type
3. tokenizer\_type

```yaml
base_model: NousResearch/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
```

Below is an analysis of the Huggingface Transformer classes that are used in the Axolotl training script:

<details>

<summary>Reference: <mark style="color:yellow;"><strong>LlamaForCausalLM</strong></mark> <mark style="color:green;">- a class within the Huggingface Transformer library</mark></summary>

The <mark style="color:yellow;">**`LlamaForCausalLM`**</mark> class is a high-level interface that allows users to easily use the Llama language model for causal language modelling tasks.&#x20;

Causal language modeling involves predicting the next word or token in a sequence based on the previous words or tokens. This class encapsulates the core functionalities of the Llama model, making it more accessible and user-friendly for developers and researchers working with language models.

It encapsulates the complexities of the underlying model architecture and <mark style="color:yellow;">offers methods for initialization, input/output handling, forward pass, and text generation</mark>.&#x20;

This class enables developers and researchers to easily fine-tune and use the Llama model for various natural language processing applications, such as text completion, content generation, and language understanding. It contributes to the field of LLMs by making the Llama model more accessible and facilitating its integration into real-world applications.

<mark style="color:green;">**Model Initialization**</mark>

* The class takes a configuration object <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`config`**</mark><mark style="color:yellow;">**)**</mark> that specifies the architecture and hyperparameters of the Llama model.
* It initializes the Llama model <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`LlamaModel`**</mark><mark style="color:yellow;">**)**</mark> using the provided configuration.
* It sets up the language modelling head <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`lm_head`**</mark><mark style="color:yellow;">**),**</mark> which is a linear layer that maps the hidden states of the model to the vocabulary size for predicting the next token.

<mark style="color:green;">**Class Inheritance**</mark>

The <mark style="color:yellow;">**`LlamaForCausalLM`**</mark> class inherits from the <mark style="color:yellow;">**`LlamaPreTrainedModel`**</mark> class, which is a base class for all Llama-based pretrained models.

<mark style="color:green;">**Class Attributes**</mark>

The class has a class attribute <mark style="color:yellow;">**`_tied_weights_keys`**</mark> which is a list containing the string <mark style="color:yellow;">**`"lm_head.weight"`**</mark>. This attribute is used for weight tying between the input and output embeddings.

<mark style="color:green;">Initialization</mark> <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`__init__`**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**method)**</mark>

* The <mark style="color:yellow;">**`__init__`**</mark> method takes a <mark style="color:yellow;">`b`</mark> parameter, which is an instance of a configuration class specific to the Llama model.
* It calls the <mark style="color:yellow;">**`super().__init__(config)`**</mark> to initialize the parent class with the provided configuration.
* It creates an instance of the <mark style="color:yellow;">**`LlamaModel`**</mark> class using the provided configuration and assigns it to the `model` attribute.
* It sets the <mark style="color:yellow;">**`vocab_size`**</mark> attribute based on the <mark style="color:yellow;">**`vocab_size`**</mark> from the configuration.
* It creates a linear layer <mark style="color:yellow;">**`lm_head`**</mark> with input size <mark style="color:yellow;">**`config.hidden_size`**</mark> and output size <mark style="color:yellow;">**`config.vocab_size`**</mark>, without bias.
* Finally, it calls the <mark style="color:yellow;">**`post_init()`**</mark> method to perform any necessary post-initialization steps.

<mark style="color:green;">**Embedding Methods**</mark>

The class provides methods to get and set the input and output embeddings:

* <mark style="color:yellow;">**`get_input_embeddings()`**</mark> returns the <mark style="color:yellow;">**`embed_tokens`**</mark> attribute of the `model`.
* <mark style="color:yellow;">**`set_input_embeddings(value)`**</mark> sets the <mark style="color:yellow;">**`embed_tokens`**</mark> attribute of the `model` to the provided `value`.
* <mark style="color:yellow;">**`get_output_embeddings()`**</mark> returns the <mark style="color:yellow;">**`lm_head`**</mark> attribute.
* <mark style="color:yellow;">**`set_output_embeddings(new_embeddings)`**</mark> sets the <mark style="color:yellow;">**`lm_head`**</mark> attribute to the provided `new_embeddings`.

<mark style="color:green;">**Decoder Methods**</mark>

* The class provides <mark style="color:blue;">methods to get and set the decoder:</mark>
  * <mark style="color:yellow;">**`set_decoder(decoder)`**</mark> sets the <mark style="color:yellow;">**`model`**</mark> attribute to the provided <mark style="color:yellow;">**`decoder`**</mark>.
  * <mark style="color:yellow;">**`get_decoder()`**</mark> returns the <mark style="color:yellow;">**`model`**</mark> attribute.

<mark style="color:green;">Forward Pass</mark> <mark style="color:yellow;">**(**</mark><mark style="color:yellow;">**`forward`**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**method)**</mark>

* The <mark style="color:yellow;">`b`</mark> method is decorated with `@add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)` and `@replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)`, which add docstrings and modify the return type documentation.
* It takes <mark style="color:blue;">various input parameters</mark> such as <mark style="color:yellow;">**`input_ids`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`attention_mask`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`position_ids`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`past_key_values`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`inputs_embeds`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`labels`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`use_cache`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`output_attentions`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`output_hidden_states`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`return_dict`**</mark><mark style="color:yellow;">**, and**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`cache_position`**</mark><mark style="color:yellow;">**.**</mark>
* It performs the forward pass of the Llama model by calling the <mark style="color:yellow;">**`model`**</mark> with the provided inputs and configuration options.
* It retrieves the hidden states from the model's output.
* If <mark style="color:yellow;">**`config.pretraining_tp > 1`**</mark>, it splits the <mark style="color:yellow;">`lm_head.weight`</mark> into slices and applies linear transformation to the hidden states using each slice, concatenating the results.
* Otherwise, it applies the <mark style="color:yellow;">**`lm_head`**</mark> linear layer to the hidden states.
* It calculates the language modelling loss if <mark style="color:yellow;">**`labels`**</mark> are provided, using the <mark style="color:yellow;">**`CrossEntropyLoss`**</mark> function.
* Finally, it returns the computed logits and other outputs based on the `return_dict` flag.

<mark style="color:green;">**Input Preparation for Generation (**</mark><mark style="color:green;">**`prepare_inputs_for_generation`**</mark><mark style="color:green;">**&#x20;**</mark><mark style="color:green;">**method)**</mark>

* This method prepares the inputs for the generation process.
* It handles the caching mechanism for past key values and adjusts the input tensors accordingly.
* It also handles the case where <mark style="color:yellow;">**`inputs_embeds`**</mark> are provided instead of <mark style="color:yellow;">**`input_ids`**</mark>.
* It returns a dictionary containing the prepared input tensors and configurations.

<mark style="color:green;">**Cache Reordering (**</mark><mark style="color:green;">**`_reorder_cache`**</mark><mark style="color:green;">**&#x20;**</mark><mark style="color:green;">**static method)**</mark>

* This static method is used to reorder the cache (past key values) based on the provided <mark style="color:yellow;">**`beam_idx`**</mark> during beam search decoding.
* It reorders the past states for each layer using the <mark style="color:yellow;">**`index_select`**</mark> operation.

Overall, the <mark style="color:yellow;">**`LlamaForCausalLM`**</mark> class is a high-level interface for using the Llama model for causal language modelling tasks.&#x20;

</details>

<details>

<summary>Reference: <mark style="color:yellow;"><strong>LlamaTokenizer</strong></mark> <mark style="color:green;">- a class within the Huggingface Transformer library</mark></summary>

Purpose:

* The <mark style="color:yellow;">**`AutoTokenizer`**</mark> class is designed to automatically instantiate the appropriate tokenizer class based on the provided pretrained model name or path.
* It serves as a convenient way to load tokenizers without explicitly specifying the tokenizer class.

<mark style="color:green;">Instantiation</mark>

* The class cannot be instantiated directly using the <mark style="color:yellow;">**`__init__()`**</mark> method. Instead, it raises an <mark style="color:yellow;">**`EnvironmentError`**</mark> to indicate that the class should be instantiated using the <mark style="color:yellow;">**`AutoTokenizer.from_pretrained()`**</mark> class method.

<mark style="color:yellow;">**`from_pretrained()`**</mark> <mark style="color:green;">class method</mark>

* This is the main method used to instantiate the appropriate tokenizer class.
* It takes the <mark style="color:yellow;">**`pretrained_model_name_or_path`**</mark> parameter, which can be a model ID, a path to a directory containing vocabulary files, or a path/URL to a single vocabulary file.
* Additional parameters can be passed to customize the tokenizer's behavior, such as <mark style="color:yellow;">**`use_fast`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`tokenizer_type`**</mark><mark style="color:yellow;">**,**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`trust_remote_code`**</mark>, and other tokenizer-specific arguments.
* The method first checks if the <mark style="color:yellow;">**`tokenizer_type`**</mark> is provided and tries to load the corresponding tokenizer class.
* If <mark style="color:yellow;">**`tokenizer_type`**</mark> is not provided, it attempts to load the tokenizer class based on the <mark style="color:yellow;">**`tokenizer_config`**</mark> or <mark style="color:yellow;">**`config`**</mark> associated with the pretrained model.
* If the tokenizer class is not found, it falls back to using the <mark style="color:yellow;">**`model_type`**</mark> derived from the configuration class.

<mark style="color:green;">Configuration handling</mark>

* The class uses the <mark style="color:yellow;">**`PretrainedConfig`**</mark> class to determine the appropriate tokenizer class to instantiate.
* It first tries to load the tokenizer configuration from the <mark style="color:yellow;">**`tokenizer_config`**</mark> file associated with the pretrained model.
* If the <mark style="color:yellow;">**`tokenizer_config`**</mark> is not available, it falls back to using the <mark style="color:yellow;">**`AutoConfig`**</mark> class to load the model configuration.

<mark style="color:green;">Fast tokenizers</mark>

* The class supports loading fast tokenizers, which are implemented in Rust and provide faster tokenization.
* If <mark style="color:yellow;">**`use_fast`**</mark> is set to <mark style="color:yellow;">**`True`**</mark> (default), it tries to load the fast tokenizer version if available.
* If the fast tokenizer is not available, it falls back to the slow (Python-based) tokenizer.

<mark style="color:green;">Trust remote code</mark>

* The class includes a <mark style="color:yellow;">**`trust_remote_code`**</mark> parameter to control whether to allow loading custom models defined on the Hugging Face Hub.
* If set to <mark style="color:yellow;">**`True`**</mark>, it executes code present on the Hub on the local machine, which should only be done for trusted repositories.

<mark style="color:green;">Error handling</mark>

* The class raises appropriate exceptions and provides informative error messages when the requested tokenizer class is not found or when there are inconsistencies in the provided parameters.

<mark style="color:green;">Tokenizer registration</mark>

* The class provides a <mark style="color:yellow;">**`register()`**</mark> method to register new tokenizers in the tokenizer mapping.
* It allows registering a configuration class along with the corresponding slow and fast tokenizer classes.

Overall, the <mark style="color:yellow;">**`AutoTokenizer`**</mark> class provides a convenient and flexible way to load tokenizers based on the pretrained model name or path.

It handles the complexity of determining the appropriate tokenizer class and provides options for customization. The class is well-structured and follows good coding practices, such as raising exceptions for invalid usage and providing clear error messages.

</details>

The next step after determining the model type configurations is to configure the <mark style="color:blue;">d</mark><mark style="color:blue;">**ata loading and processing parameters**</mark>
