Llama2 - Model Configuration

Model Configuration

The first configuration block of the Axolotl configuration file is 'model type'. It comprises three main configurations.

base_model
model_type
tokenizer_type

base_model: NousResearch/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

Below is an analysis of the Huggingface Transformer classes that are used in the Axolotl training script:

Reference: LlamaForCausalLM - a class within the Huggingface Transformer library

The LlamaForCausalLM class is a high-level interface that allows users to easily use the Llama language model for causal language modelling tasks.

Causal language modeling involves predicting the next word or token in a sequence based on the previous words or tokens. This class encapsulates the core functionalities of the Llama model, making it more accessible and user-friendly for developers and researchers working with language models.

It encapsulates the complexities of the underlying model architecture and offers methods for initialization, input/output handling, forward pass, and text generation.

This class enables developers and researchers to easily fine-tune and use the Llama model for various natural language processing applications, such as text completion, content generation, and language understanding. It contributes to the field of LLMs by making the Llama model more accessible and facilitating its integration into real-world applications.

Model Initialization

The class takes a configuration object (config) that specifies the architecture and hyperparameters of the Llama model.
It initializes the Llama model (LlamaModel) using the provided configuration.
It sets up the language modelling head (lm_head), which is a linear layer that maps the hidden states of the model to the vocabulary size for predicting the next token.

Class Inheritance

The LlamaForCausalLM class inherits from the LlamaPreTrainedModel class, which is a base class for all Llama-based pretrained models.

Class Attributes

The class has a class attribute _tied_weights_keys which is a list containing the string "lm_head.weight". This attribute is used for weight tying between the input and output embeddings.

Initialization (__init__ method)

The __init__ method takes a b parameter, which is an instance of a configuration class specific to the Llama model.
It calls the super().__init__(config) to initialize the parent class with the provided configuration.
It creates an instance of the LlamaModel class using the provided configuration and assigns it to the model attribute.
It sets the vocab_size attribute based on the vocab_size from the configuration.
It creates a linear layer lm_head with input size config.hidden_size and output size config.vocab_size, without bias.
Finally, it calls the post_init() method to perform any necessary post-initialization steps.

Embedding Methods

The class provides methods to get and set the input and output embeddings:

get_input_embeddings() returns the embed_tokens attribute of the model.
set_input_embeddings(value) sets the embed_tokens attribute of the model to the provided value.
get_output_embeddings() returns the lm_head attribute.
set_output_embeddings(new_embeddings) sets the lm_head attribute to the provided new_embeddings.

Decoder Methods

The class provides methods to get and set the decoder:
- set_decoder(decoder) sets the model attribute to the provided decoder.
- get_decoder() returns the model attribute.

Forward Pass (forward method)

The b method is decorated with @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING) and @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC), which add docstrings and modify the return type documentation.
It takes various input parameters such as input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, and cache_position.
It performs the forward pass of the Llama model by calling the model with the provided inputs and configuration options.
It retrieves the hidden states from the model's output.
If config.pretraining_tp > 1, it splits the lm_head.weight into slices and applies linear transformation to the hidden states using each slice, concatenating the results.
Otherwise, it applies the lm_head linear layer to the hidden states.
It calculates the language modelling loss if labels are provided, using the CrossEntropyLoss function.
Finally, it returns the computed logits and other outputs based on the return_dict flag.

Input Preparation for Generation (prepare_inputs_for_generation method)

This method prepares the inputs for the generation process.
It handles the caching mechanism for past key values and adjusts the input tensors accordingly.
It also handles the case where inputs_embeds are provided instead of input_ids.
It returns a dictionary containing the prepared input tensors and configurations.

Cache Reordering (_reorder_cache static method)

This static method is used to reorder the cache (past key values) based on the provided beam_idx during beam search decoding.
It reorders the past states for each layer using the index_select operation.

Overall, the LlamaForCausalLM class is a high-level interface for using the Llama model for causal language modelling tasks.

Reference: LlamaTokenizer - a class within the Huggingface Transformer library

Purpose:

The AutoTokenizer class is designed to automatically instantiate the appropriate tokenizer class based on the provided pretrained model name or path.
It serves as a convenient way to load tokenizers without explicitly specifying the tokenizer class.

Instantiation

The class cannot be instantiated directly using the __init__() method. Instead, it raises an EnvironmentError to indicate that the class should be instantiated using the AutoTokenizer.from_pretrained() class method.

from_pretrained() class method

This is the main method used to instantiate the appropriate tokenizer class.
It takes the pretrained_model_name_or_path parameter, which can be a model ID, a path to a directory containing vocabulary files, or a path/URL to a single vocabulary file.
Additional parameters can be passed to customize the tokenizer's behavior, such as use_fast, tokenizer_type, trust_remote_code, and other tokenizer-specific arguments.
The method first checks if the tokenizer_type is provided and tries to load the corresponding tokenizer class.
If tokenizer_type is not provided, it attempts to load the tokenizer class based on the tokenizer_config or config associated with the pretrained model.
If the tokenizer class is not found, it falls back to using the model_type derived from the configuration class.

Configuration handling

The class uses the PretrainedConfig class to determine the appropriate tokenizer class to instantiate.
It first tries to load the tokenizer configuration from the tokenizer_config file associated with the pretrained model.
If the tokenizer_config is not available, it falls back to using the AutoConfig class to load the model configuration.

Fast tokenizers

The class supports loading fast tokenizers, which are implemented in Rust and provide faster tokenization.
If use_fast is set to True (default), it tries to load the fast tokenizer version if available.
If the fast tokenizer is not available, it falls back to the slow (Python-based) tokenizer.

Trust remote code

The class includes a trust_remote_code parameter to control whether to allow loading custom models defined on the Hugging Face Hub.
If set to True, it executes code present on the Hub on the local machine, which should only be done for trusted repositories.

Error handling

The class raises appropriate exceptions and provides informative error messages when the requested tokenizer class is not found or when there are inconsistencies in the provided parameters.

Tokenizer registration

The class provides a register() method to register new tokenizers in the tokenizer mapping.
It allows registering a configuration class along with the corresponding slow and fast tokenizer classes.

Overall, the AutoTokenizer class provides a convenient and flexible way to load tokenizers based on the pretrained model name or path.

It handles the complexity of determining the appropriate tokenizer class and provides options for customization. The class is well-structured and follows good coding practices, such as raising exceptions for invalid usage and providing clear error messages.

The next step after determining the model type configurations is to configure the data loading and processing parameters

PreviousLlama2 NextLlama2 - Model Quantization

Last updated 1 year ago

Was this helpful?