Llama2 - Model Configuration
Model Configuration
The first configuration block of the Axolotl configuration file is 'model type'. It comprises three main configurations.
base_model
model_type
tokenizer_type
base_model: NousResearch/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: trueBelow is an analysis of the Huggingface Transformer classes that are used in the Axolotl training script:
Reference: LlamaForCausalLM - a class within the Huggingface Transformer library
The LlamaForCausalLM class is a high-level interface that allows users to easily use the Llama language model for causal language modelling tasks.
Causal language modeling involves predicting the next word or token in a sequence based on the previous words or tokens. This class encapsulates the core functionalities of the Llama model, making it more accessible and user-friendly for developers and researchers working with language models.
It encapsulates the complexities of the underlying model architecture and offers methods for initialization, input/output handling, forward pass, and text generation.
This class enables developers and researchers to easily fine-tune and use the Llama model for various natural language processing applications, such as text completion, content generation, and language understanding. It contributes to the field of LLMs by making the Llama model more accessible and facilitating its integration into real-world applications.
Model Initialization
The class takes a configuration object (
config) that specifies the architecture and hyperparameters of the Llama model.It initializes the Llama model (
LlamaModel) using the provided configuration.It sets up the language modelling head (
lm_head), which is a linear layer that maps the hidden states of the model to the vocabulary size for predicting the next token.
Class Inheritance
The LlamaForCausalLM class inherits from the LlamaPreTrainedModel class, which is a base class for all Llama-based pretrained models.
Class Attributes
The class has a class attribute _tied_weights_keys which is a list containing the string "lm_head.weight". This attribute is used for weight tying between the input and output embeddings.
Initialization (__init__ method)
The
__init__method takes abparameter, which is an instance of a configuration class specific to the Llama model.It calls the
super().__init__(config)to initialize the parent class with the provided configuration.It creates an instance of the
LlamaModelclass using the provided configuration and assigns it to themodelattribute.It sets the
vocab_sizeattribute based on thevocab_sizefrom the configuration.It creates a linear layer
lm_headwith input sizeconfig.hidden_sizeand output sizeconfig.vocab_size, without bias.Finally, it calls the
post_init()method to perform any necessary post-initialization steps.
Embedding Methods
The class provides methods to get and set the input and output embeddings:
get_input_embeddings()returns theembed_tokensattribute of themodel.set_input_embeddings(value)sets theembed_tokensattribute of themodelto the providedvalue.get_output_embeddings()returns thelm_headattribute.set_output_embeddings(new_embeddings)sets thelm_headattribute to the providednew_embeddings.
Decoder Methods
The class provides methods to get and set the decoder:
set_decoder(decoder)sets themodelattribute to the provideddecoder.get_decoder()returns themodelattribute.
Forward Pass (forward method)
The
bmethod is decorated with@add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)and@replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC), which add docstrings and modify the return type documentation.It takes various input parameters such as
input_ids,attention_mask,position_ids,past_key_values,inputs_embeds,labels,use_cache,output_attentions,output_hidden_states,return_dict, andcache_position.It performs the forward pass of the Llama model by calling the
modelwith the provided inputs and configuration options.It retrieves the hidden states from the model's output.
If
config.pretraining_tp > 1, it splits thelm_head.weightinto slices and applies linear transformation to the hidden states using each slice, concatenating the results.Otherwise, it applies the
lm_headlinear layer to the hidden states.It calculates the language modelling loss if
labelsare provided, using theCrossEntropyLossfunction.Finally, it returns the computed logits and other outputs based on the
return_dictflag.
Input Preparation for Generation (prepare_inputs_for_generation method)
This method prepares the inputs for the generation process.
It handles the caching mechanism for past key values and adjusts the input tensors accordingly.
It also handles the case where
inputs_embedsare provided instead ofinput_ids.It returns a dictionary containing the prepared input tensors and configurations.
Cache Reordering (_reorder_cache static method)
This static method is used to reorder the cache (past key values) based on the provided
beam_idxduring beam search decoding.It reorders the past states for each layer using the
index_selectoperation.
Overall, the LlamaForCausalLM class is a high-level interface for using the Llama model for causal language modelling tasks.
Reference: LlamaTokenizer - a class within the Huggingface Transformer library
Purpose:
The
AutoTokenizerclass is designed to automatically instantiate the appropriate tokenizer class based on the provided pretrained model name or path.It serves as a convenient way to load tokenizers without explicitly specifying the tokenizer class.
Instantiation
The class cannot be instantiated directly using the
__init__()method. Instead, it raises anEnvironmentErrorto indicate that the class should be instantiated using theAutoTokenizer.from_pretrained()class method.
from_pretrained() class method
This is the main method used to instantiate the appropriate tokenizer class.
It takes the
pretrained_model_name_or_pathparameter, which can be a model ID, a path to a directory containing vocabulary files, or a path/URL to a single vocabulary file.Additional parameters can be passed to customize the tokenizer's behavior, such as
use_fast,tokenizer_type,trust_remote_code, and other tokenizer-specific arguments.The method first checks if the
tokenizer_typeis provided and tries to load the corresponding tokenizer class.If
tokenizer_typeis not provided, it attempts to load the tokenizer class based on thetokenizer_configorconfigassociated with the pretrained model.If the tokenizer class is not found, it falls back to using the
model_typederived from the configuration class.
Configuration handling
The class uses the
PretrainedConfigclass to determine the appropriate tokenizer class to instantiate.It first tries to load the tokenizer configuration from the
tokenizer_configfile associated with the pretrained model.If the
tokenizer_configis not available, it falls back to using theAutoConfigclass to load the model configuration.
Fast tokenizers
The class supports loading fast tokenizers, which are implemented in Rust and provide faster tokenization.
If
use_fastis set toTrue(default), it tries to load the fast tokenizer version if available.If the fast tokenizer is not available, it falls back to the slow (Python-based) tokenizer.
Trust remote code
The class includes a
trust_remote_codeparameter to control whether to allow loading custom models defined on the Hugging Face Hub.If set to
True, it executes code present on the Hub on the local machine, which should only be done for trusted repositories.
Error handling
The class raises appropriate exceptions and provides informative error messages when the requested tokenizer class is not found or when there are inconsistencies in the provided parameters.
Tokenizer registration
The class provides a
register()method to register new tokenizers in the tokenizer mapping.It allows registering a configuration class along with the corresponding slow and fast tokenizer classes.
Overall, the AutoTokenizer class provides a convenient and flexible way to load tokenizers based on the pretrained model name or path.
It handles the complexity of determining the appropriate tokenizer class and provides options for customization. The class is well-structured and follows good coding practices, such as raising exceptions for invalid usage and providing clear error messages.
The next step after determining the model type configurations is to configure the data loading and processing parameters
Last updated
Was this helpful?

