Llama2 - Model Configuration
Model Configuration
The first configuration block of the Axolotl configuration file is 'model type'. It comprises three main configurations.
base_model
model_type
tokenizer_type
base_model: NousResearch/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
Below is an analysis of the Huggingface Transformer classes that are used in the Axolotl training script:
Reference: LlamaForCausalLM - a class within the Huggingface Transformer library
The LlamaForCausalLM
class is a high-level interface that allows users to easily use the Llama language model for causal language modelling tasks.
Causal language modeling involves predicting the next word or token in a sequence based on the previous words or tokens. This class encapsulates the core functionalities of the Llama model, making it more accessible and user-friendly for developers and researchers working with language models.
It encapsulates the complexities of the underlying model architecture and offers methods for initialization, input/output handling, forward pass, and text generation.
This class enables developers and researchers to easily fine-tune and use the Llama model for various natural language processing applications, such as text completion, content generation, and language understanding. It contributes to the field of LLMs by making the Llama model more accessible and facilitating its integration into real-world applications.
Model Initialization
The class takes a configuration object (
config
) that specifies the architecture and hyperparameters of the Llama model.It initializes the Llama model (
LlamaModel
) using the provided configuration.It sets up the language modelling head (
lm_head
), which is a linear layer that maps the hidden states of the model to the vocabulary size for predicting the next token.
Class Inheritance
The LlamaForCausalLM
class inherits from the LlamaPreTrainedModel
class, which is a base class for all Llama-based pretrained models.
Class Attributes
The class has a class attribute _tied_weights_keys
which is a list containing the string "lm_head.weight"
. This attribute is used for weight tying between the input and output embeddings.
Initialization (__init__
method)
The
__init__
method takes ab
parameter, which is an instance of a configuration class specific to the Llama model.It calls the
super().__init__(config)
to initialize the parent class with the provided configuration.It creates an instance of the
LlamaModel
class using the provided configuration and assigns it to themodel
attribute.It sets the
vocab_size
attribute based on thevocab_size
from the configuration.It creates a linear layer
lm_head
with input sizeconfig.hidden_size
and output sizeconfig.vocab_size
, without bias.Finally, it calls the
post_init()
method to perform any necessary post-initialization steps.
Embedding Methods
The class provides methods to get and set the input and output embeddings:
get_input_embeddings()
returns theembed_tokens
attribute of themodel
.set_input_embeddings(value)
sets theembed_tokens
attribute of themodel
to the providedvalue
.get_output_embeddings()
returns thelm_head
attribute.set_output_embeddings(new_embeddings)
sets thelm_head
attribute to the providednew_embeddings
.
Decoder Methods
The class provides methods to get and set the decoder:
set_decoder(decoder)
sets themodel
attribute to the provideddecoder
.get_decoder()
returns themodel
attribute.
Forward Pass (forward
method)
The
b
method is decorated with@add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
and@replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
, which add docstrings and modify the return type documentation.It takes various input parameters such as
input_ids
,attention_mask
,position_ids
,past_key_values
,inputs_embeds
,labels
,use_cache
,output_attentions
,output_hidden_states
,return_dict
, andcache_position
.It performs the forward pass of the Llama model by calling the
model
with the provided inputs and configuration options.It retrieves the hidden states from the model's output.
If
config.pretraining_tp > 1
, it splits thelm_head.weight
into slices and applies linear transformation to the hidden states using each slice, concatenating the results.Otherwise, it applies the
lm_head
linear layer to the hidden states.It calculates the language modelling loss if
labels
are provided, using theCrossEntropyLoss
function.Finally, it returns the computed logits and other outputs based on the
return_dict
flag.
Input Preparation for Generation (prepare_inputs_for_generation
method)
This method prepares the inputs for the generation process.
It handles the caching mechanism for past key values and adjusts the input tensors accordingly.
It also handles the case where
inputs_embeds
are provided instead ofinput_ids
.It returns a dictionary containing the prepared input tensors and configurations.
Cache Reordering (_reorder_cache
static method)
This static method is used to reorder the cache (past key values) based on the provided
beam_idx
during beam search decoding.It reorders the past states for each layer using the
index_select
operation.
Overall, the LlamaForCausalLM
class is a high-level interface for using the Llama model for causal language modelling tasks.
Reference: LlamaTokenizer - a class within the Huggingface Transformer library
Purpose:
The
AutoTokenizer
class is designed to automatically instantiate the appropriate tokenizer class based on the provided pretrained model name or path.It serves as a convenient way to load tokenizers without explicitly specifying the tokenizer class.
Instantiation
The class cannot be instantiated directly using the
__init__()
method. Instead, it raises anEnvironmentError
to indicate that the class should be instantiated using theAutoTokenizer.from_pretrained()
class method.
from_pretrained()
class method
This is the main method used to instantiate the appropriate tokenizer class.
It takes the
pretrained_model_name_or_path
parameter, which can be a model ID, a path to a directory containing vocabulary files, or a path/URL to a single vocabulary file.Additional parameters can be passed to customize the tokenizer's behavior, such as
use_fast
,tokenizer_type
,trust_remote_code
, and other tokenizer-specific arguments.The method first checks if the
tokenizer_type
is provided and tries to load the corresponding tokenizer class.If
tokenizer_type
is not provided, it attempts to load the tokenizer class based on thetokenizer_config
orconfig
associated with the pretrained model.If the tokenizer class is not found, it falls back to using the
model_type
derived from the configuration class.
Configuration handling
The class uses the
PretrainedConfig
class to determine the appropriate tokenizer class to instantiate.It first tries to load the tokenizer configuration from the
tokenizer_config
file associated with the pretrained model.If the
tokenizer_config
is not available, it falls back to using theAutoConfig
class to load the model configuration.
Fast tokenizers
The class supports loading fast tokenizers, which are implemented in Rust and provide faster tokenization.
If
use_fast
is set toTrue
(default), it tries to load the fast tokenizer version if available.If the fast tokenizer is not available, it falls back to the slow (Python-based) tokenizer.
Trust remote code
The class includes a
trust_remote_code
parameter to control whether to allow loading custom models defined on the Hugging Face Hub.If set to
True
, it executes code present on the Hub on the local machine, which should only be done for trusted repositories.
Error handling
The class raises appropriate exceptions and provides informative error messages when the requested tokenizer class is not found or when there are inconsistencies in the provided parameters.
Tokenizer registration
The class provides a
register()
method to register new tokenizers in the tokenizer mapping.It allows registering a configuration class along with the corresponding slow and fast tokenizer classes.
Overall, the AutoTokenizer
class provides a convenient and flexible way to load tokenizers based on the pretrained model name or path.
It handles the complexity of determining the appropriate tokenizer class and provides options for customization. The class is well-structured and follows good coding practices, such as raising exceptions for invalid usage and providing clear error messages.
The next step after determining the model type configurations is to configure the data loading and processing parameters
Last updated