Llama2 - Model Quantization
With the model configured, we next have to determine whether whether we will be using quantization in the training process.
We will be using the Lora Parameter Efficient Fine Tuning technique
Model Configuration
The first configuration block of the Axolotl configuration file is 'model type'.
It comprises three main configurations.
base_model
model_type
tokenizer_type
load_in_8bit: true
load_in_4bit: false
strict: false
load_in_8bit: true or false
This is configuration flag that determines whether the model should be loaded in 8-bit precision. If it is set to "true," the model will be loaded in 8-bit precision.
Memory Efficiency: 8-bit precision reduces the memory footprint of the model compared to higher precision formats (like 16-bit or 32-bit). This is because it requires less memory to store each weight in the model.
Loading a model in 8-bit precision can accelerate model loading and inference times. This is due to the reduced computational load compared to higher precision formats.
While 8-bit precision is more efficient, it can slightly reduce the accuracy of the model compared to full precision (32-bit). This happens because of the reduced resolution in representing the weights and activations.
load_in_4bit: true or false
This is configuration flag that determines whether the model should be loaded in 4-bit precision. If it is set to "true," the model will be loaded in 4-bit precision.
4-bit precision takes the concept of memory efficiency further, halving the memory requirements compared to 8-bit. This can be crucial for deploying large models on limited hardware.
Similar to 8-bit, 4-bit precision can lead to even faster loading and inference times due to the further reduced computational requirements.
The trade-off in accuracy might be more pronounced in 4-bit precision. The reduced bit-depth means that the model's ability to represent nuanced information in weights and activations is more limited. This might affect tasks that require high precision or are sensitive to small changes in weights.
strict: true or false
Strict Mode (strict)
If set to false, default weights will be chosen where missing in adapters. This is a component of 'bits and bytes' library
Last updated
Was this helpful?