Llama3 - Data and Precision
These configurations allow you to control various aspects of the training process, such as data handling, precision settings, and hardware utilisation.
Let's go through each configuration and explain its purpose and implications:
train_on_inputs: false
train_on_inputs: false
This configuration determines whether to include or mask out the human's prompt from the training labels.
When set to
false
, the model will not train on the human's prompt, meaning that the prompt will be excluded from the training labels.In other words, the model will only learn from the desired output or response and not from the input prompt.
This is useful when you want the model to generate responses based on the given prompts without explicitly learning to reproduce the prompts themselves.
By masking out the human's prompt, the model can focus on learning the mapping between the prompt and the desired output.
group_by_length: false
group_by_length: false
This configuration controls whether to group similarly sized data together to minimise padding during training.
When set to
false
, the data will not be grouped by length and will be processed in the order it appears in the dataset.Grouping data by length can be beneficial when working with variable-length sequences, as it helps to reduce the amount of padding needed.
Padding is the process of adding dummy tokens to shorter sequences to match the length of the longest sequence in a batch.
By grouping similarly sized data together, you can minimize the amount of unnecessary padding, which can lead to more efficient memory usage and faster training.
However, enabling
group_by_length
may result in slower data loading and preprocessing, as it requires downloading and sorting the entire dataset before training.It's also worth noting that when
group_by_length
is enabled, the training loss may exhibit an oscillating pattern due to the reordering of the data.
bf16: auto
bf16: auto
This configuration relates to the use of BFloat16 (BF16) precision during training.
BFloat16 is a 16-bit floating-point format that offers a wider dynamic range compared to the more common FP16 (Half-precision) format.
When set to
auto
, the framework will automatically determine whether to use BF16 based on the available hardware and software support.If the hardware (e.g., GPU) and software (e.g., PyTorch version) support BF16, it will be used for training.
BF16 can provide a good balance between computational efficiency and numeric precision, potentially leading to faster training times while maintaining model accuracy.
However, the actual performance gains may vary depending on the specific hardware and model architecture.
fp16:
fp16:
This configuration is related to the use of FP16 (Half-precision) during training, but in the provided configuration, it is left empty.
FP16 is a 16-bit floating-point format that offers reduced precision compared to the standard FP32 (Single-precision) format.
Using FP16 can help to reduce memory usage and accelerate training on certain hardware (e.g., NVIDIA GPUs with Tensor Cores).
However, the empty value suggests that FP16 is not being explicitly enabled or configured in this case.
tf32: true
tf32: true
This configuration is specific to NVIDIA GPUs and relates to the use of TensorFloat-32 (TF32) precision.
TF32 is a 19-bit floating-point format that is used by default on NVIDIA Ampere architecture GPUs (e.g., NVIDIA A100) for certain operations, such as matrix multiplications and convolutions.
When set to
true
, TF32 will be used for supported operations on compatible hardware.TF32 offers a balance between performance and precision, providing faster computation compared to FP32 while maintaining similar accuracy.
Enabling TF32 can lead to improved training speeds on NVIDIA Ampere GPUs without significant impact on model quality.
Last updated