# Llama3 - Data and Precision

These configurations allow you to control various aspects of the training process, such as data handling, precision settings, and hardware utilisation.&#x20;

```yaml
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
```

Let's go through each configuration and explain its purpose and implications:

#### <mark style="color:blue;">`train_on_inputs: false`</mark>

* This configuration determines whether to include or mask out the human's prompt from the training labels.
* When set to `false`, the model will not train on the human's prompt, meaning that the prompt will be excluded from the training labels.
* In other words, the model will only learn from the desired output or response and not from the input prompt.
* This is useful when you want the model to generate responses based on the given prompts without explicitly learning to reproduce the prompts themselves.
* By masking out the human's prompt, the model can focus on learning the mapping between the prompt and the desired output.

#### <mark style="color:blue;">`group_by_length: false`</mark>

* This configuration controls whether to group similarly sized data together to minimise padding during training.
* When set to `false`, the data will not be grouped by length and will be processed in the order it appears in the dataset.
* Grouping data by length can be beneficial when working with variable-length sequences, as it helps to reduce the amount of padding needed.
* Padding is the process of adding dummy tokens to shorter sequences to match the length of the longest sequence in a batch.
* By grouping similarly sized data together, you can minimize the amount of unnecessary padding, which can lead to more efficient memory usage and faster training.
* However, enabling `group_by_length` may result in slower data loading and preprocessing, as it requires downloading and sorting the entire dataset before training.
* It's also worth noting that when `group_by_length` is enabled, the training loss may exhibit an oscillating pattern due to the reordering of the data.

#### <mark style="color:blue;">`bf16: auto`</mark>

* This configuration relates to the use of BFloat16 (BF16) precision during training.
* BFloat16 is a 16-bit floating-point format that offers a wider dynamic range compared to the more common FP16 (Half-precision) format.
* When set to `auto`, the framework will automatically determine whether to use BF16 based on the available hardware and software support.
* If the hardware (e.g., GPU) and software (e.g., PyTorch version) support BF16, it will be used for training.
* BF16 can provide a good balance between computational efficiency and numeric precision, potentially leading to faster training times while maintaining model accuracy.
* However, the actual performance gains may vary depending on the specific hardware and model architecture.

#### <mark style="color:blue;">`fp16:`</mark>

* This configuration is related to the use of FP16 (Half-precision) during training, but in the provided configuration, it is left empty.
* FP16 is a 16-bit floating-point format that offers reduced precision compared to the standard FP32 (Single-precision) format.
* Using FP16 can help to reduce memory usage and accelerate training on certain hardware (e.g., NVIDIA GPUs with Tensor Cores).
* However, the empty value suggests that FP16 is not being explicitly enabled or configured in this case.

#### <mark style="color:blue;">`tf32: true`</mark>

* This configuration is specific to NVIDIA GPUs and relates to the use of TensorFloat-32 (TF32) precision.
* TF32 is a 19-bit floating-point format that is used by default on NVIDIA Ampere architecture GPUs (e.g., NVIDIA A100) for certain operations, such as matrix multiplications and convolutions.
* When set to `true`, TF32 will be used for supported operations on compatible hardware.
* TF32 offers a balance between performance and precision, providing faster computation compared to FP32 while maintaining similar accuracy.
* Enabling TF32 can lead to improved training speeds on NVIDIA Ampere GPUs without significant impact on model quality.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/llama3/llama3-data-and-precision.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
