# Augmentation Techniques

<table data-full-width="false"><thead><tr><th width="248">Field Name</th><th>Explanation</th></tr></thead><tbody><tr><td>noisy_embedding_alpha</td><td><code>noisy_embedding_alpha</code> is used for applying noise to embeddings as part of data augmentation. It is based on the NEFT (Noisy Embedding Fine-Tuning) technique and can be set to a number (e.g., 5) to add noise to embeddings. This technique helps introduce variability into the training data, potentially improving robustness and generalization.</td></tr><tr><td>flash_optimum</td><td><code>flash_optimum</code> determines whether to use the "Optimum Layer-Order for Transformers" technique provided by Better Transformers. It's an advanced technique that optimizes the order of layers in the transformer model for improved performance.</td></tr><tr><td>xformers_attention</td><td><code>xformers_attention</code> specifies whether to use the attention patch from the XFormers library. XFormers is a library that provides optimized implementations of transformer components, including attention mechanisms.</td></tr><tr><td>flash_attention</td><td><code>flash_attention</code> controls whether to use the Flash Attention patch from the Flash Attention library. Flash Attention is another library that offers optimized attention mechanisms for transformers.</td></tr><tr><td>flash_attn_cross_entropy</td><td><code>flash_attn_cross_entropy</code> determines whether to use the Flash-Attention Cross Entropy implementation. This is an advanced option and should be used with caution, as it may require specific use cases.</td></tr><tr><td>flash_attn_rms_norm</td><td><code>flash_attn_rms_norm</code> specifies whether to use the Flash-Attention Root Mean Square (RMS) Norm implementation. RMS Norm is a technique for normalizing model activations.</td></tr><tr><td>flash_attn_fuse_qkv</td><td><code>flash_attn_fuse_qkv</code> controls whether to fuse the Query, Key, and Value (QKV) components of the attention mechanism into a single operation. This can potentially improve efficiency during training.</td></tr><tr><td>flash_attn_fuse_mlp</td><td><code>flash_attn_fuse_mlp</code> determines whether to fuse part of the Multi-Layer Perceptron (MLP) components of the attention mechanism into a single operation. Like the previous option, this aims to enhance efficiency.</td></tr><tr><td>sdp_attention</td><td><code>sdp_attention</code> specifies whether to use the Scaled Dot-Product Attention mechanism, which is a fundamental component of transformer models. The link provided points to the PyTorch documentation for this attention mechanism.</td></tr><tr><td>landmark_attention</td><td><code>landmark_attention</code> is used only with LLaMA and controls whether to use landmark attention. Landmark attention is a specialized attention mechanism designed for specific use cases.</td></tr><tr><td>xpos_rope</td><td><code>xpos_rope</code> is related to the RoPE (Relative Positional Encoding) technique and is specific to LLaMA. It appears to be related to modifying RoPE for positional encoding in the LLaMA model. The provided link points to an external resource for more details.</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/axolotl-configuration-files/augmentation-techniques.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
