# Llama3- All Configurations

This is a summary of the entire configuration file for the fine tuning run of Llama3 usijng 8bit Lora PEFT.

```yaml
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: datasets/alpagasus/data/train-00000-of-00001-0c59455170918204.parquet
    type: alpaca
    ds_type: parquet
    data_files:
  - train-00000-of-00001-0c59455170918204.parquet
dataset_prepared_path:
val_set_size: 0.10
output_dir: ./llama3-out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: llama3-alpagasus
wandb_entity: continuum-labs
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>
```

The training configuration file looks mostly fine, but there are a few things that could be adjusted or clarified:

1. The `dataset_prepared_path` is empty. If you want to use a prepared dataset, you should provide a path here.
2. The `lora_target_modules` parameter is missing. You might want to specify which modules to apply LoRA to, such as `q_proj`, `v_proj`, `k_proj`, `o_proj`, etc.
3. The `wandb_watch`, `wandb_name`, and `wandb_log_model` parameters are empty. If you want to use Weights & Biases for logging and tracking, you should provide appropriate values for these parameters.
4. The `bf16` parameter is set to `auto`. Make sure your hardware supports bfloat16 if you want to use it. Otherwise, you can set it to `false`.
5. The `early_stopping_patience`, `resume_from_checkpoint`, and `local_rank` parameters are empty. If you want to use early stopping, resume from a checkpoint, or perform distributed training, you should provide appropriate values for these parameters.
6. The `eval_table_size` and `eval_max_new_tokens` parameters are present but not typically used in the training configuration. These parameters are usually used during evaluation or inference.
7. The `fsdp` and `fsdp_config` parameters are empty. If you want to use FullyShardedDataParallel for distributed training, you should provide appropriate configurations.
8. The `special_tokens` parameter only specifies the `pad_token`. If your model requires additional special tokens, such as `bos_token`, `eos_token`, or custom tokens, you should add them here.

Other than these points, the configuration looks valid. Make sure to double-check the paths to your dataset and output directory, and ensure that the hyperparameters are suitable for your specific training task and hardware setup.

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/llama3/llama3-all-configurations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
