# Llama3- All Configurations

This is a summary of the entire configuration file for the fine tuning run of Llama3 usijng 8bit Lora PEFT.

```yaml
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: datasets/alpagasus/data/train-00000-of-00001-0c59455170918204.parquet
    type: alpaca
    ds_type: parquet
    data_files:
  - train-00000-of-00001-0c59455170918204.parquet
dataset_prepared_path:
val_set_size: 0.10
output_dir: ./llama3-out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: llama3-alpagasus
wandb_entity: continuum-labs
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>
```

The training configuration file looks mostly fine, but there are a few things that could be adjusted or clarified:

1. The `dataset_prepared_path` is empty. If you want to use a prepared dataset, you should provide a path here.
2. The `lora_target_modules` parameter is missing. You might want to specify which modules to apply LoRA to, such as `q_proj`, `v_proj`, `k_proj`, `o_proj`, etc.
3. The `wandb_watch`, `wandb_name`, and `wandb_log_model` parameters are empty. If you want to use Weights & Biases for logging and tracking, you should provide appropriate values for these parameters.
4. The `bf16` parameter is set to `auto`. Make sure your hardware supports bfloat16 if you want to use it. Otherwise, you can set it to `false`.
5. The `early_stopping_patience`, `resume_from_checkpoint`, and `local_rank` parameters are empty. If you want to use early stopping, resume from a checkpoint, or perform distributed training, you should provide appropriate values for these parameters.
6. The `eval_table_size` and `eval_max_new_tokens` parameters are present but not typically used in the training configuration. These parameters are usually used during evaluation or inference.
7. The `fsdp` and `fsdp_config` parameters are empty. If you want to use FullyShardedDataParallel for distributed training, you should provide appropriate configurations.
8. The `special_tokens` parameter only specifies the `pad_token`. If your model requires additional special tokens, such as `bos_token`, `eos_token`, or custom tokens, you should add them here.

Other than these points, the configuration looks valid. Make sure to double-check the paths to your dataset and output directory, and ensure that the hyperparameters are suitable for your specific training task and hardware setup.

<br>
