Page cover image

Llama3- All Configurations

This is a summary of the entire configuration file for the fine tuning run of Llama3 usijng 8bit Lora PEFT.

base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: datasets/alpagasus/data/train-00000-of-00001-0c59455170918204.parquet
    type: alpaca
    ds_type: parquet
    data_files:
  - train-00000-of-00001-0c59455170918204.parquet
dataset_prepared_path:
val_set_size: 0.10
output_dir: ./llama3-out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: llama3-alpagasus
wandb_entity: continuum-labs
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>

The training configuration file looks mostly fine, but there are a few things that could be adjusted or clarified:

  1. The dataset_prepared_path is empty. If you want to use a prepared dataset, you should provide a path here.

  2. The lora_target_modules parameter is missing. You might want to specify which modules to apply LoRA to, such as q_proj, v_proj, k_proj, o_proj, etc.

  3. The wandb_watch, wandb_name, and wandb_log_model parameters are empty. If you want to use Weights & Biases for logging and tracking, you should provide appropriate values for these parameters.

  4. The bf16 parameter is set to auto. Make sure your hardware supports bfloat16 if you want to use it. Otherwise, you can set it to false.

  5. The early_stopping_patience, resume_from_checkpoint, and local_rank parameters are empty. If you want to use early stopping, resume from a checkpoint, or perform distributed training, you should provide appropriate values for these parameters.

  6. The eval_table_size and eval_max_new_tokens parameters are present but not typically used in the training configuration. These parameters are usually used during evaluation or inference.

  7. The fsdp and fsdp_config parameters are empty. If you want to use FullyShardedDataParallel for distributed training, you should provide appropriate configurations.

  8. The special_tokens parameter only specifies the pad_token. If your model requires additional special tokens, such as bos_token, eos_token, or custom tokens, you should add them here.

Other than these points, the configuration looks valid. Make sure to double-check the paths to your dataset and output directory, and ensure that the hyperparameters are suitable for your specific training task and hardware setup.

Last updated

Was this helpful?