Llama3- All Configurations
This is a summary of the entire configuration file for the fine tuning run of Llama3 usijng 8bit Lora PEFT.
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
datasets:
- path: datasets/alpagasus/data/train-00000-of-00001-0c59455170918204.parquet
type: alpaca
ds_type: parquet
data_files:
- train-00000-of-00001-0c59455170918204.parquet
dataset_prepared_path:
val_set_size: 0.10
output_dir: ./llama3-out
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: llama3-alpagasus
wandb_entity: continuum-labs
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>The training configuration file looks mostly fine, but there are a few things that could be adjusted or clarified:
The
dataset_prepared_pathis empty. If you want to use a prepared dataset, you should provide a path here.The
lora_target_modulesparameter is missing. You might want to specify which modules to apply LoRA to, such asq_proj,v_proj,k_proj,o_proj, etc.The
wandb_watch,wandb_name, andwandb_log_modelparameters are empty. If you want to use Weights & Biases for logging and tracking, you should provide appropriate values for these parameters.The
bf16parameter is set toauto. Make sure your hardware supports bfloat16 if you want to use it. Otherwise, you can set it tofalse.The
early_stopping_patience,resume_from_checkpoint, andlocal_rankparameters are empty. If you want to use early stopping, resume from a checkpoint, or perform distributed training, you should provide appropriate values for these parameters.The
eval_table_sizeandeval_max_new_tokensparameters are present but not typically used in the training configuration. These parameters are usually used during evaluation or inference.The
fsdpandfsdp_configparameters are empty. If you want to use FullyShardedDataParallel for distributed training, you should provide appropriate configurations.The
special_tokensparameter only specifies thepad_token. If your model requires additional special tokens, such asbos_token,eos_token, or custom tokens, you should add them here.
Other than these points, the configuration looks valid. Make sure to double-check the paths to your dataset and output directory, and ensure that the hyperparameters are suitable for your specific training task and hardware setup.
Last updated
Was this helpful?

