Llama3- All Configurations
This is a summary of the entire configuration file for the fine tuning run of Llama3 usijng 8bit Lora PEFT.
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
datasets:
- path: datasets/alpagasus/data/train-00000-of-00001-0c59455170918204.parquet
type: alpaca
ds_type: parquet
data_files:
- train-00000-of-00001-0c59455170918204.parquet
dataset_prepared_path:
val_set_size: 0.10
output_dir: ./llama3-out
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: llama3-alpagasus
wandb_entity: continuum-labs
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
The training configuration file looks mostly fine, but there are a few things that could be adjusted or clarified:
The
dataset_prepared_path
is empty. If you want to use a prepared dataset, you should provide a path here.The
lora_target_modules
parameter is missing. You might want to specify which modules to apply LoRA to, such asq_proj
,v_proj
,k_proj
,o_proj
, etc.The
wandb_watch
,wandb_name
, andwandb_log_model
parameters are empty. If you want to use Weights & Biases for logging and tracking, you should provide appropriate values for these parameters.The
bf16
parameter is set toauto
. Make sure your hardware supports bfloat16 if you want to use it. Otherwise, you can set it tofalse
.The
early_stopping_patience
,resume_from_checkpoint
, andlocal_rank
parameters are empty. If you want to use early stopping, resume from a checkpoint, or perform distributed training, you should provide appropriate values for these parameters.The
eval_table_size
andeval_max_new_tokens
parameters are present but not typically used in the training configuration. These parameters are usually used during evaluation or inference.The
fsdp
andfsdp_config
parameters are empty. If you want to use FullyShardedDataParallel for distributed training, you should provide appropriate configurations.The
special_tokens
parameter only specifies thepad_token
. If your model requires additional special tokens, such asbos_token
,eos_token
, or custom tokens, you should add them here.
Other than these points, the configuration looks valid. Make sure to double-check the paths to your dataset and output directory, and ensure that the hyperparameters are suitable for your specific training task and hardware setup.
Last updated
Was this helpful?