Sequence Configuration
sequence_len
The sequence_len
field specifies the maximum length of an input sequence to use during training. This value should typically be less than 2048, as most models have a token/context limit of 2048. Setting it appropriately ensures that input sequences are within the model's capacity.
pad_to_sequence_len
The pad_to_sequence_len
field is intended to pad inputs to a constant size, which can reduce memory fragmentation and potentially prevent Out of Memory (OOM) errors. It helps re-use memory more efficiently by ensuring that all input sequences have a uniform length. However, this field appears empty in the provided configuration and should have a value (e.g., true
or false
) to specify whether to pad to the sequence_len
.
max_packed_sequence_len
max_packed_sequence_len
sets the maximum sequence length for concatenating training samples together. This concept is inspired by StackLLaMA and can be useful for efficient training. However, it's important to note that there is a FutureWarning that suggests this field will soon be deprecated.
sample_packing
The sample_packing
field, when set to 'true,' indicates the use of efficient multi-packing with block diagonal attention and per-sequence position_ids. This setting is recommended for optimizing training and memory efficiency, especially when dealing with long sequences.
eval_sample_packing
eval_sample_packing
allows you to enable or disable sample packing during evaluation. If you encounter errors during evaluation when sample packing is on, you can set this field to 'false' to disable it.
sample_packing_eff_est
This field is not fully specified in the provided configuration. It appears to be related to estimating the efficiency of sample packing optimizations but is left empty.
total_num_tokens
total_num_tokens
seems to be intended for specifying the total number of tokens in the dataset. However, this field is also left empty in the configuration.
Last updated