Page cover image

Sequence Configuration

Field Name
Explanation

sequence_len

The sequence_len field specifies the maximum length of an input sequence to use during training. This value should typically be less than 2048, as most models have a token/context limit of 2048. Setting it appropriately ensures that input sequences are within the model's capacity.

pad_to_sequence_len

The pad_to_sequence_len field is intended to pad inputs to a constant size, which can reduce memory fragmentation and potentially prevent Out of Memory (OOM) errors. It helps re-use memory more efficiently by ensuring that all input sequences have a uniform length. However, this field appears empty in the provided configuration and should have a value (e.g., true or false) to specify whether to pad to the sequence_len.

max_packed_sequence_len

max_packed_sequence_len sets the maximum sequence length for concatenating training samples together. This concept is inspired by StackLLaMA and can be useful for efficient training. However, it's important to note that there is a FutureWarning that suggests this field will soon be deprecated.

sample_packing

The sample_packing field, when set to 'true,' indicates the use of efficient multi-packing with block diagonal attention and per-sequence position_ids. This setting is recommended for optimizing training and memory efficiency, especially when dealing with long sequences.

eval_sample_packing

eval_sample_packing allows you to enable or disable sample packing during evaluation. If you encounter errors during evaluation when sample packing is on, you can set this field to 'false' to disable it.

sample_packing_eff_est

This field is not fully specified in the provided configuration. It appears to be related to estimating the efficiency of sample packing optimizations but is left empty.

total_num_tokens

total_num_tokens seems to be intended for specifying the total number of tokens in the dataset. However, this field is also left empty in the configuration.

Last updated

Was this helpful?