Data Loading and Processing
Field Name | Explanation |
---|---|
datasets | Datasets provide the training data for the model. |
path | The |
type | The |
ds_type | The optional |
data_files | If necessary, the |
shards | The optional |
name | You can provide an optional |
train_on_split | The optional |
conversation | For specific types of prompts like "sharegpt," this optional field defines the fastchat conversation type. It's typically used in conjunction with the "sharegpt" type and allows customization of conversation style. |
Custom User Prompt | Explanation |
---|---|
system_prompt | The |
system_format |
|
field_system |
|
field_instruction |
|
field_input |
|
field_output |
|
format | The |
no_input_format |
|
field | For "completion" datasets, this field can be used to specify a custom field in the dataset to be used instead of the default "text" column. This customization can be beneficial for specific use cases. |
Dataset Prepared Path | Explanation |
---|---|
dataset_prepared_path | The |
Push Dataset to Hub | Explanation |
---|---|
push_dataset_to_hub | The |
Dataset Processing | Explanation |
---|---|
dataset_processes | The |
Push Checkpoints to Hub | Explanation |
---|---|
hub_model_id | The |
hub_strategy | The |
Authentication Token | Explanation |
---|---|
hf_use_auth_token | The |
Validation Set Size | Explanation |
---|---|
val_set_size |
|
Dataset Sharding | Explanation |
---|---|
dataset_shard_num | The |
Last updated