Download cleaned Alpaca dataset
Test the engines
The last instruction entered was to git clone the alpaca-cleaned dataset to the local directory:
git clone https://huggingface.co/datasets/yahma/alpaca-cleaned
This command downloaded this Huggingface 42MB json dataset into the directory you created called datasets.
Within datasets, this directory is located at alpaca-cleaned. The full path is:
your primary directory/axolotl/datasets/alpaca-cleaned
The screenshot below shows the contents of the alpaca-cleaned dataset. Note that it is in JSON format and that the training set is in Alpaca format:

What is Alpaca format?
When using instruction fine tuning. there are various formats for the training set. The Alpaca format has become one of the 'standards' for the structure of a dataset
Data Structure in alpaca_data.json
alpaca_data.json
This dataset is formatted as a JSON file, where each entry is represented as a dictionary with the following key-value pairs:
Instruction (instruction
):
Type: String (
str
)Description: Specifies the task to be performed by the model.
Input (input
):
Type: String (
str
) optional.Description: Provides additional context or information needed to perform the task described in the
instruction
.Example: If the instruction is "Summarize the following article", the input would be the text of the article.
Prevalence: In the original 52k Alpaca dataset, approximately 40% of the entries in the dataset include an input
field.
Output (output
):
Type: String (
str
)Description: The response generated by the text-davinci-003 model, which represents the answer or completion of the task defined in the
instruction
.
Fine-Tuning Prompts for Alpaca Model
Two distinct prompt structures were used in the fine-tuning process, depending on whether the input field
is present or not.
For Entries with Non-Empty Input Field:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
For Entries with Empty Input Field:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
For a full review of the different types of dataset techniques and structures used in Axolotl please visit datasets.
Last updated
Was this helpful?