Download cleaned Alpaca dataset
Test the engines
The last instruction entered was to git clone the alpaca-cleaned dataset to the local directory:
git clone https://huggingface.co/datasets/yahma/alpaca-cleanedThis command downloaded this Huggingface 42MB json dataset into the directory you created called datasets.
Within datasets, this directory is located at alpaca-cleaned. The full path is:
your primary directory/axolotl/datasets/alpaca-cleaned
The screenshot below shows the contents of the alpaca-cleaned dataset. Note that it is in JSON format and that the training set is in Alpaca format:

What is Alpaca format?
When using instruction fine tuning. there are various formats for the training set. The Alpaca format has become one of the 'standards' for the structure of a dataset
Data Structure in alpaca_data.json
alpaca_data.jsonThis dataset is formatted as a JSON file, where each entry is represented as a dictionary with the following key-value pairs:
Instruction (instruction):
Type: String (
str)Description: Specifies the task to be performed by the model.
Input (input):
Type: String (
str) optional.Description: Provides additional context or information needed to perform the task described in the
instruction.Example: If the instruction is "Summarize the following article", the input would be the text of the article.
Prevalence: In the original 52k Alpaca dataset, approximately 40% of the entries in the dataset include an input field.
Output (output):
Type: String (
str)Description: The response generated by the text-davinci-003 model, which represents the answer or completion of the task defined in the
instruction.
Fine-Tuning Prompts for Alpaca Model
Two distinct prompt structures were used in the fine-tuning process, depending on whether the input field is present or not.
For Entries with Non-Empty Input Field:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:For Entries with Empty Input Field:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:For a full review of the different types of dataset techniques and structures used in Axolotl please visit datasets.
Last updated
Was this helpful?

