Download the dataset
Once the model has been downloaded, the dataset is next
The next step is to download the dataset.
Before doing this we will document how datasets work on Huggingface and then provide specific instructions on how to download dataset into the Axolotl platform.
The HuggingFace Hub has numerous datasets
Uploading datasets for future use
Download a Huggingface dataset
Downloading datasets from the Hugging Face Hub can be accomplished through several methods.
We will be using the 'git clone' method:
Using Git
All datasets on the Hub are stored as Git repositories, allowing for cloning directly to the local machine.
This method is particularly useful for large datasets or when you require the entire dataset repository.
Before cloning, ensure Git Large File Storage (LFS) is installed with
git lfs install
.Clone the dataset using the command:
Replace
<dataset ID>
with the actual ID of the dataset you wish to clone (e.g.,git clone git@hf.co:datasets/allenai/c4
).If you have write-access to the dataset repository, you can also commit and push revisions.
To push changes or access private repositories, add your SSH public key to your user settings on Hugging Face.
Last updated