Use Git to download dataset
Using git to download the datasets
You should be connected to the Huggingface Hub as well as having git installed and prepared on the virtual machine. If you do not, for reference:
Install Git LFS
First, ensure that Git LFS is installed on your machine. If it's not installed, you can download and install it from Git LFS' website.
On most systems, you can install Git LFS using a package manager. For instance, on Ubuntu, you can use:
sudo apt-get install git-lfs
Initialise Git LFS
After installation, you need to set up Git LFS. In your terminal, run:
git lfs install
The output should be as follows:
Updated git hooks.
Git LFS initialized.
Navigate to the Huggingface datasets repository and search for the dataset you wish to download. In this case we will download the 'alpaca-cleaned' dataset.
When you are in the datasets website, enter in the name of the required dataset below in the 'filtered by name' input box. in this case, filter by 'alpaca_cleaned'

After filtering by dataset name, you will seen all the datasets attributable to that name. We will be downloading yahma/alpaca_cleaned:

Once in the dataset repository, click on the button with three dots positioned horizontally. This will provide the opportunity to use git clone to download the dataset to your directory.

When you click on the three horizontal dolts, a dialog box appears providing the command line for a git clone download of the dataset. Follow the instructions below to git clone the dataset into the axolotl environment.

Go into the primary axolotl directory and then enter the following command:
git clone https://huggingface.co/datasets/yahma/alpaca-cleaned
This command will create a folder called datasets and download the specified Huggingface dataset into it.
Last updated
Was this helpful?