Huggingface Hub

Link to Huggingface Hub for acess to Models and Datasets

We will be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.

What is the Huggingface Hub?

The Hugging Face Hub is a platform with a collection of model and datasets. Huggingface aim to be the github repository for artificial intelligence and machine learning.

In the development phase, we will be using the Huggingface Hub to access models and datasets. When working with enterprise clients, proprietary datasets and models can be stored on encrypted servers, not accessible to the public.

Signing Up for a Hugging Face Account

Visit the Hugging Face Website: Head to the Hugging Face website (https://huggingface.co/) to begin the account creation process.

Click on “Sign Up”: Locate the “Sign Up” button on the top right corner of the homepage and click on it.

Choose a Sign-Up Method: Hugging Face offers multiple sign-up methods, including Google, GitHub, and email. Select your preferred method and follow the prompts to complete the registration.

Verify Your Email (if applicable): If you choose to sign up via email, verify your email address by clicking on the confirmation link sent to your inbox.

Complete Your Profile: Enhance your Hugging Face experience by completing your profile. Add a profile picture, a short bio, and any other details you’d like to share with the community.

Creating an Access Token

To get a Huggingface access token follow these instructions:

Command Line Interface (CLI)

With a Huggingface account established and an access token you can now access the HuggingFace hub through a command line interface.

The huggingface_hub Python package comes with a built-in CLI called huggingface-cli. This tool allows you to interact with the Hugging Face Hub directly from a terminal.

Reference: Details on the HuggingFace Command Line

The huggingface_hub Python package comes with a built-in CLI called huggingface-cli. This tool allows you to interact with the Hugging Face Hub directly from a terminal.

For example, you can login to your account, create a repository, upload and download files, etc.

It also comes with handy features to configure your machine or manage your cache. In this guide, we will have a look at the main features of the CLI and how to use them.

Core Functionalities

User Authentication: Allows login/logout and displays the current user information (login, logout, whoami).
Repository Management: Enables creation and interaction with repositories on Hugging Face (repo).
File Operations: Supports uploading, downloading files, and managing large files on the Hub (upload, download, lfs-enable-largefiles, lfs-multipart-upload).
Cache Management: Provides commands to scan and delete cached files (scan-cache, delete-cache).

Usage

The CLI supports various commands and options, which can be explored using the --help flag.
To interact with the Hub, such as downloading private repos or uploading files, users need to authenticate using a User Access Token.
The CLI also supports downloading specific files or entire repositories, filtering files with patterns, and specifying revisions or local directories for downloads.

First, install the CLI and its extra dependencies, including the [cli] extras, for an improved user experience:

pip3 install -U "huggingface_hub[cli]"

Reference: Python libraries installed with the Huggingface Hub

The command you executed, pip install -U "huggingface_hub[cli]", installs several Python libraries and their dependencies related to the Hugging Face Hub. Here's an explanation of the libraries downloaded:

huggingface_hub

This is the main library that provides access to the Hugging Face Hub, allowing you to interact with models, datasets, and repositories.
It provides functionalities for uploading, downloading, and managing resources on the Hub.

fsspec

fsspec is a Python library for managing filesystem-like abstractions. It is often used for handling remote and cloud-based file systems.
In the context of the Hugging Face Hub, it likely helps in managing the storage and retrieval of model and dataset files.

tqdm

tqdm is a popular library for adding progress bars to loops and other iterables.
It's used in the CLI to display progress when uploading or downloading large files from the Hugging Face Hub.

prompt-toolkit

prompt-toolkit is a library for building command-line interfaces (CLIs) with interactive features.
It's used by the Hugging Face CLI for handling interactive prompts and user interactions.

pfzy

pfzy is a Python library for fuzzy string matching and searching.
It may be used in the CLI for fuzzy matching of commands or resources.

InquirerPy

InquirerPy is a Python library for creating interactive command-line prompts with customisable menus and questions.
It enhances the user experience when using the Hugging Face CLI by providing interactive prompts.

wcwidth

wcwidth is a library for determining the printable width of characters when rendering text in a terminal.
It helps ensure proper formatting and alignment of text in the CLI.

Verify that the CLI is correctly set up by running the following command:

huggingface-cli --help

You should see a list of available options and commands. If you encounter an error like "command not found: huggingface-cli," please refer to the installation guide for troubleshooting.

Now try this command:

huggingface-cli whoami

When logged into the HuggingFace Hub this command prints your username and the organisations you are a part of on the Hub.

At this stage, we have not yet logged into the HuggingFace Hub, so the response will be:

Not logged in

Reference: Huggingface CLI Commands

Use the --help option to get detailed information about a specific command. For example, to learn more about how to upload files using the CLI, run:

huggingface-cli upload --help

Examples

Login to Your Hugging Face Account

Use the following command to log in with your token obtained from huggingface.co/settings/tokens:

huggingface-cli login

Upload Files to a Repository

To upload a file or folder to a repository on the Hub, use the upload command. Replace <repository_name> with the name of your repository and <path_to_file_or_folder> with the path to the file or folder you want to upload:

huggingface-cli upload <repository_name> <path_to_file_or_folder>

Download Files from the Hub

Download files from the Hub using the download command. Specify the file or folder you want to download and the destination path:

huggingface-cli download <file_or_folder_to_download> <destination_path>

Managing Repositories

You can interact with your repositories using the repo command. For example, to create a new repository, use:

huggingface-cli repo create <repository_name>

Environment Information

To view information about your environment, use the env command:

huggingface-cli env

These tutorials will help you get started with the Hugging Face CLI for managing models, datasets, and repositories on the Hugging Face Hub.

We will now be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.

To connect to the Huggingface Hub prepare your machine to allow storing of the Huggingface token:

git config --global credential.helper store

This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.

Explanation of Git Credentials

A "git credential" refers to the authentication details used by Git to access repositories, especially those requiring user verification, such as private repositories or when pushing changes to remote repositories.

These credentials can be usernames and passwords, personal access tokens, SSH keys, or other forms of identity verification.

Types of Git Credentials

Username and Password: The simplest form, but not recommended for security reasons, especially for repositories over HTTPS.

Personal Access Tokens (PATs): More secure than passwords, these tokens are used especially when two-factor authentication (2FA) is enabled. GitHub, for instance, requires PATs for authenticating over HTTPS.

SSH Keys: Secure and commonly used, SSH keys pair a private key (kept on your machine) with a public key (added to your Git server). They are a popular choice for authentication.

Storage of Git Credentials

Credential Cache: Temporarily stores credentials in memory for a short period. This is more secure than storing them on disk but requires re-entry after the cache timeout.

Credential Store: Saves credentials in plain text in a file on your computer. It's convenient but less secure since the file can be read by anyone with access to your system.

SSH Agent: For SSH keys, an SSH agent can store your private keys and handle authentication.

System Keychain: Some Git clients can store credentials in the system's keychain or credential manager, offering a balance of convenience and security.

Environment Variables: Sometimes used for automation, credentials can be set as environment variables, but this method has significant security downsides.

Best Practices

Prefer Token/Key-based Authentication: Use personal access tokens or SSH keys over passwords for better security.

Keep Software Updated: Ensure your Git client and any related credential management tools are up-to-date to benefit from security patches.

Use SSH Keys Wisely: Protect your SSH private keys with strong passphrases and use ssh-agent for managing them.

Limit Token Scopes and Lifetimes: When creating personal access tokens, grant only the necessary permissions and set a reasonable expiration.

Securely Store Sensitive Information: Avoid storing credentials in plaintext files. Use system keychains or encrypted storage whenever possible.

Be Cautious with Environment Variables: Be mindful of security risks when using environment variables for credentials, especially in shared or public environments.

Regularly Review and Rotate Credentials: Regularly review your access tokens and SSH keys, revoking and replacing them as necessary.

Use Two-Factor Authentication (2FA): Wherever possible, enable 2FA for your Git hosting service accounts for an additional layer of security.

huggingface-cli login

You will be asked to enter a Huggingface token as per the code block below:

 > huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token:

You will then be asked whether you want to add the Huggingface token to be added as a git credential. Answer yes (y) to this question. The output should be as below:

Add token as git credential? (Y/n) Y
Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/jack/.cache/huggingface/token
Login successful

This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.

Now check to ensure you are logged in:

huggingface-cli whoami

Now that we have logged into the HuggingFace hub the output should be:

<your name>
orgs:  <your organisation>

You can clearly see your name, and the name of the organization.

Congratulations!

You have established the Axolotl platform and connected to Huggingface.

Previousscript analysis NextDownload the dataset

Last updated 1 year ago

Was this helpful?

We will be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.

What is the Huggingface Hub?

Signing Up for a Hugging Face Account

Creating an Access Token

Command Line Interface (CLI)

Types of Git Credentials

Storage of Git Credentials

Best Practices

Enter the command to login to the Huggingface Hub

Congratulations!

You have established the Axolotl platform and connected to Huggingface.