# Huggingface Hub

{% hint style="warning" %}

#### <mark style="color:orange;">We will be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.</mark>

{% endhint %}

### <mark style="color:blue;">What is the Huggingface Hub?</mark>

The Hugging Face Hub is a platform with a collection of model and datasets.  Huggingface aim to be the github repository for artificial intelligence and machine learning.

{% hint style="info" %}
In the development phase, we will be using the Huggingface Hub to access models and datasets.   When working with enterprise clients, proprietary datasets and models can be stored on encrypted servers, not accessible to the public.
{% endhint %}

### <mark style="color:blue;">Signing Up for a Hugging Face Account</mark> <a href="#fb78" id="fb78"></a>

<mark style="color:green;">**Visit the Hugging Face Website:**</mark> Head to the Hugging Face website (<https://huggingface.co/>) to begin the account creation process.

<mark style="color:green;">**Click on “Sign Up”:**</mark> Locate the “Sign Up” button on the top right corner of the homepage and click on it.

<mark style="color:green;">**Choose a Sign-Up Method:**</mark> Hugging Face offers multiple sign-up methods, including Google, GitHub, and email. Select your preferred method and follow the prompts to complete the registration.

<mark style="color:green;">**Verify Your Email (if applicable):**</mark> If you choose to sign up via email, verify your email address by clicking on the confirmation link sent to your inbox.

<mark style="color:green;">**Complete Your Profile:**</mark> Enhance your Hugging Face experience by completing your profile. Add a profile picture, a short bio, and any other details you’d like to share with the community.

### <mark style="color:blue;">Creating an Access Token</mark>

To get a Huggingface access token follow these instructions:

{% embed url="<https://huggingface.co/docs/hub/en/security-tokens>" %}
Creating a Huggingface Access Token
{% endembed %}

### <mark style="color:blue;">Command Line Interface (CLI)</mark>

With a Huggingface account established and an access token you can now <mark style="color:yellow;">access the HuggingFace hub through a command line interface</mark>.

The <mark style="color:yellow;">**`huggingface_hub`**</mark> Python package comes with a built-in CLI called <mark style="color:yellow;">**`huggingface-cli`**</mark>. This tool allows you to interact with the Hugging Face Hub directly from a terminal.&#x20;

<details>

<summary>Reference: <mark style="color:green;">Details on the HuggingFace Command Line</mark></summary>

The huggingface\_hub Python package comes with a <mark style="color:orange;">**built-in CLI**</mark> called huggingface-cli. This tool allows you to *<mark style="color:blue;">**interact with the Hugging Face Hub directly from a terminal**</mark>*.&#x20;

For example, you can login to your account, create a repository, upload and download files, etc.&#x20;

It also comes with handy features to configure your machine or manage your cache. In this guide, we will have a look at the main features of the CLI and how to use them.

<mark style="color:green;">**Core Functionalities**</mark>

* <mark style="color:blue;">**User Authentication**</mark><mark style="color:blue;">:</mark> Allows login/logout and displays the current user information (<mark style="color:yellow;">**`login`**</mark>, <mark style="color:yellow;">**`logout`**</mark>, <mark style="color:yellow;">**`whoami`**</mark>).
* <mark style="color:blue;">**Repository Management**</mark><mark style="color:blue;">:</mark> Enables creation and interaction with repositories on Hugging Face (<mark style="color:yellow;">**`repo`**</mark>).
* <mark style="color:blue;">**File Operations**</mark>: Supports uploading, downloading files, and managing large files on the Hub (<mark style="color:yellow;">**`upload`**</mark>, <mark style="color:yellow;">**`download`**</mark>, <mark style="color:yellow;">**`lfs-enable-largefiles`**</mark>, <mark style="color:yellow;">**`lfs-multipart-upload`**</mark><mark style="color:yellow;">**)**</mark>.
* <mark style="color:blue;">**Cache Management**</mark><mark style="color:blue;">:</mark> Provides commands to scan and delete cached files (<mark style="color:yellow;">**`scan-cache`**</mark>, <mark style="color:yellow;">**`delete-cache`**</mark>).

<mark style="color:green;">**Usage**</mark>

* The CLI supports various commands and options, which can be explored using the `--help` flag.
* To interact with the Hub, such as downloading private repos or uploading files, users need to authenticate using a User Access Token.
* The CLI also supports downloading specific files or entire repositories, filtering files with patterns, and specifying revisions or local directories for downloads.

</details>

First, <mark style="color:yellow;">install the CLI</mark> and its extra dependencies, including the \[cli] extras, for an improved user experience:

```bash
pip3 install -U "huggingface_hub[cli]"
```

<details>

<summary>Reference: <mark style="color:green;">Python libraries installed with the Huggingface Hub</mark></summary>

The command you executed, <mark style="color:yellow;">`pip install -U "huggingface_hub[cli]`</mark>`"`, installs several Python libraries and their dependencies related to the Hugging Face Hub. Here's an explanation of the libraries downloaded:

<mark style="color:green;">**huggingface\_hub**</mark>

* This is the main library that provides access to the Hugging Face Hub, allowing you to interact with models, datasets, and repositories.
* It provides functionalities for uploading, downloading, and managing resources on the Hub.

<mark style="color:green;">**fsspec**</mark>

* fsspec is a Python library for managing filesystem-like abstractions. It is often used for handling remote and cloud-based file systems.
* In the context of the Hugging Face Hub, it likely helps in managing the storage and retrieval of model and dataset files.

<mark style="color:green;">**tqdm**</mark>

* tqdm is a popular library for adding progress bars to loops and other iterables.
* It's used in the CLI to display progress when uploading or downloading large files from the Hugging Face Hub.

<mark style="color:green;">**prompt-toolkit**</mark>

* prompt-toolkit is a library for building command-line interfaces (CLIs) with interactive features.
* It's used by the Hugging Face CLI for handling interactive prompts and user interactions.

<mark style="color:green;">**pfzy**</mark>

* pfzy is a Python library for fuzzy string matching and searching.
* It may be used in the CLI for fuzzy matching of commands or resources.

<mark style="color:green;">**InquirerPy**</mark>

* InquirerPy is a Python library for creating interactive command-line prompts with customisable menus and questions.
* It enhances the user experience when using the Hugging Face CLI by providing interactive prompts.

<mark style="color:green;">**wcwidth**</mark>

* wcwidth is a library for determining the printable width of characters when rendering text in a terminal.
* It helps ensure proper formatting and alignment of text in the CLI.

</details>

Verify that the <mark style="color:yellow;">CLI is correctly set up</mark> by running the following command:

```bash
huggingface-cli --help
```

You should see a list of available options and commands. If you encounter an error like "command not found: huggingface-cli," please refer to the installation guide for troubleshooting.

Now try this command:

```bash
huggingface-cli whoami 
```

When logged into the HuggingFace Hub this command prints your username and the organisations you are a part of on the Hub<mark style="color:yellow;">.</mark>

At this stage, <mark style="color:yellow;">we have not yet logged into the HuggingFace Hub</mark>, so the response will be:

```bash
Not logged in
```

<details>

<summary>Reference: <mark style="color:green;">Huggingface CLI Commands</mark></summary>

Use the `--help` option to get <mark style="color:yellow;">detailed information about a specific command.</mark> For example, to learn more about how to upload files using the CLI, run:

```bash
huggingface-cli upload --help
```

<mark style="color:green;">Examples</mark>

**Login to Your Hugging Face Account**

Use the following command to log in with your token obtained from huggingface.co/settings/tokens:

```bash
huggingface-cli login
```

<mark style="color:green;">**Upload Files to a Repository**</mark>

* To upload a file or folder to a repository on the Hub, use the `upload` command. Replace <mark style="color:yellow;">`<repository_name>`</mark> with the name of your repository and <mark style="color:yellow;">`<path_to_file_or_folder>`</mark> with the path to the file or folder you want to upload:

```bash
huggingface-cli upload <repository_name> <path_to_file_or_folder>
```

<mark style="color:green;">**Download Files from the Hub**</mark>

* Download files from the Hub using the `download` command. Specify the file or folder you want to download and the destination path:

```bash
huggingface-cli download <file_or_folder_to_download> <destination_path>
```

<mark style="color:green;">**Managing Repositories**</mark>

* You can interact with your repositories using the `repo` command. For example, to create a new repository, use:

```bash
huggingface-cli repo create <repository_name>
```

<mark style="color:green;">**Environment Information**</mark>

* To view information about your environment, use the `env` command:

```bash
huggingface-cli env
```

These tutorials will help you get started with the Hugging Face CLI for managing models, datasets, and repositories on the Hugging Face Hub.

</details>

{% hint style="info" %}
We will now be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.
{% endhint %}

To connect to the Huggingface Hub <mark style="color:yellow;">prepare your machine to allow storing of the Huggingface token</mark>:

```bash
git config --global credential.helper store
```

This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.

<details>

<summary><mark style="color:green;">Explanation of Git Credentials</mark></summary>

A "git credential" refers to the <mark style="color:blue;">authentication details used by Git to access repositories,</mark> especially those requiring user verification, such as private repositories or when pushing changes to remote repositories.&#x20;

These credentials can be usernames and passwords, personal access tokens, SSH keys, or other forms of identity verification.

#### <mark style="color:green;">Types of Git Credentials</mark>

<mark style="color:blue;">**Username and Password**</mark><mark style="color:blue;">:</mark> The simplest form, but not recommended for security reasons, especially for repositories over HTTPS.

<mark style="color:blue;">**Personal Access Tokens (PATs)**</mark><mark style="color:blue;">:</mark> More secure than passwords, these tokens are used especially when two-factor authentication (2FA) is enabled. GitHub, for instance, requires PATs for authenticating over HTTPS.

<mark style="color:blue;">**SSH Keys**</mark><mark style="color:blue;">:</mark> Secure and commonly used, SSH keys pair a private key (kept on your machine) with a public key (added to your Git server). They are a popular choice for authentication.

#### <mark style="color:green;">Storage of Git Credentials</mark>

<mark style="color:blue;">**Credential Cache**</mark><mark style="color:blue;">:</mark> Temporarily stores credentials in memory for a short period. This is more secure than storing them on disk but requires re-entry after the cache timeout.

<mark style="color:blue;">**Credential Store**</mark><mark style="color:blue;">:</mark> Saves credentials in plain text in a file on your computer. It's convenient but less secure since the file can be read by anyone with access to your system.

<mark style="color:blue;">**SSH Agent**</mark><mark style="color:blue;">:</mark> For SSH keys, an SSH agent can store your private keys and handle authentication.

<mark style="color:blue;">**System Keychain**</mark><mark style="color:blue;">:</mark> Some Git clients can store credentials in the system's keychain or credential manager, offering a balance of convenience and security.

<mark style="color:blue;">**Environment Variables**</mark><mark style="color:blue;">:</mark> Sometimes used for automation, credentials can be set as environment variables, but this method has significant security downsides.

#### <mark style="color:green;">Best Practices</mark>

<mark style="color:blue;">**Prefer Token/Key-based Authentication**</mark><mark style="color:blue;">:</mark> Use personal access tokens or SSH keys over passwords for better security.

<mark style="color:blue;">**Keep Software Updated**</mark><mark style="color:blue;">:</mark> Ensure your Git client and any related credential management tools are up-to-date to benefit from security patches.

<mark style="color:blue;">**Use SSH Keys Wisely**</mark><mark style="color:blue;">:</mark> Protect your SSH private keys with strong passphrases and use ssh-agent for managing them.

<mark style="color:blue;">**Limit Token Scopes and Lifetimes**</mark><mark style="color:blue;">:</mark> When creating personal access tokens, grant only the necessary permissions and set a reasonable expiration.

<mark style="color:blue;">**Securely Store Sensitive Information**</mark><mark style="color:blue;">:</mark> Avoid storing credentials in plaintext files. Use system keychains or encrypted storage whenever possible.

<mark style="color:blue;">**Be Cautious with Environment Variables**</mark><mark style="color:blue;">:</mark> Be mindful of security risks when using environment variables for credentials, especially in shared or public environments.

<mark style="color:blue;">**Regularly Review and Rotate Credentials**</mark><mark style="color:blue;">:</mark> Regularly review your access tokens and SSH keys, revoking and replacing them as necessary.

<mark style="color:blue;">**Use Two-Factor Authentication (2FA)**</mark><mark style="color:blue;">:</mark> Wherever possible, enable 2FA for your Git hosting service accounts for an additional layer of security.

</details>

### <mark style="color:blue;">Enter the command to login to the Huggingface Hub</mark>

```bash
huggingface-cli login
```

You will be asked to enter a Huggingface token as per the code block below:

<pre class="language-bash" data-full-width="false"><code class="lang-bash"> > huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: <a data-footnote-ref href="#user-content-fn-1">xxxxxxxxxxxxxxx</a>
</code></pre>

You will then be asked whether you want to add the Huggingface token to be added as a git credential.   <mark style="color:yellow;">Answer yes (y) to this question</mark>.  The output should be as below:

```
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/jack/.cache/huggingface/token
Login successful
```

This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.

Now check to ensure you are logged in:

```bash
huggingface-cli whoami 
```

Now that we have logged into the HuggingFace hub the output should be:

```bash
<your name>
orgs:  <your organisation>
```

You can clearly see your name, and the name of the organization.

### <mark style="color:purple;">Congratulations!</mark>

#### <mark style="background-color:green;">You have established the Axolotl platform and connected to Huggingface.</mark>

[^1]: Enter your Hugginface token here


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/huggingface-hub.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
