Please click on the link below for a comprehensive review of uploading models on Hugginface:
Repositories
Models, Spaces, and Datasets are hosted on the Hugging Face Hub as Git repositories, which means that version control and collaboration are core elements of the Hub.
In a nutshell, a repository (also known as a repo) is a place where code and assets can be stored to back up your work, share it with the community, and work in a team.
Creating a repository
Using the Hub’s web interface you can easily create repositories, add files and explore models,
There are three kinds of repositories on the Hub, and in this guide you’ll be creating a model repository for demonstration purposes.
Specify the owner of the repository: this can be either you or any of the organisations you’re affiliated with.
Them enter your model’s name, this will also be the name of the repository. Then Specify whether you want your model to be public or private.
Specify the license. You can leave the License field blank for now. To learn about licenses, visit the Licenses documentation.
After creating your model repository, you should see a page like this:
Note that the Hub prompts you to create a Model Card, which you can learn about in the Model Cards documentation.
Including a Model Card in your model repo is best practice, but since we’re only making a test repo at the moment we can skip this.
Importance of completing a comprehensive model card
A model card is an essential document that accompanies a model, providing crucial information about the model's characteristics, intended uses, limitations, and performance.
Properly filling out a model card is vital for ensuring transparency, reproducibility, and responsible usage of the model.
Here's a step-by-step guide on how to fill out a model card, along with explanations of why each piece is important:
Model Description
Provide a clear and concise description of your model, including its architecture, purpose, and key features.
Explain the problem domain and the specific task(s) the model is designed to address.
Specify the model's version and any relevant identifying information.
Importance: A clear model description helps users understand what the model does and whether it is suitable for their specific use case. It also aids in model discoverability and comparison.
Intended Uses & Limitations
Clearly state the intended uses of the model and the scenarios for which it was designed.
Discuss any known limitations, biases, or ethical considerations associated with the model.
Provide guidance on the appropriate and inappropriate uses of the model.
Importance: Specifying intended uses and limitations ensures that the model is used responsibly and within its designed scope. It helps prevent misuse and potential harm arising from applying the model to unsuitable tasks or contexts.
Training Data
Describe the dataset(s) used to train the model, including their sources, size, and any preprocessing steps applied.
Provide information about the data distribution, such as class balance, demographic information, or any notable biases.
Discuss any data quality issues, noise, or limitations that may impact the model's performance.
Importance: Transparency about the training data allows users to assess the model's generalizability and potential biases. It enables reproducibility and helps users understand the model's strengths and weaknesses.
Training Procedure
Outline the training methodology, including the optimisation algorithm, hyperparameters, and any specific techniques used (e.g., transfer learning, data augmentation).
Specify the hardware and software environment used for training.
Provide information about the training duration, number of epochs, and any early stopping criteria.
Importance: Documenting the training procedure facilitates reproducibility and allows users to understand the factors that influenced the model's performance. It also enables comparisons with other models and aids in debugging and improvement efforts.
Evaluation Metrics & Results
Clearly define the evaluation metrics used to assess the model's performance, such as accuracy, precision, recall, or domain-specific metrics.
Present the model's performance results on held-out test data or benchmark datasets.
Provide confidence intervals, statistical significance, or other measures of uncertainty, if applicable.
Discuss any limitations or caveats in the evaluation process.
Importance: Evaluation metrics and results give users an objective assessment of the model's performance and help them determine if the model meets their requirements. It allows for comparative analysis with other models and informs decision-making.
Contact Information & Resources
Include contact information for the model's creators or maintainers for questions, feedback, or support.
Provide links to additional resources, such as research papers, documentation, or code repositories.
Specify the license under which the model is released and any associated terms of use.
Importance: Contact information and resources enable users to seek further information, report issues, or collaborate with the model's creators. Clear licensing information ensures proper attribution and sets expectations for model usage.
By thoroughly filling out each section of the model card, you provide users with a comprehensive understanding of your model's capabilities, limitations, and considerations. This transparency builds trust, promotes responsible usage, and enables users to make informed decisions when employing your model in their applications.
Choose Repository Type: Decide whether you want to link the repository with an individual user or an organization. Organizations can collect models related to a company, community, or library.
Upload Methods
There are a number of upload methods.
Because model repos are Git repositories we can use Git to push your model files to the Hub.
Follow the guide on Getting Started with Repositories for detailed instructions on using the git CLI to commit and push your models.
Custom PyTorch Model: If your model is a custom PyTorch model, use the huggingface_hub Python library, which enables capabilities like from_pretrained, push_to_hub, and automated download metrics.
Web Interface: Alternatively, you can upload models using the web interface. Visit huggingface.co/new and follow the steps to create a new model repository. You can upload files from your computer and leave a commit message to document your changes.
Inspect Files and History: After uploading, you can inspect your repository, view recently added files, explore commits, and see the difference introduced by each commit.
Add Metadata: Enhance your model card by adding metadata such as the task type, used library, language, dataset, metrics, license, and more. This helps users understand your model better.
Model Usage Examples:
For libraries with built-in support, like Transformers, use methods provided by the library to push/load from the Hub.
For custom PyTorch models, leverage the PyTorchModelHubMixin class from the huggingface_hub library. Define hyperparameters in a config dictionary, create a class that takes the config, and utilize save_pretrained, push_to_hub, and from_pretrained methods.
Ensure automated download metrics are available for your model, and consider adding a model card to provide additional information about your model.