# Phi 2.0 details

To introduce you to the process of fine tuning a language model, we will begin with Phi 2.0. &#x20;

Phi 2.0 is a <mark style="color:yellow;">small but powerful model</mark>, and a great way to begin learning how to fine tune a language model.

### <mark style="color:blue;">Phi 2.0 Review</mark>

Phi 2 is a <mark style="color:yellow;">relatively small model with 2.7 billion parameters,</mark> yet it outperforms models of comparable size like Mamba and Google's Gemini Nano, as well as models 20-25 times its size, according to the benchmarks.

Phi 2 was <mark style="color:yellow;">trained on high-quality synthetic data,</mark> including textbook-quality code, common sense reasoning, logic, science, and theory of mind exercises generated by GPT-3.5 and filtered by GPT-4. This synthetic data approach allowed for more training epochs.

Training on <mark style="color:yellow;">synthetic data tends to result in less toxic models,</mark> as evidenced by Phi 2's lower toxicity scores even before reinforcement learning.

The Phi 2 researchers believe that enormous amounts of compute have been wasted on ineffective training data, and that carefully curated synthetic data can lead to more efficient and higher-quality models.

Phi 2's performance suggests that achieving ChatGPT-level capabilities with a 1 billion parameter model may be possible. Extrapolating further, a 1.5 trillion parameter model trained this way could potentially imitate a 1.5 quadrillion parameter model.

However, Phi models are sensitive to prompt variations, and longer prompts may cause the model to forget, ignore, or misinterpret parts of the prompt.

The Phi 2 model itself is open-sourced, although the full training dataset has not been released yet.

The key takeaways are that Phi 2 demonstrates the potential of using high-quality synthetic data to train smaller, more efficient models that can rival the performance of much larger models, and that this approach could lead to significant advancements in AI capabilities in the near future. However, the model's sensitivity to prompts is a limitation to keep in mind.&#x20;

#### <mark style="color:blue;">Click on link below to review Phi 2.0 at Huggingface model repository</mark>

{% embed url="<https://huggingface.co/microsoft/phi-2>" %}
Huggingface model repository - Phi 2.0
{% endembed %}

The expandables below give you some insight as to the files that come with the Phi 2.0 model, and what they mean.

<details>

<summary><mark style="color:green;">Files and Versions - Explanation of Contents</mark></summary>

Here's an explanation of each file in the Hugging Face model card:

<mark style="color:yellow;">`.gitattributes`</mark><mark style="color:yellow;">:</mark> This file is used to define attributes for different file types in the Git repository. It can specify how certain files should be treated, such as whether they should be normalized or how line endings should be handled.

<mark style="color:yellow;">`LICENSE`</mark><mark style="color:yellow;">:</mark> This file contains the license under which the model is distributed. It specifies the terms and conditions for using, modifying, and distributing the model.

<mark style="color:yellow;">`README.md`</mark><mark style="color:yellow;">:</mark> This file provides an overview of the model, including its purpose, usage instructions, and any other relevant information. It serves as the main documentation for the model.

<mark style="color:yellow;">`added_tokens.json`</mark>: This file contains information about any additional tokens that have been added to the model's vocabulary beyond the standard tokens.

<mark style="color:yellow;">`config.json`</mark><mark style="color:yellow;">:</mark> This file holds the configuration settings for the model, such as the model architecture, hyperparameters, and other model-specific details.

<mark style="color:yellow;">`configuration_phi.py`</mark><mark style="color:yellow;">:</mark> This Python file likely contains the implementation of the model's configuration class, which is used to load and manage the model's configuration.

<mark style="color:yellow;">`generation_config.json`</mark><mark style="color:yellow;">:</mark> This file specifies the configuration settings for text generation using the model, such as the maximum sequence length, temperature, and other generation-related parameters.

<mark style="color:yellow;">`merges.txt`</mark><mark style="color:yellow;">:</mark> This file is part of the tokenizer and contains the byte pair encoding (BPE) merges used for tokenization.

<mark style="color:yellow;">`model-00001-of-00002.safetensors`</mark> <mark style="color:yellow;"></mark><mark style="color:yellow;">and</mark> <mark style="color:yellow;"></mark><mark style="color:yellow;">`model-00002-of-00002.safetensors`</mark><mark style="color:yellow;">:</mark> These files contain the serialized model weights, split into two parts.&#x20;

The `.safetensors` format is a <mark style="color:blue;">memory-mapped format for storing tensors safely.</mark>

<mark style="color:yellow;">`model.safetensors.index.json`</mark><mark style="color:yellow;">:</mark> This file likely contains metadata or an index for the serialized model weights stored in the <mark style="color:yellow;">`.safetensors`</mark> files.

<mark style="color:yellow;">`modeling_phi.py`</mark><mark style="color:yellow;">:</mark> This Python file contains the implementation of the model architecture and its forward pass.

<mark style="color:yellow;">`special_tokens_map.json`</mark><mark style="color:yellow;">:</mark> This file maps special token names to their corresponding token IDs in the model's vocabulary.

<mark style="color:yellow;">`tokenizer.json`</mark><mark style="color:yellow;">:</mark> This file contains the serialized tokenizer object, which is used to tokenize input text into a format suitable for the model.

<mark style="color:yellow;">`tokenizer_config.json`</mark><mark style="color:yellow;">:</mark> This file holds the configuration settings for the tokenizer, such as the vocabulary size and any special tokens.

<mark style="color:yellow;">`vocab.json`</mark><mark style="color:yellow;">:</mark> This file contains the vocabulary of the model, mapping tokens to their corresponding IDs.

These files collectively define the model architecture, weights, configuration, tokenizer, and other necessary components for using the model in downstream tasks or applications.

</details>

<details>

<summary><mark style="color:green;">config.json contents and explanation</mark></summary>

The <mark style="color:yellow;">`config.json`</mark> file contains the configuration settings for the Transformer-based language model called "Phi".

1. <mark style="color:yellow;">`"_name_or_path": "microsoft/phi-2"`</mark><mark style="color:yellow;">:</mark> Specifies the name or path of the pre-trained model.
2. <mark style="color:yellow;">`"architectures": ["PhiForCausalLM"]`</mark><mark style="color:yellow;">:</mark> Indicates the architecture class used for the model, which is `PhiForCausalLM` (Phi model for causal language modeling).
3. <mark style="color:yellow;">`"auto_map": { ... }`</mark><mark style="color:yellow;">:</mark> Defines the mapping between the auto classes <mark style="color:yellow;">(</mark><mark style="color:yellow;">`AutoConfig`</mark> <mark style="color:yellow;"></mark><mark style="color:yellow;">and</mark> <mark style="color:yellow;"></mark><mark style="color:yellow;">`AutoModelForCausalLM`</mark><mark style="color:yellow;">)</mark> and their corresponding implementation classes (`PhiConfig` and `PhiForCausalLM`).
4. <mark style="color:yellow;">`"attention_dropout": 0.0`</mark><mark style="color:yellow;">:</mark> Sets the dropout probability for the attention layers. A value of 0.0 means no dropout is applied.
5. <mark style="color:yellow;">`"bos_token_id": 50256`</mark> and <mark style="color:yellow;">`"eos_token_id": 50256`</mark><mark style="color:yellow;">:</mark> Specifies the token IDs for the beginning-of-sequence (BOS) and end-of-sequence (EOS) tokens.
6. <mark style="color:yellow;">`"embd_pdrop": 0.0`</mark><mark style="color:yellow;">:</mark> Sets the dropout probability for the embedding layers.
7. <mark style="color:yellow;">`"hidden_act": "gelu_new"`</mark><mark style="color:yellow;">:</mark> Specifies the activation function used in the hidden layers, which is the "gelu\_new" variant of the Gaussian Error Linear Unit (GELU) activation.
8. <mark style="color:yellow;">`"hidden_size": 2560`</mark><mark style="color:yellow;">:</mark> Defines the dimensionality of the model's hidden states.
9. <mark style="color:yellow;">`"initializer_range": 0.02`</mark><mark style="color:yellow;">:</mark> Sets the range for initializing the model's weights.
10. <mark style="color:yellow;">`"intermediate_size": 10240`</mark><mark style="color:yellow;">:</mark> Specifies the dimensionality of the intermediate (feed-forward) layers.
11. <mark style="color:yellow;">`"layer_norm_eps": 1e-05`</mark><mark style="color:yellow;">:</mark> Sets the epsilon value for layer normalization to provide numerical stability.
12. <mark style="color:yellow;">`"max_position_embeddings": 2048`</mark><mark style="color:yellow;">:</mark> Defines the maximum sequence length that the model can handle.
13. <mark style="color:yellow;">`"model_type": "phi"`</mark><mark style="color:yellow;">:</mark> Indicates the type of the model, which is "phi".
14. <mark style="color:yellow;">`"num_attention_heads": 32`</mark><mark style="color:yellow;">:</mark> Specifies the number of attention heads in each attention layer.
15. <mark style="color:yellow;">`"num_hidden_layers": 32`</mark><mark style="color:yellow;">:</mark> Defines the number of hidden layers (Transformer blocks) in the model.
16. <mark style="color:yellow;">`"num_key_value_heads": 32`</mark><mark style="color:yellow;">:</mark> Specifies the number of key-value pairs in each attention head.
17. <mark style="color:yellow;">`"partial_rotary_factor": 0.4`</mark><mark style="color:yellow;">:</mark> Defines the partial rotary factor used in rotary position embedding.
18. <mark style="color:yellow;">`"qk_layernorm": false`</mark><mark style="color:yellow;">:</mark> Indicates whether layer normalization is applied to the query and key vectors in the attention mechanism.
19. <mark style="color:yellow;">`"resid_pdrop": 0.1`</mark><mark style="color:yellow;">:</mark> Sets the dropout probability for the residual connections.
20. <mark style="color:yellow;">`"rope_scaling": null`</mark> <mark style="color:yellow;"></mark><mark style="color:yellow;">and</mark> <mark style="color:yellow;"></mark><mark style="color:yellow;">`"rope_theta": 10000.0`</mark><mark style="color:yellow;">:</mark> Specify the scaling and theta values for RoPE (Rotary Position Embedding).
21. <mark style="color:yellow;">`"tie_word_embeddings": false`</mark><mark style="color:yellow;">:</mark> Indicates whether the input and output word embeddings are tied (shared).
22. <mark style="color:yellow;">`"torch_dtype": "float16"`</mark><mark style="color:yellow;">:</mark> Specifies the data type used for the model's parameters (float16 for half-precision).
23. <mark style="color:yellow;">`"transformers_version": "4.37.0"`</mark><mark style="color:yellow;">:</mark> Indicates the version of the Transformers library used.
24. <mark style="color:yellow;">`"use_cache": true`</mark><mark style="color:yellow;">:</mark> Enables caching of the model's key-value pairs during inference for faster generation.
25. <mark style="color:yellow;">`"vocab_size": 51200`</mark><mark style="color:yellow;">:</mark> Defines the size of the model's vocabulary.

These configuration settings determine the architecture, hyperparameters, and behavior of the Phi model. They are used to initialise and configure the model during training and inference.

</details>

<details>

<summary><mark style="color:green;">model.safetensors.index.json</mark></summary>

The <mark style="color:yellow;">`model.safetensors.index.json`</mark> file is an index file that *<mark style="color:yellow;">**maps the names of the model's parameters to their corresponding locations within the**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**`.safetensors`**</mark><mark style="color:yellow;">**&#x20;**</mark><mark style="color:yellow;">**files.**</mark>*

In this case, the model's parameters are stored in two separate `.safetensors` files:&#x20;

`model-00001-of-00002.safetensors` and `model-00002-of-00002.safetensors`.&#x20;

The index file helps the model loading process identify where each parameter is located.

Let's break it down:

1. The <mark style="color:yellow;">`metadata`</mark> field contains information about the total size of the model parameters in bytes (5,559,367,680 bytes, which is <mark style="color:yellow;">approximately 5.6 GB</mark>).
2. The <mark style="color:yellow;">`weight_map`</mark> field is a dictionary where each key represents the name of a model parameter, and the corresponding value indicates the file in which that parameter is stored.

For example, the entry <mark style="color:yellow;">`"model.embed_tokens.weight": "model-00001-of-00002.safetensors"`</mark> means that the <mark style="color:yellow;">`model.embed_tokens.weight`</mark> parameter is stored in the <mark style="color:yellow;">`model-00001-of-00002.safetensors`</mark> file.

The parameter names provide information about the model architecture:

* <mark style="color:yellow;">`model.embed_tokens.weight`</mark> represents the embedding layer weights.
* <mark style="color:yellow;">`model.layers.0.input_layernorm.bias`</mark> and <mark style="color:yellow;">`model.layers.0.input_layernorm.weight`</mark> represent the layer normalization parameters for the input of the first layer.
* <mark style="color:yellow;">`model.layers.0.self_attn.dense.bias`</mark> and <mark style="color:yellow;">`model.layers.0.self_attn.dense.weight`</mark> represent the parameters of the dense layer in the self-attention mechanism of the first layer.
* ... and so on for each layer of the model.

The <mark style="color:yellow;">`.safetensors`</mark> format is a way to store the model parameters efficiently and safely. It allows for faster loading times and helps prevent issues like model corruption.

In summary, the <mark style="color:yellow;">`model.safetensors.index.json`</mark> file acts as a map that tells the model loading process where to find each parameter within the <mark style="color:yellow;">`.safetensors`</mark> files. This enables the model to be loaded correctly and efficiently.

</details>

With an understanding of the model characteristics, we will now download it to our local directory.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://axolotl.continuumlabs.pro/downloading-models/phi-2.0-details.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
