Hugging Face Models and Evaluation Metrics
Hugging Face Models and Evaluation Metrics
1. Classi cation
Text Classi cation: DistilBERT, RoBERTa, BERT (e.g., sentiment analysis, spam
•
detection).
• Image Classi cation: ViT (Vision Transformer), ResNet.
• Zero-Shot Classi cation: facebook/bart-large-mnli (classify without fine-tuning).
Example (Text Classi cation):
python
Copy
Download
from transformers import pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-
english")
result = classifier("I love Hugging Face!")
print(result) # Output: [{'label': 'POSITIVE', 'score': 0.9998}]
2. Text Generation
python
Copy
Download
generator = pipeline("text-generation", model="gpt2")
output = generator("Once upon a time,", max_length=50)
print(output[0]['generated_text'])
3. Summarization
python
Copy
Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Long article text...", max_length=130)
print(summary[0]['summary_text'])
4. Translation
python
Copy
Download
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")
print(result[0]['translation_text']) # Output: "Bonjour, comment allez-vous ?"
• bert-large-uncased-whole-word-masking-finetuned-squad, distilbert-base-cased-
distilled-squad.
Example:
python
Copy
Download
qa_pipeline = pipeline("question-answering", model="bert-large-uncased-whole-word-
masking-finetuned-squad")
result = qa_pipeline(question="What is Hugging Face?", context="Hugging Face is a
company...")
print(result['answer']) # Output: "a company"
6. Image Processing
Tasks:
python
Copy
Download
from transformers import ViTImageProcessor, ViTForImageClassification
import torch
from PIL import Image
processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")
image = [Link]("[Link]")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = [Link](-1).item()
print([Link].id2label[predicted_class]) # e.g., "tabby cat"
• CompVis/stable-diffusion-v1-4, runwayml/stable-diffusion-v1-5.
Example:
python
Copy
Download
from diffusers import StableDiffusionPipeline
import torch
fi
fi
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16)
pipe = [Link]("cuda")
——————————————————————————————————————
Copy
Download
import evaluate
• bleu = [Link]("bleu")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references))
• # Output: {'bleu': 0.75, 'precisions': [1.0, 0.8, 0.666, 0.5], ...}
Copy
Download
rouge = [Link]("rouge")
• predictions = ["the cat is on the mat"]
• references = ["the cat sits on the mat"]
• print([Link](predictions=predictions, references=references))
• # Output: {'rouge1': 0.83, 'rouge2': 0.66, 'rougeL': 0.83}
Copy
Download
meteor = [Link]("meteor")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references))
• # Output: {'meteor': 0.87}
4. BERTScore
Copy
Download
bertscore = [Link]("bertscore")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references,
lang="en"))
• # Output: {'precision': 0.98, 'recall': 0.97, 'f1': 0.97}
Copy
Download
# Requires `clean-fid` and PyTorch
• from cleanfid import fid
• score = fid.compute_fid("path/to/generated_images", "path/to/real_images")
• print(f"FID Score: {score}") # Lower is better (e.g., < 30 is good)
Copy
Download
7. Perplexity
Copy
fi
Download
Summary Table
——————————————————————————————————————
1. RAG (Retrieval-Augmented Generation)
What it is:
• Dynamic Knowledge: Unlike static LLMs, RAG can access up-to-date or domain-
specific data.
• Transparency: Sources can be cited (useful for research/chats).
• Ef ciency: Avoids retraining the LLM for new information.
Applications:
python
Copy
Download
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq",
index_name="exact")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq",
retriever=retriever)
What it is:
• Models that process and generate multiple data types (text, images, audio,
video) simultaneously.
• Unlike unimodal AI (e.g., GPT for text, Stable Diffusion for images), multimodal
systems understand cross-modal relationships (e.g., captioning images, generating
videos from text).
Key Architectures:
python
Copy
Download
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
image = [Link]("[Link]")
text = ["a photo of a cat", "a photo of a dog"]
Core Goal Enhance LLMs with external knowledge. Fuse/translate between modalities.
Future systems might use multimodal RAG—e.g., retrieving images and text to answer
complex queries like:
——————————————————————————————————————
Customizing a pre-trained model with your own data in Hugging Face involves ne-
tuning the model on your dataset. Here's a step-by-step guide covering text and image
tasks, using the transformers and datasets libraries.
bash
fi
Copy
Download
pip install transformers datasets accelerate evaluate torch
python
Copy
Download
from datasets import load_dataset
python
Copy
Download
from datasets import Dataset
Choose a model for your task (e.g., BERT for text classification, ViT for images).
python
Copy
Download
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,
num_labels=2) # 2 classes
python
Copy
Download
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
python
Copy
Download
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir="./results", # Output directory
per_device_train_batch_size=8, # Batch size
num_train_epochs=3, # Number of epochs
evaluation_strategy="epoch", # Evaluate after each epoch
save_strategy="epoch",
logging_dir="./logs",
)
Step B: De ne Metrics (Optional)
python
fi
fi
Copy
Download
import numpy as np
import evaluate
metric = [Link]("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = [Link](logits, axis=-1)
return [Link](predictions=predictions, references=labels)
Step C: Train the Model
python
Copy
Download
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_val,
compute_metrics=compute_metrics,
)
[Link]()
Save Locally
python
Copy
Download
model.save_pretrained("./custom_model")
tokenizer.save_pretrained("./custom_model")
Push to Hugging Face Hub (Optional)
python
Copy
Download
from huggingface_hub import notebook_login
• Model: distilbert-base-uncased
• Dataset: Custom CSV with text and label columns.
2. Image Classi cation (e.g., Cats vs. Dogs)
python
Copy
Download
from transformers import ViTFeatureExtractor, ViTForImageClassification
import torch
# Load model
model_name = "google/vit-base-patch16-224"
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
model = ViTForImageClassification.from_pretrained(model_name, num_labels=10)
# Preprocess images
def transform(example_batch):
inputs = feature_extractor(example_batch["img"], return_tensors="pt")
inputs["label"] = example_batch["label"]
return inputs
dataset = dataset.with_transform(transform)
python
Copy
fi
fi
Download
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")
Key Considerations
1. Dataset Size: Small datasets may need fewer epochs to avoid overfitting.
2. GPU/TPU: Use accelerate for distributed training.
3. Hyperparameters: Tweak learning_rate, batch_size, etc.
4. Domain Adaptation: For domain-specific tasks (e.g., medical text), try domain-
adapted models like BioBERT.
Conclusion
——————————————————————————————————————
1. Zero-Shot Prompting
De nition: The model generates an answer without any examples, relying solely on its
pre-trained knowledge.
Use Case: Quick inference tasks (classification, QA).
Hugging Face Implementation:
python
Copy
Download
from transformers import pipeline
2. Few-Shot Prompting
python
Copy
Download
from transformers import pipeline
python
Copy
Download
prompt = """
Q: If a store has 10 apples and sells 4, how many are left?
A: Let’s think step-by-step.
fi
fi
1. Start with 10 apples.
2. Sell 4 apples.
3. Remaining apples = 10 - 4 = 6.
Q: A bakery has 20 croissants and bakes 12 more. How many are there now?
A:"""
output = generator(prompt, temperature=0.7) # Temperature for creativity control
Note: Works best with models like GPT-3.5-turbo or Flan-T5.
4. Instruction Prompting
De nition: Explicitly state the task in the prompt (e.g., "Summarize this text:").
Use Case: Controlled generation (summarization, translation).
Hugging Face Example:
python
Copy
Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = "Hugging Face is a company..."
summary = summarizer(f"Summarize this text briefly: {text}", max_length=30)
print(summary[0]["summary_text"])
5. Template-Based Prompting
python
Copy
Download
template = """
User: {input}
Assistant:"""
input_text = "Explain quantum computing."
output = generator([Link](input=input_text), max_length=100)
python
Copy
Download
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq")
inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = [Link](input_ids=inputs["input_ids"])
print([Link](outputs[0])) # Output: "Paris"
python
Copy
Download
from peft import PromptTuningConfig, get_peft_model
8. Multimodal Prompting
python
Copy
Download
from transformers import CLIPProcessor, CLIPModel
fi
fi
fi
fi
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image,
return_tensors="pt", padding=True)
outputs = model(**inputs)
Key Takeaways
——————————————————————————————————————
A library optimized for fast similarity search with minimal memory usage.
Key Features
• CPU/GPU support
• No native persistence (save/load indexes manually)
• Best for static datasets
Steps to Use FAISS with Hugging Face
A. Generate Embeddings
python
Copy
Download
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # Lightweight model
embeddings = [Link](["Hugging Face is awesome", "I love NLP"])
B. Build a FAISS Index
python
Copy
Download
import faiss
import numpy as np
# Save index
faiss.write_index(index, "my_faiss_index.index")
C. Query the Index
python
Copy
Download
# Load index
index = faiss.read_index("my_faiss_index.index")
A. Run Qdrant
bash
Copy
Download
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
B. Insert Embeddings
python
Copy
Download
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient("localhost") # Connect to server
# Add embeddings
[Link](
collection_name="my_collection",
points=[
{"id": 1, "vector": [Link]("Hugging Face is awesome").tolist()},
{"id": 2, "vector": [Link]("I love NLP").tolist()},
]
)
C. Query with Filters
python
Copy
Download
query = [Link]("What is Hugging Face?")
hits = [Link](
collection_name="my_collection",
query_vector=[Link](),
limit=2 # Top-2 results
)
print(hits[0].payload) # Returns closest match
5. Real-World Applications
python
Copy
Download
from transformers import RagTokenizer, RagRetriever
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
retriever = RagRetriever.from_pretrained(
"facebook/rag-token-nq",
index_name="custom",
passages_path="my_faiss_index.index" # Load your FAISS index
)
Key Takeaways
——————————————————————————————————————
1. Chunking Strategies
Chunking splits documents into smaller pieces for efficient processing. Different strategies
include:
A. Fixed-Size Chunking
• How it works: Splits text into equal-sized chunks (e.g., 256 tokens).
• Pros: Simple, works well for uniform content.
• Cons: May split semantically related text.
• Example:
python
Copy
Download
B. Overlapping Chunking
• How it works: Adds overlap between chunks (e.g., 10% of chunk size) to preserve
context.
• Pros: Reduces information loss at boundaries.
• Cons: Increases storage/compute costs.
C. Content-Aware Chunking
• How it works: Uses NLP (e.g., sentence boundaries, headings) to split logically.
• Methods:
◦ Sentence Splitting: NLTK/spaCy for sentence detection.
◦ HTML/PDF Parsing: Extract sections (e.g., BeautifulSoup for HTML).
• Example:
python
Copy
Download
from langchain.text_splitter import MarkdownHeaderTextSplitter
• headers = [("#", "Header1"), ("##", "Header2")]
• splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers)
• chunks = splitter.split_text(markdown_doc)
Copy
Download
2. Embedding Strategies
Example (Sentence-BERT):
python
Copy
Download
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = [Link](["Hugging Face is awesome"])
Retrieval Strategies
A. Similarity Search
Copy
Download
B. Hybrid Search
Copy
Download
hits = qdrant_client.search(
• query_vector=embedding,
• filter=Filter(must=[FieldCondition(key="year", range=Range(gte=2020))],
• limit=5
• )
RAG Fusion
• How it works: Combines results from multiple retrievers (e.g., vector + keyword)
and reranks them.
• Methods:
◦ Reciprocal Rank Fusion (RRF): Merges rankings from different searches.
◦ Cross-Encoder Reranking: Uses BERT-like models to rescore.
• Example:
python
Copy
Download
Retrieval Ranking
python
Copy
Download
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query, chunk) for chunk in top_chunks]
scores = [Link](pairs)
reranked_chunks = [chunk for _, chunk in sorted(zip(scores, top_chunks), reverse=True)]
A. GuardRails
• What it is: Rules to block harmful/off-topic outputs.
• Tools:
◦ NeMo Guardrails (NVIDIA)
◦ Microsoft Guidance
• Example:
python
Copy
Download
B. Response Evaluation
• Metrics:
◦ Faithfulness: Is the output grounded in retrieved docs?
◦ Answer Relevance: Does it address the query?
• Tools:
◦ DeepEval: evaluate_faithfulness(generation, contexts)
◦ RAGAS (Retrieval-Augmented Generation Assessment)
Example (DeepEval):
python
Copy
Download
from [Link] import FaithfulnessMetric
metric = FaithfulnessMetric()
[Link](model_output, contexts=[retrieved_docs])
print([Link]) # 0.8 (faithfulness score)
C. Explanation
Key Takeaways
Component Goal Tools/Models
For production RAG systems, combine these techniques and iterate based on evaluation
metrics. Need a Colab notebook for any step?
1. Zero-Shot Prompting
De nition: The model generates an answer without any prior examples, relying solely
on its pre-trained knowledge.
Use Case: Quick classification, general QA.
Example:
python
Copy
Download
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier("This movie is fantastic!", candidate_labels=["positive", "negative"])
# Output: {'labels': ['positive', 'negative'], 'scores': [0.99, 0.01]}
2. Few-Shot Prompting
De nition: Provide a few input-output examples in the prompt to guide the model.
Use Case: Custom tasks (e.g., translation, formatting).
Example:
text
Copy
fi
fi
Download
Translate English to French:
- "Hello" → "Bonjour"
- "Coffee" → "Café"
- "Car" →
Model Output: "Voiture"
text
Copy
Download
Q: If a store has 10 apples and sells 4, how many are left?
A: Let’s think step-by-step.
1. Start with 10 apples.
2. Sell 4 apples.
3. Remaining apples = 10 - 4 = 6.
4. Instruction Prompting
De nition: Explicitly state the task in the prompt (e.g., "Summarize this article:").
Use Case: Controlled generation (summarization, translation).
Example:
python
Copy
Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Summarize this: Hugging Face is a company...", max_length=30)
5. Template-Based Prompting
Copy
Download
User: {input}
Assistant: {output}
6. Self-Consistency Prompting
De nition: Generate multiple reasoning paths and pick the most consistent answer.
Use Case: Improving reliability of CoT.
Example:
• Generate 3 different reasoning paths for a math problem and vote for the most
frequent answer.
7. Least-to-Most Prompting
text
Copy
Download
Q: What is the capital of France?
Sub-Q1: Which country is Paris in?
Sub-Q2: Is Paris the capital of France?
A: Paris.
python
Copy
Download
from transformers import RagRetriever, RagTokenForGeneration
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq")
fi
fi
fi
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq",
retriever=retriever)
9. Multimodal Prompting
python
Copy
Download
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
inputs = processor(text=["a cat", "a dog"], images=image, return_tensors="pt",
padding=True)
De nition: Dynamically select the best examples for few-shot prompts based on input.
Use Case: Adaptive few-shot learning.
text
Copy
Download
Q: Describe the Eiffel Tower in 10 words.
A: "Iconic Parisian iron tower, 324m tall, built in 1889."
text
Copy
Download
Good Response: "Python is interpreted."
Bad Response: "Python is compiled."
Q: Is Python compiled or interpreted?
text
Copy
Download
Step 1: Generate facts about the Eiffel Tower.
Step 2: Use facts to answer "When was it built?"
python
Copy
Download
from peft import PromptTuningConfig, get_peft_model
config = PromptTuningConfig(task_type="SEQ_CLS", num_virtual_tokens=10)
model = get_peft_model(model, config)
fi
fi
fi
fi
17. Tree-of-Thought (ToT) Prompting
text
Copy
Download
Q: How is a CPU like a brain?
A: Both process information and control functions.
20. Meta-Prompting
text
Copy
Download
Generate a prompt to classify sentiment:
"Label this text as positive, negative, or neutral: {text}"
Summary Table
Key Takeaways