0% found this document useful (0 votes)
31 views36 pages

Hugging Face Models and Evaluation Metrics

The document outlines various tasks and models available in Hugging Face for classification, text generation, summarization, translation, question answering, image processing, and diffusion models. It also discusses evaluation metrics for machine translation, summarization, and image generation, along with their use cases and examples. Additionally, it introduces concepts like Retrieval-Augmented Generation (RAG) and Multimodal AI, highlighting their functionalities and applications.

Uploaded by

vaibhavag404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views36 pages

Hugging Face Models and Evaluation Metrics

The document outlines various tasks and models available in Hugging Face for classification, text generation, summarization, translation, question answering, image processing, and diffusion models. It also discusses evaluation metrics for machine translation, summarization, and image generation, along with their use cases and examples. Additionally, it introduces concepts like Retrieval-Augmented Generation (RAG) and Multimodal AI, highlighting their functionalities and applications.

Uploaded by

vaibhavag404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

——————————————————————————————————————

1. Classi cation

Task: Assign a label to input data (text, image, etc.).


Hugging Face Models:

Text Classi cation: DistilBERT, RoBERTa, BERT (e.g., sentiment analysis, spam

detection).
• Image Classi cation: ViT (Vision Transformer), ResNet.
• Zero-Shot Classi cation: facebook/bart-large-mnli (classify without fine-tuning).
Example (Text Classi cation):

python

Copy

Download
from transformers import pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-
english")
result = classifier("I love Hugging Face!")
print(result) # Output: [{'label': 'POSITIVE', 'score': 0.9998}]

2. Text Generation

Task: Generate coherent text (e.g., stories, code, dialogue).


Hugging Face Models:

• Autoregressive Models: GPT-2, GPT-J, GPT-Neo.


• Controlled Generation: CTRL (conditional text).
Example (Text Generation):

python

Copy

Download
generator = pipeline("text-generation", model="gpt2")
output = generator("Once upon a time,", max_length=50)
print(output[0]['generated_text'])

3. Summarization

Task: Condense long text into a shorter summary.


Hugging Face Models:
fi
fi
fi
fi
fi
• facebook/bart-large-cnn, google/pegasus-xsum, t5-small.
Example:

python

Copy

Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Long article text...", max_length=130)
print(summary[0]['summary_text'])

4. Translation

Task: Translate text between languages.


Hugging Face Models:

• Helsinki-NLP/opus-mt-en-fr (English → French), t5-base (multilingual).


Example:

python

Copy

Download
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")
print(result[0]['translation_text']) # Output: "Bonjour, comment allez-vous ?"

5. Question Answering (QA)

Task: Extract answers from a context (e.g., SQuAD dataset).


Hugging Face Models:

• bert-large-uncased-whole-word-masking-finetuned-squad, distilbert-base-cased-
distilled-squad.
Example:

python

Copy

Download
qa_pipeline = pipeline("question-answering", model="bert-large-uncased-whole-word-
masking-finetuned-squad")
result = qa_pipeline(question="What is Hugging Face?", context="Hugging Face is a
company...")
print(result['answer']) # Output: "a company"

6. Image Processing

Tasks:

• Image Classi cation: google/vit-base-patch16-224.


• Object Detection: facebook/detr-resnet-50.
• Image Segmentation: facebook/maskformer-swin-base-ade.
Example (Image Classi cation):

python

Copy

Download
from transformers import ViTImageProcessor, ViTForImageClassification
import torch
from PIL import Image

processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")

image = [Link]("[Link]")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = [Link](-1).item()
print([Link].id2label[predicted_class]) # e.g., "tabby cat"

7. Diffusion Models (Image Generation)

Task: Generate images from text prompts.


Hugging Face Models:

• CompVis/stable-diffusion-v1-4, runwayml/stable-diffusion-v1-5.
Example:

python

Copy

Download
from diffusers import StableDiffusionPipeline
import torch
fi
fi
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16)
pipe = [Link]("cuda")

prompt = "a futuristic cityscape at sunset"


image = pipe(prompt).images[0]
[Link]("generated_image.png")

Key Hugging Face Libraries

1. Transformers: For NLP tasks (BERT, GPT, etc.).


2. Diffusers: For diffusion models (Stable Diffusion).
3. Datasets: Load datasets (e.g., SQuAD, GLUE).
4. Evaluate: Benchmark models

——————————————————————————————————————

1. BLEU (Bilingual Evaluation Understudy)

• Purpose: Evaluates the quality of machine-translated text by comparing it to


human references.
• How it works:
◦ Measures n-gram overlap (1-gram to 4-gram) between generated and
reference text.
◦ Penalizes overly short outputs with a brevity penalty.
• Range: 0 (no overlap) to 1 (perfect match).
• Limitations:
◦ Ignores semantics (e.g., synonyms).
◦ Poor for creative text generation (e.g., stories).
• Example:
python

Copy

Download

import evaluate
• bleu = [Link]("bleu")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references))
• # Output: {'bleu': 0.75, 'precisions': [1.0, 0.8, 0.666, 0.5], ...}

2. ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

• Purpose: Evaluates summarization and text generation by measuring overlap


with references.
• Variants:
◦ ROUGE-N: N-gram overlap (e.g., ROUGE-1, ROUGE-2).
◦ ROUGE-L: Longest common subsequence (captures sentence structure).
• Range: 0 to 1 (higher = better).
• Use Case: Summarization, headline generation.
• Example:
python

Copy

Download

rouge = [Link]("rouge")
• predictions = ["the cat is on the mat"]
• references = ["the cat sits on the mat"]
• print([Link](predictions=predictions, references=references))
• # Output: {'rouge1': 0.83, 'rouge2': 0.66, 'rougeL': 0.83}

3. METEOR (Metric for Evaluation of Translation with Explicit ORdering)

• Purpose: Improves BLEU by incorporating synonyms, stemming, and word


order.
• Key Features:
◦ Uses WordNet for synonym matching.
◦ Penalizes fragmentation (disjoint matches).
• Range: 0 to 1 (higher = better).
• Use Case: Machine translation, dialogue systems.
• Example:
python

Copy

Download

meteor = [Link]("meteor")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references))
• # Output: {'meteor': 0.87}

4. BERTScore

• Purpose: Evaluates text using BERT embeddings for semantic similarity.


• How it works:
◦ Compares cosine similarity of BERT embeddings for generated vs
reference text.
◦ Captures contextual meaning (unlike n-gram metrics).
• Range: -1 to 1 (higher = better).
• Use Case: Any generative task where semantics matter.
• Example:
python

Copy

Download

bertscore = [Link]("bertscore")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references,
lang="en"))
• # Output: {'precision': 0.98, 'recall': 0.97, 'f1': 0.97}

5. FID (Frechet Inception Distance)

• Purpose: Evaluates image generation quality (e.g., GANs, diffusion models).


• How it works:
◦ Compares statistics of Inception-v3 features for real vs generated
images.
◦ Lower FID = better (0 = identical distributions).
• Use Case: Stable Diffusion, GANs.
• Example:
python

Copy

Download
# Requires `clean-fid` and PyTorch
• from cleanfid import fid
• score = fid.compute_fid("path/to/generated_images", "path/to/real_images")
• print(f"FID Score: {score}") # Lower is better (e.g., < 30 is good)

6. Inception Score (IS)

• Purpose: Measures quality and diversity of generated images.


• How it works:
◦ Uses Inception-v3 to classify images.
◦ High score = images are both recognizable (low entropy per image) and
diverse (high entropy across images).
• Range: Higher = better (e.g., > 30 for good models).
• Limitations: Can be fooled by overfitting.
• Example:
python

Copy

Download

from [Link] import InceptionScore


• inception = InceptionScore()
• # Assume `generated_images` is a tensor of shape [N, 3, 299, 299]
• [Link](generated_images)
• print([Link]()) # Output: {'inception_score_mean': 35.2}

7. Perplexity

• Purpose: Evaluates language models (e.g., GPT) by measuring prediction


uncertainty.
• How it works:
◦ Lower perplexity = model is more con dent (better).
◦ Formula: exp(-1/N * Σ log P(word | context)).
• Use Case: GPT-like models, autoregressive generation.
• Example:
python

Copy
fi
Download

from transformers import GPT2LMHeadModel, GPT2Tokenizer


• model = GPT2LMHeadModel.from_pretrained("gpt2")
• tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
• input_text = "The future of AI is"
• inputs = tokenizer(input_text, return_tensors="pt")
• outputs = model(**inputs, labels=inputs["input_ids"])
• perplexity = [Link]([Link]).item()
• print(f"Perplexity: {perplexity}") # e.g., 25.3

Summary Table

Metric Best for Range Key Insight

BLEU Translation 0–1 N-gram overlap, ignores semantics.

ROUGE Summarization 0–1 Recall-focused, flexible n-grams.

METEOR Translation/Dialogue 0–1 Adds synonyms/stemming.

BERTScore Any text generation -1–1 Uses BERT embeddings.

FID Image generation Lower = better Compares feature distributions.

Inception Score Image generation Higher = better Quality + diversity.

Perplexity Language models Lower = better Measures model confidence.

When to Use Which Metric?

• Text Generation: BERTScore (semantics) + ROUGE (structure).


• Translation: BLEU + METEOR.
• Image Generation: FID + Inception Score.
• Language Models: Perplexity + human eval.

——————————————————————————————————————
1. RAG (Retrieval-Augmented Generation)

What it is:

• A hybrid AI model that combines retrieval-based search with generative AI to


produce more accurate and context-aware answers.
• Retrieval Step: Searches a knowledge base (e.g., documents, databases) for
relevant information.
• Generation Step: Uses a language model (e.g., GPT, T5) to synthesize the
retrieved data into a coherent response.
How it Works:

1. Query: User asks a question (e.g., "What causes climate change?").


2. Retrieval: The system fetches relevant documents/chunks from a database
(e.g., Wikipedia, company docs).
3. Generation: The LLM generates an answer grounded in the retrieved content,
reducing hallucinations.
Key Features:

• Dynamic Knowledge: Unlike static LLMs, RAG can access up-to-date or domain-
specific data.
• Transparency: Sources can be cited (useful for research/chats).
• Ef ciency: Avoids retraining the LLM for new information.
Applications:

• QA systems (e.g., customer support chatbots).


• Medical/legal research (pulling from verified sources).
• Enterprise knowledge management.
Example (Hugging Face):

python

Copy

Download
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq",
index_name="exact")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq",
retriever=retriever)

input_ids = tokenizer("What is the capital of France?", return_tensors="pt").input_ids


outputs = [Link](input_ids)
print([Link](outputs[0], skip_special_tokens=True)) # Output: "Paris"
fi
2. Multimodal AI

What it is:

• Models that process and generate multiple data types (text, images, audio,
video) simultaneously.
• Unlike unimodal AI (e.g., GPT for text, Stable Diffusion for images), multimodal
systems understand cross-modal relationships (e.g., captioning images, generating
videos from text).
Key Architectures:

1. CLIP (Contrastive Language-Image Pretraining):


◦ Learns joint embeddings for images and text.
◦ Powers tools like DALL-E’s text-to-image generation.
2. Flamingo (DeepMind):
◦ Processes interleaved text/images (e.g., answering questions about a
photo).
3. GPT-4V (Vision):
◦ Accepts image + text inputs for tasks like document analysis.
Applications:

• Text-to-Image: DALL-E, MidJourney.


• Visual QA: Answering questions about images ("What’s in this photo?").
• Autonomous Vehicles: Combining LiDAR, cameras, and maps.
• Healthcare: Analyzing X-rays + patient records.
Example (Multimodal QA with CLIP):

python

Copy

Download
from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

image = [Link]("[Link]")
text = ["a photo of a cat", "a photo of a dog"]

inputs = processor(text=text, images=image, return_tensors="pt", padding=True)


outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # Match text to image
probs = logits_per_image.softmax(dim=1)
print(probs) # e.g., [[0.9, 0.1]] → 90% "cat"

Key Differences: RAG vs. Multimodal AI

Feature RAG Multimodal AI

Data Types Text-only (retrieval + generation) Text, images, audio, video.

Core Goal Enhance LLMs with external knowledge. Fuse/translate between modalities.

Example Models Facebook’s RAG, Google’s REALM. CLIP, Flamingo, GPT-4V.

Use Cases Research, fact-checking. Image generation, video captioning.

Why They Matter

• RAG: Solves LLMs’ knowledge cutoff and hallucination problems.


• Multimodal AI: Enables richer human-AI interaction (e.g., ChatGPT analyzing
memes).
Combined Potential:

Future systems might use multimodal RAG—e.g., retrieving images and text to answer
complex queries like:

"Show me a 3D model of the Eiffel Tower and explain its history."


Would you like a deeper dive into either topic? 🚀

——————————————————————————————————————

how to Customise a pre-trained model with your own data in huggingface

Customizing a pre-trained model with your own data in Hugging Face involves ne-
tuning the model on your dataset. Here's a step-by-step guide covering text and image
tasks, using the transformers and datasets libraries.

Steps to Fine-Tune a Pre-Trained Model

1. Install Required Libraries

bash
fi
Copy

Download
pip install transformers datasets accelerate evaluate torch

2. Load Your Dataset

Use Hugging Face’s datasets library or a custom dataset.

Option A: Load from Hugging Face Hub

python

Copy

Download
from datasets import load_dataset

dataset = load_dataset("imdb") # Example: IMDb reviews


# Split into train/validation
train_data = dataset["train"]
val_data = dataset["test"]
Option B: Custom Dataset (CSV/JSON)

python

Copy

Download
from datasets import Dataset

# For CSV/JSON files


dataset = Dataset.from_csv("your_data.csv")
# Or from a Python dict
dataset = Dataset.from_dict({"text": ["I love this!", "This is bad..."], "label": [1, 0]})

3. Load a Pre-Trained Model & Tokenizer

Choose a model for your task (e.g., BERT for text classification, ViT for images).

python

Copy

Download
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,
num_labels=2) # 2 classes

4. Tokenize the Data

Convert text to input IDs and attention masks.

python

Copy

Download
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_train = train_data.map(tokenize_function, batched=True)


tokenized_val = val_data.map(tokenize_function, batched=True)

5. Fine-Tune the Model

Use the Trainer API for training.

Step A: De ne Training Arguments

python

Copy

Download
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
output_dir="./results", # Output directory
per_device_train_batch_size=8, # Batch size
num_train_epochs=3, # Number of epochs
evaluation_strategy="epoch", # Evaluate after each epoch
save_strategy="epoch",
logging_dir="./logs",
)
Step B: De ne Metrics (Optional)

python
fi
fi
Copy

Download
import numpy as np
import evaluate

metric = [Link]("accuracy")

def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = [Link](logits, axis=-1)
return [Link](predictions=predictions, references=labels)
Step C: Train the Model

python

Copy

Download
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_val,
compute_metrics=compute_metrics,
)

[Link]()

6. Save & Share the Model

Save Locally

python

Copy

Download
model.save_pretrained("./custom_model")
tokenizer.save_pretrained("./custom_model")
Push to Hugging Face Hub (Optional)

python

Copy
Download
from huggingface_hub import notebook_login

notebook_login() # Log in to Hugging Face


model.push_to_hub("your-username/bert-custom-imdb")
tokenizer.push_to_hub("your-username/bert-custom-imdb")

Example: Fine-Tuning for Different Tasks

1. Text Classi cation (e.g., Sentiment Analysis)

• Model: distilbert-base-uncased
• Dataset: Custom CSV with text and label columns.
2. Image Classi cation (e.g., Cats vs. Dogs)

python

Copy

Download
from transformers import ViTFeatureExtractor, ViTForImageClassification
import torch

# Load dataset (e.g., CIFAR-10)


dataset = load_dataset("cifar10")

# Load model
model_name = "google/vit-base-patch16-224"
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
model = ViTForImageClassification.from_pretrained(model_name, num_labels=10)

# Preprocess images
def transform(example_batch):
inputs = feature_extractor(example_batch["img"], return_tensors="pt")
inputs["label"] = example_batch["label"]
return inputs

dataset = dataset.with_transform(transform)

# Fine-tune using Trainer (similar to text example)


3. Custom Question Answering (SQuAD Format)

python

Copy
fi
fi
Download
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")

Key Considerations

1. Dataset Size: Small datasets may need fewer epochs to avoid overfitting.
2. GPU/TPU: Use accelerate for distributed training.
3. Hyperparameters: Tweak learning_rate, batch_size, etc.
4. Domain Adaptation: For domain-specific tasks (e.g., medical text), try domain-
adapted models like BioBERT.

Conclusion

By fine-tuning a pre-trained Hugging Face model:

1. Leverage SOTA architectures (BERT, GPT, ViT).


2. Customize for your task without training from scratch.
3. Deploy easily via pipelines or APIs.
For advanced use cases (e.g., LoRA for ef cient ne-tuning), check out Hugging
Face’s PEFT library.

——————————————————————————————————————

Prompting techniques are essential for guiding pre-trained models—especially large


language models (LLMs)—to generate desired outputs without full fine-tuning. Hugging
Face's transformers library supports these techniques natively. Below is a breakdown of
key prompting methods and how to implement them using Hugging Face.

1. Zero-Shot Prompting

De nition: The model generates an answer without any examples, relying solely on its
pre-trained knowledge.
Use Case: Quick inference tasks (classification, QA).
Hugging Face Implementation:

python

Copy

Download
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")


result = classifier(
fi
fi
fi
"Hugging Face is a company based in New York.",
candidate_labels=["technology", "politics", "business"],
)
print(result["labels"][0]) # Output: "technology"
Key Insight: Uses natural language prompts (e.g., "Is this text about technology?").

2. Few-Shot Prompting

De nition: Provide a few examples in the prompt to guide the model.


Use Case: When task specificity is needed (e.g., custom formatting).
Hugging Face Example:

python

Copy

Download
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")


prompt = """
Translate English to French:
- Hello → Bonjour
- Coffee → Café
- Hugging Face →
"""
output = generator(prompt, max_length=50)
print(output[0]["generated_text"]) # Output: "Hugging Face → Hugging Face"
Pro Tip: Use max_new_tokens to control output length.

3. Chain-of-Thought (CoT) Prompting

De nition: Encourage the model to explain its reasoning step-by-step.


Use Case: Complex reasoning tasks (math, logic).
Implementation:

python

Copy

Download
prompt = """
Q: If a store has 10 apples and sells 4, how many are left?
A: Let’s think step-by-step.
fi
fi
1. Start with 10 apples.
2. Sell 4 apples.
3. Remaining apples = 10 - 4 = 6.
Q: A bakery has 20 croissants and bakes 12 more. How many are there now?
A:"""
output = generator(prompt, temperature=0.7) # Temperature for creativity control
Note: Works best with models like GPT-3.5-turbo or Flan-T5.

4. Instruction Prompting

De nition: Explicitly state the task in the prompt (e.g., "Summarize this text:").
Use Case: Controlled generation (summarization, translation).
Hugging Face Example:

python

Copy

Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = "Hugging Face is a company..."
summary = summarizer(f"Summarize this text briefly: {text}", max_length=30)
print(summary[0]["summary_text"])

5. Template-Based Prompting

De nition: Use structured templates for consistency (e.g., for chatbots).


Use Case: Dialogue systems, data extraction.
Example:

python

Copy

Download
template = """
User: {input}
Assistant:"""
input_text = "Explain quantum computing."
output = generator([Link](input=input_text), max_length=100)

6. Retrieval-Augmented Generation (RAG)


fi
fi
De nition: Combine retrieval (search) with generation for factual accuracy.
Use Case: QA with external knowledge.
Hugging Face Tools:

python

Copy

Download
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq")
inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = [Link](input_ids=inputs["input_ids"])
print([Link](outputs[0])) # Output: "Paris"

7. Pre x/Prompt Tuning

De nition: Soft-prompts (trainable continuous embeddings) prepended to inputs.


Use Case: Efficient adaptation without full fine-tuning.
Hugging Face PEFT Library:

python

Copy

Download
from peft import PromptTuningConfig, get_peft_model

config = PromptTuningConfig(task_type="SEQ_CLS", num_virtual_tokens=10)


model = get_peft_model(model, config) # Wrap the base model

8. Multimodal Prompting

De nition: Use prompts combining text + images (e.g., CLIP, GPT-4V).


Use Case: Image captioning, visual QA.
Example with CLIP:

python

Copy

Download
from transformers import CLIPProcessor, CLIPModel
fi
fi
fi
fi
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image,
return_tensors="pt", padding=True)
outputs = model(**inputs)

Prompting Best Practices

1. Be Explicit: Clearly define the task (e.g., "Translate to German:").


2. Use Examples: Few-shot boosts performance for niche tasks.
3. Control Length: Set max_new_tokens to avoid rambling.
4. Temperature: Lower (0.2) for deterministic outputs, higher (0.7) for creativity.

Comparison of Prompting Techniques

Technique Use Case Hugging Face Tools

Zero-Shot Quick classi cation pipeline("zero-shot-classi cation")

Few-Shot Custom tasks Manual prompt design + text-generation

CoT Reasoning tasks GPT-like models + step-by-step prompts

RAG Factual QA RagToken, RagSequence

Prompt Tuning Parameter-ef cient tuning [Link] g

Key Takeaways

• Zero/Few-Shot: Fast, no training needed.


• CoT/RAG: For complex or knowledge-intensive tasks.
• Prompt Tuning: Balance between prompting and fine-tuning.

——————————————————————————————————————

Storing Embeddings in a Vector Database (FAISS/Qdrant)

Vector databases are optimized for storing and retrieving high-dimensional


embeddings (e.g., from LLMs like BERT or image models like CLIP). They enable fast
similarity searches, making them essential for:

• Retrieval-Augmented Generation (RAG)


• Semantic search
• Recommendation systems
• Deduplication
fi
fi
fi
fi
Two popular options are FAISS (Facebook AI) and Qdrant (open-source). Here’s how they
work and how to use them with Hugging Face.

1. What Are Vector Databases?

• Store embeddings (numerical vectors) instead of raw text/images.


• Enable approximate nearest neighbor (ANN) search for scalability.
• Examples: FAISS, Qdrant, Pinecone, Weaviate, Milvus.

2. FAISS (Facebook AI Similarity Search)

A library optimized for fast similarity search with minimal memory usage.

Key Features

• CPU/GPU support
• No native persistence (save/load indexes manually)
• Best for static datasets
Steps to Use FAISS with Hugging Face

A. Generate Embeddings

python

Copy

Download
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # Lightweight model
embeddings = [Link](["Hugging Face is awesome", "I love NLP"])
B. Build a FAISS Index

python

Copy

Download
import faiss
import numpy as np

# Convert to float32 (FAISS requirement)


embeddings = [Link](embeddings).astype("float32")

# Create index (L2 distance)


index = faiss.IndexFlatL2([Link][1])
[Link](embeddings) # Add embeddings to index

# Save index
faiss.write_index(index, "my_faiss_index.index")
C. Query the Index

python

Copy

Download
# Load index
index = faiss.read_index("my_faiss_index.index")

# Query a new embedding


query = [Link](["What is Hugging Face?"])
k = 2 # Top-2 results
distances, indices = [Link](query, k)
print(indices) # Returns [0] (matches "Hugging Face is awesome")

3. Qdrant (Vector Database with Persistence)

A production-ready vector database with:

• Persistence (stores data on disk)


• Filtering (metadata support)
• Scalability (distributed mode)
Steps to Use Qdrant with Hugging Face

A. Run Qdrant

bash

Copy

Download
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
B. Insert Embeddings

python

Copy

Download
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient("localhost") # Connect to server

# Create a collection (like a database table)


client.create_collection(
collection_name="my_collection",
vectors_config={"size": 384, "distance": "Cosine"} # Match model dim
)

# Add embeddings
[Link](
collection_name="my_collection",
points=[
{"id": 1, "vector": [Link]("Hugging Face is awesome").tolist()},
{"id": 2, "vector": [Link]("I love NLP").tolist()},
]
)
C. Query with Filters

python

Copy

Download
query = [Link]("What is Hugging Face?")
hits = [Link](
collection_name="my_collection",
query_vector=[Link](),
limit=2 # Top-2 results
)
print(hits[0].payload) # Returns closest match

4. FAISS vs. Qdrant

Feature FAISS Qdrant

Persistence Manual (save/load files) Built-in

Scalability Single-node Distributed (enterprise)

Metadata Filtering No Yes


Best For Static datasets, prototyping Production, dynamic data

5. Real-World Applications

1. RAG Systems: Store document embeddings for retrieval.


2. Semantic Search: Find similar text/images (e.g., "Find memes about AI").
3. Recommendations: "Users who liked this also liked…"
Example: RAG with FAISS

python

Copy

Download
from transformers import RagTokenizer, RagRetriever
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
retriever = RagRetriever.from_pretrained(
"facebook/rag-token-nq",
index_name="custom",
passages_path="my_faiss_index.index" # Load your FAISS index
)

Key Takeaways

• FAISS: Lightweight, good for prototyping.


• Qdrant: Production-ready with metadata support.
• Hugging Face Integration: Generate embeddings with sentence-transformers,
then store/query them ef ciently.

——————————————————————————————————————

Explain following - Chunking Strategies: Overlapping, Augmented, xed-size,


content-aware, KG

- Embedding Strategies: Comparing various embedding models

- Retrieval Strategies: Similarity search, hybrid search, ltering

- RAG Fusion: Combining search results for improved accuracy

- Retrieval Ranking: Tuning for best results

- GuardRails, Response Evaluation & Explanation (DeepEval-like tools)


fi
fi
fi
Here’s a detailed breakdown of the key components in advanced Retrieval-Augmented
Generation (RAG) systems, including chunking, embeddings, retrieval strategies, and
evaluation:

1. Chunking Strategies

Chunking splits documents into smaller pieces for efficient processing. Different strategies
include:

A. Fixed-Size Chunking

• How it works: Splits text into equal-sized chunks (e.g., 256 tokens).
• Pros: Simple, works well for uniform content.
• Cons: May split semantically related text.
• Example:
python

Copy

Download

from langchain.text_splitter import CharacterTextSplitter


• splitter = CharacterTextSplitter(chunk_size=256, chunk_overlap=20)
• chunks = splitter.split_text(document)

B. Overlapping Chunking

• How it works: Adds overlap between chunks (e.g., 10% of chunk size) to preserve
context.
• Pros: Reduces information loss at boundaries.
• Cons: Increases storage/compute costs.
C. Content-Aware Chunking

• How it works: Uses NLP (e.g., sentence boundaries, headings) to split logically.
• Methods:
◦ Sentence Splitting: NLTK/spaCy for sentence detection.
◦ HTML/PDF Parsing: Extract sections (e.g., BeautifulSoup for HTML).
• Example:
python

Copy

Download
from langchain.text_splitter import MarkdownHeaderTextSplitter
• headers = [("#", "Header1"), ("##", "Header2")]
• splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers)
• chunks = splitter.split_text(markdown_doc)

D. Knowledge Graph (KG)-Aware Chunking

• How it works: Chunks based on entity/relationship boundaries (e.g., nodes in a


KG).
• Use Case: Structured data like medical records or product catalogs.
E. Augmented Chunking

• How it works: Enhances chunks with metadata (e.g., summaries, embeddings).


• Example:
python

Copy

Download

chunk = {"text": "Hugging Face is a company...", "summary": "About HF",


"embedding": [0.1, 0.2,...]}

2. Embedding Strategies

Embeddings convert text to vectors. Key models and tradeoffs:

Model Dimensions Speed Semantic Quality Best For

BERT 768 Slow High Short-text search

Sentence-BERT 384-1024 Medium High General-purpose

all-MiniLM-L6 384 Fast Good Low-latency systems

OpenAI Embeddings 1536 Slow Very High Paid APIs

CLIP 512 Medium Multimodal Text-to-image search

Example (Sentence-BERT):
python

Copy

Download
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = [Link](["Hugging Face is awesome"])

Retrieval Strategies

A. Similarity Search

• How it works: Finds nearest vectors using cosine/L2 distance.


• Tools: FAISS, Qdrant, Pinecone.
• Example:
python

Copy

Download

distances, indices = faiss_index.search(query_embedding, k=5)

B. Hybrid Search

• How it works: Combines keyword search (BM25) + vector search.


• Use Case: When exact keyword matches matter (e.g., names).
• Tools: Elasticsearch + FAISS, Weaviate.
C. Filtering

• How it works: Applies metadata filters (e.g., date > 2023).


• Example (Qdrant):
python

Copy

Download

hits = qdrant_client.search(
• query_vector=embedding,
• filter=Filter(must=[FieldCondition(key="year", range=Range(gte=2020))],
• limit=5
• )

RAG Fusion

• How it works: Combines results from multiple retrievers (e.g., vector + keyword)
and reranks them.
• Methods:
◦ Reciprocal Rank Fusion (RRF): Merges rankings from different searches.
◦ Cross-Encoder Reranking: Uses BERT-like models to rescore.
• Example:
python

Copy

Download

from transformers import CrossEncoder


• cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
• scores = cross_encoder.predict([(query, chunk) for chunk in chunks])

Retrieval Ranking

Optimize retrieval quality by:

1. Dense vs. Sparse Retrieval: Combine BM25 (sparse) + DPR (dense).


2. Reranking: Use cross-encoders (e.g., ms-marco-MiniLM-L-6-v2).
3. Feedback Loops: Log user clicks to improve rankings.
Example (Reranking):

python

Copy

Download
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query, chunk) for chunk in top_chunks]
scores = [Link](pairs)
reranked_chunks = [chunk for _, chunk in sorted(zip(scores, top_chunks), reverse=True)]

Guardrails & Evaluation

A. GuardRails
• What it is: Rules to block harmful/off-topic outputs.
• Tools:
◦ NeMo Guardrails (NVIDIA)
◦ Microsoft Guidance
• Example:
python

Copy

Download

from guardrails import Guard


• guard = Guard.from_string(blocklist=["SSN", "credit card"])
• guarded_output = [Link](model_output)

B. Response Evaluation

• Metrics:
◦ Faithfulness: Is the output grounded in retrieved docs?
◦ Answer Relevance: Does it address the query?
• Tools:
◦ DeepEval: evaluate_faithfulness(generation, contexts)
◦ RAGAS (Retrieval-Augmented Generation Assessment)
Example (DeepEval):

python

Copy

Download
from [Link] import FaithfulnessMetric
metric = FaithfulnessMetric()
[Link](model_output, contexts=[retrieved_docs])
print([Link]) # 0.8 (faithfulness score)
C. Explanation

• Why it matters: Helps debug why certain chunks were retrieved.


• Methods:
◦ SHAP/LIME: Explain embedding contributions.
◦ Highlighting: Show matching text spans.

Key Takeaways
Component Goal Tools/Models

Chunking Balance context vs. efficiency LangChain, spaCy, NLTK

Embeddings High-quality vector representations Sentence-BERT, OpenAI, CLIP

Retrieval Find relevant docs quickly FAISS, Qdrant, Hybrid Search

RAG Fusion Combine multiple retrieval methods RRF, Cross-Encoders

Ranking Prioritize the best results BM25 + DPR, Rerankers

Guardrails Ensure safe/accurate outputs NeMo, DeepEval

For production RAG systems, combine these techniques and iterate based on evaluation
metrics. Need a Colab notebook for any step?

1. Zero-Shot Prompting

De nition: The model generates an answer without any prior examples, relying solely
on its pre-trained knowledge.
Use Case: Quick classification, general QA.
Example:

python

Copy

Download
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier("This movie is fantastic!", candidate_labels=["positive", "negative"])
# Output: {'labels': ['positive', 'negative'], 'scores': [0.99, 0.01]}

2. Few-Shot Prompting

De nition: Provide a few input-output examples in the prompt to guide the model.
Use Case: Custom tasks (e.g., translation, formatting).
Example:

text

Copy
fi
fi
Download
Translate English to French:
- "Hello" → "Bonjour"
- "Coffee" → "Café"
- "Car" →
Model Output: "Voiture"

3. Chain-of-Thought (CoT) Prompting

De nition: Encourage the model to explain its reasoning step-by-step before


answering.
Use Case: Math problems, complex reasoning.
Example:

text

Copy

Download
Q: If a store has 10 apples and sells 4, how many are left?
A: Let’s think step-by-step.
1. Start with 10 apples.
2. Sell 4 apples.
3. Remaining apples = 10 - 4 = 6.

4. Instruction Prompting

De nition: Explicitly state the task in the prompt (e.g., "Summarize this article:").
Use Case: Controlled generation (summarization, translation).
Example:

python

Copy

Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Summarize this: Hugging Face is a company...", max_length=30)

5. Template-Based Prompting

De nition: Use structured templates (e.g., for chatbots or data extraction).


Use Case: Consistent input/output formats.
Example:
fi
fi
fi
text

Copy

Download
User: {input}
Assistant: {output}

6. Self-Consistency Prompting

De nition: Generate multiple reasoning paths and pick the most consistent answer.
Use Case: Improving reliability of CoT.
Example:

• Generate 3 different reasoning paths for a math problem and vote for the most
frequent answer.

7. Least-to-Most Prompting

De nition: Break a complex problem into sub-questions, solve sequentially.


Use Case: Multi-step reasoning.
Example:

text

Copy

Download
Q: What is the capital of France?
Sub-Q1: Which country is Paris in?
Sub-Q2: Is Paris the capital of France?
A: Paris.

8. Retrieval-Augmented Generation (RAG) Prompting

De nition: Combine retrieval (search) + generation for factual accuracy.


Use Case: QA with external knowledge.
Example:

python

Copy

Download
from transformers import RagRetriever, RagTokenForGeneration
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq")
fi
fi
fi
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq",
retriever=retriever)

9. Multimodal Prompting

De nition: Use prompts combining text + images/audio.


Use Case: Image captioning, visual QA.
Example (CLIP):

python

Copy

Download
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
inputs = processor(text=["a cat", "a dog"], images=image, return_tensors="pt",
padding=True)

10. Active Prompting

De nition: Dynamically select the best examples for few-shot prompts based on input.
Use Case: Adaptive few-shot learning.

11. Directional Stimulus Prompting

De nition: Add hints or constraints to guide generation (e.g., "Answer in 10 words").


Example:

text

Copy

Download
Q: Describe the Eiffel Tower in 10 words.
A: "Iconic Parisian iron tower, 324m tall, built in 1889."

12. Emotion Prompting

De nition: Include emotional cues (e.g., "Explain like I’m 5").


Use Case: Tailoring tone (educational, empathetic).

13. Contrastive Prompting


fi
fi
fi
fi
De nition: Provide contrasting examples to highlight differences.
Example:

text

Copy

Download
Good Response: "Python is interpreted."
Bad Response: "Python is compiled."
Q: Is Python compiled or interpreted?

14. Generated Knowledge Prompting

De nition: First generate facts, then use them to answer.


Use Case: Fact-heavy QA.
Example:

text

Copy

Download
Step 1: Generate facts about the Eiffel Tower.
Step 2: Use facts to answer "When was it built?"

15. Automatic Prompt Engineering (APE)

De nition: Use LLMs to optimize prompts automatically.


Tools: OpenAI’s "auto-prompt" techniques.

16. Soft Prompting (Prompt Tuning)

De nition: Train continuous embeddings as prompts (no human-readable text).


Use Case: Parameter-efficient fine-tuning.
Example (PEFT Library):

python

Copy

Download
from peft import PromptTuningConfig, get_peft_model
config = PromptTuningConfig(task_type="SEQ_CLS", num_virtual_tokens=10)
model = get_peft_model(model, config)
fi
fi
fi
fi
17. Tree-of-Thought (ToT) Prompting

De nition: Explore multiple reasoning paths in a tree structure.


Use Case: Complex decision-making.

18. Maieutic Prompting

De nition: Ask the model to justify its own answers recursively.


Use Case: Debugging model reasoning.

19. Analogical Prompting

De nition: Solve problems by drawing analogies to known examples.


Example:

text

Copy

Download
Q: How is a CPU like a brain?
A: Both process information and control functions.

20. Meta-Prompting

De nition: Prompt the model to generate its own prompts.


Example:

text

Copy

Download
Generate a prompt to classify sentiment:
"Label this text as positive, negative, or neutral: {text}"

Summary Table

Technique Key Idea Best For

Zero-Shot No examples Quick tasks


fi
fi
fi
fi
Few-Shot Provide examples Custom tasks

CoT Step-by-step reasoning Math/logic

RAG Retrieve + generate Factual QA

Multimodal Text + images/audio CLIP, GPT-4V

Prompt Tuning Trainable soft prompts Efficient fine-tuning

Tree-of-Thought Explore multiple paths Complex decisions

Self-Consistency Vote among multiple answers Reliable reasoning

Key Takeaways

• For simple tasks: Zero-shot/few-shot.


• For reasoning: CoT, ToT, self-consistency.
• For factual accuracy: RAG.
• For ef ciency: Soft prompting (PEFT).
Need a deep dive on any technique? Let me know! 🚀
fi

You might also like