0% found this document useful (0 votes)

31 views36 pages

Hugging Face Models and Evaluation Metrics

The document outlines various tasks and models available in Hugging Face for classification, text generation, summarization, translation, question answering, image processing, and diffusion models. It also discusses evaluation metrics for machine translation, summarization, and image generation, along with their use cases and examples. Additionally, it introduces concepts like Retrieval-Augmented Generation (RAG) and Multimodal AI, highlighting their functionalities and applications.

Uploaded by

vaibhavag404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views36 pages

Hugging Face Models and Evaluation Metrics

Uploaded by

vaibhavag404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

——————————————————————————————————————

1. Classi cation

Task: Assign a label to input data (text, image, etc.).

Hugging Face Models:

Text Classi cation: DistilBERT, RoBERTa, BERT (e.g., sentiment analysis, spam
•
detection).
• Image Classi cation: ViT (Vision Transformer), ResNet.
• Zero-Shot Classi cation: facebook/bart-large-mnli (classify without fine-tuning).
Example (Text Classi cation):

python

Copy

Download
from transformers import pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-
english")
result = classifier("I love Hugging Face!")
print(result) # Output: [{'label': 'POSITIVE', 'score': 0.9998}]

2. Text Generation

Task: Generate coherent text (e.g., stories, code, dialogue).

Hugging Face Models:

• Autoregressive Models: GPT-2, GPT-J, GPT-Neo.

• Controlled Generation: CTRL (conditional text).
Example (Text Generation):

python

Copy

Download
generator = pipeline("text-generation", model="gpt2")
output = generator("Once upon a time,", max_length=50)
print(output[0]['generated_text'])

3. Summarization

Task: Condense long text into a shorter summary.

Hugging Face Models:
fi
fi
fi
fi
fi
• facebook/bart-large-cnn, google/pegasus-xsum, t5-small.
Example:

python

Copy

Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Long article text...", max_length=130)
print(summary[0]['summary_text'])

4. Translation

Task: Translate text between languages.

Hugging Face Models:

• Helsinki-NLP/opus-mt-en-fr (English → French), t5-base (multilingual).

Example:

python

Copy

Download
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")
print(result[0]['translation_text']) # Output: "Bonjour, comment allez-vous ?"

5. Question Answering (QA)

Task: Extract answers from a context (e.g., SQuAD dataset).

Hugging Face Models:

• bert-large-uncased-whole-word-masking-finetuned-squad, distilbert-base-cased-
distilled-squad.
Example:

python

Copy

Download
qa_pipeline = pipeline("question-answering", model="bert-large-uncased-whole-word-
masking-finetuned-squad")
result = qa_pipeline(question="What is Hugging Face?", context="Hugging Face is a
company...")
print(result['answer']) # Output: "a company"

6. Image Processing

Tasks:

• Image Classi cation: google/vit-base-patch16-224.

• Object Detection: facebook/detr-resnet-50.
• Image Segmentation: facebook/maskformer-swin-base-ade.
Example (Image Classi cation):

python

Copy

Download
from transformers import ViTImageProcessor, ViTForImageClassification
import torch
from PIL import Image

processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")

image = [Link]("[Link]")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = [Link](-1).item()
print([Link].id2label[predicted_class]) # e.g., "tabby cat"

7. Diffusion Models (Image Generation)

Task: Generate images from text prompts.

Hugging Face Models:

• CompVis/stable-diffusion-v1-4, runwayml/stable-diffusion-v1-5.
Example:

python

Copy

Download
from diffusers import StableDiffusionPipeline
import torch
fi
fi
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16)
pipe = [Link]("cuda")

prompt = "a futuristic cityscape at sunset"

image = pipe(prompt).images[0]
[Link]("generated_image.png")

Key Hugging Face Libraries

1. Transformers: For NLP tasks (BERT, GPT, etc.).

2. Diffusers: For diffusion models (Stable Diffusion).
3. Datasets: Load datasets (e.g., SQuAD, GLUE).
4. Evaluate: Benchmark models

——————————————————————————————————————

1. BLEU (Bilingual Evaluation Understudy)

• Purpose: Evaluates the quality of machine-translated text by comparing it to

human references.
• How it works:
◦ Measures n-gram overlap (1-gram to 4-gram) between generated and
reference text.
◦ Penalizes overly short outputs with a brevity penalty.
• Range: 0 (no overlap) to 1 (perfect match).
• Limitations:
◦ Ignores semantics (e.g., synonyms).
◦ Poor for creative text generation (e.g., stories).
• Example:
python

Copy

Download

import evaluate
• bleu = [Link]("bleu")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references))
• # Output: {'bleu': 0.75, 'precisions': [1.0, 0.8, 0.666, 0.5], ...}

2. ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

• Purpose: Evaluates summarization and text generation by measuring overlap

with references.
• Variants:
◦ ROUGE-N: N-gram overlap (e.g., ROUGE-1, ROUGE-2).
◦ ROUGE-L: Longest common subsequence (captures sentence structure).
• Range: 0 to 1 (higher = better).
• Use Case: Summarization, headline generation.
• Example:
python

Copy

Download

rouge = [Link]("rouge")
• predictions = ["the cat is on the mat"]
• references = ["the cat sits on the mat"]
• print([Link](predictions=predictions, references=references))
• # Output: {'rouge1': 0.83, 'rouge2': 0.66, 'rougeL': 0.83}

3. METEOR (Metric for Evaluation of Translation with Explicit ORdering)

• Purpose: Improves BLEU by incorporating synonyms, stemming, and word

order.
• Key Features:
◦ Uses WordNet for synonym matching.
◦ Penalizes fragmentation (disjoint matches).
• Range: 0 to 1 (higher = better).
• Use Case: Machine translation, dialogue systems.
• Example:
python

Copy

Download

meteor = [Link]("meteor")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references))
• # Output: {'meteor': 0.87}

4. BERTScore

• Purpose: Evaluates text using BERT embeddings for semantic similarity.

• How it works:
◦ Compares cosine similarity of BERT embeddings for generated vs
reference text.
◦ Captures contextual meaning (unlike n-gram metrics).
• Range: -1 to 1 (higher = better).
• Use Case: Any generative task where semantics matter.
• Example:
python

Copy

Download

bertscore = [Link]("bertscore")
• predictions = ["the cat is on the mat"]
• references = [["the cat sits on the mat"]]
• print([Link](predictions=predictions, references=references,
lang="en"))
• # Output: {'precision': 0.98, 'recall': 0.97, 'f1': 0.97}

5. FID (Frechet Inception Distance)

• Purpose: Evaluates image generation quality (e.g., GANs, diffusion models).

• How it works:
◦ Compares statistics of Inception-v3 features for real vs generated
images.
◦ Lower FID = better (0 = identical distributions).
• Use Case: Stable Diffusion, GANs.
• Example:
python

Copy

Download
# Requires `clean-fid` and PyTorch
• from cleanfid import fid
• score = fid.compute_fid("path/to/generated_images", "path/to/real_images")
• print(f"FID Score: {score}") # Lower is better (e.g., < 30 is good)

6. Inception Score (IS)

• Purpose: Measures quality and diversity of generated images.

• How it works:
◦ Uses Inception-v3 to classify images.
◦ High score = images are both recognizable (low entropy per image) and
diverse (high entropy across images).
• Range: Higher = better (e.g., > 30 for good models).
• Limitations: Can be fooled by overfitting.
• Example:
python

Copy

Download

from [Link] import InceptionScore

• inception = InceptionScore()
• # Assume `generated_images` is a tensor of shape [N, 3, 299, 299]
• [Link](generated_images)
• print([Link]()) # Output: {'inception_score_mean': 35.2}

7. Perplexity

• Purpose: Evaluates language models (e.g., GPT) by measuring prediction

uncertainty.
• How it works:
◦ Lower perplexity = model is more con dent (better).
◦ Formula: exp(-1/N * Σ log P(word | context)).
• Use Case: GPT-like models, autoregressive generation.
• Example:
python

Copy
fi
Download

from transformers import GPT2LMHeadModel, GPT2Tokenizer

• model = GPT2LMHeadModel.from_pretrained("gpt2")
• tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
• input_text = "The future of AI is"
• inputs = tokenizer(input_text, return_tensors="pt")
• outputs = model(**inputs, labels=inputs["input_ids"])
• perplexity = [Link]([Link]).item()
• print(f"Perplexity: {perplexity}") # e.g., 25.3

Summary Table

Metric Best for Range Key Insight

BLEU Translation 0–1 N-gram overlap, ignores semantics.

ROUGE Summarization 0–1 Recall-focused, flexible n-grams.

METEOR Translation/Dialogue 0–1 Adds synonyms/stemming.

BERTScore Any text generation -1–1 Uses BERT embeddings.

FID Image generation Lower = better Compares feature distributions.

Inception Score Image generation Higher = better Quality + diversity.

Perplexity Language models Lower = better Measures model confidence.

When to Use Which Metric?

• Text Generation: BERTScore (semantics) + ROUGE (structure).

• Translation: BLEU + METEOR.
• Image Generation: FID + Inception Score.
• Language Models: Perplexity + human eval.

——————————————————————————————————————
1. RAG (Retrieval-Augmented Generation)

What it is:

• A hybrid AI model that combines retrieval-based search with generative AI to

produce more accurate and context-aware answers.
• Retrieval Step: Searches a knowledge base (e.g., documents, databases) for
relevant information.
• Generation Step: Uses a language model (e.g., GPT, T5) to synthesize the
retrieved data into a coherent response.
How it Works:

1. Query: User asks a question (e.g., "What causes climate change?").

2. Retrieval: The system fetches relevant documents/chunks from a database
(e.g., Wikipedia, company docs).
3. Generation: The LLM generates an answer grounded in the retrieved content,
reducing hallucinations.
Key Features:

• Dynamic Knowledge: Unlike static LLMs, RAG can access up-to-date or domain-
specific data.
• Transparency: Sources can be cited (useful for research/chats).
• Ef ciency: Avoids retraining the LLM for new information.
Applications:

• QA systems (e.g., customer support chatbots).

• Medical/legal research (pulling from verified sources).
• Enterprise knowledge management.
Example (Hugging Face):

python

Copy

Download
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq",
index_name="exact")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq",
retriever=retriever)

input_ids = tokenizer("What is the capital of France?", return_tensors="pt").input_ids

outputs = [Link](input_ids)
print([Link](outputs[0], skip_special_tokens=True)) # Output: "Paris"
fi
2. Multimodal AI

What it is:

• Models that process and generate multiple data types (text, images, audio,
video) simultaneously.
• Unlike unimodal AI (e.g., GPT for text, Stable Diffusion for images), multimodal
systems understand cross-modal relationships (e.g., captioning images, generating
videos from text).
Key Architectures:

1. CLIP (Contrastive Language-Image Pretraining):

◦ Learns joint embeddings for images and text.
◦ Powers tools like DALL-E’s text-to-image generation.
2. Flamingo (DeepMind):
◦ Processes interleaved text/images (e.g., answering questions about a
photo).
3. GPT-4V (Vision):
◦ Accepts image + text inputs for tasks like document analysis.
Applications:

• Text-to-Image: DALL-E, MidJourney.

• Visual QA: Answering questions about images ("What’s in this photo?").
• Autonomous Vehicles: Combining LiDAR, cameras, and maps.
• Healthcare: Analyzing X-rays + patient records.
Example (Multimodal QA with CLIP):

python

Copy

Download
from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

image = [Link]("[Link]")
text = ["a photo of a cat", "a photo of a dog"]

inputs = processor(text=text, images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # Match text to image
probs = logits_per_image.softmax(dim=1)
print(probs) # e.g., [[0.9, 0.1]] → 90% "cat"

Key Differences: RAG vs. Multimodal AI

Feature RAG Multimodal AI

Data Types Text-only (retrieval + generation) Text, images, audio, video.

Core Goal Enhance LLMs with external knowledge. Fuse/translate between modalities.

Example Models Facebook’s RAG, Google’s REALM. CLIP, Flamingo, GPT-4V.

Use Cases Research, fact-checking. Image generation, video captioning.

Why They Matter

• RAG: Solves LLMs’ knowledge cutoff and hallucination problems.

• Multimodal AI: Enables richer human-AI interaction (e.g., ChatGPT analyzing
memes).
Combined Potential:

Future systems might use multimodal RAG—e.g., retrieving images and text to answer
complex queries like:

"Show me a 3D model of the Eiffel Tower and explain its history."

Would you like a deeper dive into either topic? 🚀

——————————————————————————————————————

how to Customise a pre-trained model with your own data in huggingface

Customizing a pre-trained model with your own data in Hugging Face involves ne-
tuning the model on your dataset. Here's a step-by-step guide covering text and image
tasks, using the transformers and datasets libraries.

Steps to Fine-Tune a Pre-Trained Model

1. Install Required Libraries

bash
fi
Copy

Download
pip install transformers datasets accelerate evaluate torch

2. Load Your Dataset

Use Hugging Face’s datasets library or a custom dataset.

Option A: Load from Hugging Face Hub

python

Copy

Download
from datasets import load_dataset

dataset = load_dataset("imdb") # Example: IMDb reviews

# Split into train/validation
train_data = dataset["train"]
val_data = dataset["test"]
Option B: Custom Dataset (CSV/JSON)

python

Copy

Download
from datasets import Dataset

# For CSV/JSON files

dataset = Dataset.from_csv("your_data.csv")
# Or from a Python dict
dataset = Dataset.from_dict({"text": ["I love this!", "This is bad..."], "label": [1, 0]})

3. Load a Pre-Trained Model & Tokenizer

Choose a model for your task (e.g., BERT for text classification, ViT for images).

python

Copy

Download
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,
num_labels=2) # 2 classes

4. Tokenize the Data

Convert text to input IDs and attention masks.

python

Copy

Download
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_train = train_data.map(tokenize_function, batched=True)

tokenized_val = val_data.map(tokenize_function, batched=True)

5. Fine-Tune the Model

Use the Trainer API for training.

Step A: De ne Training Arguments

python

Copy

Download
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
output_dir="./results", # Output directory
per_device_train_batch_size=8, # Batch size
num_train_epochs=3, # Number of epochs
evaluation_strategy="epoch", # Evaluate after each epoch
save_strategy="epoch",
logging_dir="./logs",
)
Step B: De ne Metrics (Optional)

python
fi
fi
Copy

Download
import numpy as np
import evaluate

metric = [Link]("accuracy")

def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = [Link](logits, axis=-1)
return [Link](predictions=predictions, references=labels)
Step C: Train the Model

python

Copy

Download
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_val,
compute_metrics=compute_metrics,
)

[Link]()

6. Save & Share the Model

Save Locally

python

Copy

Download
model.save_pretrained("./custom_model")
tokenizer.save_pretrained("./custom_model")
Push to Hugging Face Hub (Optional)

python

Copy
Download
from huggingface_hub import notebook_login

notebook_login() # Log in to Hugging Face

model.push_to_hub("your-username/bert-custom-imdb")
tokenizer.push_to_hub("your-username/bert-custom-imdb")

Example: Fine-Tuning for Different Tasks

1. Text Classi cation (e.g., Sentiment Analysis)

• Model: distilbert-base-uncased
• Dataset: Custom CSV with text and label columns.
2. Image Classi cation (e.g., Cats vs. Dogs)

python

Copy

Download
from transformers import ViTFeatureExtractor, ViTForImageClassification
import torch

# Load dataset (e.g., CIFAR-10)

dataset = load_dataset("cifar10")

# Load model
model_name = "google/vit-base-patch16-224"
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
model = ViTForImageClassification.from_pretrained(model_name, num_labels=10)

# Preprocess images
def transform(example_batch):
inputs = feature_extractor(example_batch["img"], return_tensors="pt")
inputs["label"] = example_batch["label"]
return inputs

dataset = dataset.with_transform(transform)

# Fine-tune using Trainer (similar to text example)

3. Custom Question Answering (SQuAD Format)

python

Copy
fi
fi
Download
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")

Key Considerations

1. Dataset Size: Small datasets may need fewer epochs to avoid overfitting.
2. GPU/TPU: Use accelerate for distributed training.
3. Hyperparameters: Tweak learning_rate, batch_size, etc.
4. Domain Adaptation: For domain-specific tasks (e.g., medical text), try domain-
adapted models like BioBERT.

Conclusion

By fine-tuning a pre-trained Hugging Face model:

1. Leverage SOTA architectures (BERT, GPT, ViT).

2. Customize for your task without training from scratch.
3. Deploy easily via pipelines or APIs.
For advanced use cases (e.g., LoRA for ef cient ne-tuning), check out Hugging
Face’s PEFT library.

——————————————————————————————————————

Prompting techniques are essential for guiding pre-trained models—especially large

language models (LLMs)—to generate desired outputs without full fine-tuning. Hugging
Face's transformers library supports these techniques natively. Below is a breakdown of
key prompting methods and how to implement them using Hugging Face.

1. Zero-Shot Prompting

De nition: The model generates an answer without any examples, relying solely on its
pre-trained knowledge.
Use Case: Quick inference tasks (classification, QA).
Hugging Face Implementation:

python

Copy

Download
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

result = classifier(
fi
fi
fi
"Hugging Face is a company based in New York.",
candidate_labels=["technology", "politics", "business"],
)
print(result["labels"][0]) # Output: "technology"
Key Insight: Uses natural language prompts (e.g., "Is this text about technology?").

2. Few-Shot Prompting

De nition: Provide a few examples in the prompt to guide the model.

Use Case: When task specificity is needed (e.g., custom formatting).
Hugging Face Example:

python

Copy

Download
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

prompt = """
Translate English to French:
- Hello → Bonjour
- Coffee → Café
- Hugging Face →
"""
output = generator(prompt, max_length=50)
print(output[0]["generated_text"]) # Output: "Hugging Face → Hugging Face"
Pro Tip: Use max_new_tokens to control output length.

3. Chain-of-Thought (CoT) Prompting

De nition: Encourage the model to explain its reasoning step-by-step.

Use Case: Complex reasoning tasks (math, logic).
Implementation:

python

Copy

Download
prompt = """
Q: If a store has 10 apples and sells 4, how many are left?
A: Let’s think step-by-step.
fi
fi
1. Start with 10 apples.
2. Sell 4 apples.
3. Remaining apples = 10 - 4 = 6.
Q: A bakery has 20 croissants and bakes 12 more. How many are there now?
A:"""
output = generator(prompt, temperature=0.7) # Temperature for creativity control
Note: Works best with models like GPT-3.5-turbo or Flan-T5.

4. Instruction Prompting

De nition: Explicitly state the task in the prompt (e.g., "Summarize this text:").
Use Case: Controlled generation (summarization, translation).
Hugging Face Example:

python

Copy

Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = "Hugging Face is a company..."
summary = summarizer(f"Summarize this text briefly: {text}", max_length=30)
print(summary[0]["summary_text"])

5. Template-Based Prompting

De nition: Use structured templates for consistency (e.g., for chatbots).

Use Case: Dialogue systems, data extraction.
Example:

python

Copy

Download
template = """
User: {input}
Assistant:"""
input_text = "Explain quantum computing."
output = generator([Link](input=input_text), max_length=100)

6. Retrieval-Augmented Generation (RAG)

fi
fi
De nition: Combine retrieval (search) with generation for factual accuracy.
Use Case: QA with external knowledge.
Hugging Face Tools:

python

Copy

Download
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq")
inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = [Link](input_ids=inputs["input_ids"])
print([Link](outputs[0])) # Output: "Paris"

7. Pre x/Prompt Tuning

De nition: Soft-prompts (trainable continuous embeddings) prepended to inputs.

Use Case: Efficient adaptation without full fine-tuning.
Hugging Face PEFT Library:

python

Copy

Download
from peft import PromptTuningConfig, get_peft_model

config = PromptTuningConfig(task_type="SEQ_CLS", num_virtual_tokens=10)

model = get_peft_model(model, config) # Wrap the base model

8. Multimodal Prompting

De nition: Use prompts combining text + images (e.g., CLIP, GPT-4V).

Use Case: Image captioning, visual QA.
Example with CLIP:

python

Copy

Download
from transformers import CLIPProcessor, CLIPModel
fi
fi
fi
fi
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image,
return_tensors="pt", padding=True)
outputs = model(**inputs)

Prompting Best Practices

1. Be Explicit: Clearly define the task (e.g., "Translate to German:").

2. Use Examples: Few-shot boosts performance for niche tasks.
3. Control Length: Set max_new_tokens to avoid rambling.
4. Temperature: Lower (0.2) for deterministic outputs, higher (0.7) for creativity.

Comparison of Prompting Techniques

Technique Use Case Hugging Face Tools

Zero-Shot Quick classi cation pipeline("zero-shot-classi cation")

Few-Shot Custom tasks Manual prompt design + text-generation

CoT Reasoning tasks GPT-like models + step-by-step prompts

RAG Factual QA RagToken, RagSequence

Prompt Tuning Parameter-ef cient tuning [Link] g

Key Takeaways

• Zero/Few-Shot: Fast, no training needed.

• CoT/RAG: For complex or knowledge-intensive tasks.
• Prompt Tuning: Balance between prompting and fine-tuning.

——————————————————————————————————————

Storing Embeddings in a Vector Database (FAISS/Qdrant)

Vector databases are optimized for storing and retrieving high-dimensional

embeddings (e.g., from LLMs like BERT or image models like CLIP). They enable fast
similarity searches, making them essential for:

• Retrieval-Augmented Generation (RAG)

• Semantic search
• Recommendation systems
• Deduplication
fi
fi
fi
fi
Two popular options are FAISS (Facebook AI) and Qdrant (open-source). Here’s how they
work and how to use them with Hugging Face.

1. What Are Vector Databases?

• Store embeddings (numerical vectors) instead of raw text/images.

• Enable approximate nearest neighbor (ANN) search for scalability.
• Examples: FAISS, Qdrant, Pinecone, Weaviate, Milvus.

2. FAISS (Facebook AI Similarity Search)

A library optimized for fast similarity search with minimal memory usage.

Key Features

• CPU/GPU support
• No native persistence (save/load indexes manually)
• Best for static datasets
Steps to Use FAISS with Hugging Face

A. Generate Embeddings

python

Copy

Download
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # Lightweight model
embeddings = [Link](["Hugging Face is awesome", "I love NLP"])
B. Build a FAISS Index

python

Copy

Download
import faiss
import numpy as np

# Convert to float32 (FAISS requirement)

embeddings = [Link](embeddings).astype("float32")

# Create index (L2 distance)

index = faiss.IndexFlatL2([Link][1])
[Link](embeddings) # Add embeddings to index

# Save index
faiss.write_index(index, "my_faiss_index.index")
C. Query the Index

python

Copy

Download
# Load index
index = faiss.read_index("my_faiss_index.index")

# Query a new embedding

query = [Link](["What is Hugging Face?"])
k = 2 # Top-2 results
distances, indices = [Link](query, k)
print(indices) # Returns [0] (matches "Hugging Face is awesome")

3. Qdrant (Vector Database with Persistence)

A production-ready vector database with:

• Persistence (stores data on disk)

• Filtering (metadata support)
• Scalability (distributed mode)
Steps to Use Qdrant with Hugging Face

A. Run Qdrant

bash

Copy

Download
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
B. Insert Embeddings

python

Copy

Download
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient("localhost") # Connect to server

# Create a collection (like a database table)

client.create_collection(
collection_name="my_collection",
vectors_config={"size": 384, "distance": "Cosine"} # Match model dim
)

# Add embeddings
[Link](
collection_name="my_collection",
points=[
{"id": 1, "vector": [Link]("Hugging Face is awesome").tolist()},
{"id": 2, "vector": [Link]("I love NLP").tolist()},
]
)
C. Query with Filters

python

Copy

Download
query = [Link]("What is Hugging Face?")
hits = [Link](
collection_name="my_collection",
query_vector=[Link](),
limit=2 # Top-2 results
)
print(hits[0].payload) # Returns closest match

4. FAISS vs. Qdrant

Feature FAISS Qdrant

Persistence Manual (save/load files) Built-in

Scalability Single-node Distributed (enterprise)

Metadata Filtering No Yes

Best For Static datasets, prototyping Production, dynamic data

5. Real-World Applications

1. RAG Systems: Store document embeddings for retrieval.

2. Semantic Search: Find similar text/images (e.g., "Find memes about AI").
3. Recommendations: "Users who liked this also liked…"
Example: RAG with FAISS

python

Copy

Download
from transformers import RagTokenizer, RagRetriever
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
retriever = RagRetriever.from_pretrained(
"facebook/rag-token-nq",
index_name="custom",
passages_path="my_faiss_index.index" # Load your FAISS index
)

Key Takeaways

• FAISS: Lightweight, good for prototyping.

• Qdrant: Production-ready with metadata support.
• Hugging Face Integration: Generate embeddings with sentence-transformers,
then store/query them ef ciently.

——————————————————————————————————————

Explain following - Chunking Strategies: Overlapping, Augmented, xed-size,

content-aware, KG

- Embedding Strategies: Comparing various embedding models

- Retrieval Strategies: Similarity search, hybrid search, ltering

- RAG Fusion: Combining search results for improved accuracy

- Retrieval Ranking: Tuning for best results

- GuardRails, Response Evaluation & Explanation (DeepEval-like tools)

fi
fi
fi
Here’s a detailed breakdown of the key components in advanced Retrieval-Augmented
Generation (RAG) systems, including chunking, embeddings, retrieval strategies, and
evaluation:

1. Chunking Strategies

Chunking splits documents into smaller pieces for efficient processing. Different strategies
include:

A. Fixed-Size Chunking

• How it works: Splits text into equal-sized chunks (e.g., 256 tokens).
• Pros: Simple, works well for uniform content.
• Cons: May split semantically related text.
• Example:
python

Copy

Download

from langchain.text_splitter import CharacterTextSplitter

• splitter = CharacterTextSplitter(chunk_size=256, chunk_overlap=20)
• chunks = splitter.split_text(document)

B. Overlapping Chunking

• How it works: Adds overlap between chunks (e.g., 10% of chunk size) to preserve
context.
• Pros: Reduces information loss at boundaries.
• Cons: Increases storage/compute costs.
C. Content-Aware Chunking

• How it works: Uses NLP (e.g., sentence boundaries, headings) to split logically.
• Methods:
◦ Sentence Splitting: NLTK/spaCy for sentence detection.
◦ HTML/PDF Parsing: Extract sections (e.g., BeautifulSoup for HTML).
• Example:
python

Copy

Download
from langchain.text_splitter import MarkdownHeaderTextSplitter
• headers = [("#", "Header1"), ("##", "Header2")]
• splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers)
• chunks = splitter.split_text(markdown_doc)

D. Knowledge Graph (KG)-Aware Chunking

• How it works: Chunks based on entity/relationship boundaries (e.g., nodes in a

KG).
• Use Case: Structured data like medical records or product catalogs.
E. Augmented Chunking

• How it works: Enhances chunks with metadata (e.g., summaries, embeddings).

• Example:
python

Copy

Download

chunk = {"text": "Hugging Face is a company...", "summary": "About HF",

"embedding": [0.1, 0.2,...]}

2. Embedding Strategies

Embeddings convert text to vectors. Key models and tradeoffs:

Model Dimensions Speed Semantic Quality Best For

BERT 768 Slow High Short-text search

Sentence-BERT 384-1024 Medium High General-purpose

all-MiniLM-L6 384 Fast Good Low-latency systems

OpenAI Embeddings 1536 Slow Very High Paid APIs

CLIP 512 Medium Multimodal Text-to-image search

Example (Sentence-BERT):
python

Copy

Download
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = [Link](["Hugging Face is awesome"])

Retrieval Strategies

A. Similarity Search

• How it works: Finds nearest vectors using cosine/L2 distance.

• Tools: FAISS, Qdrant, Pinecone.
• Example:
python

Copy

Download

distances, indices = faiss_index.search(query_embedding, k=5)

B. Hybrid Search

• How it works: Combines keyword search (BM25) + vector search.

• Use Case: When exact keyword matches matter (e.g., names).
• Tools: Elasticsearch + FAISS, Weaviate.
C. Filtering

• How it works: Applies metadata filters (e.g., date > 2023).

• Example (Qdrant):
python

Copy

Download

hits = qdrant_client.search(
• query_vector=embedding,
• filter=Filter(must=[FieldCondition(key="year", range=Range(gte=2020))],
• limit=5
• )

RAG Fusion

• How it works: Combines results from multiple retrievers (e.g., vector + keyword)
and reranks them.
• Methods:
◦ Reciprocal Rank Fusion (RRF): Merges rankings from different searches.
◦ Cross-Encoder Reranking: Uses BERT-like models to rescore.
• Example:
python

Copy

Download

from transformers import CrossEncoder

• cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
• scores = cross_encoder.predict([(query, chunk) for chunk in chunks])

Retrieval Ranking

Optimize retrieval quality by:

1. Dense vs. Sparse Retrieval: Combine BM25 (sparse) + DPR (dense).

2. Reranking: Use cross-encoders (e.g., ms-marco-MiniLM-L-6-v2).
3. Feedback Loops: Log user clicks to improve rankings.
Example (Reranking):

python

Copy

Download
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query, chunk) for chunk in top_chunks]
scores = [Link](pairs)
reranked_chunks = [chunk for _, chunk in sorted(zip(scores, top_chunks), reverse=True)]

Guardrails & Evaluation

A. GuardRails
• What it is: Rules to block harmful/off-topic outputs.
• Tools:
◦ NeMo Guardrails (NVIDIA)
◦ Microsoft Guidance
• Example:
python

Copy

Download

from guardrails import Guard

• guard = Guard.from_string(blocklist=["SSN", "credit card"])
• guarded_output = [Link](model_output)

B. Response Evaluation

• Metrics:
◦ Faithfulness: Is the output grounded in retrieved docs?
◦ Answer Relevance: Does it address the query?
• Tools:
◦ DeepEval: evaluate_faithfulness(generation, contexts)
◦ RAGAS (Retrieval-Augmented Generation Assessment)
Example (DeepEval):

python

Copy

Download
from [Link] import FaithfulnessMetric
metric = FaithfulnessMetric()
[Link](model_output, contexts=[retrieved_docs])
print([Link]) # 0.8 (faithfulness score)
C. Explanation

• Why it matters: Helps debug why certain chunks were retrieved.

• Methods:
◦ SHAP/LIME: Explain embedding contributions.
◦ Highlighting: Show matching text spans.

Key Takeaways
Component Goal Tools/Models

Chunking Balance context vs. efficiency LangChain, spaCy, NLTK

Embeddings High-quality vector representations Sentence-BERT, OpenAI, CLIP

Retrieval Find relevant docs quickly FAISS, Qdrant, Hybrid Search

RAG Fusion Combine multiple retrieval methods RRF, Cross-Encoders

Ranking Prioritize the best results BM25 + DPR, Rerankers

Guardrails Ensure safe/accurate outputs NeMo, DeepEval

For production RAG systems, combine these techniques and iterate based on evaluation
metrics. Need a Colab notebook for any step?

1. Zero-Shot Prompting

De nition: The model generates an answer without any prior examples, relying solely
on its pre-trained knowledge.
Use Case: Quick classification, general QA.
Example:

python

Copy

Download
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier("This movie is fantastic!", candidate_labels=["positive", "negative"])
# Output: {'labels': ['positive', 'negative'], 'scores': [0.99, 0.01]}

2. Few-Shot Prompting

De nition: Provide a few input-output examples in the prompt to guide the model.
Use Case: Custom tasks (e.g., translation, formatting).
Example:

text

Copy
fi
fi
Download
Translate English to French:
- "Hello" → "Bonjour"
- "Coffee" → "Café"
- "Car" →
Model Output: "Voiture"

3. Chain-of-Thought (CoT) Prompting

De nition: Encourage the model to explain its reasoning step-by-step before

answering.
Use Case: Math problems, complex reasoning.
Example:

text

Copy

Download
Q: If a store has 10 apples and sells 4, how many are left?
A: Let’s think step-by-step.
1. Start with 10 apples.
2. Sell 4 apples.
3. Remaining apples = 10 - 4 = 6.

4. Instruction Prompting

De nition: Explicitly state the task in the prompt (e.g., "Summarize this article:").
Use Case: Controlled generation (summarization, translation).
Example:

python

Copy

Download
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Summarize this: Hugging Face is a company...", max_length=30)

5. Template-Based Prompting

De nition: Use structured templates (e.g., for chatbots or data extraction).

Use Case: Consistent input/output formats.
Example:
fi
fi
fi
text

Copy

Download
User: {input}
Assistant: {output}

6. Self-Consistency Prompting

De nition: Generate multiple reasoning paths and pick the most consistent answer.
Use Case: Improving reliability of CoT.
Example:

• Generate 3 different reasoning paths for a math problem and vote for the most
frequent answer.

7. Least-to-Most Prompting

De nition: Break a complex problem into sub-questions, solve sequentially.

Use Case: Multi-step reasoning.
Example:

text

Copy

Download
Q: What is the capital of France?
Sub-Q1: Which country is Paris in?
Sub-Q2: Is Paris the capital of France?
A: Paris.

8. Retrieval-Augmented Generation (RAG) Prompting

De nition: Combine retrieval (search) + generation for factual accuracy.

Use Case: QA with external knowledge.
Example:

python

Copy

Download
from transformers import RagRetriever, RagTokenForGeneration
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq")
fi
fi
fi
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq",
retriever=retriever)

9. Multimodal Prompting

De nition: Use prompts combining text + images/audio.

Use Case: Image captioning, visual QA.
Example (CLIP):

python

Copy

Download
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
inputs = processor(text=["a cat", "a dog"], images=image, return_tensors="pt",
padding=True)

10. Active Prompting

De nition: Dynamically select the best examples for few-shot prompts based on input.
Use Case: Adaptive few-shot learning.

11. Directional Stimulus Prompting

De nition: Add hints or constraints to guide generation (e.g., "Answer in 10 words").

Example:

text

Copy

Download
Q: Describe the Eiffel Tower in 10 words.
A: "Iconic Parisian iron tower, 324m tall, built in 1889."

12. Emotion Prompting

De nition: Include emotional cues (e.g., "Explain like I’m 5").

Use Case: Tailoring tone (educational, empathetic).

13. Contrastive Prompting

fi
fi
fi
fi
De nition: Provide contrasting examples to highlight differences.
Example:

text

Copy

Download
Good Response: "Python is interpreted."
Bad Response: "Python is compiled."
Q: Is Python compiled or interpreted?

14. Generated Knowledge Prompting

De nition: First generate facts, then use them to answer.

Use Case: Fact-heavy QA.
Example:

text

Copy

Download
Step 1: Generate facts about the Eiffel Tower.
Step 2: Use facts to answer "When was it built?"

15. Automatic Prompt Engineering (APE)

De nition: Use LLMs to optimize prompts automatically.

Tools: OpenAI’s "auto-prompt" techniques.

16. Soft Prompting (Prompt Tuning)

De nition: Train continuous embeddings as prompts (no human-readable text).

Use Case: Parameter-efficient fine-tuning.
Example (PEFT Library):

python

Copy

Download
from peft import PromptTuningConfig, get_peft_model
config = PromptTuningConfig(task_type="SEQ_CLS", num_virtual_tokens=10)
model = get_peft_model(model, config)
fi
fi
fi
fi
17. Tree-of-Thought (ToT) Prompting

De nition: Explore multiple reasoning paths in a tree structure.

Use Case: Complex decision-making.

18. Maieutic Prompting

De nition: Ask the model to justify its own answers recursively.

Use Case: Debugging model reasoning.

19. Analogical Prompting

De nition: Solve problems by drawing analogies to known examples.

Example:

text

Copy

Download
Q: How is a CPU like a brain?
A: Both process information and control functions.

20. Meta-Prompting

De nition: Prompt the model to generate its own prompts.

Example:

text

Copy

Download
Generate a prompt to classify sentiment:
"Label this text as positive, negative, or neutral: {text}"

Summary Table

Technique Key Idea Best For

Zero-Shot No examples Quick tasks

fi
fi
fi
fi
Few-Shot Provide examples Custom tasks

CoT Step-by-step reasoning Math/logic

RAG Retrieve + generate Factual QA

Multimodal Text + images/audio CLIP, GPT-4V

Prompt Tuning Trainable soft prompts Efficient fine-tuning

Tree-of-Thought Explore multiple paths Complex decisions

Self-Consistency Vote among multiple answers Reliable reasoning

Key Takeaways

• For simple tasks: Zero-shot/few-shot.

• For reasoning: CoT, ToT, self-consistency.
• For factual accuracy: RAG.
• For ef ciency: Soft prompting (PEFT).
Need a deep dive on any technique? Let me know! 🚀
fi

Essential Skills for ML and GenAI
No ratings yet
Essential Skills for ML and GenAI
59 pages
Introduction to Generative Media Models
No ratings yet
Introduction to Generative Media Models
17 pages
Xet Storage Package Installation Warning
No ratings yet
Xet Storage Package Installation Warning
21 pages
Generative AI Techniques with OpenAI Models
No ratings yet
Generative AI Techniques with OpenAI Models
24 pages
Essential AI Tools for Learning
No ratings yet
Essential AI Tools for Learning
24 pages
AI Model Creation Guide: Step-by-Step
No ratings yet
AI Model Creation Guide: Step-by-Step
14 pages
Generative AI Concepts and Tools Guide
No ratings yet
Generative AI Concepts and Tools Guide
3 pages
Beginner Generative AI Project Ideas
No ratings yet
Beginner Generative AI Project Ideas
39 pages
AI System Evaluation and Open-Source Tools
No ratings yet
AI System Evaluation and Open-Source Tools
1 page
Generative AI & Prompt Engineering Lab
No ratings yet
Generative AI & Prompt Engineering Lab
4 pages
Hugging Face Sentiment Analysis Setup
No ratings yet
Hugging Face Sentiment Analysis Setup
5 pages
Deep Learning Frameworks: TensorFlow, Keras, PyTorch
No ratings yet
Deep Learning Frameworks: TensorFlow, Keras, PyTorch
10 pages
OpenAI API Python Cheat Sheet
No ratings yet
OpenAI API Python Cheat Sheet
11 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
AI Task Themes and Submission Guidelines
No ratings yet
AI Task Themes and Submission Guidelines
18 pages
Implementing GPT and BERT Models
No ratings yet
Implementing GPT and BERT Models
4 pages
Yi: Advanced Open Foundation Models
No ratings yet
Yi: Advanced Open Foundation Models
26 pages
Yi: Advanced Models for AI Tasks
No ratings yet
Yi: Advanced Models for AI Tasks
26 pages
Automated Doggy Door with VGG16 Model
No ratings yet
Automated Doggy Door with VGG16 Model
12 pages
Multi-Modal Vision with GPT-4o
No ratings yet
Multi-Modal Vision with GPT-4o
17 pages
NLP Model Training and Evaluation Guide
No ratings yet
NLP Model Training and Evaluation Guide
5 pages
Generative AI Tools Overview and Use Cases
No ratings yet
Generative AI Tools Overview and Use Cases
11 pages
Bengali Image Captioning with Transformers
No ratings yet
Bengali Image Captioning with Transformers
72 pages
Data Science Projects for Beginners
No ratings yet
Data Science Projects for Beginners
8 pages
LLaMA: Efficient Open Language Models
No ratings yet
LLaMA: Efficient Open Language Models
52 pages
Open-Source AI Tools for Projects
No ratings yet
Open-Source AI Tools for Projects
3 pages
Benefits of Pre-trained Models in AI
100% (1)
Benefits of Pre-trained Models in AI
13 pages
Machine Learning Lab Manual CS-601
No ratings yet
Machine Learning Lab Manual CS-601
11 pages
Chatbots Using GANs for NLP Tasks
No ratings yet
Chatbots Using GANs for NLP Tasks
61 pages
LLMOps for Chatbots and Customer Support
No ratings yet
LLMOps for Chatbots and Customer Support
7 pages
Fine-Tuned Vs RAG Short Notes ?
No ratings yet
Fine-Tuned Vs RAG Short Notes ?
25 pages
Deep Learning Projects Overview
No ratings yet
Deep Learning Projects Overview
3 pages
Generative AI Models: A Comprehensive Review
No ratings yet
Generative AI Models: A Comprehensive Review
31 pages
Topic Classification with Feedforward NN
No ratings yet
Topic Classification with Feedforward NN
2 pages
Langchain PDF Question-Answering Pipeline
No ratings yet
Langchain PDF Question-Answering Pipeline
7 pages
Autoencoder and BERT Model Examples
No ratings yet
Autoencoder and BERT Model Examples
17 pages
Arabic Image Caption Generator Guide
No ratings yet
Arabic Image Caption Generator Guide
29 pages
Gensim Word Embeddings and NLP Techniques
No ratings yet
Gensim Word Embeddings and NLP Techniques
22 pages
Generative AI Roadmap Overview
No ratings yet
Generative AI Roadmap Overview
5 pages
Python ML/DL/NLP Libraries Guide
No ratings yet
Python ML/DL/NLP Libraries Guide
4 pages
Recent Advances in AI Research Papers
No ratings yet
Recent Advances in AI Research Papers
15 pages
Transfer Learning in NLP with Transformers
No ratings yet
Transfer Learning in NLP with Transformers
34 pages
NLP Techniques and Applications Overview
No ratings yet
NLP Techniques and Applications Overview
43 pages
Sentiment Analysis Dataset CSV Guide
No ratings yet
Sentiment Analysis Dataset CSV Guide
2 pages
Deep Learning Frameworks Overview
No ratings yet
Deep Learning Frameworks Overview
23 pages
Master AI/ML Tools: Free Resources Guide
No ratings yet
Master AI/ML Tools: Free Resources Guide
11 pages
Phil Wang's AI Coding Playbook
No ratings yet
Phil Wang's AI Coding Playbook
10 pages
Deep Learning Lab Manual Overview
No ratings yet
Deep Learning Lab Manual Overview
67 pages
News Category Classifier with Gen AI
No ratings yet
News Category Classifier with Gen AI
15 pages
Generative AI Interview Flashcards
No ratings yet
Generative AI Interview Flashcards
2 pages
Machine Learning Code Breakdown Guide
No ratings yet
Machine Learning Code Breakdown Guide
33 pages
Quick Start to Deep Learning Frameworks
No ratings yet
Quick Start to Deep Learning Frameworks
9 pages
News Article Classification Code Guide
No ratings yet
News Article Classification Code Guide
2 pages
Overview of Popular LLM Benchmarks
No ratings yet
Overview of Popular LLM Benchmarks
5 pages
Apache Kafka Overview and Configuration Guide
No ratings yet
Apache Kafka Overview and Configuration Guide
52 pages
Data Engineering Best Practices Guide
No ratings yet
Data Engineering Best Practices Guide
23 pages
Collibra Data Lineage Success Metrics
No ratings yet
Collibra Data Lineage Success Metrics
1 page
Locio: Building Local Community Networks
No ratings yet
Locio: Building Local Community Networks
13 pages
Enhancing Transparency with Blockchain
No ratings yet
Enhancing Transparency with Blockchain
10 pages
Tax Interview Amazon
No ratings yet
Tax Interview Amazon
1 page
Control Theory in LLM Prompting
No ratings yet
Control Theory in LLM Prompting
28 pages
LLMs in Interview Protocol Development
No ratings yet
LLMs in Interview Protocol Development
21 pages
AutoRedTeamer: Automated Red Teaming Framework
No ratings yet
AutoRedTeamer: Automated Red Teaming Framework
35 pages
Mistral AI Documentation Overview 2025
No ratings yet
Mistral AI Documentation Overview 2025
4 pages
Oracle Cloud AI Foundations Exam Q&A
No ratings yet
Oracle Cloud AI Foundations Exam Q&A
8 pages
Paraphrasing Performance of LLMs
No ratings yet
Paraphrasing Performance of LLMs
22 pages
Riddhik Tilawat: AI Engineer Profile
No ratings yet
Riddhik Tilawat: AI Engineer Profile
1 page
Aya 23: Advancing Multilingual AI Models
No ratings yet
Aya 23: Advancing Multilingual AI Models
9 pages
Automated High-Level Test Case Generation
No ratings yet
Automated High-Level Test Case Generation
12 pages
Free AI Chatbot Development Course
No ratings yet
Free AI Chatbot Development Course
41 pages
AI Marketing Assistant Project Report
No ratings yet
AI Marketing Assistant Project Report
28 pages
Multi-Agent LLM Trading Framework
No ratings yet
Multi-Agent LLM Trading Framework
8 pages
AI in Democratic Deliberation
No ratings yet
AI in Democratic Deliberation
10 pages
Compositional Generalization in LLMs
No ratings yet
Compositional Generalization in LLMs
9 pages
Human-AI Symbiosis: Game Theory Insights
No ratings yet
Human-AI Symbiosis: Game Theory Insights
8 pages
Hybrid CTI Analysis System Overview
No ratings yet
Hybrid CTI Analysis System Overview
5 pages
FastVLM: Efficient Vision Language Model
No ratings yet
FastVLM: Efficient Vision Language Model
20 pages
Null 001.2024.issue 065 en
No ratings yet
Null 001.2024.issue 065 en
69 pages
NLP Fundamentals and Applications Guide
No ratings yet
NLP Fundamentals and Applications Guide
7 pages
LLMs in Spear Phishing Attacks Analysis
No ratings yet
LLMs in Spear Phishing Attacks Analysis
16 pages
Data Engineer Talha Khan's Profile
No ratings yet
Data Engineer Talha Khan's Profile
1 page
Multi-Stage LLM Fine-Tuning Strategy
No ratings yet
Multi-Stage LLM Fine-Tuning Strategy
14 pages
Panaversity Assessment Overview
No ratings yet
Panaversity Assessment Overview
17 pages
Understanding Prompt-Driven Development
No ratings yet
Understanding Prompt-Driven Development
2 pages
Unit 2
No ratings yet
Unit 2
17 pages
ChatGPT's Impact on Academic Integrity
No ratings yet
ChatGPT's Impact on Academic Integrity
41 pages
Agentic AI Threats & Mitigations Guide
100% (1)
Agentic AI Threats & Mitigations Guide
48 pages
Probing Language Model Evaluation Awareness
No ratings yet
Probing Language Model Evaluation Awareness
16 pages
Enhancing Text-to-SQL for SLMs
No ratings yet
Enhancing Text-to-SQL for SLMs
11 pages
Evaluating Flaws in Large Multimodal Models
No ratings yet
Evaluating Flaws in Large Multimodal Models
31 pages