1.
Bias Mitigation in Pre-Trained Language Models Using Real-
World Feedback Loops
Problem Definition:
Pre-trained models like GPT or BERT inherit societal biases from training
data. Most debiasing methods are static and fail in dynamic, real-world
applications.
Research Focus:
Create a framework that continuously adapts to user feedback for bias
mitigation while preserving performance.
Dataset Suggestions:
HateXplain
Jigsaw Toxic Comment
2. Self-Supervised Learning for Health Diagnostics Using
Wearable Sensor Data
Problem Definition:
Labeled health data is scarce and expensive. Most models rely on
supervised learning, which doesn't scale to large, unlabeled sensor
datasets from wearables.
Research Focus:
Design a self-supervised learning pipeline for time-series health data to
enable early anomaly detection in cardiac or sleep-related disorders.
Dataset Suggestions:
MIMIC-III Waveform Database
UCI HAR Dataset
3. Cross-Cultural Emotion Recognition Using Multi-Modal Deep
Learning
Problem Definition:
Emotion recognition models often generalize poorly across cultures due to
differences in facial expressions and vocal tone.
Research Focus:
Develop a cross-cultural, multi-modal deep learning framework (video +
audio + text) for robust and inclusive emotion recognition.
Dataset Suggestions:
RAVDESS (Audio-Visual)
CREMA-D
AFEW (Acted Facial Expressions in the Wild)
4. Explainable Deep Learning for Credit Risk Scoring in
Microfinance
Problem Definition:
Traditional credit scoring models ignore non-traditional data and are black-
box models, which reduces trust and accessibility in microfinance sectors.
Research Focus:
Build an interpretable deep learning model integrating tabular, textual
(e.g., mobile data), and image-based inputs for inclusive credit scoring.
Dataset Suggestions:
Give Me Some Credit (Kaggle)
LendingClub Loan Data
5. Adversarial Robustness in Medical Imaging Models with Real-
World Noise Injection
Problem Definition:
Deep learning models for medical imaging are sensitive to small
perturbations and image noise—limiting their reliability in real-world
hospital settings.
Research Focus:
Develop robust models using adversarial training and real-world noise
augmentation to defend against image degradation in medical tasks.
Dataset Suggestions:
ChestX-ray14
HAM10000 (Skin Cancer Images)
6. Personalized Deepfake Detection Using Meta-Learning
Techniques
Problem Definition:
Deepfake detection models struggle with generalization across identities
and styles of manipulation, especially for low-resource cases.
Research Focus:
Introduce a meta-learning framework to adapt detection models quickly to
new identities with minimal data.
Dataset Suggestions:
Deepfake Detection Challenge Dataset
FaceForensics++
7. Scalable AI for Real-Time Disaster Response Using Social Media
Streams
Problem Definition:
During disasters, social media contains critical information, but current AI
models are unable to accurately filter and classify information in real time.
Research Focus:
Build a transformer-based system that uses NLP + time-series
classification for real-time disaster detection and resource mapping.
Dataset Suggestions:
CrisisNLP Dataset
Disaster Tweets Dataset (Kaggle)
8. Continual Learning Framework for Personalized Learning
Recommendations
Problem Definition:
Educational recommendation systems struggle with evolving user
preferences and knowledge states, often leading to stale suggestions.
Research Focus:
Propose a continual learning framework with episodic memory and
concept drift adaptation to personalize learning paths.
Dataset Suggestions:
EdNet Dataset (From Santa)
ASSISTments Dataset
9. Green AI: Energy-Efficient Deep Neural Networks for Edge
Deployment
Problem Definition:
Most DL models are computationally intensive and unsuitable for low-
power edge devices.
Research Focus:
Design new training strategies and model architectures that reduce
energy consumption while maintaining accuracy, using pruning,
quantization, and neural architecture search.
Dataset Suggestions:
CIFAR-10 / CIFAR-100
Tiny ImageNet
10. Graph Neural Networks for Fake News Detection Using
Content + User Behavior
Problem Definition:
Fake news detection models often focus only on textual content, ignoring
relational and behavioral information across social networks.
Research Focus:
Develop a GNN-based approach that models user interactions, reposting
patterns, and community influence for accurate fake news classification.
Dataset Suggestions:
FakeNewsNet
LIAR Dataset