Azure Data Engineering
Presented by Tanishka
by Tanishka .
Meet Tanishka
Hi, I9m Tanishka, a DevOps & Cloud Enthusiast with a strong
interest in Data Engineering & AI-driven Automation.
Current Roles & Learning:
ML Intern at Unified Mentor 3 Working on AI/ML & data-
driven solutions.
Pursuing MSc in AI & ML from IIITB & LJMU (UK)
Microsoft Azure AI-900 Certified.
Skills & Expertise
Cloud & DevOps Tools
Microsoft Azure: Data Factory, Synapse Analytics, Blob Storage
Terraform & Kubernetes for Infrastructure as Code (IaC)
CI/CD Pipelines: Jenkins, GitHub Actions, ArgoCD
Data Engineering & AI Tools
Azure Databricks (PySpark) for big data processing
SQL & Power BI for analytics and visualization
Machine Learning & Python for predictive insights
Data Engineering Project:
Objective: Build an end-to-end Azure Data Engineering pipeline to process
& analyze Tokyo Olympics data.
' Key Features:
Ingests real-time & historical Olympics data from multiple sources.
Cleans, transforms, and stores structured data for analysis.
Visualizes athlete performance, medal standings & country trends.
Architecture Overview:
Ingestion: Azure Data Factory (ADF) + APIs & Blob Storage
Processing: Azure Databricks (PySpark)
Storage & Analytics: Azure Synapse + Power BI
Orchestration: Azure Logic Apps & Automation
Architecture Breakdown
Data Ingestion
Gathering data from various sources
Processing
Transforming and cleaning the data
Storage
Storing data in Azure Synapse Analytics
Visualization
Presenting insights through dashboards
Incremental Data Loading for Tokyo Olympics
Project
Change Data Capture
2
Identify new or updated Tokyo
Olympics data.
Initial Load
Load all historical Tokyo Olympics 1
data.
Incremental Load
3 Load only the changed Tokyo Olympics
data.
Load only the new or updated Tokyo Olympics data. This optimizes performance and reduces load times in the Azure Data
Engineering pipeline.
Challenges & Solutions
Challenge 1
Handling large data volumes.
Solution: Optimized data partitioning.
Challenge 2
Optimizing ETL performance.
Solution: Used Azure best practices.
Scope of Azure Data
Engineering
Data Integration
Integrating diverse data sources into a unified platform.
Big Data Processing
Handling large volumes of data using scalable solutions.
Data Warehousing
Building and managing data warehouses for analytical reporting.
Machine Learning
Enabling machine learning models with reliable data pipelines.
Next Steps & Resources
Microsoft Learn
Expand your knowledge
GitHub
Explore community projects
Azure Documentation and Stack Overflow provide additional support.