0% found this document useful (0 votes)

102 views12 pages

Azure Data Engineering Course Overview

This course contains 12 modules covering topics in data engineering on the Microsoft Azure platform. The modules include exploring compute and storage options, running interactive queries with Synapse SQL pool, data exploration and transformation in Databricks, loading data into the data warehouse with Synapse and Databricks, data movement with Data Factory and Synapse pipelines, implementing end-to-end security, hybrid transactional/analytical processing, and real-time stream processing with Stream Analytics and Databricks. The modules contain theory, demos, and hands-on labs.

Uploaded by

raghu.learn.007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views12 pages

Azure Data Engineering Course Overview

Uploaded by

raghu.learn.007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COURSE CONTENT

There are 12 modules in this course:

1. Module 0: Introduction to the course

2. Module 1: Explore Compute and Storage Options for Data Engineering Workloads
Introduction to Data Engineering and the Microsoft Azure Platform
3. Module 2: Run Interactive Queries with Azure Synapse Analytics Serverless SQL Pool Azure
Synapse Analytics: Serverless SQL Pool
4. Module 3: Data Exploration and Transformation in Azure Databricks Use Azure Databricks
to perform Batch Processing of Data
5. Module 4: Explore, Transform and Load Data into the Data Warehouse using Azure Synapse
Analytics Apache Spark Azure Synapse Analytics: Apache Spark Pools
6. Module 5: Ingest and Load Data into the Data Warehouse Azure Synapse Analytics:
Dedicated SQL Pool
7. Module 6: Transform Data with Azure Data Factory or Azure Synapse Pipelines Creating
Code-Free Pipelines to transform the Data
8. Module 7: Orchestrate Data Movement using Azure Data Factory or Azure Synapse
Pipelines Data Movement using the code-free pipelines
9. Module 8: Implementing End-to-end security of Data: How to secure the data while
performing Data Engineering
10. Module 9: Hybrid Transactional Analytical Processing (HTAP)
11. Module 10: Real-time stream processing with Azure Stream Analytics Processing
Streaming Data using Azure Stream Analytics
12. Module 11: Create a Stream Processing Solution with Event Hubs and Azure Databricks
Processing Streaming Data using Azure Databricks and Event Hubs

Module 0, Module 1, Module 3

Module 11, Module 10, Module 4

Module 5, Module 2

Module 6 & 7, Module 8, Module 9

Theory + Demos + LAB

Module 1: Explore Compute and Storage Options for Data Engineering
Workloads
1. What is Data Engineering?
2. Data Engineering is a process in which the data is first extracted from the different data
sources, then it is processed in the required way and then it will be provided to other users
for to perform their specific tasks on it.

Ingestion / Loading Data Store

Data Sources Processing and
Extractions
Text, CSV, Transforming A data store is a
DBs, etc the Data storage location

Other Users who want

to use this data

Data Analysts, Data

Scientists, DBAs, etc.

3. Roles and Responsibilities of a Data Engineer:

a. Connect and Collect the data from different data sources
b. Process and Transform the data and make it ready for further usage
c. Load the data at proper storage locations
d. Security
e. Monitoring and Maintenance of the DE solution

4. What is Azure?
5. It is a cloud service provider by Microsoft
6. What is Cloud?
7. It is an environment which provides us with all of the resources that we need to set up the IT
infrastructure of a company on rent. Cloud can provide us with Machines, Processors / CPUs,
Memory, Storage, Networking, VPNs, Security, Servers, etc.
8. The cloud service providers set up very big data centres at multiple locations all around the
globe. The resources that we need are created inside those data centres, and we are
provided with the access to those resources through the internet.

9. Since Data Engineering is nothing but the ETL process only which is performed at a very large
scale with different types of data coming in at huge speeds, that is why, on Azure we need to
tools which can handle this type of data to perform ETL on it.
10. Tools available on Azure to perform ETL are:
a. Extract Process: Connect and Collect the data from different data sources
Methods to perform extraction: Code, Query, Code-Free Pipeline
i. Azure Databricks: Code, Query
ii. Azure Data Factory: Code-Free Pipelines
iii. Azure Synapse Analytics: Code, Query, Code-Free Pipelines
b. Transformation Process: Process and Transform the data and make it ready for
further usage
Methods to perform Transformation: Code, Query, Code-Free Pipeline
i. Azure Databricks: Code, Query
ii. Azure Data Factory: Code-Free Pipelines
iii. Azure Synapse Analytics: Code, Query, Code-Free Pipelines
c. Loading Process: Store the data at proper storage locations
The type of resource to store the data depends upon the type of data itself
i. Structured Data: SQL Databases, SQL Data Warehouse, etc.
ii. Semi-Structured Data: Cosmos DB, Blob Storage Accounts, Azure Data Lake
Storage Gen2 (ADLS Gen2)
iii. Unstructured Data: Blob Storage Accounts, Azure Data Lake Storage Gen2
(ADLS Gen2)

11. Streaming Data: It is the real-time data which is processed as soon as it is generated by the
data source
12. Tools for processing streaming Data:
a. Azure Databricks
b. Azure Stream Analytics
c. Azure Synapse Analytics
Module 3: Data Exploration and Transformation using Azure Databricks
1. What is Databricks?
2. Azure Databricks is a compute resource
3. What is a compute resource?
4. It is a resource which provides us with the processing power on our data. In our PCs, the CPU
and the Memory are the compute resources.
5. Here also, in Databricks we are provided with CPUs and Memory to process the data.
6. Databricks is not a MS product, rather it is in direct competition with MS
7. A question arises here that if Databricks is in competition with MS then why has MS

8. The reason of providing Databricks on the Azure environment is that it is very a popular data
processing tool because of its high data processing speed
9. There are 2 reasons behind the high data processing speeds of Databricks:
a. Distributed Processing
b. In-Memory Processing

Azure Databricks Workspace

Cluster
External Storage Resource:
Source of Data
Worker Node Worker Node

Notebook
Worker Node Driver Node External Storage Resource: Sink
Used to write of Data
code/query

1. Scala
Worker Node 2. Python
Worker Node
3. SQL
4. R

10. Cluster: A group of Nodes

11. Node: It is the basic processing unit in Databricks. CPU + Memory
12. The nodes are of two types:
a. Worker Node: These are the nodes which actually perform the processing of data in
the cluster. All the nodes other than the Driver Node are Worker nodes in the cluster
b. Driver Node: It is the node which controls the working of all the other nodes inside
the cluster. There is only one single driver node in the cluster
13. Memory CPU
14. Storage CPU
15. Now, we will talk about an external storage resource. This resource is called the Storage
Account
16. A storage account is a resource on the Azure portal, which provides us with different types
of tools to store the non-relational data in it
17. The Storage Accounts are of two types:
a. Blob Storage Account
b. Azure Data Lake Storage Gen2 (ADLS Gen2)

Blob Storage Account Azure Data Lake Storage Gen2 (ADLS Gen2)

CONTAINER (Blob Container) CONTAINER (Data Lake)

It is used for storing non-relational data files It is used for storing non-relational data files

Hierarchical file system is not supported Hierarchical file system is supported

All data is stored in the root directory Data is stored in directories and sub-
directories

MESSAGE QUEUES MESSAGE QUEUES

Storing Messages Storing Messages

FILE SHARES FILE SHARES

Used for mapping hard drives of our PC to Used for mapping hard drives of our PC to
cloud cloud

AZURE TABLES AZURE TABLES

Storing non-relational data in the key-value Storing non-relational data in the key-value
format format
Module 11: Create a Stream Processing Solution with Event Hubs and
Azure Databricks
1. What is Streaming Data?
2. It is the real-time data, which it processed as soon as it is created by the data source.

Streaming Data Databricks

Source

Receive / Listen
Send

Streaming Data Databricks

Event Hub
Source

3. The Event Hubs in the Azure environment are used to control the flow of streaming data
4. On Azure the Event Hubs are not created directly, rather they are created inside an Event
Hub Namespace
Module 10: Real-time stream processing with Azure Stream Analytics
1. In this module also we will be talking about processing the Streaming Data only, but this
time, we will be using another tool available on the Azure portal which is called the Azure
Stream Analytics instead of Databricks.
2. What is Azure Stream Analytics?

Azure Databricks Azure Stream Analytics

It is a tool used for processing both Batch This tool is used for processing streaming
and Stream Data data specifically
It is a coding-based environment where we It is a GUI-based environment where we do
have to write the code for every operation not need to write the code for all the
that is performed operations
Being a coding-based environment, this is Being a GUI-based environment the
an open environment where we can do any configuration options are limited
activity by just writing the code for it

3. The process of using Azure Stream Analytics is the same as using Databricks for processing
Streaming Data

Receive / Listen
Send

Streaming Data Azure Stream

Event Hub Analytics
Source

Storage Location
Module 4: Explore, Transform and Load Data into the Data Warehouse
using Azure Synapse Analytics Apache Spark
1. The Azure Synapse Analytics is an environment on the Azure portal, which provides us with
all of the tools that are required to perform Data Engineering on one single environment
2. The Azure Synapse Analytics environment makes the integration of resources with each
other much easier for us

Azure Synapse Analytics

SQL Server
Azure Data Lake Storage Gen2 Apache Spark Pool
Storing non-relational data It is the same as a
files Databricks cluster Serverless SQL Pool Dedicated SQL Pool
Container It provides us with It provides us with storage
compute for the data and compute both for
structured data
Synapse Studio Same as a Data Warehouse

Apache Spark Notebooks

write commands to process data SQL Script
Azure Synapse Pipelines
1. SQL
Used to create code-free 2. Scala
Used to writing SQL
pipelines 3. Python Queries
4. C#
5. R

3. Under Azure Synapse, we get:

a. A tool to store the non-relational data: ADLS Gen2
b. A tool to store the relational data: Dedicated SQL Pool
c. A tool to process the non-relational data: Apache Spark Pool
d. Tools to process the relational data: Dedicated SQL Pool & Apache Spark Pool
e. A tool to write code: Apache Spark Notebooks
f. A tool to write SQL Queries: SQL Scripts
g. A tool to create code-free pipelines: Azure Synapse Pipelines
Module 5: Ingest and Load Data into the Data Warehouse
1. The Dedicated SQL Pools are the same as Data Warehouses and they are used to store and
process the structured data.
2. First of all we ingest the data into the Dedicated SQL Pools and afterwards we can write
queries on it to process that data according to our requirements.
3. In this module we will be mainly talking about how to store the data into the Dedicated SQL
Pools
4. There are many different ways to store the data in the Dedicated SQL Pools:
a. We can set up a Databricks process to transform the data and then store that data
into the Dedicated SQL Pool
b. We can set up a similar process with the Apache Spark Pools as well
c. We can write queries directly on the Dedicated SQL Pools and extract the data from
the data sources
d. We can use the Polybase technique
e. We can set up the code-free pipelines to store the data into the Dedicated SQL Pools
5. We know that the Dedicated SQL Pool stores structured data only in it by default, but by
using the Polybase technique, we give the Dedicated SQL Pool the ability to directly connect
to the semi-structured data files and extract the data from then and store it.
6. The Polybase technique stores the data in External Tables only

Module 2: Run Interactive Queries with Azure Synapse Analytics

Serverless SQL Pool
1. The Serverless SQL Pool is also a data processing resource, but it is a shared resource.
2. Whereas the Dedicated SQL Pools and also the Apache Spark Pools are both dedicated
resources.

Apache Spark Pool Dedicated SQL Pool Serverless SQL Pool

It is a dedicated resource It is a dedicated resource It is a shared resource
It provides us with compute only It provides us with storage and It provides us with compute only for
for the data compute both for the data the data
It can process all types of data It works on structured data only by It works on structured data only by
default default
It is a costly resource It is also a costly resource It is a cheaper resource
119 INR per hour 109 INR per hour 360 INR / TB
It is a fast resource It is also a fast resource It is a slower resource
It supports code in 5 different It supports SQL Queries only It also supports SQL Queries only
languages
The code is written in Spark Queries are written in SQL Scripts Queries are written in SQL Scripts
Notebooks

3. When do we use the Serverless SQL Pool?

4. The Serverless SQL Pool is used in 2 cases:
a. When we are creating a transformation process that will keep on running in the
background
b. When the data processing requirements are highly variable:
i. Day 1: 7 TB
ii. Day 2: 2 TB
iii. Day 3: No Data
iv. Day 4: 1.8 TB
v. Day 5: 4 TB
vi. Day 5: 650 GB

Module 6: Transform Data with Azure Data Factory or Azure Synapse

Pipelines
Module 7: Orchestrate Data Movement using Azure Data Factory or
Azure Synapse Pipelines
1. Both of these tools, i.e., Azure Data Factory and Azure Synapse Pipelines are used to create
code-free pipelines on Azure
2. A code free pipeline is basically a group of synchronised activities that are performed in
ordered way
3. Both of these tools are also exactly the same, there is no difference between them.
4. If both of the tools are the same, then why do we have two of them?
5. ADF is the cloud implementation of SSIS
6. Azure Synapse Pipelines are the implementation of ADF on the Synapse environment

Temporary
Data Copy Tansform
Copy of Data Storage of Transformed
Source Transformed Data
Data

HTTP ADLS Gen2 Staging Dedicated

Server
Service SQL Pool
Module 8: Implementing End-to-end security of Data
1. In this module we will talk about how to secure the data while it is in our data engineering
solution.
2. Security of the data while it is in our organisational network is mainly the task of the
network engineer or the security expert. But being a data engineer we are also responsible
for the data that we are going to handle
3. That is why we only have to protect the data while it is residing in our data engineering
solution. This is the reason why as data engineers there are not a lot of things that we need
to do to secure the data.
4. The following tasks can be performed to secure the data by a DE:
a. Never over-expose the data to the internet
b. Never provide unrestricted of storage resources to anyone
c. Instead of sharing keys and passwords, we should use Azure Key Vault

Module 9: Hybrid Transactional Analytical Processing (HTAP)

1. Online Transactional Processing (OLTP): It is the process of storing the real-time
transactional data into the databases
2. Online Analytical Processing (OLAP): It is the process of generating reports and analytics
based on the aggregated data stored in the data warehouses

Data Warehouse

Database Transformed
and Aggregated Clean, Transformed,
Reports
Transactional Aggregated and
and
Data Real-time data Historical Data
Analytics
OLTP

OLAP

3. HTAP or Hybrid Transactional Analytical Processing is a process using which we can perform
both OLTP and OLAP using one single resource only. The benefit of this is that we will get
real-time reports with the help of it.
4. The HTAP processing is quite new and not a lot of resources can perform it yet. On the Azure
Portal, two resources work together to provide us with HTAP capabilities. They are:
a. Cosmos DB
b. Azure Synapse Analytics
5. Here, we will be using Cosmos DB to store the real-time transactional data, i.e., for
performing the OLTP process. Then we will connect it to Azure Synapse Analytics using a
technology called Azure Synapse Link, in such a way that the data will appear to be stored in
Synapse only in real time. Then the Data Analysts can use the Analytical capabilities of
Synapse to generate reports on it

Azure Databricks Course Overview
No ratings yet
Azure Databricks Course Overview
4 pages
Azure Data Engineer Interview Insights
No ratings yet
Azure Data Engineer Interview Insights
20 pages
Azure Data Engineer Certification Guide
No ratings yet
Azure Data Engineer Certification Guide
4 pages
Azure Data Engineering Course Overview
No ratings yet
Azure Data Engineering Course Overview
35 pages
Azure Data Engineer Course Outline
No ratings yet
Azure Data Engineer Course Outline
4 pages
Azure Data Factory Course Overview
No ratings yet
Azure Data Factory Course Overview
4 pages
Azure Data Engineer Roadmap 2025-26
No ratings yet
Azure Data Engineer Roadmap 2025-26
28 pages
Azure Databricks Distributed Systems Guide
No ratings yet
Azure Databricks Distributed Systems Guide
11 pages
Azure Data Factory Comprehensive Guide
No ratings yet
Azure Data Factory Comprehensive Guide
13 pages
Azure Data Engineering Course Overview
No ratings yet
Azure Data Engineering Course Overview
2 pages
Data Engineering Course on Azure DP-203
No ratings yet
Data Engineering Course on Azure DP-203
12 pages
Azure Databricks Integration Guide
No ratings yet
Azure Databricks Integration Guide
32 pages
Azure Data Engineer Learning Path Guide
No ratings yet
Azure Data Engineer Learning Path Guide
1 page
Azure Databricks Mastery Project Guide
No ratings yet
Azure Databricks Mastery Project Guide
79 pages
Azure Databricks Mastery Project Guide
No ratings yet
Azure Databricks Mastery Project Guide
53 pages
Azure Data Engineer Interview Q&A
No ratings yet
Azure Data Engineer Interview Q&A
2 pages
Azure Data Engineering Course Overview
No ratings yet
Azure Data Engineering Course Overview
8 pages
Azure Databricks: Hands-On Project Guide
No ratings yet
Azure Databricks: Hands-On Project Guide
87 pages
Azure Data Platform End2End - 2day
100% (2)
Azure Data Platform End2End - 2day
108 pages
Azure Data Engineering Cheat Sheet
No ratings yet
Azure Data Engineering Cheat Sheet
37 pages
Azure Databricks Mastery Guide
No ratings yet
Azure Databricks Mastery Guide
36 pages
Azure Databricks Course Overview
75% (4)
Azure Databricks Course Overview
169 pages
Data Engineering Course 2024-2025 Guide
No ratings yet
Data Engineering Course 2024-2025 Guide
4 pages
Azure Databricks Medallion Architecture Guide
No ratings yet
Azure Databricks Medallion Architecture Guide
95 pages
Azure Databricks Medallion Architecture Guide
No ratings yet
Azure Databricks Medallion Architecture Guide
96 pages
Azure Data Pipelines Overview
No ratings yet
Azure Data Pipelines Overview
41 pages
Limitless Analytics with Azure Synapse
No ratings yet
Limitless Analytics with Azure Synapse
23 pages
Azure AI Solutions Course Overview
No ratings yet
Azure AI Solutions Course Overview
13 pages
Azure Databricks Mastery Guide
No ratings yet
Azure Databricks Mastery Guide
173 pages
DP-203 Azure Data Engineering Crash Course
100% (1)
DP-203 Azure Data Engineering Crash Course
91 pages
Azure Data Factory: Self-Signed Cert Issues
No ratings yet
Azure Data Factory: Self-Signed Cert Issues
1,158 pages
Azure Data Engineer + Databricks Content
100% (1)
Azure Data Engineer + Databricks Content
7 pages
Azure Data Engineer Training Program
No ratings yet
Azure Data Engineer Training Program
6 pages
Azure Data Engineer 3-Month Plan
No ratings yet
Azure Data Engineer 3-Month Plan
6 pages
Azure Data Engineer Overview and Roles
No ratings yet
Azure Data Engineer Overview and Roles
74 pages
DP 203T00A ENU PowerPoint - 01
No ratings yet
DP 203T00A ENU PowerPoint - 01
20 pages
Azure Fundamentals for Data Engineering
No ratings yet
Azure Fundamentals for Data Engineering
8 pages
Data Engineering on Azure DP-203 Course
No ratings yet
Data Engineering on Azure DP-203 Course
8 pages
Azure Databricks Project Guide
No ratings yet
Azure Databricks Project Guide
171 pages
ADF Basics Workshop Overview
No ratings yet
ADF Basics Workshop Overview
26 pages
Azure Data Lake Storage Gen2 Overview
No ratings yet
Azure Data Lake Storage Gen2 Overview
8 pages
Azure Data Solutions Overview
No ratings yet
Azure Data Solutions Overview
8 pages
Azure Databricks
75% (8)
Azure Databricks
69 pages
Trendytech Big Data Masters Program
No ratings yet
Trendytech Big Data Masters Program
14 pages
Ultimate Data Engineering Course Overview
No ratings yet
Ultimate Data Engineering Course Overview
10 pages
Azure Data Factory Overview and Use Cases
100% (1)
Azure Data Factory Overview and Use Cases
2,034 pages
Azure Data Engineer Learning Path
No ratings yet
Azure Data Engineer Learning Path
1 page
Databricks Course for Data Engineering
No ratings yet
Databricks Course for Data Engineering
14 pages
Azure Databricks Onboarding Guide
No ratings yet
Azure Databricks Onboarding Guide
298 pages
Azure Data Engineer Course Overview
No ratings yet
Azure Data Engineer Course Overview
16 pages
Azure Data Factory ETL Guide
100% (2)
Azure Data Factory ETL Guide
10 pages
Azure Data Engineering Training Guide
No ratings yet
Azure Data Engineering Training Guide
10 pages
Azure Data Factory Activity Types Explained
100% (1)
Azure Data Factory Activity Types Explained
37 pages
Integrating Teradata with Azure Data Factory
No ratings yet
Integrating Teradata with Azure Data Factory
92 pages
Cost Management in Databricks Data Lakes
No ratings yet
Cost Management in Databricks Data Lakes
94 pages
Truetzschler Blow Room Technology Overview
67% (3)
Truetzschler Blow Room Technology Overview
64 pages
Acceptance and Commitment Therapy For Leadership and Change Managers
No ratings yet
Acceptance and Commitment Therapy For Leadership and Change Managers
10 pages
MA78 Digi LR
No ratings yet
MA78 Digi LR
230 pages
Notebook L1
No ratings yet
Notebook L1
4 pages
Evidence Designation for Trial: U.S. v. Perez-Otero
No ratings yet
Evidence Designation for Trial: U.S. v. Perez-Otero
4 pages
Quotation for Electrical Installations
No ratings yet
Quotation for Electrical Installations
1 page
IGCSE Computer Science: Text, Sound, Images
No ratings yet
IGCSE Computer Science: Text, Sound, Images
13 pages
Authors Deliberately Commit Research Fraud? Retractions in The Scientific Literature: Do
No ratings yet
Authors Deliberately Commit Research Fraud? Retractions in The Scientific Literature: Do
6 pages
De PD 17
No ratings yet
De PD 17
4 pages
ISO 22716 Internal Audit Checklist
No ratings yet
ISO 22716 Internal Audit Checklist
1 page
Technical Specifications Alct - Srdevice - SNMP - n1
No ratings yet
Technical Specifications Alct - Srdevice - SNMP - n1
25 pages
Appreciation of Longfellow's "Success" Poem
No ratings yet
Appreciation of Longfellow's "Success" Poem
1 page
The Second Sex: A Feminist Milestone
No ratings yet
The Second Sex: A Feminist Milestone
4 pages
Understanding the Partograph in Labour
No ratings yet
Understanding the Partograph in Labour
52 pages
Inuinnait: The Copper Inuit Culture
No ratings yet
Inuinnait: The Copper Inuit Culture
4 pages
Service Learning Attitude & Competency Survey
No ratings yet
Service Learning Attitude & Competency Survey
2 pages
Driver's Handbook: Replacing Belts B13R
No ratings yet
Driver's Handbook: Replacing Belts B13R
14 pages
Legal Issues of Rohingya in Indonesia
No ratings yet
Legal Issues of Rohingya in Indonesia
17 pages
GDS101 Operation Manual
No ratings yet
GDS101 Operation Manual
70 pages
Payment Advice for Canara Bank Transaction
No ratings yet
Payment Advice for Canara Bank Transaction
2 pages
Ethics and Integrity in Civil Services
No ratings yet
Ethics and Integrity in Civil Services
4 pages
Genesis 1:1-5 Creation Commentary
No ratings yet
Genesis 1:1-5 Creation Commentary
2 pages
English Grammar Exercises Handout
No ratings yet
English Grammar Exercises Handout
5 pages
Neonatal Eye Infections and Birth Injuries
No ratings yet
Neonatal Eye Infections and Birth Injuries
6 pages
Fibroblast Proliferation in Osteoarthropathy
No ratings yet
Fibroblast Proliferation in Osteoarthropathy
11 pages
Biomes and Ecosystem Dynamics
No ratings yet
Biomes and Ecosystem Dynamics
11 pages
Risala SOS Sayyid Mahdi Bahr Al Ulum
No ratings yet
Risala SOS Sayyid Mahdi Bahr Al Ulum
26 pages
Phosphate Pump Specifications: REKOS KR
No ratings yet
Phosphate Pump Specifications: REKOS KR
3 pages
PageRank vs. Personalized PageRank Explained
No ratings yet
PageRank vs. Personalized PageRank Explained
18 pages
Practical Research 2: Quantitative Format Guide
No ratings yet
Practical Research 2: Quantitative Format Guide
24 pages

Azure Data Engineering Course Overview

Uploaded by

Azure Data Engineering Course Overview

Uploaded by

COURSE CONTENT

There are 12 modules in this course:

1. Module 0: Introduction to the course

Module 0, Module 1, Module 3

Module 11, Module 10, Module 4

Module 6 & 7, Module 8, Module 9

Theory + Demos + LAB

Ingestion / Loading Data Store

Other Users who want

Data Analysts, Data

3. Roles and Responsibilities of a Data Engineer:

Azure Databricks Workspace

10. Cluster: A group of Nodes

CONTAINER (Blob Container) CONTAINER (Data Lake)

Hierarchical file system is not supported Hierarchical file system is supported

MESSAGE QUEUES MESSAGE QUEUES

Storing Messages Storing Messages

FILE SHARES FILE SHARES

AZURE TABLES AZURE TABLES

Streaming Data Databricks

Streaming Data Databricks

Azure Databricks Azure Stream Analytics

Streaming Data Azure Stream

Azure Synapse Analytics

Apache Spark Notebooks

3. Under Azure Synapse, we get:

Module 2: Run Interactive Queries with Azure Synapse Analytics

Apache Spark Pool Dedicated SQL Pool Serverless SQL Pool

3. When do we use the Serverless SQL Pool?

Module 6: Transform Data with Azure Data Factory or Azure Synapse

HTTP ADLS Gen2 Staging Dedicated

Module 9: Hybrid Transactional Analytical Processing (HTAP)

You might also like