0% found this document useful (0 votes)
34 views5 pages

Project Overview

This document outlines an end-to-end data engineering project focused on migrating an on-premises SQL Server database to Azure Cloud, utilizing various Azure tools for data ingestion, transformation, and reporting. It details the project objectives, architecture, environment setup, and step-by-step processes for data handling, including Azure Data Factory, Azure Databricks, and Power BI. The project aims to enhance scalability and analytics capabilities by leveraging cloud technologies.

Uploaded by

MILIND
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

Project Overview

This document outlines an end-to-end data engineering project focused on migrating an on-premises SQL Server database to Azure Cloud, utilizing various Azure tools for data ingestion, transformation, and reporting. It details the project objectives, architecture, environment setup, and step-by-step processes for data handling, including Azure Data Factory, Azure Databricks, and Power BI. The project aims to enhance scalability and analytics capabilities by leveraging cloud technologies.

Uploaded by

MILIND
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

End-to-End Data Engineering Project:

Migrating On-Premises SQL Server


Database to Azure Cloud

Table of Contents
1.​Introduction
○​ Project Objective
○​ Tools and Technologies
○​ Architecture Overview
2.​Environment Setup
○​ Step 1: Create Azure Resources
○​ Step 2: Configure Azure Active Directory
○​ Step 3: Set Up Azure Key Vault
3.​Data Ingestion
○​ Step 1: Connect to On-Premises SQL Server
○​ Step 2: Configure Azure Data Factory
○​ Step 3: Copy Data to Azure Data Lake (Bronze Layer)
4.​Data Transformation
○​ Step 1: Set Up Azure Databricks
○​ Step 2: Transform Data from Bronze to Silver Layer
○​ Step 3: Transform Data from Silver to Gold Layer
5.​Data Loading
○​ Step 1: Set Up Azure Synapse Analytics
○​ Step 2: Load Data from Gold Layer to Synapse
6.​Data Reporting
○​ Step 1: Connect Power BI to Azure Synapse
○​ Step 2: Create Reports and Visualizations
7.​Pipeline Automation
○​ Step 1: Create End-to-End Pipeline in Azure Data Factory
○​ Step 2: Test the Pipeline
8.​Conclusion
○​ Summary of Deliverables
○​ Next Steps
1. Introduction

Project Objective

The goal of this project is to migrate an on-premises SQL Server database to


the Azure cloud, transforming and analyzing the data using Azure tools. The
project demonstrates a common use case for data engineers: moving traditional
on-premises databases to the cloud for scalability, cost-efficiency, and
advanced analytics.

Tools and Technologies

●​ Azure Data Factory: For data ingestion and pipeline orchestration.


●​ Azure Data Lake Gen 2: For storing raw and transformed data.
●​ Azure Databricks: For data transformation and implementing Lake House
architecture.
●​ Azure Synapse Analytics: For creating a cloud-based data warehouse.
●​ Power BI: For data visualization and reporting.
●​ Azure Active Directory: For identity and access management.
●​ Azure Key Vault: For securely storing secrets (e.g., credentials).

Architecture Overview

The project follows a Lake House architecture, which includes:


1.​ Bronze Layer: Raw data (exact copy of the source).
2.​ Silver Layer: Data with basic transformations (e.g., column renaming,
data type changes).
3.​ Gold Layer: Fully cleaned and curated data ready for analysis.
The data flows through the following stages:
1.​Data is ingested from the on-premises SQL Server database into the
Bronze Layer using Azure Data Factory.
2.​Data is transformed from Bronze to Silver and then to Gold using Azure
Databricks.
3.​Transformed data is loaded into Azure Synapse Analytics for querying and
analysis.
4.​Power BI is used to create reports and visualizations based on the data
in Synapse.
2. Environment Setup

Step 1: Create Azure Resources

1.​Log in to the Azure portal.


2.​Create the following resources:
○​ Azure Data Lake Gen 2: Set up a storage account with hierarchical
namespace enabled.
○​ Azure Data Factory: Create a data factory for ETL processes.
○​ Azure Databricks: Set up a Databricks workspace.
○​ Azure Synapse Analytics: Create a Synapse workspace and SQL pool.
○​ Azure Key Vault: Create a key vault for storing secrets.

Step 2: Configure Azure Active Directory

1.​Set up Azure Active Directory (AD) for identity and access management.
2.​Create service principals for secure access to Azure resources.
3.​Assign appropriate roles (e.g., Contributor, Reader) to the service
principals.

Step 3: Set Up Azure Key Vault

1.​Store the following secrets in Azure Key Vault:


○​ On-premises SQL Server credentials (username and password).
○​ Azure Data Lake connection strings.
○​ Azure Synapse credentials.
2.​Configure access policies to allow Azure Data Factory and Databricks to
retrieve secrets.

3. Data Ingestion

Step 1: Connect to On-Premises SQL Server

1.​Set up a self-hosted integration runtime in Azure Data Factory to


connect to the on-premises SQL Server.
2.​Test the connection to ensure Data Factory can access the database.

Step 2: Configure Azure Data Factory

1.​Create a pipeline in Azure Data Factory to copy data from the


on-premises SQL Server to Azure Data Lake.
2.​ Use the Copy Data activity to map tables from the SQL Server to the
Bronze Layer in Data Lake.

Step 3: Copy Data to Azure Data Lake (Bronze Layer)

1.​Run the pipeline to copy all tables from the SQL Server to the Bronze
Layer.
2.​Verify that the data has been successfully copied.

4. Data Transformation

Step 1: Set Up Azure Databricks

1.​Create a Databricks cluster.


2.​Install necessary libraries (e.g., PySpark, SQL).

Step 2: Transform Data from Bronze to Silver Layer

1.​Read data from the Bronze Layer into Databricks.


2.​Perform basic transformations:
○​ Rename columns.
○​ Change data types.
○​ Handle null values.
3.​Write the transformed data to the Silver Layer in Data Lake.

Step 3: Transform Data from Silver to Gold Layer

1.​Read data from the Silver Layer.


2.​Perform advanced transformations:
○​ Aggregate data.
○​ Join tables.
○​ Apply business logic.
3.​Write the final curated data to the Gold Layer.

5. Data Loading

Step 1: Set Up Azure Synapse Analytics

1.​Create a database and tables in Azure Synapse.


2.​Define the schema based on the Gold Layer data.
Step 2: Load Data from Gold Layer to Synapse

1.​Use Azure Data Factory or Databricks to load data from the Gold Layer
into Synapse.
2.​Verify that the data has been successfully loaded.

6. Data Reporting

Step 1: Connect Power BI to Azure Synapse

1.​Set up a connection between Power BI and Azure Synapse.


2.​Import the data into Power BI.

Step 2: Create Reports and Visualizations

1.​Design dashboards and reports in Power BI.


2.​Use charts, graphs, and tables to visualize the data.

You might also like