0% found this document useful (0 votes)
78 views3 pages

Engineering Data Mesh in Azure Cloud

Gireesh has over 3 years of experience developing software applications and implementing manual testing. He has strong skills in big data technologies like PySpark, Spark SQL, Azure Data Factory, and Azure Databricks. He has worked on projects involving data migration from Netezza to Azure SQL and building dashboards using Azure services. Gireesh is proficient in Python, SQL, and big data platforms and looks to take on challenging Azure and big data projects.

Uploaded by

raghu.k326
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views3 pages

Engineering Data Mesh in Azure Cloud

Gireesh has over 3 years of experience developing software applications and implementing manual testing. He has strong skills in big data technologies like PySpark, Spark SQL, Azure Data Factory, and Azure Databricks. He has worked on projects involving data migration from Netezza to Azure SQL and building dashboards using Azure services. Gireesh is proficient in Python, SQL, and big data platforms and looks to take on challenging Azure and big data projects.

Uploaded by

raghu.k326
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

gireeshchoudary2@gmail.

com
+91- 8123405641

Gireesh.K

Career Aspirations

Willing to take challenging assignments in Azure, Big data related Projects or service, and to work with latest
technologies in order to achieve Organizational and Individual goals.

Experience Summary

 Having 3+ years of experience developing and implementation of software applications along with
manual testing .
 Good understanding of big data ecosystem and Azure Cloud system.

 Having 2+ years of Work experience with Pyspark, Spark SQL ,Azure data factory and Azure Databricks.

 Working with RDD’S, Data Frame’s in Pyspark frame works.

 Good work experience with Azure data factory, Azure Data bricks.

 Providing access to the users based on approval status.

 Working on daily, weekly, monthly and yearly store load activities.

 Providing the data extract reports based on the client requirements.

 Analyze and make the Code Changes the as per the business requirement

 Good understanding of Spark Architecture, Transformations and Actions.

 Adding and modifying the stores, providing the data extracts based on client need.

 Working with incidents and Tasks.

 Providing the Audit reports on daily basses.

 Excellent analytical, communication and mentoring skills prove an asset to Organization.

 Quick learner and ability to mingle with the working environment.

Professional experience :

 Working as a Software Engineer for INAVANTAGE SOLUTIONS Pvt Ltd from May 2019 to Feb 2023.
gireeshchoudary2@[Link]
+91- 8123405641

Education and Certifications :

[Link] (Mechanical) Visvesvaraya Technological


University. 2017

Technical Skills

Hardware / Platforms : Windows 7/10.


Technology : Azure data bricks, Azure data factory.
Databases : SQL, Azure SQL
Languages : Pyspark, SQL Server.

Relevant Project Experience

# [Link] Details:

Project Title QVC-Netezza retirement plan


Client Name QVC RETAIL GROUP,USA
Databases Azure SQL,SQL SERVER
Technologies Pyspark, ADF, Azure SQL, Azure Databricks.

QVC (short for "Quality Value Convenience") is an American free-to-air television network, and flagship shopping
channel specializing in televised home shopping, owned by Qurate Retail Group was setup using Azure Data
warehouse for Database Migration, systematic analysis and been used for extensive reporting to trace activities
within Transactional services, other activities and providing support as per business requirements.
Netezza retirement plan is a migration project in which transfer the data on premise netezza DB to azure cloud
and synapse as target

Roles and responsibilities:

1) Verifying and Validating Raw data after Receiving Netezza Database.


2) Working on daily, weekly, monthly and yearly data load activities.
3) Reprocessing the data if we found any duplicate issues
4) Providing the data extract reports based on the client requirements.
5) Analyze and make the Code Changes the as per the business requirement.
6) Adding and modifying the stores, providing the data extracts based on client need.
gireeshchoudary2@[Link]
+91- 8123405641

7) Working with incidents and Tasks.


8) Providing the Audit reports on daily basses.

# [Link] Details:

Project Title Data Product P101 Dashboards

Client Name Anglo American


Databases Azure SQL
Technologies Pyspark, ADF , Azure SQL

The solution is designed to connect to various on premises data sources and ingest the data in Azure Data Lake for
various sites using ADF. The data is transformed for various end points consumption such as Enterprise Data
Management, Data Warehouse and Azure Analysis Services and Power BI.

Roles and responsibilities:

1) Communication with Business Owner & Onsite Team in daily stand up and scrum call.
2) Designed and developed code and scripts in PySpark for Acquisition and Transformation from different sources.
3) Moved all Meta data files generated from various source systems to ADLS for further processing.
4) Imported data from different sources like ADLSs and BLOB for computation using Spark.
5) Implemented Spark using Python in Databricks utilizing Data Frames and Spark-SQL API for faster Processing of
data.
6) Used Avro, Json and Parquet data formats to store data into ADLSs.
7) Created data driven work flows for data movement and transformation using Data Factory.
8) Extracted huge files by using Azure Storage Explorer from the Data Lake.

Common questions

Powered by AI

The implementation of Azure Data Factory and Databricks in large-scale data migration projects, like the Netezza retirement plan at QVC, presents the challenge of managing vast volumes of data being transferred from legacy systems to modern cloud environments, requiring effective validation and reprocessing to handle duplicates and maintain data integrity . Opportunities include leveraging Azure's advanced analytics capabilities for enhanced reporting and systematic analysis, and the ability to handle complex workloads through PySpark scripts for data acquisition and transformation, leading to improved efficiency in data processing .

PySpark complements Azure SQL by providing a powerful framework for data acquisition and transformation, enabling the processing of large-scale data with Spark-SQL API for efficient computation and analysis . Azure SQL, on the other hand, serves as a robust database for storing and managing structured data, facilitating seamless integration and efficient data querying in enterprise environments . Their combined use maximizes data processing efficiency, allowing for complex transformations and queries that drive business insights and decision-making .

Verification and validation of raw data are crucial in data migration projects such as the Netezza retirement plan to ensure accuracy, consistency, and completeness as data moves from legacy systems to new environments . This process helps identify and rectify errors or anomalies early, such as duplicates, verifying that data meets business requirements and preserving its integrity during the transition . It also minimizes the risk of data corruption or loss, which could otherwise impact operational efficiencies and strategic decision-making in business processes .

Strategies to effectively handle incidents and tasks in Azure-based projects include implementing automated monitoring and alerting systems to quickly identify and respond to issues, employing robust logging and diagnostic tools for thorough incident analysis and resolution . Regular training for team members on Azure best practices and cross-functional coordination can ensure swift proactive measures. Utilizing incident management frameworks, such as ITIL, tailored to Azure environments can streamline response processes. Additionally, maintaining detailed documentation and conducting post-incident reviews can help in understanding root causes and preventing recurrence .

Using data formats such as Avro, Json, and Parquet in Azure Data Lakes is significant because they offer different advantages for storing and processing data. Avro is ideal for serializing multiple records and supports schema evolution, which is useful in dynamic environments . Json is widely used due to its simplicity and flexibility in representing hierarchical data structures. Parquet is a columnar format that optimizes storage and query performance, especially for complex analytical workloads, making it well-suited for big data processing in Azure environments . These formats collectively enable efficient data storage and access, facilitating advanced data analytics and transformation processes.

Gireesh K's professional experience and skills align with the demands of Azure and big data projects through his comprehensive understanding of the big data ecosystem and Azure Cloud systems, which are crucial for managing and implementing modern data solutions . His practical experience with PySpark, Spark SQL, Azure Data Factory, and Databricks, as well as his familiarity with transforming and handling large datasets, positions him well for executing complex data projects . His strong analytical skills and ability to configure workflows and manage incidents further complement his suitability for working in dynamic Azure environments .

Spark-SQL API plays a critical role in processing large datasets in Databricks by providing a scalable and efficient interface for executing SQL queries on large distributed datasets. It enables developers to leverage the power of Spark's distributed computation engine to perform complex transformations and aggregations on data stored in various formats . The API's integration with DataFrames allows seamless interoperability with structured data in Azure Data Lake, facilitating faster data processing and analysis, which improves performance and efficiency in data-intensive applications .

Managing a data product dashboard project using Azure technologies involves several key responsibilities: engaging in daily communication with business owners and onsite teams to ensure alignment and progress, designing and developing code and scripts in PySpark for data acquisition and transformation, and managing the movement of metadata and file data into Azure Data Lake for processing . Additionally, it includes creating data-driven workflows using Azure Data Factory for efficient data movement and transformation, and utilizing various data formats like Avro, Json, and Parquet for data storage in Azure Data Lakes .

Leveraging Azure's integration capabilities with Power BI can significantly enhance decision-making in business environments by providing robust tools for data visualization and analytics . Azure can ingest and process data from various sources into centralized data stores like Azure SQL and Data Lake, enabling consolidated insights across the enterprise. Power BI utilizes this processed data to create interactive dashboards and reports, offering real-time analytics that are essential for strategic planning and operational efficiency. This integration ensures that stakeholders have access to actionable insights and can make informed decisions based on the latest data metrics .

The benefits of using Azure Data Lake Services (ADLS) for storing and extracting large files include its ability to handle massive volumes of varied data in a scalable, secure, and cost-effective manner, providing a single repository for structured, semi-structured, and unstructured data . ADLS facilitates easy access to data through various Azure tools and supports high-performance analytics workloads. However, challenges include ensuring data governance and compliance, managing access control for secure data sharing, and possibly facing complexities in integrating with existing data management systems . Proper planning and management are essential to address these challenges effectively.

You might also like