0% found this document useful (0 votes)

851 views11 pages

Snowflake Data Warehousing Overview

Uploaded by

22metadata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

851 views11 pages

Snowflake Data Warehousing Overview

Uploaded by

22metadata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Snowflake
Snowflake UI and Offerings
Snowflake Architecture
Warehousing in Snowflake
Configuring a Snowflake Warehouse
Advanced Warehouse Configuration
Managing Warehouses

Data engineering & Warehousing on

External tables

What is snowflake ?
● Snowflake is a SaaS cloud-based data warehousing platform
available on Azure , AWS and GCP
● Snowflake can't be hosted on self hosted or on premise systems.
● It offers scalable, high-performance data storage and analysis. It
separates compute and storage, allowing independent scaling
and cost management.
● It’s easy to set up and manage, handles multiple users and
workloads simultaneously, and integrates well with other tools.

Snowflake properties
● Snowflake isn’t a relational database ,so it doesn’t enforce
Primary and Foreign key constraints.
● It allows a concept of Snowflake SQL that allows DDL and DML
commands ,stored procedures and User defined functions.
● It allows the creation of Views and Materialised views.
● It allows the implementation of ACID transactions.
● It allows aggregations , window functions and hierarchical
querying CTE and Recursive CTE’s.

Snowflake properties
● Snowflake can easily integrate with ETL tools like Talend ,
Informatica , Pentaho. Etc
● We can leverage data from Snowflake using Bi tools like Power BI
, Tableau etc
● Can even integrate with big data tools like Databricks.
● Even can connect data via JDBC & ODBC drivers
We can leverage Snowflake by using its Snowflake Web , Snowflake
CLI and DBeaver

Snowflake UI

Snowflake offerings
● The storage and compute is easily scalable.
● For tuning purposes you need not to Index , tune the performance,
create partitions and no physical storage design.
● Pay per use can be tuned to reduce the cost.
Snowflake architecture :

The architecture of Snowflake brings together the best features of

shared-disk and shared-nothing database architectures,
respectively-the simplicity of shared-disk with the performance benefits
of shared-nothing.

Layers of Architecture:

Database Storage:
● Data is automatically reorganised into Snowflake's optimised,
compressed, columnar format when loaded.
● Data is stored in cloud storage that is managed wholly by
Snowflake: organization, compression, structure, etc.
● Data objects are not directly accessible by customers; they are
only accessible through SQL queries.
● In our case data is stored in ADLS gen 2 under the hood
Query Processing:
● Performed using "virtual warehouses"-independent MPP compute
clusters.
● Each virtual warehouse is isolated, and no compute resources are
shared among warehouses, so there's no performance impact
between warehouses.

Cloud Services:
● Collection of services that command and coordinate activities
across Snowflake.
● Handles authentication, infrastructure management, metadata
management, query optimization, and access control.
● Runs on compute instances provisioned by Snowflake from the
cloud provider.
● You can choose between cloud services like Azure , AWS & GCP

- In this tutorial we’ve used Snowflake on Azure cloud.

What is a Warehouse :
In Snowflake, the term "warehouse" refers to a virtual compute resource
that processes queries and performs data loading, unloading, and other
operations within the Snowflake environment. A Snowflake warehouse
is not a physical storage facility, rather, it provides the computing power
required to execute queries on the data stored in Snowflake's
cloud-based storage (Azure ADLS G2).

● Warehouses are responsible for executing SQL queries and other

operations.
● Multiple warehouses can operate concurrently, allowing for
parallel processing of tasks.

After going to the Admin option if we have the sys admin or account
admin rights then we can create a warehouse from the above option
To Create & Configure a Warehouse in Snowflake ;

1. Click
2. Then enter the Name and configure the clusters

3. Choose any one type among these two

5. Choose the Size of the warehouse (cluster)

Lesser the size of cluster less the cost i.e; 1 credit/hour for the XS
cluster , but this cluster is suitable for the less intensive or can say light
tasks.
Similarly , the Larger the size more the cost
Usually cost of clusters depend on the Cloud u are choosing and the
Region as well , Refer this link for the pricing Click here
These are the cluster options for Azure Central India location. That we
are using for the tutorial.

6. Then configure the advance option if you want , exactly what we can
configure in the advanced options are :
● Auto resume allows Snowflake warehouses to run the cluster
automatically whenever the query is executed.
● Auto Suspend allows the warehouse to be disabled when not in
use after a certain period of time of inactivity.
● Multi-cluster Warehouse allows increase or decrease in number
of clusters as per load detecting automatically.

This enables us to define the numbers of minimum and maximum

numbers of clusters and the scaling option as standard & Economy

Scaling Policy Start time Endtime

Standard Starts immediately whenever the

system detects there might be
more query that can be executed
by currently available clusters.

Economy
7. The Query Acceleration Service (QAS) helps maintain high
performance in cloud data warehouses by offloading resource-heavy
portions of queries to additional compute resources. This ensures that
large, unpredictable, or complex queries don’t degrade the overall
performance of the warehouse. It’s particularly useful in environments
where workloads vary greatly in size and complexity, such as ad hoc
analytics, large data scans, or queries with highly selective filters.

After configuring the options and setting simply create the Warehouse

OR
We can use the SQL Notebook to create the Warehouse :
Here we can see our newly created warehouse

What if we want to alter the properties of our existing warehouses

Similarly, we can modify other properties and even can drop the
warehouses as per requirement and use case.

Common questions

Snowflake offers features like independent compute and storage scaling, pay-per-use charging, and automatic resource scaling via features like auto-resume and auto-suspend. These tools help users control and optimize their spending based on workload demands, ensuring that resources are used efficiently and costs are reduced during periods of low activity. Such features are effective because they provide flexibility to match capacity with actual needs and limit unused resource time, directly impacting data warehousing expenses .

Snowflake separates compute and storage, allowing independent scaling and cost management. This separation means that users can scale up or down compute resources independently of data storage, optimizing costs and performance based on current needs. The storage is managed in a compressed, columnar format, enhancing efficiency. In contrast to traditional systems, users do not have to manage physical storage design or indexing, further streamlining operation and scaling .

The lack of enforced primary and foreign key constraints in Snowflake shifts the responsibility of ensuring data integrity to the application and query level rather than the database schema. This means that while Snowflake allows flexibility in schema design, users must implement integrity checks elsewhere, potentially complicating application logic and making data validation processes crucial. However, this approach can lead to more efficient data loading and updates, as there is no overhead from constraint validation during these operations .

Snowflake's query processing is conducted by independent MPP compute clusters, or virtual warehouses, which isolate processing power and ensure no performance degradation between different workloads. The inclusion of the Query Acceleration Service (QAS) further offloads resource-heavy query segments to additional compute resources, maintaining high performance for complex queries. This approach is crucial for data-intensive applications requiring quick insights from large datasets or handling complex analytics without performance issues .

ACID compliance in Snowflake ensures that transactions are completed reliably and data integrity is maintained despite operations. This is essential in environments dealing with high-volume and critical data, guaranteeing that operations are atomic, consistent, isolated, and durable. As a result, users can trust that concurrent operations will not interfere with each other, and data recovery mechanisms ensure consistency even in case of failures, supporting robust application operations .

Snowflake's use of virtual warehouses allows for isolated compute clusters that do not share resources, meaning performance issues in one do not impact others. This allows for concurrent processing of multiple workloads, enhancing efficiency and flexibility. Additionally, virtual warehouses can scale automatically, optimizing resource use for varying workloads and helping to control costs, as users are billed for compute separately from storage .

Snowflake integrates with ETL tools such as Talend, Informatica, and Pentaho, and BI tools like Power BI and Tableau. It also supports big data tools such as Databricks, and connections via JDBC and ODBC drivers. These integrations allow seamless data extraction, transformation, and loading processes, further enabling powerful data analysis and visualization capabilities without data movement, thus reducing latency and potential data governance issues .

As a SaaS platform, Snowflake eliminates the need for infrastructure management, allowing users to focus on data and queries instead of hardware or software maintenance. This enhances accessibility as users can easily access Snowflake from anywhere via the internet, facilitating remote and distributed work. SaaS delivery also supports automatic updates and scaling, providing a hassle-free experience for management and reduced total cost of ownership compared to self-hosted solutions .

Multi-cluster warehouses in Snowflake automatically adjust the number of clusters based on workload demand. This auto-scaling ensures that resources are dynamically allocated during peak periods while not incurring unnecessary costs during low demand. Options like auto-resume/suspend provide further cost efficiency by minimizing idle resource costs. Thus, it enables efficient resource utilization without manual intervention, making workload management more adaptable and cost-effective .

Snowflake's architecture incorporates the simplicity and centralized data management of shared-disk systems with the performance scalability of shared-nothing systems. While data is centrally managed in cloud storage, compute resources are provisioned in isolated clusters, ensuring no resource contention. This hybrid approach allows users to benefit from centralized data access while achieving high performance and scalability typical of shared-nothing systems .

Snowflake Query Management Essentials
No ratings yet
Snowflake Query Management Essentials
122 pages
Optimizing dbt and Snowflake Best Practices
100% (1)
Optimizing dbt and Snowflake Best Practices
30 pages
Snowflake Interview Q&A Guide
No ratings yet
Snowflake Interview Q&A Guide
7 pages
Top 25 Snowflake Interview Questions
No ratings yet
Top 25 Snowflake Interview Questions
5 pages
Top 50 DBT Interview Questions
No ratings yet
Top 50 DBT Interview Questions
18 pages
SnowPro Advanced Data Engineer Study Guide
No ratings yet
SnowPro Advanced Data Engineer Study Guide
14 pages
Snowflake - Interview Questions
No ratings yet
Snowflake - Interview Questions
15 pages
Tuning Spark for Big Data Performance
100% (1)
Tuning Spark for Big Data Performance
20 pages
Apache Airflow Developer Training Guide
No ratings yet
Apache Airflow Developer Training Guide
3 pages
Spark SQL Query Optimization Techniques
No ratings yet
Spark SQL Query Optimization Techniques
29 pages
Data Engineering Learning Roadmap
No ratings yet
Data Engineering Learning Roadmap
4 pages
Pyspark Dataframe Queries Guide
No ratings yet
Pyspark Dataframe Queries Guide
10 pages
DBT Interview Questions Overview
No ratings yet
DBT Interview Questions Overview
21 pages
Pyspark Interview Questions Overview
No ratings yet
Pyspark Interview Questions Overview
15 pages
Hadoop Admin Experience with AWS R5a Instances
100% (1)
Hadoop Admin Experience with AWS R5a Instances
52 pages
Creating and Managing Snowflake External Tables
No ratings yet
Creating and Managing Snowflake External Tables
105 pages
Spark Interview Questions: Driver & Data Skew
No ratings yet
Spark Interview Questions: Driver & Data Skew
34 pages
SQL Interview Questions for Data Engineers
No ratings yet
SQL Interview Questions for Data Engineers
11 pages
Spark Tuning with Ganglia Insights
No ratings yet
Spark Tuning with Ganglia Insights
37 pages
dbt Cloud Architecture Overview Guide
0% (1)
dbt Cloud Architecture Overview Guide
4 pages
Data Lakes and PySpark Interview Questions
100% (1)
Data Lakes and PySpark Interview Questions
14 pages
PySpark Tutorial for Beginners
No ratings yet
PySpark Tutorial for Beginners
206 pages
Pyspark Union and UnionByName Guide
No ratings yet
Pyspark Union and UnionByName Guide
66 pages
Snowflake Classic Console Overview
No ratings yet
Snowflake Classic Console Overview
44 pages
Advanced Azure Data Engineering Project
100% (1)
Advanced Azure Data Engineering Project
5 pages
Understanding Oracle Index Types
100% (1)
Understanding Oracle Index Types
27 pages
DBT Real-Time Project Overview
No ratings yet
DBT Real-Time Project Overview
15 pages
Senior Data Engineer Resume Overview
No ratings yet
Senior Data Engineer Resume Overview
5 pages
Data Cleaning with Apache Spark
No ratings yet
Data Cleaning with Apache Spark
21 pages
Azure Data Engineer Training Overview
No ratings yet
Azure Data Engineer Training Overview
6 pages
Mock Interview: Data Engineer Insights
No ratings yet
Mock Interview: Data Engineer Insights
11 pages
Overview of dbt Cloud Features
No ratings yet
Overview of dbt Cloud Features
109 pages
Snowflake Interview Questions Guide
No ratings yet
Snowflake Interview Questions Guide
20 pages
Oracle Database Overview and Commands
100% (4)
Oracle Database Overview and Commands
59 pages
PySpark Features and Applications
No ratings yet
PySpark Features and Applications
31 pages
PySpark Basics and Data Management
No ratings yet
PySpark Basics and Data Management
102 pages
BigQuery Interview Questions Guide
No ratings yet
BigQuery Interview Questions Guide
5 pages
Data Engineering Concepts and Hadoop Overview
No ratings yet
Data Engineering Concepts and Hadoop Overview
6 pages
Essential dbt Interview Questions
No ratings yet
Essential dbt Interview Questions
44 pages
Azure Data Engineering Video Series
100% (1)
Azure Data Engineering Video Series
21 pages
Hive Query Execution and Data Management
75% (4)
Hive Query Execution and Data Management
17 pages
Apache Spark Interview Q&A Guide
No ratings yet
Apache Spark Interview Q&A Guide
62 pages
Python Interview Questions and Answers
No ratings yet
Python Interview Questions and Answers
4 pages
Snowpro™ Core: Exam Study Guide
No ratings yet
Snowpro™ Core: Exam Study Guide
17 pages
Spark Optimizations and Deployment Guide
No ratings yet
Spark Optimizations and Deployment Guide
39 pages
Spark Production Insights by Databricks
No ratings yet
Spark Production Insights by Databricks
34 pages
Snowflake Database Operations Guide
No ratings yet
Snowflake Database Operations Guide
21 pages
Top 50 PySpark Interview Questions
No ratings yet
Top 50 PySpark Interview Questions
9 pages
Essential PySpark for Databricks Interviews
No ratings yet
Essential PySpark for Databricks Interviews
7 pages
Databricks Performance Optimization Guide
No ratings yet
Databricks Performance Optimization Guide
9 pages
PySpark 4 Interview Questions Guide
No ratings yet
PySpark 4 Interview Questions Guide
5 pages
Snowflake Training and Certification Guide
No ratings yet
Snowflake Training and Certification Guide
10 pages
Pyspark Window Functions Overview
100% (1)
Pyspark Window Functions Overview
8 pages
Python Interview Questions Guide
No ratings yet
Python Interview Questions Guide
15 pages
Understanding Snowflake Architecture
No ratings yet
Understanding Snowflake Architecture
35 pages
Snowflake Data Warehousing Overview
No ratings yet
Snowflake Data Warehousing Overview
90 pages
Snowflake Data Warehouse Architecture
No ratings yet
Snowflake Data Warehouse Architecture
20 pages
Snowflake Architecture Overview
No ratings yet
Snowflake Architecture Overview
220 pages
Data Sources
No ratings yet
Data Sources
3 pages
CBC Interpretation Presentation
No ratings yet
CBC Interpretation Presentation
13 pages
Azure BI Developer: Automation & Reporting
No ratings yet
Azure BI Developer: Automation & Reporting
1 page
Figma CV Template for Fullstack Developer
No ratings yet
Figma CV Template for Fullstack Developer
1 page
Train Booking and Management System
No ratings yet
Train Booking and Management System
1 page
Caking Tendency of Prilled Urea
No ratings yet
Caking Tendency of Prilled Urea
6 pages
NX Blowers Installation Guide
No ratings yet
NX Blowers Installation Guide
25 pages
OTOP Strategic Plan for Heritage MSMEs
No ratings yet
OTOP Strategic Plan for Heritage MSMEs
33 pages
CP Ipc
No ratings yet
CP Ipc
2 pages
Marketing and Society MCQ Resource
No ratings yet
Marketing and Society MCQ Resource
23 pages
Android SCADA Project Dedication & Acknowledgement
No ratings yet
Android SCADA Project Dedication & Acknowledgement
68 pages
Type 2 Diabetes Treatment Protocol
No ratings yet
Type 2 Diabetes Treatment Protocol
1 page
Supreme Court Case G.R. No. L-5470 Summary
No ratings yet
Supreme Court Case G.R. No. L-5470 Summary
3 pages
Advantages of the Sinhala Community
No ratings yet
Advantages of the Sinhala Community
2 pages
Microeconomics - VEO 2022-Slide Bu I 3+4. 18-19.6
No ratings yet
Microeconomics - VEO 2022-Slide Bu I 3+4. 18-19.6
102 pages
Hedge Ratios and Cross Hedging Explained
100% (1)
Hedge Ratios and Cross Hedging Explained
12 pages
SK Appointments in Barangay Taggat Norte
No ratings yet
SK Appointments in Barangay Taggat Norte
6 pages
LRTA vs. Salvaña: Dishonesty Appeal Decision
No ratings yet
LRTA vs. Salvaña: Dishonesty Appeal Decision
4 pages
SAP S4HANA Procurement Exam Questions
No ratings yet
SAP S4HANA Procurement Exam Questions
48 pages
Formal Letter Writing in Malaysia
100% (1)
Formal Letter Writing in Malaysia
7 pages
Woodwork Skill Acquisition in NCE Students
No ratings yet
Woodwork Skill Acquisition in NCE Students
9 pages
Challenges in Integrated Urban Infrastructure
No ratings yet
Challenges in Integrated Urban Infrastructure
10 pages
Ruling on Document Admissibility in Uganda Case
No ratings yet
Ruling on Document Admissibility in Uganda Case
6 pages
Acute Myeloid & Lymphoblastic Leukaemia
No ratings yet
Acute Myeloid & Lymphoblastic Leukaemia
23 pages
Differences in Near-Field vs. Far-Field Communication
No ratings yet
Differences in Near-Field vs. Far-Field Communication
9 pages
Root Faction Combination Guide
No ratings yet
Root Faction Combination Guide
20 pages
Green Shade Restaurant Marketing Plan
No ratings yet
Green Shade Restaurant Marketing Plan
2 pages
Level 4 Real Estate Salesperson Course
No ratings yet
Level 4 Real Estate Salesperson Course
23 pages
2024 Bond Proposal Overview
No ratings yet
2024 Bond Proposal Overview
22 pages
Types of Conductor Splices Explained
No ratings yet
Types of Conductor Splices Explained
6 pages
Definition of Wearable Technology
No ratings yet
Definition of Wearable Technology
3 pages
Ethylene Production Hazard Analysis
100% (3)
Ethylene Production Hazard Analysis
22 pages
Fingerprint Classification Systems Explained
No ratings yet
Fingerprint Classification Systems Explained
4 pages
Vembanad White Cement Insights
No ratings yet
Vembanad White Cement Insights
64 pages
NWSDB Invoice and Payment Details
No ratings yet
NWSDB Invoice and Payment Details
1 page

Snowflake Data Warehousing Overview

Uploaded by

Snowflake Data Warehousing Overview

Uploaded by

Data engineering & Warehousing on

The architecture of Snowflake brings together the best features of

- In this tutorial we’ve used Snowflake on Azure cloud.

● Warehouses are responsible for executing SQL queries and other

3. Choose any one type among these two

5. Choose the Size of the warehouse (cluster)

This enables us to define the numbers of minimum and maximum

Scaling Policy Start time Endtime

Standard Starts immediately whenever the

What if we want to alter the properties of our existing warehouses

Common questions

What features does Snowflake provide to optimize cost management, and how effective are these in reducing data warehousing expenses?

How does Snowflake's approach to storage and compute impact its data warehousing efficiency and scalability?

What are the implications of Snowflake's lack of primary and foreign key constraints on database design and data integrity?

In what ways does Snowflake enable high-performance query processing, and why is this important for data-intensive applications?

Analyze how ACID compliance in Snowflake contributes to reliable data transactions and operations.

Evaluate the benefits of Snowflake's approach to resource allocation using virtual warehouses.

In what ways does Snowflake integrate with other data processing and analysis tools, and what are the benefits of such integration?

How does Snowflake's use of a SaaS model enhance its accessibility and management for users compared to traditional on-premise databases?

Discuss how Snowflake's use of multi-cluster warehouses contributes to workload management and cost optimization.

How does Snowflake's architecture blend the benefits of shared-disk and shared-nothing database architectures?

You might also like