0% found this document useful (0 votes)
851 views11 pages

Snowflake Data Warehousing Overview

Uploaded by

22metadata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
851 views11 pages

Snowflake Data Warehousing Overview

Uploaded by

22metadata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Introduction to Snowflake
  • Snowflake UI and Offerings
  • Snowflake Architecture
  • Warehousing in Snowflake
  • Configuring a Snowflake Warehouse
  • Advanced Warehouse Configuration
  • Managing Warehouses

Data engineering & Warehousing on

External tables

What is snowflake ?
● Snowflake is a SaaS cloud-based data warehousing platform
available on Azure , AWS and GCP
● Snowflake can't be hosted on self hosted or on premise systems.
● It offers scalable, high-performance data storage and analysis. It
separates compute and storage, allowing independent scaling
and cost management.
● It’s easy to set up and manage, handles multiple users and
workloads simultaneously, and integrates well with other tools.

Snowflake properties
● Snowflake isn’t a relational database ,so it doesn’t enforce
Primary and Foreign key constraints.
● It allows a concept of Snowflake SQL that allows DDL and DML
commands ,stored procedures and User defined functions.
● It allows the creation of Views and Materialised views.
● It allows the implementation of ACID transactions.
● It allows aggregations , window functions and hierarchical
querying CTE and Recursive CTE’s.

Snowflake properties
● Snowflake can easily integrate with ETL tools like Talend ,
Informatica , Pentaho. Etc
● We can leverage data from Snowflake using Bi tools like Power BI
, Tableau etc
● Can even integrate with big data tools like Databricks.
● Even can connect data via JDBC & ODBC drivers
We can leverage Snowflake by using its Snowflake Web , Snowflake
CLI and DBeaver

Snowflake UI

Snowflake offerings
● The storage and compute is easily scalable.
● For tuning purposes you need not to Index , tune the performance,
create partitions and no physical storage design.
● Pay per use can be tuned to reduce the cost.
Snowflake architecture :

The architecture of Snowflake brings together the best features of


shared-disk and shared-nothing database architectures,
respectively-the simplicity of shared-disk with the performance benefits
of shared-nothing.

Layers of Architecture:

Database Storage:
● Data is automatically reorganised into Snowflake's optimised,
compressed, columnar format when loaded.
● Data is stored in cloud storage that is managed wholly by
Snowflake: organization, compression, structure, etc.
● Data objects are not directly accessible by customers; they are
only accessible through SQL queries.
● In our case data is stored in ADLS gen 2 under the hood
Query Processing:
● Performed using "virtual warehouses"-independent MPP compute
clusters.
● Each virtual warehouse is isolated, and no compute resources are
shared among warehouses, so there's no performance impact
between warehouses.

Cloud Services:
● Collection of services that command and coordinate activities
across Snowflake.
● Handles authentication, infrastructure management, metadata
management, query optimization, and access control.
● Runs on compute instances provisioned by Snowflake from the
cloud provider.
● You can choose between cloud services like Azure , AWS & GCP

- In this tutorial we’ve used Snowflake on Azure cloud.


What is a Warehouse :
In Snowflake, the term "warehouse" refers to a virtual compute resource
that processes queries and performs data loading, unloading, and other
operations within the Snowflake environment. A Snowflake warehouse
is not a physical storage facility, rather, it provides the computing power
required to execute queries on the data stored in Snowflake's
cloud-based storage (Azure ADLS G2).

● Warehouses are responsible for executing SQL queries and other


operations.
● Multiple warehouses can operate concurrently, allowing for
parallel processing of tasks.

After going to the Admin option if we have the sys admin or account
admin rights then we can create a warehouse from the above option
To Create & Configure a Warehouse in Snowflake ;

1. Click
2. Then enter the Name and configure the clusters

3. Choose any one type among these two

5. Choose the Size of the warehouse (cluster)

Lesser the size of cluster less the cost i.e; 1 credit/hour for the XS
cluster , but this cluster is suitable for the less intensive or can say light
tasks.
Similarly , the Larger the size more the cost
Usually cost of clusters depend on the Cloud u are choosing and the
Region as well , Refer this link for the pricing Click here
These are the cluster options for Azure Central India location. That we
are using for the tutorial.

6. Then configure the advance option if you want , exactly what we can
configure in the advanced options are :
● Auto resume allows Snowflake warehouses to run the cluster
automatically whenever the query is executed.
● Auto Suspend allows the warehouse to be disabled when not in
use after a certain period of time of inactivity.
● Multi-cluster Warehouse allows increase or decrease in number
of clusters as per load detecting automatically.

This enables us to define the numbers of minimum and maximum


numbers of clusters and the scaling option as standard & Economy

Scaling Policy Start time Endtime

Standard Starts immediately whenever the


system detects there might be
more query that can be executed
by currently available clusters.

Economy
7. The Query Acceleration Service (QAS) helps maintain high
performance in cloud data warehouses by offloading resource-heavy
portions of queries to additional compute resources. This ensures that
large, unpredictable, or complex queries don’t degrade the overall
performance of the warehouse. It’s particularly useful in environments
where workloads vary greatly in size and complexity, such as ad hoc
analytics, large data scans, or queries with highly selective filters.

After configuring the options and setting simply create the Warehouse

OR
We can use the SQL Notebook to create the Warehouse :
Here we can see our newly created warehouse

What if we want to alter the properties of our existing warehouses

Similarly, we can modify other properties and even can drop the
warehouses as per requirement and use case.

Common questions

Powered by AI

Snowflake offers features like independent compute and storage scaling, pay-per-use charging, and automatic resource scaling via features like auto-resume and auto-suspend. These tools help users control and optimize their spending based on workload demands, ensuring that resources are used efficiently and costs are reduced during periods of low activity. Such features are effective because they provide flexibility to match capacity with actual needs and limit unused resource time, directly impacting data warehousing expenses .

Snowflake separates compute and storage, allowing independent scaling and cost management. This separation means that users can scale up or down compute resources independently of data storage, optimizing costs and performance based on current needs. The storage is managed in a compressed, columnar format, enhancing efficiency. In contrast to traditional systems, users do not have to manage physical storage design or indexing, further streamlining operation and scaling .

The lack of enforced primary and foreign key constraints in Snowflake shifts the responsibility of ensuring data integrity to the application and query level rather than the database schema. This means that while Snowflake allows flexibility in schema design, users must implement integrity checks elsewhere, potentially complicating application logic and making data validation processes crucial. However, this approach can lead to more efficient data loading and updates, as there is no overhead from constraint validation during these operations .

Snowflake's query processing is conducted by independent MPP compute clusters, or virtual warehouses, which isolate processing power and ensure no performance degradation between different workloads. The inclusion of the Query Acceleration Service (QAS) further offloads resource-heavy query segments to additional compute resources, maintaining high performance for complex queries. This approach is crucial for data-intensive applications requiring quick insights from large datasets or handling complex analytics without performance issues .

ACID compliance in Snowflake ensures that transactions are completed reliably and data integrity is maintained despite operations. This is essential in environments dealing with high-volume and critical data, guaranteeing that operations are atomic, consistent, isolated, and durable. As a result, users can trust that concurrent operations will not interfere with each other, and data recovery mechanisms ensure consistency even in case of failures, supporting robust application operations .

Snowflake's use of virtual warehouses allows for isolated compute clusters that do not share resources, meaning performance issues in one do not impact others. This allows for concurrent processing of multiple workloads, enhancing efficiency and flexibility. Additionally, virtual warehouses can scale automatically, optimizing resource use for varying workloads and helping to control costs, as users are billed for compute separately from storage .

Snowflake integrates with ETL tools such as Talend, Informatica, and Pentaho, and BI tools like Power BI and Tableau. It also supports big data tools such as Databricks, and connections via JDBC and ODBC drivers. These integrations allow seamless data extraction, transformation, and loading processes, further enabling powerful data analysis and visualization capabilities without data movement, thus reducing latency and potential data governance issues .

As a SaaS platform, Snowflake eliminates the need for infrastructure management, allowing users to focus on data and queries instead of hardware or software maintenance. This enhances accessibility as users can easily access Snowflake from anywhere via the internet, facilitating remote and distributed work. SaaS delivery also supports automatic updates and scaling, providing a hassle-free experience for management and reduced total cost of ownership compared to self-hosted solutions .

Multi-cluster warehouses in Snowflake automatically adjust the number of clusters based on workload demand. This auto-scaling ensures that resources are dynamically allocated during peak periods while not incurring unnecessary costs during low demand. Options like auto-resume/suspend provide further cost efficiency by minimizing idle resource costs. Thus, it enables efficient resource utilization without manual intervention, making workload management more adaptable and cost-effective .

Snowflake's architecture incorporates the simplicity and centralized data management of shared-disk systems with the performance scalability of shared-nothing systems. While data is centrally managed in cloud storage, compute resources are provisioned in isolated clusters, ensuring no resource contention. This hybrid approach allows users to benefit from centralized data access while achieving high performance and scalability typical of shared-nothing systems .

You might also like