0% found this document useful (0 votes)

85 views22 pages

Snowflake Data Access and Management

Snowflake is a cloud-based data platform that offers scalable data storage, processing, and analytics without the need for hardware or software management. It features a unique architecture with micro-partitioning for efficient data retrieval and supports various data types and views, including materialized and secure views. Snowflake operates on a pay-per-use credit system and provides a range of functionalities for data loading, querying, and management.

Uploaded by

Neeharika Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views22 pages

Snowflake Data Access and Management

Uploaded by

Neeharika Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

SNOWFLAKE

Introduction:
 Snowflake is a cloud data platform which is provided as soft-as-a-service (SaaS) which
enables data storage, processing and analytical solutions that are faster, easier to use,
and far more flexible than traditional offerings.
 Snowflake is a true SaaS offering. More specifically:
 There is no hardware (virtual or physical) to select, install, configure, or manage.
 There is virtually no software to install, configure, or manage.
 Ongoing maintenance, management, upgrades, and tuning are handled by
Snowflake.

 Snowflake is a Massively parallel database processing engine; this means that the
system uses multiple nodes to be able to scale the execution of queries.
 Snowflake runs completely on cloud infrastructure. All components of Snowflake’s
service (other than optional command line clients, drivers, and connectors), run in
public cloud infrastructures.
 Snowflake uses virtual compute instances for its compute needs and a storage service
for persistent storage of data. Snowflake cannot be run on private cloud infrastructures
(on-premises or hosted).
 Snowflake is not a packaged software offering that can be installed by a user. Snowflake
manages all aspects of software installation and updates.
 Snowflake data platform is built from scratch. It is not built on any existing database
technology or Bigdata software platforms such as Hadoop.
 Snowflake combines a completely new SQL query engine with an innovative architecture
natively designed for the cloud.

CLUSTER: -
It enables the data automatically into micro-partitions to allow for faster retrieval of frequently
requested data.

Micro partitions: -
 Snowflake has implemented a powerful and unique form of partitioning called Micro-
partitioning.
 Tables are transparently partitioned using the ordering of the data as it is
inserted/loaded.
 Snowflake is columnar-based and horizontally partitioned, meaning a row of data is
stored in the same micro-partition.
 Micro-partitions are small in size (50 to 500 MB).
 Data is compressed in micro partitions; snowflake automatically determines the most
efficient compression algorithm for the columns in each micro-partition.
 When data is inserted with batch insert, row-by-row, data is ordered on micro-partition based
on the order of rows inserted.

INSERT/COPY

 INSERT and COPY into table operations only create new micro-partitions.

UPDATE

 UPDATE operations keep the old MPs (before the change) and create new MPs with the change.
Each MP will have its own unique version IDs.

DELETE

 DELETE operations keep the old MPs (before the delete) and create new MPs with the change
by removing the record(s). Each MP will have its own unique version IDs.
 The difference between UPDATE and INSERT is that UPDATE also needs to scan existing
partitions. So, you can see an INSERT operation is only one operation without touching any
existing files, while an UPDATE operation does need to scan existing partitions. Like the UPDATE
operations, DELETE operations need to scan existing partitions to know which record(s) are to
be removed in the new MPs.
 Due to immutable characteristics of Micro-partitions, it makes time-travel possible and easier.
This time-travel feature is unique to Snowflake in this highly competitive technology space,
which can make data recovery super easier with proper retention period while it’s very time
consuming and error-prone in traditional databases and data warehouses. A very important side
note on locking and contention, since INSERTs only add new files, it will not need any locking on
existing MPs. As a result, INSERTs can be run with high concurrency compared to
UPDATE/DELETE/MERGE etc.

VIRTUAL WAREHOUSE: -
A virtual warehouse, often referred to simply as a “warehouse”, is a Cluster of compute
resources that executes database queries and commands. DML, select, copy commands uses
virtual warehouse. This process is automatic. A virtual warehouse can consist of one or more
clusters, within each cluster we can have 1-128 nodes.

Credit: -
Snowflake credits are used to pay for the consumption of resources on Snowflake. A Snowflake
credit is a unit of measure, and it is consumed only when a customer is using resources, such as
when a virtual warehouse is running, the cloud services layer is performing work, or serverless
features are used.

DATABASE: -
It is a logical grouping of schemas. Each database belongs to a single Snowflake account.

SCHEMA: -
It is a logical grouping of database objects (tables, views, etc.). Each schema belongs to a single
database.

Architecture: -

Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database

architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for
persisted data that is accessible from all compute nodes in the platform. But similar to shared-
nothing architectures, Snowflake processes queries using MPP (massively parallel processing)
compute clusters where each node in the cluster stores a portion of the entire data set locally.
This approach offers the data management simplicity of a shared-disk architecture, but with the
performance and scale-out benefits of a shared-nothing architecture.

Snowflake’s unique architecture consists of three key layers,

 Storage
 Query Processing
 Cloud Services

Storage: -

When data is loaded into Snowflake, Snowflake reorganizes that data into its internal
optimized, compressed, columnar format. Snowflake stores this optimized data in cloud
storage.
Snowflake manages all aspects of how this data is stored — the organization, file size, structure,
compression, metadata, statistics, and other aspects of data storage are handled by Snowflake.
The data objects stored by Snowflake are not directly visible nor accessible by customers; they
are only accessible through SQL query operations run using Snowflake.

Query Processing: -

Query execution is performed in the processing layer. Snowflake processes queries using
“virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple
compute nodes allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that does not share compute
resources with other virtual warehouses. As a result, each virtual warehouse has no impact on
the performance of other virtual warehouses.

Cloud Services: -

The cloud services layer is a collection of services that coordinate activities across Snowflake.
These services tie together all the different components of Snowflake in order to process user
requests, from login to query dispatch. The cloud services layer also runs on compute instances
provisioned by Snowflake from the cloud provider.

Services managed in this layer include:

 Authentication
 Query parsing and optimization
 Access control
 Infrastructure management
 Metadata management

Data types: -

Category Type Notes

NUMBER Default precision and scale are

(38,0).

DECIMAL, NUMERIC Synonymous with NUMBER.

Numeric Data INT, INTEGER, BIGINT, Synonymous with NUMBER except

Types SMALLINT, TINYINT, precision and scale cannot be
BYTEINT specified.

FLOAT, FLOAT4, FLOAT8

Category Type Notes

DOUBLE, DOUBLE Synonymous with FLOAT.

PRECISION, REAL

VARCHAR Default (and maximum) is

16,777,216 bytes.

CHAR, CHARACTER Synonymous with VARCHAR except

String & Binary
default length is VARCHAR(1).
Data Types
STRING Synonymous with VARCHAR.

TEXT Synonymous with VARCHAR.

BINARY

VARBINARY Synonymous with BINARY.

Logical Data BOOLEAN Currently only supported for

Types accounts provisioned after January
25, 2016.

DATE

DATETIME Alias for TIMESTAMP_NTZ

TIME

TIMESTAMP Alias for one of the TIMESTAMP

variations (TIMESTAMP_NTZ by
Date & Time
default).
Data Types
TIMESTAMP_LTZ TIMESTAMP with local time zone;
time zone, if provided, is not stored.

TIMESTAMP_NTZ TIMESTAMP with no time zone;

time zone, if provided, is not stored.

TIMESTAMP_TZ TIMESTAMP with time zone.

VARIANT
Semi-structured
OBJECT
Data Types
ARRAY

Geospatial Data GEOGRAPHY

Category Type Notes

Types GEOMETRY

Analytic functions: -

 Ranking Functions (RANK and DENSE_RANK, CUME_DIST, PERCENT_RANK,

ROW_NUMBER)
 Windowing Aggregate Functions

SUM, AVG, MAX, MIN, COUNT, STDDEV, VARIANCE, FIRST_VALUE, LAST_VALUE

 Reporting Aggregate Functions

LAG/LEAD Functions, FIRST/LAST Functions

 Inverse Percentile Functions

PERCENTILE_CONT, PERCENTILE_DISC

 Hypothetical Rank and Distribution Functions

RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST

 Linear Regression Functions

REGR_COUNT, REGR_AVGY and REGR_AVGX, REGR_SLOPE and

REGR_INTERCEPT, REGR_R2, REGR_SXX, REGR_SYY, and REGR_SXY

 Other Statistical Functions

WIDTH_BUCKET Function

Tables: -
1. Permanent: -
 The data stored in permanent tables consume space and contributes to the
storage charges that snowflake bills your account.
 Permanent tables have a Fail-safe period and provide additional security of data
recovery and protection.
2. Temporary: -
 Temporary tables only exist within the session in which they were created and
persist only for the remainder of the session.
 They are not visible to other users or sessions.
 Once the session ends, data stored in the table is purged completely from the
system and, therefore, is not recoverable, either by the user who created the
table or Snowflake.
3. Transient: -
 Transient tables are like permanent tables; only key difference is that they do
not have a Fail-safe period.
 Transient Tables are meant for temporary data that must be kept after each
session but do not require the same level of data protection and recovery as
Permanent Tables.
4. External: -
 External tables allow to query the files stored in external stage like a regular
table, it means without moving data from files to Snowflake tables.
 It accesses the files stored in external stage area such as Amazon S3, GCP, Azure
blob storage.
 Basically, a metadata-only table, where the actual files and records are in the
cloud storage.
 External tables are read only, therefore no DML operations can be performed on
them. But we can use them for query and join operations on them.
 Querying data from external tables is likely slower than querying database
tables.
 We can analyze the data without storing it in Snowflake.
 We can also create views against external table.

Stages: -
 Snowflake doesn’t allow any direct data loading to a table directly. It must be happened
via stage location.
 A stage is a database object and specifies where data files are stored (staged) so that
the data in the files can be loaded into a table.
 Snowflake file formats are used while loading/unloading data from Snowflake stages
into tables using COPY INTO command and while creating EXTERNAL TABLES on files
present in stages.

Types of stages: -

 User stage: -
 By default, each user has a snowflake stage allocated to them for storing the
files.
 User stages are referenced using ‘@~’.
 This stage is a convenient option if your files will only be accessed by a single
user but need to be copied into multiple tables.
 This option is not appropriate if:
 Multiple users require access to the files.
 The current user does not have INSERT privileges on the tables the data
will be loaded into.
 Unlike named stages, user stages cannot be altered or dropped.

 Table stage: -
 By default, each table has a snowflake stage allocated to it for storing the files.
 Table stage can be referenced using ‘@%’.
 This stage is a convenient option if your files need to be accessible to multiple
users and only need to be copied into a single table.
 This option is not appropriate if you need to copy the data in the files into
multiple tables.
 Unlike named stages, table stages cannot be altered or dropped.

 Named stage: -
 It is a snowflake object which can be created by us. We can list this stage by
using ‘@’.
 We can use put command for loading data into stage.
 We can use copy into command for unloading data from stage to table.
 There are 2 types of named stages. They are:
 Named internal stage.
 Named external stage.
File format: -
 Snowflake File format is a named database object that can be used to translate the
external files data into tabular format.
 Snowflake supports 6 types of file formats (CSV, JSON, AVRO, ORC, PARQUET, XML).
 We can assign file formats to a stage, in a copy command, in an external table while loading data
into Snowflake.
 CSV and TSV are structured data file formats.
 JSON, AVRO, ORC, PARQUET, XML are semi structured data file formats.

Views: -
 A view is a database object that contains SQL query built over one or multiple tables.
 It is considered as a virtual table that can be used almost anywhere that a table can be
used (filters, joins, subqueries, etc.,).
 Whenever you query a view, the underlying SQL query associated with the view gets
executed dynamically and will fetch data from underlying tables.
 Views serve a variety of purposes like combining, segregating, protecting data.
 Changes to a table are not automatically propagated to views created on that table.
 For example, if you drop a column in a table, the views on that table might become
invalid.

Advantages of views: -
 Encapsulate complex query logic.
 Store common queries in the schema, so they can be reused.
 Views Allow Granting Access to a Subset of a Table.
 Materialized Views Can Improve Performance.
 No need of additional maintenance, auto refresh of results.

Types of views: -
1. Regular Views/Non-materialized views: -
 A non-materialized view’s results are created by executing the query at the time
that the view is referenced in a query.
 The results are not stored for future use.
 Performance is slower as compared to materialized views.
 Non-materialized views are the most common type of view.

2. Materialized views: -
 Materialized views are designed to improve query performance for workloads
composed of common, repeated query patterns.
 A materialized view stores pre-computed result set.
 Materialized views require Enterprise edition or higher.
 No need to refresh the materialized view manually. It can be refreshed
automatically.
 Querying a materialized view gives better performance than querying the base
tables.
 It can be created on single table; we can’t build it on multiple tables by joining.
 Use materialized view on a table which is queried frequently.
 The results of the view are kept up to date automatically, stored and directly
pulled every time the view is referenced.
 Storage cost: Materialized view stores query results, which adds to the monthly
storage usage for account.
 Compute cost: To prevent materialized view from becoming out-of-date,
snowflake performs automatic background maintenance of materialized views.
When a base table changes, all materialized views defined on the table are
updated by a background service that uses compute resources provided by
snowflake. So, there will be a compute cost associated with it.

3. Secured views: -
 Secure view does not allow users to see the definition of the view.
 The definition of the view is only exposed to authorized users only.
 If we don’t want the users to see underlying tables present in a database create
secure view.
 The view can be referenced but its underlying definition is not exposed.
 Use secure views whenever the view logic must be hidden from the view users.
Important Points:
 Order by clause can be a part of view definition, but snowflake recommends excluding
it.
 Views are not dynamic and doesn’t change automatically unless underlying sources are
modified.
 We cannot use limit clause in a materialized view.
 Self-join is also not possible within the materialized view.
 Whenever a view is created and granted privileges on that view to a role, the role can
use the view even if the role doesn’t have privileges on the underlying table.
 Use materialized views when:
 The query results from the view doesn’t change often.
 The results of the view are used often.
 The query consuming a lot of resources (it means query takes long time to
process and fetch the data).
 Create a regular view when:
 The results of the view change often.
 The results are not used often.
 The query is simple.
 The query contains multiple tables.
 For secure view snowflake doesn’t show how much data is scanned.
 Snowflake accepts the force keyword but doesn’t support it.
 Do not query stream objects in the select statement. Streams are not designed to serve
as a source for views or materialized views.
 Creating a materialized view requires create materialized view privilege on the schema
and select privilege on the base table.
 When you choose a name for a materialized view, note that a schema cannot contain a
table and view with the same name.
 We can’t specify a Having and Order By clause in a materialized view.
 A materialized view can’t query:
 A materialized view
 A non-materialized view
 A UDTF (User Defined Table Function)
 A materialized view can’t include:
 UDFs, Limit, Window functions, etc.,

Snowpipe: -
 Snowpipe enables loading data from files as soon as they are available in a stage.
 This means you can load data from files in micro-batches, making it available to users
within minutes, rather than manually executing COPY statements on a schedule to load
larger batches.
 Continuous loading means loading small volumes of data in continuous manner like for
every 10 minutes or for every hour etc.
 It can be live or real time data.
 For loading continuous data into tables, snowflake uses Snowpipe.
 The data is loaded according to the COPY statement defined in a referenced pipe.
 It uses the resources provided by Snowflake; it is a serverless task.
 It is a onetime setup.
 Suggested micro file size is 100-250 MB.
 Snowpipe uses file loading metadata associated with each pipe object to prevent
reloading the same files (and duplicating data) in a table.
 This metadata stores the path (i.e., prefix) and name of each loaded file, and prevents
loading files with the same name even if they were later modified.

Zero copy cloning: -

 Snowflake allows you to create clones, also known as zero copy clones.
 We can perform clone operation on databases, schemas, tables, streams, file formats,
stages, tasks.
 We can maintain multiple copies of data with no additional cost, so called zero copy.
 A snapshot of data present in the source object is taken when the clone is created and is
made available to cloned object.
 The cloned object and its source are independent to each other.

Streams: -
 A Stream is an object that records DML changes made to table including insert, update
and delete.
 A stream records update operation as a set delete (delete old record) and insert (insert
new record).
 It tracks all row level changes to a source table using offset but doesn’t store the
changed data.
 We call this process as change data capture (CDC).
 Streams store metadata about each change, so that actions can be taken using this
metadata.
 Streams can be combined with tasks to set continuous data pipeline.
 Snowpipe + stream + task  continuous data load.
 Along with changes made to table streams maintain 3 metadata fields i.e.,
METADATA$ACTION, METADATA$ISUPDATE and METADATA$ROW_ID.
Metadata$action Metadata$isupdate Action
Insert False To identify insert
records
Insert True To identify update
records
Delete false To identify delete
records

Types: -
1) Standard stream/Delta stream: - A standard stream records all DML changes made to
table including insert, update and delete.

Create or replace stream stream_name on table table_name;

2) Append-only streams - It tracks row inserts only. Update and delete (including table
truncate) operations are not recorded.

Create or replace stream stream_name on table table_name Append_only = true;

3) Insert only stream: - It tracks only row inserts for external tables only. They do not
record delete operations.

Create or replace stream stream_name on external table table_name insert_only = true;

Tasks: -
 We use tasks for scheduling in snowflake.
 We can schedule
 SQL queries
 Stored procedures
 Tasks can be combined with table streams for implementing the continuous change data
captures.
 We can maintain DAG of tasks to keep the dependencies between tasks.
 Tasks require compute resources to execute SQL code, we can choose either of
 Snowflake managed compute resources (serverless) --> introduced recently
(even though we don’t mention warehouse it will consume snowflake compute
resources)
 User managed (Virtual warehouses)
DAG of tasks: -
 DAG – Directed Acyclic Graph.
 To maintain dependencies between tasks.
 A root task followed by child tasks.
 Just schedule root task, child tasks will be executed in order.
Time travel: -
 It enables to access historical data i.e., data that has been changed or deleted at any
point within a defined period.
 It serves as a powerful tool for performing the following tasks:
 Restoring data-related objects (tables, schemas, and databases) that might have
been accidentally or intentionally deleted.
 Duplicating and backing up data from key points in the past.
 Analyzing data usage/manipulation over specified periods of time.
Retention period:
 It specifies the number of days for which historical data is preserved. Higher the
retention period higher the storage cost.
 Increasing Retention causes the data currently in Time Travel to be retained for the
longer time.
 For example, if you have a table with a 10-day retention period and increase the period
to 20 days, data that would have been removed after 10 days is now retained for an
additional 10 days before moving into Fail-safe.
 Note that this doesn’t apply to any data that is older than 10 days and has already
moved into Fail-safe.
 Decreasing Retention reduces the amount of time data is retained in Time Travel:
 For active data modified after the retention period is reduced, the new shorter period
applies.
 For data that is currently in Time Travel:
 If the data is still within the new shorter period, it remains in Time Travel.
 If the data is outside the new period, it moves into Fail-safe.
 For example, if you have a table with a 10-day retention period and you decrease the
period to 1-day, data from days 2 to 10 will be moved into Fail-safe, leaving only the
data from day 1 accessible through Time Travel.

 Changing the retention period for your account or individual objects changes the value
for all lower-level objects that do not have a retention period explicitly set. For example:
 If you change the retention period at the account level, all databases, schemas,
and tables that do not have an explicit retention period automatically inherit the
new retention period.
 If you change the retention period at the schema level, all tables in the schema
that do not have an explicit retention period inherit the new retention period.
 Keep this in mind when changing the retention period for your account or any objects in
your account because the change might have Time Travel consequences that you did
not anticipate or intend. In particular, we do not recommend changing the retention
period to 0 at the account level.
Fail safe:

 Fail-safe provides a (non-configurable) 7-day period during which historical data may be
recoverable by Snowflake.

 This period starts immediately after the Time Travel retention period ends.

 Fail-safe is a data recovery service that is provided on a best effort basis and is intended
only for use when all other recovery options have been attempted.
 Fail-safe is not provided as a means for accessing historical data after the Time Travel
retention period has ended. It is for use only by Snowflake to recover data that may
have been lost or damaged due to extreme operational failures.
 Data recovery through Fail-safe may take from several hours to several days to
complete.

Querying historical data: -

1. The following query selects historical data from a table as of the date and time
represented by the specified timestamp:

select * from table_name at (timestamp => 'wed, 28 sep 2022 [Link]'::timestamp);

2. The following query selects historical data from a table as of 5 minutes ago:

Select * from table_name at(offset=> -60*5);

3. The following query selects historical data from a table up to, but not including any
changes made by the specified statement:

Select * from table_name before(statement=> ‘query_id’);

Column level security: -

 Column level security in snowflake allows the application of a masking policy to a
column within a table or view.
 To protect sensitive data like customers PHI, bank balance etc.,
 It includes two features
 Dynamic data masking
 External Tokenization
 Dynamic data masking is the process of hiding data by masking with other characters.
We can create masking policies to hide the data present in columns.
 External Tokenization is the process of hiding sensitive data by replacing it with cypher
text. External tokenization makes use of masking policies with external functions
created at external cloud provider side.

Masking policies: -
 Snowflake supports masking policies to protect sensitive data from unauthorized access
while allowing authorized users to access at query runtime.
 Masking policies are schema level objects.
 Masking policies can include conditions and functions to transform the data when these
conditions are met.
 Same masking policy can be applied on multiple columns.

Dynamic data masking: -

 Sensitive data in snowflake is not modified in an existing table. But when users execute
a query, it will apply the masking dynamically and displays the masked data. Hence
called the Dynamic data masking.
 The data can be masked, partially masked, obfuscated(unclear) or tokenized data.
 Unauthorized users can operate the data as usual, but they can’t view the data.
 Mostly masking policies applied based on the roles.
Limitations: -
 Before dropping masking policies, we should unset them.
 Data type of input and output value must be same.

Caching: -
 Caching is a temporary storage location that stores copies of files or data, so that they
can be accessed faster in near future.
 Cache plays vital role in saving costs and speeding up results.
 It improves query performance.
 Types of caches in snowflake
 Metadata Caching
 Query results cache (or) Result cache
 Local disk cache (or) Warehouse cache

Metadata cache: -
 Metadata about Tables and micro-partitions are collected and managed by snowflake
automatically.
 Snowflake does not use compute to provide Range values like MIN, MAX, Number of
distinct values, NULL count and ROW count and clustering information.
 Fetching metadata is faster.

Results cache: -
 Results cache is located in cloud layer. This cached data will be available for next 24
hours.
 Results cache will be available and can be accessed across different virtual warehouses.
 Query results returned to one user is available to any other user on the system who
executes the same query.
 It works until underlying data has not changed.
 Here mandatory condition is query should be same.
 It won’t work for subset of data and even won’t work if we re-order columns.

Local disk cache: -

 Local disk cache is located in the virtual warehouse (cached data is stored in EC2 if we
host our snowflake account on AWS and Virtual machines if we host our account on
AZURE).
 Cache(stores) the data (not the results) fetched by SQL queries (we can re-order
columns i.e., query condition is not mandatory).
 Whenever data is needed for a given query it is retrieved from the remote disk storage
and cached in SSD and memory (for the first time).
 Cached data is only available until the VW is up and running.
 Once the VW is suspended cache gets deleted.
 Even works when we query subset of data that is available in local disk cache.
E.g. Suppose when we query 10k records for 1 st time, local disk cache will hold this 10k
records
Data and next time if we try to query only 2k or 3k records which is subset of above 10k,
it will fetch from local disk cache.
 This cache depends on virtual warehouse size we are using.
E.g. small VW can’t hold millions of records, but it can fetch part of the data from local
disk cache and remaining from remote disk.

Imp. Points: -
 Caching helps not only to improve performance but also saves a lot of compute credits.
 Snowflake architecture has 3 major components Cloud service layer caches the query result
and sometime also referred as Result Set Cache or Query Result Cache.
 Result set cache which holds the results of every query executed in the past 24 hrs.
 The result cache will be available across all Virtual warehouses.
 The result set cache is invalidated by snowflake when underlying data is changed.
 The result set cache is not used by snowflake when the newly submitted query does not
match the previously executed query. Because it stays fresh as long as the underlying data
doesn’t change and submit a word-for-word identical query as a previous query within
24hrs of the original query.
 Query result is reused if the following criteria is met,
 New query syntactically matches the previously executed query.
 Query doesn’t include functions that are evaluated at execution time (excluding
current date).
 Query doesn’t include UDFs or external functions.
 The underlying data has not changed.
 Each time the continued result for a query is reused, Snowflake resets the 24-hour retention
period for the result up to a maximum of 31 days from the date and time that the query was
first executed.
 Snowflake provides a table function that returns results for queries executed within last
24hrs. Result scan returns the set of a previous command (within 24hrs of when you
executed the query) as if the result was a table.
 The role accessing the cached results should have required privileges to the underlying
tables.
 The size of warehouse cache is determined by compute resources in the warehouse (i.e
larger the warehouse, therefore more compute resources larger the cache).
 Decreasing the size of a running warehouse removes compute resources from the
warehouse. When the compute resources are removed, the cache associated with those
resources is dropped.
 Any kind of caching doesn’t incur any storage cost.
 Queries that evaluate functions at execution time (current_timestamp etc.,) can’t use result
cache but current_date() function is an exception.
 Table record count is stored in snowflake’s cloud service and this information is fetched
from metadata service or metadata cache.
 Show tables; (it is a metadata operation and doesn’t need any virtual warehouse or any
caching data usage. However, snowflake uses metadata cache to fetch result).
 A security token used to access large, persisted query results (i.e. greater than 100KB in
size) expires after 6 hours. A new token can be retrieved to access results while they are still
in cache. Smaller persisted query results do not use an access token.

Access Control: -
 Access control privileges determines who can access database objects and perform
operations on specific objects in Snowflake.
 Snowflake’s approach to access control combines aspects from both of the following
models:
 Discretionary Access Control (DAC): Each object has an owner, who can in turn
grant access to that object.
 Role-based Access Control (RBAC): Access privileges are assigned to roles, which
are in turn assigned to users.

 The key concepts to understanding access control in Snowflake are:

 Securable object: An entity to which access can be granted. Unless allowed by a

grant, access is denied. Tables, Schemas, Views etc.
 Role: An entity to which privileges can be granted. Roles are in turn assigned to
users. Note that roles can also be assigned to other roles, creating a role
hierarchy.
 Privilege: A defined level of access that can be granted to an object. Multiple
distinct privileges may be used to control the granularity of access granted.
 User: Specifies the person or system to whom access was granted.
 In the Snowflake model, access to securable objects is allowed via privileges assigned to
roles, which are in turn assigned to other roles or users. In addition, each securable
object has an owner that can grant access to other roles.
 This model is different from a user-based access control model, in which rights and
privileges are assigned to each user or group of users. The Snowflake model is designed
to provide a significant amount of both control and flexibility.

Data Sharing: -

 Secure Data Sharing enables sharing selected objects in a database in your account with
other Snowflake accounts.

 The following Snowflake database objects can be shared:

 Tables
 External tables
 Secure views
 Secure materialized views
 Secure UDFs
 Provider is the one who is sharing the data.
 Consumer is the one who is consuming the data from data provider.
 We can share data in the following ways.
1. Account to account share (Direct share)
 In this provider can share data to consumer, who is in same cloud and
same region.
 Let’s say the provider account exists on AWS cloud with the US-east-1
region, to provide data to consumer, the consumer account must also
exist on AWS cloud with US-east-1.
 In this type of sharing consumer will have to pay only for compute and
provider will pay for the storage.

2. Reader account (Direct Share)

 In this case the provider needs to share the data to the consumer who
don't have snowflake account, then the provider can create reader
account and allows consumer to access the data.
 In this type of sharing consumer will have to pay for both compute and
storage.
 Reader account users cannot perform any DML operations, instead they
can have only select access.

3. Cross cloud and Cross region (DATA REPLICATION)

 If the provider wants to share the data to consumer, who is in the same
cloud and different region or different cloud different region we need to
replicate data.
 Let’s say the provider account exists on AWS cloud with the US-east-1
region and the consumer account in AWS cloud with US-west-1 or in
GCP/AZURE then we need to replicate the data.
 In this way of sharing snowflake makes a copy of data to the consumer
account.
 This way of sharing is costlier.
User Defined Functions: -
 UDF allows to perform operations that are not available through the built-in system
defined functions.
 Create UDF whenever there is a need to repeat the same functionality.
 Snowflake supports 4 languages for writing UDFs.
 SQL
 Java Script
 Java
 Python
 Snowflake UDFs can return scalar (value or a string) and tabular results.
 Snowflake UDFs overloading means support functions with same name but different
parameters.
 Proc_calculate_area() is different from Proc_calculate_area(radius float)
 Proc_calculate_area(radius float) is different from Proc_calculate_area(length
number, width number)

Sample UDFs:
SCALAR: Returns output for each input we TABULAR: Can return zero, one or multiple
are passing. rows.
Create function area_of_circle(radius float) create function t()
returns float returns table(name varchar, age number)
as as
$$ $$
pi() * radius * radius Select ‘RAVI’, 34
$$ union
; Select ‘LATHA’, 27
union
Select ‘MADHU’, 25
Select area_of_circle(4.5); $$
;

 Note: - In 99% of cases we do not use tabular UDFs instead we can use Stored
Procedures.

How can you handle if the data coming from file exceeds the length of a file in the table?
We can handle this by using (TRUNCATECOLUMNS = TRUE) in copy command. If we don’t
specify this copy command will fail. By default it is set to false.

Does snowflake supports Indexes?

No, we can’t define indexes on snowflake tables instead we can use cluster keys on larger
tables for better performance.

What do you mean by Horizontal and Vertical Scaling?

 Horizontal Scaling: Horizontal scaling increases concurrency by scaling horizontally. As

your customer base grows, you can use auto-scaling to increase the number of virtual
warehouses, enabling you to respond instantly to additional queries.
 Vertical Scaling: Vertical scaling involves increasing the processing power (e.g. CPU,
RAM) of an existing machine. It generally involves scaling that can reduce processing
time. Consider choosing a larger virtual warehouse-size if you want to optimize your
workload and make it run faster.

Common questions

Materialized views in Snowflake improve query performance by storing pre-computed result sets and automatically refreshing as the base table changes, which reduces query execution time . However, they incur additional storage and compute costs due to storing the results and maintaining them via Snowflake's background service . In contrast, regular views do not store results and re-execute the query whenever accessed, which might lead to slower performance but without the extra storage costs. Materialized views can only be created on a single table without joins, whereas regular views can be based on multiple tables .

In Snowflake, structured data typically refers to data that fits into a relational table format like CSV and TSV, whereas semi-structured data includes formats that do not necessarily adhere to a fixed schema, such as JSON, AVRO, ORC, PARQUET, and XML . Snowflake provides support for both structured and semi-structured data formats, allowing the translation of external file data into a tabular format using its built-in file format objects .

Stream objects in Snowflake play a critical role in continuous data pipelines by capturing DML changes—insert, update, and delete operations—on a table in real-time, which can be used to drive further processing or data integration tasks . When combined with tasks and Snowpipe, streams enable real-time or near-real-time data processing workflows that automatically react to changes in source tables. They capture metadata about changes and provide an efficient way to automate data movement and transformation processes without manual intervention . This combination offers a robust solution for maintaining an up-to-date and responsive data ecosystem, supporting agile data applications such as fraud detection or customer behavior analytics .

Snowflake's query result cache stores the outputs of queries executed in the past 24 hours in the cloud service layer, allowing subsequent identical queries to rapidly retrieve results without re-execution, contributing to significant performance improvements and reduced compute costs . This cache is invalidated if the underlying data changes or if the query logic is modified. Meanwhile, the local disk cache retains data fetched by a SQL query in a virtual warehouse, which persists only while the warehouse is active; it allows fast access to subsets of data fetched in previous queries without reloading from cloud storage, enhancing query performance . Both caches significantly improve performance by reducing redundant data processing and leveraging prior computations .

File formats impact data loading performance in Snowflake by influencing the ease and speed at which data can be parsed and ingested. Structured formats like CSV and TSV typically provide faster parsing due to their rigid structure, while semi-structured formats like JSON, AVRO, ORC, PARQUET, and XML might require additional processing to map data into table schemas . Snowflake accommodates these various file types by supporting six different file formats and providing named database objects to handle the conversion of external files into a tabular format, thus optimizing the ingestion process by allowing customized configurations to match data sources and planned table structures .

Snowflake's time travel feature allows users to access historical data for up to 90 days (based on account configuration) by specifying a timestamp or statement ID, thereby facilitating undo operations or historical data analysis . Data that surpasses its time travel period is temporarily recoverable through the fail-safe period, a non-configurable 7-day window, which serves as a safety net for recovering data after extreme operational failures . These mechanisms impact data retention by requiring careful management of historical data access needs and storage costs, as extending time travel periods could increase storage usage significantly, while the fail-safe is a last-resort recovery measure not intended for everyday access, ensuring data is retrievable in emergencies .

Zero copy cloning in Snowflake offers significant benefits as it allows creating clones of databases, schemas, tables, and other objects without duplicating the actual data, thereby maintaining data independently without incurring additional storage costs . Since it essentially creates a snapshot of the data that can be used separately from the source, it supports efficient data management by allowing rapid prototyping, testing, and multi-environment support without additional costs .

Snowflake's Snowpipe allows real-time data loading by enabling data to be ingested in micro-batches as soon as files are available in a stage, as opposed to scheduling larger batch processes . This continuous data loading capability, facilitated by pipes and serverless operations, ensures timely data availability and reduces latency in data processing, particularly beneficial for use cases requiring up-to-date insights from streaming data sources . Practically, it allows organizations to respond quickly to changing data dynamics and supports near real-time analytics, though it may require careful management of S3 or other stages for optimal performance .

Secured views in Snowflake are appropriate when there is a need to protect the SQL logic or underlying table details from users, allowing data to be referenced but keeping its structure or source essentially hidden . A secured view is particularly useful when confidentiality and security are paramount, such as in roles dealing with sensitive business logic or proprietary calculations or when user access to detailed inner workings is minimized . However, secured views do not allow the user to see the underlying SQL definition, preventing transparency in how data results are computed, which could be a limitation in environments requiring full auditability or user insight into query logic .

Dynamic data masking in Snowflake enhances data security by modifying sensitive data visibility at query runtime, ensuring that unauthorized users only see masked or obfuscated data based on defined masking policies . This approach allows secure handling of sensitive information, such as personal health data or financial records, while enabling authorized queries full access based on roles . Operational limitations include the requirement to ensure data types between inputs and outputs remain consistent, and policies must be unset before being dropped. This adds some complexity to data governance as changes to data type or policy can impact data access indirectly .

Snowflake Cloud Data Platform Overview
No ratings yet
Snowflake Cloud Data Platform Overview
108 pages
Snowflake Architecture and Features Explained
No ratings yet
Snowflake Architecture and Features Explained
4 pages
Snowflake Database Architecture Overview
No ratings yet
Snowflake Database Architecture Overview
20 pages
Snowflake Data Warehouse Overview
No ratings yet
Snowflake Data Warehouse Overview
6 pages
Snowflake Data Cloud Overview
No ratings yet
Snowflake Data Cloud Overview
2,157 pages
Snowflake Architecture Overview
No ratings yet
Snowflake Architecture Overview
220 pages
Top 32 Snowflake Interview Questions
No ratings yet
Top 32 Snowflake Interview Questions
11 pages
Snowflake Database Overview and Features
No ratings yet
Snowflake Database Overview and Features
10 pages
Snowflake Cloud Data Warehouse Overview
No ratings yet
Snowflake Cloud Data Warehouse Overview
6 pages
Snowflake Data Warehouse Features Guide
No ratings yet
Snowflake Data Warehouse Features Guide
68 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
71 pages
Snowflake Architecture and Features Overview
No ratings yet
Snowflake Architecture and Features Overview
8 pages
Snowflake Data Warehousing Essentials
100% (1)
Snowflake Data Warehousing Essentials
40 pages
Introduction to Snowflake Data Warehouse
No ratings yet
Introduction to Snowflake Data Warehouse
32 pages
Snowflake Architecture Deep Dive
No ratings yet
Snowflake Architecture Deep Dive
51 pages
Overcoming Data Warehouse Challenges with Snowflake
No ratings yet
Overcoming Data Warehouse Challenges with Snowflake
5 pages
Snowflake Course Overview and Guide
No ratings yet
Snowflake Course Overview and Guide
83 pages
Understanding Snowflake Data Warehouse
No ratings yet
Understanding Snowflake Data Warehouse
34 pages
Snowflake: Key Concepts Overview
No ratings yet
Snowflake: Key Concepts Overview
2 pages
Snowflake vs. Redshift vs. BigQuery
No ratings yet
Snowflake vs. Redshift vs. BigQuery
4 pages
Optimizing dbt and Snowflake Best Practices
100% (1)
Optimizing dbt and Snowflake Best Practices
30 pages
Snowflake Cloud Data Warehouse Overview
No ratings yet
Snowflake Cloud Data Warehouse Overview
73 pages
Snowflake Data Warehouse Architecture Overview
No ratings yet
Snowflake Data Warehouse Architecture Overview
65 pages
Snowflake Data Engineering Guide
No ratings yet
Snowflake Data Engineering Guide
23 pages
Snowflake Data Warehouse Features Overview
No ratings yet
Snowflake Data Warehouse Features Overview
7 pages
Snowflake Cloud Data Warehouse Overview
No ratings yet
Snowflake Cloud Data Warehouse Overview
31 pages
Snowflake Architecture Overview
No ratings yet
Snowflake Architecture Overview
10 pages
Snowflake Tutorial and Key Concepts Guide
91% (11)
Snowflake Tutorial and Key Concepts Guide
67 pages
Snowflake AI Data Cloud Guide
No ratings yet
Snowflake AI Data Cloud Guide
9 pages
Snowflake Data Warehouse Overview
No ratings yet
Snowflake Data Warehouse Overview
16 pages
Snowflake Data Storage and Management Guide
No ratings yet
Snowflake Data Storage and Management Guide
34 pages
Snowflake Cloud Database Overview
No ratings yet
Snowflake Cloud Database Overview
50 pages
Snowflake Interview Questions and Insights
No ratings yet
Snowflake Interview Questions and Insights
16 pages
Snowflake Database Objects Overview
No ratings yet
Snowflake Database Objects Overview
39 pages
Snowflake Data Warehousing Overview
No ratings yet
Snowflake Data Warehousing Overview
155 pages
Best Practices for Tableau and Snowflake
No ratings yet
Best Practices for Tableau and Snowflake
64 pages
Snowflake Architecture Overview
No ratings yet
Snowflake Architecture Overview
4 pages
Snowflake Interview Questions for Freshers
No ratings yet
Snowflake Interview Questions for Freshers
6 pages
Snowflake Overview and Key Features
No ratings yet
Snowflake Overview and Key Features
20 pages
Snowflake Data Warehousing Overview
No ratings yet
Snowflake Data Warehousing Overview
90 pages
Snowflake Data Cloud Overview 2021
No ratings yet
Snowflake Data Cloud Overview 2021
32 pages
Snowflake Data Warehousing Overview
No ratings yet
Snowflake Data Warehousing Overview
10 pages
Snowflake Trial Account Setup Guide
No ratings yet
Snowflake Trial Account Setup Guide
30 pages
Snowflake Data Migration and Architecture Guide
No ratings yet
Snowflake Data Migration and Architecture Guide
102 pages
Snowflake Basics
No ratings yet
Snowflake Basics
11 pages
Snowflake Platform Training Overview
No ratings yet
Snowflake Platform Training Overview
58 pages
Snowflake Data Cloud Seminar Overview
No ratings yet
Snowflake Data Cloud Seminar Overview
16 pages
Understanding Semi-Structured Data Formats
No ratings yet
Understanding Semi-Structured Data Formats
29 pages
3 - Interview Q - A
No ratings yet
3 - Interview Q - A
23 pages
Introduction to Snowflake Data Cloud
No ratings yet
Introduction to Snowflake Data Cloud
29 pages
Understanding Snowflake Architecture and Features
No ratings yet
Understanding Snowflake Architecture and Features
280 pages
Snowflake Fundamentals Overview
No ratings yet
Snowflake Fundamentals Overview
50 pages
Snowflake Interview Questions & Insights
No ratings yet
Snowflake Interview Questions & Insights
20 pages
Summer Internship in Cloud & DevOps
No ratings yet
Summer Internship in Cloud & DevOps
10 pages
Decision Tables and Programming Methodologies
No ratings yet
Decision Tables and Programming Methodologies
9 pages
Collecting Oracle 10046 Trace Data
No ratings yet
Collecting Oracle 10046 Trace Data
3 pages
CCS341 Data Warehousing Course Outline
100% (1)
CCS341 Data Warehousing Course Outline
5 pages
Top 10 Online Money-Making Methods
No ratings yet
Top 10 Online Money-Making Methods
3 pages
Cybersecurity Guide: Protect Against Phishing
No ratings yet
Cybersecurity Guide: Protect Against Phishing
2 pages
SQL Server 2008 T-SQL Query Writing Course
No ratings yet
SQL Server 2008 T-SQL Query Writing Course
6 pages
Fortinet's AI-Driven Cloud Security Solutions
No ratings yet
Fortinet's AI-Driven Cloud Security Solutions
7 pages
TMF640-Service Activation and Configuration-V5.0.0
No ratings yet
TMF640-Service Activation and Configuration-V5.0.0
70 pages
Computer Science Student Resume
No ratings yet
Computer Science Student Resume
1 page
Full Stack Drupal Developer Resume
No ratings yet
Full Stack Drupal Developer Resume
4 pages
Conditional Access Policy Flowchart
No ratings yet
Conditional Access Policy Flowchart
1 page
Digital Marketing Course Overview
No ratings yet
Digital Marketing Course Overview
5 pages
iSOL Technologies: IT Solutions Overview
No ratings yet
iSOL Technologies: IT Solutions Overview
8 pages
SAP HANA Trace Log Analysis 777
No ratings yet
SAP HANA Trace Log Analysis 777
455 pages
Efficient Automotive Firmware Reversing
No ratings yet
Efficient Automotive Firmware Reversing
32 pages
FortiSASE Self-Paced Training Guide
No ratings yet
FortiSASE Self-Paced Training Guide
2 pages
Computer Science An Overview 13th Edition Glenn Brookshear Dennis Brylow Ebook & Testbank
No ratings yet
Computer Science An Overview 13th Edition Glenn Brookshear Dennis Brylow Ebook & Testbank
265 pages
Advantages of Database Systems Explained
No ratings yet
Advantages of Database Systems Explained
8 pages
VAPT Specialist Resume of Harsha G
No ratings yet
VAPT Specialist Resume of Harsha G
2 pages
Pulkit Dembla's Profile Overview
No ratings yet
Pulkit Dembla's Profile Overview
2 pages
AWS DevOps Engineer Training Course
No ratings yet
AWS DevOps Engineer Training Course
8 pages
Line of Business Sales - L1 Deck
No ratings yet
Line of Business Sales - L1 Deck
89 pages
SystemVerilog Overview and Testbench Guide
No ratings yet
SystemVerilog Overview and Testbench Guide
4 pages
Number: C9510-418 Passing Score: 800 Time Limit: 120 Min File Version: 1
No ratings yet
Number: C9510-418 Passing Score: 800 Time Limit: 120 Min File Version: 1
39 pages
Introduction to Java Programming Basics
No ratings yet
Introduction to Java Programming Basics
68 pages
Adil Reza's Technical Profile
No ratings yet
Adil Reza's Technical Profile
1 page
Games For Vocabulary Practice - Interactive Vocabulary Activities For All Levels PDF
No ratings yet
Games For Vocabulary Practice - Interactive Vocabulary Activities For All Levels PDF
121 pages
Understanding Data Literacy Framework
No ratings yet
Understanding Data Literacy Framework
5 pages
Angular Web Development Internship Report
No ratings yet
Angular Web Development Internship Report
21 pages

Snowflake Data Access and Management

Uploaded by

Snowflake Data Access and Management

Uploaded by

SNOWFLAKE

Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database

Snowflake’s unique architecture consists of three key layers,

Services managed in this layer include:

Category Type Notes

NUMBER Default precision and scale are

DECIMAL, NUMERIC Synonymous with NUMBER.

Numeric Data INT, INTEGER, BIGINT, Synonymous with NUMBER except

FLOAT, FLOAT4, FLOAT8

DOUBLE, DOUBLE Synonymous with FLOAT.

VARCHAR Default (and maximum) is

CHAR, CHARACTER Synonymous with VARCHAR except

TEXT Synonymous with VARCHAR.

VARBINARY Synonymous with BINARY.

Logical Data BOOLEAN Currently only supported for

DATETIME Alias for TIMESTAMP_NTZ

TIMESTAMP Alias for one of the TIMESTAMP

TIMESTAMP_NTZ TIMESTAMP with no time zone;

TIMESTAMP_TZ TIMESTAMP with time zone.

Geospatial Data GEOGRAPHY

 Ranking Functions (RANK and DENSE_RANK, CUME_DIST, PERCENT_RANK,

SUM, AVG, MAX, MIN, COUNT, STDDEV, VARIANCE, FIRST_VALUE, LAST_VALUE

 Reporting Aggregate Functions

LAG/LEAD Functions, FIRST/LAST Functions

 Inverse Percentile Functions

 Hypothetical Rank and Distribution Functions

RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST

 Linear Regression Functions

REGR_COUNT, REGR_AVGY and REGR_AVGX, REGR_SLOPE and

 Other Statistical Functions

Zero copy cloning: -

Create or replace stream stream_name on table table_name;

Create or replace stream stream_name on table table_name Append_only = true;

Create or replace stream stream_name on external table table_name insert_only = true;

Querying historical data: -

select * from table_name at (timestamp => 'wed, 28 sep 2022 [Link]'::timestamp);

Select * from table_name at(offset=> -60*5);

Select * from table_name before(statement=> ‘****query_id****’);

Column level security: -

Dynamic data masking: -

Local disk cache: -

 The key concepts to understanding access control in Snowflake are:

 Securable object: An entity to which access can be granted. Unless allowed by a

 The following Snowflake database objects can be shared:

2. Reader account (Direct Share)

3. Cross cloud and Cross region (DATA REPLICATION)

Does snowflake supports Indexes?

What do you mean by Horizontal and Vertical Scaling?

 Horizontal Scaling: Horizontal scaling increases concurrency by scaling horizontally. As

Common questions

What are the benefits and limitations of using materialized views in a Snowflake database compared to regular views?

What are the benefits and limitations of using materialized views in a Snowflake database compared to regular views?

What is the distinction between structured and semi-structured data in Snowflake, and which file formats are supported for each type?

What is the distinction between structured and semi-structured data in Snowflake, and which file formats are supported for each type?

Discuss the significance of stream objects in Snowflake's continuous data pipeline and their interplay with other components.

Discuss the significance of stream objects in Snowflake's continuous data pipeline and their interplay with other components.

Explain the difference between Snowflake's query result cache and local disk cache, and how each contributes to performance optimization.

Explain the difference between Snowflake's query result cache and local disk cache, and how each contributes to performance optimization.

In what ways can file formats impact data loading performance in Snowflake, and how does Snowflake accommodate various file types?

In what ways can file formats impact data loading performance in Snowflake, and how does Snowflake accommodate various file types?

What mechanisms does Snowflake provide to handle time travel and fail-safe periods in data management, and what is their impact on data retention policies?

What mechanisms does Snowflake provide to handle time travel and fail-safe periods in data management, and what is their impact on data retention policies?

How does Snowflake’s zero copy cloning functionality provide benefits in terms of data management and cost efficiency?

How does Snowflake’s zero copy cloning functionality provide benefits in terms of data management and cost efficiency?

How does Snowflake's Snowpipe feature enable real-time data loading and what are the practical implications of using it?

How does Snowflake's Snowpipe feature enable real-time data loading and what are the practical implications of using it?

In what scenarios would it be appropriate to use a secured view in Snowflake, and what limitations do secured views have?

In what scenarios would it be appropriate to use a secured view in Snowflake, and what limitations do secured views have?

How does dynamic data masking in Snowflake enhance data security, and what are its operational limitations?

How does dynamic data masking in Snowflake enhance data security, and what are its operational limitations?

You might also like

Select * from table_name before(statement=> ‘query_id’);