Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 167

What is Snowflake Dynamic Data Masking?

The Snowflake Data Cloud has a number of powerful features that empower organizations to make
more data-driven decisions.
In this blog, we’re going to explore Snowflake’s Dynamic Data Masking feature in detail, including
what it is, how it helps, and why it’s so important for security purposes.

What is Snowflake Dynamic Data Masking?

Snowflake Dynamic Data Masking (DDM) is a data security feature that allows you to alter sections
of data (from a table or a view) to keep their anonymity using a predefined masking strategy.

Data owners can decide how much sensitive data to reveal to different data consumers or data
requestors using Snowflake’s Dynamic Data Masking function, which helps prevent accidental and
intentional threats. It’s a policy-based security feature that keeps the data in the database unchanged
while hiding sensitive data (i.e. PII, PHI, PCI-DSS), in the query result set over specific database
fields.

For example, a call center agent may be able to identify a customer by checking the final four
characters of their Social Security Number (SSN) or PII field, but the entire SSN or PII field of the
customer should not be shown to the call center agent (data requester).
Dynamic Data Masking (also known as on-the-fly data masking) policy can be specified to hide part
of the SSN or PII field so that the call center agent (data requester) does not get access to the sensitive
data. On the other hand, an appropriate data masking policy can be defined to protect SSNs or PII
fields, allowing production support members to query production environments for troubleshooting
without seeing any SSN or any other PII fields, and thus complying with compliance regulations.
Figure 1: Data Masking Using Masking Policy in Snowflake
The intention of Dynamic Data Masking is to protect the actual data and substitute or hide where the
actual data is not required to non-privileged users without changing or altering the data at rest.

Static vs Dynamic Data Masking


There is two types of data masking: static and dynamic. By modifying data at rest, Static Data
Masking (SDM) permanently replaces sensitive data. Dynamic Data Masking (DDM) strives to
replace sensitive data in transit while keeping the original data at rest intact and unchanged. The
unmasked data will remain visible in the actual database. DDM is primarily used to apply role-based
(object-level) security for databases.

Reasons for Data Masking

Data is masked for different reasons. The main reason here is risk reduction, and according to
guidelines set by the security teams, to limit the possibility of a sensitive data leak. Data is also
masked for commercial reasons such as masking of financial data that should not be common
knowledge, even within the organization. There is a Compliance reason and it is driven by
requirements or recommendations based on specific standards, and regulations like GDPR, SOX,
HIPAA, and PCI DSS.
The projects are usually initiated by data governance or compliance teams. There are requirements
from the privacy office or legal team where personally identifiable information should be protected.

How Dynamic Data Masking Works in Snowflake

In Snowflake, Dynamic Data Masking is applied through masking policies. Masking policies are
schema-level objects that can be applied to one or more columns in a table or a view (standard &
materialized) to selectively hide and obfuscate according to the level of anonymity needed.
Once created and associated with a column, the masking policy is applied to the column at query
runtime at every position where the column appears.
Figure 2: How Dynamic Data Masking Works in Snowflake

Masking Policy SQL Construct

To apply Dynamic Data Masking, the masking policy objects need to be created. Like many
other securable objects in Snowflake, the masking policy is also a securable and schema level object.
The following is an example of a simple masking policy that masks the SSN number based on a
user’s role.

Figure 3: Dynamic Data Masking SQL construct


-- creating a normal dynamic masking policy
create or replace masking policy mask_ssn as (ssn_txt string)
returns string ->
case
when current_role() in ('SYSADMIN')
then ssn_txt
when current_role() in ('CALL_CNETER_AGENT') then
regexp_replace(ssn_txt,substring(ssn_txt,1,7),'xxx-xx-')
when current_role() in ('PROD_SUPP_MEMBER') then 'xxx-xx-xxxx'
else '***Masked***'
end;
COPY

The masking policy name, “mask_ssn” is the unique identifier within the schema and the signature
for the masking policy specifies the input columns in this example “ssn_txt” alongside data
type(string) to evaluate at query runtime. The return data type must match the input data type
followed by the SQL expression that transforms or mask the data which is ssn_txt in this example.
The SQL expression can include a built-in function or UDF or conditional expression functions (like
CASE in this example).

In the above example, the SSN is partially masked if the current role of the user
is CALL_CENTER_AGENT. If the user role is PROD_SUPP_MEMBER, then it replaces all the
numeric characters with character x. For any other roles, it returns NULL.
Once the masking policy is created, it needs to be applied to a table or view column. This can be done
during the table or view creation or using an alter statement.

Figure 4: How to apply dynamic data masking to a column


-- Customer table DDL & apply masking policy
create or replace table customer(
id number,
first_name string,
last_name string,
DoB string,
ssn string masking policy mask_ssn,
country string,
city string,
zipcode string);

-- For an existing table or view, execute the following statements:


alter table if exists customer modify column ssn set masking policy mask_ssn;
COPY
Once the masking policy is applied, and a user (with a specific role) queries the table, the user will see
the query result as shown below.

Figure 5: Data Masking applied at query run time

Multiple Masking Policies Example

We can create multiple masking policies and apply them to different columns at the same time.
In the previous example, we masked the customer table’s SSN column. We can create additional
masking policies for first_name, last_name, and date of birth columns and alter the customer table and
apply additional masking policies.
-- masking policy to mask first name
create or replace masking policy mask_fname as (fname_txt string) returns string ->
case
when current_role() in ('CALL_CNETER_AGENT') then 'xxxxxx'
when current_role() in ('PROD_SUPP_MEMBER') then 'xxxxxx'
else NULL
end;
-- apply mask_fname masking policy to customer.first_name column
alter table if exists customer modify column first_name set masking policy
mydb.myschema.mask_fname;

-- masking policy to mask last name


create or replace masking policy mydb.myschema.mask_lname as (lname_txt string) returns string ->
case
when current_role() in ('CALL_CNETER_AGENT') then lname_txt
when current_role() in ('PROD_SUPP_MEMBER') then 'xxxxxx'
else NULL
end;
-- apply mask_lname masking policy to customer.last_name column
alter table if exists mydb.myschema.customer modify column last_name set masking policy
mydb.myschema.mask_lname;

-- masking policy to mask date of birth name


create or replace masking policy mydb.myschema.mask_dob as (dob_txt string) returns string ->
case
when current_role() in ('CALL_CNETER_AGENT') then
regexp_replace(dob_txt,substring(dob_txt,1,8),'xxxx-xx-')
when current_role() in ('PROD_SUPP_MEMBER') then 'xxxx-xx-xx'
else NULL
end;

-- apply mask_dob masking policy to customer.dob column


alter table if exists mydb.myschema.customer modify column dob set masking policy
mydb.myschema.mask_dob;
COPY
Once these masking policies are created & applied, and a user (with a specific role) queries the table,
the user will see the query result as shown below.

Figure 6: Multiple Data Masking Policies Applied Example

Dynamic Masking & Run Time Query Execution

The best aspect of Snowflake’s data masking strategy is that end users can query the data without
knowing whether or not the column has a masking policy. Whenever Snowflake discovers a column
with a masking policy associated, the Snowflake query engine transparently rewrites the query at
runtime.

For authorized users, query results return sensitive data in plain text, whereas sensitive data is
masked, partially masked, or fully masked for unauthorized users.
If we take our customer data set where masking policies are applied on different columns, a query
submitted by a user and the query executed after Snowflake rewrites the query automatically looks as
follows.
Query Type Query Submitted By User Rewritten Query by Snowflake

Simple Query Select dob, ssn from Select mask_dob(dob), mask_ssn(ssn) from
customer; customer;

Query with Select dob, ssn from Select dob, ssn from customer where
where clause customer where ssn = ‘576- mask_ssn(ssn) = ‘576-77-4356’
predicate 77-4356’

Query with Select first_name, count(1) Select mask_fname(first_name), count(1)


joining column from customer c join orders from customer c join orders o
& where clause o on c.first_name = on mask_fname(c.first_name) =
predicate o.first_name o.first_name
Group by c.first_name; Group by mask_fname(c.first_name);

The rewrite is performed in all places where the protected column is present in the query, such as in
“projections”, “where” clauses, “join” predicates, “group by” statements, or “order by” statements.

Conditional Masking Policy in Snowflake

There are cases where data masking on a particular field depends on other column values besides user
roles. To handle such a scenario, Snowflake supports conditional masking policy, and to enable this
feature, additional input parameters can be passed as an argument along with data type.
Let’s say a user has opted to show his/her educational detail publicly but this flag is false for many
other users. In such a case, the user’s education detail will be masked only if the public visibility flag
is false, else this field will not be masked.
Figure 7: Conditional Data Masking Policies SQL Construct
-- DDL for user table
create or replace table user
(
id number,
first_name string,
last_name string,
DoB string,
highest_degree string,
visibility boolean,
city string,
zipcode string
);

-- User table sample dataset


insert into user values
(100,'Francis','Rodriquez','1988-01-27','Graduation',true,'Atlanta',30301),
( 101,'Abigail','Nash','1978-09-18', 'Post Graduation',false,'Denver',80201),
( 102,'Kasper','Short','1996-07-29', 'None',false,'Phoenix',85001);

– create conditional masking policy using visibility field


create or replace masking policy mask_degree as (degree_txt string,visibility boolean) returns string -
>
case
when visibility = true then degree_txt
else '***Masked***'
end;
– apply masking policy
alter table if exists mydb.myschema.user modify column highest_degree set masking policy
mydb.myschema.mask_degree using (highest_degree,visibility);
COPY
Once this conditional masking policy is created & applied, and a user (with a specific role) queries the
table, the user will see the query result as shown below.
Figure 8: Conditional Data Masking Example

What Are The Benefits to Dynamic Data Masking?

 A new masking policy can be created quickly and easily with no overhead of historic loading
of data.
 You can write a policy once and have it apply to thousands of columns across databases and
schemas.
 Masking policies are easy to manage and support centralized and decentralized administration
models.
 Easily mask data before sharing.
 Easily change masking policy content without having to reapply the masking policy to
thousands of columns.

Points to Remember When Working With Data Masking

 Masking policies carry over to cloned objects.


 Masking policies cannot be applied to virtual columns (external table). If you need to apply a
data masking policy to a virtual column, you can create a view on the virtual columns, and
apply policies to the view columns.
 Since all columns of an external table are virtual except the VALUE variant column, you can
apply a data masking policy only to the VALUE column.
 Materialized views can’t be created on table columns with masking policies applied.
However, you can apply masking policies to materialized view columns as long as there’s no
masking policy on the underlying columns.
 The Result Set cache isn’t used for queries that contain columns with masking policies.
 A data sharing provider cannot create a masking policy in a reader account.
 A data sharing consumer cannot apply a masking policy to a shared database or table
 Future granting of masking policy permissions is not supported.
 To delete a database or schema, the masking policy and its mapping must be self-contained
within the database or schema.

Conclusion

Snowflake’s Dynamic Data Masking is a very powerful feature that allows you to bring all kinds of
sensitive data into your data platform and manage it at scale.

Snowflake’s policy-based approach, along with role-based access control (RBAC), allows you to
prevent sensitive data from being viewed by table/view owners and users with privileged
responsibilities.
If you’re looking to take advantage of Snowflake’s Dynamic Data Masking feature, the data experts at
phData would love to help make this a reality. Feel free to reach out today for more information
1. What is Snowflake Dynamic Data Masking?
Snowflake Dynamic Data Masking is a security feature that allows organizations to mask
sensitive data in their database tables, views, and query results in real-time. This is useful for
protecting sensitive information from unauthorized access or exposure.
Snowflake Dynamic Data Masking allows the data to be masked as it is accessed, rather than being
permanently altered in the database.
With Dynamic Data Masking, users can choose which data to mask and how it should be masked,
such as by replacing sensitive information with dummy values or by partially revealing data. This can
be done at the column level, meaning that different columns can be masked differently depending on
the sensitivity of the data they contain.
2. Steps to apply Snowflake Dynamic Data Masking on a column
Follow below steps to perform Dynamic Data Masking in Snowflake.
 Step-1: Create a Custom Role with Masking Privileges
 Step-2: Assign Masking Role to an existing Role/User
 Step-3: Create a Masking Policy
 Step-4: Apply the Masking Policy to a Table or View Column
 Step-5: Verify the masking rules by querying data
Step-1: Create a Custom Role with Masking Privileges
The below SQL statement creates a custom role MASKINGADMIN in Snowflake.
create role MASKINGADMIN;
The below SQL statement grants privileges to create masking policies to the role MASKINGADMIN.
grant create masking policy on schema MYDB.MYSCHEMA to role MASKINGADMIN;
The below SQL statement grants privileges to apply masking policies to the role MASKINGADMIN.
grant apply masking policy on account to role MASKINGADMIN;
Step-2: Assign Masking Role to an existing Role/User
The MASKINGADMIN role by default will not have access to any database nor warehouse. The role
needs to be assigned to another Custom Role or a User who have privileges to access a database and
warehouse.
The below SQL statement assigns MASKINGADMIN to another custom role named
DATAENGINEER.
grant role MASKINGADMIN to role DATAENGINEER;
This allows all users with DATAENGINEER role to inherit masking privileges. Instead if you want to
limit the masking privileges, assign the role to individual users.
The below SQL statement assigns MASKINGADMIN to a User named STEVE.
grant role MASKINGADMIN to user STEVE;
Step-3: Create a Masking Policy
The below SQL statement creates a masking policy STRING_MASK that can be applied to columns
of type string.
create or replace masking policy STRING_MASK as (val string) returns string ->
case
when current_role() in ('DATAENGINEER') then val
else '*********'
end;
This masking policy masks the data applied on a column when queried from a role other than
DATAENGINEER.
Step-4: Apply (Set) the Masking Policy to a Table or View Column
The below SQL statement applies the masking policy STRING_MASK on a column named
LAST_NAME in EMPLOYEE table.
alter table if exists EMPLOYEE modify column LAST_NAME set masking policy STRING_MASK;
Note that prior to dropping a policy, the policy needs to be unset from all the tables and views on
which it is applied.
Step-5: Verify the masking rules by querying data
Verify the data present in EMPLOYEE table by querying from two different roles.
The below image shows data present in EMPLOYEE when queried from DATAENGINEER role.

Unmasked data when queried from DATAENGINEER role


The below image shows data present in EMPLOYEE when queried from ANALYST role where the
data present in LAST_NAME column is masked.
Masked data when queried from ANALYST role
3. Remove (Unset) Masking Policy on a column
The below SQL statement removes (unsets) a masking policy applied on a column present in a table.
alter table if exists EMPLOYEE modify LAST_NAME unset masking policy ;
4. Partial Data Masking in Snowflake
Snowflake also supports partially masking the column data.
The below SQL statement creates a masking policy EMAIL_MASK which partially mask the email
data when queried from ANALYST role leaving the email domain unmasked.
create or replace masking policy EMAIL_MASK as (val string) returns string ->
case
when current_role() in ('DATAENGINEER') then val
when current_role() in ('ANALYST') then regexp_replace(val,'.+\@','*****@') -- leave email
domain unmasked
else '********'
end;
The below SQL statement applies the masking policy EMAIL_MASK on a column named EMAIL in
EMPLOYEE table.
alter table if exists EMPLOYEE modify column EMAIL set masking policy EMAIL_MASK;
The below image shows data present in EMPLOYEE when queried from ANALYST role where the
data present in EMAIL column is partially masked.
Partial Data Masking
5. Conditional Data Masking in Snowflake
Conditional Data Masking allows you to selectively apply the masking on a column by using a
different column to determine whether data in a given column should be masked.
The CREATE MASKING POLICY syntax consists of two arguments. The first
column always specifies the column to mask. The second column is a conditional column to evaluate
whether the first column should be masked.
The below SQL statement masks the data when the value of conditional columnis less than 105.
create or replace masking policy EMAIL_MASK as (mask_col string, cond_col number ) returns
string ->
case
when cond_col < 105 then mask_col
else '*********'
end;
The below SQL statement applies the masking policy EMAIL_MASK on a column named EMAIL
based on the value of the conditional column EMPLOYEE_ID present in the EMPLOYEE table.
alter table if exists EMPLOYEE modify column EMAIL set masking policy EMAIL_MASK
using(email, employee_id);
The below image shows the output of a query from EMPLOYEE table where the EMAIL data is
masked based on the value of EMPLOYEE_ID.
Conditional Data Masking
6. Altering Masking Policies in Snowflake
Snowflake supports modifying the existing masking policy rules with new rules and renaming of a
masking policy. The changes done to the masking policy will go into effect when the next SQL query
that uses the masking policy runs.
Below is the syntax to alter the existing masking policy in Snowflake.
ALTER MASKING POLICY [ IF EXISTS ] <name> SET BODY -> <expression_on_arg_name>

ALTER MASKING POLICY [ IF EXISTS ] <name> RENAME TO <new_name>

ALTER MASKING POLICY [ IF EXISTS ] <NAME> SET COMMENT = '<string_literal>'


7. Extracting information of existing Masking Policies in Snowflake
SHOW MASKING POLICIES
Lists masking policy information, including the creation date, database and schema names, owner,
and any available comments.
The below SQL statements extracts the masking policies present in the database and schema of the
current session.
show masking policies;

Listing all masking policies


The below SQL statement extracts all the masking policies present across the account.
show masking policies in account;
DESCRIBE MASKING POLICY
Describes the details about a masking policy, including the creation date, name, data type, and SQL
expression.
The below SQL statement extracts information of the masking policy STRING_MASK.
describe masking policy STRING_MASK;

Extracting details of a Masking Policy


8. Dropping Masking Policies in Snowflake
A Masking Policy in Snowflake cannot be dropped successfully if it is currently assigned to a column.
Follow below steps to drop a Masking Policy in Snowflake
1. Find the columns on which the policy is applied.
The below SQL statement lists all the columns on which EMAIL_MASK masking policy is applied.
select * from table(information_schema.policy_references(policy_name=>'EMAIL_MASK'));

Finding all the columns on which Masking Policy is applied


2. Once the columns on which masking policies are applied is found out, UNSET the masking policy
from the column
3. Drop the masking Policy.
The below SQL statement drops the masking policy named EMAIL_MASK;
drop masking policy EMAIL_MASK;
9. Limitations of Snowflake Dynamic Data Masking
Currently, Snowflake does not support different input and output data types in a masking policy, i.e
you cannot mask a date column with a string value (e.g. ***MASKED***).
The input and output data types of a masking policy must match.
Masking policies cannot be applied to virtual columns. Apply the policy to the source table column or
view column.
In conditional masking policies, a virtual column of an external table can be listed as a conditional
column argument to determine whether the first column argument should be masked. However, a
virtual column cannot be specified as the first column to mask.
Prior to dropping a policy, the policy needs to be unset from the table or view.
Data security has become a top priority, and organizations throughout the world are looking for
effective solutions to protect their expanding volumes of sensitive data. As the volume of
sensitive data grows, so does the need for robust data protection solutions. This is where data
governance comes in, guaranteeing that data is correctly handled and used to preserve accuracy,
security, and quality.
In this article, we are going to discuss in depth on Snowflake Dynamic Data Masking,
a Snowflake Data Governance Feature. We'll go through the concept, benefits, and
implementation of this feature, as well as provide step-by-step instructions on how to build and
apply masking policies. We will also explore advanced data masking techniques, how to manage
and retrieve masking policy information, and the limitations of Snowflake's data masking
capabilities.
Let’s dive right in!!

Overview of built-in Snowflake governance features


Snowflake offers robust data governance capabilities to ensure the security and compliance of
your data. There are several built-in built-in Snowflake data governance features, including:

 Snowflake Column-level security : This feature enables the application of a masking


policy to a specific column in a table or view. It offers two distinct features, they are:
- Snowflake Dynamic Data Masking
- External Tokenization

 Row-level access policies/security : This feature defines row access policies to filter
visible rows based on user permissions.
 Object tagging : Tags objects to classify and track sensitive data for compliance and
security.
 Object tag-based masking policies : This feature enables the protection of column data
by assigning a masking policy to a tag, which can then be set on a database object or the
Snowflake account.
 Data classification : This feature allows users to automatically identify and classify
columns in their tables containing personal or sensitive data.
 Object dependencies : This feature allows users to identify dependencies among
Snowflake objects.
 Access History : This feature provides a record of all user activity related to data access
and modification within a Snowflake account. Essentially, it tracks user queries that read
column data and SQL statements that write data. The Access History feature is
particularly useful for regulatory compliance auditing and also provides insights into
frequently accessed tables and columns.

Snowflake's Column Level Security Features


Now that we are familiar with various built-in Snowflake data governance features, let's shift our
focus to the main center of this article, Snowflake Column-Level Security. Snowflake column-
level security feature is available only in the Enterprise edition or higher tiers. It provides
enhanced measures to safeguard sensitive data in tables or views. It offers two distinct features,
which are:

 Snowflake Dynamic Data Masking: Snowflake Dynamic Data Masking is a feature that
enables organizations to hide sensitive data by masking it with other characters. It allows
users to create Snowflake masking policies to conceal data in specific columns of tables
or views. Dynamic Data Masking is applied in real-time, ensuring that unauthorized
users or roles only see masked data.
 External Tokenization: Before we delve into External Tokenization, let's first
understand what Tokenization is. Tokenization is a process that replaces sensitive data
with ciphertext, rendering it unreadable. It involves encoding and decoding sensitive
information, such as names, into ciphertext. On the other hand, External Tokenization
enables the masking of sensitive data before it is loaded into Snowflake, which is
achieved by utilizing an external function to tokenize the data and subsequently loading
the tokenized data into Snowflake.
While both Snowflake Dynamic Data Masking and External Tokenization are column-level
security features in Snowflake, Dynamic Data Masking is more commonly used as it allows
users to easily implement data masking without the need for external functions. External
Tokenization, on the other hand, involves a more complex setup and is typically not widely
implemented in organizations.

What exactly is Snowflake Dynamic Data Masking?


Snowflake Dynamic Data Masking (DDM) is a column-level security feature that uses masking
policies to selectively mask plain-text sensitive data in table and view columns at query time.
This means the underlying data is not altered in the database, but rather masked as it is retrieved.
DDM policies in Snowflake are defined at the schema level, and can be applied to any number
of tables or views within that schema. Each policy specifies which columns should be masked
as well as the masking method to use.
Masking methods can include:

 Redaction: Replaces data with a fixed set of characters, like XXX, ***, &&&.
 Random data: Replaces with random fake data based on column data type.
 Shuffling: Scrambles the data while preserving format.
 Encryption: Encrypts the data, allowing decryption for authorized users.
When a user queries a table or view protected by a Snowflake dynamic data masking policy, the
masking rules are applied before the results are returned, ensuring users only see the masked
version of sensitive data, even if their permissions allow viewing the actual data.
Snowflake dynamic data masking is a powerful tool for protecting sensitive data. It is easy to
use, scalable, and can be applied to any number of tables or views. Snowflake Dynamic Data
Masking can help organizations to comply with data privacy regulations, such as the General
Data Protection Regulation (GDPR) , HIPAA, SOC, and PCI DSS.

What are the reasons for Snowflake Dynamic Data Masking ?


Here are the primary reasons for Snowflake dynamic data masking:

 Risk Mitigation: The main purpose of Snowflake Dynamic Data Masking is to reduce
the risk of unauthorized access to sensitive data. So by masking sensitive columns in
query results, Snowflake Dynamic Data Masking prevents potential leaks of data to
unauthorized users.
 Confidentiality: Snowflake may contain financial data, employee data, intellectual
property or other information that should remain confidential. Snowflake Dynamic Data
Masking ensures this sensitive data is not exposed in query results to unauthorized users.
 Regulatory Compliance: Regulations like GDPR, HIPAA, SOC, and PCI DSS require
strong safeguards for sensitive and personally identifiable information. Snowflake
Dynamic Data Masking helps meet compliance requirements by protecting confidential
data from bad actors.
 Snowflake Governance Initiatives: Snowflake Data governance and security teams
typically drive initiatives to implement controls like Snowflake Dynamic Data Masking
to better manage and protect sensitive Snowflake data access.
 Privacy and Legal Requirements: Privacy regulations and legal obligations may
require Snowflake to mask sensitive data from unauthorized parties. Dynamic Data
Masking provides the technical controls to enforce privacy requirements for data access.

Implementing Snowflake Dynamic Data Masking—Step-by-Step Guide

Creating a Custom Role with Masking Privileges


Firstly, let's start by creating a custom role with the necessary masking privileges. This role will
be responsible for managing the Snowflake masking policies.
To create the custom role, execute the following SQL statement:
CREATE ROLE dynamic_masking_admin;

Creating Snowflake role for managing


Snowflake dynamic data masking
Let’s grant privileges to create Snowflake masking policies to the
role dynamic_masking_admin
GRANT CREATE masking policy ON SCHEMA my_db.my_schema TO ROLE
dynamic_masking_admin;

Granting masking policy privileges to role - Snowflake dynamic data masking


Now, let’s grant privileges to apply Snowflake masking policies to the
role dynamic_masking_admin.
GRANT apply masking policy ON account TO ROLE dynamic_masking_admin;

Grant
ing masking policy privileges to roles - Snowflake masking policies

Assign a Masking Role to an Existing Role/User


Next, assign the masking role to an existing role or user who will be responsible for managing
and applying Snowflake masking policies.
Granting the masking role can enable individuals to inherit the masking privileges and
effectively implement data masking on the desired columns.
To assign the masking role to an existing role, execute the following SQL statement:
GRANT ROLE dynamic_masking_admin TO school_principal;
Assigning
masking role to another Snowflake role - Snowflake masking policies
Note: dynamic_masking_admin role, by default, will not have access to any database or
warehouse. The role needs to be assigned to another Custom Role or a User who has privileges
to access a database and warehouse.
To assign the masking role to an individual user, execute the following SQL statement:
GRANT ROLE dynamic_masking_admin TO USER [USERNAME];

Granting masking role to


a Snowflake user - masking policies

Steps for creating Snowflake masking policies


With the custom role and privileges in place, it's time to create a masking policy. A masking
policy defines how data should be masked based on specific conditions or rules. Snowflake
offers flexibility in defining masking policies to suit your data protection needs.
Here is what making policy should look like:
CREATE
OR
replace masking policy [POLICY_NAME] AS (val [COLUMN_TYPE])
returns [COLUMN_TYPE]
BEGIN
CASE
WHEN current_role() IN ('[AUTHORIZED_ROLE]') THEN
val
ELSE '[MASKING_VALUE]'
END
END;
Replace:

 [POLICY_NAME] with a suitable name for your masking policy


 [COLUMN_TYPE] with the data type of the column you wish to mask.
 [AUTHORIZED_ROLE] as the role that should have unmasked access
 [MASKING_VALUE] as the value to mask the data.
Here is an example:
The below SQL statement creates a masking policy, data_masking that can be applied to
columns of type string.
CREATE
OR
replace masking policy data_masking AS (val string)
returns string ->
CASE
WHEN current_role() IN (school_principal) THEN
val
ELSE '*************'
END;

Creating Snowflake
masking policy to mask strings - Snowflake masking policies
This masking policy masks the data applied on a column when queried from a role other
than school_principal.

Applying the Masking Policy to a Table or View Column


After creating the masking policy, it's time to apply it to the desired column within a table or
view. By applying the masking policy, you ensure that sensitive data in that column is
appropriately masked, while authorized roles can still access the original data.
To apply the masking policy, execute the following SQL statement:
ALTER TABLE [TABLE_NAME]
MODIFY COLUMN [COLUMN_NAME]
SET MASKING POLICY [POLICY_NAME];
Replace:

 [TABLE_NAME] with the name of the table or view where the column is located.
 [COLUMN_NAME] with the name of the column to be masked
 [POLICY_NAME] with the name of the masking policy created in the previous step.
Here is an example:
ALTER TABLE IF EXISTS student_records
MODIFY COLUMN email
SET masking policy data_masking;
Applying masking policy to
Snowflake table column - Snowflake masking policies - Snowflake Dynamic Data Masking

Verifying the Masking Rules by Querying Data


To make sure the masking rules are correctly applied, it is crucial to verify the results by
querying the data.
By testing the data retrieval from different roles, you can see the masking effects and confirm
that sensitive information remains hidden from unauthorized access.
Execute queries from different roles to verify the masking rules:
Querying Data from school_principal Role:
When queried from the school_principal role, the data in the student_records table appears
unmasked. Here is an image showing the unmasked data:
use role school_principal;
select first_name, last_name, gender, email from student_records;

Querying Data from school_principal Role - Snowflake masking policies


Querying Data from student Role:
When queried from the student role, the data in the student_records table has masking applied to
the email column.
Here is an image showing the masked data:
use role student;
select first_name, last_name, gender, email from student_records;

Querying Data from student Role - Snowflake masking policies


Unsetting Masking Policy on a Column
If we want to remove the masking policy applied to a specific column, we can use the following
SQL statement:
ALTER TABLE IF EXISTS student_records MODIFY email UNSET MASKING POLICY;
This statement removes the masking policy from the email column in the student_records table.

Managing and Extracting Information of Snowflake Masking Policies

Altering Masking Policies


Snowflake allows us to modify existing masking policies by adding new rules or renaming the
policy. Any changes made to the masking policy will take effect when the next SQL query that
uses the policy runs.
To alter an existing masking policy in Snowflake, we use the following syntax:
ALTER MASKING POLICY [IF EXISTS] <name> SET BODY -> <expression_on_arg_name>
ALTER MASKING POLICY [IF EXISTS] <name> RENAME TO <new_name>
ALTER MASKING POLICY [IF EXISTS] <name> SET COMMENT = 'strings'

Extracting Information of Existing Masking Policies


We can extract information about existing masking policies in Snowflake using the following
SQL statements:

Listing All Masking Policies:


The following SQL statement lists all the masking policies present in the current session's
database and schema:
SHOW MASKING POLICIES;
This command provides information such as the creation date, database, schema names, owner,
and any available comments for each masking policy.

Listing All Masking Policies - Snowflake masking policies

Describing a Masking Policy:


The following SQL statement describes the details of a specific masking policy, including its
creation date, name, data type, and SQL expression:
DESCRIBE MASKING POLICY <policy_name>;
This command extracts information about the specified masking policy.
Here is an example:
DESCRIBE MASKING POLICY DATA_MASKING;

Describing a Masking Policy - Snowflake masking policies - Snowflake data security

Step by step process of dropping a Snowflake masking policy


To drop a masking policy in Snowflake, we need to follow these steps:

Find the Columns with Applied Policy:


First, we need to identify the columns where the masking policy is currently applied. We can use
the following SQL statement to list all the columns on which the DATA_MASKING masking
policy is applied:
SELECT * FROM TABLE(INFORMATION_SCHEMA.POLICY_REFERENCES(POLICY_NAME
=> 'DATA_MASKING'));
This statement retrieves information about the columns where the specified masking policy is
applied.

Unset the Masking Policy:


Once we identify the columns where the masking policy is applied, we need to unset the
masking policy from those columns.
This can be done using the following SQL statement:
ALTER TABLE IF EXISTS <table_name> MODIFY <column_name> UNSET MASKING
POLICY;

Drop the Masking Policy:


Finally, to drop the masking policy, we use the following SQL statement:
DROP MASKING POLICY <policy_name>;
Replace <policy_name> with the name of the masking policy that you want to drop.
Here is an example of dropping of the DATA_MASKING masking policy:

Dropping a Snowflake masking


policy - Snowflake column level security

Advanced Snowflake Dynamic Data Masking Techniques in Snowflake:

Partial Data Masking


Snowflake also supports partially masking column data. We can create a masking policy that
partially masks the email data for the student role, leaving the email domain unmasked.

Creating the Partial Data Masking Policy:


We can create a masking policy called dynamic_email_masking using the following SQL
statement:
create or replace masking policy dynamic_email_masking as (val string) returns string ->
case
when current_role() in ('SCHOOL_PRINCIPAL') then val
else regexp_replace(val,'. +\@','*****@') -- leave email domain unmasked
end;

Creating Partial Data Masking Policy in Snowflake - Snowflake column level security
This particular masking policy will mask the email address by replacing everything after the first
period with asterisks (*). But, the email domain will be left unmasked, meaning that users with
the SCHOOL_PRINCIPAL role will be able to see the full email address, while users with
other roles will only be able to see the first part of the email address, followed by asterisks.

Applying the Masking Policy:


To apply the dynamic_email_masking policy to the email column in the student_records table,
we can use the following SQL statement:
ALTER TABLE IF EXISTS student_records MODIFY COLUMN email SET MASKING POLICY
dynamic_email_masking;

Applying partial masking policy to email column in Snowflake - Snowflake Dynamic Data Masking
This statement applies the masking policy to the email column. Once you have applied the
masking policy, users with the SCHOOL_PRINCIPAL role will be able to see the full email
address for all students in the student_records table. Noet that users with other roles will only be
able to see the first part of the email address, followed by asterisks.

Conditional Data Masking


Conditional Data Masking allows us to selectively apply masking to a column based on the
value of another column. We can create conditional data masking in Snowflake using
the student_records table for the email column, where users with
the SCHOOL_PRINCIPAL role can see the full email address and users with other roles will
see the first five characters and the last two characters of the email address:

Creating the Conditional Data Masking Policy:


We can create a masking policy called conditional_email_masking using the following SQL
statement:
create or replace masking policy CONDITIONAL_EMAIL_MASKING as (val string) returns string -
>
case
when current_role() in ('SCHOOL_PRINCIPAL') then val
else substring(val, 1, 5) || '***' || substring(val, -2)
end;

Creating conditional data masking policy in Snowflake


This masking policy will only be applied to the email column in the student_records table.
Only users with the SCHOOL_PRINCIPAL role will be able to see the full email address,
while users with other roles will only see the first five characters and the last two characters of
the email address.
Applying the Masking Policy:
To apply the dynamic_email_masking policy to the email column based on the value of
the student_id column in the student_records table, we use the following SQL statement:
ALTER TABLE IF EXISTS student_records MODIFY COLUMN email SET MASKING POLICY
CONDITIONAL_EMAIL_MASKING USING (email, student_id);

Applying conditional masking policy to email column based on student_id in Snowflake - Snowflake
column level security - Snowflake Dynamic Data Masking
This statement applies the masking policy to the email column, considering the values in the
email and student_id columns.

Limitations of Snowflake Dynamic Data Masking


Here are some key limitations of Snowflake Dynamic Data Masking:

 Snowflake masking features require at least an Enterprise Edition


subscription (or Higher).
 Masking can impact query performance since Snowflake has to evaluate the masking
rules for each row returned in the result set. More complex rules can slow down query
response times.
 Masking does not hide data in columns that are not selected in the query. For example, if
a query selects only name and age columns, the masking rules will apply only to name
and age. Other columns will be returned unmasked.
 Masking conditions cannot be based on encrypted column values since Snowflake cannot
evaluate conditions on encrypted data. Masking rules can only use unencrypted columns.
 It does not mask data in temporary tables or unmanaged external tables. It only works for
managed tables in Snowflake.
 It only works on SELECT queries. It does not mask data for INSERT, UPDATE or
DELETE queries. So if a user has DML access to tables, they will still see the actual
data. It only masks data for read-only access.
 It cannot be applied to virtual columns. Virtual columns are derived columns that are not
stored in the database, which means that Dynamic Data Masking cannot be used to mask
data in virtual columns.
 It cannot be applied to shared objects. Shared objects are objects that are stored in a
Snowflake account and can be shared with other users or accounts.
 Dynamic Data Masking can be complex to set up and manage, especially if you have a
large number of tables and columns. You need to create a masking policy for each
column that you want to mask, and you need to make sure that the masking policy is
applied to the correct tables and columns.

Points to Remember—Critical Do's and Don'ts—When Working With Snowflake Dynamic

Data Masking
Here are some additional points to remember while working with Snowflake dynamic data
masking:

 Snowflake dynamic data masking policies obfuscate data at query runtime, original data
is unchanged
 Snowflake dynamic data masking prevents unauthorized users from seeing real data
 Take backup data before applying masking
 Masking applies only when reading data, not DML
 Snowflake dynamic data masking policy names must be unique within a database
schema.
 Masking policies are inherited by cloned objects, ensuring consistent data protection
across replicated data.
 Masking policies cannot be directly applied to virtual columns in Snowflake. To apply a
dynamic data masking policy to a virtual column, you can create a view on the virtual
columns and then apply the policy to the corresponding view columns.
 Snowflake records the original query executed by the user on the History page of the
web interface. The query details can be found in the SQL Text column, providing
visibility into the original query even with data masking applied.
 Masking policy names used in a specific query can be found in the Query Profile, which
helps in tracking the applied policies for auditing and debugging purposes.

Conclusion
At last, data security is a critical concern for organizations, and Snowflake's Dynamic Data
Masking feature offers a powerful solution to protect sensitive Snowflake data. Snowflake's
Dynamic Data Masking is an extremely powerful tool that empowers organizations to bring
sensitive data into Snowflake platforms while effectively managing it at scale. Snowflake
dynamic data masking combines policy-based approaches and role-based access control (RBAC)
and makes sure that only authorized individuals can access sensitive data, protecting it from
prying eyes and mitigating the risk of data breaches. Throughout this article, we explored the
concept, benefits, and implementation of Dynamic Data Masking, covering step-by-step
instructions for building and applying masking policies. We also delved into advanced
techniques like partial and conditional data masking, discussed policy management, and
highlighted the limitations as well as its benefits.
Just as a skilled locksmith carefully safeguards valuable treasures in a secure vault, Snowflake's
Dynamic Data Masking feature acts as a trustworthy guardian for organizations' sensitive data.

FAQs
What is Snowflake Dynamic Data Masking?
Snowflake Dynamic Data Masking is a security feature in Snowflake that allows the masking of
sensitive data in query results.
How does Dynamic Data Masking work in Snowflake?
It works by applying masking policies to specific columns in tables and views, which replace the
actual data with masked data in query results.
Can I apply Dynamic Data Masking to any column in Snowflake?
Yes, you can apply it to any table or view column that contains sensitive data. It cannot be
applied directly to virtual columns.
Is the original data altered when using Dynamic Data Masking?
No, the original data in the micro-partitions is unchanged. Only the query results are masked.
Who can define masking policies in Snowflake?
Only users with the necessary privileges, such
as ACCOUNTADMIN or SECURITYADMIN roles, can define masking policies.
Can I use Dynamic Data Masking with third-party tools?
Yes, as long as the tool can connect to Snowflake and execute SQL queries.
How can I test my Snowflake masking policies?
You can test them by running SELECT queries and checking if the returned data is masked as
expected.
Can I use Dynamic Data Masking to mask data in real-time?
Yes, the data is masked in real-time during query execution.
Can I use different Snowflake masking policies for different users?
Yes, you can define different masking policies and grant access to them based on roles in
Snowflake.
What types of data can I mask with Dynamic Data Masking?
You can mask any type of data, including numerical, string, and date/time data.
What happens if I drop a masking policy?
Only future queries will show unmasked data. Historical query results from before the policy
was dropped remain masked.
Can I use Dynamic Data Masking with Snowflake's Materialized Views feature?
Yes, masking will be applied at query time on the materialized view, not during its creation.
What is Row-Level Security?
Row-Level Security is a security mechanism that limits the records returned from a database
table based on the permissions provided to the currently logged-in user. Typically, this is done
such that certain users can access only their data and are not permitted to view the data of other
users.
In our previous article we have discussed how to implement Row-Level Security using Secure Views.
In this article let us understand how to set up Row-Level Security on a database table using Row
Access Policies in Snowflake.
2. What are Row Access Policies in Snowflake?
A Row Access Policy is a schema-level object that determines whether a given row in a table or
view can be viewed by a user using the following types of statements.
1. SELECT statements
2. Rows selected by UPDATE, DELETE, and MERGE statements.
The row access policy should be added to a table or a view binding with a column present inside
them. A row access policy can be added to a table or view either when the object is created or after
the object is created.
3. Steps to implement Row-Level Security using Row Access Policies in Snowflake
Follow below steps to implement Row-Level Security using Row Access Policies in Snowflake.
1. Create a table to apply Row-Level Security
2. Create a Role Mapping table
3. Create a Row Access Policy
4. Add the Row Access Policy to a table
5. Create Custom Roles and their Role Hierarchy
6. Grant SELECT privilege on table to custom roles
7. Grant USAGE privilege on virtual warehouse to custom roles
8. Assign Custom Roles to Users
9. Query and verify Row-Level Security on table using custom roles
10. Revoke privileges on role mapping table to custom roles
3.1. Create a table to apply Row-Level Security
Let us consider a sample employees table as an example for the demonstration of row-level security
using secure views.
The below SQL statements creates a table named employees with required sample data in hr schema
of analytics database.
use role SYSADMIN;

create or replace database analytics;

create or replace schema analytics.hr;

create or replace table analytics.hr.employees(


employee_id number,
first_name varchar(50),
last_name varchar(50),
email varchar(50),
hire_date date,
country varchar(50)
);

INSERT INTO analytics.hr.employees(employee_id,first_name,last_name,email,hire_date,country)


VALUES
(100,'Steven','King','SKING@outlook.com','2013-06-17','US'),
(101,'Neena','Kochhar','NKOCHHAR@outlook.com','2015-09-21','US'),
(102,'Lex','De Haan','LDEHAAN@outlook.com','2011-01-13','US'),
(103,'Alexander','Hunold','AHUNOLD@outlook.com','2016-01-03','UK'),
(104,'Bruce','Ernst','BERNST@outlook.com','2017-05-21','UK'),
(105,'David','Austin','DAUSTIN@outlook.com','2015-06-25','UK'),
(106,'Valli','Pataballa','VPATABAL@outlook.com','2016-02-05','CA'),
(107,'Diana','Lorentz','DLORENTZ@outlook.com','2017-02-07','CA'),
(108,'Nancy','Greenberg','NGREENBE@outlook.com','2012-08-17','CA')
;

Employees table
3.2. Create a Role Mapping table
The below SQL statements creates mapping table named role_mapping which stores the country and
corresponding role to be assigned for the users of that country as shown below.
use role SYSADMIN;

create or replace table analytics.hr.role_mapping(


country varchar(50),
role_name varchar(50)
);

INSERT INTO analytics.hr.role_mapping(country, role_name) VALUES


('US','DATA_ANALYST_ROLE_US'),
('UK','DATA_ANALYST_ROLE_UK'),
('CA','DATA_ANALYST_ROLE_CA')
;
Role_Mapping table
3.3. Create a Row Access Policy
The below SQL statement creates a Row Access Policy with following two conditions.
1. User with SYSADMIN role can query all rows of the table.
2. User with DATA_ANALYST roles can query only rows belonging to their country
based on the role mapping table.
use role SYSADMIN;

create or replace row access policy analytics.hr.country_role_policy as (country_name varchar)


returns boolean ->
'SYSADMIN' = current_role()
or exists (
select 1 from role_mapping
where role_name = current_role()
and country = country_name
)
;
In the above statement:
country_role_policy specifies the name of the policy.
country_name is the signature of the row access policy which specifies the field and data type of the
mapping table to which it links.
returns boolean -> specifies the application of the row access policy.
‘SYSADMIN’ = current_role() is the first condition of row access policy which allows users with
SYSDAMIN role to view all rows of the table.
or exists … is the second condition of the row access policy expression which uses a subquery. The
subquery requires the CURRENT_ROLE to be the custom role which specifies the country through
role mapping table. This is used by row access policy to limit the rows to be returned for the query
executed by user.
3.4. Add the Row Access Policy to a table
The below SQL statement adds the row access policy named country_role_policy to the
table employees on country field.
use role SYSADMIN;

alter table analytics.hr.employees


add row access policy analytics.hr.country_role_policy on (country);
3.5. Create Custom Roles and their Role Hierarchy
The below SQL statements creates custom roles mentioned in the role mapping table to assign to the
users in later stage.
use role SECURITYADMIN;

create or replace role DATA_ANALYST_ROLE_US;


create or replace role DATA_ANALYST_ROLE_UK;
create or replace role DATA_ANALYST_ROLE_CA;
When the roles are created, they exist in isolation not allowing the other roles (even the roles which
create and grant privileges to them) to access the objects created by them. So, it is required to set up a
role hierarchy for the custom roles we created.
The below SQL statements assigns the custom roles to the role SYSADMIN so that the SYSADMIN
can inherit all the privileges assigned to custom role.
use role SECURITYADMIN;

grant role DATA_ANALYST_ROLE_US to role SYSADMIN;


grant role DATA_ANALYST_ROLE_UK to role SYSADMIN;
grant role DATA_ANALYST_ROLE_CA to role SYSADMIN;
3.6. Grant SELECT privilege on table to custom roles
The below SQL statements grants usage privileges on database analytics and schema hr present
inside it with only select privilege on all tables present inside them to the custom roles created.
use role SYSADMIN;

grant usage on database analytics to role DATA_ANALYST_ROLE_US;


grant usage on schema analytics.hr to role DATA_ANALYST_ROLE_US;
grant select on all tables in schema analytics.hr to role DATA_ANALYST_ROLE_US;

grant usage on database analytics to role DATA_ANALYST_ROLE_UK;


grant usage on schema analytics.hr to role DATA_ANALYST_ROLE_UK;
grant select on all tables in schema analytics.hr to role DATA_ANALYST_ROLE_UK;

grant usage on database analytics to role DATA_ANALYST_ROLE_CA;


grant usage on schema analytics.hr to role DATA_ANALYST_ROLE_CA;
grant select on all tables in schema analytics.hr to role DATA_ANALYST_ROLE_CA;
3.7. Grant USAGE privilege on virtual warehouse to custom roles
The below SQL statements provides usage privileges on warehouse compute_wh to the custom roles
to query tables.
use role ACCOUNTADMIN;

grant usage on warehouse compute_wh to role DATA_ANALYST_ROLE_US;


grant usage on warehouse compute_wh to role DATA_ANALYST_ROLE_UK;
grant usage on warehouse compute_wh to role DATA_ANALYST_ROLE_CA;
3.8. Assign Custom Roles to Users
Let us consider there are three users TONY, STEVE and BRUCE belonging to US, UK and CA
respectively.
The below SQL statements assigns the custom roles to the users belonging to the respective countries.
use role SECURITYADMIN;

grant role DATA_ANALYST_ROLE_US to user TONY;


grant role DATA_ANALYST_ROLE_UK to user STEVE;
grant role DATA_ANALYST_ROLE_CA to user BRUCE;
3.9. Query and verify Row-Level Security on table using custom roles
Let us verify the data returned for each user when queried on the same table.
The below image shows that for user with role DATA_ANALYST_ROLE_US when queried on the
table employees, the data returned is only from country US.
Query returning only US data when queried with DATA_ANALYST_ROLE_US role
The below image shows that for user with role DATA_ANALYST_ROLE_UK when queried on the
table employees, the data returned is only from country UK.

Query returning only UK data when queried with DATA_ANALYST_ROLE_UK role


The below image shows that for user with role DATA_ANALYST_ROLE_CA when queried on the
table employees, the data returned is only from country CA.

Query returning only CA data when queried with DATA_ANALYST_ROLE_CA role


The below image shows that when the user with role SYSADMIN queries on the table employees, all
rows are returned.
Query returning all rows when queried with SYSADMIN role
3.10. Revoke privileges on role mapping table to custom roles
Since we have provided SELECT privilege on all tables, the user can also access the role mapping
table which is used to limit the access to the users.
To avoid this you could either create this role mapping table in a different schema to which the users
do not have access or simply revoke the access on this particular table.
The below SQL statement revokes all privileges on table role_mapping to the custom roles.
use role SYSADMIN;

revoke all privileges on table analytics.hr.role_mapping from role DATA_ANALYST_ROLE_US;


revoke all privileges on table analytics.hr.role_mapping from role DATA_ANALYST_ROLE_UK;
revoke all privileges on table analytics.hr.role_mapping from role DATA_ANALYST_ROLE_CA;
4. How to Remove a Row Access Policy on a table in Snowflake?
The below SQL statement removes a row access policy on a table.
alter table <table_name> drop row access policy <policy_name>;
The below SQL statement removes all row access policy associations from a table.
alter table <table_name> drop all row access policies;
5. How to Extract information of existing Row Access Policies in Snowflake?
5.1. SHOW ROW ACCESS POLICIES
Lists the row access policies for which the user have access privileges. It returns information of
creation date, database and schema names, owner, and any available comments.
The below SQL statement extracts the row access policies present in the database and schema of the
current session.
show row access policies;

Show Row Access Policies – Listing all Row Access Policies


5.2. DESCRIBE ROW ACCESS POLICY
Describes the current definition of a row access policy, including the creation date, name, data type,
and SQL expression.
The below SQL statement extracts information of the row access policy country_role_policy.
describe row access policy country_role_policy;

Describe Row Access Policy – Extracting details of a Row Access Policy


6. How to Rename a Row Access Policy in Snowflake?
The below SQL statement renames row access policy from row_policy1 to row_policy2.
alter row access policy row_policy1 rename to row_policy2;
7. How to Update a Row Access Policy in Snowflake?
To update an existing row access policy,
 If you need to see the current definition of the policy, run the DESCRIBE ROW
ACCESS POLICY command.
 The row access policy expression can then be updated with the ALTER ROW
ACCESS POLICY command.
The below SQL statement updates the SQL expression that filters the data in the row access policy.
alter row access policy <policy name> set body -> <expression_on_val>;
The expression can include conditional expression functions to represent conditional logic, built-in
functions, or UDFs to transform the data.
8. How to Drop a Row Access Policy in Snowflake?
A Row Access Policy cannot be dropped successfully if it is currently attached to a resource. Before
executing a DROP statement, detach the row access policy from the table or view.
Follow below steps to drop a row access policy in Snowflake
1. Find the objects on which the row access policy is attached.
The below SQL statement lists all the objects on which row access policy
named country_role_policy is attached.
select * from table(information_schema.policy_references(policy_name=>'country_role_policy'));

Finding all objects on which Role Access Policy is applied


2. Remove the row access policy from all the tables and views to which it is associated. (refer
section-4 of the article)
3. Drop the row access policy.
The below SQL statement drops row access policy named country_role_policy.
drop row access policy country_role_policy;
9. Closing Points
Few key points to keep in mind related to row access policies.
 If a table column has a row access policy attached to it, the column cannot be dropped
from the table.
 Snowflake does not support UNDROP with row access policy objects.
 Snowflake does not support using external tables as a mapping table in a row access
policy.
 A table or view column can only be protected by one row access policy at a time.
Adding a policy fails if the policy body refers to a table or view column that is
protected by a row access policy.
 If an object has both a row access policy and one or more Column-level
Security masking policies, the row access policy is evaluated first.
1. Introduction
In the Cloud computing era with pay-as-you go resources it is necessary to have a billing alerts set to
get notified when there are unexpected spend increases. This is an extremely useful tool to keep a
close watch on your resource usage and stay on-budget.
Snowflake provides one such feature to track your resource usage and control your budget using
Resource Monitors.
2. Snowflake Resource Monitors
Resource Monitors in Snowflake assist in cost management and prevent unforeseen credit usage
caused by operating warehouses. They issue alarm alerts and helps in stopping user-managed
warehouses when certain limits are reached or approaching.
Resource monitors can only be created by Account Administrators (i.e. users with the
ACCOUNTADMIN role).
However, Account Administrators can choose to enable users with other roles to view and modify
resource monitors using SQL.
3. Creating Resource Monitors from WebUI
To create Resource monitors in Snowflake, follow below steps.
1. Login to Snowflake and switch the role to Account Admin.
2. Navigate to Account > Resource Monitors > Create Resource Monitor.

Creating Resource Monitors


3. The following properties needs to be configured to set up a Resource Monitor.
3.1. Credit Quota
Credit Quota is the number of credits that are allowed to be consumed in a given interval of time
during which the Resource Monitor takes action. Credit quota accounts for credits consumed by both
user-managed virtual warehouses and virtual warehouses used by cloud services.
3.2. Monitor Level
The Resource Monitors in Snowflake can monitor the credit usage at two different levels.
 ACCOUNT: At the Account level i.e. all the warehouses in the account.
 WAREHOUSE: At the individual Warehouse or a group of warehouses level.
If you have selected the Monitor level as Warehouse, you need to individually select the Warehouses
to monitor.
The following image illustrates a scenario in which one resource monitor is set at the Account level
and individual warehouses are assigned to two other resource monitors.
 The credit quota for the entire account is set to 4000 credits for the interval (month,
week, etc.), as defined by Resource Monitor 1. If this quota is reached within the
interval, the actions defined for the resource monitor (Suspend, Suspend Immediately,
etc.) are enforced for all the warehouses.
 Warehouse 2 can consume a maximum of 1000 credits within the interval.
 Warehouse 3 and 4 can consume a maximum combined total of 1500 credits within
the interval.

Resource Monitors Illustration


3.3. Schedule
By default, the Snowflake set the scheduling of resource monitor to begin monitoring immediately
and reset back the credit usage to 0 at the beginning of the calendar month.
However, you can customize the scheduling of the resource monitor to your liking using additional
properties as shown below
Time Zone:
You have two options to set the time zone of the schedule – Local and UTC.
Starts:
You can choose to start the resource monitor either immediately or Later. If you choose Later, you
should enter the date and time for the resource monitor to start.
Ends:
You can choose to run the resource monitor continuously using the Never option or stop the resource
monitor at a particular date and timestamp using the On option.
Resets:
You can choose the frequency interval at which the credit usage resets. The supported values are
 Daily
 Weekly
 Monthly
 Yearly
 Never (Used credits never reset. Assigned warehouses continue using credits until the
credit quota is reached)
3.4. Actions
You can define certain defined actions when the credit quota is reaches a certain limit. Following are
the actions that resource monitors support.
 One Suspend and Notify action.
 One Suspend Immediately and Notify action.
 Up to five Notify actions.
Suspend: Suspends all assigned warehouses after all statements being executed by the warehouse(s)
have completed.
Suspend Immediately: Suspends all assigned warehouses immediately, which cancels any statements
being executed by the warehouses at the time.
Notify: Send an alert notification to all users with notifications enabled.
A Resource Monitor must have at least one action defined. If no actions have been defined, nothing
happens when the used credits reach the threshold.
4. Click Create to create the Resource Monitor.
The below image shows the Resource Monitor RM_DEMO with Credit Quota set to 100 monitoring
at the Warehouse level with two warehouses configured with default schedule.
 Suspend and Notify action is set to trigger at 90% of credit usage.
 Suspend Immediately and Notify action is set to trigger at 95% of credit usage.
 Notify action is set to trigger at 70% and 80% of credit usage.
Creati
ng Resource Monitor
4. Creating Resource Monitors using SQL
Resource Monitors can also be created using CREATE RESOURCE MONITOR command.
The below image shows an example of resource monitor with default schedule.
CREATE RESOURCE MONITOR "RM_DEMO" WITH CREDIT_QUOTA = 100
TRIGGERS
ON 90 PERCENT DO SUSPEND
ON 95 PERCENT DO SUSPEND_IMMEDIATE
ON 70 PERCENT DO NOTIFY
ON 80 PERCENT DO NOTIFY;
The below image shows an example of resource monitor with custom schedule.
CREATE RESOURCE MONITOR "RM_DEMO" WITH CREDIT_QUOTA = 100,
frequency = 'MONTHLY', start_timestamp = '2022-10-01 00:00 IST', end_timestamp = null
TRIGGERS
ON 90 PERCENT DO SUSPEND
ON 95 PERCENT DO SUSPEND_IMMEDIATE
ON 70 PERCENT DO NOTIFY
ON 80 PERCENT DO NOTIFY;
5. Assigning Warehouses to the Resource Monitor
Once the resource monitor is created, warehouses can be assigned to it as shown below.
ALTER WAREHOUSE "COMPUTE_WH" SET RESOURCE_MONITOR = "RM_DEMO";
ALTER WAREHOUSE "DEMO_WH" SET RESOURCE_MONITOR = "RM_DEMO";
A Resource monitor can be set at the Account level as shown below.
ALTER ACCOUNT SET RESOURCE_MONITOR = RM_DEMO;
6. Resource Monitor Notification Alerts for Administrators
Notifications can be received by account administrators through the web interface and/or email. By
default, notifications are not enabled. It must be enabled from classic web interface.
Follow below steps to enable notifications.
1. Login to Snowflake and switch the role to Account Admin.
2. In the drop-down menu at the top right corner, navigate to Preferences > Notifications.
3. Select the Notification Preference as All.

Enabling Notification Preferences


This allows all the users with Account Admin Role to receive the email alerts.
7. Resource Monitor Notification Alerts for Non-Administrators
Email Notifications for Non-Admin users cannot be enabled directly from Web interface. It can only
enabled through SQL statement as shown below.
CREATE RESOURCE MONITOR "RM_USER_ALERT" WITH CREDIT_QUOTA = 100
NOTIFY_USERS = ('SFUSER04')
TRIGGERS
ON 90 PERCENT DO SUSPEND
ON 95 PERCENT DO SUSPEND_IMMEDIATE
ON 70 PERCENT DO NOTIFY
ON 80 PERCENT DO NOTIFY;
The users must have their email id verified for them to receive the email alerts.
To view the list of users who were given access to email alerts of resource monitors, use below SQL
command.
SHOW RESOURCE MONITORS;

Listing users with access to email alerts of Resource Monitor


The users with Account Admin access by default have access to email alerts and they are not
displayed under notify_users.
8. Assigning Warehouses to multiple Resource Monitors
When you try to assign a Warehouse which already been assigned to a resource monitor to a new
resource monitor, the warehouse gets assigned to the new resource monitor and gets unassigned from
previous resource monitor and credit usage resets to zero.
A resource monitor can be set to monitor multiple warehouses but a warehouse can be assigned only
to a single resource monitor.
The older resource monitor if not assigned to any other warehouse remains dormant and do not
monitor anything.
9. Summary
A Resource Monitor can be used to monitor credit usage by user-managed virtual warehouses and
virtual warehouses used by cloud services. However it can only suspend user-managed warehouses
based on credit usage thresholds.
 Suspend and Suspend Immediate actions only apply to user-managed warehouses.
Virtual warehouses that provide cloud services cannot be suspended.
 By default, notifications are not enabled. It must be enabled from classic Web UI for
Administrators.
 Reassigning a warehouse to a new resource monitor get it unassigned from older
resource monitor
1. Introduction
Snowflake like any good database keeps track of all the defined objects within it and their associated
metadata. In order to make it simple for users to inspect some of the information about the databases,
schemas, and tables in the Snowflake, the metadata information is supplied as a collection of views
against the metadata layer.
Information Schema is one such offering from Snowflake that provides extensive metadata
information about the objects created in your account.
2. Snowflake Information Schema
The Snowflake Information Schema is a Data Dictionary schema available as a read-only schema
named INFORMATION_SCHEMA under each database created automatically by Snowflake.
It consists of a set of system-defined views and table functions that provide extensive metadata
information about the objects created in your account.
3. Snowflake Information Schema Views
The views in INFORMATION_SCHEMA display metadata about objects defined in the database, as
well as metadata for non-database, account-level objects that are common across all databases.
 There are 17 views available under INFORMATION_SCHEMA that holds
information of Database level objects.
 There are 8 views that holds information of Account level objects.
The Snowflake Information Schema views are ANSI-standard.
INFORMATION_SCHEMA available under a database
named COVID19
Below are the various views available under INFORMATION_SCHEMA
3.1. Databases, Schema, Tables and Views
DATABASES: The databases that are accessible to the current user’s role.
SCHEMATA: The schemas defined in this database that are accessible to the current user’s role.
TABLES: The tables defined in this database that are accessible to the current user’s role.
VIEWS: The views defined in this database that are accessible to the current user’s role.
TABLE_STORAGE_METRICS: All tables within an account, including expired tables.
INFORMATION_SCHEMA_CATALOG_NAME: Returns the name of the database in which
Information_schema resides.
3.2. Sequences and File Formats
SEQUENCES: The sequences defined in this database that are accessible to the current user’s role.
FILE_FORMATS: The file formats defined in this database that are accessible to the current user’s
role.
3.3. Stages, External Tables and Pipes
STAGES: Stages in this database that are accessible by the current user’s role.
EXTERNAL_TABLES: The external tables defined in this database that are accessible to the current
user’s role.
PIPES: The pipes defined in this database that are accessible to the current user’s role.
LOAD_HISTORY: The history of data loaded into tables using the COPY INTO command with in
last 14 days.
3.4. UDFs, Procedures and Packages
PROCEDURES: The stored procedures defined in this database that are accessible to the current
user’s role.
FUNCTIONS: The user-defined functions defined in this database that are accessible to the current
user’s role.
PACKAGES: Available packages in current account.
3.5. Roles, Object Privileges and Usage Privileges
APPLICABLE_ROLES: The roles that can be applied to the current user.
ENABLED_ROLES: The roles that are enabled to the current user.
TABLE_PRIVILEGES: The privileges on tables defined in this database that are accessible to the
current user’s role.
USAGE_PRIVILEGES: The usage privileges on sequences defined in this database that are
accessible to the current user’s role.
OBJECT_PRIVILEGES: The privileges on all objects defined in this database that are accessible to
the current user’s role.
3.6. Columns and Constraints
COLUMNS: The columns of tables defined in this database that are accessible to the current user’s
role.
REFERENTIAL_CONSTRAINTS: Referential Constraints in this database that are accessible to
the current user
TABLE_CONSTRAINTS: Constraints defined on the tables in this database that are accessible to
the current user
3.7. Replication Groups and Databases
REPLICATION_DATABASES: The databases for replication that are accessible to the current
user’s role.
REPLICATION_GROUPS: The replication groups that are accessible to the current user’s role.
4. Snowflake Information Schema Views Usage
Below are the few examples of the usage of Snowflake Information Schema views
4.1. Get the list of all tables in a Snowflake Schema
SELECT table_name, table_type
FROM my_db.information_schema.tables
WHERE table_schema = 'my_schema'
ORDER BY table_name;
4.2. Get the Top 5 Table names with highest record count in each Snowflake Schema
SELECT table_schema, table_name, row_count
FROM(
SELECT
table_schema, table_name, row_count ,
dense_rank() over(partition by table_schema order by row_count desc) as rnk
FROM my_db.information_schema.tables
WHERE row_count IS NOT NULL
)
WHERE rnk <=5
ORDER BY table_schema, row_count desc;
4.3. Get the size (in bytes) of all tables in all schemas in a Snowflake database
SELECT table_schema,sum(bytes)
FROM my_db.information_schema.tables
GROUP BY table_schema;
4.4. Get the time travel duration of all tables in a Snowflake Schema
SELECT table_name, retention_time
FROM my_db.information_schema.tables
WHERE table_schema = 'my_schema'
ORDER BY table_name;
4.5. Get the list of all Primary Keys in all tables present in a Snowflake Schema
SELECT table_name, constraint_type, constraint_name
FROM my_db.information_schema.table_constraints
WHERE constraint_type = 'PRIMARY KEY' and table_schema = 'my_schema'
ORDER BY table_name;
4.6. Generate SQL statements to change ownership on all tables owned by a role to a new role in
Snowflake
SELECT 'grant ownership on table ' || table_name || ' to role my_new_role copy grants;' AS
grant_statement
FROM my_db.information_schema.table_privileges
WHERE grantor = 'old_grant_role';
4.7. Generate SQL statements to drop all tables in a Snowflake schema
SELECT 'drop table ' || table_name || ' cascade;'
FROM my_db.information_schema.tables
WHERE table_schema = 'my_schema'
ORDER BY table_name;
5. Snowflake Information Schema Table Functions
The table functions in INFORMATION_SCHEMA can be used to return account-level usage and
historical information for storage, warehouses, user logins, and queries.
Like Information Schema Views, the table functions are not visible directly under Information
Schema. For more details about table functions refer Snowflake Documentation.
6. SHOW Commands vs Information Schema
The same data presented by the SHOW <objects> commands is also available through a SQL
interface using the INFORMATION SCHEMA views.
The SHOW commands can be replaced by the views, but before switching, you should be aware of
the following significant differences:
 To query the Information Schema views, the warehouse must be operating and
actively being used where as it is not required for SHOW commands.
 While Information Schema views display all items in the current specified database,
most SHOW commands by default limit results to the current schema.
7. Summary
INFORMATION_SCHEMA is a read-only schema available automatically under each database. It
stores metadata of all Snowflake objects built under the database.
Running Queries on INFORMATION_SCHEMA requires warehouse to be up and running which
incurs Snowflake credits.
The output of a view or table function depend on the privileges granted to the user’s current role.
When querying an INFORMATION_SCHEMA view or table function, only objects for which the
current role has been granted access privileges are returned.
Contents hide
1. Introduction
2. Snowpipe Rest API Endpoints
insertFiles Endpoint
insertReport Endpoint
3. Steps to access Snowpipe Rest API Endpoints using Postman
Step-1: Configure Key-Pair Authentication
Step-2: Generate JWT Token
Step-3: Create a table to load the data from staged files
Step-4: Create a Stage and place the files
Step-5: Create a Snowpipe
Step-6: Ingest staged files by invoking Snowpipe using a REST API call
Step-7: Get the load history of the files submitted using a REST API call
4. Snowpipe REST API Response Codes
5. Closing Points
1. Introduction
Snowflake provides multiple ways to process data from staged files into database tables. In our
previous articles, we have discussed how to process files automatically when files arrive in the
external stage using Snowpipe by configuring Event Notifications.
In this article let us discuss about Snowpipe REST API which lets users define a list of files to ingest
into Snowflake and fetch reports of the load history by making REST API calls.
2. Snowpipe Rest API Endpoints
The Snowpipe API provides two REST endpoints to work with staged files.
 insertFiles
 insertReport
insertFiles Endpoint
This endpoint helps in passing the list of files to be ingested into a table in Snowflake. A successful
response from this endpoint means that Snowflake has recorded the list of files to add to the table. A
maximum of 5000 files can be submitted for ingestion in a single API request.
insertReport Endpoint
This endpoint helps in retrieving the list of files submitted for loading using insertFiles endpoint and
their load status. The endpoint retains the 10,000 most recent events.
3. Steps to access Snowpipe Rest API Endpoints using Postman
Follow below steps to access Snowpipe REST Endpoints using Postman.
 Step-1: Configure Key-Pair Authentication
 Step-2: Generate JWT Token
 Step-3: Create a table to load the data from staged files
 Step-4: Create a Stage and place the files
 Step-5: Create a Snowpipe
 Step-6: Ingest staged files by invoking Snowpipe using a REST API call
 Step-7: Get the load history of the files submitted using a REST API call
Step-1: Configure Key-Pair Authentication
Every API request we make must also include the authentication information. The Snowpipe REST
endpoints require Key Pair authentication with JSON Web Token (JWT).
Configure Key-Pair Authentication by performing below actions.
1. Generate Public-Private Key pair using OpenSSL.
2. Assign the generated public key to your Snowflake user.
3. The generated private key should be stored in a file and available locally on machine
where JWT is generated.
Refer our previous article for more details on Configuring Key Pair authentication.
Step-2: Generate JWT Token
Once Key Pair Authentication for your Snowflake account is set, JWT token should be generated.
This JWT or JSON Web Token is a time limited token which has been signed with your key and
Snowflake will know that you authorized this token to be used to authenticate as you for the API.
Below is the command to generate JWT token using SnowSQL.
snowsql --generate-jwt -a <account_identifier> -u <username> --private-key-path
<path>/rsa_key.pem
In the above command,
 <account_identifier> : It is the unique name assigned to your account. It can be
extracted from the URL to login to Snowflake account as shown below.
<account_identifier>.snowflakecomputing.com
 <username> : It is the user name with which you connect to the specified account.
 <path>: It is the location where the generated private key file is placed.
The below image shows generating JWT using SnowSQL command line tool.

Generating JWT using SnowSQL


Step-3: Create a table to load the data from staged files
Before we create a Snowpipe, let us create a table in Snowflake into which the data from staged files
is to be loaded.
The below SQL statement creates a table named Employee.
CREATE OR REPLACE TABLE EMPLOYEES(
EMPLOYEE_ID NUMBER,
FIRST_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
EMAIL VARCHAR(50),
PHONE_NUMBER NUMBER
);
Step-4: Create a Stage and place the files
Snowpipe supports loading from the following stage types.
 Named Internal Stages
 External Stages
 Table stages
The below SQL statement creates an internal stage in Snowflake.
CREATE OR REPLACE STAGE MY_INTERNAL_STAGE;
In order to access the files from external locations like Amazon S3, Google Cloud Storage, or
Microsoft Azure, we need to build external stages in Snowflake referencing the external location
where our files will be placed.
Refer below articles for creating externals stages in Snowflake
 Create an External Stage on AWS S3 bucket
 Create an External Stage on Azure Blob Storage
Step-5: Create a Snowpipe
Now that the Stage from where the data files will be read and the table where the data will be loaded
are setup, let us build the pipe that copies data from the stage into the table.
The below SQL statement creates a Snowpipe named MY_REST_SNOWPIPE which loads data
from MY_S3_STAGE into EMPLOYEES table.
CREATE OR REPLACE PIPE MY_REST_SNOWPIPE
AS
COPY INTO EMPLOYEES
FROM @MY_S3_STAGE
FILE_FORMAT = (TYPE = 'CSV' skip_header = 1);
AUTO_INGEST=TRUE is not used while creating the Snowpipe as the invoking of the Snowpipe
happens by calling REST endpoints.
Step-6: Ingest staged files by invoking Snowpipe using a REST API call
To submit the list of files staged to load into a table using Snowpipe REST API, POST a request
using insertFiles endpoint.
The Snowpipe REST API request for submitting a list of files to ingest into a table is as follows.
HTTP Method: POST

EndPoint URL:
https://<account>.snowflakecomputing.com/v1/data/pipes/<pipeName>/insertFiles?
requestId=<requestId>

Headers:
Content-Type: application/json
Accept: application/json
Authorization: Bearer <jwt_token>

Body:
{
"files":[
{
"path":"path/file1.csv",
"size":100
},
{
"path":"path/file2.csv",
"size":100
}
]
}
In the Endpoint URL of the above request:
 account: The Account Identifier of your Snowflake account which can be obtained
from your login URL.
 pipeName: Fully qualified Snowpipe Name. ex: my_db.my_schema.my_pipe.
 requestId: A random string used to track requests. The same can be passed
to inserReport endpoint to find the load status of files processed in a particular
request.
Below is the EndPoint URL of the sample request made for the demonstration.
https://eubchbl-al20253.snowflakecomputing.com/v1/data/pipes/
DEMO_DB.PUBLIC.MY_REST_SNOWPIPE/insertFiles?requestId=S0MeRaNd0MvA1ue01
In the Headers section of the above request:
 <jwt_token> is the token generated in step-2.
The below image shows the HTTP Method, EndPoint URL, Headers configured in the API request
to submit a list of files for ingestion in Postman.

In the Body section of the above request.


 The list of files with path along with file sizes (optional but recommended for better
performance) are submitted.
Below is the contents of body submitted for demonstration.
{
"files":[
{
"path":"Inbox/s3_emp_1.csv"
},
{
"path":"Inbox/s3_emp_2.csv"
}
]
}
Submit the request once all the required details are filled in. The successful response of the request
will be as shown below.
Step-7: Get the load history of the files submitted using a REST API call
The load history of files submitted for ingestion can be extracted either using
the insertReport endpoint. The same can also be verified directly in Snowflake using
COPY_HISTORY table function.
The load history of files submitted for ingestion can be extracted by a REST API request as shown
below.
HTTP Method: GET

EndPoint URL:
https://<account>.snowflakecomputing.com/v1/data/pipes/<pipeName>/insertReport?
requestId=<requestId>&beginMark=<beginMark>

Headers:
Content-Type: application/json
Accept: application/json
Authorization: Bearer <jwt_token>
In the Endpoint URL of the above request:
 account: The Account Identifier of your Snowflake account which can be obtained
from your login URL.
 pipeName: Fully qualified Snowpipe Name. ex: my_db.my_schema.my_pipe.
 requestId: A random string used to track requests submitted in the REST request to
ingest files.
 beginMark: Marker, returned by a previous call to insertReport, that can be used to
reduce the number of repeated events seen when repeatedly calling insertReport.

Below is the response of the request used for demonstration which shows the load status of the files
processed.
{
"pipe": "DEMO_DB.PUBLIC.MY_REST_SNOWPIPE",
"completeResult": true,
"nextBeginMark": "1_1",
"files": [
{
"path": "Inbox/s3_emp_2.csv",
"stageLocation": "s3://te-aws-s3-bucket001/",
"fileSize": 18777931,
"timeReceived": "2023-05-21T15:19:04.353Z",
"lastInsertTime": "2023-05-21T15:19:35.356Z",
"rowsInserted": 199923,
"rowsParsed": 199923,
"errorsSeen": 0,
"errorLimit": 1,
"complete": true,
"status": "LOADED"
},
{
"path": "Inbox/s3_emp_1.csv",
"stageLocation": "s3://te-aws-s3-bucket001/",
"fileSize": 18777914,
"timeReceived": "2023-05-21T15:19:04.353Z",
"lastInsertTime": "2023-05-21T15:19:35.356Z",
"rowsInserted": 200195,
"rowsParsed": 200195,
"errorsSeen": 0,
"errorLimit": 1,
"complete": true,
"status": "LOADED"
}
],
"statistics": {
"activeFilesCount": 0
}
}
The load status can also be verified from Snowflake using the COPY_HISTORY table function as
shown below.
SELECT * FROM
TABLE(INFORMATION_SCHEMA.COPY_HISTORY(TABLE_NAME=>'EMPLOYEES',
START_TIME=> DATEADD(MINUTES, -10, CURRENT_TIMESTAMP())));

COPY_HISTORY output
4. Snowpipe REST API Response Codes
Below are the expected response codes of the Snowpipe REST API requests.
Response
Description
Code

200 Success. Files added to the queue of files to ingest.

400 Failure. Invalid request due to an invalid format, or limit exceeded.

Failure. pipeName not recognized. This error code can also be returned if the role used when callin
404
sufficient privileges.

429 Failure. Request rate limit exceeded.

500 Failure. Internal error occurred.

5. Closing Points
In this article we have used Postman REST Client to use Snowpipe REST API. However in your
application, you can choose to use either Java or Python SDKs which are provided by Snowflake.
The difference is the Snowflake provided SDKs automatically handle the creation and management of
JWT tokens required for authentication.
For more details, refer Snowflake Documentation.
Execute multiple SQL statements in a single Snowflake API request
April 30, 2023
Spread the love
Contents hide
Introduction
Submitting multiple statements in a single Snowflake SQL REST API request
Extracting Results of each SQL Statement in the API request
Introduction
In our previous article we have discussed on overview of Snowflake SQL REST API and how to
submit a SQL API request to execute a SQL statement. This method allows submitting only one SQL
statement for execution.
In this article let us understand how to submit a request containing multiple statements to the
Snowflake SQL API.
Submitting multiple statements in a single Snowflake SQL REST API request
The process to submit multiple statements in a single request is similar to submitting a single
statement in a request except that in the body part of the request
 In the statement field, enter multiple statements separated using semicolon (;)
 In the parameters field, set the MULTI_STATEMENT_COUNT field to the
number of SQL statements in the request.
The Snowflake SQL REST API request for executing multiple SQL statements in a single request is
as follows.
HTTP Method: POST

EndPoint URL:
https://<account_identifier>.snowflakecomputing.com/api/v2/statements

Headers:
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

Body:
{
"statement": "select * from table1;selct * from table2;",
"timeout": 60,
"database": "<your_database>",
"schema": "<your_schema>",
"warehouse": "<your_warehouse>",
"role": "<your_role>"
"parameters": {
"MULTI_STATEMENT_COUNT": "<statements_count>"
}
}
To learn more about how to generate a JWT token, refer our previous article.
For example, below is the body part of the request submitting two SQL statements for execution.
{
"statement": "select * from employees where employee_id=101;select * from employees where
employee_id=102;",
"timeout": 60,
"database": "DEMO_DB",
"schema": "PUBLIC",
"warehouse": "COMPUTE_WH",
"role": "ACCOUNTADMIN",
"parameters": {
"MULTI_STATEMENT_COUNT": "2"
}
}
In the above example
 MULTI_STATEMENT_COUNT is set to 2 which corresponds to the number of
SQL statements being submitted.
 To submit a variable number of SQL statements in the statement field,
set MULTI_STATEMENT_COUNT to 0. This is useful in an application where the
number of SQL statements submitted is not known at runtime.
 If the value of MULTI_STATEMENT_COUNT does not match the number of SQL
statements specified in the statement field, the SQL API returns error.
The below image shows the HTTP Method, EndPoint URL, Headers configured in the API request
to execute a SQL statement in Postman.

API request showing the HTTP Method, EndPoint URL, Headers in Postman
The below image shows the Body part of API request configured to execute a SQL statement in
Postman.

API request showing the Body section of the request in Postman


Extracting Results of each SQL Statement in the API request
The response of the request submitting multiple SQL statements for execution do not include the
output of the individual statements. Instead the response contains a statementHandles field which
holds the list of statement handles of each statement.
The statementHandle can be used in a GET request to /api/v2/statements/ endpoint to get the
execution status and output of individual statements.
The response of the above the discussed example is as follows which
contains statementHandle information of individual statements in a statementHandles field.
{
"resultSetMetaData": {
"numRows": 1,
"format": "jsonv2",
"partitionInfo": [
{
"rowCount": 1,
"uncompressedSize": 57
}
],
"rowType": [
{
"name": "multiple statement execution",
"database": "",
"schema": "",
"table": "",
"nullable": false,
"scale": null,
"collation": null,
"byteLength": 16777216,
"precision": null,
"type": "text",
"length": 16777216
}
]
},
"data": [
[
"Multiple statements executed successfully."
]
],
"code": "090001",
"statementHandles": [
"01abf778-3200-b817-0003-c9860001d03e",
"01abf778-3200-b817-0003-c9860001d042"
],
"statementStatusUrl": "/api/v2/statements/01abf778-3200-b817-0003-c9860001d03a?
requestId=7e5c7836-a8bd-4765-b9f9-cbaf1b96a2e5",
"requestId": "7e5c7836-a8bd-4765-b9f9-cbaf1b96a2e5",
"sqlState": "00000",
"statementHandle": "01abf778-3200-b817-0003-c9860001d03a",
"message": "Statement executed successfully.",
"createdOn": 1682833476276
}
The Snowflake SQL REST API request for checking the execution status of an individual SQL
statement is as follows.
HTTP Method: GET

EndPoint URL:
https://<account_identifier>.snowflakecomputing.com/api/v2/statements/<statementHandle>

Headers:
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT
on of a SQL statement
Introduction
Snowflake provides multiple ways to manage data efficiently in its database. Snowflake SQL REST
API is one such feature which allows users to interact with Snowflake through HTTP requests,
making it easy to integrate with other systems.
Before we jump into understanding the capabilities of Snowflake SQL REST API and how to access
it, let us quickly understand what an API is.
What is REST API?
API stands for Application Programming Interface, which is a software intermediary provided by an
application to other application that allows two applications to talk to each other.
A real time example of APIs is the Weather apps on your mobile. These apps do not use their own
weather forecasting system of their own. Instead, they provide weather information accessing the API
of a third party weather provider. Apple, for instance, uses The Weather Channel’s API.
REST stands for REpresentational State Transfer which is an architectural style. REST defines a set
of principles and standards using which APIs can be built and REST is the widely accepted
architectural style of building APIs.
REST API request is generally made up of four parts – HTTP Method, Endpoint URL, Headers
and Body
We will discuss more about the making a Snowflake SQL REST API request in the subsequent
sections of the article.
Snowflake SQL REST API capabilities
The operations that can be performed using Snowflake SQL REST API are
 Submit SQL statements for execution.
 Check the status of the execution of a statement.
 Cancel the execution of a statement.
 Fetch query results concurrently.
This API can be used to execute standard queries and most DDL and DML statements.
Snowflake SQL REST API Endpoints
The Snowflake SQL REST API can be accessed using the following URL.
https://<account_identifier>.snowflakecomputing.com/api
The <account_identifier> can be obtained easily from your login URL.
The API consists of the /api/v2/statements/ resource and provides the following endpoints:
The following endpoint is used to submit a SQL statement for execution.
/api/v2/statements
The following endpoint is used to to check the status of the execution of a statement.
/api/v2/statements/<statementHandle>
The following endpoint is used to to cancel the execution of a statement.
/api/v2/statements/<statementHandle>/cancel
In the steps to come, we shall learn how to access all these endpoints using Postman.
Steps to access Snowflake SQL REST API using Postman
Postman is a powerful tool for testing APIs, and it allows us to easily make HTTP requests and view
responses. You can either download and install desktop application of Postman or use its web
version from any device by creating an account.
Following below steps to access Snowflake SQL REST API using Postman.
Authentication
Every API request we make must also include the authentication information. There are two options
for providing authentication: OAuth and JWT key pair authentication. In this article we will use JWT
key pair authentication for demonstration purpose.
Follow below steps to use JWT Key Pair Authentication.
1. Configure Key-Pair Authentication by performing below actions.
 Generate Public-Private Key pair using OpenSSL.
 Assign the generated public key to your Snowflake user.
 The generated private key should be stored in a file and available locally on machine
where JWT is generated.
Refer our previous article for more details on configuring Key Pair authentication.
2. Once Key Pair Authentication for your Snowflake account is set, JWT token should be generated.
This JWT token is time limited token which has been signed with your key and Snowflake will know
that you authorized this token to be used to authenticate as you for the SQL API.
Below is the command to generate JWT token using SnowSQL.
snowsql --generate-jwt -a <account_identifier> -u <username> --private-key-path
<path>/rsa_key.pem
The below image shows generating JWT using SnowSQL command line tool.

Generating JWT using SnowSQL


3. The generated JWT token is used as part of Header in the API request you are making in the
Postman.
Authorization: Bearer <jwt_token>
<jwt_token> is the token that you generated.
Submitting a Request to Execute a SQL Statement
To execute a SQL statement using Snowflake SQL REST API, we have to send a POST request to
the /api/v2/statements/ endpoint.
The Snowflake SQL REST API request for executing a SQL statement is as follows.
HTTP Method: POST

EndPoint URL:
https://<account_identifier>.snowflakecomputing.com/api/v2/statements

Headers:
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

Body:
{
"statement": "select * from table",
"timeout": 60,
"database": "<your_database>",
"schema": "<your_schema>",
"warehouse": "<your_warehouse>",
"role": "<your_role>"
}
In the body part of the above request
 The statement field specifies the SQL statement to execute.
 The timeout field specifies that the server allows 60 seconds for the statement to be
executed.
 The other fields are self-explanatory.
The below image shows the HTTP Method, EndPoint URL, Headers configured in the API request
to execute a SQL statement in Postman.
The below image shows the Body part of API request configured to execute a SQL statement in
Postman.

Reading response of the API request.


If the submitted SQL statement through API request is successfully executed, Snowflake returns the
HTTP response code 200 and returns the rows in a JSON array object.
Below is the response of the API request we submitted earlier.
{
"resultSetMetaData": {
"numRows": 3,
"format": "jsonv2",
"partitionInfo": [
{
"rowCount": 3,
"uncompressedSize": 59
}
],
"rowType": [
{
"name": "EMPLOYEE_ID",
"database": "DEMO_DB",
"schema": "PUBLIC",
"table": "EMPLOYEES",
"nullable": true,
"scale": 0,
"collation": null,
"byteLength": null,
"precision": 38,
"type": "fixed",
"length": null
},
{
"name": "EMPLOYEE_NAME",
"database": "DEMO_DB",
"schema": "PUBLIC",
"table": "EMPLOYEES",
"nullable": true,
"scale": null,
"collation": null,
"byteLength": 200,
"precision": null,
"type": "text",
"length": 50
}
]
},
"data": [
[
"100",
"Tony"
],
[
"101",
"Steve"
],
[
"102",
"Bruce"
]
],
"code": "090001",
"statementStatusUrl": "/api/v2/statements/01abf1da-3200-b819-0003-c9860001c04a?
requestId=cf3656ec-15c7-42c2-9033-8405fb7e26bb",
"requestId": "cf3656ec-15c7-42c2-9033-8405fb7e26bb",
"sqlState": "00000",
"statementHandle": "01abf1da-3200-b819-0003-c9860001c04a",
"message": "Statement executed successfully.",
"createdOn": 1682747204706
}
The response of the API request consists of
 Number of rows returned as output.
 The rowType array object which gives additional metadata information about the
datatypes returned from the query.
 The actual row data information is available under the data array.
 A QueryStatus object which includes information about the status of the execution
of the statement.
Checking the Status of Execution of the Statement
If the execution of the statement takes longer than 45 seconds or if you submitted an asynchronous
query, Snowflake returns a 202 response code. In such cases you must send a request to check the
execution status of the statement.
To get the execution status of the statement, you can send GET request to /api/v2/statements/
endpoint and append the statementHandle to the end of the URL path as a path parameter.
The statementHandle is a unique identifier of a statement submitted for execution.
The statementHandle information is present in the output response of the request in the QueryStatus
object( highlighted in the response above ).
The Snowflake SQL REST API request for checking the execution status of a SQL statement is as
follows.
HTTP Method: GET

EndPoint URL:
https://<account_identifier>.snowflakecomputing.com/api/v2/statements/<statementHandle>

Headers:
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT
If the statement has finished executing successfully, Snowflake returns the HTTP response code 200
and the results in a ResultSet object. However, if an error occurred when executing the statement,
Snowflake returns the HTTP response code 422 with a QueryFailureStatus object.
Cancelling the Execution of a SQL statement
To cancel the execution of a statement, send a POST request to the /api/v2/statements/ endpoint and
append the statementHandle to the end of the URL path followed by cancel as a path parameter.
The Snowflake SQL REST API request to cancel the execution of a SQL statement is as follows.
HTTP Method: POST

EndPoint URL:
https://<account_identifier>.snowflakecomputing.com/api/v2/statements/<statementHandle>/cancel

Headers:
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT
HOW TO: Generate JWT Token for Snowflake Key Pair Authentication?
April 9, 2023
Spread the love
Contents hide
Introduction
What is JWT Token?
Pre-requisites for generating JWT Token for Snowflake Authentication
Generating JWT Token using Snowflake SnowSQL
Generating JWT Token using Python
Introduction
In our previous article, we have discussed about how to set up Key Pair Authentication in
Snowflake. But if you wanted to connect Snowflake via Snowflake SQL REST API using key pair
authentication, it expects a valid JWT (JSON Web Token).
In this article let us understand what JWT token is, how to generate it and pre-requisites to generate it.
What is JWT Token?
JWT or JSON Web Token is an open industry standard for securely transmitting information between
parties as a JSON object most commonly used to identify an authenticated user.
Once the user is logged in, each subsequent request will also include the JWT. The JWT tokens are
valid usually for only a certain period for about 60 minutes and they need to be regenerated after the
token expired.
Pre-requisites for generating JWT Token for Snowflake Authentication
Below are the pre-requisites for generating JWT Token for Snowflake Key Pair Authentication.
1. Generate Public-Private Key pair using OpenSSL.
2. Assign the public key to your Snowflake user.
3. The generated private key should be stored in a file and available locally on machine
where JWT is generated.
Refer our previous article for more details on configuring Key Pair authentication.
Generating JWT Token using Snowflake SnowSQL
SnowSQL is a command line tool for connecting to Snowflake. Snowflake SnowSQL lets you execute
all SQL queries and perform DDL and DML operations, including loading data into and unloading
data out of database tables.
Refer our previous article to learn more about Snowflake SnowSQL.
SnowSQL has a parameter –generate-jwt, which would generate the JWT Token when used in
conjunction with following parameters.
 -a <account_identifier> : It is the unique name assigned to your account. It can be
extracted from the URL to login to Snowflake account as shown below.
<account_identifier>.snowflakecomputing.com
 -u <username> : It is the user name with which you connect to the specified account.
 –private-key-path <path>: It is the location where the generated private key file is
placed.
Below is the command to generate JWT token in SnowSQL.
snowsql --generate-jwt -a <account_identifier> -u <username> --private-key-path
<path>/rsa_key.pem
The below image shows generating JWT using SnowSQL command line tool.

Generate JWT using SnowSQL


If the generated private key file has an encrypted password, you will be prompted to enter the
password, else press enter.
Generating JWT Token using Python
JWT token can be generated using Python. Snowflake provides a readily available in-built script
which can be downloaded and used for generating JWT tokens in your machine.
The script uses pyjwt and cryptography python libraries for generating the token. Install these libraries
in your python run time environment before running the script.
pip install pyjwt
pip install cryptography
The python script sql-api-generate-jwt.py can be downloaded from Snowflake documentation page
using this link.
To generate JWT, pass values to the following input arguments while running the Python script.
 account : It is the unique name assigned to your account. It can be extracted from the
URL to login to Snowflake account as shown below.
<account_identifier>.snowflakecomputing.com
 user : It is the user name with which you connect to the specified account..
 private_key_file_path : It is the location where the generated private key file is
placed.
To generate JWT, run the python script with input arguments from command line as shown below.
python sql-api-generate-jwt.py --account=<account_identifier> --user=<username> --
private_key_file_path=<path>\rsa_key.pem
The below image shows generating JWT using Python Script.
Generate JWT using Python Script
The JWT tokens can also be generated using Java and Node.js. Snowflake provides sample scripts in
these languages to generate tokens. For more details refer Snowflake Documentation.
1. Introduction
Snowflake supports multiple ways of authentication apart from basic authentication method (i.e. using
username and password). For enhanced security while connecting to Snowflake from other
applications, Snowflake supports Key Pair authentication which uses a combination of public-private
key pair.
The public and private keys required for authentication must be a minimum 2048-bit RSA key pair
generated using OpenSSL. In the Key Pair authentication method the public key is assigned to
the user and the user must use the private key while connecting from the Snowflake client.
How it works?
Every public key matches to only one private key. Together, they are used to encrypt and decrypt
messages. Data encrypted with the private key can be decrypted only with the public key and vice
versa.
When keys are used for authentication, the user being authenticated uses the private key to generate a
digital signature. The client decrypts the signature using the public key and compares the hash with its
own computed hash. If the values match, user authentication is successful.
2. How to generate Public and Private Keys using OpenSSL in Windows?
There are several ways of generating the public-private key pairs.
OpenSSL is an open-source command line tool that is commonly used to generate public-private key
pairs. Snowflake supports keys generated through OpenSSL only.
The Keys generated using PuTTYgen or OpenSSH in Windows are not supported by
Snowflake. Generate Keys by Installing OpenSSL on your Windows machine or use the
exported keys from Linux machine generated using OpenSSL.
2.1. Download and Install OpenSSL on Windows
Follow below steps to install OpenSSL on Windows machine
1. Navigate to OpenSSL for Windows installer page.
2. Download the Windows64 Installation package.
3. Once the download is complete, double-click the downloaded file to start installation.
4. Accept license agreement and click Next in each step to proceed.
5. Click on Install.
6. Click on Finish to exit the wizard once installation is complete.
2.2. Set Environmental Variables for OpenSSL
Follow below steps to set Environmental Variables for OpenSSL.
1. From the Start menu, search for “environmental variables” and click on the Edit the
system environment variables result.
2. In the System Properties window, under the Advanced tab, click Environment
Variables.
3. Under User variables, click on New to create a new variable.
 Variable Name: OPENSSL_CONF
 Variable Value: C:\Program Files\OpenSSL-Win64\bin\openssl.cfg
4. Enter a new value for Path variable as mentioned below.
 C:\Program Files\OpenSSL-Win64\bin
5. Click OK to save the changes.
2.3. Test the OpenSSL Installation
Run the following command from a new command prompt window to test the installation of
OpenSSL.
openssl version
If the installation is successful and the variables set up is done correctly, the command outputs the
OpenSSL version as shown below.

Testing
OpenSSL installation
2.4. Generate Private Key
To generate a private key, open a command prompt window and navigate to path where keys needs to
be stored. You can generate either an encrypted version of the private key or an unencrypted version
of the private key.
To generate an unencrypted version, use the following command:
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.pem –nocrypt
To generate an encrypted version, use the following command (which omits “-nocrypt”):
openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.pem
You will have to enter a passphrase as Encryption Password using this method which would be
required during authentication.
These commands generate a private key in PEM format as shown below.
-----BEGIN PRIVATE KEY-----
MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQC0ElLYu+UZjgft
6th1HDppkJg1pbEzCiUw6+czuiDgzfnbvEG8Ah/y1Ir2f27AmCUVvfFIXiEfGFIY
...
d+7T5RSG+bQyylGPpfpdig==
-----END PRIVATE KEY-----
2.5. Generate Public Key
The Public key is generated by referencing the Private Key.
The following command generates the public key using the private key contained in rsa_key.pem
openssl rsa -in rsa_key.pem -pubout -out rsa_key.pub
This commands generate a public key in PEM format as shown below.
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtBJS2LvlGY4H7erYdRw6
aZCYNaWxMwolMOvnM7og4M3527xBvAIf8tSK9n9uwJglFb3xSF4hHxhSGH7sy1n2
...
qIQIDAQAB
-----END PUBLIC KEY-----
3. Configuring Key Pair Authentication in Snowflake
Follow below steps to configure Key Pair Authentication for all supported Snowflake clients.
3.1. Generate and store the Public and Private keys securely
Generate the public and private keys using OpenSSL as explained in the previous section. If the keys
are generated in a different location, move them to your local directory where the snowflake client
runs.
The Key files should be protected from unauthorized access and it is user’s responsibility to secure
the keys.
3.2. Assign the Public Key to a Snowflake User
The public key should be assigned to the user using ALTER USER statement as shown below.
ALTER USER SFUSER08 SET RSA_PUBLIC_KEY = 'MIIBIjANB…';
Only users with SECURITYADMIN role and above can alter the user.
3.3. Verify the assigned Public Key of a User
Verify if the public key is successfully assigned to the user using DESCEIBE USER statement as
shown below.
DESCRIBE USER SFUSER08;

Verifying the Public Key and Public Key Finger print assigned to the user
The above output of the DESCRIBE USER shows the assigned public key and the public key finger
print generated.
With this step the configuration of the Key Pair Authentication is completed.
4. Connect Snowflake using Key-Pair Authentication
Below are the supported Snowflake Clients with Key Pair Authentication.
 SnowSQL (CLI Client)
 Snowflake Connector for Python
 Snowflake Connector for Spark
 Snowflake Connector for Kafka
 Go driver
 JDBC Driver
 ODBC Driver
 Node.js Driver
 .NET Driver
 PHP PDO Driver for Snowflake
Let us use SnowSQL to verify whether the generated private key can be used to connect to
Snowflake.
In order to connect SnowSQL through key pair authentication, the private key must be available on a
local directory of the machine where SnowSQL is installed.
To know more about how to download and install SnowSQL, refer our previous article.
Run the below command to connect to SnowSQL using Private key.
snowsql -a <account_identifier> -u <user> --private-key-path <path>/rsa_key.pem
The <account_identifier> can be extracted from the URL to login to Snowflake account.
<account_identifier>.snowflakecomputing.com
The below image shows that we were able to successfully connect to Snowflake SnowSQL using the
private key generated.

Connecting to Snowflake from SnowSQL using Key Pair Authentication


5. Configuring Key Pair Rotation in Snowflake
Snowflake supports assigning multiple keys to users for rotation of key pairs for authentication.
Currently, RSA_PUBLIC_KEY and RSA_PUBLIC_KEY_2 parameters can be used to associate up
to 2 public keys with a single user using ALTER USER.
To assign a second public key to a user
1. Generate a new public-private key pair.
2. Assign the public key to the user using RSA_PUBLIC_KEY_2 parameter as shown below.
ALTER USER SFUSER08 SET RSA_PUBLIC_KEY2 = 'MIIBIjANB…';
3. Use the new private key to connect to Snowflake. Snowflake verifies the correct active public key
for authentication based on the private key submitted with your connection information.
To remove a public key assigned to a user, use the following ALTER USER command.
ALTER USER SFUSER08 UNSET RSA_PUBLIC_KEY;
Subscribe to our Newsletter !!
Data Management and Utilities
Snowflake User-Defined Functions (UDFs)
June 11, 2023
Spread the love
Contents hide
1. Introduction
2. What is a User-Defined Function?
3. UDF Supported Languages in Snowflake
4. Types of User-Defined Functions
5. Creating User-Defined Function in Snowflake
6. Calling User-Defined Function in Snowflake
7. Scalar Function with Examples
Example-1: Convert Datetimestamp value into a Date value using a UDF.
Example-2: Calling UDF in a SELECT query.
Example-3: Calling UDF in a WHERE clause of a SELECT query.
Example-4: UDF with a Query Expression with SELECT Statement.
8. Table Function with Examples
Example-1: Basic example of a UDTF which returns data from a table.
Example-2: UDTF with Joins in a SQL Query.
Example-3: Using UDTF in a Join of a SQL Query.
9. Difference between UDF and Stored Procedure
1. Introduction
There are instances where there are certain requirements that cannot be fulfilled with the existing
built-in system defined functions provided by Snowflake. In such cases, Snowflake allows users to
create their own functions based on their requirement called User-Defined functions. Once created
these functions can be reused multiple times.
In this article, let us deep-dive into understanding User-Defined Functions (UDFs) in Snowflake.
2. What is a User-Defined Function?
A Snowflake User-Defined Function is a reusable component defined by user to perform a
specific task which can be called from a SQL statement. Similar to built-in functions, the user-
defined functions can be called from a SQL repeatedly from multiple places in a code.
3. UDF Supported Languages in Snowflake
Though the UDFs are created using SQL, Snowflake supports writing the body of the function which
holds the logic in multiple languages. Each language allows you to manipulate data within the
constraints of the language and its runtime environment.
Below are the languages supported by Snowflake for writing UDFs:
1. Java
2. JavaScript
3. Python
4. Scala
5. SQL
In this article, we will focus on building UDFs using SQL Scripting.
4. Types of User-Defined Functions
A SQL UDF evaluates an arbitrary SQL expression and returns the result(s) of the expression. Based
on the return value(s) provided by a function, the UDFs are of two different types.
1. Scalar Function (UDF) – returns a single value.
2. Table Function (a user-defined table function, or UDTF) – returns a set of rows as
tabular value.
Let us understand these UDF types with examples in the further sections of the article.
5. Creating User-Defined Function in Snowflake
The User-Defined Functions (UDFs) are created using a CREATE FUNCTION command. Below is
the syntax to create UDFs in Snowflake.
CREATE OR REPLACE FUNCTION <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
RETURNS <result_data_type>
LANGUAGE <language>
AS
$$
<function_body>
$$
;
The syntax to create UDFs is similar to the creating Stored Procedures in Snowflake. For more details
about the various parameters used in the syntax, refer our previous article.
6. Calling User-Defined Function in Snowflake
A User-defined function (UDF) or a User-defined table function (UDTF) can be called in the same
way that you call other functions.
Calling a UDF:
A UDF can be called using a SELECT statement as shown below. If a UDF has arguments, you can
specify those arguments by name or by position.
SELECT udf_name(udf_arguments) ;
Calling a UDTF:
A UDTF can be called in a way any table function would be called. A UDTF is called in a FROM
clause of a query using TABLE keyword followed by UDTF name and its arguments wrapped inside
a parentheses as shown below
SELECT ...
FROM TABLE ( udtf_name (udtf_arguments) ) ;
7. Scalar Function with Examples
A Scalar Function (UDF) returns a single row as output for each input row. The returned row
consists of a single column/value.
Example-1: Convert Datetimestamp value into a Date value using a UDF.
The below UDF get_date takes Datetimestamp value as an input argument and returns a date value.
CREATE OR REPLACE FUNCTION get_date(business_date timestamp)
RETURNS DATE
LANGUAGE SQL
AS
$$
TO_DATE(SUBSTR(TO_CHAR(business_date),1,10))
$$;
The call to the UDF can be made as shown below.
SELECT get_date('2023-01-01 12:53:22.000');
Output:
GET_DATE(‘2023-01-01 12:53:22.000’)

2023-01-01

Example-2: Calling UDF in a SELECT query.


Consider below Sales table as an example for demonstration purpose.
CREATE OR REPLACE TABLE SALES(
sale_datetime TIMESTAMP,
sale_amount NUMBER(19,4)
);

INSERT INTO SALES VALUES


('2023-01-01 12:53:22.000','2876.93'),
('2023-01-02 01:14:55.000','3509.75'),
('2023-01-03 01:05:12.000','2971.66'),
('2023-01-04 12:47:49.000','3328.32');
The same UDF get_date created in example-1 is used for converting the datetimestamp values into
date value while reading data from Sales table by calling UDF in the SELECT statement as shown
below.
SELECT
get_date(sale_datetime) AS sale_date,
sale_amount
FROM SALES;
Output:
SALE_DATE SALE_AMOUNT

2023-01-01 2876.9300

2023-01-02 3509.7500

2023-01-03 2971.6600

2023-01-04 3328.3200

Example-3: Calling UDF in a WHERE clause of a SELECT query.


The SQL query below is an example where UDF is applied on a field used in the WHERE clause of
SELECT statement.
SELECT * FROM sales
WHERE get_date(sale_datetime) > '2023-01-02';
Output:
SALE_DATE SALE_AMOUNT

2023-01-03 2971.6600

2023-01-04 3328.3200

Example-4: UDF with a Query Expression with SELECT Statement.


Below is an example of UDF which when called provides the sum of total sales amount
from Sales table.
CREATE OR REPLACE FUNCTION get_total_sales()
RETURNS NUMBER(19,4)
LANGUAGE SQL
AS
$$
SELECT SUM(sale_amount) FROM SALES
$$;
When using a query expression in a SQL UDF, do not include a semicolon (;) within the UDF
body to terminate the query expression.
Calling the UDF using SELECT.
SELECT get_total_sales();
Output:
GET_TOTAL_SALES()

12686.6600
Although the body of a UDF can contain a complete SELECT statement, it cannot contain DDL
statements or any DML statement other than SELECT.
8. Table Function with Examples
Table Functions or a Tabular SQL UDFs (UDTFs) returns a set of rows consisting of 0, 1 or
more rows each of which has 1 or more columns.
While creating UDTFs using CREATE FUNCTION command, the <result_data_type> should
be TABLE(…). Inside the parentheses specify the output column names along with the expected data
type.
Consider below tables sales_by_country, currency as an example for demonstration purpose.
CREATE OR REPLACE TABLE sales_by_country(
year NUMBER(4),
country VARCHAR(50),
sale_amount NUMBER
);

INSERT INTO SALES_BY_COUNTRY VALUES


('2022','US','90000'),
('2022','UK','75000'),
('2022','FR','55000'),
('2023','US','100000'),
('2023','UK','80000'),
('2023','FR','70000');

CREATE OR REPLACE TABLE currency(


country VARCHAR(50),
currency VARCHAR(3)
);

INSERT INTO CURRENCY VALUES


('US','USD'),
('UK','GBP'),
('FR','EUR');
Example-1: Basic example of a UDTF which returns data from a table.
The below UDTF is an example which returns data from a table based on an input argument value.
CREATE OR REPLACE FUNCTION get_sales(country_name VARCHAR)
RETURNS TABLE (year NUMBER, sale_amount NUMBER, country VARCHAR)
AS
$$
SELECT year, sale_amount, country
FROM sales_by_country
WHERE country = country_name
$$
;
Calling the UDTF to return the data of US country.
SELECT * FROM TABLE(get_sales('US'));
Output:
YEAR SALES_AMOUNT

2022 90000
YEAR SALES_AMOUNT

2023 100000

Example-2: UDTF with Joins in a SQL Query.


The below UDTF is an example which joins two tables to return the required data from both the tables
based on an input argument value.
CREATE OR REPLACE FUNCTION get_sales_with_currency(country_name VARCHAR)
RETURNS TABLE (year NUMBER, sale_amount NUMBER, country VARCHAR, currency
VARCHAR)
AS
$$
SELECT a.year, a.sale_amount, a.country, b.currency
FROM sales_by_country a
JOIN currency b
ON a.country = b.country
WHERE a.country = country_name
$$
;
Calling the UDTF to return the data of US country.
SELECT * FROM TABLE(get_sales_with_currency ('US'));
Output:
YEAR SALES_AMOUNT COUNTRY

2022 90000 US

2023 100000 US

Example-3: Using UDTF in a Join of a SQL Query.


The below SQL statement is an example where data from a UDTF (get_sales) is joined with a table
(Currency).
SELECT a.year, a.sale_amount, a.country, b.currency
FROM TABLE(get_sales('US')) a
JOIN currency b
ON a.country = b.country
;
Output:
YEAR SALES_AMOUNT COUNTRY

2022 90000 US

2023 100000 US

9. Difference between UDF and Stored Procedure


1. UDFs Return a Value. Stored Procedures Need Not.
 The main purpose of a UDF is to calculate and return value. Whereas a Stored
Procedure is used to perform administrative operations by executing SQL statements
where it is not required to explicitly return a value.
2. UDF Return Values Are directly usable in SQL. Stored Procedure Return Values may not be.
 The value returned by a stored procedure, unlike the value returned by a function,
cannot be used directly in SQL.
3. UDFs do not support DDL, DML queries like Stored Procedures.
 UDFs only support SELECT statements in the function body whereas Stored
Procedures also allow DDL, DML queries inside the procedure body.
4. UDFs can be called in the context of another statement. Stored Procedures are called
Independently.
 A stored procedure is called as an independent statement whereas a UDF is always
called inside the context of a SELECT statement.
SELECT MyStoredProcedure(argument_1); -- Not Supported

CALL MyStoredProcedure(argument_1);

SELECT MyUDF(column_1) FROM table1;


5. Multiple UDFs may be called within one statement. A Single Stored Procedure is called as one
statement.
 A single executable statement can call only one stored procedure. In contrast, a single
SQL statement can call multiple functions.
CALL MyStoredProcedure(argument_1);

SELECT MyUDF_1(column_1), MyUDF_2(column_2) FROM table1;

Caller’s and Owner’s Rights in Snowflake Stored Procedures


March 31, 2023
Spread the love
Contents hide
1. Introduction
2. Caller’s Rights in Snowflake Stored Procedures
3. Owner’s Rights in Snowflake Stored Procedures
4. Difference between Caller’s and Owner’s Rights in Snowflake
5. Demonstration of Caller’s and Owner’s Rights
1. Introduction
The stored procedures in Snowflake runs either with caller’s rights or the owner’s rights which helps
in defining the privileges with which the statements in the stored procedure executes. By default,
when a stored procedure is created in Snowflake without specifying the rights with which it should be
executed, it runs with owner’s rights.
In this article let us discuss what are caller’s rights and owner’s rights, the differences between the
both and how to implement them in Snowflake stored procedures.
2. Caller’s Rights in Snowflake Stored Procedures
A caller’s rights stored procedure runs with the privileges of the role that called the stored procedure.
The term “Caller” in this context refers to the user executing the stored procedure, who may or may
not be the creator of the procedure.
Any statement that the caller could not execute outside the stored procedure cannot be executed
inside the stored procedure with caller’s rights.
At the time of creation of stored procedure, the creator has to specify if the stored procedure runs with
caller’s rights. The default is owner’s rights.
The syntax to create a stored procedure with caller’s rights is as shown below.
CREATE OR REPLACE PROCEDURE <procedure_name>()
RETURNS <data_type>
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$

$$;
3. Owner’s Rights in Snowflake Stored Procedures
An Owner’s rights stored procedure runs with the privileges of the role that created the stored
procedure. The term “Owner” in this context refers to the user who created the stored procedure, who
may or may not be executing the procedure.
The primary advantage of Owner’s rights is that the owner can delegate the privileges to
another role through stored procedure without actually granting privileges outside the
procedure.
For example, if a user do not have access to clean up data in a table is granted access to a stored
procedure (with owner’s rights) which does it. The user who do not have any privileges on table
can clean up the data in the table by executing the stored procedure. But the same statements in
the procedure when executed outside the procedure, cannot be executed by the user.
The syntax to create a stored procedure with owner’s rights is as shown below.
CREATE OR REPLACE PROCEDURE <procedure_name>()
RETURNS <data_type>
LANGUAGE SQL
EXECUTE AS OWNER
AS
$$

$$;
Note “EXECUTE AS OWNER” is optional. Even if the statement is not specified, the procedure is
created with owner’s rights.
4. Difference between Caller’s and Owner’s Rights in Snowflake
The below are the differences between Caller’s and Owner’s Rights in Snowflake.
Caller’s Rights Owner’s Rights

Runs with the privileges of the caller. Runs with the privileges of the owner.

Inherit the current warehouse of the caller. Inherit the current warehouse of the caller.

Use the database and schema that the caller is Use the database and schema that the stored procedure is created in,
currently using. schema that the caller is currently using.
5. Demonstration of Caller’s and Owner’s Rights
Let us understand how Caller’s and Owner’s Rights work with an example
using ACCOUNTADMIN and SYSADMIN roles.
Using ACCOUNTADMIN role, let us create a table named Organization for demonstration.
USE ROLE ACCOUNTADMIN;
CREATE TABLE organization(id NUMBER, org_name VARCHAR(50));
When the table is queried using SYSADMIN role, it throws an errors as shown below since no grants
on this table are provided to SYSADMIN.
USE ROLE SYSADMIN;
SELECT * FROM organization;

Let us create a stored procedure with Caller’s rights using ACCOUNTADMIN role to delete data
from Organization table.
USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE PROCEDURE sp_demo_callers_rights()


RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$
BEGIN
DELETE FROM ORGANIZATION WHERE ID = '101';
RETURN 'Data cleaned up from table.';
END;
$$
;
The output of the caller’s rights stored procedure with ACCOUNTADMIN role is as below.
USE ROLE ACCOUNTADMIN;
CALL sp_demo_callers_rights();

Assign the grants to execute the stored procedure to the SYSADMIN role.
USE ROLE ACCOUNTADMIN;
GRANT USAGE ON PROCEDURE DEMO_DB.PUBLIC.sp_demo_callers_rights() TO ROLE
SYSADMIN;
The output of the caller’s rights stored procedure with SYSADMIN role is as below.
USE ROLE SYSADMIN;
CALL sp_demo_callers_rights();

Since the SYSADMIN role do not have any privileges on Organization table, the execution of
procedure with caller’s rights also fails.
The owner of the stored procedure can change the procedure from an owner’s rights stored procedure
to a caller’s rights stored procedure (or vice-versa) by executing an ALTER
PROCEDURE command as shown below.
ALTER PROCEDURE sp_demo_callers_rights() EXECUTE AS OWNER;
The output of the owner’s rights stored procedure with SYSADMIN role is as below.
USE ROLE SYSADMIN;
CALL sp_demo_callers_rights();

Though the SYSADMIN role do not have privileges on Organization table, the execution of the
procedure which deletes data from the Organization table succeeds because the procedure
executes with Owner’s rights.
Checkout other articles related to Snowflake Stored Procedures
 Snowflake Stored Procedures
 Chapter-1: Create Procedure
 Chapter-2: Variables
 Chapter-3: EXECUTE IMMEDIATE
 Chapter-4: IF-ELSE, CASE Branching Constructs
 Chapter-5: Looping in Stored Procedures
 Chapter-6: Cursors
 Chapter-7: RESULTSET
 Chapter-8: Exceptions
 Chapter-9: Caller’s and Owner’s Rights in Snowflake Stored Procedures
Subscribe to our Newsletter !!
What are Stored Procedures?
Stored procedures allow you to write procedural code that executes business logic by combining
multiple SQL statements. In a stored procedure, you can use programmatic constructs to perform
branching and looping.
A stored procedure is created with a CREATE PROCEDURE command and is executed with
a CALL command.
Snowflake supports writing stored procedures in multiple languages. In this article we will discuss on
writing stored procedures using Snowflake SQL Scripting.
2. Stored Procedure Syntax in Snowflake
The following is the basic syntax for creating Stored Procedures in Snowflake.
CREATE OR REPLACE PROCEDURE <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
RETURNS <result_data_type>
LANGUAGE SQL
AS
$$
<procedure_body>
$$
;
Note that you must use string literal delimiters (‘ or $$) around procedure definition(body) if you are
creating a Snowflake Scripting procedure in Classic Web Interface or SnowSQL. The string literal
delimiters (‘ or $$) are not mandatory when writing procedures in SnowSight.
Let us understand the various parameters in the stored procedure construct.
2.1. NAME <name>
Specifies the name of the stored procedure.
The name must start with an alphabetic character and cannot contain spaces or special characters
unless the entire identifier string is enclosed in double quotes (e.g. “My Procedure”). Identifiers
enclosed in double quotes are also case-sensitive.
2.2. INPUT PARAMETERS ( [ <arg_name> <arg_data_type> ] [ , … ] )
A Stored Procedures can be built which takes one or more arguments as input parameters or even
without any input parameters.
 The <arg_name> specifies the name of the input argument.
 The <arg_data_type> specifies the SQL data type of the input argument.
-- Stored Procedure with multiple input arguments
CREATE OR REPLACE PROCEDURE my_proc( id NUMBER, name VARCHAR)

-- Stored Procedure with single input argument


CREATE OR REPLACE PROCEDURE my_proc( id NUMBER)

-- Stored Procedure with no input arguments


CREATE OR REPLACE PROCEDURE my_proc()
2.3. RETURNS <result_data_type>
Specifies the type of the result returned by the stored procedure.
CREATE OR REPLACE PROCEDURE my_proc()
RETURNS VARCHAR
2.4. LANGUAGE SQL
Since Snowflake supports stored procedures in multiple languages, the LANGUAGE parameter
specifies the language of the stored procedure definition. For Snowflake scripting, the value to the
LANGUAGE parameter is passed as SQL.
CREATE OR REPLACE PROCEDURE my_proc()
RETURNS VARCHAR
LANGUAGE SQL
2.5. PROCEDURE BODY
The body defines the code executed by the stored procedure. The procedure definition is mentioned
after the AS clause in the stored procedure construct. As mentioned earlier the body is wrapped
between $$ string literal delimiters if the procedure scripting is not done in SnowSight.
3. Understanding various sections in Stored Procedure Body
The Stored Procedure Body is made up of multiple sections. The various sections in the stored
procedure body are as follows.
DECLARE
... (variable declarations, cursor declarations, etc.) ...
BEGIN
... (Snowflake Scripting and SQL statements) ...
EXCEPTION
... (statements for handling exceptions) ...
END;
3.1. DECLARE
The DECLARE section is used to define any variables, cursors etc. used in the body. Alternatively
they can be declared in the BEGIN…END section of the body also.
3.2. BEGIN…END
The SQL statements and scripting constructs are written between the BEGIN and END sections of the
body.
3.3. EXCEPTION
The EXCEPTION section of the body is used to hold any exception handling code you wanted to add.
Note that DECLARE and EXCEPTION sections are not mandatory in every procedure definition.
A simple stored procedure body just requires BEGIN and END sections.
BEGIN
CREATE TABLE employees(id NUMBER, firstname VARCHAR);
END;
4. Creating a Stored Procedure in Snowflake
Consider a use case where the requirement is to purge the inactive employees’ data from a database
table. Let us build a Stored Procedure which performs this activity.
The below Stored Procedure deletes all records with status field value as ‘INACTIVE’ from the
employees table.
CREATE OR REPLACE PROCEDURE purge_data()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
message VARCHAR;
BEGIN
DELETE FROM employees WHERE status = 'INACTIVE';
message := 'Inactive employees data deleted successfully';
RETURN message;
END;
$$
;
Let us break down each block of the stored procedure below to understand better.
 The name of the stored procedure is purge_data and do not take any input
parameters.
 The data type of the return value from store procedure is defined as varchar.
 The language is defined as SQL, the language in which the procedure body is defined.
 A variable named message of type varchar is defined under DECLARE section of
the body.
 Between BEGIN…END section of the procedure body,
 The statement to delete the records with INACTIVE status is defined.
The variable message is assigned a string value. The assignment operator
used is := for assigning value to variable.
 The variable message is returned as the output from the stored procedure.
5. Creating a Stored Procedure with Input Parameters
Consider another scenario where you wanted to purge the data from a table based on an input you
passed. Let us understand with an example.
The below Stored Procedure deletes all records with status value that matches the value passed as an
input through an input parameter in_status from the employees table.
CREATE OR REPLACE PROCEDURE purge_data_by_status(in_status VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
message VARCHAR;
BEGIN
DELETE FROM employees WHERE status = :in_status;
message := in_status ||' employees data deleted sucessfully';
RETURN message;
END;
$$
;
The above stored procedure is similar to that of the one defined under section-4 of the article except,
 The name of the procedure is purge_data_by_status and accepts an input through
parameter named in_status of type varchar.
 The input parameter is used in the SQL statement which deletes the data from
employees table. Prefix the input parameter with colon (:in_status) to use in a SQL
statement.
 The same input parameter is also used in the string value assigned to message
variable indicating records with which status are deleted.
The above stored procedure can be simplified by eliminating the DECLARE section as shown below.
CREATE OR REPLACE PROCEDURE purge_data_by_status(in_status VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
DELETE FROM employees WHERE status = :in_status;
RETURN in_status ||' employees data deleted successfully';
END;
$$
;
6. Calling a Stored Procedure in Snowflake
Use CALL command to execute a stored procedure in Snowflake.
The following is the syntax to CALL command
CALL <procedure_name> ( [ <arg1> , ... ] )
The below image shows calling a stored procedure named purge_data and the output of the stored
procedure call.
CALL purge_data();

Call Stored Procedure


without any Input Parameters
The below image shows calling a stored procedure named purge_data_by_status with a string input
parameter ‘INACTIVE’ and the output of the stored procedure call.
CALL purge_data_by_status('INACTIVE');

Call Stored Procedure with


Input Parameters
7. Closing Points
As the name of the article suggests, it is just an introduction to building stored procedures in
Snowflake. We have barely scratched the surface in understanding the complete concepts of stored
procedures. Nevertheless, I hope it is a good starting point to begin learning Snowflake stored
procedures.
Watch out this spaces for more articles covering in-depths concepts of stored procedures in
Snowflake.
1. What are Variables in Snowflake Stored Procedure?
A Variable is a named object which holds a value of a specific data type whose value can change
during the stored procedure execution. Variables in Snowflake stored procedures are local to stored
procedures are used to hold intermediate results.
In this article let us discuss in-detail about declaring and using variables in Snowflake Stored
Procedures. To learn more about creating a Stored Procedure, refer our article – Introduction to
Snowflake Stored Procedures
2. Declaring Variables in Snowflake Stored Procedures
A Variable must be declared before using it in Stored Procedures. When a variable is declared, the
type of the variable must be specified by either:
 Explicitly specifying the data type. The data type of variable can be
 SQL data type
 CURSOR
 RESULTSET
 EXCEPTION
 Specifying an initial value for the variable using DEFAULT command. Snowflake
Scripting uses the DEFAULT value to determine the type of the variable.
A Variable in Snowflake can be declared either in DECLARE section or BEGIN…END section of
the stored procedure body or both.
The below example shows variable declaration in DECLARE section of the stored procedure body.
-- Variable declaration in DECLARE section of body
<variable_name> <type>;

<variable_name> DEFAULT <expression> ;

<variable_name> <type> DEFAULT <expression> ;

-- Examples
net_sales NUMBER(38,2);

net_sales DEFAULT 98.67;

net_sales NUMBER(38,2) DEFAULT 98.67;


The below example shows variable declaration in BEGIN…END section of the stored procedure
body.
-- Variable declaration in BEGIN...END section of body
LET <variable_name> { DEFAULT | := } <expression> ;

LET <variable_name> <type> { DEFAULT | := } <expression> ;

-- Examples
LET net_sales := 98.67;

LET net_sales DEFAULT 98.67;

LET net_sales NUMBER(38,2) := 98.67;

LET net_sales NUMBER(38,2) DEFAULT 98.67;


Note that the variable should be preceded with LET command while declaring variables in BEGIN…
END section of the body.
3. Assigning values to Declared Variables in Snowflake Stored Procedures
To assign a value to a variable that has already been declared, use the := operator:
<variable_name> := <expression> ;
You can use another declared variables in the expression to assign the resulting value to the variable.
The below example shows
 A variable named gross_sales declared under DECLARE section of the body with
initial default value as 0.0 using DEFAULT command
 Two variables declared in the BEGIN…END section of the body
using LET command.
 The variable net_sales is assigned a value as 98.67 using := operator.
 The variable tax is declared with an initial value
of 1.33 using DEFAULT command.
 The variable gross_sales is assigned a resulting value of the summation expression of
variables net_sales and tax.
 Finally the variable gross_sales is returned as an output using RETURN command.
DECLARE
gross_sales NUMBER(38, 2) DEFAULT 0.0;
BEGIN
LET net_sales NUMBER(38, 2) := 98.67;
LET tax NUMBER(38, 2) DEFAULT 1.33;

gross_sales := net_sales + tax;

RETURN gross_sales;
END;
4. Using a Variable in a SQL Statement (Binding)
The variables declared in the stored procedure can be used in the SQL statements using colon as
prefix to the variable name. For example:
DELETE FROM EMPLOYEES WHERE ID = :in_employeeid;
If you are using the variable as the name of an object, use the IDENTIFIER keyword to indicate that
the variable represents an object identifier. For example:
DELETE FROM IDENTIFIER(:in_tablename) WHERE ID = : in_employeeid;
If you are building a SQL statement as a string to execute, the variable does not need the colon prefix.
For example:
LET sql_stmt := 'DELETE FROM EMPLOYEES WHERE ID = ' || in_employeeid;
Note that if you are using the variable with RETURN, you do not need the colon prefix. For example:
RETURN my_variable;
5. Assigning result of a SQL statement to Variables using INTO clause in Snowflake Stored
Procedures
You can assign expression result of a SELECT statement to Variables in Snowflake Stored
Procedures using INTO clause.
The syntax to assign result of a SQL statement to variables is as below.
SELECT <expression1>, <expression2>, ... INTO :<variable1>, :<variable2>, ... FROM ...
WHERE ...;
In the syntax:
 The value of <expression1> is assigned to <variable1>.
 The value of <expression2> is assigned to <variable2>.
Note that the SELECT statement used to assign values to variables must return only single output
row.
Consider below data as an example to understand how it works.
CREATE OR REPLACE TABLE employees (id INTEGER, firstname VARCHAR);

INSERT INTO employees (id, firstname) VALUES


(101, 'TONY'),
(102, 'STEVE');
The below stored procedure assigns the id and firstname of the employee with id 101 into
variables id_variable and name_variable respectively.
CREATE OR REPLACE PROCEDURE get_employeedata()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
id_variable INTEGER;
name_variable VARCHAR;
BEGIN
SELECT id, firstname INTO :id_variable, :name_variable FROM employees WHERE id = 101;
RETURN id_variable || ' ' || name_variable;
END;
$$
;
When the stored procedure is executed, the output returns the concatenated values of id and name of
the employee with id 101.
CALL get_employeedata();
GET_EMPLOYEEDATA

101 TONY
6. Variable Scope in Snowflake Stored Procedures
If you have nested blocks in your stored procedures and multiple variables with same name are
declared in it, the scope of the variable will be local to the block its declared.
For example, if you have an outer block and inner block where you have declared a
variable my_variable and assigned value as 5 in outer block and 7 in inner block. As long as the
variable is used in the inner block, the value remains 7 and all operations outside the inner block, the
value assigned to variable remains 5.
When a variable name is referenced, Snowflake looks for the variable by starting first in the current
block, and then working outward one block at a time until a matching name is found.

EXECUTE IMMEDIATE in Snowflake Stored Procedures


February 22, 2023
Spread the love
Contents hide
1. EXECUTE IMMEDIATE in Snowflake
2. Executing SQL statements using EXECUTE IMMEDIATE
3. Executing Stored Procedures using EXECUTE IMMEDIATE
4. Executing procedure blocks using EXECUTE IMMEDIATE
5. Executing SQL statements in Stored Procedures using EXECUTE IMMEDIATE
6. EXECUTE IMMEDIATE with USING clause in Snowflake
7. EXECUTE IMMEDIATE with INTO clause in Snowflake
1. EXECUTE IMMEDIATE in Snowflake
EXECUTE IMMEDIATE command in Snowflake executes SQL statements present in form a string
literal (character sets enclosed by single quotes or double dollar signs). EXECUTE IMMEDIATE
returns the result of the executed SQL statement.
The string literal that EXECUTE IMMEDIATE executes be can any of the following.
 A single SQL statement
 A Stored Procedure call
 An Anonymous Stored Procedure block
2. Executing SQL statements using EXECUTE IMMEDIATE
The syntax to execute SQL statements using EXECUTE IMMEDIATE is as follows.
EXECUTE IMMEDIATE '<sql_query>';

EXECUTE IMMEDIATE $$ <sql_query> $$;


The below example executes the SQL statement defined in a string literal using EXECUTE
IMMEDIATE command.
EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM employees';
3. Executing Stored Procedures using EXECUTE IMMEDIATE
The syntax to execute Stored Procedures using EXECUTE IMMEDIATE is as follows.
EXECUTE IMMEDIATE 'CALL <stored_procedure_name>';

EXECUTE IMMEDIATE $$ CALL <stored_procedure_name> $$;


The below example executes a stored procedure named my_procedure() using EXECUTE
IMMEDIATE command.
EXECUTE IMMEDIATE
$$
CALL my_procedure();
$$;
4. Executing procedure blocks using EXECUTE IMMEDIATE
If you are running an anonymous block in Snowsight, you can execute the block directly without the
need of EXECUTE IMMEDIATE command.
The following is an example of an anonymous block that you can run in Snowsight.
DECLARE
net_sales NUMBER(38, 2);
tax NUMBER(38, 2);
gross_sales NUMBER(38, 2) DEFAULT 0.0;
BEGIN
net_sales := 98.67;
tax := 1.33;
gross_sales := net_sales + tax;
RETURN gross_sales;
END;
The output of the above anonymous block would be as follows:
anonymous block

100
But if you are using SnowSQL or the classic web interface, you must specify the block as a string
literal (enclosed in single quotes or double dollar signs), and you must pass the block to the
EXECUTE IMMEDIATE command as shown below.
EXECUTE IMMEDIATE
$$
DECLARE
net_sales NUMBER(38, 2);
tax NUMBER(38, 2);
gross_sales NUMBER(38, 2) DEFAULT 0.0;
BEGIN
net_sales := 98.67;
tax := 1.33;
gross_sales := net_sales + tax;
RETURN gross_sales;
END;
$$
;
5. Executing SQL statements in Stored Procedures using EXECUTE IMMEDIATE
The below stored procedure is an example which executes the SQL statements using EXECUTE
IMMEDIATE.
 The first EXECUTE IMMEDIATE command executes the CREATE statement
declared in the variable create_stmt.
 The second EXECUTE IMMEDIATE command executes the DELETE statement
declared in variable delete_stmt in concatenation with a filter condition passed as a
string.
This also demonstrates that EXECUTE IMMEDIATE works not only with a string literal, but also
with an expression that evaluates to a string (VARCHAR).
CREATE OR REPLACE PROCEDURE sp_execute_immediate_demo()
RETURNS NUMBER
LANGUAGE SQL
AS
$$
DECLARE
create_stmt VARCHAR DEFAULT 'CREATE OR REPLACE TABLE temp_emp AS SELECT *
FROM employees';
delete_stmt VARCHAR DEFAULT 'DELETE FROM temp_emp';
result NUMBER DEFAULT 0;
BEGIN
EXECUTE IMMEDIATE create_stmt;
EXECUTE IMMEDIATE delete_stmt || ' WHERE status= ''INACTIVE'' ';
result := (SELECT COUNT(*) FROM temp_emp);
RETURN result;
END;
$$
;
The output of the procedures gives the record count in table temp_emp after removing the
INACTIVE records.
6. EXECUTE IMMEDIATE with USING clause in Snowflake
The EXECUTE IMMEDIATE command is used in conjunction with USING clause to pass bind
variables to the SQL query passed as a string literal to it. The bind variables are passed as a list
separated by comma and enclosed in brackets.
The syntax to use EXECUTE IMMEDIATE with USING clause in Snowflake is as follows.
EXECUTE IMMEDIATE '<sql_query>' USING (bind_variable1, bind_variable2,…);
A bind variable holds a value to be used in SQL query executed by EXECUTE IMMEDIATE
command.
The below stored procedure is an example in which values to the filter condition of a SQL query
executed by EXECETE IMMEDIATE command are passed through bind variables defined in USING
clause.
CREATE OR REPLACE PROCEDURE purge_data_by_date()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
sql_stmt VARCHAR DEFAULT 'DELETE FROM employees WHERE hire_date BETWEEN :1
AND :2';
min_date DATE DEFAULT '2015-01-01';
max_date DATE DEFAULT '2017-12-31';
result VARCHAR;
BEGIN
EXECUTE IMMEDIATE sql_stmt USING (min_date, max_date);
result := 'Data deleted between '|| min_date || 'and '|| max_date;
RETURN result;
END;
$$
;
7. EXECUTE IMMEDIATE with INTO clause in Snowflake
Using INTO clause in conjunction with EXECUTE IMMEDIATE, we can specify the list of the user
defined variables which will hold the values returned by SELECT statement in Oracle. But
currently EXECUTE IMMEDIATE with INTO clause is not supported in Snowflake like in
Oracle.
Instead we can still assign the values to the user defined variables from the result of a SQL statement
by using INTO clause directly in SELECT statement as shown below.
SELECT id, firstname INTO :id_variable, :name_variable FROM employees WHERE id = 101;
For more details refer our previous article.
IF-ELSE, CASE Statements in Snowflake Stored Procedures
February 26, 2023
Spread the love
Contents hide
Introduction
IF Statement
CASE Statement
Simple CASE Statement
Introduction
Snowflake Stored Procedures supports following branching constructs in the stored procedure
definition.
 IF ELSE
 CASE
IF Statement
IF statement in Snowflake provides a way to execute a set of statements if a condition is met.
The following is the syntax to the IF statement in Snowflake.
IF ( <condition> ) THEN
<statement>;
ELSEIF ( <condition> ) THEN
<statement>;
ELSE
<statement>;
END IF;
In an IF statement:
 The ELSEIF and ELSE clauses are optional.
 If an additional condition needs to be evaluated, add statements
under ELSEIF clause.
 Multiple conditions can be evaluated using multiple ELSEIF clauses.
 If none of the provided conditions are true, specify statements to execute
in ELSE clause.
The following is an example of Snowflake stored procedure calculating the maximum among the
three numbers using IF statement.
CREATE OR REPLACE PROCEDURE sp_demo_if(p NUMBER, q NUMBER, r NUMBER)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
var_string VARCHAR DEFAULT 'Maximum Number: ';
BEGIN
IF ( p>=q AND p>=r ) THEN
RETURN var_string || p ;
ELSEIF ( q>=p AND q>=r ) THEN
RETURN var_string || q ;
ELSE
RETURN var_string || r ;
END IF;
END;
$$
;
The output of the procedure is as follows
CALL sp_demo_if(11,425,35);
SP_DEMO_IF

Maximum Number: 425


CASE Statement
CASE statement in Snowflake lets you define different conditions using WHEN clause and
returns a value when first condition is met. This is also referred as Searched CASE statement.
The following is the syntax to the CASE statement in Snowflake.
CASE
WHEN <condition 1> THEN
<statement>;
WHEN <condition 2> THEN
<statement>;
ELSE
<statement>;
END;
In a CASE statement:
 The conditions specified in the WHEN clause are executed in the order they are
defined.
 Whenever a condition is met, the statement configured in the THEN clause is
executed.
 If none of the conditions configured in WHEN clauses are met, the statement
specified under ELSE clause is executed.
The following is an example of Snowflake stored procedure calculating the maximum among the
three numbers using CASE statement.
CREATE OR REPLACE PROCEDURE sp_demo_case(p NUMBER, q NUMBER, r NUMBER)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
var_string VARCHAR DEFAULT 'Maximum Number: ';
BEGIN
CASE
WHEN ( p>=q AND p>=r ) THEN
RETURN var_string || p ;
WHEN ( q>=p AND q>=r ) THEN
RETURN var_string || q ;
ELSE
RETURN var_string || r ;
END;
END;
$$
;
The output of the procedure is as follows
CALL sp_demo_case(11,425,35);
SP_DEMO_CASE

Maximum Number: 425


Simple CASE Statement
A Simple CASE Statement allows you to define a single condition and all the possible output
values of defined condition under different branches using WHEN clause.
The following is the syntax to the Simple CASE statement in Snowflake.
CASE <condition>
WHEN <value 1> THEN
<statement>;
WHEN <value 2> THEN
<statement>;
ELSE
<statement>;
END;
In a Simple CASE statement:
 The condition expression is defined only once in CASE statement.
 All the possible values of the expression are defined in different WHEN clauses.
 Whenever a value defined in a WHEN clause is a match, then the statement
configured in the THEN clause is executed.
 If none of the values configured in WHEN clauses are a match, the statements
specified under ELSE clause are executed.
The following is an example of Snowflake Stored Procedure fetching the currency of a country
using Searched CASE statement.
Note that the same condition expression is defined in every WHEN clause of the CASE statment in
the example below.
CREATE OR REPLACE PROCEDURE sp_demo_searched_case(v_country VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
---------------------Example of Searched Case Statement---------------------
DECLARE
v_output VARCHAR DEFAULT 'The currency of country '|| UPPER(v_country) || ' is ';
BEGIN
CASE
WHEN UPPER(v_country) = 'INDIA' THEN
RETURN v_output || 'RUPEE';
WHEN UPPER(v_country) = 'USA' THEN
RETURN v_output || 'DOLLAR';
WHEN UPPER(v_country) = 'UK' THEN
RETURN v_output || 'POUND';
ELSE
RETURN 'The country '|| v_country || ' is not defined in procedure';
END;
END;
$$
;
The output of the procedure with Searched CASE statement is as follows.
CALL sp_demo_searched_case('India');
SP_DEMO_SEARCHED_CASE

The currency of country INDIA is RUPEE


The following is the same procedure built using the Simple CASE statement.
CREATE OR REPLACE PROCEDURE sp_demo_simple_case(v_country VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
---------------------Example of Simple Case Statement---------------------
DECLARE
v_output VARCHAR DEFAULT 'The currency of country '|| UPPER(v_country) || ' is ';
BEGIN
CASE( UPPER(v_country) )
WHEN 'INDIA' THEN
RETURN v_output || 'RUPEE';
WHEN 'USA' THEN
RETURN v_output || 'DOLLAR';
WHEN 'UK' THEN
RETURN v_output || 'POUND';
ELSE
RETURN 'The country '|| v_country || ' is not defined in procedure';
END;
END;
$$
;
The output of the procedure with Simple CASE statement is as follows.
CALL sp_demo_simple_case('USA');
SP_DEMO_SIMPLE_CASE

The currency of country USA is DOLLAR


Checkout other articles related to Snowflake
Looping in Snowflake Stored Procedures
February 28, 2023
Spread the love

The following are the different type of Loops supported in Snowflake Stored Procedures.
 FOR
 WHILE
 REPEAT
 LOOP
In this article let us discuss about the different loops in Snowflake Stored Procedures with examples.
Contents hide
FOR Loop in Snowflake Stored Procedures
WHILE Loop in Snowflake Stored Procedures
REPEAT Loop in Snowflake Stored Procedures
LOOP Loop in Snowflake Stored Procedures
FOR Loop in Snowflake Stored Procedures
A FOR loop enables a particular set of steps to be executed for a specified number of times until
a condition is satisfied.
The following is the syntax of FOR Loop in Snowflake Stored Procedures.
FOR <counter_variable> IN [ REVERSE ] <start> TO <end> { DO | LOOP }
<statement>;
END { FOR | LOOP } [ <label> ] ;
The keyword DO should be paired with END FOR and the keyword LOOP should be paired
with END LOOP. For example:
FOR...DO
...
END FOR;

FOR...LOOP
...
END LOOP;
In FOR Loop:
 A <counter_variable> loops from the values defined for <start> till the value
defined for <end> in the syntax.
 Note that if a variable with the same name as <counter_variable> is declared outside
the loop, the outer variable and the loop variable are independent.
 Use REVERSE keyword to loop the values starting from <end> till <start>.
 If there are multiple loops defined in the procedure, use the <label> to identify loops
individually. This also helps to jump loops using BREAK and CONTINUE
statements.
The following is an example of Snowflake Stored Procedure which calculates the sum of first n
numbers using FOR Loop.
CREATE OR REPLACE PROCEDURE sp_demo_for_loop(n NUMBER)
RETURNS NUMBER
LANGUAGE SQL
AS
$$
DECLARE
total_sum INTEGER DEFAULT 0;
BEGIN
FOR i IN 1 TO n DO
total_sum := total_sum + i ;
END FOR;
RETURN total_sum;
END;
$$
;
The output of the procedure with FOR Loop is as follows.
CALL sp_demo_for_loop(5);
SP_DEMO_FOR_LOOP

15
The following is an example of Snowflake Stored Procedure which prints the numbers in backwards
using REVERSE keyword in FOR Loop.
CREATE OR REPLACE PROCEDURE sp_demo_reverse_for_loop(n NUMBER)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
reverse_series VARCHAR DEFAULT '';
BEGIN
FOR i IN REVERSE 1 TO n LOOP
reverse_series := reverse_series ||' '|| i::VARCHAR ;
END LOOP;
RETURN reverse_series;
END;
$$
;
The output of the procedure with REVERSE keyword in FOR Loop is as follows.
CALL sp_demo_reverse_for_loop(5);
SP_DEMO_REVERSE_FOR_LOOP

54321
WHILE Loop in Snowflake Stored Procedures
A WHILE loop iterates while a specified condition is true. The condition for the loop is tested
immediately before executing the body of the loop in WHILE loop. If the condition is false, the
loop is not executed even once.
The following is the syntax of WHILE Loop in Snowflake Stored Procedures.
WHILE ( <condition> ) { DO | LOOP }
<statement>;
END { WHILE | LOOP } [ <label> ] ;
In a WHILE Loop:
 The <condition> is an expression that evaluates to a BOOLEAN.
 The keyword DO should be paired with END WHILE and the
keyword LOOP should be paired with END LOOP.
 If there are multiple loops defined in the procedure, use the <label> to identify loops
individually. This also helps to jump loops using BREAK and CONTINUE
statements.
Note that if the <condition> never evaluates to FALSE, and the loop does not contain a BREAK
command (or equivalent), then the loop will run and consume credits indefinitely.
The following is an example of Snowflake Stored Procedure which calculates the sum of first n
numbers using WHILE Loop.
CREATE OR REPLACE PROCEDURE sp_demo_while_loop(n NUMBER)
RETURNS NUMBER
LANGUAGE SQL
AS
$$
DECLARE
total_sum INTEGER DEFAULT 0;
BEGIN
LET counter := 1;
WHILE (counter <= n) DO
total_sum := total_sum + counter;
counter := counter + 1;
END WHILE;
RETURN total_sum;
END;
$$
;
The output of the procedure with WHILE Loop is as follows.
CALL sp_demo_while_loop(6);
SP_DEMO_WHILE_LOOP

21
REPEAT Loop in Snowflake Stored Procedures
A REPEAT loop iterates until a specified condition is true. This is similar to DO WHILE loop in
other programming languages which tests the condition at the end of the loop. This means that
the body of a REPEAT loop always executes at least once.
The following is the syntax of REPEAT Loop in Snowflake Stored Procedures.
REPEAT
<statement>;
UNTIL ( <condition> )
END REPEAT [ <label> ] ;
In a REPEAT Loop:
 The <condition> is an expression that evaluates to a BOOLEAN.
 The <condition> is evaluated at the end of the loop and is defined
using UNTIL keyword.
The following is an example of Snowflake Stored Procedure which calculates the sum of first n
numbers using REPEAT Loop.
CREATE OR REPLACE PROCEDURE sp_demo_repeat(n NUMBER)
RETURNS NUMBER
LANGUAGE SQL
AS
$$
DECLARE
total_sum INTEGER DEFAULT 0;
BEGIN
LET counter := 1;
REPEAT
total_sum := total_sum + counter;
counter := counter + 1;
UNTIL(counter > n)
END REPEAT;
RETURN total_sum;
END;
$$
;
The output of the procedure with REPEAT Loop is as follows.
CALL sp_demo_repeat(7);
SP_DEMO_REPEAT

28
LOOP Loop in Snowflake Stored Procedures
A LOOP loop executes until a BREAK command is executed. It does not specify a number of
iterations or a terminating condition.
The following is the syntax of LOOP Loop in Snowflake Stored Procedures.
LOOP
<statement>;
END LOOP [ <label> ] ;
In a LOOP Loop:
 The user must explicitly exit the loop by using BREAK command in the loop.
 The BREAK command is normally embedded inside branching logic
(e.g. IF Statements or CASE Statements).
 The BREAK command immediately stops the current iteration, and skips any
remaining iterations.
The following is an example of Snowflake Stored Procedure which calculates the sum of first n
numbers using LOOP Loop.
CREATE OR REPLACE PROCEDURE sp_demo_loop(n NUMBER)
RETURNS NUMBER
LANGUAGE SQL
AS
$$
DECLARE
total_sum INTEGER DEFAULT 0;
BEGIN
LET counter := 1;
LOOP
IF(counter > n) THEN
BREAK;
END IF;
total_sum := total_sum + counter;
counter := counter + 1;
END LOOP;
RETURN total_sum;
END;
$$
;
The output of the procedure with REPEAT Loop is as follows.
CALL sp_demo_loop(8);
SP_DEMO_LOOP

36
1. Cursors in Snowflake Stored Procedures
A Cursor is a named object in a stored procedure which allows you to loop through a set of
rows of a query result set, one row at a time. It allows to perform same set of defined actions for
each row individually while looping through a result of a SQL query.
Working with Cursors in Snowflake Stored Procedures includes following steps
1. Declaring a cursor either in DECLARE or BEGIN…END section of the stored
procedure.
2. Opening a cursor using OPEN command.
3. Fetching rows from cursors using FETCH command.
4. Closing a cursor using CLOSE command.
2. Syntax of Cursors in Snowflake Stored Procedures
2.1. Declaring a Cursor
A Cursor must be declared before using it. Declaring a Cursor defines the cursor with a name
and the associated SELECT statement.
The syntax for declaring a CURSOR in DECLARE section of the procedure is as follows.
DECLARE
<cursor_name> CURSOR FOR <select_statement>;

-- Example:
DECLARE
my_cursor CURSOR FOR SELECT id, firstname FROM employees;
The syntax for declaring a CURSOR in BEGIN…END section of the procedure is as follows.
BEGIN

LET <cursor_name> CURSOR FOR <select_statement>;

END;

-- Example:
BEGIN

LET my_cursor CURSOR FOR SELECT id, firstname FROM employees;

END;
2.2. Opening a Cursor
The cursor must be explicitly opened before fetching rows from it using OPEN command. The
query associated with cursor is not executed until it is opened.
The syntax to OPEN a CURSOR in stored procedure is as follows.
OPEN <cursor_name>;

-- Example:
BEGIN
OPEN my_cursor;

END;
2.3. Fetching data from Cursor
The FETCH command retrieves row by row from the result set of the query associated with
cursor. Each FETCH command that you execute fetches a single row and increments the
internal counter to next row.
As a result the FETCH command must be executed multiple times until last row is fetched using
looping commands in stored procedures. If a FETCH command is executed after all rows are fetched,
it retrieves null values.
The syntax to FETCH data from CURSOR in stored procedures is as follows.
FETCH <cursor_name> INTO <variable_1>,<variable_2>,…;

-- Example:
BEGIN

FETCH my_cursor INTO my_variable_1, my_variable_1;

END;
2.4. Closing a Cursor
The cursor must be closed once all rows are fetched using the CLOSE command.
The syntax to CLOSE a CURSOR in stored procedures is as follows.
CLOSE <cursor_name>;

-- Example:
BEGIN

CLOSE my_cursor;
END;
3. Setting up a query for Cursor demonstration
The following SELECT query fetches all the tables present in PUBLIC schema
of DEMO_DB database. The query uses INFORMATION_SCHEMA which is a data dictionary
schema available under each database.
SELECT table_name, table_type
FROM demo_db.information_schema.tables
WHERE table_schema = 'PUBLIC' AND table_type= 'BASE TABLE'
ORDER BY table_name;

Output of query providing list of tables present in PUBLIC schema


Let us use this query as an example and fetch the details of all tables using CURSOR.
4. Cursors using OPEN, FETCH and CLOSE
The following stored procedure fetches details of all the tables present in the PUBLIC schema and
lists them as output using Cursors.
CREATE OR REPLACE PROCEDURE sp_demo_cursor()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
table_cursor CURSOR FOR
SELECT table_name FROM demo_db.information_schema.tables
WHERE table_schema = 'PUBLIC' AND table_type= 'BASE TABLE' ORDER BY table_name;
res VARCHAR DEFAULT '';
var_table_name VARCHAR;
BEGIN
OPEN table_cursor;
LOOP
FETCH table_cursor into var_table_name;
IF(var_table_name <> '') THEN
res := var_table_name ||' '||res;
ELSE
BREAK;
END IF;
END LOOP;
CLOSE table_cursor;
RETURN res;
END;
$$;
In the above example:
 In the line 7, the cursor is defined in the DECLARE section of the stored procedure.
The cursor is named as table_cursor and the query discussed in section-3 is associated
with cursor.
 In the line 13, the cursor is opened using OPEN command.
 From line 14-21, LOOP command is used to loop until all records are fetched from
the cursor.
 In the line 15, the FETCH command fetches the table_name from the result set of
cursor into variable var_table_name.
 From line 16-19, IF-ELSE clause is used to concatenate the table name values read
from the variable var_table_name as long as the variable value is not null. Once all
the values in the result set are fetched, the logic to explicitly exit the loop using the
BREAK command is embedded in the ELSE clause.
 In the line 22, the cursor table_cursor is closed.
 In the line 23, the final concatenated value of all table names is returned as output.
The output of the stored procedure sp_demo_cursor is as follows.
CALL sp_demo_cursor();
SP_DEMO_CURSOR

LOCATIONS EMPLOYEES DEPARTMENTS

5. Cursors using FOR Loop command


A FOR loop can also be used to iterate over a result set of a Cursor instead of FETCH
command. The number of iterations of the FOR loop is determined by the number of rows in
the result set of cursor.
Note that when using a FOR loop to iterate the cursor, the cursor need not be opened explicitly using
OPEN command.
The syntax of Cursor-based FOR loops is as follows.
FOR <row_variable> IN <cursor_name> DO
<statement>;
END FOR [ <label> ] ;
The <row_variable> holds data of all the columns of the row it is iterating. The individual columns
can be accessed as below
<row_variable>.<column1>, <row_variable>.<column2>, ...
The following stored procedure fetches details of all the tables present in the PUBLIC schema and
lists them as output using Cursor-based FOR loops.
CREATE OR REPLACE PROCEDURE sp_demo_cursor_using_for()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
table_cursor CURSOR FOR
SELECT table_name, table_type FROM demo_db.information_schema.tables
WHERE table_schema = 'PUBLIC' and table_type= 'BASE TABLE' ORDER BY table_name;
res VARCHAR DEFAULT '';
BEGIN
FOR var in table_cursor DO
res := var.table_name||' '||res;
END FOR;
RETURN res;
END;
$$
;
In the above example:
 From line 12-14, the FOR loop is used iterate over the result set of the
cursor table_cursor.
 In the line 12, the row variable var is used to hold the result set values of the query
associated with cursor.
 In the line 13, the table_name field from the result set is accessed using the row
variable as var.table_name.
 In the line 15, Once all rows in the result set are iterated, the FOR loop exits and the
final concatenated value of all table names is returned.
The output of the stored procedure sp_demo_cursor_using_for is as follows.
CALL sp_demo_cursor_using_for();
SP_DEMO_CURSOR_USING_FOR

LOCATIONS EMPLOYEES DEPARTMENTS

6. Cursors using RESULTSET


Instead of assigning the SELECT query directly to the CURSOR, you can assign a variable of
type RESULTSET which holds the query.
The syntax to assign SELECT query to CURSOR using RESULTSET is as follows.
BEGIN
LET <variable_name> RESULTSET := (<select_query>);
LET <cursor_name> CURSOR FOR <variable_name>;

END;
The following stored procedure fetches details of all the tables present in the PUBLIC schema and
lists them as output using RESULTSET to pass query to CURSOR.
CREATE OR REPLACE PROCEDURE sp_demo_cursor_using_resultset()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
res VARCHAR DEFAULT '';
BEGIN
LET res_set RESULTSET := ( SELECT table_name, table_type FROM
demo_db.information_schema.tables
WHERE table_schema = 'PUBLIC' and table_type= 'BASE TABLE' ORDER BY table_name);
LET table_cursor CURSOR FOR res_set;
FOR var in table_cursor DO
res := var.table_name||' '||res;
END FOR;
RETURN res;
END;
$$
;
In the above example:
 In the line 9, variable of type RESULTSET named res_set is assigned with the query
is to fetch the list of all table names.
 In the line 11, the cursor is assigned with the res_set variable instead of query.
 The rest of the procedure is same as the example discussed in earlier section.
The output of the stored procedure sp_demo_cursor_using_resultset is as follows.
CALL sp_demo_cursor_using_for();
SP_DEMO_CURSOR_USING_RESULTSET

LOCATIONS EMPLOYEES DEPARTMENTS

7. Cursors using USING clause to pass Bind variables


In the SELECT statement, we can pass bind variables to which values can be passed while
opening the cursor in the USING clause of OPEN command.
The following stored procedure fetches details of all the tables present in a schema whose value is
passed as a bind variable to the cursor query.
CREATE OR REPLACE PROCEDURE sp_demo_cursor_using_bindvariables(var_schema
VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
table_cursor CURSOR FOR
SELECT table_name, table_type FROM demo_db.information_schema.tables
WHERE table_schema = ? AND table_type= 'BASE TABLE' ORDER BY table_name;
res VARCHAR DEFAULT '';
var_table_name VARCHAR;
BEGIN
OPEN table_cursor using (var_schema);
LOOP
FETCH table_cursor into var_table_name;
IF(var_table_name <> '') THEN
res := var_table_name ||' '||res;
ELSE
BREAK;
END IF;
END LOOP;
CLOSE table_cursor;
RETURN res;
END;
$$
;
In the above example:
 In the line 1, the stored procedure is expected to be called by passing value to the
input variable var_schema which holds the schema name.
 In the line 9, you could see that the schema is passed as bind variable (?) in the query
assigned to the cursor.
 In the line 13, the value to the bind variable is passed through USING clause
of OPEN command.
 The rest of the procedure is same as what we discussed in the section-3.
The stored procedure sp_demo_cursor_using_bindvariables is executed as below.
CALL sp_demo_cursor_using_bindvariables('PUBLIC');
The output of the stored procedure sp_demo_cursor_using_bindvariables is as follows.
SP_DEMO_CURSOR_USING_BINDVARIABLES

LOCATIONS EMPLOYEES DEPARTMENTS

8. Real-time scenario using a Cursor in Stored Procedures


In all the above examples, we have looped through a SELECT query which lists the names of all
tables present in a schema. You might be wondering do we even need cursors to achieve this output.
Well, definitely not !!!
These examples are designed to make you understand about the concept of the cursors. But in real-
time, every time you extract the row from a result set of a cursor, an action would be associated with
it.
Those actions could be executing any DML statements like UPDATE, DELETE etc.. or calling
another stored procedure based on the logic.
So let us also end this article with a real-time useful scenario. In all the above examples we have just
listed the table names. Let us extend the same example by extracting the DDL of each table and
provide them as output.
The following stored procedure takes database and schema names as input and fetches the DDL of all
the tables present inside them as output using Cursors.
CREATE OR REPLACE PROCEDURE sp_get_ddl( var_db_name VARCHAR, var_schema_name
VARCHAR)
RETURNS TABLE(DDL VARCHAR)
LANGUAGE SQL
AS
$$
DECLARE
cursor_sql VARCHAR DEFAULT 'SELECT table_name, table_type FROM '||
var_db_name||'.information_schema.tables
WHERE table_schema = '''||var_schema_name||''' and table_type= ''BASE TABLE'' ORDER BY
table_name';
cursor_resultset RESULTSET DEFAULT (EXECUTE IMMEDIATE :cursor_sql);
table_cursor CURSOR FOR cursor_resultset;
my_sql VARCHAR;
my_union_sql VARCHAR;
res RESULTSET;
counter NUMBER DEFAULT 1;
BEGIN
FOR var in table_cursor DO
my_sql := 'SELECT GET_DDL(''TABLE'','''||var.table_name||''')';
IF(counter=1) THEN
my_union_sql := :my_sql;
ELSE
my_union_sql := :my_union_sql || ' UNION ALL ' || :my_sql;
END IF;
counter := counter + 1;
END FOR;
res := (EXECUTE IMMEDIATE :my_union_sql);
RETURN table(res);
END;
$$
;
In the above example:
 In the line 1, the stored procedure sp_get_ddl accepts two inputs as
variables var_db_name and var_schema_name, the database and the schema name
respectively from which you wanted to extract the DDL of tables.
 In the line 2, the RETURN TYPE of the procedure is of type TABLE since we output
a bunch of rows each with the DDL of a table.
 In the line 7, we created variable cursor_sql of type VARCHAR which holds the SQL
to be assigned to cursor as an expression.
The reason for passing SELECT statement as a string expression is that we are passing database
name also as a variable in the query which cannot be passed as a bind variable since it is not used as
a filter value in the query, but the part of the query itself.
 In the line 9, we are executing the SQL string expression using EXECUTE
IMMEDIATE and assigning it to a RESULTSET cursor_result_set.
 In the line 10, finally the cursor table_cursor is declared and assigned a result
set cursor_result_set which holds the query.
 In the line 16, we are using the FOR command to loop through the result set of the
cursor.
 In the line 17, we are creating another SQL string expression my_sql which provides
the DDL of a table.
Since we cannot provide an output in each loop we are performing in Snowflake Stored Procedures,
we have to concatenate queries using UNION ALL and form a single query at the end of the loop. The
final query which holds the statement to provide DDL of all tables is executed outside the loop and
provided as output.
 In the line 18-19, when the counter is 1, we assign the SQL string
expression my_sql as value to the variable my_union_sql. The counter is also
incremented at the end of iteration.
 In the line 20-21, after the initial iteration all the conditions go to ELSE clause and
the SQL expression is concatenated to previous values using UNION ALL.
 In the line 25, the output of the final query present in the my_union_sql is executed
and assigned to a variable of type RESULTSET res.
 In the line 26, the variable res is returned as an output in the form of TABLE.
The stored procedure sp_get_ddl is executed as below with database name as DEMO_DB and
schema name as PUBLIC.
CALL sp_get_ddl('DEMO_DB','PUBLIC');
The output of the stored procedure sp_get_ddl is as follows.

Output of procedure sp_get_ddl providing DDL of all tables


Checkout other articles related to Snowflake Stored Procedures
Contents hide
1. Introduction
2. Syntax of RESULTSET in Snowflake Stored Procedures
2.1. Declaring a RESULTSET
2.2. Returning the data of a RESULTSET as a Table
2.3. Accessing data from a RESULTSET using a Cursor
3. Return output of a SELECT statement in Stored Procedures using RESULTSET
3.1. Declaring RESULTSET with a DEFAULT Clause
3.2. Declaring RESULTSET without a DEFAULT Clause
3.3. Constructing the SQL statement dynamically for RESULTSET
4. Difference between a RESULTSET and CURSOR in Snowflake Stored Procedures
5. Conclusion
1. Introduction
Snowflake allows storing the entire rows present in the result set of a SELECT statement and
return them as output in the form a table using RESULTSET. RESULTSET is a SQL data type
supported only in Snowflake Stored Procedures that points to the result set of a query.
The results that RESULTSET points to can be accessed using one of the following ways.
 Return the results as a table using the TABLE() syntax.
 Assigning the RESULTSET to a CURSOR and looping over it.
2. Syntax of RESULTSET in Snowflake Stored Procedures
The syntax of RESULTSET includes two different parts
1. Declaring a RESULTSET and assigning a SQL statement.
2. Accessing data from a RESULTSET
 Returning the data of a RESULTSET as a table.
 Accessing data from a RESULTSET using a Cursor
2.1. Declaring a RESULTSET
The RESULTSET can be declared either in the DECLARE or BEGIN…END section of the stored
procedures.
The below is the syntax to declare a RESULTSET in the DECLARE section of the stored procedure.
DECLARE
<resultset_name> RESULTSET DEFAULT ( <query> );
The below is the syntax to declare a RESULTSET in the BEGIN…END section of the stored
procedure.
BEGIN
LET <resultset_name> := ( <query> );
(or)
LET <resultset_name> RESULTSET := ( <query> );
END;
2.2. Returning the data of a RESULTSET as a Table
In order to return the results of a RESULTSET as an output, pass the RESULTSET to TABLE(). In
the CREATE PROCEDURE, the return type should be declared as a TABLE along with the columns
and their data types.
The below is the syntax to return a RESULTSET as a Table in a stored procedure.
CREATE PROCEDURE sp_demo()
RETURNS TABLE( column_1 <data_type>, column_2 <data_type>,…)

BEGIN

RETURN TABLE(<resultset_name>);
END;
2.3. Accessing data from a RESULTSET using a Cursor
Instead of assigning the SELECT query directly to the CURSOR, you can assign a variable of type
RESULTSET which holds the query.
The below is the syntax to access data from a RESULTSET using a CURSOR.
BEGIN
LET <resultset_name> RESULTSET := ( <query> );
LET <cursor_name> CURSOR FOR <resultset_name>;

END;
To learn more about using RESULTSET with CURSORS, refer our previous article on cursors.
3. Return output of a SELECT statement in Stored Procedures using RESULTSET
In order to return the output of a SELECT statement in stored procedures, we can assign the SQL
statement to a variable of type RESULTSET and return the data from RESULTSET as a table.
Let us understand how to implement RESULTSETs in stored procedures using examples. Consider
below EMPLOYEES data as an example for demonstration purpose.
CREATE OR REPLACE TABLE EMPLOYEES(
ID NUMBER,
EMP_NAME VARCHAR(50)
);

INSERT INTO EMPLOYEES VALUES (101, 'TONY'), (102, 'STEVE');


3.1. Declaring RESULTSET with a DEFAULT Clause
A RESULTSET can be declared in DECLARE section of the stored procedure and assigned with a
DEFAULT SELECT query while declaration as shown below.
CREATE OR REPLACE PROCEDURE sp_demo_resultset()
RETURNS TABLE(ID NUMBER, ENAME VARCHAR)
LANGUAGE SQL
AS
$$
DECLARE
res RESULTSET DEFAULT (SELECT ID, EMP_NAME FROM EMPLOYEES);
BEGIN
RETURN TABLE(res);
END;
$$
;
The output of the stored procedure is as follows.
CALL sp_demo_resultset();
ID ENAME

101 TONY

102 STEVE

3.2. Declaring RESULTSET without a DEFAULT Clause


Instead of assigning the SELECT query while declaring using DEFAULT clause, the query can be
assigned after declaration in BEGIN…END section of the procedure as below.
CREATE OR REPLACE PROCEDURE sp_demo_resultset2()
RETURNS TABLE(ID NUMBER, ENAME VARCHAR)
LANGUAGE SQL
AS
$$
DECLARE
res RESULTSET;
BEGIN
res := (SELECT ID, EMP_NAME FROM EMPLOYEES);
RETURN TABLE(res);
END;
$$
;
The output of the stored procedure is as follows.
CALL sp_demo_resultset2();
ID ENAME

101 TONY

102 STEVE

3.3. Constructing the SQL statement dynamically for RESULTSET


A SQL statement can also be constructed dynamically as a string expression. But it cannot be directly
assigned to a variable of type RESULTSET as shown in previous examples.
The SQL expression should be executed using EXECUTE IMMEDIATE before assigning it to a
RESULTSET.
CREATE OR REPLACE PROCEDURE sp_demo_resultset_dynamic_query(var_id number)
RETURNS TABLE(ID NUMBER, ENAME VARCHAR)
LANGUAGE SQL
AS
$$
DECLARE
res RESULTSET;
sql_query VARCHAR;
BEGIN
sql_query := 'SELECT ID, EMP_NAME FROM EMPLOYEES WHERE ID ='|| var_id;
res := (EXECUTE IMMEDIATE :sql_query);
RETURN TABLE(res);
END;
$$
;
The output of the stored procedure is as follows.
CALL sp_demo_resultset_dynamic_query(101);
ID ENAME

101 TONY

4. Difference between a RESULTSET and CURSOR in Snowflake Stored Procedures


Though both RESULTSET and CURSOR provide access to the result set of a query, they differ in
following ways.
1. The main difference between a RESULTSET and CURSOR in Snowflake Stored Procedures is that
the Cursors allow you to loop through each row of the query result set and apply certain actions on
each row. Whereas it is not supported with the RESULTSET.
In order to loop through each row of the result set of a query that RESULTSET points to, it again
needs to be assigned to a Cursor.
2. The query assigned to the RESULTSET is executed when the query is assigned to it either in
DECLARE or BEGIN…END section of the procedure.
For Cursors, the query is not executed during assignment. The query is executed when you open the
cursor using OPEN command.
3. Binding variables to the query assigned is supported in Cursors whereas it is not supported in
RESULTSET
5. Conclusion
Snowflake supports RESULTSET only inside Stored Procedures with SQL Scripting.
Snowflake do not support
 Declaring an input parameter of type RESULTSET.
 Declaring a stored procedure’s return type as RESULTSET.
 Declaring column of type of RESULTSET.
 Querying RESULTSET as a table using SELECT.
Checkout other articles related to Snowflake Stored Procedures
Exceptions in Snowflake Stored Procedures
March 25, 2023
Spread the love
Contents hide
1. Introduction
2. Declaring an Exception in Snowflake Stored Procedures
3. Raising an Exception in Snowflake Stored Procedures
4. Catching an Exception in Snowflake Stored Procedures
5. Built-in Exception Variables in Snowflake Stored Procedures
5.1. SQLCODE
5.2. SQLERRM
5.3. SQLSTATE
6. Built-in Exceptions in Snowflake Stored Procedures
6.1. STATEMENT_ERROR
6.2. EXPRESSION_ERROR
6.3. OTHER
7. Closing Points
1. Introduction
An exception occurs during the execution of a procedure when an instruction is encountered which
cannot be executed at run-time due to an error. Apart from run-time errors, Snowflake also lets you
raise an exception manually whenever an undesired result is encountered to prevent the next lines of
code from executing.
Snowflake also allows catching an exception for the errors that can occur in our code and handle the
exception by defining exception handlers for each exception. These exception handlers contain the
code that needs to be executed when that particular exception arises.
In this article let us discuss how to declare, raise and catch exceptions in Snowflake Stored
Procedures.
2. Declaring an Exception in Snowflake Stored Procedures
The user-defined Exception needs to be declared in the DECLARE section of the stored procedure.
The syntax to declaring an exception in DECLARE section of the stored procedure is as shown
below.
DECLARE
<exception_name> EXCEPTION ( <exception_number>, '<execption_message>') ;
Exception_name:
The name of the user-defined exception provided by the user.
Exception_Number:
This is the number to uniquely identify the exception which should be between -20000 to -20999.
Same number should not ne user for multiple exceptions with in the same procedure.
If you don’t not specify a number for the exception, the default value used is -20000.
Exception_Message:
This is the text that describes the exception. The text must not contain any double quote characters.
The below is an example of declaring exceptions with and without exception number and message.
DECLARE
MY_EXCEPTION1 EXCEPTION;
MY_EXCEPTION2 EXCEPTION(-20000,'Raised user defined exception
MY_SP_EXCEPTION1.');
3. Raising an Exception in Snowflake Stored Procedures
An exception can be raised manually by executing the RAISE command.
The syntax to raise an exception using RAISE command in the BEGIN..END section of the stored
procedure is as shown below.
RAISE <exception_name>;
The below is a simple example showing an exception being raised using RAISE command.
CREATE OR REPLACE PROCEDURE sp_raise_exception()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
MY_SP_EXCEPTION EXCEPTION;
BEGIN
RAISE MY_SP_EXCEPTION;
END;
$$
;
The output of the stored procedure is as follows.
CALL sp_raise_exception();

The below is another simple example showing an exception being raised using RAISE command
where exception number and message are defined.
CREATE OR REPLACE PROCEDURE sp_raise_exception()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
MY_SP_EXCEPTION EXCEPTION(-20001, 'Raised user defined exception
MY_SP_EXCEPTION.');
BEGIN
RAISE MY_SP_EXCEPTION;
END;
$$
;
The output of the stored procedure is as follows.
CALL sp_raise_exception();

Note that the exception number and message are displayed as per the exception definition and are
different from previous example.
4. Catching an Exception in Snowflake Stored Procedures
Whenever we are raising and exception using the RAISE command, the job fails providing the
information of the error.
Instead of letting the job fail, we can also handle the exception by catching it using
the EXCEPTION block of the stored procedure.
The syntax to catch an exception using the EXCEPTION block in the stored procedure is as shown
below.
BEGIN

EXCEPTION
WHEN <exception_name> THEN
<statement>;
END;
The below is an example showing how to catch an exception using EXCEPTION block in stored
procedures.
CREATE OR REPLACE PROCEDURE sp_raise_exception()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
MY_SP_EXCEPTION EXCEPTION(-20001, 'Raised user defined exception
MY_SP_EXCEPTION.');
BEGIN
RAISE MY_SP_EXCEPTION;
EXCEPTION
WHEN MY_SP_EXCEPTION THEN
RETURN 'Raised user defined exception MY_SP_EXCEPTION';
END;
$$
;
The output of the stored procedure is as follows.
CALL sp_raise_exception();

5. Built-in Exception Variables in Snowflake Stored Procedures


Snowflake provides some built-in variables which provides information about the exceptions raised in
the stored procedure.
The three built-in exception variables are as follows:
1. SQLCODE
2. SQLERRM
3. SQLSTATE
5.1. SQLCODE
The SQLCODE variable captures the exception number defined for the user defined exception while
declaring.
5.2. SQLERRM
The SQLERRM variable captures the error message defined for the user defined exception while
declaring.
5.3. SQLSTATE
The SQLSTATE variable is a 5-character code which indicates the return code of a call which accords
to ANSI SQL SQLSTATE . Snowflake uses additional values beyond those in the ANSI SQL
standard.
The below is an example which shows the usage of built-in exception variables in a stored procedure.
CREATE OR REPLACE PROCEDURE sp_raise_exception(var number)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
my_sp_exception1 EXCEPTION (-20001, 'Raised user defined exception
MY_SP_EXCEPTION1.');
my_sp_exception2 EXCEPTION (-20002, 'Raised user defined exception
MY_SP_EXCEPTION2.');
BEGIN
IF (var=0) THEN
RAISE my_sp_exception1;
ELSEIF (var=1) THEN
RAISE my_sp_exception2;
END IF;
RETURN var;
EXCEPTION
WHEN my_sp_exception1 THEN
RETURN SQLSTATE||':'||SQLCODE||':'||SQLERRM;
WHEN my_sp_exception2 THEN
RETURN SQLSTATE||':'||SQLCODE||':'||SQLERRM;
END;
$$
;
In the above stored procedure
 There are two user defined exceptions my_sp_exception1 and my_sp_exception2.
 If the value of the variable var=0, the my_sp_exception1 is raised.
 If the value of the variable var=1, the my_sp_exception2 is raised.
 If the value of the variable var is other than 0 and 1, the value of var is returned.
 For each exception, the details are captured using the built-in variables in the
EXCEPTION block of the stored procedure.
The output of the stored procedure with built-in variables for various inputs is as follows.
CALL sp_raise_exception(0);

CALL sp_raise_exception(1);
CALL sp_raise_exception(3);

6. Built-in Exceptions in Snowflake Stored Procedures


Snowflake provides built-in exceptions which are able to define the cause of the error and helps in
catching the error. These built-in exceptions can be used along with user defined exceptions in the
stored procedures.
The built-in exceptions in the Snowflake stored procedures are as follows
6.1. STATEMENT_ERROR
This exception indicates the error associated with executing a SQL statement. For example, if you
perform any operation on a table which does not exist, this exception is raised.
6.2. EXPRESSION_ERROR
This exception indicates an expression-related error. This error is raised, for instance, dividing by
zero or if you construct an expression that evaluates to a VARCHAR and try to assign its value to a
FLOAT.
6.3. OTHER
Though this exception do not capture only one particular error, this helps in catching the exceptions
that are not specified in the stored procedure.
The below is an example of a stored procedure capturing the error from a delete statement using built-
in exception STATEMENT_ERROR.
CREATE OR REPLACE PROCEDURE sp_purge_data()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
DELETE FROM emp;
EXCEPTION
WHEN STATEMENT_ERROR THEN
RETURN 'STATEMENT_ERROR:'||SQLSTATE||':'||SQLCODE||':'||SQLERRM;
END;
$$;
The output of the stored procedure is as follows.
CALL sp_purge_data();

The below is an example of a stored procedure capturing the error using built-in exception
EXPRESSION_ERROR.
CREATE OR REPLACE PROCEDURE sp_expression_error()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
var1 FLOAT;
var2 VARCHAR DEFAULT 'Some text';
BEGIN
var1 := var2;
RETURN var1;
EXCEPTION
WHEN EXPRESSION_ERROR THEN
RETURN 'STATEMENT_ERROR:'||SQLSTATE||':'||SQLCODE||':'||SQLERRM;
END;
$$;
The output of the stored procedure is as follows.
CALL sp_expression_error();

Note that we have not declared any exception in the above examples and the error information is
captured using a built-in exceptions.
In both the above two examples, you can replace the built-in exception with OTHER and it will still
capture the error information. Not just the built-in exceptions, the OTHER exception catches any
user-defined exception which is declared but not specified in EXCEPTION block.
The below is an example of a stored procedure capturing the error using built-in exception OTHER.
CREATE OR REPLACE PROCEDURE sp_demo_other()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
var1 NUMBER;
BEGIN
var1 := 1/0;
RETURN var1;
EXCEPTION
WHEN OTHER THEN
RETURN 'OTHER_ERROR:'||SQLSTATE||':'||SQLCODE||':'||SQLERRM;
END;
$$;
The output of the stored procedure is as follows.
CALL sp_demo_other();

7. Closing Points
More than one exception can be handled using one exception handler using OR.
The below exception block shows that same value to be returned for multiple exceptions using OR.
EXCEPTION
WHEN MY_EXCEPTION_1 OR MY_EXCEPTION_2 OR MY_EXCEPTION_3 THEN
RETURN 123;
WHEN MY_EXCEPTION_4 THEN
RETURN 4;
WHEN OTHER THEN
RETURN 99;
The exception handler should be at the end of the block. If the block contains statements after the
exception handler, those statements are not executed.
If more than one WHEN clause could match a specific exception, then the first WHEN clause that
matches is the one that is executed. The other clauses are not executed.
If you want to raise the exception which you caught in your exception handler, execute RAISE
command without any arguments.
BEGIN
DELETE FROM emp;
EXCEPTION
WHEN STATEMENT_ERROR THEN
LET ERROR_MESSAGE := SQLCODE || ': ' || SQLERRM;
INSERT INTO error_details VALUES (:ERROR_MESSAGE); -- Capture error details into a table.
RAISE; -- Raise the same exception that you are handling.
END;
Checkout other articles related to Snowflake Stored Procedures
Caller’s and Owner’s Rights in Snowflake Stored Procedures
March 31, 2023
Spread the love
Contents hide
1. Introduction
2. Caller’s Rights in Snowflake Stored Procedures
3. Owner’s Rights in Snowflake Stored Procedures
4. Difference between Caller’s and Owner’s Rights in Snowflake
5. Demonstration of Caller’s and Owner’s Rights
1. Introduction
The stored procedures in Snowflake runs either with caller’s rights or the owner’s rights which helps
in defining the privileges with which the statements in the stored procedure executes. By default,
when a stored procedure is created in Snowflake without specifying the rights with which it should be
executed, it runs with owner’s rights.
In this article let us discuss what are caller’s rights and owner’s rights, the differences between the
both and how to implement them in Snowflake stored procedures.
2. Caller’s Rights in Snowflake Stored Procedures
A caller’s rights stored procedure runs with the privileges of the role that called the stored procedure.
The term “Caller” in this context refers to the user executing the stored procedure, who may or may
not be the creator of the procedure.
Any statement that the caller could not execute outside the stored procedure cannot be executed
inside the stored procedure with caller’s rights.
At the time of creation of stored procedure, the creator has to specify if the stored procedure runs with
caller’s rights. The default is owner’s rights.
The syntax to create a stored procedure with caller’s rights is as shown below.
CREATE OR REPLACE PROCEDURE <procedure_name>()
RETURNS <data_type>
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$

$$;
3. Owner’s Rights in Snowflake Stored Procedures
An Owner’s rights stored procedure runs with the privileges of the role that created the stored
procedure. The term “Owner” in this context refers to the user who created the stored procedure, who
may or may not be executing the procedure.
The primary advantage of Owner’s rights is that the owner can delegate the privileges to
another role through stored procedure without actually granting privileges outside the
procedure.
For example, if a user do not have access to clean up data in a table is granted access to a stored
procedure (with owner’s rights) which does it. The user who do not have any privileges on table
can clean up the data in the table by executing the stored procedure. But the same statements in
the procedure when executed outside the procedure, cannot be executed by the user.
The syntax to create a stored procedure with owner’s rights is as shown below.
CREATE OR REPLACE PROCEDURE <procedure_name>()
RETURNS <data_type>
LANGUAGE SQL
EXECUTE AS OWNER
AS
$$

$$;
Note “EXECUTE AS OWNER” is optional. Even if the statement is not specified, the procedure is
created with owner’s rights.
4. Difference between Caller’s and Owner’s Rights in Snowflake
The below are the differences between Caller’s and Owner’s Rights in Snowflake.
Caller’s Rights Owner’s Rights

Runs with the privileges of the caller. Runs with the privileges of the owner.

Inherit the current warehouse of the caller. Inherit the current warehouse of the caller.

Use the database and schema that the caller is Use the database and schema that the stored procedure is created in,
currently using. schema that the caller is currently using.
5. Demonstration of Caller’s and Owner’s Rights
Let us understand how Caller’s and Owner’s Rights work with an example
using ACCOUNTADMIN and SYSADMIN roles.
Using ACCOUNTADMIN role, let us create a table named Organization for demonstration.
USE ROLE ACCOUNTADMIN;
CREATE TABLE organization(id NUMBER, org_name VARCHAR(50));
When the table is queried using SYSADMIN role, it throws an errors as shown below since no grants
on this table are provided to SYSADMIN.
USE ROLE SYSADMIN;
SELECT * FROM organization;

Let us create a stored procedure with Caller’s rights using ACCOUNTADMIN role to delete data
from Organization table.
USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE PROCEDURE sp_demo_callers_rights()


RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$
BEGIN
DELETE FROM ORGANIZATION WHERE ID = '101';
RETURN 'Data cleaned up from table.';
END;
$$
;
The output of the caller’s rights stored procedure with ACCOUNTADMIN role is as below.
USE ROLE ACCOUNTADMIN;
CALL sp_demo_callers_rights();

Assign the grants to execute the stored procedure to the SYSADMIN role.
USE ROLE ACCOUNTADMIN;
GRANT USAGE ON PROCEDURE DEMO_DB.PUBLIC.sp_demo_callers_rights() TO ROLE
SYSADMIN;
The output of the caller’s rights stored procedure with SYSADMIN role is as below.
USE ROLE SYSADMIN;
CALL sp_demo_callers_rights();

Since the SYSADMIN role do not have any privileges on Organization table, the execution of
procedure with caller’s rights also fails.
The owner of the stored procedure can change the procedure from an owner’s rights stored procedure
to a caller’s rights stored procedure (or vice-versa) by executing an ALTER
PROCEDURE command as shown below.
ALTER PROCEDURE sp_demo_callers_rights() EXECUTE AS OWNER;
The output of the owner’s rights stored procedure with SYSADMIN role is as below.
USE ROLE SYSADMIN;
CALL sp_demo_callers_rights();

Though the SYSADMIN role do not have privileges on Organization table, the execution of the
procedure which deletes data from the Organization table succeeds because the procedure
executes with Owner’s rights.
Checkout other articles related to Snowflake Stored Procedures
Window Functions in Snowflake Snowpark
February 21, 2024
Spread the love
Contents hide
1. Introduction to Window Functions
2. Window Functions in Snowpark
3. Demonstration of Window Functions in Snowpark
3.1. Find the Employees with Highest salary in each Department
3.2. Calculate Total Sum of Salary for each Department and display it alongside each employee’s
record
3.3. Calculate the Cumulative Sum of Salary for each Department
3.4. Calculate the Minimum Salary Between the Current Employee and the one Following for each
Department
1. Introduction to Window Functions
A Window Function performs a calculation across a set of table rows that are somehow related
to the current row and returns a single aggregated value for each row. Unlike regular aggregate
functions like SUM() or AVG(), window functions do not group rows into a single output row.
Instead, they compute a value for each row based on a specific window of rows.
Refer to our previous article to learn more about Window Functions in SQL, including examples to
help you understand them better.
In this article, let us explore how to implement Window Functions on DataFrames in Snowflake
Snowpark.
2. Window Functions in Snowpark
The Window class in Snowpark enables defining a WindowSpec (or Window Specification) that
determines which rows are included in a window. A window is a group of rows that are associated
with the current row by some relation.
Syntax
The following is the syntax to form a WindowSpec in Snowpark.
Window.<partitionBy_specification>.<orderBy_specification>.<windowFrame_specification>
 PartitionBy Specification
The Partition specification defines which rows are included in a window (partition). If
no partition is defined, all the rows are included in a single partition.
 OrderBy Specification
The Ordering specification determines the ordering of the rows within the window.
The ordering could be ascending (ASC) or descending (DESC).
 WindowFrame Specification
The Window Frame specification defines the subset of rows within a partition over
which the window function operates. It determines the range of rows to consider for
each row’s computation within the window partition.
A Window Function is formed by passing the WindowSpec to the aggregate functions (like SUM(),
AVG(), etc.) using the OVER clause.
<aggregate_function>(<arguments>).over(<windowSpec>)
To know the full list of functions that support Windows, refer to Snowflake Documentation.
3. Demonstration of Window Functions in Snowpark
Consider the EMPLOYEES data below for the demonstration of the Window Functions in
Snowpark.
#// creating dataframe with employee data

employee_data = [
[1,'TONY',24000,101],
[2,'STEVE',17000,101],
[3,'BRUCE',9000,101],
[4,'WANDA',20000,102],
[5,'VICTOR',12000,102],
[6,'STEPHEN',10000,103],
[7,'HANK',15000,103],
[8,'THOR',21000,103]
]

employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]

df_emp =session.createDataFrame(employee_data, schema=employee_schema)


df_emp.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |101 |
|2 |STEVE |17000 |101 |
|3 |BRUCE |9000 |101 |
|4 |WANDA |20000 |102 |
|5 |VICTOR |12000 |102 |
|6 |STEPHEN |10000 |103 |
|7 |HANK |15000 |103 |
|8 |THOR |21000 |103 |
------------------------------------------------
3.1. Find the Employees with Highest salary in each Department
Follow the below steps to find the details of Employees with the highest salary in each Department
using Window Functions in Snowpark.
STEP-1: Import all the necessary Snowpark libraries and create a WindowSpec
The following code creates a WindowSpec where a partition is created based on the DEPT_ID field
and the rows within each partition are ordered by SALARY in descending order.
#// Importing Snowpark Libraries
from snowflake.snowpark import Window
from snowflake.snowpark.functions import row_number, desc, col, min

#// creating a WindowSpec


windowSpec = Window.partitionBy("DEPT_ID").orderBy(col("SALARY").desc())
STEP-2: Create a Window Function that Ranks Employees by Salary within each Department
The following code creates a new field using a Window Function that assigns ranks to employees
based on their salaries within each department.
df_emp.withColumn("RANK", row_number().over(windowSpec)).show()

---------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"RANK" |
---------------------------------------------------------
|4 |WANDA |20000 |102 |1 |
|5 |VICTOR |12000 |102 |2 |
|1 |TONY |24000 |101 |1 |
|2 |STEVE |17000 |101 |2 |
|3 |BRUCE |9000 |101 |3 |
|8 |THOR |21000 |103 |1 |
|7 |HANK |15000 |103 |2 |
|6 |STEPHEN |10000 |103 |3 |
---------------------------------------------------------
In the above code, we have used the DataFrame.withColumn() method that returns a DataFrame
with an additional column with the specified column name computed using the specified expression.
In this scenario, the method returns all the columns of the DataFrame df_emp along with a new field
named RANK computed based on the Window Function passed as an expression.
Alternatively, the DataFrame.select() method can be used to achieve the same output as shown
below.
df_emp.select("*", row_number().over(windowSpec).alias("RANK")).show()
STEP-3: Filter the records with the Highest Salary in each Department
The following code filters the records with RANK value 1 and sorts the records based on
the DEPT_ID.
df_emp.withColumn("RANK",
row_number().over(windowSpec)).filter(col("RANK")==1).sort("DEPT_ID").show()

---------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"RANK" |
---------------------------------------------------------
|1 |TONY |24000 |101 |1 |
|4 |WANDA |20000 |102 |1 |
|8 |THOR |21000 |103 |1 |
---------------------------------------------------------
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement will be equivalent to the following query.
SELECT * FROM(
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
ROW_NUMBER() OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS RANK
FROM EMPLOYEES
)
WHERE RANK = 1
ORDER BY DEPT_ID ;
3.2. Calculate Total Sum of Salary for each Department and display it alongside each employee’s
record
The following code calculates the total Sum of the Salary for each Department and displays it
alongside each employee record using Window Functions in Snowpark.
windowSpec = Window.partitionBy("DEPT_ID")
df_emp.withColumn("TOTAL_SAL", sum("SALARY").over(windowSpec)).show()

--------------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"TOTAL_SAL" |
--------------------------------------------------------------
|1 |TONY |24000 |101 |50000 |
|2 |STEVE |17000 |101 |50000 |
|3 |BRUCE |9000 |101 |50000 |
|4 |WANDA |20000 |102 |32000 |
|5 |VICTOR |12000 |102 |32000 |
|6 |STEPHEN |10000 |103 |46000 |
|7 |HANK |15000 |103 |46000 |
|8 |THOR |21000 |103 |46000 |
--------------------------------------------------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
SUM(SALARY) OVER (PARTITION BY DEPT_ID) AS TOTAL_SAL
FROM EMPLOYEES
;
3.3. Calculate the Cumulative Sum of Salary for each Department
The following code calculates the total Cumulative Sum of the Salary for each Department and
displays it alongside each employee record using Window Functions in Snowpark.
windowSpec = Window.partitionBy("DEPT_ID").orderBy(col("EMP_ID"))
df_emp.withColumn("CUM_SAL", sum("SALARY").over(windowSpec)).show()

------------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"CUM_SAL" |
------------------------------------------------------------
|1 |TONY |24000 |101 |24000 |
|2 |STEVE |17000 |101 |41000 |
|3 |BRUCE |9000 |101 |50000 |
|4 |WANDA |20000 |102 |20000 |
|5 |VICTOR |12000 |102 |32000 |
|6 |STEPHEN |10000 |103 |10000 |
|7 |HANK |15000 |103 |25000 |
|8 |THOR |21000 |103 |46000 |
------------------------------------------------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
SUM(SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID) AS CUM_SAL
FROM EMPLOYEES
;
3.4. Calculate the Minimum Salary Between the Current Employee and the one Following for each
Department
The following code calculates the minimum salary between the current employee and the one
following for each department using Window Functions in Snowpark.
windowSpec =
Window.partitionBy("DEPT_ID").orderBy(col("EMP_ID")).rows_between(Window.currentRow,1)
df_emp.withColumn("MIN_SAL", min("SALARY").over(windowSpec)).sort("EMP_ID").show()

------------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"MIN_SAL" |
------------------------------------------------------------
|1 |TONY |24000 |101 |17000 |
|2 |STEVE |17000 |101 |9000 |
|3 |BRUCE |9000 |101 |9000 |
|4 |WANDA |20000 |102 |12000 |
|5 |VICTOR |12000 |102 |12000 |
|6 |STEPHEN |10000 |103 |10000 |
|7 |HANK |15000 |103 |15000 |
|8 |THOR |21000 |103 |21000 |
------------------------------------------------------------
The above Snowpark code is equivalent to the following SQL query where the salary of the current
employee record is compared with the next employee record.
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
MIN(SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID ROWS BETWEEN
CURRENT ROW AND 1 FOLLOWING) AS MIN_SAL
FROM EMPLOYEES
ORDER BY EMP_ID
;
Window Frames require the data within the window to be ordered. So, even though the ORDER BY
clause is optional in regular window function syntax, it is mandatory in window frame syntax.
Window Functions in Snowflake Snowpark
February 21, 2024
Spread the love
Contents hide
1. Introduction to Window Functions
2. Window Functions in Snowpark
3. Demonstration of Window Functions in Snowpark
3.1. Find the Employees with Highest salary in each Department
3.2. Calculate Total Sum of Salary for each Department and display it alongside each employee’s
record
3.3. Calculate the Cumulative Sum of Salary for each Department
3.4. Calculate the Minimum Salary Between the Current Employee and the one Following for each
Department
1. Introduction to Window Functions
A Window Function performs a calculation across a set of table rows that are somehow related
to the current row and returns a single aggregated value for each row. Unlike regular aggregate
functions like SUM() or AVG(), window functions do not group rows into a single output row.
Instead, they compute a value for each row based on a specific window of rows.
Refer to our previous article to learn more about Window Functions in SQL, including examples to
help you understand them better.
In this article, let us explore how to implement Window Functions on DataFrames in Snowflake
Snowpark.
2. Window Functions in Snowpark
The Window class in Snowpark enables defining a WindowSpec (or Window Specification) that
determines which rows are included in a window. A window is a group of rows that are associated
with the current row by some relation.
Syntax
The following is the syntax to form a WindowSpec in Snowpark.
Window.<partitionBy_specification>.<orderBy_specification>.<windowFrame_specification>
 PartitionBy Specification
The Partition specification defines which rows are included in a window (partition). If
no partition is defined, all the rows are included in a single partition.
 OrderBy Specification
The Ordering specification determines the ordering of the rows within the window.
The ordering could be ascending (ASC) or descending (DESC).
 WindowFrame Specification
The Window Frame specification defines the subset of rows within a partition over
which the window function operates. It determines the range of rows to consider for
each row’s computation within the window partition.
A Window Function is formed by passing the WindowSpec to the aggregate functions (like SUM(),
AVG(), etc.) using the OVER clause.
<aggregate_function>(<arguments>).over(<windowSpec>)
To know the full list of functions that support Windows, refer to Snowflake Documentation.
3. Demonstration of Window Functions in Snowpark
Consider the EMPLOYEES data below for the demonstration of the Window Functions in
Snowpark.
#// creating dataframe with employee data

employee_data = [
[1,'TONY',24000,101],
[2,'STEVE',17000,101],
[3,'BRUCE',9000,101],
[4,'WANDA',20000,102],
[5,'VICTOR',12000,102],
[6,'STEPHEN',10000,103],
[7,'HANK',15000,103],
[8,'THOR',21000,103]
]

employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]

df_emp =session.createDataFrame(employee_data, schema=employee_schema)


df_emp.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |101 |
|2 |STEVE |17000 |101 |
|3 |BRUCE |9000 |101 |
|4 |WANDA |20000 |102 |
|5 |VICTOR |12000 |102 |
|6 |STEPHEN |10000 |103 |
|7 |HANK |15000 |103 |
|8 |THOR |21000 |103 |
------------------------------------------------
3.1. Find the Employees with Highest salary in each Department
Follow the below steps to find the details of Employees with the highest salary in each Department
using Window Functions in Snowpark.
STEP-1: Import all the necessary Snowpark libraries and create a WindowSpec
The following code creates a WindowSpec where a partition is created based on the DEPT_ID field
and the rows within each partition are ordered by SALARY in descending order.
#// Importing Snowpark Libraries
from snowflake.snowpark import Window
from snowflake.snowpark.functions import row_number, desc, col, min

#// creating a WindowSpec


windowSpec = Window.partitionBy("DEPT_ID").orderBy(col("SALARY").desc())
STEP-2: Create a Window Function that Ranks Employees by Salary within each Department
The following code creates a new field using a Window Function that assigns ranks to employees
based on their salaries within each department.
df_emp.withColumn("RANK", row_number().over(windowSpec)).show()

---------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"RANK" |
---------------------------------------------------------
|4 |WANDA |20000 |102 |1 |
|5 |VICTOR |12000 |102 |2 |
|1 |TONY |24000 |101 |1 |
|2 |STEVE |17000 |101 |2 |
|3 |BRUCE |9000 |101 |3 |
|8 |THOR |21000 |103 |1 |
|7 |HANK |15000 |103 |2 |
|6 |STEPHEN |10000 |103 |3 |
---------------------------------------------------------
In the above code, we have used the DataFrame.withColumn() method that returns a DataFrame
with an additional column with the specified column name computed using the specified expression.
In this scenario, the method returns all the columns of the DataFrame df_emp along with a new field
named RANK computed based on the Window Function passed as an expression.
Alternatively, the DataFrame.select() method can be used to achieve the same output as shown
below.
df_emp.select("*", row_number().over(windowSpec).alias("RANK")).show()
STEP-3: Filter the records with the Highest Salary in each Department
The following code filters the records with RANK value 1 and sorts the records based on
the DEPT_ID.
df_emp.withColumn("RANK",
row_number().over(windowSpec)).filter(col("RANK")==1).sort("DEPT_ID").show()

---------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"RANK" |
---------------------------------------------------------
|1 |TONY |24000 |101 |1 |
|4 |WANDA |20000 |102 |1 |
|8 |THOR |21000 |103 |1 |
---------------------------------------------------------
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement will be equivalent to the following query.
SELECT * FROM(
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
ROW_NUMBER() OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS RANK
FROM EMPLOYEES
)
WHERE RANK = 1
ORDER BY DEPT_ID ;
3.2. Calculate Total Sum of Salary for each Department and display it alongside each employee’s
record
The following code calculates the total Sum of the Salary for each Department and displays it
alongside each employee record using Window Functions in Snowpark.
windowSpec = Window.partitionBy("DEPT_ID")
df_emp.withColumn("TOTAL_SAL", sum("SALARY").over(windowSpec)).show()

--------------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"TOTAL_SAL" |
--------------------------------------------------------------
|1 |TONY |24000 |101 |50000 |
|2 |STEVE |17000 |101 |50000 |
|3 |BRUCE |9000 |101 |50000 |
|4 |WANDA |20000 |102 |32000 |
|5 |VICTOR |12000 |102 |32000 |
|6 |STEPHEN |10000 |103 |46000 |
|7 |HANK |15000 |103 |46000 |
|8 |THOR |21000 |103 |46000 |
--------------------------------------------------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
SUM(SALARY) OVER (PARTITION BY DEPT_ID) AS TOTAL_SAL
FROM EMPLOYEES
;
3.3. Calculate the Cumulative Sum of Salary for each Department
The following code calculates the total Cumulative Sum of the Salary for each Department and
displays it alongside each employee record using Window Functions in Snowpark.
windowSpec = Window.partitionBy("DEPT_ID").orderBy(col("EMP_ID"))
df_emp.withColumn("CUM_SAL", sum("SALARY").over(windowSpec)).show()

------------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"CUM_SAL" |
------------------------------------------------------------
|1 |TONY |24000 |101 |24000 |
|2 |STEVE |17000 |101 |41000 |
|3 |BRUCE |9000 |101 |50000 |
|4 |WANDA |20000 |102 |20000 |
|5 |VICTOR |12000 |102 |32000 |
|6 |STEPHEN |10000 |103 |10000 |
|7 |HANK |15000 |103 |25000 |
|8 |THOR |21000 |103 |46000 |
------------------------------------------------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
SUM(SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID) AS CUM_SAL
FROM EMPLOYEES
;
3.4. Calculate the Minimum Salary Between the Current Employee and the one Following for each
Department
The following code calculates the minimum salary between the current employee and the one
following for each department using Window Functions in Snowpark.
windowSpec =
Window.partitionBy("DEPT_ID").orderBy(col("EMP_ID")).rows_between(Window.currentRow,1)
df_emp.withColumn("MIN_SAL", min("SALARY").over(windowSpec)).sort("EMP_ID").show()

------------------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |"MIN_SAL" |
------------------------------------------------------------
|1 |TONY |24000 |101 |17000 |
|2 |STEVE |17000 |101 |9000 |
|3 |BRUCE |9000 |101 |9000 |
|4 |WANDA |20000 |102 |12000 |
|5 |VICTOR |12000 |102 |12000 |
|6 |STEPHEN |10000 |103 |10000 |
|7 |HANK |15000 |103 |15000 |
|8 |THOR |21000 |103 |21000 |
------------------------------------------------------------
The above Snowpark code is equivalent to the following SQL query where the salary of the current
employee record is compared with the next employee record.
SELECT
EMP_ID, EMP_NAME, SALARY, DEPT_ID,
MIN(SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID ROWS BETWEEN
CURRENT ROW AND 1 FOLLOWING) AS MIN_SAL
FROM EMPLOYEES
ORDER BY EMP_ID
;
Window Frames require the data within the window to be ordered. So, even though the ORDER BY
clause is optional in regular window function syntax, it is mandatory in window frame syntax.
1. Introduction
The Table.update() method in Snowpark helps in updating the rows of a table. It returns a
tuple UpdateResult, representing the number of rows modified and the number of multi-joined rows
modified. This method can also be used to update the rows of a DataFrame.
Syntax
Table.update(<assignments>, <condition>, [<source>])
Parameters
 <assignments>
A dictionary that contains key-value pairs representing columns of a DataFrame and
the corresponding values with which they should be updated. The values can either be
a literal value or a column object.
 <condition>
Represents the specific condition based on which a column should be updated. If no
condition is specified, all the rows of the DataFrame will be updated.
 <source>
Represent another DataFrame based on which the data of the current DataFrame will
be updated. The join condition between both the DataFrames should be specified in
the <condition>.
2. Steps to Update a DataFrame in Snowpark
Follow the below steps to update data of a DataFrame in Snowpark using Table.update() method.
1. Create a DataFrame with the desired data using Session.createDataFrame(). The
DataFrame could be built based on an existing table or data read from a CSV file or
content created within the code.
2. Create a temporary table with the contents of the DataFrame using
the DataFrameWriter class.
3. Create a DataFrame to read the contents of the temporary table
using Session.table() method.
4. Using the Table.update() method, update the contents of the DataFrame which is
created using a temporary table.
5. Display the contents of the DataFrame to verify that the appropriate records have
been updated using the DataFrame.show() method.
Temporary tables only exist within the session in which they were created and are not visible to other
users or sessions. Once the session ends, the table is completely purged from the system. Therefore,
temporary tables are well-suited in the scenario of updating DataFrames.
3. Demonstration of Updating all rows of a DataFrame
STEP-1: Create DataFrame
The following code creates a DataFrame df_emp which holds the the EMPLOYEES data as shown
below.
#// create a DataFrame with employee data
employee_data = [
[1,'TONY',24000,10],
[2,'STEVE',17000,10],
[3,'BRUCE',9000,20],
[4,'WANDA',20000,20]
]

employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]

df_emp =session.createDataFrame(employee_data, schema=employee_schema)


df_emp.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |10 |
|2 |STEVE |17000 |10 |
|3 |BRUCE |9000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
STEP-2: Create Temporary Table
The following code creates a temporary table named tmp_emp in the Snowflake database using the
contents of df_emp DataFrame.
#// create a temp table
df_emp.write.mode("overwrite").save_as_table("tmp_emp", table_type="temp")
STEP-3: Read Temporary Table
The following code creates a new DataFrame df_tmp_emp which reads the contents of temporary
table tmp_emp.
#// create a DataFrame to read contents of temp table
df_tmp_emp = session.table("tmp_emp")
df_tmp_emp.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |10 |
|2 |STEVE |17000 |10 |
|3 |BRUCE |9000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
STEP-4: Update DataFrame
The following code updates all the records of DataFrame df_tmp_emp by multiplying the DEPT_ID
values by 10 and doubling the SALARY amounts.
#// update DEPT_ID and SALARY fields of all records
from snowflake.snowpark.types import IntegerType
from snowflake.snowpark.functions import cast

df_tmp_emp.update({"DEPT_ID": cast("DEPT_ID", IntegerType())*10, "SALARY":


cast("SALARY", IntegerType())*2 })
// UpdateResult(rows_updated=4, multi_joined_rows_updated=0)
Note that we have used the cast function to convert the DEPT_ID and SALARY fields to Integer type
before updating them.
STEP-5: Display Updated DataFrame
The following code displays the contents of the updated DataFrame.
#// display updated DataFrame
df_tmp_emp.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |48000 |100 |
|2 |STEVE |34000 |100 |
|3 |BRUCE |18000 |200 |
|4 |WANDA |40000 |200 |
------------------------------------------------
4. Updating a DataFrame based on a Condition
The following code updates the salary of all employees belonging to department 100.
#// update the SALARY field of employees where DEPT_ID is 100

df_tmp_emp.update({"SALARY": cast("SALARY", IntegerType())+ 100},


df_tmp_emp["DEPT_ID"] == 100 )
// UpdateResult(rows_updated=2, multi_joined_rows_updated=0)

df_tmp_emp.show()
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |48100 |100 |
|2 |STEVE |34100 |100 |
|3 |BRUCE |18000 |200 |
|4 |WANDA |40000 |200 |
------------------------------------------------
5. Updating a DataFrame based on data in another DataFrame
A DataFrame can also be updated based on the data in another DataFrame
using Table.update() method.
The following code updates employees’ SALARY in df_tmp_emp DataFrame where EMP_ID is
equal to EMP_ID in another DataFrame df_salary.
#// update DataFrame based on data in another DataFrame

df_salary = session.createDataFrame([[1, 50000], [2, 35000]], ["EMP_ID", "SALARY"])


df_salary.show()
-----------------------
|"EMP_ID" |"SALARY" |
-----------------------
|1 |50000 |
|2 |35000 |
-----------------------

df_tmp_emp.update({"SALARY": df_salary["SALARY"]} , df_tmp_emp["EMP_ID"] ==


df_salary["EMP_ID"], df_salary)
// UpdateResult(rows_updated=2, multi_joined_rows_updated=0)

df_tmp_emp.show()
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |50000 |100 |
|2 |STEVE |35000 |100 |
|3 |BRUCE |18000 |200 |
|4 |WANDA |40000 |200 |
------------------------------------------------
6. Updating a DataFrame using Session.sql() Method
The Session.sql() method in Snowpark can be used to execute a SQL statement. It returns a new
DataFrame representing the results of a SQL query.
Follow the below steps to update the data of a DataFrame in Snowpark using
the Session.sql() method.
1. Create a DataFrame with the desired data using Session.createDataFrame(). The
DataFrame could be built based on an existing table or data read from a CSV file or
content created within the code.
2. Create a temporary table with the contents of the DataFrame using
the DataFrameWriter class.
3. Use the Session.sql() method to update the contents of the temporary table.
4. Create a DataFrame to read the contents of the updated temporary table using
the session.table() method.
5. Display the contents of the DataFrame to verify that the appropriate records have
been updated using the DataFrame.show() method.
#// create DataFrame
employee_data = [
[1,'TONY',24000,10],
[2,'STEVE',17000,10],
[3,'BRUCE',9000,20],
[4,'WANDA',20000,20]
]
employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]
df_emp =session.createDataFrame(employee_data, schema=employee_schema)

#// create temporary table


df_emp.write.mode("overwrite").save_as_table("tmp_emp", table_type="temp")

#// update DataFrame using session.sql()


session.sql("UPDATE tmp_emp SET SALARY=70000 WHERE EMP_ID=3").collect()
// [Row(number of rows updated=1, number of multi-joined rows updated=0)]

#// create DataFrame to read contents of updated temp table


df_tmp_emp = session.table("tmp_emp")

#// display updated DataFrame


df_tmp_emp.show()
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |10 |
|2 |STEVE |17000 |10 |
|3 |BRUCE |70000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
Related Articles:
Joins in Snowflake Snowpark
February 5, 2024
Spread the love
Contents hide
1. Introduction
2. Joins in Snowpark
3. Demonstration of JOINS in Snowpark
3.1. Join DataFrames in Snowpark
3.2. Join DataFrames referring to a Single Column Name in Snowpark
3.3. Rename Ambiguous Columns of Join operation Output in Snowpark
3.4. Rename Ambiguous Columns of Join operation Output using lsuffix and rsuffix
3.5. Join DataFrames based on Multiple Conditions in Snowpark.
3.6. Join Types in Snowpark
1. Introduction
Joins are used to combine rows from two or more tables, based on a related column between them.
Joins allow for the creation of a comprehensive result set that incorporates relevant information from
multiple tables.
In this article, let us explore how to join data between two DataFrames in Snowflake Snowpark.
2. Joins in Snowpark
The DataFrame.join method in Snowpark helps in performing a join of specified type on the data of
the current DataFrame with another DataFrame based on a list of columns.
Syntax:
DataFrame.join(right DataFrame, <join_condition>, join_type=<join_type>)
Parameters:
right DataFrame – The other DataFrame to be joined with.
<join_condition> – The condition using which data in both the DataFrames are joined. The valid
values for a join condition are
A column or a list of column names. When a column or list of column names are
specified, this method assumes the same named columns are present in both the
DataFrames.
 Column names from both the DataFrames specifying the join condition.
<join_type> – The type of join to be applied to join data between two DataFrames. The Snowpark
API for Python supports the following join types found in SQL.
SQL Join Type Supported Value

Inner join “inner” (the default value)

Left outer join “left”, “leftouter”

Right outer join “right”, “rightouter”

Full outer join “full” “outer”, “fullouter”

Left semi join “semi”, “leftsemi”

Left anti join “anti”, “leftanti”

Cross join “cross”

3. Demonstration of JOINS in Snowpark


Consider the EMPLOYEES and DEPARTMENTS data below for the demonstration of the
implementation of the Joins in Snowpark.
#// create dataframe with employees data
employee_data = [
[1,'TONY',101],
[2,'STEVE',101],
[3,'BRUCE',102],
[4,'WANDA',102],
[5,'VICTOR',103],
[6,'HANK',105],
]

employee_schema = ["ID", "NAME", "DEPT_ID"]

df_emp =session.createDataFrame(employee_data, schema=employee_schema)


df_emp.show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|1 |TONY |101 |
|2 |STEVE |101 |
|3 |BRUCE |102 |
|4 |WANDA |102 |
|5 |VICTOR |103 |
|6 |HANK |105 |
-----------------------------
#// create dataframe with departments data
department_data = [
[101,'HR'],
[102,'SALES'],
[103,'IT'],
[104,'FINANCE'],
]

department_schema = ["DEPT_ID", "NAME"]

df_dept =session.createDataFrame(department_data, schema=department_schema)


df_dept.show()

-----------------------
|"DEPT_ID" |"NAME" |
-----------------------
|101 |HR |
|102 |SALES |
|103 |IT |
|104 |FINANCE |
-----------------------
3.1. Join DataFrames in Snowpark
The EMPLOYEES and DEPARTMENTS DataFrames can be joined using
the DataFrame.join method in Snowpark as shown below.
#// Joining two DataFrames

#// Method-1
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID).show()

#// Method-2
df_emp.join(df_dept, df_emp["DEPT_ID"] == df_dept["DEPT_ID"]).show()

------------------------------------------------------------------------------
|"ID" |"l_vrun_NAME" |"l_vrun_DEPT_ID" |"r_jc4z_DEPT_ID" |"r_jc4z_NAME" |
------------------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
------------------------------------------------------------------------------
3.2. Join DataFrames referring to a Single Column Name in Snowpark
The DataFrames can be joined by referring to a single column name if the name of the column is same
in both the DataFrames.
The EMPLOYEES and DEPARTMENTS DataFrames can be joined by referring to a single
column DEPT_ID as shown below.
#// Joining two DataFrames referring to a single column
df_emp.join(df_dept, "DEPT_ID").show()

----------------------------------------------------
|"DEPT_ID" |"ID" |"l_9ml8_NAME" |"r_8dfz_NAME" |
----------------------------------------------------
|101 |1 |TONY |HR |
|101 |2 |STEVE |HR |
|102 |3 |BRUCE |SALES |
|102 |4 |WANDA |SALES |
|103 |5 |VICTOR |IT |
----------------------------------------------------
3.3. Rename Ambiguous Columns of Join operation Output in Snowpark
When two DataFrames are joined, the overlapping columns will have random column names in the
resulting DataFrame as seen in the above examples.
The randomly named columns can be renamed using Column.alias as shown below.
#// Renaming the ambiguous columns
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID).\
select(df_emp.ID, df_emp.NAME.alias("EMP_NAME"), df_emp.DEPT_ID.alias("DEPT_ID"),
df_dept.NAME.alias("DEPT_NAME")).show()

-----------------------------------------------
|"ID" |"EMP_NAME" |"DEPT_ID" |"DEPT_NAME" |
-----------------------------------------------
|1 |TONY |101 |HR |
|2 |STEVE |101 |HR |
|3 |BRUCE |102 |SALES |
|4 |WANDA |102 |SALES |
|5 |VICTOR |103 |IT |
-----------------------------------------------
3.4. Rename Ambiguous Columns of Join operation Output using lsuffix and rsuffix
The randomly named overlapping columns can be renamed using lsuffix and rsuffix parameters
in DataFrame.join method.
 lsuffix – Suffix to add to the overlapping columns of the left DataFrame.
 rsuffix – Suffix to add to the overlapping columns of the right DataFrame.
#// Renaming the ambiguous columns using lsuffix and rsuffix
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID, lsuffix="_EMP",
rsuffix="_DEPT").show()

--------------------------------------------------------------------
|"ID" |"NAME_EMP" |"DEPT_ID_EMP" |"DEPT_ID_DEPT" |"NAME_DEPT" |
--------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
--------------------------------------------------------------------
It is recommended to use lsuffix and rsuffix parameters within DataFrame.join method when there
are overlapping columns between the DataFrames.
3.5. Join DataFrames based on Multiple Conditions in Snowpark.
DataFrames can be joined based on multiple conditions separated by the “&” symbol as shown
below.
DataFrame.join(right DataFrame, (<join_condition_1>) & (< join_condition_2>))
When the names of the columns are the same between both the DataFrames, they can be joined by
passing a list of column names as shown below.
DataFrame.join(right DataFrame, ["col_1", "col_2", ..]
The following is an example of joining EMPLOYEES and DEPARTMENTS based on multiple
conditions.
#// Joining two DataFrames based on Multiple conditions
df_emp.join(df_dept, (df_emp.DEPT_ID == df_dept.DEPT_ID) & (df_emp.ID <
df_dept.DEPT_ID), \
lsuffix="_EMP", rsuffix="_DEPT").show()
3.6. Join Types in Snowpark
By default, the DataFrame.join method applies an inner join to join the data between the two
DataFrames. The other supported join types can be specified to join the data between two DataFrames
as shown below.
#// Left Outer Join
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID, lsuffix="_EMP", rsuffix="_DEPT",
join_type="left").show()
--------------------------------------------------------------------
|"ID" |"NAME_EMP" |"DEPT_ID_EMP" |"DEPT_ID_DEPT" |"NAME_DEPT" |
--------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
|6 |HANK |105 |NULL |NULL |
--------------------------------------------------------------------
Instead of the “join_type” parameter, we can also use the “how” parameter to specify the join
condition.
#// Right Outer Join
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID, lsuffix="_EMP", rsuffix="_DEPT",
how="right").show()

--------------------------------------------------------------------
|"ID" |"NAME_EMP" |"DEPT_ID_EMP" |"DEPT_ID_DEPT" |"NAME_DEPT" |
--------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
|NULL |NULL |NULL |104 |FINANCE |
--------------------------------------------------------------------
IN Operator in Snowflake Snowpark
February 13, 2024
Spread the love
Contents hide
1. Introduction
2. IN Operator in Snowflake Snowpark
3. Demonstration of IN operator in Snowpark
3.1. Filtering Data from a Snowpark DataFrame using a Single Value
3.2. Filtering Data from a Snowpark DataFrame using Multiple Values
4. Implementing SubQueries in Snowpark using Column.in_() method
STEP-1: Extract the list of values to be passed to the IN operator into a DataFrame
STEP-2: Pass the DataFrame representing a Subquery as Input Parameter to the Column.in_() Method
5. Implementing the IN operator in the Snowpark SELECT clause
6. NOT IN Operator in Snowflake Snowpark
STEP-1: Use the IN Operator in the SELECT Clause to Identify Values Not Present in the Specified
List
STEP-2: Filter Values Not Present in the Specified List
STEP-3: Filter DataFrame by Passing a List of Values Not Present in the Specified List
1. Introduction
The IN operator in SQL allows you to specify multiple values in a WHERE clause to filter the data. It
serves as a shorthand for employing multiple OR conditions. Additionally, the IN operator can be
utilized with a subquery within the WHERE clause.
In this article, let us explore how the IN operator can be implemented with DataFrames in Snowflake
Snowpark.
2. IN Operator in Snowflake Snowpark
The Column.in_() method in Snowpark returns a list of values that can be passed to
the DataFrame.filter() method (equivalent to WHERE in SQL) to perform the equivalent of an IN
operator in SQL.
The supported values of Column.in_() method are a sequence of values or a DataFrame that
represents a subquery.
3. Demonstration of IN operator in Snowpark
Consider the EMPLOYEES and DEPARTMENTS data below for the demonstration of the IN
operator in Snowpark.
#// create dataframe with employees data
employee_data = [
[1,'TONY',101],
[2,'STEVE',101],
[3,'BRUCE',102],
[4,'WANDA',102],
[5,'VICTOR',103],
[6,'HANK',105],
]

employee_schema = ["ID", "NAME", "DEPT_ID"]

df_emp =session.createDataFrame(employee_data, schema=employee_schema)


df_emp.show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|1 |TONY |101 |
|2 |STEVE |101 |
|3 |BRUCE |102 |
|4 |WANDA |102 |
|5 |VICTOR |103 |
|6 |HANK |105 |
-----------------------------
#// create dataframe with departments data
department_data = [
[101,'HR'],
[102,'SALES'],
[103,'IT'],
[104,'FINANCE'],
]

department_schema = ["DEPT_ID", "NAME"]

df_dept =session.createDataFrame(department_data, schema=department_schema)


df_dept.show()

-----------------------
|"DEPT_ID" |"NAME" |
-----------------------
|101 |HR |
|102 |SALES |
|103 |IT |
|104 |FINANCE |
-----------------------
3.1. Filtering Data from a Snowpark DataFrame using a Single Value
The following example extracts details of employees with ID=1 from the EMPLOYEES DataFrame.
df_emp.filter(col("ID")==1).show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|1 |TONY |101 |
-----------------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT * FROM EMPLOYEES WHERE ID = 1;
3.2. Filtering Data from a Snowpark DataFrame using Multiple Values
The following example extracts details of employees with ID 1, 2, and 3 from
the EMPLOYEES DataFrame using Column.in_() method.
from snowflake.snowpark.functions import col

df_emp.filter(col("ID").in_(1,2,3)).show()

#// (or) //

df_emp.filter(df_emp.col("ID").in_(1,2,3)).show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|1 |TONY |101 |
|2 |STEVE |101 |
|3 |BRUCE |102 |
-----------------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT * FROM EMPLOYEES WHERE ID IN (1,2,3);
4. Implementing SubQueries in Snowpark using Column.in_() method
A Subquery also known as an inner query or nested query is a query nested within another SQL
statement. The inner query is executed first and its results are used by the outer query to further filter,
join, or manipulate data.
Consider a scenario where we need to extract the details of all employees belonging to
the SALES department. The same can be achieved using the below SQL query using a filter condition
on the SALES department in a subquery.
SELECT * FROM EMPLOYEES WHERE DEPT_ID IN (
SELECT DEPT_ID FROM DEPARTMENTS WHERE NAME = 'SALES');
Let us understand how the same can be implemented in Snowpark.
STEP-1: Extract the list of values to be passed to the IN operator into a DataFrame
In this scenario, we need to extract the DEPT_ID value of the SALES department from
the DEPARTMENTS DataFrame into a new DataFrame.
The following code applies a filter on the NAME field with values as ‘SALES’ and selects only
the DEPT_ID field from the DEPARTMENTS DataFrame into a new DataFrame.
df_dept_SALES = df_dept.filter(col("NAME")=="SALES").select("DEPT_ID")
df_dept_SALES.show()
-------------
|"DEPT_ID" |
-------------
|102 |
-------------
STEP-2: Pass the DataFrame representing a Subquery as Input Parameter to the Column.in_() Method
The following code returns the details of all employees belonging to the SALES department by
passing the DataFrame that holds the SALES department ID value as input to
the Column.in_() method.
df_emp_SALES = df_emp.filter(col("DEPT_ID").in_(df_dept_SALES))
df_emp_SALES.show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|3 |BRUCE |102 |
|4 |WANDA |102 |
-----------------------------
5. Implementing the IN operator in the Snowpark SELECT clause
The Column.in_() method in Snowpark can also be passed to a DataFrame.select() call. The
expression returns a Boolean value and evaluates to true if the value in the column is one of the values
in the specified sequence.
The following code returns the ID column from the EMPLOYEES DataFrame along with a new
column that returns true if the ID value is present in one of the values passed to the IN operator.
#// IN operator in SELECT clause
df_emp.select(col("ID"), col("ID").in_(1,2,3).alias("IS_EXISTS")).show()

----------------------
|"ID" |"IS_EXISTS" |
----------------------
|1 |True |
|2 |True |
|3 |True |
|4 |False |
|5 |False |
|6 |False |
----------------------
The above Snowpark code is equivalent to the following SQL query.
SELECT ID, ID IN (1,2,3) AS IS_EXISTS FROM EMPLOYEES;
6. NOT IN Operator in Snowflake Snowpark
In SQL, using the NOT keyword preceding the IN operator retrieves all records that do not match any
of the values in the list. For example, the following SQL statement returns all employee records
whose ID is not 1, 2, or 3.
SELECT * FROM EMPLOYEES WHERE ID NOT IN (1,2,3);
There is no in-built method available in Snowpark that can perform the same actions as NOT
IN operator in SQL.
To implement the NOT IN operator in Snowpark, we still utilize the Column.in_() method.
However, it’s essential to ensure that the DataFrame passed as an input parameter to the method
contains a list of values other than those in the specified list.
GET all employee records whose ID is not 1, 2, or 3.
STEP-1: Use the IN Operator in the SELECT Clause to Identify Values Not Present in the Specified
List
The following code returns ‘True’ for IDs that are present in the list passed to
the Column.in_() method, and ‘False’ if they are not present.
df1 = df_emp.select(col("ID"), col("ID").in_(1,2,3).alias("IS_EXISTS"))
df1.show()

----------------------
|"ID" |"IS_EXISTS" |
----------------------
|1 |True |
|2 |True |
|3 |True |
|4 |False |
|5 |False |
|6 |False |
----------------------
STEP-2: Filter Values Not Present in the Specified List
The following code retrieves all the IDs from the EMPLOYEES DataFrame, excluding those that are
present in the specified list, by filtering out the records that returned ‘False’ in the previous step.
df2 = df1.filter(col("IS_EXISTS")=='False').select("ID")
df2.show()

--------
|"ID" |
--------
|4 |
|5 |
|6 |
--------
STEP-3: Filter DataFrame by Passing a List of Values Not Present in the Specified List
The following code retrieves all employee records whose ID is not 1, 2 or 3
from EMPLOYEES DataFrame by passing a DataFrame that holds all the employee IDs except for
1, 2, and 3.
df3 = df_emp.filter(col("ID").in_(df2))
df3.show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|4 |WANDA |102 |
|5 |VICTOR |103 |
|6 |HANK |105 |
-----------------------------
All the above mentioned steps are equivalent to the following SQL query.
SELECT * FROM EMPLOYEES WHERE ID IN (
SELECT ID FROM(
SELECT ID, ID IN (1,2,3) IS_EXISTS FROM EMPLOYEES)
WHERE IS_EXISTS = 'False');
The process to identify the values that are not present in the required list may vary depending on the
specific scenario. However, the overall approach is to identify the values absent from the list specified
in the NOT IN condition and leverage them to filter the records.
1. Introduction
Joins are used to combine rows from two or more tables, based on a related column between them.
Joins allow for the creation of a comprehensive result set that incorporates relevant information from
multiple tables.
In this article, let us explore how to join data between two DataFrames in Snowflake Snowpark.
2. Joins in Snowpark
The DataFrame.join method in Snowpark helps in performing a join of specified type on the data of
the current DataFrame with another DataFrame based on a list of columns.
Syntax:
DataFrame.join(right DataFrame, <join_condition>, join_type=<join_type>)
Parameters:
right DataFrame – The other DataFrame to be joined with.
<join_condition> – The condition using which data in both the DataFrames are joined. The valid
values for a join condition are
 A column or a list of column names. When a column or list of column names are
specified, this method assumes the same named columns are present in both the
DataFrames.
 Column names from both the DataFrames specifying the join condition.
<join_type> – The type of join to be applied to join data between two DataFrames. The Snowpark
API for Python supports the following join types found in SQL.
SQL Join Type Supported Value

Inner join “inner” (the default value)

Left outer join “left”, “leftouter”

Right outer join “right”, “rightouter”

Full outer join “full” “outer”, “fullouter”

Left semi join “semi”, “leftsemi”

Left anti join “anti”, “leftanti”

Cross join “cross”

3. Demonstration of JOINS in Snowpark


Consider the EMPLOYEES and DEPARTMENTS data below for the demonstration of the
implementation of the Joins in Snowpark.
#// create dataframe with employees data
employee_data = [
[1,'TONY',101],
[2,'STEVE',101],
[3,'BRUCE',102],
[4,'WANDA',102],
[5,'VICTOR',103],
[6,'HANK',105],
]

employee_schema = ["ID", "NAME", "DEPT_ID"]

df_emp =session.createDataFrame(employee_data, schema=employee_schema)


df_emp.show()

-----------------------------
|"ID" |"NAME" |"DEPT_ID" |
-----------------------------
|1 |TONY |101 |
|2 |STEVE |101 |
|3 |BRUCE |102 |
|4 |WANDA |102 |
|5 |VICTOR |103 |
|6 |HANK |105 |
-----------------------------
#// create dataframe with departments data
department_data = [
[101,'HR'],
[102,'SALES'],
[103,'IT'],
[104,'FINANCE'],
]

department_schema = ["DEPT_ID", "NAME"]

df_dept =session.createDataFrame(department_data, schema=department_schema)


df_dept.show()

-----------------------
|"DEPT_ID" |"NAME" |
-----------------------
|101 |HR |
|102 |SALES |
|103 |IT |
|104 |FINANCE |
-----------------------
3.1. Join DataFrames in Snowpark
The EMPLOYEES and DEPARTMENTS DataFrames can be joined using
the DataFrame.join method in Snowpark as shown below.
#// Joining two DataFrames

#// Method-1
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID).show()

#// Method-2
df_emp.join(df_dept, df_emp["DEPT_ID"] == df_dept["DEPT_ID"]).show()

------------------------------------------------------------------------------
|"ID" |"l_vrun_NAME" |"l_vrun_DEPT_ID" |"r_jc4z_DEPT_ID" |"r_jc4z_NAME" |
------------------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
------------------------------------------------------------------------------
3.2. Join DataFrames referring to a Single Column Name in Snowpark
The DataFrames can be joined by referring to a single column name if the name of the column is same
in both the DataFrames.
The EMPLOYEES and DEPARTMENTS DataFrames can be joined by referring to a single
column DEPT_ID as shown below.
#// Joining two DataFrames referring to a single column
df_emp.join(df_dept, "DEPT_ID").show()

----------------------------------------------------
|"DEPT_ID" |"ID" |"l_9ml8_NAME" |"r_8dfz_NAME" |
----------------------------------------------------
|101 |1 |TONY |HR |
|101 |2 |STEVE |HR |
|102 |3 |BRUCE |SALES |
|102 |4 |WANDA |SALES |
|103 |5 |VICTOR |IT |
----------------------------------------------------
3.3. Rename Ambiguous Columns of Join operation Output in Snowpark
When two DataFrames are joined, the overlapping columns will have random column names in the
resulting DataFrame as seen in the above examples.
The randomly named columns can be renamed using Column.alias as shown below.
#// Renaming the ambiguous columns
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID).\
select(df_emp.ID, df_emp.NAME.alias("EMP_NAME"), df_emp.DEPT_ID.alias("DEPT_ID"),
df_dept.NAME.alias("DEPT_NAME")).show()

-----------------------------------------------
|"ID" |"EMP_NAME" |"DEPT_ID" |"DEPT_NAME" |
-----------------------------------------------
|1 |TONY |101 |HR |
|2 |STEVE |101 |HR |
|3 |BRUCE |102 |SALES |
|4 |WANDA |102 |SALES |
|5 |VICTOR |103 |IT |
-----------------------------------------------
3.4. Rename Ambiguous Columns of Join operation Output using lsuffix and rsuffix
The randomly named overlapping columns can be renamed using lsuffix and rsuffix parameters
in DataFrame.join method.
 lsuffix – Suffix to add to the overlapping columns of the left DataFrame.
 rsuffix – Suffix to add to the overlapping columns of the right DataFrame.
#// Renaming the ambiguous columns using lsuffix and rsuffix
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID, lsuffix="_EMP",
rsuffix="_DEPT").show()

--------------------------------------------------------------------
|"ID" |"NAME_EMP" |"DEPT_ID_EMP" |"DEPT_ID_DEPT" |"NAME_DEPT" |
--------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
--------------------------------------------------------------------
It is recommended to use lsuffix and rsuffix parameters within DataFrame.join method when there
are overlapping columns between the DataFrames.
3.5. Join DataFrames based on Multiple Conditions in Snowpark.
DataFrames can be joined based on multiple conditions separated by the “&” symbol as shown
below.
DataFrame.join(right DataFrame, (<join_condition_1>) & (< join_condition_2>))
When the names of the columns are the same between both the DataFrames, they can be joined by
passing a list of column names as shown below.
DataFrame.join(right DataFrame, ["col_1", "col_2", ..]
The following is an example of joining EMPLOYEES and DEPARTMENTS based on multiple
conditions.
#// Joining two DataFrames based on Multiple conditions
df_emp.join(df_dept, (df_emp.DEPT_ID == df_dept.DEPT_ID) & (df_emp.ID <
df_dept.DEPT_ID), \
lsuffix="_EMP", rsuffix="_DEPT").show()
3.6. Join Types in Snowpark
By default, the DataFrame.join method applies an inner join to join the data between the two
DataFrames. The other supported join types can be specified to join the data between two DataFrames
as shown below.
#// Left Outer Join
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID, lsuffix="_EMP", rsuffix="_DEPT",
join_type="left").show()
--------------------------------------------------------------------
|"ID" |"NAME_EMP" |"DEPT_ID_EMP" |"DEPT_ID_DEPT" |"NAME_DEPT" |
--------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
|6 |HANK |105 |NULL |NULL |
--------------------------------------------------------------------
Instead of the “join_type” parameter, we can also use the “how” parameter to specify the join
condition.
#// Right Outer Join
df_emp.join(df_dept, df_emp.DEPT_ID == df_dept.DEPT_ID, lsuffix="_EMP", rsuffix="_DEPT",
how="right").show()

--------------------------------------------------------------------
|"ID" |"NAME_EMP" |"DEPT_ID_EMP" |"DEPT_ID_DEPT" |"NAME_DEPT" |
--------------------------------------------------------------------
|1 |TONY |101 |101 |HR |
|2 |STEVE |101 |101 |HR |
|3 |BRUCE |102 |102 |SALES |
|4 |WANDA |102 |102 |SALES |
|5 |VICTOR |103 |103 |IT |
|NULL |NULL |NULL |104 |FINANCE |
--------------------------------------------------------------------
GROUP BY in Snowflake Snowpark
February 2, 2024
Spread the love
Contents hide
1. Introduction
2. GROUP BY clause in Snowpark
3. Demonstration of GROUP BY clause in Snowpark
3.1. Find the Number of Employees in each Department
3.2. Find the MAX and MIN Salary of employees in each Department
4. HAVING Clause in Snowflake Snowpark
4.1. Find the Departments with more than two employees
1. Introduction
The GROUP BY clause in SQL is utilized in conjunction with the SELECT statement to aggregate
data from multiple records and organize the results based on one or more columns. The GROUP BY
clause returns a single row for each group.
In this article, let us explore how to implement the GROUP BY clause on rows of a DataFrame in
Snowflake Snowpark.
2. GROUP BY clause in Snowpark
The DataFrame.group_by method in Snowpark is similar to the GROUP BY clause in Snowflake
that helps in grouping of rows based on specified columns.
Syntax:
One or multiple columns can be passed as inputs to the group_by method as shown below.
DataFrame.group_by("col_1", "col_2",…)
A List of column names can be passed as inputs to the group_by method as shown below.
DataFrame.group_by(["col_1", "col_2",…])
Return Value:
The DataFrame.group_by method returns a RelationalGroupedDataFrame as an output.
A RelationalGroupedDataFrame is a representation of an underlying DataFrame where rows are
organized into groups based on common values. Aggregations can then be defined on top of this
grouped data.
>>> df = df_employee.group_by("DEPT_ID")
>>> type(df)

<class 'snowflake.snowpark.relational_grouped_dataframe.RelationalGroupedDataFrame'>
Unlike a regular DataFrame, only a limited set of methods are supported on a
RelationalGroupedDataFrame. To know the full list of methods supported on a
RelationalGroupedDataFrame, refer to the Snowflake Documentation.
3. Demonstration of GROUP BY clause in Snowpark
Consider the EMPLOYEE data below for the demonstration of the implementation of the GROUP
BY in Snowpark.
#// create dataframe with employee data
employee_data = [
[1,'TONY',24000,101],
[2,'STEVE',17000,101],
[3,'BRUCE',9000,101],
[4,'WANDA',20000,102],
[5,'VICTOR',12000,102],
[6,'STEPHEN',10000,103],
[7,'HANK',15000,103],
[8,'THOR',21000,103]
]

employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]

df_employee =session.createDataFrame(employee_data, schema=employee_schema)


df_employee.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |101 |
|2 |STEVE |17000 |101 |
|3 |BRUCE |9000 |101 |
|4 |WANDA |20000 |102 |
|5 |VICTOR |12000 |102 |
|6 |STEPHEN |10000 |103 |
|7 |HANK |15000 |103 |
|8 |THOR |21000 |103 |
------------------------------------------------
3.1. Find the Number of Employees in each Department
The following is the SQL Query that calculates the number of employees in each department.
SELECT DEPT_ID, COUNT(EMP_ID)
FROM EMPLOYEES
GROUP BY DEPT_ID;
The same can be achieved in Snowpark using the DataFrame.group_by method as shown below
>>> from snowflake.snowpark.functions import count
>>> df_employee.group_by("DEPT_ID").agg(count("EMP_ID")).show()

-------------------------------
|"DEPT_ID" |"COUNT(EMP_ID)" |
-------------------------------
|101 |3 |
|102 |2 |
|103 |3 |
-------------------------------
3.2. Find the MAX and MIN Salary of employees in each Department
The following is the SQL Query that calculates the MAX and MIN salary of employees in each
department.
SELECT DEPT_ID,
MAX(SALARY) MAX_SALARY, MIN(SALARY) MIN_SALARY
FROM EMPLOYEES
GROUP BY DEPT_ID
;
The same can be achieved in Snowpark using the DataFrame.group_by method as shown below.
>>> from snowflake.snowpark.functions import max, min
>>> df_employee.group_by("DEPT_ID").agg(max("SALARY"), min("SALARY")).show()

---------------------------------------------
|"DEPT_ID" |"MAX(SALARY)" |"MIN(SALARY)" |
---------------------------------------------
|101 |24000 |9000 |
|102 |20000 |12000 |
|103 |21000 |10000 |
---------------------------------------------
Note that we are employing Aggregate Functions using the DataFrame.agg method in conjunction
with the DataFrame.group_by method to achieve the solution.
Adding aliases to the aggregate fields using Column.alias method for returning a renamed column
name.
>>> df_employee.group_by("DEPT_ID").agg(max("SALARY").alias("MAX_SALARY"),
min("SALARY").alias("MIN_SALARY")).show()

-------------------------------------------
|"DEPT_ID" |"MAX_SALARY" |"MIN_SALARY" |
-------------------------------------------
|101 |24000 |9000 |
|102 |20000 |12000 |
|103 |21000 |10000 |
-------------------------------------------
4. HAVING Clause in Snowflake Snowpark
The HAVING clause in SQL is used in conjunction with the GROUP BY clause to filter the results
of a query based on aggregated values. Unlike the WHERE clause, which filters individual rows
before they are grouped, the HAVING clause filters the result set after the grouping and aggregation
process.
In Snowpark, there is no equivalent method that provides the functionality of the HAVING clause in
SQL. Instead, we can use DataFrame.filter method which filters rows of a DataFrame based on the
specified conditional expression (similar to WHERE in SQL).
Let us understand with an example.
4.1. Find the Departments with more than two employees
The following is the SQL Query that returns the departments with more than two employees.
SELECT DEPT_ID
FROM EMPLOYEES
GROUP BY DEPT_ID
HAVING COUNT(EMP_ID)>2;
Follow the below steps to return the departments with more than two employees in Snowpark.
STEP-1: Find the Number of Employees in each Department
using DataFrame.group_by and DataFrame.agg methods as shown below.
>>> df1 = df_employee.group_by("DEPT_ID").agg(count("EMP_ID").alias("EMP_COUNT"))
>>> df1.show()
---------------------------
|"DEPT_ID" |"EMP_COUNT" |
---------------------------
|101 |3 |
|102 |2 |
|103 |3 |
---------------------------
STEP-2: Filter the records with employee count >2 using the DataFrame.filter method as shown
below.
>>> df2 = df1.filter(col("EMP_COUNT") > 2)
>>> df2.show()

---------------------------
|"DEPT_ID" |"EMP_COUNT" |
---------------------------
|101 |3 |
|103 |3 |
---------------------------
STEP-3: Select only the Department ID field using the DataFrame.select method as shown below.
>>> df3 = df2.select("DEPT_ID")
>>> df3.show()

-------------
|"DEPT_ID" |
-------------
|101 |
|103 |
-------------
All these steps can be combined into a single command, as shown below.
>>> df_employee.group_by("DEPT_ID").agg(count("EMP_ID").alias("EMP_COUNT")).\
filter(col("EMP_COUNT")>2).select("DEPT_ID").show()
-------------
|"DEPT_ID" |
-------------
|101 |
|103 |
-------------
This is equivalent to the SQL query below, where an outer query is employed to filter the departments
using the WHERE clause as shown below.
SELECT DEPT_ID FROM(
SELECT DEPT_ID, COUNT(EMP_ID) EMP_COUNT
FROM EMPLOYEES
GROUP BY DEPT_ID)
WHERE EMP_COUNT>2;
GROUP BY in Snowflake Snowpark
February 2, 2024
Spread the love
Contents hide
1. Introduction
2. GROUP BY clause in Snowpark
3. Demonstration of GROUP BY clause in Snowpark
3.1. Find the Number of Employees in each Department
3.2. Find the MAX and MIN Salary of employees in each Department
4. HAVING Clause in Snowflake Snowpark
4.1. Find the Departments with more than two employees
1. Introduction
The GROUP BY clause in SQL is utilized in conjunction with the SELECT statement to aggregate
data from multiple records and organize the results based on one or more columns. The GROUP BY
clause returns a single row for each group.
In this article, let us explore how to implement the GROUP BY clause on rows of a DataFrame in
Snowflake Snowpark.
2. GROUP BY clause in Snowpark
The DataFrame.group_by method in Snowpark is similar to the GROUP BY clause in Snowflake
that helps in grouping of rows based on specified columns.
Syntax:
One or multiple columns can be passed as inputs to the group_by method as shown below.
DataFrame.group_by("col_1", "col_2",…)
A List of column names can be passed as inputs to the group_by method as shown below.
DataFrame.group_by(["col_1", "col_2",…])
Return Value:
The DataFrame.group_by method returns a RelationalGroupedDataFrame as an output.
A RelationalGroupedDataFrame is a representation of an underlying DataFrame where rows are
organized into groups based on common values. Aggregations can then be defined on top of this
grouped data.
>>> df = df_employee.group_by("DEPT_ID")
>>> type(df)

<class 'snowflake.snowpark.relational_grouped_dataframe.RelationalGroupedDataFrame'>
Unlike a regular DataFrame, only a limited set of methods are supported on a
RelationalGroupedDataFrame. To know the full list of methods supported on a
RelationalGroupedDataFrame, refer to the Snowflake Documentation.
3. Demonstration of GROUP BY clause in Snowpark
Consider the EMPLOYEE data below for the demonstration of the implementation of the GROUP
BY in Snowpark.
#// create dataframe with employee data
employee_data = [
[1,'TONY',24000,101],
[2,'STEVE',17000,101],
[3,'BRUCE',9000,101],
[4,'WANDA',20000,102],
[5,'VICTOR',12000,102],
[6,'STEPHEN',10000,103],
[7,'HANK',15000,103],
[8,'THOR',21000,103]
]

employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]

df_employee =session.createDataFrame(employee_data, schema=employee_schema)


df_employee.show()

------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |101 |
|2 |STEVE |17000 |101 |
|3 |BRUCE |9000 |101 |
|4 |WANDA |20000 |102 |
|5 |VICTOR |12000 |102 |
|6 |STEPHEN |10000 |103 |
|7 |HANK |15000 |103 |
|8 |THOR |21000 |103 |
------------------------------------------------
3.1. Find the Number of Employees in each Department
The following is the SQL Query that calculates the number of employees in each department.
SELECT DEPT_ID, COUNT(EMP_ID)
FROM EMPLOYEES
GROUP BY DEPT_ID;
The same can be achieved in Snowpark using the DataFrame.group_by method as shown below
>>> from snowflake.snowpark.functions import count
>>> df_employee.group_by("DEPT_ID").agg(count("EMP_ID")).show()

-------------------------------
|"DEPT_ID" |"COUNT(EMP_ID)" |
-------------------------------
|101 |3 |
|102 |2 |
|103 |3 |
-------------------------------
3.2. Find the MAX and MIN Salary of employees in each Department
The following is the SQL Query that calculates the MAX and MIN salary of employees in each
department.
SELECT DEPT_ID,
MAX(SALARY) MAX_SALARY, MIN(SALARY) MIN_SALARY
FROM EMPLOYEES
GROUP BY DEPT_ID
;
The same can be achieved in Snowpark using the DataFrame.group_by method as shown below.
>>> from snowflake.snowpark.functions import max, min
>>> df_employee.group_by("DEPT_ID").agg(max("SALARY"), min("SALARY")).show()

---------------------------------------------
|"DEPT_ID" |"MAX(SALARY)" |"MIN(SALARY)" |
---------------------------------------------
|101 |24000 |9000 |
|102 |20000 |12000 |
|103 |21000 |10000 |
---------------------------------------------
Note that we are employing Aggregate Functions using the DataFrame.agg method in conjunction
with the DataFrame.group_by method to achieve the solution.
Adding aliases to the aggregate fields using Column.alias method for returning a renamed column
name.
>>> df_employee.group_by("DEPT_ID").agg(max("SALARY").alias("MAX_SALARY"),
min("SALARY").alias("MIN_SALARY")).show()

-------------------------------------------
|"DEPT_ID" |"MAX_SALARY" |"MIN_SALARY" |
-------------------------------------------
|101 |24000 |9000 |
|102 |20000 |12000 |
|103 |21000 |10000 |
-------------------------------------------
4. HAVING Clause in Snowflake Snowpark
The HAVING clause in SQL is used in conjunction with the GROUP BY clause to filter the results
of a query based on aggregated values. Unlike the WHERE clause, which filters individual rows
before they are grouped, the HAVING clause filters the result set after the grouping and aggregation
process.
In Snowpark, there is no equivalent method that provides the functionality of the HAVING clause in
SQL. Instead, we can use DataFrame.filter method which filters rows of a DataFrame based on the
specified conditional expression (similar to WHERE in SQL).
Let us understand with an example.
4.1. Find the Departments with more than two employees
The following is the SQL Query that returns the departments with more than two employees.
SELECT DEPT_ID
FROM EMPLOYEES
GROUP BY DEPT_ID
HAVING COUNT(EMP_ID)>2;
Follow the below steps to return the departments with more than two employees in Snowpark.
STEP-1: Find the Number of Employees in each Department
using DataFrame.group_by and DataFrame.agg methods as shown below.
>>> df1 = df_employee.group_by("DEPT_ID").agg(count("EMP_ID").alias("EMP_COUNT"))
>>> df1.show()
---------------------------
|"DEPT_ID" |"EMP_COUNT" |
---------------------------
|101 |3 |
|102 |2 |
|103 |3 |
---------------------------
STEP-2: Filter the records with employee count >2 using the DataFrame.filter method as shown
below.
>>> df2 = df1.filter(col("EMP_COUNT") > 2)
>>> df2.show()

---------------------------
|"DEPT_ID" |"EMP_COUNT" |
---------------------------
|101 |3 |
|103 |3 |
---------------------------
STEP-3: Select only the Department ID field using the DataFrame.select method as shown below.
>>> df3 = df2.select("DEPT_ID")
>>> df3.show()

-------------
|"DEPT_ID" |
-------------
|101 |
|103 |
-------------
All these steps can be combined into a single command, as shown below.
>>> df_employee.group_by("DEPT_ID").agg(count("EMP_ID").alias("EMP_COUNT")).\
filter(col("EMP_COUNT")>2).select("DEPT_ID").show()

-------------
|"DEPT_ID" |
-------------
|101 |
|103 |
-------------
This is equivalent to the SQL query below, where an outer query is employed to filter the departments
using the WHERE clause as shown below.
SELECT DEPT_ID FROM(
SELECT DEPT_ID, COUNT(EMP_ID) EMP_COUNT
FROM EMPLOYEES
GROUP BY DEPT_ID)
WHERE EMP_COUNT>2;
Subscribe to our Newsletter !!
Aggregate Functions in Snowflake Snowpark
January 27, 2024
Spread the love
Contents hide
1. Introduction
2. Aggregate Functions in Snowpark
3. Demonstration of Aggregate Functions using DataFrame.agg Method in Snowpark
3.1. Passing a DataFrame Column Object
3.2. Passing a Tuple with Column Name and Aggregate Function
3.3. Passing a List of Column Objects and Tuple
3.4. Passing a dictionary Mapping Column Name to Aggregate Function
4. Aggregate Functions using DataFrame.select method in Snowpark
5. Renaming the Return Aggregate Fields
6. Passing Return Value of an Aggregate Function as an Input
1. Introduction
Aggregate functions perform a calculation on a set of values and return a single value. These
functions are often used in conjunction with the GROUP BY clause to perform calculations on
groups of rows.
To know the list of all the supported aggregate functions in Snowflake, refer to Snowflake
Documentation.
In this article, we will explore how to use aggregate functions in Snowflake Snowpark Python.
2. Aggregate Functions in Snowpark
The DataFrame.agg method in Snowpark is used to aggregate the data in a DataFrame. This method
accepts any valid Snowflake aggregate function names as input to perform calculations on multiple
rows and produce a single output.
There are several ways the DataFrame columns can be passed to DataFrame.agg method to perform
aggregate calculations.
1. A Column object
2. A tuple where the first element is a column object or a column name and the second
element is the name of the aggregate function
3. A list of the above
4. A dictionary that maps column name to an aggregate function name.
3. Demonstration of Aggregate Functions using DataFrame.agg Method in Snowpark
Follow the below steps to perform Aggregate Calculations using DataFrame.agg Method.
 STEP-1: Establish a connection with Snowflake from Snowpark using
the Session class.
 STEP-2: Import all the required aggregate functions (min, max, sum, etc.,) from
the snowflake.snowpark.functions package.
 STEP-3: Create a DataFrame that holds the data on which aggregate functions are to
be applied.
 STEP-4: Implement aggregate calculations on the DataFrame using the
DataFrame.agg method.
Demonstration
Consider the EMPLOYEE data below for the demonstration of the implementation of the Aggregate
functions in Snowpark.
#// Creating a DataFrame with EMPLOYEE data
employee_data = [
[1,'TONY',24000],
[2,'STEVE',17000],
[3,'BRUCE',9000],
[4,'WANDA',20000],
[5,'VICTOR',12000],
[6,'STEPHEN',10000]
]

employee_schema = ["EMP_ID", "EMP_NAME", "SALARY"]

df_employee =session.createDataFrame(employee_data, schema=employee_schema)


df_employee.show()

------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |
------------------------------------
|1 |TONY |24000 |
|2 |STEVE |17000 |
|3 |BRUCE |9000 |
|4 |WANDA |20000 |
|5 |VICTOR |12000 |
|6 |STEPHEN |10000 |
------------------------------------
3.1. Passing a DataFrame Column Object
Import all the necessary aggregate function methods from
the snowflake.snowpark.functions package before performing aggregate calculations as shown
below.
#// Importing the Aggregate Function methods
from snowflake.snowpark.functions import col, min, max, avg

#// Passing a Column object to DataFrame.agg method


df_employee.agg(max("SALARY"), min("SALARY")).show()
---------------------------------
|"MAX(SALARY)" |"MIN(SALARY)" |
---------------------------------
|24000 |9000 |
---------------------------------

df_employee.agg(max(col("SALARY")), min(col("SALARY"))).show()
---------------------------------
|"MAX(SALARY)" |"MIN(SALARY)" |
---------------------------------
|24000 |9000 |
---------------------------------
3.2. Passing a Tuple with Column Name and Aggregate Function
#// Passing a tuple with column name and aggregate function to DataFrame.agg method
df_employee.agg(("SALARY", "max"), ("SALARY", "min")).show()
---------------------------------
|"MAX(SALARY)" |"MIN(SALARY)" |
---------------------------------
|24000 |9000 |
---------------------------------
3.3. Passing a List of Column Objects and Tuple
#// Passing a list of the values
df_employee.agg([("SALARY", "min"), ("SALARY", "max"), avg(col("SALARY"))]).show()
-------------------------------------------------
|"MIN(SALARY)" |"MAX(SALARY)" |"AVG(SALARY)" |
-------------------------------------------------
|9000 |24000 |15333.333333 |
-------------------------------------------------
3.4. Passing a dictionary Mapping Column Name to Aggregate Function
#// Passing a dictionary mapping column name to aggregate function
df_employee.agg({"SALARY": "min"}).show()
-----------------
|"MIN(SALARY)" |
-----------------
|9000 |
-----------------
4. Aggregate Functions using DataFrame.select method in Snowpark
The DataFrame.select method can be used to return a new DataFrame with the specified Column
expressions as output. Aggregate functions can be utilized as column expressions to select and process
data from a DataFrame.
#// Aggregate functions using select method
df_employee.select(min("SALARY"), max("SALARY")).show()
-----------------------------------------
|"MIN(""SALARY"")" |"MAX(""SALARY"")" |
-----------------------------------------
|9000 |24000 |
-----------------------------------------
5. Renaming the Return Aggregate Fields
The output fields from the Aggregate Functions can be renamed to new column names
using Column._as or Column.alias methods as shown below.
#// Renaming column names
df_employee.agg(min("SALARY").as_("min_sal"), max("SALARY").alias("max_sal")).show()
-------------------------
|"MIN_SAL" |"MAX_SAL" |
-------------------------
|9000 |24000 |
-------------------------

df_employee.select(min("SALARY").as_("MIN_SAL"),
max("SALARY").alias("MAX_SAL")).show()
-------------------------
|"MIN_SAL" |"MAX_SAL" |
-------------------------
|9000 |24000 |
-------------------------
6. Passing Return Value of an Aggregate Function as an Input
Let us understand this with a simple example. Consider the requirement is to get the employee details
with max salary. This can be accomplished using the below SQL query.
-- Get employee details with MAX Salary
SELECT * FROM EMPLOYEES WHERE SALARY IN(
SELECT MAX(SALARY) FROM EMPLOYEES) ;

In the above example,


 The Max Salary amount in the table is calculated using the aggregate function.
 The Calculated Salary amount is passed as a filter to the employees table to extract
the entire employee details.
Let us understand how the same can be achieved in Snowpark.
The DataFrame.collect method in Snowpark is used to collect all the return values after executing all
the defined calculations on a DataFrame. The output is stored in the form of a list of Row objects.
In the following code, the max salary is calculated using the DataFrame.agg method and the return
value is stored into a variable using DataFrame.collect method.
max_sal = df_employee.agg(max("SALARY").alias("MAX_SALARY")).collect()
The following code shows that the variable max_sal is of a type list and the value stored in it.
type(max_sal)
-----------------
|<class 'list'> |
-----------------

print(max_sal)
----------------------------
|[Row(MAX_SALARY=24000)] |
----------------------------
Extract the max salary amount from the list object as shown below.
max_sal = df_max_sal[0]['MAX_SALARY']

type(max_sal)
-----------------
|<class 'int'> |
-----------------

print(max_sal)
----------
|24000 |
----------
The DataFrame.filter method filters rows from a DataFrame based on the specified conditional
expression (similar to WHERE in SQL).
The following code extracts the employee details with max salary.
#// Get employee details with max salary
df_employee.filter(col("SALARY") == max_sal).show()
------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |
------------------------------------
|1 |TONY |24000 |
------------------------------------
HOW TO: Create and Read Data from Snowflake Snowpark DataFrames?
January 10, 2024
Spread the love
Contents hide
1. Introduction
2. What is a DataFrame?
3. Pre-requisites to create a DataFrame in Snowpark
4. How to create a DataFrame in Snowpark?
5. How to Read data from a Snowpark DataFrame?
6. How to create a DataFrame in Snowpark with a List of Specified Values?
7. How to create a DataFrame in Snowpark with a List of Specified Values and Schema?
8. How to create a DataFrame in Snowpark using Pandas?
9. How to create a DataFrame in Snowpark from a range of numbers?
10. How to create a DataFrame in Snowpark from a Database Table?
11. How to create a DataFrame in Snowpark by reading files from a stage?
1. Introduction
Snowpark is a developer framework from Snowflake that allows developers to interact with
Snowflake directly and build complex data pipelines. In our previous article, we discussed what
Snowflake Snowpark is and how to set up a Python development environment for Snowpark.
In Snowpark, the primary method for querying and processing data is through a DataFrame. In this
article, we will explore what DataFrames are and guide you through the process of creating them in
Snowpark.
2. What is a DataFrame?
A DataFrame in Snowpark acts like a virtual table that organizes data in a structured manner.
Think of it as a way to express a SQL query, but in a different language. It operates lazily,
meaning it doesn’t process the data until you instruct it to perform a specific task, such as
retrieving or analyzing information.
The Snowpark API finally converts the DataFrames into SQL to execute your code in Snowflake.
3. Pre-requisites to create a DataFrame in Snowpark
To construct a DataFrame, you have to make use of Session class in Snowpark which establishes a
connection with a Snowflake database and provides methods for creating DataFrames and accessing
objects.
When you create a Session object, you provide connection parameters to establish a connection with a
Snowflake database as shown below
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session

connection_parameters = {
"account": "snowflake account",
"user": "snowflake username",
"password": "snowflake password",
"role": "snowflake role", # optional
"warehouse": "snowflake warehouse", # optional
"database": "snowflake database", # optional
"schema": "snowflake schema" # optional
}

session = Session.builder.configs(connection_parameters).create()
To create DataFrames in a Snowsight Python worksheet, construct them within the handler function
(main) and utilize the Session object (session) passed into the function.
def main(session: snowpark.Session):
# your code goes here
4. How to create a DataFrame in Snowpark?
The createDataFrame method of Session class in Snowpark creates a new DataFrame containing the
specified values from the local data.
Syntax:
The following is the syntax to create a DataFrame using createDataFrame method.
session.createDataFrame(data[, schema])
The accepted values for data in the createDataFrame method are List, Tuple or a Pandas
DataFrame.
 Lists are used to store multiple items in a single variable and are created using square
brackets.
ex: myList = [“one”, “two”, “three”]
 Tuples are used to store multiple items in a single variable and are created using
round brackets. The contents of a tuple cannot change once they have been created in
Python.
ex: myTuple = (“one”, “two”, “three”)
 Pandas is a Python library used for working with data sets. Pandas allows the
creation of DataFrames natively in Python.
The schema in the createDataFrame method can be a StructType containing names and data types of
columns, or just a list of column names, or None.
5. How to Read data from a Snowpark DataFrame?
Data from a Snowpark DataFrame can be retrieved by utilizing the show method.
Syntax:
The following is the syntax to read data from a Snowpark DataFrame using show method.
DataFrame.show([n, max_width])
Parameters:
 n – The value represents the number of rows to print out. This default value is 10.
 max_width – The maximum number of characters to print out for each column.
6. How to create a DataFrame in Snowpark with a List of Specified Values?
Example-1:
The following is an example of creating a DataFrame with a list of values and assigning the column
name as “a”.
df1 = session.createDataFrame([1,2,3,4], schema=["a"])
df1.show()

------
|"A" |
------
|1 |
|2 |
|3 |
|4 |
------
The DataFrame df1 when executed is translated and executed as SQL in Snowflake by Snowpark API
as shown below.
SELECT "A" FROM (
SELECT $1 AS "A"
FROM VALUES (1::INT), (2::INT), (3::INT), (4::INT)
) LIMIT 10
Example-2:
The following is an example of creating a DataFrame with multiple lists of values and assigning the
column names as “a”,”b”, “c” and “d”.
df2 = session.createDataFrame([[1,2,3,4],[5,6,7,8]], schema=["a","b","c","d"])
df2.show()

--------------------------
|"A" |"B" |"C" |"D" |
--------------------------
|1 |2 |3 |4 |
|5 |6 |7 |8 |
--------------------------
The DataFrame df2 when executed is translated and executed as SQL in Snowflake by Snowpark API
as shown below.
SELECT "A", "B", "C", "D" FROM (
SELECT $1 AS "A", $2 AS "B", $3 AS "C", $4 AS "D"
FROM VALUES
(1::INT, 2::INT, 3::INT, 4::INT),
(5::INT, 6::INT, 7::INT, 8::INT)
) LIMIT 10
7. How to create a DataFrame in Snowpark with a List of Specified Values and Schema?
When schema parameter in the createDataFrame method is passed as a list of column names or
None, the schema of the DataFrame will be inferred from the data across all rows.
Example-3:
The following is an example of creating a DataFrame with multiple lists of values with different data
types and assigning the column names as “a”,”b”, “c” and “d”.
df3 = session.createDataFrame([[1, 2, 'Snow', '2024-01-01'],[3, 4, 'Park', '2024-01-02']],
schema=["a","b","c","d"])
df3.show()

----------------------------------
|"A" |"B" |"C" |"D" |
----------------------------------
|1 |2 |Snow |2024-01-01 |
|3 |4 |Park |2024-01-02 |
----------------------------------
The DataFrame df3 when executed is translated and executed as SQL in Snowflake by Snowpark API
as shown below.
SELECT "A", "B", "C", "D" FROM (
SELECT $1 AS "A", $2 AS "B", $3 AS "C", $4 AS "D"
FROM VALUES
(1::INT, 2::INT, 'Snow'::STRING, '2024-01-01'::STRING),
(3::INT, 4::INT, 'Park'::STRING, '2024-01-02'::STRING)
) LIMIT 10
Note that in the above query, since we did not explicitly specify the data types of the columns
during definition, the values ‘2024-01-01’ and ‘2024-01-02’, despite being of “Date” data type,
are identified as “String” data type.
Example-4:
Create a custom schema parameter of StructType containing names and data types of columns and
pass it to the createDataFrame method as shown below.
#//create dataframe with schema
from snowflake.snowpark.types import IntegerType, StringType, StructField, StructType, DateType

my_schema = StructType(
[StructField("a", IntegerType()),
StructField("b", IntegerType()),
StructField("c", StringType()),
StructField("d", DateType())]
)

df4 = session.createDataFrame([[1, 2, 'Snow', '2024-01-01'],[3, 4, 'Park', '2024-01-02']],


schema=my_schema)
df4.show()

------------------------------------------
|"A" |"B" |"C" |"D" |
------------------------------------------
|1 |2 |Snow |2024-01-01 00:00:00 |
|3 |4 |Park |2024-01-02 00:00:00 |
------------------------------------------
The DataFrame df4 when executed is translated and executed as SQL in Snowflake by Snowpark API
referencing to the columns with the defined data types as shown below.
SELECT
"A", "B", "C",
to_date("D") AS "D"
FROM (
SELECT $1 AS "A", $2 AS "B", $3 AS "C", $4 AS "D"
FROM VALUES
(1::INT, 2::INT, 'Snow'::STRING, '2024-01-01'::STRING),
(3::INT, 4::INT, 'Park'::STRING, '2024-01-02'::STRING)
) LIMIT 10
Note that in the above query, the column “D” is read as Date data type in Snowflake.
8. How to create a DataFrame in Snowpark using Pandas?
A Pandas DataFrame can be passed as “data” to create a DataFrame in Snowpark.
Example-5:
The following is an example of creating a Snowpark DataFrame using pandas DataFrame.
import pandas as pd

df_pandas = session.createDataFrame(pd.DataFrame([1,2,3],columns=["a"]))
df_pandas.show()

------
|"a" |
------
|1 |
|2 |
|3 |
------
Unlike DataFrames created with Lists or Tuples using the ‘createDataFrame‘ method, when a
DataFrame is created using a pandas DataFrame, the Snowpark API creates a temporary table
and imports the data from the pandas DataFrame into it. When extracting data from the
Snowpark DataFrame created using the pandas DataFrame, the data is retrieved by querying
the temporary table.
The DataFrame df_pandas when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT * FROM
"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA"."SNOWPARK_TEMP_TABLE_9
RSV8KITUO" LIMIT 10
9. How to create a DataFrame in Snowpark from a range of numbers?
A DataFrame from a range of numbers can be created using range method of Session class in
Snowpark. The resulting DataFrame has single column named “ID” containing elements in a range
from start to end.
Syntax:
The following is the syntax to create a DataFrame using range method.
session.range(start[, end, step])
Parameters:
 start : The start value of the range. If end is not specified, start will be used as the
value of end.
 end : The end value of the range.
 step : The step or interval between numbers.
Example-6:
The following is an example of creating a DataFrame with a range of numbers from 1 to 9.
df_range = session.range(1,10).to_df("a")
df_range.show()

-------
|"A" |
-------
|1 |
|2 |
|3 |
|4 |
|5 |
|6 |
|7 |
|8 |
|9 |
-------
The DataFrame df_range when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT * FROM (
SELECT ( ROW_NUMBER() OVER ( ORDER BY SEQ8() ) - 1 ) * (1) + (1) AS id
FROM ( TABLE (GENERATOR(ROWCOUNT => 9)))
) LIMIT 10
Example-7:
The following is an example of creating a DataFrame with a range of numbers from 1 to 9 with a step
value of 2 and returning the output column renamed as “A”.
df_range2 = session.range(1,10,2).to_df("a")
df_range2.show()

-------
|"A" |
-------
|1 |
|3 |
|5 |
|7 |
|9 |
-------
The DataFrame df_range2 when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT "ID" AS "A" FROM (
SELECT ( ROW_NUMBER() OVER ( ORDER BY SEQ8() ) - 1 ) * (2) + (1) AS id
FROM ( TABLE (GENERATOR(ROWCOUNT => 5)))
) LIMIT 10
10. How to create a DataFrame in Snowpark from a Database Table?
The sql and table methods of Session class in Snowpark can be used to create a DataFrame from a
Database Table.
Example-8:
The following is an example of creating a DataFrame from a database table by executing a SQL query
using sql method of Session class in Snowpark.
df_sql = session.sql("SELECT * FROM
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.MONTHLY_REVENUE")
df_sql.show(5)

----------------------------------
|"YEAR" |"MONTH" |"REVENUE" |
----------------------------------
|2012 |5 |3264300.11 |
|2012 |6 |3208482.33 |
|2012 |7 |3311966.98 |
|2012 |8 |3311752.81 |
|2012 |9 |3208563.06 |
----------------------------------
The DataFrame df_sql when executed is translated and executed as SQL in Snowflake by Snowpark
API as shown below.
SELECT * FROM (SELECT * FROM
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.MONTHLY_REVENUE) LIMIT 5
Example-9:
The following is an example of creating a DataFrame from a database table by executing a SQL query
using table method of Session class in Snowpark.
df_sql = session.table("MONTHLY_REVENUE")
df_sql.show(5)

----------------------------------
|"YEAR" |"MONTH" |"REVENUE" |
----------------------------------
|2012 |5 |3264300.11 |
|2012 |6 |3208482.33 |
|2012 |7 |3311966.98 |
|2012 |8 |3311752.81 |
|2012 |9 |3208563.06 |
----------------------------------
The DataFrame df_table when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT * FROM MONTHLY_REVENUE LIMIT 5
11. How to create a DataFrame in Snowpark by reading files from a stage?
DataFrameReader class in Snowpark provides methods for loading data from a Snowflake stage to a
DataFrame with format-specific options. To use it:
1. Create a DataFrameReader object through Session.read method.
2. For CSV file format, create a custom schema parameter of StructType containing
names and data types of columns.
3. Set the file format specific properties such as delimiter using options() method.
4. Specify the file path and stage details by calling the method corresponding to the
CSV format, csv().
Example-10:
The following is an example of creating a DataFrame in Snowpark by reading CSV files from S3
stage.
from snowflake.snowpark.types import IntegerType, StringType, StructField, StructType

schema = StructType(
[StructField("EMPLOYEE_ID", IntegerType()),
StructField("FIRST_NAME", StringType()),
StructField("LAST_NAME", StringType()),
StructField("EMAIL", StringType())
])

df_s3_employee = session.read.schema(schema).options({"field_delimiter": ",", "skip_header":


1}).csv('@my_s3_stage/Inbox/')
df_s3_employee.show(5)

--------------------------------------------------------------
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL |
--------------------------------------------------------------
| 204384 | Steven | King | SKING@test.com |
| 204388 | Neena | Kochhar | NKOCHHAR@test.com |
| 204392 | Lex | De Haan | LDEHAAN@test.com |
| 204393 | Alexander | Hunold | AHUNOLD@test.com |
| 204394 | Bruce | Ernst | BERNST@test.com |
--------------------------------------------------------------
The DataFrame df_s3_employee when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
1. A temporary file format is created using the properties specified in
the options() method.
2. The stage files are queried using the file format created in the first step and the
columns are cast into the data types specified in the schema defined
3. The file format created in the first step is dropped.
--create a temporary file format
CREATE SCOPED TEMPORARY FILE FORMAT If NOT EXISTS
"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA".SNOWPARK_TEMP_FILE_FOR
MAT_Y00K7HK598
TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1

--select data from stage files using the temporary file format
SELECT * FROM (
SELECT
$1::INT AS "EMPLOYEE_ID",
$2::STRING AS "FIRST_NAME",
$3::STRING AS "LAST_NAME",
$4::STRING AS "EMAIL"
FROM @my_s3_stage/Inbox/( FILE_FORMAT =>
'"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA".SNOWPARK_TEMP_FILE_FOR
MAT_Y00K7HK598')
) LIMIT 5
--drop the temporary file format
DROP FILE FORMAT If EXISTS
"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA".SNOWPARK_TEMP_FILE_FOR
MAT_Y00K7HK598
HOW TO: Create and Read Data from Snowflake Snowpark DataFrames?
January 10, 2024
Spread the love
Contents hide
1. Introduction
2. What is a DataFrame?
3. Pre-requisites to create a DataFrame in Snowpark
4. How to create a DataFrame in Snowpark?
5. How to Read data from a Snowpark DataFrame?
6. How to create a DataFrame in Snowpark with a List of Specified Values?
7. How to create a DataFrame in Snowpark with a List of Specified Values and Schema?
8. How to create a DataFrame in Snowpark using Pandas?
9. How to create a DataFrame in Snowpark from a range of numbers?
10. How to create a DataFrame in Snowpark from a Database Table?
11. How to create a DataFrame in Snowpark by reading files from a stage?
1. Introduction
Snowpark is a developer framework from Snowflake that allows developers to interact with
Snowflake directly and build complex data pipelines. In our previous article, we discussed what
Snowflake Snowpark is and how to set up a Python development environment for Snowpark.
In Snowpark, the primary method for querying and processing data is through a DataFrame. In this
article, we will explore what DataFrames are and guide you through the process of creating them in
Snowpark.
2. What is a DataFrame?
A DataFrame in Snowpark acts like a virtual table that organizes data in a structured manner.
Think of it as a way to express a SQL query, but in a different language. It operates lazily,
meaning it doesn’t process the data until you instruct it to perform a specific task, such as
retrieving or analyzing information.
The Snowpark API finally converts the DataFrames into SQL to execute your code in Snowflake.
3. Pre-requisites to create a DataFrame in Snowpark
To construct a DataFrame, you have to make use of Session class in Snowpark which establishes a
connection with a Snowflake database and provides methods for creating DataFrames and accessing
objects.
When you create a Session object, you provide connection parameters to establish a connection with a
Snowflake database as shown below
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session

connection_parameters = {
"account": "snowflake account",
"user": "snowflake username",
"password": "snowflake password",
"role": "snowflake role", # optional
"warehouse": "snowflake warehouse", # optional
"database": "snowflake database", # optional
"schema": "snowflake schema" # optional
}

session = Session.builder.configs(connection_parameters).create()
To create DataFrames in a Snowsight Python worksheet, construct them within the handler function
(main) and utilize the Session object (session) passed into the function.
def main(session: snowpark.Session):
# your code goes here
4. How to create a DataFrame in Snowpark?
The createDataFrame method of Session class in Snowpark creates a new DataFrame containing the
specified values from the local data.
Syntax:
The following is the syntax to create a DataFrame using createDataFrame method.
session.createDataFrame(data[, schema])
The accepted values for data in the createDataFrame method are List, Tuple or a Pandas
DataFrame.
 Lists are used to store multiple items in a single variable and are created using square
brackets.
ex: myList = [“one”, “two”, “three”]
 Tuples are used to store multiple items in a single variable and are created using
round brackets. The contents of a tuple cannot change once they have been created in
Python.
ex: myTuple = (“one”, “two”, “three”)
 Pandas is a Python library used for working with data sets. Pandas allows the
creation of DataFrames natively in Python.
The schema in the createDataFrame method can be a StructType containing names and data types of
columns, or just a list of column names, or None.
5. How to Read data from a Snowpark DataFrame?
Data from a Snowpark DataFrame can be retrieved by utilizing the show method.
Syntax:
The following is the syntax to read data from a Snowpark DataFrame using show method.
DataFrame.show([n, max_width])
Parameters:
 n – The value represents the number of rows to print out. This default value is 10.
 max_width – The maximum number of characters to print out for each column.
6. How to create a DataFrame in Snowpark with a List of Specified Values?
Example-1:
The following is an example of creating a DataFrame with a list of values and assigning the column
name as “a”.
df1 = session.createDataFrame([1,2,3,4], schema=["a"])
df1.show()

------
|"A" |
------
|1 |
|2 |
|3 |
|4 |
------
The DataFrame df1 when executed is translated and executed as SQL in Snowflake by Snowpark API
as shown below.
SELECT "A" FROM (
SELECT $1 AS "A"
FROM VALUES (1::INT), (2::INT), (3::INT), (4::INT)
) LIMIT 10
Example-2:
The following is an example of creating a DataFrame with multiple lists of values and assigning the
column names as “a”,”b”, “c” and “d”.
df2 = session.createDataFrame([[1,2,3,4],[5,6,7,8]], schema=["a","b","c","d"])
df2.show()

--------------------------
|"A" |"B" |"C" |"D" |
--------------------------
|1 |2 |3 |4 |
|5 |6 |7 |8 |
--------------------------
The DataFrame df2 when executed is translated and executed as SQL in Snowflake by Snowpark API
as shown below.
SELECT "A", "B", "C", "D" FROM (
SELECT $1 AS "A", $2 AS "B", $3 AS "C", $4 AS "D"
FROM VALUES
(1::INT, 2::INT, 3::INT, 4::INT),
(5::INT, 6::INT, 7::INT, 8::INT)
) LIMIT 10
7. How to create a DataFrame in Snowpark with a List of Specified Values and Schema?
When schema parameter in the createDataFrame method is passed as a list of column names or
None, the schema of the DataFrame will be inferred from the data across all rows.
Example-3:
The following is an example of creating a DataFrame with multiple lists of values with different data
types and assigning the column names as “a”,”b”, “c” and “d”.
df3 = session.createDataFrame([[1, 2, 'Snow', '2024-01-01'],[3, 4, 'Park', '2024-01-02']],
schema=["a","b","c","d"])
df3.show()

----------------------------------
|"A" |"B" |"C" |"D" |
----------------------------------
|1 |2 |Snow |2024-01-01 |
|3 |4 |Park |2024-01-02 |
----------------------------------
The DataFrame df3 when executed is translated and executed as SQL in Snowflake by Snowpark API
as shown below.
SELECT "A", "B", "C", "D" FROM (
SELECT $1 AS "A", $2 AS "B", $3 AS "C", $4 AS "D"
FROM VALUES
(1::INT, 2::INT, 'Snow'::STRING, '2024-01-01'::STRING),
(3::INT, 4::INT, 'Park'::STRING, '2024-01-02'::STRING)
) LIMIT 10
Note that in the above query, since we did not explicitly specify the data types of the columns
during definition, the values ‘2024-01-01’ and ‘2024-01-02’, despite being of “Date” data type,
are identified as “String” data type.
Example-4:
Create a custom schema parameter of StructType containing names and data types of columns and
pass it to the createDataFrame method as shown below.
#//create dataframe with schema
from snowflake.snowpark.types import IntegerType, StringType, StructField, StructType, DateType

my_schema = StructType(
[StructField("a", IntegerType()),
StructField("b", IntegerType()),
StructField("c", StringType()),
StructField("d", DateType())]
)
df4 = session.createDataFrame([[1, 2, 'Snow', '2024-01-01'],[3, 4, 'Park', '2024-01-02']],
schema=my_schema)
df4.show()

------------------------------------------
|"A" |"B" |"C" |"D" |
------------------------------------------
|1 |2 |Snow |2024-01-01 00:00:00 |
|3 |4 |Park |2024-01-02 00:00:00 |
------------------------------------------
The DataFrame df4 when executed is translated and executed as SQL in Snowflake by Snowpark API
referencing to the columns with the defined data types as shown below.
SELECT
"A", "B", "C",
to_date("D") AS "D"
FROM (
SELECT $1 AS "A", $2 AS "B", $3 AS "C", $4 AS "D"
FROM VALUES
(1::INT, 2::INT, 'Snow'::STRING, '2024-01-01'::STRING),
(3::INT, 4::INT, 'Park'::STRING, '2024-01-02'::STRING)
) LIMIT 10
Note that in the above query, the column “D” is read as Date data type in Snowflake.
8. How to create a DataFrame in Snowpark using Pandas?
A Pandas DataFrame can be passed as “data” to create a DataFrame in Snowpark.
Example-5:
The following is an example of creating a Snowpark DataFrame using pandas DataFrame.
import pandas as pd

df_pandas = session.createDataFrame(pd.DataFrame([1,2,3],columns=["a"]))
df_pandas.show()

------
|"a" |
------
|1 |
|2 |
|3 |
------
Unlike DataFrames created with Lists or Tuples using the ‘createDataFrame‘ method, when a
DataFrame is created using a pandas DataFrame, the Snowpark API creates a temporary table
and imports the data from the pandas DataFrame into it. When extracting data from the
Snowpark DataFrame created using the pandas DataFrame, the data is retrieved by querying
the temporary table.
The DataFrame df_pandas when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT * FROM
"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA"."SNOWPARK_TEMP_TABLE_9
RSV8KITUO" LIMIT 10
9. How to create a DataFrame in Snowpark from a range of numbers?
A DataFrame from a range of numbers can be created using range method of Session class in
Snowpark. The resulting DataFrame has single column named “ID” containing elements in a range
from start to end.
Syntax:
The following is the syntax to create a DataFrame using range method.
session.range(start[, end, step])
Parameters:
 start : The start value of the range. If end is not specified, start will be used as the
value of end.
 end : The end value of the range.
 step : The step or interval between numbers.
Example-6:
The following is an example of creating a DataFrame with a range of numbers from 1 to 9.
df_range = session.range(1,10).to_df("a")
df_range.show()

-------
|"A" |
-------
|1 |
|2 |
|3 |
|4 |
|5 |
|6 |
|7 |
|8 |
|9 |
-------
The DataFrame df_range when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT * FROM (
SELECT ( ROW_NUMBER() OVER ( ORDER BY SEQ8() ) - 1 ) * (1) + (1) AS id
FROM ( TABLE (GENERATOR(ROWCOUNT => 9)))
) LIMIT 10
Example-7:
The following is an example of creating a DataFrame with a range of numbers from 1 to 9 with a step
value of 2 and returning the output column renamed as “A”.
df_range2 = session.range(1,10,2).to_df("a")
df_range2.show()

-------
|"A" |
-------
|1 |
|3 |
|5 |
|7 |
|9 |
-------
The DataFrame df_range2 when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT "ID" AS "A" FROM (
SELECT ( ROW_NUMBER() OVER ( ORDER BY SEQ8() ) - 1 ) * (2) + (1) AS id
FROM ( TABLE (GENERATOR(ROWCOUNT => 5)))
) LIMIT 10
10. How to create a DataFrame in Snowpark from a Database Table?
The sql and table methods of Session class in Snowpark can be used to create a DataFrame from a
Database Table.
Example-8:
The following is an example of creating a DataFrame from a database table by executing a SQL query
using sql method of Session class in Snowpark.
df_sql = session.sql("SELECT * FROM
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.MONTHLY_REVENUE")
df_sql.show(5)

----------------------------------
|"YEAR" |"MONTH" |"REVENUE" |
----------------------------------
|2012 |5 |3264300.11 |
|2012 |6 |3208482.33 |
|2012 |7 |3311966.98 |
|2012 |8 |3311752.81 |
|2012 |9 |3208563.06 |
----------------------------------
The DataFrame df_sql when executed is translated and executed as SQL in Snowflake by Snowpark
API as shown below.
SELECT * FROM (SELECT * FROM
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.MONTHLY_REVENUE) LIMIT 5
Example-9:
The following is an example of creating a DataFrame from a database table by executing a SQL query
using table method of Session class in Snowpark.
df_sql = session.table("MONTHLY_REVENUE")
df_sql.show(5)

----------------------------------
|"YEAR" |"MONTH" |"REVENUE" |
----------------------------------
|2012 |5 |3264300.11 |
|2012 |6 |3208482.33 |
|2012 |7 |3311966.98 |
|2012 |8 |3311752.81 |
|2012 |9 |3208563.06 |
----------------------------------
The DataFrame df_table when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
SELECT * FROM MONTHLY_REVENUE LIMIT 5
11. How to create a DataFrame in Snowpark by reading files from a stage?
DataFrameReader class in Snowpark provides methods for loading data from a Snowflake stage to a
DataFrame with format-specific options. To use it:
1. Create a DataFrameReader object through Session.read method.
2. For CSV file format, create a custom schema parameter of StructType containing
names and data types of columns.
3. Set the file format specific properties such as delimiter using options() method.
4. Specify the file path and stage details by calling the method corresponding to the
CSV format, csv().
Example-10:
The following is an example of creating a DataFrame in Snowpark by reading CSV files from S3
stage.
from snowflake.snowpark.types import IntegerType, StringType, StructField, StructType

schema = StructType(
[StructField("EMPLOYEE_ID", IntegerType()),
StructField("FIRST_NAME", StringType()),
StructField("LAST_NAME", StringType()),
StructField("EMAIL", StringType())
])

df_s3_employee = session.read.schema(schema).options({"field_delimiter": ",", "skip_header":


1}).csv('@my_s3_stage/Inbox/')
df_s3_employee.show(5)

--------------------------------------------------------------
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL |
--------------------------------------------------------------
| 204384 | Steven | King | SKING@test.com |
| 204388 | Neena | Kochhar | NKOCHHAR@test.com |
| 204392 | Lex | De Haan | LDEHAAN@test.com |
| 204393 | Alexander | Hunold | AHUNOLD@test.com |
| 204394 | Bruce | Ernst | BERNST@test.com |
--------------------------------------------------------------
The DataFrame df_s3_employee when executed is translated and executed as SQL in Snowflake by
Snowpark API as shown below.
1. A temporary file format is created using the properties specified in
the options() method.
2. The stage files are queried using the file format created in the first step and the
columns are cast into the data types specified in the schema defined
3. The file format created in the first step is dropped.
--create a temporary file format
CREATE SCOPED TEMPORARY FILE FORMAT If NOT EXISTS
"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA".SNOWPARK_TEMP_FILE_FOR
MAT_Y00K7HK598
TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1

--select data from stage files using the temporary file format
SELECT * FROM (
SELECT
$1::INT AS "EMPLOYEE_ID",
$2::STRING AS "FIRST_NAME",
$3::STRING AS "LAST_NAME",
$4::STRING AS "EMAIL"
FROM @my_s3_stage/Inbox/( FILE_FORMAT =>
'"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA".SNOWPARK_TEMP_FILE_FOR
MAT_Y00K7HK598')
) LIMIT 5

--drop the temporary file format


DROP FILE FORMAT If EXISTS
"SNOWPARK_DEMO_DB"."SNOWPARK_DEMO_SCHEMA".SNOWPARK_TEMP_FILE_FOR
MAT_Y00K7HK598
Subscribe to our Newsletter !!
Introduction
Snowpark is a developer framework from Snowflake that allows developers to interact with
Snowflake directly and build complex data pipelines using Python. In our previous articles, we have
discussed what Snowpark DataFrames are and how to create and read data using Snowflake
Snowpark DataFrames.
In this article, we will explore how to write data into Snowflake tables using Snowpark DataFrames.
How to Write data into Snowflake table from a Snowpark DataFrame?
DataFrameWriter class in Snowflake provides methods for writing data from a DataFrame to
desired destinations within the Snowflake ecosystem.
To write data from DataFrame into a table:
1. Create a DataFrame containing the data to be written into a Snowflake table.
2. Create a DataFrameWriter object by calling the DataFrame.write property on the
DataFrame.
3. Specify the write mode by calling the mode() method on the DataFrameWriter object.
This returns a new DataFrameWriter object that is configured with the specified
mode.
4. Call the save_as_table method on the DataFrameWriter object to save the contents of
the DataFrame to a specified table.
Syntax:
DataFrame.write.mode(save_mode).save_as_table(table_name)
Methods:
mode(save_mode)
Sets the save mode of the DataFrameWriter. The supported values of the save_mode are:
 “append”: Appends data from the DataFrame to the existing table. If the table does
not exist, it creates a new one.
 “overwrite”: Overwrite the existing table with the data from the DataFrame.
 “errorifexists”: Throws an exception if the table already exists.
 “ignore”: Ignore this operation if the table already exists.
save_as_table(table_name)
Writes the data to the specified table in a Snowflake database.
Demonstration
For demonstration purposes, let’s create a DataFrame by reading data from an existing Snowflake
table. We will then transform the data retrieved from Snowflake and, ultimately, save the transformed
data as a new table.
Refer to our previous article to learn how to establish a connection with a Snowflake database using
Snowpark before starting to create the DataFrames.
In the below example, a DataFrame is created that reads data from a Snowflake table
(“SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER“), filters rows where
“C_NATIONKEY” is ‘15‘, and selects only the “C_CUSTKEY” and “C_NAME” columns.
from snowflake.snowpark.functions import col

df_customer = session.table("SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER")
df_customer_filter = df_customer.filter(col("C_NATIONKEY")=='15')
df_customer_select = df_customer_filter.select(col("C_CUSTKEY"),col("C_NAME"))
OverWrite Data
The following code writes the contents of the df_customer_select DataFrame to the specified
Snowflake table, overwriting the table’s existing data if it already exists.
customer_wrt =
df_customer_select.write.mode("overwrite").save_as_table("SNOWPARK_DEMO_DB.SNOWPAR
K_DEMO_SCHEMA.CUSTOMER")
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement is as follows:
--creates a new table overwriting the existing one if already exists
CREATE OR REPLACE TABLE
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER (
"C_CUSTKEY" BIGINT NOT NULL,
"C_NAME" STRING(25) NOT NULL
) AS SELECT * FROM (
SELECT "C_CUSTKEY", "C_NAME" FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER WHERE ("C_NATIONKEY" = '15')
);
The following code confirms that the table is created and displays the count of records loaded into the
CUSTOMER table.
session.table("SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER").count()

+----+
|5921|
+----+
Append Data
The following code appends the contents of the df_customer_select DataFrame to the specified
Snowflake table, adding new records to the existing ones.
customer_wrt =
df_customer_select.write.mode("append").save_as_table("SNOWPARK_DEMO_DB.SNOWPARK_
DEMO_SCHEMA.CUSTOMER")
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement is as follows:
--verifies if specified table already exists
show tables like 'CUSTOMER' in schema
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA

--Inserts data as table is existing already; otherwise creates the table and inserts data.
INSERT INTO SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER
SELECT "C_CUSTKEY", "C_NAME"
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
WHERE ("C_NATIONKEY" = '15')
The following code displays the count of records in the CUSTOMER table. Since we have already
created and inserted data in the preceding step, the record count indicates that data got appended.
session.table("SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER").count()

+------+
|11842 |
+------+
Ignore Data
The following code ignores the write operation if the specified table already exists.
customer_wrt =
df_customer_select.write.mode("errorifexists").save_as_table("SNOWPARK_DEMO_DB.SNOWPA
RK_DEMO_SCHEMA.CUSTOMER")
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement is as follows:
--creates table only if not already existing
CREATE TABLE IF NOT EXISTS
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER (
"C_CUSTKEY" BIGINT NOT NULL,
"C_NAME" STRING(25) NOT NULL
) AS
SELECT * FROM (
SELECT "C_CUSTKEY", "C_NAME"
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
WHERE "C_NATIONKEY" = '15'
);
The following code displays the count of records in the CUSTOMER table. Since we have already
created and inserted data in the preceding steps, the record count indicates that no data got written into
the table.
session.table("SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER").count()

+------+
|11842 |
+------+
Throw Error
The following code throws an exception if the specified table already exists.
customer_wrt =
df_customer_select.write.mode("errorifexists").save_as_table("SNOWPARK_DEMO_DB.SNOWPA
RK_DEMO_SCHEMA.CUSTOMER")

// SQL compilation error: Object


'SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER' already exists.
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API
and throws error. The resulting SQL statement and the result are as follows:
--"create table" is used instead of "create or replace"
CREATE TABLE SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER (
"C_CUSTKEY" BIGINT NOT NULL,
"C_NAME" STRING(25) NOT NULL
) AS SELECT * FROM (
SELECT "C_CUSTKEY", "C_NAME" FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER WHERE ("C_NATIONKEY" = '15')
);

--SQL compilation error: Object


'SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.CUSTOMER' already exists.
How to Specify Table Type while Writing data into Snowflake from a Snowpark DataFrame?
By default, the save_as_table method creates a permanent table. To create tables of the temporary or
transient type, include an additional parameter table_type along with the table_name.
The supported values of table_type are: temp, temporary, and transient.
Syntax:
DataFrame.write.mode(save_mode).save_as_table(table_name, table_type= "{,temp, temporary, and
transient}")
The following code writes the contents of the df_customer_select DataFrame to a temporary table
named TEMP_CUSTOMER.
customer_wrt =
df_customer_select.write.mode("overwrite").save_as_table("SNOWPARK_DEMO_DB.SNOWPAR
K_DEMO_SCHEMA.TEMP_CUSTOMER" ,table_type="temp")
How to Create View from a Snowpark DataFrame?
To create a view from a DataFrame, call the create_or_replace_view method, which creates a new
view immediately.
Syntax:
DataFrame.create_or_replace_view("view_name")
The following code creates a view named VW_CUSTOMER using the computation expressed by
the df_customer_select DataFrame.
customer_view =
df_customer_select.create_or_replace_view("SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCH
EMA.VW_CUSTOMER")
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement is as follows:
--creating view based on DataFrame expression
CREATE OR REPLACE VIEW
SNOWPARK_DEMO_DB.SNOWPARK_DEMO_SCHEMA.VW_CUSTOMER AS
SELECT * FROM (
SELECT "C_CUSTKEY", "C_NAME"
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
WHERE "C_NATIONKEY" = '15'
);
Alternatively, the create_or_replace_temp_view method can be used to creates a temporary view.
The temporary view is only available in the session in which it is created.
HOW TO: COPY Data from CSV Files INTO Snowflake Table using Snowpark?
January 24, 2024
Spread the love
Contents hide
1. Introduction
2. Steps to COPY Data from CSV Files INTO Snowflake table using Snowpark
STEP-1: Establish a Connection with Snowflake from Snowpark
STEP-2: Define a schema parameter of StructType containing names and data types of columns.
STEP-3: Read Data from Staged Files into a Snowpark DataFrame using Session.read Method
STEP-4: COPY Data from Snowpark DataFrame INTO Snowflake table using
DataFrame.copy_into_table Method
3. How to Reload CSV data into Snowflake Table from Snowpark?
4. How to Skip Error Records when copying data into Snowflake Table from Snowpark?
5. How to Specify File Names to Load Data into Snowflake Table from Snowpark?
6. How to Specify File Pattern to Load Data into Snowflake Table from Snowpark?
7. How to COPY INTO Snowflake Table with Different Structure from Snowpark?
8. How to Transform CSV Data before Loading it into Snowflake table from Snowpark?
1. Introduction
Snowpark is a developer framework from Snowflake that allows developers to interact with
Snowflake directly and build complex data pipelines using Python. In our previous articles, we
discussed what Snowpark is and how to set up a Python development environment for Snowpark and
write data into Snowflake using Snowpark.
In this article, we will explore how to copy data from CSV files into Snowflake tables using
Snowpark.
2. Steps to COPY Data from CSV Files INTO Snowflake table using Snowpark
Follow the below steps to copy data from CSV files into the Snowflake table using Snowpark.
STEP-1: Establish a Connection with Snowflake from Snowpark
To establish a connection with the Snowflake database from Snowpark, make use of the Session class
in Snowpark which establishes a connection with a Snowflake database and provides methods for
creating DataFrames and accessing objects.
When you create a Session object, you provide connection parameters to establish a connection with a
Snowflake database as shown below.
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session

connection_parameters = {
"account": "snowflake account",
"user": "snowflake username",
"password": "snowflake password",
"role": "snowflake role", # optional
"warehouse": "snowflake warehouse", # optional
"database": "snowflake database", # optional
"schema": "snowflake schema" # optional
}

session = Session.builder.configs(connection_parameters).create()
STEP-2: Define a schema parameter of StructType containing names and data types of columns.
StructType is a data type representing a collection of fields that may have different data types. It is
commonly used to define the schema of a DataFrame or a column with a nested structure.
Define a schema parameter as a StructType that includes the names and corresponding data types of
the columns in the CSV files from which data should be copied as shown below.
from snowflake.snowpark.types import IntegerType, StringType, StructField, StructType, DateType

schema = StructType(
[StructField("EMPLOYEE_ID", IntegerType()),
StructField("FIRST_NAME", StringType()),
StructField("LAST_NAME", StringType()),
StructField("EMAIL", StringType()),
StructField("HIRE_DATE", DateType()),
StructField("SALARY", IntegerType())
])
STEP-3: Read Data from Staged Files into a Snowpark DataFrame using Session.read Method
DataFrameReader class in Snowpark provides methods for reading data from a Snowflake stage to a
DataFrame with file format-specific options. A DataFrameReader object can be created
through Session.read method as shown below.
#// Use the DataFrameReader (session.read) to read from a CSV file

df_employee = session.read.schema(schema).options({"field_delimiter": ",", "skip_header":


1}).csv('@my_s3_stage/Inbox/employee.csv')
In the above code,
 The file format-specific properties are passed using the options() method.
 The schema of the file defined as a StructType parameter is passed using
the schema() method.
 The file path and stage details are passed by calling the method corresponding to the
CSV format, csv().
The following is the data read by the DataFrame from the stage file employee.csv
df_employee.show(5)

----------------------------------------------------------------------------------
|"EMPLOYEE_ID" |"FIRST_NAME" |"LAST_NAME" |"EMAIL" |"HIRE_DATE" |"SALARY"
|
----------------------------------------------------------------------------------
|100 |Steven |King |SKING |2003-06-17 |24000 |
|101 |Neena |Kochhar |NKOCHHAR |2005-09-21 |17000 |
|102 |Lex |De Haan |LDEHAAN |2001-01-13 |17000 |
|103 |Alexander |Hunold |AHUNOLD |2006-01-03 |9000 |
|104 |Bruce |Ernst |BERNST |2007-05-21 |6000 |
----------------------------------------------------------------------------------
STEP-4: COPY Data from Snowpark DataFrame INTO Snowflake table using
DataFrame.copy_into_table Method
DataFrame.copy_into_table method in Snowpark executes a COPY INTO <table> command to
load data from files in a stage location into a specified table.
This method is slightly different from the COPY INTO command. It automatically creates a table if it
doesn’t exist and the input files are CSV, unlike the COPY INTO <table> command.
Syntax:
The following is the syntax to copy data from a CSV file into a table using
the DataFrame.copy_into_table method in Snowpark.
DataFrame.copy_into_table(table_name, [optional_parameters])
Example:
The following code copies the data from DataFrame df_employee into the EMPLOYEE table in
Snowflake.
#// writing data using DataFrame.copy_into_table method
copied_into_result = df_employee.copy_into_table("employee")
When executed, this code is translated and executed as SQL in Snowflake through the Snowpark API.
The resulting SQL statement is as follows:
-- Verifying if table exists
show tables like 'employee'

-- Creating table
CREATE TABLE employee(
"EMPLOYEE_ID" INT,
"FIRST_NAME" STRING,
"LAST_NAME" STRING,
"EMAIL" STRING,
"HIRE_DATE" DATE,
"SALARY" INT)

-- Copying data from Stage file into Snowflake table


COPY INTO employee FROM @my_s3_stage/Inbox/employee.csv
FILE_FORMAT = ( TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1 )
The following code confirms that the table is created and displays the count of records loaded into
the EMPLOYEE table.
session.table("employee").count()
------
| 107|
------
3. How to Reload CSV data into Snowflake Table from Snowpark?
The COPY INTO <table> command in Snowflake do not process the files which are already
processed and have not changed.
To reload (duplicate) data from a set of staged data files that have not changed,
add FORCE=TRUE in the DataFrame.copy_into_table Method as shown below.
#// Reloading data using DataFrame.copy_into_table method
copied_into_result = df_employee.copy_into_table("employee", force=True)
The following code confirms that data is reloaded into the EMPLOYEE table showing that the record
count is doubled.
session.table("employee").count()
------
| 214|
------
4. How to Skip Error Records when copying data into Snowflake Table from Snowpark?
The ON_ERROR clause specifies what to do when the COPY command encounters errors in the
files. The default behavior aborts the load operation unless a different ON_ERROR option is
explicitly set in the COPY statement.
Syntax:
DataFrame.copy_into_table(table_name, on_error="<copy_options>")
The following are all the supported values of ON_ERROR copy option
CONTINUE
 Continue to load the file if errors are found.
SKIP_FILE
 Skip a file when an error is found.
SKIP_FILE_num (ex: SKIP_FILE_10)
 Skip a file when the number of error rows found in the file is equal to or exceeds the
specified number.
SKIP_FILE_num%’ (ex: ‘SKIP_FILE_10%’)
 Skip a file when the percentage of error rows found in the file exceeds the specified
percentage.
ABORT_STATEMENT
 Abort the load operation if any error is found in a data file.
The following code Skips Error Records if any, when copying data into Snowflake from a Snowpark
DataFrame
#// Specifying the COPY option to continue on error
copied_into_result = df_employee.copy_into_table("employee", force=True,
on_error="CONTINUE")
5. How to Specify File Names to Load Data into Snowflake Table from Snowpark?
Consider we are reading all the files in a stage location using a DataFrame in Snowpark. Using
the files parameter in the DataFrame.copy_into_table method, we can explicitly specify the list of
files from which data should be copied into a Snowflake table.
Syntax:
DataFrame.copy_into_table(table_name, files=['file1.csv', 'file2.csv'])
The following code copies data only from the employee.csv file in the stage location into the
EMPLOYEE table in Snowflake.
#// Reads all files from the Inbox folder in Stage location
df_employee = session.read.schema(schema).options({"field_delimiter": ",", "skip_header":
1}).csv('@my_s3_stage/Inbox/')

#// Copies data from only specified files from stage location
copied_into_result = df_employee.copy_into_table("employee", files=['employee.csv'], force=True)
6. How to Specify File Pattern to Load Data into Snowflake Table from Snowpark?
Using the pattern parameter in the DataFrame.copy_into_table method, we can explicitly define the
file pattern from which data should be copied into a Snowflake table.
Syntax:
DataFrame.copy_into_table(table_name, pattern='<regex_file_pattern>')
The following code copies data from stage files that match the file pattern specified in
the DataFrame.copy_into_table method into the Snowflake EMPLOYEE table.
#// Reads all files from the Inbox folder in Stage location
df_employee = session.read.schema(schema).options({"field_delimiter": ",", "skip_header":
1}).csv('@my_s3_stage/Inbox/')

#// Copies data from files which match the specified file pattern from stage location
copied_into_result = df_employee.copy_into_table("employee", pattern='emp[a-z]+.csv', force=True)
7. How to COPY INTO Snowflake Table with Different Structure from Snowpark?
When the structure of columns is different between the stage files and the Snowflake table, we can
specify the order of target columns to which data should be saved in the table using
the target_columns parameter in the DataFrame.copy_into_table method.
Syntax:
DataFrame.copy_into_table(table_name, target_columns=['<column_1>','<column_2>',..])
The following is the order of columns in the Stage file vs Snowflake table vs target_columns
parameter
employee.csv EMPLOYEE target_columns parameter

EMPLOYEE_ID ID ID

FIRST_NAME FNAME FNAME

LAST_NAME LNAME LNAME

EMAIL SALARY EMAIL_ADDRESS

HIRE_DATE EMAIL_ADDRESS JOIN_DATE


SALARY JOIN_DATE SALARY
The following code copies data from the DataFrame into the EMPLOYEES table where the order of
columns is different between the stage file and the Snowflake table using
the target_columns parameter.
#// copying data into the table with a different structure than file
copied_into_result = df_employee.copy_into_table("employees", \
target_columns=['ID','FNAME','LNAME','EMAIL_ADDRESS','JOIN_DATE','SALARY'],\
force=True, on_error="CONTINUE")
The following is the data loaded into the EMPLOYEES table in Snowflake.

Data in EMPLOYEES data loaded from CSV files


8. How to Transform CSV Data before Loading it into Snowflake table from Snowpark?
The data from CSV files can be transformed before loading into the Snowflake table using
the transformations parameter in the DataFrame.copy_into_table method.
Syntax:
DataFrame.copy_into_table(table_name, transformations=[<transformed_columns>])
The following code transforms the data in DataFrame before copying it into the EMPLOYEES table
using the transformations parameter.
 The CSV file columns are represented as $1, $2, $3, and so on.
 For FNAME and LNAME table fields, we are copying the data of columns 2 and 3
from files by trimming the spaces using ltrim and rtrim functions.
 For the EMAIL_ADDRESS field of the table, we are concatenating the first letter of
column 2 with column 3 data using concat and substr functions.
#// Transform data and load to snowflake
from snowflake.snowpark.functions import ltrim, rtrim, concat, substr

copied_into_result = df_employee.copy_into_table("employees", \
target_columns=['ID','FNAME','LNAME','EMAIL_ADDRESS','JOIN_DATE','SALARY'], \
transformations=['$1', ltrim(rtrim('$2')), ltrim(rtrim('$3')), concat(substr('$2',1,1),'$3'), '$5',
'$6'], \
force=True, on_error="CONTINUE")
To find the complete list of supported functions in the transformations parameter, refer to Snowflake
Documentation.
The following is the transformed data loaded into the EMPLOYEES table in Snowflake.
Transformed data in EMPLOYEES Table
Introduction to Snowflake Snowpark for Python
January 2, 2024
Spread the love
Contents hide
1. What is Snowflake Snowpark?
2. Setting up Snowpark Python Environment to connect Snowflake
1. Installing Python
2. Installing Snowpark
3. Installing Visual Studio Code
3. Connecting Snowflake using Snowpark Python
3.1. Import Snowpark Libraries
3.2. Create Connection Parameters
3.3. Create a Session
3.4. Write your code
4. Writing Snowpark Code in Python Worksheets
4.1. Creating Python Worksheets
4.2. Writing Snowpark Code in Python Worksheets
4.3. Snowpark Python Packages for Python Worksheets
5. Closing Thoughts
1. What is Snowflake Snowpark?
The Snowpark is an intuitive library that offers an API for querying and processing data at scale in
Snowflake. It seamlessly integrates DataFrame-style programming into preferred languages like
Python, Java, and Scala, and all operations occur within Snowflake using the elastic and serverless
Snowflake engine. This eliminates the need of moving data to the system where your application code
runs.
When developing Snowpark applications, there are some key concepts that are important to
understand.
 The Snowpark API enables you to write code from your IDE or notebook in your
preferred language and finally convert it into SQL to execute your code in Snowflake.
 A core abstraction in Snowpark is the DataFrame, representing a query in your
chosen language.
 Snowpark does not require a separate cluster outside of Snowflake for computations.
The queries built using Dataframes are converted to SQL, efficiently distributing
computation in Snowflake’s elastic engine.
 Dataframes in Snowpark are executed lazily, running only when actions like retrieval,
storage, or viewing of data are performed.
 Snowpark Dataframes are also run entirely within Snowflake ensuring data remains
within unless explicitly requested by the application.
2. Setting up Snowpark Python Environment to connect Snowflake
The following are the prerequisites for setting up a local Python development environment to build
applications using Snowpark.
1. Install Python
2. Install Snowpark
3. Install Visual Studio Code
1. Installing Python
At the point of writing this article, the supported version of Python for Snowpark is 3.10. To know the
latest supported version, please refer Snowflake Documentation.
Follow below steps to install Python on your Windows local computer.
1. Navigate to the official Python website. Click on Downloads and then select Windows.
2. Navigate to the supported version of the Python in the downloads page and download the installer.

3. Once the executable file is downloaded completely, open the file to install Python.
4. In the installation wizard, verify the path where the installation files will be saved. Also, select the
checkbox at the bottom to Add Python 3.11 to PATH and click Install Now.
5. Wait for the wizard to finish the installation process until the Set up was successful message
appears. Click Close to exit the wizard.
6. To verify if Python is installed on your machine, issue the following command from the command
prompt (start >> cmd) of your machine.
python --version
2. Installing Snowpark
To install the Snowpark Python package, execute the following command from your command
prompt window.
pip install snowflake-snowpark-python
The download begins and all the required packages are installed into your Python virtual environment.
Follow below steps to install PIP on your machine if it is not already installed before running the
above given command.
1. Run the following cURL command in the command prompt to download the get-pip.py file
curl https://bootstrap.pypa.io/get-pip.py –o get-pip.py
2. Once the download is complete, run the following Python command to install PIP.
Python get-pip.py
3. Open a new command prompt window and run the following command to verify if PIP has
successfully installed.
pip --version
3. Installing Visual Studio Code
Visual Studio Code is the most popular code editor and IDE provided by Microsoft with support for
development operations like debugging, task running, and version control.
Follow below steps to install Visual Studio Code on Windows.
1. Navigate to the official website of Visual Studio code.
2. Click on the Download for Windows button on the website to start downloading the application.

3. Once the download finishes, Click on the installer icon to start the installation process of the Visual
Studio Code.
4. In the installation wizard, agree to the terms and conditions, and proceed by clicking the “Next”
and “Install” buttons on the subsequent pages.
5. After successful installation of Visual Studio Code, go to the extensions tab in Visual Studio Code,
search for the Python extension and install it.
6. The Python extension tries to find and select what it deems the best environment for the workspace.
To manually specify the environment, press Ctrl+Shift+P to open the VS Code Command
Palette and execute the command Python: Select Interpreter

7. The Python: Select Interpreter command displays a list of available global environments, select
the python 3.10 environment which we set up in the first step.

3. Connecting Snowflake using Snowpark Python


To connect Snowflake using Snowpark Python in your application, the following steps need to be
followed.
1. Import Snowpark Libraries.
2. Create Connection Parameters
3. Create a Session
4. Write your code
3.1. Import Snowpark Libraries
The first step is to import the Snowpark libraries that establish a session with the Snowflake database.
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session
3.2. Create Connection Parameters
Create a Python dictionary containing the names and values of the parameters for connecting to
Snowflake as shown below.
connection_parameters = {
"account": "snowflake account",
"user": "snowflake username",
"password": "snowflake password",
"role": "snowflake role", # optional
"warehouse": "snowflake warehouse", # optional
"database": "snowflake database", # optional
"schema": "snowflake schema" # optional
}
3.3. Create a Session
Pass the connection parameters dictionary to the Session.builder.configs method to return a builder
object that has these connection parameters.
Call the create method of the builder to establish the session as shown below.
new_session = Session.builder.configs(connection_parameters).create()
3.4. Write your code
Once the connection is established, write the code on database objects.
The following code prints the current database and schema details which are used in the session
created.
print(new_session.sql("select current_database(), current_schema()").collect())
The following code creates a dataframe on the database table named campaign_spend and prints the
first 10 rows.
df_campaign_spend = new_session.table('campaign_spend')
df_campaign_spend.show()
As discussed, the DataFrames in Snowpark are executed lazily. In this case, no action is
performed when the dataframe is created. Only when we call the show function, the dataframe is
converted into a SQL query and is executed against the database defined.
The below image shows the DataFrame executed in the application is converted into a SQL query in
Snowflake as shown below.

DataFrame executed as SQL query in Snowflake


Here is the sample Snowpark code which connects to Snowflake and prints the contents from a table.
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session

connection_parameters = {
"account": "qokbyrr-ag94793",
"user": "SFUSER13",
"password": "Abc123",
"role": "ACCOUNTADMIN",
"warehouse": "SNOWPARK_DEMO_WH",
"database": "SNOWPARK_DEMO_DB",
"schema": "SNOWPARK_DEMO_SCHEMA"
}

new_session = Session.builder.configs(connection_parameters).create()

print(new_session.sql("select current_database(), current_schema()").collect())

df_campaign_spend = new_session.table('campaign_spend')
df_campaign_spend.show()
Below is the output of the Snowpark application code.
Application Output
Here, the application code is run on your local machine, but the actual query execution is
performed within Snowflake.
4. Writing Snowpark Code in Python Worksheets
Snowflake supports writing Snowpark code in Python worksheets to process data using Snowpark
Python in Snowsight. You can conduct your development and testing in Snowflake without the need
to install dependent libraries by writing code in Python worksheets.
4.1. Creating Python Worksheets
To start coding in Python worksheets, in Snowsight, open Worksheets, simply click + to add new
worksheet, and select Python Worksheet.
The below image shows the default code with which the Python worksheet is created in Snowsight.

Default Python Worksheet in Snowflake


4.2. Writing Snowpark Code in Python Worksheets
In the Python Worksheets, the Snowpark Python code is written inside the handler function.
import snowflake.snowpark as snowpark

def main(session: snowpark.Session):


# your code goes here
 The default handler function is main, but you can change it in the Settings for the
worksheet. The active handler is highlighted in the worksheet.
 Use the session object to access data in Snowflake with the Snowpark API libraries.
 After you write your code in the Python worksheet, select Run to run your Python
worksheet.
4.3. Snowpark Python Packages for Python Worksheets
The snowflake-snowpark-python package is required and always installed for Python worksheets.

Default packages in Snowflake


Python Worksheet
To use Anaconda provided packages in Python Worksheets, you must enable the packages by
accepting to the terms.
To accept the terms, in Snowsight, go to Admin > Billing & Terms. In the Anaconda section,
select Enable.

In the Anaconda Packages dialog, select Acknowledge & Continue.

5. Closing Thoughts
While there is much more to cover on Snowpark, I trust this article has offered you a fundamental
understanding, particularly beneficial for individuals without a programming background. It aims to
assist you in initiating your Snowpark learning journey and building a solid foundation for exploring
its capabilities.
Watch this space for more informative content on Snowflake Snowpark !!

You might also like