Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

The Simple Guide

to Snowpipe
Create your first Snowpipe, step-by-step

masteringsnowflake.com
The Simple Guide to Snowpipe
When we need to handle micro-batches, or
continuous data loads Snowflake provides an
efficient way to do this with Snowpipe.

Snowpipe...

Uses Snowflake-supplied compute


resources
Can auto ingest structured or semi-
structured data into Snowflake
Prevents the need to run multiple COPY
INTO commands

In this guide we'll look at how to create a


Snowpipe to auto-ingest data, step-by-step.

Adam Morton
The Simple Guide to Snowpipe
Snowpipe is a fully managed service. It doesn’t
require a user-defined virtual warehouse
because it uses Snowflake managed compute
resources.

Snowpipe is designed to move smaller


amounts of structured or semi-structured data
quickly and efficiently from a stage to a table in
Snowflake.

You must create and stage the files first, which


means you cannot stream data directly from
the source system into Snowflake.

Adam Morton
The Simple Guide to Snowpipe
Snowpipe removes the need for tuning or any
other additional management, and you don’t
need to worry about the resources because it’s
completely serverless.

Adam Morton
The Simple Guide to Snowpipe
You only pay for the compute time used to
load data.

You still pay using Snowflake credits.

Even though there’s no user-defined virtual


warehouse required, it is still listed on the
Snowflake bill under a separate virtual
warehouse named Snowpipe.

Adam Morton
Hold up, what are we

The Simple Guide to Snowpipe


building again?

Let me break it down for you:

Files arrive in a AWS S3 bucket.

AWS notices these files and adds them to a


queue as messages.

Adam Morton
The Simple Guide to Snowpipe
Snowpipe checks the queue and copies these
messages to a Snowflake queue.

A Snowflake-provided virtual warehouse loads


data from the queued files into the target table
based on parameters defined in the specified
pipe.

Elegant and simple, right?!

Adam Morton
Let's take a look at a

The Simple Guide to Snowpipe


picture of what we'll build

Adam Morton
The Simple Guide to Snowpipe
Now, be prepared to buckle up.

It's going to take a while until we get to create


our first Snowpipe!

You ready?

Let's begin....

Adam Morton
This feature is available

The Simple Guide to Snowpipe


on AWS, Microsoft and
Google

Although we're going to look at an example


using AWS S3, you can also configure
Snowpipe to auto ingest files from a Google
Cloud Storage or Microsoft Azure.

The process will be similar but you'll be using


the equivalent services from either Google or
Microsoft respectively.

Adam Morton
The Simple Guide to Snowpipe
To configure Snowpipe to auto ingest files
from a S3 bucket there's a few things we need
to create and configure.

We'll also need to jump back and forth


between AWS management console and
Snowflake - so keep them both open on
different windows.

Adam Morton
Your 6 Steps to

The Simple Guide to Snowpipe


Snowpipe Success
1. AWS: IAM policy & role
2. Snowflake: A storage integration to hold
the credentials to access the S3 bucket.
3. Snowflake: A external stage which acts as
a pointer to the S3 bucket for Snowflake.
4. Snowflake: Create a pipe in Snowflake.
5. AWS: An event notification on the S3
bucket to push events to a SQS queue.
6. Snowflake: Provide access to others to
use the pipe.

We'll cover each step in more detail next.

Adam Morton
Step 1

The Simple Guide to Snowpipe


Creating the IAM Policy

First we need to configure access permissions


for Snowflake in your AWS Management
Console to access your S3 bucket.

1. Log into the AWS Management Console.


2. From the home dashboard, choose
Identity & Access Management (IAM):

Adam Morton
The Simple Guide to Snowpipe

Adam Morton
The Simple Guide to Snowpipe
3. Choose Account settings from the left-hand
navigation pane.

4. Expand the Security Token Service Regions


list, find the AWS region corresponding to the
region where your account is located, and
choose Activate if the status is Inactive.

5. Choose Policies from the left-hand


navigation pane.

Adam Morton
The Simple Guide to Snowpipe
6. Click Create Policy

7. Click the JSON tab.

8. Add a policy document that will allow


Snowflake to access the S3 bucket and folder.

Check out the next page for an example JSON.

Adam Morton
The Simple Guide to Snowpipe
The following policy (in JSON format) provides
Snowflake with the required permissions to
load or unload data using a single bucket and
folder path.

Adam Morton
The Simple Guide to Snowpipe

Adam Morton
The Simple Guide to Snowpipe
9. Click Review policy.

10. Enter the policy name (e.g.


snowflake_access) and an optional
description.

11. Click Create policy.

Adam Morton
The Simple Guide to Snowpipe

Adam Morton
Attach our policy to an

The Simple Guide to Snowpipe


IAM role

Ok, so now we have our policy defined we


need to create a role and attach it to the policy.

1. Head back to IAM in the AWS


management console.
2. Choose Roles from the left-hand
navigation pane.
3. Click the Create role button.

Adam Morton
The Simple Guide to Snowpipe
4. Select Another AWS account as the trusted
entity type.
5. In the Account ID field, enter your own AWS
account ID.
6. Select the Require external ID option. Enter a
dummy ID such as 0000.

Later, we'll come back and modify the account


ID and external ID values

Adam Morton
Open Sesame!

The Simple Guide to Snowpipe


7. Click next and we'll attach the policy we
created previously to the role.
8. Enter a name and description for the role,
and click the Create role button.
9. You have now created an IAM policy for a
bucket, created an IAM role, and attached the
policy to the role.

Important: Make sure to record the Role ARN


value located on the role summary page.

In the next step, you will create a Storage


Integration in Snowflake which references this
AWS IAM role.

Adam Morton
The Simple Guide to Snowpipe
24 pages in and you've made it to step 2.

Remember this is a marathon, not a sprint.

Take on some water if you need to!

Adam Morton
Step 2 - Creating a

The Simple Guide to Snowpipe


Storage Integration

A storage integration is a Snowflake object


that stores the credentials for the IAM user (for
your S3 cloud storage), along with an optional
set of allowed or blocked storage locations (i.e.
buckets).

This option allows you to avoid sharing your


credentials when creating stages or loading
data - other users can simply reference the
storage integration when they create an
External Stage.

Adam Morton
The Simple Guide to Snowpipe
This is how you create a storage integration
object in Snowflake.

Replace the green sections with your own


environment details.

Adam Morton
The Simple Guide to Snowpipe
Hold up and pay attention here!

This is the tricky bit, which at first appears not


particularly intuitive.

Just bear with me, ok?

Adam Morton
The Simple Guide to Snowpipe
Ok, we need to execute a command in
Snowflake.

And take note of some values.

Before heading back to AWS and plugging


them in.

If you do this bit correctly you can do


ANYTHING!!

And it you get it right first time hit me up in


the comments ;)

Adam Morton
Behold....the Storage

The Simple Guide to Snowpipe


Integration ARN
Ok, so execute this command in Snowflake:
DESC INTEGRATION <integration_name>;

We need to grab the values associated with


the following:

STORAGE_AWS_IAM_USER_ARN - this is the


ARN created in our S3 account by the storage
integration object

STORAGE_AWS_EXTERNAL_ID - This is the


external ID of the storage integration object.

Adam Morton
Grant the IAM User

The Simple Guide to Snowpipe


Permissions to Access
Bucket Objects
Armed with this information we can go back to
the AWS management console.

Head back to the IAM > Roles section.

Select our role we created earlier.

Select Trust Relationships and edit trust


relationships.

We'll modifiy this JSON next.

Adam Morton
Edit trust permissions

The Simple Guide to Snowpipe


Modify the policy document with the DESC
STORAGE INTEGRATION output values we
captured earlier.

snowflake_user_arn is the
STORAGE_AWS_IAM_USER_ARN value
you recorded.
snowflake_external_id is the
STORAGE_AWS_EXTERNAL_ID value
you recorded.

We'll take a look at an example on the next


page....

Adam Morton
Edit trust permissions

The Simple Guide to Snowpipe


Click the Update Trust Policy button. The
changes are saved.

Phew!

Adam Morton
The Simple Guide to Snowpipe
I'm starting to think I should rename this to
'The least complex guide to Snowpipe'

But believe me, this is the simplest way to


do this.

Let me know in the comments if you think


otherwise.

Adam Morton
Step 3 - Create external

The Simple Guide to Snowpipe


stage
Now we have set up our permissions to our
AWS S3 location we need to create a pointer
this location from Snowflake.

To do this we create a external stage and refer


to our storage integration we created earlier.

Check this out:

Adam Morton
Step 4 - Create the pipe

The Simple Guide to Snowpipe


Finally, we get down to the business end of
things.

We're creating the [snow]pipe.

To create a pipe we use the CREATE PIPE


command and reference the stage and specify
the format of the data to expect.

Adam Morton
The Simple Guide to Snowpipe
The pipe defines the COPY INTO <table>
statement used by Snowpipe to load data from
the ingestion queue into the target table.

Note: you'll need to create a target table in


Snowflake too!

Adam Morton
The Simple Guide to Snowpipe
Wow, what an anticlimax!

The creation of the Snowpipe was quite


possibly the quickest and easiest step of the
whole process.

Adam Morton
The Simple Guide to Snowpipe
**Quick recap people**

We've got:

An IAM policy

Attached to an IAM role

Referenced by a Storage Integration to


store the credentials to S3

A External stage acting a pointer to the S3


location and referencing the Storage
Integration

And a pipe set to auto ingest = true which


has created a AWS SQS queue behind the
scenes.

Adam Morton
Step 5 - Event

The Simple Guide to Snowpipe


notifications
Nearly there!

Now we have to tell our S3 bucket to push a


message to the queue when new files arrive,

This notifies our Snowpipe when new data is


available to load.

The auto-ingest feature relies on SQS queues


to deliver event notifications from S3 to
Snowpipe.

For ease of use, Snowpipe SQS queues are


created and managed by Snowflake.

Adam Morton
The Simple Guide to Snowpipe
Every object created in AWS is uniquely
identified by a Amazon Resource Name (ARN).

As Snowpipe created our SQS queue in AWS


for us we need to grab the ARN so we can tell
S3 where to send the event notifcations.

The SHOW PIPES command output displays


the of your SQS queue.

SHOW PIPES;

Note the ARN of the SQS queue for the stage in


the notification_channel column. Copy the
ARN to a convenient location.

Adam Morton
Back to AWS S3 we go!

The Simple Guide to Snowpipe


1. In the Buckets list, choose the name of the
bucket that you want to enable events for.

2. Choose Properties.

3. Navigate to the Event Notifications section


and choose Create event notification.

Adam Morton
The Simple Guide to Snowpipe
Enter the following:
Name: Name of the event notification (e.g.
Auto-ingest Snowflake).
Events: Select the ObjectCreate (All)
option.
Send to: Select SQS Queue from the
dropdown list.
SQS: Select Add SQS queue ARN from the
dropdown list.
SQS queue ARN: Paste the SQS queue
name from the SHOW PIPES output.

Adam Morton
Step 6 - Set up access

The Simple Guide to Snowpipe


controls on the pipe

For each user who needs to execute the


Snowpipe we need to grant access control
privileges on the objects for the data load (i.e.
the target database, schema, and table; the
stage object, and the pipe).

Adam Morton
The Simple Guide to Snowpipe
Use the GRANT <privileges> command to
grant privileges to the role as follows:

Adam Morton
Cast your mind back a

The Simple Guide to Snowpipe


few hundred pages...

Adam Morton
The Simple Guide to Snowpipe
And we've done it, we've built the entire
process flow.

If you have followed all the steps correctly


then adding a file into your S3 location will
trigger Snowpipe to auto ingest your data into
your target table in Snowflake.

Adam Morton
The Simple Guide to Snowpipe
Your still here?

Adam Morton
The Simple Guide to Snowpipe
You want more??

Adam Morton
The Simple Guide to Snowpipe
Ok a little extra, how do you monitor your
Snowpipe usage?

Adam Morton
The Simple Guide to Snowpipe
To view the credits billed to your Snowflake
account within a specified date range you can
use either Snowsight, the Classic Web UI, or
SQL.

Note: You must be an account administrators


or a user with the MONITOR USAGE global
privilege to view this information.

Adam Morton
I'm Adam Morton

The Simple Guide to Snowpipe


I'll help you unlock your career potential and
supercharge your career.

Follow me and lets move your aspirations from


a pipe dream to a reality.
Work With Me

The Simple Guide to Snowpipe


Apply to join my exclusive Mastering
Snowflake program

Link in my profile

Adam Morton

You might also like