Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Snowflake Mini Project

Project Activities –

1:Warehouse Creation:

Create a warehouse of name SFTRNG. Document the parameters with which the WH is created and the
reasons behind the WH options used.

Solution:

 Warehouse: Warehouses are required for queries, as well as all DML operations, including loading
data into tables. A warehouse is defined by its size, as well as the other properties that can be set to
help control and automate warehouse activity.
 In this picture we create a Warehouse (SFTRNG) with the help of Snowflake tool, where we
configure with different parameters like size, clusters and scalling policy.

2:Resource Monitor:

Create a Resource Monitor of name TRNG_TRACK to track the WH SFTRNG. Document the parameters
with which the Resource Monitor is created and the reasons behind the WH options used.

Solution :

 Resource monitors can be used to monitor credit usage by user-managed virtual warehouses and
virtual warehouses used by cloud services. A resource monitor can suspend user-managed
warehouses based on credit usage thresholds.
3:Unloading data:

Unload the data from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

1) Whole data into multiple csv files with delimiter | (Pipe)

2) Whole data into single file with delimiter |-|

Document the queries used to unload the data.

Solution:

1) Whole data into multiple csv files with delimiter | (Pipe)

""SNOWFLAKE_SAMPLE_DATA"SELECT * FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

create or replace file format my_unload_format

type = 'CSV'

field_delimiter = '|'

COMPRESSION = NONE;

"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"

create or replace stage my_unload_stage2

file_format = my_unload_format;
remove @my_unload_stage2

copy into @my_unload_stage2 from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

--SINGLE = FALSE

MAX_FILE_SIZE=16777216;

"SNOWFLAKE_SAMPLE_DATA"

list @my_unload_stage2 ------ Unloading it as .gz file ------------

"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"

2) Whole data into single file with delimiter |-|

""SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10""SNOWFLAKE_SAMPLE_DATA""SNOWFLAKE_
SAMPLE_DATA"SELECT * FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

create or replace file format my_unload_format

type = 'CSV'

field_delimiter = '|-|'

COMPRESSION = NONE;

"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"

create or replace stage my_unload_stage

file_format = my_unload_format;

remove @my_unload_stage

copy into @my_unload_stage from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

--SINGLE = FALSE

MAX_FILE_SIZE=16777216;

"SNOWFLAKE_SAMPLE_DATA"
list @my_unload_stage ------ Unloading it as .gz file ------------

get @my_unload_stage/data file://C:\Snowflake\UNload\Assin1;

4) DB Objects creation:

 Create a database SFTRNG


 Create schema TRNG
 Create tables LINEITEM_Local; LINEITEM_AWS; LINEITEM_GCP by doing data profiling and
determining the max length of each column. (Note: not to use the default column data types)

Solution:

create or replace database SFTRNG;

create or replace table LINEITEM_Local (

L_ORDERKEY NUMBER,

L_PARTKEY NUMBER,

L_SUPPKEY NUMBER,

L_LINENUMBER NUMBER,

L_QUANTITY NUMBER,

L_EXTENDEDPRICE NUMBER,

L_DISCOUNT NUMBER,

L_TAX NUMBER,

L_RETURNFLAG VARCHAR,

L_LINESTATUS VARCHAR,

L_SHIPDATEDATE,

L_COMMITDATE DATE,

L_RECEIPTDATE DATE,

L_SHIPINSTRUCT VARCHAR,

L_SHIPMODE VARCHAR,

L_COMMENT VARCHAR);
create or replace file format LINEITEM_Local

type = 'CSV'

field_delimiter = '|'

skip_header = 1;

create or replace stage my_local_stage

file_format = LINEITEM_Local;

put file://c:\snowflake\load\contacts1.csv @my_local_stage;

list @my_local_stage;

select * from LINEITEM_Local;

copy into LINEITEM_Local

from @my_local_stage/contacts1.csv

file_format = (format_name = mylocalformat)

--on_error = 'SKIP_FILE'

FORCE = TRUE

PURGE = TRUE;

copy into mylocaltable

from @my_local_stage

--file_format = (type = local field_delimiter = '|' skip_header = 1)

--pattern='.*contacts[1-5].csv.gz'

--on_error = 'abort statement'

--on_error = 'skip_file'

--on_error = 'CONTINUE'
FORCE = TRUE

VALIDATION_MODE = 'RETURN_3_ROWS';

copy into LINEITEM_Local

from @my_local_stage/contacts.csv;

1) COPY/CLONE:

Do the copy or clone from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."ORDERS" as ORDERS_BKUP

Solution:

create or replace table orders_bkup clone orders;

2) TimeTravel:
A) Drop the table ORDERS_BKUP and note the query id.

B) Restore the ORDERS_BKUP table and note the query id.


C) Delete the data from ORDERS_BKUP and note the query id

D) Restore the data into ORDERS_BKUP using time travel and note the query id and query

Solutoin:

A) drop table ORDERS_BKUP; --01a25551-0000-2022-0000-0002cc2e60f1

UNDROP TABLE ORDERS_BKUP; --01a25552-0000-2023-0000-0002cc2e51f9

B) SELECT * FROM ORDERS_BKUP; --01a25552-0000-2022-0000-0002cc2e611d

C) delete from ORDERS_BKUP; --01a25555-0000-2023-0000-0002cc2e526d

D) -- time travel to a time just before the update was run


select * from ORDERS_BKUP AT (timestamp => '2022-02-15 19:37:48.259 +0000'::timestamp); --
01a2555f-0000-2022-0000-0002cc2e6271

-- time travel to 10 minutes ago (i.e. before we ran the update)


select * from ORDERS_BKUP AT(offset => -60*1) --01a25560-0000-2022-0000-0002cc2e62a5

select * from ORDERS_BKUP AT (statement => '01a25555-0000-2023-0000-0002cc2e526d'); --


01a25561-0000-2022-0000-0002cc2e62c5

Performance:

Write a query to find the number of orders each customer has made and total price of the orders
From the ORDERS_BKUP table.
Note the total micro partitions being used and micro partitions being scanned.

Increase the size of the WH and note the performance statistics.

Note the difference with and without usage of result cache.

Determine if clustering can improve the performance.

Solution:

CREATE or replace TABLE ORDERS AS SELECT * FROM


"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."ORDERS"

CREATE or replace TABLE ORDERS_CLUSTER AS SELECT * FROM


"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."ORDERS"

ALTER TABLE ORDERS_CLUSTER CLUSTER BY (O_ORDERDATE,O_ORDERSTATUS)

SELECT * FROM ORDERS where (O_ORDERDATE,O_ORDERSTATUS) = ('1992-01-20','F') --


1.66s

SELECT * FROM ORDERS_CLUSTER where (O_ORDERDATE,O_ORDERSTATUS) = ('1992-01-


20','F') -- 1.02s

You might also like