Snowflake Mini Project

Snowflake Mini Project
Project Activities –
1:Warehouse Creation:
Create a warehouse of name SFTRNG. Document the parameters with which the WH is created and the
reasons behind the WH options used.
Solution:
 Warehouse: Warehouses are required for queries, as well as all DML operations, including loading
data into tables. A warehouse is defined by its size, as well as the other properties that can be set to
help control and automate warehouse activity.
 In this picture we create a Warehouse (SFTRNG) with the help of Snowflake tool, where we
configure with different parameters like size, clusters and scalling policy.
2:Resource Monitor:
Create a Resource Monitor of name TRNG_TRACK to track the WH SFTRNG. Document the parameters
with which the Resource Monitor is created and the reasons behind the WH options used.
Solution :
 Resource monitors can be used to monitor credit usage by user-managed virtual warehouses and
virtual warehouses used by cloud services. A resource monitor can suspend user-managed
warehouses based on credit usage thresholds.
3:Unloading data:
Unload the data from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"
1) Whole data into multiple csv files with delimiter | (Pipe)
2) Whole data into single file with delimiter |-|
Document the queries used to unload the data.
Solution:
1) Whole data into multiple csv files with delimiter | (Pipe)
""SNOWFLAKE_SAMPLE_DATA"SELECT * FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"
create or replace file format my_unload_format
type = 'CSV'
field_delimiter = '|'
COMPRESSION = NONE;
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"
create or replace stage my_unload_stage2
file_format = my_unload_format;
remove @my_unload_stage2
copy into @my_unload_stage2 from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"
--SINGLE = FALSE
MAX_FILE_SIZE=16777216;
"SNOWFLAKE_SAMPLE_DATA"
list @my_unload_stage2 ------ Unloading it as .gz file ------------
2) Whole data into single file with delimiter |-|
""SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10""SNOWFLAKE_SAMPLE_DATA""SNOWFLAKE_
SAMPLE_DATA"SELECT * FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"
create or replace file format my_unload_format
type = 'CSV'
field_delimiter = '|-|'
COMPRESSION = NONE;
create or replace stage my_unload_stage
file_format = my_unload_format;
remove @my_unload_stage
copy into @my_unload_stage from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"
--SINGLE = FALSE
MAX_FILE_SIZE=16777216;
"SNOWFLAKE_SAMPLE_DATA"
list @my_unload_stage ------ Unloading it as .gz file ------------
get @my_unload_stage/data file://C:\Snowflake\UNload\Assin1;
4) DB Objects creation:
 Create a database SFTRNG

 Create schema TRNG
 Create tables LINEITEM_Local; LINEITEM_AWS; LINEITEM_GCP by doing data profiling and
determining the max length of each column. (Note: not to use the default column data types)
Solution:
create or replace database SFTRNG;
create or replace table LINEITEM_Local (
L_ORDERKEY NUMBER,
L_PARTKEY NUMBER,
L_SUPPKEY NUMBER,
L_LINENUMBER NUMBER,
L_QUANTITY NUMBER,
L_EXTENDEDPRICE NUMBER,
L_DISCOUNT NUMBER,
L_TAX NUMBER,
L_RETURNFLAG VARCHAR,
L_LINESTATUS VARCHAR,
L_SHIPDATEDATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT VARCHAR,
L_SHIPMODE VARCHAR,
L_COMMENT VARCHAR);
create or replace file format LINEITEM_Local
type = 'CSV'
field_delimiter = '|'
skip_header = 1;
create or replace stage my_local_stage
file_format = LINEITEM_Local;
put file://c:\snowflake\load\contacts1.csv @my_local_stage;
list @my_local_stage;
select * from LINEITEM_Local;
copy into LINEITEM_Local
from @my_local_stage/contacts1.csv
file_format = (format_name = mylocalformat)
--on_error = 'SKIP_FILE'
FORCE = TRUE
PURGE = TRUE;
copy into mylocaltable
from @my_local_stage
--file_format = (type = local field_delimiter = '|' skip_header = 1)
--pattern='.*contacts[1-5].csv.gz'
--on_error = 'abort statement'
--on_error = 'skip_file'
--on_error = 'CONTINUE'
FORCE = TRUE
VALIDATION_MODE = 'RETURN_3_ROWS';
copy into LINEITEM_Local
from @my_local_stage/contacts.csv;
1) COPY/CLONE:
Do the copy or clone from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."ORDERS" as ORDERS_BKUP
Solution:
create or replace table orders_bkup clone orders;
2) TimeTravel:
A) Drop the table ORDERS_BKUP and note the query id.
B) Restore the ORDERS_BKUP table and note the query id.

C) Delete the data from ORDERS_BKUP and note the query id
D) Restore the data into ORDERS_BKUP using time travel and note the query id and query
Solutoin:
A) drop table ORDERS_BKUP; --01a25551-0000-2022-0000-0002cc2e60f1
UNDROP TABLE ORDERS_BKUP; --01a25552-0000-2023-0000-0002cc2e51f9
B) SELECT * FROM ORDERS_BKUP; --01a25552-0000-2022-0000-0002cc2e611d
C) delete from ORDERS_BKUP; --01a25555-0000-2023-0000-0002cc2e526d
D) -- time travel to a time just before the update was run

select * from ORDERS_BKUP AT (timestamp => '2022-02-15 19:37:48.259 +0000'::timestamp); --
01a2555f-0000-2022-0000-0002cc2e6271
-- time travel to 10 minutes ago (i.e. before we ran the update)

select * from ORDERS_BKUP AT(offset => -60*1) --01a25560-0000-2022-0000-0002cc2e62a5
select * from ORDERS_BKUP AT (statement => '01a25555-0000-2023-0000-0002cc2e526d'); --

01a25561-0000-2022-0000-0002cc2e62c5
Performance:
Write a query to find the number of orders each customer has made and total price of the orders
From the ORDERS_BKUP table.
Note the total micro partitions being used and micro partitions being scanned.
Increase the size of the WH and note the performance statistics.
Note the difference with and without usage of result cache.
Determine if clustering can improve the performance.
Solution:
CREATE or replace TABLE ORDERS AS SELECT * FROM

"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."ORDERS"
CREATE or replace TABLE ORDERS_CLUSTER AS SELECT * FROM

"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."ORDERS"
ALTER TABLE ORDERS_CLUSTER CLUSTER BY (O_ORDERDATE,O_ORDERSTATUS)
SELECT * FROM ORDERS where (O_ORDERDATE,O_ORDERSTATUS) = ('1992-01-20','F') --

1.66s
SELECT * FROM ORDERS_CLUSTER where (O_ORDERDATE,O_ORDERSTATUS) = ('1992-01-

20','F') -- 1.02s

Snowflake Mini Project

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Snowflake Mini Project

Uploaded by

Copyright:

Available Formats

Snowflake Mini Project

Unload the data from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

1) Whole data into multiple csv files with delimiter | (Pipe)

2) Whole data into single file with delimiter |-|

Document the queries used to unload the data.

1) Whole data into multiple csv files with delimiter | (Pipe)

create or replace file format my_unload_format

create or replace stage my_unload_stage2

copy into @my_unload_stage2 from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

list @my_unload_stage2 ------ Unloading it as .gz file ------------

2) Whole data into single file with delimiter |-|

create or replace file format my_unload_format

create or replace stage my_unload_stage

copy into @my_unload_stage from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10"."LINEITEM"

get @my_unload_stage/data file://C:\Snowflake\UNload\Assin1;

 Create a database SFTRNG

create or replace database SFTRNG;

create or replace table LINEITEM_Local (

create or replace stage my_local_stage

put file://c:\snowflake\load\contacts1.csv @my_local_stage;

select * from LINEITEM_Local;

copy into LINEITEM_Local

file_format = (format_name = mylocalformat)

copy into mylocaltable

--file_format = (type = local field_delimiter = '|' skip_header = 1)

--on_error = 'abort statement'

copy into LINEITEM_Local

Do the copy or clone from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1000"."ORDERS" as ORDERS_BKUP

create or replace table orders_bkup clone orders;

B) Restore the ORDERS_BKUP table and note the query id.

A) drop table ORDERS_BKUP; --01a25551-0000-2022-0000-0002cc2e60f1

UNDROP TABLE ORDERS_BKUP; --01a25552-0000-2023-0000-0002cc2e51f9

B) SELECT * FROM ORDERS_BKUP; --01a25552-0000-2022-0000-0002cc2e611d

C) delete from ORDERS_BKUP; --01a25555-0000-2023-0000-0002cc2e526d

D) -- time travel to a time just before the update was run

-- time travel to 10 minutes ago (i.e. before we ran the update)

select * from ORDERS_BKUP AT (statement => '01a25555-0000-2023-0000-0002cc2e526d'); --

Increase the size of the WH and note the performance statistics.

Note the difference with and without usage of result cache.

Determine if clustering can improve the performance.

CREATE or replace TABLE ORDERS AS SELECT * FROM

CREATE or replace TABLE ORDERS_CLUSTER AS SELECT * FROM

ALTER TABLE ORDERS_CLUSTER CLUSTER BY (O_ORDERDATE,O_ORDERSTATUS)

SELECT * FROM ORDERS where (O_ORDERDATE,O_ORDERSTATUS) = ('1992-01-20','F') --

SELECT * FROM ORDERS_CLUSTER where (O_ORDERDATE,O_ORDERSTATUS) = ('1992-01-

You might also like