2020 Bookmatter JumpstartSnowflake

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Index

A AVRO file
generation, 169
Administration
JSON sample file, 168
clustered tables, 124, 125
loading data, 171
database objects, 122, 123
metadata, 169
databases, 118, 119
schema, 168
data share, 123, 124
working, 167
parameters, 121, 122
AWS Snowball, 245
warehouses, 117, 118
Azure Data Box, 245
Advanced Encryption Standard
Azure Databricks, 222
(AES), 136
connecting Snowflake, 227
Agile data warehousing, 237
creation, 223, 224
ALTER WAREHOUSE
data, 227, 228
command, 118
delta caching, 226
Alteryx, 214
environment, 224, 225
Amazon Web Services (AWS), 20, 229
notebook, 226
Analytical ecosystem, 214
Spark cluster, 225, 226
Apache MLflow, 215
Apache Spark, 214, 222
cloud providers, 216
components, 216, 217 B
connector Batch method, 197
data flow process, 218 Bulk data loading
key features, 219 compression methods, 57
stages, 218 COPY statement, 54
data scientists, 216 encoding, 56
machine learning, 215 encryption, 57
optimal strategy, 216 file formats, 55, 56
vs. Snowflake, 219, 220 staging area, 54

© Dmitry Anoshin, Dmitry Shirokov, Donna Strok 2020 261


D. Anoshin et al., Jumpstart Snowflake, https://doi.org/10.1007/978-1-4842-5328-1
INDEX

Bulk data loading (cont.) Data sharing, 186–188, 190, 192


storage locations, 55 Datasource API, 217
user interface, loading, 66–68 Data system
Business intelligence (BI), 1, 230 lifecycle, 251
retention period, 252
Data warehouse, 90, 92
C Data warehouse as a service
Cloud computing (DWaaS), 12, 107
deployment models, 10 Data warehouse (DW), 1
key terms, 8 Data warehouse migration
modern bandwidth, 13 architecture, 230
role of hypervisor, 9 business analytics, 230, 231
service models, 11 cloud analytics, 233, 234
Shared responsibility model, goal, 230
AWS, 12 on-premise analytics, 231–233
virtualization, 9 organizational part
COPY command, 54, 63, 91 data, 238, 242
COPY INTO table statement, 60 deadlines/budget, 239
CREATE SHARE command, 123 development/deployment
process, 237, 238
documentation, 236
D migration
Database management approach, 237, 238
commands, 118 outcomes, 239
Databricks, 214, 220 repoint tools, 243
elements, 221 run, 243
components, 222 security, 240
Data clustering, 125 Snowflake, 241, 242
Data Definition test plan, 240, 243
Language (DDL), 30 overview, 235, 236
Data Manipulation Language technical part, 244, 245
(DML), 30 Delta Lake, 222
DataRobot, 214 Discretionary access control
Data science tool, 198 (DAC), 108, 135

262
INDEX

E L
Extract-load-transform (ELT), 2, Loading data files, 63, 64
198, 242, 245
Extract-transform-load (ETL)
processing, 62, 242, 245 M, N
Machine learning, 215
Managed Streaming for Kafka
F (MSK), 94
File preparation, bulk data Massive parallel
CSV files, 59 processing (MPP), 232
file sizing, 58 data management solution, 6
semistructured data, 60 data mining techniques, 4
splitting files, 58 principles, 2
File staging Redshift, 5
ETL processing, 62 vs. SMP, 2, 3
logical paths, 61 Snowflake, 5
named stage, 61 Materialized views (MVs)
staged data, 62 benefit, 127
data manipulation language,
126
G, H, I similarities and differences, 127
Google Cloud Platform (GCP), 229 Matillion ETL, 198, 200, 245
GraphX, 217 ML Libraries, 217, 222
Modern solution
architecture, 196–198
J, K Multicluster virtual
JSON format warehouses, 38
extracting attributes, 165 Multifactor authentication
FLATTEN function, 166 (MFA), 134
NASDAQ, 162
Snowflake, 163
SQL, 165 O
table, 164 Optimized Row
tree structure, 163 Columnar (ORC), 149

263
INDEX

P R
Parquet file Real-world project
creating metadata, 173 big data, 248
CSV sample file, 172 challenges, 246
PyArrow, 172 DW architecture, 246, 247
transforming data, 173 ETL tool, 247
uploading data and copying to streaming, 248
target table, 174, 175 Tableau, 247
working, 171 Regions, 20
Partition pruning, 125 Resource consumption,
Pattern matching, 64 administration
Penetration testing, 144 data storage, usage, 115
Planning data transfer, usage, 116
cloud provider, 20 usage permissions, 113
limitations, 22 VWs, usage, 114
regions, 20 Role-based access control (RBAC),
pricing model, 22 108, 135
cloud storage, 23 Roles and users, administration
virtual warehouse access control models, 108
size, 23 account menu, 113
Snowflake editions, 20 create user, 112
tools, 24 hierarchy, role, 109, 110
JDBC, 25 marketing role, 108, 109
ODBC, 25 role, commands, 111
SnowSQL, 25 Snowflake account, 109
web interface, 24, 25 user, commands, 111, 112
Pushdown optimization, 219 R Studio, 214

Q S
Qubola, 214 Schema-on-read approach, 147
Querying staged files, 64, 65 Security reference architecture

264
INDEX

account and user JDBC driver, 219


authentication, 134, 135 key layers, 16
AES, 136, 137 planning (see Planning)
audit and logging scalability, 17
history audit functions, 142 table, 220
penetration tests, 144 consumer account, 185, 186
query history audit logs metadata, 184
functions, 139, 143 steps, 181, 182
Query Profiler, 140 stock data, 183
layers, 130, 131 VW (see Virtual warehouses
network and site access, 133, (VWs))
134 web interface (see Web
object security, 135, 136 interface, Snowflake)
physical security, 133 XML, 153
validations, 138 Snowflake partner ecosystem
VPC, 132 connect page, 198, 199
Semistructured data drivers, 199
data types, 149, 151 Matillion ETL
file formats, 148, 149 creation, 201, 204, 205
schema-on-read approach, 147 definition, 200
Shared responsibility model key elements, 203
(SRM), 12 modern solution
Snowflake, 197 architecture, 202
account creation, 26, 27 objects, 200
architecture, 15 tables, 205
aspects, 14 Tableau, 206
cloud providers, 14 best-of-breed
connection, 28, 29 technologies, 206
data sharing connection window, 207, 208
benefits, 178 data, 209
process, 181 desktop connection, 206, 207
DLL, 244 sign in, 208
ETL processing, 16 SQL query, 210
internal/external stages, 220 visualization, 206

265
INDEX

Snowflake Spark Connector, 220 SQLDbm tool, 244


Snowpipe, 91 Stream method, 197
benefits, 93
options, 92
Snowpipe Auto-Ingest, 94, 95 T
data pipeline, 95–97, 99, 100 Tableau, 198
CloudWatch logging, 103 Time travel feature
Kinesis Firehose delivery data retention
stream, 102 parameter, 256, 257
PUT statement, 102 sample table, 257
S3 bucket, 102 table clone, 259
SQS, 101 table creation, 257
stream events, 100 table state, 258
testing, 104 Time travel SQL extension
Snowpipe REST API, 104, 105 process, 254
SnowSQL statements, 252
commands, 81, 82 work, 255
installation
curl commands, 70
Downloads dialog, 70, 71 U
introduction screen, 71, 72 UNDROP DATABASE, 119
platform-specific USE ROLE command, 110
versions, 70 USE WAREHOUSE
Summary tab, 72, 73 command, 37, 47
load data, 86, 87
multiple sessions, 83, 85
SnowSQL configuration V
connection settings, 74, 75 Virtual private cloud (VPC), 132, 245
variables, 76 Virtual Private Snowflake (VPS), 22
active session, 79, 80 Virtual warehouses (VWs), 15, 114
command line, 78, 79 building
config file, 76–78 creating, 43
Spark dataframe, 219 load monitoring, 47–51
SqlDBM model, 198, 205 query statuses, 49, 50

266
INDEX

start/resume, suspend, and partner connect page, 33


resize, 45 shares page, 32
TESTWAREHOUSE, 45 user preferences menu, 34
USE WAREHOUSE warehouses page, 31, 32
command, 47 worksheets page, 32
caching impacts, 42
multicluster
choosing minimum and X, Y
maximum number, 39 XML format, 151
credits and usage, 40 built-in functions, 153
query design, 41 choosing warehouse, 154
scaling, 42 downloading source file, 155
sizes and features example, 151
choosing right size, 36 extracting values, 159–161
concurrency, 36, 37 file creation, 156
USE WAREHOUSE LATERAL FLATTEN table
command, 37 function, 159
loading data into table, 154
selecting load options, 157
W structure, 152
Warehouse commands, 117, 118 XMLGET function, 158
Web interface, Snowflake
databases page, 30, 31
help menu, 33 Z
history page, 33 Zero-Copy cloning, 119, 121

267

You might also like