Professional Documents
Culture Documents
2020 Bookmatter JumpstartSnowflake
2020 Bookmatter JumpstartSnowflake
2020 Bookmatter JumpstartSnowflake
A AVRO file
generation, 169
Administration
JSON sample file, 168
clustered tables, 124, 125
loading data, 171
database objects, 122, 123
metadata, 169
databases, 118, 119
schema, 168
data share, 123, 124
working, 167
parameters, 121, 122
AWS Snowball, 245
warehouses, 117, 118
Azure Data Box, 245
Advanced Encryption Standard
Azure Databricks, 222
(AES), 136
connecting Snowflake, 227
Agile data warehousing, 237
creation, 223, 224
ALTER WAREHOUSE
data, 227, 228
command, 118
delta caching, 226
Alteryx, 214
environment, 224, 225
Amazon Web Services (AWS), 20, 229
notebook, 226
Analytical ecosystem, 214
Spark cluster, 225, 226
Apache MLflow, 215
Apache Spark, 214, 222
cloud providers, 216
components, 216, 217 B
connector Batch method, 197
data flow process, 218 Bulk data loading
key features, 219 compression methods, 57
stages, 218 COPY statement, 54
data scientists, 216 encoding, 56
machine learning, 215 encryption, 57
optimal strategy, 216 file formats, 55, 56
vs. Snowflake, 219, 220 staging area, 54
262
INDEX
E L
Extract-load-transform (ELT), 2, Loading data files, 63, 64
198, 242, 245
Extract-transform-load (ETL)
processing, 62, 242, 245 M, N
Machine learning, 215
Managed Streaming for Kafka
F (MSK), 94
File preparation, bulk data Massive parallel
CSV files, 59 processing (MPP), 232
file sizing, 58 data management solution, 6
semistructured data, 60 data mining techniques, 4
splitting files, 58 principles, 2
File staging Redshift, 5
ETL processing, 62 vs. SMP, 2, 3
logical paths, 61 Snowflake, 5
named stage, 61 Materialized views (MVs)
staged data, 62 benefit, 127
data manipulation language,
126
G, H, I similarities and differences, 127
Google Cloud Platform (GCP), 229 Matillion ETL, 198, 200, 245
GraphX, 217 ML Libraries, 217, 222
Modern solution
architecture, 196–198
J, K Multicluster virtual
JSON format warehouses, 38
extracting attributes, 165 Multifactor authentication
FLATTEN function, 166 (MFA), 134
NASDAQ, 162
Snowflake, 163
SQL, 165 O
table, 164 Optimized Row
tree structure, 163 Columnar (ORC), 149
263
INDEX
P R
Parquet file Real-world project
creating metadata, 173 big data, 248
CSV sample file, 172 challenges, 246
PyArrow, 172 DW architecture, 246, 247
transforming data, 173 ETL tool, 247
uploading data and copying to streaming, 248
target table, 174, 175 Tableau, 247
working, 171 Regions, 20
Partition pruning, 125 Resource consumption,
Pattern matching, 64 administration
Penetration testing, 144 data storage, usage, 115
Planning data transfer, usage, 116
cloud provider, 20 usage permissions, 113
limitations, 22 VWs, usage, 114
regions, 20 Role-based access control (RBAC),
pricing model, 22 108, 135
cloud storage, 23 Roles and users, administration
virtual warehouse access control models, 108
size, 23 account menu, 113
Snowflake editions, 20 create user, 112
tools, 24 hierarchy, role, 109, 110
JDBC, 25 marketing role, 108, 109
ODBC, 25 role, commands, 111
SnowSQL, 25 Snowflake account, 109
web interface, 24, 25 user, commands, 111, 112
Pushdown optimization, 219 R Studio, 214
Q S
Qubola, 214 Schema-on-read approach, 147
Querying staged files, 64, 65 Security reference architecture
264
INDEX
265
INDEX
266
INDEX
267