Professional Documents
Culture Documents
Redshift Vs Snowflake - An In-Depth Comparison PDF
Redshift Vs Snowflake - An In-Depth Comparison PDF
REDSHIFT VS SNOWFLAKE
AN IN-DEPTH COMPARISON
Introduction 1
Primary Use-Cases 2
Architectural Difference 3
Performance Differences 8
Pricing Models 10
Scalability 12
Unique Features 14
Bonus - How to Perform ETL to Redshift and Snowflake? 16
1
Introduction
Redshift
Snowflake
Architectural Difference
Redshift
● Integrations
Amazon Redshift can be integrated with various ETL tools like Hevo, BI
reporting like Power BI, and other analytics tools. Redshift follows
industry-standard PostgreSQL hence most existing SQL client
applications would work with least changes.
3
● Connections
● Clusters
1. Leader Node
The leader node interacts with client programs and does all the
communication with compute nodes. It communicates steps to obtain a
certain result in the most efficient way, assigning data storage to all to
compute nodes. It does not store any data and acts as a leader instructing
all the compute nodes for the actions.
2. Compute Nodes
The leader node compiles code for the request and assigns the code to
individual compute nodes. Now, all the compute nodes will execute the
compiled code and send results back to the leader.
Each compute node has its own CPU, memory, and disk storage, which
are configured by the node type from AWS console login or CLI.
Node Slices
4
Internal network
● Databases
Snowflake
6
Database Storage
When data gets loaded into Snowflake tables, the data is stored in the
compressed and columnar format in the most optimized way. Snowflake
uses Amazon Web Services S3 (Simple Storage Service) cloud storage
for the same purpose.
Snowflake manages almost all the admin and management aspects of
how this data is stored in S3 — the size of a file, its structure, columnar
compression, metadata definition of data storage. The data objects stored
in S3 is not visible to customers. They can only be accessed through SQL
query operations.
Query Processing
All the query part is performed in the processing layer. Queries are
processed using virtual warehouses. Virtual warehouse acts as an
independent cluster allocated with separate workload as per our
requirements. It uses AWS EC2 for achieving this purpose. It is the most
prime feature of Snowflake as compared to Redshift which lacks such
mechanism.
Cloud Services
7
Redshift
Primary keys and foreign key constraints are just for information. They are
not mandatory in Redshift. However, primary keys and foreign keys are
used to design an effective query plan by the query engine. Hence, it is a
good practice to declare them. The query planner uses these relationships
but it assumes that all keys in Amazon Redshift tables are valid as loaded.
So, we need to show extra care with integrity constraints. If the
application allows invalid keys, few queries could return spiked results.
Amazon Redshift enforces NOT NULL column constraints. Data
distribution, workload management of queries, data partition, configuring
nodes, clusters, table sorting, and S3 are some unique feature to store
and access the data in the most efficient way.
Snowflake
Snowflake also supports defining constraints but does not enforce them,
except for NOT NULL as in the case of Redshift. Snowflake supports
constraints on permanent, transient, and temporary tables. Constraints
can be defined on any number of columns of any data types. For
Snowflake Time Travel (data recovery), when previous versions of a table
are recovered, the current version of the constraints on the table is used
because the history of metadata is not stored on Snowflake. It is a zero
management data warehousing service as data distribution, workloads,
configuring nodes, backups, and most of the tasks related to managing
and storing data are either managed by Snowflake or are a matter of few
clicks. Snowflake focuses on analyzing the data more rather than
managing them. We can create many virtual warehouses and configure
them as per need. It is very cost effective and easy to create.
8
Performance Differences
Redshift
Snowflake
Note: For a cluster that runs 24 hours a day Redshift is the best option.
Whereas for reporting queries and when ETL is only done when required
then Snowflake is a better option as you are only charged when you
query the data warehouse.
10
Pricing Models
Redshift
Redshift charges are based on the number of hours and number of nodes.
The pricing starts at $0.25 per-hour for 160GB data. Redshift lets you
choose the hardware specifications as per your requirement. It helps you
find how much storage and throughput you get from the money invested.
Source: A
WS Redshift Pricing
11
Snowflake
Snowflake pricing largely depends upon the usage pattern. It charges an
hourly rate for each of the virtual warehouses created. Data storage is
decoupled so it is charged separately as $0.20 per TB per month. It offers
7 different types of the warehouse. The X-small is the smallest which is
charged at $2 per hour. Snowflake offers dynamic pricing model which
means clusters will shut down when not in use and automatically start
when in use. They also can be resized on the fly depending upon the
workload thus saves more money.
Source: S
nowflake Manual
Selecting the right cluster depends on your usage patterns. If the cluster
is up and running 24 hours a day (due to ETL or reporting) Redshift is the
better option. If the ETL runs ones in a week and only querying of data is
required as per demand Snowflake is the option.
12
Scalability
Redshift
Let’s consider if you are trying to load 1TB of data for the below
instances. Data load speeds are proportional to the number of nodes
defined in the cluster as shown by the findings below:
● A single node XL instance will take close to 16 hours.
● A multi-node XL instance of two nodes will take close to 9 hours.
● A multi-node 8XL instance of two nodes will take close to 1.5
hours.
A query will run faster when there is a number of nodes but the
performance does not rise linearly. Redshift clusters are optimized for
multiple node clusters supporting MPP in the best possible way.
Resizing
13
Snowflake
As Snowflake is easy to use and accessible on almost any scale for all the
users and applications deployed on the cloud. It manages storage,
compute, and metadata separately. Billions of rows of data can be
queried by concurrent users sitting anywhere. Storage and compute can
be scaled up or down independently and the metadata service will
automatically scale up and down as per the requirement.
14
Unique Features
Redshift
Snowflake
● Full SQL database: It supports DDL, DML, analytical functions,
transactions, and complex joins.
● Variety of data: Snowflake ingests almost all kinds of data, either
from traditional sources or machine-generated sources without
tradeoffs. Snowflake supports both structured and semi-structured
data like JSON and Avro.
● No management: S nowflake is a data warehouse as a service
running in the cloud and thus there is no infrastructure to manage
or knobs to turn. Snowflake automatically handles infrastructure
requirement, optimization of queries or tables, data distribution,
availability, and data security.
● Performance: S nowflake processes reports and KPIs at very high
speed because of the columnar database engine.
● Broad ecosystem: Snowflake integrates with almost all kind of
tools near to its ecosystem like Hevo, Redshift, BigQuery. The
different custom connectors include ODBC, JDBC, Javascript,
Python, Spark, R, and Node.js.
16
If you want to load any data easily into Redshift and Snowflake without
any hassle, you can try out Hevo. Hevo automates the flow of data from
various sources to Amazon Redshift and Snowflake in real time and at
zero data loss. In addition to migrating data, you can also build
aggregates and joins on Redshift and Snowflake to create materialized
views that enable faster query processing.
Looking for a simple and reliable way to bring Data from
Any Source to AWS Redshift and Snowflake?
TRY HEVO
SIGN UP FOR FREE TRIAL