Day 2 Red Shift Arch

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Day 2: Redshift Architecture

CREATE TABLE
Creates a new table in the current database. The owner of this table is the issuer of the CREATE TABLE
command. The maximum length for the table name is 127 bytes; longer names are truncated to 127
bytes. You can use UTF-8 multibyte characters up to a maximum of four bytes. Amazon Redshift
enforces a quota of the number of tables per cluster by node type, including user-defined temporary
tables and temporary tables created by Amazon Redshift during query processing or system
maintenance.

Syntax:
CREATE [ [LOCAL] {TEMPORARY | TEMP}] TABLE

[IF NOT EXISTS] table_name

({column_name data_type [column_attributes] [ column_constraints ]

table_constraints

LIKE parent_table [ { INCLUDING | EXCLUDING} DEFAULTS ] }

[,... ])

[BACKUP {YES | NO}]

[table_attribute]

where column_attributes are:

[DEFAULT default_expr]

[ IDENTITY (seed, step)]

[ GENERATED BY DEFAULT AS IDENTITY (seed, step)]

[ENCODE encoding]

[ DISTKEY]

[ SORTKEY]

[COLLATE CASE SENSITIVE | COLLATE CASE INSENSITIVE ]

and column constraints are:

[[ NOT NULL | NULL}]

[{UNIQUE | PRIMARY KEY}]

[REFERENCES reftable [ (refcolumn) ]]

and table constraints are:


[UNIQUE [column_name [,... ]) ]

[PRIMARY KEY ( column_name [,... ]) ]

[FOREIGN KEY (column_name L-11REFERENCES reftable [[ refcolumn]]

and table attributes are:

[ DISTSTYLE (AUTO | EVEN | KEY | ALL}}

[ DISTKEY(column_name]]

[ECOMPOUND | INTERLEAVED) SORTKEY( column_name[]) | [SORTKEY AUTO]]

[ENCODE AUTO]

Data types
Each value that Amazon Redshift stores or retrieves has a data type with a fixed set of associated
properties. Data types are declared when tables are created. A data typ constrains the set of values that
a column or argument can contain.

 Multibyte characters: The VARCHAR data type supports UTF-8 multibyte characters up to a
maximum of four bytes. Five-byte or longer characters are not supported.
 Numeric types: Numeric data types include integers, decimals, and floating-point numbers.
 Character types: Character data types include CHAR (character) and VARCHAR (character
varying).
 Datetime types: Datetime data types include DATE, TIME, TIMETZ, TIMESTAMP, and
TIMESTAMPTZ.
Boolean type: Use the BOOLEAN data type to store true and false values in a single byte column.
The following table describes the three possible states for a Boolean value and the literal values
that result in that state.
 HLLSKETCH type: Use the HLLSKETCH data type for HyperLogLog sketches. Amazon Redshift
supports HyperLogLog sketch representations that are either sparse or dense. Sketches begin as
sparse and switch to dense when the dense format is more efficient to minimize the memory
footprint that is used.
 SUPER type: Use the SUPER data type to store semistructured data or documents as values.

Compression encodings
A compression encoding specifies the type of compression that is applied to a column of data values as
rows are added to a table.

ENCODE AUTO is the default for tables. Amazon Redshift automatically manages compression encoding
for all columns in the table. You can specify the ENCODE AUTO option for the table to enable Amazon
Redshift to automatically manage compression encoding for all columns in the table.

Amazon Redshift automatically assigns compression encoding to columns for which you don't specify
compression encoding as follows:
 Columns that are defined as sort keys are assigned RAW compression.
 Columns that are defined as BOOLEAN, REAL, or DOUBLE PRECISION data types are assigned
RAW compression.
 Columns that are defined as SMALLINT, INTEGER, BIGINT, DECIMAL, DATE, TIMESTAMP, or
TIMESTAMPTZ data types are assigned AZ64 compression.
 Columns that are defined as CHAR or VARCHAR data types are assigned LZO compression.

Clusters
An Amazon Redshift cluster consists of nodes. Each cluster has a leader node and one or more compute
nodes. The leader node receives queries from client applications, parses the queries, and develops query
execution plans. The leader node then coordinates the parallel execution of these plans with the
compute nodes and aggregates the intermediate results from these nodes. It then finally returns the
results back to the client applications.

Amazon Redshift offers different node types to accommodate your workloads, and we recommend
choosing RA3 or DC2 depending on the required performance, data size, and expected data growth.

RA3 nodes with managed storage enable you to optimize your data warehouse by scaling and paying for
compute and managed storage independently. With RA3, you choose the number of nodes based on
your performance requirements and only pay for the managed storage that you use. Size your RA3
cluster based on the amount of data you process daily.

DC2 nodes enable you to have compute-intensive data warehouses with local SSD storage included. You
choose the number of no you need bas on data size and performance requirements. DC2 nodes store
your data locally for high performance, and as the data size grows, you can add more compute nodes to
increase the storage capacity of the cluster.

DS2 nodes enable you to create large data warehouses using hard disk drives (HDDs), and we
recommend using RA3 nodes instead. If you are using DS2 nodes. If you are using eight or more nodes of
ds2.xlarge, or any number of ds2.8xlarge nodes, you can now upgrade to RA3 to get 2x more storage
and improved performance for the same on-demand cost.

You might also like