Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

TERADATA

Training

By

Umaa S Krisnan

by Umaa S Krishnan 1
What is Teradata?
Teradata is an RDBMS designed for enterprise data
warehousing.
Massively Parallel Processing system(MPP)
Parallelism throughout Platform
“Share Nothing” architecture
Linear Scalability
Shared Nothing Software
• Delivers linear scalability
– Maximizes utilization of SMP resources
– To any size configuration
– Allows flexible configurations
– Incremental upgrades
• Linear with a slope of 1 at any size
VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs
Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps

Copyright 2005 - Teradata, A division of


01/2005 3
NCR Corporation
Teradata Scales Linearly

• Scaling achieved via ‘shared nothing’ architecture


and unconditional parallelism
• Power is in linear scalability More nodes
More work
where slope = 1 More users
More data

BYNET

Node
Node1 Node2 Node3 Node4 Work
Users
Data
Retail customer
I/O Utilization – 228 nodes
1400

1200

1000

800 Phys Disk WrTB


Phys PreRd TB
Phys PosRd TB

600

400

200

0
8/1/2002 8/8/2002 8/15/2002 8/22/2002 8/29/2002 9/5/2002

Peaks nearing 1200 TB per day


Copyright 2005 - Teradata, A division of
01/2005 5
NCR Corporation
The Largest Data Warehouse in the World

• 20,000 users • 314 nodes (As of September, ‘04)


• 140+ applications – 1048 CPUs
– 70 + TB perm data • 616 - Intel 700mhz CPUs
– Approx 50 TB raw data • 128 – Intel 900mhz CPUs
– Any question on any data from any user • 88 – Intel 2.8 Ghz CPUs
anytime (within security and privacy • 216 - Intel 3.06 Ghz CPUs
constraints) – 1256 GB RAM
– Full normalized enterprise model (thousands – 242 TB raw disk – 10,736 drives
of tables) – 121 TB Max Perm addressable
– 300K-450K queries/day – ~32 GB/sec interconnect bandwidth
• 60% < 1 second – ~44 GB/sec I/O bandwidth
• 95% < 1 minute – >350 TB/day average I/O
– >650 TB/day max I/O

VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs
Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps

Copyright 2005 - Teradata, A division of


01/2005 6
NCR Corporation
Shared Nothing - Dividing the Work
• Basis of Teradata scalability
– Each AMP owns an equal slice of the disk
– Only that AMP reads that slice
• No single point of control for any operation
– I/O, Buffers, Locking, Logging, Dictionary
– Nothing centralized
Logs AMPs
Locks
Buffers
I/O
Copyright 2005 - Teradata, A division of
01/2005 7
NCR Corporation
NODE ARCHITECTURE

PE PE PE

BYNET 0

BYNET 1

AMP AMP AMP AMP AMP AMP

by Umaa S Krishnan 8
PARSING ENGINE (PE)
The Parsing Engine does three things every time
you run an SQL statement.
• Checks the syntax of your SQL
• Checks the security to make sure you have
access to the table
• Comes up with a plan for the AMPs to follow

by Umaa S Krishnan 9
PARSING ENGINE – PE (CONTD)
• The PE creates a PLAN that tells the AMPs exactly
what to do in order to get the data.
• The PE knows how many AMPs are in the system,
how many rows are in the table, and the best
way to get to the data.
• Query Optimizer is in the Parsing Engine
• The Parsing Engine verifies SQL requests for
proper syntax, checks security, maintains up to
120 individual user sessions, and breaks down
the SQL requests into steps.
by Umaa S Krishnan 10
PARSER ENGINE

LOGON

PE

by Umaa S Krishnan 11
Access Module Processors (AMPs)
• The philosophy of parallel processing revolves around the AMPs.
Teradata takes each table and spreads the rows evenly among all
the AMPs. When data is requested from a particular table each
AMP retrieves the rows for the table that they hold on their
disk.
• If the data is spread evenly then each AMP should retrieve
their rows simultaneously with the other AMPs.
• That is what we mean when we say Teradata was born to be
parallel.
• The AMPs will also perform output conversion while the PE
performs input conversion.
• The AMPs do the physical work associated with retrieving an
answer s et.

by Umaa S Krishnan 12
AMP CONTD

PE

AMP AMP AMP AMP

Order Order Order Order

Order Item Order Item Order Item Order Item

by Umaa S Krishnan
13
AMPS

AMP1 AMP2 AMP3

order order order

emp emp emp

by Umaa S Krishnan 14
Data Management - Bottom Line
• No reorgs
– Don’t even have a reorg utility
• No index rebuilds
• No re-partitioning
• No detailed space management
• Easy database and table definition
• Minimum ongoing maintenance
– All performed automatically

Copyright 2005 - Teradata, A division of


01/2005 15
NCR Corporation
FALLBACK
• A WAY TO ENSURE FAULT TOLERANCE AT AMP
LEVEL
• GROUPS OF AMPS MAKE A CLUSTER
• DATA IS WRITTEN ON TO THE AMP AND
ANOTHER AMP IN THE CLUSTER
• FALLBACK OPTION DEFINED AT TABLE LEVEL
DURING TABLE CREATION

by Umaa S Krishnan 16
FALLBACK CLUSTER

PE

AMP AMP AMP AMP

Order Order Order Order


O1 O2 O3 O4 O 5 O6 O6 O7
O6 O7 O5 O 6 O3 O 4 O1 O 2

by Umaa S Krishnan
17
FALLBACK CLUSTER
• IF 1 AMP IN A CLUSTER FAILS, NO PROBLEM
• IF MORE THAN 1 AMP IN A CLUSTER FAILS,
THEN QUERY WILL BE PROCESSED IF THAT
AMP IS NOT IN USED IN THE QUERY
AMP AMP AMP AMP

Order Order Order Order


O1 O2 O3 O4 O 5 O6 O6 O7
O6 O7 O5 O 6 O3 O 4 O1 O 2

by Umaa S Krishnan
18
FALLBACK CLUSTER REVIEW
• IF AMP 1 and AMP 4 FAILS, GIVE 2 EAMPLES
OF A QUERY THAT WILL FAIL
• GIVE 2 EXAMPLES OF A QUERY THAT WILL
RETURN VALID ROWS
AMP AMP AMP AMP

Order Order Order Order


O1 O2 O3 O4 O 5 O6 O6 O7
O6 O7 O5 O 6 O3 O 4 O1 O 2

by Umaa S Krishnan
19
CLIQUE
• A group of nodes is connected by hardware is
a clique
• It enables fault tolerance in the event of a
node failure
• The AMP processes of the failed node
transfers to the remaining nodes

by Umaa S Krishnan 20
NODE IN A CLIQUE FAILS

NODE 1 NODE 2

NODE 3 NODE 4
All AMP processes will migrate to surviving node

by Umaa S Krishnan 21
by Umaa S Krishnan 22
Exercise 1
1) Name two Operating Systems that the Teradata Database runs on
----- ---------
2) Which of the following represents a trillion bytes (1 TB) of data ?
a) 10 ⁶ b) 10 ⁹ c) 10₁₂ d) 10 ⁸

3) Which features allows Teradata to process large amounts of data quickly?


a) High availability software and hardware components
b) Parallelism
c) Proven Scalability
d) High performance servers from Intel

4) What tasks do Teradata DBAs never have to do? (Select 1)


a) Reorganize data
b) Select Primary Index
c) Restart the System
d) Allocate table space

by Umaa S Krishnan 23
TERADATA – PRIMARY INDEX
• DATA IS DISTRIBUTED BASED ON THE PRIMARY INDEX
• Teradata's PE examines the Primary Index value of the row.
Teradata takes that Primary Index value and runs it
through a Hashing Algorithm. The output of the
Hashing Algorithm (i.e., a formula) is a 32-bit Row Hash.

• The 32-bit Row Hash will perform two functions:


1. The 32-bit Row Hash will point to a certain spot on the
Hash Map, which will indicate which AMP will hold the
row.
2. The 32-bit Row Hash will always remain with the Row as
part of a Row ID.

by Umaa S Krishnan 24
PRIMARY INDEX
1) When query is issued, PE looks up in the hash map, and decides which AMPS participates in
the query
2) Hash value of a primary PE
index always remains HASH MAP
the same.
3) So if EMP, WITH PRIMARY INDEX ON NAME, AND NAME = JOHN’S HASH VALUE
IN HASH
MAP = 3, THEN
ALGORITHM WILL
ALWAYS
RETURN 3 AMP
AMP AMP AMP

**** WHEN YOU ADD AMPS THEN HASH MAP WILL CHANGE ****

by Umaa S Krishnan
25
PRIMARY INDEX
• Does not have to be unique
• Is the fastest way of retrieving data
• Choice of primary index is extremely important
• Criteria for Primary Index
- Relatively Non Volatile
- Even Distribution amongst AMPS
- Is a frequent criteria for selection
- Must be a Date, Integer, char or varchar
by Umaa S Krishnan 26
CREATE TABLE SYNTAX
CREATE TABLE Order_Table
(Order_No Integer Not Null,
Cust_No Integer Not Null,
Order_Date Date,
Order_Total Decimal(10,2))
Primary Index (OrderNo))

by Umaa S Krishnan 27
ROW DISTRIBUTION
AMP AMP AMP AMP

DAVIS * GATES *
SMITH JOBS
DAVIS * GATES *
KRISHNAN MARKS
WOODS BATES
SACHS

Non Unique Primary Index Using Employee Name

by Umaa S Krishnan 28
ROW DISTRIBUTION
AMP AMP AMP AMP

MALE * FEMALE*
MALE * FEMALE*
MALE* FEMALE*

Non Unique Primary Index using Employee Gender Code

by Umaa S Krishnan 29
ROW DISTRIBUTION
AMP AMP AMP AMP

1234 3412 1241 922


4856 8314 4141 12
3243 1111 2121 3123

Non Unique Primary Index using Emp No

by Umaa S Krishnan 30
ROW DISTRIBUTION
AMP AMP AMP AMP

HR* ENG* MAINT* R&D*


HR* ENG* MAINT* R&D*
HR* ENG* ADMIN* R&D*

Non Unique Primary Index using Dept

by Umaa S Krishnan 31
PRIMARY INDEX
• Every primary index will be made of the hash
value + uniqueness value
• UPI (Unique Primary Index) accesses 1 AMP
• UPI returns 0 -1 row
• NUPI (Non Unique Primary Index) accesses 1
AMP
• NUPI returns 0-Many rows

by Umaa S Krishnan 32
CREATE TABLE SYNTAX
CREATE TABLE Order_Table
(Order_No Integer Not Null,
Cust_No Integer Not Null,
Order_Date Date,
Order_Total Decimal(10,2))
Primary Index (OrderNo)
PARTION BY Order_Date;

by Umaa S Krishnan 33
PARTITION ELIMINATION CAN AVOID FULL TABLE SCANS
AMP1 AMP2
EMP DEPT NAME EMP DEPT NAME

99 10 TOM 44 10 JERRY
75 10 MIKE 32 10 MIKE
56 10 UMAA 12 10 ANITA
67 20 JAYRA 45 20 TOM
54 20 ANITA 16 20 SALLY
30 20 SASHA 22 20 SASHA

Partition Primary Index is Dept

by Umaa S Krishnan 34
SET AND MULTISET TABLE
• Set table during create table ensure no
duplicates
• Multiset table allows duplicates
• Syntax
Create set table emp (emp_id int,
name varchar(10))
;

by Umaa S Krishnan 35
by Umaa S Krishnan 36
by Umaa S Krishnan 37

You might also like