TD Concepts

TERADATA
Training
By
Umaa S Krisnan
by Umaa S Krishnan 1
What is Teradata?
Teradata is an RDBMS designed for enterprise data
warehousing.
Massively Parallel Processing system(MPP)
Parallelism throughout Platform
“Share Nothing” architecture
Linear Scalability
Shared Nothing Software
• Delivers linear scalability
– Maximizes utilization of SMP resources
– To any size configuration
– Allows flexible configurations
– Incremental upgrades
• Linear with a slope of 1 at any size
VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs
Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps
Copyright 2005 - Teradata, A division of

01/2005 3
NCR Corporation
Teradata Scales Linearly
• Scaling achieved via ‘shared nothing’ architecture

and unconditional parallelism
• Power is in linear scalability More nodes
More work
where slope = 1 More users
More data
BYNET
Node
Node1 Node2 Node3 Node4 Work
Users
Data
Retail customer
I/O Utilization – 228 nodes
1400
1200
1000
800 Phys Disk WrTB

Phys PreRd TB
Phys PosRd TB
600
400
200
0
8/1/2002 8/8/2002 8/15/2002 8/22/2002 8/29/2002 9/5/2002
Peaks nearing 1200 TB per day

01/2005 5
NCR Corporation
The Largest Data Warehouse in the World
• 20,000 users • 314 nodes (As of September, ‘04)

• 140+ applications – 1048 CPUs
– 70 + TB perm data • 616 - Intel 700mhz CPUs
– Approx 50 TB raw data • 128 – Intel 900mhz CPUs
– Any question on any data from any user • 88 – Intel 2.8 Ghz CPUs
anytime (within security and privacy • 216 - Intel 3.06 Ghz CPUs
constraints) – 1256 GB RAM
– Full normalized enterprise model (thousands – 242 TB raw disk – 10,736 drives
of tables) – 121 TB Max Perm addressable
– 300K-450K queries/day – ~32 GB/sec interconnect bandwidth
• 60% < 1 second – ~44 GB/sec I/O bandwidth
• 95% < 1 minute – >350 TB/day average I/O
– >650 TB/day max I/O
VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs
Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps Amps

01/2005 6
NCR Corporation
Shared Nothing - Dividing the Work
• Basis of Teradata scalability
– Each AMP owns an equal slice of the disk
– Only that AMP reads that slice
• No single point of control for any operation
– I/O, Buffers, Locking, Logging, Dictionary
– Nothing centralized
Logs AMPs
Locks
Buffers
I/O
01/2005 7
NCR Corporation
NODE ARCHITECTURE
PE PE PE
BYNET 0
BYNET 1
AMP AMP AMP AMP AMP AMP
PARSING ENGINE (PE)
The Parsing Engine does three things every time
you run an SQL statement.
• Checks the syntax of your SQL
• Checks the security to make sure you have
access to the table
• Comes up with a plan for the AMPs to follow
PARSING ENGINE – PE (CONTD)
• The PE creates a PLAN that tells the AMPs exactly
what to do in order to get the data.
• The PE knows how many AMPs are in the system,
how many rows are in the table, and the best
way to get to the data.
• Query Optimizer is in the Parsing Engine
• The Parsing Engine verifies SQL requests for
proper syntax, checks security, maintains up to
120 individual user sessions, and breaks down
the SQL requests into steps.
PARSER ENGINE
LOGON
PE
Access Module Processors (AMPs)
• The philosophy of parallel processing revolves around the AMPs.
Teradata takes each table and spreads the rows evenly among all
the AMPs. When data is requested from a particular table each
AMP retrieves the rows for the table that they hold on their
disk.
• If the data is spread evenly then each AMP should retrieve
their rows simultaneously with the other AMPs.
• That is what we mean when we say Teradata was born to be
parallel.
• The AMPs will also perform output conversion while the PE
performs input conversion.
• The AMPs do the physical work associated with retrieving an
answer s et.
AMP CONTD
PE
AMP AMP AMP AMP
Order Order Order Order
Order Item Order Item Order Item Order Item
by Umaa S Krishnan
13
AMPS
AMP1 AMP2 AMP3
order order order
emp emp emp
Data Management - Bottom Line
• No reorgs
– Don’t even have a reorg utility
• No index rebuilds
• No re-partitioning
• No detailed space management
• Easy database and table definition
• Minimum ongoing maintenance
– All performed automatically

01/2005 15
NCR Corporation
FALLBACK
• A WAY TO ENSURE FAULT TOLERANCE AT AMP
LEVEL
• GROUPS OF AMPS MAKE A CLUSTER
• DATA IS WRITTEN ON TO THE AMP AND
ANOTHER AMP IN THE CLUSTER
• FALLBACK OPTION DEFINED AT TABLE LEVEL
DURING TABLE CREATION
FALLBACK CLUSTER
PE
AMP AMP AMP AMP

O1 O2 O3 O4 O 5 O6 O6 O7
O6 O7 O5 O 6 O3 O 4 O1 O 2
by Umaa S Krishnan
17
FALLBACK CLUSTER
• IF 1 AMP IN A CLUSTER FAILS, NO PROBLEM
• IF MORE THAN 1 AMP IN A CLUSTER FAILS,
THEN QUERY WILL BE PROCESSED IF THAT
AMP IS NOT IN USED IN THE QUERY
AMP AMP AMP AMP

O1 O2 O3 O4 O 5 O6 O6 O7
O6 O7 O5 O 6 O3 O 4 O1 O 2
by Umaa S Krishnan
18
FALLBACK CLUSTER REVIEW
• IF AMP 1 and AMP 4 FAILS, GIVE 2 EAMPLES
OF A QUERY THAT WILL FAIL
• GIVE 2 EXAMPLES OF A QUERY THAT WILL
RETURN VALID ROWS
AMP AMP AMP AMP

O1 O2 O3 O4 O 5 O6 O6 O7
O6 O7 O5 O 6 O3 O 4 O1 O 2
by Umaa S Krishnan
19
CLIQUE
• A group of nodes is connected by hardware is
a clique
• It enables fault tolerance in the event of a
node failure
• The AMP processes of the failed node
transfers to the remaining nodes
NODE IN A CLIQUE FAILS
NODE 1 NODE 2
NODE 3 NODE 4
All AMP processes will migrate to surviving node
Exercise 1
1) Name two Operating Systems that the Teradata Database runs on
----- ---------
2) Which of the following represents a trillion bytes (1 TB) of data ?
a) 10 ⁶ b) 10 ⁹ c) 10₁₂ d) 10 ⁸
3) Which features allows Teradata to process large amounts of data quickly?

a) High availability software and hardware components
b) Parallelism
c) Proven Scalability
d) High performance servers from Intel
4) What tasks do Teradata DBAs never have to do? (Select 1)

a) Reorganize data
b) Select Primary Index
c) Restart the System
d) Allocate table space
TERADATA – PRIMARY INDEX
• DATA IS DISTRIBUTED BASED ON THE PRIMARY INDEX
• Teradata's PE examines the Primary Index value of the row.
Teradata takes that Primary Index value and runs it
through a Hashing Algorithm. The output of the
Hashing Algorithm (i.e., a formula) is a 32-bit Row Hash.
• The 32-bit Row Hash will perform two functions:

1. The 32-bit Row Hash will point to a certain spot on the
Hash Map, which will indicate which AMP will hold the
row.
2. The 32-bit Row Hash will always remain with the Row as
part of a Row ID.
PRIMARY INDEX
1) When query is issued, PE looks up in the hash map, and decides which AMPS participates in
the query
2) Hash value of a primary PE
index always remains HASH MAP
the same.
3) So if EMP, WITH PRIMARY INDEX ON NAME, AND NAME = JOHN’S HASH VALUE
IN HASH
MAP = 3, THEN
ALGORITHM WILL
ALWAYS
RETURN 3 AMP
AMP AMP AMP
**** WHEN YOU ADD AMPS THEN HASH MAP WILL CHANGE ****
by Umaa S Krishnan
25
PRIMARY INDEX
• Does not have to be unique
• Is the fastest way of retrieving data
• Choice of primary index is extremely important
• Criteria for Primary Index
- Relatively Non Volatile
- Even Distribution amongst AMPS
- Is a frequent criteria for selection
- Must be a Date, Integer, char or varchar
CREATE TABLE SYNTAX
CREATE TABLE Order_Table
(Order_No Integer Not Null,
Cust_No Integer Not Null,
Order_Date Date,
Order_Total Decimal(10,2))
Primary Index (OrderNo))
ROW DISTRIBUTION
AMP AMP AMP AMP
DAVIS * GATES *
SMITH JOBS
DAVIS * GATES *
KRISHNAN MARKS
WOODS BATES
SACHS
Non Unique Primary Index Using Employee Name
ROW DISTRIBUTION
AMP AMP AMP AMP
MALE * FEMALE*
MALE * FEMALE*
MALE* FEMALE*
Non Unique Primary Index using Employee Gender Code
ROW DISTRIBUTION
AMP AMP AMP AMP
1234 3412 1241 922

4856 8314 4141 12
3243 1111 2121 3123
Non Unique Primary Index using Emp No
ROW DISTRIBUTION
AMP AMP AMP AMP
HR* ENG* MAINT* R&D*

HR* ENG* MAINT* R&D*
HR* ENG* ADMIN* R&D*
Non Unique Primary Index using Dept
PRIMARY INDEX
• Every primary index will be made of the hash
value + uniqueness value
• UPI (Unique Primary Index) accesses 1 AMP
• UPI returns 0 -1 row
• NUPI (Non Unique Primary Index) accesses 1
AMP
• NUPI returns 0-Many rows
CREATE TABLE SYNTAX
CREATE TABLE Order_Table
(Order_No Integer Not Null,
Cust_No Integer Not Null,
Order_Date Date,
Order_Total Decimal(10,2))
Primary Index (OrderNo)
PARTION BY Order_Date;
PARTITION ELIMINATION CAN AVOID FULL TABLE SCANS
AMP1 AMP2
EMP DEPT NAME EMP DEPT NAME
99 10 TOM 44 10 JERRY
75 10 MIKE 32 10 MIKE
56 10 UMAA 12 10 ANITA
67 20 JAYRA 45 20 TOM
54 20 ANITA 16 20 SALLY
30 20 SASHA 22 20 SASHA
Partition Primary Index is Dept
SET AND MULTISET TABLE
• Set table during create table ensure no
duplicates
• Multiset table allows duplicates
• Syntax
Create set table emp (emp_id int,
name varchar(10))
;

TD Concepts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TD Concepts

Uploaded by

Copyright:

Available Formats

TERADATA

Copyright 2005 - Teradata, A division of

• Scaling achieved via ‘shared nothing’ architecture

800 Phys Disk WrTB

Peaks nearing 1200 TB per day

• 20,000 users • 314 nodes (As of September, ‘04)

Copyright 2005 - Teradata, A division of

AMP AMP AMP AMP AMP AMP

AMP AMP AMP AMP

Order Order Order Order

Order Item Order Item Order Item Order Item

AMP1 AMP2 AMP3

order order order

emp emp emp

Copyright 2005 - Teradata, A division of

AMP AMP AMP AMP

Order Order Order Order

Order Order Order Order

Order Order Order Order

3) Which features allows Teradata to process large amounts of data quickly?

4) What tasks do Teradata DBAs never have to do? (Select 1)

• The 32-bit Row Hash will perform two functions:

Non Unique Primary Index Using Employee Name

Non Unique Primary Index using Employee Gender Code

1234 3412 1241 922

Non Unique Primary Index using Emp No

HR* ENG* MAINT* R&D*

Non Unique Primary Index using Dept

Partition Primary Index is Dept

You might also like