Professional Documents
Culture Documents
TD SQL1
TD SQL1
3 main Components:
The PE checks the syntax of the query, check the user security rights
Then PE come up with the best optimized plan for the execution of the query
The PE passes this plan through BYNET to AMPs.
The AMPs follow the plan and retrieve the data from its DISK.
Then AMPs passes the data to PE through BYNET.
The PE then passes the data to the user.
5. Create a table:
SELECT HASHAMP(HASHBUCKET(HASHROW(EmployeeNo))) AS
"AMP#",EmployeeNo,HASHROW(EmployeeNo),HASHBUCKET(HASHROW(EmployeeNo)
)
--,COUNT(*)
FROM EMPLOYEE
--GROUP BY 1
ORDER BY 2,1 DESC;
SELECT HASHAMP(HASHBUCKET(HASHROW(EmployeeNo))) AS
"AMP#",COUNT(*)
FROM EMPLOYEE
GROUP BY 1
ORDER BY 2,1 DESC;
SELECT *
FROM dbc.allrights
WHERE username='vkonara'
AND databasename='APPL';
10. Find the Table Space Size of your table across all AMPs in Teradata
When ask for Explain plan on employee table with a where condition on NUSI:
After collecting stats on the table and ran the explain plan:
Now run the query using column which has no index defined on it.
We have used column which is not any index in the query. This results in ALL-AMP fetch from table
and there is NO confidence level.
When you want to retrieve the rows from the table, the very first step in explain plan is the
pseudo table lock on that table
e.g –
Type select * from <databasename>.<tablename> in the SQL assistant, and do the explain
plan for this table. The very first step you see is –
1) First, we lock a distinct <databasename>.”pseudo table” for read on a
RowHash to prevent global deadlock for
<databasename>.<tablename>.
2)Next, we lock <databasename>.<tablename> for read.
We know that for retrieval of rows from the table we need to put the read lock which we are
implementing in step 2, but the question is that what is this pseudo table lock in the step 1 ?
We know that each AMP holds a portion of a table. We also know that when a Full Table
Scan is performed that each AMP will read their portion of the table.
Now suppose that two different users wants to place multiple locks on the same table and
one user gets one lock and the other user gets another lock. Both user requires lock made
by other user and have to wait for indefinite time to acquire that lock because actually both
the users are waiting for each other to release lock. This is called DEADLOCK.
Note – Teradata selects this “Gatekeeper” AMP by hashing the tablename used in the select
query and then matching the hash value in the hash map. The AMP number which it gets
from hash map is assigned as “Gatekeeper” AMP.
Refer the image below taken from Coffings to understand the concept better –
Data Distribution
Known access path
Improves Join performance
Let us now understand what exactly happens when you define a PI on the table.
Unique Primary Index(UPI):-
As the name suggests a UPI allows unique values i.e. no duplicates are allowed. It is a one
AMP operation and data distribution is even. It can contain one null value. Syntax for
Unique Primary Index:
CREATE TABLE sample_1
(col_a INT,
col_b VARCHAR(20),
col_c CHAR(4))
UNIQUE PRIMARY INDEX (col_a);
For eg: We have an Employee table where EMP_NO is the primary index (we have chosen
this as EMP_NO is unique to all).
Data
Distribution using UPI:-
Sample query:-
INSERT INTO DBNAME.EMPOYEE VALUES (011,'Wilson',20,'2010-10-26',5000);
When a user submits a insert query for a table with Primary Index the following processes
occur:
1. The index value goes through a hashing algorithm and gives out a 32-bit Row-hash
value something like this 0011 0011 0101 0101 0000 0001 0110 0001 for EMP_NO 011.
2. First 20 bit of this 32 bit Row hash value determines the AMP on which the row will
reside. This is decided from the Hash map which contains 1 million hash bucket. Hash
map looks something like this for 4 AMP system.
3.
So now the hash value will point to the particular amp from HASH MAP. e.g : Our value 0011
0101 0101 0000 0001 0110 0001 points to 2nd row , 1st column i.e. AMP 4. Now we have AMP 3
where the row will reside. 5. The PE will send the row to the AMP with the hash value
attached to it, something like this:
6. An uniqueness value is defined for each row. As EMP_NO is unique to all, in case of UPI
the uniqueness value will be 1 for all. This can be well understood while we will study NUPI.
So, this is how the row can be distributed to the AMP. Same is the process for retrieval.
Non-Unique Primary Index(NUPI):-
A NUPI can allow duplicates. It can have n number of null values.
Syntax for NUPI:-
CREATE TABLE sample_1
(col_a INT,
col_b VARCHAR(20),
col_c CHAR(4))
PRIMARY INDEX (col_a);
We will take the same example for understanding NUPI.
We
have taken EMP_NAME as the NUPI column, as it can contain duplicate records. As per our
table the employee name Gary has appeared twice that means it is a duplicate value. Now
let us see what happens when PE receives a duplicate value.
1. The same process of generating the Row-Hash value is followed.
2. To differentiate within the duplicates a uniqueness value is added with the hash-value,
something like below:
3.
The uniqueness value is added to make the row unique from all the duplicates. If we had one
more employee as Gary, the uniqueness value would have been 3 for him. 4. As the AMP is
selected with the help of Hash value, all the duplicates value will go the same AMP.
5. The duplicates reside on same AMP, thus it leads to uneven distribution of data and may
cause performance to degrade.
Together with Row – Hash and the uniqueness value the Teradata make as 64 bit ROW –
ID to uniquely identify each row in the given AMP.
So after knowing about Primary index the point here is that when we already
have UPI and NUPI then what’s the use of this Secondary Index?
Well the best possible answer for this question is that – Secondary Indexes provide an
alternate path to the data, and should be used on queries that run many times.
Teradata runs extremely well without secondary indexes, but since secondary indexes use
up space and overhead, they should only be used on “KNOWN QUERIES” or queries that are
run over and over again. Once you know the data warehouse, environment you can create
secondary indexes to enhance its performance.
Syntax of creating Secondary Index
Syntax of UNIQUE SI:
CREATE UNIQUE INDEX (Column/Columns) ON <dbname>.<tablename >;
Syntax of NON-UNIQUE SI:
CREATE INDEX (Column/Columns) ON <dbname>.<tablename >;
Note – SI can be created even after table is populated with data. Unlike PI which is created
only at the time of creation of table. You can create and drop SI at any time.
Whenever you create SI on the table, Teradata will create a subtable on all AMP. This
subtable contains three columns given below –
1. Secondary Index Value
2. Secondary Index Row ID (this is the hashed value of SI value)
3. Base table Row ID (this is the actual base row id )
Will see the use of all these values later in this post.
USI Subtable Example
When we defined a UNIQUE SI on the table, then Teradata will immediately create a USI
subtable in each AMP for that particular table.
When we defined a UNIQUE SI on the table, then Teradata will immediately create a USI
subtable in each AMP for that particular table.
Suppose we have an Employee table (base table) on which we defined NUSI on the
column Fname.
1) Now Teradata will first create the subtable on all AMP.
2) Each AMP will hold the secondary index values for their rows in the base table only. In
our example, each AMP holds the Fname column for all employee rows in the base table on
their AMP (AMP local).
3) Each AMP Local Fname will have the Base Table Row-ID (pointer) so the AMP can
retrieve it quickly if needed. If an AMP contains duplicate first names, only one subtable row
for that name is built with multiple Base Row-IDs. See the example above for the Fname =
‘John’ , the subtable holds multiple base row id for this value.
Teradata retrieval of NUSI query.
Suppose on the above example we make a query –
Select * from Employee_table where Fname = ‘John’;
When an NUSI (Fname) is used in the WHERE clause of an SQL statement, the PE Optimizer
recognizes the Non-Unique Secondary Index. It will perform an all AMP operation to look
into the subtable for the requested value. So the step its perform for retrieval is as follows –
1) It will hash the value of NUSI (‘John’), by hashing algorithm and found the hash value
for it.
2) Now it will instruct all AMP to look for this hash value in its Employee subtable. Note
unlike USI there is no looking into hash map because each subtable in the AMP contains
rows from its own base rows only. So this look up on hash value will be performed on all
AMP subtable.
3) Any AMP which doesn’t have this hash value will not participate anymore in the
operation.
4) When the hash value found the corresponding Base row id will be fetched from the
subtable and send to optimizer for actual retrieval of rows.
The point to note here is that NUSI operation is not similar to FTS (full table scan).
Suppose we don’t have Fname as the NUSI and we make the query
on Fname in WHERE clause. In this case first of all Fname from the Employee table is
redistributed in SPOOL space and then we match our value given in the where clause from
the rows in SPOOL.
While in our case where Fname is defined as NUSI, TD optimizer already knows that this
column is NUSI and its already distributed by its value in subtable in each AMP. So it will not
go for redistribution step instead of that it will directly match the value for it in each
subtables.
The PE will decide if a NUSI is strongly selective and worth using over a Full Table Scan. So
it’s advisable to always do COLLECT STATS on NUSI index. You can check the Explain
function to see if a NUSI is being utilized or if bitmapping (FTS) is taking place.
Secondary Index Summary
1) You can have up to 32 secondary indexes for a table.
2) Secondary Indexes provide an alternate path to the data.
3) The two types of secondary indexes are USI and NUSI.
4) Every secondary index defined causes each AMP to create a subtable.
5) USI subtables are hash distributed.
6) NUSI subtables are AMP local.
7) USI queries are Two-AMP operations.
8) NUSI queries are All-AMP operations, but not Full Table Scans.
9) Always Collect Statistics on all NUSI indexes.
Partitioned primary index or PPI is used for physically splitting the table into a series of subtables. With the proper
use of Partition primary Index we can save queries from time consuming full table scan. Instead of scanning full table,
only one particular partition is accessed.
This query will not result in a full table scan because all the January orders are kept together in
their partition.
Partition by RANGE
If we use NO RANGE or NO CASE – then all values not in this range will be in a single
partition.
If we specify UNKNOWN, then all null values will be placed in this partition
There are many ways to generate a query plan for a given SQL, and collecting statistics
ensures that the optimizer will have the most accurate information to create the best access
and join plans.
The optimizing phase of Teradata, makes decisions on how to access table data. These
decisions can be very important when table joins (especially those involving multiple joins)
are required by a query. By default, the Optimizer uses approximations of the number of
rows in each table (known as the cardinality of the table) and of the number of unique
values in indexes in making its decisions. To build such estimates, the Optimizer picks a
random AMP and builds the information and it is possible for the estimates to be
significantly off. This can lead to poor choices of join plans, and associated increases in the
response times of the queries involved.
One way to help the Optimizer make better decisions is to give it more accurate information
as to the content of the table. This can be done using the COLLECT STATISTICS statement.
When the Optimizer finds that there are statistics available for a referenced table, it will use
those statistics instead of using estimated table cardinality or estimated unique index value
counts.
Stats should be collected mainly under the below circumstances:
1. A thumb rule is to collect statistics when they’ve changed by 10%. (That would be 10% more
rows inserted, or 10% of the rows deleted, or 10% of the rows changed, or some
combination.)
2. The range of values for an index or column of a table for which statistics have been
collected has changed significantly. Sometimes one can infer this from the date and time the
statistics were last collected, or by the very nature of the column (for instance, if the column
in question holds a transaction date, and statistics on that column were last gathered a year
ago, it is almost certain that the statistics for that column are stale).
How stats are built over the table?
TD builds the uniqueness count for each identified column / set of columns for the
completed table/partition data and stores the information in the DBC tables.
Whenever the stats are collected later, the previously collected information is lost and fresh
stats are updated in the DBC tables.
The time taken to collect stats doesn’t depend on how frequently the stats have been
collected or how recently the stats have been collected.
Stats should be collected on all dimensions, history, transactional, reference and aggregate
tables based on the below approach:
1. If the table is loaded under DELETE INSERT mode, then STATS should be collected during
each load.
2. If the table is built under INSERT UPDATE mode, then STATS should be collected if the
data demographics change by more than 10%.
3. If the target is a transactional table loaded in APPEND mode, then STATS should be
collected if the data demographics change by more than 10%.
4. If the table is built under INSERT mode; (aggregate tables where data is built for a
particular duration and queried upon this duration) tables where partitions are built over
each aggregation period, STATS should be collected on the new partition, even if the data
demographics for the entire table changes less than 10%, because user queries or extractions
might be built over data for current period of aggregation.
Starting with Teradata Release 13, tables can be defined without having a primary index.
As we all know, the primary index is the main idea behind an evenly data distribution on a
Teradata system. By design, the primary index ensures that a Teradata system is
unconditionally scalable.
Hence, the question is: what is the meaning of tables without a primary index and how are
they implemented and fit into the hashing design of Teradata.
Initially, some words regarding how data is distributed in case of a no primary index table.
Basically, rows are distributed randomly across the AMPs.
As no hashing takes place, but rows have to be identified uniquely, the ROWID is generated
differently from the ROWID of a regular table having a primary index:
As we do not have a hash value, Teradata uses the HASHBUCKET of the responsible AMP
and adds a uniqueness value. As you can conclude, the bytes normally occupied by the hash
value can now be used to increase the range for generating uniqueness values.
This is how No Primary Index Tables are created:
CREATE TABLE <TABLE>
(
PK INTEGER NOT NULL
) NO PRIMARY INDEX
;
There are some further restrictions if you decide to use no primary index tables. Here are the
most important:
Only MULTISET tables can be created
No identity columns can be used
NoPi tables cannot be partitioned with a PPI
No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT)
No Permanent Journal possible
Cannot be defined as Queue Tables
Join Indexes:
The join index JOIN the two tables together and keeps the result set in the permanent space
of Teradata. This JOIN index will hold the result set of the two table, and at the time of JOIN
parsing engine will decide whether it is fast to build the result set from the actual BASE
tables or the JOIN index. User never directly query the JOIN index. In the sense JOIN index is
the result of joining two tables together so that parsing engine always decide to take the
result set from this JOIN index instead of going and doing manual join on the base table.
Types of JOIN index –
Teradata is designed in such a fashion, to reduce the DBA’s administrative functions when it
comes to space management. Space is configured in the following ways in Teradata system
–
1) PERMANENT SPACE
2) SPOOL SPACE
3) TEMPORARY SPACE
1) PERMANENT SPACE – Permanent space is where the objects (i.e. – databases, users,
tables) are created and stored. PERM space is distributed evenly across all the AMPs. Equal
distribution is necessary, because then there is a high percentage that the objects will be
shared across all the AMPs, and at the time of data retrieval all AMPs will work parallel to
fetch the data.
Unlike other relational databases the Teradata database does not physically defined the
PERM space at the time of object creation, instead of that it defines the upper limit for the
PERM space and then PERM space is used dynamically by the objects.
E.g. if a database is defined as the 500 GB PERM space and actual size of database is 300
GB only, then the remaining 200 GB will be used as SPOOL space, there is no need of holding
the 200 GB when it is not required by the database. But when database required more space
then this 200 GB will be released from the SPOOL space and given back to database. This
mechanism ensures enough memory to execute all processes in the Teradata system.
2) SPOOL SPACE – Spool space is the amount of space on the system that has not been
allocated. The primary reason for the SPOOL space is to store intermediate results or queries
that are being processed in Teradata. For example, when executing conditional query all the
qualifying rows which satisfies the given condition will be store in the SPOOL space for
further processing by the query. Any PERM space currently unassigned is available as a
SPOOL space.
Defining a SPOOL space limit is not required when Users and Databases are created. But it is
highly recommended to define the upper limit of SPOOL space for any object (i.e. users,
database, tables) which you create. Because in case there is no upper limit define for SPOOL
space for the object then the processing query for that object might consume all the space
in the system and cause “runaway transaction”.
one of the difference between the PERM space and the SPOOL space is that –
In PERM space if we create a CHILD database from the PARENT database then the amount
of PERM space for that CHILD database is subtracted from the PARENT PERM space.
For example a database SYSDBA is allotted 500 GB of PERM space. Now if we create another
CHILD database, say HR, from SYSDBA , and allot 200 GB of PERM space to HR database,
then this 200 GB will be subtracted from the PARENT database SYSDBA. Similarly if we
define another CHILD database SALARY from HR and allot 100 GB PERM space to it, then this
100 GB will be deducted from HR database.
While the SPOOL space limit for a CHILD database is not subtracted from its immediate
PARENT, but the CHILD database SPOOL space is as large as its immediate PARENT.
In spool space allocation the CHILD database HR and SALARY has the same amount of
SPOOL space as there PARENT database SYSDBA has.
To define PERM space and SPOOL space on a database we required below mentioned query
–
CREATAE DATABASE teradatatech AS PERM = 10000000, SPOOL =20000000
3) TEMP SPACE – The amount of space used for Global Temporary Tables is known as TEMP
space. These results remain available to the user until the session is terminated. Tables
created in TEMP space will survive a restart. Permanent space not being used for tables is
available for TEMP space.
Recovery Journal:
The Teradata database uses Recovery Journal to automatically maintain data integrity in the
case of :
An interrupted transaction
An AMP failure
Recovery Journals are created, maintained and purged by the system automatically, so
no DBA intervention is required. Recovery journal are tables stored in the storage medium
so they take up the disk space on the system.
There are three types of Recovery Journal in Teradata-
1. Transient Journal
2. Down – AMP Recovery Journal
3. Permanent Journal
Now we look on each of the recovery journal in details –
Transient Journal
A transient journal maintains data integrity when in-flight transactions are interrupted. Data
is returned to its original state after transaction failure.
A transient journal is used during normal system operation to keep “before images” of
changed rows so the data can be restored to its previous state if the transaction is not
completed. This happens on each AMP as changes occur. When a transaction is started, the
system automatically stores a copy of all the rows affected by the transaction in the
transient journal until the transaction is completed. Once the transaction is completed the
“before images” are purged.
In the event of transaction failure, the “before images” are reapplied to the affected tables
and deleted from the journal, and the “rollback” operation is completed.
Down AMP Recovery Journal
The down AMP recovery journal allows continued system operation while an AMP is down.
A down AMP recovery journal is used with fallback protected tables to maintain a record of
write transactions (updates, creates, inserts, deletes, etc) on the failed AMP while it is
unavailable.
The Down AMP recovery journal starts automatically after the loss of an AMP in a cluster.
Any changes to the data in the failed AMP are logged into the Down AMP recovery journal
by the other AMPs in the cluster. When the failed AMP is brought back online, the restart
process includes applying the changes in the Down – AMP recovery journal to the recovered
AMP.
The journal is discarded once the process is complete, and the AMP is brought online, fully
recovered.
Permanent Journal
Permanent Journals are an optional feature used to provide an additional level of data
protection. You specify the use of permanent journal at the table level. It provides full-table
recovery to a specific point in time. It can also reduce the need for costly and time –
consuming full table backups.
Permanent journals are tables stored on disk array like user data is, so they can take up
additional disk space, on the system. The database administrator maintains the permanent
journal entries (deleting, archiving, and so on).A database can have one permanent journal.
When you create a table with permanent journaling, you must specify whether the
permanent journal will capture.
Before images – for rollback to “undo” a set of changes to a previous state.
After images – for roll forward to “redo” to a specific state.
Following is the syntax of giving permanent journal –
Join Methods
Teradata uses different join methods to perform join operations. Some of the commonly
used Join methods are −
Merge Join
Nested Join
Product Join
Merge Join
Merge Join method takes place when the join is based on the equality condition. Merge
Join requires the joining rows to be on the same AMP. Rows are joined based on their row
hash. Merge Join uses different join strategies to bring the rows to the same AMP.
Strategy #1
If the join columns are the primary indexes of the corresponding tables, then the joining
rows are already on the same AMP. In this case, no distribution is required.
When these two tables are joined on EmployeeNo column, then no redistribution takes
place since EmployeeNo is the primary index of both the tables which are being joined.
Strategy #2
Consider the following Employee and Department tables.
If these two tables are joined on DeparmentNo column, then the rows need to be
redistributed since DepartmentNo is a primary index in one table and non-primary index in
another table. In this scenario, joining rows may not be on the same AMP. In such case,
Teradata may redistribute employee table on DepartmentNo column.
Strategy #3
For the above Employee and Department tables, Teradata may duplicate the Department
table on all AMPs, if the size of Department table is small.
Nested Join
Nested Join doesn’t use all AMPs. For the Nested Join to take place, one of the condition
should be equality on the unique primary index of one table and then joining this column to
any index on the other table.
In this scenario, the system will fetch the one row using Unique Primary index of one table
and use that row hash to fetch the matching records from other table. Nested join is the
most efficient of all Join methods.
Product Join
Product Join compares each qualifying row from one table with each qualifying row from
other table. Product join may take place due to some of the following factors −
Transient Journal
Teradata uses Transient Journal to protect data from transaction failures. Whenever any
transactions are run, Transient journal keeps a copy of the before images of the affected
rows until the transaction is successful or rolled back successfully. Then, the before images
are discarded. Transient journal is kept in each AMPs. It is an automatic process and cannot
be disabled.
Fallback
Fallback protects the table data by storing the second copy of rows of a table on another
AMP called as Fallback AMP. If one AMP fails, then the fallback rows are accessed. With
this, even if one AMP fails, data is still available through fallback AMP. Fallback option can
be used at table creation or after table creation. Fallback ensures that the second copy of
the rows of the table is always stored in another AMP to protect the data from AMP failure.
However, fallback occupies twice the storage and I/O for Insert/Delete/Update.
Following diagram shows how fallback copy of the rows are stored in another AMP.
Down AMP Recovery Journal
The Down AMP recovery journal is activated when the AMP fails and the table is fallback
protected. This journal keeps track of all the changes to the data of the failed AMP. The
journal is activated on the remaining AMPs in the cluster. It is an automatic process and
cannot be disabled. Once the failed AMP is live then the data from the Down AMP recovery
journal is synchronized with the AMP. Once this is done, the journal is discarded.
Cliques
Clique is a mechanism used by Teradata to protect data from Node failures. A clique is
nothing but a set of Teradata nodes that share a common set of Disk Arrays. When a node
fails, then the vprocs from the failed node will migrate to other nodes in the clique and
continue to access their disk arrays.
RAID
Redundant Array of Independent Disks (RAID) is a mechanism used to protect data from
Disk Failures. Disk Array consists of a set of disks which are grouped as a logical unit. This
unit may look like a single unit to the user but they may be spread across several disks.
RAID 1 is commonly used in Teradata. In RAID 1, each disk is associated with a mirror disk.
Any changes to the data in primary disk is reflected in mirror copy also. If the primary disk
fails, then the data from mirror disk can be accessed.
Explain
The first step in performance tuning is the use of EXPLAIN on your query. EXPLAIN plan
gives the details of how optimizer will execute your query. In the Explain plan, check for the
keywords like confidence level, join strategy used, spool file size, redistribution, etc.
Collect Statistics
Optimizer uses Data demographics to come up with effective execution strategy. COLLECT
STATISTICS command is used to collect data demographics of the table. Make sure that the
statistics collected on the columns are up to date.
Collect statistics on the columns that are used in WHERE clause and on the columns
used in the joining condition.
Collect statistics on the Unique Primary Index columns.
Collect statistics on Non Unique Secondary Index columns. Optimizer will decide if it
can use NUSI or Full Table Scan.
Collect statistics on the Join Index though the statistics on base table is collected.
Collect statistics on the partitioning columns.
Data Types
Make sure that proper data types are used. This will avoid the use of excessive storage than
required.
Conversion
Make sure that the data types of the columns used in join condition are compatible to avoid
explicit data conversions.
Sort
Remove unnecessary ORDER BY clauses unless required.
Primary Index
Make sure that the Primary Index is correctly defined for the table. The primary index
column should evenly distribute the data and should be frequently used to access the data.
SET Table
If you define a SET table, then the optimizer will check if the record is duplicate for each and
every record inserted. To remove the duplicate check condition, you can define Unique
Secondary Index for the table.
UPDATE on Large Table
Updating the large table will be time consuming. Instead of updating the table, you can
delete the records and insert the records with modified rows.
MULTISET Table
If you are sure that the input records will not have duplicate records, then you can define
the target table as MULTISET table to avoid the duplicate row check used by SET table.
Teradata – BTEQ:
BTEQ utility is a powerful utility in Teradata that can be used in both batch and interactive
mode. It can be used to run any DDL statement, DML statement, create Macros and stored
procedures. BTEQ can be used to import data into Teradata tables from flat file and it can
also be used to extract data from tables into files or reports.
BTEQ Terms
Following is the list of terms commonly used in BTEQ scripts.
Example
Following is a sample BTEQ script.
.LOGON 192.168.1.102/dbc,dbc;
DATABASE tduser;
SELECT * FROM
Employee
Sample 1;
.IF ACTIVITYCOUNT <> 0 THEN .GOTO InsertEmployee;
.LABEL InsertEmployee
INSERT INTO employee_bkup
SELECT a.EmployeeNo,
a.FirstName,
a.LastName,
a.DepartmentNo,
b.NetPay
FROM
Employee a INNER JOIN Salary b
ON (a.EmployeeNo = b.EmployeeNo);
Teradata – FastLoad:
FastLoad utility is used to load data into empty tables. Since it does not use transient
journals, data can be loaded quickly. It doesn't load duplicate rows even if the target table
is a MULTISET table.
Limitation
Target table should not have secondary index, join index and foreign key reference.
Phase 1
The Parsing engines read the records from the input file and sends a block to each
AMP.
Each AMP stores the blocks of records.
Then AMPs hash each record and redistribute them to the correct AMP.
At the end of Phase 1, each AMP has its rows but they are not in row hash sequence.
Phase 2
Phase 2 starts when FastLoad receives the END LOADING statement.
Each AMP sorts the records on row hash and writes them to the disk.
Locks on the target table is released and the error tables are dropped.
Example
Create a text file with the following records and name the file as employee.txt.
101,Mike,James,1980-01-05,2010-03-01,1
102,Robert,Williams,1983-03-05,2010-09-01,1
103,Peter,Paul,1983-04-01,2009-02-12,2
104,Alex,Stuart,1984-11-06,2014-01-01,2
105,Robert,James,1984-12-01,2015-03-09,3
Following is a sample FastLoad script to load the above file into Employee_Stg table.
LOGON 192.168.1.102/dbc,dbc;
DATABASE tduser;
BEGIN LOADING tduser.Employee_Stg
ERRORFILES Employee_ET, Employee_UV
CHECKPOINT 10;
SET RECORD VARTEXT ",";
DEFINE in_EmployeeNo (VARCHAR(10)),
in_FirstName (VARCHAR(30)),
in_LastName (VARCHAR(30)),
in_BirthDate (VARCHAR(10)),
in_JoinedDate (VARCHAR(10)),
in_DepartmentNo (VARCHAR(02)),
FILE = employee.txt;
INSERT INTO Employee_Stg (
EmployeeNo,
FirstName,
LastName,
BirthDate,
JoinedDate,
DepartmentNo
)
VALUES (
:in_EmployeeNo,
:in_FirstName,
:in_LastName,
:in_BirthDate (FORMAT 'YYYY-MM-DD'),
:in_JoinedDate (FORMAT 'YYYY-MM-DD'),
:in_DepartmentNo
);
END LOADING;
LOGOFF;
FastLoad Terms
Following is the list of common terms used in FastLoad script.
Teradata – MultiLoad:
MultiLoad can load multiple tables at a time and it can also perform different types of tasks
such as INSERT, DELETE, UPDATE and UPSERT. It can load up to 5 tables at a time and
perform up to 20 DML operations in a script. The target table is not required for MultiLoad.
IMPORT
DELETE
MultiLoad requires a work table, a log table and two error tables in addition to the target
table.
Log Table − Used to maintain the checkpoints taken during load which will be used
for restart.
Error Tables − These tables are inserted during load when an error occurs. First error
table stores conversion errors whereas second error table stores duplicate records.
Log Table − Maintains the results from each phase of MultiLoad for restart purpose.
Work table − MultiLoad script creates one work table per target table. Work table is
used to keep DML tasks and the input data.
Limitation
MultiLoad has some limitations.
101,Mike,James,1980-01-05,2010-03-01,1
102,Robert,Williams,1983-03-05,2010-09-01,1
103,Peter,Paul,1983-04-01,2009-02-12,2
104,Alex,Stuart,1984-11-06,2014-01-01,2
105,Robert,James,1984-12-01,2015-03-09,3
The following example is a MultiLoad script that reads records from employee table and
loads into Employee_Stg table.
.LOGTABLE tduser.Employee_log;
.LOGON 192.168.1.102/dbc,dbc;
.BEGIN MLOAD TABLES Employee_Stg;
.LAYOUT Employee;
.FIELD in_EmployeeNo * VARCHAR(10);
.FIELD in_FirstName * VARCHAR(30);
.FIELD in_LastName * VARCHAR(30);
.FIELD in_BirthDate * VARCHAR(10);
.FIELD in_JoinedDate * VARCHAR(10);
.FIELD in_DepartmentNo * VARCHAR(02);
Teradata – FastExport:
FastExport utility is used to export data from Teradata tables into flat files. It can also
generate the data in report format. Data can be extracted from one or more tables using
Join. Since FastExport exports the data in 64K blocks, it is useful for extracting large
volume of data.
Example
Consider the following Employee table.
EmployeeNo FirstName LastName BirthDate
Following is an example of a FastExport script. It exports data from employee table and
writes into a file employeedata.txt.
.LOGTABLE tduser.employee_log;
.LOGON 192.168.1.102/dbc,dbc;
DATABASE tduser;
.BEGIN EXPORT SESSIONS 2;
.EXPORT OUTFILE employeedata.txt
MODE RECORD FORMAT TEXT;
SELECT CAST(EmployeeNo AS CHAR(10)),
CAST(FirstName AS CHAR(15)),
CAST(LastName AS CHAR(15)),
CAST(BirthDate AS CHAR(10))
FROM
Employee;
.END EXPORT;
.LOGOFF;
Executing a FastExport Script
Once the script is written and named as employee.fx, you can use the following command
to execute the script.
FastExport Terms
Following is the list of terms commonly used in FastExport script.