Professional Documents
Culture Documents
Course
Course
Workshop
Metadata Metadata
Access Services Analysis Services
Metadata Server
Clients
DataStage
parallel job
Log
message
events
User-added
folder
Standard
jobs folder
Standard
Table
Definitions
folder
● Stages
– Implemented as OSH operators (pre-built components)
– Passive stages (E and L of ETL)
• Read data
• Write data
• E.g., Sequential File, DB2, Oracle, Peek stages
– Processor (active) stages (T of ETL)
• Transform data
• Filter data
• Aggregate data
• Generate data
• Split / Merge data
• E.g., Transformer, Aggregator, Join, Sort stages
● Links
– “Pipes” through which the data moves from stage to stage
Node 1
Operation
subset1
Node 2
subset2
Operation
Data subset3
Node 3
Operation
Clients
Startup command
Information Server
console address
Information
Server
administrator ID
Administration
console
Create
new user
Users
Assign Suite
User ID Administrator role
Assign Suite
User role
Users
Assign DataStage
User role Assign DataStage
Administrator role
© Copyright IBM Corporation 2009 43
DataStage Credential Mapping
● DataStage credentials
– Required by DataStage in order to log onto a DataStage client
– Managed by the DataStage Server operating system or LDAP
● Users given DataStage Suite roles in the Suite
Administration console do not automatically receive
DataStage credentials
– Users need to be mapped to a user who has DataStage credentials
on the DataStage Server machine
• This DataStage user must have file access permission to the
DataStage engine/project files or Administrator rights on the
operating system
● Three ways of mapping users (IS ID OS ID)
– Many to 1
– 1 to 1
– Combination of the above
Map DataStage
User credential
DataStage
administrator ID
and password
Name or IP
address of
DataStage server
machine
© Copyright IBM Corporation 2009 47
DataStage Administrator Projects Tab
Server projects
Click to specify
project properties
Enable runtime
column propagation
(RCP) by default
Environment
variable settings
User-defined variables
Display Score
Display record
counts
Display OSH
Assigned product
role
Add DataStage
users
Add DataStage
Available users /
users
groups. Must
have a
DataStage
product User role.
Added DataStage
user
Select DataStage
role
Host name,
port number of
application
server
DataStage server
machine/project
Parallel
canvas
Palette
Project
Default jobs
folder
Default Table
Definitions
folder
Click to
select
objects
from the
repository Selected
objects
Export
type Export
location on
client
system
Begin export
© Copyright IBM Corporation 2009 68
Import Procedure
Display list to
select from
Information Server
domain. May
contain multiple
DataStage systems
Created
export
archive
DataStage
Repository. Shows all
DataStage servers
and projects
© Copyright IBM Corporation 2009 75
Information Server Manager – Package View
● Multiple packages Domain
● Multiple builds within each package Packages
● Build history
Builds
Objects
Import Table
Definition for
a sequential
file
Start import
Select file
Select repository
folder
Delimiter
Double-click to define
extended properties
Parallel properties
Property
categories
Available
properties
Type Source
Stored Table
Definition
Stage
categories
Stages
Open
New
window
Parallel job
Compile
Row Peek
Generator
Job
properties
Properties
tab
Set property
value
Property
Double-click to specify
extended properties
View data
Select Table
Definition Load a
Table
Definition
© Copyright IBM Corporation 2009 100
Extended Properties
Specified
properties and
their values
Additional
properties to add
Output to
job log
Parameters tab
Parameter
Add environment
variable
● Job Properties
– Short and long descriptions
● Annotation stage
– Added from the Tools Palette
– Displays formatted text descriptions on diagram
Documentation
Run
Compile
Assign values to
parameter
Select job to
view
messages in
the log view
© Copyright IBM Corporation 2009 116
Director Log View
Click the notepad
icon to view log
messages
Peek messages
● Sequential
– Fixed or variable length
● Data Set
● Complex flat file
Record delimiter
Field 1 , Field 1
2 , Field 13 , Last field nl
Stream link
Reject link
(broken line)
Output tab
File to
access
Column names
in first row
Record format
Column format
View data
Save as a new
Table Definition
Use wild
cards
Select File
Pattern
Append /
Overwrite
Column
mappings
Nullable
column
Added
property
DataSet stage
DataSet stage
properties
Display data
Display record
counts for each
data file
Data viewer
● Parallel processing:
– Executing the job on multiple CPUs
● Scalable processing:
– Add more resources (CPUs and disks) to increase system
performance
Node 1
Operation
subset1
Node 2
subset2
Operation
Data subset3
Node 3
Operation
Keyless
● Keyless partitioning methods
…8 7 6 5 4 3 2 1 0
● Rows are evenly distributed
across partitions
– Good for initial import of data if no
other partitioning is needed
– Useful for redistributing data Round Robin
● Fairly low overhead
● Misplaced records
– Using Aggregator stage to sum customer sales by
customer number
– If there are 25 customers, 25 records should be output
– But suppose records with the same customer numbers
are spread across partitions
• This will produce more than 25 groups (records)
– Solution: Use hash partitioning algorithm
● Partition imbalances
– If all the records are going down only one of the nodes,
then the job is in effect running sequentially
Part 0
ID LName FName Address
Partition 1
ID LName FName Address
5 Dodge Horace 17840 Jefferson
1 Ford Henry 66 Edison Avenue
6 Dodge John 75 Boston Boulevard
2 Ford Clara 66 Edison Avenue
7 Ford Henry 4901 Evergreen
3 Ford Edsel 7900 Jefferson
8 Ford Clara 4901 Evergreen
4 Ford Eleanor 7900 Jefferson
9 Ford Edsel 1100 Lakeshore
7 Ford Henry 4901 Evergreen
10 Ford Eleanor 1100 Lakeshore
8 Ford Clara 4901 Evergreen
Partitioning icon
Collecting icon
“fan-out” Sequential
to Parallel
SAME partitioner
Re-partition
watch for this!
AUTO partitioner
Algorithms
Key
specification
Algorithms
© Copyright IBM Corporation 2009 176
Runtime Architecture
Executable
Job
Transformer
Components
Reference link
Source (stream)
link
Lookup
constraints
Mappings to
output columns
– Fail (Default)
• Stage reports an error and the job fails immediately
– Drop
• Input row is dropped
– Continue
• Input row is transferred to the output. Reference link columns are
filled with null or default values
– Reject
• Input row sent to a reject link
• This requires that a reject link has been created for the stage
Select to return
multiple rows
Lookup key
column
Empty string
or NULL
Output of Lookup with Drop option
Reference link
Lookup stage
Retrieve
description
Source values
Source link
Double-click
to specify
range
Reference link
Select range
columns
Select
operators
Source range
Retrieve other
column values
Reference link
key column
Range type
Select range
columns
Column to
match
Select left input
link
Select if multiple
Select join type columns make up
the join key
Join key
column
Null or default
value
● Transfers all values from the right link and transfers values from
the left link only where key columns match.
Null or default
value
Merge stage
Capture update
link non-matches
Match key
Keep or drop
unmatched
masters
Output Rejects
Partitioning
tab Do sort
Preserve
non-key
ordering
Remove
dups
Sort key
Sort options
● Sort Utility
– DataStage – the default
– Unix: Don’t use. Slower than DataStage sort utility
● Stable
● Allow duplicates
● Memory usage
– Sorting takes advantage of the available memory for increased
performance
• Uses disk if necessary
– Increasing amount of memory can improve performance
4 X
1 K
3 Y
1 A
1 K
2 P
3 C 2 L
2 P 3 Y
3 D 3 C
1 A 3 D
2 L 4 X
Aggregator
stage
● Count rows
– Count rows in each group
– Put result in a specified output column
● Calculation
– Select column
– Put result of calculation in a specified output column
– Calculations include:
• Sum
• Count
• Min, max
• Mean
• Missing value count
• Non-missing value count
• Percent coefficient of variation
Grouping key
columns
Grouping key
columns
Calculation
aggregation type
Calculations with
output columns
Available
calculations
© Copyright IBM Corporation 2009 241
Grouping Methods
● Hash (default)
– Calculations are made for all groups and stored in memory
• Hash table structure (hence the name)
– Results are written out after all input has been processed
– Input does not need to be sorted
– Useful when the number of unique groups is small
• Running tally for each group’s aggregations needs to fit into
memory
● Sort
– Requires the input data to be sorted by grouping keys
• Does not perform the sort! Expects the sort
– Only a single aggregation group is kept in memory
• When a new group is seen, the current group is written out
– Can handle unlimited numbers of groups
Key Col 4 4X
4 X
3 Y 3 3Y 3C 3D
1 K
3 C
1 1K 1A
2 P
3 D
2 2P 2L
1 A
2 L
1 K 1K 1A
1 A
2 P
2P 2L
2 L
3 Y
3Y 3C 3D
3 C
3 D
4X
4 X
OR
Remove Duplicates
stage
Transformer
Single input
Show/Hide
Constraint
Derivations / Mappings
Column defs
Input column
Function
Job parameter
Input column
Reject link
Convert link to a
Reject link
Otherwise link
Check to create
otherwise Can specify abort
condition condition
Last in
order
New Parameter
Set
Specify
parameters
Parameter set
name is specified
on General tab
Values for
Values file name parameters
View Parameter
Set
Select Parameter
Set parameters
● ODBC
– Conforms to ODBC 3.5 standard and is Level 3 compliant
– Certified with Oracle, DB2 UDB, SQL Server, and others
– DataDirect drivers
● DB2 UDB
● Teradata
● InfoSphere MQ
– WSMQ 5.3 and 6.0 for Client / Sever
– WSMB 5.0
ODBC connector
ODBC Connector
Properties
Click to display link
properties and columns
Warning indicator
SQL statement
property
SQL Builder
Connection
information
SQL
Transaction
and session
management
Before/After
the SQL
Connection
information
Connection
information
Write mode
Locator tab
Table name
SQL Builder
for Select
Drag Table
Definition
Drag
Columns
Build column
expressions
Selection tab
Select function
Select Expression Editor
Open a second
Expression
Editor
Functions
Select function
Function parameters
Second column
to sort by
Sort Ascending
/ Descending
First column to
sort by
Read-only
SQL tab
Write mode
property
Specify
INSERT
SQL
Specify Builder
UPDATE
© Copyright IBM Corporation 2009 312
SQL Builder INSERT
Drag Table
Definition
Drag
columns to
load Specify
values
Drag Table
Definition
Drag
columns to
load Specify
values
Add
WHERE
conditio
clause
n to
clause
New Data
Connection
Specify parameter
Optionally create values
a parameter set
Load Data
Connection
Save Data
Connection
Parallel layout
Layout
Schema corresponding to
the Table Definition Save schema
No columns
defined here
Path to
schema file
● Project level
– DataStage Administrator Parallel tab
● Job level
– Job properties General tab
● Stage level
– Link Output Column tab
● Settings at a lower level override settings at a higher level
– E.g., disable at the project level, but enable for a given job
– E.g., enable at the job level, but disable a given stage
Enable RCP in
Sequential File
stage
Check to enable
RCP in
Transformer
Selected
components
Click
Shared
Container
● Run stages
– Job Activity: Run a job
– Execute Command: Run a system command
– Notification Activity: Send an email
● Flow control stages
– Sequencer: Go if All / Some
– Wait for File: Go when file exists / doesn’t
exist
– StartLoop / EndLoop
– Nested Condition: Go if condition satisfied
● Error handling
– Exception Handler
– Terminator
● Variables
– User Variables
Send email
Run job
Handle
exceptions
Restart
Exception stage to
handle aborts
Job to be executed
Execution mode
Job parameters
to be passed
Executable
Parameters to pass
Include job
status info in
email body
Variable
File
Options
Trigger conditions
Pass control to
Exception stage when
an activity fails
Control
goes here
if an
activity fails
Enable
checkpoints to be
added
Don’t
checkpoint
this activity
Information Management
Course Scenario
Review Questions
Copy data files to disk if not using the virtual machine supplied
with the course materials
Information Management
Rule Sets
Classes
Default Classes
Threshold Weights
Defines the fields for the output file for the rule set
The format for the Dictionary File is:
field-identifier/ field-type/ field-length/missing value/description
[;comments]
– Field identifier: Any length, no spaces in the name,
• Example: HouseNumber
– Field type: C (character), N (numeric)
• This is legacy information but must be entered into the dictionary file
– Field length: Length (in bytes)
– Missing value identifier: S (spaces), Z (zero), X (none)
• This is legacy information but a value must be entered into the dictionary
file
– Description: No spaces allowed in the description
The first line of a Dictionary File must be:
– \FORMAT\ SORT=N
Lookup tables
– Contain information that is specific to the rule set; for example the gender
lookup table for USNAME contains the following:
MRS F
MS F
MR M
– ASCII file that can be manually edited
User override tables
– User classification, input text, input pattern, unhandled text, unhandled
pattern
– Entries added through Designer Standardization Overrides GUI
USNAME CopyOfUSNAME
CopyOfUSNAME
– CLS – Apaloosa Apalusa F
– PAT – F | +
• Change COPY TO COPY_A
Use Rules Analyzer to test
Information Management
The Answer
Parsing
Start by having only the space in the SEPLIST and only the space
in the STRIPLIST
Examine the pattern report to examine how the current SEPLIST/
STRIPLIST combination parsed the data
Remember, it’s important to thoroughly parse data before
classifying words
– New words can result from additional parsing
For example, if the “/” is not in the SEPLIST, the following pattern
results
VALHALLA/WRAPPER @
The pattern now shows that there are two (currently) unclassified
words separated by a “/”
– There are now two more words we can classify
How would removing the highlighted special characters impact the meaning
of the below data items?
#2 pencil
5# flour
Ave.
$100.00
£100.00
1/2
True or false: It’s best to start with all special characters in the
STRIPLIST.
Information Management
Exercise 5-1
Refine SEPLIST/STRIPLIST
Modify .PAT to reflect new SEPLIST/STRIPLIST
Re-run Word Investigate stage
Iterate as often as needed
Information Management
Classification Schema
What to Classify
What to Classify
Result File
Review Questions
Information Management
Generating Patterns
Once the classification table has been created and the Word
Investigate stage is rerun, patterns will emerge
Review of the Pattern Report will indicate whether additional
classes or new members to existing classes must be made
Are the patterns explicit enough for you to glean the meaning of
the data based on its context?
– If you can answer yes to this question, you are ready to begin developing
pattern action sets
– If you answer no to this question, you must refine/modify your classification
schema/table until you can answer this question in the affirmative
Result File
Information Management
Information Management
Pattern Format
Pattern
^|D|?|T
COPY [1] {HouseNumber} ; Copy operand 1 to HOUSENUMBER
Action
COPY_A [2] {PreDirection} ; Copy operand 2 w/standardization to PREDIR
Statements
COPY_S [3] {StreetName} ; Copy operand 3 w/spaces to STREETNAME
COPY_A [4] {StreetType} ; Copy operand 4 w/standardization to
; SUFFIXTYPE
^|?|T
COPY [1] {HN} ; Copy operand 1 to HOUSENUMBER
COPY_S [2] {SN} ; Copy operand 2 w/spaces to STREETNAME
COPY_A [3] {ST} ; Copy operand 3 w/standardization to SUFFIXTYPE
^|D|?|T
COPY [1] {HN} ; Copy operand 1 to HOUSENUMBER
COPY_A [2] {PD} ; Copy operand 2 w/standardization to PREDIRECTION
COPY_S [3] {SN} ; Copy operand 3 w/spaces to STREETNAME
COPY_A [4] {ST} ; Copy operand 4 w/standardization to SUFFIXTYPE
User Variables
This pattern action set looks for a date such as 10/15/98 and
copies it as a single token to a date field
*^|/|^|/|^
COPY [1] temp
CONCAT [2] temp
CONCAT [3] temp
CONCAT [4] temp
CONCAT [5] temp Temporary Variable
COPY temp {DA}
Position-related Commands
Position-related Commands
Subfield Commands
Subfield Classes - 1 to 9, -1 to -9
– Parses individual words of a ? string
– Johnson Controls Corp whose pattern is ? | C
– 1 | C – Johnson is operand 1 and Corp is operand 2
– -1 | C – Controls is operand 1 and Corp is operand 2
Subfield Ranges - beg:end
– Specified a range of unknown words
– 123 – A B Main Street whose pattern is ^ | - | ? | T
– ^ | - | (1:2) | T
• “A B Main” is the string specified by ?
• 1:2 is the portion A B
– Operand [3] is A B
• You can also look from the end of the field. For example, (-3:-1) specifies
a range of the third word from the last to the last word
Additional Classes
Universal Class - **
– Matches all tokens
– ** | T would match any pattern that ended with a T class
Negation Class Qualifier - !
– Is used to indicate NOT
– ** | !T would match any pattern that did not end with a T class
Conditionals
Referencing Operands
Calling Subroutines
Review Questions
Information Management
Result File
Information Management
Once you’ve run the Standardize stage, you should investigate the
results to see how accurately your rule set is working
Your rule set contains two fields that make this process much
faster and easier
– Unhandled Pattern – which shows the pattern of any data it was unable to
process
– Unhandled Data – which show the actual unhandled/unprocessed data
Running a Character Concatenate Investigate stage on these two
fields will provide you with a consolidated view of patterns/data
that still need processing
You should also run a Character Discrete Investigate stage on all
your single domain fields to ensure they have domain integrity
and your rule set is working as desired
Review Questions
Windows Linux/UNIX
DB2 DB2
zSeries iSeries
DB2
DB2
DB2 Express-C
Free, entry-level edition of the DB2 data server
for the developer and partner community
SQL/XML/API
Database Engine
Communication Support
• Product documentation:
http://www.ibm.com/support/docview.wss?rs=71&uid=swg27015148
db2
option-flag db2-command
sql-statement
?
phrase
message
sql-state
class-code
Non-interactive mode
Interactive mode
db2
db2=> connect to musicdb
db2=> select * from syscat.tables
© Copyright IBM Corporation 2010
CLP command options
4 Every session
put point 3 in UNIX db2profile
or System Program Group in Windows
© Copyright IBM Corporation 2010
Input file
Edit create.tab
-- comment: db2 -svtf create.tab
connect to sample;
commit work;
connect reset;
edit seltab.cmd
vi seltab
echo "Table Name Is" $1 > out.sel echo 'Table Name Is' %1 > out.sel
db2 "select * from $1" >> out.sel db2 Select * from %1 >> out.sel
seltab org
quit No No
IBM DB2
Contents
Pane
Object
Tree
Pane
Object
Detail
Pane
Configuration
Assistant
Remote Client
© Copyright IBM Corporation 2010
Configuration Assistant (2 of 2)
Instance_1 Instance_2
DBM CFG DBM CFG
DB CFG DB CFG
DB CFG
Computer system
Tablespace A
Tablespace B
Table 1 Table 2 Table 3
Table 4
co
nn
e ct
to
• DROP:
– Instance must be stopped and no applications are allowed to be
connected to the databases in this instance
– Does not remove databases (catalog entries)
– Removes the Instance
db2idrop <instance_name>
db2start db2stop
Environment Platform
Variables Specific
No restart Global-Level
of system
Profile
after changing!
Registry
© Copyright IBM Corporation 2010
The db2set command
• Command line tool for DB2 Profile Registry administration
• Displays, sets, resets, or removes profile variables
db2set variable=value
-g
-i instance [db-partition-number]
-n DAS node [[-u user id] [-p password]]
-u
-ul
-ur
-p
-r
-l
-lr
-v
-? (or -h)
-all
Node type = Enterprise Server Edition with local and remote clients
database1 database2
Tablespace A Tablespace A
Tablespace B
Table 1 Table 2 Table 3 Table 4 Table 1 Table 2
4 extents
database DB2INSTANCE=inst20
inst20
db2nodes.cfg
NODEnnnn
SQLDBDIR
sqldbdir
SQLxxxxx
SQLDBCONF SQLBP.1 SQLBP.2
db2rhist.asc db2rhist.bak
SQLOGCTL.LFH.1 SQLOGCTL.LFH.2 SQLOGMIR.LFH
SQLSGF.1 SQLSGF.2
SQLSPCS.1 SQLSPCS.2
SQLOGDIR
dbauto DB2INSTANCE=inst20
inst20
NODE0000
DSS
T0000000 SYSCATSPACE
C0000000.CAT
T0000001 TEMPSPACE1
C0000000.TMP
T0000002 USERSPACE1
C0000000.LRG
SYSCAT v
...
i
e SYSIBM.SYSCOLUMNS
SYSSTAT SYSIBM.SYSTABLES
w
...
s
ALL DBPARTITIONNUMS
,
DATABASE PARTITION GROUP db-partition-group-name
SIZE 1000 AUTOMATIC
SIZE number-of-pages
1000
SIZE AUTOMATIC
number-of-pages
NUMBLOCKPAGES 0
NUMBLOCKPAGES number-of-pages
BLOCKSIZE number-of-pages
PAGESIZE integer
K
© Copyright IBM Corporation 2010
Table space, container, extent, page
TABLE SPACE
CONTAINER
EXTENT
DATA
PAGE
File
Directory
Device
Container 0 0 2 Container 1
1 3
Extent
4 64 4 8
KB GB KB TB
128 16
GB 256 TB 32
8 KB GB 8 KB TB
512 64
GB TB
16 KB 16 KB
32 KB 32 KB
Page size table space size Page size
,
USING ( 'container-string' )
| on-db-partitions-clause |
database-containers:
USING | container-clause |
| on-db-partitions-clause |
container-clause:
,
( FILE 'container-string' num-pages )
DEVICE integer K
M
G
on-db-partitions-clause:
,
ON DBPARTITIONNUM ( db-partition-number1 )
DBPARTITIONNUMS TO db-partition-number2
• Example
To list table spaces ordered by number of physical reads from table
space containers.
SELECT varchar(tbsp_name, 30) as tbsp_name,
member,
tbsp_type,
pool_data_p_reads
FROM TABLE(MON_GET_TABLESPACE('',-2)) AS t
ORDER BY pool_data_p_reads DESC
• Consider EXTENTSIZE
– Trade-off between space (very small tables) and performance
• DEACTIVATE DATABASE
– Databases initialized by ACTIVATE DATABASE can be shut down
using the DEACTIVATE DATABASE command, or using the
db2stop command.
db2 deactivate db <db_name>
SYSCATSPACE SYSCATSPACE
Catalog Catalog
View1 View1
Log Database 1 Log Database 2
View2
DB configuration file DB configuration file
Index1 TSDMSREG2
Index1
TSDMSLRG3
BLOBs Index2
/path
[ SAMPLE ]
[ DB2INSTANCE ] [ T0000000 ] C0000000.CAT
SQLCRT.FLG
db2rhist.asc [ T0000001 ] SQLCRT.FLG
[ NODE0000 ] db2rhist.bak
SQLBP.1
[ C0000000.TMP ]
[ sqldbdir ] SQLBP.2
[ T0000002 ] C0000000.LRG
SQLDBCON SQLCRT.FLG
[ SQL00001 ] SQLDBCONF C0000000.LRG
SQLINSLK [ T0000003 ] SQLCRT.FLG
SQLOGCTL.LFH.1
SQLOGCTL.LFH.2 C0000000.LRG
[ T0000004 ] SQLCRT.FLG
SQLOGMIR.LFH
SQLSGF.1 C0000000.LRG
SQLSGF.2 [ T0000005 ] SQLCRT.FLG
SQLSPCS.1
[ db2event ] SQLSPCS.2
SQLTMPLK
[ your_SMS_tablespace ]
[ db2detaildeadlock ]
00000000.EVT
[ CONT1 ]
..
DB2EVENT.CTL SQLTAG.NAM
[ SQLOGDIR ] [ CONT2 ]
S0000000.LOG SQLTAG.NAM
S0000001.LOG SQL00041.DAT
connect to musicdb;
create table artists
(artno smallint not null,
name varchar(50) with default 'abc',
classification char(1) not null,
bio clob(100K) logged,
picture blob( 10M) not logged compact)
in dms01
index in dms02
long in dms03;
Action
News
DB2
by the Book
B L OB C L OB D B C L OB
H A J O Y H A J
I A J
A R E U T A R E
N R E R G C B E R G C
A G C A E T L A E T
R E T C E C
Y T T
E E
R R
DB2
© Copyright IBM Corporation 2010
Table partitioning
• Data organization scheme in which table data is divided across
multiple storage objects called data partitions or ranges:
– Each data partition is stored separately
– These storage objects can be in different table spaces, in the same
table space, or a combination of both
• Benefits:
– Easier roll-in and roll-out of table data
– Allows large data roll-in (ATTACH) or roll-out (DETACH) with a
minimal impact to table availability for applications
– Supports very large tables
– Indexes can be either partitioned (local) or non-partitioned (global)
– Table and Index scans can use partition elimination when access
includes predicates for the defined ranges
– Hierarchical Storage Management (HSM) solutions can be used to
manage older, less frequently accessed ranges
© Copyright IBM Corporation 2010
Example of a partitioned table
• The PARTITION BY RANGE clause defines a set of data ranges
CREATE TABLE PARTTAB.HISTORYPART ( ACCT_ID INTEGER NOT NULL ,
TELLER_ID SMALLINT NOT NULL ,
BRANCH_ID SMALLINT NOT NULL ,
BALANCE DECIMAL(15,2) NOT NULL ,
……….
TEMP CHAR(6) NOT NULL )
PARTITION BY RANGE (BRANCH_ID)
(STARTING FROM (1) ENDING (20) IN TSHISTP1 INDEX IN TSHISTI1 ,
STARTING FROM (21) ENDING (40) IN TSHISTP2 INDEX IN TSHISTI2 ,
STARTING FROM (41) ENDING (60) IN TSHISTP3 INDEX IN TSHISTI3 ,
STARTING FROM (61) ENDING (80) IN TSHISTP4 INDEX IN TSHISTI4 ) ;
• In this example the data objects and index objects for each data range
are stored in different table spaces
• The table spaces used must be defined with the same options, such as
type of management, extent size and page size
CONNECT TO TESTDB;
CREATE VIEW EMPSALARY
AS SELECT EMPNO, EMPNAME, SALARY
FROM PAYROLL, PERSONNEL
WHERE EMPNO=EMPNUMB AND SALARY > 30000.00;
CONNECT TO SAMPLE;
CREATE ALIAS ADMIN.MUSICIANS FOR ADMIN.ARTISTS;
COMMENT ON Alias ADMIN.MUSICIANS IS 'MUSICIANS
alias for ARTISTS';
CONNECT RESET;
Department Table
DEPT DEPTNAME
Parent Table
R PRIMARY KEY = DEPT
E
S
T Employee Table
R EMPNO Dependent Table
NAME WKDEPT
I FOREIGN KEY = WKDEPT
C
T
RULE PGM n
DB2 DB2
ENFORCEMENT
APPLICATION RULE
DATABASE
RULE RULE
BUSINESS RULES:
INSERT, UPDATE,
and DELETE
actions against a
table may require
another action or
set of actions
to occur to meet
PGM n the needs of the
business.
ENFORCEMENT DB2
APPLICATION
DATABASE
Attribute
<book>
Start Tag <authors>
<author id=“47”>John Doe</author>
<author id=“58”>Peter Pan</author>
End Tag </authors>
<title>Database systems</title>
<price>29</price> Element
<keywords>
Start Tag
<keyword>SQL</keyword>
<keyword>relational</keyword>
</keywords>
End Tag
</book>
Data
/book Database
author author 29 keyword keyword
Systems
/book/authors
/book/authors/author
/book/authors/author/@id id=47 John Doe id=58 Peter Pan SQL relational
(...)
© Copyright IBM Corporation 2010
Integration of XML and relational capabilities
• Applications combine XML and relational data
• Native XML data type (server and client side)
• XML capabilities in all DB2 components
DB2 SERVER
, ,
...
MODIFIED BY filetype-mod
select-statement
MESSAGES message-file
DB2
MUSICDB
EXPORT
artexprt
artists
...
,
LOBS FROM lob-path
ALLOW NO ACCESS
tblspace-specs
| |
IN tablespace-name
INDEX IN tablespace-name LONG IN tablespace-name
© Copyright IBM Corporation 2010
IMPORT command example
• Imports data from a file to a database table
db2 connect to musicdb
db2 import from artexprt of ixf
messages artmsg create into artists
insert
insert_update
replace
replace_create
• Imports data from a file to three DMS table spaces
db2 connect to musicdb
db2 import from artexprt of ixf
messages artmsg create into artists [(column_list)]
in <tablespace>
index in <indextablespace>
long in <largetablespace>
DB2
musicdb
IMPORT
artexprt artists
© Copyright IBM Corporation 2010
Differences between IMPORT and LOAD
IMPORT LOAD
• Slow when moving large • Faster for large amounts of
amounts of data data
1. LOAD
Load data into tables DB2 Data
Collect index keys / sort
Consistency points at SAVECOUNT
Invalid data rows in dump file; messages in message file
2. BUILD
Indexes created or updated
3. DELETE
Unique Key Violations placed in Exception Table
Messages generated for unique key violations
Deletes Unique Key Violation Rows
4. INDEX COPY
Copy indexes from temp table space to index
table space
Input
Build
Data
Delete
Index
Copy
© Copyright IBM Corporation 2010
LOAD: Build phase
• During the BUILD phase, indexes are created based on the
index keys collected in the Load phase
• The index keys are sorted during the Load phase
• If a failure occurs during this phase, LOAD restarts from the
BUILD phase
Load
Input
Build
Data
Delete
Index
Copy
© Copyright IBM Corporation 2010
LOAD: Delete phase
• During the DELETE phase, all rows that have violated a
unique constraint are deleted
• If a failure occurs, LOAD restarts from the DELETE phase
• Once the database indexes are rebuilt, information about the
rows containing the invalid keys is contained in an exception
table, WHICH SHOULD BE CREATED BEFORE LOAD
• Finally, any duplicate keys found are deleted
Load
Input
Build
Data
Delete
Index
Copy
© Copyright IBM Corporation 2010
LOAD: Index Copy phase
• The index data is copied from a system temporary table
space to the original table space.
• This will only occur if a system temporary table space was
specified for index creation during a load operation with the
READ ACCESS option specified.
Load
Input
Build
Data
Delete
Index
Copy
© Copyright IBM Corporation 2010
Load data wizard
,
LOAD FROM file OF ASC
CLIENT pipe DEL
device IXF
cursorname CURSOR
MODIFIED BY filetype-mod
...
SAVECOUNT n ROWCOUNT n WARNINGCOUNT n MESSAGES msg-file
ALLOW NO ACCESS
INPUT OUTPUT
calpar.del cal.par cal.parexp par.msgs dump.fil.000
10 ~~ 1 10 ~~ 1 30 ~~ 4 timestamp msg ... msg ... 20 ~~ -
50 7 timestamp msg 40 ~~ X
20 ~~ - 30 ~~ 3 ~~ 20 msg
read/write read/write
Load allows no access
Time
Super Super
exclusive exclusive Load
Drain Drain lock lock commit
requested granted requested granted
read/write read
read
r/w
Load allows read access
Time
Tablestate:
Load Pending
UNIX
Databasealias.Type.Instancename.Nodename.Catnodename.Timestamp.number
Windows
Databasealias.Type.Instancename.Nodename.Catnodename.Timestamp.number
IMMEDIATE CHECKED
| exception-clause |
,
FOR table-name ALL IMMEDIATE UNCHECKED
FOREIGN KEY
CHECK
exception-clause
,
| FOR EXCEPTION IN table-name USE table-name |
BEFORE
SYSCAT.TABLES
cal.par cal.for
TABNAME DEFINER STATUS CONST_ ACCESS_
CHECKED MODE 10 1 10 X
80 8 50 Z
AFTER
80 X
Primary Key
forexp 90 A
cal.for
20 Y timestamp msg
10 ~~ X Foreign Key
90 A timestamp msg
10 ~~ Y parexp
50 ~~ Z timestamp msg
80 ... X
db2 SET INTEGRITY for
cal.par, cal.for IMMEDIATE CHECKED
FOR EXCEPTION
IN cal.par USE cal.parexp,
IN cal.for USE cal.forexp
Create triggers,target,
staging tables
SOURCE TARGET
TABLE 2 TABLE
COPY PHASE
c1 c2 … cn c1 c2 … cn
3
REPLAY PHASE
INSERT INSERT
c1 c2 … cn
Rows with
DELETE DELETE keys present
UPDATE in staging
UPDATE
table are
re-copied
Online Keys of from source
Workload row changed table
by online
workload 4 SWAP PHASE
captured via
triggers STAGING Rename Target -> Source
TABLE
© Copyright IBM Corporation 2010
ADMIN_MOVE_TABLE procedure methods
• There are two methods of calling ADMIN_MOVE_TABLE:
One method specifies the how to define the target table.
>>-ADMIN_MOVE_TABLE--(--tabschema--,--tabname--,---------------->
>--data_tbsp--,--index_tbsp--,--lob_tbsp--,--mdc_cols--,-------->
.-,-------.
V |
>--partkey_cols--,--data_part--,--coldef--,----options-+--,----->
>--operation--)------------------------------------------------><
>>-ADMIN_MOVE_TABLE--(--tabschema--,--tabname--,---------------->
.-,-------.
V |
>--target_tabname--,----options-+--,--operation--)-------------><
© Copyright IBM Corporation 2010
Example: Move a table to new table space
call SYSPROC.ADMIN_MOVE_TABLE
( 'MDC', 'HIST1', 'MDCTSP2', 'MDCTSPI', 'MDCTSP2', NULL, NULL, NULL, NULL,
'KEEP, REORG',
'MOVE' ) Source is kept, target is reorganized
KEY VALUE
-------------------------------- --------------------------------------------------
AUTHID INST461
CLEANUP_END 2009-06-01-10.57.55.282196
CLEANUP_START 2009-06-01-10.57.55.050882
COPY_END 2009-06-01-10.57.36.243768
COPY_INDEXNAME HIST1IX1
COPY_INDEXSCHEMA MDC
COPY_OPTS ARRAY_INSERT,CLUSTER_OVER_INDEX
COPY_START 2009-06-01-10.57.25.132774
COPY_TOTAL_ROWS 100000
INDEXNAME MDC.HIST1IX3
INDEXSCHEMA INST461
INDEX_CREATION_TOTAL_TIME 7
INIT_END 2009-06-01-10.57.24.880488
INIT_START 2009-06-01-10.57.21.220930
ORIGINAL HIST1AATOhxo
REPLAY_END 2009-06-01-10.57.53.870351
REPLAY_START 2009-06-01-10.57.36.244234
REPLAY_TOTAL_ROWS 0
REPLAY_TOTAL_TIME 0
STATUS COMPLETE
SWAP_END 2009-06-01-10.57.54.995848
SWAP_RETRIES 0
SWAP_START 2009-06-01-10.57.54.255941
VERSION 09.07.0000
© Copyright IBM Corporation 2010
Unit summary
Having completed this unit, you should be able to:
• Discuss the INSERT statement and recognize its limitations
• Explain the differences between IMPORT and LOAD
• Explain the EXPORT, IMPORT, and LOAD syntax
• Create and use Exception Tables and Dump-Files
• Distinguish and resolve Table States:
– Load Pending and Set Integrity Pending
MEDIA OPERATIONAL
HARDWARE
SOFTWARE
POWER
© Copyright IBM Corporation 2010
Database objects
DB2 Recovery based on restoring at the Database or Table Space level
Table
Table
Table
Container Container
0 Container 0
Container
0 1
Log 2
Database
Backup Table spaces
Log 3
1: Crash Recovery
Log 4
11
12
1 Database at 3PM
10 2
9 3
8 4
7
6
5 Restore DB
DB2 Database Log 5
Database Log 6
Backup Table spaces 3: Roll forward
2: Version Recovery Recovery
Buffer Pool
Log Buffer
Insert
Current Row New Row
Update
Requests Commit
for Reads Externalized
Log Buffer Full
and Writes
TABLES LOGS
SECONDARY
Manual
or
Automatic Archive
12 ACTIVE — Contains
information for non-
committed or non-
13 externalized transactions
14
Time
....
....
loghead
SQLLOGCTL.LFH(1 and 2) contains:
— Log configuration
— Log files to be archived
— Active logs
— Start point for crash recovery
© Copyright IBM Corporation 2010
Location of log files
Instance
/your/choice
NODE0000 newlogpath
SQLOGDIR
mirrorlogpath
REMOTE
*SIZE?
MUSICDB
-or-
Buffer size, TSM, XBSA or
number? vendor backup
shared library name
[INCREMENTAL [DELTA]]
[UTIL_IMPACT_PRIORITY [priority]
[{INCLUDE | EXCLUDE} LOGS] [WITHOUT PROMPTING]
© Copyright IBM Corporation 2010
BACKUP using the GUI interface
• Tape images are not named, but internally contain the same
information in the backup header for verification purposes
• Backup history provides key information in easy-to-use format
REMOTE
MUSICDB
-or-
Buffer size, TSM, XBSA or
number? vendor backup
shared library name
TABLESPACE ( tablespace-name )
ONLINE
HISTORY FILE
ONLINE
FROM directory
device
LOAD shared-library
OPEN num-sessions SESSIONS
...
WITHOUT ROLLING FORWARD WITHOUT PROMPTING
>--+-------------------------------------------------------------------------------------
+-->
| .-ON ALL DBPARTITIONNUMS-. .-USING UTC TIME---. |
+-TO--+-isotime--+------------------------+--+------------------+-+--+--------------+-+
| | '-USING LOCAL TIME-' | +-AND COMPLETE-+ |
| | .-ON ALL DBPARTITIONNUMS-. | '-AND STOP-----' |
| +-END OF BACKUP--+------------------------+-----------------+ |
| '-END OF LOGS--+----------------------------------+---------' |
| '-| On Database Partition clause |-' |
'-+-COMPLETE---------------------------+--+----------------------------------+--------'
+-STOP-------------------------------+ '-| On Database Partition clause |-'
+-CANCEL-----------------------------+
| .-USING UTC TIME---. |
'-QUERY STATUS--+------------------+-'
'-USING LOCAL TIME-'
>--+-------------------------------------------------------+---->
'-TABLESPACE--+-ONLINE--------------------------------+-'
| .-,---------------. |
| V | |
'-(----tablespace-name-+--)--+--------+-'
'-ONLINE-'
>--+------------------------------------------------------------------------+-->
'-OVERFLOW LOG PATH--(--log-directory--+----------------------------+--)-'
'-,--| Log Overflow clause |-'
>--+------------+----------------------------------------------->
'-NORETRIEVE-'
0000000
Sxxxxxxx.LOG
9999999
BACKUP
OFFLINE
Backup N/A OFFLINE
or ONLINE
Last
Currency after Media Latest
Not Provided Committed
Failure Backup Work
• NO SCRIPTING REQUIRED!
– One set of embedded scripts that are used by all cluster managers.
Clients
Network
IB M
Standby DB2
Primary DB2 Database Server
IB M
Database Server
Direct Database to
Database TCPIP Link