Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

Session I

Ab Initio – An Introduction
What Does Ab Initio Mean?
• Ab Initio is a Latin phrase that means:
• Of, relating to, or occurring at the
beginning; first
• From first principles, in scientific circles
• From the beginning, in legal circles
Ab initio Platforms
• No problem is too big or too small for Ab Initio.
Ab Initio runs on a few processors or few
hundred processors. Ab Initio runs on virtually
every kind of hardware
• SMP (Symmetric Multiprocessor) systems
• MPP (Massively Parallel Processor) systems
• Clusters
• PCs
Ab Initio runs on many operating
systems
• Compaq Tru64 UNIX
• Digital unix
• Hewlett-Packard HP-UX
• Ibm aix
• NCR MP-RAS
• Red Hat Linux
• IBM/Sequent DYNIX/ptx
• Siemens Pyramid Reliant UNIX
• Slicon Graphics IRIX
• Sun Solaris
• Windows NT and Windows 2000
Ab Initio base software consists
of two main pieces:

• Ab Initio Co>Operating System and core


components
• Graphical Development
environment(GDE)
Anatomy of a Running Job

What happens when you push the “Run” button?


 Your graph is translated into a script that can be executed in the
Shell Development Environment.
 This script and any metadata files stored on the GDE client machine
are shipped (via FTP) to the server.
 The script is invoked (via REXEC or TELNET) on the server.
 The script creates and runs a job that may run across many nodes.
 Monitoring information is sent back to the GDE client.
Anatomy of a Running Job

• Host Process Creation


 Pushing “Run” button generates script.
 Script is transmitted to Host node.

 Script is invoked, creating Host process .


Host
GDE

Client Host Processing nodes


Anatomy of a Running Job
• Agent Process Creation
 Host process spawns Agent processes.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job
• Component Process Creation
 Agent processes create Component processes on each processing node.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job
• Component Execution
 Component processes do their jobs.
 Component processes communicate directly with datasets and each
other to move data around.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job

• Successful Component Termination


 As each Component process finishes with its data, it exits with
success status.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job

• Agent Termination
 When all of an Agent’s Component processes exit, the Agent informs
the Host process that those components are finished.
 The Agent process then exits.

Host
GDE

Client Host Processing nodes


Anatomy of a Running Job

• Host Termination
 When all Agents have exited, the Host process informs the
GDE that the job is complete.
 The Host process then exits.

Host
GDE

Client Host Processing nodes


Session II
Ab Initio – Parallelism
Symbols
Boxes for processing and Data
Transforms
Arrows for Data Flows between
process
Cylinders for serial I/O files

Divided cylinders for parallel I/O files

Grid boxes for database tables


Parallelism

• Component parallelism

• Pipeline parallelism

• Data parallelism
Component Parallelism
Sorting Customers

Sorting Transactions
Component Parallelism
• Comes “for free” with graph programming.

• Limitation:
– Scales to number of “branches” a graph.
Pipeline Parallelism
Processing Record: 100

Processing Record: 99
Pipeline Parallelism
• Comes “for free” with graph programming.

• Limitations:
– Scales to length of “branches” in a graph.
– Some operations, like sorting, do not pipeline.
Data Parallelism

ns
t i o
rt i
Pa
Two Ways of Looking at
Data Parallelism
Expanded View:

Global View:
Data Parallelism
• Scales with data.

• Requires data partitioning.

• Different partitioning methods for different


operations.
Data Partitioning
Expanded View:

Global View:
Data Partitioning:
The Global View
Degree of Parallelism

Fan-out Flow

You might also like