Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Run Time behaviour of Reformat Component :

Reformat : The n in outn gives each out port a unique number. Each outn port has a corresponding
rejectn and errorn port. Reformat does the following:

1. The component reads records from the in port.

2. If you specify an expression for the select parameter, the expression filters the records on the in
port:

 @ If the expression evaluates to 0 for a particular record, Reformat does not process the
record, which means that the record does not appear on any output port.
 @ If the expression produces NULL for any record, Reformat writes a descriptive error
message and stops execution of the graph.
 @ If the expression evaluates to anything other than 0 or NULL for a particular record,
Reformat processes the record.

3. If you do not specify an expression for the select parameter, Reformat processes all the records on
the in port.

4. If you specify a value for either output-index or output-indexes, Reformat passes the records to
the transform functions, calling the transform function on each port in order, depending on the value
of output-index or output-indexes, for each record, beginning with out port 0 and progressing
through out port count – 1.

The evaluation of the transform functions takes place within each partition of a Reformat
running in parallel, which means that evaluations of later transform functions can depend on the
results of the evaluations of earlier transform functions, such as modification of global variables or
use of functions such as next_in_sequence.

If you do not specify a transform function for a particular out port, Reformat uses default
record assignment. (For more information, see “Default record assignment”.) You can use default
record assignment to eliminate fields from a record format.

If a transform function returns NULL, Reformat writes:

 @ An error message to the corresponding error port


 @ The current input record to the corresponding reject port

The component stops execution of the graph when the number of reject events exceeds the
result of the following formula:

limit + (ramp * number_of_records_processed_so_far)

For more information, see “Setting limit and ramp for reject events”.

If you do not connect flows to the reject or error ports, Reformat discards the information.

5. Reformat writes the valid records to the out ports.


Runtime behavior of FILTER BY EXPRESSION

Filter by Expression does the following:

1. Reads data records from the in port.

2. If the use_package parameter is false, applies the expression in the select_expr parameter to each
record. It routes records as follows, based on how the expression evaluates:

 For a non-0 value, Filter by Expression writes the record to the out port.
 For 0, Filter by Expression writes the record to the deselect port. If you do not connect a flow
to the deselect port, Filter by Expression discards the records.
 For NULL, Filter by Expression writes the record to the reject port and a descriptive error
message to the error port.

3. If the use_package parameter is true, executes the functions defined in the package:

 If the select function returns 1, the component writes the record to the out port.
 If the select function returns 0, the component writes the record to the deselect port.

4. If output_for_error or make_error is defined, executes them whenever an error event occurs. If


log_error is defined and logging of rejects is turned on, executes log_error.

Filter by Expression stops execution of the graph according to the reject-threshold parameter. If its
value is use limit/ramp, the graph stops when the number of reject events exceeds the result of the
following formula:

limit + (ramp * number_of_records_processed_so_far)

Runtime behavior of DEDUP SORTED

Dedup Sorted does the following:

1. Reads a grouped flow of records from the in port.

If our records are not already grouped, use SORT to group them.

2. Does one of the following:

 If you have supplied an expression for the select parameter, Dedup Sorted applies the
expression to each record as follows:
 Evaluates to 0 for a particular record : Does not process the record (that is, the record does
not appear on any output port).
 Produces NULL for a particular record : Writes the record to the reject port and writes a
descriptive error message to the error port.
 Discards the information if you do not connect flows to the reject or error ports.
 Evaluates to anything other than 0 or NULL for a particular record Processes the record.
 If you do not supply an expression for the select parameter, Dedup Sorted processes all
records on the in port.

3. Processes groups of records as follows:

 Considers any consecutive records with the same key value to be in the same group.
 If a group consists of one record, writes that record to the out port.
 If a group consists of more than one record, uses the value of the keep parameter to
determine which record — if any — to write to the out port, and which record or records to
write to the dup port.
 If you have chosen unique-only for the keep parameter, does not write records to the out
port from any groups consisting of more than one record.

NOTE: Both the out and dup ports are optional; if you do not connect flows to them, Dedup Sorted
discards the records.

Runtime behavior of ROLLUP

Rollup reduces each group of records to a single output record, using a series of transform functions,
as follows:

 With the first record in each group, Rollup’s initialize function creates a temporary record.
 With each following record of the same group, Rollup updates the temporary record.

When you set sorted-input to Input must be sorted or grouped, Rollup writes the output record to
the out port after processing the last record of a group. Rollup then repeats the process with the
next group.

When you set sorted-input to In memory: Input need not be sorted, Rollup reads all the input
records into memory and then writes all output records to the out port.

If you define a template rollup function, the Co>Operating System expands it into definitions for
temporary_type, initialize, rollup, and finalize before Rollup processes any records.

At runtime, Rollup first performs input selection on all input records:

 If you have defined input_select, it filters the input records.

NOTE: If you are also using the key_change function, input_select is evaluated before key_change.

If the expression evaluates to 0 for a particular input record, Rollup does not process the record (that
is, the record does not appear on any output port).

If the expression produces NULL for a particular input record, Rollup:

 Writes the input record to the reject port


 Writes a descriptive error message to the error port
If you do not connect flows to the reject or error ports, Rollup discards the information.

If the expression evaluates to anything other than 0 or NULL for a particular record, Rollup processes
the record.

 If you have not defined input_select, Rollup processes all records.

Then ROLLUP executes the following steps for each group of records:

1. Temporary initialization:

Rollup passes the first record in each group to the initialize transform function.

The initialize transform function creates a temporary record for the group, with record type
temporary_type. If you do not explicitly define the initialize function, Rollup executes the default
initialize function. The default initialize function populates the temporary record from the input
record according to the rules of default record assignment.

2. Computation.

Rollup calls the rollup transform function for each input record. The input to the rollup transform
function is the input record and the temporary record for the input group the input record belongs
to. The rollup transform function returns an updated temporary record for that input group.

If you defined a template rollup function that includes a rule that does not reference an aggregation
function, Rollup executes that rule once for each input group. Execution occurs after all input records
for that group have been processed. It is as if the nonaggregation rule were in the finalize function.

3. Finalization.

If you leave sorted-input set to its default, Input must be sorted or grouped:

 Rollup calls the finalize transform function after it processes all the input records in each
group.
 Rollup passes the temporary record for the group and the last input record in the group to
the finalize transform function.
 The finalize transform function produces an output record for the group.
 Rollup repeats this procedure with each group.

If you set sorted-input to In memory: Input need not be sorted:

 After Rollup processes all the input records, it calls the finalize transform function with the
temporary record for each group and an arbitrary input record from each group as
arguments.
 The finalize transform function produces an output record for each group.

4. Output selection.

If you have defined the output_select transform function, it filters the output records.

The output_select transform function takes a single argument — the record produced by finalization
— and returns a value of 0 (false) or non-zero (true).
Rollup ignores records for which output_select returns 0; it writes all others to the out port.

If you have not defined the output_select transform function, Rollup writes all records to the out
port.

If any of the transform functions returns NULL, Rollup writes:

 The current input record to the reject port

The component stops the execution of the graph when the number of reject events exceeds the
result of the following formula:

limit + (ramp * number_of_records_processed_so_far)

For details, see “Setting limit and ramp for reject events”.

 A descriptive error message to the error port

If you do not connect flows to the reject or error ports, Rollup discards the information.

When Rollup is run in parallel, it is frequently preceded by PARTITION BY KEY, which distributes
records with the same key value to the same partition.

Rollup stores temporary files in the working directories specified by its layout.

Runtime behavior of SCAN

SCAN produces a series of n output records for each group of n input records, computing each
output record as an aggregation of the input records processed so far, as follows:

 With the first record of each group, Scan creates a temporary aggregate record.
 With each following record of the same group, Scan updates the temporary aggregate
record, producing an aggregate record of the records processed so far for each group.
 After processing each record, Scan writes the aggregate record to the out port, using a series
of transform functions.

At runtime, Scan does the following:

1. Input selection:

If you have not defined the input_select function in your transform, Scan processes all records.

If you have defined the input_select function, Scan filters the input records accordingly. If the
function evaluates to 0 for a particular record, Scan does not process the record. In other words, the
record does not appear on any output port.

If the function produces NULL for a particular record, Scan:

 Writes the record to the reject port


 Writes a descriptive error message to the error port
If you do not connect flows to the reject or error ports, Scan discards the information. If the function
evaluates to anything other than 0 or NULL for a particular record, Scan processes the record.

NOTE: If you are also using the key_change function, input_select is evaluated before key_change.

2. Temporary initialization:

Scan passes the first record in each group to the initialize transform function. The initialize transform
function creates a temporary record for the group, with record type temporary_type.

3. Computation:

Scan calls the scan transform function for each record in a group, including the first, using that record
and the temporary record for the group as arguments. The scan transform function returns a new
temporary record.

4. Finalization:

Scan calls the finalize transform function once for every input record. Scan passes the input record
and the temporary record that the scan function returned to the finalize transform function. The
finalize transform function produces an output record for each input record.

5. Output selection:

If you have not defined the output_select transform function, Scan writes all output records to the
out port. If you have defined the output_select transform function, Scan filters the output records.
The output_select transform function takes a single argument — the record produced by finalization
— and returns a value of 0 (false) or non-zero (true). Scan ignores records for which output_select
returns 0; it writes all others to the out port.

SCAN transform functions that return NULL

If any of the transform functions returns NULL, Scan:

 Writes the current input record to the reject port.

Scan stops execution of the graph when the number of reject events exceeds the result of the
following formula:

limit + (ramp * number_of_records_processed_so_far)

For more information, see “Setting limit and ramp for reject events”.

 Writes a descriptive error message to the error port.

If you do not connect flows to the reject or error ports, Scan discards the information.
When you run Scan in parallel, a common practice is to execute PARTITION BY KEY first. This
distributes records with the same key value to the same partition.
Runtime behavior of JOIN

Join does the following:

1. Reads data records from multiple inn ports. Depending on the setting of the sorted-input
parameter, it does one of the following:

 If input is sorted, it reads records in the order in which they arrive.


 In input is unsorted, it loads all records from all inputs except the driving input into main
memory. Once the nondriving inputs are loaded, it reads records from the driving input in
the order they arrive. (If the nondriving inputs do not all fit into the memory limit specified
by max-core, data, including the driving input, is spilled to disk, hindering performance.)

2. Applies the expression in any defined selectn parameter to the records on the corresponding inn
port:

If the expression does this JOIN does this

 Evaluates to 0 for a record Does not process the record, and the record does not appear on
any output port
 Produces NULL for a particular record Writes a descriptive error message and stops execution
of the graph
 Evaluates to anything other than 0 or NULL for a particular record Processes the record

If you do not supply an expression for a selectn parameter, Join processes all the records on the
corresponding inn port.
3. Removes any duplicate records that have made it through the select (if you set the dedupn
parameter to True).

Which duplicate records Join uses depends on the setting of the sorted-input parameter (see “How
JOIN handles duplicates”). Unused duplicates are sent to the unusedn port.

4. Operates on records that have matching key values using a multi-input transform function.

If the transform function returns NULL, Join:

 Writes each input record to the corresponding rejectn port, then stops execution of the
graph when the number of reject events exceeds the result of the following formula:

limit + (ramp * number_of_records_processed_so_far)

For more information, see “Component tolerance for rejections”.

 Writes an error message to the corresponding errorn port.

If you do not connect flows to the rejectn or errorn ports, Join discards the information.
5. Writes the non-NULL return record from the transform function to the out port.

Join stores temporary files in the working directories specified by its layout.

Unused ports

If you connect a flow to an unusedn port, Join writes to the unusedn port, from the corresponding
inn port, any of the selected records that it does not pass through the transform function. In other
words, Join writes the following records to unusedn ports:

For an inner join — All unmatched records


For an outer join — No records, since Join passes all records through the transform function
For an explicit join — Records for which the transform is not called
For an input port with the dedupn parameter set to True — Records with duplicate key
values

Lookup File
Icon of this component.

Purpose

Lookup File can make graph processing faster and more efficient. Lookup File represents one or more
serial files or a multifile. For the purpose of indexing records and retrieving them, you use Lookup
File to associate key values with corresponding data values. The amount of data associated with a
Lookup File should be small enough to be held in main memory.

Lookup File is not connected to other graph components, but its associated data is accessible from
other components. You can define a transform function in another component to access the data
associated with a Lookup File. Retrieving the associated records from main memory is much quicker
than retrieving them from disk.

LOOKUP FILE is not a phased component — it never displays a phase number. For more information,

Runtime behaviour of SORT


Sort does the following:

1. Reads the records from all flows connected to the in port until it reaches the number of bytes
specified in the max-core parameter.

2. Sorts the records and writes the results to a temporary file on disk.

Sort stores temporary files in the working directories specified by its layout.

3. Repeats Steps 1 and 2 until it has read all records.

4. Merges all temporary files, maintaining the sort order.

5. Writes the result to the out port.

You might also like