Professional Documents
Culture Documents
RowGen31 Manual
RowGen31 Manual
v3
Test Data and File
Generation
March 2013
IRI tries to ensure that this document is correct and accurate, and IRI reserves the right to change
it without notice.
RowGen software licenses are serialized and usage is registered. Anyone wishing to expand
the use, or integrate and/or distribute all, or any part, of the software, must first execute an
appropriate license agreement with IRI.
No warranty, expressed or implied, is made by IRI as to the accuracy of the material and
functioning of the software. Any warranty of fitness for any particular purpose is expressly
excluded and in no event will IRI be liable for any direct or consequential damages.
Trademarks: RowGen, CoSort, and sortcl of IRI. All other brand or product names are
trademarks or registered trademarks of their respective holders/companies.
© 2005-2013 IRI. All rights reserved. No part of this document or the RowGen programs
may be used or copied without the express written permission of IRI. Please contact:
TABLE OF CONTENTS
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
ROWGEN EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 ADVANCED SETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Example 5: Using a Relational SET File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Example 6: Using Literal SETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Example 7: Selecting ALL Values from a Single-column SET . . . . . . . . . . 21
Example 8: Selecting ALL Values from a Multi-column SET . . . . . . . . . . . 23
Example 9: Selecting Values ONCE from a SET . . . . . . . . . . . . . . . . . . . . . 23
1 PURPOSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
RowGen iii
Table of Contents
3 EXECUTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 USAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 Data Flow Structure in RowGen Scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Input Filenames and Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Output Filenames and Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 /INCOLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 /OUTCOLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 FILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1 Resource Control File (rowgenrc) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Job Specification Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Data Definition Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 File Formats (/PROCESS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.5 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6 Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Example 19: Creating an /AUDIT log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Field Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 ROWID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Example 20: Using ROWID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 SET Files and Literal SETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.1 Distributions Using Routines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.2 Distributions Using Set Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.6 POSITION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Example 21: Generating Fixed-Position Fields . . . . . . . . . . . . . . . . . . . . . . . 80
Example 22: Generating Variable-Position (Delimited) Fields . . . . . . . . . . . 80
6.7 SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Example 23: Size with NUMERIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Example 24: Using ASCII Substrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.8 SEPARATOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Example 25: Generating Multi-Character Field Separators . . . . . . . . . . . . . 86
iv RowGen
Table of Contents
9 /INREC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Example 39: Using /INREC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10 /DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Example 40: Using /DATA Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
RowGen v
Table of Contents
14 KEYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.2 Field Name Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.3 Unnamed Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.4 Collating Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.5 Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.6 ASCII Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
14.7 No Duplicates, Duplicates Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
15 CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
15.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
15.2 Unary Logical Expressions (Change Test) . . . . . . . . . . . . . . . . . . . . . . . . . . 140
15.3 Binary Logical Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
15.4 Function Compares in Conditions (iscompares). . . . . . . . . . . . . . . . . . . . . . 141
Example 41: Using iscompares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
15.5 Compound Logical Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
15.6 Evaluation Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
15.7 Compound Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
vi RowGen
Table of Contents
20 SEQUENCER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Example 52: Using SEQUENCER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
cob2ddf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
1 PURPOSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2 USAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
csv2ddf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
1 PURPOSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2 USAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
3 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
RowGen vii
Table of Contents
elf2ddf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
1 PURPOSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2 USAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.1 ELF Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
3 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
ctl2ddf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
1 PURPOSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
2 USAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
2.1 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
3 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
viii RowGen
INTRODUCTION
RowGen is a high-performance data generator and format synthesizer that builds test
databases, flat files, records and reports in the same form, and format, of real data.
RowGen can be used to develop and stress test applications, prototype databases and
ETL/ELT operations, and safely outsource formatted data targets when production data
are confidential or unavailable.
RowGen jobs are controlled by text-based scripts that define the layout of the test tables
and files you want to build. RowGen Control Language (RCL) scripts use the same
explicit and intuitive syntax as the Sort Control Language (sortcl) program within IRI’s
CoSort data manipulation package, so you can transform and format your test data.
For test database generations, RowGen can parse the data model information for any
JDBC-connected database, converting the SQL information about table layouts and
primary-foreign key relationships into RowGen job scripts. These, in turn, produce pre-
sorted, structurally and referentially correct test data that can be bulk loaded to target
tables.
The IRI Workbench, a graphical user interface (GUI) built on Eclipse, facilitates the
specification, execution, tuning and maintenance of RowGen job scripts through job
wizards, a dynamic job outline, and a fully syntax-aware editor. To ensure that
referential integrity is preserved in the test data, a wizard for creating database test data
based on existing table structures and their relationships is also available. This manual
documents the use of the RowGen Control Language only, and is thus a reference guide
for the programs running outside the GUI. Documentation on GUI operations is
provided in the topic help and in the context sensitive help within RowGen-specific
GUI wizards and dialogs.
RowGen job scripts describe the precise layout of the data to be randomly generated or
selected, including the size, position, and data type of each field element. RowGen can
generate one or more sets of data in a single RowGen execution, and it can produce
multiple output tables/files/reports that involve transformations, like sorting, or
aggregation on those random sets, all within the same job script and I/O pass.
RowGen allows you to order the generated data (over any number of keys), create
structured reports, or build sequenced load files by using the same formatting
capabilities available to CoSort sortcl users. The difference between RowGen and the
sortcl program is that RowGen generates and processes random data, while sortcl
recognizes and processes real data.
RowGen Introduction 9
RowGen-generated data fields may consist of:
• ASCII
• EBCDIC
• numeric
• COBOL
• datestamp and timestamp
• binary
• other special types such as IP_Address.
See Data Types on page 93.
In cases where RowGen cannot generate field data of a desired data type in the
NOTE input phase of a job script, RowGen can convert field values to your preferred
data type on output.
• Values drawn at random from SET files (either user-built or provided with the
RowGen package) or from inline values within a job script (literal SETs). SETs
can include character-based strings, numbers and numeric ranges, dates and date
ranges, or timestamp and timestamp ranges. You can also select from relational
SET files where the value that is returned is dependent on another value(s)
(see SET FILES on page 108).
RowGen output can be stored in files, sent to stdout or stderr, or sent to named pipes.
RowGen can assist in building and testing databases (see page 48) and applications that
would manipulate or otherwise act on real production data later, or for benchmarking
high-performance file management or manipulation software like CoSort. RowGen
output can range from simple, single-field flat files to multi-column database load
targets. RowGen output can be formatted with multi-level, custom HTML reports
populated with headers, footers, literals, details and summary values.
For new users, it is recommended that you read the ROWGEN EXAMPLES
NOTE chapter, which provides an opportunity to sample the functionality of the
RowGen product. Examples start out simple and become increasingly more
complex. The ROWGEN CONTROL LANGUAGE chapter contains the formal
recitation of the RowGen Control Language (RCL).
10 Introduction RowGen
ROWGEN EXAMPLES
RowGen data generation is under control of the text-based 4GL known as RowGen
Control Language (RCL). The examples provided in this chapter illustrate the various
capabilities of RCL. For a formal description of RCL syntax components, see the
ROWGEN CONTROL LANGUAGE chapter on page 47.
Although specific file name extensions are not required, the following conventions are
used for file name extensions:
All files referenced throughout this chapter are provided in the subdirectory
/examples/Examples_chapter of your RowGen install directory, so you can run these
examples and re-create the results. In addition, where other file types are used
(such as .set and .bat), they too are provided in /examples/Examples_chapter.
The first examples are simple, and they become increasingly more complex. It is
therefore recommended that you perform them in the order presented:
Many of the SET files delivered with the RowGen package are derived from
NOTE public domain sources found on the internet. As such, IRI (and thus you)
cannot vouch for the accuracy, format, safety, or completeness of their content.
They are provided for your convenience and to assist in the development of
realistic set data through random selection of their values. IRI expressly
disclaims any warranty of fitness for these SET files.
The simplest form of a RowGen job script requires an /INFILE statement, followed by
one or more
/FIELD statements (see Data Flow Structure in RowGen Scripts on page 55).
The following script, default_gen.rcl, illustrates the simplest way to generate random
data with RowGen, where standard defaults are applied:
• the generated records will be sorted from left to right (see KEYS on page 135)
• output will be sent to stdout, that is, the default output when no
file name is specified (see Output Filenames and Attributes on page 57)
• output records will be linefeed- or carriage return, linefeed (CRLF)- terminated
(see RECORD on page 63)
• output (to stdout) will be of the same format as described on input, that is, a
single alpha_digit field.
rowgen /spec=default_gen.rcl
This will generate 100 sorted records (the following shows the first five records):
0wl6n
2ihzj
3Nvin
5Zqdw
5rWHG
RowGen can draw field values at random from an existing SET file containing any
number of possible field values. In this way, you can ensure that fields are populated
with realistic-looking data, rather than just randomized characters.
This example uses the SET file parts_list1.set, which contains the following:
Brackets
Screws
Nails
Tacks
The following script, simple_gen.rcl, will generate 10 records consisting of two fields
each, where one field contains values drawn from the SET file, and the other contains
randomly generated values.
rowgen /spec=simple_gen.rcl
Tacks|-3.67
Brackets|-7.17
Nails|64.12
Screws|68.64
Nails|-2.94
Brackets|-8.14
Brackets|81.57
Screws|-3.77
Brackets|6.85
Screws|57.45
Screws|52.24
Note that:
This example shows how you can produce field values randomly by drawing from
a data set of numeric values (literals) and ranges. You must declare the data type
NUMERIC when selecting from a numeric SET file.
values_low.set contains:
-15
[-9,-5]
(-2,0)
values_high.set contains:
34567
[40000,40010]
56789
Consider the following script, number_gen.rcl, which draws values at random from the
two set files, where random values can be either literal, or fall within a range:
-6 40003.73
-15 56789.00
-1 40009.70
-15 34567.00
-15 56789.00
-15 40009.93
-1 40008.64
-1 40005.78
-15 40006.68
-1 40008.82
Note that:
• The field "low" contains a random mix of the literal value -15, and can also
contain values from the inclusive [-9,-5] and exclusive (-2,0) ranges,
respectively (see Numeric SET Files with Ranges on page 113).
• The SIZE=5.0 attribute indicates that no decimal point will be used
(.0 precision) in the first column.
• The field "high" contains a random mix of the literal values 34567 and 56789,
and values from the inclusive range [40000,40010].
• The SIZE=8.2 attribute indicates that a decimal precision of 2 will be used.
To see how different SIZE precision specifications can affect output when
selecting values from a numeric SET file, see Example 35 on page 114.
WARNING!
A 0-length entry in a numeric SET file can cause an infinite
loop because such a value can never be found during selection.
You can restrict the minimum and maximum field sizes of both ASCII character and
numeric field values (see MIN_SIZE and MAX_SIZE on page 82 for complete details).
/INFILE=minmaxsize.in
/PROCESS=RANDOM
/FIELD=(code,POSITION=1,MIN_SIZE=1,MAX_SIZE=4,SEPARATOR='|')
/FIELD=(value,POSITION=2,MIN_SIZE=4,MAX_SIZE=8,SEPARATOR='|',NUMERIC)
/REPORT
/OUTFILE=minmaxsize.out
rowgen /spec=minmaxsize.rcl
This will generate 100 sorted records (the following shows the first 13 records):
01AE|-.02
0G7w|47.77
0|-.73
0|113.39
1k4|211.25
1v7|598.30
1|3.17
2A|66.97
2Sf|-.19
2d4|-.89
2p|-80.17
2|1556.03
3Bt|-1.56
3M|1.41
3Nb7|85245.23
Note that:
• the "code" field, which defaults to the alpha_digit data type (see Data Types on
page 93), ranges from a width of 1 through 4
• the "value" field, specified as NUMERIC, defaults to a total field width between
4 and 8, which includes two decimal places (by default), the decimal point, and a
minus sign if the generated number is negative (see SIZE on page 81).
For a description of other, default behaviors observed when executing this script, see
Using All Defaults: Simplest Form on page 13.
2 ADVANCED SETS
As illustrated in Example 2 on page 14 and Example 3 on page 15, RowGen can select
field values at random from a pre-existing SET file. This section demonstrates some of
the more advanced SET functionality available in RowGen. See SET FILES on
page 108 for full details on all supported SET options and how to use them.
This example shows how you can use a dependency table as a SET file to produce
values for one field which are dependent on the values selected for another
(see Relation SET Files on page 119). This example provides realistic first names that
conform to gender.
female Carole
female Jane
female Jill
female Rachel
male Bill
male John
male Peter
male Roger
last_names.set
Stevens
Evans
Jones
Pierce
Murray
Smith
Osbourne
The following script, relate.rcl, performs a two-key sort that uses the above SET files to
select realistic names that correspond appropriately to gender. It also produces a second
output file that applies record filter logic:
/INFILE=names.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(gender,set=name_list.set,POSITION=1,SEPARATOR='|')
/FIELD=(first_name,set=ROW[2] name_list.set,POSITION=2,SEPARATOR='|')
# the dependency table is referenced to provide a first_name to
# correspond to the value of gender
/FIELD=(last_name,set=last_names.set,POSITION=3,SEPARATOR='|')
/SORT
/KEY=last_name
/KEY=first_name
/OUTFILE=names.out
/HEADREC="Name Gender\n----------------------\n"
/FIELD=(last_name,POSITION=1)
/DATA=","
/FIELD=(first_name)
/DATA="\t"
/FIELD=(gender)
/OUTFILE=females_only.out
/INCLUDE WHERE gender == "female"
/HEADREC="Name Gender\n----------------------\n"
/FIELD=(last_name,POSITION=1)
/DATA=","
/FIELD=(first_name)
/DATA="\t"
/FIELD=(gender)
Name Gender
----------------------
Evans,Bill male
Evans,Rachel female
Jones,Bill male
Jones,Rachel female
Murray,Jane female
Murray,Peter male
Osbourne,John male
Pierce,Rachel female
Smith,Bill male
Smith,Roger male
Note that the first names correspond to the gender, as determined by the dependency
table, name_list.set. The header record was produced with the /HEADREC statement
(see /HEADREC on page 170).
Name Gender
----------------------
Evans,Rachel female
Jones,Rachel female
Murray,Jane female
Pierce,Rachel female
The /INCLUDE statement ensured that only records containing the female gender were
included in the output file (see INCLUDE-OMIT (RECORD SELECTION) on
page 146).
This example shows how you can include SET values explicitly within /FIELD
statements when you have a small number of random elements to draw from, and do not
require a separately held SET file (see Literal SETs on page 122).
last_names.set
Stevens
Evans
Jones
Pierce
Murray
Smith
Osbourne
first_names.set
John
Susan
Mary
Stephen
Barry
Cathy
The following script, lower_man.rcl, ensures that RowGen selects only the zip codes
representing the Lower Manhattan area of New York City. The zip codes are entered
explicitly in the /FIELD statement:
/INFILE=literal_set.in
/PROCESS=RANDOM
/FIELD=(last_name,SET=last_names.set,POSITION=1,SIZE=10)
/FIELD=(first_name,SET=first_names.set,POSITION=11,SIZE=10)
/FIELD=(zip,SET={10004,10005,10006,10007,10038,10280},POSITION=21,SIZE=5)
/SORT
/KEY=last_name
/NODUPLICATES
/OUTFILE=lower_man.out
/OUTCOLLECT=7
/HEADREC="Lower Manhattan\n"
/FIELD=(last_name,POSITION=1,SIZE=10)
/FIELD=(first_name,POSITION=11,SIZE=10)
/FIELD=(zip,POSITION=21,SIZE=5)
This produces:
Lower Manhattan
Evans Mary 10005
Jones Cathy 10006
Murray Barry 10038
Osbourne Susan 10005
Pierce Mary 10006
Smith Barry 10006
Stevens Mary 10038
Note that, RowGen selected zip codes only from the list given within the curly brackets
{} of the SET attribute in the zip field (see Literal SETs on page 122).
This example demonstrates how you can ensure that all values contained in a
single-column SET file are used to populate a given field within the generated records.
glue sticks
lighters
pliers
ratchets
buzz saws
sanders
hammers
nails
tacks
screws
screwdrivers
wrenches
drills
lightbulbs
The following script, allparts.rcl, shows how you can ensure the inclusion of all values
from the SET file:
/INFILE=parts.in
/PROCESS=RANDOM
/INCOLLECT=15
/FIELD=(part,SET=ALL parts_list2.set,POSITION=1,SIZE=13)
/FIELD=(price,set={(1,15)},POSITION=15,SIZE=5.2,NUMERIC) # literal range
/REPORT
/OUTFILE=pricelist_all.out
/HEADREC="Price list: Set of 100\n"
/FIELD=(part,POSITION=1,SIZE=13)
/FIELD=(price,POSITION=15,SIZE=5.2,NUMERIC)
pricelist_all.out contains:
Note that:
• all 15 unique values from parts_list2.set are represented in the above list
• random numbers between 1 and 15 were drawn from a literal numeric range (see
Literal SETs on page 122).
This example demonstrates how you can select ALL values that satisfy a dependency
within a multi-column, relational SET file (see Example 5 on page 18).
NY Albany
NY Concord
NY Hemstead
NY New York City
PA Erie
PA Philadelphia
PA Pittsburg
PA Scranton
PA State College
The following script, allny.rcl, generates all records on the right that correspond to the
given value on the left. Note that this example contains a string to represent the desired
left-hand value, but you can also use a field value to define the value on the left, as
described in Relation SET Files on page 119.
/INFILE=allstates.in
/INCOLLECT=4
/FIELD=(ny_city,POSITION=1,SIZE=15,set = ALL PANY_cities.set["NY"])
/REPORT
/OUTFILE=allny.out
/FIELD=(ny_city,POSITION=1,SIZE=15)
Albany
Concord
Hemstead
New York City
All unique values from the right column that satisfy the relationship specified ("NY")
are produced. The /INCOLLECT=4 statement ensures that no repeated entries are
produced (see also ONCE in Example 9 on page 23 for an alternative way to return all
values once without repetition).
This example uses the SET files, parts_list2.set from Example 7 on page 21.
The following script, onceparts.rcl, shows how you can include all the values from the
SET file only once:
/INFILE=parts.in
/PROCESS=RANDOM
/INCOLLECT=20
/FIELD=(part,SET=ONCE parts_list2.set,POSITION=1,SIZE=13)
/FIELD=(price,SET={(1,15)},POSITION=15,SIZE=5.2,NUMERIC)
/REPORT
/OUTFILE=pricelist_once.out
/HEADREC="Price list: Set of 100\n"
/FIELD=(part,POSITION=1,SIZE=13)
/FIELD=(price,POSITION=15,SIZE=5.2)
pricelist_once.out contains:
Note that:
• All 15 unique values from parts_list2.set are represented in the above list. After
the unique values are used, no further selections are made from that SET file,
resulting in empty values in the left column.
• Random numbers between 1 and 15 were drawn from a literal numeric range
(see Literal SETs on page 122).
Several record and file formatting options are available to modify the output targets that
contain the random data generated by RowGen. In addition, you can create derived
fields in the output section that are the result of field-level conditions, cross-calculation,
and/or aggregation.
This example shows how you can add /DATA statements to the /OUTFILE section
of a RowGen script to format the output records. The following script, ssno_gen.rcl,
generates one million valid social security numbers (according to the rules of validity
described in http://en.wikipedia.org/wiki/
Social_security_number#Valid_SSNs):
/INFILE=ssno.in
/PROCESS=RANDOM
/INCOLLECT=100000 # Generates 1 million records
# after include/omit logic is applied
/FIELD=(area_number,POSITION=1,SIZE=3,SEPARATOR=',',digit)
/FIELD=(group_number,POSITION=2,SIZE=2,SEPARATOR=',',digit)
/FIELD=(serial_number,POSITION=3,SIZE=4,SEPARATOR=',',digit)
/OMIT WHERE area_number > "772" OR area_number == "666"
/INCLUDE WHERE area_number > "000" AND group_number > "00" \
AND serial_number > "0000"
/REPORT
/OUTFILE=ssnos.out
/FIELD=(area_number)
/DATA="-" # customized record formatting
/FIELD=(group_number)
/DATA="-" # customized record formatting
/FIELD=(serial_number)
In the above job script, field descriptions were specified in the input section, and field
formatting was applied in the output section. See /DATA on page 129 for complete
details on using /DATA statements.
This example shows how you can customize output record layouts, and produce a report,
using record formatting and field remapping statements.
Consider the SET file parts_list2.set from Example 7 on page 21. The following script,
remap.rcl, shows how to specify remapping and record formatting in the output section,
building on the generated data specified in the input section:
Price Part
-------------------------------
96.40 for a set of hammers
65.63 for a set of hammers
65.00 for a set of glue sticks
55.04 for a set of nails
29.55 for a set of nails
-------------------------------
As recorded on 2013-02-01
Note that:
• values for the "part" field were selected at random using the SET file, and the
"price" field consists of randomly generated numbers, satisfying the INCLUDE
• the "price" values were sorted in descending order (see KEYS on page 135)
• the /HEADREC statement created the header record, i.e., a literal string with \n
(linefeed) characters (see /HEADREC on page 170 and CONTROL (ESCAPE)
CHARACTERS on page 132)
• the two fields were remapped to fixed-byte positions, with price displayed first
• the /DATA statement created intra-record text, placed within the field values
(see /DATA on page 129)
• the /FOOTREC statement created the footer record with the current datestamp
(see /FOOTREC on page 171 and INTERNAL VARIABLES on page 131).
RowGen can produce multiple output files with different file formats (see File Formats
(/PROCESS) on page 62), in the same job. You can also convert from one data type to
another at the field level (see Data Types on page 93), which is useful when multiple
output files have different data-type requirements. Consider the following SET file,
USAdates.set:
Feb/13/2003
Mar/06/1998
Apr/13/1996
Jul/14/2000
Jun/20/2003
Jan/03/1999
The following script, conv.rcl, produces two output files. The first will show the same
data as generated on input. The second output will illustrate a file-format conversion,
and data-type conversion at the field level:
The file orig.out contains all seven generated records, sorted on the "day" field:
Apr/13/1996|07271|ÿ-+ç-
Apr/13/1996|34755|+ÖS+¬
Mar/06/1998|58163|¦Ö-µÿ
Mar/06/1998|34568|¦++ö+
Mar/06/1998|70363|êtå++
Jan/03/1999|11065|+-+Ñù
Feb/13/2003|30614|ó-üÑ+
The second output file, conv.csv, which was produced simultaneously (in the same I/O
pass) with the other output file, is as follows:
day,zip,code
"1996-04-13","07271","qBCgK"
"1996-04-13","34755","CrUNz"
"1998-03-06","58163","GrBWq"
"1998-03-06","34568","GPImI"
"1998-03-06","70363","hXfHL"
Note that:
• all seven generated records were returned to the first output file, orig.out, and
five records only were produced in conv.csv, as determined by the output file-
specific /OUTCOLLECT statement (see /OUTCOLLECT on page 59).
• as a result of specifying /PROCESS=CSV, a header record was created based on
the field names in the output layout, and the individual fields are framed in
double-quotes by default (see CSV on page 66)
• the "day" field was converted from the AMERICAN_DATE format (from the
values in the SET file) into ISO_DATE format (see Date/Timestamp Data Types
on page 105)
• the randomly generated "code" field was converted from EBCDIC_alpha format
into ASCII alpha format (see ASCII Character Data Types on page 94).
The job script on the following page, derived.rcl, illustrates several methods for
producing derived fields, and also demonstrates how to use /INREC to create a virtual
record layout to be processed (see /INREC on page 126).
Consider the following character SET file, parts_list3.set, where each part name is
preceded by a unique three-digit code and a colon (:):
DBD:screwdrivers
CBC:hammers
ABC:glue sticks
EBD:lightbulbs
ABE:pliers
BBC:ratchets
BBD:buzz saws
EBE:switches
BBE:sanders
CBD:nails
CBE:tacks
DBC:screws
DBE:wrenches
EBC:drills
ABD:lighters
/INFILE=derived.in
/PROCESS=RANDOM
/FIELD=(full,SET=parts_list3.set,POSITION=1,SIZE=16)# use SET values
/FIELD=(value1,POSITION=17,SIZE=5,NUMERIC)# random numeric value 1
/FIELD=(value2,POSITION=22,SIZE=5,NUMERIC)# random numeric value 2
/INCLUDE WHERE value1 > 0 AND value2 > 0 # exclude negative values
/INREC # create virtual input record:
/FIELD=(part=sub_string(full,5,12),POSITION=1,SIZE=12) # define substring
/FIELD=(value1,POSITION=13,SIZE=5)
/FIELD=(value2,POSITION=18,SIZE=5)
/FIELD=(T=abs(value1 - value2),POSITION=23,SIZE=5,NUMERIC) # math field
/SORT
/KEY=part # sort over the derived substring
/NODUPLICATES # exclude duplicates of substring
/OUTFILE=derived.out# summary record, same outfile:
/DATA="\n Average difference: "
/FIELD=(avg_diff,NUMERIC)# names and positions new field
/AVERAGE avg_diff from T # defines the aggregation
/OUTFILE=derived.out # layout for detail records:
/HEADREC="Part Low High Diff\n------------------------------
-----------\n"
/FIELD=(part,POSITION=1,SIZE=12)# produce substring on output
/FIELD=(low,POSITION=17,SIZE=5,NUMERIC, IF value1 LE value2 THEN value1 \
ELSE value2) # conditional field
/FIELD=(high,POSITION=27,SIZE=5,NUMERIC,IF value1 LE value2 THEN value2 \
ELSE value1) # conditional field
/FIELD=(T,POSITION=37,SIZE=5,NUMERIC)
Note that:
• the "full" field, as defined on input, specifies that values from the SET file will
be drawn at random
• the /INREC section creates a virtual layout of three fields to be processed
(see /INREC on page 126), where consecutive fields are placed right next to each
other (temporarily, to reduce processing time)
• the first /INREC field is a derived field, "part," defined as a substring of "full,"
where the offset of 5 ignores the first four prefix characters (see ASCII
Substrings on page 84)
• the "part" field, the desired substring, is then sorted in ascending order,
excluding duplicates
• the first same-name iteration of /OUTFILE provides the output layout for the
detail records, including three newly derived fields:
• "low" is a conditional field that ensures that the lower of the two values
(value1 or value2, as generated on input) is placed at position 17
(see CONDITIONAL FIELD AND DATA STATEMENTS on page 152)
• "high" is a conditional field that ensures that the higher of the two values is
placed next, at position 27
• the field named as "T" will contain the absolute value of the result of
subtracting value2 from value1, where the absolute value ensures a positive
result regardless of which value was larger
• the second same-name iteration of /OUTFILE contains:
• a derived field "avg_diff," which is not given a POSITION attribute, and is
therefore placed directly after the preceding DATA statement
(see /DATA on page 129)
This example, inventory.rcl, creates a report with detail records, subtotals, and
grand totals:
B1PP $53.68
B6Oz $36.52
BCAW $36.09
BPDq $42.66
BPo4 $31.82
Bt45 $23.62
Total B $224.39 $53.68 $23.62 $37.40 6
C10A $60.76
C1bS $63.55
C1h6 $51.22
C6UL $12.18
C8ZR $78.26
Cj4c $62.41
Total C $328.38 $78.26 $12.18 $54.73 6
All Groups:-----------------------------------------------
$813.38 $94.38 $12.18 $54.23 15
Note that:
This example produces an HTML file showing, as a table, the summary of store sales by
state in the USA.
states.set contains:
Alabama
Alaska
Arizona
Arkansas
California etc.
/INFILE=html.in
/PROCESS=RANDOM
/FIELD=(State,set=states.set,POSITION=1,SEPARATOR=',')
/FIELD=(StoreCode,POSITION=2,SEPARATOR=',',SIZE=3,upper)
/FIELD=(Sales,POSITION=3,SEPARATOR=',',SIZE=6,NUMERIC)
/INCLUDE WHERE Sales > 0
/SORT
/KEY=State
/KEY=StoreCode
/OUTFILE=sales_report.htm # summary lines
/DATA="<TR>\n<TD><B><FONT SIZE=+2>"
/FIELD=(State)
/DATA="</FONT><B></TD>\n<TD align=right><B><U><FONT SIZE=+2>"
/FIELD=(Sales,SIZE=15,CURRENCY)
/DATA="</FONT></U></B></TD>\n</TR>\n"
/SUM Sales BREAK State
/FOOTREC="</TABLE><BR>\nCreated on </B>%s.\
<HR></BODY>\n</HTML>",AMERICAN_TIMESTAMP
/OUTFILE=sales_report.htm # details lines
/HEADREC="<HTML><HEAD>\n<TITLE>HTML produced by RowGen\
</TITLE>\n</HEAD>\n<BODY><H2>Summary of Sales by\
State</H2>\nSales under \$100 are shown in \
italics.\n<TABLE CELLPADDING=4 CELLSPACING=1 \
BORDER COLS=5>\n"
/DATA="<TR>\n<TD><B>"
/FIELD=(StoreCode)
/DATA="</B></TD>\n<TD align=right>"
/DATA=(IF Sales LT 100 THEN "<em>")
/FIELD=(Sales,SIZE=15,CURRENCY)
/DATA=(IF Sales LT 100 THEN "</FONT>")
/DATA="</TD>\n</TR>\n"
Note that, for HTML purposes, field names are enclosed within parentheses, for
example: /FIELD=(Region).
RowGen allows you to use any HTML syntax, such as commands to modify text and
background color, to enhance a web-ready report. You can also include commands
specific to other markup languages, such as XML and SGML.
The mark-up language syntax you can use is dependent on the version of the
NOTE browser, or other utility, you will use to open and read the output file(s).
RowGen allows you to generate multiple data sets in the input section of a job
specification script. You can process, mix, and track these disparate sets on output
according to your specifications. This capability is useful for synthesizing indexed
tables or formatted reports where data origin is important.
This example produces an intermixed data file that contains three separately generated
record types. The first input record format contains employee names and addresses with
sales; the second, store numbers with sales; and the third, department numbers with
sales. A code is used to distinguish the different record types. The first field in each
record represents its code. For the employee records, the code is a; the store records, b;
and the department records, c. To ensure that the various record types are intermixed in
the output, the records are sorted together using three characters starting in column
position 24.
first_names.set:
John
Susan
Mary
Stephen
Barry
Cathy
last_names.set:
Stevens
Evans
Jones
Pierce
Murray
Smith
Osbourne
The following script, multi_in.rcl, demonstrates how you can generate multiple data
sets in RowGen:
Note that:
• three distinct /INFILE sections were created, each with their own attributes
(including a different /INCOLLECT for each)
• the "code" field in each of the three data sets references a distinct literal SET
value (see Literal SETs on page 122), so it is easy to identify the source after the
three data sources are processed together
• the /KEY statement references a position and size which is common to all three
input sources, so that three data sets are arbitrarily mixed together by the sort
(see Unnamed Reference on page 136)
• the output file contains a mix of the three generated data sets, as sorted in
ascending order at position 24 (Barry is first alphabetically, then Ddy, etc.).
There can be no remapping in the output when multiple input files are
NOTE generated.
The input section of the RowGen script can be used to generate all the required column
data (a master set), and the output section can be used to produce the requisite tables
using custom-selected subsets from the master set. A common key(s) is shared among
all output tables, thereby allowing you to perform joins and create views that have
relational integrity.
In this example, three multi-column tables are produced that contain: customer
information, stock transaction records, and broker information. Because the tables are
produced at the same time using the same master set of data values, they share a key that
is common to all three tables, the ID code (an ID prefix and number), which is unique to
each customer. The following relations are created with the output tables from this
example:
one to one The customer table will contain a unique name and a unique address
for each ID code.
many to one The trades table will contain many (one or more) buy/sell
transactions for each ID code.
one to many The brokers table will contain a unique brokerage company office for
many different ID codes.
The script on the following page, common_keys.rcl, uses several SET files to provide
realistic values.
/SORT
/KEY=idcode_prefix
/KEY=idcode_number
/KEY=time # where ID codes are the same, order by transaction time
customers.out (excerpts)
Note that:
• the first column contains the sorted ID code, where the prefix is either A, B, C,
D, or E, as drawn from the three-column set file trading_cities.set.
• the use of the output file /INCLUDE statement with file name references (that is,
without logical expressions) was used to ensure uniqueness of ID codes, thereby
preventing duplicate ID codes for any given name and address (see /INCLUDE
and /OMIT Syntax on page 146)
• the names and street addresses were drawn from SET files (see Character SET
Files on page 111)
• the cities and states were drawn from a relational SET file (see Relation SET
Files on page 119).
trades.out (excerpts)
Note that:
• The first column contains the sorted ID code. Note the repetition of ID codes that
simulate multiple transactions per customer. Repetition frequency is based on
the number of records generated and restrictions on the SET values.
• Random timestamps were drawn from a range in a timestamp SET file
(see Timestamp SET Files with Ranges on page 117), and this was used as a sort
key where ID codes were the same.
• A "buy" or "sell" transaction type was chosen from a literal SET
(see Literal SETs on page 122).
• The number of shares was controlled by a literal numeric range SET (see Literal
SETs on page 122).
• The stock symbol and stock prices were drawn from a relational SET file
(see Relation SET Files on page 119).
• The final column’s value was cross-calculated from the number of shares bought
or sold times the price of that stock.
brokers.out (excerpts)
A010||Los Angeles, CA
A011|Acme Brokerage Company|Atlanta, GA
A013|Acme Brokerage Company|Atlanta, GA
...
B004|BrandX Traders, Inc.|Biloxi, MS
B008|BrandX Traders, Inc.|Trenton, NJ
B010|BrandX Traders, Inc.|Salem, NC
...
Note that:
Conclusion
The above tables, when loaded into a database, will provide a transactional environment.
Because of the common ID code key across the tables, you are able to join, and to create
views that have relational integrity.
This example shows how you can stream rows of data directly from RowGen into
another process. By streaming data, rather than creating physical files, you do not have
to wait for the data generation to complete in order to start the next process. With this
method, I/O efficiency can scale linearly as data sets grow.
This example also shows how you can use the ctl2ddf utility to translate SQL*Loader
control file statements into RowGen-supported /FIELD layouts. The /FIELD layouts
are written to a data definition file which can be invoked from within a RowGen job
specification script (see Data Definition Files on page 62).
Consider that you have an empty Oracle table instream, which you expect to populate
on a nightly basis with new test data. The table description for instream is:
Name Type
--------- ----------------
CODE CHAR(5)
PRICE NUMBER(6,2)
PART VARCHAR2(12)
Your control file, stream_in.ctl, is used to load the new nightly test data into this table:
LOAD DATA
INFILE 'outstream.dat'
TRUNCATE
INTO TABLE instream
TRAILING NULLCOLS
(code position(0001:0005) char,
price position(0006:0012) DECIMAL EXTERNAL,
part position(0013:0024) char)
To generate the metadata used by RowGen, you can run the utility ctl2ddf against this
control file, as follows:
ctl2ddf stream_in.ctl
/FILE=outstream.dat
/FIELD=(code, POSITION=1, SIZE=5)
/FIELD=(price, POSITION=6, SIZE=7.2, NUMERIC)
/FIELD=(part, POSITION=13, SIZE=12)
Note that:
• the .ddf file was named instream.ddf, as derived from the LOAD table entry in
the control file
• the /FILE entry uses the INFILE name from the control file
• the "price" field was given a NUMERIC data type for RowGen purposes, derived
from decimal external, and the size defaults to a precision of 2 (and increased to
size 7 to account for the decimal place).
For the purposes of generating meaningful data, this example requires that you manually
change one of the field statements in instream.ddf in order to reference SET file values.
That is, change the part field to the following:
/FIELD=(part,SET=parts_list2.set,POSITION=13,SIZE=12)
If you do not make this change, the example will run, but the "part" field will consist of
random ASCII characters.
Consider the following script, stream_out.rcl. This invokes the /FIELD statements in
instream.ddf:
This script will generate 1 million records to be loaded, and orders the rows on the
"code" field. The DIRECT=TRUE clause bypasses SQL*Loader’s indexing function,
which would otherwise slow the load with its internal sort. Because the data has been
pre-sorted on the desired key(s) by RowGen, you can specify this faster direct path load
option.
Finally, you can use the batch file, stream.bat, which will perform the streaming
operation:
mkfifo outstream.dat
rowgen /spec=stream_out.rcl |
sqlldr scott/tiger control=stream_in.ctl DIRECT=TRUE
When streaming RowGen data into a process other than SQL*Loader, you
NOTE may not need to used a named pipe (as mkfifo specifies). You can
declare stdout for RowGen output (/OUTFILE=stdout), and stdin can
be the input source to the next process. SQL*Loader, however, requires a
.dat extension for its data source (INFILE), so the named pipe convention was
used in this example.
When you run stream.bat, the RowGen and SQL*Loader processes begin
simultaneously. To verify the results, check the contents of the newly populated table in
Oracle, which now contains 1 million rows:
10 rows selected.
Note that:
1 PURPOSE
This chapter describes the RowGen Control Language (RCL). RCL syntax uses
intuitive key words and logical expressions to generate and produce any desired volume
and format of test data.
RCL syntax is explicit; it is a 4GL designed to be easy for anyone to learn, use and
modify. RCL job scripts can be invoked from the command line, the IRI Workbench
Eclipse GUI, embedded into batch scripts, or called into programs.
With a single RCL job script and I/O pass through the RowGen program, you can
generate one or more random files (i.e., from randomly generated data, or from
randomly selected real data) in one or more formats. If you wish, you can also
simultaneously incorporate major transformation and presentation functions for future
application prototyping and report format sharing. The supported manipulations are:
• select
• sort
• aggregate
• cross-calculate
• report.
More specifically, RowGen provides the following functionality during data synthesis:
• A single input definition can be filtered and sorted, and multiple input definitions
with various formats can be mapped into multiple output files with different
formats.
• Summary results can be generated (to multiple levels of grouping at the same
time), and can include maximums and minimums, totals, counts, and averages.
• Detail records can be output to one file, with aggregations sent to another.
RowGen can accomplish its transformation mappings from input to output because it
references the layouts of named fields within your record definitions. This also allows
you to reference metadata that are centrally maintained (in data definition files),
providing the basis for a shared data environment and the creation of materialized views.
Because the source for RCL is the CoSort Sort Control Language (sortcl) program,
your RCL job scripts can be used immediately for equivalent data transformations and
reporting on real data sources if you also license CoSort.
The RowGen data flow diagram on page 54 illustrates the functionality of RowGen by
following the flow of data from input (generation), optional processing (sorting), and
output (production). Data can be mapped, sorted, and remapped into multiple output
targets and formats.
The job script files (.rcl), SET files (.set) and result files (.out) referenced
NOTE throughout this chapter are provided in the /examples/RCL_chapter
subdirectory of your RowGen install directory, so you can run these examples
and re-create the results.
db
If you are specifically interested in using RowGen to create test data in structurally and
referentially correct target tables of a relational database, please use the IRI Workbench
GUI, built on Eclipse, and its DB test data creation job wizards, in particular.
The GUI also has a custom test data job wizard that supports RCL functions and
specifications for bespoke table, file, and report targets needing test data.
2 CONVENTIONS
This section describes documentation conventions that are used throughout this chapter.
W Windows users cannot use \ on the command line. Long lines should simply wrap
around to the next line.
WARNING!
You cannot use a \ line continuation character to separate a
[path]filename reference, even if it contains spaces (see
Spaces within File Names/Paths: Windows Users Only on
page 56) Also, you cannot use the line continuation character to
break up any word. You must place the \ before or after the
complete word, and complete the statement on the next line.
2.2 Comments
The # character marks the beginning of comments on a line. Comments may begin after
a statement on the same line, or may be on a line by themselves. The comment continues
until the end of the line, and is ignored during processing.
Square brackets [] are used to describe optional parameters or values. They are also
used in numeric SET file ranges (see Numeric SET Files with Ranges on page 113).
The character $ preceding a variable name directs RowGen to replace the environment
variable with its current value. You may use any of the following conventions:
• $variable
• ${variable}
• $[variable]
U UNIX users defining environment variables in a Bourne or Korn shell must export
the variables for RowGen to recognize them.
The following rules apply both to names and statements recognized by RowGen, and to
file names and field names that you define:
WARNING!
Hyphenated names can be interpreted as mathematical
expressions involving subtraction. The use of underscores
is therefore recommended for compound file and field names.
Statements and field names are not case-sensitive, that is, upper and lowercase letters are
interchangeable, for example:
• POSITION is the same as position.
• /FIELD=(PARTY) is the same as /field=(Party)
U UNIX paths and file names, however, are case-sensitive, so the file chicago is not
the same as the file CHICAGO, Chicago, etc.
W For Windows users with Fast Access Table (FAT) and NT File Systems (NTFS),
file names are not case-sensitive, so the file chicago is the same as the file CHICAGO,
Chicago, etc.
Refer to your operating system manual for the acceptable maximum length and naming
format of a file name.
stdin and stdout are used for standard input and standard output (pipes), respectively.
When an output file name is not specified, the default is stdout.
2.6 Abbreviations
and
/SPECIFICATIONS is the same as
/SPEC
3 EXECUTION
RowGen is a standalone program that is executed from the command line or from
within a batch script.
To begin execution from the command line, enter the program name RowGen followed
by either actual specifications, a job specification file, or a combination of both.
Although you can enter RCL statements on the command line, it is strongly
recommended that you organize these statements into a job specification (script) file
which RowGen reads and processes (see Job Specification Files on page 61). This
prevents you from encountering difficulties with shell control characters and command
line buffer limits, and you can preserve your scripts for repeated use.
The syntax for executing a script from the command line is:
rowgen /SPECIFICATION=[path]script_file
rowgen /spec=/home/test/job10.rcl
W The syntax required for Windows users referencing the path to a script_file
depends on whether the drive letter is used. The following is an example of the syntax
required in each case:
rowgen /spec=C:\\home\\test\\job10.rcl
rowgen /spec=/home/test/job10.rcl
That is, Windows users must use a double backslash when including the drive letter as
part of a /SPECIFICATION= command-line statement. If the drive letter is not used,
you can use either a single forward slash (for consistency with UNIX) or a single
backslash (the standard Windows convention). See Spaces within File Names/Paths:
Windows Users Only on page 56 if the name of the job script, or its path, contain a
space.
During RowGen execution, any syntax errors are reported by the line number in your
script. The script also provides a way to re-execute, share, and modify your data
definitions and/or job specifications.
4 USAGE
This section describes the structure of a RowGen script, and explains the basic syntax
requirements for generating data, ordering data (if required), and performing additional
transformations on the generated data (if required). RowGen data flow is controlled by
statements specified in the input, action, and output sections of a RowGen job
specification script, as illustrated in the diagram below. For more details, see the
following sections:
/OUTFILE=target #2
...
Action (Process) This can be a /SORT (the default) over one or more key fields,
or a /REPORT (where the generated records are not ordered).
The input section of a RowGen script consists of an /INFILE name, followed by a set
of attributes such as the number of records to be generated (/INCOLLECT), the field
layout, and record filters (see Input Filenames and Attributes on page 56). In order for
RowGen to generate data, an /INFILE name must be followed by one or more /FIELD
statements. This requirement does not apply to the output section unless you have
additional output-specific requirements (see Output Filenames and Attributes on
page 57). The output section should, however, consist of one or more /OUTFILE
declarations, where you name each output file. Otherwise, the default output target is
stdout.
For documentation purposes, the terms generated and produced are used to
NOTE differentiate between the data that are generated internally by RowGen based
on your input attributes, and the output targets that are ultimately created.
Input file names in RowGen are required only as placeholders for when real data
become available. RowGen is designed only to generate new data files/streams and
reports based on your specifications. However, if real data sources will become
available at sometime in the future, it is recommended that you specify the name of the
input file(s), if known, that you will be working with. In this way, if you purchase a
license for CoSort, the sortcl program interface will recognize, as a legitimate input
source, the input file(s) you have specified, reducing (or in some cases, eliminating) the
need to modify the RowGen script.
The syntax for the input file section of a RowGen script is as follows:
/INFILE=[path]filename
attributes
Optionally, you can create multiple /INFILE sections, each with their own attributes, to
generate disparate data sets that can be processed together, as illustrated in Example 16
on page 36.
A valid input file name is also a named pipe or an unnamed pipe (stdin). Standard input
is relevant if you intend to upgrade to CoSort and use streamed-in data as an input
source. Similarly, if you intend to upgrade to CoSort’s sortcl to perform a job that
involves multiple input sources, you may use the following syntax in RowGen to
identify multiple files of similar formats:
/INFILES=([path]filename1,[path]filename2,...)
attributes
W In some cases, you may want RowGen, for placeholder purposes, to reference a
path name or file name that contains a space. RowGen allows you to use the Windows
convention of a tilde (~) followed by a 1 to substitute for the space. For example, within
a RowGen script, you can use:
/INFILE=C:\progra~1\iri\rowgen21\chiefs
This feature also applies to /OUTFILE (see OUTPUT OPTIONS on page 169) and
/FILE statements (see Data Definition Files on page 62) within scripts.
4.3 Action
This section describes the two fundamental actions (processes) that can take place when
moving generated records (as defined in the input section) to one or more outputs (as
defined in the output section). Only one action statement can be designated in a
RowGen job; they are mutually exclusive. You can specify either of the following
action statements:
/REPORT Generated data are passed to output without ordering, that is, as
unsorted records.
/SORT Records are ordered over one or more keys that you specify. This is
the default action when no action is specified. If no /KEY statements
are included beneath a /SORT statement, the default key is a left-to-
right sort over the entire record (see KEYS on page 135).
As records are generated by RowGen, they are processed and written to one or more
outputs, each with optional target-specific field layouts, record-filters, and other
attributes. An output target can be named stdout (the default if no /OUTFILE statement
is included), where RowGen displays results on-screen unless streaming into another
process (see Example 1 on page 13 and Example 18 on page 44). In simple, generation-
only RowGen scripts, there is no need to specify any output file attributes other than the
file name because the data generated and processed by RowGen are based on the layout
(and any record-filter logic) provided in the input file section of the script.
However, if you are performing a job with target-specific requirements, you must add
one or more /FIELD statements (and other statements if required) beneath your output
file name declaration. /FIELD and other statements included on output allow you to
exploit the full range of RowGen’s capabilities in addition to random data generation,
including:
• data-type conversion
• remapping
• conditional field value / substitution
• record reformatting
• complex report formatting
• drill-down and roll-up aggregation
• other derivations (such as mathematical formulae).
/OUTFILE=[path]filename
optional_attributes
This chapter describes all the various ways to perform the above transformations,
starting with FIELD EXPRESSIONS (CROSS-CALCULATION) on page 123.
If you intend to produce multiple output files from the same generated data, then you
must define file-specific attributes with each new /OUTFILE statement, for example:
/OUTFILE=apples
attributes for the apples file
/OUTFILE=peaches
attributes for the peaches file
/OUTFILE=pears
attributes for the pears file
In cases where output reports contain both summary and detail records, a single output
file such as pears will requires multiple record formats, in which case you must use an
/OUTFILE statement to describe the attributes of each format, using the same filename:
/OUTFILE=pears
attributes_a
/OUTFILE=pears
attributes_b
...
4.5 /INCOLLECT
This statement determines the number of records to be generated by RowGen after any
include or omit logic is applied. It is specified last in the inputs section(s) of a script. The
syntax is:
/INCOLLECT=n
or
/INCOLLECT=PERMUTE
You can specify the number of records produced on output using the /OUTCOLLECT
statement (see MISCELLANEOUS OPTIONS on page 174). By default, the
/OUTCOLLECT value is the same as the /INCOLLECT value.
4.6 /OUTCOLLECT
This statement determines the number of records produced for each output target (file)
after data is generated and processed, and then after any output /INCLUDE, /OMIT, and
/OUTSKIP filters are applied. It is specified in the output section(s) of a script.
The syntax is:
/OUTCOLLECT=n
To determine the number of records generated on in the input side (after input conditions
are satisfied, use /INCOLLECT (see /INCOLLECT on page 59). If the number of
/OUTCOLLECT records you specify is greater than the /INCOLLECT value, RowGen
will not return an amount of records greater than that which was generated.
In cases where you are applying record filter logic (see INCLUDE-OMIT
NOTE (RECORD SELECTION) on page 146), it is possible that the specified number
of /OUTCOLLECT records for a given output file will not be produced because
a smaller number of generated records meets the output filter condition(s) you
have specified.
5 FILES
This section describes the types of files that can be referenced by RowGen:
This is a text file you can use to set system resources locally and/or globally that
RowGen’s embedded CoSort engine uses for high-performance sorting of your random
data. You can adjust and control several resources to improve sorting efficiency, such as
the:
• number of threads
• amount of memory
• overflow areas
• verbosity of monitor messages.
W On Windows systems, the resource control file is called rowgen.rc, and its values
take precedence over default registry settings only when this file is in the same directory
as the RowGen (or a separately licensed RowGen) executable.
You may set the environment variable COSORT_TUNER to the path and name for a
resource control file. And, you may create multiple resource control files so that you
may have specific files for specific users and/or jobs. To ensure that a particular file is
used with a specific job, you can use the following statement within a RowGen job
script:
/MEMORY-WORK="[path]filename"
The /MEMORY-WORK values have a higher priority than those in COSORT_TUNER. After
values in COSORT_TUNER are checked, any values in .rowgenrc (UNIX) or rowgen.rc
(Windows) in the search path will be read. If there are still values that have not been set,
factory default values will be used (see Search Order for Resource Controls on
page 211).
These files are used to organize RowGen statements. These named files are referenced
by a /SPECIFICATION or /SPEC statement as shown below:
rowgen /SPECIFICATION=[path]filename
where the filename that is launched by RowGen is the job specification file. When a
/SPECIFICATION=filename statement occurs within a job specification file, the
contents of the referenced file (the Data Definition File, or DDF) are read into the job at
that point. For example, you might have a job specification file as follows:
/FIELD=(pres,SET=names.set,POSITION=1,SIZE=22)
/FIELD=(votes,POSITION=24,SIZE=3,digit)
/FIELD=(party,SET=types.set,POSITION=28,SIZE=3)
/INCOLLECT=500
This is the equivalent of writing the following job specification (indents applied for
readability only):
/INFILE=chiefs
/FIELD=(pres,SET=names.set,POSITION=1,SIZE=22)
/FIELD=(votes,POSITION=24,SIZE=3,digit)
/FIELD=(party,SET=types.set,POSITION=28,SIZE=3)
/INCOLLECT=500
/REPORT
/OUTFILE=parties
Note that instead of using the name of a specification file, you may use an environment
variable that references the file. In that case, the job specification file might be as
follows:
/INFILE=chiefs
/SPECIFICATION=$KEY1
/REPORT
/OUTFILE=parties
When an /INFILE or /OUTFILE statement occurs in a job specification file, the record
definitions can be obtained from the same-named /FILE layout within the data
definition file. The data definition file reference must appear before any/INFILE
or /OUTFILE that will reference the layouts in that data definition file.
# chiefs.ddf
/FIELD=(president,POSITION=1,SIZE=22)
/FIELD=(votes,POSITION=24,SIZE=3)
/FIELD=(service,POSITION=28,SIZE=9)
/FIELD=(party,POSITION=40,SIZE=3)
/FIELD=(state,POSITION=45,SIZE=2)
/INFILE=chiefs
/SPECIFICATION=definitions.ddf
/REPORT
/OUTFILE=out # no change to the generated records
which will produce two files of 100 records each: out will contain randomly generated
records based on the chiefs layout.
By default, all generated RowGen records are consistent with the file format
(process type) called RECORD (see RECORD on page 63).
However, for output purposes, you can change the default file format to any other
supported file format type by using the /PROCESS statement in the output section of a
RowGen job script. You can also produce multiple output files simultaneously, each
with a different file format.
/OUTFILE=filename
/PROCESS=process_type
• RECORD on page 63
• MFVL_SMALL on page 64
• MFVL_LARGE on page 64
• VARIABLE_SEQUENTIAL (or VS) on page 65
• LINE_SEQUENTIAL (or LS) on page 65
• VISION on page 65
• UNIVBF on page 66
• CSV on page 66
• LDIF on page 66
• ODBC on page 67
• XML on page 67
• ELF (W3C Extended Log Format) on page 69.
RECORD
MFVL_SMALL
When you are using this process type on output, you have the option to set your own
minimum and maximum record lengths, the values of which will be written to the
header record to support inter-process conversions.
The syntax for using this process on output therefore supports additional options:
/PROCESS=MFVL_LARGE[,min_length][,max_length]
where min_length and max_length are the minimum and maximum record
lengths contained in the output file. If /PROCESS=MFVL_SMALL on input, this
information is taken from the input file if you do not specify these attributes.
MFVL_LARGE
When you are using this process type on output, you have the option to set your own
minimum and maximum record lengths, the values of which will be written to the
header record to support inter-process conversions.
The syntax for using this process on output therefore supports additional options:
/PROCESS=MFVL_LARGE[,min_length][,max_length]
where min_length and max_length are the minimum and maximum record
lengths contained in the output file. If /PROCESS=MFVL_LARGE on input, this
information is taken from the input file if you do not specify these attributes.
WARNING!
The format of the short integer is machine-dependent.
Therefore, VS data between dissimilar computers (for example,
between RISC and SPARC) may be incompatible due to the
difference in endianness.
VISION
When using /PROCESS=VISION on output, you must specify the index /KEY in the
sortcl job script in order to create an indexed file.
UNIVBF
CSV
To facilitate the creation of CSV file metadata, you may wish to use the utility csv2ddf,
which is provided with the RowGen package in the $ROWGEN_HOME/bin directory
(see the csv2ddf sub-chapter on page 195). In Windows, csv2ddf is located in
\Rowgen21\bin. The utility examines file headers and generates a RowGen data
definition file based on the input field names found in the file header. Its syntax is:
csv2ddf datafile [data definition filename]
LDIF
/INFILE=ldif_data
/FIELD=(cn,SET=cn_vals.set,POSITION=1, SEPARATOR='|')
/FIELD=(address1,SET=add1_vals.set,POSITION=2,SEPARATOR='|')
/FIELD=(zipcode,SET=zip_vals.set,POSITION=3,SEPARATOR='|')
/REPORT
/OUTFILE=test1.ldif
/PROCESS=LDIF
/FIELD=(cn,POSITION=1,SEPARATOR='|')
/FIELD=(address1,POSITION=2,SEPARATOR='|')
/FIELD=(zipcode,POSITION=3,SEPARATOR='|')
ODBC
Use /PROCESS=ODBC in a job script to populate (on output) table columns in databases
supported by ODBC (Open Database Connectivity). For CoSort version 9.5.1, the
following database types have been tested for compliance with /PROCESS=ODBC:
• Oracle
• SQL Server
• DB2
• MySQL.
WARNING!
On some systems, the ODBC specification has been updated
such that a certain value (SQLLEN) has changed from a 32-bit
integer to a 64-bit integer when the software is compiled in 64-
bit mode. Some older drivers will expect this value to be 32-bit,
and newer drivers will expect it to be 64-bit. To address this, a
separate file format must be used,
/PROCESS=ODBC_LEGACY, which supports the older standard.
Current PostgreSQL and MySQL ODBC drivers use the 64-bit
value(/PROCESS=ODBC). Oracle 11i ODBC drivers use the
32-bit value (/PROCESS=ODBC_LEGACY). This is not an issue
on Windows or MacOSX (iODBC always defines it as 64-bit).
XML
When /PROCESS=XML on output, data records are generated within XML tags, and an
XML file is produced. To define XML data elements, you can specify XML attributes
and tag names within each /FIELD statement using an XDEF attribute. Alternatively,
you can define a single XDEF for one field, and the same nodes you specify will be
applied to all remaining fields, where the tag names will assume the /FIELD names by
default.
If you do not define any XDEF attributes when using /PROCESS=RANDOM, the
NOTE XML tags will default to a generic /File/Record naming convention in the
target XML file.
When defining the XDEF attribute, the syntax for /FIELD statements used for
/PROCESS=XML is:
/FIELD=(fieldname,POSITION=n,SEPARATOR='x',XDEF="/node1/node2/[/node3]/tag_name")
Optionally, you can use the @ sign to specify that data is an attribute of the XML tag
preceding it, for example:
/FIELD=(fieldname,POSITION=n,SEPARATOR='x',XDEF="/node1/node2/@attribute_name")
Note that you can specify any level of nodes, depending on how nested the source XML
tag is (for input purposes), or how nested you would like the XML target tag to be.
Generally, the first two nodes listed in an XDEF attribute will be the same for
NOTE all fields you are defining because the actual data elements do not typically
reside within the two uppermost node levels in the XML hierarchical structure.
The /FIELD statements that you define will determine the size and position of the field
elements within the records that are output by RowGen.
The following is an example of a script that will produce an XML output file. Note that
the fields defined on input will be mapped automatically to the output section because
no fields are specified beneath the /OUTFILE statement (see Data Flow Structure in
RowGen Scripts on page 55):
/INFILE=xml_data.in
/FIELD=(president,POSITION=1,SEPARATOR="^",XDEF="/chiefs/chief/president")
/FIELD=(sparty,POSITION=2,SEPARATOR="^",XDEF="/chiefs/chief/state")
/FIELD=(state,POSITION=3,SEPARATOR="^",XDEF="/chiefs/chief/party")
/SORT
/KEY=president
/OUTFILE=data.xml
/PROCESS=XML
In this case, because the same node structure is desired for all output fields, and because
the tag names that are desired in the target are identical to the RowGen /FIELD names,
the same XML output file could be generated by specifying only one XDEF as an
indicator of how the other fields will be written:
/INFILE=xml_data.in
/FIELD=(president,POSITION=1,SEPARATOR="^")
/FIELD=(sparty,POSITION=2,SEPARATOR="^")
/FIELD=(state,POSITION=3,SEPARATOR="^",XDEF="/chiefs/chief/party")
/SORT
/KEY=president
/OUTFILE=data.xml
/PROCESS=XML
Web Logs
RowGen supports the generation of internet web transaction files in W3C Extended Log
Format (ELF). These files have a header containing lines of comments, followed by a
line naming the data fields. When you specify /PROCESS=ELF on output, RowGen
produces a header based on the field names and positions as they are specified on output.
If you wish to create a RowGen script that is designed to process ELF files when the
actual data become available, you can use the utility elf2ddf, which is provided with the
RowGen package in the $ROWGEN_HOME/bin directory. In Windows, elf2ddf is
located in \Rowgen21\bin (see the elf2ddf sub-chapter on page 199). This utility reads
ELF file headers and generates RowGen data definition files accordingly, giving you a
base for generating ELF data files. Its syntax is:
elf2ddf datafile [data-definition-filename]
5.5 Statistics
5.6 Auditing
RowGen can produce a self-appending log file, in XML format, that contains
comprehensive job information for the purposes of auditing. Auditing is enabled when
the AUDIT entry is present in the rowgenrc file on Unix or the Windows Registry. An
audit record is produced for every RowGen execution, and these records append to the
XML [path]filename specified by the AUDIT entry.
Consider that the following entry is set in the Windows Registry (or set in the
rowgenrc file on Unix):
AUDIT=C:\RowGen\Tests\mytests\audit\rgaudit.xml
/INFILE=audit.in
/SPECIFICATION=fields.ddf
/REPORT
/OUTFILE=$CSOUTPUT
/SPECIFICATION=fields.ddf
where fields.ddf contains the first two /FIELD statements and a nested
/SPECIFICATION entry that refers to the remaining two fields of the generated record:
/FIELD=(name,POSITION=1,SIZE=27,ASCII)
/FIELD=(year,POSITION=28,SIZE=12,ASCII)
/SPECIFICATION=fields2.ddf
/FIELD=(party,POSITION=40,SIZE=5,ASCII)
/FIELD=(state,POSITION=45,SIZE=2,ASCII)
When the above job script is executed, an entry in the audit file rgaudit.xml, is
created/added. When rgaudit.xml is opened in a text editor, the .ddf file names and
their components are tabbed within the <Script> element to allow for easier
readability of nested script specifications, as follows:
...
<Script>
/AUDIT=C:\RowGen\Tests\mytests\audit\rgaudit.xml
/INFILE=audit.in
/SPECIFICATION=fields.ddf
/FIELD=(name,POSITION=1,SIZE=27,ASCII)
/FIELD=(year,POSITION=28,SIZE=12,ASCII)
/SPECIFICATION=fields2.ddf
/FIELD=(party,POSITION=40,SIZE=5,ASCII)
/FIELD=(state,POSITION=45,SIZE=2,ASCII)
/REPORT
/OUTFILE=C:\RowGen\Tests\mytests\audit\chiefs.out
/SPECIFICATION=fields.ddf
/FIELD=(name,POSITION=1,SIZE=27,ASCII)
/FIELD=(year,POSITION=28,SIZE=12,ASCII)
/SPECIFICATION=fields2.ddf
/FIELD=(party,POSITION=40,SIZE=5,ASCII)
/FIELD=(state,POSITION=45,SIZE=2,ASCII)
</Script>
...
This audit file contains only one record. Subsequent jobs that are written to
NOTE rgaudit.xml are appended to the bottom, and always begin with a new
<AuditRecord> tag.
6 FIELDS
The data files/streams generated by RowGen consist of records, where records consist
of one or more fields that you define. Fields are the building blocks of RowGen files,
and the various syntax options for defining fields allow you to customize the contents to
best suit your requirements.
The location of a field within a RowGen record is either:
• fixed, where the starting byte for each field is always in the same column
• floating or delimited, where the first field is at position 1, and subsequent fields
are separated by a delimiter character(s).
In both cases, numbering begins with one (1). RowGen allows you produce files that
contain both fixed and floating fields within the same record.
6.1 Syntax
The following is the syntax for defining a fixed-position field. When defining input
fields to be generated, the POSITION and SIZE attributes are required, in addition to
field name:
/FIELD=(fieldname[,ROWID[=values]][,SET=[access ][NOT_SORTED ]
[path]filename][,SET={literal_set}],POSITION=n,SIZE=n.[n]
[,FRAME=’char’][,data type])
The following is the syntax for defining delimited fields. When defining input fields to
be generated, the POSITION and SEPARATOR are required, in addition to the field name:
/FIELD=(fieldname[,ROWID[=values]][,SET=[access ][NOT_SORTED ]
[path]filename][,SET={literal_set}],POSITION=n,SEPARATPR=’Char’[,SIZE=n[.n]]
[,FRAME=’char’][,data type])
Note that field attributes are separated by commas. The field name must appear first, but
other attributes may appear in any order. The n shown for some of the parameter values
represents a whole number.
Note that alignment, mill, and fill options are available when defining /FIELD statments
in the output section of RowGen scripts (see Alignment on page 90, MILL on page 91,
and FILL on page 92).
The field name you provide identifies the field. A symbolic or meaningful name is
recommended. Once you have identified a field name in the input section of your script,
it is recognized in the /OUTPUT section(s) of the same script, or in the /INREC section
(see /INREC on page 126). In some cases, you can create a newly named field on output
or in /INREC, but only when you are deriving a new field (for example, for summary
purposes or for cross-calculation).
In the output file only, it is possible to have a field definition that is only an input field
name. Each defined field will be mapped sequentially (one after the other). For example,
if you had defined/generated the fields lastname and firstname in the input file, you
could have the following field definitions in an output file:
/FIELD=(lastname)
/FIELD=(firstname)
Unless new field attributes are specified, the output fields will retain the attributes
(such as size, position, and data type) specified in the input section (or the /INREC
section, which takes precedence, if applicable).
6.3 ROWID
The ROWID field attribute enables you assign a unique, incrementing or decrementing
row number (or ID tag) to a field within each generated record. ROWID is valid in the
/INFILE section of a job script. See SEQUENCER on page 166 for details on creating
an incremental value field in the /OUTFILE section of a job script.
Syntax
ROWID[=”[INIT][,STEP][,MAX or MIN]”]
INIT The initial value for the counter. Can be either a positive or negative
integer of up to 19 digits. (RowGen supports RowID values that
reach, but do not exceed, 22 digits.) The INIT value defaults to 1.
STEP The increment value of the ROWID counter. This can be set to a
positive step to indicate a forward count, or a negative step that
requires a minus sign (-) to indicate a backward count. Defaults to
counting forward by one.
MAX Applicable when you specify a positive STEP value (the default).
The highest value that the ROWID counter will reach . If this value is
reached, the ROWID counter recycles to the INIT value. The MAX
value can be either a positive or negative integer. By default, there is
no MAX value.
MIN Applicable when you specify a negative STEP value. The lowest
value that the ROWID counter will reach when counting backwards. If
this value is reached, the ROWID counter recycles to the INIT value.
The MIN value can be either a positive or negative integer. Be default,
there is no MIN value.
If you want to omit the STEP, option, and rely on its default (STEP=1), you
NOTE must include an empty space between the commas to denote the absence of the
STEP, as shown in the last example above.
If the field with a ROWID attribute also includes a SIZE, the ROWID value
will be right-aligned in accordance with the field size.
This example, rowid.rcl, demonstrates how the ROWID field attribute behaves when
INIT, STEP, and MAX values are provided. It requires the SET file parts_list1.set,
which contains:
Brackets
Screws
Nails
Tacks
1|Nails|-3.88
11|Brackets|41.47
21|Screws|65.66
31|Brackets|-2.32
41|Brackets|-8.14
51|Nails|-1.13
61|Tacks|53.58
71|Nails|-6.35
81|Nails|-6.24
91|Brackets|-4.02
1|Screws|-.99
11|Tacks|85.45
21|Brackets|82.73
31|Screws|75.14
41|Tacks|71.60
51|Screws|-4.52
and so on....
Note that:
• the ROWID field named "pkey" begins at 1 (INIT=1) for the first record
• the "pkey" field increments by positive 10 (STEP=10) for subsequent records
• when the MAX value of 100 is reached (91 in this case due to the STEP size),
the ROWID counter then recycles to 1.
See SET FILES on page 108 for full details on using user-defined SETs to provide
realistic values from which RowGen can draw.
6.5 Distributions
Establish parameters for data that control the distribution of values (numeric or
character-based) that best approximate the occurrence rate (or spread) of certain values.
The following distribution types use either routines or set files.
• Linear
• Normal (bell curve) for mean and standard deviation
• Normal (bell curve) for a range
• Weighted distribution of items
The routines used for these distributions are contained in the library file libcsnum,
provided in /lib/modules in the home directory.
Linear
(field1=linear(min_value,max_value,precision),TYPE=ALPHA_DIGIT,POSITION=1,SEPARATOR=",")
where linear is the routine used , field1 is the name of the new field, min_value
and max_value are the lowest value and highest value in the distribution, and
precision is the number of decimal places or significant digits of the generated
numbers.
The syntax for invoking a normal distribution for mean and standard deviation is:
(field1=normal_distribution1(mean,std_dev,precision),TYPE=ALPHA_DIGIT, POSITION=1,SEPARATOR=",")
where normal_distribution1 is the routine used, field1 is the name of the new
field, mean is the average value within all the generated values, std_dev sets the
standard deviation from the mean, and precision is the decimal precision of the
generated numbers.
where normal_distribution2 is the routine used , field1 is the name of the new
field, min_value and max_value are the lowest value and highest value in the range,
and precision is the number of decimal places or significant digits of the generated
numbers.
You can control the occurrence rate of certain literal values in relation to the occurrence
rate of others, for example if you want 60 percent males and 40 percent females in your
test data, irrespective of the number of total records generated. The total must equal 100
percent.
(field1=weighted_distribution(ratio,"value",percentage"value"),TYPE=ALPHA_DIGIT,POSITION=1,SEPARATOR=",")
where weighted_distribution is the routine used, field1 is the name of the new
field, percentage is a percentage for the occurrence of the value, value is a literal
value that will be generated in accordance with the above percentage in relation to the
other percentage/value entries.
You can add multiple entries, but the percentages must equal 100.
Random numeric values will be generated from the user-defined space; the entire space
can consist of one or more smaller spaces. Each entry that you define describes that
smaller space. This dialog has entries to define the smaller spaces using percentage-
based ranges of numeric values, with flexible minimum and maximum ranges for each.
Weighted Distribution requires the existence of a set file with one or more entries with
five options that must be tab separated. The options are percentage, beginning minimum
value, ending minimum value, beginning maximum value, and ending maximum value.
The syntax to invoke the set file with weight options is:
where field1 is the name of the new field. You must use WEIGHTS followed by a
reference to the set file, including the path.
20 45 25 45 45
20 5 5 5 25
30 5 5 45 45
30 0 25 50 25
If you invoke the above set file with the WEIGHTS option, RowGen will produce a
weighted distribution of values that would look similar to the following if mapped to a
scatter-plot diagram.
As shown in the diagram above, there are four sections, two of which represent the 20
percent range entries, and two that represent the 30 percent range entries. Note that a
scatter plot similar to the above can be generated with a Preview feature in the IRI
Workbench.
6.6 POSITION
This statement describes the starting location of each field in the record. The syntax is:
POSITION=n
For a fixed-position field, n is the starting column for the first byte of the field. In a
floating (delimited) field, n is the field number when counting from left to right. If
defining delimited fields on output, you must define each field sequentially even if
the field is null.
To generate three fixed-position fields, you might use the following script, fixed.rcl:
/INFILE=fixedpos.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(code1,POSITION=1,SIZE=5,digit)
/FIELD=(code2,POSITION=7,SIZE=5,digit)
/FIELD=(code3,POSITION=13,SIZE=5,digit)
/REPORT
/OUTFILE=fixedpos.out
To produce delimited rows, you might use the following script, variable.rcl:
/INFILE=variable.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(code1,POSITION=1,SIZE=5,SEPARATOR='|',WHOLE)
/FIELD=(code2,POSITION=2,SIZE=5,SEPARATOR='|',WHOLE)
/FIELD=(code3,POSITION=3,SIZE=5,SEPARATOR='|',WHOLE)
/REPORT
/OUTFILE=variable.out
95278|12121|97825
78550|20426|68242
45025|95232|51598
Note that the positions are now 1, 2, and 3, which indicate the field positions with
respect to the separator (|), rather than fixed byte positions. Note also how the fields are
given a size, which is supported for both fixed-length and delimited fields. The ASCII-
numeric data type WHOLE is used, which generates whole numbers anywhere from size 1
to size 5, unlike the ASCII-character data type digit which would always generate size
5
(see ASCII Character Data Types on page 94 and ASCII-Numeric Data Types on
page 95).
6.7 SIZE
This statement sets the width, in bytes, of a given field. If you do not provide a SIZE
attribute on input, RowGen will generate fields that range from size 1 through size 10
by default, regardless of the data type of that field. However, if you want to limit this
variance, or set a fixed length for all generated values, you can use the SIZE attribute for
both fixed-position and delimited fields (see also MIN_SIZE and MAX_SIZE on
page 82). Depending on the data type, this attribute behaves differently:
All Other Data Types For all data types other than ASCII-Numeric, any SIZE attribute you
provide will dictate the actual size of every value generated. For
example, if you want to generate a whole number that is always
5 characters long (such as a zip code), you would the data type digit
and specify a size of five, because using WHOLE (an ASCII-Numeric
data type) would generate values that can be of size 1, 2, 3, 4, or 5.
If you want to select from a range of known values that may not all be the same
NOTE size, you can provide a list of acceptable values in a SET file (see SET Files
and Literal SETs on page 77), in which case the SIZE attribute you specify
will simply create a fixed, maximum field size into which the randomly drawn
values will be placed.
For example, to specify a SKU field as 13 bytes long, the statement could be:
/FIELD=(SKU,POSITION=16,SIZE=13,digit)
The .precision option applies only to fields declared as NUMERIC. By default, fields
specified as NUMERIC on input will be generated with a decimal point and two decimal
places to the right (the .precision option is not required). Numeric fields are also
right-aligned by default when fixed-position fields are generated. However, you can also
control the number of decimal places, where the .precision value you specific
indicates the number of digits to be displayed to the right of the decimal point, and size
then represents the total length of the field, including the decimal point and decimal
places (negative sign, if applicable).
/INFILE=numsize.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(value,POSITION=1,SIZE=8.3,NUMERIC)
/REPORT
/OUTFILE=numsize.out
This will produce numsize.out, where the precision is 3 and the field size is up to eight
bytes wide:
8046.126
5858.720
-168.132
If you want to generate a NUMERIC field without decimal places, you must
NOTE specify a SIZE with a precision of 0, such as SIZE=5.0.
If you declare an output field as NUMERIC, and assign it a size that is shorter
than that which was defined on input, then *** characters will be produced to
indicate an overflow. Similarly, if you create summary NUMERIC fields that are
not sized appropriately to accommodate a product of two numeric values, for
example, the overflow characters will appear (see SUMMARY FUNCTIONS
(AGGREGATION) on page 156).
MIN_SIZE and MAX_SIZE are used to manage the size of generated field values. This is
particularly useful when generating delimited (variable-length) fields when you want the
size of the field values to vary according to your parameters, rather than having the same
fixed length assigned to all generated fields when the SIZE attribute is used (see SIZE
on page 81).
You can specify a minimum and maximum value size for any field with an ASCII
character or numeric data type (specifically, those described in ASCII Character Data
Types on page 94 and ASCII-Numeric Data Types on page 95).
[MIN_SIZE=width1][,MAX_SIZE=width2][,SIZE=width3]
where width1 and width2 are the minimum and maximum character string sizes to be
generated. Optionally, you can also include a SIZE attribute, where width3 is the total
field width to be generated in all instances, irrespective of the string size dictated by
MIN_SIZE and MAX_SIZE. That is, fields will be padded with blank spaces when the
string generated is not as large as the SIZE you specify. Note that you can use any
combination of the above attributes.
[MIN_SIZE=width1[.n]][,MAX_SIZE=width2[.n]][,SIZE=width3[.n]]
where width1 and width2 are the minimum and maximum widths of the numeric
values to be generated (including any minus sign, or decimal point and decimal digits if
applicable, as shown in Example 24 on page 82). Optionally, you can specify a decimal
precision of n (the default precision for NUMERIC fields is 2), but any precision value
you specify for MIN_SIZE must be equal to the precision you specify for MAX_SIZE.
(That is, the width of the numeric value before the decimal point can vary, but the
number of digits after the decimal point cannot.) Optionally, you can also include a
SIZE attribute, where width3 is the total field width to be generated, irrespective of the
numeric value size dictated by MIN_SIZE and MAX_SIZE. However, any precision
given for SIZE must be equal to the precision given for MIN_SIZE and MAX_SIZE.
Note that you can use any combination of the above attributes.
See Example 4 on page 16 for an example of using MIN_SIZE and MAX_SIZE attributes
to generate ASCII character and numeric values.
Precision
In the output section of a RowGen script, you can use the PRECISION attribute to
assign a uniform decimal precision to numeric fields. Using PRECISION instead of the
SIZE=width[.precision] convention ensures that each numeric value is displayed
with the decimal precision you specify, without the limitation of having to specify a
SIZE that would result in the same field length for every record (see SIZE on page 81
and Example 24 on page 82). The syntax is:
PRECISION=n
ASCII Substrings
RowGen allows you to identify substrings of either an ASCII field that was generated
on input, or a string of characters that you provide. You can use substrings to re-position
and re-cast these values on output (or in /INREC).
where:
• field_name is a field that was specified on input, and the substring is
derived from its field contents
• "string" is an ASCII string from which the substring will be taken
• value1 is the offset. It can be a positive value (the default) to indicate the
number of characters from the left of the field (or string) where you want your
substring to begin. Or, you can use a negative value to indicate the number of
characters from the right of the field (or string).
• value2 is the substring length. It is the number of bytes/characters you want to
include in the string once your starting point has been determined using value1.
value1 and/or value2 can be field names rather than integers if the fields
NOTE you specify contain integer values.
This example incorporates some of the substring options that are available. Consider the
SET file chiefs.set:
McKinley, William
Roosevelt, Theodore
Taft, William H.
Wilson, Woodrow
Harding, Warren G.
The following script, substring.rcl, incorporates several uses of the substring option:
/INFILE=substring.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(president,SET=chiefs.set,POSITION=1)
/REPORT
/OUTFILE=substring.out
/FIELD=(president,POSITION=1)
/FIELD=(sub_string(president,3,3),POSITION=25,SIZE=10)
/FIELD=(sub_string(president,-3,3),POSITION=35,SIZE=10)
/FIELD=(sub_string("tuvwxyz",-3,3),POSITION=44,SIZE=3)
Note that:
• The first substring field at position 25 includes the third character in from the left
of the president and extends for 3 bytes.
• The second substring at position 35 uses a negative offset to show that the
substring was taken by counting in 3 from the right-most character of the
president field.
• The final substring uses the "string" option and also features a negative offset,
giving the last three characters xyz.
6.8 SEPARATOR
The separator character is used as the delimiting character that separates floating fields.
The syntax is:
SEPARATOR=’option’
For an example of generating records with a single ASCII separator, see Example 23 on
page 80.
Multi-Character Separators
The following script, multichar.rcl, specifies that three records will be generated, each
consisting of three alpha_digit fields (alphabetic and digit characters, the default
data type) separated by a multiple-character separator:
/INFILE=multichar.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(first,POSITION=1,SIZE=3,SEPARATOR=’,*|’)
/FIELD=(second,POSITION=2,SIZE=3,SEPARATOR=’,*|’)
/FIELD=(third,POSITION=3,SIZE=3,SEPARATOR=’,*|’)
/REPORT
/OUTFILE=multichar.out
vel,*|vtS,*|bzO
plZ,*|BP7,*|PhQ
1PG,*|Eud,*|5ds
SEPARATOR=’,*\t’
RowGen allows the use of different separators within the same record definition. The
fields delimited by one separator are totally independent of the fields delimited by
another.
Consider that you want to produce a file that contains multiple separators, where two
inner sub-fields are delimited by a comma (,) and three total fields are delimited by a
pipe (|).
/INFILE=multiple_seps.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(left1,POSITION=1,SIZE=3,SEPARATOR='|')
/FIELD=(mid1,POSITION=2,SIZE=2,SEPARATOR='|')
/FIELD=(mid2,POSITION=2,SIZE=2,SEPARATOR=',')
/FIELD=(right2,POSITION=3,SIZE=3,SEPARATOR='|')
/REPORT
/OUTFILE=multiple_seps.out
gSq|ZF,7Y|YKl
BeE|jS,Jj|7S0
FY8|zl,v7|2HO
6.9 FRAME
FRAME=’char’
You can create one or more framed fields in the input section of a script, or you can
remove, change, or add a new frame to one or more fields in the output (or INREC)
section.
When framing a field to which you assign a SIZE, such as when defining
NOTE fixed-position fields, you must add 2 to the size of the field contents to account
for the enclosing characters. For example, a random digit field with SIZE=5
and FRAME=’*’ might appear as *726*.
Smith, Wanda
Jones, Paul
Jones, Abe
Smith, Bill
Smith, Walter
Jones, Mark
The following script, gen_frame.rcl, encloses the names field within quotes (") to
protect its contents from being processed incorrectly:
/INFILE=framed.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(last_first,SET=full_names.set,POSITION=1,SEPARATOR=',',FRAME=’"’)
/FIELD=(code,POSITION=2,SIZE=3,digit,SEPARATOR=',')
/SORT
/KEY=last_first
/NODUPLICATES
/OUTFILE=framed_protected.out
"Jones, Paul",985
"Smith, Bill",429
"Smith, Walter",724
"Smith, Wanda",728
Note that both segments of the name field (last and first) were protected for sorting
purposes. Without the frame, the two fields in this example could not have been
generated as comma-separated, and the name field would not be protected.
This example shows how you can create CVS files, where all fields are framed by
default. Consider the following script, cvs_out.rcl:
/INFILE=framed.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(field1,POSITION=1,SEPARATOR=',',alpha)
/FIELD=(field2,POSITION=2,SIZE=3,SEPARATOR=',',digit)
/FIELD=(field3,POSITION=3,SEPARATOR=',',alpha)
/SORT
/KEY=(field2,DESCENDING)
/OUTFILE=csv.out
/PROCESS=csv
field1,field2,field3
"EV","596","OPuEfz"
"pwX","589","mEO"
"UipnSF","047","rtSQz"
Note that "field2" was sorted in descending order (see KEYS on page 135), and enclosed
in quotes on output.
6.10 Alignment
This field attribute aligns a desired field string (not its leading or trailing fill characters)
to either the left or right of the target output or /INREC field (see /INREC on page 126).
Alignment also moves leading or trailing fill characters to the opposite side of the string.
The following alignment options are accepted:
LEFT_ALIGN The string beginning with the first desired (non-fill) character is
aligned to the left of the target field. The remaining length to the right
of the target field is populated with the specified fill character.
RIGHT_ALIGN Fill characters to the right of the source string are removed. The
remaining source string is moved to the right side of the target field.
The remaining length to the left of the target field is populated with
the specified fill character.
By default, the fill character is considered to be a space, but you can specify a different
FILL character on input or on output (see FILL on page 92).).
Consider the SET file chiefs.set (see Example 25 on page 85). The following script,
align.rcl, will generate an ACSCII field called name (with values drawn from the SET
file) and an age field (a randomly generated two-digit number):
/INFILE=align.in
/PROCESS=RANDOM
/INCOLLECT=4
/FIELD=(name,SET=chiefs.set,POSITION=1,SIZE=20)
/FIELD=(age,POSITION=22,SIZE=2,digit)
/SORT
/KEY=name
/OUTFILE=align.out
/FIELD=(name,POSITION=1,SIZE=20,RIGHT_ALIGN)
/FIELD=(age,POSITION=22,SIZE=2,digit)
Harding, Warren G. 37
Roosevelt, Theodore 34
Taft, William H. 70
Wilson, Woodrow 72
Note that the name field was sorted (see KEYS on page 135), and right-aligned on
output. The spaces (fill characters) existing within the name strings have not been
affected by the alignment.
6.11 MILL
This statement can be used in an output field with a numeric data type. It causes commas
to be inserted at the appropriate place(s) in a string of digits.
MILL is implied when CURRENCY (or MONEY) is used as a data type on output. on output.
Consider the effect of the MILL option in this example, mill.rcl, where the generated
input field is a five-digit number:
/INFILE=mill.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(Value,POSITION=1,SIZE=5,digit)
/OUTFILE=mill.out
/FIELD=(Value,POSITION=1,SIZE=8,NUMERIC)
/FIELD=(Value,POSITION=10,SIZE=9,MILL,NUMERIC)
/FIELD=(Value,POSITION=20,SIZE=10,CURRENCY)
Note that the same value was used for all three output fields in each record, but the input
size of 5 had to be expanded to 8 on output to accommodate the NUMERIC data type
(which includes an additional decimal point and two decimal places). A size of 9 was
used for MILL to accommodate that numeric value and an additional comma. Finally,
the CURRENCY field, which uses MILL by default and produces a $ character, had to be
expanded to size 10.
If, on output, you do not expand sizes in this way, an overflow (***) may occur.
The MILL attribute is available only in the OUTFILE section of a RowGen script.
6.12 FILL
This statement is used with numeric data types (such as NUMERIC, WHOLE, or
CURRENCY) to pad the left side of field values with a specified character whenever the
value is shorter than the SIZE given. There are two forms of the FILL statement:
• FILL=’char’
• FILL=n
where char is the fill character, and n is the decimal weight of a character. The default
fill character for a non-binary field is a space; for a binary field, it is a binary NULL
(see Binary Numeric Data Types on page 96).
/INFILE=fill.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(code1,POSITION=1,SIZE=5.0,NUMERIC)
/REPORT
/OUTFILE=fill.out
/FIELD=(code1,POSITION=1,FILL='*',SIZE=7.0,NUMERIC)
0095634
0008910
0052162
0009264
0004742
0029380
0009665
0092121
0050436
0082668
Note that where the field value did not extend to the full size of the field (7), the left side
was padded with the fill character.
The FILL attribute is available only in the OUTFILE section of a RowGen script.
• Generate randomized field data using several available data types, which can be
specified in the input section of a job script (see Input Filenames and Attributes
on page 56).
• Convert from one supported data type to another. This is done by supplying
/FIELD statements within one or more output file sections that reference the
originally generated field name, but with a different option for the data type
attribute (see Output Filenames and Attributes on page 57).
When you are selecting values from a character SET file (see Character SET
NOTE Files on page 111), the values are generated in the exact form in which they
appear in the SET file, and the alpha_digit data type (the default) is assumed.
Therefore, if you want to change the data type of these values, you must
provide /FIELD statements on output, and declare a different data type for
each field that you need to convert.
alpha_digit, the default data type, is not required to be specified as a field attribute.
ASCII fields are randomized using subsets of the 127 possible ASCII characters
available (see ASCII COLLATING SEQUENCE on page 234). The following RowGen
data types are provided to help you customize your ASCII field data with the most
realistic values possible, which is especially useful when real data become available:
ASCII Contains all ASCII characters, printable and non-printable, with the
exception of a binary 0. From hexadecimal 01 to 7F.
Certain generated values may differ based on your environment’s locale set-
NOTE ting.
When generating random data with input /FIELD statements, ASCII fields are
populated to the full SIZE that you assign, and placed at the POSITION you assign.
However, as with any data type you generate, any output SIZE and POSITION you
specify (if performing field-level transformation) will supersede its input phase
specifications. See Data Flow Structure in RowGen Scripts on page 55.
When generating random data with input /FIELD statements, ASCII-Numeric fields are
not always populated to the full SIZE that you assign. For example, if you declare a field
of SIZE=3 and a data type of WHOLE, RowGen will randomly generate numbers from 0
through 999. If you want to ensure that your numeric values are always a certain number
of digits, you must use the data type digit.
The following example illustrates how the ASCII-numeric data types display.
Consider a SET file that contains these entries:
150
9.46
1023.45
These entries, if converted to ASCII-Numeric types on output, would appear as follows
if given adequate sizes:
WHOLE NUMERIC CURRRENCY CURRENCY w/ FILL
--------------------------------------------------
150 150.00 $150.00 $***150.00
9 9.46 $9.46 $*****9.46
1023 1023.45 $1,023.45 $*1,023.45
If you do not size your fields appropriately for the data type, the * overflow symbol may
appear on output, for example:
***
INT, UINT UNIT indicates integer, natural signed. UINT indicates integer, natural
unsigned. These data types represent 2’s complement integers stored
in the width allocated by an int declaration with no short or long
adjective. Two and four bytes are common widths. Byte ordering
uses the natural order of the machine.
LONG, ULONG LONG indicates integer, long signed. ULONG indicates integer, long
unsigned. Four bytes is the typical width for these 2’s complement
data types, with ranges
-2147483648 ≤ signed ≤ 2147483647 and
0 ≤ unsigned ≤ 4294967295.
FLOAT, DOUBLE FLOAT indicates float, single precision. DOUBLE indicates float, double
precision. Single precision and double precision floats are usually
placed in four and eight bytes.
Chars are one-machine-byte long, with no assumptions made about byte width
NOTE for most comparison cases -- assuming a standard 8-bit width with ranges -128
≤ signed character ≤ 127 and 0 ≤ unsigned character ≤ 255. Natural chars are
either signed or unsigned, depending on the interpretation of char used by
your machine.
Natural means that RowGen will use the native C library memcmp() function
to evaluate the keys. Natural comparison is faster and should be used
whenever collation of meta-characters (those with the most significant bit on)
is not a problem. If meta-characters occur, memcmp() does not necessarily
give the same results as the default char type on a machine, on signed or
unsigned chars.
Because the results of char comparisons can differ for meta-characters across
machines and libraries, it is advisable to build test cases for every
combination of meta-characters so that you will know what to expect.
The following RowGen data types are provided to help you generate and/or customize
EBCDIC field data with the most realistic values possible, which is especially useful
when real data become available:
Certain generated values may differ based on your environment’s locale set-
NOTE ting.
When generating random data with input /FIELD statements, EBCDIC fields are
populated to the full SIZE that you assign, and placed at the POSITION you assign.
However, as with any data type you generate, any output SIZE and POSITION you
specify (if performing field-level transformation) will supersede its input specifications.
See Output Filenames and Attributes on page 57.
RowGen can generate the following Micro Focus data types, where the width of data
fields is generally based on a maximum of 18 characters in a PICTURE clause:
The following data types cannot be directly generated. However, you can
NOTE generate using NUMERIC in input and convert to these data types in output.
MF_COMP, UMF_COMP
COMP, Signed and COMP, Unsigned. In computational
(COMPUTATIONAL-4, BINARY) field values, negative values are
stored as 2’s complement numbers with the most significant byte first.
The number of bytes of storage depends on the magnitude of the value
(9s in PICTURE) and on the storage mode of the COBOL program
which generates the data, as shown in Table 1:
Table 2: Hexadecimal/Signs
Decimal Hex Sign
11 b Positive
13 d Negative
15 f Unsigned (positive)
MF_CMP5, UMF_CMP5
MF COMP-5, Signed and MF COMP-5, Unsigned.
COMPUTATIONAL-5 is like COMPUTATIONAL-4 but the byte order
depends on the hardware.
For RowGen purposes, if you require big-endian data, you should use
NOTE COMP-4. The COMP-5 algorithm explicitly does little-endian comparisons.
The following data types cannot be directly generated. However, you can
NOTE generate using NUMERIC in input and convert to these data types in output.
RM_COMP, URM_COMP
RM_COMP indicates COMP, signed. URM_COMP indicates COMP,
unsigned. Computational variables are stored one digit per byte with a
trailing byte for signed data. Each digit is stored as its binary value
(that is, 0x01 for 1). The sign byte is 0x0D for negative values and
0x0B for positive values.
RM_CMP3, URM_CMP3
RM_CMP3 indicates COMP-3, signed. URM_CMP3 indicates COMP-3,
unsigned. A COMPUTATIONAL-3 item, also referred to as packed
decimal, comprises a string of hex digits and a sign. Decimal digits (0
through 9) are held left to right.
Each decimal digit is represented as a hex digit with two hex digits
per byte. The last hex digit holds the sign, as shown in Table 3:
Table 3: Hexadecimal/Signs
Decimal Hex Sign
11 b Positive
13 d Negative
15 f Unsigned (positive)
The sign is the last hex digit so that an odd number of decimal digits
needs to be retained. If there are an even number of digits, a 0 hex
digit is prepended to the value to make full bytes.
USAGE IS DISPLAY values are stored byte-by-byte as the ASCII values for
NOTE each digit for up to 18 digits. Each digit is in the range 0x30 through 0x39.
RowGen can generate the following EBCDIC-Native Micro Focus COBOL Data types:
The following data types cannot be directly generated. However, you can
NOTE generate using NUMERIC in input and convert to these data types in output.
EMF_COMP, EMF_UCOMP
EMF_COMP indicates COMP, Signed. EMF_UCOMP indicates COMP,
unsigned.
EMF_CMP3, EMF_UCMP3
EMF_CMP3 indicates COMP-3, packed decimal. EMF_UCMP3 indicates
COMP-3, unsigned.
EMF_CMP4, EMF_UCMP4
EMF_CMP4 indicates COMP-4, signed. EMF_UCMP4 indicates COMP-4,
unsigned.
EMF_CMP5, EMF_UCMP5
EMF_CMP5 indicates COMP-5, signed. EMF_UCMP5 indicates COMP-5,
unsigned.
EMF_COMPX COMP-X.
See Micro Focus Data Types on page 99 for a description of each type.
The following data types cannot be directly generated. However, you can
NOTE generate using NUMERIC in input and convert to these data types in output.
ERM_COMP, ERM_UCOMP
ERM_COMP indicates COMP, signed. ERM_UCOMP indicates
COMP, unsigned.
ERM_CMP1 COMP-1.
ERM_CMP3, ERM_UCMP3
ERM_CMP3 indicatesCOMP-3, signed. ERM_UCMP3 indicates
COMP-3, unsigned.
ERM_CMP6 COMP-6.
See RM COBOL Data Types on page 101 for a description of each type.
RowGen supports several common datestamp and timestamp formats. Values in these
formats can be generated natively on input, or you can select from a SET file with values
formatted in any of these formats (see Date SET Files with Ranges on page 115 and
Timestamp SET Files with Ranges on page 117). RowGen can sort the values
appropriately if used as keys (see KEYS on page 135). Also, note that you can convert
from one data type to another. For example, on input you can generate, or draw from a
SET of, AMERICAN_TIMESTAMP-conforming values, and declare them as another
timestamp format on output.
AMERICAN_DATE month/day/year
where month is a name, for example:
Jul/31/2004 (using as many characters for the name as the size of the
field allows. Will recognize the month as an integer when used in a set
file, for example 12/31/2004.
AMERICAN_TIME hour[:minute][:second] xM
for example:
11:23:01 PM
AMERICAN_TIMESTAMP
month/day/year hour[:minute][:second] xM
for example:
12/31/2004 11:23:01 PM
EUROPEAN_DATE day.month.year
where month is a name or integer, for example:
31.12.2004 or 31.Dec.2004
EUROPEAN_TIME hour[.minute][.second]
for example:
23.23.01
EUROPEAN_TIMESTAMP
day.month.year hour[.minute][.second]
for example:
31.12.2004 23.23.01
JAPANESE_DATE year-month-day
where month is a name or integer, for example:
2004-12-31 or 2004-Dec-31
JAPANESE_TIME hour[:minute][:second]
for example:
23:23:01
JAPANESE_TIMESTAMP
year-month-day hour[:minute][:second]
for example:
2004-12-31 23:23:01
ISO_DATE year-month-day
where month is a name or integer, for example:
2004-12-31 or 2004-Dec-31
ISO_TIME hour[.minute][.second]
for example:
23.51.01
Zoned decimal is not a supported data type within RowGen. However, this section
describes how RowGen enables you to produce output fields in zoned decimal format
by using the supplied zoned.set file (found in the /examples/RCL_chapter directory of
your RowGen installation directory). See Example 33 below.
The following data types cannot be directly generated. However, you can
NOTE generate using NUMERIC in input and convert to these data types in output.
Zoned Decimals are alphanumeric digits. If the decimal quantity is positive, the last
character ends with a digit from 0-9. If the decimal quantity is negative, the last
character is written as a lower-case character. The following table shows the format
for negative quantities
To produce random zoned decimal fields on output using RowGen, you will need the
SET file zoned.set, which gives all of the possible values for the final character of a
zoned decimal field:
0
1
2
3
4
5
6
7
8
9
p
q
r
s
t
u
v
w
x
y
The following script, zoned.rcl, generates 10 random zoned decimal formatted values
with a size of 4:
/INFILE=zoned.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(value_first,POSITION=1,SIZE=3,digit)
/FIELD=(number_last,SET=zoned.set,POSITION=4,SIZE=1)
/OUTFILE=zoned.out
328v
390x
397v
4176
5934
653x
7417
775q
801t
8234
7 SET FILES
RowGen can draw field values at random from a pre-existing SET file. In this way, you
can ensure that one or more generated fields is populated with realistic-looking data.
Syntax
SET files are text files that consist of one or more columns of any number of entries.
When multiple columns are present (see Relation SET Files on page 119), the entries
must be tab-delimited. Any SET file entries beginning with a # symbol and are located
at the top of the set are treated as comments, and not selected by RowGen. When the #
symbol is no longer the first character, the SET file content begins. Thereafter, if a #
symbol appears, it is assumed to be part of the data.
where:
• ORDER=
• SEARCH=
= EQ exact match to the argument
> GT value in table after the argument
>= GE exact match if found, otherwise later value
< LT value in table before the argument
<= LE exact match if found, otherwise earlier value
• SELECT=
ANY The default. Values from the specified SET file are selected at
random. Therefore, repetition of field values is possible using
this default option, as well as the omission of values. The
amount of repeated or omitted SET values depends on the
number of rows being generated and the number of entries that
exist in the SET file.
ALL If ALL is specified, RowGen selection begins at the top entry
of the SET file, and continues downward through the file, in
order, with each new row that is generated. Depending on the
value of /INCOLLECT (see /INCOLLECT on page 59), if all
SET files entries are utilized, the selection process beings
again at the top, and this process is repeated.
ONCE If ONCE is specified, RowGen selection begins at the top entry
of the SET file, and continues downward through the file, in
order, with each new row that is generated. Depending on the
value of /INCOLLECT (see /INCOLLECT on page 59), once
all SET files entries are utilized, the selection process ends,
and any remaining rows to be generated contain an empty
value for that field.
SUFFIX Selection begins at the top entry and continues downward
through the set file, in order, with each new row that is
generated. Once all set file entries are used, the selection
process begins again at the top. On the second pass through the
set file, the values are appended with the string "_2, continuing
with "_3", "_4", and so on, until the INCOLLECT limit is
reached.
ROW Only applies to set files with two or more tab delimited
columns. Works like ANY if the referenced set file has not
been used in a previous field. If the referenced set file has
already been used in a previous field, then the same row from
the set file is used to supply the value for the field with the
ROW selection type. The required index argument to the
ROW selection type specifies which column of the set file to
use to supply the field value.
The above access types are also applicable when using literal SETS
NOTE (see Literal SETs on page 122), where ALL and ONCE selections begin at the
left-most entry in the literal set, and continue with each value to the right.
RowGen can draw field values at random from pre-existing SET files that contain any
number of character strings. When you are selecting values from a character SET file,
the values are generated in the exact form in which they appear in the SET file.
An alpha_digit data type (the default) is assumed (see ASCII Character Data Types on
page 94).
If, for example, you want to produce EBCDIC-equivalent values from the strings in your
character SET file, you must provide output /FIELD statements and specify EBCDIC as
the data type.
Comments preceded by # symbol are allowed in all RowGen SET files at the
NOTE top of the set. When the # symbol is no longer the first character, the set file
content begins. Thereafter, if a # appears, it is assumed to be part of the data.
/INFILE=simple.in
/PROCESS=RANDOM
/INCOLLECT=3
/FIELD=(Names,SET=chiefs.set,POSITION=1,SIZE=19)
/FIELD=(zip_code,POSITION=21,SIZE=5,digit)
/REPORT
/OUTFILE=simple.out
To run this job, you would type the following on the command line:
rowgen /spec=simple.rcl
However, without the SET=filename attribute in the above script, RowGen would
randomly generate a field of size 15, with alpha_digit characters only, which is the
default data type (see ASCII Character Data Types on page 94):
p01d33sG6yD4864qt6d 31716
E1v6IaG86PP1rq66o8R 45286
P1cO3la4Jh3hk5Kv6e6 14812
If you declare a SIZE that is too small to accommodate one or more values in
NOTE the numeric SET file referenced in its /FIELD statement, an error 21 is
returned (improper format declaration).
RowGen can also draw from numeric values at random from pre-existing SET files that
contain any number of numeric values or numeric ranges. The SIZE attribute you
provide will determine the precision of how the value will be represented (unless a
literal value is selected
When you are selecting values from a numeric SET file, the NUMERIC data type should
be given as a field attribute on input. If you want to produce fields on output with
another other data type such as currency, you can provide output /FIELD statements and
specify CURRENCY as the data type.
The supported entries in a numeric SET files can be any combination of:
• Literal values with or without precision, such as 14 or 12.25. These values are
produced as they appear.
• [x,y]. Inclusive low to high range values, where x and y are considered for
random selection. For example, the entry [-2,2] can yield any of the following
(if decimal precision is set to 0):
• -2
• -1
• 0
• 1
• 2.
• (x,y). Exclusive low to high range values, where x and y are not considered for
random selection, but all values in between are considered. For example, the
entry (-2,2) can yield any of the following (if decimal precision is set to 0):
• -1
• 0
• 1.
• [x,y) or (x,y]. A combination pair of square and round brackets where the
above rules apply to each side of the expression. For example, the entry [-2,1)
can yield any of the following (if decimal precision is set to 0):
• -2
• -1
• 0.
• Numeric ranges using decimal precision. For example, the entry [1.52,1.55] can
yield any of the following (if decimal precision is set to 2):
• 1.52
• 1.53
• 1.54
• 1.55.
Comments preceded by # symbol are allowed in all RowGen SET files at the
NOTE top of the set. When the # symbol is no longer the first character, the set file
content begins. Thereafter, if a # appears, it is assumed to be part of the data.
The following script, numbergen.rcl, shows several ways to define and produce
numeric values using the above SET file:
/INFILE=numbers.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(v0,SET=numbers.set,POSITION=01,SIZE=8.0,NUMERIC)
/FIELD=(v1,SET=numbers.set,POSITION=11,SIZE=8.1,NUMERIC)
/FIELD=(v2,SET=numbers.set,POSITION=21,SIZE=8.2,NUMERIC)
/FIELD=(v3,SET=numbers.set,POSITION=31,SIZE=8.3,NUMERIC)
/SORT
/KEY=v0
/OUTFILE=numbers.out
To run this job, you would type the following on the command line:
rowgen /spec=numbergen.rcl
Note that the SIZE attribute given for each /FIELD statement determines the total field
size and decimal precision on output for the selected SET file values. The ranges of
possible values are determined by the SET file entries themselves and the types of
brackets that enclose them.
See Example 3 on page 15 for another example of selecting values from a numeric SET
file.
RowGen can also draw from dates at random from pre-existing SET files that contain
any number of date values or ranges.
When you are selecting values from a date SET file, a RowGen-supported DATE data
type must be specified as a field attribute on input (see Date/Timestamp Data Types on
page 105 for the available options), and the date(s) within the SET file must be of the
data type format specified.
If you want to produce fields on output with another supported DATE data type, such as
converting from ISO_DATE to AMERICAN_DATE, you can provide output
/FIELD statements and specify AMERICAN_DATE as the data type.
Note that DATE fields are sorted in date order when referenced as a sort /KEY (see
KEYS on page 135).
The supported entries in a date SET file can be any combination of:
• 2002-02-27
• 2002-02-28
• 2002-03-01
• 2002-03-02.
Only valid dates are produced by RowGen when date ranges are selected from
NOTE a SET file.
Literal date values in a SET file are also ignored by RowGen when they are
invalid. For example, an entry of Feb/31/2005 is ignored.
• (x,y). Exclusive low to high range values, where x and y are not considered for
random selection, but all values in between may be generated. For example, the
entry (2002-02-27,2002-03-02) can yield either of the following:
• 2002-02-28
• 2002-03-01.
• [x,y) or (x,y]. A combination pair of square and round brackets where the
above rules apply to each side of the expression. For example, the entry
[2002-02-27,2002-03-02) can yield any of the following:
• 2002-02-27
• 2002-02-28
• 2002-03-01.
[2009-02-27,2009-03-01]
[2010-12-31,2011-01-02)
(2011-04-02,2011-04-04)
(2011-06-01,2011-06-03]
2012-01-01
The following script, dategen.rcl, illustrates how dates are drawn from a date SET file,
and how date formats can be converted:
/INFILE=dates.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(date1,SET=dates.set,POSITION=1,SIZE=10,ISO_DATE)
/SORT
/KEY=date1 # sorts in ascending date order
/NODUPLICATES # produces unique dates
/OUTFILE=datesISO.out # display original selection
/OUTFILE=datesUSA.out # for type conversion
/FIELD=(date1,POSITION=1,SIZE=11,AMERICAN_DATE) # type-convert
To run this job, you would type the following on the command line:
rowgen /spec=dategen.rcl
2009-02-28
2011-01-02
2011-04-03
2011-04-04
2011-06-03
2012-01-01
This will also produce the output file datesUSA.out, which contains:
Feb/28/2009
Jan/02/2011
Apr/03/2011
Apr/04/2011
Jun/03/2011
Jan/01/2012
Note that the ranges of possible values are determined by the SET file entries and the
types of brackets that enclose them. Note that the literal entry (2008/01/01) is also
represented in the final output. The data type is converted from ISO_DATE in the SET
file to AMERICAN_DATE on output. The entry /NODPULICATES assures that each
date is unique (see No Duplicates, Duplicates Only on page 138). Note that values with
the ISO_DATE data type can also be generated on input (rather than requiring a
conversion as shown in this example), provided the SET file contains ISO_DATE
values, and that the input field is declared accordingly.
RowGen can also draw from dates at random from SET files that contain any number of
timestamp values or ranges.
When you are selecting values from a timestamp SET file, a RowGen-supported
timestamp data type must be specified as a field attribute on input (see Date/Timestamp
Data Types on page 105 for the available options), and the timestamp(s) within the SET
file must be of the data type format specified.
If you want to produce fields on output with another supported timestamp data type,
such as converting from ISO_TIMESTAMP to AMERICAN_TIMESTAMP, you can
provide output /FIELD statements and specify AMERICAN_TIMESTAMP as the data
type. Note that timestamp fields are sorted in date order when referenced as a sort /KEY
(see KEYS on page 135).
The supported entries in a date timestamp file can be any combination of:
• 2004-11-22 02.28.32
• 2004-11-25 09.57.26
• 2005-01-07 20.20.24
• 2005-02-07 12.39.35.
Only valid dates are produced by RowGen when timestamp ranges are
NOTE selected from a SET file.
Literal date values in a SET file are also ignored by RowGen when they are
invalid, for example, an entry of 2004-02-31 27.15.01 is ignored.
• (x,y). Exclusive low to high range values, where x and y are not considered for
random selection, but all values in between can be generated.
• [x,y) or (x,y]. A combination pair of square and round brackets where the
above rules apply to each side of the expression.
The following script, timestamp.rcl, illustrates how dates are drawn from a SET file,
and how timestamp formats can be converted:
/INFILE=remap.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(random_time,SET=times.set,POSITION=1,SIZE=20,ISO_TIMESTAMP)
/REPORT
/OUTFILE=times.out
/FIELD=(random_time,POSITION=1,SIZE=20,ISO_TIMESTAMP)
rowgen /spec=timestamp.rcl
2011-01-31 09:56:530
2013-01-25 19:15:270
2009-12-01 13.04.040
2009-12-01 13.04.040
2009-12-01 13.04.040
2009-12-01 13.04.040
2010-12-29 11:14:090
2010-12-03 01:00:160
2009-12-01 13.04.040
2009-12-01 13.04.040
Note that the ranges of possible values are determined by the SET file ranges and its
single literal value.
RowGen supports the generation of independent and dependent (or related )data values
or ranges across fields, when randomly selected from a multi-column set file (such as
those found in look-up tables). This functionality is similar to using a look-up table.
When the desired value or range of values of one field is dependent on the selected value
from another field, the data look more realistic.
/FIELD=(field1,SET=setfilename.set ,POSITION=1, . . .
/FIELD=(field2,SET=ROW[2] setfilename.set ,POSITION= . . .
where setfilename.set is the name of any tab-delimited set file that contains the
possible values for field1 (which can be given as a literal string, i.e., a known
quantity) to the left of the tab, and the possible values for field2 to the right of the tab.
The first pull from the set file in the record will pull randomly from the column
indicated. All other columns will come from the same record in the set file. If you later
in the record pull from the set file without indicating the column, that is again a random
pull and then all other pulls after that will pull from the same record in the set file from
which that was pulled. The number following ROW in brackets is the column to pull from
within the set file.
Consider a relational SET file, state_city.set, which contains a tab-delimited list of the
50 United States, accompanied by the cities which comprise each state:
Alabama Abbeville
Alabama Alabaster
Alabama Albertville
...
Alaska Anchorage
Alaska Barrow
Alaska Bethel
...
Arizona Ajo
Arizona Apache Junction
Arizona Avondale
...
add_values.set
[1,100)
[1,100)
[1,100)
[100,999]
streetnames.set
1st St.
2nd St.
3rd St.
4th St.
5th St.
6th St.
7th St.
8th St.
9th St.
A St.
Aaron's Pl.
Abbey Pl.
Abigail St.
Abington Pl.
etc...
The following script, address_gen.rcl, generates fictional addresses, and matches states
with realistic, corresponding city names:
/INFILE=addresses.in
/PROCESS=RANDOM
/FIELD=(add_no,SET=add_values.set,POSITION=1,SIZE=3.0,SEPARATOR='\t',NUMERIC)
/FIELD=(address,SET=streetnames.set,POSITION=2,SEPARATOR='\t')
/FIELD=(State,SET=state_city.set,POSITION=3,SEPARATOR='\t')
/FIELD=(City,SET=ROW[2] state_city.set,POSITION=4,SEPARATOR='\t')
/REPORT
/OUTFILE=addresses.out
/FIELD=(add_no,POSITION=1)
/DATA=" "
/FIELD=(address)
/DATA=" "
/FIELD=(City,POSITION=25)
/DATA=", "
/FIELD=(State)
Note that the city and state fields match up according to the entries in the SET file
state_city.set, which provides a more realistic set of address values.
In cases where you have a small, customized static list of field values, you can include
the values directly within the /FIELD statement, rather than refer to a separately held
SET file.
You can include a comma-separated list of single values from which RowGen will
randomly draw, for example:
/FIELD=(first_name,SET={rob,sue,bill},POSITION=3,SEPARATOR=’|’)
The conventional use of encapsulating string values within a literal SET is also
NOTE supported, for example, SET={"rob","sue","bill"}
Or, you can include a literal range, or comma-separated ranges, of values from which
RowGen will randomly draw, for example:
/FIELD=(salary,SET={(10000,50000),(60000,95000)},POSITION=3,SEPARATOR=’|’)
which will generate random numbers between 10,000 and 50,000, and between 60,000
and 95,000. Use the NUMERIC data type if you want values right-aligned, with a
precision of 2, by default (see ASCII-Numeric Data Types on page 95).
See Numeric SET Files with Ranges on page 113 for details on the rules for using
parentheses and square brackets to determine range selection criteria.
In both /INREC and in the output files, it is possible to use a mathematical expression in
place of a field name. These expressions may reference multiple fields and use
arithmetic operators and mathematical functions. Parentheses can be used to control
operator precedence, and temporary fields may be created to hold intermediate values.
These features are particularly useful for ad hoc financial calculations or spreadsheet-
style presentations.
This example, expr.rcl, demonstrates expression writing and its use of precedence:
/INFILE=expr.in
/PROCESS=RANDOM
/INCOLLECT=5
/FIELD=(a,POSITION=01,SIZE=1,digit)
/FIELD=(b,POSITION=04,SIZE=1,digit)
/FIELD=(c,POSITION=07,SIZE=1,digit)
/FIELD=(d,POSITION=10,SIZE=1,digit)
/OUTFILE=expr.out
/HEADREC="a b c d t=a+b*(c+d) (t-1)/4\n\n"
/FIELD=(a,POSITION=01)
/FIELD=(b,POSITION=04)
/FIELD=(c,POSITION=07)
/FIELD=(d,POSITION=10)
/FIELD=(t=a + b * (c + d),POSITION=15,SIZE=6,NUMERIC) # calculate t
/FIELD=((t - 1) / 4,SIZE=9,NUMERIC) # calculate and display
1) Add c plus d.
2) Multiply by b.
3) Add a.
4) Store t.
2 5 3 9 62.00 15.25
2 8 7 1 66.00 16.25
3 6 5 9 87.00 21.50
6 6 4 7 72.00 17.75
9 4 6 3 45.00 11.00
Function Description
abs (x) absolute value of x, |x|. x can be a whole number or a floating point value
acos (x) arc cosine of x; range: [0, π] radians; -1≤x≤1
asin (x) arc sine of x; range: [-π, π/2] radians; -1≤x≤1
atan (x) arc tangent of x; range: [-π/2, π/2] radians
atan2 arc tangent of y/x; range: [-π, π]
j0 (x) Bessel function of x, first kind, order 0
j1 (x) Bessel function of x, first kind, order 1
jn (n, x) Bessel function of x, first kind, order n
y0 (x) Bessel function of x, second kind, order 0.
y1 (x) Bessel function of x, second kind, order 1.
yn (n, x) Bessel function of x, second kind, order n.
cos (x) cosine of x in radians
exp (x) e to the power of x
cosh (x) hyperbolic cosine of x
sinh (x) hyperbolic sine of x
Function Description
tanh (x) hyperbolic tangent of x
floor (x) largest whole number (as a double-precision number) not greater than x
9 /INREC
The optional /INREC section in a RowGen job script provides the capability to
transform the input (generation-phase) record, prior to the processing phase of a job,
where you can define one or more derived fields as part the layout. The derived fields
can be based on the randomly generated (or selected) field values that were defined in
the input section.
An /INREC section must contain all the fields which you intend to include in your
output. That is, if one of the original fields that you define in the input section is not
included in /INREC, it will not be processed, and therefore will not appear on output.
As with the /INFILE section of a script, an /INREC section defines the mapping that
will be sent to output if no output layout is specified.
The example on the following page incorporates many possible uses of /INREC.
Consider that you have three sets of grades for each pupil, and that the average of the
three grades determines the final mark they receive.
The following script, grades.rcl uses /INREC to define a derived field that is first used
for sorting, then within outfile-specific /INCLUDE statements, and finally as part of
conditional /DATA statements within each output file:
/INFILE=grades.in
/PROCESS=RANDOM
/INCOLLECT=20
/FIELD=(student_code,POSITION=1,SIZE=3,alpha)
/FIELD=(grade1,POSITION=6,SIZE=2,digit)
/FIELD=(grade2,POSITION=9,SIZE=2,digit)
/FIELD=(grade3,POSITION=12,SIZE=2,digit)
/INCLUDE WHERE grade1 > 60 AND grade2 > 60 and grade3 > 60
/OUTFILE=lowgrades.out
/INCLUDE WHERE avg_grade < 80 # references the INREC-defined field
/HEADREC="Code Grade1 Grade2 Grade3 Average\n------------------
----------------------\n"
/FIELD=(student_code,POSITION=1,SIZE=3)
/FIELD=(grade1,POSITION=6)
/FIELD=(grade2,POSITION=15)
/FIELD=(grade3,POSITION=24)
/FIELD=(avg_grade,POSITION=34)
/DATA=IF avg_grade < 70 THEN " very low pass" ELSE " low pass"
/OUTFILE=highgrades.out
/INCLUDE WHERE avg_grade > 80 # references the INREC-defined field
/HEADREC="Code Grade1 Grade2 Grade3 Average\n------------------
----------------------\n"
/FIELD=(student_code,POSITION=1)
/FIELD=(grade1,POSITION=6)
/FIELD=(grade2,POSITION=15)
/FIELD=(grade3,POSITION=24)
/FIELD=(avg_grade,POSITION=34)
/DATA=IF avg_grade < 85 THEN " high pass" ELSE " very high pass"
lowgrades.out contains:
highgrades.out contains:
Note that the "avg_grade" field (grade1 + grade2 + grade3) / 3) was created in
the /INREC section because it was used for sorting purposes. It was also used in output
file-specific /INCLUDE statements (see INCLUDE-OMIT (RECORD SELECTION) on
page 146). Finally, it was used as part of output field-specific IF THEN ELSE logic to
produce the evaluations seen in the final column
(see CONDITIONAL FIELD AND DATA STATEMENTS on page 152).
10 /DATA
/DATA statements are used to pad and format output records. /DATA statements are not
named fields, so they cannot be directly mapped. They are positioned just after the
previous/DATA or /FIELD statement.
/DATA=field_name
displays the value of an input field without formatting
/DATA=internal_variable
where the internal variable can be any RowGen internal variable
listed in Table 7 on page 131.
You can also use a conditional data statement (see CONDITIONAL FIELD AND DATA
STATEMENTS on page 152).
Several examples of /DATA statements are shown in the following script, data.rcl.
It uses the SET file parts_list2.set (as shown in Example 11 on page 25).
/INFILE=data.in
/PROCESS=RANDOM
/INCOLLECT=15
/FIELD=(part,SET=parts_list2.set,POSITION=1,SIZE=15)
/FIELD=(price,POSITION=16,SIZE=10,NUMERIC)
/OMIT WHERE Price < 0
/SORT
/KEY=part
/NODUPLICATES
/OUTFILE=data.out
/DATA=part # field name
/DATA="Price: " # literal string
/FIELD=(price,SIZE=14,CURRENCY)
/DATA=" " # adds two spaces
/DATA=CURRENT_TIMESTAMP # internal variable
/DATA={3}"*-*" # repeated constant string
11 INTERNAL VARIABLES
RowGen maintains internal values which you can use in /DATA, /HEADREC, and
/FOOTREC statements (see /DATA on page 129, /HEADREC on page 170, and
/FOOTREC on page 171). The internal values are shown in Table 7.
Table 7: Internal Variables
Variable Output/Example
AMERICAN_DATE Sep/19/2004
EUROPEAN_DATE 19.09.2004
JAPANESE_DATE 2004-09-19
ISO_DATE 2004-09-19
CURRENT_DATE 2004-09-19
AMERICAN_TIME 09:47:15 AM
EUROPEAN_TIME 21.47.15
JAPANESE_TIME 09:47:15 PM
ISO_TIME 21.47.15
CURRENT_TIME 21.47.15
Produced on 2004-09-19
See details on using format control characters, such as %s, see Table 14 on page 170.
RowGen supports the use of a control character, such as a horizontal tab, as a field
separator (see SEPARATOR on page 86). In addition, the formatting statements /DATA,
/HEADREC, and /FOOTREC can also contain control characters (see /DATA on page 129,
/HEADREC on page 170, and /FOOTREC on page 171). Table 8 describes the
supported control characters and how to specify them.
13 CONVERSION SPECIFIERS
Conversion specifiers are used within /DATA statements, and for defining conditions,
when you want to convert a known value into its equivalent in another data type
(see /DATA on page 129).
The EBCDIC conversion specifier is available when you want to use the ASCII
equivalent of an EBCDIC value within a condition, or return the EBCDIC equivalent of
an ASCII value within the records of your output. Similarly, the PACKED conversion
specifier is available when you want to use the ASCII-numeric equivalent of a PACKED
value within a condition, or return the PACKED equivalent of an ASCII value within
the records of your output.
The hexadecimal conversion specifier is used when want to return the hexadecimal
equivalent of an ASCII value. For example, if you include the statement
/DATA=%HEX"C134", the two-byte hexadecimal equivalent, consisting of c1 and 34,
would be returned. A hex dump representation of this value would therefore appear as
C1 34 at the specified location within every output record.
The ASCII conversion specifier is not typically required because the character string
you specify for conversion is already in ASCII. It is provided only for situations where
the specifier itself is a variable (see Environment Variables on page 49).
Table 9 shows the syntax for using the conversion specifiers which are recognized by
RowGen.
For hexadecimal values 01 through 09, you can use "\n" without the %HEX
NOTE specifier, where n is any whole number from 1 to 9. For example,
/DATA="\4" returns the hexadecimal value 04.
To return the EBCDIC equivalent of the ASCII value 65 in an output file, for example,
use the following:
/DATA=%EBCDIC"65"
The location of this value within your results depends on where, within the layout of
your output records, you specified the /DATA statement (see /DATA on page 129). For
/HEADREC and /FOOTREC statements, the value would appear at the top or bottom of
the output, respectively.
/CONDITION=(Senior,TEST=(Age GT %E"65"))
If you /INCLUDE this condition, only those records where the EBCDIC equivalent of
66 and higher for the field Age are included in the output (see CONDITIONS on
page 139).
14 KEYS
In RowGen jobs that contain a /SORT statement (the default process), the order of the
output records is determined by comparing one or more key fields within records until
they are not equal. The /KEY statement is used for specifying each key field.
By default, records sort from left to right if no /KEY is given. Any number of /KEY
statements can be given. Compares will be performed in the order given while each key
is equal.
This section describes the components of the /KEY statement and its related options:
• Syntax
• Field Name Reference
• Unnamed Reference on page 136
• Collating Sequence on page 136
• Direction on page 136
• ASCII Options on page 137
• No Duplicates, Duplicates Only on page 138.
14.1 Syntax
Sort parameters are separated by commas. If the field name is used, it must be first,
while the other parameters may appear in any order:
/KEY=(field[,data type][,collating_sequence]
[,direction][,ASCII options])
The simplest form of the /KEY statement uses a defined input field name for field and
no other parameters. When field is the only parameter and the direction is ascending,
the parentheses are not required. The position in the record, size of the field, and data
type are known from the input or /INREC section’s /FIELD description.
The following RowGen job script generates and sorts records with first and last names,
selected from SET files. The sort order is by last name, followed by first name:
/INFILE=chiefs_sep
/FIELD=(fname,SET=first_names.set,POSITION=1,SEPARATOR=’|’)
/FIELD=(lname,SET=last_names.set,POSITION=2,SEPARATOR=’|’)
/FIELD=(term,POSITION=3,SEPARATOR=’|’)
/SORT
/KEY=lname
/KEY=fname
/OUTFILE=chiefs.out
Parentheses are not required for the key when using only the field name to
NOTE describe it.
By default, the collating sequence for the key field(s) is determined by the data type
specified in the input /FIELD statement(s). If the field was not specifically typed on
input, ASCII collation will occur.
You can, however, specify a different collating sequence for any given key. For
example, if the code field contains ASCII strings that might contain some binary
characters (for example, if using a SET file), you may want to use the EBCDIC collating
sequence, as follows:
/KEY=(code,EBCDIC)
14.5 Direction
These can be used when the field is ASCII. These options do not actually reformat the
field; they only change how the comparison is made (see Alignment on page 90 for
actual field reformatting). The ASCII options are:
alignment Causes the key field to be shifted for comparison purposes. The
options are:
Left Sort as if the key field characters precede trailing space(s).
Right Sort as if the key field characters trail leading spaces.
None The default. Sort with the key field characters compared as
they currently appear in the field. Sort order may be affected
by leading or trailing spaces within the field.
/INFILE=align.dat
/FIELD=(f1,SET=align_test.set,POSITION=1,SIZE=9)
/SORT
/KEY=(f1,alignment)
/OUTFILE=stdout
__Chars__1
Duplicates occur when two or more records have key fields that compare equally.
Only the key fields, not the records, must compare equally for a record to be considered
a duplicate.
The statement /NODUPLICATES results in only one of the duplicates being output.
If you also specify /STABLE, the earliest input duplicate will be the one retained.
Without /STABLE, the duplicate retained is arbitrary.
The number of records that are generated (using /INCOLLECT) will not match
NOTE the number that is produced if duplicates are found and removed.
Inversely, you can specify /DUPLICATESONLY, where only records containing key
fields that compare equally are returned.
15 CONDITIONS
A condition is a logical expression that combines field names and/or constants with
relational and/or logical operators. When the expression is evaluated, it will be either
true or false.
A condition is associated with several RowGen statements, and can be used for both
input and output. The true/false result controls how the statement works. The following
statements use a condition within their definitions:
• /INCLUDE and /OMIT
• /DATA using IF-THEN-ELSE logic
• /FIELD using IF-THEN-ELSE logic
• WHERE and BREAK in summary functions (MAX, MIN, SUM, AVG, COUNT).
15.1 Syntax
Named (implicit) where the logical expression has a name and is defined with the
statement:
/CONDITION=(condition_name, TEST=(logical_expression))
Unnamed (explicit) where the logical expression is built into the statement using it:
The simplest logical expression is a field name. The condition is true when the value of
the field is different from its value in the previous record. If the value of the field has not
changed, the condition is false.
Typically in change testing, data is sorted using the named field as a key, but this is not a
requirement. The most common use of change tests is for defining BREAK points in
summary functions, as shown in Example 50 on page 159.
Another form of a logical expression involves two values and a relational operator.
The following is the general form of the expression:
RowGen recognizes both the operator and symbol forms of these relational operators,
as shown in Table 12.
Author EQ "Publisher"
Publisher >= "Addison-Wesley"
Author CT "Hemmingway"
Price > 25.00
Note the difference above between Price > 25.00 (a numeric compare) and
NOTE Publisher >= "Addison-Wesley" (a character compare).
You can use C-style iscompare functions in RowGen to evaluate conditions at the field
level, and also for record-filtering using /INCLUDE and /OMIT statements. This is
useful when drawing from character SET files that require validation (see Character
SET Files on page 111).
isascii(field) True if each character is a 7-bit unsigned char value that fits into the
ASCII character set.
isempty(field) Returns true for null fields or those that satisfy isspace(field).
isnumeric(field) Same as isdigit(field), but also recognizes period (.), plus (+),
and minus (-), and only when it satisfies isspace(field). At least
one char must be a digit.
isholding(value1,value2)
True if value2 is contained within value1. value1 and/or value2
may be a literal value or a field name, for example
isholding(ACCOUNT," # ").
ispattern(field,"expression")
Checks the field using Perl-compatible regular expressions such as
a+bc.
ispacked(field) Checks the field to make sure each nibble, except for the last one,
contains a 0-9 value, and that the last nibble contains a hex b, c,
d, or f.
555-1111
55520098
555-4321
555-0098
Note that the two empty records above consist of 8 spaces each.
The following script, iscompare.rcl, uses iscompare functions within both record filter
logic (/OMIT) and field-level evaluation in the output:
/INFILE=iscompare.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(phone,set=phone.set,POSITION=1,SIZE=8)
/OMIT WHERE isdigit(phone)
/REPORT
/OUTFILE=iscompare.out
/FIELD=(phone,POSITION=1,SIZE=8)
/FIELD=(flag,POSITION=12,IF isspace(phone) THEN "empty" ELSE "")
empty
555-4321
empty
555-4321
empty
555-1111
empty
555-4321
555-4321
empty
Note that the following record was not selected because the phone number field contains
all digits (isdigit):
55520098
Notice also the records that contain empty phone number fields (isspace) were appended
with the word empty because of the IF THEN ELSE logic in the final output field
(see CONDITIONAL FIELD AND DATA STATEMENTS on page 152).
It is possible to build conditions so that the logical expression of one condition will
contain the name of a previously defined condition linked to the rest of the expression by
a logical operator. Since parentheses are not recognized for grouping logical
expressions, this is helpful in defining complex logical expressions.
The following is an example of a script that includes nested conditions, and creates two
output files with different filter logic based on the nested conditions:
/INFILE=presidents.in
/FIELD=(Name,set=names.set,POSITION=1,SIZE=27)
/FIELD=(Party,set=party.set,POSITION=40,SIZE=3)
/FIELD=(State,set=states.set,POSITION=45,SIZE=2)
/CONDITION=(C1,TEST=(Party EQ "DEM")) # C1 defined
/CONDITION=(C2,TEST=(Party EQ "REP")) # C2 defined
/CONDITION=(C3,TEST=(C1 OR C2)) # C3 includes C1 and C2
/INCOLLECT=20
/SORT
/KEY=Name
/OUTFILE=dem_rep.out
/INCLUDE WHERE C3 # output includes only DEMs and REPs
/OUTFILE=no_dem_rep.out
/OMIT WHERE C3 # output omits all DEMs and REPs
/INCLUDE and /OMIT statements use conditions to accept or reject entire records,
respectively. They are different than the IF-THEN-ELSE conditions which can
determine field values (see CONDITIONAL FIELD AND DATA STATEMENTS on
page 152); include and omit conditions use field-value conditions to determine the
dispositions of entire records. This functionality can be applied to records for both input
filtering and/or output purposes.
When a field name is given, without being used in an expression, beneath an /OUTFILE
statement, such as /INCLUDE WHERE ID_NO, then only one record containing each
unique ID_NO is returned to that output file. This is similar to the use of
/NODUPLICATES, but in this case, the uniqueness feature is output file-specific,
whereas /NODUPLICATES affects all output files (see No Duplicates, Duplicates Only
on page 138). Note that /OMIT WHERE field_name is the inverse case, which
behaves similarly to /DUPLICATESONLY, but at the output file level. See Example 17 on
page 39 which contains two output files that employ this /INCLUDE usage for
uniqueness.
Care must be given to the order of /INCLUDE and /OMIT statements because records are
tested for each /INCLUDE and /OMIT condition in the order specified. If a particular
record meets a given /INCLUDE condition, it is included, regardless of any remaining
/OMIT statements that would otherwise cause that record to be omitted. Alternatively, if
a record meets a given /OMIT condition, it is omitted, regardless of any remaining
/INCLUDE statements that would otherwise cause that record to be included. That is,
once a record satisfies an include or omit condition, its inclusion/exclusion in the output
is determined (see Example 44 on page 148 and Example 45 on page 149).
For better performance when there are multiple conditions, the most likely /INCLUDE
and /OMIT statements must be given early. The sooner the records’ dispositions are
determined, the fewer conditions are required to be evaluated.
You should also consider which section of the script is best for placement of
/INCLUDE and /OMIT statements.
• /INFILE
• /INREC
• /OUTFILE.
Any records that are not required in the output should be filtered out in the generation
(input) or /INREC phase prior to sorting (see /INREC on page 126). This makes the sort
faster by keeping unnecessary records from being processed. Be sure such records will
not be required for deriving fields in the output.
This example, inc.rcl, shows how /INCLUDE and /OMIT logic is applied differently,
depending on whether used in the input or output section(s) of a job script:
/INFILE=inc.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(letter1,POSITION=1,SIZE=1,alpha)
/FIELD=(rest,POSITION=2,SIZE=3,alpha)
/INCLUDE WHERE letter1 GE "A" AND letter1 LE "C"
/REPORT
/OUTFILE=lettera.out
/INCLUDE WHERE letter1 == "A"
/OUTFILE=letterb.out
/INCLUDE WHERE letter1 == "B"
/OUTFILE=letterc.out
/OMIT WHERE letter1 == "A" OR letter1 == "B"
In this example, RowGen generates 10 random fields, where the first field consists of a
single letter, A, B, or C. These ten records are sent to separate output files, depending on
the include or omit conditions at the outfile level. Note that the omit logic used in the
final output file is the equivalent of using
In this example, the condition criteria were specified explicitly. See Example 44 on
page 148 for an example of using named conditions
lettera.out contains:
AFUz
AKCG
letterb.out contains:
BKpz
BtBE
BbHW
BnsH
BCxW
letterc.out contains:
CcwH
CuDm
CyeB
This example, named.rcl, uses /INCLUDE and /OMIT statements with named
conditions:
/INFILE=named.in
/PROCESS=RANDOM
/FIELD=(Code,POSITION=1,SIZE=3,alpha)
/FIELD=(Price,POSITION=5,SIZE=5,NUMERIC)
/CONDITION=(caps_only,TEST=(Code GE "A" AND Code LE "Z"))
/CONDITION=(under_ten,TEST=(Price < 10))
/OMIT WHERE under_ten
/INCLUDE WHERE caps_only
/INCOLLECT=10
/OUTFILE=named.out
ARM 53.75
BoD 95.67
DeZ 14.23
EGS 42.12
EbD 65.52
GbH 67.39
KKD 26.41
PBH 77.16
UOK 81.70
XbJ 31.21
In this case, records are not generated when they satisfy the under_ten /OMIT
condition, and records are generated only when they satisfy the caps_only /INCLUDE
condition. The /INCOLLECT statement determines the number of records generated that
satisfy both input file conditions.
This example, order1.rcl, shows how the output is affected when an /INCLUDE
statement appears before an /OMIT statement:
/INFILE=order1.in
/PROCESS=RANDOM
/INCOLLECT=15
/FIELD=(code,POSITION=1,SIZE=1,alpha)
/FIELD=(rest,POSITION=2,SIZE=3,alpha)
/FIELD=(Price,POSITION=6,SIZE=5,NUMERIC)
/INCLUDE WHERE Code GE "A" AND Code LE "C"
/OUTFILE=order1.out
/INCLUDE WHERE Code GE "A" AND Code LE "B"
/OMIT WHERE Price < 10
In this example, 15 records are generated where the first character can be only A, B, or
C. The order of the include and omit statements in the output section determines how the
filter logic is applied in the resultant output file. After generation and processing
(a default left to right sort in this case), RowGen first checks the include condition. If
the code field is A or B (as the condition states), these records are produced to ouput
regardless of the subsequent omit statement. All other records, that is, those with code C,
are subject to the evaluation criteria of the subsequent /OMIT statement.
AFGw 34.97
AQsE 68.50
AXxA 39.10
AZtM 81.95
Aenn -7.80
AmFc 75.58
BFjG 84.76
BbXB 3.45
Bfmn 43.41
ByXG 19.21
CFHR 27.62
CGyw 72.19
CPCV 84.15
CnIa 74.38
CwgD 26.63
Note that all records satisfying the first include statement (A and B records) are included
without exception. Only the C records were left to be evaluated by the omit condition,
and therefore, C records satisfying the under_ten condition were omitted from the
results.
This example, order2.rcl, shows how the output is affected when an /OMIT statement
appears before an /INCLUDE statement:
/INFILE=order2.in
/PROCESS=RANDOM
/INCOLLECT=15
/FIELD=(code,POSITION=1,SIZE=1,alpha)
/FIELD=(rest,POSITION=2,SIZE=3,alpha)
/FIELD=(Price,POSITION=6,SIZE=5,NUMERIC)
/INCLUDE WHERE Code GE "A" AND Code LE "C"
/OUTFILE=order2.out
/OMIT WHERE Price < 10
/INCLUDE WHERE Code GE "A" AND Code LE "B"
The output is different for this script because the order of the /INCLUDE and /OMIT
statements is reversed. Generated records are first tested for the condition under_ten,
and if true, these records are omitted regardless of the subsequent /INCLUDE statement.
Of the surviving records that are not omitted (that is, those with prices greater than 10),
the /INCLUDE evaluation is then applied.
AJED 76.51
AaKh 87.32
AgCe 79.67
AoQQ 21.51
AsER 90.73
Attz 29.85
AvhO 30.26
BKLe 72.03
BRnB 67.32
Note that all records with prices less than ten were omitted from the results without
exception. Of the remaining records, the include statement is considered,
which is why only A and B records are produced.
RowGen permits the value for /FIELD statements in /INREC and output file
descriptions, and the value for /DATA statements in output file descriptions to be derived
from IF-THEN-ELSE logic. When using this logic, the syntax for these statements is:
/FIELD=(field-name[,field attributes][,IF-THEN-ELSE])
/DATA=IF-THEN-ELSE
If the condition is true, the resultant value of the /FIELD or /DATA statement is
value1; if the condition is false, the resultant value is value2. A condition is a
named condition or a logical expression as discussed in CONDITIONS starting on
page 139. A value is a field name, literal (numeric value or character string), algebraic
expression, summary value, or another IF-THEN-ELSE derived value.
Below is an example with a conditional /DATA and a /FIELD statement, which are used
in the output section(s) of a script:
/CONDITION=(DT,TEST=(type == "deciduous"))
/FIELD=(tree, POSITION=2, SIZE=4, IF DT THEN oldtree ELSE "PINE")
/DATA=IF DT THEN instructions ELSE "none"
where type, oldtree, and instructions are previously defined input fields where
SET files were used, and "PINE" and "none" are string literals. Notice that the string
must be in double quotes. Also, remember that /DATA statements will appear
immediately after any preceding /FIELD or /DATA statements while a /FIELD
statement may have its POSITION and SIZE defined.
You can get the same results as above using implicit conditions:
The first implies an empty value for the ELSE clause, and the second implies an empty
value for the THEN clause. Below are examples:
A next level of IF-THEN-ELSE may appear in either or both of the ELSE or THEN
clauses. Any number of levels can be defined to meet the degree of complexity in your
requirements.
IF C1 \
THEN IF C2 \
THEN V1 \
ELSE V2 \
ELSE IF C3 \
THEN v3 \
ELSE V4
The rule is that each THEN or ELSE clause is associated with the most recent IF that does
not already have a THEN or ELSE clause associated with it. The line continuation at the
end of each line of the statement is necessary for RowGen to evaluate the statement
correctly.
When long lists are to be tested for a true condition, processing will be faster if
NOTE the cases that are most likely to be true are specified first.
Following are examples using conditional field and conditional data statements.
DBD:screwdrivers
CBC:hammers
ABC:glue sticks
EBD:lightbulbs
ABE:pliers
BBC:ratchets
BBD:buzz saws
EBE:switches
BBE:sanders
CBD:nails
CBE:tacks
DBC:screws
DBE:wrenches
EBC:drills
ABD:lighters
This example, cond_data.rcl, includes a /DATA statement that identifies those records
from parts_list3.set where the prefix code begins with the letter A. An /INREC
statement is used to isolate the first letter of the prefix code for evalution purposes
(see /INREC on page 126):
/INFILE=cond_data.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(part,SET=parts_list3.set,POSITION=1,SIZE=30)
/INREC
/FIELD=(prefix=sub_string(part,1,1),POSITION=1,SIZE=1)
/FIELD=(part,POSITION=1,SIZE=30)
/REPORT
/OUTFILE=cond_data.out
/FIELD=(part,POSITION=1,SIZE=30)
/DATA=IF prefix EQ "A" THEN "A code" ELSE "not A"
EBE:switches not A
CBD:nails not A
ABE:pliers A code
DBC:screws not A
BBD:buzz saws not A
BBE:sanders not A
BBC:ratchets not A
ABC:glue sticks A code
CBE:tacks not A
DBE:wrenches not A
This job script, cond_field.rcl, uses a multi-level condition for an output field where the
test is based on the value of the /INREC-derived prefix field (see /INREC on
page 126):7
/INFILE=cond_field.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(part,SET=parts_list3.set,POSITION=1,SIZE=30)
/INREC
/FIELD=(prefix=sub_string(part,1,1),POSITION=1,SIZE=1)
/FIELD=(part,POSITION=1,SIZE=30)
/REPORT
/OUTFILE=cond_field.out
/FIELD=(part,POSITION=1,SIZE=30)
/FIELD=(test1,POSITION=31,SIZE=10,IF prefix EQ "A" THEN "A code" \
ELSE IF prefix EQ "B" THEN "B code" \
ELSE IF prefix EQ "C" THEN "C code" \
ELSE "other")
EBE:switches other
CBD:nails C code
ABE:pliers A code
DBC:screws other
BBD:buzz saws B code
BBE:sanders B code
BBC:ratchets B code
ABC:glue sticks A code
CBE:tacks C code
DBE:wrenches other
RowGen can produce output records containing summary fields derived from
accumulated detail records that have been generated. (Detail records are the records that
comprise the summaries.) Multiple levels of summary records can be created in the
same pass. Depending on the levels of BREAKs you specify, multiple levels of subtotals
can be provided, along with a grand total.
For an example of a multi-level report that uses all RowGen-supported aggregation
functions, see Report with Multiple Aggregations on page 31.
One or more summary fields may be derived using the following RowGen features:
• Summary and Average Functions on page 157
• Maximum and Minimum Functions on page 159
• Counting Function on page 161
• Ranking on page 161.
There are two steps required to define a summary field within a summary or detail
record:
1) Use an output /FIELD statement to describe its position and format in the output
record.
2) Use one of the above functions to determine how the value of the field is derived.
Summary records can be formatted differently at each level. Each of these levels can be
written to a separate file, or merged into one file to produce a structured report.
You can also create RUNNING summary fields in the detail records. These running, or
accumulating, summary fields are updated at each record.
Use the /ROUNDING statement to change the way in which numeric values with several
decimal places are rounded after an arithmetic RowGen operation (see /ROUNDING on
page 177).
A sum is accumulated until each break. With /AVERAGE, the accumulating sum is
divided by the number of records before each break. A grand total and file-wide
average is produced if there is no grouping via breaks.
The FROM portion indicates the source of values; it can be a generated field name or a
mathematical expression which references one or more previous field names. If an
expression is used, it should be enclosed with parentheses.
When using an expression to define the FROM clause, you should enclose the
NOTE expression in parentheses. Only simple expressions can be used in the FROM of
a summary definition. You may not use parentheses for grouping. For
example, you may have:
/SUM=Exp_B FROM (A * B - 5)
but not
/SUM=Exp_B FROM (A * (B - 5))
BREAK Condition2 controls when the /SUMMARY or /AVERAGE record is output and
the values are reset.
Again, there is a natural BREAK at the end of the job; therefore, if the BREAK portion is
not used, or Condition2 never becomes true, results for the whole job display.
Consider the SET file, publishers.set, which consists of names of book publishers:
Prentice Hall
Harper-Row
Dell
Valley Kill
Academic Press
Cambridge University Press
Macmillan
The following job script, sum_avg.rcl, generates 100 records where the publisher name
is randomly selected from the set file, and generates several book prices (up to $20 each)
for each publisher. The summary information is grouped by individual publisher, so the
job must be sorted over the publisher key.
/INFILE=sum_avg.in
/PROCESS=RANDOM
/FIELD=(publisher,SET=publishers.set,POSITION=1,SIZE=30)
/FIELD=(price,POSITION=31,SIZE=5.2,NUMERIC)
/INCLUDE WHERE price > 0 AND price < 20
/SORT
/KEY=publisher # required as a sort key when BREAK is used
/OUTFILE=sum_avg.out
/FIELD=(publisher,POSITION=1,SIZE=30)
/FIELD=(tot_cost,POSITION=32,SIZE=8.2,NUMERIC)
/FIELD=(ct_books,POSITION=41,SIZE=4)
/FIELD=(avgprice,POSITION=48,SIZE=7.2,NUMERIC)
/SUM=tot_cost FROM price WHERE price < 10 BREAK publisher
/COUNT=ct_books WHERE price < 10 BREAK Publisher
/AVERAGE=avgprice FROM price WHERE price < 10 BREAK publisher
Note that:
• the first numeric column in the output contains the sum (/SUM) of all the
sub-$10 price values for each publisher.
• the second columncontains a count of all sub-$10 records for that publisher
(see Counting Function on page 161)
• the final column is the average price, calculated as the total sum of prices divided
by the number of records, for that publisher
• WHERE clauses were applied in the summary fields to produce information for
only those records where the price was less than $10, and therefore the number
of records returned on output was fewer than those generated
(/INCOLLECT=100, by default).
/MAXIMUM and/MINIMUM are used to calculate the maximum and minimum values,
respectively, of a field.
/MAX and /MIN are used the same way as /SUM or /AVERAGE. The syntax is:
If two or more records share the same /MAX or /MIN field value, RowGen will output
the first sorted record that satisfies the condition.
Consider the SET file, publishers.set (see Summary and Average on page 158).
In this example, max_min.rcl, RowGen produces maximum and minimum values for
prices, by publisher:
/INFILE=max_min.in
/PROCESS=RANDOM
/FIELD=(publisher,SET=Publishers.set,POSITION=1,SIZE=30)
/FIELD=(price,POSITION=31,SIZE=7,NUMERIC)
/INCLUDE WHERE price > 0
/SORT
/KEY=publisher # required as a sort key when BREAK is used
/OUTFILE=max_min.out
/FIELD=(publisher,POSITION=1,SIZE=30)
/FIELD=(min_price,POSITION=32,SIZE=8.2,MILL,NUMERIC)
/FIELD=(max_price,POSITION=43,SIZE=8.2,MILL,NUMERIC)
/MIN min_price FROM price BREAK publisher
/MAX max_price FROM price BREAK publisher
Note that the MILL option produced numeric values with a comma where expected (see
MILL on page 91).
In this example, the aggregate records are displayed without their component
NOTE detail records. You can add a same-name /OUTFILE section to include the
detail records in the above report, as illustrated in Example 14 on page 31.
To also display the minimum and maximum for all departments (that is, with no
BREAK), add the following lines to the bottom of the script:
/OUTFILE=min_max.out
/HEADREC="--------------------------------------------------\n"
/FIELD=(min_price,POSITION=32,SIZE=8.2,MILL,NUMERIC)
/FIELD=(max_price,POSITION
=43,SIZE=8.2,MILL,NUMERIC)
/MIN min_price FROM price
/MAX max_price FROM price
In this case, the grand-total record (without a BREAK) was specified in the job script after
the sub-total records (with a BREAK), but both /OUTFILE names were the same. When
creating a script with detail records and sub-totals (and or/grand totals), the detail
records must be specified in thelast same-name /OUTFILE section in the script, as
illustrated in Example 14 on page 31.
FieldX will contain the count of records that satisfy Condition1. The record
containing FieldX is displayed and reset to 0 when Condition2 occurs. If no WHERE
condition is specified, the records are counted until the BREAK occurs. If no BREAK is
given, all records that satisfy Condition1 are counted and displayed at the end of the
job. See Summary and Average on page 158 for an example of using a /COUNT
statement.
19.4 Ranking
One type of statistical analysis involves assigning a sequential rank to a set of data
values. You can use RowGen to rank generated records by performing a descending-
order sort, and use /COUNT RUNNING option in the output. The RUNNING option is
described in Running (Accumulating) Aggregates on page 164.
The following RowGen syntax can be used as a template for a ranking job:
/INFILE=...
/FIELD=(data_value, ...)
...
/SORT
/KEY=(data_value,DESCENDING)
/OUTFILE=...
/FIELD=(rank, SIZE=n.0,NUMERIC)
/FIELD=(data_value, ...)
...
/COUNT=rank RUNNING WHERE data_value
The data_value is the input field on which ranking is to be performed, and this must
be the primary /KEY field for sorting (descending order for highest-value-first ranking).
The WHERE clause is required for instances when there are equal values in data_value.
Example 51 Ranking
Suppose that you want to create a report that ranks salespeople by the value of sales they
have made, with the highest sales ranking first. In the production application, the top
three ranking salespeople are to earn a bonus.
Jane
Sam
Frank
Melanie
Adam
Mary
Robert
John
Vanessa
Donald
Sarah
Lawrence
Laura
Roger
Keith
Paul
Jennifer
Henry
Richard
The following RowGen script, rank.rcl, generates 10 records, where names are selected
randomly from the set file, and a sales figure (in thousands of dollars) is generated for
each. The use of /COUNT with RUNNING and a descending sort key will rank the
salespeople by the value of sales they have made:
/INFILE=rank.in
/PROCESS=RANDOM
/INCOLLECT=10
/FIELD=(person,SET=people.set,POSITION=1,SIZE=11)
/FIELD=(sales,POSITION=13,SIZE=7.2,NUMERIC)# values up to 9999.99
/INCLUDE WHERE sales GE 1000 # ensures values of over 1000
/SORT
/KEY=(sales,DESCENDING) # Highest value is ranked first
/OUTFILE=rank.out
/FIELD=(rank,POSITION=1,SIZE=2.0,NUMERIC)
/FIELD=(person,POSITION=4,SIZE=11)
/FIELD=(sales,POSITION=15,SIZE=9.2,CURRENCY)
/FIELD=(bonus,POSITION=26,IF rank < 4 THEN "Bonus" ELSE "No Bonus")
/COUNT rank RUNNING WHERE sales
Note how the generated numeric values are converted to currency, which requires an
increase in field size to accommodate the expected comma (,) and $ in the new data type
(see ASCII-Numeric Data Types on page 95).
Note that the ranking was performed in descending order of sales made for each
salesperson. The top three are to receive bonuses, as specified using IF THEN ELSE
logic in the final output field (see CONDITIONAL FIELD AND DATA STATEMENTS
on page 152).
To produce a report that has detail records, any number of subtotals, and a final total in
the same output file, use the same output file name to define each type of record. For an
example of a multi-level report that uses all RowGen-supported aggregation functions,
see Report with Multiple Aggregations on page 31.
For each aggregation function, you may display RUNNING aggregate values in the detail
records. To define a running summary, for example, you must add the parameter
RUNNING to the summary definition.
Using publishers.set (see Example 49 on page 158), the following script, running.rcl,
produces a running record count, plus a running sum and average price values for each
publisher:
/INFILE=sum_avg.in
/PROCESS=RANDOM
/FIELD=(publisher,SET=publishers.set,POSITION=1,SIZE=30)
/FIELD=(price,POSITION=31,SIZE=5.2,NUMERIC)
/INCLUDE WHERE price > 0 AND price < 10
/SORT
/KEY=publisher # required as a sort key when BREAK is used
/OUTFILE=runninga.out
/FIELD=(publisher,POSITION=1,SIZE=30)
/FIELD=(ct_books,POSITION=32,SIZE=4)
/FIELD=(tot_cost,POSITION=36,SIZE=8.2,NUMERIC)
/FIELD=(avgprice,POSITION=52,SIZE=7.2,NUMERIC)
/COUNT=ct_books RUNNING BREAK Publisher
/SUM=tot_cost FROM price RUNNING BREAK publisher
/AVERAGE=avgprice FROM price RUNNING BREAK publisher
Note that all the individual records for each publisher are displayed with a running count
in the second column, followed by a running sum and a running average. Note that
without the RUNNING option, only one record -- the record displaying the total -- would
be displayed for each publisher.
20 SEQUENCER
SEQUENCER is an internal field that keeps a running count of records. The SEQUENCER
field can be positioned and sized, as required, and is particularly useful in database
indexing and re-loading work. It is used in the /OUTFILE section of a job script.
See ROWID on page 74 for details on assigning a unique, incrementing row number or
ID tag to a field within the /INFILE section of a job script.
When generating output with both summary BREAK and detail records, SEQUENCER can
be used in both types of records. Every time there is a BREAK record, the counter for the
detail records gets reset. SEQUENCER also displays a running count for the BREAK
records. Example 53 on page 166 demonstrates this.
The field named SEQUENCER may appear in one or more output files with the
following syntax:
/FIELD=(SEQUENCER[=[+/-]n],[field attributes])
where n is a whole number and you choose either "+" or "-" to indicate whether the
initial value is positive or negative. If no initial value is given, then the initial value is 1.
If no sign is given, then the sign is assumed to be positive.
/FIELD=(SEQUENCER,POSITION=5,SIZE=4)
/FIELD=(SEQUENCER=0,POSITION=5,SIZE=4)
/FIELD=(SEQUENCER=-10,POSITION=5,SIZE=4)
/FIELD=(SEQUENCER=+100,POSITION=5,SIZE=4)
The following example, seq.rcl, includes a SEQUENCER field for both the detail records
that comprise each group, and an additional group-level SEQUENCER field:
/INFILE=seq.in
/PROCESS=RANDOM
/FIELD=(code_letter1,POSITION=1,SIZE=1,alpha)
/FIELD=(code_letter2,POSITION=2,SIZE=1,alpha)
/FIELD=(code3_num,POSITION=3,SIZE=1,WHOLE)
/FIELD=(value,POSITION=6,SIZE=5.2,NUMERIC)
/OMIT WHERE code_letter1 > "C"
/OMIT WHERE code_letter2 > "C"
/INCLUDE WHERE value > 0
/INCOLLECT=15
/SORT
/KEY=(POSITION=1,SIZE=3)
/KEY=value
/OUTFILE=seq.out
/FIELD=(tot_value,POSITION=10,SIZE=6.2,NUMERIC)
/DATA=" Group "
/FIELD=(SEQUENCER=+101,SIZE=3)
/DATA="\n"
/SUM tot_value FROM value BREAK code_letter1
/OUTFILE=seq.out
/FIELD=(SEQUENCER,POSITION=1,SIZE=3)
/FIELD=(code_letter1,POSITION=5,SIZE=1)
/FIELD=(code_letter2,POSITION=6,SIZE=1)
/FIELD=(code3_num,POSITION=7,SIZE=1)
/FIELD=(value,POSITION=10,SIZE=6.2,NUMERIC)
1 AA7 8.08
2 AB3 85.13
3 AB3 93.60
4 AB7 43.64
5 AC4 9.83
6 AC5 32.56
272.84 Group 101
1 BA2 64.73
2 BB1 1.33
3 BB2 2.11
4 BB5 79.62
5 BC3 59.21
207.00 Group 102
1 CA0 26.78
2 CA5 48.63
3 CA8 73.75
4 CC1 67.52
216.68 Group 103
Note that the SEQUENCER count for the 15 detail records is re-started with each new
group, and a different sequencer is applied to the groups themselves (A, B, or C). The
attribute SEQUENCER=+101 ensured that aggregate sequences would begin at 101.
21 OUTPUT OPTIONS
RowGen provides several horizontal record filtering statements for managing the size,
number, and flow of records.
This section describes the statements relating to the output phase of RowGen record
management:
• /CREATE
• /APPEND
• /HEADREC on page 170
• /FOOTREC on page 171
• /RECSPERPAGE on page 172
• /OUTSKIP on page 173
• MISCELLANEOUS OPTIONS on page 174.
/OUTCOLLECT, another output option, controls the number of records that are produced
on a per-outfile basis. This is described in /OUTCOLLECT on page 59.
/CREATE
This is the default specification for an output target. It indicates that a new output file
will be created. If the file name already exists, all previous data in the file will be lost,
even if nothing is written by this job.
/APPEND
You can associate an /APPEND with any output to cause output data to be placed
directly after the existing data in a file. If the file does not exist or is empty, /CREATE
will be invoked.
/HEADREC
This statement creates a customized header record in the output file (report). See
/RECSPERPAGE on page 172 for details on making the header record appear on every
page of output.
The character string can be a constant which may contain any combination of
internal variables and control (escape) characters recognized by RowGen (see Table 7
on page 131 and Table 8 on page 132).
A character string can also contain syntax specific to a mark-up language, such
NOTE as HTML, provided the browser (or other utility) you use to read the output file
accepts those statements. For an example of producing an HTML report, see
Example 15 on page 33).
where \n indicates a new line, and %s is the format for the variable CURRENT_DATE.
See Table 7 on page 131 for a list of accepted variables.
The \n is needed to cause a line-feed. Without it, the first record will display
NOTE immediately after the header on the same line. Also, the variables must be
listed in the order in which they will appear in the string.
Table 14 lists the display formats that RowGen supports for customizing output reports.
/FOOTREC
This statement uses the same syntax as /HEADREC but its string values appear at the
bottom of the output data as a footer (see /HEADREC on page 170). See
/RECSPERPAGE on page 172 for details on making the footer record appear on every
page of output.
The character string can be a constant which may contain any combination of
internal variables, control (escape) characters, and conversion-specifier characters
recognized by RowGen (see Table 7 on page 131, Table 8 on page 132, and
Table 9 on page 133).
A character string can also contain syntax specific to a mark-up language, such
NOTE as HTML, provided the browser (or other utility) you use to read the output file
accepts that syntax. For an example of producing a custom HTML report, see
Example 15 on page 33.
/RECSPERPAGE
This statement sets the number of records displayed on each page of a RowGen output
report. It is used in conjunction with /FOOTREC and/or /HEADREC. If /RECSPERPAGE
is specified, the /HEADREC header and/or /FOOTREC footer will output on each page.
If the /RECSPERPAGE statement is not given, the header and/or footer will only be
displayed once (at the start and end of the file, respectively).
/RECSPERPAGE=10
After every 10 records have been displayed or written, the header and/or footer will be
repeated if /HEADREC and/or /FOOTREC are defined.
Using /RECSPERPAGE does not actually cause a page break; it designates the
NOTE number of records to be printed before printing the footer and header records
again. In order to force a pagebreak, a "\f" should be included in the
/FOOTREC statement.
For example, the following report is produced from a job script where
/RECSPERPAGE=5 was specified on output, and where total of seven records were
generated by RowGen (note that a character SET file was used):
Page 1
Page 2
/OUTSKIP
This statement skips the first n number of processed records on output that satisfy any
previous /INCLUDE or /OMIT selection criteria. The syntax is:
/OUTSKIP=n
where the first n number of sorted/processed records will be excluded from the output
target to which the /OUTSKIP statement applies.
22 MISCELLANEOUS OPTIONS
This section describes the various RowGen statements relating to neither input nor
output:
• /RC
• /EXECUTE
• /MONITOR on page 174
• Runtime Warnings (/WARNINGSON and /WARNINGSOFF) on page 177
• /ROUNDING on page 177.
22.1 /RC
This statement allows the user to turn off the generation of output from RowGen so that
the user may
• test for errors and warnings during the debugging of a specification file
• check tuner settings being used from rowgenrc files and /MEMORY-WORK
(see Using Customized Resource Control Files on page 209).
22.2 /EXECUTE
This statement causes the operating system shell to execute the specified command.
It can be placed anywhere in the job script. The syntax is:
/EXECUTE="command_statement"
An example is:
When the above was executed, the following was echoed to the file joblog:
22.3 /MONITOR
This statement allows you to monitor the progress of each RowGen job by setting a
level of runtime server messages that will report through stderr. Important events such
as job start and stop, file opens and closes, and record throughput can each be reported
with a timestamp. The syntax of the statement is:
/MONITOR=n
where n is a whole number greater than or equal to 0 and less than 16. The statement can
appear anywhere in the job script. The MONITOR_LEVEL can also be set in the rowgenrc
file or Windows registry (see MONITOR_LEVEL level number on page 216).
Level Description
0 no monitoring
Show job initiation and job completion. This includes for the
1
job itself, the sort processes, and the merge processes.
2 Includes Level 1 plus the opening and closing of output files.
Includes Level 2 plus the opening and closing of temporary
3-9 files. Each progressive level will show number of records
processed with an increasing degree of frequency.
3 every 1,000,000 records
4 every 100,000 records
5 every 10,000 records
6 every 1,000 records
7 every 100 records
8 every 10 records
9 every 1 record
10-14 undefined
15 Use the monitor value in the rowgenrc file.
The following are examples of running the same RowGen job at different monitor level
settings. This job creates, and sorts, a 500MB file.
Level 1
C:\RowGen>rowgen /spec=gen.rcl /monitor=1
Level 2
Level 3
Because this sort requires the creation of temporary files, the lines pertaining to
workfiles are displayed, which shows that the temporary files are being deleted as they
are merged together.
Below is the final output that /MONITOR will display at this level:
Level 5
When generating a large file, you will see, at this level, when every 10,000 records are
processed as the job is running:
If there are criteria for accepting or rejecting records, you will periodically see the
accepted and rejected line as the job is running:
Note that when running a job without sorting (using /REPORT), the Accepted
NOTE display reflects the number of rows being generated / processed.
The command /WARNINGSON captures warning messages which do not stop execution,
but do indicate where some steps may have been omitted. /WARNINGSOFF is the default
and turns off the output of the warning messages.
22.5 /ROUNDING
The /ROUNDING statement, which should be placed at or near the top of your RowGen
script, is used to control how rounding is to be performed throughout the execution of
the job. One of two options are available:
/ROUNDING=SYSTEM The default. This produces results in the manner determined by your
operating system.
WARNING!
The ROUNDING=NEAREST option slows numeric output. If your
calculations do not require this level of accuracy, or if your
system rounds in your preferred manner by default, then
/ROUNDING=NEAREST should not be specified.
23 USING SEEDS
If you do not specify a seed, RowGen will use a random seed value each time
NOTE a job is run.
/SEED=seed_value
where seed_value is an integer between 0 and 65535, such as /SEED=9876. You can
also use an environment variable such as /SEED=$value where value is equal to the
desired seed value.
The following job script, seeded.rcl uses the seed value 12345.
/INFILE=seeded.in
/PROCESS=RANDOM
/SEED=12345 # use a custom start seed
/FIELD=(string,POSITION=1,SIZE=5,alpha)
/INCOLLECT=7
/REPORT
/OUTFILE=seeded.out
VzBaQ
DrGSq
zJUxW
JDISk
QgZfc
bHlYz
MYbEl
Record generation was initiated with the seed value 12345. RowGen will produce
identical results every time this job is run.
minimum 159 R
minimum field size 83
miscellaneous options 174 ranking 161
mod (x,y) 125 RECORD file format 63
MONITOR_LEVEL 175 record filters 169, 174
multi-column SET files 119 RECORD_SEQUENTIAL file format 63
multiplication 123 records
common input 126
fixed-length 63
N footer 171
format control characters 170
named conditions 139 header 170
named keys 135 length 64
naming conventions 50 selection 146
naming fields 74 variable-length 63
NC 140 records per page 172
NE 140 registry 60, 175
newline 132 relational operators
no duplicates 138 CT 140
NT File System. See NTFS. EQ, == 140
NTFS 50 GE, >=, !< 140
NULL 92, 132 GT, > 140
numeric SET files and ranges 113 LE, <=, !> 140
numeric SET files, ranges 113 list of 140
LT, < 140
NC 140
O
NE, != 140
relational SET files 119
ODBC file format 67
reports with summaries 163
ODS 47
resource control file 60
OMIT
resource controls
defined 146
search order 211
Operational Data Store. See ODS.
right align 90
optional statements 49
row ID 74
optional values 49
RowGen tools
csv2ddf 66
P elf2ddf 69
ROWID 74
PAGE_NUMBER 131, 171 rows. See records.
pagebreak 172 RUNNING
parentheses in arithmetic 123 with aggregates 156, 164
pattern matching 142 running aggregates 164
PERMUTE 59 runtime statistics 69
POSITION 79 runtime warnings 177
pow (x,y) 125
precision 84
priority of resource control settings 211
S /LENGTH 64
/MAXIMUM 159
search order for resource controls 211 /MEMORY-WORK 174
seeds 179 /MINIMUM 159
SEPARATOR 86 /NODUPLICATES
separator characters in fields 86 defined 138
SEQUENCER 166 /OMIT
SEQUENCER. See also RUNNING with defined 146
/COUNT. /OUTCOLLECT 59
SET files 77 /OUTFILE 58
character SET files 111 /OUTSKIP 173
date SET files and ranges 115 /PROCESS 63
literal SETs 122 /RC 174
numeric SET files and ranges 113 /RECSPERPAGE 172
relational SET files 119 /REPORT 57
SET source 111 /ROUNDING 177
SGML 34 /SEED 179
sin (x) 125 /SORT 57
single quote 132 /STABLE 138
sinh (x) 124 /STATISTICS 69
size of fields /SUM 157
substrings 81, 84 /VERSION 52
specification file 61 /WARNINGSOFF 177
sqrt (x) 125 /WARNINGSON 177
statements statistical analysis
/APPEND 169 ranking 161
/AUDIT 218 statistics file 69
/AVERAGE 157 stderr 174
/COUNT 161 stdout 51, 177
/CREATE 169 substrings 81, 84
/DATA 133 subtraction 123
defined 129 summary functions 156
/DUPLICATESONLY 138 /AVERAGE 157
/EXECUTE 174 /COUNT 161
/FIELD 73 /MAXIMUM 159
/FOOTREC 132 /MINIMUM 159
defined 171 /SUM 157
/HEADREC 132 BREAK 139, 156
defined 170 with /COUNT 161
/INCLUDE with /MAXIMUM and /MINIMUM
defined 146 159
/INCOLLECT 59 with /SUM and /AVERAGE 157
/INFILE 56 FROM 157, 159
/INFILES 56 RUNNING 156, 164
/INREC summary reports 163
defined 126 SYSDATE 131
/KEY 135
XML 34
XML file format 67
ROWGEN TOOLS
This chapter discusses the additional tools that RowGen provides to facilitate the
creation of field layouts. It contains the following sub-chapters:
cob2ddf
1 PURPOSE
RowGen offers a metadata translation program for users with input data from a COBOL
application who want to convert the record (copybook) layouts into RowGen data
definition files. The cob2ddf (COBOL-to-data definition file) program, located in the
directory $RowGen11/bin on UNIX (\RowGen11\bin on Windows), produces
descriptive file name and field-layout text that can be referenced by, or pasted directly
into, a RowGen job specification script (see Data Definition Files on page 62). The
RowGen language allows you to generate and produce multiple, differently formatted
output files and structured reports, while also replicating some of the data
transformation and reformatting capabilities of COBOL and legacy sort programs.
Currently, cob2ddf does not convert the entire range of COBOL data-definition
functionality, but it provides a convenient way to convert field descriptions. See the
Micro Focus COBOL Language Reference for documentation on the data description
portion of COBOL programs.
2 USAGE
where filename.cbl is the name of the copybook or file description file, and
RowGen_script is the resulting RowGen specification file. For example, the
command:
converts the file and field descriptions in cpybk.cbl to a RowGen data definition file
called cpybk.ddf. For details on how to reference a .ddf file from within a RowGen job
script, see Data Definition Files on page 62.
3 EXAMPLE
01 REG
05 PLANT PIC X(08).
05 FREE PIC 9(10).
05 CLIENT PIC X(09).
05 CARRIER-18 PIC 9(12).
05 CARRIER-23 PIC 9(12).
05 INTEREST PIC S9(11) sign is leading
separate character.
05 CARGOES PIC S9(12) sign is leading
separate character.
To create the RowGen equivalent, enter the following on the command line:
This file gives the record layout (field definitions) for a file that can be used in the input
and/or output section of a RowGen job specification script. If using the data definition
file in this manner, then the following statement needs to be placed at the top of your
RowGen job specification file:
/SPECIFICATION=cob.ddf
You may also copy the field definitions into the job specification file.
csv2ddf
1 PURPOSE
WARNING!
For the purposes of csv2ddf, it is expected that the first record
of a Microsoft .csv file data is preceded by header descriptions.
Therefore, if your data file does not have a header, you cannot
use the csv2ddf program. See SEPARATOR on page 86 for
details on how to use RowGen to define input file fields for
comma-delimited records.
2 USAGE
Using the resultant .ddf that is created, you can include the field layout in the input
section of a RowGen script to generate fields of this type. It is then recommended that
you specify /PROCESS=CSV on output, and include the field layouts again so that a
header is automatically created in the output field based on the field names given
(see CSV on page 66), as shown in the following example.
3 EXAMPLE
This example shows how you can use both csv2ddf and RowGen to produce a random
data file with the same header and record layout as a pre-existing CSV file.
Element_Name,Windows_NT,Windows,Windows_CE,Win32s,Component,
Component_Version,Header_File,Import_Library,Unicode,Element_Type
ADsBuildEnumerator,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsBuildVarArrayInt,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsBuildVarArrayStr,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsEnumerateNext,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsFreeEnumerator,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsGetLastError,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsGetObject,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsOpenObject,4.0 or later,,,,ADSI,,adshlp.h,,,function
ADsSetLastError,4.0 or later,,,,ADSI,,adshlp.h,,,function
IADs::Get,4.0 or later,,,,ADSI,,iads.h,,,interface method
This produces the following file, csv.ddf, which can be invoked with a
/SPECIFICATION= statement from within a RowGen job specification script (see Data
Definition Files on page 62).
/FILE=comma_sep
/PROCESS=CSV
/LENGTH=0
/FIELD=(Element_Name,POSITION=1,SEPARATOR=',')
/FIELD=(Windows_NT,POSITION=2,SEPARATOR=',')
/FIELD=(Windows,POSITION=3,SEPARATOR=',')
/FIELD=(Windows_CE,POSITION=4,SEPARATOR=',')
/FIELD=(Win32s,POSITION=5,SEPARATOR=',')
/FIELD=(Component,POSITION=6,SEPARATOR=',')
/FIELD=(Component_Version,POSITION=7,SEPARATOR=',')
/FIELD=(Header_File,POSITION=8,SEPARATOR=',')
/FIELD=(Import_Library,POSITION=9,SEPARATOR=',')
/FIELD=(Unicode,POSITION=10,SEPARATOR=',')
/FIELD=(Element_Type,POSITION=11,SEPARATOR=',')
You can now generate random data, and produce a sample output file that conforms to
your original CSV format, including the insertion of a header record, as follows:
/SPECIFICATION=comma_sep.ddf
/INFILE=comma_sep # reference-in the .ddf created by ct2lddf
/INCOLLECT=7 # generate 100 rows of csv data
/REPORT # no ordering required
/OUTFILE=comma_sep.out
/PROCESS=CSV # generate header and frame fields in quotes
/SPECIFICATION=comma_sep.ddf # use same field layouts from the .ddf
Because the file type /PROCESS=CSV is present on output, a header record is created
from the field names, and field contents are automatically framed within double
quotes (").
Element_Name,Windows_NT,Windows,Windows_CE,Win32s,Component,Comp
onent_Version,Header_File,Import_Library,Unicode,Element_Type
"g06dCAtm","aNnbY5","HFhY6IS","Tn4JQG1a","ZWwYHuSymI","JG","kpnl
ieI9C","nT2o","7xLK9","JsCOGgm","9ZBXHv"
"A4PT7TN61","A58ZN","KNDLWWNf18","vvoQ","kMdtiS","tXcWvR","7yjlu
HTp","28teziD","qUp1x","rIB","CzNuwT"
"a8RU","GHBU","q","Hf8","JIvKH","D7","RPM5ay","UCyDk","nd","8tBr
Tq0y","FL"
"d09aOah","u","YxQ8P5","OUqqxDBvv","Kd","aBvYhL5","M2AoGn","XsNp
o","w1ZJFJ","DgNvDF2r","efuAyN8p07"
"HP0VwDAe","ZrRzUY","H","5","vALJufwAl","IJ8o4pTkhO","ObfP","1y6
N99vx","O2Tly","qSeDG","4Ml"
"SJ070U3qF","bWSqlGTiI","LMi9J","X5T","ohriL","cOUv023","JEvY8fd
","Nd6x3nr","10y","4kVQBx9n","Rj"
elf2ddf
1 PURPOSE
RowGen also handles web logs in CLF format. The sample RowGen data definition
files CLF_Referrer.ddf, CLF_Agent.ddf, and CLF_Access.ddf are provided in the
examples/RowGen directory.
2 USAGE
elf2ddf requires that your ELF file is in the W3C convention format described on
this page.
Entries consist of a sequence of fields relating to a single HTTP transaction. Fields are
separated by white space, the use of tab characters for this purpose is encouraged. If a
field is unused in a particular entry dash "-" marks the omitted field. Directives record
information about the logging process itself.
Lines beginning with the # character contain directives. The following directives are
defined:
• Version: integer.integer
The version of the extended log file format used.
• Fields: [specifier...]
Specifies the fields recorded in the log.
•
Software: string
Identifies the software which generated the log.
• Start-Date: date_time
The date and time at which the log was started.
• End-Date: date_time
The date and time at which the log was finished.
• Date: date_time
The date and time at which the entry was added.
• Remark: text
Specifies comment information. Data recorded in this field should be ignored by
analysis tools.
The directives Version and Fields are required and should precede all entries in the
log. The Fields directive specifies the data recorded in the fields of each entry.
2.2 Syntax
where filename.elf is the name of the extended log format file, and
RowGen_script is the resulting RowGen data definition file. For example, the
command:
converts the field layout descriptions in the clickstream.elf file header to a RowGen
data definition file named clickstream.ddf. For details on how to reference a .ddf file
from within a RowGen job script, see Data Definition Files on page 62.
3 EXAMPLE
#Version 1.0
#Date: 12-Jan-2007 00:00:00
#Fields: time cs-method cs-url
00:24:23 GET /tak/far.html
12:21:16 GET /tak/far.html
12:45:52 GET /tak/far.html
12:57:34 GET /tak/far.html
ctl2ddf
1 PURPOSE
The program ctl2ddf converts the column layouts specified in a SQL*Loader control
file (.ctl) into a data definition file (.ddf) containing /FIELD layout descriptions.
This .ddf then be used in RowGen job specification scripts (see Data Definition Files
on page 62) to create multiple, differently formatted load files and structured reports.
2 USAGE
2.1 Execution
ctl2ddf control_file
where control_file is the SQL*Loader control file (typically .ctl) containing the
column layout specifications you want to convert. The output is sent to filename.ddf,
where filename is the table name specified in the .ctl file into which the data will be
loaded.).
3 EXAMPLE
ctl2ddf test1.ctl
emp_sorted.ddf is generated:
/FILE=out.dat
/FIELD=(EMPNO, POSITION=1, SIZE=6, DOUBLE)
/FIELD=(ENAME, POSITION=7, SIZE=10)
/FIELD=(JOB, POSITION=17, SIZE=10)
/FIELD=(MGR, POSITION=27, SIZE=6, DOUBLE)
/FIELD=(SAL, POSITION=33, SIZE=11, DOUBLE)
/FIELD=(COMM, POSITION=44, SIZE=10, DOUBLE)
/FIELD=(DEPTNO, POSITION=54, SIZE=3, DOUBLE)
Note that the INFILE source for the .ctl file, out.dat, is used as the /FILE specification
in the .ddf (see in Data Definition Files on page 62).
This .ddf can be now invoked from within a RowGen job script, for example:
/SPECIFICATION=test1.ddf
/INFILE=out.dat
/SORT
/KEY=(JOB)
/KEY=(SAL, DESCENDING
/OUTFILE=stdout.dat
In this example, stdout.dat could be a named pipe for directly feeding output data
into another SQL*Loader operation.
APPENDIX
A PERFORMANCE TUNING
RowGen is designed to allow the user to control, and scale, machine resources. There
are no mechanisms to assume special privilege, disable interrupts, bypass the kernel, or
restrict other programs performing I/O. This good neighbor approach assures that
multiple sorts and non-sort jobs can run well concurrently. Accordingly, RowGen will
perform with minimal impact on the system with the default resource control settings
provided.
This section describes the tuning of parameters that can improve RowGen job
NOTE performance where sorting is involved (that is, if you are using the /SORT
option rather than /REPORT). These tuning descriptions are relevant only when
the job exceeds the upper memory limit used for internal (in-memory) sorts --
that is, when overflow to disk-based temporary files occurs.
Your system administrator can use the advice and instructions in this section to re-tune
RowGen for faster or slower performance. Manual performance tuning will affect the
execution of RowGen.
Memory Usage
The amount of sorting performed in memory has the greatest impact on the length of
time a job takes to run. For a large job, keeping few records in memory and writing
many to disk generally causes slower sort performance due to the physical reading and
writing of files. It can also cause job failure if there is insufficient disk space for the
records.
Conversely, using too much memory can cause thrashing, which also degrades
performance. On most computer systems, the best timings are achieved when all the
records of the sort can easily be retained in RAM.
For small sorts on lightly loaded machines, sorting in memory works well, especially
with the defaults provided by IRI. For larger jobs, or those running on very heavily
loaded machines, it is desirable to find new tuning values that will optimize the
resources spent. For huge jobs, or large jobs that you expect to run more than once, it
may be worthwhile to experiment with various memory-disk schemes to find a good
combination.
The setup program works at installation time to find the best amount of total memory
and shared memory for sorting. The values generated by setup assume that
approximately 35-40% of the machines resources can be given to a sort job.
On a heavily used system, it is likely that you will not get any memory without
swapping. The main concern is that requesting too much memory will thrash the system
or lead to complete memory usage. Advanced users will note that sar and vmstat on
various UNIX flavors, and mem on Windows machines can provide more detailed
memory usage information. See MEMORY_PERTHREAD_MAX in Resource Control
Settings on page 213.
Disk Usage
Overflow occurs in sorting when the input data volume exceeds available or specified
memory limits. When the memory is filled, the records in memory are sorted and written
to a temporary work file. When all the overflow data are distributed to temporary files,
these files and the internal data are merged to produce output. Typically, the required
overflow space is less than the size of the input. If the system does allow all the work
files to be open at once, it is possible to temporarily need more than the input size.
You are able to control where temporary files will be placed. Where multiple drives are
available, you can achieve increased speed, as the files are written to read back in
parallel. If your system provides striping, the I/O system will automatically
accommodate this. Otherwise you should assign different devices as overflow
directories. Furthermore, using different physical devices for work space and output will
distribute the space requirement in a multi-threaded fashion, avoiding I/O conflicts in
the merge phase.
In addition to specifying the location of sort overflow, the capacity of each work area
can be set. By restricting the amount of overflow allowed, system administrators can
more precisely control the impact and performance of large sort jobs on a busy machine;
see WORK_AREAS path on page 214.
See Search Order for Resource Controls on page 211 for details on how RowGen
prioritizes the recognition of resource control settings.
To access and edit the Windows Registry, you must run regedit.exe and select:
HKEY_LOCAL_MACHINE\SOFTWARE\Innovative Routines Int’l
Inc.\RowGen 2.1\Global Configuration.
Generally, the values you will be most concerned with are WORK_AREAS. These are the
directories where RowGen will put temporary files. For the best performance, it is
recommended that you name as many directories as possible which are on physically
separate devices.
You can include multiple entries for any given WORK_AREAS setting, for
NOTE example c:\sortwork1,d:\sortwork2.
The number of threads used for sorting can be controlled by the value of THREAD_MAX.
Other registry values should not be changed unless recommended by IRI’s technical
support staff.
It is recommended that you set ROWGEN_TUNER to the name of a resource control file.
This allows you to create and modify multiple resource control files, which can be
selected for different users, jobs, and system conditions. For details, see Using
Customized Resource Control Files on page 209.
See Search Order for Resource Controls on page 211 for details on how RowGen
prioritizes the different ways of setting resource control settings.
• edit the resource control template, and give it a name that is meaningful to a
specific job
• create your own resource control file and give it a name that is meaningful to a
specific job.
When creating your own resource control file, it is necessary to include only those
settings that you wish to be used other than the defaults. For example, the following is
valid resource control file contents in Windows:
THREAD_MAX 3
If you specify this resource control setting, the maximum number of processes
designated for the job will be 3 (for details on all the possible settings, see Resource
Control Settings on page 213). All other values for the job will be determined in the
order described in Search Order for Resource Controls on page 211.
U If you are in the C shell, the syntax for specifying a new resource control file to
be used is:
For example:
You can also use the /MEMORY-WORK statement to designate a resource control file. The
syntax is:
/MEMORY-WORK="[path]filename"
For example:
/MEMORY-WORK="/usr/mis/rowgenrc.job1"
The settings in the file rowgenrc.job1 would be read before the settings in any other
resource control file, and therefore take precedence.
W To specify the use of a new resource control file from a Windows command line,
type:
set ROWGEN_TUNER=c:\progra~1\IRI\RowGen21\bin\filename
For example:
set ROWGEN_TUNER=c:\progra~1\IRI\RowGen21\bin\rowgen_job1.rc
You can also use the /MEMORY-WORK statement to designate a resource control file. The
syntax is:
/MEMORY-WORK="[path]filename"
For example:
/MEMORY-WORK="\Progra~1\IRI\RowGen21\etc\rowgen.rc2"
The settings in the file rowgen.rc2 would be read before the settings in any other
resource control file, and therefore take precedence.
U Any controls not set by these methods will be searched in order in the files
named below. Note that for some directories, .rowgenrc is the file, and in others, it is
rowgenrc.
RowGen will search for each of the above files in order until all the values have been
set. Once a value is set, it will not be changed by settings in any subsequent files. It is
therefore possible that different variables can be set in different files. Factory defaults
will be used for any values not already set.
If you wish to have all the variables set in one place, you should set ROWGEN_TUNER to a
file name and make sure that all the variables are set in that file. Remember that you may
use any legal UNIX name for this file and that it will only be searched in the path
designated in the ROWGEN_TUNER declaration. If all the values are not set in this file, a
search will be done as shown in Table 1.
If you find performance unsatisfactory, you may want to run setup and choose Tune
RowGen from the menu. This is the same utility that runs during First Time
Setup. You may set different values for the tuner variables based on anticipated
conditions at the time of the sort. Any values that you do not set yourself will be given
factory default values.
You are also given the opportunity to set the path and name for the resource control file
being created. In this way, you can generate multiple resource control files, store them in
the same location, and then activate the desired one by setting ROWGEN_TUNER or
/MEMORY-WORK to the appropriate file name.
W Table 2 shows the order in which RowGen prioritizes resource control settings.
Factory (registry) defaults will be used for any values not manually set.
The names are not case-sensitive and they may be specified in any order. A sample
resource control file will be presented after the definitions.
All size and count values must be positive numbers without commas, though decimal
values are supported. Units may be designated as K or KB for kilobytes, M or MB for
megabytes, G or GB for gigabytes.
If no units are designated, the default units are bytes. Units are not case-sensitive.
All values and path/file names can contain references to environment variables.
All resource control settings prior to RowGen version 8 are still supported, but
NOTE their values will not be optimal. It is therefore recommended that you update
any older settings (and values) with the settings described here.
THREAD_MAX count
This is the maximum number of threads to be designated for a sort or merge. The default
is the number of CPUs licensed for the RowGen software on each machine in your
license agreement. You might want to set this number lower to minimize overhead in
smaller jobs, or to accommodate the needs of other programs running concurrently.
Typically, the maximum number of sort threads corresponds to the number of physical
CPUs on board.
Is the upper memory limit used for internal (in-memory) sorts before they will overflow
to disk-based temporary files. After overflow, the same value represents the size of
temporary files for sort operations.
At installation, you can set MEMORY_MAX to be a literal value, such as 500MB. You can
also express this value as a percentage of physically-detected RAM. For example, if
physical RAM is 1024MB, you can set MEMORY_MAX=50% which indicates a value of
512MB. You can modify this value in the rowgenrc file at any time.
U The default MEMORY_MAX value on Unix and Linux systems at installation is based
on a percentage of the memory detected when CSMEMTEST is run at installation:
On Unix systems where CSMEMTEST is not run, or where it fails, the default will be
the above percentages based on physically-detected RAM.
WORK_AREAS path
This allows you to specify directories for sort overflow (temporary) files. You may
specify as many WORK_AREAS entries as you wish, but the optimal quantity should
match your THREAD_MAX value, and their optimal locations would be on different
physical drives, each with sufficient capacity and space to hold at least 1x your largest
generated file. If you name multiple overflow directories on different physical devices,
your sort will be faster because the I/O will occur in parallel and not conflict. Try also to
ensure that your WORK_AREAS path(s) are not on the same physical drive that holds the
generated or the sorted output file(s).
Work files take the form CSprocess_number, and are removed automatically at the
completion of a successful job.
WARNING!
U If your generated, temporary, and/or output files will be
read from and/or written to remote, NFS-mounted drives, then
you should add the following entry to the rowgenrc file:
AIO OFF
ON_WORKAREAS_FULL option
This option determines the pause/resume behavior when one or more specified
WORK_AREAS (the paths where temporary work files are written to in large sort jobs) has
run out of disk space during a RowGen job. The options are:
ABORT The default behavior. When a work area(s) is full, the job is aborted,
temporary work files are purged, and the job must be re-run. Before
restarting the job after an abort, you must either free up space in your
specified work area(s), or assign different WORK_AREAS entries. See
WORK_AREAS path on page 214.
If you want to resume, you must first free up space in the specified
WORK_AREAS (see WORK_AREAS path on page 214), and then enter
r to resume the job.
RETRY_ADD_WORKAREA_PROMPT
The following prompt is displayed when a work area(s) is filled up
during a job:
To resume the job, you must first free up space in the specified
WORK_AREAS (see WORK_AREAS path on page 214), and then enter
Enter one or more paths, pressing <Enter> after each entry, to add to
the list of WORK_AREAS (see WORK_AREAS path on page 214).
After you have added one or more path names, press <Enter> again to
create an empty line. Each path entered is checked for read/write
access, and, if acceptable, added to the list of WORK_AREAS. All
previous work files are then considered to be non-full, and RowGen
attempts to resume the job. And if all other work areas are still full,
RowGen will utilize only the new work areas entered here.
This defines the size of the I/O buffer. Optimal blocksize varies depending on the
amount of physical memory available in your environment. The default is set to 1200KB
on Unix and 2048KB on Windows.
Set the monitor level to determine the degree of on-screen reporting detail on a RowGen
job. Monitor events are reported through stderr. Setting the monitor at higher levels can
adversely impact the efficiency of the sort. The default MONITOR_LEVEL is 1.
Setting the level with a sortcl /MONITOR statement will override the global default for
that particular job. Below is a chart describing the levels:
Level Description
0 no monitoring
Level Description
8 every 10 records
9 every 1 record
Runtime configuration and sort timing information can be sent to a self-appending log
file. By showing the actual system values used during execution, the log file can be used
in benchmarking and performance analysis. Specifying different filenames in other
resource control files will create additional log files.
When an error prevents a job from completing, the log file is not created. Instead, you
can refer to the file .rgerrlog which is created and overwritten with each job. The
location of .rgerrlog is also determined by the [path] specified here. If you do not
specify a path, any log and .rgerrlog files are written to the current working (user)
directory.
OUTPUT_TERMINATOR option
This setting determines the record terminator used when your output is specified as
variable-length. This option is ignored when your output is specified as fixed-length.
You can include this setting when you want to do any of the following on output:
character(s)
The character, or character string, that you specify -- possibly the null
string "" -- is appended to each output record.
Using the null string "" as the output terminator character can be useful when
NOTE reunified-terminated input data is declared as fixed-length (which is done to
save processing time when all records are of equal length). In these cases,
using the null string as an output terminator character will prevent
double-linefeeds from appearing in the output when converting from fixed- to
variable-length.
AUDIT [path]filename
sortcl can produce a self-appending log file, in XML format, that contains
comprehensive job information for the purposes of auditing. Auditing is enabled only
when the AUDIT entry is included in the rowgenrc file (or Windows Registry).
An audit record is appended to the log after each sortcl invocation, and includes
statistical information regarding the job and the complete sortcl job script. Additional
entries/lines that do not appear in the original job script (that is, entries referenced via
one or more /SPEC commands) are expanded and included in the audit log.
Additionally, an environment display is provided to show the literal equivalents of all
environment variables that were contained in the job script.
AUDIT [path]filename
where path is the directory that will contain the self-appending audit file, and
filename is the name of the audit file. It is recommended that the file name is given the
extension .xml to conform with its file type, for example:
AUDIT /home/compliance/audit/audit_trail.xml
To disable auditing, you must remove the AUDIT entry from the rowgenrc file
(Windows Registry).
You can assign an environment variable in place of the path, for example:
NOTE
AUDIT $syspath/audit_log.xml
This section contains a table of RowGen error and runtime values and messages
(see Detailed Error and Runtime Messages on page 226 for elaboration on each
message).
3 Unknown Exception An internal error detected due to invalid data and/or specifica-
tions. Check whether the input data and specifications are
valid. Contact IRI Support if you cannot resolve the problem.
10 Parameter Error Indicates that a routine was called with an illegal parameter.
Check your rowgen specification.
13 Invalid Key Specification Key location was not specified as CS_FCHAR, CS_FANY, or
CS_FBLANK.
18 Invalid Record Length A fixed key position plus its length goes past the end of a
fixed-length record. A variable-length record shorter than the
highest fixed key position was read. A variable-length record
longer than 65,535 bytes was read.
22 Field Type vs. Length Wrong A CS_NUMERIC CS_INTERNAL key had an unreasonable
k_len for its form.
23 Output Type Unknown Output was not stdout, file, both or returned to caller.
24 Problem with User’s Output File An error occurred when writing data to the final output file.
25 Invalid Value An invalid value for the given data type/task/feature has been
specified.
34 Insufficient Disk Space for Output File Total input bytes cannot fit on disk where output file
is specified.
35 Insufficient Disk Space for Work File(s) Total input bytes exceed (maxmemory + total temp area)
36 Insufficient Disk Space for Two times total input bytes is greater than total output area,
Both Output File and Work File(s) where output area overlaps work area.
37 Maximum Number of Sort Trying to run with more sort threads than your license allows.
Threads Exceeded
70 Too Many Errors -- Aborting RowGen script analyzer stops after 8 syntax errors.
91 No /HEADREC on input file Need to define a headrec on the referenced input file.
97 Cannot Set Locale on System The requested locale routines are not available.
98 Specific Line Too Long RowGen statement or command line limit exceeded.
99 Condition Define on a Different File Defined on one file and used on another.
102 Invalid Record Length for this File e.g., file size not integer multiple of fixed record length.
103 Constant in Conditional Possible Error Invalid data type for numeric operand in a condition.
104 Error Return from RowGen Coroutine reports an error to the caller.
108 Cannot Set Environment Variable for example, setlocale() returns error.
111 Records Not in Order SET file requires the NOT_SORTED option.
113 Last Record Incomplete Wrong record length or missing record terminator.
127 Cannot Find a Matching File The specified pattern did not match any file in the file system.
128 Not Supported The feature/syntax is not supported by rowgen. Contact your
RowGen agent.
130 Missing records (number read != number A mismatch between the number of output records and the
written) number of input records (also accounts for any filter logic).
131 AIO resources exceeded AIO resources are insufficient and must be increased.
151 Operating System Error Indicates an operating system error that is not otherwise
covered by the one of the standard error conditions.
152 Parameter Error Indicates that a routine was called with an illegal parameter.
Check your RowGen specification.
153 Too Many Files Open An attempt was made to open more files than the system
allows open at once.
154 Unsupported Action for the Current File An operation was requested that the current file open mode
Mode does not allow.
155 Record in Use by Another Process/Task The requested record is locked by another process/task.
157 Duplicate Key Detected Where it is Not A duplicate key was detected where duplicates are not
Allowed allowed.
158 Requested Record Was Not Found The requested record was not found. This can indicate the
end or beginning of the file.
159 File Handler has Undefined Status The current file operation cannot be completed because it
detected an undefined status for a parameter.
161 File in Use by Another Process/Task The file is locked by another process/task.
162 Mismatch in Record Size Mismatch detected in the record size specifications.
163 File Type Mismatch Trying to treat a file with a different /PROCESS type.
164 Insufficient Memory Required memory space could not be dynamically allocated.
167 Requested Operation Not Supported by The requested operation is not supported in your machine.
this Host System
168 System Ran Out of Lock-Table Entries Indicates an error when your machine ran out of lock-table
entries. Try again.
169 Vision License Error Invalid license file to generate vision files. Make sure you
have the Vision license file rowgen.vlc in the directory where
you have the RowGen executable. The Vision license file
can be obtained from a Micro Focus (formerly Acucorp)
representative.
170 Unknown Exception An internal problem detected due to invalid input data or
specification. Please make sure that the input data and
specifications are valid. Contact IRI technical support if you
cannot resolve the problem.
171 Error in the Vision Transaction System Indicates that an error occurred in the Vision transaction
system.
172 Header information missing in input file The input /PROCESS type you have specified requires a
header record to exist, and it can not be found.
The following list contains details of the RowGen error and runtime messages,
presented alphabetically. For a list of messages by value number, see Table 3 on
page 219.
Abbreviation of ASCENDING
A RowGen syntax error.
rowgenrc Report Reports the total number of RowGen errors in /DEBUG mode.
Duplicate Name A RowGen syntax error. FILLER, when used as a field name, is
exempt from this error.
Empty Input File A warning message that indicates that the specified file is empty.
Error in WORK_AREAS
WORK_AREAS specification in your resource control file is invalid or
cannot be used. Check whether the WORK_AREAS directory exists and
you have read, write, file create access.
File Type Mismatch Trying to treat a file with a different /PROCESS type.
Insufficient Disk Space for Both Output File and Work File(s)
RowGen has found that the output area and at least one of the work
areas overlap, and that the file system does not have enough free
space to accommodate both the sort output size and the temporary
disk space requirements of the sort. In the simplest case of a single
work area overlapping with the output area, the estimated
consumption of the common area is typically equal to two times the
total number of input file sizes.
path does not have enough space to hold the output of the sort. The
space required of that file system is typically equal to the total
number of input file sizes.
Invalid Field Position The acceptable field start range is 0 ≤ Position ≤ 65,535.
Invalid File The specified file is not a regular file. This could happen when you
try to enter a directory name as the input file.
Invalid Value An invalid value for the given data type/task/feature has been
specified. Check your RowGen specification.
No Such File The specified does not exist in the file system.
Parameter Error Indicates that a routine was called with an illegal parameter. Check
your RowGen specification.
Permission Denied Insufficient privileges to access the specified file. Check the access
privileges of the specified file.
Too Many Files Open An attempt was made to open more files than the system allows open
at once.
Unknown Case Type Only 0 for no case conversion, or 1 for case conversion, are
acceptable values.
Unable to Open File Error in opening the specified file. This could be due to resource
limitations of your machine during runtime.
Unable to Read File Error in reading the specified file. This could be due to resource
limitations of your machine during runtime.
Unable to Write File Error in writing the specified file. This could be due to resource
limitations of your machine during runtime.
Vision License Error Invalid license file to generate vision files. Make sure you have the
Vision license file with name rowgen.vlc in the directory where you
have the RowGen executable. A Vision license file can be obtained
from a Micro Focus (formerly Acucorp) representative.
Dec Hex Chr Dec Hex Chr Dec Hex Chr Dec Hex Chr
Dec Hex Chr Dec Hex Chr Dec Hex Chr Dec Hex Chr
Please indicate the title and page number(s) in the manual you are addressing.
Be sure to provide your name, address (or e-mail address), and telephone number
if you would like a reply from IRI.
E-Mail: rowgen@iri.com
The comments you provide may be used by IRI to improve the quality of, or to make
additions to, this document and/or the RowGen software.
© 2005-2013 IRI. All rights reserved. No part of this document or the RowGen
programs may be used or copied without the express written permission of IRI.