MIcro Design

Micro Design
For
ABC Framework
Owner: IBM
Project
Creation Date: 20/05/12
Last Updated: 18/09/12
Version: V1.3
Authors: IBM
Version: V1.3 Page 1 of 26

Date: 20/09/12
File Name Micro Design Document for ETL Framework and Generic ABC validation
Document Revision History
Revision Number Revision Date Revision Made by Summary of Changes
Draft 20/05/12 IBM New document
V1.1 08/06/12 IBM 1.Added New table called Global_Config
2.Added New Fields in Object Data Metadata
3. Modified the field lengths in the table Batch_job_audit
V1.2 IBM 1. Autosys process updated

2. Minor Metadata Tables modifications updated
V1.3 18/05/2012 IBM 1. Customs ABC Check metadata Tables inserted
2. New key fields(ABC_ID) added into Batch_job_audit table
3. mapping picture changed for generic check
Approval List
Serial# Name Position Sign-off Status Sign-off Date
1
2
3
4
Distribution List
Serial# Name Email id Position
1
2
3

Date: 20/09/12
Table of Contents
1 INTRODUCTION........................................................................................................................................................... 4
2 PROGRAM SPECIFICATION........................................................................................................................................... 4
2.1 Tables used for ETL framework and ABC verification.................................................................................................................................4
2.2 INFORMATICA ETL Objects Created for ABC checks...............................................................................................................................12
3 UNIX SHELL SCRIPTS.............................................................................................................................................. 18
3.1 Autosys Dependency Jobs:..........................................................................................................................................................................23
4 UTILITIES:................................................................................................................................................................ 24

Date: 20/09/12
1 Introduction
The purpose of this document is to provide detail information regarding the ETL process control framework and
audit-balance-check functionalities to support the following:
1.Track ETL batch execution

2.Maintain audit trail of files / database extracts received and utilized in each batch execution cycle
3.Maintain audit trail of each job executed and associated job statistics
2 Program Specification
As a process flow the ETL framework will perform audit, balance and control for all the source systems irrespective
of flat file or relational databases.
Data Load to Landing Area:
Data loading will be audited in depending on the type of interface between source system and landing area
For systems where files are provided on ETL server, audit-balance-checks will be done using shell scripts
For systems where file are pulled via sftp from source system server, audit-balance-checks will be done using shell
scripts
For systems where data is pulled from source system database using Informatica mappings, audit-balance-checks
will be done using informatica worklets
Data Load to Staging Area:

Data loading between Landing area to Staging area is done by Informatica workflow called from ETL
framework(Unix Shell script) programs. In these cases, audit-balance-checks are done using informatica worklet.
The ABC framework will maintain job stream and job execution history and results of ABC performed in a set of
metadata and transaction tables as listed below.
2.1 Tables used for ETL framework and ABC verification
ABC Metadata tables:
SYSTEM_CONFIG_METADATA - This table is used to maintain the metadata related various folders and FTP location details for
each source system. Data from this table is utilized during job execution to set informatica parameters for Source and target files,
Log files, Bad files,etc.
GLOBAL_CONFIG - This table used to maintain the metadata related to success/Failure email ID /group ID and other
parameters like waiting period for dependent process to search for parent job completion.
JOB_STREAM_METADATA – This table is used to hold the metadata for job streams. Appropriate values need to be
populated in this table for each job stream during code deployment.
OBJECT_DATA_METADATA – This table used to contain the metadata information related to each jobs and table
loaded. This table will hold metadata related to source object name, target object name, source object pattern,
acquisition type, acquisition frequency, what the generic ABC checks applicable against each job name.
Generic ABC checks are

Date: 20/09/12
File Name Micro Design Document for ETL Framework and ABC Checks
Record count
Checksum
Range checks vis-a-vis previous period data
Checks for zero record count
Checks for same count and checksum on consecutive periods
Appropriate values need to be populated manually in this table for each jobs
ABC_CHECK_METADATA – This table used to keep the metadata for each custom ABC check that would be
performed against job. These checks will be used primarily during load from Staging to SOR and SOR to Mart.
JOB_PARAM_DATA – This table used to keep the data corresponds to all the parameter and its values for all
workflows/sessions. It contains the data related to sections like “Global” and “Object” wise
Detailed transaction / ABC tables:

JOB_STREAM – This table is used to keep the audit data for each job stream execution. This table will record
details like date and time of job stream execution, status, business processing date associated with the job stream
BATCH_JOB_STATUS – This table used to keep the audit data for execution of each jobs which have defined in the
OBJECT_DATA_METADATA table
BATCH_JOB_AUDIT – This table used to keep the audit data for result of generic as well as customs ABC checks
defined against each job in the OBJECT_DATA_METADATA table. New inclusion of field like ABC_Chek_ID to
identify the records corresponds to which ABC rule .
Description of columns of each Tables used :
Control Famework
DDL.zip
Data Model

Date: 20/09/12
Date: 20/09/12
JOB_STREAM_METADATA :
Column Name Description
JOB_STREAM_NAME Name of the job stream like STG_VP_LOAD,

STG_PRIMA_LOAD etc.
RUN_FREQUENCY Will hold data like 'D' for daily, 'W' for weekly, 'M' for monthly
etc.
BUSINESS_DATE Will hold value of date for which data is belongs to in 'dd-MMM-
yy' format
JOB_STREAM:
JOB_STREAM_ID Sequence generated unique number would be created during

each job stream execution
JOB_STREAM_NAME Name of the job stream like STG_VP_LOAD,

STG_PRIMA_LOAD, SOR_VP_LOAD etc
EDW_AREA Will hold data like 'STG' for staging, 'SOR' for enterprise data
warehouse, 'MART ' for mart area
BUSINESS_PROCESSING_DATE Will hold value of date for which data is belongs to in 'dd-MMM-
yy' format
JOB_STREAM_START_DATETIME Start time of the job stream execution
JOB_STREAM_END_DATETIME Completion time of the job stream execution
JOB_STREAM_PROCESS_STATUS Status of the job stream , would contain values like

'SUCCESS', 'FAIL' ,'RUN' 'SKIP' depending upon job stream
execution
OBJECT_DATA_METADATA :
OBJECT_ID Unique number would be created for each job name
SYSTEM_NAME Name of the systems like VP, PRIMA, LAS, LAG etc
SOURCE_OBJECT_NAME Name of source entity(can be a table or flat file) for the Job
name
TARGET_OBJECT_NAME Name of the target entity(can be a table or flat file) for the Job
name
ACQUISITION_TYPE Acquisition type could be PUSH or PULL depending upon to

system
ACQUISITION_FREQUENCY Acquisition frequency could be 'DAILY', 'MONTHLY', 'WEEKLY'
HEADER_LINE_CNT Count of number of header records
TRAILER_LINE_CNT Count of number of trailer records
JOB_NAME Name of the Jobs, for the Informatica jobs it is the work flow

Date: 20/09/12
name.
COUNT_CHECK This column holds the values like “ABORT” , “NA”

ABORT – if defined as 'abort' then job's ABC check will fail if
count does not match between the count from control file and
actual count of records inserted in the table.
NA- If defined as 'NA' then count check would not be
performed for that job
CHECKSUM_CHECK This column holds the values like “ABORT” , “NA”

checksum value does not match between the checksum value
from control file and actual checksum value against the same
column for records inserted in the table.
NA- If defined as 'NA' then checks w.r.t to that column would
not be performed for that job
COUNT_ZERO This column holds the values like “ABORT” , “NA”

actual count of inserted records in the table is zero.
NA- If defined as 'NA' then this check w.r.t count of records
inserted inside the table could not be performed for that job
CHECKSUM_ZERO This column holds the values like “ABORT” , “NA”

checksum value against a column is zero
PREV_CNT_CSM_SAME This column holds the values like “ABORT” , “NA”

ABORT – if defined as 'abort' then ABC check of the job will fail
if previous and current counts and checksum values for a table
are same
RANGE_CHECK This column holds the values like “ABORT” , “NA”

ABORT – if defined as 'abort' then ABC check of the job will fail
if checksum does not fit in the range defined
RANGE_TYPE Values would be 'ABS' for absolute or 'PERCENT' for

percentage
RANGE_MIN If RANGE_CHECK is defined as 'NA' then the value would be

Zero/NULL, otherwise lower limit value for the RANGE_TYPE
is equal to 'PERCENT' (NULL → No lower limit)
RANGE_MAX If RANGE_CHECK is defined as 'NA' then the value would be

Zero/NULL, otherwise upper limit value for the RANGE_TYPE
is equal to 'PERCENT' (NULL → No upper limit)
INPUT_FILE_PATTERN This field contains the pattern using which sFTP mechanism
will find which files need to be FTP from a remote server

Date: 20/09/12
SOURCE_OBJECT_TYPE Holds the value of object Type of the Source entity (FILE /
TABLE)
TARGET_OBJECT_TYPE Holds the value of object Type of the Target entity (FILE /
TABLE)
CHKSUM_EXPR This field will hold the value of checksum expressions ,

example like SUM(ABS(col_name))
CHKSUM_WHERECLAUSE This field will hold the where clause value of checksum
expressions , example like WHERE RECORD_ID='2' (as
done for HTSM for VP system)
RANGE_EXPR This field will hold the value of RANGE expressions
RANGE_WHERECLAUSE This field will hold the where clause value of range expressions
HDR_CHECKSUM_DIV This field will hold the value for the CHECKSUM DIVISION if
required for any type of file ( example , values would be default
1 , in some types of file like HTSM,ATH2,ATH4 it is 100)
BATCH_JOB_STATUS :
JOB_RUN_ID Sequence generated unique number which will be created

during each job is being executed
JOB_STREAM_ID Would be flowing from Jobstream table based upon which job
stream is being executed
OBJECT_ID Would be the object id associated with every job name
BUSINESS_PROCESSING_DATE Will hold the value of the date for which the job stream/job is
being run
JOB_START_DATETIME Will hold the start time stamp of the job
JOB_END_DATETIME Will hold the completion time stamp of the job
JOB_RUN_STATUS Will hold the status of the job. would contain values like
'SUCCESS', 'FAIL' ,'RUN' 'SKIP' depending upon job execution
BATCH_JOB_AUDIT :
JOB_RUN_ID Sequence generated unique number which will be created

during each job is being executed
JOB_STREAM_ID Would be flowing from Jobstream table based upon which job
stream is being executed
OBJECT_ID Would be the object id associated with every job name which is
being executed
SOURCE_RECORD_EXTRACTED Actual number of source records extracted during the job run

Date: 20/09/12
TARGET_RECORD_INSERTED Actual number of records inserted into target table during the
job run
TARGET_RECORD_UPDATED Actual number of records updated into target table during the
job run
RUN_DATE Contains the date value for which job is being run
PREV_CNT Actual count value of the previous period run for the a job
PREV_CHKSUM Actual checksum value of the previous period run for the a job
CURR_CNT Actual count value of the current run for the same job
CURR_CHKSUM Actual checksum value of the current run for the same job
RANGE_MIN_VAL Holds minimum value of the range for the job is being run
RANGE_MAX_VAL Holds maximum value of the range for the job is being run
COUNT_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the ABC 'COUNT' check for the job .
CHECKSUM_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the ABC 'CHECKSUM' check for the job .
CNT_ZERO_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the ABC 'CNT_ZERO' check for the job .
CHKSUM_ZERO_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the ABC 'CHKSUM_ZERO' check for the job .
CNT_CSM_SAME_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the ABC 'CNT_CSM_SAME_RESULT' check for the
job .
RANGE_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the corresponding ABC 'RANGE' check for the job .
HEADER_CHKSUM Holds the value of checksum shown in the header line of the
flat file source
HEADER_CNT Holds the value of record count shown in the header line of the
flat file source
HEADER_BUSINESS_DATE Holds the value of record count shown in the header line of the
flat file source
BUSINESS_DATE_CHK_RESULT Will hold the values like 'SUCCESS', 'FAIL' depending upon the
status of the ABC 'Business date check RESULT' check for the
job
ABC_CHECK_ID It will contain “ABC_CHECK_ID” for each ABC checks to be be

performed for each SOR or MART workflow. For STG area the
value would be “NA”
ABC_CHECK_METADATA:
ABC_CHECK_ID A number which will be created during population of metadata

Date: 20/09/12
TARGET_OBJECT_NAME Target Object name related to a Job
MINIMUM_THRESHOLD Will hold the minimum thresold value for that target column
which is in the Range_expression
MAXIMUM_THRESHOLD Will hold the minimum thresold value for that target column
which is in the Range_expression
THRESHOLD_TYPE Tells what type of Thresholds like PERCENT, ABS value
SOURCE_CUSTOMS_SQL Complex source side SQL to be saved to perform any kind of

ABC check
SOURCE_COUNT_EXPR Will hold the expression for actual count value for the source
column of target table
ex. COUNT(DISTINCT ACCT)
SOURCE_CHECKSUM_EXPR Will hold the expression for actual checksum value for the
source column of target table
ex. SUM(LOGO)
TARGET_RANGE_FROM_CLAUSE Would hold the value of “FROM clause” to be applied for

determining actual range value on a particular target column,
ex. $$SOR_DB.$$TGT_OWNER.STG_VP_ATH1
SOURCE_FROM_CLAUSE Would hold the value of “FROM clause” to be applied for

determining checksum or count value on a particular source
column, ex. $$SOR_DB.$$TGT_OWNER.STG_VP_ATH1
SOURCE_WHERE_CLAUSE Would hold the value of “where clause” to be applied for

determining checksum or count value on a particular source
column, ex. BUSINESS_DT=$$BUSINESS_DT
TARGET_COUNT_EXPR Will hold the expression for actual count value for the target
column of target table
TARGET_CHECKSUM_EXPR Will hold the expression for actual checksum value for the
target column of target table
ex. SUM(LOGO)
TARGET_RANGE_EXPR Will hold the actual value of the target column for which range
to be determined
TARGET_FROM_CLAUSE Would hold the value of “FROM clause” to be applied for

determining checksum or count value on a particular target
column, ex. $$SOR_DB.$$TGT_OWNER.STG_VP_ATH1
TARGET_WHERE_CLAUSE Would hold the value of “where clause” to be applied for

determining checksum or count value on a particular target
column, ex. BUSINESS_DT=$$BUSINESS_DT
OBJECT_ID Will hold the value of the object id for which ABC check is being
evaluated. ex. 14 for ATH1
TARGET_RANGE_WHERE_CLAUSE Would hold the value of “Range where clause” to be applied

for determining range value on a particular target column, ex.
BUSINESS_DT=$$BUSINESS_DT

Date: 20/09/12
TARGET_CUSTOM_SQL Complex target side SQL to be saved to perform any kind of
ABC check on target table
SYSTEM_CONFIG_METADATA
SYSTEM_NAME Name of the source systems like VP, PRIMA, FINONE
INBOUND_DIR This is the name of the directory where the source files to be
kept
REJECT_DIR This directory is made for data files which are rejected after file
validation
ARCHIVE_DIR This directory is made for archiving the data files which are ok
after file validation and before informatica job runs for that file
SRC_FILE_DIR Directory path where source files to be kept
TGT_FILE_DIR Directory path where target files to be kept
FTP_SERVER Will contain the HOSTNAME of the FTP server
FTP_SRC_DIR Source file location on the FTP server
FTP_USER Name of the user to log into FTP server
BAD_FILE_DIR This directory is made for the files which are rejected during
Informatica load
SESS_LOG_DIR This directory is made for the files which are created during
Informatica execution as a log file
GLOBAL_CONFIG
SUCCESS_EMAILID Will hold the value of EMAIL ID in case of Suceess of Jobs
FAILURE_EMAILID Will hold the value of EMAIL ID in case of failure of Jobs
FILE_WATCHER_WAIT_INTERVAL Will hold the value of “waiting period” value (in seconds) for
Autosys dependent job will search for its file related to its
parent job's completeion
JOB_PARAM_DATA
OBJECT_ID Object id for a particular jobname
PARAM_SECTION Will hold the “SECTION” value of for that jobname, like
“GLOBAL” or
“[ORGNAME_STG_VP.WF:wf_stg_VP_ATH1.ST:s_m_stg_VP_
ATH1]”

Date: 20/09/12
PARAM_NAME Will hold the value of “param name” which would be used
during workflow (jobname) execution
like $$JOB_STRM_ID or $PMSessionLogFile
PARAM_VALUE Will hold the actual value of “param name” which would be
used during workflow (jobname) execution
like STG_VP_ATH1_JSID(to be dynamically replaced with
actual value like 1, 1001 etc) for $$JOB_STRM_ID or
“s_m_stg_VP_ATH1.log” for $PMSessionLogFile
Relation of Tables within ETL Framework:
JOB_PARAM_DATA
Global_Config
2.2 INFORMATICA ETL Objects Created for ABC checks
a) Generic ABC check mapping used during staging area population:

Date: 20/09/12
b) Custom ABC check mapping used during SOR and MART area population:
(a)Mapping Details
Specification Type Specification
Mapping Name m_Generic_ABC_Check

Mapping Location ORGNAME_Shared
Processing Logic Performs the generic ABC checks which are defined in the object data metadata tables against
each entity as an “ABORT'
It compares the values like Count, Checksum against a column defined in the control
file(generated from header record of the source file ) with the actual count of records and

Date: 20/09/12
checksum values against that specific column of the Target Table(once loaded from the source
file/Table)
Input Tables Object_Data_Metadata,
Actual STAGING Table
BATCH_JOB_AUDIT
Sequence of this mapping Run Once after each Target table load from file
Parameter/ Variables $$TABLENAME -->Name of the stage Table
$$CHECKSUM_EXPR-->like SUM(ABS(col_name)) or SUM(col_name) etc.
$$ABCRESULT -->Will be used later for generic ABC checks
$$JOB_NAME → Job name for which ABC checks is being performed
$$JS_ID → Job stream Id for the job
$$JOB_RUN_ID → Current Job run ID
$$SRCROW → Number of source record extracted would be fetched from Informatica metadata
variable during the execution of the mapping
$$TGTINSROW → Number of target record inserted would be fetched from Informatica
metadata variable during the execution of the mapping
$$BUS_DT_CTL_FILE → Value of the Business date from the source control file
$$CNT_CTL_FILE → Value of the count of records from the control file
$$CHKSUM_CTL_FILE → Value of the Checksum column from the control file
$$WHERECLAUSE → for files like HTSM of VP systems where $$WHERECLAUSE would be
useful to get CHKSUM value
$$CHECKSUM_EXPR → would be used to hold values like SUM(ABS(COL_VAL1))
$$RANGE_EXPR→ would be used to hold values for Range expression
$$RANGE_WHERECLAUSE→ would be used to hold values for where clause for range
expression
$$CHECKSUM_DIV→ would be used to hold values for CHECKSUM_DIV to divide the value
provided in the header file (like in case of ATH2,ATH4,HTSM)
Ouput TABLES BATCH JOB_AUDIT

DUMMY_ABC_TARGET → This flow added in the mapping to ABORT the session in case if any
of the ABC check fails
NB – The shortcut of this mapping would be created in each of the folders specific to each source systems
b) Worklet and session Details for Generic ABC Check
One worklet (below mentioned) need to be created in each of the folder specific to the Source Systems, and this
worklet to be added after each target entity load session of the workflow.

Date: 20/09/12
Session Details:
Informatica server used to execute the mapping IS_ORGNAME_EDW_DEV
Session name s_m_Generic_ABC_Check_STG
Mapping name m_Generic_ABC_Check
Session Type non-reusable
Session Log File <sessionname>.log
Parameter File /edw/infa_shared/ParamFiles/
<system_name>_<Target_entity_Name>.prm
Source NA
Record Treatment Target Data driven
Pre Pre_session_variable_A Gets all the variables from the previous session
ssignment
Session specific to Source Systems(where ever the flat file is the source , example VP)
Informatica server used to execute the mapping IS_ORGNAME_EDW_DEV
Session name s_m_READ_VP_CONTROL_FILE
Mapping name m_READ_VP_CONTROL_FILE
Source NA
Record Treatment Target Insert
Post Post_session_Success_ Assigns all the variable from the control file related to the job_name to the
Variable_assignments next session
Work-let Name Specification

Worklet Name wklt_Generic_ABC_Check_STG
Worklet Variables $$CTRL_FILE → Name of the control file to be used
$$P_TGT_CNT → Target record count from the earlier session (Actual

Date: 20/09/12
Work-let Name Specification
Worklet Name wklt_Generic_ABC_Check_STG
target table load)
$$P_BUS_DT_CTL_FILE → Business date value from the Control file
$$P_COUNT_CTL_FILE → Record Count value from the Control File
$$P_CHKSUM_CTL_FILE →Checksum Value from the Control file
Control Task Name Fail_ABC_CHECK → Fails the worklet and parent workflow if there are any
of the checks not complied
Parmeter File Entry for each workflow for ABC Check Session:
Sample entry for ABC Check related with ATH2
[_STG_VP.WF:wf_stg_VP_ATH2.WT:wklt_Generic_ABC_Check_STG]
$PMSessionLogFile=s_m_Generic_ABC_Check.log
$DBConnection_Src=CONN_EDW_NZ
$DBConnection_ABC=Oracle_repo
$$TABLENAME=STG_VP_ATH2
$$CHECKSUM_EXPR=SUM(ABS(MT_AMOUNT))
$$JS_ID=STG_VP_ATH2_JSID
$$JOB_RUN_ID=STG_VP_ATH2_JOBID
$$JOB_NAME=STG_VP_ATH2_JOBNM
$$CTRL_FILE=ATH2_20120515.ctl
$$COUNT_VALUE=0
$$CHECKSUM_VALUE=0
$$CHECKSUM_EXPR=VAR_CHKSUMEXPR
$$WHERECLAUSE=VAR_WHERECLAUSE
$$RANGE_EXPR=VAR_RANGEEXPR
$$RANGE_WHERECLAUSE=VAR_RANGEWHERECLAUSE
$$CHECKSUM_DIV=VAR_CHECKSUMDIV
II>> Customs ABC check mapping:

Mapping Details
Mapping Name m_Custom_ABC_Check

Mapping Location ORGNAME_Shared
Processing Logic Performs the custom ABC checks which are defined in the ABC_CHECK_Metadata table for a
particular workflow against each entity as an “ABORT'
It compares the values like Count, Checksum , range against a column defined in the source
side columns like “SOURCE_COUNT_EXPR” ,” SOURCE_CHECKSUM_EXPR” with the actual
count of records on target side vcolumns like
“TARGET_COUNT_EXPR” ,”TARGET_CHECKSUM_EXPR” and checksum values against that
specific column of the Target Table
Input Tables ABC_CHECK_METADATA
Actual SOR/MART table
BATCH_JOB_AUDIT
Sequence of this mapping Run Once after each workflow (containing multiple Target tables) load from file

Date: 20/09/12
Parameter/ Variables $$STG_DB - staging database name

$$SOR_DB – SOR database name
$$SRC_OWNER – name of the owner for source side database
$$TGT_OWNER – name of the owner for target side database
$$TABLENAME -->Name of the stage Table

$$CHECKSUM_EXPR-->like SUM(ABS(col_name)) or SUM(col_name) etc.
$$ABCRESULT -->Will be used later for generic ABC checks
$$JOB_NAME → Job name for which ABC checks is being performed
$$JS_ID → Job stream Id for the job

$$JOB_RUN_ID → Current Job run ID
BUSINESS_DT → Date for which the job is being executed
$$SRC_RECORD_EXTRACTED – Number of records is being extracted from set of sources
$$TGT_RECORD_INSERTED – Number of records are being inserted into target table
Ouput TABLES BATCH JOB_AUDIT

DUMMY_ABC_TARGET → This flow added in the mapping to ABORT the session in case if any
of the ABC check fails
Session Details:
Informatica server used to execute the mapping IS_ORGNAME_EDW_DEV or IS_ORGNAME_EDW_UAT
Session name s_m_Custom_ABC_Check
Mapping name m_Custom_ABC_Check
Source NA
Record Treatment Target Data driven
Pre Pre_session_variable_A NA
ssignment
NB - One worklet to be defined for this “s_m_Custom_ABC_Check” in each SOR/MART folder based upon subject
area , which worklet be attached after each SOR or MART workflow so that all the “ABC” related checks (defined in
the ABC_CHECK_METADATA) to be executed after each worklflow(job) runs.

Date: 20/09/12
3 UNIX Shell Scripts
Scripts.zip B a c k u p _ 1 4 S e p . z ip
Unix Shell Script Name Description

initialize_jobstream.sh This shell script does the following
Starts the ETL control process
initiates a jobstream process
inserts a record into “job stream” table with 'RUN' status
Predecessor job Will be called from Autosys
Input Arguments jobstream_name
edw_area
SQL files Involved initialize_JobStream_GetStatus.sql
initialize_JobStream_Insert_Rec.sql
get_business_date.sql

job_exec_lnd_vp.sh This shell script is specific to systems(like VP) where source is a FF with
header/trailer does the following
Checks whether a landing Job can be started based on the status of a job stream
and status of the job
initiates the specific landing job sent as a argument
Gets system configuration folder structure
inserts a record in "Batch Job Status" with 'RUN' status
 Also inserts a record in the "Batch Job Audit " table
Does sFTP of the specific flat file from remote server to the local server
validates file for record count check and updates result into “batch job audit” table
for that job
validate file for date provided in the header record if that is matching with the
business date.
If validation gets successful then it moves source file from “Inbound” folder to
“SrcFiles” folder defined for that specific system
If validation fails then it moves the file to “Reject” folder defined for that specific
system
Calls <job_afterrun .sh> program based upon the status of sFTP and file validation
Predecessor job(if any) Will be called from Autosys
initialize_jobstream.sh
edw_area
job_name
SQL Files Used job_execution_Get_JobId.sql

Date: 20/09/12
job_execution_Get_Status.sql
job_execution_Insert_Rec.sql
job_vf_update_bja.sql

job_execution.sh This shell script does the following
Checks whether a Job can be started based on the status of a job stream and status
of the job
Gets system configuration folder structure
Initiates the specific job sent as a argument
Inserts a record in "Batch Job Status" with 'RUN' status
Also inserts a record in the "Batch Job Audit " table
Copies the source file from “SrcFiles” folder to “Archive” folder defined for that
system
Creates template parameter file based on entries in JOB_PARAM_DATA table by
“union all” of sections with
 No object id (GLOBAL section)
Object id specific (job specifc section)
Replaces the parameters of the Template parameter file defined for specific job – to
be used by the ETL Informatica jobs – one such ETL parameter file for each job
Calls the Informatica ETL job specific to that job name
Calls <job_afterrun .sh> program based upon the status of informatica job
execution status.
Predecessor job(if any) Will be called from Autosys
initialize_jobstream.sh
edw_area
job_name
SQL Files Used job_execution_Get_JobId.sql
job_execution_Get_Status.sql
job_execution_Insert_Rec.sql

job_afterrun.sh This shell script does the following
Removes the source file and control file from “SrcFiles” directory defined for that
system for that job
updates the status for that job in the “Batch_Job_status” table based upon the
status of the job which just ran
Predecessor job Either job_execution.sh or job_exec_lnd_vp.sh
job_name
Job Status
Business_date

Date: 20/09/12
SQL Files Used job_afterrun_Get_Status.sql
job_afterrun_Update_Rec.sql

Date: 20/09/12
jobstream_wrapup.sh This shell script does the following
Updates a jobstream record based upon the status of all the jobs under that
jobstream
Advances the jobstream metadata table based upon the frequency of the jobstream
defined.
Predecessor job initialize_jobstream.sh and/or

either job_execution.sh or job_exec_lnd_vp.sh
job_afterrun.sh
Will be called from Autosys

Input Arguments Jobstream Name
SQL Files Used(If Any) jobstream_wrapup_Get_Status.sql
jobstream_wrapup_Update_Rec.sql
update_business_date.sql
Configuration File (if any) NA

sftp_files.sh This shell script does the following
Does sFTP of the source file from “FTP source” directory to “Inbound ” directory
defined for that system for that job

Input Arguments SOURCE_SYSTEM
INPUT_FILE_PATTERN
OUTPUT_FILE_PATTERN
SQL Files Used(If Any) NA
Configuration File (if any) VP_FTP.cfg (system Specific)
rsa_keys to be created in the remote server for password less FTP

Date: 20/09/12
Validatefile.sh This shell script does the following
Checks whether input file is empty or not
Validates business date is matching with the header date and updates
“batch_job_audit” table accordingly
validates file for record count check and updates result into “batch job audit” table
for that job
If validation gets successful then it moves source file from “Inbound” folder to
“SrcFiles” folder defined for that specific system
If validation fails then it moves the file to “Reject” folder defined for that specific
system
Input Arguments  <jobstream_name>
<file_name>
<Job_name>
SQL Files Used(If Any) /checkfile_get_head_tail_cnt.sql
job_afterrun_Get_Status.sql
job_vf_update_bja.sql
Configuration File (if any)

runSQL.sh This shell script does the following
executes the SQL scripts
Predecessor job NA
Input Arguments SQL File
RPT file
Variable number of arguments

parse_header.sh This shell script does the following
parses the different header/trailer record of different source file
Predecessor job NA
Input Arguments NA

Date: 20/09/12
rmhdrtrl.sh This shell script removes the header & trailer record for the input source file.
This uses the SED unix command to remove the header and trailer record from the
source flat file
Predecessor job NA
Input Arguments Input_Filename

cmn_const.sh Common constants kept in this file
Also keeps the password of the Informatica user in encrypted format
comm_func_lib.sh Common function library

WriteToLog
proc_lnd_prm_<source_entity_name>. Shell script to run the job related to prima landing ETL jobs for
sh <source_entity_name>
proc_stg_prm_<source_entity_name>. Shell script to run the job related to prima staging ETL jobs for
sh <source_entity_name>
proc_lnd_vp_<source_entity_name>.sh Shell script to run the job related to vp landing ETL jobs for <source_entity_name>
proc_stg_vp_<source_entity_name>.sh Shell script to run the job related to vp staging ETL jobs for <source_entity_name>
proc_init_stg_<source_system_name>. Initializes the Jobstream for <source_system_name>
sh
proc_wrp_<source_system_name>.sh Wraps up the jobstream for <source_system_name>
3.1 Autosys Dependency Jobs [file_watcher_autosys_touch_file.sh]:

Process purpose:
As Autosys can not create a dependency between jobs in two separate flows. To overcome this constraint we have
following strategy:
1.Create a touch file after execution of each parent job

2.Create a Object Dependency table for all inter-dependent jobs
3.Allow to Execute dependent jobs only if touch file created by successful parent job(s) is present
4.Remove touch file [Created by successful parent job execution] at appropriate point in processing
Metadata Table creation:
Metadata Table to be created : Object_Dependency_Metadata

Date: 20/09/12
Fields :
Parent_Job_Object_Id Integer
Child_Job_Object_id Integer
File_Watcher_Ind Char(1)
Process Description :
1.Create touch file – “File watcher” file[ Standard : <object_id>_BUSDATE.suc ex. 1_20120807.suc]. Touch file to
be created in existing shell script <Job_afterrun.sh>
2.Create file watcher job to perform following steps
1.Read “Object_Dependency_Metadata” tables for the job to check if there is a dependency exists for that
object , if dependency exists then check for the respective file_watcher file.
2.If file_watcher file for Parent job found in step-2 then allow the child job to execute
3.If file_watcher file for Parent job not found in step-2 , wait 30 secs (configurable in global config metadata
column “File_Watcher_Wait_Interval”) again go to step-2 for rechecking
3.Create shell script to remove the “file_watcher” file related to parent job.
4 Utilities:
1.Bulkwriter – There is a awk script which was specifically written for changing the “Netezza Relational writer” for
target table to “Netezza Bulk writer” . This utility applies to the XML file(exported Informatica workflows) and it
generated the changed XML file. Changed XML can be imported in Informatica workflow Manager which would
contain the properties set with “Netezza Bulk writer” .
Use : awk -v v_pattern=VP -f parseWorkflow.awk VP.XML > VP.Changed.XML
2.Session Config -There is a awk script which specifically written for changing the following session properties .
This utility applies to the XML file(exported Informatica workflows) and it generated the changed XML file. Changed
XML can be imported in Informatica workflow Manager which would contain the properties set with “Netezza Bulk
writer” .
1. Save Session log by – By Run
2. Save session log by these runs -10
3. Recovery strategy – Restart task
4.Session on Grid – Is Enabled(checkbox checked)
Use : awk -f parseWorkflow.awk PRIMA.XML > PRIMA.Changed.XML
3.Paramete File Conversion - There is a awk script which was specifically written for uploading the “conventional
parameter file” to target table “Job_Param_Data” . This utility applies to the text file(containing parameter name-

Date: 20/09/12
value pairs) and it uploads those parameter values into a table “Job_Param_Data” using a sqlloader and a control
file.
Control file – load_param.ctl
4.load data
infile './vp_template.prm.out'
append
into table tmp_job_param_data
fields terminated by '~' optionally enclosed by '"'
(filename, param_section, wf_name, param_name, param_value)
Step-1> Use : awk -f Convert_global_section.awk vp_template.prm>vp_template_out.prm
vp_template_out.prm is a '~' seperated flat file which would be used in the second step
Step-2 > Use : sqlldr userid/password@infarepo CONTROL=load_param.ctl BADFILE='load_param.bad'
5.Repository Backup : This shell script would be used to take backup of the repository
use :
sh repository_backup.sh

Date: 20/09/12

MIcro Design - ABC Framework Ver 1.3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIcro Design - ABC Framework Ver 1.3

Uploaded by

Copyright:

Available Formats

Version: V1.3 Page 1 of 26

V1.2 IBM 1. Autosys process updated

Version: V1.3 Page 2 of 26

Version: V1.3 Page 3 of 26

1.Track ETL batch execution

Data Load to Landing Area:

Data Load to Staging Area:

2.1 Tables used for ETL framework and ABC verification

ABC Metadata tables:

Version: V1.3 Page 4 of 26

Detailed transaction / ABC tables:

Description of columns of each Tables used :

Version: V1.3 Page 5 of 26

JOB_STREAM_NAME Name of the job stream like STG_VP_LOAD,

JOB_STREAM_ID Sequence generated unique number would be created during

JOB_STREAM_NAME Name of the job stream like STG_VP_LOAD,

JOB_STREAM_START_DATETIME Start time of the job stream execution

JOB_STREAM_END_DATETIME Completion time of the job stream execution

JOB_STREAM_PROCESS_STATUS Status of the job stream , would contain values like

OBJECT_ID Unique number would be created for each job name

ACQUISITION_TYPE Acquisition type could be PUSH or PULL depending upon to

ACQUISITION_FREQUENCY Acquisition frequency could be 'DAILY', 'MONTHLY', 'WEEKLY'

HEADER_LINE_CNT Count of number of header records

TRAILER_LINE_CNT Count of number of trailer records

Version: V1.3 Page 7 of 26

COUNT_CHECK This column holds the values like “ABORT” , “NA”

CHECKSUM_CHECK This column holds the values like “ABORT” , “NA”

COUNT_ZERO This column holds the values like “ABORT” , “NA”

CHECKSUM_ZERO This column holds the values like “ABORT” , “NA”

PREV_CNT_CSM_SAME This column holds the values like “ABORT” , “NA”

RANGE_CHECK This column holds the values like “ABORT” , “NA”

RANGE_TYPE Values would be 'ABS' for absolute or 'PERCENT' for

RANGE_MIN If RANGE_CHECK is defined as 'NA' then the value would be

RANGE_MAX If RANGE_CHECK is defined as 'NA' then the value would be

Version: V1.3 Page 8 of 26

CHKSUM_EXPR This field will hold the value of checksum expressions ,

RANGE_EXPR This field will hold the value of RANGE expressions

JOB_RUN_ID Sequence generated unique number which will be created

OBJECT_ID Would be the object id associated with every job name

JOB_START_DATETIME Will hold the start time stamp of the job

JOB_END_DATETIME Will hold the completion time stamp of the job

JOB_RUN_ID Sequence generated unique number which will be created

Version: V1.3 Page 9 of 26

ABC_CHECK_ID It will contain “ABC_CHECK_ID” for each ABC checks to be be

ABC_CHECK_ID A number which will be created during population of metadata

Version: V1.3 Page 10 of 26

THRESHOLD_TYPE Tells what type of Thresholds like PERCENT, ABS value

SOURCE_CUSTOMS_SQL Complex source side SQL to be saved to perform any kind of

TARGET_RANGE_FROM_CLAUSE Would hold the value of “FROM clause” to be applied for

SOURCE_FROM_CLAUSE Would hold the value of “FROM clause” to be applied for

SOURCE_WHERE_CLAUSE Would hold the value of “where clause” to be applied for

TARGET_FROM_CLAUSE Would hold the value of “FROM clause” to be applied for

TARGET_WHERE_CLAUSE Would hold the value of “where clause” to be applied for

TARGET_RANGE_WHERE_CLAUSE Would hold the value of “Range where clause” to be applied

Version: V1.3 Page 11 of 26

SYSTEM_NAME Name of the source systems like VP, PRIMA, FINONE

SRC_FILE_DIR Directory path where source files to be kept

TGT_FILE_DIR Directory path where target files to be kept

FTP_SERVER Will contain the HOSTNAME of the FTP server

FTP_SRC_DIR Source file location on the FTP server

FTP_USER Name of the user to log into FTP server