Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

ETL Team Development Standards

ETL Development Checklist


1. Develop and publish technical specifications based on functional requirements
2. Develop Informatica mappings in development folder, following folder management,
version control, naming, data transformation, and dimension table standards.
3. Design Load Status Table process if necessary.
4. Design incremental load process if necessary.
5. Develop Informatica workflows in development folder, following folder management,
version control, and naming, standards
6. Test Informatica mappings and sessions.
7. Migrate Informatica objects to production.

1. Technical Specification Development Guidelines

Template Location
To develop technical specifications for a project, use the PROJECT NAME Data
Requirements template under Data Warehouse Groups > Deliverable Templates:
https://calshare.berkeley.edu/sites/RAPO/EDW/reporting/Deliverable
%20Templates/Forms/AllItems.aspx 

Publishing
Publish completed specifications and all technical documentation to CalShare sharepoint:
ETL Home  Shared Documents.

Peer Review
Review completed designs with teammates.

2. Informatica Mapping Development Guidelines

Folder Management
Determine which folder in which to locate new mappings. Develop a new folder if
necessary, coordinating with the Informatica PowerCenter Server administrator. Maps and
sessions may be developed in personal folders within the info_dev repository, but should be
migrated to the dev copy of the relevant production folder after initial development is
complete and before QA testing begins. Guidelines for folder management follow:
o Source-to-Stage copy mappings should be located in subject area folders (e.g.
HCM9.0 for HR source tables)
o Mappings which load into ENTERPRISE_DIM tables should be located in the
ENTERPRISE folder.

Version Control
For new Informatica mappings, create mapping in info_dev repository using Designer tool.
Version for new maps is 1.0. See Naming Standards below for details.

Page 1
ETL Team Development Standards
For revisions to existing Informatica mappings, copy mappings from info_prod repository to
info_dev repository within Designer tool. Increment info_dev instance of mapping
version/name per Naming Standards below.

Do not use shortcuts to develop folders or mappings.

Naming Standards
Within Informatica Designer, maps should be named using the following template:
Area_TargetName_Qualifier_Action_vX_Y

where:

Area is stg – Staging


dw – EDW fact/dimension tables

TargetName is the final target table name all in upper case

Action is del - Delete


ins - Insert
updt - Update
extr - Insert and Update
scd - Slowly Changing Dimensions
copy - Copy (no transformation logic between sources and targets – mainly
used for source to stage copies, creation of test data, and ad hoc data
movement.)

Qualifier is A description of the functionality of the mapping. This only needs


to be added if multiple mappings use the same target table.

“v” stands for version

“X” is the major version number. It is initially set to 1 when a map is first
created and is incremented by one for each subsequent major
change to that mapping. Major changes involve fundamental changes to
a map design, e.g. new sources, transformations and/or
targets, replacing or significantly augmenting existing functionality. For
minor mapping revisions, the major version number
remains constant.

“Y” is the minor version number. It is initially set to 0 when a map is first
created and is incremented by one for each minor change to a
given mapping (e.g. a change to a Filter transformation condition or a
change to derived values within an Expression tranformation).
When the Major Version number (“X” above) is incremented, the minor
version number is re-set to “0”.

Data Transformation Standards

Page 2
ETL Team Development Standards
When values are moved into data warehouse fact tables, they will only be transformed when
the consumer requests a transformation: Nulls will remain nulls.
1. Right trim spaces from all varchar columns in the ETL mapping except for Desc or Txt
types. Name types should be right trimmed.
2. If a code field is equal to null or spaces in the source table, change it to a single space in the
fact table. This will correspond to the default value in the associated lookup table (see
Dimension Table Standards below for additional information).
3. Where possible, use the target database sequence generator (e.g. Oracle (select nextval from
dual”) to generate surrogate values. This simplifies data loads from multiple sources (and
multiple Informatica maps) into dimensional tables, and also simplifies ETL migrations
from development/QA to production environments.
4. High value date: The date ‘12/31/9999’ should be used in DW_LAST_EFFECTIVE_DT to
indicate the maximum date.
5. When the source is PeopleSoft, DW_FEFF_DT should be set equal to the date set in the
PeopleSoft effective date column.
6. When the source system is not PeopleSoft as a source, DW_FEFF_DT should be set to the
date the data was entered into the source system.
7. DW_LEFF_DT of the old current row should be changed from 12/31/9999 to the
DW_FEFF_DT of the new current row minus one day.
8. To determine the value in DW_FIRST_EFFECTIVE_DT
a. First, take the value from the source system.
b. If there is no date in the source system, consult the business owner for the date to
use.
c. If the column is a Data Warehouse field of no interest to the business owner, use the
default date of ‘01/01/1902’ to indicate the earliest possible First Effective Date. The
reason this date is used is to distinguish a date set by the Data Warehouse from
PeopleSoft system data which uses ‘01/01/1900’ and UCB custom data which uses
‘01/01/1901’.

Dimension Table Standards


9. Right trim spaces from all varchar columns in the ETL mapping except for Desc or Txt
types. Name types should be right trimmed.
10. When a consuming group uses a set of values that does not match the current set of values in
a dimension table, if necessary their values will be added and crossed referenced to the
values already in the table. When a consuming group uses different descriptions for the
same table value, if necessary their description will also be kept in the table.
11. All dimensions will have a row whose key is the highest value possible for the key’s data
type (e.g. 99999) and a description of “Unknown”. When a value is moved into the
warehouse that does not match any of the actual dimension values contained in the
dimension, this key will be assigned.
12. For all dimension table columns, if a report requires dashes ‘—‘ to be displayed instead of a
space, this change will be done in the report, not in the underlying view or table.
13. Add a row in all lookup tables (dimensions containing lists of code values and associated 
descriptions) for no value. This will allow an inner join between the lookup table and the 
fact table.  
a. The code value should be a single space – all source system code values of 
null or spaces should be converted to a single space before loading into the 
Page 3
ETL Team Development Standards
lookup table.  Special consideration should be given to cases where a source 
system contains multiple code values equal to null and/or one or more spaces.
b. Effective Date will be set to ‘01/01/1902’ (note that the default effective date 
in PeopleSoft applications is 01/01/1900.  Special consideration will neeed to
be made for those rare cases where there is data effective before 1900).
c. Short Description will be ‘Unknown’. 
d. Long Description will be ‘Unknown’.

3. Load Status Table


Load start and end times for each session should be captured in a load status table. A status
screen that shows the current load status will be developed to keep consumers informed.

4. Incremental Load Process Design

The table BISSTG.ETL_JOB_TRACKER has been developed to support EDW incremental


load processes. It contains the columns listed in the matrix below:

Column Name Data Type Comments


JOB_NAME VARCHAR2(80) Contains the name of the Informatica Session to
be tracked. Should be unique for each session to
be tracked. Also should be generic so that if the
session name changes the table and mapping
doesn’t need to be updated; e.g. ‘BR2_jad_atld’
rather than ‘b_BR2_jad_atld_p3’
START_DATE DATE Timestamp indicating the time a given session
started, for date-based incremental loads
END_DATE DATE Timestamp indicating the date/time a given
session completed.

To support the functionality of this table, the SP_JOB_TRACKER Oracle Stored


Procedure has been created. This stored procedure takes JOB_NAME and either
“START” or “FINISH” as input (see format below):

SP_JOB_TRACKER (JOB_NAME, [START/FINISH]

If called with “START”, it creates a new ETL_JOB_TRACKER record, setting


END_DATE to NULL. If called with “FINISH”, it updates END_DATE to equal
sysdate in the most recent job tracker record.

The following steps will allow incremental loading of data from source tables,
independently of the last successful run of a given mapping.
 Incorporate SP_JOB_TRACKER_START as an unconnected Stored
Procedure transformation object of type “Source Pre-load”. Use the
Informatica Job Name and “START” as inputs.

Page 4
ETL Team Development Standards
 Incorporate SP_JOB_TRACKER_FINISH as an unconnected Stored
Procedure transformation object of type “Target Post-load”. Use Job Name
and “FINISH” as inputs.
 Add the following to the Source SQL:
WHERE…[Source table date field of interest] >
(SELECT MAX(START_DATE)
FROM ETL_JOB_TRACKER
WHERE JOB_NAME=’[Job Name]’
and FINISH_DATE NOT NULL

5. Informatica Workflow Development

Folder Management
Session folder should be the same as the folder in which mappings are developed using the
Designer tool.

Version Control
For new Informatica mappings, create sessions in info_dev repository using Server Manager
tool, per Informatica Naming Standards below. Sessions should have the same version
number as their associated mappings.

For revisions to existing Informatica mappings, copy sessions from info_prod repository to
info_dev repository within Server Manager tool. Verify that connection strings work and
that targets are appropriately defined. Maps and associated sessions should share the same 
version number.  See naming standards below for details on how version numbers should be 
maintained for sessions.

Naming Standards
Within Informatica Workflow Manager, sessions should be named using the following
template: s_MappingName_Qualifier

where:

s stands for “session”

MappingName is the name of the Informatica mapping associated with a given


session

Qualifier is A description of the functionality of the session. This only needs to


be added if the mapping is associated with multiple
sessions.

Within Informatica Workflow Manager, Workflows should be named using the following
template: wf_WorkflowName_Frequency

where:

wf stands for “Workflow”

Page 5
ETL Team Development Standards
WorkflowName is a description of the functionality contained within the workflow, e.g.
“HR_ADM_WKFORCE”

Frequency is how often the workflow runs e.g. “Monthly”, “Daily”,


“Weekly”. “Daily” can be used for workflows which run Monday
through Saturday or Monday through Friday.

6. Unit Testing

Using test data, verify that Informatica mappings perform as expected. Check results,
troubleshoot performance, execution and data quality. issues.

Access privileges may prevent developers from being able to view data contained in
database Views. Two ways to deal with this:
1. Update the security tables (in dev or QA) to allow access by your userid. Only do this if
you really know what you are doing.
2. Apply for security access through SARA (HRMS Dept. Security, Administer Workforce)

7. Migration to Production

Informatica objects
o Copy mappings from info_dev repository to info_prod repository using the Designer
tool.
o Copy sessions from info_dev repository to info_prod repository using the Server
Manager tool. Modify database connections as necessary to point to production
sources and targets. Verify that connection strings work and that sources and targets
are appropriately defined. Maps and associated sessions should share the same 
version number.  
o Assemble sessions into workflows as necessary, per technical specifications and 
functional requirements.
o If possible, test mappings/sessions/workflows in production to ensure functionality is
as expected.

Note to Contractors:  To move mappings and sessions to production, send a request to 
etldoctor@berkeley.edu or to your designated ETL resource contact within the Data 
Warehouse Services team.  Provide all pertinent details of the move with your request.  Note
that all ETL development contractors are generally granted full access development 
(info_dev) copies of production folders.

Procedures

Page 6
ETL Team Development Standards

Informatica:  for all significant changes (those involving changes to database objects or 
reports), Create a Change Mangement Request system ticket:  
https://remedy.berkeley.edu/CMR

Mainframe 
o Enter objects to be moved into TSO MIGMGR.
o Send email requests to asdhelp@berkeley.edu and ist­as­production@lists.berkeley.edu, 
describing what needs to be moved and when (e.g. move members xxx from 
EDW.PUB.STAGE.INCLIB to ASD.P.CTM.BIS.AEVARS).

Report Migration requests


o Contact bairpthelp@berkeley.edu. Non-standard between 5-6 PM , 8-9 AM.
o Report users/ESS staff are to review all report changes before general access is
allowed.
o Report developers should also update the report inventory
(https://bearshare.berkeley.edu/sites/RAPO/EDW/reporting/Reports/Shared
%20Documents/Forms/AllItems.aspx ) as part of any new development work. This
report consists of three spreadsheets listing details for BAIRS, BIS and HR reports.

Appendix A: Archiving Procedures


EDW data is backed up to tape on the following schedule:
 Nightly backups, retained for 30 days
 Weekly backups, retained for 90 days
 Monthly backups, retained for 1 year

Backups are not encrypted. Tapes are stored by Iron Mountain.

See ist­fst01.berkeley.edu:\RAPO\EDW\CCS­EDW\Support\Financials\ProductionSupport\AP PO Archive 
procedure.doc for details on AP/PO archiving procedures.

CalShare is backed up nightly with 2 hour snapshots taken during the day – more info is available at :  
https://bearshare.berkeley.edu/C4/Implementing%20BearShare/default.aspx. 

Appendix B: Environment Objectives


Development QA Production
 Shared environment for DBA’s,  Data validation Ready for
ETL and Report Developers.  Performance testing general
 Not for data validation  Non-production data can be loaded for test cases user access
 Not intended for functional users and then refreshed with production data (must be
coordinated).

Page 7
ETL Team Development Standards

Update History
Version By Date Comments
1.0 Max Michel and 6/5/08 First draft – incorporating comments and documents from Peter Cava, Boshin
Cheryl Kojina Lin and Gaelyn Chappel.

Page 8

You might also like