Professional Documents
Culture Documents
ETL Development Standards
ETL Development Standards
Template Location
To develop technical specifications for a project, use the PROJECT NAME Data
Requirements template under Data Warehouse Groups > Deliverable Templates:
https://calshare.berkeley.edu/sites/RAPO/EDW/reporting/Deliverable
%20Templates/Forms/AllItems.aspx
Publishing
Publish completed specifications and all technical documentation to CalShare sharepoint:
ETL Home Shared Documents.
Peer Review
Review completed designs with teammates.
Folder Management
Determine which folder in which to locate new mappings. Develop a new folder if
necessary, coordinating with the Informatica PowerCenter Server administrator. Maps and
sessions may be developed in personal folders within the info_dev repository, but should be
migrated to the dev copy of the relevant production folder after initial development is
complete and before QA testing begins. Guidelines for folder management follow:
o Source-to-Stage copy mappings should be located in subject area folders (e.g.
HCM9.0 for HR source tables)
o Mappings which load into ENTERPRISE_DIM tables should be located in the
ENTERPRISE folder.
Version Control
For new Informatica mappings, create mapping in info_dev repository using Designer tool.
Version for new maps is 1.0. See Naming Standards below for details.
Page 1
ETL Team Development Standards
For revisions to existing Informatica mappings, copy mappings from info_prod repository to
info_dev repository within Designer tool. Increment info_dev instance of mapping
version/name per Naming Standards below.
Naming Standards
Within Informatica Designer, maps should be named using the following template:
Area_TargetName_Qualifier_Action_vX_Y
where:
“X” is the major version number. It is initially set to 1 when a map is first
created and is incremented by one for each subsequent major
change to that mapping. Major changes involve fundamental changes to
a map design, e.g. new sources, transformations and/or
targets, replacing or significantly augmenting existing functionality. For
minor mapping revisions, the major version number
remains constant.
“Y” is the minor version number. It is initially set to 0 when a map is first
created and is incremented by one for each minor change to a
given mapping (e.g. a change to a Filter transformation condition or a
change to derived values within an Expression tranformation).
When the Major Version number (“X” above) is incremented, the minor
version number is re-set to “0”.
Page 2
ETL Team Development Standards
When values are moved into data warehouse fact tables, they will only be transformed when
the consumer requests a transformation: Nulls will remain nulls.
1. Right trim spaces from all varchar columns in the ETL mapping except for Desc or Txt
types. Name types should be right trimmed.
2. If a code field is equal to null or spaces in the source table, change it to a single space in the
fact table. This will correspond to the default value in the associated lookup table (see
Dimension Table Standards below for additional information).
3. Where possible, use the target database sequence generator (e.g. Oracle (select nextval from
dual”) to generate surrogate values. This simplifies data loads from multiple sources (and
multiple Informatica maps) into dimensional tables, and also simplifies ETL migrations
from development/QA to production environments.
4. High value date: The date ‘12/31/9999’ should be used in DW_LAST_EFFECTIVE_DT to
indicate the maximum date.
5. When the source is PeopleSoft, DW_FEFF_DT should be set equal to the date set in the
PeopleSoft effective date column.
6. When the source system is not PeopleSoft as a source, DW_FEFF_DT should be set to the
date the data was entered into the source system.
7. DW_LEFF_DT of the old current row should be changed from 12/31/9999 to the
DW_FEFF_DT of the new current row minus one day.
8. To determine the value in DW_FIRST_EFFECTIVE_DT
a. First, take the value from the source system.
b. If there is no date in the source system, consult the business owner for the date to
use.
c. If the column is a Data Warehouse field of no interest to the business owner, use the
default date of ‘01/01/1902’ to indicate the earliest possible First Effective Date. The
reason this date is used is to distinguish a date set by the Data Warehouse from
PeopleSoft system data which uses ‘01/01/1900’ and UCB custom data which uses
‘01/01/1901’.
The following steps will allow incremental loading of data from source tables,
independently of the last successful run of a given mapping.
Incorporate SP_JOB_TRACKER_START as an unconnected Stored
Procedure transformation object of type “Source Pre-load”. Use the
Informatica Job Name and “START” as inputs.
Page 4
ETL Team Development Standards
Incorporate SP_JOB_TRACKER_FINISH as an unconnected Stored
Procedure transformation object of type “Target Post-load”. Use Job Name
and “FINISH” as inputs.
Add the following to the Source SQL:
WHERE…[Source table date field of interest] >
(SELECT MAX(START_DATE)
FROM ETL_JOB_TRACKER
WHERE JOB_NAME=’[Job Name]’
and FINISH_DATE NOT NULL
Folder Management
Session folder should be the same as the folder in which mappings are developed using the
Designer tool.
Version Control
For new Informatica mappings, create sessions in info_dev repository using Server Manager
tool, per Informatica Naming Standards below. Sessions should have the same version
number as their associated mappings.
For revisions to existing Informatica mappings, copy sessions from info_prod repository to
info_dev repository within Server Manager tool. Verify that connection strings work and
that targets are appropriately defined. Maps and associated sessions should share the same
version number. See naming standards below for details on how version numbers should be
maintained for sessions.
Naming Standards
Within Informatica Workflow Manager, sessions should be named using the following
template: s_MappingName_Qualifier
where:
Within Informatica Workflow Manager, Workflows should be named using the following
template: wf_WorkflowName_Frequency
where:
Page 5
ETL Team Development Standards
WorkflowName is a description of the functionality contained within the workflow, e.g.
“HR_ADM_WKFORCE”
6. Unit Testing
Using test data, verify that Informatica mappings perform as expected. Check results,
troubleshoot performance, execution and data quality. issues.
Access privileges may prevent developers from being able to view data contained in
database Views. Two ways to deal with this:
1. Update the security tables (in dev or QA) to allow access by your userid. Only do this if
you really know what you are doing.
2. Apply for security access through SARA (HRMS Dept. Security, Administer Workforce)
7. Migration to Production
Informatica objects
o Copy mappings from info_dev repository to info_prod repository using the Designer
tool.
o Copy sessions from info_dev repository to info_prod repository using the Server
Manager tool. Modify database connections as necessary to point to production
sources and targets. Verify that connection strings work and that sources and targets
are appropriately defined. Maps and associated sessions should share the same
version number.
o Assemble sessions into workflows as necessary, per technical specifications and
functional requirements.
o If possible, test mappings/sessions/workflows in production to ensure functionality is
as expected.
Note to Contractors: To move mappings and sessions to production, send a request to
etldoctor@berkeley.edu or to your designated ETL resource contact within the Data
Warehouse Services team. Provide all pertinent details of the move with your request. Note
that all ETL development contractors are generally granted full access development
(info_dev) copies of production folders.
Procedures
Page 6
ETL Team Development Standards
Informatica: for all significant changes (those involving changes to database objects or
reports), Create a Change Mangement Request system ticket:
https://remedy.berkeley.edu/CMR
Mainframe
o Enter objects to be moved into TSO MIGMGR.
o Send email requests to asdhelp@berkeley.edu and istasproduction@lists.berkeley.edu,
describing what needs to be moved and when (e.g. move members xxx from
EDW.PUB.STAGE.INCLIB to ASD.P.CTM.BIS.AEVARS).
See istfst01.berkeley.edu:\RAPO\EDW\CCSEDW\Support\Financials\ProductionSupport\AP PO Archive
procedure.doc for details on AP/PO archiving procedures.
CalShare is backed up nightly with 2 hour snapshots taken during the day – more info is available at :
https://bearshare.berkeley.edu/C4/Implementing%20BearShare/default.aspx.
Page 7
ETL Team Development Standards
Update History
Version By Date Comments
1.0 Max Michel and 6/5/08 First draft – incorporating comments and documents from Peter Cava, Boshin
Cheryl Kojina Lin and Gaelyn Chappel.
Page 8