Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 35

Informatica

The following pages summarize some of the questions that typically arise during development and suggest potential resolutions.
Q: How does source format affect performance? (i.e., is it more efficient to source from a flat file rather than a database?)
In general, a flat file that is located on the server machine loads faster than a database located on the server machine. Fixedwidth files are faster than delimited files because delimited files require extra parsing. However, if there is an intent to perform intricate
transformations before loading to target, it may be advisable to first load the flat file into a relational database, which allows the Power
Center mappings to access the data in an optimized fashion by using filters and custom SQL SELECTs where appropriate.
Q: What are some considerations when designing the mapping? (i.e. what is the impact of having multiple targets populated
by a single map?)
With Power Center, it is possible to design a mapping with multiple targets. You can then load the targets in a specific order
using Target Load Ordering. The recommendation is to limit the amount of complex logic in a mapping. Not only is it easier to debug a
mapping with a limited number of objects, but they can also be run concurrently and make use of more system resources. When using
multiple output files (targets) consider writing to multiple disks or file systems simultaneously. This minimizes disk seeks and applies to
a session writing to multiple targets, and to multiple sessions running simultaneously.
Q: What are some considerations for determining how many objects and transformations to include in a single mapping?
There are several items to consider when building a mapping. The business requirement is always the first consideration,
regardless of the number of objects it takes to fulfill the requirement. The most expensive use of the DTM is passing unnecessary data
through the mapping. It is best to use filters as early as possible in the mapping to remove rows of data that are not needed. This is the
SQL equivalent of the WHERE clause. Using the filter condition in the Source Qualifier to filter out the rows at the database level is a
good way to increase the performance of the mapping.
Log File Organization
Q: Where is the best place to maintain Session Logs?
One often-recommended location is the default "SessLogs" folder in the Informatica directory, keeping all log files in the same
directory.
Q:

What documentation is available for the error codes that appear within the error log files?
Log file errors and descriptions appear in Appendix C of the Power Center Troubleshooting Guide. Error information also
appears in the Power Center Help File within the Power Center client applications. For other database-specific errors, consult your
Database User Guide.
Scheduling Techniques
Q: What are the benefits of using workflows with multiple tasks rather than a workflow with a stand-alone session?
Using a workflow to group logical sessions minimizes the number of objects that must be managed to successfully load the
warehouse. For example, a hundred individual sessions can be logically grouped into twenty workflows. The Operations group can then
work with twenty workflows to load the warehouse, which simplifies the operations tasks associated with loading the targets.
Workflows can be created to run sequentially or concurrently or have tasks in different paths doing either.
A sequential workflow runs sessions and tasks one at a time, in a linear sequence. Sequential workflows help ensure that
dependencies are met as needed. For example, a sequential workflow ensures that session1 runs before session2 when session2 is
dependent on the load of session1, and so on. It's also possible to set up conditions to run the next session only if the previous session
was successful, or to stop on errors, etc.
A concurrent workflow groups logical sessions and tasks together, like a sequential workflow, but runs all the tasks at one
time. This can reduce the load times into the warehouse, taking advantage of hardware platforms' Symmetric Multi-Processing (SMP)
architecture.
Other workflow options, such as nesting worklets within workflows, can further reduce the complexity of loading the warehouse.
However, this capability allows for the creation of very complex and flexible workflow streams without the use of a third-party scheduler.
Q: Assuming a workflow failure, does Power Center allow restart from the point of failure?
No. When a workflow fails, you can choose to start a workflow from a particular task but not from the point of failure. It is
possible, however, to create tasks and flows based on error handling assumptions.
Q: What guidelines exist regarding the execution of multiple concurrent sessions / workflows within or across applications?
Workflow Execution needs to be planned around two main constraints:
Available system resources
Memory and processors
The number of sessions that can run at one time depends on the number of processors available on the server. The load
manager is always running as a process. As a general rule, a session will be compute-bound, meaning its throughput is limited by the
availability of CPU cycles. Most sessions are transformation intensive, so the DTM always runs. Also, some sessions require more I/O,
so they use less processor time. Generally, a session needs about 120 percent of a processor for the DTM, reader, and writer in total.
For concurrent sessions:
One session per processor is about right; you can run more, but that requires a "trial and error" approach to determine what
number of sessions starts to affect session performance and possibly adversely affect other executing tasks on the server.
The sessions should run at "off-peak" hours to have as many available resources as possible.Even after available processors

are determined, it is necessary to look at overall system resource usage. Determining memory usage is more difficult than the
processors calculation; it tends to vary according to system load and number of Informatica sessions running.
The first step is to estimate memory usage, accounting for:
Operating system kernel and miscellaneous processes
Database engine
Informatica Load Manager
The DTM process creates threads to initialize the session, read, write and transform data, and handle pre- and post-session
operations.
More memory is allocated for lookups, aggregates, ranks, sorters and heterogeneous joins in addition to the shared memory
segment.
At this point, you should have a good idea of what is left for concurrent sessions. It is important to arrange the production run
to maximize use of this memory. Remember to account for sessions with large memory requirements; you may be able to run only one
large session, or several small sessions concurrently.
Load Order Dependencies are also an important consideration because they often create additional constraints. For example,
load the dimensions first, then facts. Also, some sources may only be available at specific times, some network links may become
saturated if overloaded, and some target tables may need to be available to end users earlier than others.
Q: Is it possible to perform two "levels" of event notification? At the application level and the Informatica Server level to notify
the Server Administrator?
The application level of event notification can be accomplished through post-session e-mail. Post-session e-mail allows you to
create two different messages, one to be sent upon successful completion of the session, the other to be sent if the session fails.
Messages can be a simple notification of session completion or failure, or a more complex notification containing specifics about the
session.
You
can
use
the
following
variables
in
the
text
of
your
post-session
e-mail:
E-mail Variable Description
%s Session name
%l Total records loaded
%r Total records rejected
%e Session status
%t Table details, including read throughput in bytes/second and write throughput in rows/second
%b Session start time
%c Session completion time
%i Session elapsed time (session completion time-session start time)
%g Attaches the session log to the message
%m Name and version of the mapping used in the session
%d Name of the folder containing the session
%n Name of the repository containing the session
%a Attaches the named file. The file must be local to the Informatica Server. The following are valid filenames: %a or %a
On Windows NT, you can attach a file of any type.
On UNIX, you can only attach text files. If you attach a non-text file, the send might fail.
Note: The filename cannot include the Greater Than character (>) or a line break.
The PowerCenter Server on UNIX uses rmail to send post-session e-mail. The repository user who starts the PowerCenter server must
have the rmail tool installed in the path in order to send e-mail.
To verify the rmail tool is accessible:
1. Login to the UNIX system as the PowerCenter user who starts the PowerCenter Server.
2. Type rmail at the prompt and press Enter.
3. Type '.' to indicate the end of the message and press Enter.
4. You should receive a blank e-mail from the PowerCenter user's e-mail account. If not, locate the directory where rmail resides and
add that directory to the path.
5. When you have verified that rmail is installed correctly, you are ready to send post-session e-mail.
The output should look like the following:
Session complete.
Session name: sInstrTest
Total Rows Loaded = 1
Total Rows Rejected = 0
Completed
Rows
Loaded Rows
Rejected ReadThroughput
(bytes/sec) WriteThroughput
(rows/sec) Table Name
Status
1 0 30 1 t_Q3_sales
No errors encountered.
Start Time: Tue Sep 14 12:26:31 1999
Completion Time: Tue Sep 14 12:26:41 1999

Elapsed time: 0:00:10 (h:m:s)


This information, or a subset, can also be sent to any text pager that accepts e-mail.
Backup Strategy Recommendation
Q: Can individual objects within a repository be restored from the back-up or from a prior version?
At the present time, individual objects cannot be restored from a back-up using the Power Center Repository Manager (i.e.,
you can only restore the entire repository). But, it is possible to restore the back up repository into a different database and then
manually copy the individual objects back into the main repository.
Another option is to export individual objects to XML files. This allows for the granular re-importation of individual objects, mappings,
tasks, workflows, etc.
Server Administration
Q: What built-in functions, does Power Center provide to notify someone in the event that the server goes down, or some
other significant event occurs?
The Repository Server can be used to send messages notifying users that the server will be shut down. Additionally, the
Repository Server can be used to send notification messages about repository objects that are created, modified or deleted by another
user. Notification messages are received through the Informatica Client tools.
Q: What system resources should be monitored? What should be considered normal or acceptable server performance
levels?
The pmprocs utility, which is available for UNIX systems only, shows the currently executing PowerCenter processes.
Pmprocs is a script that combines the ps and ipcs commands. It is available through Informatica Technical Support. The utility
provides the following information:
- CPID - Creator PID (process ID)
- LPID - Last PID that accessed the resource
- Semaphores - used to sync the reader and writer
- 0 or 1 - shows slot in LM shared memory
(See Chapter 16 in the PowerCenter Repository Guide for additional details.)
A variety of UNIX and Windows NT commands and utilities are also available. Consult your UNIX and/or Windows NT documentation.
Q: What cleanup (if any) should be performed after a UNIX server crash? Or after an Oracle instance crash?
If the UNIX server crashes, you should first check to see if the repository database is able to come back up successfully. If this
is the case, then you should try to start the Power Center server. Use the pmserver.err log to check if the server has started correctly.
You can also use ps -ef | grep pmserver to see if the server process (the Load Manager) is running.
Metadata
Q: What recommend
dations or considerations exist as to naming standards or repository administration for
metadata that might be extracted from the Power Center repository and used in others?
With Power Center, you can enter description information for all repository objects, sources, targets, transformations, etc, but
the amount of metadata that you enter should be determined by the business requirements. You can also drill down to the column level
and give descriptions of the columns in a table if necessary. All information about column size and scale, data types, and primary keys
are stored in the repository.
The decision on how much metadata to create is often driven by project timelines. While it may be beneficial for a developer to
enter detailed descriptions of each column, expression, variable, etc, it is also very time consuming to do so. Therefore, this decision
should be made on the basis of how much metadata will be required by the systems that use the metadata.
There are some time saving tools that are available to better manage a metadata strategy and content, such as third party metadata
software and, for sources and targets, data modeling tools.
Q: What procedures exist for extracting metadata from the repository?
Informatica offers an extremely rich suite of metadata-driven tools for data warehousing applications. All of these tools store,
retrieve, and manage their metadata in Informatica's central repository. The motivation behind the original Metadata Exchange (MX)
architecture was to provide an effective and easy-to-use interface to the repository.Today, Informatica and several key Business
Intelligence (BI) vendors, including Brio, Business Objects, Cognos, and Micro Strategy, are effectively using the MX views to report
and query the Informatica metadata.
Informatica strongly discourages accessing the repository directly, even for SELECT access because some releases of Power
Center change the look and feel of the repository tables, resulting in a maintenance task for you. Rather, views have been created to
provide access to the metadata stored in the repository.
Additional products, such as Informaticas Metadata Reporter and Power Analyzer, allow for more robust reporting against the
repository database and are able to present reports to the end-user and/or management.
1. While importing the relational source definition from database, what are the Metadata of source U import?
Source name
Database location
Column names
Data types
Key constraints
2. How many ways U can update a relational source definition and what r they?
Two ways
1. Edit the definition

2. Re-import the definition


3.Where should U place the flat file to import the flat file definition to the designer?
Place it in local folder
4. To provide support for Mainframes source data, which files r used as a source definitions?
COBOL files
5. Which transformation should u need while using the COBOL sources as source definitions?
Normalizer transformation which is used to normalize the data. Since COBOL sources r oftenly consists of De-normalized
data.
6. How can U create or import flat file definition into the warehouse designer?
U can not create or import flat file definition into warehouse designer directly. Instead U must analyze the file in source
analyzer, and then drag it into the warehouse designer. When you drag the flat file source definition into warehouse designer
workspace, the warehouse designer creates a relational target definition not a file definition. If u want to load to a file, configure the
session to write to a flat file. When the Informatica server runs the session, it creates and loads the flat file.
7. What is the Mapplet?
Mapplet is a set of transformations that you build in the Mapplet designer and U can use in multiple mappings.
8. What is a transformation?
It is a repository object that generates, modifies or passes data.
9. Define: Mapping?
It is a set of source and target definitions linked by transformation objects that define the rules for transformation.
10. What r the designer tools for creating transformations?
Mapping designer
Transformation developer
Mapplet designer
11. What r the active and passive transformations?
An active transformation can change the number of rows that pass through it.
A passive transformation does not change the number of rows that pass through it.
12. What r the connected or unconnected transformations?
An unconnected transformation is not connected to other transformations in the mapping.
Connected transformation is connected to other transformations in the mapping.
13. How many ways u create ports?
Two ways
1.Drag the port from another transformation
2.Click the add button on the ports tab.
14. What r the reusable transformations?
Reusable transformations can be used in multiple mappings. When you need to incorporate this transformation into mapping,
U add an instance of it to mapping. Later if U change the definition of the transformation, all instances of it inherit the changes. Since
the instance of reusable transformation is a pointer to that transformation, U can change the transformation in the transformation
developer; its instances automatically reflect these changes. This feature can save U great deal of work.
15. What r the methods for creating reusable transformations?
Two methods
1. Design it in the transformation developer.
2. Promote a standard transformation from the mapping designer. After U add a transformation to the mapping, U can
promote it to the status of reusable transformation. Once U promote a standard transformation to reusable status, U can demote it to a
standard transformation at any time. If u change the properties of a reusable transformation in mapping, U can revert it to the original
reusable transformation properties by clicking the revert button.
16. What r the unsupported repository objects for a Mapplet?
COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations
Pre or post session stored procedures
Target definitions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions
17. What r the mapping parameters and mapping variables?
Mapping parameter represents a constant value that U can define before running a session. A mapping parameter retains the
same value throughout the entire session. When u use the mapping parameter, U declare and use the parameter in a mapping or
Mapplet. Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter, a mapping variable

represents a value that can change throughout the session. The Informatica server saves the value of mapping variable to the
repository at the end of session run and uses that value next time U runs the session
.
18. Can U use the mapping parameters or variables created in one mapping into another mapping?
NO. We can use mapping parameters or variables in any transformation of the same Mapping or Mapplet in which U have
created mapping parameters or variables.
19. Can u use the mapping parameters or variables created in one mapping into any other reusable transformation?
Yes. Because reusable transformation is not contained with any Mapplet or mapping.
20. How can U improve session performance in aggregator transformation?

Use sorted input.

21. What is aggregate cache in aggregator transformation?


The aggregator stores ata
in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the
Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more
space, it stores overflow values in cache files.
22. What r the difference between joiner transformation and source qualifier transformation?
U can join heterogeneous data sources in joiner transformation which
we can not achieve in source qualifier transformation.
U need matching keys to join two relational sources in source qualifier transformation. Where as u doesn't need matching keys
to join two sources.
Two relational sources should come from same data source in
source qualifier. U can join relational sources which r coming from different sources also.
23. In which conditions we can not use joiner transformation (Limitations of joiner transformation)?
Both pipelines begin with the same original data source.
Both input pipelines originate from the same Source Qualifier transformation.
Both input pipelines originate from the same Normalizer transformation.
Both input pipelines originate from the same Joiner transformation.
Either input pipelines contains an Update Strategy transformation.
Either input pipelines contains a connected or unconnected Sequence Generator transformation.
24. What r the settings that u use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
25. What r the join types in joiner transformation?
Normal (Default)
Master outer
Detail outer
Full outer
26. What r the joiner caches?
When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and
builds index and data caches based on the master rows.
After building the caches, the Joiner transformation reads records from the detail source and performs joins.
27. What is the look up transformation?
Use lookup transformation
in u'r mapping to lookup data in a relational table, view, and synonym. Informatica server queries the look up table based on the lookup
ports in the transformation. It compares the lookup transformation port values to lookup table column values based on the look up
condition.
28. Why use the lookup transformation?
To perform the following tasks.
Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target
table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but
not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the
target.
29. What r the types of lookup?
Connected and unconnected
30. Differences between connected and unconnected lookup?
Connected lookup
Receives input values directly from
the pipe line.
U can use a dynamic or static cache
Cache includes all lookup columns

Unconnected lookup

Receives input values from the result of a


:lkp expression in a another transformation.
U can use a static cache.
Cache includes all lookup output ports in

used in the mapping.

the lookup condition and the lookup/return


port.

Support user defined default values

Does not support user defined default values

31. What is meant by lookup caches?


The Informatica server builds a cache in memory when it processes the first row of a data in a cached look up transformation.
It allocates memory for the cache based on the amount u configure in the transformation or session properties. The Informatica server
stores condition values in the index cache and output values in the data cache.
32. What r the types of lookup caches?
Persistent cache: U can save the lookup cache files and reuse them the next time the Informatica server processes a lookup
transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with the lookup table, U can configure the lookup transformation
to rebuild the lookup cache.
Static cache: U can configure a static or read only cache for only lookup table. By default Informatica server creates a static cache. It
caches the lookup table and lookup values in the cache for each row that comes into the transformation. When the lookup condition is
true, the Informatica server does not update the cache while it processes the lookup transformation.
Dynamic cache: If u wants to cache the target table and insert new rows into cache and the target, u can create a look up
transformation to use dynamic cache. The Informatica server dynamically inserts data to the target table.
Shared cache: U can share the lookup cache between multiple transactions. U can share unnamed cache between transformations in
the same mapping.
33. Difference between static cache and dynamic cache?
Static cache
U can not insert or update the cache.
The Informatica server returns a value from
the lookup table or cache when the condition
is true. When the condition is not true, the
Informatica server returns the default value
for connected transformations and null for
unconnected transformations.

Dynamic cache

U can insert rows into the cache as u pass


to the target
The Informatica server inserts rows into cache
when the condition is false. This indicates that
the row is not in the cache or target table.
U can pass these rows to the target table.

34. Which transformation should we use to normalize the COBOL and relational sources?
Normalizer Transformation. When U drag the COBOL source in to the mapping Designer workspace, the normalizer
transformation automatically appears, creating input and output ports for every column in the source.
35. How the Informatica server sorts the string values in Rank transformation?
When the Informatica server runs in the ASCII data movement mode it sorts session data using Binary sort order. If U
configures the session to use a binary sort order, the Informatica server calculates the binary value of each string and returns the
specified number of rows with the highest binary values for the string.
36. What r the rank caches?
During the session, the Informatica server compares an input row with rows in the data cache. If the input row out-ranks a
stored row, the Informatica server replaces the stored row with the input row. The Informatica server stores group information in an
index cache and row data in a data cache.
37. What is the Rank index in Rank transformation?
The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank
Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5
salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:
38.What is the Router transformation?
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test
data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router
transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the
conditions to a default output group. If you need to test the same input data based on multiple conditions, use a Router Transformation
in a mapping instead of creating multiple Filter transformations to perform the same task.
39. What r the types of groups in Router transformation?
Input group and Output group.
The designer copies property information from the input ports of the input group to create a set of output ports for each output
group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.

40. Why we use stored procedure transformation?


For populating and maintaining data bases
42. What r the types of data that passes between Informatica server and stored procedure?
3 - Types of data Input/Out put parameters,Return Values,Status code.
43. What is the status code?
Status code provides error handling for the Informatica server during the session. The stored procedure issues a status code
that notifies whether or not stored procedure completed successfully. This value can not seen by the user. It only used by the
Informatica server to determine whether to continue running the session or stop.
44. What is source qualifier transformation?
When U add a relational or a flat file source definition to a mapping you need to connect it to a source qualifier transformation.
The source qualifier transformation represents the records
that the Informatica server reads when it runs a session.
45. What r the tasks that source qualifier perform?
Join data originating from same source data base.
Filter records when the Informatica server reads source data.
Specify an outer join rather than the default inner join
Specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the Informatica server to read source data.
46. What is the target load order?
U specify the target load order based on source qualifiers in a mapping. If u have the multiple source qualifiers connected to
the multiple targets, U can designate the order in which Informatica server loads data into the targets.
47. What is the default join that source qualifier provides?
Inner equi join.
48. What r the basic needs to join two sources in a source qualifier?
Two sources should have primary and foreign key relation ships.
Two sources should have matching data types.
49. What is update strategy transformation?
This transformation is used to maintain the history data or just most recent changes in to target table.
50. Describe two levels in which update strategy transformation sets?
Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for
example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database
operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.
51. What is the default source option for update strategy transformation?
Data Driven
52. What is Data Driven?
The Informatica server follows instructions coded into update strategy transformations with in the session mapping determine
how to flag records for insert, update, delete or reject
If u do not choose data driven option setting, the Informatica server ignores all update strategy
transformations in the mapping.
53. What r the options in the target session of update strategy transformation?
Insert
Delete
Update as update
Update as insert
Update else insert
Truncate table
54. What r the types of mapping wizards that r to be provided in Informatica?
The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are designed to
create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact table.
Getting Started Wizard. Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables.
Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing dimension tables based on the amount of historical
dimension data you want to keep and the method you choose to handle historical dimension data.
55. What r the types of mapping in Getting Started Wizard?
Simple Pass through mapping:
Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all existing data from your table
before loading new data.
Slowly Growing target:
Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when existing data does not

require updates.
56. What r the mappings that we use for slowly changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1
Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions
of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the
target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of
dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new
dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing
dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the
updates
57. What r the different types of Type2 dimension mapping?
Type2 Dimension/Version Data Mapping: In this mapping the updated dimension in the source will gets inserted in target along with a
new version number. And newly added dimension
in source will inserted into target with a primary key.
Type2 Dimension/Flag current Mapping: This mapping is also used for slowly changing dimensions. In addition it creates a flag value
for changed or new dimension.
Flag indicates the dimension is new or newly updated. Recent dimensions will gets saved with current flag value 1. And updated
dimensions r saved with the value 0.
Type2 Dimension/Effective Date Range Mapping: This is also one flavour of Type2 mapping used for slowly changing dimensions.
This mapping also inserts both new and changed dimensions in to the target. And changes r tracked by the effective date range for
each version of each dimension.
58. How can u recognize whether or not the newly added rows in the source r gets insert in the target?
In the Type2 mapping we have three options to recognize the newly added rows
Version number
Flag value
Effective date Range
59. What r two types of processes that Informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.
The DTM process: Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session
operations.
60. What r the new features of the server manager in the Informatica 5.0?
U can use command line arguments for a session or batch. This allows U to change the values of session parameters, and
mapping parameters and mapping variables.
Parallel data processing: This feature is available for power center only. If we use the Informatica server on a SMP system, U can use
multiple CPU's to process a session concurrently.
Process session data using threads: Informatica server runs the session in two processes. Explained in previous question.
61. Can u generate reports in Informatica?
Yes. By using Metadata reporter we can generate reports in Informatica.
62. What is metadata reporter?
It is a web based application that enables you to run reports against repository metadata.
With a meta data reporter, u can access information about U'r repository with out having knowledge of sql, transformation language or
underlying tables in the repository.
63. Define mapping and sessions?
Mapping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation.
Session: It is a set of instructions that describe how and when to move data from source to targets.
64. Which tool U use to create and manage sessions and batches and to monitor and stop the Informatica server?
Informatica server manager.
65. Why we use partitioning the session in Informatica?
Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into
target.
66. To achieve the session partition what r the necessary tasks u have to do?
Configure the session to partition source data.
Install the Informatica server on a machine with multiple CPU's.

67. How the Informatica server increases the session performance through partitioning the source?
For relational sources Informatica server creates multiple connections for each partition of a single source and extracts
separate range of data for each connection. Informatica server reads multiple partitions of a single source concurrently. Similarly for
loading also Informatica server creates multiple connections to the target and loads partitions of data concurrently.
For XML and file sources, Informatica server reads multiple files concurrently. For loading the data Informatica server creates
a separate file for each partition (of a source file).U can choose to merge the targets.
68. Why u use repository connectivity?
When u edit, schedule the session each time, Informatica server directly communicates the repository to check whether or not
the session and users r valid. All the metadata of sessions and mappings will be stored in repository.
69. What r the tasks that Load Manager Process will do?
Manages the session and batch scheduling: When u start the Informatica server the load manager launches and queries the
repository for a list of sessions configured to run on the Informatica server. When u configures the session the load manager maintains
list of list of sessions and session start times. When u start a session load manger fetches the session information from the repository to
perform the validations and verifications prior to starting DTM process.
Locking and reading the session: When the Informatica server starts a session load manager locks the session from the repository.
Locking prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter files, load manager reads the parameter file and verifies that the session
level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or not the user have privileges to run the
session.
Creating log files: Load manger creates log file contains the status of session.
70. What is DTM process?
After the load manger performs validations for session, it creates the DTM process. DTM is to create and manage the threads
that carry out the session tasks. I create the master thread. Master thread creates and manages all the other threads.

71. What r the different threads in DTM process?


Master thread: Creates and manages all other threads
Mapping thread: One mapping thread will be creates for each session. It fetches session and mapping information.
Pre and post session threads: This will be created to perform pre and post session operations.
Reader thread: One thread will be created for each partition of a source. It reads data from source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to transform data.
72. What r the data movement modes in Informatica?
Data movement modes determine how Informatica server handles the character data. U choose the data movement in the
Informatica server configuration settings. Two types of data movement modes available in Informatica.
ASCII mode
Unicode mode.
73. What r the out put files that the Informatica server creates during the session running?
Informatica server log: Informatica server (on unix) creates a log for all status and error messages (default name: pm.server.log).It
also creates an error log for error messages. These files will be created in Informatica home directory.
Session log file: Informatica server creates session log file for each session. It writes information about session into log files such as
initialization process, creation of sql commands for reader and writer threads, errors encountered and load summary. The amount of
detail in session log file depends on the tracing level that u set.
Session detail file: This file contains load statistics for each targets in mapping. Session detail include information such as table name,
number of rows written or rejected. U can view this file by double clicking on the session in monitor window
Performance detail file: This file contains information known as session performance details which helps U where performance can be
improved. To generate this file select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when U run a session that uses the external loader. The control file
contains the information about the target flat file such as data format and loading instructions for the external loader.
Post session email: Post session email allows U to automatically communicate information about a session run to designated
recipients can create two different messages. One if the session completed successfully the other if the session fails.

Indicator file: If u use the flat file as a target, U can configure the Informatica server to create indicator file. For each target row, the
indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the target file based on file properties entered in the session
property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache files. For the following circumstances
Informatica server creates index and data cache files.
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
74. In which circumstances that Informatica server creates reject files?
When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.
75. What is polling?
It displays the updated information about the session in the monitor window. The monitor window displays the status of each
session when U polls the Informatica server
76. Can u copy the session to a different folder or repository?
Yes. By using copy session wizard u can copy a session in a different folder or repository. But that target folder or repository
should consists of mapping of that session. If target folder or repository is not having the mapping of copying session, u should have to
copy that mapping first before u copy the session
77. What is batch and describe about types of batches?
Grouping of session is known as batch. Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another. If
u have several independent sessions u can use concurrent batches. Which runs all the sessions at the same time?
78. Can u copy the batches?
NO
79.How many number of sessions that u can create in a batch?
Any number of sessions.
80. When the Informatica server marks that a batch is failed?
If one of session is configured to "run if previous completes" and that previous session fails.
81. What is a command that used to run a batch?
pmcmd is used to start a batch.
82. What r the different options used to configure the sequential batches?
Two options:
Run the session only if previous session completes successfully.
Always runs the session.
83. In a sequential batch can u run the session if previous session fails?
Yes. By setting the option always runs the session.
84. Can u start batches with in a batch?
U can not. If u want to start batch that resides in a batch, create a new independent batch and copy the necessary sessions
into the new batch.
85. Can u start a session inside a batch individually?
We can start our required session only in case of sequential batch.
In case of concurrent batch we cant do like this.
86. How can u stop a batch?
By using server manager or pmcmd.
87. What r the session parameters?
Session parameters r like mapping parameters, represent values U might want to change between sessions such as database
connections or source files.
Server manager also allows U to create user defined session parameters. Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or location of
session target file between session runs.

Reject file name : Use this parameter when u want to change the name or location of
session reject files between session runs.
88. What is parameter file?
Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by text
editor such as word pad or notepad.
U can define the following values in parameter file
Mapping parameters
Mapping variables
session parameters
89. How can u access the remote source into U'r session?
Relational source: To access relational source which is situated in a remote place, u need to
configure database connection to the data source.
File Source: To access the remote source file U must configure the FTP connection to the
host machine before u create the session.
Heterogeneous: When U'r mapping contains more than one source type, the server manager creates a heterogeneous session that
displays source options for all types.
90. What is difference between partitioning of relational target and partitioning of file targets?
If u partition a session with a relational target Informatica server creates multiple connections to the target database to write
target data concurrently. If u partition a session with a file target the Informatica server creates one target file for each partition. U can
configure session properties to merge these target files.

91. What r the transformations that restricts the partitioning of sessions?


Advanced External procedure transformation and External procedure transformation: This transformation contains a check box
on the properties tab to allow partitioning.
Aggregator Transformation: If u use sorted ports u can not partition the associated source
Joiner Transformation: U can not partition the master source for a joiner transformation
Normalizer Transformation
XML targets.
92. Performance tuning in Informatica?
The goal of performance tuning is optimize session performance so sessions run during the available load window for the
Informatica Server. Increase the session performance by following.
The performance of the Informatica Server is related to network connections. Data generally moves across a network at less
than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session
performance. So avoid network connections.
Flat files: If u'r flat files stored on a machine other than the Informatica server, move those files to the machine that consists of
Informatica server.
Relational data sources: Minimize the connections to sources, targets and Informatica server to
improve session performance. Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force Informatica server to perform multiple data passes.
Removing of staging areas may improve session performance.
U can run the multiple Informatica servers against the same repository. Distributing the session load to multiple Informatica servers may
improve session performance.
Run the Informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode
stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table
select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size, which allows
data to cross the network at one time. To do this go to server manager, choose server configure database connections.
If u r target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop
constraints and indexes before u run the session and rebuild them after completion of session.
Running parallel sessions by using concurrent batches will also reduce the time of loading the
data. So concurrent batches may also increase the session performance.
Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in
parallel pipe lines.

In some cases if a session contains a aggregator transformation, u can use incremental aggregation to improve session performance.
Aviod transformation errors to improve the session performance.
If the session contains lookup transformation u can improve the session performance by enabling the look up cache.
If U'r session contains filter transformation, create that filter transformation nearer to the sources
or u can use filter condition in source qualifier.
Aggregator, Rank and joiner transformation may oftenly decrease the session performance. Because they must group data before
processing it. To improve session performance in this case use sorted ports option.
93. What is difference between mapplet and reusable transformation?
Mapplet consists of set of transformations that is reusable. A reusable transformation is a
single transformation that can be reusable.
If u create a variables or parameters in mapplet that can not be used in another mapping or mapplet. Unlike the variables that
r created in a reusable transformation can be useful in any other mapping or mapplet.
We can not include source definitions in reusable transformations. But we can add sources to a mapplet.
Whole transformation logic will be hided in case of mapplet. But it is transparent in case of reusable transformation.
We cant use COBOL source qualifier, joiner, normalizer transformations in mapplet. Where as we can make them as a
reusable transformations.
94. Define Informatica repository?
The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and
Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when
you want the Informatica Server to perform the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords, permissions and privileges, and
product version.
Use repository manager to create the repository. The Repository Manager connects to the repository database and runs the
code needed to create the repository tables. These tables
stores metadata in specific format the Informatica server, client tools use.
95. What r the types of metadata that stores in repository?
Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations
96. What is power center repository?
The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a data mart
domain, you can create a single global repository to store metadata used across an enterprise, and a number of local repositories to
share the global metadata as needed.
97. How can u work with remote database in Informatica? Did u work directly by using remote connections?
To work with remote data source u need to connect it with remote connections. But it is not preferable to work with that remote
source directly by using remote connections .Instead u bring that source into U r local machine where Informatica server resides. If u
work directly with remote source the session performance will decreases by passing less amount of data across the network in a
particular time.
98. What r the new features in Informatica 5.0?
U can Debug U'r mapping in mapping designer
U can view the work space over the entire screen
The designer displays a new icon for a invalid mappings in the navigator window
U can use a dynamic lookup cache in a lookup transformation
Create mapping parameters or mapping variables in a mapping or mapplet to make mappings more flexible
U can export objects into repository and import objects from repository. when u export a repository object, the designer or server
manager creates an XML file to describe the repository metadata.
The designer allows u to use Router transformation to test data for multiple conditions. Router transformation allows u route groups of
data to transformation or target.
U can use XML data as a source or target.
Server Enhancements:
U can use the command line program pmcmd to specify a parameter file to run sessions or batches. This allows you to change
the values of session parameters, and mapping parameters and variables at runtime.

If you run the Informatica Server on a symmetric multi-processing system, you can use multiple CPUs to process a session
concurrently. You configure partitions in the session properties based on source qualifiers. The Informatica Server reads, transforms,
and writes partitions of data in parallel for a single session. This is available for Power center only.
Informatica server creates two processes like load manager process, DTM process to run the sessions.
Metadata Reporter: It is a web based application which is used to run reports against repository metadata.
U can copy the session across the folders and repositories using the copy session wizard in the Informatica server manager.
With new email variables, you can configure post-session email to include information, such as the mapping used during the
session
99. What is incremental aggregation?
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the
source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This
allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the
same calculations each time you run the session.
100. What r the scheduling options to run a session?
U can schedule a session to run at a given time or interval, or u can manually run the session.
Different options of scheduling
Run only on demand: server runs the session only when user starts session explicitly
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervals as u configured.
Customized repeat: Informatica server runs the session at the dates and times specified in the repeat dialog box.
101 .What is tracing level and what r the types of tracing level?
Tracing level represents the amount of information that Informatica server writes in a log file. Types of tracing level
Normal
Verbose
Verbose initialization
Verbose data
102. What is difference between stored procedure transformation and external procedure transformation?
In case of stored procedure transformation procedure will be compiled and executed in a relational data source. U need data
base connection to import the stored procedure in to u'r mapping.
Where as in external procedure transformation procedure or function will be executed outside of data source. If u need to
make it as a DLL to access in u r mapping. No need to have data base connection in case of external procedure transformation.
103. Explain about Recovering sessions?
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of
failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of
the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
1 Run the session again if the Informatica Server has not issued a commit.
2 Truncate the target tables and run the session again if the session is not recoverable.
3 Consider performing recovery if the Informatica Server has issued at least one commit.
104. If a session fails after loading of 10,000 records in to the target. How can u load the records from 10001 th record when u
run the session next time?
As explained above Informatica server has 3 methods to recovering the sessions. Use performing recovery to load the records
from where the session fails.
105. Explain about perform recovery?
When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of
the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next
row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica
Server bypass the rows up to 10,000 and starts loading with row 10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server
setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table.

106. How to recover the standalone session?


A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a
menu command or pmcmd. These options are not available for batched sessions.
To recover sessions using the menu:
1. In the Server Manager, highlight the session you want to recover.
2. Select Server Requests-Stop from the menu.
3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.
To recover sessions using pmcmd:
1.From the command line, stop the session.
2. From the command line, start recovery.
107. How can u recover the session in sequential batches?
If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The
Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property

To recover sessions in sequential batches configured to stop on failure:


1. In the Server Manager, open the session property sheet.
2. On the Log Files tab, select Perform Recovery, and click OK.
3. Run the session.
4. After the batch completes, open the session property sheet.
5. Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous
session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the
failed session as a standalone session.
108. How to recover sessions in concurrent batches?
If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a
session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone
session.
To recover a session in a concurrent batch:
1.Copy the failed session using Operations-Copy Session.
2. Drag the copied session outside the batch to be a standalone session.
3. Follow the steps to recover a standalone session.
4. Delete the standalone copy.
109. How can u complete unrecoverable sessions?
Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session
from the beginning. Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery
might result in inconsistent data.
110. What r the circumstances that Informatica server results an unrecoverable session?
The source qualifier transformation does not use sorted ports.
If u changes the partition information after the initial session fails.
Perform recovery is disabled in the Informatica server configuration.
If the sources or targets changes after initial session fails.
If the mapping consists of sequence generator or normalizer transformation.
If a concurrent batchs contains multiple failed sessions.
111. If i done any modifications for my table in backend does it reflect in Informatica warehouse or mapping designer or
source analyzer?
NO. Informatica is not at all concern with back end data base. It displays u all the information that is to be stored in repository.
If want to reflect back end changes to Informatica screens, again u have to import from back end to Informatica by valid connection.
And u have to replace the existing files with imported files.
112. After dragging the ports of three sources (sql server, oracle, informix) to a single source qualifier, can u map these three
ports directly to target?
NO. Unless and until u join those three ports in source qualifier u cannot map them directly.
113. How can u work with remote database in Informatica? Did u work directly by using remote connections?
To work with remote data source u need to connect it with remote connections. But it is not preferable to work with that remote
source directly by using remote connections .Instead u brings that source into U r local machine where Informatica server resides. If u
works directly with remote source the session performance will decreases by passing less amount of data across the network in a
particular time.
FACTLESS FACT TABLE
Fact tables don't have any facts ($ amounts) at all! They may consist of nothing but keys. These are called factless fact tables.
The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out
to be factless. One good example is: track student attendance at a college. Imagine that you have a modern student tracking system
that detects each student attendance event each day. With the heightened powers of Dimensional thinking that you have developed
over the past few months, you can easily list the dimensions surrounding the student attendance event.
DEGENRATE DIMENSIONS
In transaction-oriented fact tables, treat the operational control numbers (such as the purchase order or invoice number) as
degenerate dimensions. They reside as dimension keys on the fact table, but don't join to a corresponding dimension table.
Teams are sometimes tempted to create a dimension table with information from the operational header record, such as the
transaction number, transaction date, transaction type, or transaction terms. In this case, you'd end up with a dimension table that has
nearly as many rows as the fact table. A dimension table growing at nearly the same pace as the fact table is a warning sign that a
degenerate dimension
may be lurking within it.
Much of my work involves designing and reviewing dimensional data models, and an interesting issue that often comes up is
how to deal with data items such as invoice number, order number and so on, that are not strictly facts - you're not going to want to add
them up, or average them, or perform any other maths on them - but they don't seem to fit into existing dimensions.
Ralph Kimball coined the term 'Degenerate Dimensions' for these data items, as they perform much the same function as
dimensions: they sit in the fact table and allow you to limit down or 'slice and dice' your fact table measures, but they aren't foreign key
links through to dimension tables, as all the information you want - the invoice number, or the order number -is contained in the
degenerate dimension column itself. Degenerate dimensions are useful as they tie the transactions, or events, in the fact table back to

real-life items - invoices, orders and so on and they can be a quick way to group together similar transactions for further analysis.
The key here is not to go overboard and make these degenerate dimensions into full dimension tables for example, an
Invoice dimension - as in all likelihood this dimension table will grow at the same rate as your fact table. If there is other interesting
information to go with the invoice - for example, who the customer was, what products were ordered - this is better placed in specific
dimensions for customers and products where it can be stored as a kind of 'master copy', rather than storing it alongside each order in
a ballooning Invoice dimension.
The other advantage with degenerate dimensions is that they're a lot easier to build and maintain when using ETL tools such
as Oracle Warehouse Builder, as you don't have to create dimension lookup tables, create synthetic keys, sequences and so on.
Indeed, if you're loading your dimensional model into a multidimensional database such as Oracle OLAP, your database will be much
smaller in size and easier to handle if you can keep the number of formal dimensions to a minimum, as they tend to 'explode' in size the
more dimensions you add to the database.
Judicious use of degenerate dimensions keeps your dimensional model rational and your database size reasonable, whilst
allowing you to keep useful items in the fact table that help us tie the data warehouse back to the original source systems.
Parameter files, Mapping variables when do we use them and why, how do they affect the performance.
Mapping parameter represents a constant value that U can define before running a session. A mapping parameter retains the
same value throughout the entire session. When u use the mapping parameter, U declare and use the parameter in a mapping or
Mapplet. Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout the session. The Informatica
server saves the value of mapping variable to the repository at the end of session run and uses that value next time U run the session.
Parameter files are used @ session level-Syntax for parameter files is:
[folder_name.session_name]
parameter_name=value
variable_name=value
Enter this in any editor and give the path in properties.
(Click the Properties tab in session.
Enter the parameter directory and name in the Parameter Filename field)
We use them generally whenever we want to change dates/pass specific date values
Why we use them is because hard coding the values is difficult for maintenance--having in a txt editor is easy to maintain /change.
As far as I can think They dont effect performance much--(but depends on how many records are there in ur source and what field ur
passing...)
114. What factors do we need to consider/take care of when we migrate our mappings from development to testing...
something in relation to connectivity objects.
Lets say we got 3 repositories DEV, QA, PROD. For migration from Dev to QA Repository, 1st the DBA's will create all the
required objects (tables, views...) in the respective schemas (1) in the QA database. We then move (2)(just drag and drop) all the
required mappings in designer to the specified folder (3) in QA repository. Validate mapping in Designer. Then we move (just drag and
drop again) all the tasks (WFs) in the workflow manager from dev to QA. (4).Validate WF again.
This is the procedure followed in most of the projects. Also there is 1 more procedure given in guide to do it directly --check
that
1)All the tables in a project will be in different schemas according to their functionality
2)You might need to follow the order of migration. First the Shared
Folder then the Folder which is using shared folder objects using shortcuts.
3)In a project we'll have all source to stg mappings in 1 folder and actual maps in 1 more folder and so on......so we have to take care
when moving from 1 rep to other. we have move from folder A mappings in dev to folder A in QA
4)After moving WF's we got to change connections to point to QA database
115. How many Mapplets do u on an average create in a project?
Depends --- Usually say around 5.we got 2 in our project
116. What do u need to take care of when we r upgrading from one version to another
(What changes do we need to do in
our mappings).
None in our mappings except in certain instances. These are the steps for upgradation
PROCESS FLOW (STEPS FOR INFORMATICA UPGRADE):
1) Prepare the Repository
2) Create a copy of the Repository
3) Installation and Configuration of PC7.1.2 Components (client and Repository server)
4) Upgrade the Repository
5) Install and Configure the PC Server
6) Register the PC Server with the Repository
7) Test the Upgradation/Installation
117. Will there be any difference when we r extracting from /loading data to SQL server or DB2 or flat files?
None, SQL sever and DB2 is just like oracle table import. For flat files its a little different. Loading is same.

118. Where do we use UNIX shell scripts, and what do they contain?
Used for scheduling. Talk about pmcmd
119. How do we collect performance details of a session?
The performance details provide counters that help you understand the session and mapping efficiency. Just check the tab in
session.
"You create performance details by selecting Collect Performance Data in the session properties before running the session.
By evaluating the final performance details, you can determine where session performance slows down. Monitoring also provides
session-specific details that can help tune the following:
Buffer block size
Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations
Lookup transformations
Before using performance details to improve session performance you must do the following:
Enable monitoring
Increase Load Manager shared memory
Understand performance counters
Say its only selected in testing. My experience it slows down the session a lot
Tuning Mappings part1
Mapping-level optimization takes time to implement but can significantly boost performance. Sometimes the mapping is the
biggest bottleneck in the load process because business rules determine the number and complexity of transformations in a mapping.
Before deciding on the best route to optimize the mapping architecture, you need to resolve some basic issues. Tuning
mappings is a tiered process. The first tier can be of assistance almost universally, bringing about a performance increase in all
scenarios. The second tier of tuning processes may yield only small performance increase, or can be of significant value, depending on
the situation.
Some factors to consider when choosing tuning processes at the mapping level include the specific environment, software/
hardware limitations, and the number of records going through a mapping. This Best Practice offers some guidelines for tuning
mappings Description.
Analyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize
mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected
input/output or output ports. Doing so can reduce the amount of data the transformations store in the data cache. Too many Lookups
and Aggregators encumber performance because each requires index cache and data cache. Since both are fighting for memory
space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings
is another option.
Limit the number of Aggregators in a mapping. A high number of Aggregators can increase I/O activity on the cache directory.
Unless the seek/access time is fast on the directory itself, having too many Aggregators can cause a bottleneck. Similarly, too many
Lookups in a mapping causes contention of disk and memory, which can lead to thrashing, leaving insufficient memory to run a
mapping efficiently.
Consider Single-Pass Reading
If several mappings use the same data source, consider a single-pass reading. Consolidate separate mappings into one
mapping with either a single Source Qualifier Transformation or one set of Source Qualifier Transformations as the data source for the
separate data flows.
Similarly, if a function is used in several mappings, a single-pass reading will reduce the number of times that function will be called in
the session.
Optimize SQL Overrides
When SQL overrides are required in a Source Qualifier, Lookup Transformation, or in the update override of a target object, be
sure the SQL statement is tuned. The extent to which and how SQL can be tuned depends on the underlying source or target database
system.
Scrutinize Data type Conversions
Power Center Server automatically makes conversions between compatible data types. When these conversions are
performed unnecessarily performance slows. For example, if a mapping moves data from an Integer port to a Decimal port, then back
to an Integer port, the conversion may be unnecessary.
In some instances however, data type conversions can help improve performance. This is especially true when integer values
are used in place of other data types for performing comparisons using Lookup and Filter transformations.
Eliminate Transformation Errors
Large numbers of evaluation errors significantly slow performance of the PowerCenter Server. During transformation errors,
the PowerCenter Server engine pauses to determine the cause of the error, removes the row causing the error from the data flow, and
logs the error in the session log.
Transformation errors can be caused by many things including: conversion errors, conflicting mapping logic, any condition that
is specifically set up as an error, and so on. The session log can help point out the cause of these errors. If errors recur consistently for
certain transformations, re-evaluate the constraints for these transformations. Any source of errors should be traced and eliminated.
Tuning mappings -2
Optimize Lookup Transformations
There are a number of ways to optimize lookup transformations that are setup in a mapping.

When to Cache Lookups


When caching is enabled, the Power Center Server caches the lookup table and queries the lookup cache during the session.
When this option is not enabled, the Power Center Server queries the lookup table on a row-by-row basis.
Sharing Lookup Caches
There are a number of methods for sharing lookup caches.
Within a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the
PowerCenter Server will re-use the cache for the multiple instances of the lookup. Using the same lookup multiple times in the mapping
will be more resource intensive with each successive instance. If multiple cached lookups are from the same table but are expected to
return different columns of data, it may be better to setup the multiple lookups to bring back the same columns even though not all
return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads.
Across sessions of the same mapping, the use of an unnamed persistent cache allows multiple runs to use an
existing cache file stored on the PowerCenter Server. If the option of creating a persistent cache is set in the lookup properties, the
memory cache created for the lookup during the initial run is saved to the PowerCenter Server. This can improve performance because
the Server builds the memory cache from cache files instead of the database. This feature should only be used when the lookup table is
not expected to change between session runs.
Across different mappings and sessions, the use of a named persistent cache allows sharing of an existing cache
file.
Reducing the Number of Cached Rows
There is an option to use a SQL override in the creation of a lookup cache. Options can be added to the WHERE clause to
reduce the set of records included in the resulting cache.
NOTE: If you use a SQL override in a lookup, the lookup must be cached.
Optimizing the Lookup Condition
In the case where a lookup uses more than one lookup condition, set the conditions with an equal sign first in order to optimize
lookup performance.
Indexing the Lookup Table
The Power Center Server must query, sort and compare values in the lookup condition columns. As a result, indexes on the
database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached
lookups.
In the case of a cached lookup, an ORDER BY condition is issued in the SQL statement used to create the cache. Columns
used in the ORDER BY condition should be indexed. The session log will contain the ORDER BY statement.
In the case of an un-cached lookup, since a SQL statement created for each row passing into the lookup transformation,
performance can be helped by indexing columns in the lookup condition.
Optimize Filter and Router Transformations
Filtering data as early as possible in the data flow improves the efficiency of a mapping. Instead of using a Filter
Transformation to remove a sizeable number of rows in the middle or end of a mapping, use a filter on the Source Qualifier or a Filter
Transformation immediately after the source qualifier to improve performance.
Avoid complex expressions when creating the filter condition. Filter transformations are most effective when a simple integer
or TRUE/FALSE expression is used in the filter condition.
Filters or routers should also be used to drop rejected rows from an Update Strategy transformation if rejected rows do not need to be
saved.
Replace multiple filter transformations with a router transformation. This reduces the number of transformations in the mapping
and makes the mapping easier to follow.
Optimize Aggregator Transformations
Aggregator Transformations often slow performance because they must group data before processing it.
Use simple columns in the group by condition to make the Aggregator Transformation more efficient. When possible, use
numbers instead of strings or dates in the GROUP BY columns. Also avoid complex expressions in the Aggregator expressions,
especially in GROUP BY ports.
Use the Sorted Input option in the aggregator. This option requires that data sent to the aggregator be sorted in the order in
which the ports are used in the aggregators group by. The Sorted Input option decreases the use of aggregate caches. When it is used,
the PowerCenter Server assumes all data is sorted by group and, as a group is passed through an aggregator, calculations can be
performed and information passed on to the next transformation. Without sorted input, the Server must wait for all rows of data before
processing aggregate calculations. Use of the Sorted Inputs option is usually accompanied by a Source Qualifier which uses the
Number of Sorted Ports option.
Use an Expression and Update Strategy instead of an Aggregator Transformation. This technique can only be used if the
source data can be sorted. Further, using this option assumes that a mapping is using an Aggregator with Sorted Input option. In the
Expression Transformation, the use of variable ports is required to hold data from the previous row of data processed. The premise is to
use the previous row of data to determine whether the current row is a part of the current group or is the beginning of a new group.
Thus, if the row is a part of the current group, then its data would be used to continue calculating the current group function. An Update
Strategy Transformation would follow the Expression Transformation and set the first row of a new group to insert and the following
rows to update
Tuning mapping-3
Optimize Joiner Transformations
Joiner transformations can slow performance because they need additional space in memory at run time to hold intermediate

results.
Define the rows from the smaller set of data in the joiner as the Master rows. The Master rows are cached to memory and the
detail records are then compared to rows in the cache of the Master rows. In order to minimize memory requirements, the smaller set of
data should be cached and thus set as Master.
Use Normal joins whenever possible. Normal joins are faster than outer joins and the resulting set of data is also smaller.
Use the database to do the join when sourcing data from the same database schema. Database systems usually can perform
the join more quickly than the Informatica Server, so a SQL override or a join condition should be used when joining multiple tables from
the same database schema.
Optimize Sequence Generator Transformations
Sequence Generator transformations need to determine the next available sequence number, thus increasing the Number of
Cached Values property can increase performance. This property determines the number of values the Informatica Server caches at
one time. If it is set to cache no values then the Informatica Server must query the Informatica repository each time to determine what is
the next number which can be used. Configuring the Number of Cached Values to a value greater than 1000 should be considered. It
should be noted any cached values not used in the course of a session are lost since the sequence generator value in the repository is
set, when it is called next time, to give the next set of cache values.
Avoid External Procedure Transformations
For the most part, making calls to external procedures slows down a session. If possible, avoid the use of these
Transformations, which include Stored Procedures, External Procedures and Advanced External Procedures.
Field Level Transformation Optimization
As a final step in the tuning process, expressions used in transformations can be tuned. When examining expressions, focus
on complex expressions for possible simplification.
To help isolate slow expressions, do the following:
1. Time the session with the original expression.
2. Copy the mapping and replace half the complex expressions with a constant.
3. Run and time the edited session.
4. Make another copy of the mapping and replace the other half of the complex expressions with a constant.
5. Run and time the edited session.
Processing field level transformations takes time. If the transformation expressions are complex, then processing will be
slower. Its often possible to get a 10- 20% performance improvement by optimizing complex field level transformations. Use the target
table mapping reports or the Metadata Reporter to examine the transformations. Likely candidates for optimization are the fields with
the most complex expressions. Keep in mind that there may be more than one field causing performance problems.
Factoring out Common Logic
This can reduce the number of times a mapping performs the same logic. If a mapping performs the same logic multiple times
in a mapping, moving the task upstream in the mapping may allow the logic to be done just once. For example, a mapping has five
target tables. Each target requires a Social Security Number lookup. Instead of performing the lookup right before each target, move
the lookup to a position before the data flow splits.
Minimize Function Calls
Anytime a function is called it takes resources to process. There are several common examples where function calls can be
reduced or eliminated.
Aggregate function calls can sometime be reduced. In the case of each aggregate function call, the Informatica Server must
search and group the data.
Thus the following expression:
SUM(Column A) + SUM(Column B)
Can be optimized to:
SUM(Column A + Column B)
In general, operators are faster than functions, so operators should be used whenever possible.
For example if you have an expression which involves a CONCAT function such as:
CONCAT(CONCAT(FIRST_NAME, ), LAST_NAME)
It can be optimized to:
FIRST_NAME || || LAST_NAME
Remember that IIF() is a function that returns a value, not just a logical test. This allows many logical statements to be written
in a more compact fashion.
For example:
IIF(FLG_A=Y and FLG_B=Y and FLG_C=Y, VAL_A+VAL_B+VAL_C,
IIF(FLG_A=Y and FLG_B=Y and FLG_C=N, VAL_A+VAL_B,
IIF(FLG_A=Y and FLG_B=N and FLG_C=Y, VAL_A+VAL_C,
IIF(FLG_A=Y and FLG_B=N and FLG_C=N, VAL_A,
IIF(FLG_A=N and FLG_B=Y and FLG_C=Y, VAL_B+VAL_C,
IIF(FLG_A=N and FLG_B=Y and FLG_C=N, VAL_B,
IIF(FLG_A=N and FLG_B=N and FLG_C=Y, VAL_C,
IIF(FLG_A=N and FLG_B=N and FLG_C=N, 0.0))))))))
Can be optimized to:
IIF(FLG_A=Y, VAL_A, 0.0) + IIF(FLG_B=Y, VAL_B, 0.0) + IIF(FLG_C=Y, VAL_C, 0.0)
The original expression had 8 IIFs, 16 ANDs and 24 comparisons. The optimized expression results in 3 IIFs, 3 comparisons and two
additions.
Be creative in making expressions more efficient. The following is an example of rework of an expression which eliminates three
comparisons down to one:
For example:
IIF(X=1 OR X=5 OR X=9, 'yes', 'no')

Can be optimized to:


IIF(MOD(X, 4) = 1, 'yes', 'no')
Calculate Once, Use Many Times
Avoid calculating or testing the same value multiple times. If the same sub-expression is used several times in a
transformation, consider making the sub-expression a local variable. The local variable can be used only within the transformation but
by calculating the variable only once can speed performance.
Choose Numeric versus String Operations
The Informatica Server processes numeric operations faster than string operations. For example, if a lookup is done on a large
amount of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves
performance.
Optimizing Char-Char and Char-Varchar Comparisons
When the Informatica Server performs comparisons between CHAR and VARCHAR columns, it slows each time it finds trailing
blank spaces in the row. The Treat CHAR as CHAR On Read option can be set in the Informatica Server setup so that the Informatica
Server does not trim trailing spaces from the end of CHAR source fields.
Use DECODE instead of LOOKUP
When a LOOKUP function is used, the Informatica Server must lookup a table in the database. When a DECODE function is
used, the lookup values are incorporated into the expression itself so the Informatica Server does not need to lookup a separate table.
Thus, when looking up a small set of unchanging values, using DECODE may improve performance.
Reduce the Number of Transformations in a Mapping
Whenever possible the number of transformations should be reduced. As there is always overhead involved in moving data
between transformations. Along the same lines, unnecessary links between transformations should be removed to minimize the amount
of data moved. This is especially important with data being pulled from the Source Qualifier Transformation.
Use Pre- and Post-Session SQL Commands
You can specify pre- and post-session SQL commands in the Properties tab of the Source Qualifier transformation and in the
Properties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target
before the session runs, then recreate them when the session completes.
Apply the following guidelines when using the SQL statements:
You can use any command that is valid for the database type. However, the Power Center Server does not allow nested comments,
even though the database might.
You can use mapping parameters and variables in SQL executed against the source, but not against the target.
Use a semi-colon (;) to separate multiple statements.
The Power Center Server ignores semi-colons within single quotes, double quotes, or within /* ...*/.
If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\).
The Workflow Manager does not validate the SQL.
Use Environmental SQL
For relational databases, you can execute SQL commands in the database environment when connecting to the database.
You can use this for source, target, lookup, and stored procedure connection. For instance, you can set isolation levels on the source
and target systems to avoid deadlocks. Follow the guidelines mentioned above for using the SQL statements.
1. Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are associated with one Fact Table ur
project?
Yes
2.What is ROLAP, MOLAP, and DOLAP...?
ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop OLAP). In these three OLAP
architectures, the interface to the analytic layer is typically the same; what is quite different is how the data is physically stored.
In MOLAP, the premise is that online analytical processing is best implemented by storing the data multidimensionally; that is,
data must be stored multidimensionally in order to be viewed in a multidimensional manner.
In ROLAP, architects believe to store the data in the relational model; for instance, OLAP capabilities are best provided against
the relational database.
In DOLAP, is a variation that exists to provide portability for the OLAP user. It creates multidimensional datasets that can be
transferred from server to desktop, requiring only the DOLAP software to exist on the target system. This provides significant
advantages to portable computer users, such as salespeople who are frequently on the road and do not have direct access to their
office server.
3. What is an MDDB and What is the difference between MDDBs and RDBMSs?
Multidimensional Database There are two primary technologies that are used for storing the data used in OLAP applications.
These two technologies are multidimensional databases (MDDB) and relational databases (RDBMS). The major difference
between MDDBs and RDBMSs is in how they store data. Relational databases store their data in a series of tables and columns.
Multidimensional databases, on the other hand, store their data in a large multidimensional arrays.
For example, in an MDDB world, you might refer to a sales figure as Sales with Date, Product, and Location coordinates of 121-2001, Car, and south, respectively.
Advantages of MDDB:
Retrieval is very fast because
The data corresponding to any combination of dimension members can be retrieved with a single I/O.
Data is clustered compactly in a multidimensional array.
Values are caluculated ahead of time.
The index is small and can therefore usually reside completely in memory.
Storage is very efficient because
The blocks contain only data.
A single index locates the block corresponding to a combination of sparse dimension numbers.
4. What is MDB modeling and RDB Modeling?

5. What is Mapplet and how do u create Mapplet?


A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can
contain as many transformations as you need.
Create a mapplet when you want to use a standardized set of transformation logic in several mappings. For example, if you
have a several fact tables that require a series of dimension keys, you can create a mapplet containing a series of Lookup
transformations to find each dimension key. You can then use the mapplet in each fact table mapping, rather than recreate the same
lookup logic in each mapping.
To create a new mapplet:
1. In the Mapplet Designer, choose Mapplets-Create Mapplet.
2. Enter a descriptive mapplet name.
The recommended naming convention for mapplets is mpltMappletName.
3. Click OK.
The Mapping Designer creates a new mapplet in the Mapplet Designer.
4. Choose Repository-Save.
6. What for is the transformations used?
Transformations are the manipulation of data from how it appears in the source system(s) into another form in the data
warehouse or mart in a way that enhances or simplifies its meaning. In short, u transforms data into information.
This includes Datamerging, Cleansing, Aggregation: Datamerging: Process of standardizing data types and fields. Suppose one source system calls integer type data as small int where as
another calls similar data as decimal. The data from the two source systems needs to rationalized when moved into the oracle data
format called number.
Cleansing: This involves identifying any changing inconsistencies or inaccuracies.
- Eliminating inconsistencies in the data from multiple sources.
- Converting data from different systems into single consistent data set suitable for analysis.
- Meets a standard for establishing data elements, codes, domains, formats and naming conventions.
- Correct data errors and fills in for missing data values.
Aggregation: The process where by multiple detailed values are combined into a single summary value typically summation numbers
representing dollars spend or units sold.
- Generate summarized data for use in aggregate fact and dimension tables.
Data Transformation is an interesting concept in that some transformation can occur during the extract, some during the
transformation, or even in limited cases--- during load portion of the ETL process. The type of transformation function u need will
most often determine where it should be performed. Some transformation functions could even be performed in more than one place.
Bze many of the transformations u will want to perform already exist in some form or another in more than one of the three
environments (source database or application, ETL tool, or the target db).

7. What is the difference between OLTP & OLAP?


OLTP stand for Online Transaction Processing. This is standard, normalized database structure. OLTP is designed for
Transactions, which means that inserts, updates, and deletes must be fast. Imagine a call center that takes orders. Call takers are
continually taking calls and entering orders that may contain numerous items. Each order and each item must be inserted into a
database. Since the performance of database is critical, we want to maximize the speed of inserts (and updates and deletes). To
maximize performance, we typically try to hold as few records in the database as possible.
OLAP stands for Online Analytical Processing. OLAP is a term that means many things to many people. Here, we will use the
term OLAP and Star Schema pretty much interchangeably. We will assume that star schema database is an OLAP system.( This is not
the same thing that Microsoft calls OLAP; they extend OLAP to mean the cube structures built using their product, OLAP Services).
Here, we will assume that any system of read-only, historical, aggregated data is an OLAP system.
A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost always used to support decisionmaking in the organization. That is why many data warehouses are considered to be DSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage mechanisms for read-only, historical, aggregated data.
By read-only, we mean that the person looking at the data wont be changing it. If a user wants at the sales yesterday for a
certain product, they should not have the ability to change that number.
The historical part may just be a few minutes old, but usually it is at least a day old. A data warehouse usually holds data that
goes back a certain period in time, such as five years. In contrast, standard OLTP systems usually only hold data as long as it is
current or active. An order table, for example, may move orders to an archive table once they have been completed, shipped, and
received by the customer.
When we say that data warehouses and data marts hold aggregated data, we need to stress that there are many levels of
aggregation in a typical data warehouse.
8. If data source is in the form of Excel Spread sheet then how do use?
PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a flat file. Like relational sources, the

Designer uses ODBC to import a Microsoft Excel source. You do not need database permissions to import Microsoft Excel sources.
To import an Excel source definition, you need to complete the following tasks:
Install the Microsoft Excel ODBC driver on your system.
Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit Administrator.
Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric data.
Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges display as source definitions
when you import the source.
9. Which db is RDBMS and which is MDDB can u name them?
MDDB ex. Oracle Express Server(OES), Essbase by Hyperion Software, Powerplay by Cognos and RDBMS ex. Oracle , SQL
Server etc.
10. What are the modules/tools in Business Objects? Explain their purpose briefly?
BO Designer, Business Query for Excel, BO Reporter, Infoview, Explorer, WEBI, BO Publisher, and Broadcast Agent, BO
ZABO).
InfoView: IT portal entry into Web Intelligence & Business Objects.
Base module required for all options to view and refresh reports.
Reporter: Upgrade to create/modify reports on LAN or Web.
Explorer: Upgrade to perform OLAP processing on LAN or Web.
Designer: Creates semantic layer between user and database.
Supervisor: Administer and control access for group of users.
Web Intelligence: Integrated query, reporting, and OLAP analysis over the Web.
Broadcast Agent: Used to schedule, run, publish, push, and broadcast pre-built reports and spreadsheets, including event notification
and response capabilities, event filtering, and calendar based notification, over the LAN, e-mail, pager, Fax, Personal Digital
Assistant( PDA), Short Messaging Service(SMS), etc.
Set Analyzer - Applies set-based analysis to perform functions such as exclusion, intersections, unions, and overlaps visually.
Developer Suite Build packaged, analytical, or customized apps.
11.What are the Ad hoc queries, Canned Queries/Reports and How do u create them?
(Plz check this pageC\:BObjects\Quries\Data Warehouse - About Queries.htm)
The data warehouse will contain two types of query. There will be fixed queries that are clearly defined and well understood, such as
regular reports, canned queries (standard reports) and common aggregations. There will also be ad hoc queries that are unpredictable,
both in quantity and frequency.
Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a database. Any business analyst wants to know what is
inside the database. He then proceeds by calculating totals, averages, maximum and minimum values for most attributes within the
database. These are unpredictable element of a data warehouse. It is exactly that ability to run any query when desired and expect a
reasonable response that makes the data warehouse worthwhile, and makes the design such a significant challenge.
The end-user access tools are capable of automatically generating the database query that answers any Question posed by
the user. The user will typically pose questions in terms that they are familiar with (for example, sales by store last week); this is
converted into the database query by the access tool, which is aware of the structure of information within the data warehouse.
Canned queries: Canned queries are predefined queries. In most instances, canned queries contain prompts that allow you to
customize the query for your specific needs. For example, a prompt may ask you for a School, department, term, or section ID. In this
instance you would enter the name of the School, department or term, and the query will retrieve the specified data from the
Warehouse. You can measure resource requirements of these queries, and the results can be used for capacity planning and for
database design.
The main reason for using a canned query or report rather than creating your own is that your chances of misinterpreting data
or getting the wrong answer are reduced. You are assured of getting the right data and the right answer.
12. How many Fact tables and how many dimension tables u did? Which table precedes what?
http://www.ciobriefings.com/whitepapers/StarSchema.asp

13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA?
http://www.ciobriefings.com/whitepapers/StarSchema.asp
14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?
Because its denormalized structure, i.e., Dimension Tables are denormalized. Why to denormalize means the first (and often
only) answer is: speed. OLTP structure is designed for data inserts, updates, and deletes, but not data retrieval. Therefore, we can
often squeeze some speed out of it by denormalizing some of the tables and having queries go against fewer tables.
These queries are faster because they perform fewer joins to retrieve the same record set. Joins are also confusing to many End users.
By denormalizing, we can present the user with a view of the data that is far easier for them to understand.
Benefits of STAR SCHEMA:
Far fewer Tables.
Designed for analysis across time.
Simplifies joins.
Less database space.
Supports drilling in reports.
Flexibility to meet business and technical needs.
15. How do u load the data using Informatica?
Using session.

16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to use FTP without a special utility?
(i) The FTP (File Transfer Protocol) utility program is commonly used for copying files to and from other computers. These computers
may be at the same site or at different sites thousands of miles apart. FTP is general protocol that works on UNIX systems as well as
other non- UNIX systems.
(ii) Remote connect commands:
ftp machinename
ex: ftp 129.82.45.181 or ftp iesg
If the remote machine has been reached successfully, FTP responds by asking for a login name and password. When u enter ur own
login name and password for the remote machine, it returns the prompt like below
ftp>
and permits u access to ur own home directory on the remote machine. U should be able to move around in ur own directory
and to copy files to and from ur local machine using the FTP interface commands.
Note: U can set the mode of file transfer to ASCII (default and transmits seven bits per character).
Use the ASCII mode with any of the following:
- Raw Data (e.g. *.dat or *.txt, codebooks, or other plain text documents)
- SPSS Portable files.
- HTML files.
If u set mode of file transfer to Binary (the binary mode transmits all eight bits per byte and thus provides less chance of
a transmission error and must be used to transmit files other than ASCII files).
For example use binary mode for the following types of files:
- SPSS System files
- SAS Dataset
- Graphic files (eg., *.gif, *.jpg, *.bmp, etc.)
- Microsoft Office documents (*.doc, *.xls, etc.)
(iii) Yes. If u r using Windows, u can access a text-based FTP utility from a DOS prompt.
To do this, perform the following steps:
1. From the Start Programs MS-Dos Prompt
2. Enter ftp ftp.geocities.com. A prompt will appear
(or)
Enter ftp to get ftp prompt ftp> open hostname ex. ftp>open ftp.geocities.com (It connect to the specified host).
3. Enter ur yahoo! GeoCities member name.
4. enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo! GeoCities directory.
17. What cmd is used to transfer multiple files at a time using FTP?
mget ==> To copy multiple files from the remote machine to the local machine. You will be prompted for a y/n answer before
transferring each file mget * ( copies all files in the current remote directory to ur current local directory, using the same file names).
mput ==> To copy multiple files from the local machine to the remote machine.
18. What is an Filter Transformation or what options u have in Filter Transformation?
The Filter transformation provides the means for filtering records in a mapping. You pass all the rows from a source
transformation through the Filter transformation, then enter a filter condition for the transformation. All ports in a Filter transformation are
input/output, and only records that meet the condition pass through the Filter transformation.
Note: Discarded rows do not appear in the session log or reject files
To maximize session performance, include the Filter transformation as close to the sources in the mapping as possible. Rather
than passing records you plan to discard through the mapping, you then filter out unwanted data early in the flow of data from sources
to targets.
You cannot concatenate ports from more than one transformation into the Filter transformation; the input ports for the filter
must come from a single transformation. Filter transformations exist within the flow of the mapping and cannot be unconnected. The
Filter transformation does not allow setting output default values.
19. What are default sources which will supported by Informatica Powermart ?
Relational tables, views, and synonyms.
Fixed-width and delimited flat files that do not contain binary data.
COBOL files.
20. When do u create the Source Definition? Can I use this Source Definition to any Transformation?
When working with a file that contains fixed-width binary data, you must create the source definition.
The Designer displays the source definition as a table, consisting of names, datatypes, and constraints. To use a source
definition in a mapping, connect a source definition to a Source Qualifier or Normalizer transformation. The Informatica Server uses
these transformations to read the source data.
21. What is Active & Passive Transformation?
Ans: Active and Passive Transformations
Transformations can be active or passive. An active transformation can change the number of records passed through it. A passive
transformation never changes the record count. For example, the Filter transformation removes rows that do not meet the filter
condition defined in the transformation.
Active transformations that might change the record count include the following:
Advanced External Procedure
Aggregator

Filter
Joiner
Normalizer
Rank
Source Qualifier
Note: If you use Power Connect to access ERP sources, the ERP Source Qualifier is also an active transformation.
You can connect only one of these active transformations to the same transformation or target, since the Informatica Server
cannot determine how to concatenate data from different sets of records with different numbers of rows.
Passive transformations that never change the record count include the following:
Lookup
Expression
External Procedure
Sequence Generator
Stored Procedure
Update Strategy
You can connect any number of these passive transformations, or connect one active transformation with any number of
passive transformations, to the same transformation or target.
22. What is staging Area and Work Area?
Staging Area
- Holding Tables on DW Server.
- Loaded from Extract Process
- Input for Integration/Transformation
- May function as Work Areas
- Output to a work area or Fact Table
Work Area
- Temporary Tables
- Memory
23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page # 125)
Definition: Data About Data
Metadata contains descriptive data for end users. In a data warehouse the term metadata is used in a number of different situations.
Metadata is used for:
Data transformation and load
Data management
Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and any changes that need to be made. The
advantage of storing metadata about the data being transformed is that as source data changes the changes can be captured in the
metadata, and transformation programs automatically regenerated.
For each source data field the following information is required:
Source Field:
Unique identifier (to avoid any confusion occurring between 2 fields of the same name from different sources).
Name (Local field name).
Type (storage type of data, like character, integer, floating pointand so on).
Location
- system (system it comes from ex.Accouting system).
- object (object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:
Unique identifier
Name
Type (database data type, such as Char, Varchar, Number and so on).
Table name (Name of the table th field will be part of).
The other information that needs to be stored is the transformation or transformations that need to be applied to turn the source data
into the destination data:
Transformation:
Transformation (s)
- Name
- Language (name of the lanjuage that transformation is written in).
- module name
- syntax
The Name is the unique identifier that differentiates this from any other similar transformations. The Language attribute
contains the name of the language that the transformation is written in. The other attributes are module name and syntax. Generally
these will be mutually exclusive, with only one being defined. For simple transformations such as simple SQL functions the syntax will
be stored. For complex transformations the name of the module that contains the code is stored instead.
Data management:
Metadata is required to describe the data as it resides in the data warehouse. This is needed by the warehouse manager to allow it to
track and control all data movements. Every object in the database needs to be described.
Metadata is needed for all the following:
Tables
- Columns
- name
- type

Indexes
- Columns
- name
- type
Views
- Columns
- name
- type
Constraints
- name
- type
- table
- columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page # 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same metadata can be used by the warehouse
manager to describe the data in the data warehouse is also required by the query manager.
The query manager will also generate metadata about the queries it has run. This metadata can be used to build a history of all queries
run and generate a query profile for each user, group of users and the data warehouse as a whole.
The metadata that is required for each query is:
- query
- tables accessed
- columns accessed
- name
- reference identifier
- restrictions applied
- column name
- table name
- reference identifier
- restriction
- join Criteria applied

- aggregate functions used

- group by criteria

- sort criteria

- syntax - execution plan


- resources

24. What kind of Unix flavoures u r experienced?


Solaris 2.5 SunOs 5.5 (Operating System)
Solaris 2.6 SunOs 5.6 (Operating System)
Solaris 2.8 SunOs 5.8 (Operating System)
AIX 4.0.3
5.5.1 2.5.1 May 96 sun4c, sun4m, sun4d, sun4u, x86, ppc
5.6 2.6 Aug. 97 sun4c, sun4m, sun4d, sun4u, x86
5.7 7 Oct. 98 sun4c, sun4m, sun4d, sun4u, x86
5.8 8 2000 sun4m, sun4d, sun4u, x86
25. What are the tasks that are done by Informatica Server?
The Informatica Server performs the following tasks:
Manages the scheduling and execution of sessions and batches
Executes sessions and batches
Verifies permissions and privileges
Interacts with the Server Manager and pmcmd.
The Informatica Server moves data from sources to targets based on metadata stored in a repository. For instructions on how
to move and transform data, the Informatica Server reads a mapping (a type of metadata that includes transformations and source and
target definitions).
Each mapping uses a session to define additional information and to optionally override mapping-level
options. You can group multiple sessions to run as a single unit, known as a batch.

26. What are the two programs that communicate with the Informatica Server?
Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server:
Server Manager. A client application used to create and manage sessions and batches, and to monitor and stop the Informatica
Server. You can use information provided through the Server Manager to troubleshoot sessions and improve session performance.
pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the Informatica Server, and verify if the
Informatica Server is running.

27. When do u reinitialize Aggregate Cache?


Reinitializing the aggregate cache overwrites historical aggregate data with new aggregate data. When you reinitialize the
aggregate cache, instead of using the captured changes in source tables, you typically need to use the use the entire source table.
For example, you can reinitialize the aggregate cache if the source for a session changes incrementally every day and completely
changes once a month. When you receive the new monthly source, you might configure the session to reinitialize the aggregate cache,
truncate the existing target, and use the new source table during the session.
Note: To be clarified when server manger works for following
To reinitialize the aggregate cache:
1.In the Server Manager, open the session property sheet.
2.Click the Transformations tab.
3.Check Reinitialize Aggregate Cache.
4.Click OK three times to save your changes.
5.Run the session.
The Informatica Server creates a new aggregate cache, overwriting the existing aggregate cache.
To be check for step 6 & step 7 after successful run of session
6.After running the session, open the property sheet again.
7.Click the Data tab.
8.Clear Reinitialize Aggregate Cache.
9.Click OK.
28. (i) What is Target Load Order in Designer?
Target Load Order: - In the Designer, you can set the order in which the Informatica Server sends records to various target definitions
in a mapping. This feature is crucial if you want to maintain referential integrity when inserting, deleting, or updating records in tables
that have the primary key and foreign key constraints applied to them. The Informatica Server writes data to
all the targets connected to the same Source Qualifier or Normalizer simultaneously, to maximize performance.
28. (ii) What are the minimum condition that u need to have so as to use Target Load Order Option in Designer?
U need to have Multiple Source Qualifier transformations.
To specify the order in which the Informatica Server sends data to targets, create one Source Qualifier or Normalizer
transformation for each target within a mapping. To set the target load order, you then determine the order in which each Source
Qualifier sends data to connected targets in the mapping.
When a mapping includes a Joiner transformation, the Informatica Server sends all records to targets connected to that Joiner
at the same time, regardless of the target load order.
28(iii). How do u set the Target load order?
To set the target load order:
1. Create a mapping that contains multiple Source Qualifier transformations.
2. After you complete the mapping, choose Mappings-Target Load Plan.
A dialog box lists all Source Qualifier transformations in the mapping, as well as the targets that receive data from each Source
Qualifier.
3. Select a Source Qualifier from the list.
4. Click the Up and Down buttons to move the Source Qualifier within the load order.
5. Repeat steps 3 and 4 for any other Source Qualifiers you wish to reorder.
6. Click OK and Choose Repository-Save.
29. What u can do with Repository Manager?
We can do following tasks using Repository Manager : To create usernames, you must have one of the following sets of privileges:
- Administer Repository privilege
- Super User privilege
To create a user group, you must have one of the following privileges :
- Administer Repository privilege
- Super User privilege
To assign or revoke privileges , u must have one of the following privilege..
- Administrator Repository privilege
- Super User privilege
Note: You cannot change the privileges of the default user groups or the default repository users
30. What u can do with Designer?
The Designer client application provides five tools to help you create mappings:
Source Analyzer. Use to import or create source definitions for flat file, Cobol, ERP, and relational sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.
Note: The Designer allows you to work with multiple tools at one time. You can also work in multiple folders and repositories
31. What are different types of Tracing Levels u have in Transformations?
Tracing Levels in Transformations: Terse indicates when the Informatica Server initializes the session and its components. Summarizes session results, but not at the

level of individual records.


Normal Includes initialization information as well as error messages and notification of rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive information about initializing
transformations in the session.
Verbose data Includes all information provided with the Verbose initialization setting.
Note: By default, the tracing level for every transformation is Normal.
To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum of detail to the session log when
running a session containing the transformation.
31(i). What the difference is between a database, a data warehouse and a data mart?
A database is an organized collection of information.
-- A data warehouse is a very large database with special sets of tools to extract and cleanse data from operational systems and to
analyze data.
-- A data mart is a focused subset of a data warehouse that deals with a single area of data and is organized for quick
analysis.
32. What is Data Mart, Data Warehouse and Decision Support System explain briefly?
Data Mart:
A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular
community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more
specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of
analysis, content, presentation, and ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar.
In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However,
most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data
warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later
be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that
may derive from a data warehouse or not and that emphasizes ease of access and usability for a particular designed purpose. In
general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at
meeting an immediate need.
Data Warehouse:
A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems
collect. The term was coined by W. H. Inmon. IBM sometimes uses the term "information warehouse."
Typically, a data warehouse is housed on an enterprise mainframe server. Data from various online transaction processing
(OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical
applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access,
but does not generally start from the point-of-view of the end user or knowledge worker who may need access to specialized,
sometimes local databases. The latter idea is known as the data mart. Data mining, Web mining, and a decision support system (DSS)
are three kinds of applications that can make use of a data warehouse.
Decision Support System:
A decision support system (DSS) is a computer program application that analyzes business data and presents it so that users
can make business decisions more easily. It is an "informational application" (in distinction to an "operational application" that collects
the data in the course of normal business operation).
Typical information that a decision support application might gather and present would be:
Comparative sales figures between one week and the next Projected revenue figures based on new product sales
assumptions The consequences of different decision alternatives, given past experience in a context that is described.
A decision support system may present information graphically and may include an expert system or artificial intelligence (AI).
It may be aimed at business executives or some other group of knowledge workers.

33. What r the differences between Heterogeneous and Homogeneous?


Heterogeneous
Homogeneous
Stored in different Schemas
Common structure
Stored in different file or db types
Same database type
Spread across in several countries
Same data center
Different platform &H/W config.
Same platform and H/Ware configuration.
34. How do you use DDL commands in PL/SQL block ex. Accept table name from user and drop it, if available else display
msg?
To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL, the Package used is DBMS_SQL.
35. What r the steps to work with Dynamic SQL?
Open a Dynamic cursor, Parse SQL stmt, Bind i/p variables (if any), Execute SQL stmt of Dynamic Cursor and Close the
Cursor.
36. Which package, procedure is used to find/check free space available for db objects like
table/procedures/views/synonymsetc?
The Package is DBMS_SPACE
The Procedure is UNUSED_SPACE
The Table is DBA_OBJECTS

Note: See the script to find free space @ c:\informatica\tbl_free_space


37. Does informatica allow if EmpId is PKey in Target tbl and source data is 2 rows with same EmpID? If u use lookup for the
same situation does it allow to load 2 rows or only 1?
No, it will not it generates pkey constraint voilation. (it loads 1 row)
=> Even then no if EmpId is Pkey.
38. If Ename varchar2(40) from 1 source(siebel), Ename char(100) from another source (oracle) and the target is having Name
varchar2(50) then how does informatica handles this situation? How Informatica handles string and numbers datatypes
sources?
40. How do u qry the Metadata tables for Informatica?
41(i). When do u use connected lookup n when do u use unconnected lookup?
Connected Lookups : A connected Lookup transformation is part of the mapping data flow. With connected lookups, you can have multiple return
values. That is, you can pass multiple values from the same row in the lookup table out of the Lookup transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates
=> Finding a value based on multiple conditions
Unconnected Lookups : An unconnected Lookup transformation exists separate from the data flow in the mapping. You write an expression using
the :LKP reference qualifier to call the lookup within another transformation.
Some common uses for unconnected lookups include:
=> Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example, updating slowly changing dimension tables)
=> Calling the same lookup multiple times in one mapping
42. What u need concentrate after getting explain plan?
The 3 most significant columns in the plan table are named OPERATION,OPTIONS, and OBJECT_NAME.For each step,
these tell u which operation is going to be performed and which object is the target of that operation.
Ex:**************************
TO USE EXPLAIN PLAN FOR A QRY...
**************************
SQL> EXPLAIN PLAN
2 SET STATEMENT_ID = 'PKAR02'
3 FOR
4 SELECT JOB,MAX(SAL)
5 FROM EMP
6 GROUP BY JOB
7 HAVING MAX(SAL) >= 5000;
Explained.
**************************
TO QUERY THE PLAN TABLE :**************************
SQL> SELECT RTRIM(ID)||' '||
2 LPAD(' ', 2*(LEVEL-1))||OPERATION
3 ||' '||OPTIONS
4 ||' '||OBJECT_NAME STEP_DESCRIPTION
5 FROM PLAN_TABLE
6 START WITH ID = 0 AND STATEMENT_ID = 'PKAR02'
7 CONNECT BY PRIOR ID = PARENT_ID
8 AND STATEMENT_ID = 'PKAR02'
9 ORDER BY ID;
STEP_DESCRIPTION
---------------------------------------------------0 SELECT STATEMENT
1 FILTER
2 SORT GROUP BY
3 TABLE ACCESS FULL EMP
43. How components are interfaced in Psoft?
44. How do u do the analysis of an ETL?
45. What is Standard, Reusable Transformation and Mapplet?
Mappings contain two types of transformations, standard and reusable. Standard transformations exist within a single
mapping. You cannot reuse a standard transformation you created in another mapping, nor can you create a shortcut to that
transformation. However, often you want to create transformations that perform common tasks, such as calculating the average salary

in a department. Since a standard transformation cannot be used by more than one mapping, you have to set up the same
transformation each time you want to calculate the average salary in a department.
A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can
contain as many transformations as you need. A mapplet can contain transformations, reusable transformations, and shortcuts to
transformations.
46. How do u copy Mapping, Repository, Sessions?
To copy an object (such as a mapping or reusable transformation) from a shared folder, press the Ctrl key and drag and drop
the mapping into the destination folder.
To copy a mapping from a non-shared folder, drag and drop the mapping into the destination folder. In both cases, the
destination folder must be open with the related tool active.
For example, to copy a mapping, the Mapping Designer must be active. To copy a Source Definition, the Source Analyzer
must be active.
Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not dragging it to the workbook.
When asked if you want to make a copy, click Yes, then enter a new name and click OK.
Choose Repository-Save.
Repository Copying: You can copy a repository from one database to another. You use this feature before upgrading, to preserve the
original repository. Copying repositories provides a quick way to copy all metadata you want to use as a basis for a new repository.
If the database into which you plan to copy the repository contains an existing repository, the Repository Manager deletes the
existing repository. If you want to preserve the old repository, cancel the copy. Then back up the existing repository before copying the
new repository.
To copy a repository, you must have one of the following privileges:
Administer Repository privilege
Super User privilege
To copy a repository:
1. In the Repository Manager, choose Repository-Copy Repository.
2. Select a repository you wish to copy, then enter the following information:
-------------------------------- --------------------------- ------------------------------------------------Copy Repository Field Required/ Optional Description
-------------------------------- --------------------------- ------------------------------------------------Repository Required Name for the repository copy. Each repository name must be unique within
the domain and should be easily distinguished from all other repositories.
Database Username Required Username required to connect to the database. This login must have the appropriate database
permissions to create the repository.
Database Password Required Password associated with the database username.Must be in US-ASCII.
ODBC Data Source Required Data source used to connect to the database.
Native Connect String Required Connect string identifying the location of the database.
Code Page Required Character set associated with the repository. Must be a superset of the code
page of the repository you want to copy.
If you are not connected to the repository you want to copy, the Repository Manager asks you to log in.
3. Click OK.
5. If asked whether you want to delete an existing repository data in the second repository, click OK to delete it. Click Cancel to
preserve the existing repository.
Copying Sessions:
In the Server Manager, you can copy stand-alone sessions within a folder, or copy sessions in and out of batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission
Super User privilege
To copy a session:
1. In the Server Manager, select the session you wish to copy.
2. Click the Copy Session button or choose Operations-Copy Session.
The Server Manager makes a copy of the session. The Informatica Server names the copy after the original session, appending a
number, such as session_name1.
47. What are shortcuts, and what is advantage?
Shortcuts allow you to use metadata across folders without making copies, ensuring uniform metadata. A shortcut inherits all
properties of the object to which it points. Once you create a shortcut, you can configure the shortcut name and description.
When the object the shortcut references changes, the shortcut inherits those changes. By using a shortcut instead of a copy,
you ensure each use of the shortcut exactly matches the original object. For example, if you have a shortcut to a target definition, and
you add a column to the definition, the shortcut automatically inherits the additional column.
Shortcuts allow you to reuse an object without creating multiple objects in the repository. For example, you use a source
definition in ten mappings in ten different folders. Instead of creating 10 copies of the same source definition, one in each folder, you
can create 10 shortcuts to the original source definition.
You can create shortcuts to objects in shared folders. If you try to create a shortcut to a non-shared folder, the Designer
creates a copy of the object instead.
You can create shortcuts to the following repository objects:
Source definitions
Reusable transformations
Mapplets
Mappings

Target definitions
Business components
You can create two types of shortcuts:
Local shortcut. A shortcut created in the same repository as the original object.
Global shortcut. A shortcut created in a local repository that references an object in a global repository.
Advantages: One of the primary advantages of using a shortcut is maintenance. If you need to change all instances of an object, you
can edit the original repository object. All shortcuts accessing the object automatically inherit the changes.
Shortcuts have the following advantages over copied repository objects:
You can maintain a common repository object in a single location. If you need to edit the object, all shortcuts immediately inherit the
changes you make.
You can restrict repository users to a set of predefined metadata by asking users to incorporate the shortcuts into their work instead of
developing repository objects independently.
You can develop complex mappings, mapplets, or reusable transformations, then reuse them easily in other folders.
You can save space in your repository by keeping a single repository object and using shortcuts to that object, instead of creating
copies of the object in multiple folders or multiple repositories.
48. What are Pre-session and Post-session Options?
(Plzz refer Help Using Shell Commands n Post-Session Commands and Email)
The Informatica Server can perform one or more shell commands before or after the session runs. Shell commands are
operating system commands. You can use pre- or post- session shell commands, for example, to delete a reject file or session log, or
to archive target files before the session begins.
The status of the shell command, whether it completed successfully or failed, appears in the session log file.
To call a pre- or post-session shell command you must:
1. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows NT servers.
2. Configure the session to execute the pre- or post-session shell commands.
You can configure a session to stop if the Informatica Server encounters an error while executing pre-session shell commands.
For example, you might use a shell command to copy a file from one directory to another. For a Windows NT server you would use the
following shell command to copy the SALES_ ADJ file from the target directory, L, to the source, H:
copy L:\sales\sales_adj H:\marketing
For a UNIX server, you would use the following command line to perform a similar operation:
cp sales/sales_adj marketing/
Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the Informatica Server. Environment settings in one
shell command script do not carry over to other scripts. To run all shell commands in the same environment, call a single shell script
that in turn invokes other scripts.
49. What are Folder Versions?
In the Repository Manager, you can create different versions within a folder to help you archive work in development. You can
copy versions to other folders as well. When you save a version, you save all metadata at a particular point in development. Later
versions contain new or modified metadata, reflecting work that you have completed since the last version.
Maintaining different versions lets you revert to earlier work when needed. By archiving the contents of a folder into a version
each time you reach a development landmark, you can access those versions if later edits prove unsuccessful.
You create a folder version after completing a version of a difficult mapping, then continue working on the mapping. If you are
unhappy with the results of subsequent work, you can revert to the previous version, then create a new version to continue
development. Thus you keep the landmark version intact, but available for regression
50. How do automate/schedule sessions/batches n did u use any tool for automating Sessions/batch?
We scheduled our sessions/batches using Server Manager.
You can either schedule a session to run at a given time or interval, or you can manually start the session.
U need to create sessions n batches with Read n Execute permissions or super user privilege.
If you configure a batch to run only on demand, you cannot schedule it.
Note: We did not use any tool for automation process.
51. What are the differences between 4.7 and 5.1 versions?
New Transformations added like XML Transformation and MQ Series Transformation, and Power Mart and Power Center both are
same from 5.1version.
52. What r the procedure that u need to undergo before moving Mappings/sessions from Testing/Development to Production?
53. How many values it (Informatica server) returns when it passes thru Connected Lookup n Unconnected Lookup?
Connected Lookup can return multiple values where as Unconnected Lookup will return only one values that is Return Value.
54. What is the difference between PowerMart and PowerCenter in 4.7.2?
If You Are Using PowerCenter
PowerCenter allows you to register and run multiple Informatica Servers against the same repository. Because you can run
these servers at the same time, you can distribute the repository session load across available servers to improve overall performance.
With PowerCenter, you receive all product functionality, including distributed metadata, the ability to organize repositories into
a data mart domain and share metadata across repositories

A PowerCenter license lets you create a single repository that you can configure as a global repository, the core component of
a data warehouse.
If You Are Using PowerMart
This version of PowerMart includes all features except distributed metadata and multiple registered servers. Also, the various
options available with PowerCenter (such as PowerCenter Integration Server for BW, PowerConnect for IBM DB2,
PowerConnect for SAP R/3, and PowerConnect for PeopleSoft) are not available with PowerMart.
55. What kind of modifications u can do/perform with each Transformation?
Using transformations, you can modify data in the following ways:
----------------- -----------------------Task Transformation
----------------- -----------------------Calculate a value Expression
Perform an aggregate calculations Aggregator
Modify text Expression
Filter records Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure Stored Procedure
Call a procedure in a shared library or in the External Procedure
COM layer of Windows NT
Generate primary keys Sequence Generator
Limit records to a top or bottom range Rank
Normalize records, including those read Normalizer
from COBOL sources
Look up values Lookup
Determine whether to insert, delete, update, Update Strategy
or reject records
Join records from different databases Joiner
or flat file systems
56. Expressions in Transformations, Explain briefly how do u use?
Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The most obvious examples of these are the
Expression and Aggregator transformations, which perform calculations on either single values or an entire range of values within a
port. Transformations that use expressions include the following:
--------------------- -----------------------------------------Transformation How It Uses Expressions
--------------------- -----------------------------------------Expression Calculates the result of an expression for each row passing through the transformation, using values from one or more
ports.
Aggregator Calculates the result of an aggregate expression, such as a sum or average, based on all data passing through a port or
on groups within that data.
Filter Filters records based on a condition you enter using an expression.
Rank Filters the top or bottom range of records; based on a condition you enter using an expression.
Update Strategy Assigns a numeric code to each record based on an expression, indicating whether the Informatica Server should use
the information in the record to insert, delete, or update the target.
In each transformation, you use the Expression Editor to enter the expression. The Expression Editor supports the
transformation language for building expressions. The transformation language uses SQL-like functions, operators, and other
components to build the expression. For example, as in SQL, the transformation language includes the functions COUNT and SUM.
However, the Power Mart / Power Center transformation language includes additional functions not found in SQL.
When you enter the expression, you can use values available through ports. For example, if the transformation has two input
ports representing a price and sales tax rate, you can calculate the final sales tax using these two values. The ports used in the
expression can appear in the same transformation, or you can use output ports in other transformations.
57. In case of Flat files (which comes thru FTP as source) has not arrived then what happens?Where do u set this option?
U get an fatal error which cause server to fail/stop the session.
U can set Event-Based Scheduling Option in Session Properties under General tab-->Advanced options..
----------------- ------------------- -----------------Event-Based Required/ Optional Description
----------------- -------------------- -----------------Indicator File to Wait For Optional Required to use event-based scheduling. Enter the indicator file (or directory and file) whose arrival
schedules the session. If you do not enter a directory, the Informatica Server assumes the file appears in the server variable directory
$PMRootDir.
58. What is the Test Load Option and when you use in Server Manager?
When testing sessions in development, you may not need to process the entire source. If this is true, use the Test Load
Option(Session Properties General Tab Target Options. Choose Target Load options as Normal (option button), with Test Load
cheked (Check box) and No.of rows to test ex.2000 (Text box with Scrolls)). You can also click the Start button.
59. What is difference between data scrubbing and data cleansing?
Scrubbing data is the process of cleaning up the junk in legacy data and making it accurate and useful for the next generations of
automated systems. This is perhaps the most difficult of all conversion activities. Very often, this is made more difficult when the
customer wants to make good data out of bad data. This is the dog work. It is also the most important and can not be done
without the active participation of the user.
Data Cleansing - a two step process including DETECTION and then CORRECTION of errors in a data set

60. What is Metadata and Repository?


Metadata. Data about data .
It contains descriptive data for end users.
Contains data that controls the ETL processing.
Contains data about the current state of the data warehouse.
ETL updates metadata, to provide the most current state.
Repository. The place where you store the metadata is called a repository. The more sophisticated your repository, the more complex
and detailed metadata you can store in it. PowerMart and PowerCenter use a relational database as the repository.
61. What is a data-warehouse?
A data warehouse is subject-oriented, integrated, time-variant and non-volatile [data] collection in support of management
decision making processes. (OR) A data warehouse, is a collection of data gathered from one or more data repositories to create a
new, central database.
Storage of large volumes of data, Historical data, Load and save - no updates
Reporting system, Query and analysis, Trends and forecasting
62. What are Data Marts?
Data mart is restricted to a single business process/ group focusing more on group specific analysis which can be derived from
ore generic EDW.
Union of data marts equal data warehouse
63.What is ER Diagram?
Entity Relational diagram. Used to represent the OLTP systems. Highly normalized depicting the relations between the
entities.
The relations could be (a) one to one (b) One to many (crowfoot style) (c) Many to many
64. What is a Star Schema?
Star schema: A modeling paradigm that has fact table in the middle connected to a number of dimensions tables around it radially
The dimension tables are highly de-normalized which minimizes number of joins required and gives good query performance.
65. What is Dimensional Modelling?
Data Warehousing modeling paradigm in which the entities and relations are remodeled to illustrate business entities and user
friendliness.
The normalized structures of ER model are de-normalized and put in STAR / SNOWFLAKE schema.
66. What Snow Flake Schema?
Snowflake structure: Snowflake is a star schema with normalized dimensions.
67. What is Data cleaning?
Filling in missing values, smoothing noisy data, identifying & removing outliers, correcting inconsistencies, etc.;
68. What are the Different methods of loading Dimension tables?
Full load & Incremental load.
69. What are Aggregate tables?
After fact tables are built, any necessary aggregate fact tables must be built. Aggregate tables are structured to define "totals"
of data stored in granular fact tables. This pre-summarization allows for faster extracts from the warehouse, avoiding costly repeats of
"sum/group by" SQL requests. Aggregate tables may be built at the staging level or may be built as a post-load process in the
warehouse DBMS itself.

70. What are the Differences between OLTP and OLAP?


OLTP

OLAP

Functional: day to day operations

Decision support

Db design: application oriented

subject oriented

Data : Current upto date

Historical data

Detailed, flat relational

Summarized, Isolated

Unit of work: Short, simple, transaction

Complex query

Transaction oriented: quick response

Analysis Oriented: User friendly, robust

71. What is ETL?


Processes of Extracting, Transforming (or Transporting) and Loading (ETL) data from source systems into the data warehouse
(or) Extract Transform and Load a set of database utilities used to extract information from one database, transform it and load it into
a second database. These tools are particularly useful to aggregate data from different database suppliers, e.g., Oracle to Sybase. into

a data warehouse
72. What are the various ETL tools in the Market?
Data Stage, Data Junction, Abinitio, Informatica, Cognos, OWB
73. What are the various Reporting tools in the Market?
Seagate Crystal Reports, Business Objects, Microstrategy, Cognos
74. What is Fact table?
A table in a star schema that contains facts. A fact table typically has two types of columns: those that contain facts and those
that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign
keys.
(OR)
The tables which are extracted from heterogeneous sources and used in the Data Warehouse
75. What is a dimension table?
Dimension tables describe the business entities of an enterprise, represented as hierarchical, categorical information such as
time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.
76. What is a lookup table?
Another name for Dimension table.
77. What is a general purpose scheduling tool? Name some of them?
To schedule jobs. Operating system can be used for scheduling.
Tools:
78. What are modeling tools available in the Market? Name some of them?
Visio-based database modeling component, ERWin

79. What is real time data-warehousing?


An enterprise-wide implementation that replicates data from the publication tables on different servers/platforms to a single
subscription table with minimum possible time lag. This implementation effectively consolidates data from multiple sources with least
time lag, so that business can see the real-time reports, as in OLTP.
80. What is data mining?
Data mining is about the discovery of knowledge, rules, trends, or patterns in large quantities of data.
The process of finding hidden patterns and relationships in data. For instance, a consumer goods company may track 200
variables about each consumer. There are scores of possible relationships among the 200 variables. Data mining tools will identify the
significant relationships
81. What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?
Normalization is a step-by-step process of removing redundancies and dependencies of attributes in a data structure. The
condition of the data at completion of each step is described as a "normal form."
First Normal:
Second Normal:
Third Normal:
82. What is ODS?
The form that data warehouse takes in the operational environment. Operational data stores can be updated, do provide rapid
and consistent time, and contain only a limited amount of historical data.
83. What type of Indexing mechanism do we need to use for a typical data warehouse?
Bitmap, Btree
84. Which columns go to the fact table and which columns go the dimension table?
(My user needs to see element broken by All elements before broken = Fact Measures All elements after broken = Dimension
Elements
85. What is a level of Granularity of a fact table? What does this signify? (Weekly level
summarization there is no need to have Invoice Number in the fact table anymore)
Granularity is the lowest possible level of detail of the data stored in a data warehouse
86. How are the Dimension tables designed?
De-Normalized, Wide, Short, Use Surrogate Keys, Contain Additional date fields and flags.
87. What are slowly changing dimensions?
Dimension elements are static by definition, but by nature they may be changing. This change is slower than the fact change.
Hence slowly changing name. We need to update such dimensions which may change slowly in time.
We have 3 types in SCD methodology. Overwrite (Type1), Have a NEW record (Type2), Track required definite number of
Columns (Type3).
88. What are non-additive facts? (Inventory , > > Account balances in bank)
Facts are generally additive. But in some business fact may be non-additive such as Inventory, Bank Balances.

89. What is VLDB?


Data base is too large to back up in a time frame then it's a VLDB
90. What is SCD1 , SCD2 , SCD3 ?
Slowly Changing Dimensions Answered above
91. How do you load the time dimension?
Time dimension can be loaded for only one time as we can calculate all the attributes in advance depending on the date.
92. What are Semi-additive and factless facts? And in which scenario will you use such kinds of fact tables?
Facts which are not aggregatable across all dimensions are called semi-additive facts.
A fact table without any metrics in it is factless fact. Relational tables between dimensions and resultant tables while resolving many-tomany relations are essentially fact-less fact tables.
93. What are conformed dimensions?
Conformed dimensions can be used to analyze facts from two or more data marts. Suppose you have a shipping data mart
(telling you what youve shipped to whom and when) and a sales data mart (telling you who has purchased what and when). Both
marts require a customer dimension and a time dimension. If theyre the same dimension, then you have conforming dimensions,
allowing you to extract and manipulate facts relating to a particular customer from both marts, answering questions such as whether
late shipments have affected sales to that customer
94. Differences between star and snowflake
A snowflake schema is a set of tables comprised of a single, central fact table surrounded by normalized dimension
hierarchies.
Gives good performance, with aggregate tables, when there are reports touching most of the levels of dimensions.
Occupies less space.
A star schema is a set of tables comprised of a single, central fact table surrounded by de-normalized dimensions
Gives good performance with less joins
More space is required
95. What is a staging area? Do we need it? What is the purpose of a staging area?
Data staging is actually a collection of processes used to prepare source system data for loading a data warehouse. Staging
includes the following steps:
Source data extraction, Data transformation (restructuring),
Data transformation (data cleansing, value transformations),
Surrogate key assignments.
96. What is a three tier data warehouse?
Three tiered data warehousing means there are 3 tiers of data, each designed to meet a specific set of end user requirements
Operational Data Systems (Tier 1)
Operations of a business on a day to day basis
Data Warehouse (Tier 2)
This data layer may be comprised of multiple data structures; the operational data store (ODS) for tactical decision support
applications which require transaction level detail as well as the data warehouse which provides a single common set of data bases
designed specifically for all decision support applications in a business.
Data Mart (Tier 3)
This tier is customized for a specific department or set of users like sales/marketing analysts, financial analysts, customer
satisfaction, etc.
97. What are the various methods of getting incremental records or delta records from the source systems? What are the
various tools? - Name a few
We can use the date of creation/modification and ETL date to get the deltas load from source systems.
98. What is latest version of Power Center / Power Mart?
Power center 7.0, Power mart 7.0
99. What is the difference between Power Center & Power Mart?
Informatica PowerCenter license - has all options, including distributed metadata, ability to organize repositories into a data mart
domain and share metadata across repositories.
PowerMart - a limited license (all features except distributed metadata and multiple registered servers). Only local repository can be
created
100. What are the various transformation available?
Source Qualifier, Filter, Router, Joiner, Aggregate, Expression, Rank, Lookup, Update, Sequence Generator, Stored
Procedure, External Stored Procedure, Adv St Procedure, XML, Normalization
101. What are the modules in Power Mart?
Same as Power center
102. What are the different Lookup methods used in Informatica?
Static and Dynamic
103. Can Informatica load heterogeneous targets from heterogeneous sources?
Informatica 6.0 can

104. How do we call shell scripts from Informatica?


Post/pre session tasks
105. What is Informatica Metadata and where is it stored?
Data about data, its stored in the repository
106. What is a mapping, session, worklet, workflow, mapplet?
Mapping : logical flow of data with rules
Session: instance of mapping
Mapplet: reusable mapping
Workflow: logical flow of data with control points
Worklet: reusable workflow
107. How can we use mapping variables in Informatica? Where do we use them?
To make the mapping generic, we can use them in session parameters
108. What are parameter files? Where do we use them?
To pass session variable values.

109. Can we override a native sql query within Informatica? Where do we do it? How do we do it?
Yes, with Override sql query in the Mapping Properties
Eg: Select * from. Where.. Its advised not to use ORDER BY Clause here.
110. Can we use procedural logic inside Informatica? If yes how , if no how can we use external procedural logic in
Informatica?
Do we need an ETL tool? When do we go for the tools in the market?
When the ETL process requires less technical skills and more automation and involves heavy volume of data
111. How do we extract SAP data Using Informatica?

112. What is ABAP? What are IDOCS?


113. How to determine what records to extract?
I could not understood the underlined
Timestamps: When we use date/time columns to record the time& date when that record changed/created
* Deletes are logical with timestamped deletes: If we mark that record as inactive with some date, we can treat that as deleted record
* Triggers on source system tables (Generally we dont do this as this decreases the source system efficiency)
* Application Integration Software TIBCO , MQSERIES
* File Compares (least method)
* Snapshots in Oracle(daily) : replica structures maintained in another system which will have data at one point of time
* Oracle Streams
114. What is Full load & Incremental or Refresh load?
Full & refresh load is same, Used for Dimension generally. Incremental load is used for facts. Defined above.
Techniques of Error Handling
Ignore, Rejecting bad records to a flatfile, loading the records and reviewing them (default values)
115. What are snapshots? What are materialized views & where do we use them? What is a materialized view log?
Snapshots are used when we require same structure in another database and we can pull data when ever we want.
Materialized views are structures pull data from many tables like a view but give better performance and maintenance than
view.

116. What is partitioning? What are the types of partitioning?


Logical splitting of table to facilitate only required minimum chunk of data is accessed.
117. When do we Analyze the tables? How do we do it?
Oracle specific. To gather the statistics of tables. By a query.
118. Compare ETL & Manual development of Business Intelligence?
119. What is Business Intelligence?
Embedding the DSS into the business for effective utilization of fruits of DW.
120. What is OLTP?
OLTP Online transaction processing: Defines the transaction processing that supports the daily business operations

121. What is OLAP?


Online Analytical Processing: Drilling down on various data dimensions to gain a more detailed view of the data. For
instance, a user might begin by looking at North American sales and then drill down on regional sales, then sales by state, and then
sales by major metro area. Enables a user to view different perspectives of the same data to facilitate decision-making.
122. What is OLAP, MOLAP, ROLAP, DOLAP, HOLAP?
ROLAP = relational olap, the users see cubes but under the hood it is pure relationnal table, Micro-Strategy is a rolap product
MOLAP = multi dimensionnal olap, the users see cubes and under the hood there a big cube, Oracle Express used to be a molap
product
DOLAP = Desktop olap, the users see many cubes and under the hood there are many small cubes, Cognos PowerPlay.
HOLAP = hybryd olap, combines molap and rolap, Essbase
123. Name some of the standard Business Intelligence tools in the market?
BO, MSTR, Cognos
124.What are the various modules in Business Objects product Suite?
Designer, BO, Broadcast Agent, Infoview, WebI, Supervisor, ZABO
125. What is a Universe?
Mapping between the source system and logical business entities.
126. What is BAS? What is the function?
127. How do we enhance the functionality of the reports in BO? (VBA)

You might also like