Professional Documents
Culture Documents
Infor
Infor
The following pages summarize some of the questions that typically arise during development and suggest potential resolutions.
Q: How does source format affect performance? (i.e., is it more efficient to source from a flat file rather than a database?)
In general, a flat file that is located on the server machine loads faster than a database located on the server machine. Fixedwidth files are faster than delimited files because delimited files require extra parsing. However, if there is an intent to perform intricate
transformations before loading to target, it may be advisable to first load the flat file into a relational database, which allows the Power
Center mappings to access the data in an optimized fashion by using filters and custom SQL SELECTs where appropriate.
Q: What are some considerations when designing the mapping? (i.e. what is the impact of having multiple targets populated
by a single map?)
With Power Center, it is possible to design a mapping with multiple targets. You can then load the targets in a specific order
using Target Load Ordering. The recommendation is to limit the amount of complex logic in a mapping. Not only is it easier to debug a
mapping with a limited number of objects, but they can also be run concurrently and make use of more system resources. When using
multiple output files (targets) consider writing to multiple disks or file systems simultaneously. This minimizes disk seeks and applies to
a session writing to multiple targets, and to multiple sessions running simultaneously.
Q: What are some considerations for determining how many objects and transformations to include in a single mapping?
There are several items to consider when building a mapping. The business requirement is always the first consideration,
regardless of the number of objects it takes to fulfill the requirement. The most expensive use of the DTM is passing unnecessary data
through the mapping. It is best to use filters as early as possible in the mapping to remove rows of data that are not needed. This is the
SQL equivalent of the WHERE clause. Using the filter condition in the Source Qualifier to filter out the rows at the database level is a
good way to increase the performance of the mapping.
Log File Organization
Q: Where is the best place to maintain Session Logs?
One often-recommended location is the default "SessLogs" folder in the Informatica directory, keeping all log files in the same
directory.
Q:
What documentation is available for the error codes that appear within the error log files?
Log file errors and descriptions appear in Appendix C of the Power Center Troubleshooting Guide. Error information also
appears in the Power Center Help File within the Power Center client applications. For other database-specific errors, consult your
Database User Guide.
Scheduling Techniques
Q: What are the benefits of using workflows with multiple tasks rather than a workflow with a stand-alone session?
Using a workflow to group logical sessions minimizes the number of objects that must be managed to successfully load the
warehouse. For example, a hundred individual sessions can be logically grouped into twenty workflows. The Operations group can then
work with twenty workflows to load the warehouse, which simplifies the operations tasks associated with loading the targets.
Workflows can be created to run sequentially or concurrently or have tasks in different paths doing either.
A sequential workflow runs sessions and tasks one at a time, in a linear sequence. Sequential workflows help ensure that
dependencies are met as needed. For example, a sequential workflow ensures that session1 runs before session2 when session2 is
dependent on the load of session1, and so on. It's also possible to set up conditions to run the next session only if the previous session
was successful, or to stop on errors, etc.
A concurrent workflow groups logical sessions and tasks together, like a sequential workflow, but runs all the tasks at one
time. This can reduce the load times into the warehouse, taking advantage of hardware platforms' Symmetric Multi-Processing (SMP)
architecture.
Other workflow options, such as nesting worklets within workflows, can further reduce the complexity of loading the warehouse.
However, this capability allows for the creation of very complex and flexible workflow streams without the use of a third-party scheduler.
Q: Assuming a workflow failure, does Power Center allow restart from the point of failure?
No. When a workflow fails, you can choose to start a workflow from a particular task but not from the point of failure. It is
possible, however, to create tasks and flows based on error handling assumptions.
Q: What guidelines exist regarding the execution of multiple concurrent sessions / workflows within or across applications?
Workflow Execution needs to be planned around two main constraints:
Available system resources
Memory and processors
The number of sessions that can run at one time depends on the number of processors available on the server. The load
manager is always running as a process. As a general rule, a session will be compute-bound, meaning its throughput is limited by the
availability of CPU cycles. Most sessions are transformation intensive, so the DTM always runs. Also, some sessions require more I/O,
so they use less processor time. Generally, a session needs about 120 percent of a processor for the DTM, reader, and writer in total.
For concurrent sessions:
One session per processor is about right; you can run more, but that requires a "trial and error" approach to determine what
number of sessions starts to affect session performance and possibly adversely affect other executing tasks on the server.
The sessions should run at "off-peak" hours to have as many available resources as possible.Even after available processors
are determined, it is necessary to look at overall system resource usage. Determining memory usage is more difficult than the
processors calculation; it tends to vary according to system load and number of Informatica sessions running.
The first step is to estimate memory usage, accounting for:
Operating system kernel and miscellaneous processes
Database engine
Informatica Load Manager
The DTM process creates threads to initialize the session, read, write and transform data, and handle pre- and post-session
operations.
More memory is allocated for lookups, aggregates, ranks, sorters and heterogeneous joins in addition to the shared memory
segment.
At this point, you should have a good idea of what is left for concurrent sessions. It is important to arrange the production run
to maximize use of this memory. Remember to account for sessions with large memory requirements; you may be able to run only one
large session, or several small sessions concurrently.
Load Order Dependencies are also an important consideration because they often create additional constraints. For example,
load the dimensions first, then facts. Also, some sources may only be available at specific times, some network links may become
saturated if overloaded, and some target tables may need to be available to end users earlier than others.
Q: Is it possible to perform two "levels" of event notification? At the application level and the Informatica Server level to notify
the Server Administrator?
The application level of event notification can be accomplished through post-session e-mail. Post-session e-mail allows you to
create two different messages, one to be sent upon successful completion of the session, the other to be sent if the session fails.
Messages can be a simple notification of session completion or failure, or a more complex notification containing specifics about the
session.
You
can
use
the
following
variables
in
the
text
of
your
post-session
e-mail:
E-mail Variable Description
%s Session name
%l Total records loaded
%r Total records rejected
%e Session status
%t Table details, including read throughput in bytes/second and write throughput in rows/second
%b Session start time
%c Session completion time
%i Session elapsed time (session completion time-session start time)
%g Attaches the session log to the message
%m Name and version of the mapping used in the session
%d Name of the folder containing the session
%n Name of the repository containing the session
%a Attaches the named file. The file must be local to the Informatica Server. The following are valid filenames: %a or %a
On Windows NT, you can attach a file of any type.
On UNIX, you can only attach text files. If you attach a non-text file, the send might fail.
Note: The filename cannot include the Greater Than character (>) or a line break.
The PowerCenter Server on UNIX uses rmail to send post-session e-mail. The repository user who starts the PowerCenter server must
have the rmail tool installed in the path in order to send e-mail.
To verify the rmail tool is accessible:
1. Login to the UNIX system as the PowerCenter user who starts the PowerCenter Server.
2. Type rmail at the prompt and press Enter.
3. Type '.' to indicate the end of the message and press Enter.
4. You should receive a blank e-mail from the PowerCenter user's e-mail account. If not, locate the directory where rmail resides and
add that directory to the path.
5. When you have verified that rmail is installed correctly, you are ready to send post-session e-mail.
The output should look like the following:
Session complete.
Session name: sInstrTest
Total Rows Loaded = 1
Total Rows Rejected = 0
Completed
Rows
Loaded Rows
Rejected ReadThroughput
(bytes/sec) WriteThroughput
(rows/sec) Table Name
Status
1 0 30 1 t_Q3_sales
No errors encountered.
Start Time: Tue Sep 14 12:26:31 1999
Completion Time: Tue Sep 14 12:26:41 1999
represents a value that can change throughout the session. The Informatica server saves the value of mapping variable to the
repository at the end of session run and uses that value next time U runs the session
.
18. Can U use the mapping parameters or variables created in one mapping into another mapping?
NO. We can use mapping parameters or variables in any transformation of the same Mapping or Mapplet in which U have
created mapping parameters or variables.
19. Can u use the mapping parameters or variables created in one mapping into any other reusable transformation?
Yes. Because reusable transformation is not contained with any Mapplet or mapping.
20. How can U improve session performance in aggregator transformation?
Unconnected lookup
Dynamic cache
34. Which transformation should we use to normalize the COBOL and relational sources?
Normalizer Transformation. When U drag the COBOL source in to the mapping Designer workspace, the normalizer
transformation automatically appears, creating input and output ports for every column in the source.
35. How the Informatica server sorts the string values in Rank transformation?
When the Informatica server runs in the ASCII data movement mode it sorts session data using Binary sort order. If U
configures the session to use a binary sort order, the Informatica server calculates the binary value of each string and returns the
specified number of rows with the highest binary values for the string.
36. What r the rank caches?
During the session, the Informatica server compares an input row with rows in the data cache. If the input row out-ranks a
stored row, the Informatica server replaces the stored row with the input row. The Informatica server stores group information in an
index cache and row data in a data cache.
37. What is the Rank index in Rank transformation?
The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank
Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5
salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:
38.What is the Router transformation?
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test
data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router
transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the
conditions to a default output group. If you need to test the same input data based on multiple conditions, use a Router Transformation
in a mapping instead of creating multiple Filter transformations to perform the same task.
39. What r the types of groups in Router transformation?
Input group and Output group.
The designer copies property information from the input ports of the input group to create a set of output ports for each output
group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.
require updates.
56. What r the mappings that we use for slowly changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1
Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions
of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the
target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of
dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new
dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing
dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the
updates
57. What r the different types of Type2 dimension mapping?
Type2 Dimension/Version Data Mapping: In this mapping the updated dimension in the source will gets inserted in target along with a
new version number. And newly added dimension
in source will inserted into target with a primary key.
Type2 Dimension/Flag current Mapping: This mapping is also used for slowly changing dimensions. In addition it creates a flag value
for changed or new dimension.
Flag indicates the dimension is new or newly updated. Recent dimensions will gets saved with current flag value 1. And updated
dimensions r saved with the value 0.
Type2 Dimension/Effective Date Range Mapping: This is also one flavour of Type2 mapping used for slowly changing dimensions.
This mapping also inserts both new and changed dimensions in to the target. And changes r tracked by the effective date range for
each version of each dimension.
58. How can u recognize whether or not the newly added rows in the source r gets insert in the target?
In the Type2 mapping we have three options to recognize the newly added rows
Version number
Flag value
Effective date Range
59. What r two types of processes that Informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.
The DTM process: Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session
operations.
60. What r the new features of the server manager in the Informatica 5.0?
U can use command line arguments for a session or batch. This allows U to change the values of session parameters, and
mapping parameters and mapping variables.
Parallel data processing: This feature is available for power center only. If we use the Informatica server on a SMP system, U can use
multiple CPU's to process a session concurrently.
Process session data using threads: Informatica server runs the session in two processes. Explained in previous question.
61. Can u generate reports in Informatica?
Yes. By using Metadata reporter we can generate reports in Informatica.
62. What is metadata reporter?
It is a web based application that enables you to run reports against repository metadata.
With a meta data reporter, u can access information about U'r repository with out having knowledge of sql, transformation language or
underlying tables in the repository.
63. Define mapping and sessions?
Mapping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation.
Session: It is a set of instructions that describe how and when to move data from source to targets.
64. Which tool U use to create and manage sessions and batches and to monitor and stop the Informatica server?
Informatica server manager.
65. Why we use partitioning the session in Informatica?
Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into
target.
66. To achieve the session partition what r the necessary tasks u have to do?
Configure the session to partition source data.
Install the Informatica server on a machine with multiple CPU's.
67. How the Informatica server increases the session performance through partitioning the source?
For relational sources Informatica server creates multiple connections for each partition of a single source and extracts
separate range of data for each connection. Informatica server reads multiple partitions of a single source concurrently. Similarly for
loading also Informatica server creates multiple connections to the target and loads partitions of data concurrently.
For XML and file sources, Informatica server reads multiple files concurrently. For loading the data Informatica server creates
a separate file for each partition (of a source file).U can choose to merge the targets.
68. Why u use repository connectivity?
When u edit, schedule the session each time, Informatica server directly communicates the repository to check whether or not
the session and users r valid. All the metadata of sessions and mappings will be stored in repository.
69. What r the tasks that Load Manager Process will do?
Manages the session and batch scheduling: When u start the Informatica server the load manager launches and queries the
repository for a list of sessions configured to run on the Informatica server. When u configures the session the load manager maintains
list of list of sessions and session start times. When u start a session load manger fetches the session information from the repository to
perform the validations and verifications prior to starting DTM process.
Locking and reading the session: When the Informatica server starts a session load manager locks the session from the repository.
Locking prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter files, load manager reads the parameter file and verifies that the session
level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or not the user have privileges to run the
session.
Creating log files: Load manger creates log file contains the status of session.
70. What is DTM process?
After the load manger performs validations for session, it creates the DTM process. DTM is to create and manage the threads
that carry out the session tasks. I create the master thread. Master thread creates and manages all the other threads.
Indicator file: If u use the flat file as a target, U can configure the Informatica server to create indicator file. For each target row, the
indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the target file based on file properties entered in the session
property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache files. For the following circumstances
Informatica server creates index and data cache files.
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
74. In which circumstances that Informatica server creates reject files?
When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.
75. What is polling?
It displays the updated information about the session in the monitor window. The monitor window displays the status of each
session when U polls the Informatica server
76. Can u copy the session to a different folder or repository?
Yes. By using copy session wizard u can copy a session in a different folder or repository. But that target folder or repository
should consists of mapping of that session. If target folder or repository is not having the mapping of copying session, u should have to
copy that mapping first before u copy the session
77. What is batch and describe about types of batches?
Grouping of session is known as batch. Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another. If
u have several independent sessions u can use concurrent batches. Which runs all the sessions at the same time?
78. Can u copy the batches?
NO
79.How many number of sessions that u can create in a batch?
Any number of sessions.
80. When the Informatica server marks that a batch is failed?
If one of session is configured to "run if previous completes" and that previous session fails.
81. What is a command that used to run a batch?
pmcmd is used to start a batch.
82. What r the different options used to configure the sequential batches?
Two options:
Run the session only if previous session completes successfully.
Always runs the session.
83. In a sequential batch can u run the session if previous session fails?
Yes. By setting the option always runs the session.
84. Can u start batches with in a batch?
U can not. If u want to start batch that resides in a batch, create a new independent batch and copy the necessary sessions
into the new batch.
85. Can u start a session inside a batch individually?
We can start our required session only in case of sequential batch.
In case of concurrent batch we cant do like this.
86. How can u stop a batch?
By using server manager or pmcmd.
87. What r the session parameters?
Session parameters r like mapping parameters, represent values U might want to change between sessions such as database
connections or source files.
Server manager also allows U to create user defined session parameters. Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or location of
session target file between session runs.
Reject file name : Use this parameter when u want to change the name or location of
session reject files between session runs.
88. What is parameter file?
Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by text
editor such as word pad or notepad.
U can define the following values in parameter file
Mapping parameters
Mapping variables
session parameters
89. How can u access the remote source into U'r session?
Relational source: To access relational source which is situated in a remote place, u need to
configure database connection to the data source.
File Source: To access the remote source file U must configure the FTP connection to the
host machine before u create the session.
Heterogeneous: When U'r mapping contains more than one source type, the server manager creates a heterogeneous session that
displays source options for all types.
90. What is difference between partitioning of relational target and partitioning of file targets?
If u partition a session with a relational target Informatica server creates multiple connections to the target database to write
target data concurrently. If u partition a session with a file target the Informatica server creates one target file for each partition. U can
configure session properties to merge these target files.
In some cases if a session contains a aggregator transformation, u can use incremental aggregation to improve session performance.
Aviod transformation errors to improve the session performance.
If the session contains lookup transformation u can improve the session performance by enabling the look up cache.
If U'r session contains filter transformation, create that filter transformation nearer to the sources
or u can use filter condition in source qualifier.
Aggregator, Rank and joiner transformation may oftenly decrease the session performance. Because they must group data before
processing it. To improve session performance in this case use sorted ports option.
93. What is difference between mapplet and reusable transformation?
Mapplet consists of set of transformations that is reusable. A reusable transformation is a
single transformation that can be reusable.
If u create a variables or parameters in mapplet that can not be used in another mapping or mapplet. Unlike the variables that
r created in a reusable transformation can be useful in any other mapping or mapplet.
We can not include source definitions in reusable transformations. But we can add sources to a mapplet.
Whole transformation logic will be hided in case of mapplet. But it is transparent in case of reusable transformation.
We cant use COBOL source qualifier, joiner, normalizer transformations in mapplet. Where as we can make them as a
reusable transformations.
94. Define Informatica repository?
The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and
Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when
you want the Informatica Server to perform the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords, permissions and privileges, and
product version.
Use repository manager to create the repository. The Repository Manager connects to the repository database and runs the
code needed to create the repository tables. These tables
stores metadata in specific format the Informatica server, client tools use.
95. What r the types of metadata that stores in repository?
Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations
96. What is power center repository?
The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a data mart
domain, you can create a single global repository to store metadata used across an enterprise, and a number of local repositories to
share the global metadata as needed.
97. How can u work with remote database in Informatica? Did u work directly by using remote connections?
To work with remote data source u need to connect it with remote connections. But it is not preferable to work with that remote
source directly by using remote connections .Instead u bring that source into U r local machine where Informatica server resides. If u
work directly with remote source the session performance will decreases by passing less amount of data across the network in a
particular time.
98. What r the new features in Informatica 5.0?
U can Debug U'r mapping in mapping designer
U can view the work space over the entire screen
The designer displays a new icon for a invalid mappings in the navigator window
U can use a dynamic lookup cache in a lookup transformation
Create mapping parameters or mapping variables in a mapping or mapplet to make mappings more flexible
U can export objects into repository and import objects from repository. when u export a repository object, the designer or server
manager creates an XML file to describe the repository metadata.
The designer allows u to use Router transformation to test data for multiple conditions. Router transformation allows u route groups of
data to transformation or target.
U can use XML data as a source or target.
Server Enhancements:
U can use the command line program pmcmd to specify a parameter file to run sessions or batches. This allows you to change
the values of session parameters, and mapping parameters and variables at runtime.
If you run the Informatica Server on a symmetric multi-processing system, you can use multiple CPUs to process a session
concurrently. You configure partitions in the session properties based on source qualifiers. The Informatica Server reads, transforms,
and writes partitions of data in parallel for a single session. This is available for Power center only.
Informatica server creates two processes like load manager process, DTM process to run the sessions.
Metadata Reporter: It is a web based application which is used to run reports against repository metadata.
U can copy the session across the folders and repositories using the copy session wizard in the Informatica server manager.
With new email variables, you can configure post-session email to include information, such as the mapping used during the
session
99. What is incremental aggregation?
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the
source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This
allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the
same calculations each time you run the session.
100. What r the scheduling options to run a session?
U can schedule a session to run at a given time or interval, or u can manually run the session.
Different options of scheduling
Run only on demand: server runs the session only when user starts session explicitly
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervals as u configured.
Customized repeat: Informatica server runs the session at the dates and times specified in the repeat dialog box.
101 .What is tracing level and what r the types of tracing level?
Tracing level represents the amount of information that Informatica server writes in a log file. Types of tracing level
Normal
Verbose
Verbose initialization
Verbose data
102. What is difference between stored procedure transformation and external procedure transformation?
In case of stored procedure transformation procedure will be compiled and executed in a relational data source. U need data
base connection to import the stored procedure in to u'r mapping.
Where as in external procedure transformation procedure or function will be executed outside of data source. If u need to
make it as a DLL to access in u r mapping. No need to have data base connection in case of external procedure transformation.
103. Explain about Recovering sessions?
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of
failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of
the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
1 Run the session again if the Informatica Server has not issued a commit.
2 Truncate the target tables and run the session again if the session is not recoverable.
3 Consider performing recovery if the Informatica Server has issued at least one commit.
104. If a session fails after loading of 10,000 records in to the target. How can u load the records from 10001 th record when u
run the session next time?
As explained above Informatica server has 3 methods to recovering the sessions. Use performing recovery to load the records
from where the session fails.
105. Explain about perform recovery?
When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of
the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next
row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica
Server bypass the rows up to 10,000 and starts loading with row 10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server
setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table.
real-life items - invoices, orders and so on and they can be a quick way to group together similar transactions for further analysis.
The key here is not to go overboard and make these degenerate dimensions into full dimension tables for example, an
Invoice dimension - as in all likelihood this dimension table will grow at the same rate as your fact table. If there is other interesting
information to go with the invoice - for example, who the customer was, what products were ordered - this is better placed in specific
dimensions for customers and products where it can be stored as a kind of 'master copy', rather than storing it alongside each order in
a ballooning Invoice dimension.
The other advantage with degenerate dimensions is that they're a lot easier to build and maintain when using ETL tools such
as Oracle Warehouse Builder, as you don't have to create dimension lookup tables, create synthetic keys, sequences and so on.
Indeed, if you're loading your dimensional model into a multidimensional database such as Oracle OLAP, your database will be much
smaller in size and easier to handle if you can keep the number of formal dimensions to a minimum, as they tend to 'explode' in size the
more dimensions you add to the database.
Judicious use of degenerate dimensions keeps your dimensional model rational and your database size reasonable, whilst
allowing you to keep useful items in the fact table that help us tie the data warehouse back to the original source systems.
Parameter files, Mapping variables when do we use them and why, how do they affect the performance.
Mapping parameter represents a constant value that U can define before running a session. A mapping parameter retains the
same value throughout the entire session. When u use the mapping parameter, U declare and use the parameter in a mapping or
Mapplet. Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout the session. The Informatica
server saves the value of mapping variable to the repository at the end of session run and uses that value next time U run the session.
Parameter files are used @ session level-Syntax for parameter files is:
[folder_name.session_name]
parameter_name=value
variable_name=value
Enter this in any editor and give the path in properties.
(Click the Properties tab in session.
Enter the parameter directory and name in the Parameter Filename field)
We use them generally whenever we want to change dates/pass specific date values
Why we use them is because hard coding the values is difficult for maintenance--having in a txt editor is easy to maintain /change.
As far as I can think They dont effect performance much--(but depends on how many records are there in ur source and what field ur
passing...)
114. What factors do we need to consider/take care of when we migrate our mappings from development to testing...
something in relation to connectivity objects.
Lets say we got 3 repositories DEV, QA, PROD. For migration from Dev to QA Repository, 1st the DBA's will create all the
required objects (tables, views...) in the respective schemas (1) in the QA database. We then move (2)(just drag and drop) all the
required mappings in designer to the specified folder (3) in QA repository. Validate mapping in Designer. Then we move (just drag and
drop again) all the tasks (WFs) in the workflow manager from dev to QA. (4).Validate WF again.
This is the procedure followed in most of the projects. Also there is 1 more procedure given in guide to do it directly --check
that
1)All the tables in a project will be in different schemas according to their functionality
2)You might need to follow the order of migration. First the Shared
Folder then the Folder which is using shared folder objects using shortcuts.
3)In a project we'll have all source to stg mappings in 1 folder and actual maps in 1 more folder and so on......so we have to take care
when moving from 1 rep to other. we have move from folder A mappings in dev to folder A in QA
4)After moving WF's we got to change connections to point to QA database
115. How many Mapplets do u on an average create in a project?
Depends --- Usually say around 5.we got 2 in our project
116. What do u need to take care of when we r upgrading from one version to another
(What changes do we need to do in
our mappings).
None in our mappings except in certain instances. These are the steps for upgradation
PROCESS FLOW (STEPS FOR INFORMATICA UPGRADE):
1) Prepare the Repository
2) Create a copy of the Repository
3) Installation and Configuration of PC7.1.2 Components (client and Repository server)
4) Upgrade the Repository
5) Install and Configure the PC Server
6) Register the PC Server with the Repository
7) Test the Upgradation/Installation
117. Will there be any difference when we r extracting from /loading data to SQL server or DB2 or flat files?
None, SQL sever and DB2 is just like oracle table import. For flat files its a little different. Loading is same.
118. Where do we use UNIX shell scripts, and what do they contain?
Used for scheduling. Talk about pmcmd
119. How do we collect performance details of a session?
The performance details provide counters that help you understand the session and mapping efficiency. Just check the tab in
session.
"You create performance details by selecting Collect Performance Data in the session properties before running the session.
By evaluating the final performance details, you can determine where session performance slows down. Monitoring also provides
session-specific details that can help tune the following:
Buffer block size
Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations
Lookup transformations
Before using performance details to improve session performance you must do the following:
Enable monitoring
Increase Load Manager shared memory
Understand performance counters
Say its only selected in testing. My experience it slows down the session a lot
Tuning Mappings part1
Mapping-level optimization takes time to implement but can significantly boost performance. Sometimes the mapping is the
biggest bottleneck in the load process because business rules determine the number and complexity of transformations in a mapping.
Before deciding on the best route to optimize the mapping architecture, you need to resolve some basic issues. Tuning
mappings is a tiered process. The first tier can be of assistance almost universally, bringing about a performance increase in all
scenarios. The second tier of tuning processes may yield only small performance increase, or can be of significant value, depending on
the situation.
Some factors to consider when choosing tuning processes at the mapping level include the specific environment, software/
hardware limitations, and the number of records going through a mapping. This Best Practice offers some guidelines for tuning
mappings Description.
Analyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize
mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected
input/output or output ports. Doing so can reduce the amount of data the transformations store in the data cache. Too many Lookups
and Aggregators encumber performance because each requires index cache and data cache. Since both are fighting for memory
space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings
is another option.
Limit the number of Aggregators in a mapping. A high number of Aggregators can increase I/O activity on the cache directory.
Unless the seek/access time is fast on the directory itself, having too many Aggregators can cause a bottleneck. Similarly, too many
Lookups in a mapping causes contention of disk and memory, which can lead to thrashing, leaving insufficient memory to run a
mapping efficiently.
Consider Single-Pass Reading
If several mappings use the same data source, consider a single-pass reading. Consolidate separate mappings into one
mapping with either a single Source Qualifier Transformation or one set of Source Qualifier Transformations as the data source for the
separate data flows.
Similarly, if a function is used in several mappings, a single-pass reading will reduce the number of times that function will be called in
the session.
Optimize SQL Overrides
When SQL overrides are required in a Source Qualifier, Lookup Transformation, or in the update override of a target object, be
sure the SQL statement is tuned. The extent to which and how SQL can be tuned depends on the underlying source or target database
system.
Scrutinize Data type Conversions
Power Center Server automatically makes conversions between compatible data types. When these conversions are
performed unnecessarily performance slows. For example, if a mapping moves data from an Integer port to a Decimal port, then back
to an Integer port, the conversion may be unnecessary.
In some instances however, data type conversions can help improve performance. This is especially true when integer values
are used in place of other data types for performing comparisons using Lookup and Filter transformations.
Eliminate Transformation Errors
Large numbers of evaluation errors significantly slow performance of the PowerCenter Server. During transformation errors,
the PowerCenter Server engine pauses to determine the cause of the error, removes the row causing the error from the data flow, and
logs the error in the session log.
Transformation errors can be caused by many things including: conversion errors, conflicting mapping logic, any condition that
is specifically set up as an error, and so on. The session log can help point out the cause of these errors. If errors recur consistently for
certain transformations, re-evaluate the constraints for these transformations. Any source of errors should be traced and eliminated.
Tuning mappings -2
Optimize Lookup Transformations
There are a number of ways to optimize lookup transformations that are setup in a mapping.
results.
Define the rows from the smaller set of data in the joiner as the Master rows. The Master rows are cached to memory and the
detail records are then compared to rows in the cache of the Master rows. In order to minimize memory requirements, the smaller set of
data should be cached and thus set as Master.
Use Normal joins whenever possible. Normal joins are faster than outer joins and the resulting set of data is also smaller.
Use the database to do the join when sourcing data from the same database schema. Database systems usually can perform
the join more quickly than the Informatica Server, so a SQL override or a join condition should be used when joining multiple tables from
the same database schema.
Optimize Sequence Generator Transformations
Sequence Generator transformations need to determine the next available sequence number, thus increasing the Number of
Cached Values property can increase performance. This property determines the number of values the Informatica Server caches at
one time. If it is set to cache no values then the Informatica Server must query the Informatica repository each time to determine what is
the next number which can be used. Configuring the Number of Cached Values to a value greater than 1000 should be considered. It
should be noted any cached values not used in the course of a session are lost since the sequence generator value in the repository is
set, when it is called next time, to give the next set of cache values.
Avoid External Procedure Transformations
For the most part, making calls to external procedures slows down a session. If possible, avoid the use of these
Transformations, which include Stored Procedures, External Procedures and Advanced External Procedures.
Field Level Transformation Optimization
As a final step in the tuning process, expressions used in transformations can be tuned. When examining expressions, focus
on complex expressions for possible simplification.
To help isolate slow expressions, do the following:
1. Time the session with the original expression.
2. Copy the mapping and replace half the complex expressions with a constant.
3. Run and time the edited session.
4. Make another copy of the mapping and replace the other half of the complex expressions with a constant.
5. Run and time the edited session.
Processing field level transformations takes time. If the transformation expressions are complex, then processing will be
slower. Its often possible to get a 10- 20% performance improvement by optimizing complex field level transformations. Use the target
table mapping reports or the Metadata Reporter to examine the transformations. Likely candidates for optimization are the fields with
the most complex expressions. Keep in mind that there may be more than one field causing performance problems.
Factoring out Common Logic
This can reduce the number of times a mapping performs the same logic. If a mapping performs the same logic multiple times
in a mapping, moving the task upstream in the mapping may allow the logic to be done just once. For example, a mapping has five
target tables. Each target requires a Social Security Number lookup. Instead of performing the lookup right before each target, move
the lookup to a position before the data flow splits.
Minimize Function Calls
Anytime a function is called it takes resources to process. There are several common examples where function calls can be
reduced or eliminated.
Aggregate function calls can sometime be reduced. In the case of each aggregate function call, the Informatica Server must
search and group the data.
Thus the following expression:
SUM(Column A) + SUM(Column B)
Can be optimized to:
SUM(Column A + Column B)
In general, operators are faster than functions, so operators should be used whenever possible.
For example if you have an expression which involves a CONCAT function such as:
CONCAT(CONCAT(FIRST_NAME, ), LAST_NAME)
It can be optimized to:
FIRST_NAME || || LAST_NAME
Remember that IIF() is a function that returns a value, not just a logical test. This allows many logical statements to be written
in a more compact fashion.
For example:
IIF(FLG_A=Y and FLG_B=Y and FLG_C=Y, VAL_A+VAL_B+VAL_C,
IIF(FLG_A=Y and FLG_B=Y and FLG_C=N, VAL_A+VAL_B,
IIF(FLG_A=Y and FLG_B=N and FLG_C=Y, VAL_A+VAL_C,
IIF(FLG_A=Y and FLG_B=N and FLG_C=N, VAL_A,
IIF(FLG_A=N and FLG_B=Y and FLG_C=Y, VAL_B+VAL_C,
IIF(FLG_A=N and FLG_B=Y and FLG_C=N, VAL_B,
IIF(FLG_A=N and FLG_B=N and FLG_C=Y, VAL_C,
IIF(FLG_A=N and FLG_B=N and FLG_C=N, 0.0))))))))
Can be optimized to:
IIF(FLG_A=Y, VAL_A, 0.0) + IIF(FLG_B=Y, VAL_B, 0.0) + IIF(FLG_C=Y, VAL_C, 0.0)
The original expression had 8 IIFs, 16 ANDs and 24 comparisons. The optimized expression results in 3 IIFs, 3 comparisons and two
additions.
Be creative in making expressions more efficient. The following is an example of rework of an expression which eliminates three
comparisons down to one:
For example:
IIF(X=1 OR X=5 OR X=9, 'yes', 'no')
Designer uses ODBC to import a Microsoft Excel source. You do not need database permissions to import Microsoft Excel sources.
To import an Excel source definition, you need to complete the following tasks:
Install the Microsoft Excel ODBC driver on your system.
Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit Administrator.
Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric data.
Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges display as source definitions
when you import the source.
9. Which db is RDBMS and which is MDDB can u name them?
MDDB ex. Oracle Express Server(OES), Essbase by Hyperion Software, Powerplay by Cognos and RDBMS ex. Oracle , SQL
Server etc.
10. What are the modules/tools in Business Objects? Explain their purpose briefly?
BO Designer, Business Query for Excel, BO Reporter, Infoview, Explorer, WEBI, BO Publisher, and Broadcast Agent, BO
ZABO).
InfoView: IT portal entry into Web Intelligence & Business Objects.
Base module required for all options to view and refresh reports.
Reporter: Upgrade to create/modify reports on LAN or Web.
Explorer: Upgrade to perform OLAP processing on LAN or Web.
Designer: Creates semantic layer between user and database.
Supervisor: Administer and control access for group of users.
Web Intelligence: Integrated query, reporting, and OLAP analysis over the Web.
Broadcast Agent: Used to schedule, run, publish, push, and broadcast pre-built reports and spreadsheets, including event notification
and response capabilities, event filtering, and calendar based notification, over the LAN, e-mail, pager, Fax, Personal Digital
Assistant( PDA), Short Messaging Service(SMS), etc.
Set Analyzer - Applies set-based analysis to perform functions such as exclusion, intersections, unions, and overlaps visually.
Developer Suite Build packaged, analytical, or customized apps.
11.What are the Ad hoc queries, Canned Queries/Reports and How do u create them?
(Plz check this pageC\:BObjects\Quries\Data Warehouse - About Queries.htm)
The data warehouse will contain two types of query. There will be fixed queries that are clearly defined and well understood, such as
regular reports, canned queries (standard reports) and common aggregations. There will also be ad hoc queries that are unpredictable,
both in quantity and frequency.
Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a database. Any business analyst wants to know what is
inside the database. He then proceeds by calculating totals, averages, maximum and minimum values for most attributes within the
database. These are unpredictable element of a data warehouse. It is exactly that ability to run any query when desired and expect a
reasonable response that makes the data warehouse worthwhile, and makes the design such a significant challenge.
The end-user access tools are capable of automatically generating the database query that answers any Question posed by
the user. The user will typically pose questions in terms that they are familiar with (for example, sales by store last week); this is
converted into the database query by the access tool, which is aware of the structure of information within the data warehouse.
Canned queries: Canned queries are predefined queries. In most instances, canned queries contain prompts that allow you to
customize the query for your specific needs. For example, a prompt may ask you for a School, department, term, or section ID. In this
instance you would enter the name of the School, department or term, and the query will retrieve the specified data from the
Warehouse. You can measure resource requirements of these queries, and the results can be used for capacity planning and for
database design.
The main reason for using a canned query or report rather than creating your own is that your chances of misinterpreting data
or getting the wrong answer are reduced. You are assured of getting the right data and the right answer.
12. How many Fact tables and how many dimension tables u did? Which table precedes what?
http://www.ciobriefings.com/whitepapers/StarSchema.asp
13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA?
http://www.ciobriefings.com/whitepapers/StarSchema.asp
14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?
Because its denormalized structure, i.e., Dimension Tables are denormalized. Why to denormalize means the first (and often
only) answer is: speed. OLTP structure is designed for data inserts, updates, and deletes, but not data retrieval. Therefore, we can
often squeeze some speed out of it by denormalizing some of the tables and having queries go against fewer tables.
These queries are faster because they perform fewer joins to retrieve the same record set. Joins are also confusing to many End users.
By denormalizing, we can present the user with a view of the data that is far easier for them to understand.
Benefits of STAR SCHEMA:
Far fewer Tables.
Designed for analysis across time.
Simplifies joins.
Less database space.
Supports drilling in reports.
Flexibility to meet business and technical needs.
15. How do u load the data using Informatica?
Using session.
16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to use FTP without a special utility?
(i) The FTP (File Transfer Protocol) utility program is commonly used for copying files to and from other computers. These computers
may be at the same site or at different sites thousands of miles apart. FTP is general protocol that works on UNIX systems as well as
other non- UNIX systems.
(ii) Remote connect commands:
ftp machinename
ex: ftp 129.82.45.181 or ftp iesg
If the remote machine has been reached successfully, FTP responds by asking for a login name and password. When u enter ur own
login name and password for the remote machine, it returns the prompt like below
ftp>
and permits u access to ur own home directory on the remote machine. U should be able to move around in ur own directory
and to copy files to and from ur local machine using the FTP interface commands.
Note: U can set the mode of file transfer to ASCII (default and transmits seven bits per character).
Use the ASCII mode with any of the following:
- Raw Data (e.g. *.dat or *.txt, codebooks, or other plain text documents)
- SPSS Portable files.
- HTML files.
If u set mode of file transfer to Binary (the binary mode transmits all eight bits per byte and thus provides less chance of
a transmission error and must be used to transmit files other than ASCII files).
For example use binary mode for the following types of files:
- SPSS System files
- SAS Dataset
- Graphic files (eg., *.gif, *.jpg, *.bmp, etc.)
- Microsoft Office documents (*.doc, *.xls, etc.)
(iii) Yes. If u r using Windows, u can access a text-based FTP utility from a DOS prompt.
To do this, perform the following steps:
1. From the Start Programs MS-Dos Prompt
2. Enter ftp ftp.geocities.com. A prompt will appear
(or)
Enter ftp to get ftp prompt ftp> open hostname ex. ftp>open ftp.geocities.com (It connect to the specified host).
3. Enter ur yahoo! GeoCities member name.
4. enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo! GeoCities directory.
17. What cmd is used to transfer multiple files at a time using FTP?
mget ==> To copy multiple files from the remote machine to the local machine. You will be prompted for a y/n answer before
transferring each file mget * ( copies all files in the current remote directory to ur current local directory, using the same file names).
mput ==> To copy multiple files from the local machine to the remote machine.
18. What is an Filter Transformation or what options u have in Filter Transformation?
The Filter transformation provides the means for filtering records in a mapping. You pass all the rows from a source
transformation through the Filter transformation, then enter a filter condition for the transformation. All ports in a Filter transformation are
input/output, and only records that meet the condition pass through the Filter transformation.
Note: Discarded rows do not appear in the session log or reject files
To maximize session performance, include the Filter transformation as close to the sources in the mapping as possible. Rather
than passing records you plan to discard through the mapping, you then filter out unwanted data early in the flow of data from sources
to targets.
You cannot concatenate ports from more than one transformation into the Filter transformation; the input ports for the filter
must come from a single transformation. Filter transformations exist within the flow of the mapping and cannot be unconnected. The
Filter transformation does not allow setting output default values.
19. What are default sources which will supported by Informatica Powermart ?
Relational tables, views, and synonyms.
Fixed-width and delimited flat files that do not contain binary data.
COBOL files.
20. When do u create the Source Definition? Can I use this Source Definition to any Transformation?
When working with a file that contains fixed-width binary data, you must create the source definition.
The Designer displays the source definition as a table, consisting of names, datatypes, and constraints. To use a source
definition in a mapping, connect a source definition to a Source Qualifier or Normalizer transformation. The Informatica Server uses
these transformations to read the source data.
21. What is Active & Passive Transformation?
Ans: Active and Passive Transformations
Transformations can be active or passive. An active transformation can change the number of records passed through it. A passive
transformation never changes the record count. For example, the Filter transformation removes rows that do not meet the filter
condition defined in the transformation.
Active transformations that might change the record count include the following:
Advanced External Procedure
Aggregator
Filter
Joiner
Normalizer
Rank
Source Qualifier
Note: If you use Power Connect to access ERP sources, the ERP Source Qualifier is also an active transformation.
You can connect only one of these active transformations to the same transformation or target, since the Informatica Server
cannot determine how to concatenate data from different sets of records with different numbers of rows.
Passive transformations that never change the record count include the following:
Lookup
Expression
External Procedure
Sequence Generator
Stored Procedure
Update Strategy
You can connect any number of these passive transformations, or connect one active transformation with any number of
passive transformations, to the same transformation or target.
22. What is staging Area and Work Area?
Staging Area
- Holding Tables on DW Server.
- Loaded from Extract Process
- Input for Integration/Transformation
- May function as Work Areas
- Output to a work area or Fact Table
Work Area
- Temporary Tables
- Memory
23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page # 125)
Definition: Data About Data
Metadata contains descriptive data for end users. In a data warehouse the term metadata is used in a number of different situations.
Metadata is used for:
Data transformation and load
Data management
Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and any changes that need to be made. The
advantage of storing metadata about the data being transformed is that as source data changes the changes can be captured in the
metadata, and transformation programs automatically regenerated.
For each source data field the following information is required:
Source Field:
Unique identifier (to avoid any confusion occurring between 2 fields of the same name from different sources).
Name (Local field name).
Type (storage type of data, like character, integer, floating pointand so on).
Location
- system (system it comes from ex.Accouting system).
- object (object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:
Unique identifier
Name
Type (database data type, such as Char, Varchar, Number and so on).
Table name (Name of the table th field will be part of).
The other information that needs to be stored is the transformation or transformations that need to be applied to turn the source data
into the destination data:
Transformation:
Transformation (s)
- Name
- Language (name of the lanjuage that transformation is written in).
- module name
- syntax
The Name is the unique identifier that differentiates this from any other similar transformations. The Language attribute
contains the name of the language that the transformation is written in. The other attributes are module name and syntax. Generally
these will be mutually exclusive, with only one being defined. For simple transformations such as simple SQL functions the syntax will
be stored. For complex transformations the name of the module that contains the code is stored instead.
Data management:
Metadata is required to describe the data as it resides in the data warehouse. This is needed by the warehouse manager to allow it to
track and control all data movements. Every object in the database needs to be described.
Metadata is needed for all the following:
Tables
- Columns
- name
- type
Indexes
- Columns
- name
- type
Views
- Columns
- name
- type
Constraints
- name
- type
- table
- columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page # 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same metadata can be used by the warehouse
manager to describe the data in the data warehouse is also required by the query manager.
The query manager will also generate metadata about the queries it has run. This metadata can be used to build a history of all queries
run and generate a query profile for each user, group of users and the data warehouse as a whole.
The metadata that is required for each query is:
- query
- tables accessed
- columns accessed
- name
- reference identifier
- restrictions applied
- column name
- table name
- reference identifier
- restriction
- join Criteria applied
- group by criteria
- sort criteria
26. What are the two programs that communicate with the Informatica Server?
Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server:
Server Manager. A client application used to create and manage sessions and batches, and to monitor and stop the Informatica
Server. You can use information provided through the Server Manager to troubleshoot sessions and improve session performance.
pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the Informatica Server, and verify if the
Informatica Server is running.
in a department. Since a standard transformation cannot be used by more than one mapping, you have to set up the same
transformation each time you want to calculate the average salary in a department.
A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can
contain as many transformations as you need. A mapplet can contain transformations, reusable transformations, and shortcuts to
transformations.
46. How do u copy Mapping, Repository, Sessions?
To copy an object (such as a mapping or reusable transformation) from a shared folder, press the Ctrl key and drag and drop
the mapping into the destination folder.
To copy a mapping from a non-shared folder, drag and drop the mapping into the destination folder. In both cases, the
destination folder must be open with the related tool active.
For example, to copy a mapping, the Mapping Designer must be active. To copy a Source Definition, the Source Analyzer
must be active.
Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not dragging it to the workbook.
When asked if you want to make a copy, click Yes, then enter a new name and click OK.
Choose Repository-Save.
Repository Copying: You can copy a repository from one database to another. You use this feature before upgrading, to preserve the
original repository. Copying repositories provides a quick way to copy all metadata you want to use as a basis for a new repository.
If the database into which you plan to copy the repository contains an existing repository, the Repository Manager deletes the
existing repository. If you want to preserve the old repository, cancel the copy. Then back up the existing repository before copying the
new repository.
To copy a repository, you must have one of the following privileges:
Administer Repository privilege
Super User privilege
To copy a repository:
1. In the Repository Manager, choose Repository-Copy Repository.
2. Select a repository you wish to copy, then enter the following information:
-------------------------------- --------------------------- ------------------------------------------------Copy Repository Field Required/ Optional Description
-------------------------------- --------------------------- ------------------------------------------------Repository Required Name for the repository copy. Each repository name must be unique within
the domain and should be easily distinguished from all other repositories.
Database Username Required Username required to connect to the database. This login must have the appropriate database
permissions to create the repository.
Database Password Required Password associated with the database username.Must be in US-ASCII.
ODBC Data Source Required Data source used to connect to the database.
Native Connect String Required Connect string identifying the location of the database.
Code Page Required Character set associated with the repository. Must be a superset of the code
page of the repository you want to copy.
If you are not connected to the repository you want to copy, the Repository Manager asks you to log in.
3. Click OK.
5. If asked whether you want to delete an existing repository data in the second repository, click OK to delete it. Click Cancel to
preserve the existing repository.
Copying Sessions:
In the Server Manager, you can copy stand-alone sessions within a folder, or copy sessions in and out of batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission
Super User privilege
To copy a session:
1. In the Server Manager, select the session you wish to copy.
2. Click the Copy Session button or choose Operations-Copy Session.
The Server Manager makes a copy of the session. The Informatica Server names the copy after the original session, appending a
number, such as session_name1.
47. What are shortcuts, and what is advantage?
Shortcuts allow you to use metadata across folders without making copies, ensuring uniform metadata. A shortcut inherits all
properties of the object to which it points. Once you create a shortcut, you can configure the shortcut name and description.
When the object the shortcut references changes, the shortcut inherits those changes. By using a shortcut instead of a copy,
you ensure each use of the shortcut exactly matches the original object. For example, if you have a shortcut to a target definition, and
you add a column to the definition, the shortcut automatically inherits the additional column.
Shortcuts allow you to reuse an object without creating multiple objects in the repository. For example, you use a source
definition in ten mappings in ten different folders. Instead of creating 10 copies of the same source definition, one in each folder, you
can create 10 shortcuts to the original source definition.
You can create shortcuts to objects in shared folders. If you try to create a shortcut to a non-shared folder, the Designer
creates a copy of the object instead.
You can create shortcuts to the following repository objects:
Source definitions
Reusable transformations
Mapplets
Mappings
Target definitions
Business components
You can create two types of shortcuts:
Local shortcut. A shortcut created in the same repository as the original object.
Global shortcut. A shortcut created in a local repository that references an object in a global repository.
Advantages: One of the primary advantages of using a shortcut is maintenance. If you need to change all instances of an object, you
can edit the original repository object. All shortcuts accessing the object automatically inherit the changes.
Shortcuts have the following advantages over copied repository objects:
You can maintain a common repository object in a single location. If you need to edit the object, all shortcuts immediately inherit the
changes you make.
You can restrict repository users to a set of predefined metadata by asking users to incorporate the shortcuts into their work instead of
developing repository objects independently.
You can develop complex mappings, mapplets, or reusable transformations, then reuse them easily in other folders.
You can save space in your repository by keeping a single repository object and using shortcuts to that object, instead of creating
copies of the object in multiple folders or multiple repositories.
48. What are Pre-session and Post-session Options?
(Plzz refer Help Using Shell Commands n Post-Session Commands and Email)
The Informatica Server can perform one or more shell commands before or after the session runs. Shell commands are
operating system commands. You can use pre- or post- session shell commands, for example, to delete a reject file or session log, or
to archive target files before the session begins.
The status of the shell command, whether it completed successfully or failed, appears in the session log file.
To call a pre- or post-session shell command you must:
1. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows NT servers.
2. Configure the session to execute the pre- or post-session shell commands.
You can configure a session to stop if the Informatica Server encounters an error while executing pre-session shell commands.
For example, you might use a shell command to copy a file from one directory to another. For a Windows NT server you would use the
following shell command to copy the SALES_ ADJ file from the target directory, L, to the source, H:
copy L:\sales\sales_adj H:\marketing
For a UNIX server, you would use the following command line to perform a similar operation:
cp sales/sales_adj marketing/
Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the Informatica Server. Environment settings in one
shell command script do not carry over to other scripts. To run all shell commands in the same environment, call a single shell script
that in turn invokes other scripts.
49. What are Folder Versions?
In the Repository Manager, you can create different versions within a folder to help you archive work in development. You can
copy versions to other folders as well. When you save a version, you save all metadata at a particular point in development. Later
versions contain new or modified metadata, reflecting work that you have completed since the last version.
Maintaining different versions lets you revert to earlier work when needed. By archiving the contents of a folder into a version
each time you reach a development landmark, you can access those versions if later edits prove unsuccessful.
You create a folder version after completing a version of a difficult mapping, then continue working on the mapping. If you are
unhappy with the results of subsequent work, you can revert to the previous version, then create a new version to continue
development. Thus you keep the landmark version intact, but available for regression
50. How do automate/schedule sessions/batches n did u use any tool for automating Sessions/batch?
We scheduled our sessions/batches using Server Manager.
You can either schedule a session to run at a given time or interval, or you can manually start the session.
U need to create sessions n batches with Read n Execute permissions or super user privilege.
If you configure a batch to run only on demand, you cannot schedule it.
Note: We did not use any tool for automation process.
51. What are the differences between 4.7 and 5.1 versions?
New Transformations added like XML Transformation and MQ Series Transformation, and Power Mart and Power Center both are
same from 5.1version.
52. What r the procedure that u need to undergo before moving Mappings/sessions from Testing/Development to Production?
53. How many values it (Informatica server) returns when it passes thru Connected Lookup n Unconnected Lookup?
Connected Lookup can return multiple values where as Unconnected Lookup will return only one values that is Return Value.
54. What is the difference between PowerMart and PowerCenter in 4.7.2?
If You Are Using PowerCenter
PowerCenter allows you to register and run multiple Informatica Servers against the same repository. Because you can run
these servers at the same time, you can distribute the repository session load across available servers to improve overall performance.
With PowerCenter, you receive all product functionality, including distributed metadata, the ability to organize repositories into
a data mart domain and share metadata across repositories
A PowerCenter license lets you create a single repository that you can configure as a global repository, the core component of
a data warehouse.
If You Are Using PowerMart
This version of PowerMart includes all features except distributed metadata and multiple registered servers. Also, the various
options available with PowerCenter (such as PowerCenter Integration Server for BW, PowerConnect for IBM DB2,
PowerConnect for SAP R/3, and PowerConnect for PeopleSoft) are not available with PowerMart.
55. What kind of modifications u can do/perform with each Transformation?
Using transformations, you can modify data in the following ways:
----------------- -----------------------Task Transformation
----------------- -----------------------Calculate a value Expression
Perform an aggregate calculations Aggregator
Modify text Expression
Filter records Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure Stored Procedure
Call a procedure in a shared library or in the External Procedure
COM layer of Windows NT
Generate primary keys Sequence Generator
Limit records to a top or bottom range Rank
Normalize records, including those read Normalizer
from COBOL sources
Look up values Lookup
Determine whether to insert, delete, update, Update Strategy
or reject records
Join records from different databases Joiner
or flat file systems
56. Expressions in Transformations, Explain briefly how do u use?
Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The most obvious examples of these are the
Expression and Aggregator transformations, which perform calculations on either single values or an entire range of values within a
port. Transformations that use expressions include the following:
--------------------- -----------------------------------------Transformation How It Uses Expressions
--------------------- -----------------------------------------Expression Calculates the result of an expression for each row passing through the transformation, using values from one or more
ports.
Aggregator Calculates the result of an aggregate expression, such as a sum or average, based on all data passing through a port or
on groups within that data.
Filter Filters records based on a condition you enter using an expression.
Rank Filters the top or bottom range of records; based on a condition you enter using an expression.
Update Strategy Assigns a numeric code to each record based on an expression, indicating whether the Informatica Server should use
the information in the record to insert, delete, or update the target.
In each transformation, you use the Expression Editor to enter the expression. The Expression Editor supports the
transformation language for building expressions. The transformation language uses SQL-like functions, operators, and other
components to build the expression. For example, as in SQL, the transformation language includes the functions COUNT and SUM.
However, the Power Mart / Power Center transformation language includes additional functions not found in SQL.
When you enter the expression, you can use values available through ports. For example, if the transformation has two input
ports representing a price and sales tax rate, you can calculate the final sales tax using these two values. The ports used in the
expression can appear in the same transformation, or you can use output ports in other transformations.
57. In case of Flat files (which comes thru FTP as source) has not arrived then what happens?Where do u set this option?
U get an fatal error which cause server to fail/stop the session.
U can set Event-Based Scheduling Option in Session Properties under General tab-->Advanced options..
----------------- ------------------- -----------------Event-Based Required/ Optional Description
----------------- -------------------- -----------------Indicator File to Wait For Optional Required to use event-based scheduling. Enter the indicator file (or directory and file) whose arrival
schedules the session. If you do not enter a directory, the Informatica Server assumes the file appears in the server variable directory
$PMRootDir.
58. What is the Test Load Option and when you use in Server Manager?
When testing sessions in development, you may not need to process the entire source. If this is true, use the Test Load
Option(Session Properties General Tab Target Options. Choose Target Load options as Normal (option button), with Test Load
cheked (Check box) and No.of rows to test ex.2000 (Text box with Scrolls)). You can also click the Start button.
59. What is difference between data scrubbing and data cleansing?
Scrubbing data is the process of cleaning up the junk in legacy data and making it accurate and useful for the next generations of
automated systems. This is perhaps the most difficult of all conversion activities. Very often, this is made more difficult when the
customer wants to make good data out of bad data. This is the dog work. It is also the most important and can not be done
without the active participation of the user.
Data Cleansing - a two step process including DETECTION and then CORRECTION of errors in a data set
OLAP
Decision support
subject oriented
Historical data
Summarized, Isolated
Complex query
a data warehouse
72. What are the various ETL tools in the Market?
Data Stage, Data Junction, Abinitio, Informatica, Cognos, OWB
73. What are the various Reporting tools in the Market?
Seagate Crystal Reports, Business Objects, Microstrategy, Cognos
74. What is Fact table?
A table in a star schema that contains facts. A fact table typically has two types of columns: those that contain facts and those
that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign
keys.
(OR)
The tables which are extracted from heterogeneous sources and used in the Data Warehouse
75. What is a dimension table?
Dimension tables describe the business entities of an enterprise, represented as hierarchical, categorical information such as
time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.
76. What is a lookup table?
Another name for Dimension table.
77. What is a general purpose scheduling tool? Name some of them?
To schedule jobs. Operating system can be used for scheduling.
Tools:
78. What are modeling tools available in the Market? Name some of them?
Visio-based database modeling component, ERWin
109. Can we override a native sql query within Informatica? Where do we do it? How do we do it?
Yes, with Override sql query in the Mapping Properties
Eg: Select * from. Where.. Its advised not to use ORDER BY Clause here.
110. Can we use procedural logic inside Informatica? If yes how , if no how can we use external procedural logic in
Informatica?
Do we need an ETL tool? When do we go for the tools in the market?
When the ETL process requires less technical skills and more automation and involves heavy volume of data
111. How do we extract SAP data Using Informatica?