Is Sorter Transformation Passive or Active ?: 1. When We Want To Get Single Return Value

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

Is sorter transformation passive or active ?

Sorter Transformation is passive when the DISTINCT option is


unchecked bcoz it wont change the number of records passes through
it. Once the DISTINCT option is checked it may change the no of
records if the records has duplicates.

why we use source qualifier transformation? Answer


Source qualifier is the first transformation which reads data from
relation of FF source.
1. With SQ we can limit the source data or
2. sort or aggregate or
3. remove the duplicate data via SQL Query.

when we go for unconnected lookup transformation? and


why?
1. When we want to get single return value,
2. we can call this anywhere in the mapping
3. It gives u the flexibility to run lookup for selected source rows
which you don’t find in connected lookup

1. Difference between connected and unconnected lookup, which one


is faster?
Connected is faster as it is connected to transformations.
2. Star schema
3. Bottlenecks
4. XML case sensitive
5. Largest table size
6. Lead experience
7. APN Software experience
8. Informatica experience
9. Something about myself
10. Database Tuning
11. Active and passive transformation
12. How to create sequence generator in Mapplet
13. Different commit options
Session property – Commit Type and Commit interval
14. Different email options
15. Which folder reject files will go in?
Badfiles
16. What’s the use of copy paste in Repository manager
Copies all the related components
17. Need insert and update, how to proceed?
Use simple update strategy Transformations and use session
property data driven.
18. Can you create Target transformation in mapplet ?
No
19. You can attach more than one source to same source qualifier
Yes
20. What is constraint base loading?
21. What is Target load order?

Now what are some of the pros and cons of ELT?


Pros:
* ELT leverages RDBMS engine hardware for scalability
* ELT keeps all data in the RDBMS all the time
* ELT is parallelized according to the data set, and disk I/O is usually
optimized at the engine level for faster throughput.
* ELT Scales as long as the hardware and RDBMS engine can continue
to scale.
* ETL can achieve 3x to 4x the throughput rates on the appropriately
tuned MPP RDBMS platform.

Cons:
* ELT relies on proper database tuning, proper data model
architecture, normalized data model architecture
* ELT relies on MPP hardware
* ELT can easily eat 100% of the hardware resources available for
complex and huge operations
* ELT can't balance the workload
* ELT can't reach out to alternate systems (all data must exist in the
RDBMS BEFORE ELT operations take place)
* ELT easily doubles, triples, and quadruples disk storage
requirements (more processes, each simpler, each requiring
intermediate temporary tables).
* ELT (sometimes) is not 100% metadata lineage traceable.
* ELT can take longer to design and implement, more steps, less
complicated per step, but usually results in more custom SQL code
(sometimes this is where metadata is lost).

Pros and Cons of ETL


Pros:
* ETL can balance the workload / share the workload with the RDBMS
* ETL can perform more complex operations in single data flow
diagrams (data maps)
* ETL can scale with separate hardware.
* ETL can handle Partitioning and parallelism independent of the data
model, database layout, and source data model architecture.
* ETL can process data in-stream, as it transfers from source to target
* ETL does not require co-location of data sets in order to do it's work.
* ETL captures huge amounts of metadata lineage today.
* ETL can run on SMP or MPP hardware

Cons:
* ETL requires separate and equally powerful hardware in order to
scale.
* ETL can "bounce" data to and from the target database, requires
separate caching mechanisms which sometimes don't scale to the
magnitude of the data set - this can result in scalability and
performance issues.
* ETL cannot perform as fast as ELT without twice the size of
hardware (usually for RAM and CPU resources).

What is Session and Batches?

Session - A Session Is A set of instructions that tells the Informatica


Server How And When To Move DataFrom Sources To Targets. After
creating the session, we can use either the server manager or the
command line program pmcmd to start or stop the session. A session
is a type of task, similar to other tasks available in the Workflow
Manager. In the Workflow Manager, you configure a session by
creating a Session task. You can run as many sessions in a workflow
as you need. You can run the Session tasks sequentially or
concurrently, depending on the requirement.
Batches - A Batch is set of tasks that may include one or more
numbar of tasks (sessions, ewent wait, email, command, etc..,) . It
Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server. There Are Two Types Of Batches
:
Sequential - Run Session One after the Other. When Data moves one
after another from source to target
concurrent - Run Session At The Same Time. When whole data moves
simultaneously from source to target it is Concurrent

What are Target Types on the server?

• Relational. Oracle, Sybase, Sybase IQ, Informix, IBM

DB2, Microsoft SQL Server, and Teradata.

• File. Fixed and delimited flat file and XML.

• Application. You can purchase additional PowerCenter Connect

products to load data into SAP BW. You can also load data

into IBM MQSeries message queues and TIBCO.

• Other. Microsoft Access.

You can load data into targets using ODBC or native drivers, FTP, or
external loaders.

What are Target Options on the Servers?

Target Options for File Target type are FTP File, Loader and MQ.

There are no target options for ERP target type

Target Options for Relational are Insert, Update (as Update), Update
(as Insert), Update (else Insert), Delete, and Truncate Table.

How do you identify existing rows of data in the target table


using lookup transformation?

There are two ways to lookup the target table to verify a row exists or
not :
1. Use connect dynamic cache lookup and then check the values of
NewLookuprow Output port to decide whether the incoming record
already exists in the table / cache or not.

2. Use Unconnected lookup and call it from an expression


transformation and check the Lookup condition port value (Null/ Not
Null) to decide whether the incoming record already exists in the
table or not.

What are 2 modes of data movement in Informatica Server?

a) Unicode - IS allows 2 bytes for each character and uses additional


byte for each non-ascii character (such as Japanese characters)

b) ASCII - IS holds all data in a single byte

What is Code Page used for?

Code Page is used to identify characters that might be in different


languages. If you are importing Japanese data into mapping, u must
select the Japanese code page of source data.

What is Code Page Compatibility?

Compatibility between code pages is used for accurate data


movement when the Informatica Sever runs in the Unicode data
movement mode. If the code pages are identical, then there will not
be any data loss. One code page can be a subset or superset of
another. For accurate data movement, the target code page must be
a superset of the source code page.

Superset - A code page is a superset of another code page when it


contains the character encoded in the other code page, it also
contains additional characters not contained in the other code page.

Subset - A code page is a subset of another code page when all


characters in the code page are encoded in the other code page.

What is Load Manager?


Ans: While running a Workflow,the PowerCenter Server uses the Load
Manager process and the Data Transformation Manager Process
(DTM) to run the workflow and carry out workflow tasks.

When the PowerCenter Server runs a workflow, the Load Manager


performs the following tasks:
1. Locks the workflow and reads workflow properties.
2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.

When the PowerCenter Server runs a session, the DTM performs the
following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is
enabled. Checks query
conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and transformation
threads to extract,transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.

What is source qualifier?

The Source Qualifier transformation represents the rows that the


Integration Service reads when it runs a session.
Select only distinct values from the source.
It can be used for sorting, fitering,
-Create a custom query to issue a special SELECT statement for the
Integration Service to read source data.
Its also Used to Join Homogeneous Source systems.
We can Join Any number of Sources in Singlae Source Qualifier.
We Can't Join the Flatfiles In sourcequalifier Because Flatfiles Are
Heterogeneous When we open the Flatfiles At sourcequalifier At the
time All The options are Disabled.
While importing the relational source definition from
database, what meta data of source is imported?
Source name
Database location
Column names
Datatypes
Key constraints

Mainframe support
COBOL Copy-book files
VSAM files

What is the maplet?

For Ex:Suppose we have several fact tables that require a series of


dimension keys.Then we can create a mapplet which contains a series
of Lkp transformations to find each dimension key and use it in each
fact table mapping instead of creating the same Lkp logic in each
mapping.

what is a transforamation?

Transformation is a repository object of converting a given input to


desired output.It can generates,modifies and passes the data.

Connected transformation is a part of your data flow in the pipeline


while unconnected Transformation is not.

How many ways u create ports?


1.Drag the port from another transforamtion
2.Click the add buttion on the ports tab.
What r the reusable transforamtions?
1. A reusable transformation can be used in multiple transformations
2.The designer stores each reusable transformation as metada separate from
any mappings that use the transformation.
3. Every reusable transformation falls within a category of transformations available in the Designer
4.one can only create an External Procedure transformation as a reusable transformation.
Reusable transformations can be used in multiple mappings.When u need to incorporate
this transformation into maping,U add an instance of it to maping.Later if U change the definition of the
transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a
pointer to that transforamtion,U can change the transforamation in the transformation developer,its
instances automatically reflect these changes.This feature can save U great deal of work.

You might also like