Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

2.

 Define Data Services components.


Data Services includes the following standard components:

● Designer
● Repository
● Job Server
● Engines
● Access Server
● Adapters
● Real-time Services
● Address Server
● Cleansing Packages, Dictionaries, and Directories
● Management Console

3. What are the steps included in Data integration process?

● Stage data in an operational datastore, data warehouse, or data mart


● Update staged data in batch or real-time modes
● Create a single environment for developing, testing, and deploying the entire data integration
platform
● Manage a single metadata repository to capture the relationships between different extraction
and access methods and provide integrated lineage and impact analysis.

4. Define the terms Job, Workflow, and Dataflow.


A job is the smallest unit of work that you can schedule independently for execution.
A work flow defines the decision-making process for executing data flows.
Data flows extract, transform, and load data. Everything having to do with data, including reading sources,
transforming data, and loading targets, occurs inside a data flow.
5. How many types of datastores are present in Data services?
There are three, they are:

● Database Datastores: provide a simple way to import metadata directly from an RDBMS.
● Application Datastores: let users easily import metadata from most Enterprise Resource Planning
(ERP) systems.
● Adapter Datastores: can provide access to an application’s data and metadata or just metadata.

6. What are Memory Datastores?


Data Services also allows you to create a database datastore using Memory as the Database type.
Memory Datastores are designed to enhance processing performance of data flows executing in real-time
jobs.
7. What are file formats?
A file format is a set of properties describing the structure of a flat file (ASCII). File formats describe the
metadata structure. File format objects can describe files in:
● Delimited format — Characters such as commas or tabs separate each field
● Fixed width format — The column width is specified by the user
● SAP ERP and R/3 format

8. What is repository? List the types of repositories.


Repository is a set of tables that holds user-created and predefined system objects, source and target
metadata, and transformation rules. There are 3 types of repositories.

● A local repository
● A central repository
● A profiler repository

9. What is the difference between a Repository and a Datastore?


A Repository is a set of tables that hold system objects, source and target metadata, and transformation
rules. A Datastore is an actual connection to a database that holds data.
10. What is the difference between a Parameter and a Variable?
A Parameter is an expression that passes a piece of information to a work flow, data flow or custom
function when it is called in a job. A Variable is a symbolic placeholder for values.
11. When would you use a global variable instead of a local variable?

● While the variable will need to be used multiple times within a job
● While reducing the development time required for passing values between job components
● While creating a dependency between job level global variable name and job components

13. List some reasons why a job might fail to execute?


Incorrect syntax, Job Server not running, port numbers for Designer and Job Server not matching
14. List factors you consider when determining whether to run work flows or data flows serially or
in parallel?
Consider the following:

● Whether or not the flows are independent of each other


● Whether or not the server can handle the processing requirements of flows running at the same
time (in parallel)

16. What are Adapters?


Adapters are additional Java-based programs that can be installed on the job server to provide
connectivity to other systems such as Salesforce.com or the JavaMessagingQueue. There is also a
SoftwareDevelopment Kit (SDK) to allow customers to create adapters for custom applications.
17. List the data integrator transforms.

● Data_Transfer
● Date_Generation
● Effective_Date
● Hierarchy_Flattening
● History_Preserving
● Key_Generation
● Map_CDC_Operation
● Pivot Reverse Pivot
● Table_Comparison
● XML_Pipeline

18. List the Data Quality Transforms.

● Global_Address_Cleanse
● Data_Cleanse
● Match
● Associate
● Country_id
● USA_Regulatory_Address_Cleanse

19. What are Cleansing Packages?


These are packages that enhance the ability of Data Cleanse to accurately process various forms of
global data by including language-specific reference data and parsing rules.
20. What is Data Cleanse?
The Data Cleanse transform identifies and isolates specific parts of mixed data, and standardizes your
data based on information stored in the parsing dictionary, business rules defined in the rule file, and
expressions defined in the pattern file.
21. What is the difference between Dictionary and Directory?
Directories provide information on addresses from postal authorities. Dictionary files are used to identify,
parse, and standardize data such as names, titles, and firm data.
22. Give some examples of how data can be enhanced through the data cleanse transform, and
describe the benefit of those enhancements.

● Enhancement Benefit
● Determine gender distributions and target
● Gender Codes marketing campaigns
● Provide fields for improving matching
● Match Standards results

23. A project requires the parsing of names into given and family, validating address information,
and finding duplicates across several systems. Name the transforms needed and the task they will
perform.

● Data Cleanse: Parse names into given and family.


● Address Cleanse: Validate address information.
● Match: Find duplicates.

24. Describe when to use the USA Regulatory and Global Address Cleanse transforms.
Use the USA Regulatory transform if USPS certification and/or additional options such as DPV and
Geocode are required. Global Address Cleanse should be utilized when processing multi-country data.
25. What are the different strategies you can use to avoid duplicate rows of data when re-loading a
job?

● Using the auto-correct load option in the target table


● Including the Table Comparison transform in the data flow
● Designing the data flow to completely replace the target table during each execution
● Including a preload SQL statement to execute before the table loads

26. What is the use of Auto Correct Load?


It does not allow duplicated data entering into the target table. It works like Type1 Insert else Update the
rows based on Non-matching and matching data respectively.
27. What is the use of Array fetch size?
Array fetch size indicates the number of rows retrieved in a single request to a source database. The
default value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly improve
performance. The maximum value is 5000
29. What is the use of using Number of loaders in Target Table?
Number of loaders loading with one loader is known as Single loader Loading. Loading when the number
of loaders is greater than one is known as Parallel Loading. The default number of loaders is 1. The
maximum number of loaders is 5.
30. What is the difference between lookup (), lookup_ext () and lookup_seq ()?
lookup() : Briefly, It returns single value based on single condition
lookup_ext(): It returns multiple values based on single/multiple condition(s)
lookup_seq(): It returns multiple values based on sequence number
32. What is the use of Map-Operation Transform?
The Map_Operation transform allows you to change operation codes on data sets to produce the desired
output. Operation codes: INSERT UPDATE, DELETE, NORMAL, or DISCARD.
33. What is Hierarchy Flattening?
Constructs a complete hierarchy from parent/child relationships, and then produces a description of the
hierarchy in vertically or horizontally flattened format.
Parent Column, Child Column
Parent Attributes, Child Attributes.
34. What is the use of Case Transform?
Use the Case transform to simplify branch logic in data flows by consolidating case or decision-making
logic into one transform. The transform allows you to split a data set into smaller sets based on logical
branches.

You might also like