Professional Documents
Culture Documents
Informatica
Informatica
local repository into a global one. It supports ERP sources as well as diverse local and global
repositories.
Informatica Power Connect – This component mines the bulks of raw data and extracts
meaningful insights and metadata from the ERPs and third party applications.
Informatica Power Mart – This component is responsible for processing comparatively lesser
volumes of data and supports local repositories only. Unlike Informatica Power center, Power
Mart does not support ERPs and global repositories.
Informatica Power Exchange – This component supports batch, real-time and changed data
capture options in various set-ups. It allows the companies to leverage the data by avoiding
coding data extraction programs manually.
Informatica Power Analyses – This component provides various reporting facilities that help
companies have a clear vision into the business processes. This tool provides versatile benefits
ranging from accessing, examining and sharing the enterprise data in a lucid way.
Informatica Power Quality – This component scales the services to share them across multiple
machines. It consists of a set of applications and components that improve the enterprise-wide
data quality.
Informatica Domain – It is an administrative unit consisting nodes and services. These nodes
and services can be further categorized into folders and subfolders. There are basically two types
of services in the Informatica Domain- Service Manager and Application Services. While the
former is responsible for authenticating/authorizing the loggings and running the application
services, the latter represents the integration services, repository services and reporting services.
Repository Service – This service maintains a connection between the clients and PowerCenter
repository. It is a multi-threaded process that fetches, inserts and updates the metadata. It also
maintains a uniformity within the repository metadata.
Nodes – Nodes are the computing platforms where the aforementioned services are executed.
Reporting Service – Reporting services are responsible for handling the metadata and allowing
other services to access the same.
Integration Service – This service is the engine that executes the tasks created in the
Informatica tool. It is nothing but a process inside the server waiting for the tasks to be assigned.
As soon as a workflow is executed, the integration service gets the details and executes it.
PowerCenter Designer – It is a developer tool used for creating ETL mappings between source
and target.
Data Marts
Data mart can be defined as the subset of data warehouse of an organization which is limited to a
specific business unit or group of users. It is a subject-oriented database and is also known as
High Performance Query Structures (HPQS).
Dependent Data Mart – This data mart depends on the enterprise data warehouse and works in
top-down manner.
Independent Data Mart – This data mart does not depend on the enterprise data warehouse and
works in bottom-up manner.
The Credit risk Report server will compile the data required from the sources systems (Debt manager
and Transact SM) "
Strategic Reporting
Facts and Dimension
In Memory is having Huge Cache
Landing Zone
Staging Zone
The pipeline is built upon the following AWS services and open source software:
AWS Step to manage and orchestrate the event driven process from file arrival to dataset publishing.
http://versent.com.au/insights/aws-re-invent-2017-recap-the-security-version
The results from GuardDuty can be pushed to AWS CloudWatch Events to trigger AWS Lambda functions
to perform specific actions based on the type of the issue discovered by GuardDuty.
https://aws.amazon.com/guardduty/
=========
1.Nimbus Node
2.Zookeeper nodes
3.Supervisor Nodes
Amazon RedShift
===============
• Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in java
for data cleaning and preprocessing.
• Importing and exporting data into HDFS and Hive using Sqoop
• Experienced in running Hadoop streaming jobs to process terabytes of xml format data
• Load and transform large sets of structured, semi structured and unstructured data
• Responsible to manage data coming from different sources
• Ensure projects are completed on-time, quality and within budget on client expectations
• Manage day-to-day relationship with client and internal stakeholders and resolution of all issues
Monolithic to Microservices
• Installed and configured Cloudera Hadoop, Developed multiple map reduce jobs in java for data
cleaning and preprocessing.
• Involved in creating Hive tables, loading with data and writing hive queries which will run
internally in map reduce way.
• Gained very good business knowledge on health insurance, claim processing, fraud suspect
identification, appeals process etc.
• Reconciliation Concept
===================
Data Loader
Data Sync
Data Replication
Contact Validation
Custom
Apps
Data Replication
Amazon EC2
Amazon RedShift
Amazon EMR
Amzon RDS
Amzon Dynamo DB
Amazon Aurora
Amzon S3
Deliver deep level technical workshops, executing on the defined learning path and
developing technical aspects of key scenarios defined in the technology roadmap. Enable and
teach the partner in performing and delivering Architecture Design Session (ADS). Provide
guidance for completing competency/qualifications technical requirements.
Enable the partner in identifying technology opportunities that enable and/or support the
creation of differentiated offers in the market. Enable the partner in creating/adapting SOW for
the offers, defining the delivery/operational models and including adoption services and
activities.
Enable the partner to be successful during pre-sales activities which includes delivery of
Proof Of Concepts (POC), Pilots, Prototypes and technical blockers and objections removal.
Enable the partner to discover ways to automating solution to reduce costs and create
repeatability, while documenting processes for knowledge retention and IP.
Qualifications
Experiences Required: Education, Key Experiences, Skills and Knowledge:
Ability to create deep technical relationships and assess level of the partner technical
roles, designing a time effective learning path and evaluate the progress toward the
defined milestones.
Deep technical skill and significant experience in the relevant practice area and
technology focus.
Work with solutions architect(s) to provide a consensus based enterprise solution that is
scalable, adaptable and in synchronization with ever-changing business needs.
1. Data Analytics - Translate business objectives into analytic approaches and identify data
sources to support analysis. Design, develop and implement analytical techniques on
large, complex, structured and unstructured data sets to help make better decisions.
2. Data Integration - Connect offline and online data to continuously improve overall
understanding of customer behavior and journeys for personalization. Data pre-
processing including collecting, parsing, managing, analyzing and visualizing large sets of
data.
3. Data Quality Management - Cleanse the data and improve data quality and readiness
for analysis. Drive standards, define and implement/improve data governance strategies
and enforce best practices to scale data analysis across platforms.
4. Data Mining - Implement statistical and data mining techniques e.g. hypothesis testing,
joins, aggregations, regressions, associations, correlations, inferences, clustering, graph
analysis and retrieval processes on a large amount of data to identify trends, figures and
KPIs across business units, segments, regions etc.
6. Research - Research on advanced and better ways of solving data specific problems and
establish best practices.
7. Collaborate - Collaborate with other data scientists, subject matter experts, and business
team/s around the globe to deliver strategic advanced data analytics projects from
design to execution.
Mandatory
Desirable
Skillset Required:
Primary skills:
10+ Yrs: Datastage Technical Manager (strong ETL Background); Technical Project
Manager - DWH Projects
Oversee/design info architecture for data warehouse, incl. all info structures i.e. staging
area, data warehouse, data marts, & operational data stores.
Oversee standardization of data def. & dev. of physical/logical modeling; develop
strategies for warehouse & database impl./mgmt.;
Support both development and production support activities.
Provides technical consulting and leadership in identifying and implementing new uses
of information technologies, which assist the functional business units in meeting their
strategic objectives.
Acts as technical resource to lead Delivery staff in all phases of the development and
implementation process.
4. Prepare estimates for project work.
5. Oversees the technical direction of design and development to ensure
alignment with architecture, business requirements, and industry best practices.
1. Squoop
2. Flume
3. Storm
Document data base
data management steps—for example, capturing, cleaning and integrating data. Second, natural-
language processing (NLP) has expanded my thinking on how to measure attitudes. This new knowledge
inspired me t
A Microservice is an independent entity that executes a minimal amount of work upon each
service call; it is independent because normally it does not have to share neither the
persistence support with other Microservices, and communication happens among the
boundaries through interfaces and messages passing.
The data in a data warehouse is typically loaded through an extraction, transformation, and loading
(ETL) process from multiple data sources.
Modern data warehouses are moving toward an extract, load, transformation (ELT) architecture in
which all or most data transformation is performed on the database that hosts the data warehouse.
Although the discussion above has focused on the term "data warehouse", there are two other
important terms that need to be mentioned. These are the data mart and the operation data store
(ODS).
data marts exist in two styles. Independent data marts are those which are fed directly from source
data. They can turn into islands of inconsistent information. Dependent data marts are fed from an
existing data warehouse. Dependent data marts can avoid the problems of inconsistency, but they
require that an enterprise-level data warehouse already exist.
The Java interface called DataStax Driver Cassandra
Interfaces
com.datastax.driver.core.Cluster