Professional Documents
Culture Documents
Module 2 Edcmicdiforcdw 21588742042605
Module 2 Edcmicdiforcdw 21588742042605
`
Today’s Agenda
1 Welcome
3 Efficiently Ingest Streaming Data, Files, and Databases to Cloud Data Warehouses & Data Lakes
4 Integrate & Cleanse Data into Cloud Data Warehouse & Data Lakes
5 Close
Streaming 2
Stream Real-time
6
Processing
Stream Storage Analytics Business
User
IoT Machine Apps
Data
5
Cloud Data Lake Enterprise
Data Provisioning
Data Integration
Log files Social Mobile Analytics Data
Analyst
3 Data Integration & Quality Cloud Data
Data Ingestion
Data Provisioning
Documents Data
Warehouse
Data
Engineer
SaaS Cloud Storage Data Science/AI
Streaming 2
Stream Real-time
6
Processing
Stream Storage Analytics Business
User
IoT Machine Apps
Data
5
Cloud Data Lake Enterprise
Data Provisioning
Data Integration
Log files Social Mobile Analytics Data
Analyst
3 Data Integration & Quality Cloud Data
Data Ingestion
Data Provisioning
Documents Data
Warehouse
Data
Engineer
SaaS Cloud Storage Data Science/AI
20.6 zettabytes 500 million 20 billion Over 94% of data 1 billion workers
per year business data connected center traffic will be assisted by
in global data users and growing devices will come from machine learning
center traffic the Cloud or AI
Data Cataloging is the First Step
Discovery of critical enterprise Flipped the 80-20% rule for Enterprise-scale metadata Opening up data visibility is
data provided foundation for finding data to analyzing effort understanding allows them to fueling development of new
data governance program for data analysts simplify cloud migration services and improving quality
projects of patient care
• • • •
•
Discovery
Profiling •
Semantic Search
Domain Discovery
Enterprise Data Catalog •
Relationships
PK-FK Discovery •
Reviews/Ratings
Questions/Answers
• Lineage • Similarity Clustering • Business Context • Data Certifications
Metadata System of Record
• Impact Analysis • Business Term Association • Custom Annotations • Change Notifications
for The Enterprise
Knowledge Graph + AI/ML
BigQuery Cloud
HANA ADLS Storage
On-prem File BI On-prem/ ETL
Databases Systems Tools SaaS Apps
Enterprise | Unified Metadata | Intelligence
What data assets should be migrated? Are there duplicate data sets?
Enterprise
Data Catalog
Streaming 2
Stream Real-time
6
Processing
Stream Storage Analytics Business
User
IoT Machine Apps
Data
5
Cloud Data Lake Enterprise
Data Provisioning
Data Integration
Log files Social Mobile Analytics Data
Analyst
3 Data Integration & Quality Cloud Data
Data Ingestion
Data Provisioning
Documents Data
Warehouse
Data
Engineer
SaaS Cloud Storage Data Science/AI
Streaming 2
Stream Real-time
6
Processing
Stream Storage Analytics Business
User
IoT Machine Apps
Data
5
Cloud Data Lake Enterprise
Data Provisioning
Data Integration
Log files Social Mobile Analytics Data
Analyst
3 Data Integration & Quality Cloud Data
Data Ingestion
Data Provisioning
Documents Data
Warehouse
Data
Engineer
SaaS Cloud Storage Data Science/AI
STRUCTURED
STREAMING
• Mass ingestion of files onto • Mass & CDC Ingest on- • Logs/Clickstream ingestion
cloud & on-prem data lakes prem database content onto • IoT data ingestion
• Streaming & IoT data Cloud DWH (Snowflake) • CDC Ingestion onto Kafka
ingestion onto data lake • Mass & CDC ingest
• Mass Ingest on-prem mainframe content onto
database content onto Cloud DWH
cloud/om-prem data lake
Mass Ingestion
AI/ML &
Cloud Storage Data Science
File
Informatica Mass Ingestion
Advanced
Analytics
Google Cloud
Informatica
Storage
Streaming
Cloud Data
ADLS Gen2
Ingestion
Database Reports &
Warehouse Dashboards
IoT Machine Logs
Data
Messaging
Self-Service
Analytics
Messaging
Streaming
Ingestion
27
27 © Informatica. Proprietary and Confidential.
Informatica Cloud Mass
Ingestion Service Overview
• Cloud native service for all ingestion uses cases
• File
• Database (initial and incremental - CDC)
• Streaming & IoT
Combine web searches Identify stress signals Collect and process bedside Run real time fraud
& camera feeds to coming from devices and monitor data for clinical detection machine-
identify the customer act on them before its researchers to more effectively learning model on the
and roll out real-time too late understand and transaction data
offers detect disease
Streaming Ingestion
Advanced
Analytics
Google Cloud
Weblogs Social Storage
Self-Service
Analytics
Amazon Azure Messaging
Kinesis EventHub
Streaming
Analytics
Amazon Azure
Kinesis EventHub
Ingest streaming data: Logs, clickstream, social media, Kafka Real-time monitoring of ingestion jobs with lifecycle management
Kinesis, S3, ADLS, Firehose, etc. and alerting in case of issues
Apply simple transformation rules to ensure the data is ready for Orchestrate streaming data ingestion in hybrid/cloud as managed
analytics and secure service
Connectivity
Edge Transformation
• Streaming sources
• Filter, Segregator, Combiner
• Messaging and data lakes
Apache Kafka
Microsoft ADLS
Microsoft SQL-DW
IICS Snowflake
Communication
Secured
Cloud ata F l
ow
red D
Secu
On Prem
Database Databas
Secure
e
Source Agent Secured Data Flow
Ingestio
n
Database Ingestion
• Provides Database ingestion capabilities as part of IICS • Real-time monitoring of ingestion jobs with lifecycle
Mass Ingestion service management and alerting
• Ingest relational database data from Oracle, SQL-Server & • Orchestrate database data ingestion in hybrid/cloud as
MySQL. Also supporting Schema Drift on CDC supported managed and secure service
Databases
39
39 © Informatica. Proprietary and Confidential.
Database Ingestion
Benefits of new service
6. Auditability; needs to detect/trace changes, and ensure data has been applied where it should be
7. Aiming to provide strong monitoring capabilities with consistent look and feel
Azure Blob
Cloud Redshift S3
1 MI Task
4 Update Job
log
Azure DW, Blob, Data Lake
Secure
2 Agent
3
File Mass
Advanced FTP/SFTP/FTPS Ingestion service
connector
44
44 © Informatica. Proprietary and Confidential.
Demo
File Ingestion
Summary
Streaming 2
Stream Real-time
6
Processing
Stream Storage Analytics Business
User
IoT Machine Apps
Data
5
Cloud Data Lake Enterprise
Data Provisioning
Data Integration
Log files Social Mobile Analytics Data
Analyst
3 Data Integration & Quality Cloud Data
Data Ingestion
Data Provisioning
Documents Data
Warehouse
Data
Engineer
SaaS Cloud Storage Data Science/AI
Streaming 2
Stream Real-time
6
Processing
Stream Storage Analytics Business
User
IoT Machine Apps
Data
5
Cloud Data Lake Enterprise
Data Provisioning
Data Integration
Log files Social Mobile Analytics Data
Analyst
3 Data Integration & Quality Cloud Data
Data Ingestion
Data Provisioning
Documents Data
Warehouse
Data
Engineer
SaaS Cloud Storage Data Science/AI
STRUCTURED
STREAMING
• Easy to use 4-step data ingestion wizard • Broad connectivity to on-premise & cloud • Pushdown integration logic
end-points to cloud data warehouse end points
• Mass ingest hundreds automatically
of data sources from files, databases, • Out-of-the-box transformations &
streams into Cloud Data Warehouse & Data templates for common data warehousing • Intelligently auto-scale, provision, and
Lakes patterns and advanced DI, DQ auto-tune data pipelines using Spark-
based serverless cloud clusters with built-
• Replicate on-premises data to Cloud Data • Built-in transformational capabilities to in HA and Recovery.
Warehouse handle multiple data formats (XML,
JSON, Avro, Parquet)
• Automatically detect schema drift.
• Integrated experience with Enterprise Data
Catalog to search and discover data objects
• Out-of-the-box purpose-built
transformations
• Built-in transformational capabilities to
handle multiple data formats
• Rich-set of functions for data
conversion/manipulation
• No-Hand coding required for data
manipulation or transformation
Standardize integration tasks with parameterization Reusable mapplets and user-defined functions
• Automated parameter file generation with default values.
• Support for connection and object parameter overrides for cloud
databases + data warehouses
• Support for partial parameterization of SQL overrides, Pre-Post
SQL queries in mapping
Mapping Design
Connectivity
Transformations
Debugging
Workflows
Scheduling
Monitoring