Professional Documents
Culture Documents
P69 Latam Partner Bootcamp - Data Platform
P69 Latam Partner Bootcamp - Data Platform
P69 Latam Partner Bootcamp - Data Platform
Overview
Nathan Colossi
Cloud Solution Architect
Data and AI
Transform data into intelligent action
Information Big Data Stores Machine Learning Intelligence
Management and Analytics
Data People
Sources
Machine Cognitive
Data Factory Data Lake Store
Learning Services
“We had an incident lasting for about ““Intelligent Insights proactively “SQL Threat Detection helps us to
6 months. Before Intelligent insights finds a database performance be ahead of the threats instead
we have not had a way of figuring out problem in a more efficient way of chasing them.”
where do we even start and much faster than humans.
troubleshooting. Intelligent Insights With it we can proactively help
gave us a list of things to do. What customers until we have a fix Shahin Kohan, CTO
Intelligent Insight does is that it for the problem.”, Bauke Stil,
enables us to pinpoint where the “SQL Threat Detection helps us
Application Manager, SnelStart
problem is and to get a fix deployed respond to activities, which were
within 24hrs.” not visible beforehand.”
Frans Lytzen – CTO, New Orbit Bauke Stil, App Manager, SnelStart Manrique Logan, architect & technical
lead
INTRODUCING AZURE SQL DATABASE MANAGED INSTANCE
Managed
SQL SQL Single Elastic Pool
Instance
Instance-scoped programming Standalone managed Shared resource model for
model with high compatibility database for predictable and greater efficiency through
to SQL Server stable workloads multi-tenancy
…
Data Sources Ingest Transform & Analyze Publish
Customer Table
Customer Table
Customer Customer
On Premises Table
Geocode
Game Usage
Data Mart
Azure Blob Storage Azure DB
Basic Concepts
• ADF is Microsoft’s unified platform for ETL/ELT services in the cloud
• ADF allows you to build data pipelines and execute them/schedule their runs
• Data pipeline is a chain/group of activities to be performed on your data, e.g. data
movements/transformations
• Some activities are powered by services with data store/compute resources allocated outside ADF, e.g.
HDI/ML/etc.
UX & SDK
Data Factory
Authoring | Monitoring/Mgmt A data integration account.
Location of orchestration, service metadata
Azure Data Factory v2 Service
Scheduling | Orchestration | Monitoring
Pipeline SSIS
Package
Integration Runtime (IR)
ADF’s execution engine
Self Hosted
Integration Azure
Integration
To integrate data flow and control flow across the
Runtime
Runtime enterprises’ hybrid cloud, customer can instantiate
multiple IR instances for different network
environments:
On Prem Apps & Cloud Svcs, Apps - On premises (similar to DMG in ADF V1)
Data & Data - In public cloud
- Inside VNet
Azure SQL Data Warehouse
A Z U R E S Q L D ATA WA R E H O U S E
G r o w, s h r i n k , a n d p a us e i n s e c o n d s
PREVIEW
Compute-optimized for demanding workloads
PREVIEW
U n l i m i t e d c ol u m n a r s t o r a g e
Fully managed PaaS
Data Warehouse Units DWU
Control
Compute
Storage
Distributing Data
CREATE TABLE [build].[FactOnlineSales]
(
[OnlineSalesKey] int NOT NULL
, [DateKey] datetime NOT NULL
, [StoreKey] int NOT NULL
, [ProductKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [CurrencyKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [SalesOrderLineNumber] int NULL
, [SalesQuantity] int NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
(
DISTRIBUTION = HASH([ProductKey])
)
;
Query Execution Query
Result
Control
Compute
Storage
Azure Analysis Services
Azure Analysis Services
Enterprise grade analytics engine as a service
Cloud Cloud
On-premises On-premises
Azure Analysis Services
SQL Server Power BI Desktop
Analytics platform
system Excel
Lifecycle management Business logic & metrics
Other data sources
Third party BI tools
SQL Server Analysis Services
Power BI
Azure
Analysis Services
Cloud data sources
Visualizations & insights
SQL Database
Power BI
Other SSMS
data sources
Transform data into intelligent action
Information Big Data Stores Machine Learning Intelligence
Management and Analytics
Data People
Sources
Machine Cognitive
Data Factory Data Lake Store
Learning Services
Data Catalog
Apps
Event Hubs
IOT Hubs
Sensors
and
devices
Data
Get more value from your enterprise data assets
Information
Management
Data Factory
Data Catalog
Event Hubs
IOT Hubs
• Spend less time looking for data, and more time getting value from it • Intuitive search and filtering to understand the data sources and their
• Register enterprise data sources, discover data assets and unlock their purpose
potential, and capture tribal knowledge to make data understandable • Let your data live where you want; connect using tools you choose
• Bridge the gap between IT and the business, allowing everyone to • Integrate into existing tools and processes with open REST APIs
contribute their insights, tags, and descriptions
Ingest events from websites, apps and devices at cloud scale
Information Data
Management sources
Data Factory
Apps Azure
API
Data Catalog Management Storage HDInsight
• Log millions of events per second in near real time • Get a managed service with elastic scale
• Connect devices using flexible authorization and throttling • Reach a broad set of platforms using native client libraries
• Use time-based event buffering • Pluggable adapters for other cloud services
• Get a managed service with elastic scale
Specialized Device Scenarios with IOT Suite
IoT Hub 10
1
0 0
IP-capable
1 1
1 0 1 100110
0 1 0 1010
1 0
0 1010
0111
Information
Management Event processing and insight
(e.g. hot and cold paths)
Data Factory
Cloud
PAN-devices
protocol
gateway
Device business logic,
Connectivity monitoring
Data Catalog
Field
gateway
Event Hubs
Application
device provisioning and management
SQL Data
Data Catalog Warehouse
Apps
IOT Hubs
Sensors
and
devices
Data
A hyper-scale repository for big data analytics workloads
Big Data Stores
ADL Analytics
Data Lake Store
Devices Social
HDInsight
SQL Data
Warehouse LOB ADL Store
Applications Video
R
Cosmos DB
Web Sensors
Spark
• A Hadoop Distributed File System for the cloud • Massive throughput to increase analytic performance
• No fixed limits on file size • High durability, availability, and reliability
• No fixed limits on account size • Azure Active Directory access control
• Unstructured and structured data in their native format
Data lake is the center of a big data solution
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
Use Cases Batch, Interactive, Streaming App backend, backup data, media storage
for streaming
Size Limits No limits on account size, file size, # files 500TB account, 4.75TB file
Billing Pay for data stored and for I/O Pay for data stored and for I/O
Region Availability Two US regions (East, Central) & North All Azure Regions
Europe (Other regions coming soon)
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
SQL Data
Warehouse
Column-family
Document
Cosmos DB
Key-value Graph
Black Friday
12000000
10000000
Transparent server-side partition management
8000000
6000000
4000000
Elastically scale storage (GB to PB) and throughput (100 to 100M req/sec)
across many machines and multiple regions
2000000
Cosmos DB
Permissions
JS Triggers
JS User-defined functions
• Designed for modern mobile and web applications • Enables complex ad hoc queries using a dialect of SQL
• Delivers consistently fast reads and writes, schema flexibility, and • Supports multi-document transaction processing using the
the ability to easily scale a database up and down on demand familiar programming model of stored procedures,
• Offers native support for JavaScript, SQL query, and transactions triggers, and UDFs
over JSON documents
Machine Learning and Analytics
Information Big Data Stores Machine Learning
Management and Analytics
Data
Sources
Machine
Data Factory Data Lake Store
Learning
Data Intelligence
Big data analytics made easy
Machine Learning
and Analytics
Machine
Learning
Data Lake Analytics
Data Lake
Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Azure Analysis
Services SQL Data SQL Database Data Lake Store Storage Blobs SQL Database
Warehouse in a VM
• Analyze data of any kind and size • Managed and supported with an enterprise-grade SLA
• Develop faster, debug and optimize smarter • Dynamically scales to match your business priorities
• Interactively explore patterns in your data • Enterprise-grade security with Azure Active Directory
• No learning curve—use U-SQL (SQL with C#) • Built on YARN, designed for the cloud
Develop massively parallel programs with simplicity
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT @searchlog
TO @"/Samples/Output/SearchLog_output.tsv"
USING Outputters.Tsv();
A simple U-SQL script can scale U-SQL automatically generates a scaled Execution nodes immediately
from Gigabytes to Petabytes without out and optimized execution plan to rapidly allocated to run the
learning complex big data handle any amount of data. program.
programming techniques.
Error handling, network issues, and
runtime optimization are handled
automatically.
1
DECLARE @endDate DateTime = DateTime.Now;
U-SQL
DECLARE @startDate DateTime = @endDate.AddDays(-7);
Basics
@orders = 2
EXTRACT
OrderId int, (1) DECLARE constant values using C#
Customer string, Expressions
Date DateTime,
Amount float
(2) EXTRACT performs schema on
FROM "/input/orders.txt"
USING Extractors.Tsv(); read for files and places results in a
RowSet
Notes:
OUTPUT @orders • Whole-script optimization
TO "/output/output.txt"
USING Outputters.Tsv();
4
Fully-managed Hadoop and Spark
Azure for the cloud
HDInsight 100% Open Source Hortonworks
data platform
Hadoop and Spark
Clusters up and running in minutes
as a Service on Azure
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Microsoft + Hortonworks
Promoting Open Hadoop
Engineering alignment
Corporate alignment
Field alignment
Hortonworks Data Platform (HDP) 2.6
(under the covers of HDInsight)
Simply put, Hortonworks ties all the open source products together (22)
Spark for Azure HDInsight
In Memory Processing on Multiple Workloads
• Single execution model for multiple
Spark
SQL
Spark
Streaming
Machine
Learning
Graph tasks
32x Speedup
1400s 40X
Speedup 100x
44.3s Speedup
35.1s 15s
HDP 2.1
Hive 10 HDP 1.3 / HDP 2.0
HDInsight Supports HBase
NoSQL database on data in HDInsight
Columnar, NoSQL database
Runs on top of the Hadoop Distributed File System (HDFS)
Provides flexibility in that new columns can be added to column families at any time
HMaster
Coordination
Name Node Region Server Region Server Region Server Region Server
Job Tracker
Apache
Storm on HBase Web/thick client
Kafka / HDInsight
Applications RabbitMQ / dashboards
ActiveMQ HDFS
Field Storage
Web and Social gateways adapters Devices to take action
R Server for HDInsight
Data Lake
Analytics Event Hubs Blob Storage
HDInsight
(Hadoop and Stream
Spark)
10 Analytics
Stream Analytics 10 Power BI Table Storage
Azure Analysis
Services Blob Storage
• Perform real-time analytics for your Internet of Things solutions • Create real-time dashboards and alerts over data from
• Stream millions of events per second devices and applications
• Correlate across multiple streams of data
• Get mission-critical reliability and performance with predictable
results • Use familiar SQL-based language for rapid development
Tumbling Window
Greatest
SIMPLICITY. Maximum
Lowest TCO
Job Service Cluster Service CONTROL
Azure Stream Analytics Spark Streaming & Storm (HD Insight) Virtual Machines
AZURE DATABRICKS
Azure Databricks
Collaborative Workspace
Data warehouses
Optimized Databricks Runtime Engine Data exports
Hadoop storage
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
Data warehouses
Enhance Productivity Build on secure & trusted cloud Scale without limits
KNOWING THE VARIOUS BIG DATA SOLUTIONS
Azure HDInsight
ANALYTICS
BIG DATA
Azure Marketplace
HDP | CDH | MapR
Any Hadoop technology, Workload optimized, Frictionless & Optimized Data Engineering in a
any distribution managed clusters Spark clusters Job-as-a-service model
BIG DATA
STORAGE
Azure Storage
Intelligence
Information Big Data Stores Machine Learning Intelligence
Management and Analytics
Data
Sources
Machine Cognitive
Data Factory Data Lake Store
Learning Services
Data Intelligence
Dashboards & Visualizations
Information Big Data Stores Machine Learning Intelligence
Management and Analytics
Data
Sources
Machine Cognitive
Data Factory Data Lake Store
Learning Services
Stream Analytics
IOT Hubs Dashboards &
Visualizations
Sensors Azure
Azure Analysis
Analysis
and Services
Services
devices Power BI
Azure Databricks
Data Intelligence
Power BI: experience your data
Any data, any way, anywhere
Web
Mobile
Out-of-the-box SaaS content packs
Power BI
On-premises
“……”
data
Power BI product portfolio
Author Share and collaborate Large scale deployments Share and collaborate App dev
</>
Free data analysis Cloud-based modern Dedicated capacity for On-premises report server Visual analytics embedded
and report authoring tool business analytics solution increased performance in your applications
Power BI Desktop
Free companion authoring tool for the Power BI service
Get Data
Easily connect, clean, and mashup data
Analyze
Build powerful models and flexible measures
Visualize
Create stunning interactive reports
Publish
Share insights with others
Shared capacity
Greater scale and performance Premium capacity – P1
Premium capacity – P2
My workspace
Premium capacity – P3 User 2
Flexibility to license by capacity
App workspace My workspace
Sales User 3
My workspace
App workspace
User 1
Extending on-premises capabilities Marketing
APIs
Custom app
Power BI Premium
Power BI Report Server
Connect to data
Over 70+. Data can be imported, queried directly or live connection to SSAS
Power BI reports
Fully interactive reports on-premises to visualize your data and gain insights
SSRS reports
Precisely formatted operational reports