Professional Documents
Culture Documents
Azure Data Platform End2End - 2day
Azure Data Platform End2End - 2day
<your name>
<your role>
<your email>
Begin with the end in mind
© Microsoft Corporation
Course Objectives
• We will understand Cloud and Big Data concepts and technologies used to solve the most common
advanced analytics problems
• We will understand the role of Microsoft Azure data services in a modern data platform architecture
• We will look at individual Azure Data Services and use them to implement a modern data platform
reference architecture
• We will have a ARM template of a data platform that will enable us to solve most of our data challenges
© Microsoft Corporation
Important Reminder
• The modern data platform architecture proposed in this course aims to help with your technology
decisions when architecting data solutions in Azure.
• The Azure services covered in this course are only a subset of a much larger family of data services.
Some real-world data scenarios may require the use of services not included in this course.
• This course does not replace in-depth training on each Azure service covered today.
• Some concepts presented in this course can be quite complex and you may need to seek for more
information from different sources.
© Microsoft Corporation
Modern Data Platform Reference Architecture
Load and Ingest Process
Stream Datasets
λ Lambda Architecture and Real-time
Stream Dashboards
Hot Path
V=Velocity Business User
Real Time
IoT devices, sensors, gadgets Analytics
(loosely-typed) Event Hubs Stream Analytics
Cold Path
History and
Trend Analysis
Semi-Structured Data Factory Azure Data Lake Gen2 Databricks CosmosDB Application
V=Volume
csv, logs, json, xml Scheduled / event-
triggered data ingestion
(loosely-typed) Integrate big data
Fast load
data with scenarios with
Polybase/ traditional data
ParquetDirect warehouse
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
Store Serve
© Microsoft Corporation
Lab Guide Azure Data Platform End2End Lab 1: Load Data into Azure Synapse Analytics using Azure Data Factory Pipelines
Lab 2: Transform Big Data using Azure Data Factory Mapping Data Flows
5
ADPLogicApp 1 ADPEventHubs-suffix 3 SynapseStreamAnalytics-suffix
Logic App Event Hubs Event Hubs
4
2 5 6
ADPCosmosDB-suffix
Azure CosmosDB
1 2
3
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix ADPDatabricks
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2 Azure Databricks
Workspace
1 2 1
2
ADPComputerVision
4 Computer Vision API
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3 2
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student͛s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
© Microsoft Corporation
The modern data world out there
© Microsoft Corporation
I tried to understand it, but…
No-SQL Databricks
Storm Data Catalog
IoT PaaS vs IaaS
Hadoop Power BI Streaming
Deep Learning Machine Learning
Predictive Data Mart SMP vs MPP
ETL vs ELT
Data Visualisation Prescriptive
Data Warehouse
Data Lake Master Data
Big Data Data Factory Cloud vs On-prem
Data Quality Velocity, Variety and Volume
© Microsoft Corporation
Semantic Layer Spark AI
The modern data estate
Hybrid
Reason over any data, anywhere Flexibility of choice Security and performance
© Microsoft Corporation
The Microsoft offering
Hybrid
Easiest lift and shift
with no code changes
Industry leader 4 years in a row Operational databases Operational databases 70% faster
T-SQL query over any data Data lakes Data lakes 99.9% SLA
Reason over any data, anywhere Flexibility of choice Security and performance
© Microsoft Corporation
The Azure data landscape
SQL
>_
10
Azure CLI Azure SDK 01
Azure Blob Storage Azure Data Lake Azure Azure Azure ML ML Server Azure
Store HDInsight Databricks Databricks
NSG
Azure ExpressRoute Azure Active Directory Azure network Azure key Operations Azure Functions Visual Studio
security groups management service Management Suite
© Microsoft Corporation
The Azure Big Data landscape
SQL
>_
10
Azure CLI Azure SDK 01
Azure Blob Storage Azure Data Lake Azure Azure Azure ML ML Server Azure
Store HDInsight Databricks Databricks
NSG
Azure ExpressRoute Azure Active Directory Azure network Azure key Operations Azure Functions Visual Studio
security groups management service Management Suite
© Microsoft Corporation
Our job, pretty much
© Microsoft Corporation
Azure Data Architecture Guide
Valuable collection of architecture principles to help you with your technology choices
https://aka.ms/adag
© Microsoft Corporation
Azure Architecture Solutions
Collection of reference architectures for most common challenges
https://azure.microsoft.com/en-us/solutions/architecture/
© Microsoft Corporation
Modern Data Platform Solution Scenarios
Big Data and advanced analytics
SQL
“We want to integrate all our data “We’re trying to predict when “We’re trying to get insights
—including Big Data—with our our customers churn” from our devices in real-time”
data warehouse”
© Microsoft Corporation
Modern Data Platform Concepts
Part I
© Microsoft Corporation
IaaS vs PaaS vs SaaS
On Premises Infrastructure Platform Software
Physical / Virtual as a Service (IaaS) as a Service (PaaS) as a Service (SaaS)
You manage
Applications Applications Applications Applications
management by Microsoft
Runtime Runtime Runtime Runtime
management by Microsoft
Scale, Resilience and
O/S O/S O/S O/S
Managed by
Microsoft
Servers Servers Servers Servers
Azure Azure
Virtual Machines Cloud Services
© Microsoft Corporation
What is a Data Warehouse?
A data warehouse is a large collection of business data used to help an organization make decisions. Data in the
Data Warehouse has been identified as valuable to specifically defined business cases and is stored in a structured
way readily available for reporting and data analysis.
It is not an Operational Database
Different workload types: transactional (DB) versus analytics (DW)
It is not a Data Lake
These are different concepts, they can co-exist and they compliment each other
It is not a Data Mart
A data mart is a subject-oriented database populated from a subset of the Data Warehouse
© Microsoft Corporation
Modern Data Warehousing
© Microsoft Corporation
Modern data warehousing
The modern data warehouse extends the scope of the data warehouse to
serve Big Data that’s prepared with techniques beyond relational ETL
SQL
“We want to integrate all our data “We’re trying to predict when “We’re trying to get insights
—including Big Data—with our our customers churn” from our devices in real-time”
data warehouse”
© Microsoft Corporation
Modern data warehousing pattern
LOB
CRM
BI + Reporting
INGEST STORE PREP MODEL & SERVE
(& store)
Graph
Advanced Analytics
Image
Social Data orchestration Big data store Transform & Clean Data warehouse
and monitoring
AI
IoT
© Microsoft Corporation
SQL Server and Azure SQL Database
© Microsoft Corporation
Data platform continuum
Shared lower cost
IaaS
SQL Server in Azure VM
Virtualized Machines
Virtual
SQL Server Private Cloud
Virtualized Machine + Appliance
Dedicated higher cost
Physical
SQL Server
Physical Machine (raw iron)
Combine data from many sources without Store high volume data in a data lake and access Easily feed integrated data from many sources to
moving or replicating it it easily using either SQL or Spark your model training
Scale out compute and caching to boost Management services, admin portal, and Ingest and prep data and then train, store, and
performance integrated security make it all easy to manage operationalize your models all in one system
Azure SQL Database deployment option
Best for apps that require resource Best for SaaS apps with multiple databases that can share Best for modernization at scale with
guarantee at database level resources at database level, achieving better cost efficiency low friction and effort
General Purpose
Service Tiers
Business Critical
Hyperscale
Serverless
Azure SQL Managed Instance
Enable full isolation from other tenants without VNET support in SQL Database Managed Instance
resource sharing
Promote secure communication over private IP
addresses with native VNET integration
Enable your on-premise identities on cloud
instances, through integration with Azure Active
Directory and AD Connect
Combine the best of SQL Server with the
benefits of a fully-managed service
Use familiar SQL Server features in SQL
Database Managed Instance
Azure Data Factory
© Microsoft Corporation
Azure Data Factory
Hybrid data integration service for enabling code-free ETL
Industry leading Visual Hybrid Pay only for what Managed SSIS
data ingestion No Code you use
UX & SDK
Data Factory
Authoring | Monitoring/Mgmt A data integration account.
Location of orchestration, service metadata
Azure Data Factory v2 Service
Scheduling | Orchestration | Monitoring
Pipeline SSIS
Package
Integration Runtime (IR)
ADF’s execution engine
- Azure Integration Runtime
- Self-Hosted Integration Runtime
- SSIS Integration Runtime
Self-hosted Azure
Integration Runtime Integration Runtime
LEGEND
Linked Command and Control
On-prem Azure Services
Service Apps & Data Data
Azure Data Factory Data Flows
No-code data transformation and preparation @ scale
Code free data transformation @scale Code free data preparation @scale
© Microsoft Corporation
Azure Synapse Analytics
© Microsoft Corporation
Azure Synapse Analytics
Integrated data platform for BI, AI and continuous intelligence
Designed for analytics workloads
Artificial Intelligence / Machine Learning / Internet of Things
at any scale
Intelligent Apps / Business Intelligence
Azure Synapse Analytics
SaaS developer experiences for
code free and code first
Experience Azure Synapse Analytics Studio
Multiple languages suited to
Platform Languages different analytics workloads
MANAGE MENT
SQL Python .NET Java Scala R
Integrated analytics runtimes
Form Factors
available provisioned and
SECURITY
P ROVISIONE D ON- DEM AND serverless on-demand
SQL Analytics offering T-SQL for
Analytics Runtimes
MONITORING
batch, streaming and interactive
processing
SQL Spark for big data processing with
ME TASTORE Python, Scala, R and .NET
DATA INTE GRATION
Integrated platform services
for, management, security,
monitoring, and metastore
Azure Common Data Model
Data Lake Storage Enterprise Security Data lake integrated and
Optimized for Analytics Common Data Model aware
Azure Synapse Analytics MPP Architecture
Data
Snapshot backups
Log
© Microsoft Corporation
Table Distributions
CREATE TABLE [dbo].[FactInternetSales]
Round-robin distributed (
Distributes table rows evenly across all [ProductKey] int NOT NULL,
distributions at random.
[OrderDateKey] int NOT NULL,
Hash distributed [CustomerKey] int NOT NULL,
[PromotionKey] int NOT NULL,
Distributes table rows across the Compute [SalesOrderNumber] nvarchar(20) NOT NULL,
nodes by using a deterministic hash [OrderQuantity] smallint NOT NULL,
function to assign each row to one [UnitPrice] money NOT NULL,
distribution. [SalesAmount] money NOT NULL
)
Replicated WITH
(
Full copy of table accessible on each CLUSTERED COLUMNSTORE INDEX,
Compute node. DISTRIBUTION = HASH([ProductKey]) |
ROUND ROBIN |
REPLICATED
);
© Microsoft Corporation
Polybase
Data ingestion using external data sources -- Create Azure DataLake Gen2 Storage reference
CREATE EXTERNAL DATA SOURCE AzureStorage with
(
TYPE = HADOOP,
LOCATION='abfss://<container>@<storageaccnt>.blob.core.windows.net' ,
© Microsoft Corporation
Lab 1
Load data into Azure Synapse Analytics using Azure Data Factory Pipelines
Load and Ingest Process
Business User
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
© Microsoft Corporation
Store Serve
Lab 1
Azure Data Platform
Lab Architecture Resource Group
PowerBI
MDWResources SynapseDataLakesuffix
Power BI Desktop/
Storage Account Azure Data Lake Storage Gen2
Workspace
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3
or
Azure Bastion
ADPDesktop
Virtual Machine
synapsesql-suffix\SynapseDW
Student͛s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
© Microsoft Corporation
Modern Data Platform Concepts
Part II
© Microsoft Corporation
TY
CI
The Modern Data Problem
LO
VE
Re
al
-ti
m
e
VOLUM
E Ba
tc
h
ZB
How to derive value from data:
GB
What happened historically?
ed
What is happening now? Str u ctur
d at a
What is going to happen?
VAR
IET
Y
What is a Data Lake?
It is a central storage repository that holds data coming from many sources in a raw, granular format. It can store
structured, semi-structured, or unstructured data, which means data ingested quickly and can be kept in a
more flexible format for future use cases.
Best Practices
Benefits
(ELT) volumes of diverse needed to avoid
• Collection of data, data structures Data Swamp
not a platform • Enable advanced • Security
• Perfect place for analytics and data considerations
evolving data exploration • Design your Data
• Scalability and Lake
storage cost • Metadata
reduction management
© Microsoft Corporation
Data Warehouse or Data Lake?
Answer: both.
© Microsoft Corporation
Data Lake Design Considerations
Data Lake Zones Data Governance Considerations
Transient Landing Zone Security and Compliance
Temporary storage of data to meet regulatory and quality control Access Control at Folder/File level
requirements. Limited access. May not be required depending on Encryption at rest
requirements.
Metadata Management
Raw Zone
Data Quality
Original source of data ready for consumption. Metadata publicly
available but access to data still limited. Metadata Management
Trusted Zone Lifecycle Management
Standardized and enriched datasets ready for consumption to those
with appropriate role-based access. Metadata available to all.
Curated/Refined Zone
Data transformed from Trusted Zone to meet specific business
requirements.
Sandbox Zone
Playground for Data Scientists for ad hoc exploratory use cases.
© Microsoft Corporation
Data Lake Design Example
Data Lake Zone
Business Area
Source System
Entity Name
Year
Month
Data Files (standard naming convention)
Source: https://www.sqlchick.com/entries/2019/1/20/faqs-about-organizing-a-data-lake
© Microsoft Corporation
Azure Data Lake Storage Gen2
© Microsoft Corporation
Azure Data Lake Storage Gen2
A “no-compromises” Data Lake: Secure, performant and massively-scalable
A Data Lake that brings together the cost and scale of object storage with the performance and analytics feature set of data lake storage
Single service
Azure Data Lake Storage Gen2
High performance HDFS Endpoint to Azure Blob Storage
Hadoop Filesystem, File and Folder Server Backups, Archive Storage, Semi-
Hierarchy, Granular ACLs structured Data
Common SDK, Tools, Control Plane Object Tiering and Lifecycle AAD integration, RBAC, Storage HA/DR support through ZRS and RA-
Policy Management account security GRS
© Microsoft Corporation
Lab 2: Transform Big Data using
Azure Data Factory Mapping Data
Flows
© Microsoft Corporation
Lab 2
Transform Big Data using Azure Data Factory Mapping Data Flows
Load and Ingest Process
Business User
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
© Microsoft Corporation
Store Serve
Lab 2
Lab Architecture Azure Data Platform
Resource Group
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2
Workspace
1 2
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student͛s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
© Microsoft Corporation
Advanced Analytics
© Microsoft Corporation
Advanced analytics
Advanced analytics goes beyond the traditional business intelligence (BI) and uses mathematical, probabilistic,
and statistical modeling techniques to enable predictive processing and automated decision making.
SQL
“We want to integrate all our data “We’re trying to predict when “We’re trying to get insights
—including Big Data—with our our customers churn” from our devices in real-time”
data warehouse”
© Microsoft Corporation
Modern Data Platform Concepts
Part III
© Microsoft Corporation
Hadoop and Spark in Azure
Open Source Apache Projects for Big Data Compute
It was the original open-source framework for distributed Effective, fast, general-purpose unified cluster computing framework with
processing and analysis of big data sets on clusters. high-level APIs in Java, Scala, Python and R.
© Microsoft Corporation
Apache Spark Value Proposition
Open
Unified and integrated analytics model
Performance
Productivity
100x faster than Hadoop quickly in Scala or libraries in integrated • Contributors: 50
MapReduce in memory Python, Java framework companies and over
• 10x faster on disk • Over 80 high-level • Combine SQL, 400 developers,
• Useful for real-time and operators that make it streaming, graph growing
batch processing easy to build parallel processing
apps • Runs on Hadoop (Yarn,
• Use interactively from Mesos), standalone, or
the Scala and Python in the cloud
shells • Diverse data sources
including HDFS (CSV,
JSON, etc.), Azure
Storage, Cosmos DB,
Azure SQL Data
Warehouse, Parquet,
JDBC
© Microsoft Corporation
Azure Databricks
© Microsoft Corporation
Azure Databricks
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data Factory,
Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB)
© Microsoft Corporation
Azure Databricks
Azure Databricks
Collaborative Workspace
Data warehouses
Optimized Databricks Runtime Engine Data exports
Hadoop storage
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
Data warehouses
Enhance Productivity Build on secure & trusted cloud Scale without limits
© Microsoft Corporation
Azure Databricks Deployment
Cosmos DB SQL DW
Jobs
Azure Resource Interact using UI or Azure Databricks REST API Data Factory
Manager APIs
© Microsoft Corporation
Azure Databricks Clusters
Driver Program
SparkContext
Azure Databricks clusters are the set of Azure Linux VMs that
host the Spark Worker and Driver Nodes
Cluster Manager
Your Spark application code (i.e. Jobs) runs on the provisioned
clusters.
Worker Node Worker Node Worker Node
Azure Databricks clusters are launched in your subscription—
but are managed through the Azure Databricks portal. Cache Cache
Cache
Azure Databricks provides a comprehensive set of graphical
Task Task Task
wizards to manage the complete lifecycle of clusters—from
creation to termination.
With Azure Databricks notebooks you have a default language but you can mix multiple languages in the same notebook:
%python Allows you to execute python code in a notebook (even if that notebook is not python)
%sql Allows you to execute sql code in a notebook (even if that notebook is not sql).
%r Allows you to execute r code in a notebook (even if that notebook is not r).
%scala Allows you to execute scala code in a notebook (even if that notebook is not scala).
%sh Allows you to execute shell code in your notebook.
%fs Allows you to use Databricks Utilities - dbutils filesystem commands.
%md To include rendered markdown
© Microsoft Corporation
Lab 3: Explore Big Data with Azure
Databricks
© Microsoft Corporation
Lab 3
Explore Big Data with Azure Databricks
Load and Ingest Process
Business User
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
Store Serve
© Microsoft Corporation
Lab 3
Lab Architecture
Azure Data Platform
Resource Group
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix ADPDatabricks
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2 Azure Databricks
Workspace
1 2 1
2
ADPComputerVision
Computer Vision API
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3 2
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student͛s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
© Microsoft Corporation
Modern Data Platform Concepts
Part IV
© Microsoft Corporation
Artificial Intelligence
“The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with
intelligent beings.” – Encyclopedia Britannica
Machine Learning
Supervised Learning
Regression
Classification
Unsupervised Learning
Cluster Analysis
Application Examples
Weather Forecast
Fraud Detection
Customer Churn
Insurance Premium
© Microsoft Corporation
What’s No-SQL?
Term coined in 2009 for a developer meetup – ”Not Only SQL” -> “NoSQL”.
Databases that allow you to store and retrieve data in various structures, formats, and models other than
tabular relational model.
Graph Databases
Document Databases
© Microsoft Corporation
Azure AI
© Microsoft Corporation
Azure AI
Solution Areas
Popular frameworks
To build advanced deep learning solutions
PyTorch TensorFlow Scikit-Learn ONNX
Productive services
To empower data science and development teams Azure Azure Machine Machine
Databricks Learning Learning VMs
Powerful infrastructure
To accelerate deep learning
CPU GPU FPGA
Custom
Vision
QnA Maker
Custom Decision Bing Visual Search
Video Indexer
Bing Search
Bing Autosuggest
Search Content
Moderator
Face
Cognitive Services capabilities
Infuse your apps, websites, and bots with human-like intelligence
© Microsoft Corporation
Azure Cosmos DB
Column-family Document
Key-value Graph
Turnkey global
Comprehensive
distribution
SLAs
RESOURCE MODEL
Account
Database
Database
Database
Database
Database
Container = Collection Graph Table
Database
Database
Item
C O N TA I N E R - L E V E L R E S O U R C E S
Account
Database
Database
Database
Database
Database
Container
Database
Database
Item Sproc Trigger UDF Conflict
Lab 4: Add AI to your Big Data
Pipeline with Cognitive Services
© Microsoft Corporation
Lab 4
Add AI to your Big Data Pipeline with Cognitive Services
Load and Ingest Process
Business User
Semi-Structured Data Factory Azure Data Lake Gen2 Databricks CosmosDB Application
V=Volume
Scheduled / event-
csv, logs, json, xml
triggered data ingestion
(loosely-typed) Integrate big data
Fast load
data with scenarios with
Polybase/ traditional data
ParquetDirect warehouse
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
© Microsoft Corporation
Store Serve
Lab 4
Lab Architecture Azure Data Platform
Resource Group
5 6
ADPCosmosDB-suffix
Azure CosmosDB
1 2
3
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix ADPDatabricks
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2 Azure Databricks
Workspace
1 2 1
2
ADPComputerVision
4 Computer Vision API
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3 2
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student͛s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
© Microsoft Corporation
Real-time Analytics
© Microsoft Corporation
Real-time analytics
Deals with streams of data that are captured in real-time and processed with minimal latency to generate real-
time (or near-real-time) reports or automated responses.
SQL
“We want to integrate all our data “We’re trying to predict when “We’re trying to get insights
—including Big Data—with our our customers churn” from our devices in real-time”
data warehouse”
© Microsoft Corporation
Modern Data Platform Concepts
Part V
© Microsoft Corporation
Streaming Use Cases
Retail Financial Oil/Gas & Energy Security
CONSUMER ENGAGEMENT RISK AND REVENUE MANAGEMENT GRID OPS, ASSET OPTIMIZATION ACTIONABLE THREAT INTELLIGENCE
© Microsoft Corporation
Real-time Stream Processing
Simple Event Processing
Filter
Transform
Enrich
Split
Route
© Microsoft Corporation
Scenario Types
Automation
Dashboarding
Actions by Human Actors
“See and seize” insights
Live visualization
Alerts and alarms
Dynamic aggregation
© Microsoft Corporation
Lambda (λ) Architecture
Designed to handle Big Data use cases by taking advantage of both batch and stream-processing methods
© Microsoft Corporation
Event Hubs
© Microsoft Corporation
Event Hubs
Big data streaming platform and event ingestion service capable of receiving and processing millions of events
per second.
© Microsoft Corporation
Event Hubs Capture
Batch on stream
© Microsoft Corporation
Stream Analytics
© Microsoft Corporation
Stream Analytics
Event-processing engine that allows you to examine high volumes of data streaming from devices
© Microsoft Corporation
Differentiators
1,915 lines of code with Apache Storm
@ApplicationAnnotation(name="WordCountDemo")
Programmer Productivity public class Application implements StreamingApplication
{
Declarative SQL like language protected String fileName =
"com/datatorrent/demos/wordcount/samplefile.txt";
Built-in temporal semantics private Locality locality = null;
© Microsoft Corporation
Example Scenario: Tolling Stations
Tolling stations have multiple booths (identified by TollId)
Each booth has two sensors: Entry and Exit that send out EntryStream and ExitStream respectively
EntryStream – Data stream from the Entry sensor data on vehicles entering toll booths
ExitStream - Data stream from the Exit sensor on vehicles exiting toll booths
© Microsoft Corporation
Filters and Projections
SELECT VehicleCategory =
Case Type
From the incoming stream find only vehicles that: WHEN 1 THEN ‘Passenger’
• Are from either WA and CA state WHEN 2 THEN ‘Commercial’
ELSE THEN ‘Other’
• Have a weight less than 3000 lbs END,
• Have License plate number end in 999 TollId,
• Have a make that starts with a “M” LicensePlate,
State,
Make,
Display: Model,
• “Passenger” if type = 1 DATEPART(mi,EntryTime) AS ‘Mins’,
DATEPART(ss,EntryTime) AS ‘Seconds’
• “Commercial” if Type = 2
DATEPART(ms,EntryTime) AS ‘Milleseconds’
• “Other” for all other types FROM EntryStream TIMESTAMP BY EntryTime
WHERE (State = ‘CA’ OR State = ‘WA’)
AND Weight < 3000
• Display time as ‘Mins’, ‘Seconds’, ‘Milliseconds’
AND CHARINDEX (‘M’, model) = 0
AND PATINDEX(‘%999’, LicensePlate) = 5
© Microsoft Corporation
Temporal Joins
Report the time in seconds required for vehicles to pass the toll booth
SELECT ES.TollId,
ES.EntryTime,
EX.ExitTime,
ES.EntryTime,
ES.LicensePlate DATEDIFF(minute, ES.EntryTime, EX.ExitTime )
FROM EntryStream ES TIMESTAMP BY EntryTime
JOIN ExitStream EX TIMESTAMP BY ExitTime
ON (EX.TollId= ES.TollId and ES.LicensePlate = EX.LicensePlate)
AND DATEDIFF(minute, ES, EX) BETWEEN 0 AND 15
© Microsoft Corporation
Windowing Concepts
© Microsoft Corporation
Windowing Functions
Sliding Windows and Tumbling Windows
Sliding Windows
Tumbling Windows
© Microsoft Corporation
Windowing Functions
Hopping Windows and Session Windows
Hopping Windows
Session Windows
© Microsoft Corporation
Lab 5: Ingest and Analyse real-time
data with Event Hubs and Stream
Analytics
© Microsoft Corporation
Lab 5
Ingest and Analyse real-time data with Event Hubs and Stream Analytics
Load and Ingest Process
Stream Datasets
λ Lambda Architecture and Real-time
Stream Dashboards
Hot Path
V=Velocity Real Time Business User
IoT devices, sensors, gadgets Analytics
(loosely-typed) Event Hubs Stream Analytics
Cold Path
History and
Trend Analysis
Semi-Structured Data Factory Azure Data Lake Gen2 Databricks CosmosDB Application
V=Volume
Scheduled / event-
csv, logs, json, xml
triggered data ingestion
(loosely-typed) Integrate big data
Fast load
data with scenarios with
Polybase/ traditional data
ParquetDirect warehouse
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
Store Serve
© Microsoft Corporation
Lab 5
Lab Architecture Azure Data Platform
Resource Group
5
ADPLogicApp 1 ADPEventHubs-suffix 3 SynapseStreamAnalytics-suffix
Logic App Event Hubs Event Hubs
4
2 5 6
ADPCosmosDB-suffix
Azure CosmosDB
1 2
3
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix ADPDatabricks
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2 Azure Databricks
Workspace
1 2 1
2
ADPComputerVision
4 Computer Vision API
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3 2
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student͛s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
© Microsoft Corporation
It’s all on
© Microsoft Corporation
© Copyright Microsoft Corporation. All rights reserved.