Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Cloud Data

Warehousing with Azure


Synapse Analytics
(formerly Azure SQL Data Warehouse)

Igor Stanko
Principal Group PM Manager
igorstan@microsoft.com
Woodward, Phinean
Data Architect, Unilever
Phinean.Woodward@unilever.com
BRK3051
Agenda Azure Synapse Analytics
Overview
What’s new
Demo
Unilever story
Azure Synapse Analytics is Azure SQL Data Warehouse evolved, blending big
data, data warehousing, and data integration into a single service for end-to-
end analytics at cloud scale
Azure Synapse Analytics
limitless analytics service with unmatched time to insight

Designed for analytics workloads


Artificial Intelligence / Machine Learning / Internet of Things
at any scale
Intelligent Apps / Business Intelligence
Synapse Analytics
SaaS developer experiences for
code free and code first
Experience
Experience Synapse Analytics Studio
Multiple languages suited to
Platform
Platform Languages different analytics workloads
MANAGEM ENT
SQL Python .NET Java Scala R
Integrated analytics runtimes
Form Factors available provisioned and
SE CURITY
PROVISIONED ON- DE MAND serverless on-demand
SQL Analytics offering T-SQL for
Analytics Runtimes
MONITORING
batch, streaming and interactive
processing
SQL Spark for big data processing with
METASTORE Python, Scala, R and .NET
DATA INTEGRATION
Integrated platform services
for, management, security,
monitoring, and metastore
Azure
Azure Common Data Model
Data
Data Lake
Lake Storage
Storage
Enterprise Security Data lake integrated and
Optimized for Analytics Common Data Model aware
SQL Analytics
new features available

GA features: Public preview features:


- Performance: Resultset caching - Workload management: Workload Isolation
- Performance: Materialized Views - Data ingestion: Simple ingestion with COPY
- Performance: Ordered columnstore - Data Sharing: Share DW data with Azure Data Share
- Heterogeneous data: JSON support - Trustworthy: Private LINK support
- Trustworthy: Dynamic Data Masking
- Continuous integration & deployment: SSDT support
- Language: Read committed snapshot isolation

Private preview features:


- Data ingestion: Streaming ingestion & analytics in DW
- Built-in ML: Native Prediction/Scoring
- Data lake enabled: Fast query over Parquet files
- Language: Updateable distribution column 
- Language: FROM clause with joins
- Language: Multi-column distribution support
Note: private preview features require whitelisting
Scope: Generally Available

Result
Control Node

Best in class price


performance Compute Node Compute Node Compute Node

Interactive dashboarding with


Resultset Caching
- Millisecond responses with resultset caching Storage
- Cache survives pause/resume/scale operations
- Fully managed cache (1TB in size)

Enable caching: Alter Database <DBNAME> Set Result_Set_Caching ON


Purge cache: DBCC DropResultSetCache
Scope: Generally Available

Best in class price


performance

Interactive dashboarding with


Materialized Views
- Automatic data refresh and maintenance
- Automatic query rewrites to improve performance
- Built-in advisor
Scope: Public Preview

CREATE WORKLOAD GROUP Sales


WITH
(
[ MIN_PERCENTAGE_RESOURCE = 60 ]
[ CAP_PERCENTAGE_RESOURCE = 100 ]
[ MAX_CONCURRENCY = 6 ] )

Workload aware Intra Cluster Workload Isolation


query execution (Scale In)

Sales

Workload Isolation
- Multiple workloads share deployed resources
60% Compute
1000c DWU
Marketing
- Reservation or shared resource configuration
- Online changes to workload policies 100%
40%
Local In-Memory + SSD Cache

Data
Warehouse
Scope: Private Preview (whitelisting needed)

T-SQL Language
Heterogenous Data
Preparation & SQL Analytics
Ingestion
Event Hubs

Streaming Ingestion Data Warehouse

Native SQL Streaming


Built-in streaming ingestion & analytics
- High throughput ingestion (up to 200MB/sec)
- Delivery latencies in seconds
IoT Hub
- Ingestion throughput scales with compute scale
- Analytics capabilities (SQL-based queries for joins,
aggregations, filters)
Scope: Public Preview

T-SQL Language

SQL Analytics
Event Hubs

Heterogenous Data
Preparation &
Streaming Ingestion Data Warehouse

Ingestion Streaming, Batch &


COPY statement

Trickle loading
IoT Hub
COPY statement
- Simplified permissions (no CONTROL required) Azure Data Lake
- No need for external tables
- Standard CSV support (i.e. custom row terminators,
escape delimiters, SQL dates) --Copy files in parallel directly into data warehouse table
COPY INTO [dbo].[weatherTable]
- User-driven file selection (wild card support) FROM 'abfss://<storageaccount>.blob.core.windows.net/<filepath>'
WITH (
FILE_FORMAT = 'DELIMITEDTEXT’,
SECRET = CredentialObject);
Scope: Private Preview (whitelisting needed)

Create Upload Score


models models models

Machine Learning SQL Analytics

enabled DW + =
Model Data Predictions

T-SQL Language

Native PREDICT-ion
- T-SQL based experience (interactive./batch scoring)
- Interoperability with other models built elsewhere Data Warehouse
- Execute scoring where the data lives

--T-SQL syntax for scoring data in SQL DW


SELECT d.*, p.Score
FROM PREDICT(MODEL = @onnx_model, DATA = dbo.mytable
AS d)
WITH (Score float) AS p;
Scope: Private Preview (whitelisting needed)

SQL Analytics

Data Lake
Integration

ParquetDirect for interactive 13X


data lake exploration
- >10X performance improvement
- Full columnar optimizations (optimizer, batch)
- Built-in transparent caching (SSD, in-memory,
resultset)
Scope: Generally Available

Azure Data Share

Enterprise data sharing


- Built-in flexibility
• Share from DW to DW/DB/other systems
• Choose data format to receive data in (CSV, Parquet)
- One to many data sharing
- Share a single or multiple datasets
Demo scenario explore predict

Arcadia

Event Hub

data warehouse

Data Lake

Parquet
files
Taxi ride – pricing
data
Unilever Story
Speaker name
WE MAKE MANY OF THE WORLD’S
FAVOURITE BRANDS

On any day, 2.5 billion people use Unilever products to look good, feel good and get more out of life – giving us a
unique opportunity to build a brighter future
Modern Data Warehouse Logical Representation

• Each product will have its own


resource group for cataloguing
and cross-charging purposes
• Generally implemented using

ADF
Product Product Product Product Product SQL DW, AAS, PBI, but flexibility
based on requirements

Business Data Lake Business Data Lake Business Data Lake


• Implemented using ADLS
(BDL) (BDL) (BDL)

ADF
• Implemented using Azure
Databricks

Universal Data Lake (UDL) • Implemented using ADLS

Orchestration
Azure Synapse Analytics (formerly SQL DW)

 Used in 50+ projects


 Model – facts & dimensions:
 data available to our user & data science community in a familiar format
 Scale up/down capability – power for processing & costs managed

 New features Unilever is leveraging


 Reserved instance pricing
 Workload management & workload isolation
 Result set cache
 Materialised views
 Fast queries over parquet files
Unilever and Azure Synapse Analytics (Preview)

 In preview for ~3 months


 Brings together data integration, data warehousing and big data processing capabilities
at scale
 Accelerates the delivery of BI, AI and Intelligent Applications
 Build Analytics Solution from ingestion to BI reporting in one place in a single workspace
 Easy Management
 Integrated Dev Ops environment
 Easy deployment
 Secure Environment
 Centralized Monitoring & Alerting
 Additional features
Q/A
Backup
Want to learn more about Azure Synapse Analytics?
Check out these sessions for more information

NEW! Introducing Azure Synapse Analytics: the Next Evolution of SQL Data Warehouse for
BRK2187 Monday, 3:15PM
Every Data Professional

BRK3044 Migrating Your Mission-Critical Data Warehouse to Azure Synapse Analytics Monday, 4:30PM

BRK3330 Unifying AI-to-BI with Azure Synapse Analytics Tuesday, 9:15AM

Modernizing your Data Warehouse with Data Ingestion, Preparation, and Serving using Azure
BRK3224 Tuesday, 10:30AM
Synapse Analytics

BRK3229 Securing Your Data Warehouse with Azure Synapse Analytics Tuesday, 11:45AM

BRK3050 Democratizing the Data Lake with On-Demand Capabilities in Azure Synapse Analytics Tuesday, 3:30PM

BRK3051 Cloud Data Warehousing with Azure Synapse Analytics Wednesday, 10:30AM
Want to learn more about analytics on Azure?
Check out these sessions for more information

Wednesday,
BRK3045 Code-free ETL using Azure Data Factory & Data Share
2:15PM
Wednesday,
BRK3094 Modern Data Integration Scenarios & B2B data sharing using Azure Data Share
3:30PM

BRK3046 Achieving Petabyte-Scale Data Ingestion with Azure Data Factory Thursday, 9:15AM

BRK3043 Maximizing your Azure Databricks Deployment Thursday, 10:30AM

Gaining Business Insights with Open Source Analytics on Azure HDInsight:


BRK3042 Friday, 9:15AM
Patterns and Best Practices

BRK3047 Prepare for the Next Era of Insights Using Azure Data Lake Storage Friday, 11:45AM

Enabling Real-Time Analytics Patterns from the Cloud to the Intelligent Edge with Azure Stream
BRK2066 Tuesday, 2:15PM
Analytics

BRK3048 Build High Performance Time Series and Log Data Analysis Solutions with Azure Data Explorer Friday, 10:30AM
Want to get hands-on?
Check out these labs to get hands-on with the latest in Azure Analytics.

WRK4002 Data Integration using Azure Data Factory and Azure Data Share Tuesday, 4:00PM

WRK4000 Build Solutions Powered by Real-time Analytics using Azure Stream Analytics and Azure Data Explorer Wednesday, 12:30PM

WRK4001 Building an End-to-end Analytics Pipeline with Azure Synapse Analytics Thursday, 4:00PM

Stopping by our booth?


Check out these theater sessions in the Hub for even more information

THR3110 Big Data Processing with Spark and .NET Tuesday, 12:40PM

THR3113 Maximizing ROI in SQL Data Warehouse with Enhanced Workload Management Wednesday, 12:40PM

THR3116 Streaming Data with Azure EventHubs and Kafka Wednesday, 3:05PM

THR3128 Time Series and Machine Learning with Azure Data Explorer Thursday, 1:15PM

THR3133 Staying Productive with Azure Data's Developer Tooling Thursday, 10:20AM

THR3119 What's New with Azure Data Lake Storage? Thursday, 2:30PM
Please evaluate this session
Your feedback is important to us!

Please evaluate this session through


MyEvaluations on the mobile app
or website.
Download the app:
https://aka.ms/ignite.mobileapp
Go to the website:
https://myignite.techcommunity.microsoft.com/evaluations
Find this session Visit aka.ms/MicrosoftIgnite2019/BRK3051

in Microsoft Tech  Download slides and resources

Community
 Access session recordings in 48 hours
 Ask questions & continue the conversation
© Copyright Microsoft Corporation. All rights reserved.
<Add an image for interaction with ASA for real time
scoring>
 Will update once DW-ASA demo is working
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’ ,
MEMBERNAME = 'security_account' [ [ , ]
IMPORTANCE = { LOW | BELOW_NORMAL | NORMAL (default) | ABOVE_NORMAL | HIGH }])

Scheduler With Importance Turned On


Workload aware
query execution
CEO CEO

1 2 12
3 4 5 6 7 8 9 10 11 12
Normal Low Normal High
Workload Importance

Running Queued
Queued

You might also like