Aniruddha BigDataandAnalytics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Transform your data estate

with Big Data, Advanced Analytics

anid@microsoft.com
The world is changing
Today, 80% of
AI investment
Data will grow to organizations
increased by
44 ZB in 2020 adopt cloud -first
300% in 2017
strategies
Today, 80% of
AI investment
Data will grow to organizations
D ATA
44 ZB in 2020 CLOUD
adopt cloud -first AI
increased by
300% in 2017
strategies
D ATA AI
Organizations that harness data,
cloud, and AI outperform
CLOUD
Companies surveyed include well-known
enterprises across key industries

Financial services Retail

Manufacturing Consumer goods

Source: Keystone Strategy interviews Oct 2015 - Mar 2016


Organizations that harness data,
cloud, and AI outperform

Nearly double $100M in additional


operating margin operating income
They rely on a modern data estate
Rising citizen demands are driving
transformation in government services

25% 15% 10%


of government digital
of government of government transactions
services will use bots to
organizations will (such as tax collection,
personalize workflows and/or
roll out cognitive welfare disbursement and
channels through which
immigration control) will have xx
solutions by 2020 citizens and businesses
embedded analytics by 2019
access services by 2018
Modern government challenges

Changes in services, Large, expanding


policy, and citizen volume of data
requests require more increasing complexity
responsive agencies and storage costs

Data security,
Unstructured data privacy and regulatory
limits ability to analyze requirements are
and take action of paramount
importance
Unique government opportunities

IT infrastructure Social network Logistics Web app


optimization Public Safety analysis optimization optimization

Health Fraud Analytics Data Weather Healthcare


analysis detection Consolidation forecasting outcomes

Scientific Equipment Smart Sensor


research monitoring monitoring
constantly expanding
THE MODERN
DATA E S TAT E

HYBRID

On-premises Cloud
Private cloud

Operational databases Operational databases


Data warehouses Data warehouses
Data lakes Data lakes

Reason over any data, anywhere Flexibility of choice Security and Performance
The Azure BIG Data Landscape
AZURE
AZURE AZURE IMPORT AZURE SQL DB AZURE COSMOS DB AZURE SQL DATA WAREHOUSE POWER BI
ANALYSIS SERVICES
DATA FACTORY EXPORT SERVICE

AZURE CLI AZURE SDK


AZURE DATA LAKE AZURE AZURE AZURE ML ML SERVER AZURE
AZURE STORAGE AZURE DATA LAKE ANALYTICS HDINSIGHT DATABRICKS
DATABRICKS
BLOBS STORE

AZURE IOT HUB AZURE EVENT HUBS

AZURE AZURE AZURE


AZURE SEARCH BOT SERVICE COGNITIVE SERVICES
AZURE STREAM ANALYTICS HDINSIGHT DATABRICKS
KAFKA ON DATA CATALOG
AZURE HDINSIGHT

AZURE EXPRESSROUTE AZURE AZURE NETWORK AZURE KEY OPERATIONS AZURE FUNCTIONS
VISUAL STUDIO
ACTIVE DIRECTORY SECURITY GROUPS MANAGEMENT SERVICE MANAGEMENT SUITE
SQL Server 2019
Azure Data Lake
Azure Data Bricks
Industry-leading performance and security, with intelligence over all your data

Intelligence over Choice of platform Industry-leading Most secure Insights in minutes


any data and language performance over the last 8 years5 and rich reports

200
180
160

Vulnerabilities (2010-2017)
140
120
100
80
60
#1 OLTP performance1 40
20
#1 DW performance on 0
AI and Machine Learning T-SQL PHP Python The best of Power BI and
1TB2, 10TB3, and 30TB4
over all data with the power Java Node.js Ruby SQL Server Reporting Services
of SQL and Apache Spark C/C++ C#/VB.NET Intelligent Query Processing with Power BI Report Server

In-memory across all workloads Most consistent data platform

Private cloud 1/10th the cost of Oracle Public cloud

All TPC Claims as of 1/19/2018.


1 http://www.tpc.org/4081; 2 http://www.tpc.org/3331; 3 http://www.tpc.org/3326; 4 http://www.tpc.org/3321; 5 National Institute of Standards and Technology Comprehensive Vulnerability Database
Data virtualization Managed SQL Server, Spark, Complete AI platform
and data lake

Admin portal and management services


Analytics Apps
T-SQL Integrated AD-based security
REST API containers
for models

SQL Server External Tables SQL


Server Spark
SQL Server Spark &
ML Services Spark ML
Compute pools and data pools

Scalable, shared storage (HDFS)


Open NoSQL Relational HDFS External HDFS
database databases data sources
connectivity

Combine data from many sources without Store high volume data in a data lake and access Easily feed integrated data from many sources to
moving or replicating it it easily using either SQL or Spark your model training
Scale out compute and caching to boost Management services, admin portal, and Ingest and prep data and then train, store, and
performance integrated security make it all easy to manage operationalize your models all in one system
Custom
apps BI Analytics

SQL Server
SQL
master instance

Compute pool Compute pool Compute pool Directly


read from

SQL Compute SQL Compute SQL Compute SQL Compute SQL Compute
Node Node Node Node Node HDFS

Data mart Storage pool

SQL Data SQL Data


Node Node Spark
SQL
Spark
SQL
… Spark
SQL
Server Server Server

HDFS Data Node HDFS Data Node HDFS Data Node


Storage Storage
Kubernetes pod
IoT data

Node Node Node Node Node Node Node


Persistent storage
Performance
Intelligent Query Processing
Accelerating I/O performance with Persistent Memory
Gain performance insights anytime and anywhere with Lightweight Query Profiling

Security
Always Encrypted with secure enclaves
Data Classification and auditing built-in
Manage certificates easier with SQL Configuration Manager

Availability
Always On availability group enhancements
Resumable online index creation
Online Clustered Columnstore index creation and rebuild
Availability groups on Kubernetes
SQL Server 2019
Azure Data Lake
Azure Data Bricks
VALUE
How can we
make it happen?
Prescriptive
What will Analytics
happen?
Theory
Predictive
Theory Analytics
Why did Hypothesis
Hypothesis it happen?
Diagnostic Pattern
Observation What
Analytics
happened?
Observation
Descriptive
Confirmation
Analytics

DIFFICULTY
Understand Gather Implement Data Warehouse
Corporate Requirements
Strategy Reporting & BI and analytic
Reporting &
Analytics Design Analytics
Business Development
Requirements
Data warehouse
Dimension Modelling Physical Design

ETL
ETL
ETL Design
Technical Development
Requirements
Data sources
Setup Infrastructure Install and Tune
Data Lake Uses A Bottom-Up Approach

Ingest all data Store all data Do analysis


regardless of requirements in native format without Using analytic engines
schema definition like Hadoop

Batch queries

DEVICES
Interactive queries
Real-time analytics
r
LOGS, FILES AND MEDIA Machine Learning
(UNSTRUCTURED)

Data warehouse

BUSINESS / CUSTOM
APPS
(STRUCTURED)
WASB WASB ADLS Azure Data Lake Storage Gen2

Blob Storage + Blob Storage + Azure Data Lake


Scalable, secure storage that
(WASB) (WASB) Store (ADLS) speeds time to insight

Scale and Scale and Speed to Scale and Speed to


Availability Availability Insight Availability Insight
Cost Cost Rich Cost Rich
Effectiveness Effectiveness Security Effectiveness Security

Azure Data Lake Storage Gen2: Single Data Lake Store that combines the performance and
innovation of ADLS with the scale and rich feature set of Blob Storage
SQL Server 2019
Azure Data Lake
Azure Data Bricks
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure

Best of Databricks Best of Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workflows

Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

Native integration with Azure ser vices (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Director y integration, compliance, enterprise -grade SL As)
Azure Databricks
Azure Databricks
Collaborative Workspace

IoT / streaming data Machine learning models

DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST

Deploy Production Jobs & Workflows


BI tools
Cloud storage

MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS

Data warehouses
Optimized Databricks Runtime Engine Data exports

Hadoop storage
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
Data warehouses

Enhance Productivity Build on secure & trusted cloud Scale without limits
Collaborative Workspace
Azure Databricks
Collaborative Workspace

DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST

Deploy Production Jobs & Workflows

MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS

Optimized Databricks Runtime Engine

DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs


Deploy Production Jobs & Workflows
Azure Databricks
Collaborative Workspace

DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST

Deploy Production Jobs & Workflows

MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS

Optimized Databricks Runtime Engine

DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs


Optimized Databricks Runtime Engine
Azure Databricks
Collaborative Workspace

DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST

Deploy Production Jobs & Workflows

MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS

Optimized Databricks Runtime Engine

DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs


D ATA M A N A G E M E N T S O L U T I O N
FOR INTELLIGENCE IN THE CLOUD

Ingest Store Prep & Train Model & Serve Intelligence


Business
apps Data Factory
(Data movement, pipelines & orchestration)

Cosmos DB

Predictive apps

Kafka Blobs HDInsight /


10
01
SQL
SQL Database
Custom Data Lake Spark
apps

SQL Data
Operational reports
Warehouse
Event Hubs Machine
IoT Hub Learning
Sensors Analysis Services
and devices
Analytical dashboards

DATA INTELLIGENCE ACTION


THE MODERN
DATA E S TAT E

HYBRID

On-premises Cloud
Private cloud

Operational databases Operational databases


Data warehouses Data warehouses
Data lakes Data lakes

Reason over any data, anywhere Flexibility of choice Security and Performance
Empower today’s innovators to unleash the power of data
and reimagine possibilities that will improve our world

You might also like