Data Analytics With MATLAB

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Data Analytics with MATLAB

Adam Filion
Application Engineer
MathWorks

© 2015 The MathWorks, Inc.1


Case Study: Day-Ahead Load Forecasting
 Goal:
– Implement a tool for easy and accurate computation of day-
ahead system load forecast

 Requirements:
– Acquire and clean data from
multiple sources
– Accurate predictive model
– Easily deploy to production
environment

2
Challenges with Data Analytics

 Aggregating data from multiple sources

 Cleaning data

 Choosing a model

 Moving to production

3
NYISO Energy Load Data

4
Techniques to Handle Missing Data

 List-wise deletion
– Unbiased estimates
– Reduces sample size
 Implementation options
– Built in to many
MATLAB functions
– Manual filtering

5
Techniques to Handle Missing Data

Substitution – replace
missing data points with a
reasonable approximation

Easy to model

Too important to exclude

6
Merge Different Sets of Data

 Join along a common axis

Inner Join
 Popular Joins:
– Inner
– Full Outer Full Outer Join

– Left Outer
– Right Outer
Left Outer Join

7
Full Outer Join

Key
A B
1 1.1
4 1.4 Key B Y Z
7 1.7 1 1.1 0.1 0.2
9 1.9
3 0.3 0.4
NaN
First Data Set
4 1.4 NaN NaN

X
Key Y Z 5 NaN 0.5 0.6

1 0.1 0.2 7 1.7 0.7 0.8


3 0.3 0.4 9 1.9 NaN NaN
5 0.5 0.6 Joined Data Set
7 0.7 0.8
Second Data Set

8
Learn More: Big Data with MATLAB
www.mathworks.com/discovery/big-data-matlab.html
www.mathworks.com/discovery/matlab-mapreduce-hadoop.html

9
Challenges with Data Analytics

 Aggregating data from multiple sources

 Cleaning data

 Choosing a model

 Moving to production

10
Machine Learning
Characteristics and Examples

 Characteristics
– Lots of variables
– System too complex to know
the governing equation
(e.g., black-box modeling)

 Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)

11
Overview – Machine Learning
Type of Learning Categories of Algorithms

Classification
Supervised
Learning

Develop predictive Regression


Machine
model based on both
Learning input and output data

Unsupervised
Clustering
Learning

Group and interpret


data based only
on input data

12
Supervised Learning

Regression

Neural Ensemble Non-linear Reg. Linear


Decision Trees
Networks Methods (GLM, Logistic) Regression

Classification

Support Vector Discriminant Nearest


Naive Bayes
Machines Analysis Neighbor

13
Unsupervised Learning

k-Means,
Fuzzy C-Means

Hierarchical

Clustering Neural
Networks

Gaussian
Mixture

Hidden Markov
Model

14
Learn More: Machine Learning with MATLAB

mathworks.com/machine-learning

16
Challenges with Data Analytics

 Aggregating data from multiple sources

 Cleaning data


 Choosing a model

 Moving to production

17
Deployment Highlights

Database Servers Desktop Applications

Spreadsheets
Hadoop / Big Data

Client Applications
Application Servers

Web Applications Batch/Cron Jobs

 Share with others who may not have MATLAB

 Royalty-free deployment

 Encryption to protect your intellectual property


18
Deploying Applications with MATLAB

MATLAB

MATLAB MATLAB
Compiler Compiler SDK

MATLAB
Standalone Excel Production
Application Add-in Hadoop C/C++ Java .NET
Server

MATLAB Compiler for sharing MATLAB programs without integration


programming
MATLAB Compiler SDK provides implementation and platform flexibility for
software developers

MATLAB Production Server provides the most efficient development path


for secure and scalable web and enterprise applications
19
Sharing Standalone Applications
Application Author

MATLAB
Toolboxes

MATLAB Compiler End User


2

Standalone Excel
Application Add-in Hadoop

3 MATLAB
Runtime

20
Integrating MATLAB-based Components
Application Author

MATLAB
Toolboxes
Application author and software
developer might be same person

1
Software Developer

MATLAB Compiler SDK


2
MATLAB
C/C++ Java .NET Production
Server 3 4 MATLAB
Runtime

21
MATLAB Production Server

 Directly deploy MATLAB analytic programs into production


– Centrally manage multiple MATLAB programs & MCR versions
– Automatically deploy updates without server restarts
MATLAB Production Server(s)
 Scalable & reliable
– Service large numbers of concurrent requests
– Add capacity or redundancy with additional servers
HTML
XML
 Use with web, database & application servers Web
Java Script

– Lightweight client library isolates MATLAB processing Server(s)

– Access MATLAB programs using native data types


– Integrates with Java, .NET, C and Python

22
Deployed Analytics
MATLAB Production Server

Web MATLAB MATLAB


Application Production Desktop
Server Server
Train in
Apache Tomcat MATLAB MATLAB
Production Server

Request Broker
Web Server/
Webservice Predictive
Models

Weather
CTF
Data

Energy
Data
23
Learn More: Application Deployment with
MATLAB
www.mathworks.com/solutions/desktop-web-deployment/

24
Learn More: MATLAB Application Deployment

Also … www.mathworks.com/solutions/desktop-web-deployment/

25
Data Analytics Products

Access and Develop Integrate Analytics


Preprocess Data with Systems
Explore Data Predictive Models

MATLAB

Parallel Computing Toolbox, MATLAB Distributed Computing Server MATLAB Production Server

Database Toolbox Statistics and Machine Learning Toolbox MATLAB Compiler

Data Acquisition Toolbox Curve Fitting Toolbox Neural Network Toolbox MATLAB Compiler SDK

Mapping Toolbox Signal Processing Toolbox Computer Vision System Toolbox

Used in today’s demo


Image Acquisition Toolbox Image Processing Toolbox Econometrics Toolbox
Additional Data Analytics
OPC Toolbox products
26
Key Takeaways

 Data preparation can be a big job;


leverage built-in MATLAB tools and
spend more time on the analysis

 Rapidly iterate through different


predictive models, and find the one
that’s best for your application

 Leverage parallel computing to scale-up your analysis to large


datasets

 Eliminate the need to recode by deploying your MATLAB algorithms


into production

27
© 2015 The MathWorks, Inc.
28

You might also like