Welcome to Scribd!

Machine Learning Spark ML

Uploaded by

0% found this document useful (0 votes)

32 views11 pages

Machine Learning with Spark MLlib discusses machine learning processes including supervised and unsupervised learning. Supervised learning models relationships between features and targets by classification or regression. Unsupervised learning finds patterns without predefined outcomes, like clustering. The machine learning process involves data preparation, model building/evaluation, and deployment. Key steps are data splitting, feature engineering, model selection based on performance.

Original Description:

machine leraning sprakml

Original Title

MachineLearningSparkML

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

32 views11 pages

Machine Learning Spark ML

Uploaded by

syarian sakir

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 11

Search inside document

Machine Learning with Spark MLlib

Manuel Martín Márquez

Antonio Romero Marin
Joeri Hermans
Hadoop Tutorials
Machine Learning (ML)
• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense out of
data
• Extracting patterns, fitting data to functions, classifying data,
etc
• ML systems can learn and improve
• With historical data, time and experience
• Bridges theoretical computer science and real noise data.

3
ML in real-life

4
Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering 2.5

2.0

1.5
irisCluster$cluster

Petal.Width
1

1.0

0.5

0.0
2 4 6
Petal.Length

5
Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined
outcome
• Models the relations between a set of descriptive features and
a target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression

6
Supervised Learning
• Classification
• Predicts which class a given sample of data (sample of descriptive
features) is part of (discrete value).
virginica
0.0 4.0 96.0

Percent
100

Predicted
versicolor
0.0 96.0 4.0 50

• Regression setosa
100.0 0.0 0.0
• Predicts continuous values.
setosa versicolor virginica
Actual

7
Machine Learning as a Process
Define - Define measurable and quantifiable goals
Objectives - Use this stage to learn about the problem

- Normalization
- Transformation
Model - Missing Values
Deployment Data - Outliers
Preparation

- Study models accuracy

- Work better than the naïve - Data Splitting
approach or previous system - Features Engineering
- Do the results make sense in - Estimating Performance
the context of the problem - Evaluation and Model
Model Model
Selection
Evaluation Building

8
ML as a Process: Data Preparation
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the
model performance
• Time on data preparation should not be underestimated

• Missing Values • Scaling

• Error Values • Centering
Raw
• Different Scales Data
• Skewness
Transfor
Data Modeling
Data
• Dimensionality
• Types Problems
• Outliers
mation
• Missing Values
Ready phase
• Many others • Errors

9
ML as a Process: Feature engineering
• Determine the predictors (features) to be used is one of the most critical
questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative predictors
Algorithms that use
Multiple models
Wrappers adding and
removing parameter
models as input and
performance as
Genetics Algorithms
output

Evaluate the
Filters relevance of the
predictor
Based normally on
correlations

• Binning predictors

10
ML as a Process: Model Building
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED

AI-Lecture 8 (Machine Learning Overview)
Document42 pages
AI-Lecture 8 (Machine Learning Overview)
Braga Gladys Mae
No ratings yet
Audit Course Review
Document11 pages
Audit Course Review
rahul suryawanshi
No ratings yet
Machine Learning
Document10 pages
Machine Learning
Md Shadman Sakib
No ratings yet
ML Lectures 2022 Part 1
Document231 pages
ML Lectures 2022 Part 1
PRIYANKA S
No ratings yet
L5 SubjectReview
Document18 pages
L5 SubjectReview
Shaiba Shoshi
No ratings yet
Untitled
Document29 pages
Untitled
Nikhil
No ratings yet
P 2 FSDL Berkeley Lecture10 Testing and Explainability 51 97
Document47 pages
P 2 FSDL Berkeley Lecture10 Testing and Explainability 51 97
Gestion Rif
No ratings yet
Machine Learning Spark ML
Document10 pages
Machine Learning Spark ML
Aditya Kumar
No ratings yet
10 - Overfitting and Underfitting
Document22 pages
10 - Overfitting and Underfitting
Panku Rangaree
No ratings yet
Bridging The Gap Between Few-Shot and Many-Shot Learning Via Distribution Calibration
Document13 pages
Bridging The Gap Between Few-Shot and Many-Shot Learning Via Distribution Calibration
Shuo Yang
No ratings yet
Machine Learning The Way To Better Thinking
Document11 pages
Machine Learning The Way To Better Thinking
Rick Mitra
No ratings yet
Project
Document12 pages
Project
12061017
No ratings yet
Imbalanced Deep Learning by Minority Class Incremental Rectification
Document16 pages
Imbalanced Deep Learning by Minority Class Incremental Rectification
María Gerón García
No ratings yet
Syllabus Business Analytics PDF
Document1 page
Syllabus Business Analytics PDF
purnabh PARASHAR
No ratings yet
Machine Learning: An Introduction
Document80 pages
Machine Learning: An Introduction
admin
No ratings yet
Data Mining
Document30 pages
Data Mining
Tinashe Kota
No ratings yet
Poster
Document2 pages
Poster
ketan itcell
No ratings yet
The Effect of Signed Epistemologies On Cryptoanalysis: Xander Hendrik and Ruben Judocus
Document3 pages
The Effect of Signed Epistemologies On Cryptoanalysis: Xander Hendrik and Ruben Judocus
John
No ratings yet
Pattern Recognition Application
Document43 pages
Pattern Recognition Application
Khaled Omar
No ratings yet
Or 3
Document22 pages
Or 3
Mary Ann Pacia
No ratings yet
Data Analysis, Interpretation and Presentation
Document30 pages
Data Analysis, Interpretation and Presentation
Anonymous Pm6jWs8Pi2
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
Document60 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
zombiee hook
No ratings yet
2 Level Factorial
Document58 pages
2 Level Factorial
Parimal Bhambare
100% (1)
Test - ML Cheat Sheet: by Via
Document2 pages
Test - ML Cheat Sheet: by Via
Al kuyuudi
No ratings yet
Frequencies: Notes
Document24 pages
Frequencies: Notes
Aan Asep Saepudin
No ratings yet
Pengetahuan
Document9 pages
Pengetahuan
Despiyadi S.Kep.,Ns
No ratings yet
Empirical Data Analysis in Accounting and Finance
Document37 pages
Empirical Data Analysis in Accounting and Finance
Ra'fat Jallad
No ratings yet
Theory in Machine Learning
Document60 pages
Theory in Machine Learning
Rivujit Das
No ratings yet
Nallamothu URTCPOSTEROFFICIAL
Document1 page
Nallamothu URTCPOSTEROFFICIAL
Sirihaasa N
No ratings yet
Storytelling With Data To Executives 09212016
Document33 pages
Storytelling With Data To Executives 09212016
Oscar Pinillos
No ratings yet
BDA-PPT Final
Document28 pages
BDA-PPT Final
Lohith Kumar
No ratings yet
On Unit-3
Document30 pages
On Unit-3
Nihar Ranjan Prusty 92
No ratings yet
Performance Metrics: Dr. Gaurav Dixit
Document9 pages
Performance Metrics: Dr. Gaurav Dixit
Aniket Sujay
No ratings yet
Presented By, Shobha C.Hiremath (01FE17MCS019)
Document25 pages
Presented By, Shobha C.Hiremath (01FE17MCS019)
shobha hiremath
No ratings yet
Operation Research
Document11 pages
Operation Research
Jayender Rathore
No ratings yet
Untitled
Document3 pages
Untitled
Anurag Singh
No ratings yet
Choosing Model and Tuning
Document20 pages
Choosing Model and Tuning
kar20201214
No ratings yet
BIDM Session 07-08
Document44 pages
BIDM Session 07-08
Ajit chowdary
No ratings yet
Data Mining and Knowledge Discovery in Databases
Document36 pages
Data Mining and Knowledge Discovery in Databases
Pixelsquare Studios
No ratings yet
Pattern Recognition and Their Medicals Applications by DR - Subodh Srivastava
Document73 pages
Pattern Recognition and Their Medicals Applications by DR - Subodh Srivastava
Aman Kumar Dipti
No ratings yet
Unit 7 - Introduction To Predictive Analytics
Document10 pages
Unit 7 - Introduction To Predictive Analytics
Rosan Yanise
No ratings yet
MATLAB Demystified
From Everand
MATLAB Demystified
David McMahon
Rating: 5 out of 5 stars
5/5 (1)
Lecture 15 - Recap and Midterm Review
Document37 pages
Lecture 15 - Recap and Midterm Review
deponly
No ratings yet
UNIT5
Document60 pages
UNIT5
Sahana Shetty
No ratings yet
Soft Sensor For Faulty Measurements Detection and Reconstruction in Urban Traffic
Document22 pages
Soft Sensor For Faulty Measurements Detection and Reconstruction in Urban Traffic
ShivankyJaiswal
No ratings yet
Lecture8 Model Assessment Students
Document37 pages
Lecture8 Model Assessment Students
翁江靖
No ratings yet
BATCH - 11: Classifying Interactions/Reactions SVM (Machine Learning Concept)
Document13 pages
BATCH - 11: Classifying Interactions/Reactions SVM (Machine Learning Concept)
NANDESHVAR KALEEDASS
No ratings yet
Definition of Terms
Document3 pages
Definition of Terms
Jane Rob
No ratings yet
Example 1: Hand-Written Digit Recognition: X R X X
Document21 pages
Example 1: Hand-Written Digit Recognition: X R X X
Abdul Salam (F-Name: Amanullah
No ratings yet
Intro To Response Surface Methods: - Central Composite Designs
Document61 pages
Intro To Response Surface Methods: - Central Composite Designs
Sayyid Ridho
No ratings yet
Operations Research: The OR Process
Document11 pages
Operations Research: The OR Process
DrRitu Malik
No ratings yet
Ch1 Introduction To OR
Document26 pages
Ch1 Introduction To OR
yared haftu
No ratings yet
In5490 Classification
Document85 pages
In5490 Classification
sherin joyson
No ratings yet
Intro
Document10 pages
Intro
chandreshpadmani9993
No ratings yet
Intro To OOSE 2
Document65 pages
Intro To OOSE 2
biruk molla
No ratings yet
(Download PDF) Machine Learning Pocket Reference Working With Structured Data in Python 1St Edition Matt Harrison Online Ebook All Chapter PDF
Document42 pages
(Download PDF) Machine Learning Pocket Reference Working With Structured Data in Python 1St Edition Matt Harrison Online Ebook All Chapter PDF
darrin.kohl682
100% (9)
Pydata 2021 CV Tesco
Document28 pages
Pydata 2021 CV Tesco
test
No ratings yet
What Is Classification? What Is Prediction?
Document21 pages
What Is Classification? What Is Prediction?
Hit Man
No ratings yet
Big Data Lesson 2 Lucrezia Noli
Document21 pages
Big Data Lesson 2 Lucrezia Noli
Reyansh Sharma
No ratings yet
ISLR Chap 4 Shaheryar
Document16 pages
ISLR Chap 4 Shaheryar
Shaheryar Zahur
No ratings yet
Create Reference Model Controller With MATLAB Script - MATLAB & Simulink
Document5 pages
Create Reference Model Controller With MATLAB Script - MATLAB & Simulink
jose diaz
No ratings yet
Neural Networks: Representa1on: Non - Linear Hypotheses
Document34 pages
Neural Networks: Representa1on: Non - Linear Hypotheses
AnilSiwakoti
No ratings yet
2019 기부 및 사회이슈 트렌드 PDF
Document230 pages
2019 기부 및 사회이슈 트렌드 PDF
제닛피셜
No ratings yet
Artificial Intelligence (A.I.)
Document11 pages
Artificial Intelligence (A.I.)
shubham jagtap
No ratings yet
(Download PDF) Deep Learning For Vision Systems 1St Edition Mohamed Elgendy 2 Online Ebook All Chapter PDF
Document42 pages
(Download PDF) Deep Learning For Vision Systems 1St Edition Mohamed Elgendy 2 Online Ebook All Chapter PDF
maria.tower621
100% (11)
Dr. Xavier Chelladurai: Udemy Online Video Based Courses by
Document1 page
Dr. Xavier Chelladurai: Udemy Online Video Based Courses by
Ronil Raju
No ratings yet
Artificial Intelligence in Business: Dr. Amanish Lohan
Document17 pages
Artificial Intelligence in Business: Dr. Amanish Lohan
jitender KUMAR
No ratings yet
Infotec Ai 1000 Program-hcia-Ai Lab Guide
Document82 pages
Infotec Ai 1000 Program-hcia-Ai Lab Guide
micke juarez
No ratings yet
Reasearch Paper Review
Document45 pages
Reasearch Paper Review
Arpana Singh
No ratings yet
Cse319 Soft-Computing TH 1.10 Ac26 PDF
Document2 pages
Cse319 Soft-Computing TH 1.10 Ac26 PDF
mridulkhandelwal
No ratings yet
BTP Presentation
Document29 pages
BTP Presentation
SAJAL PATHAK
No ratings yet
ITP4-Lesson 4-Week 7-8
Document18 pages
ITP4-Lesson 4-Week 7-8
Jamaica Mercolita
No ratings yet
A Study On Regression Algorithm in Machine Learning
Document3 pages
A Study On Regression Algorithm in Machine Learning
International Journal of Innovative Science and Research Technology
No ratings yet
Plant Leaf Disease Detection Using Deep Learning: Mr. Thangavel. M - AP/ECE, Gayathri P K, Sabari K R, Prathiksha V
Document4 pages
Plant Leaf Disease Detection Using Deep Learning: Mr. Thangavel. M - AP/ECE, Gayathri P K, Sabari K R, Prathiksha V
Sanket Deshmukh
No ratings yet
Artificial Neural Networks-Unsupervised Learning PDF
Document39 pages
Artificial Neural Networks-Unsupervised Learning PDF
Selva Kumar
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
Document20 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
vanjchao
No ratings yet
Comprehensive Guide Transfer Learning Real World - Python
Document47 pages
Comprehensive Guide Transfer Learning Real World - Python
ashish.mukti223
No ratings yet
Project 2
Document100 pages
Project 2
Ne'am Mohamed Abdullah
No ratings yet
Topic 5 - Part1 Multilayer Perceptron
Document28 pages
Topic 5 - Part1 Multilayer Perceptron
ﻣﺤﻤﺪ الخميس
No ratings yet
Deep Learning With Tensorflow
Document70 pages
Deep Learning With Tensorflow
amina
No ratings yet
Internship Semianar 2k21
Document21 pages
Internship Semianar 2k21
prajwal c n
No ratings yet
Soft Computing
Document2 pages
Soft Computing
Neelima Malchi
No ratings yet
Face Detection Using CNN
Document6 pages
Face Detection Using CNN
My Life
No ratings yet
A Comprehensive Study of Deep Video Action Recognition
Document30 pages
A Comprehensive Study of Deep Video Action Recognition
Duc Le Hong
No ratings yet
Internal Assessment Test-Iii Department of Computer Science & Engineering
Document2 pages
Internal Assessment Test-Iii Department of Computer Science & Engineering
Omeshwar
No ratings yet
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
Document12 pages
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
sdjnsj jnjcdejnew
No ratings yet
Coding
Document5 pages
Coding
Nayz Mhaai
No ratings yet
ML Section15 Neural Networks
Document133 pages
ML Section15 Neural Networks
dummy
No ratings yet
PR - L1-Introduction To Pattern Recognition PDF
Document20 pages
PR - L1-Introduction To Pattern Recognition PDF
cooldoubtless
No ratings yet
Sudhakar An 2017
Document6 pages
Sudhakar An 2017
Pavani Reddy
No ratings yet