5 Machine Learning Models You Should Know - DataRobot AI Cloud

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

We’ve updated the terms of the DataRobot Privacy Policy.

By continuing you confirm thatStart


you’ve
for Free
read and understood the Policy.
Read Policy

Blog / 5 machine learning models you should kn…

5 machine learning models you should know


November 20, 2019 by DataRobot · 6 min read

This article was originally published at Algorithimia’s website. The company was acquired by DataRobot
in 2021. This article may not be entirely up-to-date or refer to products and offerings no longer in
existence. Find out more about DataRobot MLOps here.

Getting started with machine learning starts with understanding the how and why behind employing
particular methods. We’ve chosen five of the most commonly used machine learning models on which to
base the discussion.

AI taxonomy 
Before diving too deep, we thought we’d define some important terms that are often confused when
discussing machine learning. 

Algorithm – A set of predefined rules used to solve a problem. For example, simple linear regression
is a prediction algorithm used to find a target value (y) based on an independent variable (x). 
Model – The actual equation or computation that is developed by applying sample data to the
parameters of the algorithm. To continue the simple linear regression example, the model is the
equation of the line of best fit of the x and y values in the sample set plotted against each other.
Neural network  – A multilayered algorithm that consists of an input layer, output layer, and a hidden
Get Yourthat
layer in the middle. The hidden layer is a series of stacked algorithms Free Pass
iterate until the computer
This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
chooses a final output. Neural networks are sometimes referred to as “black box” algorithms because
Explore
Please see our Cookie Policy for more information or Cookies Settings to adjust the future Cookie
your preferences. of innovation in
humans
Policy don’t have a clear and structured idea how the computer is making its decisions. 
AI, Analytics, and Data Science.
Deep learning – Machine learning methods based on neural network architecture. “Deep” refers to the
large number
Cookies Settings of algorithms employed in the hidden layer (often more than 100).
Accept All Cookies
Data science – A discipline that combines math, computer science, and business/domain knowledge. 
Machine learning methods Start for Free

Machine learning methods are often broken down into two broad categories: supervised learning
and unsupervised learning. 

Supervised learning – Supervised learning methods are used to find a specific target, which must also
exist in the data. The main categories of supervised learning include classification and regression. 

Classification – Classification models often have a binary target sometimes phrased as a “yes” or
“no.” A variation on this model is probability estimation in which the target is how likely a new
observation is to fall into a particular category. 
Regression – Regression models always have a numeric target. They model the relationship between
a dependent variable and one or more independent variables. 

Unsupervised learning – Unsupervised learning methods are used when there is no specific target to find.
Their purpose is to form groupings within the dataset or make observations about similarities. Further
interpretation would be needed to make any decisions on these results. 

Clustering – Clustering models look for subgroups within a dataset that share similarities. These
natural groupings are similar to each other, but different than other groups. They may or may not have
any actual significance. 
Dimension reduction – These models reduce the number of variables in a dataset by grouping similar
or correlated attributes.

It’s important to note that individual models are not necessarily used in isolation. It often takes a
combination of supervised and unsupervised methods to solve a data science problem. For example, one
might use a dimension-reduction method on a large dataset and then use the new variables in a
regression model. 

To that end, Model pipelining involves the act of splitting up machine learning workflows into modular,
reusable parts to couple together with other model applications to build more powerful software over
time. 

What are the most popular machine learning


algorithms?  Get Your Free Pass
This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Below
Please we’ve
see ourdetailed some
Cookie Policy for of theinformation
more most common machine
or Cookies Explore
Settings learning
to adjust the future
algorithms.
your of innovation
They’re
preferences. Cookie in
often mentioned in
Policy
AI, Analytics,
introductory data science courses and books and are a good place to begin. and Data
We’ve alsoScience.
provided some
examples of how these algorithms are used in a business context. 
Cookies Settings Accept All Cookies
Linear regression Start for Free

Linear regression is a method in which you predict an output variable using one or more input variables.
This is represented in the form of a line: y=bx+c. The Boston Housing Dataset is one of the most
commonly used resources for learning to model using linear regression. With it, you can predict the
median value of a home in the Boston area based on 14 attributes, including crime rate per town,
student/teacher ratio per town, and the number of rooms in the house. 

K-means clustering
K-means clustering is a method that forms groups of observations around geometric centers called
centroids. The “k” refers to the number of clusters, which is determined by the individual conducting the
analysis. Clustering is often used as a market segmentation approach to uncover similarity among
customers or uncover an entirely new segment altogether. 

Principal component analysis (PCA) Get Your Free Pass


This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Please
PCA is see our Cookie Policy for more
a dimension-reduction information
technique or Cookies
used Settings
to reduce Explore
thetonumber
adjust your the futureinCookie
of preferences.
variables ofa innovation
dataset by in
grouping
Policy
together variables that are measured on the same scale and areAI, Analytics,
highly and Data
correlated. Science.is to distill
Its purpose
the dataset down to a new set of variables that can still explain most of its variability. 
Cookies Settings Accept All Cookies
A common application of PCA is aiding in the interpretation of surveys that have a large number of
Start for Free
questions or attributes. For example, global surveys about culture, behavior or well-being are often broken
down into principal components that are easy to explain in a final report. In the Oxford Internet Survey,
researchers found that their 14 survey questions could be distilled down to four independent factors. 

K-nearest neighbors (k-NN)


Nearest-neighbor reasoning can be used for classification or prediction depending on the variables
involved. It is a comparison of distance (often euclidian) between a new observation and those already in
a dataset. The “k” is the number of neighbors to compare and is usually chosen by the computer to
minimize the chance of overfitting or underfitting the data. 

In a classification scenario, how closely the new observation is to the majority of the neighbors of a
particular class determines which class it is in. For this reason, k is often an odd number to prevent ties.
For a prediction model, an average of the targeted attribute of the neighbors predicts the value for the
new observation. 

Get Your Free Pass


This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Explore
Please see our Cookie Policy for more information or Cookies Settings to adjust the future Cookie
your preferences. of innovation in
Policy
AI, Analytics, and Data Science.
(ResearchGate)
Cookies Settings Accept All Cookies
Classification and regression trees (CART) Start for Free

Decision trees are a transparent way to separate observations and place them into subgroups. CART is a
well-known version of a decision tree that can be used for classification or regression. You choose a
response variable and make partitions through the predictor variables. The computer typically chooses
the number of partitions to prevent underfitting or overfitting the model. CART is useful in situations
where “black box” algorithms may be frowned upon due to inexplicability, because interested parties need
to see the entire process behind a decision. 

(community.jmp)

How do I choose the best model for machine learning?


The model you choose for machine learning depends greatly on the question you are trying to answer or
the problem you are trying to solve. Additional factors to consider include the type of data you are
analyzing (categorical, numerical, or maybe a mixture of both) and how you plan on presenting your
results to a larger audience. 

The five model types discussed herein do not represent the full collection of model types out there, but
are commonly used for most business use cases we see today. Using the above methods, companies
can conduct complex analysis (predict, forecast, find patterns, classify, etc.) to automate workflows.

DEMO

See DataRobot in Action


Request a demo

ABOUT THE AUTHOR

DataRobot
The Next Generation of AI Get Your Free Pass
This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Please see our
DataRobot AI Cookie
Cloud isPolicy forgeneration
the next more information
of AI. Theor Cookies
unified Settings
platform to for
is built Explore
adjust theallfuture
yourtypes,
all data preferences. of innovation
Cookie
users, and in to
all environments
deliver critical business insights for every organization. DataRobot is trusted by global customers across industries and verticals,
Policy
AI, Analytics, and Data Science.
including a third of the Fortune 50. For more information, visit https://www.datarobot.com/.

Meet DataRobot
Cookies Settings Accept All Cookies
Start for Free

Related Posts

DATAROBOT RELEASE DATAROBOT RELEASE

Delivering the Next Generation of AI with Introducing the DataRobot AI


DataRobot AI Cloud Closer Look

September 14, 2021 · 5 min read September 14, 2021 · 4 min read

See other posts in

Get Your Free Pass


This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Explore
Please see our Cookie Policy for more information or Cookies Settings to adjust the future Cookie
your preferences. of innovation in
Policy
AI, Analytics, and Data Science.

Cookies Settings Accept All Cookies


Start for Free

Subscribe to our Blog

First Name * Last Name *

Email *

Country *

Industry

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out
from our marketing emails, in our Privacy Policy.

Submit

Get Your Free Pass


This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Explore
Please see our Cookie Policy for more information or Cookies Settings to adjust the future Cookie
your preferences. of innovation in
Policy
AI, Analytics, and Data Science.

Cookies Settings Accept All Cookies


Who Is DataRobot?
Start for Free
DataRobot was founded in 2012 to democratize access to AI. Today, DataRobot is the AI Cloud leader, with a vision to deliver a
unified platform for all users, all data types, and all environments to accelerate delivery of AI to production for every
organization.

Subscribe to More Intelligent Tomorrow


Insights on the future brought to you by DataRobot

Subscribe

The Gartner Peer Insights Customers’ Choice badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved. Gartner Peer Insights Customers’ Choice constitute the
subjective opinions of individual end-user reviews, ratings, and data applied against a documented methodology; they neither represent the views of, nor constitute an endorsement by, Gartner or its affiliates.


Back to top

© 2022 DataRobot, Inc.


Legal Privacy Trust

Get Your Free Pass


This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Explore
Please see our Cookie Policy for more information or Cookies Settings to adjust the future Cookie
your preferences. of innovation in
Policy
AI, Analytics, and Data Science.

Cookies Settings Accept All Cookies

You might also like