Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

INTRODUCTION TO

ARTIFICIAL INTELLIGENCE
FOR IT & NON-IT PROFESSIONALS
HOW TO PERFORM MACHINE LEARNING
HOW TO PERFORM MACHINE LEARNING

It involves the following 7 steps:


1. Specify the problem
2. Prepare data
3. Choose learning method
4. Apply the learning method
5. Assess the method and results
6. Parameter tuning
7. Making predictions
SPECIFY THE PROBLEM

• This involves understanding the problem, how we can solve the


problem, and how it can be evaluated
• It is useful to understand why you want to solve the problem
PREPARING DATA

• Data is information
• Any time we have a table with information, we have data
• each row in the table is a data point
• Consider a dataset of pets. Each row represents a pet. Each pet is
described by certain features.
WHAT ARE FEATURES?

• Features are the columns of the table


• Some features are special, and we call them labels. If we are trying to
predict a feature based on the others, then that feature is called the
label
• Labels depends on the context of the problem we are solving
LABELED AND UNLABELED DATA.

Labeled data: Data that comes with a label.


Unlabeled data: Data that comes without a label.
PREPARING DATA

• ML algos learn from the data they are trained on


• It is mission critical to provide the model with valid, correct data to
learn from
• Data must be prepared in a usable format
• The data must then be processed to ensure correct formatting,
removal of erroneous data, and the fixing of any missing data
PREPARING DATA

• Sampling: The dataset size may be more than required, and so


dataset sampling may also be required
• Data pre-processing is essential to have tidy, valid data
• Tidy, valid data is key to having robust, veracious (true) outcomes
ML DATASET

• The UC Irvine ML Repository is a collection of DBs, domain theories,


and data generators that are used by the ML community for the
empirical analysis of ML algorithms
ATTRIBUTE, VARIABLE, FEATURE
SELECTION
• It is essentially filtering- and refers to the selection of a subset of the
original example set that is most relevant in the predictive modeling at
hand
• Feature selection includes and excludes attributes rather than
creating new ones.
TRANSFORMING DATA

• Check your datasets for errors, biases, and inconsistencies


• Data may also need to be transformed. This is typically guided by the
algorithm you are using and the data available
• Scaling: Data can contain attributes with varying quantities
APPLY LEARNING METHODS

• ML tasks are typically conducted in a variety of programming


languages: predominantly R, Python, MATLAB, SQL, Java, and C
• R is typically used for statistical analysis
• Python is well suited to ML
• MATLAB is the language used for fast prototyping
• SQL is used for managing data held in a traditional database
management system
TRAINING AND TEST DATA

• Test set and training set is selected from the prepared data
• The algorithm is trained on the training dataset and evaluated against
the test dataset
• Signal: the true underlying pattern in a dataset
• Noise: random or irrelevant patterns in a dataset
ASSESS METHOD AND RESULTS

• The performance of ML tasks depends on the representation of data


given
• It is not necessary to require complete feature sets as part of
representations to have highly confident outputs
PARAMETER TUNING

• After evaluation, for further improvement, we can tune the parameters


by showing the model our full dataset multiple times, rather than just
once, to increase accuracy
• It’s important that to define what makes a model “good enough”,
otherwise you might find yourself tweaking parameters for a very long
time. These parameters are called “hyperparameters”.
PREDICTIONS

• Prediction, or inference, is the step where we get to answer some


questions
• This is the real objective of all this work, where the value of ML is
realized

You might also like