Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

EEE436

Introduction to Machine Learning

Khairul Alam
Professor, EEE Department
East West University
Aftabnagar, Dhaka, Bangladesh

1
Introduction to Machine Learning

Definition:
 Machine learning is the science of getting computers to learn and act like humans do,
and improve their learning over time in autonomous fashion, by feeding them data and
information in the form of observations and real world interactions.
 A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.
 ML is the capability of a machine to imitate intelligent human behavior. ML uses
algorithms trained on data to produce adaptable models that can perform a variety of
complex tasks without human intervention.
 Machine Learning is the science (and art) of programming computers so they can learn from data

2
Why Use Machine Learning

To understand the necessity of ML, consider a spam email filter


(1) In traditional programming, you may look for some pattern, say “free” in the email
body or subject to filter the email as spam.
(2) Now you write detection algorithm for each such patterns, and your code detects
spam successfully.
(3) Over time, the pattern list will grow up, new rules are coded and your algorithm
will become complex, eventually it becomes unmanageable.
(4) On the other hand, ML automatically learns which words and phrases are good
predictors of spam by detecting unusually frequent patterns of words in the spam
examples. The program is now much shorter, easier to maintain, and most likely
more accurate.
3
Computer Program vs
Machine Learning
 In traditional programming

• We write code / program


• Give input
• Get output
 In machine learning
• We give input and output
• Machine develops program / model
• Predicts output for a new input
4
How does a Machine Learn
 How it works?

• We write algorithm
• Give input data
• Give output data
• Train our model using the I/O data
• Once the model is trained up, it can predict output for new input
• If more data are fed, then the model automatically improves its
internal algorithm for better accuracy
5
How does a Machine Learn
(1) Initially, machine learning model is provided with training data for which the result
is known. (2) Machine Learning algorithm is executed and adjustments are
accomplished till (3) the result of ML model is exactly similar to the actual model. Once, the
results of ML model is as desired, then the ML model is said to be trained and (4)
testing data is fed as input to this trained ML model to get human like intelligent results.
New
Input
(1) (2) (3) (4)

Data Algorithm Out


Run the
I/O are generates Model put
Algorithm
known model

6
Training Data
The two fundamental components of ML are (1) training data and (2) algorithm

Collection of Data
It involves finding the data on the basis of the machine learning project that we desire to make. Data may
be gathered from various sources such as files, sensors, databases, etc.

Pre-processing of Data
The data collected from different sources for building the machine learning model cannot be directly used
for analysis purpose, as it may contain a large amount of noisy data, unorganized text, missing values,
large values, or irrelevant text. All such unwanted data may be eliminated in order to obtain clean
data.
– Numerical: example age, salary, etc.
– Categorical: example nationality, gender, etc.
– Ordinal: example high, medium, low, etc
7
Training Data
Data pre-processing may be performed in the following ways:
Dealing with Null Values
We can solve the problem of null values by either deleting the rows and columns
that comprise null values or by using imputation, which is a process of
substituting the missing values with some substituted values.
Standardization
It is a process that involves the manipulation of values so that the mean of all the
values is 0 and the standard deviation is 1.

8
Training Data
Dealing with Categorical Variables
Categorical variables are those which are discrete and not continuous.
Feature Scaling
It is a technique in which we make the values of all the features same by scaling
down the features that are insignificant and have a large range of values.
Splitting the Data
In machine learning, we usually split the data in 70:30, meaning 70% of the data
is used for training and 30% of the data is used for testing.

9
ML System (Algorithm)
Machine Learning systems can be classified according to the amount and type of
supervision they get during training. There are four major categories: supervised learning,
unsupervised learning, semi-supervised learning, and reinforcement learning.
Supervised learning
In supervised learning, the training data you feed to the algorithm includes the desired
solutions, called labels (Figure 1-5).

The machine is trained with


many example emails along
with their class (spam or
ham), and it must learn how
to classify new emails.

10
ML System (Algorithm)
Supervised learning Unsupervised learning
The most important supervised learning In unsupervised learning, the training data
algorithms are: is unlabeled. The system tries to learn
 k-Nearest Neighbors without a teacher. Algorithms include
 Linear Regression  Clustering
 Logistic Regression  Dimensionality Reduction
 Support Vector Machines (SVMs)
 Decision Trees
 Random Forests
 Neural Networks

11
ML System (Algorithm)
Semi-supervised learning
Some algorithms can deal with partially
labeled training data, usually a lot of
unlabeled data and a little bit of labeled
data. This is called semi-supervised
learning. Most semi-supervised learning
algorithms are combinations of
unsupervised and supervised algorithms.
Some photo-hosting services, such as Google
Photos, are good examples of this. Once you upload
all your family photos to the service, it automatically
recognizes that the same person A shows up in
photos 1, 5, and 11, while another person B shows up
in photos 2, 5, and 7.
12
ML System (Algorithm)
Reinforcement learning
The learning system, called an agent
in this context, can observe the
environment, select and perform
actions, and get rewards or penalty. It
must then learn by itself what is the
best strategy, called a policy, to get the
most reward over time. A policy
defines what action the agent should
choose when it is in a given situation.
For example, many robots implement
Reinforcement Learning algorithms to
learn how to walk.
13
A Few Applications of ML
Virtual Personal Assistants: The virtual personal assistants popularly used today
include Alexa, Siri, and Google. These virtual personal assistants assist in getting
information, whenever asked over voice.

Traffic Prediction: For managing traffic, GPS navigation devices are used. GPS devices
help in tracking the current location and velocity of a vehicle, and store the information in
the central server. This information is used for generating the current traffic report.

Video Surveillance System: A single person cannot monitor multiple video cameras.
A video surveillance system uses machine learning at its back end to detect unusual
behavior in people, like napping, stumbling, standing, etc.

Social Media like Facebook: Facebook notices people that we connect with, the profiles
that we often visit, our workplace, groups that we share, our interest, etc. On the basis
of all this information, Facebook gives a suggestion of the list of people we would like to
become friends with.

14
A Few Applications of ML
Online Customer Support: Many websites today provide the facility to chat with the
company’s customer support representatives while the user is scrolling through the website.
But not all the websites provide live executives to answer the queries. In some websites, the
user talks to a chatbot which is based on ML.

Search Engine: Google and many other companies now a days use ML to improve the search
capabilities. Whenever we perform a search, an algorithm is run at the back end to see how
we respond to the results provided by a search engine. If we open the top-most result and
stay on the page for a very long time, then the search engine assumes that the result
displayed is appropriate in accordance with the query. Also, if we reach the second or the
third page of the search results but do not open any of the pages, then the search engine
assumes that the result displayed is not in accordance with the query. In this way, the
algorithm running at the back end tries to improve the performance of the search results.

In one sentence, the applications of ML is uncountable. Self driving cars, automatic text
translation, robot, in healthcare, and so many.

15
Issues and Challenges in ML
Lack of Training Data: In general, machine learning models need training data–
information and examples representing exactly what you want them to do for you. And
the truth is, in many instances, you simply don’t have access to millions of real data that
you need. That’s the very first challenge machine learning specialists have to overcome.

Poor Quality of Data: In other instances, the data you need is available, but the quality
of it leaves a lot to be desired. If you start work with poor-quality data, you can’t expect
to end up with a fully functional and effective algorithm. On the contrary, it will be
defective and inefficient. And this is where data quality tools come into play. They are
designed to remove formatting errors, typos, redundancies, missing entries, and other
issues that reduce the quality of your data.

Data Overfitting: In short, data overfitting is all about developing a too complicated
machine learning model and trying to fit it into a limited set of data. As a result, your
machine learning model works brilliantly on a training dataset, but in more instances and
cases, it fails to generalize properly.
16
Issues and Challenges in ML
Data Underfitting: Here, we deal with a reverse problem. Our model is too simple or misses
parameters that it should have included in order to produce a clear and unbiased result. This
means that our machine learning model cannot draw useful conclusions from the training data.

Data Security: You need to make sure that every framework, every third-party app, and every
piece of your IT infrastructure is properly secured against diverse cyber threats. Secondly, bear
in mind that your employees and coworkers can also be a source of the problem. Another data
security-related problem is fake data. This problem happens when your company is being
attacked by hackers who replace your real data with fake information. The best way to avoid
unnecessary complications lies in designing encrypted authentication and validation
procedures so that users are verified before they can implement any changes into the system
or data it stores.

Accessibility: Many data science tools and platforms are available, but not all are accessible
to everyone. It can be a problem for people new to data science or who need more resources
to invest in expensive tools. Additionally, some platforms may be more accessible, depending
on your skill set and experience.
17
Issues and Challenges in ML
Video Training Data: Today, the vast majority of machine learning models are trained on
static data, e. g., pictures and texts. We still have a problem with using dynamic data to
“teach” machine learning algorithms.

Object Detection: Although we know and understand how this technology works, object
detection (specially moving objects) is still quite a challenge, and many algorithms
struggle with it. Of course, our solutions are getting better and better at it, but there’s
still a lot to achieve.

18

You might also like