Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 29

A Training Report On

Machine Learning
The training was conducted during the period (22/06/2020 to 05/10/2020) given under
the seal of

Eduonix

Submitted BY

(Rajat Saxena)
(1747910047)

(Computer Science & Engineering)


Under the Guidance of
(Mr.Ankur Bhatnagar)

Dpt. of Computer Science & Engineering


Rajshree Institute of Management & Technology
National Highway 30, Pilibhit Road, Bareilly (Up)
ACKNOWLEDGEMENTS

It is always a pleasure to remind the fine people in the Engineering


workshops for their sincere guidance I received to upload my theoretical
skill Engineering. I would like to express my sincere gratitude towards
Mr. Rajesh Sharma, for taught me this toughest subject in easiest way.
And, I would like to thanks, Mr.Ankur Bhatnagar for providing me
their invaluable guidance and suggestion throughout the course and also
motivate me to work harder. Also, I would like to thanks my dear friends
for helping me to complete my project.

Rajat Saxena
(1747910008)
CERTIFICATE
PREFACE

In this internship report I will describe my experiences during my internship

period. The internship report contains an overview of the internship company and

the activities tasks and projects that I have worked on during my internship.

Writing this report, I also will describe and reflect my learning objects and

personal goals that I have set during my internship period. In compiling this

report I have intended to provide a synthesis of theoretical approaches and

methods of implementing them in the world of business. I have tried to discover

the relationship between theoretical and practical type of knowledge. I have tried

to bridge the gap between theoreti calassumptions and practical necessities.

During the entire course of our academic study we remainengaged in theoretical

learning where the primary objective is academic success. A concise knowledge

of the modern business arena can only be attained through the pragmatic

implementation of hypothetical ideas, which we learn from our academic


activities.
Declaration given by the student

The machine learning field, which can be briefly defined as


enabling computers make successful predictions using past
experiences, has exhibited an impressive development recently
with the help of the rapid increase in the storage capacity and
processing power of computers. Together with many other
disciplines, machine learning methods have been widely
employed in bioinformatics. The difficulties and cost of
biological analyses have led to the development of sophisticated
machine learning approaches for this application area. In this
course, we first review the fundamental concepts of machine
learning such as ML Paradigms i.e, unsupervised, supervised and
types of classification. Then, I point out the one main subtopic of
machine learning i.e, logistic Regression and perform a small
project on this.

Rajat Saxena
Organization Introduction

Who we are

Eduonix learning Solutions is the premier training and skill development organization which was
started with a vision to bring world class training content, pedagogy and best learning practices to
everyone's doorsteps . Eduonix aims to identify and provide the best learning and training
environment. It identifies industry veterans and content creators around the globe and bring it to
the global audience using number of intuitive platforms for easy and affordable access to quality
content. Eduonix offers easy to understand online courses and workshops for everyday people. If
you have ever wanted to learn a new skill, but don't want to attend four years of college to do it,
we have a solution for you.

Company Vision

To bring quality skill building content and world class learning experience to everyone using both
online and offline mediums. To add fun and joy back to learning.

Company mission

- To innovate and bring better learning experience for all our students

- To partner with both industry and academia to bridge the gap between Industry and Universities

- To consistently enhance our content portfolio and provide amazing consumer experience

- To encourage ideas and talent across industries

We offer training and skill building courses across Technology, Design, Management, Science

and Humanities. We have taught over 120000 students globally. We aim to touch millions of
lives and bring them the joy of knowledge.
Table of contents

Cover page……………………………………………………………………………………………………
ACKNOWLEDGEMENT.................................................................... ……………………

Certifcate.....................................................................................................
Preface................................................................................................................
Declaration given by the student …………………………………………………
Organization Introduction………………………………………………………

Chapter 1: INTRODUCTION TO MACHINE LEARNING

1.1 What is Machine Learning?


1.2 History of Machine Learning.
1.3 How does Machine Learning works?
1.4 Steps to solve problem in Machine Learning problem.

Chapter 2: TYPES OF MACHINE LEARNING

2.1 Supervised Learning.


2.1.1 Classification
2.1.2 Regression
2.2 Unsupervised Learning.
2.2.1 Clustering
2.2.2 Association
2.3 Reinforcement Learning.
2.3.1 Model-based
2.3.2 Model-free

Chapter 3: CHAPTER 3MUITIPILE LINEAR (MLR) REGRESSION


3.0 Definition?
3.1 Formula and calculation Of Multiple Linear Regression
3.2 What Multiple Linear Regression (MLR) Can Tell u?
3.3 The Multi Linear Regression is based on following
3.4 Grap
3.5 Data Visulization?
Chapter 4: PROJECT REPORT

4.1 Overview
4.2 Dataset Description
4.3 Screen Shot of Project Code and Output

CONCLUSION …………………………………………….

BIBLIOGRAPHY……………………………………………
CHAPTER 1

INTRODUCTION TO MACHINE LEARNING

1.1 What is Machine Learning?

The term Machine Learning was first introduced by Arthur


Samuel (1959). We can define machine learning as:-
“Machine Learning enables a machine to automatically learn from
data, improve performance from
Experiences and predict things without being explicitly

programmed”
Machine learning (ML) is the study of computer algorithms
that improve automatically through experience. It is seen as a
subset of artificial intelligence. Machine learning algorithms
build a model based on sample data, known as "training
data", in order to make predictions or decisions without
being explicitly programmed to do so.
Machine learning is the process of teaching a computer
system how to make accurate predictions when fed data.
Those predictions could be answering whether a piece of
fruit in a photo is a banana or an apple, spotting people
crossing the road in front of a self-driving car, whether the
use of the word book in a sentence relates to a paperback or a
hotel reservation, whether an email is spam, or recognizing
speech accurately enough to generate captions for a
YouTube video.

1.2 History of Machine Learning.

The name machine learning was coined in 1959 by Arthur Samuel. Tom M.
Mitchell provided a widely quoted, more formal definition of
the algorithms studied in the machine learning field: "A
computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P
if its performance at tasks in T , as measured by P, improves
with experience E."
This follows Alan Turing's proposal in his paper "Computing
Machinery and Intelligence", in which the question "Can
machines think? ". Is replaced with the question "Can
machines do what we (as thinking entities) can do?". In
Turing’s proposal the characteristics that could be possessed
by a thinking machine and the various implications in
constructing one are exposed.

1.3 How does Machine Learning works?

Machine learning is a form of artificial intelligence (AI) that


teaches computers to think in a similar way to how humans do:
learning and improving upon past experiences. It works by
exploring data, identifying patterns, and involves minimal
human intervention.

Almost any task that can be completed with a data-


defined pattern or set of rules can be automated with
machine learning. This allows companies to transform
processes that were previously only possible for
humans to perform—think responding to customer
service calls book keeping, and reviewing resumes.

Figure 1 Working of Machine Learning

As shown in fig.1,
Model takes the past data as an input and trains that
data by using Machine Learning algorithms to learn
from data then when new data is given to that model,
it will generate a predicated output.

1.4 Steps to solve problem in Machine Learning problem

3) Choose a Model
4) Train the Model
5) Evaluate the Model
6) Parameter Tuning
7) Make Predictions

1- Data Collection

• The quantity & quality of your data dictate how accurate


our model is
• The outcome of this step is generally a representation of
data (Guo simplifies to specifying a table) which we will
use for training
• Using pre-collected data, by way of datasets from
Kaggle, UCI, etc., still fits into this step

2- Data Preparation

• Wrangle data and prepare it for training


• Clean that which may require it (remove duplicates,
correct errors, deal with missing values,
normalization, data type conversions, etc.)
• Randomize data, which erases the effects of the
particular order in which we collected and/or
otherwise prepared our data
• Visualize data to help detect relevant relationships
between variables or class imbalances (bias alert!), or
perform other exploratory analysis  Split into training
and evaluation sets

3- Choose a Model

 Different algorithms are for different tasks; choose the right one

4- Train the Model

• The goal of training is to answer a question or make a


prediction correctly as often as possible
• Linear regression example: algorithm would need to
learn values for m (or W) and b (x is input, y is output)
• Each iteration of process is a training step

5- Evaluate the Model

• Uses some metric or combination of metrics to


"measure" objective performance of model
• Test the model against previously unseen data
• This unseen data is meant to be somewhat
representative of model performance in the real world,
but still helps tune the model (as opposed to test data,
which does not)
• Good train/eval split? 80/20, 70/30, or similar,
depending on domain, data availability, dataset
particulars, etc.

6- Parameter Tuning

• This step refers to hyper parameter tuning, which is an


"art form" as opposed to a science
• Tune model parameters for improved performance
• Simple model hyper parameters may include: number of
training steps, learning rate, initialization values and
distribution, etc.

7- Make Predictions

 Using further (test set) data which have, until this point,
been withheld from the model (and for which class
labels are known), are used to test the model; a better
approximation of how the model will perform in the
real world.
CHAPTER 2

TYPES OF MACHINE LEARNING

As with any method, there are different ways to train


machine learning algorithms, each with their own advantages
and disadvantages. To understand the pros and cons of each
type of machine learning, we must first look at what kind of
data they ingest. In ML, there are two kinds of data —
labelled data and unlabelled data.

Labelled data has both the input and output parameters in a


completely machinereadable pattern, but requires a lot of
human labour to label the data, to begin with. Unlabelled
data only has one or none of the parameters in a machine-
readable form. This negates the need for human labour but
requires more complex solutions.

There are also some types of machine learning algorithms


that are used in very specific use-cases, but three main
methods are used.

 Supervised Learning

 Unsupervised Learning

 Reinforcement Learning

2.1 Supervised Learning

Supervised learning is one of the most basic types of


machine learning. In this type, the machine learning
algorithm is trained on labeled data. Even though the data
needs to be labeled accurately for this method to work,
supervised learning is extremely powerful when used in the
right circumstances.

In supervised learning, the ML algorithm is given a small


training dataset to work with. This training dataset is a
smaller part of the bigger dataset and serves to give the
algorithm a basic idea of the problem, solution, and data
points to be dealt with. The training dataset is also very
similar to the final dataset in its characteristics and provides
the algorithm with the labelled parameters required for the
problem.

The algorithm then in relationships between the parameters


given, essentially establishing a cause and effect relationship
between the variables in the dataset. At the end of the
training, the algorithm has an idea of how the data works and
the relationship between the input and the output.

This solution is then deployed for use with the final dataset,
which it learns from in the same way as the training dataset.
This means that supervised machine learning algorithms will
continue to improve even after being deployed, discovering
new patterns and relationships as it trains itself on new data.

Supervised learning can be split into two


subcategories: - Classification and regression.

2.1.1 Classification

During training, a classification algorithm will be given data


points with an assigned category. The job of a classification
algorithm is to then take an input value and assign it a class,
or category, that it fits into based on the training data
provided.

The most common example of classification is determining if


an email is spam or not. With two classes to choose from
(spam, or not spam), this problem is called a binary
classification problem. The algorithm will be given training
data with emails that are both spam and not spam. The model
will find the features within the data that correlate to either
class and create the mapping function mentioned earlier:
Y=f(x). Then, when provided with an unseen email, the
model will use this function to determine whether or not the
email is spam.

Classification problems can be solved with a numerous


amount of algorithms . Whichever algorithm you choose to
use depends on the data and the situation. Here are a few
popular classification algorithms:

• Linear Classifiers

• Support Vector Machines

• Decision Trees

• K-Nearest Neighbour

• Random Forest

2.1.2 Regression

Regression is a predictive statistical process where the model


attempts to find the important relationship between
dependent and independent variables. The goal of a
regression algorithm is to predict a continuous number such
as sales, income, and test scores. The equation (2.1) for basic
linear regression can be written as so:

Y= w[0]*x[0] +w[1]*x[1]+……..
+w[i]*x[i]+b

Where x[i] is the feature(s) for the data and where w[i] and b
are parameters which are developed during training. For
simple linear regression models with only one feature in the
data, the formula looks like this:

Y’=w*x+ b

(2.2)
Where w is the slope, x is the single feature and b is the y-
intercept. For simple regression problems such as this, the
models predictions are represented by the line of best fit. For
models using two features, the plane will be used. Finally,
for a model using more than two features, a hyper plane will
be used.

There are many different types of regression algorithms. The


three most common are listed below:
• Linear Regression
• Logistic Regression
• Polynomial Regression

2.2 Unsupervised Learning

Unsupervised learning is a branch of machine learning that is


used to manifest underlying patterns in data and is often used
in exploratory data analysis.
Unsupervised learning does not use labeled data, but instead
focuses on the data’s features. Labelled training data has a
corresponding output for each input. When using
unsupervised learning, we are not concerned with the
targeted outputs because making predictions is not the
desired outcome of unsupervised learning algorithms.
Supervised learning is concerned with labelled data in order
to make predictions, but unsupervised learning is not.

The goal of unsupervised learning algorithms is to analyze


data and find important features. Unsupervised learning will
often find subgroups or hidden patterns within the dataset
that a human observer may not pick up on.

Unsupervised learning can be split into two subcategories: -

Clustering and Association

2.2.1 Clustering

Clustering is an important concept when it comes to


unsupervised learning. It mainly deals with finding a
structure or pattern in a collection of uncategorized data.
Clustering algorithms will process your data and find natural
clusters (groups) if they exist in the data. You can also
modify how many clusters your algorithms should identify. It
allows you to adjust the granularity of these groups.

There are different types of clustering you can utilize:

1. Exclusive (partitioning)

In this clustering method, Data are grouped in such a way


that one data can belong to one cluster only.

Example: K-means

2. Agglomerative
In this clustering technique, every data is a cluster. The
iterative unions between the two nearest clusters reduce the
number of clusters.
Example: Hierarchical clustering

3. Overlapping

In this technique, fuzzy sets used to cluster data. Each point


may belong to two or more clusters with separate degrees of
membership.
Here, data will be associated with an appropriate
membership value. Example: Fuzzy
C- Means

4. Probabilistic

This technique uses probability distribution to create the


clusters Example: Following keywords
• "man’s shoe."
• "women 's shoe."
• "women's glove."
• "man's glove."

Can be clustered into two categories "shoe" and "glove" or


"man" and "women." There are many different types of
clustering algorithms.
They are listed below:
• Hierarchical clustering
• K-means clustering
• K-NN (k nearest neighbours)
• Principal Component Analysis
• Singular Value Decomposition
• Independent Component Analysis
2.2.2 Association

Association rules allow you to establish associations


amongst data objects inside large databases. This
unsupervised technique is about discovering interesting
relationships between variables in large databases. For
example, people that buy a new home most likely to buy new
furniture.

Association rule is unsupervised learning where algorithm


tries to learn without a teacher data are not labelled.
Association rule is descriptive not the predictive method,
generally used to discover interesting relationship hidden in
large datasets. The relationship are usually represented in
form of rules or frequent item sets.

Association rules mining are used to identify new and


interesting insights between different objects in a set,
frequent pattern in transactional data or any sort of relational
database. They are commonly used for Market Basket
Analysis (which items are bought together), Customer
clustering in Retail (Which stores people tend to visit
together), Price Bundling, Assortment Decisions, Cross
Selling and others. This can be considered advanced form of
what if scenario, if this then that.

Figure 2 Reinforcement Learning


2.3 Reinforcement Learning

Reinforcement learning is the training of machine


learning models to make a sequence of decisions.
The agent learns to achieve a goal in an uncertain,
potentially complex environment. In reinforcement
learning, an artificial intelligence faces a game-like
situation.

Reinforcement learning is an area of Machine


Learning. It is about taking suitable action to
maximize reward in a particular situation. It is
employed by various software and machines to find
the best possible behaviour or path it should take in a
specific situation. Reinforcement learning differs
from the supervised learning in a way that in
supervised learning the training data has the answer
key with it so the model is trained with the correct
answer itself whereas in reinforcement learning,
there is no answer but the reinforcement agent
decides what to do to perform the given task. In the
absence of a training dataset, it is bound to learn
from its experience.

Environment — Physical world in which the agent


operates

State — Current situation of the agent

Reward — Feedback from the environment

Policy — Method to map agent’s state to actions

Value — Future reward that an agent would receive by


taking an action in a particular state
There are 2 main types of RL algorithms. They are
model-based and modelfree.
2.3.1 Model-based

Model-based Reinforcement Learning refers to


learning optimal behavior indirectly by learning a
model of the environment by taking actions and
observing the outcomes that include the next state
and the immediate reward. The models predict the
outcomes of actions and are used in lieu of or in
addition to interaction with the environment to learn
optimal policies.

2.3.2 Model-free
A model-free algorithm (as opposed to a model-
based one) is an algorithm which does not use the
transition probability distribution (and the reward
function) associated with the Markov decision
process (MDP) , which, in RL, represents the
problem to be solved. The transition probability
distribution (or transition model) and the reward
function are often collectively called the "model" of
the environment (or MDP), hence the name "model-
free". A modelfree RL algorithm can be thought of as
an "explicit" trial-and- error algorithm. An example
of a model-free algorithm is Q-learning

There are many different algorithm used in


Reinforcement Learning.
There are two commonly used algorithm:-
Q-learning and SARSA

 Q-learning

Q-learning is an off policy reinforcement learning


algorithm that seeks to find the best action to take
given the current state. It’s considered off-policy
because the q-learning function learns from actions
that are outside the current policy, like taking random
actions, and therefore a policy isn’t needed. More
specifically, q-learning seeks to learn a policy that
maximizes the total reward.

 SARSA (State-Action-Reward-State-Action)
SARSA algorithm is a slight variation of the popular
Q-Learning algorithm. For a learning agent in any
Reinforcement Learning algorithm it’s policy can be
of two types:-
1. On Policy: In this, the learning agent learns the value
function according to the current action derived from the
policy currently being used.
2. Off Policy: In this, the learning agent learns the value
function according to the action derived from another
policy.
3. SARSA depends on the current state, current action,
reward obtained, next state and next action. This
observation lead to the naming of the learning technique
as
SARSA.
CHAPTER 3

Multiple Linear Regression (MLR)

3.0 DEFINITION

Multiple linear regression (MLR), also known simply as multiple


regression is a statistical technique that uses several explanatory
variables to predict the outcome of a response variable. The goal
of multiple linear regression (MLR) is to model the linear
relationship between The explanatory (independent) variables and
response (dependent) variable.

3.1Formula and Calculation Of Multiple Linear Regression ?

yi=β0+β1xi1+β2xi2+...+ βpxip+ϵ
βϵ=p =slope coefficients for each explanatory variable the model’s error
term (also known as the residuals)
3.1.1 What Multiple Linear Regression(MLR) Can Tell You?

#Simple linear regression is a function that allows an analyst or


statistician to make predictions about one variable based on
the information that is known about another variable.
Linear regression can only be used when one has two
continuous variables—an independent variable and a
dependent variable.
The independent variable is the parameter that is used to
calculate the dependent variable or outcome. A multiple
regression model extends to several explanatory variables

3.1.2The multiple regression model is based on the following


assumptions:

There is a linear relationship between the dependent variables and the


independent variables.
The independent variables are not too highly correlated with each other.

yi observations are selected independently and randomly from the population .σ.
Residuals should be normally distributed with a mean of 0 and variance

3.1.3 Graph
fig 1.1.0 Multiple Linear Regression


3.1.4Data visualization ?

Data visualization is the practice of translating information into a visual


context In such a kind of classification, a dependent variable will have
only two possible such as a map or graph, to make data easier for the
human brain to understand types either 1 and 0. For example, these
variables may represent success or failure, yes and pull insights from .or
0,1,2,3.

no, win or loss etc. he main goal of data visualization is to make it easier
to identify patterns,

Fig .1.2.0 Data visualization of Multiple Linear Regression


CHAPTER 4

PROJECT REPORT

4.1 Overview

A dataset related to customer preference is given according


to their area, age, bedrooms. This project classifies that what
is the correct price and, what is the future prediction of
room?

4.2 Dataset Description

The dataset consists of anonymous information such as area, bedrooms,


age, price .

Fig1.3.0: Dataset
4.3 Screen Shot of Project Code and Output Project Code and Output
with screenshots
CONCLUSION

This training has introduced us to Machine Learning. Now, we know that Machine
Learning is a technique of training machines to perform the activities a human brain can
do, albeit bit faster and better than an average human-being. Today we have seen that the
machines can beat human champions in games such as Chess, Mahjong, which are
considered very complex. We have seen that machines can be trained to perform human
activities in several areas and can aid humans in living better lives. Machine learning is
quickly growing field in computer science. It has applications in nearly every other field
of study and is already being implemented commercially because machine learning can
solve problems too difficult or time consuming for humans to solve. To describe machine
learning in general terms, a variety models are used to learn patterns in data and make
accurate predictions based on the patterns it observes. Machine Learning can be a
Supervised or Unsupervised. If we have a lesser amount of data and clearly labelled data
for training, we opt for Supervised Learning. Unsupervised Learning would generally
give better performance and results for large data sets. If we have a huge data set easily
available, we go for deep learning techniques. We also have learned Reinforcement
Learning and Deep Reinforcement Learning. We know what Neural Networks are, their
applications and limitations. Specifically, we have developed a thought process for
approaching problems that machine learning works so well at solving. We have learn
throw machine learning is different than descriptive statistics. Finally, when it comes to
the development of machine learning models of our own, we looked at the choices of
various development languages, IDEs and Platforms. Next thing that we need to do is
start learning and practicing each machine learning technique. The subject is vast, it
means that there is width, but if we consider the depth, each topic can be learned in a few
hours. Each topic is independent of each other. We need to take into consideration one
topic at a time, learn it, practice it and implement the algorithm/s in it using a language
choice of yours. This is the best way to start studying Machine Learning. Practicing one
topic at a time, very soon we can acquire the width that is eventually required of a
Machine Learning expert.
BIBLIOGRAPHY

[1] www.tutorialspoint.com
[2] en.wikipedia.org
[3] www.geeksforgeeks.org [4] towardsdatascience.com
[5] www.potentiaco.com
[6] www.sciencedirect.com
[7] www.guru99.com
[8] www.google.com

You might also like