Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

SUMMER TRAINING REPORT

ON
MACHINE LEARNING

SUBMITTED IN PARTIAL FULFILMENT OF THE

REQUIREMENT FOR THE DEGREE OF B. TECH

At

Delhi Institute of Tool Engineering


Okhla-I, New Delhi

(Affiliated to Guru Gobind Singh Indraprastha University)

Department of Mechatronics Engineering


(2014-2018)

Submitted To: - Submitted By: -

Mrs. Charu Gaur Vishal Kumar


Assistant Professor MET (VII Sem)
DITE 07270211214
ACKNOWLEDGEMENT

It is a matter of great pleasure for me to submit this Summer Training


Report on MACHINE LEARNING, as a part of curriculum for award of
BACHELOR’S IN TECHNOLOGY (MET) degree of DITE (Affiliated to
GGSIPU), Delhi.

It gives me an immense pleasure in acknowledging the effort of entire


technical and non-technical staff of CODING NINJAS, Delhi, for giving
me their valuable time and full cooperation for undertaking this Practical
Summer Training Program at their center. I am indebted to the members
of the department for their wholehearted cooperation and for their
extended support in the use of available resources.

I would especially like to thank Mr. Ankush Singla, Project Mentor &
Faculty Head, without whose guidance and support this training would
not have been possible. Their encouragement and experience helped
to realize the practical aspect of programming. They gave me ample
support and help for accomplishment of my project. I feel grateful to
them for giving me the opportunity to have a practical experience in
this field. Their knowledge and immense work experience helped me a
lot in making this six weeks Practical Summer Training Program a great
learning experience.

VISHAL KUMAR
ABSTRACT

Present day computer applications require the representation of huge


amount of complex knowledge and data in programs and thus require
tremendous amount of work. Our ability to code the computers falls short
of the demand for applications. If the computers are endowed with the
learning ability, then our burden of coding the machine is eased (or at
least reduced). This is particularly true for developing expert systems
where the "bottle-neck" is to extract the expert’s knowledge and feed
the knowledge to computers. The present day computer programs in
general (with the exception of some Machine Learning programs)
cannot correct their own errors or improve from past mistakes, or learn
to perform a new task by analogy to a previously seen task. In contrast,
human beings are capable of all the above. Machine Learning will
produce smarter computers capable of all the above intelligent
behavior.

The area of Machine Learning deals with the design of programs that
can learn rules from data, adapt to changes, and improve performance
with experience. In addition to being one of the initial dreams of
Computer Science, Machine Learning has become crucial as
computers are expected to solve increasingly complex problems and
become more integrated into our daily lives. This is a hard problem, since
making a machine learn from its computational tasks requires work at
several levels, and complexities and ambiguities arise at each of those
levels.

So, here we study how the Machine learning take place, what are the
methods, discuss various Projects (Implemented during Training)
applications, present and future status of machine learning.
TABLE OF CONTENT

Introduction to Machine Learning -------------------------------------------------- 01

Architecture of Machine Learning Model ------------------------------------- 04

Classification of Machine Learning ----------------------------------------------- 09

Type of Machine Learning Algorithms ------------------------------------------- 13

Neural Networks -------------------------------------------------------------------------- 29

Reinforcement Learning -------------------------------------------------------------- 34

Python Machine Learning Packages -------------------------------------------- 36

Projects ------------------------------------------------------------------------------------- 40

Future of Machine Learning -------------------------------------------------------- 46

REFERENCES ------------------------------------------------------------------------------ 47
Introduction to Machine Learning

Machine learning is the science of getting computers to act without being


explicitly programmed. In the past decade, machine learning has given us
self-driving cars, practical speech recognition, effective web search, and a
vastly improved understanding of the human genome. Machine learning is
so pervasive today that you probably use it dozens of times a day without
knowing it. Many researchers also think it is the best way to make progress
towards human-level AI.

General Definition: Ability of a machine to improve its own


performance through the use of
a software that employs artificial
intelligence techniques to mimic the ways by which humans seem to
learn, such as repetition and experience.

ML Definition by Tom M. Mitchell: A computer program is said to


learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T, as measured
by P, improves with experience E.

Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which


concerns with developing computational theories of learning and building
learning machines. The goal of machine learning, closely coupled with the
goal of AI, is to achieve a thorough understanding about the nature of
learning process (both human learning and other forms of learning), about
the computational aspects of learning behaviors, and to implant the
learning capability in computer systems. Machine learning has been
recognized as central to the success of Artificial Intelligence, and it has
applications in various areas of science, engineering and society.

Learning?
Learning is a phenomenon and process which has manifestations of various
aspects. Roughly speaking, learning process includes (one or more of) the
following:

1.) Acquisition of new (symbolic) knowledge

2) Development of cognitive skills through instruction and practice.

3) Refinement and organization of knowledge into more effective


representations or more useful form

4) Discovery of new facts and theories through observation and experiment

The general effect of learning in a system is the improvement of the


system’s capability to solve problems. It is hard to imagine a system
capable of learning cannot improve its problem- solving performance. A
system with learning capability should be able to do self-changing in order
to perform better in its future problem-solving.

We also note that learning cannot take place in isolation: We typically


learn something (knowledge K) to perform some tasks (T), through some
experience E, and whether we have learned well or not will be judged by
some performance criteria P at the task T

There are various forms of improvement of a system’s problem-solving


ability:

1) To solve wider range of problems than before and perform


generalization.

2) To solve the same problem more effectively and give better quality
solutions.

3) To solve the same problem more efficiently and faster.


The Goals of Machine Learning.


The goal of ML, in simple words, is to understand the nature of (human
and other forms of) learning, and to build learning capability in computers.
To be more specific, there are three aspects of the goals of ML.

1) To make the computers smarter, more intelligent. The more direct


objective in this aspect is to develop systems (programs) for specific
practical learning tasks in application domains.

2) To develop computational models of human learning process and


perform computer simulations. The study in this aspect is also called
cognitive modeling.

3) To explore new learning methods and develop general learning


algorithms independent of applications.

Why the goals of ML are important and desirable.?


The present day computer programs in general (with the exception of some
ML programs) cannot correct their own errors or improve from past
mistakes, or learn to perform a new task by analogy to a previously seen
task. In contrast, human beings are capable of all the above. ML will
produce smarter computers capable of all the above intelligent behavior.

It is clear that central to our intelligence is our ability to learn. Thus a


thorough understanding of human learning process is crucial to understand
human intelligence. ML will gain us the insight into the underlying
principles of human learning and that may lead to the discovery of more
effective education techniques. It will also contribute to the design of
machine learning systems.
Architecture of Machine Learning
Model

If we go into details of machine learning process, firstly we identify,


choose and get the data that we want to work with the data with which we
start is raw and unstructured, it is never in the correct form as needed for
actual processing. It could have duplicate data, or data that is missing, or
else a lot of extra data that is not needed. The data could be formed from
various sources which may also eventually end up being duplicate or
redundant data. In this case, there comes the requirement for pre-
processing the data, so that the process could understand the data, and the
good thing is that the machine learning products usually provide some
data pre-processing modules to process the raw or unstructured data.
So, in order to apply the actual algorithm to the data, we need to have that
complete unstructured data into a structured and shaped data for which a
process of pre-massaging is required, through which the data is passed.
Finally, we get a candidate copy of data which could be processes
through the algorithm to get the actual golden copy.

After the data is pre-processed, we get some good structured data, and
this data is now an input for machine learning. But is this a one-time job?
Of course not, the process has to be iterative, and it has to be iterative
until the data is available. In machine learning the major chunk of time is
spent in this process. That is, working on the data to make it structured,
clean, ready and available. Once the data is available, the algorithms
could be applied to the data. Not only pre-processing tools, but the
machine learning products also offer a large number of machine learning
algorithms as well. The result of the algorithm applied data is a model,
but now the question is whether this is the final model we needed.

No, it is the candidate model that we got. Candidate model means the first
most appropriate model that we get, but still it needs to be massaged. But
do we get only one candidate model? Of course not, since this is an
iterative process, we do not actually know what the best candidate model
is, until we again and again produce several candidate models through the
iterative process. We do it until we get the model that is good enough to
be deployed. Once the model is deployed, applications start making use
of it, so there is iteration at small levels and at the largest level as well.

We need to repeat the entire process again and again and re-create the
model at regular intervals. The reason again for this process is very
simple, it’s because the scenarios and factors change and we need to have
our model up to date and real all the time. This could eventually also
mean to process new data or applying new algorithms altogether.

Classification of Machine Learning


System
There some variations of how to define the types of Machine Learning
Systems but commonly they can be divided into categories according to
their purpose and the main categories are the following:
Supervised Machine Learning: Supervised learning is a machine learning
technique for learning a function from training data. The training data
consist of pairs of input objects (typically vectors), and desired outputs.
The output of the function can be a continuous value (called regression),
or can predict a class label of the input object (called classification).

The task of the supervised learner is to predict the value of the function for
any valid input object after having seen a number of training examples (i.e.
pairs of input and target output). To achieve this, the learner has to
generalize from the presented data to unseen situations in a "reasonable"
way. “Supervised learning is a machine learning technique whereby the
algorithm is first presented with training data which consists of examples
which include both the inputs and the desired outputs; thus enabling it to
learn a function. The learner should then be able to generalize from the
presented data to unseen examples.” By Tom M. Mitchell
Unsupervised Machine Learning: Unsupervised learning is a type of
machine learning where manual labels of inputs are not used. It is
distinguished from supervised learning approaches which learn how to
perform a task, such as classification or regression, using a set of human
prepared examples. Unsupervised learning means we are only given the X
(Feature Vector) and some (ultimate) feedback function on our
performance. We simply have a training set of vectors without function
values of them. The problem in this case, typically, is to partition the
training set into subsets in some appropriate way. Input data is not labeled
and does not have a known result. A model is prepared by deducing
structures present in the input data. This may be to extract general rules. It
may be through a mathematical process to systematically reduce
redundancy, or it may be to organize data by similarity.
Semi-Supervised Learning: Semi-Supervised learning uses both labeled
and unlabeled data to perform an otherwise supervised learning or
unsupervised learning task. There is a desired prediction problem but the
model must learn the structures to organize the data as well as make
predictions. The goal is to learn a predictor that predicts future test data
better than the predictor learned from the labeled training data alone. semi-
supervised learning finds applications in cognitive psychology as a
computational model for human learning. In human categorization and
concept forming, the environment provides unsupervised data (e.g., a child
watching surrounding objects by herself) in addition to labeled data from a
teacher (e.g., Dad points to an object and says “bird!”). There is evidence
that human beings can combine labeled and unlabeled data to facilitate
learning.

Reinforcement Learning: Reinforcement Learning is a type of Machine


Learning, and thereby also a branch of Artificial Intelligence. It allows
machines and software agents to automatically determine the ideal
behavior within a specific context, in order to maximize its performance.
Simple reward feedback is required for the agent to learn its behavior; this
is known as the reinforcement signal. Some applications of
the reinforcement learning algorithms are computer played board games
(Chess, Go), robotic hands, and self-driving cars.
Types of Machine Learning Algorithms

Machine learning comes in many different flavors, depending on the


algorithm and its objectives. You can divide machine learning algorithms
into three main groups based on their purpose:

1.) Supervised Learning Algorithms


2.) Unsupervised Learning Algorithms
3.) Reinforcement Learning Algorithms

Supervised Learning Algorithms: Supervised learning is where you have


input variables (x) and an output variable (Y) and you use an algorithm to
learn the mapping function from the input to the output.

Y = F(X)

The goal is to approximate the mapping function so well that when you
have new input data (x) that you can predict the output variables (Y) for
that data.

We know the correct answers, the algorithm iteratively makes predictions


on the training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.

Supervised learning problems can be further grouped into regression and


classification problems.

• Classification: A classification problem is when the output variable is a


category, such as “red” or “blue” or “disease” and “no disease”.
• Regression: A regression problem is when the output variable is a
real(continues) value, such as “dollars” or “weight”.

Some popular examples of supervised machine learning algorithms are:

• Linear Regression: Linear regression is a linear model, e.g. a model that


assumes a linear relationship between the input variables (x) and the single
output variable (y). More specifically, that y can be calculated from a linear
combination of the input variables (x).
When there is a single input variable (x), the method is referred to as simple
linear regression. When there are multiple input variables, literature
from statistics often refers to the method as multiple linear regression.
To define the supervised learning problem more formally, given a
training set, the aim is to learn a function so that is a predictor for
the corresponding value of Y. This function is called a hypothesis.
Next, we need to decide while designing a learning algorithm is the
representation if the hypothesis function as a function of .
Let us initially assume that the hypothesis function looks like this:

Here, are called parameters.


In linear regression, we have a training set and we want to come up with
values for the parameters so that the straight line we get out of
somehow fits the data well.
Let's try to choose values for the parameters so that given the in the
training set, we make reasonable predictions for the values. Formally, we
want to solve a minimization problem, that is, we want to minimize the
difference between . To achieve that, we solve the following
equation:

Here, is the number of training examples. To make the math a little bit

easier, we put a factor of , and it gives us the same value of the process.

By convention, we define a cost function:

This cost function is also called the squared error function.


The expression means that we want to find the values of so
that the cost function is minimized.

Gradient Descent
Gradient descent is an algorithm that is used to minimize a function.
Gradient descent is used not only in linear regression; it is a more general
algorithm.
We will now learn how gradient descent algorithm is used to minimize
some arbitrary function f and, later on, we will apply it to a cost function
to determine its minimum.
We will start off by some initial guesses for the values of and then
keep on changing the values according to the formula:

Here, is called the learning rate, and it determines how big a step needs
to be taken when updating the parameters. The learning rate is always a
positive number.
We want to simultaneously update , that is, calculate the right-
hand-side of the above equation for both and then
update the values of the parameters to the newly calculated ones. This
process is repeated till convergence is achieved.
If is too small, then we will end up taking tiny baby steps, which means a
lot of steps before we get anywhere near the global minimum. Now, if is
too large, then there is a possibility that we miss the minimum entirely. It
may fail to converge or it can even diverge.

• Logistic Regression: Logistic regression is used for a different class of


problems known as classification problems. Here the aim is to predict the
group to which the current object under observation belongs to.
Classification is all about portioning the data with us into groups based on
certain features. Logistic regression is one of the most popular machine
learning algorithms for binary classification. This is because it is a simple
algorithm that performs very well on a wide range of problems.
Z=ΘTX
Θ is coefficient Vector and X is Feature Vector.

In Logistic Regression, a sigmoid (also knows as logistic) function is


applied over the general known hypothesis function (as in Linear
Regression) to get it into a range of (0,1). Sigmoid function is as follows,
Here is Plot of Sigmoid Function,

the output is transformed into a probability using the logistic function


g(X)=P(y=1|x; Θ)

and as y can take only 0 & 1, the other value probability is 1 minus the
hypothesis value.
With the above interpretation we can safely decide the decision boundary
with the following rule: y=1 if g(y)≥0.5,
else y=0. g(ΘTX)≥0.5 implies ΘTX≥0 and similarly for less than
condition.

Cost function
With the modified hypothesis function, taking a square error function
won't work as it no longer convex in nature and tedious to minimize. We
take up a new form of cost function which is as follows:
E(g(Θ,X),y) = −log(g(Θ,X)) if y=1
E(g(Θ,X),y) = −log(1−g(Θ,X)) if y=0

This can be written in a simpler form as:


E(g(Θ,X),y) = −ylog(g(Θ,X))−(1−y)log(1−g(Θ,X))
and it is quiet evident that it is equivalent to the above cost function. For
estimation of parameters, we take the mean of cost function over all
points in the training data. So,

where, C is Equal to Θ and “sg(z)” to “g(z)”.


For parameter estimation, we use an iterative method called gradient
descent that improves the parameters over each step and minimizes the
cost function H(C)H(C) to the most possible value.
In gradient descent, you start with random parameter values and then
update their values at each step to minimize the cost function by a some
amount at each step until we reach a minimum hopefully or until there is
negligible change over certain number of consecutive steps. The steps of
gradient descent go as follows:

where β is equal to Θ.
for each i =1,...n and p is the learning rate at which we move along the
slope on the curve to minimize the cost function.

• Naïve Bayes Classifier: The Naive Bayes Classifier technique is based on


the so-called Bayesian theorem and is particularly suited when the
dimensionality of the inputs is high. Despite its simplicity, Naive Bayes
can often outperform more sophisticated classification methods.

RED GREEN
To demonstrate the concept of Naïve Bayes Classification, consider the
example displayed in the illustration above. As indicated, the objects can
be classified as either GREEN or RED. Our task is to classify new cases as
they arrive, i.e., decide to which class label they belong, based on the
currently exiting objects.
Since there are twice as many GREEN objects as RED, it is reasonable to
believe that a new case (which hasn't been observed yet) is twice as likely
to have membership GREEN rather than RED. In the Bayesian analysis,
this belief is known as the prior probability. Prior probabilities are based
on previous experience, in this case the percentage of GREEN and RED
objects, and often used to predict outcomes before they actually happen.
Thus, we can write:

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED,


our prior probabilities for class membership are:

Having formulated our prior probability, we are now ready to classify a


new object (WHITE circle). Since the objects are well clustered, it is
reasonable to assume that the more GREEN (or RED) objects in the
vicinity of X, the more likely that the new cases belong to that particular
color. To measure this likelihood, we draw a circle around X which
encompasses a number (to be chosen a priori) of points irrespective of their
class labels. Then we calculate the number of points in the circle belonging
to each class label. From this we calculate the likelihood:

From the illustration above, it is clear that Likelihood of X given GREEN


is smaller than Likelihood of X given RED, since the circle encompasses
1 GREEN object and 3 RED ones. Thus:

Although the prior probabilities indicate that X may belong to GREEN


(given that there are twice as many GREEN compared to RED) the
likelihood indicates otherwise; that the class membership of X is RED
(given that there are more RED objects in the vicinity of X than GREEN).
In the Bayesian analysis, the final classification is produced by combining
both sources of information, i.e., the prior and the likelihood, to form a
posterior probability using the so-called Bayes' rule (named after Rev.
Thomas Bayes 1702-1761).

Finally, we classify X as RED since its class membership achieves the


largest posterior probability.
(The above probabilities are not normalized. However, this does not affect
the classification outcome since their normalizing constants are the same.)
There are multiple variations of the Naive Bayes algorithm depending on
the distribution of . e.g. The Gaussian Naive Bayes algorithm,
The Multinomial Naive Bayes algorithm, The Bernoulli algorithm.

• Support Vector Machine: “Support Vector Machine” (SVM) is a


supervised machine learning algorithm which can be used for both
classification or regression challenges. However, it is mostly
used in classification problems. In this algorithm, we plot each data item
as a point in n-dimensional space (where n is number of features you have)
with the value of each feature being the value of a particular coordinate.
Then, we perform classification by finding the hyper-plane
that differentiate the two classes very well. The margin is de ned as the
distance between the separating hyperplane (decision boundary) and the
training samples that are closest to this hyperplane, which are the so-called
support vectors.
Maximum margin intuition

The rationale behind having decision boundaries with large margins is


that they tend to have a lower generalization error whereas models with
small margins are more prone to over tting. To get an intuition for the
margin maximization, let's take a closer look at those positive and
negative hyperplanes that are parallel to the decision boundary, which can
be expressed as follows:

w0 +wT xpos = 1 (1)

w0+wTxneg = −1 (2)

If we subtract those two linear equations (1) and (2) from each other, we
get:

⇒wT (xpos −xneg)=2

We can normalize this by the length of the vector w, which is de ned as


follows:
The left side of the preceding equation can then be interpreted as the
distance between the positive and negative hyperplane, which is the so-
called margin that we want to maximize. Now the objective function of
the SVM becomes the maximization of this margin

solved by quadratic programming.

• Tree Based Algorithms:


Decision Tree: Decision tree is a type of supervised learning algorithm
(having a pre-defined target variable) that is mostly used in classification
problems. It works for both categorical and continuous input and output
variables. In this technique, we split the population or sample into two or
more homogeneous sets (or sub-populations) based on most significant
splitter / differentiator in input variables.

Types of decision tree is based on the type of target variable we have. It


can be of two types:

1. Categorical Variable Decision Tree: Decision Tree which has


categorical target variable then it called as categorical variable
decision tree. Example:- In above scenario of student problem,
where the target variable was “Student will play cricket or not” i.e.
YES or NO.
2. Continuous Variable Decision Tree: Decision Tree has
continuous target variable then it is called as Continuous Variable
Decision Tree.

Random forest: Random forest is just an improvement over the top of the
decision tree algorithm. The core idea behind Random Forest is to generate
multiple small decision trees from random subsets of the data (hence the
name “Random Forest”). Each of the decision tree gives a biased classifier
(as it only considers a subset of the data). They each capture different
trends in the data. This ensemble of trees is like a team of experts each with
a little knowledge over the overall subject but thorough in their area of
expertise. Now, in case of classification the majority vote is considered to
classify a class. In analogy with experts, it is like asking the same multiple
choice question to each expert and taking the answer as the one that most
no. of experts vote as correct. In case of Regression, we can use the avg. of
all trees as our prediction. In addition to this, we can also weight some
more decisive trees high relative to others by testing on the validation data.
Unsupervised Learning Algorithms: Unsupervised learning is where you
only have input data (X) and no corresponding output variables.The goal
for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning


above there is no correct answers and there is no teacher. Algorithms are
left to their own devises to discover and present the interesting structure in
the data.

Unsupervised learning problems can be further grouped into clustering and


association problems.

• Clustering: A clustering problem is where you want to discover the


inherent groupings in the data, such as grouping customers by purchasing
behavior.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.

• Clustering: Clustering is the task of dividing the population or data points


into a number of groups such that data points in the same groups are more
similar to other data points in the same group than those in other groups.
In simple words, the aim is to segregate groups with similar traits and
assign them into clusters.
Clustering can be divided into two subgroups :
Hard Clustering: In hard clustering, each data point either belongs to a
cluster completely or not. For example, in the above example each
customer is put into one group out of the 10 groups.

Soft Clustering: In soft clustering, instead of putting each data point into
a separate cluster, a probability or likelihood of that data point to be in
those clusters is assigned. For example, from the above scenario each
costumer is assigned a probability to be in either of 10 clusters of the retail
store.

K Means Clustering: K means is an iterative clustering algorithm that


aims to find local maxima in each iteration. This algorithm works in these
5 steps:

1 Specify the desired number of clusters K : Let us choose k=2 for these 5
data points in 2-D space.

2 Randomly assign each data point to a cluster : Let’s assign three points
in cluster 1 shown using red color and two points in cluster 2 shown using

grey color.
3 Compute cluster centroids : The centroid of data points in the red cluster
is shown using red cross and those in grey cluster using grey cross.
4 Re-assign each point to the closest cluster centroid : Note that only the
data point at the bottom is assigned to the red cluster even though its
closer to the centroid of grey cluster. Thus, we assign that data point into
grey cluster

5 Re-compute cluster centroids : Now, re-computing the centroids for


both the clusters.

6 Repeat steps 4 and 5 until no improvements are possible : Similarly, we’ll


repeat the 4th and 5th steps until we’ll reach global optima. When there
will be no further switching of data points between two clusters for two
successive repeats. It will mark the termination of the algorithm if not
explicitly mentioned.
Hierarchical Clustering: Hierarchical clustering, as the name suggests is
an algorithm that builds hierarchy of clusters. This algorithm starts with all
the data points assigned to a cluster of their own. Then two nearest clusters
are merged into the same cluster. In the end, this algorithm terminates when
there is only a single cluster left.
The results of hierarchical clustering can be shown using dendrogram. The
dendrogram can be interpreted as:

At the bottom, we start with 25 data points, each assigned to separate


clusters. Two closest clusters are then merged till we have just one cluster
at the top. The height in the dendrogram at which two clusters are merged
represents the distance between two clusters in the data space.
The decision of the no. of clusters that can best depict different groups can
be chosen by observing the dendrogram. The best choice of the no. of
clusters is the no. of vertical lines in the dendrogram cut by a horizontal
line that can transverse the maximum distance vertically without
intersecting a cluster.
In the above example, the best choice of no. of clusters will be 4 as the red
horizontal line in the dendrogram below covers maximum vertical distance
AB.
Two important things that you should know about hierarchical clustering
are:

1 This algorithm has been implemented above using bottom up approach.


It is also possible to follow top-down approach starting with all data
points assigned in the same cluster and recursively performing splits till
each data point is assigned a separate cluster.

2 The decision of merging two clusters is taken on the basis of closeness


of these clusters. There are multiple metrics for deciding the closeness of
two clusters :
o Euclidean distance: ||a-b||2 = √(Σ(ai-bi))
o Squared Euclidean distance: ||a-b||22 = Σ((ai-bi)2)
o Manhattan distance: ||a-b||1 = Σ|ai-bi|
o Maximum distance:||a-b||INFINITY = maxi|ai-bi|
Neural Networks
An Artificial Neural Network (ANN) is a computational model that is
inspired by the way biological neural networks in the human brain
process information

Single Neuron(Perceptron):
The basic unit of computation in a neural network is the neuron, often
called a node or unit. It receives input from some other nodes, or from an
external source and computes an output. Each input has an
associated weight (w), which is assigned on the basis of its relative
importance to other inputs. The node applies a function f (defined below)
to the weighted sum of its inputs as shown in Figure 1 below:

The above network takes numerical inputs X1 and X2 and has


weights w1 and w2 associated with those inputs. Additionally, there is
another input 1 with weight b (called the Bias) associated with it. We will
learn more details about role of the bias later.

The output Y from the neuron is computed as shown in the Figure


1. The function f is non-linear and is called the Activation Function. The
purpose of the activation function is to introduce non-linearity into the
output of a neuron. This is important because most real world data is non
linear and we want neurons to learn these non linear representations.

Every activation function (or non-linearity) takes a single number and


performs a certain fixed mathematical operation on it [2]. There are several
activation functions you may encounter in practice:
• Sigmoid: takes a real-valued input and squashes it to range between
0 and 1

σ(x) = 1 / (1 + exp(−x))

• tanh: takes a real-valued input and squashes it to the range [-1, 1]

tanh(x) = 2σ(2x) − 1

• ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued


input and thresholds it at zero (replaces negative values with zero)

f(x) = max(0, x)

The below figures show each of the above activation functions.

FeedForward Neural Network: The feedforward neural network was


the first and simplest type of artificial neural network devised. It contains
multiple neurons (nodes) arranged in layers. Nodes from adjacent layers
have connections or edges between them All these connections have
weights associated with them. In a feedforward network, the information
moves in only one direction – forward – from the input nodes, through
the hidden nodes (if any) and to the output nodes. There are no cycles or
loops in the network (this property of feed forward networks is different
from Recurrent Neural Networks in which the connections between the
nodes form a cycle).
A feedforward neural network can consist of three types of nodes:

1. Input Nodes – The Input nodes provide information from the


outside world to the network and are together referred to as the
“Input Layer”. No computation is performed in any of the Input
nodes – they just pass on the information to the hidden nodes.
2. Hidden Nodes – The Hidden nodes have no direct connection with
the outside world (hence the name “hidden”). They perform
computations and transfer information from the input nodes to the
output nodes. A collection of hidden nodes forms a “Hidden Layer”.
While a feedforward network will only have a single input layer and
a single output layer, it can have zero or multiple Hidden Layers.
3. Output Nodes – The Output nodes are collectively referred to as the
“Output Layer” and are responsible for computations and
transferring information from the network to the outside world.

Two examples of feedforward networks are given below:

Single Layer Perceptron – This is the simplest feedforward neural


network and does not contain any hidden layer.

Multi Layer Perceptron – A Multi Layer Perceptron has one or more


hidden layers. We will only discuss Multi Layer Perceptron’s below
since they are more useful than Single Layer Perceptron’s for practical
applications today.
Backpropagation Algorithm: The Backpropagation algorithm is a
supervised learning method for multilayer feed-forward networks from the
field of Artificial Neural Networks.
Feed-forward neural networks are inspired by the information processing
of one or more neural cells, called a neuron. A neuron accepts input signals
via its dendrites, which pass the electrical signal down to the cell body. The
axon carries the signal out to synapses, which are the connections of a cell’s
axon to other cell’s dendrites.

The principle of the backpropagation approach is to model a given function


by modifying internal weightings of input signals to produce an expected
output signal. The system is trained using a supervised learning method,
where the error between the system’s output and a known expected output
is presented to the system and used to modify its internal state.
Reinforcement Learning
In reinforcement learning, the goal is to develop a system (agent) that
improves its performance based on interactions with the environment.
Since the information about the current state of the environment typically
also includes a so-called reward signal, we can think of reinforcement
learning as a eld related to supervised learning. However, in
reinforcement learning this feedback is not the correct ground truth label
or value, but a measure of how well the action was measured by a reward
function. Through the interaction with the environment, an agent can then
use reinforcement learning to learn a series of actions that maximizes this
reward via an exploratory trial-and-error approach or deliberative
planning.

Consider an example of a child learning to walk.

Let’s formalize the above example, the “problem statement” of the


example is to walk, where the child is an agent trying to manipulate
the environment (which is the surface on which it walks) by taking
actions (viz walking) and he/she tries to go from one state (viz each
step he/she takes) to another. The child gets a reward (let’s say
chocolate) when he/she accomplishes a submodule of the task (viz
taking couple of steps) and will not receive any chocolate (a.k.a
negative reward) when he/she is not able to walk. This is a simplified
description of a reinforcement learning problem.
Markov Decision Process: The mathematical framework for defining a
solution in reinforcement learning scenario is called Markov Decision
Process. This can be designed as:

• Set of states, S
• Set of actions, A
• Reward function, R
• Policy, π
• Value, V

We have to take an action (A) to transition from our start state to our end
state (S). In return getting rewards (R) for each action we take. Our actions
can lead to a positive reward or negative reward.
The set of actions we took define our policy (π) and the rewards we get in
return defines our value (V). Our task here is to maximize our rewards by
choosing the correct policy. So we have to maximize

for all possible values of S for a time t.

Q-learning: Q-learning is a policy based learning algorithm with the


function approximator as a neural network. This algorithm was used by
Google to beat humans at Atari games!

Let’s see a pseudocode of Q-learning:

1. Initialize the Values table ‘Q(s, a)’.


2. Observe the current state ‘s’.
3. Choose an action ‘a’ for that state based on one of the action
selection policies (e.g. epsilon greedy)
4. Take the action, and observe the reward ‘r’ as well as the new state
‘s’.
5. Update the Value for the state using the observed reward and the
maximum reward possible for the next state. The updating is done
according to the formula and parameters described above.
6. Set the state to the new state, and repeat the process until a terminal
state is reached.
A simple description of Q-learning can be summarized as follows:

Some major domains where RL has been applied are as follows:

• Game Theory and Multi-Agent Interaction


• Robotics
• Computer Networking
• Vehicular Navigation
• Medicine and
• Industrial Logistic.
Python Machine Learning Packages
Python is often the choice for developers who need to apply statistical
techniques or data analysis in their work, or for data scientists whose tasks
need to be integrated with web apps or production environments. In
particular, Python really shines in the field of machine learning. Its
combination of machine learning libraries and flexibility makes Python
uniquely well-suited to developing sophisticated models and prediction
engines that plug directly into production systems.
One of Python’s greatest assets is its extensive set of libraries. Libraries
are sets of routines and functions that are written in a given language. A
robust set of libraries can make it easier for developers to perform complex
tasks without rewriting many lines of code.

Basic libraries for Machine Learning:

These are the basic libraries that transform Python from a general purpose
programming language into a powerful and robust tool for data analysis
and visualization. Sometimes called the SciPy Stack, they’re the
foundation that the more specialized tools are built on.

1.) NumPy is the foundational library for scientific computing in


Python, and many of the libraries on this list use NumPy arrays as
their basic inputs and outputs. In short, NumPy introduces objects
for multidimensional arrays and matrices, as well as routines that
allow developers to perform advanced mathematical and statistical
functions on those arrays with as little code as possible.
2.) SciPy builds on NumPy by adding a collection of algorithms and
high-level commands for manipulating and visualizing data. This
package includes functions for computing integrals numerically,
solving differential equations, optimization, and more.

3.) Pandas adds data structures and tools that are designed for practical
data analysis in finance, statistics, social sciences, and engineering.
Pandas works well with incomplete, messy, and unlabeled data
(i.e., the kind of data you’re likely to encounter in the real world),
and provides tools for shaping, merging, reshaping, and slicing
datasets.
4.) IPython(Jupyter Notebook) extends the functionality of Python’s
interactive interpreter with a souped-up interactive shell that adds
introspection, rich media, shell syntax, tab completion, and
command history retrieval. It also acts as an embeddable
interpreter for your programs that can be really useful for
debugging. If you’ve ever used Mathematica or MATLAB, you
should feel comfortable with IPython.
5.) matplotlib is the standard Python library for creating 2D plots and
graphs. It’s pretty low-level, meaning it requires more commands
to generate nice-looking graphs and figures than with some more
advanced libraries. However, the flip side of that is flexibility.
With enough commands, you can make just about any kind of
graph you want with matplotlib.

Libraries for Machine Learning:

Machine learning sits at the intersection of Artificial Intelligence and


statistical analysis. By training computers with sets of real-world data,
we’re able to create algorithms that make more accurate and sophisticated
predictions, whether we’re talking about getting better driving directions
or building computers that can identify landmarks just from looking at
pictures. The following libraries give Python the ability to tackle a
number of machine learning tasks, from performing basic regressions to
training complex neural networks.

1. scikit-learn builds on NumPy and SciPy by adding a set of


algorithms for common machine learning and data mining tasks,
including clustering, regression, and classification. As a library,
scikit-learn has a lot going for it. Its tools are well-documented and
its contributors include many machine learning experts. What’s
more, it’s a very curated library, meaning developers won’t have to
choose between different versions of the same algorithm. Its power
and ease of use make it popular with a lot of data-heavy startups,
including Evernote, OKCupid, Spotify, and Birchbox.
2. Theano uses NumPy-like syntax to optimize and evaluate
mathematical expressions. What sets Theano apart is that it takes
advantage of the computer’s GPU in order to make data-intensive
calculations up to 100x faster than the CPU alone. Theano’s speed
makes it especially valuable for deep learning and other
computationally complex tasks.
3. TensorFlow is another high-profile entrant into machine learning,
developed by Google as an open-source successor to DistBelief,
their previous framework for training neural networks. TensorFlow
uses a system of multi-layered nodes that allow you to quickly set
up, train, and deploy artificial neural networks with large datasets.
It’s what allows Google to identify objects in photos or understand
spoken words in its voice-recognition app.
Projects
During Summer Training Various Machine Learning Projects are done. A
short Introduction of one of important project give below.

MNIST Handwritten Digit Recognition:


It is a digit recognition task. As such there are 10 digits (0 to 9) or 10 classes
to predict. Results are reported using prediction error, which is nothing
more than the inverted classification accuracy
Images (MNIST Dateset) of digits were taken from a variety of scanned
documents, normalized in size and centered. This makes it an excellent
dataset for evaluating models, allowing the developer to focus on the
machine learning with very little data cleaning or preparation required.
Each image is a 28 by 28 pixel square (784 pixels total). A standard spit of
the dataset is used to evaluate and compare models, where 60,000 images
are used to train a model and a separate set of 10,000 images are used to
test it.
During the Project Various Python Scientific Computing (NumPy) and
Data Visualization (Matplotlib) Packages is Used for Exploring Dataset
and to Visualize Data to see if there is Relation between the features of
Dataset. Scikit-Learn (A Machine Learning Python Library) is used to
Model Machine Learning Algorithms.

Project Code:
Best Estimator learned through GridSearch

SVC(C=3,cache_size=200,class_weight=None,coef0=0.0, degree=3,
gamma=0.001,kernel='rbf', max_iter=-1, probability=False,
random_state=None,
shrinking=True, tol=0.001, verbose=False)
Other Projects Which are implemented During Summer Training:

1.) Sentimental Analysis of Tweets


2.) Face Recognition
3.) Stock Prediction
4.) Music Genre Classification
5.) Image Classification
Future of Machine Learning

Research in Machine Learning Theory is a combination of attacking


established fundamental questions, and developing new frameworks for
modeling the needs of new machine learning applications. While it is
impossible to know where the next breakthroughs will come, a few topics
one can expect the future to hold include:

• Better understanding how auxiliary information, such as unlabeled data,


hints from a user, or previously-learned tasks, can best be used by a
machine learning algorithm to improve its ability to learn new things.
Traditionally, Machine Learning Theory has focused on problems of
learning a task (say, identifying spam) from labeled examples (email
labeled as spam or not). However, often there is additional information
available. One might have access to large quantities of unlabeled data
(email messages not labeled by their type, or discussion-group transcripts
on the web) that could potentially provide useful information. One might
have other hints from the user besides just labels, e.g. highlighting relevant
portions of the email message. Or, one might have previously learned
similar tasks and want to transfer some of that experience to the job at hand.
These are all issues for which a solid theory is only beginning to be
developed.

• Further developing connections to economic theory. As software agents


based on machine learning are used in competitive settings, “strategic”
issues become increasingly important. Most algorithms and models to date
have focused on the case of a single learning algorithm operating in an
environment that, while it may be changing, does not have its own
motivations and strategies. However, if learning algorithms are to operate
in settings dominated by other adaptive algorithms acting in their own
users’ interests, such as bidding on items or performing various kinds of
negotiations, then we have a true merging of computer science and
economic models. In this combination, many of the fundamental issues are
still wide open. Report
REFERENCES

Books:

Sebastian Raschka (2015), Python Machine Learning

Richard S. Sutton, A. G. (2015 Draft). Reinforcement Learning. MIT Press.

Jiawei Han, Micheline Kamber, Jian Pei(2000). Data Mining: Concepts


and Techniques, 3rd Edition.

Links:

https://www.medium.com/

https://www.analyticsvidhya.com

http://www.tutorialspoint.com/numpy

http://www.tutorialpoint.com/pandas

You might also like