Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

MACHINE LEARNING

CLASS 7
In this class, we will be learning about some general concepts related to one of the simplest examples of
supervised learning, namely, the classification problem.

We consider mainly binary classification problems.

In this context we introduce the concepts of hypothesis, hypothesis space and version space.

Input Features or Input Representation: Those features which are significant need be considered as
inputs for assigning the class labels.

Example
Consider the problem of assigning the label “family car” or “not family car” to cars. Let us assume that the
features that separate a family car from other cars are the price and engine power. These attributes or
features constitute the input representation for the problem. While deciding on this input representation,
we are ignoring various other attributes like seating capacity or color as irrelevant.
The hypothesis is defined as the supposition or proposed explanation based on insufficient
evidence or assumptions. It is just a guess based on some known facts but has not yet been
proven. A good hypothesis is testable, which results in either true or false.
Hypothesis space (H):
Hypothesis space is defined as a set of all possible legal hypotheses; hence it is also known as
a hypothesis set. It is used by supervised machine learning algorithms to determine the best
possible hypothesis to describe the target function or best maps input to output.

It is often constrained by choice of the framing of the problem, the choice of model, and the
choice of model configuration.

It is defined as the approximate function that best describes the target in supervised machine
learning algorithms. It is primarily based on data as well as bias and restrictions applied to
data.

Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper
output and can be evaluated as well as used to make predictions.
Let's understand the hypothesis (h) and hypothesis space (H) with a two-dimensional
coordinate plane showing the distribution of data as follows:
Assume we have some test data by which ML If we divide this coordinate plane in such as way that it
algorithms predict the outputs for input as follows: can help you to predict output or result as follows:
However, based on data, algorithm, and constraints, this coordinate plane can also be divided in the
following ways as follows:
With the above example, we can conclude that;
Hypothesis space (H) is the composition of all legal best possible ways to divide the coordinate plane so
that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the hypothesis and
hypothesis space would be like this:
Hypothesis in Statistics

•Null Hypothesis: A null hypothesis is a type of statistical hypothesis which tells that there is no
statistically significant effect exists in the given set of observations. It is also known as conjecture
and is used in quantitative analysis to test theories about markets, investment, and finance to
decide whether an idea is true or false.

•Alternative Hypothesis: An alternative hypothesis is a direct contradiction of the null hypothesis,


which means if one of the two hypotheses is true, then the other must be false. In other words, an
alternative hypothesis is a type of statistical hypothesis which tells that there is some significant
effect that exists in the given set of observations.
Significance level

The significance level is the primary thing that must be set before starting an experiment.

It is useful to define the tolerance of error and the level at which effect can be considered significantly.

During the testing process in an experiment, a 95% significance level is accepted, and the remaining 5%
can be neglected.

The significance level also tells the critical or threshold value. For e.g., in an experiment, if the
significance level is set to 98%, then the critical value is 0.02%.
P-value

The p-value in statistics is defined as the evidence against a null hypothesis. In other
words, P-value is the probability that a random chance generated the data or
something else that is equal or rarer under the null hypothesis condition.

If the p-value is smaller, the evidence will be stronger, and vice-versa which means
the null hypothesis can be rejected in testing. It is always represented in a decimal
form, such as 0.035.

Whenever a statistical test is carried out on the population and sample to find out P-
value, then it always depends upon the critical value. If the p-value is less than the
critical value, then it shows the effect is significant, and the null hypothesis can be
rejected. Further, if it is higher than the critical value, it shows that there is no
significant effect and hence fails to reject the Null Hypothesis.

You might also like