Ai Hon 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Unit IV

Statistical Methods
Standard Deviation,
Normalization- Feature Scaling,
Min-Max scaling,
Bias,
Variance,
Regularization,
Ridge Regression,
Lasso Regression,
Cross Validation Techniques- K-fold,
LOOCV,
Stratified K-fold,
Grid Search CV,
CV Error
What is meant by statistical methods?
Definition. Statistical methods are mathematical formulas, models, and techniques
that are used in statistical analysis of raw research data. The application of statistical
methods extracts information from research data and provides different ways to assess
the robustness of research outputs.

Statistical methods refer to general principles and techniques which are commonly
used in the collection, analysis and interpretation of data.

Where is statistical methods used?


It is a broad discipline with applications in academia, business, the social sciences,
genetics, population studies, engineering and several other fields. Statistical
analysis has several functions. You can use it to make predictions, perform simulations,
create models, reduce risk and identify trends.

What are the two main statistical methods?


Two main statistical methods are used in data analysis: descriptive statistics, which
summarizes data using indexes such as mean and median and another is
inferential statistics, which draw conclusions from data using statistical tests such as
student's t-test.

What are the statistical Methods?


Statistical models are tools to help you analyze sets of data. Professionals use statistical
models as part of statistical analysis, which is the process of gathering and interpreting
quantitative data. Using a statistical model can help you evaluate the characteristics of a
sample size within a given population and apply your findings to the larger group. While
statisticians and data analysts may use statistical models more than others, many
professionals can benefit from understanding statistical models, including marketing
representatives, business executives and government officials.
Why statistical methods are required?
Many organizations nowadays have a lot of data available about their customers,
operations, services or products and related factors. A statistical model can make all of
this data more comprehensible. When businesses have a way to analyze and understand
all of their data, they can perform tasks such as:
Standard Deviation:-

What is Standard deviation?


Standard Deviation is a measure which shows how much variation (such as spread,
dispersion, spread,) from the mean exists. The standard deviation indicates a “typical”
deviation from the mean. It is a popular measure of variability because it returns to the
original units of measure of the data set. Like the variance, if the data points are close to
the mean, there is a small variation whereas the data points are highly spread out from
the mean, then it has a high variance. Standard deviation calculates the extent to which
the values differ from the average. Standard Deviation, the most widely used measure of
dispersion, is based on all values. Therefore a change in even one value affects the value
of standard deviation. It is independent of origin but not of scale. It is also useful in certain
advanced statistical problems.

Standard Deviation Formula


The formulas for the variance and the standard deviation is given below:
Standard Deviation Formula
The population standard deviation formula is given as:
Here,
σ = Population standard deviation
N = Number of observations in population
Xi = ith observation in the population
μ = Population mean
Similarly, the sample standard deviation formula is:

Here,
s = Sample standard deviation
n = Number of observations in sample
xi = ith observation in the sample
x―
= Sample mean

How is Standard Deviation calculated?


The formula for standard deviation makes use of three variables. The first variable is the
value of each point within a data set, with a sum-number indicating each additional
variable (x, x1, x2, x3, etc). The mean is applied to the values of the variable M and the
number of data that is assigned to the variable n. Variance is the average of the values
of squared differences from the arithmetic mean.
To calculate the mean value, the values of the data elements have to be added together
and the total is divided by the number of data entities that were involved.
Standard deviation, denoted by the symbol σ, describes the square root of the mean of
the squares of all the values of a series derived from the arithmetic mean which is also
called the root-mean-square deviation. 0 is the smallest value of standard deviation since
it cannot be negative. When the elements in a series are more isolated from the mean,
then the standard deviation is also large.
The statistical tool of standard deviation is the measures of dispersion that computes the
erraticism of the dispersion among the data. For instance, mean, median and mode are
the measures of central tendency. Therefore, these are considered to be the central first
order averages. The measures of dispersion that are mentioned directly over are
averages of deviations that result from the average values, therefore these are called
second-order averages.
Standard Deviation Example
Let’s calculate the standard deviation for the number of gold coins on a ship run by pirates.
There are a total of 100 pirates on the ship. Statistically, it means that the population is
100. We use the standard deviation equation for the entire population if we know a
number of gold coins every pirate has.
Statistically, let’s consider a sample of 5 and here you can use the standard deviation
equation for this sample population.
This means we have a sample size of 5 and in this case, we use the standard deviation
equation for the sample of a population.
Consider the number of gold coins 5 pirates have; 4, 2, 5, 8, 6.

Standard deviation:
Feature Scaling is a technique to standardize the independent features present in the
data in a fixed range. It is performed during the data pre-processing.
Working:
Given a data-set with features- Age, Salary, BHK Apartment with the data size of 5000
people, each having these independent data features.
Each data point is labeled as:
 Class1- YES (means with the given Age, Salary, BHK Apartment feature value one
can buy the property)
 Class2- NO (means with the given Age, Salary, BHK Apartment feature value one
can’t buy the property).
Using a dataset to train the model, one aims to build a model that can predict whether
one can buy a property or not with given feature values.

Once the model is trained, an N-dimensional (where N is the no. of features present in
the dataset) graph with data points from the given dataset, can be created. The figure
given below is an ideal representation of the model.

As shown in the figure, star data points belong to Class1 – Yes and circles
represent Class2 – No labels, and the model gets trained using these data points. Now
a new data point (diamond as shown in the figure) is given and it has different
independent values for the 3 features (Age, Salary, BHK Apartment) mentioned above.
The model has to predict whether this data point belongs to Yes or No.
Prediction of the class of new data points:
The model calculates the distance of this data point from the centroid of each class
group. Finally, this data point will belong to that class, which will have a minimum
centroid distance from it.

eature Scaling is a technique to standardize the independent features present in the


data in a fixed range. It is performed during the data pre-processing to handle highly
varying magnitudes or values or units. If feature scaling is not done, then a machine
learning algorithm tends to weigh greater values, higher and consider smaller values
as the lower values, regardless of the unit of the values.
Example: If an algorithm is not using the feature scaling method then it can consider
the value 3000 meters to be greater than 5 km but that’s actually not true and in this
case, the algorithm will give wrong predictions. So, we use Feature Scaling to bring all
values to the same magnitudes and thus, tackle this issue.

Techniques to perform Feature Scaling


Consider the two most important ones:
 Min-Max Normalization: This technique re-scales a feature or observation value
with distribution value between 0 and 1.

 Standardization: It is a very effective technique which re-scales a feature value so


that it has distribution with 0 mean value and variance equals to 1.

Feature Scaling:-
Feature scaling is a method used to normalize the range of independent variables or
features of data. In data processing, it is also known as data normalization and is generally
performed during the data preprocessing step.
Feature Engineering is a big part of Data Science and Machine Learning. Feature Scaling is one of the last
steps in the whole life cycle of Feature Engineering. It is a technique to standardize the independent features
in a data in a fixed range or scale. Thus the name Feature Scaling.
In simple words, once we are done with all the other steps of feature engineering, like ecoding variables,
handling missing values etc, then we scale all the variable to a very small range of say -1 to +1. So all the
data gets squeezed to decimal points between -1 and +1. What it does is keep the distribution of the data,
the correlation and covariance absolutely the same however scales every independent or the feature matrix
columns to a smaller scale. We do this as most of the ML algorithms problems perform significantly better
after scaling.

ypes of Feature Scaling:

 Standardization:
 Standard Scaler
 Normalization:
 Min Max Scaling
 Mean Normalization
 Max Absolute Scaling
 Robust Scaling etc.
Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of
normalization is to change the values of numeric columns in the dataset to use a common scale, without
distorting differences in the ranges of values or losing information.
Min Max Scaling
Min-max normalization is one of the most common ways to normalize data. For every feature,
the minimum value of that feature gets transformed into a 0, the maximum value gets
transformed into a 1, and every other value gets transformed into a dec imal between 0 and 1.
Bias:-

It is a lack of objectivity when looking at something. The bias can be both intentional and
unintentional. For example, a person may like one shirt more than two others when given a choice
because the shirt they picked is also their favorite color.
Statistical bias is anything that leads to a systematic difference between the true parameters
of a population and the statistics used to estimate those parameters.
Bias is a statistical term which means a systematic deviation from the actual value. It is a sampling
procedure that may show some serious problems for the researcher as a mere increase cannot reduce
it in sample size. Bias is the difference between the expected value and the real value of the parameter.
In this article, we are going to discuss the classification of bias and its different types.

Bias Definition in Statistics


In statistics, bias is a term which defines the tendency of the measurement process. It means that it
evaluates the over or underestimation of the value of the population parameter. Let us consider an
example, in case you have the rule to evaluate the mean of the population. Hopefully, you might
have found an estimation using the rule, which is the true reflection of the population. Now, by using
the biased estimator, it is easy to find the difference between the true value and the statistically
expected value of the population parameter.
Types of Bias
The following are the different types of biases, which are listed below-

 Selection Bias
 Spectrum Bias
 Cognitive Bias
 Data-Snooping Bias
 Omitted-Variable Bias
 Exclusion Bias
 Analytical Bias
 Reporting Bias
 Funding Bias

Classification of Bias
The bias is mainly categorized into two different types

Measurement Bias
Measurement bias takes place for the duration of the carrying out the survey, and its consequences
chiefly because of three reasons –
(i) The error happens while recording the data
While recording data, errors happen due to the malfunction of instruments that are used for data
collection or because of ineffective handling of these tools by the researchers concerned with data
collection.
(ii) Leading Questions
The questions prepared for the survey might be put in a manner to lead the responses that are
preferred by the researcher. There can be more choices for the preferred retort given than for the
conflicting views.
(iii) Respondents gave inadvertent false responses
There can be a situation when many responders may have misunderstood the question and chooses
an incorrect option. If the sample groups were composed of numerous senior citizens and if they
asked to give answers by remembering their previous experiences, they might be providing some
false inputs because of deficiency of memory.

Non-Representative Sampling Bias


Non-representative sampling bias also referred to as selection bias. This inaccuracy occurs because
of implementing random methods during the selection process. It results in an excess representation
of some of the elements in the population. All the samples collected using convenience sampling are
caused by bias. These type of situation are called under coverage bias.

Variance:-

What Is Variance?
The term variance refers to a statistical measurement of the spread between numbers
in a data set. More specifically, variance measures how far each number in the set is
from the mean (average), and thus from every other number in the set. Variance is often
depicted by this symbol: σ2. It is used by both analysts and traders to
determine volatility and market security.

The square root of the variance is the standard deviation (SD or σ), which helps
determine the consistency of an investment’s returns over a period of time.

 Variance is a measurement of the spread between numbers in a data set.


 In particular, it measures the degree of dispersion of data around the sample's
mean.
 Investors use variance to see how much risk an investment carries and whether
it will be profitable.
 Variance is also used in finance to compare the relative performance of each
asset in a portfolio to achieve the best asset allocation.
 The square root of the variance is the standard deviation.

Understanding Variance

In statistics, variance measures variability from the average or mean. It is calculated by


taking the differences between each number in the data set and the mean, then squaring
the differences to make them positive, and finally dividing the sum of the squares by the
number of values in the data set.

Variance is calculated by using the following formula:


Advantages and Disadvantages of Variance

Statisticians use variance to see how individual numbers relate to each other within a
data set, rather than using broader mathematical techniques such as arranging numbers
into quartiles. The advantage of variance is that it treats all deviations from the mean as
the same regardless of their direction. The squared deviations cannot sum to zero and
give the appearance of no variability at all in the data.

One drawback to variance, though, is that it gives added weight to outliers. These are
the numbers far from the mean. Squaring these numbers can skew the data. Another
pitfall of using variance is that it is not easily interpreted. Users often employ it primarily
to take the square root of its value, which indicates the standard deviation of the data.
As noted above, investors can use standard deviation to assess how consistent returns
are over time.

Example of Variance in Finance

Here’s a hypothetical example to demonstrate how variance works. Let’s say returns for
stock in Company ABC are 10% in Year 1, 20% in Year 2, and −15% in Year 3. The
average of these three returns is 5%. The differences between each return and the
average are 5%, 15%, and −20% for each consecutive year.

Squaring these deviations yields 0.25%, 2.25%, and 4.00%, respectively. If we add these
squared deviations, we get a total of 6.5%. When you divide the sum of 6.5% by one less
the number of returns in the data set, as this is a sample (2 = 3-1), it gives us a variance
of 3.25% (0.0325). Taking the square root of the variance yields a standard deviation of
18% (√0.0325 = 0.180) for the returns.

How Do I Calculate Variance?

Follow these steps to compute variance:

1. Calculate the mean of the data.


2. Find each data point's difference from the mean value.
3. Square each of these values.
4. Add up all of the squared values.
5. Divide this sum of squares by n – 1 (for a sample) or N (for the population).
What Is Variance Used for?

Variance is essentially the degree of spread in a data set about the mean value of that
data. It shows the amount of variation that exists among the data points. Visually, the
larger the variance, the "fatter" a probability distribution will be. In finance, if something
like an investment has a greater variance, it may be interpreted as more risky or volatile.

Why Is Standard Deviation Often Used More Than Variance?

Standard deviation is the square root of variance. It is sometimes more useful since
taking the square root removes the units from the analysis. This allows for direct
comparisons between different things that may have different units or different
magnitudes. For instance, to say that increasing X by one unit increases Y by two
standard deviations allows you to understand the relationship between X and Y
regardless of what units they are expressed in.

What are the 4 main measures of variability?


Variability is most commonly measured with the following descriptive statistics:

 Range: the difference between the highest and lowest values


 Interquartile range: the range of the middle half of a distribution
 Standard deviation: average distance from the mean
 Variance: average of squared distances from the mean

What’s the Difference between standard deviation and Variance?

Variance is the average squared deviations from the mean, while standard deviation is
the square root of this number. Both measures reflect variability in a distribution, but their
units differ:

 Standard deviation is expressed in the same units as the original values (e.g.,
minutes or meters).
 Variance is expressed in much larger units (e.g., meters squared).

Although the units of variance are harder to intuitively understand, variance is important
in statistical tests.

What is Variance used for in statistics?

Statistical tests such as variance tests or the analysis of variance (ANOVA) use
sample variance to assess group differences of populations. They use the variances of
the samples to assess whether the populations they come from significantly differ from
each other.

What is Homoscedasticity?

Homoscedasticity, or homogeneity of variances, is an assumption of equal or


similar variances in different groups being compared.
This is an important assumption of parametric statistical tests because they are sensitive
to any dissimilarities. Uneven variances in samples result in biased and skewed test
results.

What is Regularization?

Regularization is one of the most important concepts of machine learning. It is a technique


to prevent the model from overfitting by adding extra information to it.

Sometimes the machine learning

model performs well with the training data but does not perform well with the test data. It
means the model is not able to predict the output when deals with unseen data by
introducing noise in the output, and hence the model is called overfitted. This problem
can be deal with the help of a regularization technique.

This technique can be used in such a way that it will allow to maintain all variables or
features in the model by reducing the magnitude of the variables. Hence, it maintains
accuracy as well as a generalization of the model.

It mainly regularizes or reduces the coefficient of features toward zero. In simple words,
"In regularization technique, we reduce the magnitude of the features by keeping the
same number of features."

How does Regularization Work?

Regularization works by adding a penalty or complexity term to the complex model. Let's
consider the simple linear regression equation:

y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively. Here
represents the bias of the model, and b represents the intercept.

Linear regression models try to optimize the β0 and b to minimize the cost function. The
equation for the cost function for the linear model is given below:
Now, we will add a loss function and optimize parameter to make the model that can
predict the accurate value of Y. The loss function for the linear regression is called as RSS
or Residual sum of squares.

Techniques of Regularization

There are mainly two types of regularization techniques, which are given below:

o Ridge Regression(L2 regularization)


o Lasso Regression(L1 regularization)

Ridge Regression
o Ridge regression is one of the types of linear regression in which a small amount
of bias is introduced so that we can get better long-term predictions.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o In this technique, the cost function is altered by adding the penalty term to it. The
amount of bias added to the model is called Ridge Regression penalty. We can
calculate it by multiplying with the lambda to the squared weight of each individual
feature.
o The equation for the cost function in ridge regression will be:

o In the above equation, the penalty term regularizes the coefficients of the model,
and hence ridge regression reduces the amplitudes of the coefficients that
decreases the complexity of the model.
o As we can see from the above equation, if the values of λ tend to zero, the
equation becomes the cost function of the linear regression model. Hence,
for the minimum value of λ, the model will resemble the linear regression model.
o A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge regression
can be used.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression(L1 Regularization):-
o Lasso regression is another regularization technique to reduce the complexity of
the model. It stands for Least Absolute and Selection Operator.
o It is similar to the Ridge Regression except that the penalty term contains only the
absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for the cost function of Lasso
regression will be:

o Some of the features in this technique are completely neglected for model
evaluation.
o Hence, the Lasso regression can help us to reduce the overfitting in the model as
well as the feature selection.

Difference between Ridge Regression and Lasso Regression


o Ridge regression is mostly used to reduce the overfitting in the model, and it
includes all the features present in the model. It reduces the complexity of the
model by shrinking the coefficients.
o Lasso regression helps to reduce the overfitting in the model as well as feature
selection.

Cross-validation:-
Cross-validation is a resampling method that uses different portions of the data to test
and train a model on different iterations. It is mainly used in settings where the goal is
prediction, and one wants to estimate how accurately a predictive model will perform in
practice.
Cross-Validation is a resampling technique with the fundamental idea of splitting the
dataset into 2 parts- training data and test data. Train data is used to train the model and
the unseen test data is used for prediction.
Cross-validation classified into two broad categories – Non-exhaustive and Exhaustive
Methods.
Non-exhaustive Methods

Non-exhaustive cross validation methods, as the name suggests do not compute all ways
of splitting the original data. Let us go through the methods to get a clearer understanding.

Holdout method

This is a quite basic and simple approach in which we divide our entire dataset into two
parts viz- training data and testing data. As the name, we train the model on training data
and then evaluate on the testing set. Usually, the size of training data is set more than
twice that of testing data, so the data is split in the ratio of 70:30 or 80:20.

In this approach, the data is first shuffled randomly before splitting. As the model is trained
on a different combination of data points, the model can give different results every time
we train it, and this can be a cause of instability. Also, we can never assure that the train
set we picked is representative of the whole dataset.

Also when our dataset is not too large, there is a high possibility that the testing data may
contain some important information that we lose as we do not train the model on the
testing set.

The hold-out method is good to use when you have a very large dataset, you’re on a time
crunch, or you are starting to build an initial model in your data science project.

K fold cross validation:-

Cross-validation is a resampling procedure used to evaluate machine learning


models on a limited data sample. The procedure has a single parameter called k
that refers to the number of groups that a given data sample is to be split into.

K-fold cross validation is one way to improve the holdout method. This method
guarantees that the score of our model does not depend on the way we picked the train
and test set. The data set is divided into k number of subsets and the holdout method is
repeated k number of times. Let us go through this in steps:

1. Randomly split your entire dataset into k number of folds (subsets)


2. For each fold in your dataset, build your model on k – 1 folds of the dataset. Then,
test the model to check the effectiveness for kth fold
3. Repeat this until each of the k-folds has served as the test set
4. The average of your k recorded accuracy is called the cross-validation accuracy
and will serve as your performance metric for the model.
Because it ensures that every observation from the original dataset has the chance of
appearing in training and test set, this method generally results in a less biased model
compare to other methods. It is one of the best approaches if we have limited input data.
The disadvantage of this method is that the training algorithm has to be rerun from scratch
k times, which means it takes k times as much computation to make an evaluation.

Stratified K Fold Cross Validation

Using K Fold on a classification problem can be tricky. Since we are randomly shuffling
the data and then dividing it into folds, chances are we may get highly imbalanced folds
which may cause our training to be biased. For example, let us somehow get a fold that
has majority belonging to one class(say positive) and only a few as negative class. This
will certainly ruin our training and to avoid this we make stratified folds using stratification.

Stratification is the process of rearranging the data so as to ensure that each fold is a
good representative of the whole. For example, in a binary classification problem where
each class comprises of 50% of the data, it is best to arrange the data such that in every
fold, each class comprises of about half the instances.

Exhaustive Methods

Exhaustive cross validation methods and test on all possible ways to divide the original
sample into a training and a validation set.

Leave-P-Out cross validation

When using this exhaustive method, we take p number of points out from the total number
of data points in the dataset(say n). While training the model we train it on these (n – p)
data points and test the model on p data points. We repeat this process for all the possible
combinations of p from the original dataset. Then to get the final accuracy, we average
the accuracies from all these iterations.

This is an exhaustive method as we train the model on every possible combination of


data points. Remember if we choose a higher value for p, then the number of
combinations will be more and we can say the method gets a lot more exhaustive.

Leave-one-out cross validation

This is a simple variation of Leave-P-Out cross validation and the value of p is set as one.
This makes the method much less exhaustive as now for n data points and p = 1, we
have n number of combinations.

What is Rolling Cross Validation?

For time-series data the above-mentioned methods are not the best ways to evaluate the
models. Here are two reasons as to why this is not an ideal way to go:

1. Shuffling the data messes up the time section of the data as it will disrupt the order
of events
2. Using cross-validation, there is a chance that we train the model on future data
and test on past data which will break the golden rule in time series i.e. “peaking
in the future is not allowed”.

1. What is the purpose of cross validation?

The purpose of cross–validation is to test the ability of a machine learning model to predict
new data. It is also used to flag problems like overfitting or selection bias and gives
insights on how the model will generalize to an independent dataset.

2. How do you explain cross validation?

Cross-validation is a statistical method used to estimate the performance (or accuracy)


of machine learning models. It is used to protect against overfitting in a predictive model,
particularly in a case where the amount of data may be limited. In cross-validation, you
make a fixed number of folds (or partitions) of the data, run the analysis on each fold, and
then average the overall error estimate.

3. What are the types of cross validation?

The 4 Types of Cross Validation are:

 Holdout Method
 K-Fold Cross-Validation
 Stratified K-Fold Cross-Validation
 Leave-P-Out Cross-Validation

4. What is cross validation and why we need it?

Cross-Validation is a very useful technique to assess the effectiveness of a machine


learning model, particularly in cases where you need to mitigate overfitting. It is also of
use in determining the hyperparameters of your model, in the sense that which
parameters will result in the lowest test error.

5. Does cross validation reduce Overfitting?

Cross-validation is a procedure that is used to avoid overfitting and estimate the skill of
the model on new data. There are common tactics that you can use to select the value of
k for your dataset.

Grid Search CV:-

What is GridSearchCV?
GridSearchCV is the process of performing hyperparameter tuning in order to determine
the optimal values for a given model. As mentioned above, the performance of a model
significantly depends on the value of hyperparameters. Note that there is no way to know
in advance the best values for hyperparameters so ideally, we need to try all possible
values to know the optimal values. Doing this manually could take a considerable amount
of time and resources and thus we use GridSearchCV to automate the tuning of
hyperparameters.

GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn) model_selection


package.So an important point here to note is that we need to have the Scikit learn library
installed on the computer. This function helps to loop through predefined
hyperparameters and fit your estimator (model) on your training set. So, in the end, we
can select the best parameters from the listed hyperparameters.

How does GridSearchCV work?

As mentioned above, we pass predefined values for hyperparameters to the


GridSearchCV function. We do this by defining a dictionary in which we mention a
particular hyperparameter along with the values it can take. Here is an example of it

{ 'C': [0.1, 1, 10, 100, 1000],

'gamma': [1, 0.1, 0.01, 0.001, 0.0001],

'kernel': ['rbf',’linear’,'sigmoid'] }

Here C, gamma and kernels are some of the hyperparameters of an SVM model. Note
that the rest of the hyperparameters will be set to their default values

GridSearchCV tries all the combinations of the values passed in the dictionary and
evaluates the model for each combination using the Cross-Validation method. Hence
after using this function we get accuracy/loss for every combination of hyperparameters
and we can choose the one with the best performance.

Difference between parameter and hypermeter

Parameter Hyperparameter

The configuration model’s parameters Hyperparameters are parameters that are explicitly
are internal to the model. specified and control the training process.

Predictions require the use of Model optimization necessitates the use of


parameters. hyperparameters.
These are specified or guessed while These are established prior to the start of the model’s
the model is being trained. training.

This is internal to the model. This is external to the model.

These are learned & set by the model These are set manually by a machine learning
by itself. engineer/practitioner.

What is GridSearchCV used for?


GridSearchCV is a technique for finding the optimal parameter values from a given set of
parameters in a grid. It’s essentially a cross-validation technique. The model as well as
the parameters must be entered. After extracting the best parameter values, predictions
are made.

How do you define GridSearchCV?


GridSearchCV is the process of performing hyperparameter tuning in order to determine
the optimal values for a given model.

What does cv in GridSearchCV stand for?


GridSearchCV is also known as GridSearch cross-validation: an internal cross-validation
technique is used to calculate the score for each combination of parameters on the grid.

How do you use GridSearchCV in regression?


GirdserachCV in regression can be used by following the below steps
Import the library – GridSearchCv.
Set up the Data.
Model and its Parameter.
Using GridSearchCV and Printing Results.

Does GridSearchCV use cross-validation?


GridSearchCV does, in fact, do cross-validation. If I understand the notion correctly, you
want to hide a portion of your data set from the model so that it may be tested. As a result,
you train your models on training data and then test them on testing data.

What is cross-validation?

Cross-Validation is a technique used in model selection to better estimate the test error
of a predictive model. The idea behind cross-validation is to create a number of partitions
of sample observations, known as the validation sets, from the training data set. After
fitting a model on to the training data, its performance is measured against each validation
set and then averaged, gaining a better assessment of how the model will perform when
asked to predict for new observations. The number of partitions to construct depends on
the number of observations in the sample data set as well as the decision made regarding
the bias-variance trade-off, with more partitions leading to a smaller bias but a higher
variance.
K-Fold cross-validation

This is the most common use of cross-validation. Observations are split into K partitions,
the model is trained on K – 1 partitions, and the test error is predicted on the left out
partition k. The process is repeated for k = 1,2…K and the result is averaged. If K=n, the
process is referred to as Leave One Out Cross-Validation, or LOOCV for short. This
approach has low bias, is computationally cheap, but the estimates of each fold are highly
correlated. In this tutorial we will use K = 5.

CV Error:-

Cross validated error is your best guess for the average error you would see with your
regression model on new data.

The basic idea in calculating cross validation error is to divide up training data into k-
folds (e.g. k=5 or k=10). Each fold will then be held out one at a time, the model will be
trained on the remaining data, and that model will then be used to predict the target for
the holdout observations.

You might also like