Professional Documents
Culture Documents
chi-SquaredTest - Vishal (21DM217) - Vatsal (21DM216) - Preeti (21DM242) - Absent On4th&11th July.
chi-SquaredTest - Vishal (21DM217) - Vatsal (21DM216) - Preeti (21DM242) - Absent On4th&11th July.
chi-SquaredTest - Vishal (21DM217) - Vatsal (21DM216) - Preeti (21DM242) - Absent On4th&11th July.
Chi-Square, RFE, IG
Chi-Squared Test
A low value for chi-square means there is a high You could also use a p-value. First state
correlation between your two sets of data. In the null hypothesis and the alternate
theory, if your observed and expected values hypothesis. Then generate a chi-square
were equal (“no difference”) then chi-square curve for your results along with a p-
would be zero — highly unlikely to happen value
Chi-Squared Test
Consider an array arr[] = {2, 10, 8, 7}
Example question: 256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus
(24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius
(20), Pisces (23). Test the hypothesis that zodiac signs are evenly distributed across visual artists.
Step 2: Fill in your categories. Categories should be Step 3: Write your counts. Counts are the number of
given to you in the question. There are 12 zodiac signs, each items in each category in column 2. You’re given
so: the counts in the question:
Chi-Squared Test
Consider an array arr[] = {2, 10, 8, 7}
Step 4: Calculate your expected value for column 3. In Step 5: Subtract the expected value (Step 4) from the
this question, we would expect the 12 zodiac signs to be Observed value (Step 3) and place the result in the
evenly distributed for all 256 people, so 256/12=21.333. “Residual” column. For example, the first row is Aries:
Write this in column 3. 29-21.333=7.667.
Chi-Squared Test
Consider an array arr[] = {2, 10, 8, 7}
Step 6: Square your results from Step 5 and place the Step 7: Divide the amounts in Step 6 by the expected
amounts in the (Obs-Exp)2 column. value (Step 4) and place those results in the final
column. Finally add up (sum) all the values in the last
column.
Chi-Squared Test
Consider an array arr[] = {2, 10, 8, 7}
What is it?
It measures the reduction in entropy or surprise by splitting a
dataset according to a given value of a random variable.
Information...
information quantifies how surprising an event is in bits.
Lower probability events have more information, higher
probability events have less information.
Entropy
Example
For example, in a binary classification problem (two
classes), we can calculate the entropy of the data sample
as follows:
Syntax
Information Gain
Example
Predicting the Gender of an unborn baby.
Example
Predict whether there will be a golf game today or not.
What is it? RFE can be used to handle problems presented by the two
models listed below:
Classification: Classification predicts the class of selected
Feature elimination in machine learning is
data points. Classes are also known as targets, labels, or
referred to as choosing a subset of relevant
categories. Classification predictive modeling involves
features from the dataset to use in further model
approximating a mapping function (f) from input variables (X)
construction achieving the optimum number to discrete output variables (y).
needed to assure peak performance. Regression: Regression models supply a function describing
the relationship between one (or more) independent
Steps variables and a response, dependent, or target variable.
Data Preparation You and your five friends are trying to decide whether to go
To start with, we will import the following libraries. out to eat or not?
factors come up for consideration:
Who is hungry enough to eat a full meal
How people’s available funds are holding up
How late people can stay up
What kind of food people do want
The location and types of local eateries
How late do people want to stay out
Who has a car
Example
1.To show how this works in practice, we’ll start with a
contrived example using a dataset that has only 3
informative features out of 25.