Unit 1 TE Honours

Types of statistical inference
From sources across the web
Confidence interval
Hypothesis testing
Estimation
Random variable
Statistics
Statistical inference
Population
Approximate distributions
Central tendency
Table
Correlation
What Are Descriptive Statistics?

Descriptive statistics are brief informational coefficients that summarize a
given data set, which can be either a representation of the entire
population or a sample of a population. Descriptive statistics are broken
down into measures of central tendency and measures of variability
(spread). Measures of central tendency include the mean, median, and
mode, while measures of variability include standard deviation, variance,
minimum and maximum variables, kurtosis, and skewness.12
Massachusetts of Information Technology. "Introduction to Descriptive Statistics ." Page
3.
Understanding Descriptive Statistics

Descriptive statistics, in short, help describe and understand the features
of a specific data set by giving short summaries about the sample and
measures of the data. The most recognized types of descriptive statistics
are measures of center: the mean, median, and mode, which are used at
almost all levels of math and statistics. The mean, or the average, is
calculated by adding all the figures within the data set and then dividing by
the number of figures within the set.
For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The
mean is 4 (20/5). The mode of a data set is the value appearing most
often, and the median is the figure situated in the middle of the data set. It
is the figure separating the higher figures from the lower figures within a
data set. However, there are less common types of descriptive statistics
that are still very important.1
People use descriptive statistics to repurpose hard-to-understand

quantitative insights across a large data set into bite-sized descriptions. A
student's grade point average (GPA), for example, provides a good
understanding of descriptive statistics. The idea of a GPA is that it takes
data points from a wide range of exams, classes, and grades, and
averages them together to provide a general understanding of a student's
overall academic performance. A student's personal GPA reflects their
mean academic performance.
What Is Descriptive Statistics?
What Are Inferential Statistics Used

For?
Inferential statistics are generally used in two ways: to set parameters about
a group and then create hypotheses about how data will perform when
scaled.
Inferential statistics are among the most useful tools for making educated
predictions about how a set of data will scale when applied to a larger
population of subjects. These statistics help set a benchmark for hypothesis
testing, as well as a general idea of where specific parameters will land
when scaled to a larger data set, such as the larger set’s mean.
This process can determine a population’s z-score (where a subject will

land on a bell curve) and set data up for further testing.
Find out who's hiring.

See all Data + Analytics jobs at top tech companies & startups
VIEW 3027 JOBS
What’s the Difference Between

Descriptive and Inferential
Statistics?
Descriptive statistics are meant to illustrate data exactly as it is presented,
meaning no predictions or generalizations should be used in the
presentation of this data. More detailed descriptive statistics will present
factors like the mean of a sample, the standard deviation of a sample or
describe the sample’s probability shape.
Inferential statistics, on the other hand, rely on the use of generalizations

based on data acquired from subjects. These statistics use the same sample
of data as descriptive statistics, but exist to make assumptions about how a
larger group of subjects will perform based on the performance of the
existing subjects, with scalability factors to account for variations in larger
groups.
Inferential statistics essentially do one of two things: estimate a

population’s parameter, such as the mean or average, or set a hypothesis for
further analysis.
The Importance of
Statistics in Machine
Learning
Geeta Kakrani
·
Follow
3 min read
Jul 8, 2023
Introduction:
Machine learning has revolutionized numerous industries, enabling
computers to analyze vast amounts of data and make accurate
predictions or decisions. While machine learning algorithms and
models are at the core of this technology, it is crucial to recognize the
pivotal role that statistics plays in driving the success and
effectiveness of machine learning. In this blog post, we will delve
into the significance of statistics in machine learning and explore its
various applications and benefits.
1. Data Preparation: Statistics provides the foundation for data

preprocessing and preparation, which is a critical step in
machine learning. Statistical techniques help in handling missing
data, identifying outliers, and transforming variables to ensure
data quality and reliability. Through statistical analysis, we can
gain insights into the distribution and characteristics of the data,
making it suitable for training machine learning models.
2. Feature Selection and Engineering: Statistics guides the process

of selecting relevant features or variables for machine learning
models. Techniques like correlation analysis, ANOVA, or chi-
square tests help identify the most significant predictors,
reducing dimensionality and improving model performance.
Furthermore, statistical methodologies assist in feature
engineering, transforming and creating new features that
enhance the representation and predictive power of the data.
3. Model Building and Evaluation: Statistics underpins the

development and evaluation of machine learning models.
Statistical learning theory provides a theoretical framework for
model selection, optimization, and generalization. Techniques
like linear regression, logistic regression, decision trees, and
neural networks are rooted in statistical principles, allowing us to
infer relationships, estimate parameters, and make predictions.
Statistical metrics such as accuracy, precision, recall, F1-score,
and area under the ROC curve enable the assessment and
comparison of model performance.
4. Hypothesis Testing and Inference: Statistics facilitates

hypothesis testing and statistical inference, which play a crucial
role in machine learning research and development. Statistical
hypothesis tests enable us to evaluate the significance of
relationships, validate assumptions, and make data-driven
decisions. Through statistical inference, we can draw conclusions
about the population based on a sample, providing insights into
the reliability and generalizability of machine learning models.
5. Model Interpretability and Explainability: As machine learning

models become more complex, the need for interpretability and
explainability arises. Statistical techniques, such as feature
importance analysis, partial dependence plots, and permutation
importance, help us understand and interpret the behavior of
machine learning models. Statistical methods can uncover the
underlying relationships and contributions of different variables,
enhancing the trustworthiness and transparency of the models.
Measures of Central Tendency
Introduction
A measure of central tendency is a single value that attempts to describe a set of
data by identifying the central position within that set of data. As such, measures of
central tendency are sometimes called measures of central location. They are also
classed as summary statistics. The mean (often called the average) is most likely the
measure of central tendency that you are most familiar with, but there are others,
such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate
to use than others. In the following sections, we will look at the mean, mode and
median, and learn how to calculate them and under what conditions they are most
appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is
most often with continuous data (see our Types of Variable guide for data types).
The mean is equal to the sum of all the values in the data set divided by the number
of values in the data set. So, if we have � values in a data set and they have
values �1,�2, …,��, the sample mean, usually denoted by �― (pronounced "x
bar"), is:
�―=�1+�2+⋯+��
This formula is usually written in a slightly different manner using the Greek capitol
letter, ∑, pronounced "sigma", which means "sum of...":
�―=∑��
You may have noticed that the above formula refers to the sample mean. So, why
have we called it a sample mean? This is because, in statistics, samples and
populations have very different meanings and these differences are very important,
even if, in the case of the mean, they are calculated in the same way. To
acknowledge that we are calculating the population mean and not the sample mean,
we use the Greek lower case letter "mu", denoted as �:
�=∑��
The mean is essentially a model of your data set. It is the value that is most
common. You will notice, however, that the mean is not often one of the actual
values that you have observed in your data set. However, one of its important
properties is that it minimises error in the prediction of any one value in your data
set. That is, it is the value that produces the lowest amount of error from all other
values in the data set.
An important property of the mean is that it includes every value in your data set as
part of the calculation. In addition, the mean is the only measure of central tendency
where the sum of the deviations of each value from the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence
of outliers. These are values that are unusual compared to the rest of the data set by
being especially small or large in numerical value. For example, consider the wages
of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data
suggests that this mean value might not be the best way to accurately reflect the
typical salary of a worker, as most workers have salaries in the $12k to 18k range.
The mean is being skewed by the two large salaries. Therefore, in this situation, we
would like to have a better measure of central tendency. As we will find out later,
taking the median would be a better measure of central tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when
our data is skewed (i.e., the frequency distribution for our data is skewed). If we
consider the normal distribution - as this is the most frequently assessed in statistics
- when the data is perfectly normal, the mean, median and mode are identical.
Moreover, they all represent the most typical value in the data set. However, as the
data becomes skewed the mean loses its ability to provide the best central location
for the data because the skewed data is dragging it away from the typical value.
However, the median best retains this position and is not as strongly influenced by
the skewed values. This is explained in more detail in the skewed distribution section
later in this guide.
Median
The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data. In order to
calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the
middle mark because there are 5 scores before it and 5 scores after it. This works
fine when you have an odd number of scores, but what happens when you have an
even number of scores? What if you had only 10 scores? Well, you simply have to
take the middle two scores and average the result. So, if we look at the example
below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and average them to
get a median of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents
the highest bar in a bar chart or histogram. You can, therefore, sometimes consider
the mode as being the most popular option. An example of a mode is presented
below:
Normally, the mode is used for categorical data where we wish to know which is the
most common category, as illustrated below:
We can see above that the most common form of transport, in this particular data
set, is the bus. However, one of the problems with the mode is that it is not unique,
so it leaves us with problems when we have two or more values that share the
highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data.
This is particularly problematic when we have continuous data because we are more
likely not to have any one value that is more frequent than the other. For example,
consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that
we will find two or more people with exactly the same weight (e.g., 67.4 kg)? The
answer, is probably very unlikely - many people might be close, but with such a small
sample (30 people) and a large range of possible weights, you are unlikely to find
two people with exactly the same weight; that is, to the nearest 0.1 kg. This is why
the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure
of central tendency when the most common mark is far away from the rest of the
data in the data set, as depicted in the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that
the mode is not representative of the data, which is mostly concentrated around the
20 to 30 value range. To use the mode to describe the central tendency of this data
set would be misleading.
Skewed Distributions and the Mean and Median
We often test whether our data is normally distributed because this is a common
assumption underlying many statistical tests. An example of a normally distributed
set of data is presented below:
When you have a normally distributed sample you can legitimately use both the
mean or the median as your measure of central tendency. In fact, in any symmetrical
distribution the mean, median and mode are equal. However, in this situation, the
mean is widely preferred as the best measure of central tendency because it is the
measure that includes all the values in the data set for its calculation, and any
change in any of the scores will affect the value of the mean. This is not the case
with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data set
below:
We find that the mean is being dragged in the direct of the skew. In these situations,
the median is generally considered to be the best representative of the central
location of the data. The more skewed the distribution, the greater the difference
between the median and mean, and the greater emphasis should be placed on using
the median as opposed to the mean. A classic example of the above right-skewed
distribution is income (salary), where higher-earners provide a false representation of
the typical income if expressed as a mean and not a median.
If dealing with a normal distribution, and tests of normality show that the data is non-
normal, it is customary to use the median instead of the mean. However, this is more
a rule of thumb than a strict guideline. Sometimes, researchers wish to report the
mean of a skewed distribution if the median and mean are not appreciably different
(a subjective assessment), and if it allows easier comparisons to previous research
to be made.
Link https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-
median.php
Measures of Dispersion | Types, Formula

and Examples



Measures of Dispersion are used to represent the scattering of data. These are the
numbers that show the various aspects of the data spread across various parameters.
Let’s learn about the measure of dispersion in statistics , its types, formulas, and
examples in detail.
Dispersion in Statistics
Dispersion in statistics is a way to describe how spread out or scattered the data is
around an average value. It helps to understand if the data points are close together or
far apart.
Dispersion shows the variability or consistency in a set of data. There are different
measures of dispersion like range, variance, and standard deviation.
Measure of Dispersion in Statistics
Measures of Dispersion measure the scattering of the data. It tells us how the values
are distributed in the data set. In statistics, we define the measure of dispersion as
various parameters that are used to define the various attributes of the data.
These measures of dispersion capture variation between different values of the data.
Types of Measures of Dispersion
Measures of dispersion can be classified into the following two types :
 Absolute Measure of Dispersion
 Relative Measure of Dispersion
These measures of dispersion can be further divided into various categories. They
have various parameters and these parameters have the same unit.
Let’s learn about them in detail.
Absolute Measure of Dispersion
The measures of dispersion that are measured and expressed in the units of data
themselves are called Absolute Measure of Dispersion. For example – Meters,
Dollars, Kg, etc.
Some absolute measures of dispersion are:
Range: It is defined as the difference between the largest and the smallest value in the
distribution.
Mean Deviation: It is the arithmetic mean of the difference between the values and
their mean.
Standard Deviation: It is the square root of the arithmetic average of the square of
the deviations measured from the mean.
Variance: It is defined as the average of the square deviation from the mean of the
given data set.
Quartile Deviation: It is defined as half of the difference between the third quartile
and the first quartile in a given data set.
Interquartile Range: The difference between upper(Q3 ) and lower(Q1) quartile is
called Interterquartile Range. Its formula is given as Q3 – Q1.
Read More :
 Mean deviation
 Standard Deviation
 Variance
 Range
 Quartile Deviation
Relative Measure of Dispersion
We use relative measures of dispersion to measure the two quantities that have
different units to get a better idea about the scattering of the data.
Here are some of the relative measures of dispersion:
Coefficient of Range: It is defined as the ratio of the difference between the highest
and lowest value in a data set to the sum of the highest and lowest value.
Coefficient of Variation: It is defined as the ratio of the standard deviation to the
mean of the data set. We use percentages to express the coefficient of variation.
Coefficient of Mean Deviation: It is defined as the ratio of the mean deviation to the
value of the central point of the data set.
Coefficient of Quartile Deviation: It is defined as the ratio of the difference between
the third quartile and the first quartile to the sum of the third and first quartiles.
Read More :
 Coefficient of Mean Deviation
 Coefficient of Variation
 Coefficient of Range
Range of Data Set
The range is the difference between the largest and the smallest values in the
distribution.
Thus, it can be written as
R=L–S
where,
L is the largest value in the Distribution
S is the smallest value in the Distribution
 A higher value of range implies higher variation in the data set.
 One drawback of this measure is that it only takes into account the maximum and
the minimum value. They might not always be the proper indicator of how the
values of the distribution are scattered.
Example: Find the range of the data set 10, 20, 15, 0, 100.
Solution:
 Smallest Value in the data = 0
 Largest Value in the data = 100
Thus, the range of the data set is,
R = 100 – 0
R = 100
Note: Range cannot be calculated for the open-ended frequency distributions. Open-
ended frequency distributions are those distributions in which either the lower limit of
the lowest class or the higher limit of the highest class is not defined.
Range for Ungrouped Data
To find the range for the ungrouped data set, first we have to find the smallest and the
largest value of the data set by observing. The difference between them gives the
range of ungrouped data.
We can understyand this with the help of following example:
Example: Find out the range for the following observations, 20, 24, 31, 17, 45, 39,
51, 61.
Solution:
 Largest Value = 61
 Smallest Value = 17
Thus, the range of the data set is
Range = 61 – 17 = 44
Range for Grouped Data
The range of the grouped data set is found by studying the following example,
Example: Find out the range for the following frequency distribution table for
the marks scored by class 10 students.
Marks Intervals Number of Students
0-10 5
10-20 8
20-30 15
30-40 9
Solution:
 For Largest Value: Taking the higher limit of Highest Class = 40
 For Smallest Value: Taking the lower limit of Lowest Class = 0
Range = 40 – 0
Thus, the range of the given data set is,
Range = 40
Mean Deviation
Mean deviation measures the deviation of the observations from the mean of the
distribution.
Since the average is the central value of the data, some deviations might be positive
and some might be negative. If they are added like that, their sum will not reveal much
as they tend to cancel each other’s effect.
For example :
Let us consider this set of data : -5, 10, 25
Mean = (-5 + 10 + 25)/3 = 10
Now a deviation from the mean for different values is,
 (-5 -10) = -15
 (10 – 10) = 0
 (25 – 10) = 15
Now adding the deviations, shows that there is zero deviation from the mean which is
incorrect. Thus, to counter this problem only the absolute values of the difference are
taken while calculating the mean deviation.
Mean Deviation Formula :
MD =
Mean Deviation for Ungrouped Data
For calculating the mean deviation for ungrouped data, the following steps must be
followed:
Step 1: Calculate the arithmetic mean for all the values of the dataset.
Step 2: Calculate the difference between each value of the dataset and the mean. Only
absolute values of the differences will be considered. |d|
Step 3: Calculate the arithmetic mean of these deviations using the formula,
M.D =
This can be explained using the example.
Example: Calculate the mean deviation for the given ungrouped data, 2, 4, 6, 8,
10
Solution:
Mean(μ) = (2+4+6+8+10)/(5)
μ=6
M. D =
⇒ M.D =
⇒ M.D = (4+2+0+2+4)/(5)
⇒ M.D = 12/5 = 2.4
Chi-Square (χ2) Statistic: What It Is, Examples, How and When to Use
the Test
By
ADAM HAYES
Updated January 08, 2024
Reviewed by MARGARET JAMES

Fact checked by
VIKKI VELASQUEZ
Investopedia / Paige McLaughlin
Trending Videos
Close this video player

0 seconds of 5 secondsVolume 0%
What Is a Chi-Square Statistic?

A chi-square (χ2) statistic is a test that measures how a model compares to
actual observed data. The data used in calculating a chi-
square statistic must be random, raw, mutually exclusive, drawn from
independent variables, and drawn from a large enough sample. For
example, the results of tossing a fair coin meet these criteria.
Chi-square tests are often used to test hypotheses. The chi-square

statistic compares the size of any discrepancies between the expected
results and the actual results, given the size of the sample and the number
of variables in the relationship.
For these tests, degrees of freedom are used to determine if a certain null
hypothesis can be rejected based on the total number of variables and
samples within the experiment. As with any statistic, the larger the sample
size, the more reliable the results.
KEY TAKEAWAYS
 A chi-square (χ2) statistic is a measure of the difference between the

observed and expected frequencies of the outcomes of a set of
events or variables.
 Chi-square is useful for analyzing such differences in categorical
variables, especially those nominal in nature.
 χ2 depends on the size of the difference between actual and
observed values, the degrees of freedom, and the sample size.
 χ2 can be used to test whether two variables are related or
independent of each other.
 It can also be used to test the goodness of fit between an observed
distribution and a theoretical distribution of frequencies.
Formula for Chi-Square

��2=∑(��−��)2��where:�=Degrees of freedom�=Obse
rved value(s)�=Expected value(s)χc2=∑Ei(Oi−Ei)2
where:c=Degrees of freedomO=Observed value(s)E=Expected val
ue(s)
What Does a Chi-Square Statistic Tell You?

There are two main kinds of chi-square tests: the test of independence,
which asks a question of relationship, such as, "Is there a relationship
between student gender and course choice?"; and the goodness-of-fit test,
which asks something like "How well does the coin in my hand match a
theoretically fair coin?"1
Chi-square analysis is applied to categorical variables and is especially

useful when those variables are nominal (where order doesn't matter, like
marital status or gender).2
Independence
When considering student gender and course choice, a χ2 test for
independence could be used. To do this test, the researcher would collect
data on the two chosen variables (gender and courses picked) and then
compare the frequencies at which male and female students select among
the offered classes using the formula given above and a χ2 statistical
table.2
If there is no relationship between gender and course selection (that is if

they are independent), then the actual frequencies at which male and
female students select each offered course should be expected to be
approximately equal, or conversely, the proportion of male and female
students in any selected course should be approximately equal to the
proportion of male and female students in the sample.2
A χ2 test for independence can tell us how likely it is that random chance
can explain any observed difference between the actual frequencies in the
data and these theoretical expectations.
Goodness-of-Fit
χ2 provides a way to test how well a sample of data matches the (known or
assumed) characteristics of the larger population that the sample is
intended to represent. This is known as goodness of fit.
If the sample data do not fit the expected properties of the population that
we are interested in, then we would not want to use this sample to draw
conclusions about the larger population.3
An Example
For example, consider an imaginary coin with exactly a 50/50 chance of
landing heads or tails and a real coin that you toss 100 times. If this coin is
fair, then it will also have an equal probability of landing on either side, and
the expected result of tossing the coin 100 times is that heads will come up
50 times and tails will come up 50 times.4
In this case, χ2 can tell us how well the actual results of 100 coin flips
compare to the theoretical model that a fair coin will give 50/50 results.
The actual toss could come up 50/50, or 60/40, or even 90/10. The farther
away the actual results of the 100 tosses are from 50/50, the less good the
fit of this set of tosses is to the theoretical expectation of 50/50, and the
more likely we might conclude that this coin is not actually a fair coin.4
When to Use a Chi-Square Test

A chi-square test is used to help determine if observed results are in line
with expected results and to rule out that observations are due to chance.
A chi-square test is appropriate for this when the data being analyzed are
from a random sample, and when the variable in question is a categorical
variable.2
Kent State University, University Libraries. "SPSS Tutorials: Chi-Square

Test of Independence ."
A categorical variable consists of selections such as type of car, race,
educational attainment, male or female, or how much somebody likes a
political candidate (from very much to very little).
These types of data are often collected via survey responses or

questionnaires. Therefore, chi-square analysis is often most useful in
analyzing this type of data.
How to Perform a Chi-Square Test

These are the basic steps whether you are performing a goodness of fit
test or a test of independence:
 Create a table of the observed and expected frequencies;

 Use the formula to calculate the chi-square value;
 Find the critical chi-square value using a chi-square value table or
statistical software;
 Determine whether the chi-square value or the critical value is the
larger of the two;
 Reject or accept the null hypothesis.5
Limitations of the Chi-Square Test

The chi-square test is sensitive to sample size. Relationships may appear
to be significant when they aren't simply because a very large sample is
used.
In addition, the chi-square test cannot establish whether one variable has
a causal relationship with another. It can only establish whether two
variables are related.
What Is a Chi-square Test Used for?

Chi-square is a statistical test used to examine the differences between
categorical variables from a random sample in order to judge the
goodness of fit between expected and observed results.
Who Uses Chi-Square Analysis?

Since chi-square applies to categorical variables, it is most used by
researchers who are studying survey response data. This type of research
can range from demography to consumer and marketing research to
political science and economics.
Is Chi-Square Analysis Used When the Independent

Variable Is Nominal or Ordinal?
A nominal variable is a categorical variable that differs by quality, but
whose numerical order could be irrelevant. For instance, asking somebody
their favorite color would produce a nominal variable. Asking somebody's
age, on the other hand, would produce an ordinal set of data. Chi-square
can be best applied to nominal data.
The Bottom Line

There are two types of chi-square tests: the test of independence and the
test of goodness of fit. Both are used to determine the validity of a
hypothesis or an assumption. The result is a piece of evidence that can be
used to make a decision. For example:
In a test of independence, a company may want to evaluate whether its
new product, an herbal supplement that promises to give people an energy
boost, is reaching the people who are most likely to be interested. It is
being advertised on websites related to sports and fitness, on the
assumption that active and health-conscious people are most likely to buy
it. It does an extensive poll that is intended to evaluate interest in the
product by demographic group. The poll suggests no correlation between
interest in this product and the most health-conscious people.
In a test of goodness of fit, a marketing professional is considering

launching a new product that the company believes will be irresistible to
women over 45. The company has conducted product testing panels of
500 potential buyers of the product. The marketing professional has
information about the age and gender of the test panels, This allows the
construction of a chi-square test showing the distribution by age and
gender of the people who said they would buy the product. The result will
show whether or not the likeliest buyer is a woman over 45. If the test
shows that men over 45 or women between 18 and 44 are just as likely to
buy the product, the marketing professional will revise the advertising,
promotion, and placement of the product to appeal to this wider group of
customers.
SPONSORED
Trade on the Go. Anywhere,

Unit 1 TE Honours

Uploaded by

Copyright:

Available Formats

You might also like

Unit 1 TE Honours

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1 TE Honours

Uploaded by

Copyright:

Available Formats

Types of statistical inference

From sources across the web

What Are Descriptive Statistics?

Understanding Descriptive Statistics

People use descriptive statistics to repurpose hard-to-understand

What Is Descriptive Statistics?

What Are Inferential Statistics Used

This process can determine a population’s z-score (where a subject will

Find out who's hiring.

What’s the Difference Between

Inferential statistics, on the other hand, rely on the use of generalizations

Inferential statistics essentially do one of two things: estimate a

1. Data Preparation: Statistics provides the foundation for data

2. Feature Selection and Engineering: Statistics guides the process

3. Model Building and Evaluation: Statistics underpins the

4. Hypothesis Testing and Inference: Statistics facilitates

5. Model Interpretability and Explainability: As machine learning

Measures of Central Tendency

When not to use the mean

We again rearrange that data into order of magnitude (smallest first):

Skewed Distributions and the Mean and Median

Measures of Dispersion | Types, Formula

Updated January 08, 2024

Reviewed by MARGARET JAMES

Close this video player

What Is a Chi-Square Statistic?

Chi-square tests are often used to test hypotheses. The chi-square

 A chi-square (χ2) statistic is a measure of the difference between the

Formula for Chi-Square

What Does a Chi-Square Statistic Tell You?

Chi-square analysis is applied to categorical variables and is especially

If there is no relationship between gender and course selection (that is if

When to Use a Chi-Square Test

Kent State University, University Libraries. "SPSS Tutorials: Chi-Square

These types of data are often collected via survey responses or

How to Perform a Chi-Square Test

 Create a table of the observed and expected frequencies;

Limitations of the Chi-Square Test

What Is a Chi-square Test Used for?

Who Uses Chi-Square Analysis?

Is Chi-Square Analysis Used When the Independent

The Bottom Line

In a test of goodness of fit, a marketing professional is considering

Trade on the Go. Anywhere,

You might also like