Introduction To Inferential Statistics

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Introduction to Inferential Statistics

Inferential Statistics
Statistical inference's primary goal is about drawing conclusions about a population
parameter from a sample of that population. A correlation coefficient, a standard deviation, a mean, a
median, or any other of several statistical parameters could be that parameter. The purpose of
inferential statistics is to draw a conclusion (an inference) about conditions that exist in a population
(the complete set of observations) by studying a sample (a subset) drawn from the population ( King
and Minium ,2018).
Psychologists use inferential statistics to draw conclusions and to make inferences that are based on
the numbers from a research study but that go beyond the numbers. For example, inferential statistics
allow researchers to make inferences about a large group of individuals based on a research study in
which a much smaller number of individuals took part.

Two main concepts of inferential statistics are:

1. Population- A population is the complete set of observations about which an investigator wishes to
draw conclusions.

2. Sample- It is a part of population In inferential statistics, we try to make inferences about the
population from the sample.

Researchers frequently utilize a process known as hypothesis testing in inference.


First, a research question is posed by the investigator. After taking a sample, the mean is inferred that
is likely to be different if we took another sample with the same size. The relative frequency
distribution of statistics values that, if randomly selected from a given population, an unlimited
number of samples of a specific size would yield. This is termed as sampling distribution. In statistics,
the selection procedure is crucial. Each element of the population must have an equal chance of being
included in the sample in order to obtain a random sample. The researcher's endeavor to serve as a
human randomizer is the fundamental component of this methodology.

Procedures of Inferential Statistics

NHST and Estimation


NHST is a method of statistical inference by which an experimental factor is tested
against a hypothesis of no effect or no relationship based on a given observation. The
hypothesis that a researcher tests is called the null hypothesis, symbolized H0. It is the
hypothesis that he or she will decide to retain or reject. The approach combines the ideas of
acceptance based on critical rejection areas, developed by Neyman & Pearson in 1928, with
significance testing, developed by Fisher in 1925. The p-value is an index that is used to
reach this conclusion. According to Pernet (2015), NHST refers to using inference to test the
probability of an observed effect against the null hypothesis of no effect/relationship.

The use of NHST and its role in decision‐making in healthcare research is controversial, not
least because it is typically misunderstood and misused. The idea that an intervention is effective, or
exposure to a risk factor is only important if the value of p is less than 0.05 is a reductionist view that
does not always reflect clinical importance. There have been frequent calls for a statistics reform
regarding the use of NHST in decision‐making, including no longer using the concept of statistical
significance. Nonetheless, the binary approach to decision‐making is a convenient one, and its use has
remained ubiquitous. Regardless of the future for statistical significance, there are calls for greater
focus on the magnitude and accuracy of the observed effects and their clinical importance. This
ultimately seems sensible and accords with the original intentions of Fisher. 5 However, to do so will
bring challenges not least because of the subjectivity that will exist when interpreting study results.
Journals have been cautious in their approach to calls for a statistics reform, and it appears that
statistical significance will continue to play a role in decision ‐making also in obstetrics and
gynecology. This may be acceptable if we continue to educate ourselves in the role of statistics,
including controlling the probability of type I and II errors plus the importance of sample size. In
particular, statisticians, researchers, and clinicians all need to recognize that a statistical answer based
on NHST to the question posed is not necessarily an answer to the scientific question asked. Statistical
inference does not automatically reflect clinical inference (Scand, Sedgwick and
Hammer,.2022)

It can be estimated by using a table of random numbers that is a table of


computer-generated numbers in which there is an equal chance of every digit from 0 to 9
appearing. One can read a table of random numbers as consecutive single digits, consecutive
two-digit numbers, consecutive three-digit numbers, and so on. Additionally, we have the
option to read the numerals horizontally or vertically, in any manner. We have to include the
same element again if while using replacement sampling and it is chosen again. In the event
that we use without replacement our sampling and an identification number resurfaces, we
disregard it. (King & Minium,2018).

The 6 steps to hypothesis testing are:


1) State the Selected Null and Alternative Hypothesis. Conducting a conventional
hypothesis test requires having a precise, testable null hypothesis. In its most basic
form, the null hypothesis states that two sets of observations are identical. More
specifically, one need not always compare "two sets of observations" to compare two
populations or samples. The hypothetical or conjectured set of observations is called a
null hypothesis (μH), and it states that there is no difference between the two sets of
observations. Conversely, the alternative hypothesis states that there are differences
between the two sets of observations. The set of data under examination may differ
from or overlap with the set of data being proposed, according to the alternative
hypothesis (King & Minium, 2018).
2) Select an Appropriate Inferential Statistical Test. You will do a Z test if your
sample size is greater than or equal to 30, and a t test if it is less than 30. However,
you would then need to determine whether the sample mean differs significantly from
the population mean in order to reject the null hypothesis. The degrees of significance
are relevant in this situation (King & Minium,2018).
3) Select Level of Significance. By choosing a significance level, you are essentially
expressing how comfortable you are with answering a test question incorrectly.
Formally speaking, the probability of making a Type I Error—that is, rejecting the
null hypothesis when it is true—is what determines the level of significance. On the
other hand, accepting the null hypothesis when it is untrue constitutes a Type II error.
A typical significance level (α) for hypothesis testing is 0.05, signifying a 5%
likelihood of committing a Type I error. To indicate a 1% chance of committing a
Type I error, you would therefore probably use an α = 0.01 if you wanted to be more
cautious. (King & Minium,2018)
4) Determine Regions of the Rejection Region. Every significance level has a unique
Z score assigned to it. Z ≠ 0 indicates that the sample mean differs from the
population mean by either more or less. We must check to see if our Z score is within
the rejection range for our particular test in order to ascertain whether it is
significantly different.Our rejection regions are divided into the tails of a frequency
distribution on each side of the mean if our test is non-directional. Our rejection
region would be in the tails, which represent 0.025 below and 0.025 above the mean,
for α = 0.05. (King & Minium,2018)
5) Perform Test. Perform the test selected.
6) Make a Conclusive Statement Stemming from the Result of the Test. It's critical
that your statement accurately reflects the outcome of the test. P-values offer a
conclusive figure for illuminating the degree of uncertainty in the statistical test
results.

Random Sampling Distribution of Mean (RSDM)


Random sampling distribution of the mean ( X) is a theoretical relative frequency
distribution of all the values of x̄ that would be obtained by chance from an infinite number
of samples of a particular size drawn from a given population. For samples of a specific size,
a sampling distribution of the mean displays every conceivable value that the sample mean
may have along with the probability of each .There will only be a limited number of possible
samples if the population has a fixed size. This allows one to create a complete sampling
distribution and investigate its characteristics. Let us consider a population of four scores (2,
4, 6, and 8) from which size 2 samples are to be chosen. There are four options for selecting
the first score and four options for selecting the second. Thus, 4 × 4 = 16 possible samples are
possible. There is a 0.5 chance that the first element will be 2. Therefore, using the
multiplication rule for independent occurrences, the probability of the sample (2,2) is equal to
the product of the two probabilities, or (1/4)(1/4) = 1/16. Based on similar reasoning, there is
a 1/16 chance that any of the other samples exist. Every one of the 16 potential samples has
an equal chance of occurring if samples are chosen in this way, therefore the fundamental
requirement of random sampling is satisfied. Equal probability of all possible samples is the
outcome of random sampling, not equal probability of all potential sample means. First, the
same figure displays the population's score distribution. Secondly, the sample distribution of
the mean has been adjusted using a normal curve. (King & Minium,2018).

Characteristics of Random Sampling Distribution of Means


The endlessly long sequence of samples needed to generate the true relative
frequency distribution of x̄ from randomly selected samples within a population is impractical
in most real-world settings. Mathematicians have nevertheless been able to infer its
characteristics and explain what would happen in the case of an infinite number of trials. The
fact that the samples are selected at random makes this possible. The properties that define a
distribution include its shape, mean, and standard deviation.The mean of the population of
scores from which the samples were drawn is equal to the expected value of the sample
mean, which is the mean of any random sampling distribution of x̄ . It holds true regardless of
n, 𝜎, and the form of the population .
To put it symbolically,

𝜇x̄ = 𝜇x

The standard deviation of the random sampling distribution of the mean, sometimes referred
to as the standard error of the mean, is determined by the sample size and the population
standard deviation, 𝜎X.

𝜎x̄ = 𝜎x̄ /√n

This distribution, which includes all feasible means of samples of a given size and hence
represents the population of them rather than just some of them, serves as a representation of
the population. ( King & Minium, 2018).

If the population of scores is normally distributed, then the sampling distribution of the mean
will also be normally distributed in terms of distribution shape, irrespective of sample
size.Rather than a single random sampling distribution of X corresponding to a given
population, there is a family of such distributions, one for each possible sample size.

According to the central limit theorem, "the approximation to the normal distribution
improves as sample size increases; the random sampling distribution of the mean tends
toward a normal distribution irrespective of the shape of the population of observations
sampled." (King & Minium,2018) .
Figure 1: A Normally Distributed Population of Scores and Random Sampling Distributions
of Means for n = 3 and n = 9.
Sampling With and Without Replacement
There exist two sampling techniques to generate a random sample: replacement
sampling, in which an element is sampled more than once, and non-replacement sampling, in
which an element is sampled more than once. Every score is received, recorded, and then
returned to the population before the next score is chosen when replacement sampling is
applied. If there are 50 tickets in a lottery and we choose one, put it away, draw another, and
so on until five tickets are selected, we are sampling without replacement. Within the
parameters of this sampling plan, no element may occur more than once in a sample. If we
select a ticket, note its number, and then return it before moving on to the next, we are
sampling with replacement, though. When an element appears more than once, drawing a
sample using this design is doable. These two approaches satisfy the random sampling
criterion, but sampling without replacement precludes a number of sample outcomes that are
possible with replacement sampling. A lower standard error of the mean is the outcome of
replication sampling. That being said, the difference is insignificant if the sample is tiny compared to
the population (King & Minium,2018).
Testing Hypothesis About Mean

Testing a Hypothesis about a Single Mean


Dr. Brown initially formulates a statistical hypothesis in order to respond to an inquiry,
which is: "Do sixth-grade students in her school district perform as well on the mathematics
achievement test as students nationally?" In order to test the correctness of her theory, Dr.
Brown inquires as to what sample means, if the population mean of 85 is correct, would
happen by chance if numerous samples of the same size were randomly selected from her
population. Dr. Brown now refers to the random sampling distribution of 𝑋. Se compares her
sample mean with those in this sampling distribution, imagining the theoretical random
sampling distribution of 𝑋 for an infinite number of samples of size 100 taken from a
population whose mean is 85. The key to any problem in statistical inference is to discover
what sample values will occur by chance in repeated sampling and with what probability.

Null and Alternative Hypothesis


Null hypothesis is a hypothesis about a population parameter (e.g., μx) that a researcher
tests. H0 symbol for the null hypothesis. It is the hypothesis that he or she will decide to retain
or reject. In Dr. Brown’s problem, H0: μx=85. There is an alternative hypothesis, denoted by
HA, for each null hypothesis. Alternative hypothesis is about a population parameter that
contradicts the null hypothesis; in research, the hypothesis the researcher wishes to prove is
true.
Dr. Brown wants to find out whether there is a difference (if any) regardless of direction ,
therefore her alternative hypothesis is expressed as HA: μx≠ 85. Both H0 and HA are always
statements about the population parameter. The decision to reject or retain the hypothesis always
refers to H0 and never to HA. It is H0 that is the subject of the test.
When do we Retain and Reject Null Hypothesis
Our obtained value of 𝑋 will almost never exactly equal to μ𝑥 when we draw a random
sample from a population. The chosen criterion for differentiating between those 𝑋s that
would be common and those that would be rare if H0 is true determines whether to reject or
keep the null hypothesis. Common research practice is to reject H0 if the sample mean is so
deviant that its probability of occurrence by chance in random sampling is .05 or less. This
kind of criterion is known as the level of significance, and the Greek letter ⍶ (alpha) stands
for it. In other circumstances, the researcher might want to be even stricter and employ a
significance level of.01 or below. Level of significance is the probability value that is used
as a criterion to decide that an obtained sample statistic (𝑋) has a low probability of occurring
if the null hypothesis is true (resulting in rejection of the null hypothesis).

Procedures for Hypothesis Testing


In summary, the process for evaluating any statistical hypothesis, be it related to
population characteristics, means, or frequencies, is as follows:
Step 1: A specific hypothesis, called the null hypothesis (H0), is formulated about a
parameter of the population (e.g., 𝜇x) along with an alternative hypothesis (HA).
Step 2: A random sample is drawn from the population of observations, and the value of
the sample statistic (e.g., 𝑋) is obtained.
Step 3: The random sampling distribution of the statistic under consideration is examined to
learn what sample outcomes would occur by chance over an infinite number of repetitions
(and with what relative frequencies) if H0 is true.
Step 4: H0 is retained if the particular sample outcome is in line with the outcomes expected if
the hypothesis is true; otherwise, it is rejected and HA is accepted.

The Statistical Decision


The final decision may depend on our decision criterion. It is not implied by the choice to
"retain" H0 that H0 is likely to be accurate. Instead, this choice only indicates that there is insufficient
data to rule out the null hypothesis. If tested in the same manner, further null hypotheses that might
have been proposed would have likewise been kept. For examplewhere the hypothesis is H0:μx=85. If
sample mean is 86 and is related to the hypothesised sampling distribution of 𝑥 ,our decision will be
to retain the null hypothesis. But suppose the hypothesis had been H0: μx=87. If the same sample
result had occurred, 𝑥=86, we would again be led to retain the null hypothesis that μx=87.

Choice of HA: One-Tailed and Two-Tailed Tests


Nondirectional (two-tailed) test the alternative hypothesis, HA, states that the population
parameter may be either less than or greater than the value stated in H0 (and the critical region
is divided between both tails of the sampling distribution). A two-tailed test is conducted
when the alternative hypothesis is nondirectional, and it is possible to identify a difference
between the parameter's true value and its hypothesized value regardless of the difference's
direction.
Directional (one-tailed test) the alternative hypothesis, HA, states that the population
parameter differs from the value stated in H0 in one particular direction (and the critical
region is located in only one tail of the sampling distribution). In a one-tailed test, the
alternative hypothesis limits what the researcher has a chance to discover. The major
disadvantage of a one-tailed test is that it does not allow for any chance of discovering that
reality is just the opposite of what the alternative hypothesis says. In general, a directional
alternative hypothesis is appropriate only when there is no practical difference in meaning
between retaining the null hypothesis and concluding that a difference exists in a direction
opposite to that stated in the directional alternative hypothesis.

Review of Assumptions in Testing Hypotheses about a Single Mean


The normal curve table is always used when doing a z test. Several requirements must
be met for the normal curve model of statistical inference about single means to be precisely
accurate:
1. A random sample has been drawn from the population.
2. The sample has been drawn by the with-replacement sampling plan.
3. The sampling distribution of 𝑥 follows the normal curve.
4. The standard deviation of the population of scores is known.

It is frequently challenging to obtain a truly random sample in practice. If this premise


is broken, the sampling distribution's mean and standard deviation could be impacted in
unexpected ways. It is common practice to sample without replacement. The consequent error
in inference is quite small as long as the sample size is a small fraction of the population size.
The third assumption is that the sampling distribution of 𝑥 is normally distributed. When the
population's scores are somewhat close to a normal distribution, this assumption can be
approximated quite well. The central limit theorem is useful when the sample size is 25 or
greater and the scores in the population are not regularly distributed. The fourth assumption is
that we know σ𝑥, the standard deviation of the population from which our sample is drawn. It
appears in the formula for the z test (it is the numerator of the standard error of the mean).
But in actual practice, we will rarely know the value of σ𝑥 ( King & Minium, 2018).

Estimating standard errors of mean when sigma is unknown


The mean of the estimates made from all possible samples equals the value of the parameter
estimated (X is an unbiased estimator of X; Sx is not an unbiased estimator of αx). An unbiased
estimator in statistics represents a crucial tool for accurately estimating population parameters based
on sample data. Its essence lies in the expectation that, while individual sample estimates may vary
around the true population parameter, the average of these estimates converges to the actual parameter
value. Take, for instance, the sample mean, a quintessential example of an unbiased estimator: while
any single sample mean may deviate from the population mean, the average of sample means across
all possible samples precisely equals the population mean. This principle extends to other estimators,
such as sample variance, albeit with nuances due to the intricacies of variance calculations and
degrees of freedom adjustments. Unbiased estimators serve as cornerstones in statistical analysis,
underpinning the reliability of inferential methods and ensuring the robustness of conclusions drawn
from sample data in representing broader populations. It has been shown that the tendency toward
underestimation will be corrected if ∑ (X-X )2 is divided by n-1 rather than by n. The following
formula incorporates this correction:
When we substitute sX for X, the result is called the estimated standard error of the

mean, symbolized sX:

Degree of Freedom

The notion of degrees of freedom in statistical analysis pertains to the number of variables in
a calculation that can vary independently. In the context of computing the sample standard deviation
Sx for a dataset comprising three scores X1, X2, X3, only two deviation scores can freely change due
to the constraint that the sum of deviations from the mean (∑X – X) must equate to zero.
Consequently, if the values of any two deviation scores are known, the third becomes fixed, resulting
in n-1 degrees of freedom for a sample of size (n). This principle is pivotal for precisely estimating
population variance from sample information. Degrees of freedom are also integral in hypothesis
testing, particularly in determining the critical values of (t)from statistical tables. For instance, with 20
degrees of freedom and a significance level(α) of .05, the critical (t) values delineating the central
95% of the distribution would be ±2.086, showcasing the symmetric nature of the (t) distribution
around zero (King, Rosapa & Minium, 2018).

Levels of Significance versus p-Values

In statistical hypothesis testing, the significance level (α) represents the probability of
incorrectly rejecting the null hypothesis (H0) when it is actually true. Researchers should determine α
before conducting tests to control the risk of this error. However, many fail to specify α, although it's
commonly assumed to be no higher than 0.05, a widely accepted criterion in behavioral science
journals. P-values indicate the probability of observing sample data as extreme as or more extreme
than what was obtained, given that H0 is true. Researchers often report p-values instead of α, with
lower values suggesting stronger evidence against H0. For instance, Dr. Brown's study found a p-
value of 0.0062, indicating a rare outcome if H0 were true. If the p-value is less than or equal to α, H0
is rejected. Researchers commonly compare p-values to predefined significance levels such as 0.05,
0.01, or 0.001. They report significance based on whether the p-value falls below or above these
thresholds. For example, if a p-value is "significant at the 0.01 level," it means the result is unlikely if
H0 were true. However, the reported significance levels do not necessarily reflect the exact thresholds
used for all sets of results; often, the actual level of significance applied across different tests is likely
the same, usually 0.05.

In conclusion, while significance levels and p-values are essential in hypothesis testing,
researchers often interchangeably use them in reporting results, which can lead to confusion. The
reported significance levels may not precisely reflect the thresholds used in the analysis( King &
Minium, 2018).
Interpreting results of hypothesis testing

A Statistically Significant Difference versus a Practically Important Difference

In statistical analysis, if the t-value is sufficiently large, it indicates statistical significance,


leading to the rejection of the null hypothesis. Significance, in this context, means that based on a
specific decision criterion, the null hypothesis has been tested and rejected. The t-value's magnitude
depends on both the numerator and denominator of its formula. The numerator reflects the difference
between the observed sample mean (X) and the hypothesized mean (hyp), which is influenced by
random variation and the true difference between hyp and the actual population mean. A larger
difference between hyp and the true mean results in a larger t-value. The denominator measures
random variation due to sampling. With a large sample size, the denominator becomes small, so even
a relatively small difference between X and hyp can yield a large enough t-value to reject the null
hypothesis. However, the practical significance of this difference should be considered. Dr. Brown's
study exemplifies this, where even small differences in means may be statistically significant but
practically inconsequential. For smaller sample sizes, the standard error of the mean increases,
making it harder to detect significant differences unless the true difference between hyp and the
population mean is substantial. Therefore, determining an appropriate sample size is crucial in
experimental design to ensure the sensitivity of statistical tests.

In essence, while statistical significance is important, researchers must also evaluate the
practical significance of their findings, considering the context of their research questions. The
importance of a difference depends on the specific implications of the research rather than just
statistical calculations( King & Minium, 2018).

Errors in Hypothesis Testing


In hypothesis testing, there are two potential states: either the null hypothesis (H0) is true
or false, leading to two possible decisions: rejecting or retaining it. This creates four possible
outcomes, where correct decisions occur when we retain H0 when it's true or reject it when it's false.
Errors can arise when our decision contradicts reality, resulting in Type I and Type II errors. A Type I
error occurs when we reject H0, believing it to be false, even though it's true. For instance, using a 5%
significance level, we might reject H0 based on a sample mean that appears rare but could occur due
to random chance. The probability of Type I errors is denoted by the Greek letter α (alpha). Therefore,
in hypothesis testing, it's essential to balance the risk of Type I errors with the significance level
chosen for the test.

Figure(3): Testing H0: H0 is true, but X leads to a Type I error

If we fail to reject the null hypothesis (H0) when it's false, we commit a Type II error. For
instance, suppose we're testing H0: X = 150 at a 5% significance level. If we obtain a sample mean of
152 but the true population mean is 154, our sample mean belongs to the true distribution centered on
154. However, evaluating it based on the sampling distribution centered on 150, as per H0, it doesn't
appear sufficiently deviant to reject H0. Therefore, we retain H0, unaware that a difference exists.
This incorrect decision constitutes a Type II error, meaning we fail to detect a genuine difference that
indeed exists in reality. The probability of committing a Type II error is indicated by the Greek letter
(beta):

Figure(4): Testing H0: H0 is false, but it leads to a Type II error.

In hypothesis testing, researchers commonly evaluate outcomes based on significance levels


such as 0.05 or 0.01, but alternatives like 0.02 or 0.001 are permissible. The choice of significance
level involves balancing the risks of Type I and Type II errors. For instance, setting α at 0.25 increases
the risk of Type I errors, while reducing α to 0.001 raises the likelihood of Type II errors. Choosing
the significance level involves considering practical consequences, such as the implications of
rejecting the null hypothesis when it's true. In Dr. Brown's case, concluding superiority or inferiority
in mathematics achievement scores without valid evidence could lead to unwarranted actions. Hence,
the acceptable level of risk should be translated into concrete terms. Typically, 0.05 and 0.01 are
common choices as they provide reasonable assurance against Type I errors without overly increasing
the chance of retaining false hypotheses. Importantly, the significance level should be determined
before data collection to maintain objectivity in the analysis.

The Power of a Test


In Dr. Brown’s problem, she would be making a correct decision (i.e., rejecting a false null
hypothesis) if she claimed that the mean mathematics achievement score of sixth graders in her school
district was superior to the national norm; and in fact it really was superior. Because the probability of
retaining a false null hypothesis is beta, the probability of correctly rejecting a false null hypothesis is
(1- beta):

The value of (1-beta) is called the power of the test. Among several ways of conducting a test,
the most powerful one is that offering the greatest probability of rejecting H0 when it should be
rejected. Because beta and the power of a test are complementary, any condition that decreases beta
increases the power of the test, and vice versa. In the next several sections, we will examine the
factors affecting the power of a test (and beta) ( King & Minium, 2018).

You might also like