Professional Documents
Culture Documents
ISNEWQB
ISNEWQB
P A
Subject Code:22BST4AA
Subejct Title: Inferential Statistics
Medium
3 1 A 5
9 2 A 5
18 4 A 5
19 4 A 5
20 4 A 5
Medium
24 5 A 5
25 5 A 5
1 1 B 12
2 1 B 12
3 1 B 12
5 1 B 12
Explain in detail the T and F testing High
6 2 B 12
9 2 B 12
10 2 B 12
11 3 B 12
15 3 B 12
17 4 B 12
18 4 B 12
19 4 B 12
20 4 B 12
Explain Prior and posterior distributions High
21 5 B 12
22 5 B 12
23 5 B 12
24 5 B 12
25 5 B 12
1 1 C 15
2 2 C 15
High
Discuss the steps of the following testing
strategieaa)Wilcoxon signed Rank Test,
b)Mann-Whiteney U test, c)Sign test,
3 3 C 15 d) Signed rank test
High
High
CO1
Understanding
CO1
Understanding
CO1
Remembering
CO1
Understanding
CO1
Understanding
Remembering
CO2
Understanding
CO2
Understanding
CO2
Remembering
CO2
Understanding
CO2
Understanding
CO3
Understanding
CO3
Understanding
CO3
Understanding
CO3
Remembering
CO3
Understanding
CO4
Understanding
CO4
Remembering
CO4
Remembering
CO4
Understanding
CO4
Understanding
CO5
Remembering
CO5
Understanding
CO5
Remembering
CO5
Understanding
CO5
Remembering
CO1
Understanding
CO1
Understanding
CO1
Remembering
CO1
Understanding
Understanding
CO2
Remembering
CO2
Understanding
CO2
Understanding
CO2
Understanding
CO2
Understanding
CO3
Understanding
CO3
Remembering
CO3
Understanding
CO3
Understanding
CO3
Remembering
CO4
Understanding
CO4
Understanding
CO4
Understanding
CO4
Understanding
CO4
Understanding
CO5
Remembering
CO5
Understanding
CO5
Understanding
CO5
Understanding
CO5
Understanding
Understanding
Applying
Remembering
Understanding
Answer Key
(Maximum of 4 to 5 points)
The power of a statistical test is the probability that the test will correctly reject a false null hypothesis. In other
words, it is the ability of a test to detect a true effect or difference when it exists. The critical region (also
known as the rejection region) is the set of all possible sample outcomes that would lead to the rejection of the
null hypothesis
Random Sample:A random sample is a subset of a population chosen in such a way that each member of the
population has an equal chance of being included.
Parameter:A parameter is a numerical characteristic of a population, often denoted by Greek letters, that
summarizes or describes a specific aspect of the entire group.
Statistic:A statistic is a numerical measure or summary calculated from a sample, used to estimate or infer
information about a corresponding parameter of the population.
A hypothesis in inferential statistics is a statement or assumption about a population parameter, often framed
to be tested through statistical analysis to determine its validity. Types Null Hypothesis and Alternative
Hypothesis
Null Hypothesis (H0):The null hypothesis is a statement suggesting that there is no significant difference, effect,
or relationship in the population; any observed difference in the sample is due to random chance.
Alternative Hypothesis (Ha or H1):The alternative hypothesis is a statement proposing that there is a significant
difference, effect, or relationship in the population, contradicting the null hypothesis. It represents what
researchers aim to support with their data.
The Neyman-Pearson Lemma states that among all possible tests for a simple versus simple hypothesis testing
problem (where each hypothesis specifies a unique probability distribution), the likelihood ratio test is the most
powerful test for a given level of significance.
Used to determine if the mean of a single sample is significantly different from a known or hypothesized
population mean.
A confidence interval for the population arithmetic mean is a range of values constructed from sample data that is
likely to contain the true population mean with a certain level of confidence. It provides a way to quantify the
uncertainty associated with estimating a population parameter based on a sample.
Parametric testing refers to a category of statistical tests that make certain assumptions about the distribution of
the underlying population from which the sample is drawn . One-Sample t-Test: Compares the mean of a single
sample to a known or hypothesized population mean.
Two-Sample t-Test: Compares the means of two independent samples to assess if they are significantly different
from each other.
Paired Sample t-Test: Compares the means of two related or paired samples, such as repeated measurements
on the same subjects.
The F-test for equality of two variances is a statistical test used to assess whether the variances of two
independent samples are equal.
A confidence interval for the population variance is a statistical range of values that is constructed from sample
data and is used to estimate the true variance of a population with a certain level of confidence. This interval
provides a measure of the uncertainty associated with estimating the population variance based on a sample.
The sign test is a non-parametric statistical test used to determine whether the median of a sample is equal to a
hypothesized population median.
The Median Test is a non-parametric statistical test used to determine whether there is a significant difference
between the medians of two or more independent groups. It is particularly useful when the assumption of
normality is not met or when dealing with ordinal or skewed interval data.
The Signed Rank Test, also known as the Wilcoxon Signed Rank Test, is a non-parametric statistical test used to
determine whether the median of a single sample is different from a hypothesized median. It is particularly useful
when the assumption of normality is not met or when dealing with ordinal or skewed interval data.
The Wilcoxon Signed Rank Test is a non-parametric statistical test used to determine whether the median of a
paired sample is different from a hypothesized median. It's commonly applied when the data do not meet the
assumptions of normality or when working with ordinal or skewed interval data. This test is particularly useful for
paired samples or repeated measures designs.
The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-based nonparametric
test that can be used to determine if there are statistically significant differences between two or more groups of
an independent variable on a continuous or ordinal dependent variable.
Linearity:
Independence:
Homoscedasticity:
Normality of Residuals
Unbiased Estimation:
Efficiency:
Minimum Variance:
Unbiasedness
Efficiency
Consistency
Sufficiency
Minimum Variance
Estimation, in statistics, refers to the process of making educated guesses or approximations about certain
characteristics or parameters of a population based on information derived from a sample of that population. The
goal of estimation is to infer or predict unknown values, such as population means, proportions, variances, or
other parameters, using observed data from a subset of the population.
Bayesian inference
Prior probability
Likelihood
Posterior probability
Bayes' theorem
Bayes factor
Bayesian inference is a fundamental concept in statistics that provides a framework for updating our beliefs about
uncertain quantities based on evidence or data.
Bayesian interval estimation, also known as Bayesian credible interval estimation, is a method used in Bayesian
statistics to estimate a range of plausible values for an unknown parameter of interest. Unlike classical
frequentist confidence intervals, which are constructed based on the sampling distribution of the estimator,
Bayesian credible intervals directly incorporate prior knowledge and uncertainties about the parameter.
Prior Distribution:
The prior distribution in Bayesian statistics represents our beliefs or uncertainty about a parameter before
observing any data. It encapsulates our subjective knowledge, information from previous studies, expert
opinions, or assumptions about the parameter's possible values
Posterior Distribution:
The posterior distribution in Bayesian statistics represents our updated beliefs or uncertainty about a parameter
after observing data. It combines the prior distribution with the likelihood function, which quantifies the probability
of observing the data given different values of the parameter.
The Bayesian procedure is a statistical framework for making inferences, predictions, and decisions based on
probability theory, specifically Bayesian probability. It involves several steps
Answer Key
(Maximum of 6 to 7 points)
The level of significance, denoted by
�α, represents the threshold for rejecting the null hypothesis in hypothesis testing. A Type I error
occurs when the null hypothesis is incorrectly rejected when it is actually true. In other words, it
represents the situation where we detect an effect or difference when there is none in reality. The
probability of committing a Type I error is precisely the level of significance
�
α chosen for the test.A Type II error occurs when the null hypothesis is incorrectly not rejected when it
is actually false. In other words, it represents the situation where we fail to detect an effect or
difference when one truly exists
An unbiased critical region is a region of the sample space that ensures the test maintains the correct
level of significance,
Neyman-Pearson avoiding
Lemma undue
provides influence on
a systematic the test's
method for outcome duethe
constructing to factors other than
most powerful the data
hypothesis
itself.
test for a given significance level.
It helps in determining the critical region that maximizes the power of the test while maintaining a
fixed Type I error rate.
It is widely used in various fields, including signal detection, quality control, medical diagnosis, and
communication theory, to design optimal hypothesis tests with specific performance criteria.
T-testing:
T-tests are used to determine whether the means of two groups are significantly different from each other. It is typically
applied when comparing the means of two independent groups or when comparing the mean of a sample to a known
population mean F-tests, on the other hand, are used for comparing variances or testing the equality of means across multiple
groups. One common application of F-tests is in analysis of variance (ANOVA), where it is used to assess whether there are
significant differences in the means of three or more groups
Test for the equality of two or more than two normal distributions typically involves statistical tests such as the two-sample t-
test for comparing means of two groups, ANOVA for comparing means across multiple groups, or Levene's test for
comparing variances between groups, ensuring that assumptions like normality and homogeneity of variances are met.
Confidence Interval for Population Arithmetic Mean:
A confidence interval for the population arithmetic mean provides a range of plausible values for the true mean of a
population, based on sample data. It is constructed using the sample mean and the standard error of the mean, and it
quantifies the uncertainty associated with estimating the population mean
Single Mean:A single mean hypothesis test or confidence interval is used to make inferences about the population mean
based on a single sample.A two means hypothesis test or confidence interval compares the means of two independent
samples to determine if they are significantly different from each other or constructs a range of plausible differences between
them.A single proportion hypothesis test or confidence interval is used to make inferences about the population proportion
based on a single sample. two proportions hypothesis test or confidence interval compares the proportions of two
independent samples to determine if they are significantly different from each other or constructs a range of plausible
differences between them.
Parametric test: A statistical test that assumes specific parameters of the population distribution, such as mean and variance,
and relies on these assumptions for inference.
Nonparametric test: A statistical test that does not make assumptions about the underlying population
distribution, making it robust to violations of normality and suitable for ordinal or non-normally distributed data,
with applications in comparing medians, testing independence, and analyzing ranked data.
a) Wilcoxon signed-rank test: A nonparametric test used to assess whether the median difference between paired samples is
significantly different from zero.
b) Mann-Whitney U test: A nonparametric test used to determine if there is a significant difference between the medians of
two independent groups.
Friedman test: A non-parametric test used to determine whether there are statistically significant differences among multiple
paired groups, typically when comparing three or more related samples or treatments, as an alternative to repeated measures
ANOVA.
Example: Assessing if there's a difference in the performance of three different teaching methods across multiple classrooms.
Introduction to Ordinary Least Squares (OLS) Method: OLS is a statistical technique used to estimate the parameters of a
linear regression model by minimizing the sum of squared differences between observed and predicted values.
Assumptions of Ordinary Least Squares (OLS) Method: The key assumptions include linearity, independence of errors,
homoscedasticity, normality of errors, and absence of perfect multicollinearity.
Unbiasedness: An estimator is unbiased if, on average, it provides estimates that are equal to the true parameter value.
Consistency: An estimator is consistent if it converges to the true parameter value as the sample size increases indefinitely.
Efficiency: An estimator is efficient if it has the smallest possible variance among all unbiased estimators.
Sufficiency: A statistic is sufficient if it contains all the information in the sample needed to estimate the parameter, without
losing any additional information.
Estimation of parameters in multiple linear regression involves using methods like ordinary least squares to find coefficients
that minimize the sum of squared differences between observed and predicted values.
Best linear unbiased estimator (BLUE): OLS estimators have the smallest variance among all linear unbiased estimators.
Efficient: OLS estimators achieve the Cramér-Rao lower bound, making them statistically efficient.
Consistent: OLS estimators converge to the true parameter values as sample size increases.
Unbiased: OLS estimators have zero bias under the assumptions of the linear regression model.
Gauss-Markov theorem: OLS estimators are the best linear unbiased estimators under the classical linear regression model
assumptions.
Prior Distribution: Represents initial beliefs about a parameter before observing data.
Posterior Distribution: Represents updated beliefs about a parameter after incorporating observed data.
Point estimation in Bayesian statistics involves deriving a single value (e.g., posterior mean, median) as the best estimate of a
parameter, considering both prior information and observed data.
Prior probability: Initial belief about the parameter before observing data.
Likelihood: Probability of observing the data given different parameter values.
Posterior probability: Updated belief about the parameter after incorporating observed data.
Bayes' theorem: Formula to update prior beliefs using observed data.
Bayes factor: Ratio of the likelihoods of two competing hypotheses.
Bayesian testing procedures involve comparing hypotheses by assessing the posterior probabilities of competing models or
hypotheses.
Bayesian procedures involve updating beliefs about uncertain quantities using Bayes' theorem, incorporating prior knowledge
with observed data to derive posterior distributions for inference and decision-making.
Answer Key
(Maximum of 8 to 9 points)
Neyman-Pearson Lemma provides a systematic method for constructing the most powerful hypothesis
test for a given significance level.
It helps in determining the critical region that maximizes the power of the test while maintaining a
fixed Type I error rate.
It is widely used in various fields, including signal detection, quality control, medical diagnosis, and
communication theory, to design optimal hypothesis tests with specific performance criteria.
Small sample tests are statistical tests designed for use with small sample sizes, often when population parameters are
unknown or when assumptions of large sample tests are violated.
Types include t-tests for comparing means, chi-square tests for independence, and Fisher's exact test for small contingency
tables.
a) Wilcoxon Signed Rank Test: Non-parametric test for paired data, assessing if median differences between paired
observations differ significantly from zero.
b) Mann-Whitney U Test: Non-parametric test for independent samples, determining if there's a significant difference
between their medians.
c) Sign Test: Non-parametric test assessing whether the median of a paired difference is significantly different from zero,
based on the signs of the differences.
d) Signed Rank Test: Non-parametric test similar to Wilcoxon Signed Rank Test, but uses the magnitudes of the differences.
Estimation of mean and variance of a normal distribution using maximum likelihood estimator involves finding parameter
values that maximize the likelihood function given the observed data.
Bayesian statistical inference involves updating beliefs about uncertain quantities using Bayes' theorem, incorporating prior
knowledge with observed data to derive posterior distributions for inference and decision-making.