Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 53

STATISTICS

What is difference between z test and t test?


Generally, z-tests are used when we have large sample sizes (n > 30), whereas t-tests are
most helpful with a smaller sample size (n < 30). Both methods assume a normal distribution
of the data, but the z-tests are most useful when the standard deviation is known.

ANOVA (ANALYSIS OF VARIANCE)


Is Anova Parametric?
In ANOVA, the dependent variable must be a continuous (interval or ratio) level of
measurement. The independent variables in ANOVA must be categorical (nominal or
ordinal) variables. Like the t-test, ANOVA is also a parametric test and has some
assumptions. ANOVA assumes that the data is normally distributed.

What is ANOVA
ANOVA is a statistical technique that assesses potential differences in a scale-level
dependent variable by a nominal-level variable having 2 or more categories.  For example, an
ANOVA can examine potential differences in IQ scores by Country (US vs. Canada vs. Italy
vs. Spain).  The ANOVA, developed by Ronald Fisher in 1918, extends the t and the z test
which have the problem of only allowing the nominal level variable to have two categories.
This test is also called the Fisher analysis of variance.

General Purpose of ANOVA


Researchers and students use ANOVA in many ways.  The use of ANOVA depends on the
research design.  Commonly, ANOVAs are used in three ways:
  one-way ANOVA,
 two-way ANOVA, 
 and N-way ANOVA.
One-Way ANOVA
A one-way ANOVA has just one independent variable.  For example, difference in IQ can be
assessed by Country, and County can have 2, 20, or more different categories to compare.
Two-Way ANOVA
A two-way ANOVA refers to an ANOVA using two independent variables.  Expanding the
example above, a 2-way ANOVA can examine differences in IQ scores (the dependent
variable) by Country (independent variable 1) and Gender (independent variable 2).  Two-
way ANOVA can be used to examine the interaction between the two independent variables.
Interactions indicate that differences are not uniform across all categories of the independent
variables.  For example, females may have higher IQ scores overall compared to males, but
this difference could be greater (or less) in European countries compared to North American
countries.  Two-way ANOVAs are also called factorial ANOVAs.
N-Way ANOVA
A researcher can also use more than two independent variables, and this is an n-way ANOVA
(with n being the number of independent variables you have).  For example, potential
differences in IQ scores can be examined by Country, Gender, Age group, Ethnicity, etc,
simultaneously.

General Purpose – Procedure


Omnibus ANOVA test:
The null hypothesis for an ANOVA is that there is no significant difference among the
groups.  The alternative hypothesis assumes that there is at least one significant difference
among the groups.  After cleaning the data, the researcher must test the assumptions of
ANOVA.  They must then calculate the F-ratio and the associated probability value (p-
value).  In general, if the p-value associated with the F is smaller than .05, then the null
hypothesis is rejected and the alternative hypothesis is supported.  If the null hypothesis is
rejected, one concludes that the means of all the groups are not equal.  Post-hoc tests tell the
researcher which groups are different from each other.

So what if you find statistical significance?  Multiple comparison tests


When you conduct an ANOVA, you are attempting to determine if there is a statistically
significant difference among the groups.  If you find that there is a difference, you will then
need to examine where the group differences lay.
At this point you could run post-hoc tests which are t  tests examining mean differences
between the groups.  There are several multiple comparison tests that can be conducted that
will control for Type I error rate, including the Bonferroni, Scheffe, Dunnet, and Tukey tests.
Types of Research Questions the ANOVA Examines
One-way ANOVA: Are there differences in GPA by grade level (freshmen vs. sophomores
vs. juniors)?
Two-way ANOVA: Are there differences in GPA by grade level (freshmen vs. sophomores
vs. juniors) and gender (male vs. female)?

Data Level and Assumptions


The level of measurement of the variables and assumptions of the test play an important role
in ANOVA.  In ANOVA, the dependent variable must be a continuous (interval or ratio)
level of measurement.  The independent variables in ANOVA must be categorical (nominal
or ordinal) variables.  Like the t-test, ANOVA is also a parametric test and has some
assumptions.  ANOVA assumes that the data is normally distributed.  The ANOVA also
assumes homogeneity of variance, which means that the variance among the groups should
be approximately equal.  ANOVA also assumes that the observations are independent of each
other.  Researchers should keep in mind when planning any study to look out for extraneous
or confounding variables.  ANOVA has methods (i.e., ANCOVA) to control for confounding
variables.
Testing of the Assumptions
1. The population from which samples are drawn should be normally distributed.
2. Independence of cases: the sample cases should be independent of each other.
3. Homogeneity of variance: Homogeneity means that the variance among the groups
should be approximately equal.
These assumptions can be tested using statistical software (like Intellectus Statistics!).  The
assumption of homogeneity of variance can be tested using tests such as Levene’s test or the
Brown-Forsythe Test.  Normality of the distribution of the scores can be tested using
histograms, the values of skewness and kurtosis, or using tests such as Shapiro-Wilk or
Kolmogorov-Smirnov.  The assumption of independence can be determined from the design
of the study.
It is important to note that ANOVA is not robust to violations to the assumption of
independence.  This is to say, that even if you violate the assumptions of homogeneity or
normality, you can conduct the test and basically trust the findings.  However, the results of
the ANOVA are invalid if the independence assumption is violated.  In general, with
violations of homogeneity the analysis is considered robust if you have equal sized groups. 
With violations of normality, continuing with the ANOVA is generally ok if you have a
large sample size.

Related Statistical Tests: MANOVA and ANCOVA


Researchers have extended ANOVA in MANOVA and ANCOVA.  MANOVA stands for the
multivariate analysis of variance.  MANOVA is used when there are two or more
dependent variables.  ANCOVA is the term for analysis of covariance.  The ANCOVA is
used when the researcher includes one or more covariate variables in the analysis.

POWER OF A HYPOTHESIS TEST


What is statistical power?
The power of any test of statistical significance is defined as the probability that it will reject
a false null hypothesis. Statistical power is inversely related to beta or the probability of
making a Type II error. In short, power = 1 – β.
In plain English, statistical power is the likelihood that a study will detect an effect when
there is an effect there to be detected. If statistical power is high, the probability of making a
Type II error, or concluding there is no effect when, in fact, there is one, goes down.
Statistical power is affected chiefly by the size of the effect and the size of the sample used to
detect it. Bigger effects are easier to detect than smaller effects, while large samples offer
greater test sensitivity than small samples.

The probability of not committing a Type II error is called the power of a hypothesis test.


Effect Size
To compute the power of the test, one offers an alternative view about the "true" value of the
population parameter, assuming that the null hypothesis is false. The effect size is the
difference between the true value and the value specified in the null hypothesis.
Effect size = True value - Hypothesized value
For example, suppose the null hypothesis states that a population mean is equal to 100. A
researcher might ask: What is the probability of rejecting the null hypothesis if the true
population mean is equal to 90? In this example, the effect size would be 90 - 100, which
equals -10.
Factors That Affect Power
The power of a hypothesis test is affected by three factors.
 Sample size (n). Other things being equal, the greater the sample size, the greater the
power of the test.
 Significance level (α). The lower the significance level, the lower the power of the
test. If you reduce the significance level (e.g., from 0.05 to 0.01), the region of
acceptance gets bigger. As a result, you are less likely to reject the null hypothesis.
This means you are less likely to reject the null hypothesis when it is false, so you are
more likely to make a Type II error. In short, the power of the test is reduced when
you reduce the significance level; and vice versa.
 The "true" value of the parameter being tested. The greater the difference between
the "true" value of a parameter and the value specified in the null hypothesis, the
greater the power of the test. That is, the greater the effect size, the greater the power
of the test.

Test Your Understanding


Problem 1
Other things being equal, which of the following actions will reduce the power of a
hypothesis test?
I. Increasing sample size.
II. Changing the significance level from 0.01 to 0.05.
III. Increasing beta, the probability of a Type II error.
(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above
Solution

What does a power of 0.8 mean?


Number or percentage that indicates the probability a study will obtain a statistically
significant effect. For example, a power of 80 percent (or 0.8) means that a survey or study
(when conducted repeatedly over time) is likely to produce a statistically significant result 8
times out of 10.
How is effect size reported?
Effect Size. Effect size is a statistical concept that measures the strength of the relationship
between two variables on a numeric scale. ... Cohen's d effect size: Cohen's d is known as the
difference of two population means and it is divided by the standard deviation from the data.

Strenght of Effect size


Cohen suggested that d=0.2 be considered a 'small' effect size, 0.5 represents a
'medium' effect size and 0.8 a 'large' effect size. This means that if two groups' means don't
differ by 0.2 standard deviations or more, the difference is trivial, even if it is statistically
signficant.

 CASE CONTROL ANALYSIS

Definition
A study that compares patients who have a disease or outcome of interest (cases) with
patients who do not have the disease or outcome (controls), and looks back retrospectively to
compare how frequently the exposure to a risk factor is present in each group to determine
the relationship between the risk factor and the disease.
Case control studies are observational because no intervention is attempted and no attempt
is made to alter the course of the disease. The goal is to retrospectively determine the
exposure to the risk factor of interest from each of the two groups of individuals: cases and
controls. These studies are designed to estimate odds.
Case control studies are also known as "retrospective studies" and "case-referent studies."
Advantages
 Good for studying rare conditions or diseases
 Less time needed to conduct the study because the condition or disease has already
occurred
 Lets you simultaneously look at multiple risk factors
 Useful as initial studies to establish an association
 Can answer questions that could not be answered through other study designs
Disadvantages
 Retrospective studies have more problems with data quality because they rely on
memory and people with a condition will be more motivated to recall risk factors
(also called recall bias).
 Not good for evaluating diagnostic tests because it’s already clear that the cases
have the condition and the controls do not
 It can be difficult to find a suitable control group
Design pitfalls to look out for
Care should be taken to avoid confounding, which arises when an exposure and an outcome
are both strongly associated with a third variable. Controls should be subjects who might
have been cases in the study but are selected independent of the exposure. Cases and controls
should also not be "over-matched."
Is the control group appropriate for the population? Does the study use matching or pairing
appropriately to avoid the effects of a confounding variable? Does it use appropriate
inclusion and exclusion criteria?

Introduction
The first steps in learning to understand and appreciate evidence-based medicine are daunting
to say the least, especially when confronted with the myriad of statistics in any paper. This
short tutorial aims to introduce healthcare students to the interpretation of some of the most
commonly used statistics for reporting the results of medical research.
The scenario for this tutorial is centred around the diagram below, which outlines a fictional
parallel two arm randomised controlled trial of a new cholesterol lowering medication against
a placebo.

Odds ratio (OR)


An odds ratio is a relative measure of effect, which allows the comparison of the intervention
group of a study relative to the comparison or placebo group. 
So when researchers calculate an odds ratio they do it like this:
The numerator is the odds in the intervention arm
The denominator is the odds in the control or placebo arm = Odds Ratio (OR)
So if the outcome is the same in both groups the ratio will be 1, which implies there is no
difference between the two arms of the study.
However:
If the OR is > 1 the control is better than the intervention.
If the OR is < 1 the intervention is better than the control.
Concept check 1
If the trial comparing SuperStatin to placebo with the outcome of all cause mortality found
the following:
Odds of all cause mortality for SuperStatin were 0.4
Odds of all cause mortality for placebo were 0.8
Odds ratio would equal 0.5
So if the trial comparing SuperStatin to placebo stated OR 0.5
What would it mean?
A) The odds of death in the SuperStatin arm are 50% less than in the placebo arm.
B) There is no difference between groups
C) The odds of death in the placebo arm are 50% less than in the SuperStatin arm.

Confidence interval (CI)


The confidence interval indicates the level of uncertainty around the measure of effect
(precision of the effect estimate) which in this case is expressed as an OR. Confidence
intervals are used because a study recruits only a small sample of the overall population so by
having an upper and lower confidence limit we can infer that the true population effect lies
between these two points. Most studies report the 95% confidence interval (95%CI).
If the confidence interval crosses 1 (e.g. 95%CI 0.9-1.1) this implies there is no difference
between arms of the study.
Concept check 2
So if the trial comparing SuperStatin to placebo stated OR 0.5 95%CI 0.4-0.6
What would it mean?
A)  The odds of death in the SuperStatin arm are 50% less than in the placebo arm with the
true population effect between 20% and 80%. 
B) The odds of death in the SuperStatin arm are 50% less than in the placebo arm with the
true population effect between 60% and 40%.
C) The odds of death in the SuperStatin arm are 50% less than in the placebo arm with the
true population effect between 60% and up to 10% worse.

P values
P < 0.05 indicates a statistically significant difference between groups. P>0.05 indicates there
is not a statistically significant difference between groups.
Concept check 3
So if the trial comparing SuperStatin to placebo stated OR 0.5 95%CI 0.4-0.6 p<0.01
What would it mean?
A) The odds of death in the SuperStatin arm are 50% less than in the placebo arm with the
true population effect between 60% and 40%.  This result was statistically significant.
B) The odds of death in the SuperStatin arm are 50% less than in the placebo arm with the
true population effect between 60% and 40%. This result was not statistically significant.
C) The odds of death in the SuperStatin arm are 50% less than in the placebo arm with the
true population effect between 60% and 40%. This result was equivocal.

Bringing it all together – Real world example


A drug company-funded double blind randomised controlled trial evaluated the efficacy of an
adenosine receptor antagonist Cangrelor vs Clopidogrel in patients undergoing urgent or
elective Percutaneous Coronary Intervention (PCI) who were followed up for specific
complications for 48 hrs as outlined in the diagram below (Bhatt et al. 2009).
The results section reported “The rate of the primary efficacy end point was … (adjusted
odds ratio with Cangrelor, 0.78; 95% confidence interval [CI], 0.66 to 0.93; P=0.005)
What does this mean?
A)  The odds of death, myocardial infarction, ischemia-driven revascularization, or stent
thrombosis at 48 hours after randomization in the Cangrelor arm were 22% less than in the
Clopidogrel arm with the true population effect between 34% and 7%.  This result was not
statistically significant.
B) The odds of death, myocardial infarction, ischemia-driven revascularization, or stent
thrombosis at 48 hours after randomization in the Cangrelor arm were 34% less than in the
Clopidogrel arm with the true population effect between 7% and 22%. This result was
statistically significant.
C) The odds of death, myocardial infarction, ischemia-driven revascularization, or stent
thrombosis at 48 hours after randomization in the Cangrelor arm were 22% less than in the
Clopidogrel arm with the true population effect between 34% and 7%. This result was
statistically significant.

Summary
This is a very basic introduction to interpreting odds ratios, confidence intervals and p values
only and should help healthcare students begin to make sense of published research, which
can initially be a daunting prospect.  However it should be stressed that any results are only
valid if the study was well designed and conducted, which highlights the importance of
critical appraisal as a key feature of evidence based medicine.

Self test Answers


Concept check 1.  The correct answer is A.
Concept check 2.  The correct answer is B.
Concept check 3.  The correct answer is A.
Bringing it all together – Real world example. The correct answer is C.

How do you interpret odds ratios?


An odds ratio of exactly 1 means that exposure to property A does not affect the odds of
property B. An odds ratio of more than 1 means that there is a higher odds of property B
happening with exposure to property A. An odds ratio is less than 1 is associated with
lower odds.

What does an odds ratio of 0.5 mean?


An odds ratio of 0.5 means that the odds of the exposure being found in the case group is
50% less than the odds of finding the exposure in the control group.

BIVARIATE ANALYSIS FOR NOMINAL VARIABLES

Data Levels of Measurement


A variable has one of four different levels of measurement: Nominal, Ordinal, Interval, or
Ratio.  (Interval and Ratio levels of measurement are sometimes called Continuous or Scale).
It is important for the researcher to understand the different levels of measurement, as these
levels of measurement, together with how the research question is phrased, dictate what
statistical analysis is appropriate.
In descending order of precision, the four different levels of measurement are:
 Nominal–Latin for name only (Republican, Democrat, Green, Libertarian)
 Ordinal–Think ordered levels or ranks (small–8oz, medium–12oz, large–32oz)
 Interval–Equal intervals among levels (1 dollar to 2 dollars is the same interval as 88
dollars to 89 dollars)
 Ratio–Let the “o” in ratio remind you of a zero in the scale (Day 0, day 1, day 2, day
3, …)
The first level of measurement is nominal level of measurement.  In this level of
measurement, the numbers in the variable are used only to classify the data.  In this level of
measurement, words, letters, and alpha-numeric symbols can be used.  Suppose there are data
about people belonging to three different gender categories. In this case, the person belonging
to the female gender could be classified as F, the person belonging to the male gender could
be classified as M, and transgendered classified as T.  This type of assigning classification is
nominal level of measurement.
The second level of measurement is the ordinal level of measurement.  This level of
measurement depicts some ordered relationship among the variable’s observations.  Suppose
a student scores the highest grade of 100 in the class.  In this case, he would be assigned the
first rank.  Then, another classmate scores the second highest grade of an 92; she would be
assigned the second rank.  A third student scores a 81 and he would be assigned the third
rank, and so on.   The ordinal level of measurement indicates an ordering of the
measurements.
The third level of measurement is the interval level of measurement.  The interval level of
measurement not only classifies and orders the measurements, but it also specifies that the
distances between each interval on the scale are equivalent along the scale from low interval
to high interval.  For example, an interval level of measurement could be the measurement of
anxiety in a student between the score of 10 and 11, this interval is the same as that of a
student who scores between 40 and 41.   A popular example of this level of measurement
is temperature in centigrade, where, for example, the distance between 940C and 960C is the
same as the distance between 1000C and 1020C.
The fourth level of measurement is the ratio level of measurement.  In this level of
measurement, the observations, in addition to having equal intervals, can have a value of zero
as well.  The zero in the scale makes this type of measurement unlike the other types of
measurement, although the properties are similar to that of the interval level of measurement.
In the ratio level of measurement, the divisions between the points on the scale have an
equivalent distance between them.
The researcher should note that among these levels of measurement, the nominal level is
simply used to classify data, whereas the levels of measurement described by the interval
level and the ratio level are much more exact.

 Let’s deal with the importance part first.


Knowing the level of measurement of your variables is important for two reasons.  1.Each
of the levels of measurement provides a different level of detail.  Nominal provides the least
amount of detail, ordinal provides the next highest amount of detail, and interval and ratio
provide the most amount of detail.
In a nominal level variable, values are grouped into categories that have no meaningful order.
For example, gender and political affiliation are nominal level variables.  Members in the
group are assigned a label in that group and there is no hierarchy.  Typical descriptive
statistics associated with nominal data are frequencies and percentages.
Ordinal level variables are nominal level variables with a meaningful order.  For example,
horse race winners can be assigned labels of first, second, third, fourth, etc. and these labels
have an ordered relationship among them (i.e., first is higher than second, second is higher
than third, and so on).  As with nominal level variables, ordinal level variables are typically
described with frequencies and percentages.
Interval and ratio level variables (also called continuous level variables) have the most detail
associated with them.  Mathematical operations such as addition, subtraction, multiplication,
and division can be accurately applied to the values of these variables.  An example variable
would be the amount of milk used in cookie recipe (measured in cups).  This variable has
arithmetic properties such that 2 cups of milk is exactly twice as much as 1 cup of milk. 
Additionally, the difference between 1 and 2 cups of milk is exactly the same as the
difference between 2 and 3 cups of milk.  Interval and ratio level variables are typically
described using means and standard deviations.
2. The second reason levels of measurement are important to know is because different
statistical tests are appropriate for variables with different levels of measurement.  For
example, chi-square tests of independence are most appropriate for nominal level data.  The
Mann-Whitney U test is most appropriate for an ordinal level dependent variable and a
nominal level independent variable.  An ANOVA is most appropriate for a continuous level
dependent variable and a nominal level independent variable.  

What is an example of a nominal variable?


Nominal. A nominal scale describes a variable with categories that do not have a natural
order or ranking. ... Examples of nominal variables include: genotype, blood type, zip code,
gender, race, eye color, political party.
What is bivariate data analysis?
Bivariate analysis means the analysis of bivariate data. It is one of the simplest forms
of statistical analysis, used to find out if there is a relationship between two sets of values.
It usually involves the variables X and Y. Univariate analysis is the analysis of one (“uni”)
variable.
While descriptive statistics describes the characteristics of a single variable, inferential
statistics examines the relationship between two or more variables. Bivariate statistics is
a type of inferential statistics that deals with the relationship between two variables. That is,
bivariate statistics examines how one variable compares with another or how one variable
influences another variable.

Types of Bivariate Analysis


The types of a bivariate analysis will depend upon the types of variables or attributes we
will use for analysing. The variable could be numerical, categorical or ordinal.
If the independent variable is categorical, like a particular brand of pen, then logit or probit
regression can be used.
If independent and dependent both the attributes are ordinal, which means they have position
or ranking, then we can measure a rank correlation coefficient.
If dependent attribute is ordinal, then ordered logit or ordered probit can be utilised.
If the dependent attribute is either ratio or interval, like temperature scale, then we can
measure regression. So based on these data, we can mention the types of bivariate data
analysis:
1. Numerical and Numerical – In this type, both the variables of bivariate data,
independent and dependent, are having numerical values.- Scatter plot, Linear
correlation ,
2. Categorical and Categorical – When both the variables are categorical.:Stacked
Column Chart, CHI square
3. Numerical and Categorical – When one variable is numerical and one is categorical.
Line Chart with Error Bars, Z-test and t-test(Z-test and t-test are basically the same.
They assess whether the averages of two groups are statistically different from each
other.), ANOVA

REGRESSION ANALYSIS AND DIFFERENT MODELS


Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable (s) (predictor). This
technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables. For example, relationship between rash driving
and number of road accidents by a driver is best studied through regression.
Regression analysis is an important tool for modelling and analyzing data. Here, we fit a
curve / line to the data points, in such a manner that the differences between the distances of
data points from the curve or line is minimized.  

Why do we use Regression Analysis?


As mentioned above, regression analysis estimates the relationship between two or more
variables. Let’s understand this with an easy example:
Let’s say, you want to estimate growth in sales of a company based on current economic
conditions. You have the recent company data which indicates that the growth in sales is
around two and a half times the growth in the economy. Using this insight, we can predict
future sales of the company based on current & past information.
There are multiple benefits of using regression analysis. They are as follows:
1. It indicates the significant relationships between dependent variable and independent
variable.
2. It indicates the strength of impact of multiple independent variables on a dependent
variable.
Regression analysis also allows us to compare the effects of variables measured on different
scales, such as the effect of price changes and the number of promotional activities. These
benefits help market researchers / data analysts / data scientists to eliminate and evaluate the
best set of variables to be used for building predictive models.
 
How many types of regression techniques do we have?
There are various kinds of regression techniques available to make predictions. These
techniques are mostly driven by three metrics (number of independent variables, type
of dependent variables and shape of regression line). We’ll discuss them in detail in the
following sections.
But before you start that, let us understand the most commonly used regressions:
 
1. Linear Regression
It is one of the most widely known modeling technique. Linear regression is usually among
the first few topics which people pick while learning predictive modeling. In this
technique, the dependent variable is continuous, independent variable(s) can be continuous or
discrete, and nature of regression line is linear.
Linear Regression establishes a relationship between dependent variable (Y) and one or
more independent variables (X) using a best fit straight line (also known as regression line).
It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of the line and e
is error term. This equation can be used to predict the value of target variable based on given
predictor variable(s).
The difference between simple linear regression and multiple linear regression is that,
multiple linear regression has (>1) independent variables, whereas simple linear regression
has only 1 independent variable.

2. Logistic Regression
Logistic regression is used to find the probability of event=Success and event=Failure. We
should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/
No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following
equation.

3. Polynomial Regression
A regression equation is a polynomial regression equation if the power of independent
variable is more than 1.
4.Stepwise Regression
This form of regression is used when we deal with multiple independent variables. In this
technique, the selection of independent variables is done with the help of an automatic
process, which involves no human intervention.
This feat is achieved by observing statistical values like R-square, t-stats and AIC metric to
discern significant variables. Stepwise regression basically fits the regression model by
adding/dropping co-variates one at a time based on a specified criterion. Some of the most
commonly used Stepwise regression methods are listed below:
 Standard stepwise regression does two things. It adds and removes predictors as
needed for each step.
 Forward selection starts with most significant predictor in the model and adds variable
for each step.
 Backward elimination starts with all predictors in the model and removes the least
significant variable for each step.
The aim of this modeling technique is to maximize the prediction power with minimum
number of predictor variables. It is one of the method to handle higher dimensionality of data
set.
 
5. Ridge Regression
Ridge Regression is a technique used when the data suffers from multicollinearity
( independent variables are highly correlated). In multicollinearity, even though the least
squares estimates (OLS) are unbiased, their variances are large which deviates the observed
value far from the true value. By adding a degree of bias to the regression estimates, ridge
regression reduces the standard errors.
select the right regression model?
Life is usually simple, when you know only one or two techniques. One of the training
institutes I know of tells their students – if the outcome is continuous – apply linear
regression. If it is binary – use logistic regression! 

 NMHS and the sampling for the study, how to calculate prevalence,

DEGREES OF FREEDOM
Definition: The Degrees of Freedom refers to the number of values involved in the
calculations that have the freedom to vary. In other words, the degrees of freedom, in general,
can be defined as the total number of observations minus the number of independent
constraints imposed on the observations.

The degrees of freedom are calculated for the following statistical tests to check their
validity:
1. t-Distribution
2. F- Distribution
3. Chi-Square Distribution

These tests are usually done to compare the observed data with the data that is expected to be
obtained with a specific hypothesis.
Degrees of Freedom is usually denoted by a Greek symbol ν (mu) and is commonly
abbreviated as, df. The statistical formula to compute the value of degrees of freedom is
quite simple and is equal to the number of values in the data set minus one. Symbolically:
df= n-1
Where n is the number of values in the data set or the sample size. The concept of df can be
further understood through an illustration given below:
Suppose there is a data set X that includes the values: 10,20,30,40. First of all, we will
calculate the mean of these values, which is equal to:
(10+20+30+40) /4 = 25.
Once the mean is calculated, apply the formula of degrees of freedom. As the number of
values in the data set or sample size is 4, so,
df = 4-1=3.
Thus, this shows that there are three values in the data set that have the freedom to vary as
long as the mean is 25.

How do you find the degrees of freedom for a paired sample t test?
The most commonly encountered equation to determine degrees of freedom in statistics
is df = N-1

What is the calculation of the degrees of freedom for the two independent sample t test?
In the case of a t-test, there are two samples, so the degrees of freedom are N1 + N2 – 2 = df.
Once you determine the significance level (first row) and the degrees of freedom (first
column), the intersection of the two in the chart is the critical value for your particular study.

dF for ANOVA
One way ANOVA
DF1= k(no of groups) -1
Two way Anova 2*3 design
Df1= (2-1) *(3-1)

DF2 = N(no of participants)- K( no of groups )

For the case of within subjects-designs, things can become a bit more complicated. The
following paragraphs are work in progress. The calculation of df2 for a repeated measures
ANOVA with one within-subjects factor is as follows: df2 = df_total – df_subjects –
df_factor, where df_total = number of observations (across all levels of the within-subjects
factor, n) – 1, df_subjects = number of participants (N) – 1, and df_factor = number of levels
(k) – 1. Basically, the take home message for repeated measures ANOVA is that you lose one
additional degree of freedom for the subjects 

Df for Chi square


Rows- 1 * columns-1

CHI SQUARE
The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze
group differences when the dependent variable is measured at a nominal level. ... The
Cramer's V is the most common strength test used to test the data when a significant Chi-
square result has been obtained.

What is a chi square test used for?


The chi-squared test is used to determine whether there is a significant difference between the
expected frequencies and the observed frequencies in one or more categories. ... A chi-
squared test can be used to attempt rejection of the null hypothesis that the data are
independent.

Important points before we get started:


 This test only works for categorical data (data in categories), such as Gender
{Men, Women} or color {Red, Yellow, Green, Blue} etc, but not numerical data such as height
or weight.
 The numbers must be large enough. Each entry must be 5 or more. In our
example we have values such as 209, 282, etc, so we are good to go.
Our first step is to state our hypotheses:
The two hypotheses are.
 Gender and preference for cats or dogs are independent.
 Gender and preference for cats or dogs are not independent.
Lay the data out in a table:
  Cat Dog
Men 207 282
Women 231 242
Add up rows and columns:
  Cat Dog  
Men 207 282 489
Women 231 242 473
  438 524 962
Calculate "Expected Value" for each entry:
Multiply each row total by each column total and divide by the overall total:
  Cat Dog  
Men 489×438/962 489×524/962 489
Women 473×438/962 473×524/962 473
  438 524 962
Which gives us:
  Cat Dog  
Men 222.64 266.36 489
Women 215.36 257.64 473
  438 524 962
Subtract expected from actual, square it, then divide by expected:
  Cat Dog  
2 2
Men (207-222.64) 222.64 (282-266.36) 266.36 489
Women (231-215.36)2215.36 (242-257.64)2257.64 473
  438 524 962
Which is:
  Cat Dog  
Men 1.099 0.918 489
Women 1.136 0.949 473
  438 524 962
Now add up those values:
1.099 + 0.918 + 1.136 + 0.949 = 4.102
Chi-Square is 4.102
From Chi-Square to p
To get from Chi-Square to p-value is a difficult calculation, so either look it up in a table, or

use the  Chi-Square Calculator .

But first you will need a "Degree of Freedom" (DF)


Calculate Degrees of Freedom
Multiply (rows − 1) by (columns − 1)
Example: DF = (2 − 1)(2 − 1) = 1×1 = 1
Result
The result is:
p = 0.04283
Done!
Chi-Square Formula
This is the formula for Chi-Square:

 O = the Observed (actual) value
 E = the Expected value

WHY USE THE ANOVA OVER A T-TEST? 


The point of conducting an experiment is to find a significant effect between the
stimuli being tested. To do this various statistical tests are used, the 2 being
discussed in this blog will be the ANOVA and the t-test. In a psychology
experiment an independent variable and dependant variable are the stimuli being
manipulated and the behaviour being measured. Statistical tests are carried out to
confirm if the behaviour occurring is more than chance.
The t-test compares the means between 2 samples and is simple to conduct,
but if there is more than 2 conditions in an experiment a ANOVA is
required. The fact the ANOVA can test more than one treatment is a major
advantage over other statistical analysis such as the t-test, it opens up many
testing capabilities but it certainly doesn’t help with mathematical headaches. It
is important to know that when looking at the analysis of variance an IV is
called a factor, the treatment conditions or groups in an experiment are
called the levels of the factor. ANOVA’s use an F-ratio as its significance
statistic which is variance because it is impossible to calculate the sample
means difference with more than two samples.

T-tests are easier to conduct, so why not conduct a t-test for the possible
interactions in the experiment? A Type I error is the answer because the more
hypothesis tests you use the more you risk making a type I error and the less
power a test has. There is no disputing the t-test changed statistics with its
ability to find significance with a small sample, but as previously mentioned the
ANOVA allowed for testing more than 2 means. ANOVA’s are used a lot
professionally when testing pharmaceuticals and therapies.
The ANOVA is an important test because it enables us to see for example how
effective two different types of treatment are and how durable they are.
Effectively a ANOVA can tell us how well a treatment work, how long it lasts
and how budget friendly it will be an example being intensive early behavioural
intervention (EIBI) for autistic children which lasts a long time with a lot hour,
has amazing results but costs a lot of money. The ANOVA is able to tell us if
another therapy can do the same task in shorter amount of time and therefor
costing less and making the treatment more accessible. Conducting this test
would also help establish concurrent validity for the therapy against EIBI. The F-
ratio tells the researcher how big of a difference there is between the conditions
and the effect is more than just chance. ANOVA test assumes three things:
 The population sample must be normal
 The observations must be independent in each sample
 The population the samples are selected from have equal variance
a.k.a. homogeneity of variance.
These requirements are the same for a paired and a repeated measures t-test and
these measured are solved in the same way for the t-test and the ANOVA. The
population sample is assumed to be normal anyway, the independent samples is
achieved with the design of the experiment, if the variance is not correct then
normally more data (participants) is needed in the experiment.
In conclusion it is necessary to use the ANOVA when the design of a study has
more than 2 condition to compare. The t-test is simple and less daunting
especially when you see a 2x4x5 factorial ANOVA is needed, but the risk of
committing a type I error is not worth it. The time you spent conducting the
experiment only to have it declared obsolete because the right statistical test
wasn’t conducted would be a waste of time and resources, statistical tests should
be used correctly for this reason.

 standard error

TYPE 1 AND TYPE 2 ERROR


Statistical hypothesis testing implies that no test is ever 100% certain: that’s because we rely
on probabilities to experiment.
Even though hypothesis tests are meant to be reliable, there are two types of errors that can
occur.
These errors are known as type 1 and type 2 errors.

Understanding Type 1 errors


Type 1 errors – often assimilated with false positives – happen in hypothesis testing
when the null hypothesis is true but rejected. The null hypothesis is a general statement or
default position that there is no relationship between two measured phenomena.
Simply put, type 1 errors are “false positives” – they happen when the tester validates a
statistically significant difference even though there isn’t one.
Source
Type 1 errors have a probability of  “α” correlated to the level of confidence that you set. A
test with a 95% confidence level means that there is a 5% chance of getting a type 1
error.

Consequences of a type 1 Error


Type 1 errors can happen due to bad luck (the 5% chance has played against you) or because
you didn’t respect the test duration and sample size initially set for your experiment.
Consequently, a type 1 error will bring in a false positive. This means that you will
wrongfully assume that your hypothesis testing has worked even though it hasn’t.
In real life situations, this could potentially mean losing possible sales due to a faulty
assumption caused by the test.
A real-life example of a type 1 error
Let’s say that you want to increase conversions on a banner displayed on your website. For
that to work out, you’ve planned on adding an image to see if it increases conversions or not.
You start your A/B test running a control version (A) against your variation (B) that contains
the image. After 5 days, the variation (B) outperforms the control version by a staggering
25% increase in conversions with an 85% level of confidence.
You stop the test and implement the image in your banner. However, after a month, you
noticed that your month-to-month conversions have actually decreased.
That’s because you’ve encountered a type 1 error: your variation didn’t actually beat your
control version in the long run.
Understanding type 2 errors
If type 1 errors are commonly referred to as “false positives”, type 2 errors are referred to
as “false negatives”.
Type 2 errors happen when you inaccurately assume that no winner has been declared
between a control version and a variation although there actually is a winner.
In more statistically accurate terms, type 2 errors happen when the null hypothesis is
false and you subsequently fail to reject it.
If the probability of making a type 1 error is determined by “α”, the probability of a
type 2 error is “β”. Beta depends on the power of the test (i.e the probability of not
committing a type 2 error, which is equal to 1-β).

Consequences of a type 2 error


Similarly to type 1 errors, type 2 errors can lead to false assumptions and poor decision
making that can result in lost sales or decreased profits.
Moreover, getting a false negative (without realizing it) can discredit your conversion
optimization efforts even though you could have proven your hypothesis. This can be a
discouraging turn of events that could happen to all CRO experts and digital marketers.

A real-life example of a type 2 error


Let’s say that you run an e-commerce store that sells high-end, complicated hardware for
tech-savvy customers. In an attempt to increase conversions, you have the idea to implement
an FAQ below your product page.
“Should we add an FAQ at the bottom of the product page”? Source: Digital Storm
You launch an A/B test to see if the variation (B) could outperform your control version (A).
After a week, you do not notice any difference in conversions: both versions seem to convert
at the same rate and you start questioning your assumption. Three days later, you stop the test
and keep your product page as it is.
At this point, you assume that adding an FAQ to your store didn’t have any effect on
conversions.
Two weeks later, you hear that a competitor has implemented an FAQ at the same time and
observed tangible gains in conversions. You decide to re-run the test for a month in order to
get more statistically relevant results based on an increased level of confidence (say 95%).
After a month – surprise – you discover positive gains in conversions for the variation (B).
Adding an FAQ at the bottom of your product page has indeed brought your company more
sales than the control version.
That’s right – your first test encountered a type 2 error!

SPECIFICITY AND SENSITIVITY


Sensitivity is defined as the ability of a test to identify as positive, all the patients who
actually have the disease. 
Specificity is defined as the ability of a test to identify as negative all the patients who do
not have the disease. ... Therefore, specificity is more important than sensitivity.

 What does it mean if a test is sensitive but not specific?


A highly sensitive test means that there are few false negative results, and thus fewer
cases of disease are missed. The specificity of a test is its ability to designate
an individual who does not have a disease as negative. A highly specific test means that
there are few false positive results.

What does 90% specificity mean?


A test that is 90% specific will identify 90% of patients who do not have the disease. Tests
with a high specificity (a high true negative rate) are most useful when the result is positive.
A highly specific test can be useful for ruling in patients who have a certain disease

PARAMETRIC AND NON PARAMETRIC TESTS


Parametric tests are those that make assumptions about the parameters of the population
distribution from which the sample is drawn. This is often the assumption that the population
data are normally distributed.
 Non-parametric tests are “distribution-free” and, as such, can be used for non-Normal
variables.

The following table lists the nonparametric tests and their parametric alternatives.
N ON P A RA M ETRIC TES T P A RA M ETRIC A LTERN A TIV E
1-sample sign test One-sample Z-test, One sample t-test
1-sample Wilcoxon Signed Rank test One sample Z-test, One sample t-test
Friedman test Two-way ANOVA
Kruskal-Wallis test One-way ANOVA
Mann-Whitney test Independent samples t-test
Mood’s Median test One-way ANOVA
Spearman Rank Correlation Correlation Coeff

WHAT IS A GOLD STANDARD TEST?


A gold standard test is a best available diagnostic test for determining whether a patient
does or does not have a disease or condition. For example, a biopsy can identify breast
cancer cells with good accuracy, while an autopsy are usually accurate at identifying the
cause of death. It is usually used when an initial screening gives a positive result. People
who test negative are not usually given the gold standard tests, because they are often
expensive, invasive, or risky.

Gold standard tests mean that diseases and conditions can be correctly classified.
Examples
Breast CancerMammogramTissue Biopsy
D IS EAS E/CO N D ITIO N IN ITIA L TES T G O LD S TA N DA R D
Rabies (in animals) ? DFA test
Tuberculosis Tuberculin skin or blood test Lowenstein-Jensen culture
Colorectal cancer Blood in stool sample Colonoscopy
Diabetes Fasting plasma glucose Oral GTT
Assessing new tests
New diagnostic tests are compared against the gold standard. Sensitivity and specificity for
these new tests are usually estimated from comparing them to the gold standard. This system
is not perfect: no test (even if the best available) is perfect and has some bias (error) attached
to it. For example, biopsies vary wildly in accuracy depending on the specific cancer type.
What this means is you’re essentially comparing any new test to a standard that has some
error attached it it — which means your new test will also have error. This can be avoided by
using a third, “resolver” test, followed by re-testing some people whose results are the same
(either positive or negative) on both the new test and the resolver test.

SKEWNESS AND KURTOSIS


What does skewness mean?
Skewness is asymmetry in a statistical distribution, in which the curve appears distorted or
skewed either to the left or to the right. Skewness can be quantified to define the extent to
which a distribution differs from a normal distribution.
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,
or data set, is symmetric if it looks the same to the left and right of the center point.
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal
distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data
sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution
would be the extreme case.
The skewness for a normal distribution is zero, and any symmetric data should have a
skewness near zero. Negative values for the skewness indicate data that are skewed left and
positive values for the skewness indicate data that are skewed right. By skewed left, we mean
that the left tail is long relative to the right tail. Similarly, skewed right means that the right
tail is long relative to the left tail. If the data are multi-modal, then this may affect the sign of
the skewness.
The kurtosis for a standard normal distribution is three. 

CRITICAL VALUE,
What is a critical value in statistics?
In hypothesis testing, a critical value is a point on the test distribution that is compared to the
test statistic to determine whether to reject the null hypothesis. If the absolute value of your
test statistic is greater than the critical value, you can declare statistical significance and
reject the null hypothesis.

When the sampling distribution of the statistic is normal or nearly normal, the critical value
can be expressed as a t score or as a z-score. To find the critical value, follow these steps.
 Compute alpha (α): α = 1 - (confidence level / 100)
 Find the critical probability (p*): p* = 1 - α/2
 To express the critical value as a z-score, find the z-score having a cumulative
probability equal to the critical probability (p*).
 To express the critical value as a t statistic, follow these steps.
 Find the degrees of freedom (df). Often, df is equal to the sample size minus
one.
 The critical t statistic (t*) is the t statistic having degrees of freedom equal to
df and a cumulative probability equal to the critical probability (p*).
Should you express the critical value as a t statistic or as a z-score? There are several ways to
answer this question. As a practical matter, when the sample size is large (greater than 40), it
doesn't make much difference. Both approaches yield similar results. Strictly speaking, when
the population standard deviation is unknown or when the sample size is small, the t statistic
is preferred. Nevertheless, many introductory texts and the Advanced Placement Statistics
Exam use the z-score exclusively. 

MEASURES OF DISPERSION
What do measures of dispersion show?
As the name suggests, the measure of dispersion shows the scatterings of the data. It tells
the variation of the data from one another and gives a clear idea about the distribution of the
data. The measure of dispersion shows the homogeneity or the heterogeneity of the
distribution of the observations

What are the four measures of variation?


Just as in the section on central tendency where we discussed measures of the center of a
distribution of scores, in this chapter we will discuss measures of the variability of a
distribution. There are four frequently used measures of variability: the range, interquartile
range, variance, and standard deviation.
RANGE
The range is the simplest measure of variability to calculate, and one you have probably
encountered many times in your life. The range is simply the highest score minus the lowest
score
INTERQUARTILE RANGE
The interquartile range (IQR) is the range of the middle 50% of the scores in a distribution. It
is computed as follows:
IQR = 75th percentile - 25th percentile
VARIANCE
Variability can also be defined in terms of how close the scores in the distribution are to the
middle of the distribution. Using the mean as the measure of the middle of the distribution,
the variance is defined as the average squared difference of the scores from the mean.

STANDARD DEVIATION
The standard deviation is simply the square root of the variance. This makes the standard
deviations of the two quiz distributions 1.225 and 2.588. The standard deviation is an
especially useful measure of variability when the distribution is normal or approximately
normal (see Chapter on Normal Distributions) because the proportion of the distribution
within a given number of standard deviations from the mean can be calculated. For example,
68% of the distribution is within one standard deviation of the mean and approximately 95%
of the distribution is within two standard deviations of the mean. Therefore, if you had a
normal distribution with a mean of 50 and a standard deviation of 10, then 68% of the
distribution would be between 50 - 10 = 40 and 50 +10 =60. Similarly, about 95% of the
distribution would be between 50 - 2 x 10 = 30 and 50 + 2 x 10 = 70. The symbol for the
population standard deviation is σ; the symbol for an estimate computed in a sample is s.
Figure 2 shows two normal distributions. The red distribution has a mean of 40 and a
standard deviation of 5; the blue distribution has a mean of 60 and a standard deviation of 10.
For the red distribution, 68% of the distribution is between 35 and 45; for the blue
distribution, 68% is between 50 and 70.
Level Of Significance

The level of significance is defined as the probability of rejecting a null hypothesis by the test
when it is really true, which is denoted as α. That is, P (Type I error) = α.

Confidence level:

Confidence level refers to the possibility of a parameter that lies within a specified range of
values, which is denoted as c. Moreover, the confidence level is connected with the level of
significance. The relationship between level of significance and the confidence level is
c=1−α.

The common level of significance and the corresponding confidence level are given below:

• The level of significance 0.10 is related to the 90% confidence level.

• The level of significance 0.05 is related to the 95% confidence level.

• The level of significance 0.01 is related to the 99% confidence level.

The rejection rule is as follows:

Rejection region:
The rejection region is the values of test statistic for which the null hypothesis is rejected.

Non rejection region:

The set of all possible values for which the null hypothesis is not rejected is called the
rejection region.

The rejection region for two-tailed test is shown below:

The rejection region for one-tailed test is given below:

• In the left-tailed test, the rejection region is shaded in left side.

How do you find the level of significance?


To find the significance level, subtract the number shown from one. For example, a value of
".01" means that there is a 99% (1-.01=.99) chance of it being true.

-importance of testing hypothesis


According to the San Jose State University Statistics Department, hypothesis testing is one
of the most important concepts in statistics because it is how you decide if something really
happened, or if certain treatments have positive effects, or if groups differ from each other or
if one variable predicts another.
The Importance of Hypothesis Testing
By Sirah Dubois
A hypothesis is a theory or proposition set forth as an explanation for the occurrence of
some observed phenomenon, asserted either as a provisional conjecture to guide
investigation, called a working hypothesis, or accepted as highly probable in lieu of the
established facts. A scientific hypothesis can become a theory or ultimately a law of nature
if it is proven by repeatable experiments. Hypothesis testing is common in statistics as a
method of making decisions using data. In other words, testing a hypothesis is trying to
determine if your observation of some phenomenon is likely to have really occurred based
on statistics.
Statistical Hypothesis Testing
Statistical hypothesis testing, also called confirmatory data analysis, is often used to decide
whether experimental results contain enough information to cast doubt on conventional
wisdom. For example, at one time it was thought that people of certain races or color had
inferior intelligence compared to Caucasians. A hypothesis was made that intelligence is
not based on race or color. People of various races, colors and cultures were given
intelligence tests and the data was analyzed. Statistical hypothesis testing then proved that
the results were statistically significant in that the similar measurements of intelligence
between races are not merely sample error.
Null and Alternative Hypotheses
Before testing for phenomena, you form a hypothesis of what might be happening. Your
hypothesis or guess about what’s occurring might be that certain groups are different from
each other, or that intelligence is not correlated with skin color, or that some treatment has
an effect on an outcome measure, for examples. From this, there are two possibilities: a
“null hypothesis” that nothing happened, or there were no differences, or no cause and
effect; or that you were correct in your theory, which is labeled the “alternative
hypothesis.” In short, when you test a statistical hypothesis, you are trying to see if
something happened and are comparing against the possibility that nothing happened.
Confusingly, you are trying to disprove that nothing happened. If you disprove that nothing
happened, then you can conclude that something happened.
VIDEO OF THE DAY
00:1901:36
Importance of Hypothesis Testing
According to the San Jose State University Statistics Department, hypothesis testing is one
of the most important concepts in statistics because it is how you decide if something really
happened, or if certain treatments have positive effects, or if groups differ from each other
or if one variable predicts another. In short, you want to proof if your data is statistically
significant and unlikely to have occurred by chance alone. In essence then, a hypothesis test
is a test of significance.
Possible Conclusions
Once the statistics are collected and you test your hypothesis against the likelihood of
chance, you draw your final conclusion. If you reject the null hypothesis, you are claiming
that your result is statistically significant and that it did not happen by luck or chance. As
such, the outcome proves the alternative hypothesis. If you fail to reject the null hypothesis,
you must conclude that you did not find an effect or difference in your study. This method
is how many pharmaceutical drugs and medical procedures are tested.

-RCT
What is randomized controlled trial in research?
Randomized controlled trial: (RCT) A study in which people are allocated at random (by
chance alone) to receive one of several clinical interventions. One of these interventions is
the standard of comparison or control. The control may be a standard practice, a placebo
("sugar pill"), or no intervention at all.

Random Samples / Randomization Copyright © 1998 Gerard E. Dallal


Random Samples and Randomization are two different things, but they have something in
common as the presence of random in both names suggests — both involve the use of a
probability device. With random samples, chance determines who will be in the sample. With
randomization [aka, random assignment], chance determines the assignment of treatments. A
random sample is drawn from a population by using a probability device. We might put
everyone’s name on a slip of paper, mix thoroughly, and select the number of names we
need. The use of a probability device to select the subjects allows us to make valid
generalizations from the sample to the population. In an intervention trial, randomization
refers to the use of a probability device to assign subjects to treatment. This allows us to use
statistical methods to make valid statements about the difference between treatments for this
set of subjects. The subjects who are randomized may or may not be a random sample from
some larger population. If they are a random sample, then statistical theory lets us generalize
from this trial to the population from which the sample was drawn. If they are not a random
sample from some larger population, then generalizing beyond the trial is a matter of
nonstatistical judgment [italics added].

-methods of randomization - pros and cons


INTRODUCTION A good clinical trial minimizes variability of the evaluation and provides
an unbiased evaluation of the intervention by avoiding confounding from other factors.
Randomization insures that each patient have an equal chance of receiving any of the
treatments under study, generate comparable intervention groups which are alike in all
important aspects except for the intervention each group receives. It also provides a basis for
the statistical methods used in analyzing data. WHY RANDOMIZATION The basic benefits
of randomization include 1. Eliminates selection bias. 2. Balances arms with respect to
prognostic variables (known and unknown). 3. Forms basis for statistical tests, a basis for an
assumption-free statistical test of the equality of treatments. In general, a randomized trial is
an essential tool for testing the efficacy of the treatment. CRITERIA FOR
RANDOMIZATION 1. Unpredictability • Each participant has the same chance of receiving
any of the interventions. • Allocation is carried out using a chance mechanism so that neither
the participant nor the investigator will know in advance which will be assigned. 2. Balance •
Treatment groups are of a similar size & constitution, groups are alike in all important aspects
and only differ in the intervention each group receives 3. Simplicity • Easy for
investigator/staff to implement METHODS OF RANDOMIZATION The common types of
randomization include (1) simple, (2) block, (3) stratified and (4) unequal randomization.
Some other methods such as biased coin, minimization and response-adaptive methods may
be applied for specific purposes. 1. Simple Randomization This method is equivalent to
tossing a coin for each subject that enters a trial, such as Heads = Active, Tails = Placebo.
The random number generator is generally used. It is simple and easy to implement and
treatment assignment is completely unpredictable. However, it can get imbalanced in
treatment assignment, especially in smaller trials. Imbalanced randomization reduces
statistical power. In trial of 10 participants, treatment effect variance for 5-5 split relative to
7-3 split is (1/5+1/5)/(1/7+1/3)=.84, so 7-3 split is only 84% as efficient as 5-5 split. Even if
treatment is balanced at the end of a trial, it may not be balanced at some time during the
trial. For example, the trial may be balanced at end with 100 participants, but the first 10
might be AAAATATATA. If the trial is monitored during the process, we’d like to have
balance in the number of subjects on each treatment over time. 2. Block Randomization
Simple randomization does not guarantee balance in numbers during trial. Especially, if
patient characteristics change with time, (e.g. early patients sicker than later), early
imbalances can't be corrected. Block randomization is often used to fix this issue. The basic
idea of block randomization is to divide potential patients into m blocks of size 2n, randomize
each block such that n patients are allocated to A and n to B. then choose the blocks
randomly. This method ensures equal treatment allocation within each block if the complete
block is used. Example: Two treatments of A, B and Block size of 2 x 2= 4 Possible
treatment allocations within each block are (1) AABB, (2) BBAA, (3) ABAB, (4) BABA, (5)
ABBA, (6) BAAB Block size depends on the number of treatments, it should be short enough
to prevent imbalance, and long enough to prevent guessing allocation in trials. The block size
should be at least 2x number of treatments (ref ICH E9). The block size is not stated in the
protocol so the clinical and investigators are blind to the block size. If blocking is not masked
in open-label trials, the sequence becomes somewhat predictable (e.g. 2n= 4): B A B ? Must
be A. A A ? ? Must be B B. This could lead to selection bias. The solution to avoid selection
bias is (1).Do not reveal blocking mechanism. (2). Use random block sizes. If treatment is
double blinded, selection bias is not likely. Note if only one block is requested, then it
produces a single sequence of random assignment, i.e. simple randomization. 3. Stratified
Randomization Imbalance randomization in numbers of subjects reduces statistical power,
but imbalance in prognostic factors is also more likely inefficient for estimating treatment
effect. Trial may not be valid if it is not well balanced across prognostic factors. For example,
with 6 diabetics, there is 22% chance of 5-1 or 6-0 split by block randomization only.
Stratified randomization is the solution to achieve balance within subgroups: use block
randomization separately for diabetics and non-diabetics. For example, Age Group: < 40, 41-
60, >60; Sex: M, F Total number of strata = 3 x 2 = 6 Stratification can balance subjects on
baseline covariates, tend to produce comparable groups with regard to certain characteristics
(e.g., gender, age, race, disease severity), thus produces valid statistical tests. The block size
should be relative small to maintain balance in small strata. Increased number of stratification
variables or increased number of levels within strata leads to fewer patients per stratum.
Subjects should have baseline measurements taken before randomization. Large clinical trials
don’t use stratification. It is unlikely to get imbalance in subject characteristics in a large
randomized trial. 4. Unequal Randomization Most randomized trials allocate equal numbers
of patients to experimental and control groups. This is the most statistically efficient
randomization ratio as it maximizes statistical power for a given total sample size. However,
this may not be the most economically efficient or ethically/practically feasible. When two or
more treatments under evaluation have a cost difference it may be more economically
efficient to randomize fewer patients to the expensive treatment and more to the cheaper one.
The substantial cost savings can be achieved by adopting a smaller randomization ratio such
as a ratio of 2:1, with only a modest loss in statistical power. When one arm of the treatment
saves lives and the other such as placebo/medical care only does not much to save them in the
oncology trials. The subject survival time depends on which treatment they receive. More
extreme allocation may be used in these trials to allocate fewer patients into the placebo
group. Generally, randomization ratio of 3:1 will lose considerable statistical power, more
extreme than 3:1 is not very useful, which leads to much larger sample size.

Advantages of  randomised  control trial study design: Disadvantages of randomised  control trial
Comparative: study design
o One treatment is directly compared to another to  Logistics:
establish superiority. o Power calculation might demand
o This study design can make causal inferences, i.e. it is vast samples size, which require more resources
the strongest empirical evidence of a treatment's efficacy from the investigators
Minimises bias: o Validity requires multiple sites,
o Randomisation minimises allocation bias and selection which will be difficult to manage
bias o Long trial run time may result in the
o Blinding minimises performance bias loss of relevance as practice may have moved
o Double-blinding minimises assessment bias on by the time the trial is published

o Allocation concealment minimises both performance  Statistics

and assessment bias\ o A disadvantage of block


o Prospective design minimises recall error and selection randomization is that the allocation of
bias participants may be predictable and result in

Minimises confounding factors: selection bias when the study groups are

o Randomisation minimises confounding due to unequal unmasked

distribution of prognostic factors  Applicability

o Randomisation makes groups comparable according o Trials which test for efficacy may

both known and unknown factors not be widely applicable. Trials which test

o Blocked randomisation makes groups comparable for effectiveness  are larger and more expensive

within known confounding factors  o Results may not always mimic real

Statistical reliability life treatment situation (e.g. inclusion /

o Statistical test of significance is readily interpretable exclusion criteria; highly controlled setting)

when the study is randomised  Ethical limitations

o Sample size - when adequately powered- avoids o Randomisation requires clinical

both Type 1 error (where the null hypothesis is incorrectly equipoise: one cannot ethically randomise

rejected) and Type 2 error (where the null hypothesis is patients unless both treatments have equal

incorrectly accepted) support in the clinical community

Publishable o Informed consent is often impossible

o Considered the gold standard: more publishable o Some research cannot be ethically
performed as an RCT (classically,  RCT of the
effects of parachutes on the survival of sky-
divers)
- sampling and sampling methods
There are several different sampling techniques available, and they can be subdivided into
two groups: probability sampling and non-probability sampling. In probability (random)
sampling, you start with a complete sampling frame of all eligible individuals from which
you select your sample. In this way, all eligible individuals have a chance of being chosen for
the sample, and you will be more able to generalise the results from your study. Probability
sampling methods tend to be more time-consuming and expensive than non-probability
sampling. In non-probability (non-random) sampling, you do not start with a complete
sampling frame, so some individuals have no chance of being selected. Consequently, you
cannot estimate the effect of sampling error and there is a significant risk of ending up with a
non-representative sample which produces non-generalisable results. However, non-
probability sampling methods tend to be cheaper and more convenient, and they are useful
for exploratory research and hypothesis generation.
 
Probability Sampling Methods
1. Simple random sampling
In this case each individual is chosen entirely by chance and each member of the population
has an equal chance, or probability, of being selected. One way of obtaining a random sample
is to give each individual in a population a number, and then use a table of random numbers
to decide which individuals to include.1 For example, if you have a sampling frame of 1000
individuals, labelled 0 to 999, use groups of three digits from the random number table to
pick your sample. So, if the first three numbers from the random number table were 094,
select the individual labelled “94”, and so on.
As with all probability sampling methods, simple random sampling allows the sampling error
to be calculated and reduces selection bias. A specific advantage is that it is the most
straightforward method of probability sampling. A disadvantage of simple random sampling
is that you may not select enough individuals with your characteristic of interest, especially if
that characteristic is uncommon. It may also be difficult to define a complete sampling frame
and inconvenient to contact them, especially if different forms of contact are required (email,
phone, post) and your sample units are scattered over a wide geographical area.
 
2. Systematic sampling
Individuals are selected at regular intervals from the sampling frame. The intervals are chosen
to ensure an adequate sample size. If you need a sample size n  from a population of size x,
you should select every x/nth individual for the sample.  For example, if you wanted a sample
size of 100 from a population of 1000, select every 1000/100 = 10th member of the sampling
frame.
Systematic sampling is often more convenient than simple random sampling, and it is easy to
administer. However, it may also lead to bias, for example if there are underlying patterns in
the order of the individuals in the sampling frame, such that the sampling technique coincides
with the periodicity of the underlying pattern. As a hypothetical example, if a group of
students were being sampled to gain their opinions on college facilities, but the Student
Record Department’s central list of all students was arranged such that the sex of students
alternated between male and female, choosing an even interval (e.g. every 20th student) would
result in a sample of all males or all females. Whilst in this example the bias is obvious and
should be easily corrected, this may not always be the case.
 
3. Stratified sampling
In this method, the population is first divided into subgroups (or strata) who all share a
similar characteristic. It is used when we might reasonably expect the measurement of
interest to vary between the different subgroups, and we want to ensure representation from
all the subgroups. For example, in a study of stroke outcomes, we may stratify the population
by sex, to ensure equal representation of men and women. The study sample is then obtained
by taking equal sample sizes from each stratum. In stratified sampling, it may also be
appropriate to choose non-equal sample sizes from each stratum. For example, in a study of
the health outcomes of nursing staff in a county, if there are three hospitals each with
different numbers of nursing staff (hospital A has 500 nurses, hospital B has 1000 and
hospital C has 2000), then it would be appropriate to choose the sample numbers from each
hospital proportionally (e.g. 10 from hospital A, 20 from hospital B and 40 from hospital C).
This ensures a more realistic and accurate estimation of the health outcomes of nurses across
the county, whereas simple random sampling would over-represent nurses from hospitals A
and B. The fact that the sample was stratified should be taken into account at the analysis
stage.
Stratified sampling improves the accuracy and representativeness of the results by reducing
sampling bias. However, it requires knowledge of the appropriate characteristics of the
sampling frame (the details of which are not always available), and it can be difficult to
decide which characteristic(s) to stratify by.
 
4. Clustered sampling
In a clustered sample, subgroups of the population are used as the sampling unit, rather than
individuals. The population is divided into subgroups, known as clusters, which are randomly
selected to be included in the study. Clusters are usually already defined, for example
individual GP practices or towns could be identified as clusters. In single-stage cluster
sampling, all members of the chosen clusters are then included in the study. In two-stage
cluster sampling, a selection of individuals from each cluster is then randomly selected for
inclusion. Clustering should be taken into account in the analysis. The General Household
survey, which is undertaken annually in England, is a good example of a (one-stage) cluster
sample. All members of the selected households (clusters) are included in the survey.1
Cluster sampling can be more efficient that simple random sampling, especially where a
study takes place over a wide geographical region. For instance, it is easier to contact lots of
individuals in a few GP practices than a few individuals in many different GP practices.
Disadvantages include an increased risk of bias, if the chosen clusters are not representative
of the population, resulting in an increased sampling error.
 
Non-Probability Sampling Methods
1. Convenience sampling
Convenience sampling is perhaps the easiest method of sampling, because participants are
selected based on availability and willingness to take part. Useful results can be obtained, but
the results are prone to significant bias, because those who volunteer to take part may be
different from those who choose not to (volunteer bias), and the sample may not be
representative of other characteristics, such as age or sex. Note: volunteer bias is a risk of all
non-probability sampling methods.
 
2. Quota sampling
This method of sampling is often used by market researchers. Interviewers are given a quota
of subjects of a specified type to attempt to recruit. For example, an interviewer might be told
to go out and select 20 adult men, 20 adult women, 10 teenage girls and 10 teenage boys so
that they could interview them about their television viewing. Ideally the quotas chosen
would proportionally represent the characteristics of the underlying population.
Whilst this has the advantage of being relatively straightforward and potentially
representative, the chosen sample may not be representative of other characteristics that
weren’t considered (a consequence of the non-random nature of sampling). 2
 
3. Judgement (or Purposive) Sampling
Also known as selective, or subjective, sampling, this technique relies on the judgement of
the researcher when choosing who to ask to participate. Researchers may implicitly thus
choose a “representative” sample to suit their needs, or specifically approach individuals with
certain characteristics. This approach is often used by the media when canvassing the public
for opinions and in qualitative research.
Judgement sampling has the advantage of being time-and cost-effective to perform whilst
resulting in a range of responses (particularly useful in qualitative research). However, in
addition to volunteer bias, it is also prone to errors of judgement by the researcher and the
findings, whilst being potentially broad, will not necessarily be representative.
 
4. Snowball sampling
This method is commonly used in social sciences when investigating hard-to-reach groups.
Existing subjects are asked to nominate further subjects known to them, so the sample
increases in size like a rolling snowball. For example, when carrying out a survey of risk
behaviours amongst intravenous drug users, participants may be asked to nominate other
users to be interviewed.
Snowball sampling can be effective when a sampling frame is difficult to identify. However,
by selecting friends and acquaintances of subjects already investigated, there is a significant
risk of selection bias (choosing a large number of people with similar characteristics or views
to the initial individual identified).
 
Bias in sampling
There are five important potential sources of bias that should be considered when selecting a
sample, irrespective of the method used. Sampling bias may be introduced when:1
1. Any pre-agreed sampling rules are deviated from
2. People in hard-to-reach groups are omitted
3. Selected individuals are replaced with others, for example if they are difficult to
contact
4. There are low response rates
5. An out-of-date list is used as the sample frame (for example, if it excludes people who
have recently moved to an area)

- Analysis used in RCTs


Longitudinal analysis of covariance, repeated measures analysis in which also the baseline
value is used as outcome and the analysis of changes were theoretically explained and applied
to an example 

- What is matching, how is it done, what is demerit of matching too many factors
What is matching in research design?
Matched group design (also known as matched subjects design) is used in experimental
research in order for different experimental conditions to be observed while being able to
control for individual difference by matching similar subjects or groups with each other. This
can best be described using an example.

Advantages of matching
Matching is a useful method to optimize resources in a case control study. 
Matching on a factor linked to other factors may automatically control for the confounding
role of those factors (e.g. matching on neighborhood may control for socio-economic
factors).
Matching allows to use a smaller sample size, by preparing the stratified analysis "a priori"
(before the study, at the time of cases and control selection), with smaller sample sizes as
compared to an unmatched sample with stratified analysis made "a posteriori".
Matching avoids a stratified analysis with too many strata, with potentially no case or control,
done to control several confounding factors at the same time.  Indeed, in an unmatched  case
control study, while we perform logistic regression, or even more simply a stratified analysis,
we might end up with empty strata (no cases or no control in some strata). Matching avoids
this situation. 
Disadvantages of matching
The efficiency in data analysis that matching provides is limited by several disadvantages.
The greatest disadvantage of matching is that the effect of matching factor on the occurrence
of the disease of interest cannot be studied anymore.  One should therefore limit matching to
factors that are already known to be risk factors for the studied outcome.
If statistical softwares with logistic regression are available, it is possible to control for many
confounding factors during the analysis of the study, and therefore preventing confounding
by matching during the design of the study might not be needed, especially if the study is
including a large population and there are few chances that we will end up with empty strata.
If matching is performed, it must also be taken into account in the statistical analysis, because
a matched OR needs to be calculated, and conditional logistic regression need to be used.
However the study of the matching factor as an effect modifier is still possible if doing a
stratified analysis over several categories of the matching factor. For example when matching
on age, analysis is still feasible within each age stratum created. However to use different age
categories than those used for matching would require a multivariable analysis. Trying to
identify a dose response involving a matching factor would also require a multivariable
model of analysis.
Matching on criteria that are only associated with exposure and not with outcome further
biases the measurement of the effect. In this situation the matching factor is not a
confounding factor and matching would bring the OR towards 1.
Another difficulty occurs when matching on several factors. It then becomes difficult (time
and energy) to logistically identify and recruit controls due the high number of matching
factors (e.g. same age, sex, socio economic status, occupation, etc.). Matching on several
criteria may improve the efficiency of statistical analysis with a reduced sample size but the
difficulties to recruit controls may jeopardize that efficiency. It may also exclude cases for
which no matched controls can be identified. In addition, matching on many criteria increases
the risk of matching on exposure (therefore bringing the OR closer to one). This is sometimes
called overmatching.
One major challenge when matching is to properly define the various strata of the matching
variable. For example when frequency matching on age, we need to make sure that, within
each of the age group created, age is no longer a confounding factor. This is sometimes
called residual confounding. Several analysis with several width of age strata may be tested.
For example, let's suppose we stratify on several age groups 20 years wide (0-19, 20-39, 40-
59, 60-79, 80+). To assess if age is still a confounder within one age group we could further
stratify (by five years age group) and test if age is still a confounding factor inside a 20 years
wide age group. So it may still be important to take account of age as a potential confounder
in a multivariable analysis. 
What are the 3 types of t tests?
There are three main types of t-test:
 An Independent Samples t-test compares the means for two groups.
 A Paired sample t-test compares means from the same group at different times (say,
one year apart).
 A One sample t-test tests the mean of a single group against a known mean

RESEARCH METHODOLOGY
 Experimental method
 Experimental designs,
 What are the three principles of experimental design?
Thus, the three principles of experimental design are: replication, to provide an
estimate of experimental error; randomization, to ensure that this estimate is
statistically valid; and. local control, to reduce experimental error by making
the experiment more efficient.

The research design for studying factor of schizophrenia
 Prospective
 Retrospective

 Cohort
What is a cohort study in medical research?
 Finding causes
 Examples
 Limitations
Cohort studies are a type of medical research used to investigate the causes of disease and to
establish links between risk factors and health outcomes.
The word cohort means a group of people. These types of studies look at groups of people.
They can be forward-looking (prospective) or backward-looking (retrospective).
Prospective" studies are planned in advance and carried out over a future period of time.
Retrospective cohort studies look at data that already exist and try to identify risk factors for
particular conditions. Interpretations are limited because the researchers cannot go back and
gather missing data.
These long-term studies are sometimes called longitudinal studies.
Fast facts on cohort studies
 Cohort studies typically observe large groups of individuals, recording their exposure
to certain risk factors to find clues as to the possible causes of disease.
 They can be prospective studies and gather data going forward, or retrospective
cohort studies, which look at data already collected.
 The Nurses' Health Study is one example of a large cohort study, and it has produced
many important links between lifestyle choices and health by following hundreds of
thousands of women across North America.
 Such research can also help identify social factors that influence health

 Confounding variables
 What is a confounding variable in psychology?
Confounding variables are factors other than the independent variable that may
cause a result. In your caffeine study, for example, it is possible that the students who
received caffeine also had more sleep than the control group. Or, the experimental
group may have spent more time overall preparing for the exam

 The non sampling errors


Sampling Error
“Sampling error is the error that arises in a data collection process as a result of taking a
sample from a population rather than using the whole population.
Sampling error is one of two reasons for the difference between an estimate of a population
parameter and the true, but unknown, value of the population parameter. The other reason is
non-sampling error. Even if a sampling process has no non-sampling errors then estimates
from different random samples (of the same size) will vary from sample to sample, and each
estimate is likely to be different from the true value of the population parameter.
The sampling error for a given sample is unknown but when the sampling is random, for
some estimates (for example, sample mean, sample proportion) theoretical methods may be
used to measure the extent of the variation caused by sampling error.”
Non-sampling error:
“Non-sampling error is the error that arises in a data collection process as a result of factors
other than taking a sample.
Non-sampling errors have the potential to cause bias in polls, surveys or samples.
There are many different types of non-sampling errors and the names used to describe them
are not consistent. Examples of non-sampling errors are generally more useful than using
names to describe them.
And it proceeds to give some helpful examples.
These are great definitions, and I thought about turning them into a diagram, so here it is:

Five Common Types of Sampling Errors


 Population Specification Error—This error occurs when the researcher does not
understand who they should survey. For example, imagine a survey about breakfast
cereal consumption. Who to survey? It might be the entire family, the mother, or the
children. The mother might make the purchase decision, but the children influence her
choice.
 Sample Frame Error—A frame error occurs when the wrong sub-population is used to
select a sample. A classic frame error occurred in the 1936 presidential election
between Roosevelt and Landon. The sample frame was from car registrations and
telephone directories. In 1936, many Americans did not own cars or telephones, and
those who did were largely Republicans. The results wrongly predicted a Republican
victory.
 Selection Error—This occurs when respondents self-select their participation in the
study – only those that are interested respond. Selection error can be controlled by
going extra lengths to get participation. A typical survey process includes initiating
pre-survey contact requesting cooperation, actual surveying, and post-survey follow-
up. If a response is not received, a second survey request follows, and perhaps
interviews using alternate modes such as telephone or person-to-person.
 Non-Response—Non-response errors occur when respondents are different than those
who do not respond. This may occur because either the potential respondent was not
contacted or they refused to respond. The extent of this non-response error can be
checked through follow-up surveys using alternate modes.
 Sampling Errors—These errors occur because of variation in the number or
representativeness of the sample that responds. Sampling errors can be controlled by
(1) careful sample designs, (2) large samples (check out our online sample size
calculator), and (3) multiple contacts to assure representative response.

Non-sampling error refers to any deviation between the results of a survey and the truth
which are not caused by the random selecting of observations. That is, non-sampling error is
the total of all forms of error other than sampling error. Common types of non-sampling error
include non-response error, measurement error, interviewer error, adjustment error, and
processing error.
Non-response error
Non-response error refers to errors that are caused by differences between people that
participate in surveys versus people who do not participate in surveys. For example, surveys
that ask people about how they spend their time likely have large amounts of non-response
error, as people who spend their time doing surveys are likely quite different from those who
do not.
Non-response error can be narrowly defined as relating to whether people selected to
participate actually do participate (e.g., the difference between people who opened their email
invitation and completed a survey versus those who did not); or it can be more broadly
defined to include all non-random aspects of sampling (e.g., selection of the list to be used in
the research). Errors relating to whether the lists used in research are representative are also
known as list selection error and coverage error.
Measurement error
Measurement error refers to all the errors relating to the specific measurement of each
sampling unit (as opposed to errors relating to how they were selected to be measured). For
example, these could include confusing question wordings, low-quality data due to
respondent fatigue, and low quality multi-item scales being used to measure abstract
concepts.
Interviewer error
Interviewer error occurs when an interviewer makes an error in how they administer the
survey or record responses. For example, in qualitative research, an interviewer may “lead” a
respondent to a certain answer, and in quantitative research a bored interviewer may choose
to ask a question in words that they regard as superior to those in the questionnaire.
Adjustment error
Adjustment error occurs where the analysis of the data inadvertently adjusts the data in such
a way that it becomes less accurate. The main forms of adjustment error are errors with
weighting, data cleaning, and imputation.
Processing error
Processing error occurs when the processing of the data has caused an error of some kind,
such as when it is incorrectly entered or corrupted.

RESEARCH METHODOLOGY
 Experimental method
 Experimental designs
 What is the definition of experimental research design?
Experimental research is a study that strictly adheres to a scientific research design. It
includes a hypothesis, a variable that can be manipulated by the researcher, and
variables that can be measured, calculated and compared. Most
importantly, experimental research is completed in a controlled environment.

 The prospective research biases


Prospective vs. Retrospective Studies
 
Prospective
A prospective study watches for outcomes, such as the development of a disease, during the
study period and relates this to other factors such as suspected risk or protection factor(s).
The study usually involves taking a cohort of subjects and watching them over a long period.
The outcome of interest should be common; otherwise, the number of outcomes observed
will be too small to be statistically meaningful (indistinguishable from those that may have
arisen by chance). All efforts should be made to avoid sources of bias such as the loss of
individuals to follow up during the study. Prospective studies usually have fewer potential
sources of bias and confounding than retrospective studies.
 
Retrospective
A retrospective study looks backwards and examines exposures to suspected risk or
protection factors in relation to an outcome that is established at the start of the study. Many
valuable case-control studies, such as Lane and Claypon's 1926 investigation of risk factors
for breast cancer, were retrospective investigations. Most sources of error due to confounding
and bias are more common in retrospective studies than in prospective studies. For this
reason, retrospective investigations are often criticised. If the outcome of interest is
uncommon, however, the size of prospective investigation required to estimate relative risk is
often too large to be feasible. In retrospective studies the odds ratio provides an estimate of
relative risk. You should take special care to avoid sources of bias and confounding in
retrospective studies.
 
Prospective investigation is required to make precise estimates of either the incidence of an
outcome or the relative risk of an outcome based on exposure.
 
Case-Control studies
Case-Control studies are usually but not exclusively retrospective, the opposite is true for
cohort studies. The following notes relate case-control to cohort studies:
 outcome is measured before exposure
 controls are selected on the basis of not having the outcome
 good for rare outcomes
 relatively inexpensive
 smaller numbers required
 quicker to complete
 prone to selection bias
 prone to recall/retrospective bias
 related methods are risk (retrospective), chi-square 2 by 2 test, Fisher's exact
test, exact confidence interval for odds ratio, odds ratio meta-analysis and conditional
logistic regression.
 
Cohort studies
Cohort studies are usually but not exclusively prospective, the opposite is true for case-
control studies. The following notes relate cohort to case-control studies:
 outcome is measured after exposure
 yields true incidence rates and relative risks
 may uncover unanticipated associations with outcome
 best for common outcomes
 expensive
 requires large numbers
 takes a long time to complete
 prone to attrition bias (compensate by using person-time methods)
 prone to the bias of change in methods over time
 related methods are risk (prospective), relative risk meta-analysis, risk difference
meta-analysis and proportions
 

 Follow up based studies limitation


 What kind of bias is encountered if too many participants in a study are lost to follow
up?
Attrition Bias (Loss to follow-up)
Loss to follow-up is a situation in which the investigator loses contact with the
subject, resulting in missing data. If too many subjects are loss to follow-up, the
internal validity of the study is reduced.

 National mental health survey, sample size, statistics used, methodology


 -Which study would I use to determine whether factor X causes very rare disease Y.?

As per Global Burden of Disease report, mental disorders accounts for 13% of total DALYs
lost for Years Lived with Disability (YLD) with depression being the leading cause2 .

Mental Health and Sustainable Development goals Within the health related SDGs, two
targets are directly related to mental health and substance abuse. Target 3.4 “By 2030, reduce
by one third premature mortality from Non communicable diseases through prevention and
treatment and promote mental health and well-being.” Target 3.5 requests that countries:
“Strengthen the prevention and treatment of substance abuse, including narcotic drug abuse
and harmful use of alcohol.” Source: 11
The NMHS was undertaken as a large scale, multi-centred national study on the various
dimensions and characteristics of mental health problems among individuals aged 18 years
and above across 12 Indian states (Figure 1) during 2014 - 16. 2. A pilot study was
undertaken in Kolar district, Karnataka during Jan-Nov 2014, on a sample of 3190
individuals (13 years and above) to examine the feasibility of: conducting the survey, the
proposed sampling methodologies and the use of hand held computing devices for field data
collection. Six well trained data collectors were involved in data collection across 50 clusters
(Villages and urban wards) of the district which provided a crude prevalence of 7.5% for all
mental disorders. The lessons learnt and experience gained helped in developing the NMHS
methodology. 3. The selection of states was based on the availability of an interested and
reliable partner organisation in that state, their willingness to undertake the study and the
availability of screening and diagnostic data collection tools in the vernacular languages
spoken in that state. The states selected were North : Punjab and Uttar Pradesh, South: Tamil
Nadu and Kerala, East: Jharkhand and West Bengal, West: Rajasthan and Gujarat, Central:
Madhya Pradesh and Chhattisgarh and, North-East : Assam and Manip

A multi stage sampling was adopted (District g Taluka g Village / Ward g HH) in each state
and each selected state of India constituted the sampling frame

in total 34802 adults and about 1191 adolescents drawn from 12 states were interviewed

The study instruments collected sociodemographic information including completed age,


gender, education, occupation, income (house-hold and individual) and marital status. For
assessment of mental morbidity, the Mini International Neuro-Psychiatric Inventory (MINI)
adult version and the MINI-Kid version were used for adults and, older children and
adolescents, respectively. In addition, additional questionnaires for tobacco use (Fagerstrom
questionnaire) and to screen for Epilepsy, Intellectual Disability (ID) and Autism Spectrum
Disorders (ASD) were also incorporated. Further, questionnaires on health care utilisation,
assessment of disability (modified Sheehan’s scale) and socioeconomic impact of illness
were used in the study.

You might also like