MA 2140: Statistics: Dr. Sameen Naqvi Department of Mathematics, IIT Hyderabad Email Id: Sameen@iith - Ac.in

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

MA 2140: Statistics

Dr. Sameen Naqvi


Department of Mathematics, IIT Hyderabad
Email id: sameen@iith.ac.in

1 / 22
Data Basics Types of studies

A. Observational Studies

Simply observing what happens


Researcher ‘does not impose’ a treatment or randomly assign subjects
to a group.
Only establishes an association

Examples
(i) Tanning and Skin Cancer
Select a group of people who have skin cancer and another group of
people who do not have skin cancer. Ask all participants whether
they used tanning beds.
(ii) Sodium and Blood Pressure
Select 100 individuals and give them diet diaries where they record
everything they eat each day. From this the amount of sodium in the
diet is found. Measure their blood pressure.

2 / 22
Data Basics Types of studies

Types of observational studies

I Retrospective - look at past records and historical data.


Tanning and Skin Cancer.

I Prospective - identify subjects and collect data as events unfold


Sodium and Blood Pressure.

Observational studies are often used in marketing or health studies.

3 / 22
Data Basics Types of studies

Confounding

Suppose an observational study tracked sunscreen use and skin


cancer, and it was found that the more sunscreen someone used, the
more likely the person was to have skin cancer. Does this mean
sunscreen causes skin cancer?

No!

An important piece of information that is absent is sun exposure.


Here, sun exposure is a confounding variable.

Confounding variables are extraneous variables that affect both the


explanatory (independent) and the response (dependent) variable and
that make it seem like there’s a relationship between them.

4 / 22
Data Basics Types of studies

Example: Fertilizer and Corn yield

All plots on the west side of the state are in fertilizer A.


All plots on the middle of the state are in fertilizer B.
All plots on the east side of the state are in fertilizer C.

If we see differences in yield between the fertilizer groups, is it due to


the fertilizer or is it due to soil?

We don’t know. This is because ‘fertilizer’ and ‘soil’ are confounded


in the scenario.

5 / 22
Data Basics Types of studies

B. Experiment

Randomly assign subjects to treatments.

Establish causal connections.


Uses randomization as a tool to fight the occurrence of confounding.

Why does randomization help?

The effects of other factors (besides the variable of interest) on the


response gets ‘averaged-out’ across the treatment groups.

All fertilizers get some good soil and some bad soil.
Fertilizer is no longer confounded with the soil.

6 / 22
Data Basics Types of studies

Examples

(i) In a study, a group of adults are randomly divided into two groups.
One group was told to drink tea every night for a week, while the
other group was told not to drink tea that week. Researchers then
compared when each group fell asleep.

(ii) Sixty individuals who wish to lose weight are randomly divided into
two groups of 30. One group is given an exercise program to follow
while the other group follows a special diet. After three months, the
researcher compares mean weight losses in the two groups. What
type of study is this?

I Experimental Study

7 / 22
Data Basics Types of studies

Summary

The conclusions for observational studies are not as strong as


experiments (which use randomization), but sometimes we can not
ethically perform an experiment.

“Can you randomly assign individuals to be in a cigarette smoking


group?” No.
“Can you randomly assign someone to use a tanning bed if there is a
risk of cancer?” No.

Observational studies are generally only sufficient to show


associations, and making causal conclusions can be treacherous and is
not recommended. While experiments, allow us to infer causation.

Correlation does not imply Causation.

8 / 22
Data Basics Types of studies

Collecting data

9 / 22
Collecting data

Census vs. Sampling

Census
Wouldn’t it be better to just include everyone and sample the entire
population, i.e., conduct a census.

Reasons for not conducting a census:

I Some individuals may be hard to locate or hard to measure, and these


people may be different from the rest of the population.

I Populations rarely stand still.

I Sampling is actually quite natural.

10 / 22
Collecting data

11 / 22
Collecting data Strategies for data collection

Strategies for data collection

(A.) Non-probability Methods:

(i) Convenience sampling: for instance, surveying students as they pass


by the Auditorium.

(ii) Gathering volunteers: for instance, using an advertisement in a


magazine or on a website inviting people to complete a form or
participate in a study.

12 / 22
Collecting data Strategies for data collection

Strategies for data collection

(B.) Probability Methods:

(i) Simple random sampling

Each case is equally likely to be selected.

13 / 22
Collecting data Strategies for data collection

(ii) Stratified sampling

Divide the population into homogeneous strata

Then randomly sample from within each stratum

14 / 22
Collecting data Strategies for data collection

(iii) Cluster sampling

Divide the population into clusters

Randomly sample a few clusters

Then sample all observations within these clusters

15 / 22
Collecting data Strategies for data collection

Which method is better? Probabilistic or Non-probabilistic?

16 / 22
Collecting data Strategies for data collection

Example: Survey of Air India Passengers

Suppose you are interested to survey Air India passengers on what


they like and dislike about traveling on this airline.

Since you live in Hyderabad, you go to the airport and just interview
passengers as they approach the ticket counter.
I Convenience Sampling

You randomly select a set of passengers flying on the airline and


question those that you have selected.
I Simple Random Sampling

17 / 22
Collecting data Strategies for data collection

Example contd.

You ask the ticket counter personnel to distribute a questionnaire to


each passenger requesting them to complete the survey and return it.
I Volunteer Sampling

You stratify the passengers by the class they fly (business, premium
economy, economy), and then take a random sample from each of
these strata.
I Stratified Sampling

18 / 22
Collecting data Strategies for data collection

Example: Phone survey

In predicting the 2008 Iowa Caucus results, a phone survey said that
Hillary Clinton would win, but instead Obama won.

Where did they go wrong?

Since the survey was conducted on landline phones, they mostly


reached older people and were unable to get opinions from younger
people who were using cellphones. However, lots of younger people
got involved in this election and voted for Obama.

19 / 22
Collecting data Sources of bias

Sources of sampling bias

I Convenience sample bias: individuals who are easily accessible, are


more likely to be included in the sample.

I Non-response bias: if only a non-random fraction of the randomly


sampled people respond to a survey, such that the sample is no longer
representative of the population.

I Voluntary response bias: occurs when the sample consists of only


people who volunteer to respond because they have strong opinions
on the issue.

20 / 22
Collecting data Sources of bias

Methods of data collection

21 / 22
Collecting data Sources of bias

Thank you for listening!

22 / 22

You might also like