Apuntes UNIT 1

UNIT 1: INTRODUCTION TO STATISTICS
WHAT IS STATISTICS?
Statistics is a branch of mathematics dealing with the collection, analysis,

interpretation, and presentation of masses of numerical data.
WHY DO WE NEED STATISTICS?
Data is almost always an imperfect measure of what we are interested in because of:
- systematic error (error that occurs in the measurement process because of an

uncalibrated instrument)
- random error (error that is produced by a random chance and it is harder to
prevent)
We need to find ways to extract insights from these imperfect measures. Anything we
choose to measure will vary from person to person, from time to time, and from place
to place. We need to extract insights from this variability.
- What can we learn from the way a variable changes from person to person?
- what can we learn from the way a variable changes from place to place?
- Can you think of examples?
You will make decisions based on the statistical analysis of data:
- placement of a student in a special program

- Assessing the severity of depression in a patient
- In HR: recruitment decisions, promotions, etc.
In research (academic or industry), advanced knowledge of research methodology and
statistical techniques is required.
Generally, statistical literacy (the ability to understand and critically evaluate statistical
results) is fundamental in the “age of data”
POPULATIONS AND SAMPLES
 A population is a collection of all possible members of a defined group. It could

be of any size.
- University students in Europe (≈ 20 million)
- University students in Spain (≈ 2 million)
 A sample is a subset of units of the population of interest. It is just a portion of

the population.
- A sample of 3000 University students from different European
universities.
- A sample of 500 university students from different universities
across Spain
TWO BRANCHES OF STATISTICS
 Descriptive statistics: Organize, summarize and communicate numerical

information. We make claims about the sample
Ex: We measure the height (in cm) of 50 UEM students chosen at random:
Mean= 170, S.D. = 10 ----------------It describes the average height (and
variability) in that group of 50 students
 Inferential statistics: Using representative sample data, we draw conclusions

(we make inferences) about a population. We make claims about the
population.
Ex: Let’s say we were expecting to see a mean = 175cm. Is the sample mean
significantly different than 175 cm?
t (49) = 3.182, p =.002
This result tells us that the height of this group of students is significantly
different than a 175 cm. Because these students were chosen at random, we
can make claims about the height of UEM students.
STATISTICS IS MOSTLY ABOUT NUMBERS
So… how do we make “thigs of interest” into numbers?
We come up with an objective and clear definition that allows us to assign numbers to
objects.
For example:
 Age = “years that have passed since date of birth”

 Economic status = “income declared in the yearly tax return form”
 Depression severity = “score on the BDI (Beck Depression Inventory”
These measures result in outcomes (numbers) which we will treat as variables
WHAT IS A VARIABLE?
VARIABLES: Any observation that can take on different values in different

circumstances. In other words: any value that is allowed to change or vary. We use it
to refer to all types of observations that make up data (or a dataset)
Examples: age, reaction times in experiments, grade in an exam, nationality…
TYPES OF VARIABLES
DISCRETE: Variables that can only take on whole numbers (0,1,2,3…)
Ex: number of students
CONTINUOUS: Can take on almost any numerical value using decimals (1,0; 1,1; 1,2…)
Ex: weight
CLASSIFICATION OF VARIABLES
CATEGORICAL VARIABLES:
 NOMINAL: category or name. Unordered categories

Ex: field of study, the city in which a person lives, marital status, blood type.
 ORDINAL: Ranking of data. Ordered categories
Ex: educational level, rank in the military, ratings (good, average, poor)
QUANTITATIVE VARIABLES: THEY ARE ALSO KNOWN AS SCALE VARIABLES
 INTERVAL: it is used with numbers that are equally spaced. Zeros are not
meaningful
Ex: IQ score
 RATIO: they are similar to interval variables, but zero has meaning (absence of
the thing you are measuring)
Ex: drug dosage in mg
OPERATIONS
NOMINAL:
We can only count the number of observations In each category and report that
number.
Ex:
- Nationalities: How many people come from France? How many from Spain?
- Marital status: how many people are married? single? Divorced?
ORDINAL:
We can only rank observations
Ex:
- Educational level: primary school, secondary school, bachelor degree, masters,

PhD.
- A person with a bachelor degree has a highest level of education than a person
who only completed secondary school
INTERVAL:
We can calculate differences in values, because numbers are meaningful and equally
spaced.
Ex:
- IQ score
- A person with an IQ= 110 is 10 points above the mean (100). 110-100= 10
RATIO:
Same properties as interval data, but we can also multiply and divide values.
Ex: number of hours of study
- Sam studies for 6 hours; Alex studies for 2 hours

- Sam studied 4 hours more than Alex
- And also Sam studied 3 times as much as Alex.
THE ROLE OF VARIABLES
To understand how we will analyze variables, we need to understand their context.

When we design an experiment to explore a problem, we attempt to understand the
relationship between two variables
 Dependent variable -------- variable that we want to understand

(DVs): Also known as outcome. The variable that we have measured in the study
to explore differences/changes based on the IV (or predictor) Generally these
variables are interval or ratio variables. But sometimes we are interested in
nominal or ordinal outcomes
 Independent variable --------- variable whose differences cause differences in
the dependent variables. These are the ones that usually manipulated
experimentally or, alternatively, use to categorize
(IV): Also known as predictor. Variable you manipulate (if possible) or categorize
- For a true experiment: it is manipulated (or controlled for)

EX: treatment applied to a group or drug dosage
- For a quasi experiment: it is naturally occurring in a group.
EX: gender or socioeconomic status
TYPE OF RESEARCH DESIGNS
EXPERIMENTS : studies in which participants are randomly assigned to a condition (or

level) of one or more independent variables.
EX: We randomly assign patients diagnosed with depression (with similar depression
levels) to two different treatments.
- Group A: cognitive therapy

- Group B: group therapy
If we see a difference in recovery, then we can say that one treatment is more
effective than the other.
QUASI-EXPERIMENTS: similar to an experimental design but groups are not randomly

created.
EX: we want to compare cognitive and group therapy but we cannot randomly assign
patients. We need to apply each type of therapy in groups that are given to us
(patients attend different psychotherapy centers)
- Group A in center Y: Cognitive therapy

- Group B in center X: group therapy
If we see a difference in recovery, we can say that one treatment is more effective
than the other. But we should also consider that the psychotherapy center has an
effect on results
EX-POST FACTO (AKA CORRELATIONAL) STUDIE S: don’t manipulate either variable.

Variables are assessed as they exist. We look for relationships between two or more
variables.
They usually cannot determine causality.
Ex: we take a sample of students and we apply two tests to measure:
- Stress levels using the Perceived Stress Scale (PSS)

- Procrastination using the Procrastination Assessment Scale (PASS)
We might find a relationship but we don’t know what is causing what

BEWARE OF CONFOUNDING VARIABLES
CONFOUNDING (OR LURKING) VARIABLES: Variables that systematically vary with

the independent variable so that we cannot determine which variable is a work. We
will need to control for them (or randomize them away)
EX: a recent study found that people who engage in cultural activities (going to
museums, going to the opera, cinema) lived longer than people who did not do so.
- Dependent variable: life expectancy

- Independent variable: engaging in cultural activities
SIMPSON’S PARADOX
The total results are affected by the amount of patients in poor health who arrived in
hospital B
Radelet and Pierce (1991): the lower percentage of total death sentences for African-
Americans is due to the fact that cases in which the victim is black are less likely to get
death sentences and same-race violence is more prevalent.

Apuntes UNIT 1

Uploaded by

Copyright:

Available Formats

You might also like

Apuntes UNIT 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Apuntes UNIT 1

Uploaded by

Copyright:

Available Formats

UNIT 1: INTRODUCTION TO STATISTICS

Statistics is a branch of mathematics dealing with the collection, analysis,

WHY DO WE NEED STATISTICS?

- systematic error (error that occurs in the measurement process because of an

You will make decisions based on the statistical analysis of data:

- placement of a student in a special program

POPULATIONS AND SAMPLES

 A population is a collection of all possible members of a defined group. It could

 A sample is a subset of units of the population of interest. It is just a portion of

TWO BRANCHES OF STATISTICS

 Descriptive statistics: Organize, summarize and communicate numerical

 Inferential statistics: Using representative sample data, we draw conclusions

STATISTICS IS MOSTLY ABOUT NUMBERS

So… how do we make “thigs of interest” into numbers?

 Age = “years that have passed since date of birth”

These measures result in outcomes (numbers) which we will treat as variables

VARIABLES: Any observation that can take on different values in different

Examples: age, reaction times in experiments, grade in an exam, nationality…

DISCRETE: Variables that can only take on whole numbers (0,1,2,3…)

Ex: number of students

 NOMINAL: category or name. Unordered categories

QUANTITATIVE VARIABLES: THEY ARE ALSO KNOWN AS SCALE VARIABLES

We can only rank observations

- Educational level: primary school, secondary school, bachelor degree, masters,

Ex: number of hours of study

- Sam studies for 6 hours; Alex studies for 2 hours

THE ROLE OF VARIABLES

To understand how we will analyze variables, we need to understand their context.

 Dependent variable -------- variable that we want to understand

- For a true experiment: it is manipulated (or controlled for)

TYPE OF RESEARCH DESIGNS

EXPERIMENTS : studies in which participants are randomly assigned to a condition (or

- Group A: cognitive therapy

QUASI-EXPERIMENTS: similar to an experimental design but groups are not randomly

- Group A in center Y: Cognitive therapy

EX-POST FACTO (AKA CORRELATIONAL) STUDIE S: don’t manipulate either variable.

They usually cannot determine causality.

Ex: we take a sample of students and we apply two tests to measure:

- Stress levels using the Perceived Stress Scale (PSS)

We might find a relationship but we don’t know what is causing what

CONFOUNDING (OR LURKING) VARIABLES: Variables that systematically vary with

- Dependent variable: life expectancy

You might also like