Apuntes UNIT 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

UNIT 1: INTRODUCTION TO STATISTICS

WHAT IS STATISTICS?

Statistics is a branch of mathematics dealing with the collection, analysis,


interpretation, and presentation of masses of numerical data.

WHY DO WE NEED STATISTICS?

Data is almost always an imperfect measure of what we are interested in because of:

- systematic error (error that occurs in the measurement process because of an


uncalibrated instrument)
- random error (error that is produced by a random chance and it is harder to
prevent)

We need to find ways to extract insights from these imperfect measures. Anything we
choose to measure will vary from person to person, from time to time, and from place
to place. We need to extract insights from this variability.

- What can we learn from the way a variable changes from person to person?
- what can we learn from the way a variable changes from place to place?
- Can you think of examples?

You will make decisions based on the statistical analysis of data:

- placement of a student in a special program


- Assessing the severity of depression in a patient
- In HR: recruitment decisions, promotions, etc.
In research (academic or industry), advanced knowledge of research methodology and
statistical techniques is required.

Generally, statistical literacy (the ability to understand and critically evaluate statistical
results) is fundamental in the “age of data”

POPULATIONS AND SAMPLES

 A population is a collection of all possible members of a defined group. It could


be of any size.
- University students in Europe (≈ 20 million)
- University students in Spain (≈ 2 million)

 A sample is a subset of units of the population of interest. It is just a portion of


the population.
- A sample of 3000 University students from different European
universities.
- A sample of 500 university students from different universities
across Spain

TWO BRANCHES OF STATISTICS

 Descriptive statistics: Organize, summarize and communicate numerical


information. We make claims about the sample

Ex: We measure the height (in cm) of 50 UEM students chosen at random:
Mean= 170, S.D. = 10 ----------------It describes the average height (and
variability) in that group of 50 students

 Inferential statistics: Using representative sample data, we draw conclusions


(we make inferences) about a population. We make claims about the
population.
Ex: Let’s say we were expecting to see a mean = 175cm. Is the sample mean
significantly different than 175 cm?
t (49) = 3.182, p =.002
This result tells us that the height of this group of students is significantly
different than a 175 cm. Because these students were chosen at random, we
can make claims about the height of UEM students.

STATISTICS IS MOSTLY ABOUT NUMBERS

So… how do we make “thigs of interest” into numbers?

We come up with an objective and clear definition that allows us to assign numbers to
objects.

For example:

 Age = “years that have passed since date of birth”


 Economic status = “income declared in the yearly tax return form”
 Depression severity = “score on the BDI (Beck Depression Inventory”

These measures result in outcomes (numbers) which we will treat as variables

WHAT IS A VARIABLE?

VARIABLES: Any observation that can take on different values in different


circumstances. In other words: any value that is allowed to change or vary. We use it
to refer to all types of observations that make up data (or a dataset)

Examples: age, reaction times in experiments, grade in an exam, nationality…

TYPES OF VARIABLES

DISCRETE: Variables that can only take on whole numbers (0,1,2,3…)

Ex: number of students

CONTINUOUS: Can take on almost any numerical value using decimals (1,0; 1,1; 1,2…)

Ex: weight

CLASSIFICATION OF VARIABLES

CATEGORICAL VARIABLES:

 NOMINAL: category or name. Unordered categories


Ex: field of study, the city in which a person lives, marital status, blood type.
 ORDINAL: Ranking of data. Ordered categories
Ex: educational level, rank in the military, ratings (good, average, poor)

QUANTITATIVE VARIABLES: THEY ARE ALSO KNOWN AS SCALE VARIABLES

 INTERVAL: it is used with numbers that are equally spaced. Zeros are not
meaningful
Ex: IQ score

 RATIO: they are similar to interval variables, but zero has meaning (absence of
the thing you are measuring)
Ex: drug dosage in mg

OPERATIONS

NOMINAL:

We can only count the number of observations In each category and report that
number.

Ex:

- Nationalities: How many people come from France? How many from Spain?
- Marital status: how many people are married? single? Divorced?

ORDINAL:

We can only rank observations

Ex:

- Educational level: primary school, secondary school, bachelor degree, masters,


PhD.
- A person with a bachelor degree has a highest level of education than a person
who only completed secondary school

INTERVAL:

We can calculate differences in values, because numbers are meaningful and equally
spaced.

Ex:

- IQ score
- A person with an IQ= 110 is 10 points above the mean (100). 110-100= 10
RATIO:

Same properties as interval data, but we can also multiply and divide values.

Ex: number of hours of study

- Sam studies for 6 hours; Alex studies for 2 hours


- Sam studied 4 hours more than Alex
- And also Sam studied 3 times as much as Alex.

THE ROLE OF VARIABLES

To understand how we will analyze variables, we need to understand their context.


When we design an experiment to explore a problem, we attempt to understand the
relationship between two variables

 Dependent variable -------- variable that we want to understand


(DVs): Also known as outcome. The variable that we have measured in the study
to explore differences/changes based on the IV (or predictor) Generally these
variables are interval or ratio variables. But sometimes we are interested in
nominal or ordinal outcomes
 Independent variable --------- variable whose differences cause differences in
the dependent variables. These are the ones that usually manipulated
experimentally or, alternatively, use to categorize
(IV): Also known as predictor. Variable you manipulate (if possible) or categorize

- For a true experiment: it is manipulated (or controlled for)


EX: treatment applied to a group or drug dosage
- For a quasi experiment: it is naturally occurring in a group.
EX: gender or socioeconomic status

TYPE OF RESEARCH DESIGNS

EXPERIMENTS : studies in which participants are randomly assigned to a condition (or


level) of one or more independent variables.

EX: We randomly assign patients diagnosed with depression (with similar depression
levels) to two different treatments.

- Group A: cognitive therapy


- Group B: group therapy

If we see a difference in recovery, then we can say that one treatment is more
effective than the other.

QUASI-EXPERIMENTS: similar to an experimental design but groups are not randomly


created.

EX: we want to compare cognitive and group therapy but we cannot randomly assign
patients. We need to apply each type of therapy in groups that are given to us
(patients attend different psychotherapy centers)

- Group A in center Y: Cognitive therapy


- Group B in center X: group therapy

If we see a difference in recovery, we can say that one treatment is more effective
than the other. But we should also consider that the psychotherapy center has an
effect on results

EX-POST FACTO (AKA CORRELATIONAL) STUDIE S: don’t manipulate either variable.


Variables are assessed as they exist. We look for relationships between two or more
variables.

They usually cannot determine causality.

Ex: we take a sample of students and we apply two tests to measure:

- Stress levels using the Perceived Stress Scale (PSS)


- Procrastination using the Procrastination Assessment Scale (PASS)

We might find a relationship but we don’t know what is causing what


BEWARE OF CONFOUNDING VARIABLES

CONFOUNDING (OR LURKING) VARIABLES: Variables that systematically vary with


the independent variable so that we cannot determine which variable is a work. We
will need to control for them (or randomize them away)

EX: a recent study found that people who engage in cultural activities (going to
museums, going to the opera, cinema) lived longer than people who did not do so.

- Dependent variable: life expectancy


- Independent variable: engaging in cultural activities

SIMPSON’S PARADOX

The total results are affected by the amount of patients in poor health who arrived in
hospital B

Radelet and Pierce (1991): the lower percentage of total death sentences for African-
Americans is due to the fact that cases in which the victim is black are less likely to get
death sentences and same-race violence is more prevalent.

You might also like