Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

https://www.student.uwa.edu.

au/learning/resources/ace/
respect-intellectual-property/copyright-and-uwa-unit-content

Adriano Polpo (UWA) STAT 1400


STAT 1400 - Statistics for Science

stat1400-ems@uwa.edu.au

Contributors to lecture material: Adrian Baddeley, Adriano Polpo, John Bamberg, Ed Cripps, Julie Marsh, Kevin Murray,
Gordon Royle, and Berwin Turlach.

Adriano Polpo (UWA) STAT 1400


What is Data?

“Data is systematically observed and recorded information.”

Data are numbers with a context — telling us what the


numbers mean, and how the observations were collected.
Interpretation of data

Interpretation of data depends on context


Example: “42 out of 100 people approve of the prime
minister.”
Which people?
What does “approve” mean?
Was it a random sample of Australians?
Was it self-selected online poll?
Always be sceptical. Ask yourself: what is this data? How was
it collected? What is its context?
Data Tables

To say anything useful, must


Gather the data
Organize and record the data

By far the most common way to arrange data is in a data table.


Data table about students

Student Age Degree Height Gender Postcode


1 21.1 Science 175 M 6009
2 18.3 Commerce 168 F 6123
3 28.1 Science 180 M 6009
4 19.2 Arts 172 F 6014
.. .. .. .. .. ..
. . . . . .
Rows are Observations
Each row of the data table contains the values for one individual or
one observation.

Student Age Degree Height Gender Postcode


1 21.1 Science 175 M 6009
2 18.3 Commerce 168 F 6123
3 28.1 Science 180 M 6009
4 19.2 Arts 172 F 6014
.. .. .. .. .. ..
. . . . . .

Respondents (in a survey)


Subjects or Participants (in an experiment)
Experimental Units (animals or inanimate objects)
Columns are Variables / Attributes

Each column of the data table records the observed values of one
of the variables, or attributes of the experimental unit.

Student Age Degree Height Gender Postcode


1 21.1 Science 175 M 6009
2 18.3 Commerce 168 F 6123
3 28.1 Science 180 M 6009
4 19.2 Arts 172 F 6014
.. .. .. .. .. ..
. . . . . .

A general property of every subject, like “Age” or “Height”.


Every subject has its own value for each of these properties.
The values are revealed by the experiment / observation
A famous field study

Joyce H. Poole, Mate guarding, reproductive


success and female choice in African elephants.
Animal Behaviour 37 (1989) 842–849

Part of the paper reports a study of 41 male African elephants


over a period of 8 years (1980-1987).
She recorded:
Number of successful matings of each elephant,
Age (as of 1987) of each elephant.
A day’s logbook
Age and Mating

Matings Ages (1987)

5"" e.G 0 s I

uo.-Je.-

(6

R Z rts

-, S. ,£o.J(

';) (,..."Vi 10 A-\.v1lr(;

bo YJ
.j c... "'-\. Hele v, 8S

'a......

:
The data table

Age Matings
27 0
28 1
28 1
28 1
28 3
29 0
29 0
29 0
29 2
29 2
. .
. .
. .
. .
. .
. .
43 0
43 2
43 3
43 4
43 9
44 3
45 5
47 7
48 2
52 9
Frequencies (1 variable)
A first step in visualising data is to consider the frequency with
which each value occurs.

8
6
Frequency

4
2
0

0 2 4 6 8

Number of successful matings


Associations (2 variables)
If each observation involves just two variables, then each
observation can be plotted as a point in 2d-space.

● ●

● ●
6
Number of matings

● ●

● ● ●
4

● ● ● ● ● ●

● ● ● ● ● ● ●
2

● ● ● ● ●

● ● ●
0

30 35 40 45 50

Age in years
Questions

mmediately looks for trends and patterns.

Is there a clear relationship?


Is the relationship positive or negative?
Can the relationship be described by an equation?

Drawing a nice red line through the cloud of points does not mean
that there is a genuine association.
Types of Data

Quantitative (numeric) data


measures the amount or quantity of something:
weight, height, number of things, length of time, . . .

Categorical data
identifies the type or category into which something falls:
gender (Male/Female),
size (Small/Medium/Large),
emergency type (Fire/Police/Ambulance),
nucleotide (A,C,T,G).
From the 2016 Australian Census

Gender
Male, Female
Age
Marital status
Never married, widowed, divorced, separated, married
Australian citizen
Yes, No
English speaking ability
Very well, well, not well, not at all
Number of births (female respondents)
Postcode

Which are quantitative and which are categorical?


Not all numbers are quantitative

Some categorical variables use numbers as the categories.

This does not make them quantitative variables.

A good rule of thumb is to ask whether it makes any sense to take


an average of the values—if not, then the data is not quantitative.

The average postcode of this class is

(6009 + 6123 + 6009 + 6014)/4 = 6038.75


Data Taxonomy
Data Taxonomy
Data Taxonomy
Types of Quantitative Data

Continuous Any value in some range can occur


Heights 180.0 cm, 180.01 cm, etc.
Rainfall 21.12 mm, 21.125 mm, etc.

0 1 Continuous 6 7 8

Discrete

Discrete Every value separated from every other


Alleles of a gene (0, 1 or 2)
Girls in family of 5 children (0, 1, 2, . . ., 5)
Matings of adult African elephants (0, 1, 2, . . .)
Types of Categorical Data

Nominal Categories have no natural ordering:


Eye colour (green, brown, blue, etc)
Make of car (Mazda, Toyota, Ferrari, etc)

Ordinal Categories have natural ordering:


Size of tree (small, medium, large)
Likert scale (strongly disagree, disagree, neutral, agree,
strongly agree)
Dates (4th August, 5th August, 12th September)

You might also like