Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

DESCRIPTIVE STATISTICS

(such statistics simply report what has been found, they


describe and present the data in a variety of ways)

FRECUENCY DISTRIBUTIONS.
MEASURES OF LOCATION AND
VARIABILITY.

Mónica Marbán
VARIABLES

Statistical variable
In statistics, we generally want to study a population, that is, an entire collection of
persons, things or objects.
To study the larger population we select a sample.
From the sample data, we can calculate a “statistic” (a number that is a property for the
sample)
A statistic is a parameter of a population “parameter” (property for the population)

• A VARIABLE is a measurable or observable characteristic of interes for each person or


thing in a population (people, families, companies or objects) we are interested in.

Discretes
Quantitatives
Continuous
Variables
Nominal
Qualitatives
Ordinal

2 Mónica Marbán
VARIABLES

 REALISE THAT VARIABLES ARE NOT JUST THOSE THINGS THAT CAN BE
MEASURED IN THE TRADITIONAL SENSE.

Quantitative variables: when it is possible to find differences in magnitude, numbers,


among the categories of a variable. There are two types:

 Discrete: the values are integer numbers coming from a counting process (the number
of children in a family, the number of employees in a company, the number of
siblings,…)
 Continuous: the values are real numbers coming from a measurement process (the
height and weight of a person, the amount of time it takes a train to arrive at its
destination, the salary earned by an employee,…)

3 Mónica Marbán
VARIABLES

Qualitative variables: when it is nonsense to distinguish two categories of a variable by the


difference in numbers. There are two types:

 Nominal: the variable is an attribute whose categories are names or words which
cannot be ordered (nationality, profession, eye colour, type of accomodations, reasons
of travel, place of vacation for next year, last year place of vacation, possibility of
second residence, …)
 Ordinal: idem but the categories can be ranked, so an order is introduced into the data
(category of hotel/ apartment, opinion relative to the price, your feeling about the travel,
quality compaired with last visits, type of selected tourist package, …). Ordinal data
include items such as rating scales and Likert scales, and are frequently used in asking
for opininions and attitudes.

4 Mónica Marbán
VARIABLES

ABOUT ROLES AND PURPOSES


 The indepent variable is what you manipulate, a treatment or program or
cause. It causes a particular outcome or response.
 The dependent variable is what is affected by the independent variable,
the effects or outcomes.
 Extraneus variables are factors which cannot be controlled.

 If you are studying the effects of a new educational program on student


achievement…
 Independent variable: the program
 Dependent variable: your measures of achievement
 Extraneus variables are independent variables that have not been
controlled. They may or may not influence the results. Normally there is
an homogeneous distribution, randomly, of those variables among
groups (if not, we should take them into consideration)
5 Mónica Marbán
SAMPLING

 POPULATION (N): is the group you wish to generalize to in your study.


The units of this set can be observed individually. All the cases that are
according to that which you wish to study.
• Theoretical population: the population you would like to generalize to.
• Accessible population: the population that will be accesible to you.
 SUBJECT or PARTICIPANT: person from whom data are collected.
• Subject: term used in a quantitative context
• Participant: term used in a qualitative context
 SAMPLE: is a subgroup of population group. The collective group of
subjects or participants from whom data are collected.

Considerations in sampling: the sample size (at least 30 if you are going to
use statistical analysis on your data), the representativeness of the sample (it
allows us to extract conclussions), the access to the sample, the sampling
strategy to be used.

6 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Frecuency distribution. One variable Statistical Analysis.


It is a table in which data are summarized in columns. It is useful when the number of
data available is high. Elements:

• Values, xi: the different values taken on by the variable. If the variable is not
nominal its values must be ordered from the lowest to the highest.

• Absolute frequency, ni: the number of times xi is repeated.


The total number of data is N.

• Relative frequency, fi: the ratio of the absolute frequency to N. It´s a proportion.

• Absolute cumulative frequency, Ni: the number of observations which are equal
or lower than a given value.

• Relative cumulative frequency, Fi: it is the absolute cumulative frequency Ni


divided by the total number of data N.

7 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Organizing the Data into Summary Tables:

xi ni fi Ni Fi
x1 n1 f1 N1 F1
x2 n2 f2 N2 F2
… … … … …
xi ni fi Ni Fi
… … … … …
xn nn fn Nn=N Fn=1
∑ N 1

8 Mónica Marbán
FRECUENCY DISTRIBUTIONS

In order to build a complete frequency distribution is enough to know the values of the
variable, some other column and N.
Two frequency distributions will be equal if the values of the variables are the same and so
are the corresponding relative frequencies
The range of a variable is the difference between the highest and the lowest value of such a
variable (R = the maximum value – the minimum value)
When we consider the data included in intervals we have grouped frequency distributions.
These have additional elements:
– Classes or intervals in which the data are included [li-1;li) or (li-1;li]
– The amplitud/ width or length of each interval, ci: the difference between the upper
and the lower class limits. The width can be constant or variable.
– Class mark or midpoint, xi, is the value representing the class and is calculated by
dividing the sum of the upper and lower class limits by 2
– Density, di, is the concentration of data inside the interval and is calculated by
dividing the absolute frequency by the length or width of interval

9 Mónica Marbán
FRECUENCY DISTRIBUTIONS

[li-1;li) ci x’i ni fi di Ni Fi
[li-1;li) c1 x’1 n1 f1 d1 N1 F1
[li-1;li) c2 x’2 n2 f2 d2 N2 F2
… … … … … … …
[li-1;li) ci x’i ni fi di Ni Fi
… … … … … … …
[ls-1;ls) cs x’s ns fs ds Ns=N Fs=1

∑ N 1

li + li −1 ni
ci =
li − li −1 ; x = ; di =
'
i
2 ci

10 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Example: We asked at random 30 students about the highest year in which they have
got a course:

10 students were in first year


8 students were in second year
6 students were in third year
The resto of students were in fourth year

Put this array into a frecuency distribution table with columns for values, frecuency
and percentage.
X ni Ni fi Fi

1st 10 N1= 10

2nd 8 N2= 10 + 8 = 18

3rd 6 N3= 10 + 8 + 6 = 24

4th 6 N4 = 10 + 8 + 6 + 6 = 30

11 Mónica Marbán
FRECUENCY DISTRIBUTIONS

X ni Ni fi Fi

1st 10 10 f1 = 10/30 = 0,3333

2nd 8 18 f2 = 8/30 = 0,2667

3rd 6 24 f3 = 6/30 = 0,2

4th 6 30 f4 = 6/30 = 0,2

f1 + f2 + … + fn = f1 + f2 + f3 + f4 = 0,3333 + 0,2667 + 0,2 + 0,2 = 1

Relative frecuencies can also be expressed in percents (%), if we multiply the


result by 100. In this case: f1 + f2 + … + fn = 100

To calculate the cumulative relative frecuency, Fi: add all the previous relative
frecuencies (fi) to the relative frecuency for the current row (add this one too). The last entry
of the cumulative relative frecuency column is one, indicating that one hundred percent of the
data has been acumulated.

12 Mónica Marbán
FRECUENCY DISTRIBUTIONS

X ni Ni fi Fi

1st 10 10 33,33% F1= 33,33%

2nd 8 18 26,67% F2= 33,33% + 26,67% = 60%

3rd 6 24 20% F3= 33,33% + 26,67% + 20% = 80%

4th 6 30 20% F4= 33,33% + 26,67% + 20% +20% = 100%

Fn = F4= 100% (if we use percents) ó 1 (if we operate per one)

Fi, because it is a relative frecuency it can be calculated as a fraction, dividing the


absolute cumulative frecuency (Ni) by the total number of students in the sample (N).

Ni
Fi =
N

13 Mónica Marbán
FRECUENCY DISTRIBUTIONS

2.3 Grouped frecuency distributions

The challenge is to group the data in such way that the most significant trends
become sharply visible. There is no single best solution to this problem, as so much of it
involves personal insights and judgements.

Let´s see and example and how to calculate class intervals, class marks, width,…

Example: In the following table we find the amount of money that 50 employees have
received as a bonus at the end of the year:

Money(€) 300 400 500 700 750 800 1000 1200 1500

Nº employees 5 7 10 11 6 5 3 2 1

If the number of values is high enough, it is possible to group them in CLASSES OR


INTERVALS.

14 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Class mark or midpoint (xic).

Li −1 + Li
xic =
2

Li-1: lower limit class


Li: upper limit class

In this example the number of the classes and their width don´t follow any rule.
Later on we will see some criteria we could use to build intervals.

Money (€) Class width


Li-1- Li (ci)
250-500 c1= 500-250 = 250
Equal
intervals 500-750 c2= 750-500 = 250 Unequal
width 750-1000 c3= 1000-750 = 250
class width
1000-1500 c4= 1500-1000 = 500

15 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Li-1- Li ci Midpoint (xic)


250-500 250 X1c = (250+500)/2 = 375
500-750 250 X2c = (500+750)/2 = 625
750-1000 250 X3c = (750+1000)/2 = 875
1000-1500 500 X4c = (1000+1500)/2 = 1250

16 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Statistics offers some guidelines for transforming ungrouped distributions into grouped
distributions:

1) Use no fewer than 5 classes and no more than 20. The number of classes is decided in
a somewhat arbitrary manner. Herewith, you have a quick guide to approximate the number
of intervals.

SAMPLE SIZE NUMBER of INTERVALS


Fewer than 50 5–7
50 to 100 7–8
101 to 500 8 – 10
501 to 1000 10 – 11
1001 to 5000 11 – 14
More than 5000 14 - 20

17 Mónica Marbán
FRECUENCY DISTRIBUTIONS

18 Mónica Marbán
FRECUENCY DISTRIBUTIONS

2.4 Charts and graphs


A) Categorical variables (nominal and ordinal):
• Bar charts are appropiate to represent ni
• Pie charts are adequate to depict fi or percentages
B) Discrete variables with ungrouped data:
• Bar charts for either ni or fi
• Pie charts are adequate to depict fi

C) Quantitative variables with grouped data


• Histograms: graphs with classes in X axis and rectangles over them.
The area of the bar is proportional to the corresponding frequency
(abolute, relative or density).
• An ogive or cumulative frequency poligons for Ni or Fi

19 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Example 1. Consider a group of 20 university students being assessed in


their level of English with the following results:

xi ni fi
Very poor 2 0,10
Poor 3 0,15
Average 5 0,25
Good 6 0,30
Very good 4 0,20
20 1

Ordinal variable

20 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Bar chart: the bar height is proportional to ni

7
Absolute frequency

6
5
4
3
2
1
0
Very Poor Average Good Very
poor good

21 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Pie chart: each sector is proportional to fi ni αi


=
N 360º

Relative frequency fi fi =
ni
α i = f i * 360º
N

20% 10%
15% Very poor
Poor
30% Average
25%
Good
Very good

22 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Example 2. Now let’s ask for their ages

xi ni fi
18 4 20%
19 5 25%
20 4 20%
21 3 15%
23 2 10%
25 1 5%
27 1 5%
20 1

The variable is discrete (with ungrouped data)

23 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Its (approximate) bar chart for fi:

30%

25%
Relative frequency fi

20%

15%

10%

5%

0%
18 19 20 21 23 25 27

24 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Another example:
xi ni
1 2
2 3
3 4
4 3
5 6
6 8 16
7 14 14
14
8 9
9 4 12
10 2 10 9
8
Total 55 ni 8
6
6
4 4
4 3 3
2 2
2
0
1 2 3 4 5 6 7 8 9 10
xi

25 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Example 3. Finally, we will study the students’ height in


centimeters (cm), with grouped data:

[Li-1-Li) x'i ci ni fi Ni Fi di
160-165 162,5 5 3 0,15 3 0,15 0,60
165-172 168,5 7 4 0,20 7 0,35 0,57
172-180 176 8 6 0,30 13 0,65 0,75
180-184 182 4 4 0,20 17 0,85 1,00
184-193 189,5 9 3 0,15 20 1 0,33
TOTAL 20 1

Corresponding to a continuous variable

26 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Its (approximate) histogram being: graphs with classes in X axis and rectangles
over them. The area of the bar is proportional to the corresponding
frecuency (absolute, relative or density)

1,0
0,9
Density frequency di

0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,0
160-165 165-172 172-180 180-184 184-193

27 Mónica Marbán
FRECUENCY DISTRIBUTIONS

Another example:

Intervalos ni
0-2 6
2-4 10
4-6 30 40
6-8 35 35
8-10 10
30
10-12 6
Total 97 25
ni 20
15
10

5
0
0 2 4 6 8 10 12 14

xi

28 Mónica Marbán

You might also like