Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

STATISTICS-1.

pdf

margaarp

Estadística Económica

1º Grado en Economía y Negocios Internacionales

Facultad de Ciencias Económicas, Empresariales y Turismo


Universidad de Alcalá

Reservados todos los derechos.


No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
1. INTRODUCTION TO STATISTICS
1 INTRODUCTION
Statistics, in a general definition, is a set of tools used to study regularities in the data coming from stochastic
events that are subject to observation. It is usually perceived as a set of numbers, graphs, averages, etc.

Statistics: a science that allows us to interpret the information, choose appropriate


representative samples to make inference, estimate causal relationships and predict.

Statistics is a science articulated in three fields:

o Descriptive statistics: allows to observe and numerically describe a population. It is a deductive


based method.
o Probability: from the definition of probability, it studies theoretical models for the populations.
o Inferential statistics: it induces general properties for the whole population based on an observed
sample and on probability.

2 POPULATION, ELEMENTS AND CHARACTERISTICS


Deterministic Phenomena (Causality)
NATURE

Random events
HAZARD Statistical regularity
(Stochastic)

The behaviour of the events Description


KNOWLEDGE

The explanation of its behaviour Causal analysis

ONE DIMENSIONAL
According to the number if
characteristics observed
MULTIDIMENSIONAL
Characteristics
According to the nature of the Qualitative - attribute Modality
results
OBSERVATION

Quantitative - variables Values

Discret Continuous

Elements, individuals or statistic units Simple / composed

Population, group or universe Invariant/variable

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
3 METHODS OF OBSERVATION
A population can be observed in three ways:

• Exhaustively: All the the elements of the population are observed. It is known as CENSUS. It takes
long and it is very laborious.

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
• Partially: Only part of the population is observed. There are two ways to partially observe the data.
- Sample: If the observed elements do not share any condition that make them different than the
rest. The elements in a sample form a smaller population.
- Sub-population: The only elements that are observed are those that share a specific characteristic
that make them different to the rest.
• Mixed: Part of the population is exhaustively observed and a sample is selected from the rest of the
population.

4 MEASURING THE CHARACTERISTICS: VARIABLES AND MEASURE SCALES


o For qualitative (attribute)
- Nominal: xi = xj or xi ≠ xj
- Ordinal: if xi ≠ xj ⇒ xi ≤ xj or xi ≥ xj
o For quantitative (variable)
- Interval: also, if xi ≥ xj ⇒ ∃ xi – xj
- Ratio: also, 0 is a meaningfull value, such that if xj ≠0 ⇒ ∃ xi/xj ⇒ There is a pattern unity

5 STAGES OF AN OLD (TRADITIONAL) STATISTICAL STUDY


1. Objective definition

• Identification of the characteristics to study


• Definition of the population
• Identification of the frame and its accesibility
• Decide on the observation method
• Specify the observation setting and the way to obtain the data.

With big data:

▪ I have this data, what can I do with this information?


- Estimate personal consumption
- Tourists statics

2. Data collection

• Questionnaire
• Sample design
• Data collection
• Data processing

3. Data analysis

• Descriptive analysis.
• Sample and non sample error

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Descarga carpetas completas de una vez con el Plan PRO y PRO+


Estadística Económica
Banco de apuntes de la
6 QUALITATIVE AND QUANTITATIVE DATA
If the observed characteristic cannot be numerically described, but its values are qualitative, we say it is an
attribute. The observation of an attribute shows different modalities. As examples, we find the gender or
nationality of someone.

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
If the characteristic can be numerically described, we say it is a variable. Depending on the set of possible
values it takes, we have the following classification:

o Discrete variables: they take values on a numerable (finite or infinite) set of values. For instance, the
age, the number of members in the family, etc.
o Continous variable: they may take values in finite or non-numerable infinite like all the values in a
real interval. For instance, height, weigh, distances, etc.

7 TABLES
TABLES FOR ATTRIBUTES

Let’s asume that X is the observed attribute and that it has k different modalities that we denote by
x1,x2,...,xi,...,xk

We count the number of repetitions of each modality and summary the result in a table like the following
one:

ni is the number of times that the


attribute modality xi has been observed.
It is the observed frequency or absolute
frequency for xi.

If there is a total of N observations, then:

Example 1:

Within a group of 100 people, we have observed that 50 are married, 25 single, 15 widow and
10 divorced. Build a frequency table for the attribute civil status.

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Descarga carpetas completas de una vez con el Plan PRO y PRO+


TABLES FOR NON-GROUPED VARIABLES

Let`s assume that X is a discrete statistical variable that takes a few different values, but that they are repeated
many times. First, we order increasingly the different values, x1<x2<…<xi<…<xk.

Now we count how many times each of them has been observed and build a table:

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
ni is the number that the variable value xi
has been observed. It is the observed
frequency or absolute frequency for xi.

If there is a total of N observations, then:

Example: 2

In a community of 25 neighbours, we have asked how many people live in the house and we
have obtained the following data:

4431424514135

433551211255

Build a table that summarises the distribution of the variable “Number of people in the house”

TABLES FOR VARIABLES GROUPED IN INTERVALS

Let’s assume X is a statistical variable that take many different values. In this case, it is interesting to group
the observed values in k intervals. In this case, the frequency ni is the number of observations in the i-th
interval, it is, the number of observations, x, that verify that Li-1<x≤Li

ni is the number that the variable value xi has been observed. It


is the observed frequency or absolute frequency for xi.

Xi is the midclass point, it is the mean


value of the interval

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Descarga carpetas completas de una vez con el Plan PRO y PRO+


Example 3:

The earnings (€) obtained by a shop during 25 days are the following:

Calculate the different frequency distributions:

First decide on the interval width depending on the number of groups that we want:

Then the intervals are [5.000, 9.000]; (9.000, 13.000], etc. We can see that in the first interval there
are 3 observations (marked in Green). In the second interval there are 4 observations (marked in
blue)…

8 FREQUENCY DISTRIBUTIONS: ABSOLUTE, RELATIVE, ACCUMULATED


• Absolute Frequency of the xi value (or i-th interval): is the number of observations associated to xi
(or to the i-th interval). It is represented by ni.

• Relative Frequency of the xi variable value (or i-th interval): is the


ratio between the absolute frequency for that value and the total
number of data N, it is:

• Accumulated Frequency Ni of a certain variable value (ordered from


lowest to highest) xi (or i-th interval): is the number of observations
less or equal to it:

• Accumulated Relative Frequency Fi of a certain variable value xi (or


i-th interval): is the ratio between its accumulated frequency and the
total number of data N:

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
Absolute and Relative accumulated frequencies do not apply when the data are nominal

Example 4:

Calculate the frequencies associated to example 1.

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
Obtain all the frequency distributions for example 2.

Obtain all the frequency distributions for example 3.

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Descarga carpetas completas de una vez con el Plan PRO y PRO+


FREQUENCY DISTRIBUTION PROPERTIES

• Absolute

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
• Relative

9 GRAPHS
GRAPHS FOR CHARACTERISTICS

PIE CHART BAR CHART

GRAPHS FOR NON-GROUPED VARIABLES

With these data we can graph all the different frequency distributions.

The Bar chart is a way to graph absolute or relative frequencies.

To build it, the x axis represent the different values of the variable, while y axis shows the frequencies
(absolute or relative) of each value.

Example 2: bar charts

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Descarga carpetas completas de una vez con el Plan PRO y PRO+


To represent cumulative frequencies (absolute or relative) we make a Step chart. This chart shows a step
line that joins the points (xi , Ni ) The charts are equivalent if we use the relative cumulative frequencies and
join the points (xi , Fi )

Example 2: step charts

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
GRAPHS FOR GROUPED VARIABLES

There are two main charts for these kind of variables: one for absolute and relative frequencies and another
for cumulative ones. Absolute or relative frequency are represented by means of the histogram.

This charts represents rectangles over each interval with a surface that is proportional to its frequency. It
makes necessary to introduce the concept of frequency density:

Example 3: histogram

Example 5:

The next table shows monthly sales of 50 shops (thousand €). Make the histogram.

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Descarga carpetas completas de una vez con el Plan PRO y PRO+


To graph the cumulative frequency distribution we use the cumulative frequency polygon. We join by
lines the following points:

Example 3: cumulative frequency polygon

a64b0469ff35958ef4ab887a898bd50bdfbbe91a-9752556

Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.

You might also like