Professional Documents
Culture Documents
Statistics in Economics and Management
Statistics in Economics and Management
Ademir ABDI]
STATISTICS
IN ECONOMICS
AND MANAGEMENT
Sarajevo, 2010.
Naziv djela STATISTICS IN ECONOMICS AND MANAGEMENT
Tiraž 300
330.45:519.2]:005(075.8)
ISBN 978-9958-25-056-9
1. Resić, Emina
COBISS.BH-ID 18502150
STATISTICS IN ECONOMICS AND MANAGEMENT
PREFACE
All kind of activities require the use of numbers. Students are expected
to work with confusing sets of data and statistics help them to make
sense of it. By using statistical tools, we aim to simplify complex
problems and present to the others in comprehensive form.
This book is intended for the students of Economics and the closely
related Accountancy and Business disciplines. It provides examples
and problems relevant to those subjects, using real data where possible.
This is book for an elementary level and requires no prior knowledge
of statistics, nor advanced mathematics. Book covers all the relevant
concepts so that an understanding of why a particular statistical test
should be used is gained. These concepts are introduced naturally in the
course of the text, rather than having sections to themselves. The book
can form the basis of a one- or two-term course, depending upon the
intensity of the teaching.
Some tasks were done using Excel, in order to show the benefit that
Excel and other computer program have for solving of statistical
problem. We have included Excel output in the form of screenshots so
that reader become familiar with the program and be equipped to use it
on its own. There is possibility of numerical differences in results as
a consequence of differences in the precision of computing resources
and rounding.
3
PREFACE
Since this is the first edition of this book, the authors will be grateful for
any suggestions that would improve the quality of this book.
4
CONTENT
1. DATA COLLECTION AND PRESENTATION ...................... 11
1.1. WHAT IS STATISTICS ........................................................ 13
1.2. DATA/INFORMATION/STATISTICS ................................ 15
1.3. SCALES OF MEASUREMENT .......................................... 18
1.4. DISCRETE AND CONTINUOUS VARIABLES ................ 20
1.5. DATA COLLECTION .......................................................... 21
1.5.1. Population and sample ............................................... 21
1.5.2. Census ......................................................................... 22
1.5.3. Sampling ..................................................................... 23
1.6. TYPES OF SAMPLE ........................................................... 25
1.6.1. Simple random sample ................................................ 25
1.6.2. Stratified sample ......................................................... 28
1.6.3. Cluster sampling ......................................................... 29
1.6.4. Quota sampling ........................................................... 31
1.6.5. Systematic sampling ................................................... 33
1.6.6. Calculating a Sample Size ........................................... 34
1.7. FREQUENCY DISTRIBUTION ......................................... 35
1.7.1. Constructing frequency distribution table ................... 41
1.7.2. Constructing cumulative frequency
distribution tables ........................................................ 45
1.7.3. Class intervals ............................................................. 48
1.7.4. Outliers ....................................................................... 57
1.8. DATA PRESENTATION: TABLES,
DIAGRAMS AND GRAPHS ............................................... 58
5
CONTENT
6
STATISTICS IN ECONOMICS AND MANAGEMENT
7
CONTENT
8
STATISTICS IN ECONOMICS AND MANAGEMENT
9
CONTENT
10
1
DATA
COLLECTION
AND
PRESENTATION
CHAPTER
1
STATISTICS IN ECONOMICS AND MANAGEMENT
“The best thing about being a statistician is that you get to play in
everyone else’s backyard.” John Tukey, Princeton University1
Data used in these examples are numerical. Human brain has limited
capacity to deal with ample incoming information and when faced with
large groups of numbers, most people cannot normally hold them all in
mind at once. It is difficult to make any conclusions by simply looking
at the raw data; therefore, it is useful to create some kind of overall
picture or summary of what is going on. The main purpose of statistics
is to accurately summaries the data into easily interpretable fewer
numbers.2 The statistician’s role involves the extraction and synthesis of
important features of a large body of numerical data. They try to make
sense out of numerical data by data summary, which helps to get an
easily understandable picture, while little of importance is lost.
1
http://math.hunter.cuny.edu/, access 25. 04. 2010.
2
http://www.marketresearchworld.net/index.php?option=com_content&task=view&id=21&I
temid=41, access 27. 01. 2010.
13
1 DATA COLLECTION AND PRESENTATION
Here are some of the many real-world examples that require the use of
statistics:
Your company has created a new drug that may cure some disease.
How would you conduct a test to confirm the drug's effectiveness?
The latest sales data have just come in, and your manager wants you
to prepare a report for management about areas where the company
could improve its business. What should you look for? What should
you not look for?
A widget maker in your factory that normally breaks 4 widgets for
every 100 it produces has recently started breaking 5 widgets for
every 100. When is it time to buy a new widget maker? (And just
what is a widget, anyway?)
14
STATISTICS IN ECONOMICS AND MANAGEMENT
With the appropriate tools and solid grounding in statistics, one can use
a limited sample to make intelligent and accurate statements about the
population. In today's information-overloaded age, statistics is one of
the most useful subjects anyone can learn. Newspapers are filled with
statistical data, and anyone who is ignorant of statistics is at risk of being
seriously misled about important real-life decisions such as what to eat,
who is leading the polls, how dangerous smoking is, etc. Knowing a
little about statistics will help one to make more accurate decisions
about these and other important questions. Furthermore, statistics are
often used by politicians, advertisers and others who use statistics to
twist the truth for their own gain. For example, a company selling the
cat food brand “Cat-sweet” may claim in their advertisements that eight
out of ten cat owners said that their cats really preferred brand “Cat-
sweet” food to "the other leading brand" of cat food. What they may
not mention is that the cat owners questioned were those they found in
a supermarket buying “Cat-sweet”.
Statistics is the most powerful tool available for assessing the significance
of experimental data and for drawing the right conclusions from the
vast amount of data faced by engineers, scientists, sociologists and the
other professionals. There is no social, health-care, environmental or
political study that does not rely on statistical methodologies. Since the
nature of variation is ubiquitous, probability and statistics, fields that
allow us to study, understand, model, embrace and interpret variation,
are ubiquitous as well.
1.2. DATA/INFORMATION/STATISTICS
15
1 DESCRIPTIVE STATISTICS
15 individuals in the
28 kg, 30 kg, etc. Median weight = 28 kg
25-to-30-kg range
Data can take various forms, but are often numerical. As such, data can
relate to an enormous variety of aspects. Some examples are:
Other forms of data exist, such as radio signals, digitized images and
laser patterns on compact discs.
3
http://www.statcan.gc.ca/edu/power-pouvoir/ch1/definitions/5214853-eng.htm, access 25.
05. 2010.
16
STATISTICS IN ECONOMICS AND MANAGEMENT
Information, like data, can take various forms. Some examples of the
different types of information that can be derived from data include:
Using the previous examples, some of the statistics that can be obtained
include:
17
1 DESCRIPTIVE STATISTICS
Nominal scale
We can only compare whether variables are equal or unequal. There are
no "less than" or "greater than" relations among them, nor operations
such as addition or subtraction.
Ordinal scale
For example there is “product satisfaction”, because you can be: very
satisfied, satisfied, neutral, unsatisfied or very unsatisfied. A physical
example is the Mohs scale of mineral hardness. Another example is the
results of a horse race; which horses arrived first, second, third, etc. are
18
STATISTICS IN ECONOMICS AND MANAGEMENT
reported, but the time intervals between the horses are not reported.
The most measurement in psychology and other social sciences is at
the ordinal level; for example attitudes and IQ are only measured at
the ordinal level. If customers surveyed report preferring chocolate to
vanilla-flavored ice cream, the data are of this kind.
Ratio scale
All mathematical operations are possible with this type of data and lead
to meaningful results. There are numerous methods for analyzing this
type of data.
Interval scale
19
1 DESCRIPTIVE STATISTICS
20
STATISTICS IN ECONOMICS AND MANAGEMENT
21
1 DESCRIPTIVE STATISTICS
1.5.2. Census
There are various reasons why a census may or may not be chosen as
the method of data collection:
Census data
Advantages (+)
Sampling variance is zero: There is no sampling variability attributed to the
statistic because it is calculated using data from the entire population.
4
http://www.statcan.gc.ca/edu/power-pouvoir/ch2/types/5214777-eng.htm, access 20.05.2010.
22
STATISTICS IN ECONOMICS AND MANAGEMENT
Disadvantages (–)
Cost: In terms of money, conducting a census for a large population can be very
expensive.
Time: A census generally takes longer to conduct than a sample survey.
Response burden: Information needs to be received from every member of the
target population.
1.5.3. Sampling
5
a natural suggestion is that these chances should be equal
23
1 DESCRIPTIVE STATISTICS
Phoning the fifth person on every page of the local phonebook and
asking them how long they have lived in the area.
Selecting sub-populations in proportion to their incidence in the
overall population. For instance, a researcher may have reason to
select a sample consisting of 30% females and 70% males from a
population that has same gender structure.
Selecting several cities in a country, several neighborhoods in
those cities and several streets in those neighborhoods to recruit
participants for a survey.
In a sample survey, data are gathered for only part of the total population.
If you collected data about the height of 10 students in a class of 30, that
would be a sample survey of the class rather than a census. Reasons one
may or may not choose to use a sample survey include:
Sample survey
Advantages (+)
Cost: A sample survey costs less than a census because data are collected from
only part of a group.
Time: Results are obtained far more quickly for a sample survey, than for a census.
Fewer units are contacted and less data needs to be processed.
Response burden: Fewer people have to respond in the sample.
6
http://www.statcan.gc.ca/edu/power-pouvoir/ch2/types/5214777-eng.htm, access 20.05.2010.
24
STATISTICS IN ECONOMICS AND MANAGEMENT
Control: The smaller scale of this operation allows for better monitoring and
quality control.
Disadvantages (–)
Sampling variance is non-zero: The data may not be as precise because the data
came from a sample of a population, instead of the total population.
Detail: The sample may not be large enough to produce information about small
population sub-groups or small geographical areas.
7
A measurement will be unbiased when the average of a large set of unbiased measurements
is close to the true value of parameter for population.
25
1 DESCRIPTIVE STATISTICS
The chance that any particular member of the frame is selected on the
1
first draw is . Then the chance that any particular member of the frame
N 1
not previously selected will be selected on the second draw is ,
N −1
etc. This process continues until desired sample of size n is obtained.
Sampling without replacement deliberately avoids choosing any member
of the population more than once. An unbiased random selection of
individuals is important so that in the long run, the sample represents the
population. However, this does not guarantee that a particular sample
is a perfect representation of the population. Simple random sampling
merely allows one to draw externally valid conclusions about the entire
population based on the sample.
26
STATISTICS IN ECONOMICS AND MANAGEMENT
final sample. For the sake of the example, let's say you want to select
100 clients to survey and that there were 1000 clients over the past 12
months. Then, the sampling fraction is f = n/N = 100/1000 = 0.10 or
10%. Now, to actually draw the sample, you have several options. You
could print off the list of 1000 clients, tear them into separate strips, put
the strips in a hat, mix them up real good, close your eyes and pull out
the first 100. But this mechanical procedure would be tedious and the
quality of the sample would depend on how thoroughly you mixed them
up and how randomly you reached in. Perhaps a better procedure would
be to use the kind of ball machine that is popular with many of the state
lotteries. You would need three sets of balls numbered 0 to 9, one set
for each of the digits from 000 to 999 (if we select 000 we'll call that
1000). Number the list of names from 1 to 1000 and then use the ball
machine to select the three digits that select each person. The obvious
disadvantage here is that you need to get the ball machines.8
8
http://www.socialresearchmethods.net/kb/sampprob.php, access 26. 05. 2010.
27
1 DESCRIPTIVE STATISTICS
Example 1.1.
28
STATISTICS IN ECONOMICS AND MANAGEMENT
The first step is to find the total number of staff (180) and calculate the
percentage in each group:
9
http://www.socialresearchmethods.net/kb/sampprob.php, access 25. 01. 2010.
29
1 DESCRIPTIVE STATISTICS
10
http://www.angelfire.com/empire/richardt/, access 26. 01. 2010.
30
STATISTICS IN ECONOMICS AND MANAGEMENT
Often the total sample size must be fairly large to enable cluster
sampling to be used effectively. Cluster sample is mostly more effective
than simple random sample, particularly if the population is spread over
a wide territory.
Like with stratified sampling, the researcher first identifies the stratums
and their proportions as they are represented in the population. Then
convenience or judgment sampling is used to select the required number
of subjects from each stratum. This differs from stratified sampling,
where the stratums are filled by random sampling.
31
1 DESCRIPTIVE STATISTICS
For instance, if you know the population has 40% women and 60% men,
and that you want a total sample size of 100, you will continue sampling
until you get those percentages and then you will stop. So, if you've
already got the 40 women for your sample, but not the sixty men, you
will continue to sample men but even if legitimate women respondents
come along, you will not sample them because you have already "met
your quota." The problem here (as in much purposive sampling) is that
you have to decide the specific characteristics on which you will base
the quota. Will it be by gender, age, education race, religion, etc.?
Here, you're not concerned with having numbers that match the
proportions in the population. Instead, you simply want to have enough
to assure that you even will be able to talk about small groups in the
population. This method is the nonprobabilistic analogue of stratified
random sampling in that it is typically used to assure that smaller groups
are adequately represented in your sample.
32
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 1.2.
For example, suppose you want to sample 8 houses from a street of 120
houses. 120/8=15, so every 15th house is chosen after a random starting
point between 1 and 15. If the random starting point is 11, then the houses
selected are 11, 26, 41, 56, 71, 86, 101, and 116.
33
1 DESCRIPTIVE STATISTICS
The three most important factors that determine sample size are:
How accurate you wish to be?
How confident you are in the results?
What budget you have available?
34
STATISTICS IN ECONOMICS AND MANAGEMENT
First result that we get after research is series with gross data.
35
1 DESCRIPTIVE STATISTICS
Example 1.3.
From table we can conclude that 30 people “agree somewhat” with this
statement, etc.
This simple tabulation has two drawbacks. When a variable can take
continuous values instead of discrete values or when the number of
possible values is too large, the table construction is cumbersome, if not
impossible. A slightly different tabulation scheme based on the range
of values (classes or intervals) is used in such cases .
Example 1.4.
There is one example for using Excel procedure for creating frequency
distribution: According to data base for HBS 200411 we have information
Constructing about several variables for 7,413 households:
of frequency Entity
distribution
Canton
using Excel.
Gender
Marital status
Education level
Employment status
11
Database Household Budget Survey 2004, B&H Agency for Statistics
36
STATISTICS IN ECONOMICS AND MANAGEMENT
First, we will enter in empty column of Excel sheet type modalities for
given variable. We will take variable “marital status” with modalities:
unmarried, married, informal marriage, divorced and widower/
widow:
37
1 DESCRIPTIVE STATISTICS
38
STATISTICS IN ECONOMICS AND MANAGEMENT
39
1 DESCRIPTIVE STATISTICS
Example 1.5.
From that table we can see that 25 students have height in range 4.5-5.0,
etc.
40
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 1.6.
1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0
41
1 DESCRIPTIVE STATISTICS
Your frequency distribution table for this exercise should look like this:
42
STATISTICS IN ECONOMICS AND MANAGEMENT
we can use Excel function Frequency. First we have to fix all modalities
(0, 1, 2, 3, and 4) in a new column:
Then we have to select free cells beside that column and choose Excel
function – Statistical – Frequency:
Data array – row or column or array with original data,
Bins array – new column with modalities,
43
1 DESCRIPTIVE STATISTICS
44
STATISTICS IN ECONOMICS AND MANAGEMENT
The last value will always be equal to the total for all
observations (N), since all frequencies will already have been
added to the previous total.
45
1 DESCRIPTIVE STATISTICS
It is calculated by:
1. dividing the cumulative absolute frequency by the total number of
observations, then multiplying it by 100, or by
2. adding each percentage frequency from a frequency distribution
table to the sum of its predecessors:
The last value for increasing relative cumulative frequency will always
be equal to 100%.
Example 1.7.
Participant (10 in total) of the summer fair had to fill out a form with
personal information (sex, ages, occupation,…). We were interested in
age structure and hence sort out ages of participants:
Use the following steps to present these data and create cumulative
frequency distribution table:
Divide the results into intervals, and then count the number of results
Constructing
cumulative
in each interval. In this case, intervals of 10 are appropriate. Since 36
frequency is the lowest age and 92 is the highest age, start the intervals at 35 to
distribution 44 and end the intervals with 85 to 94.
table.
Create a table similar to the frequency distribution table but with
three new columns for cumulative frequencies.
In the first column or the Lower value column, put the lower value
of the result intervals. For example, in the first row, we would put the
number 35.
46
STATISTICS IN ECONOMICS AND MANAGEMENT
The next column is the Upper value column. Place the upper value
of the result intervals. For example, we would put the number 44 in
the first row.
The third column is the Frequency column. Record the number of
times a result appears between the lower and upper values for given
interval. In the first row, we would place the number 1.
The fourth column is Interval or class column. For the first interval,
upper- bounded limits would be 35 – 45.
The fifth column is the Cumulative frequency column. Here we add
the cumulative frequency of the previous row to the frequency of the
current row. Since this is the first row, the cumulative frequency is
the same as the absolute frequency. However, in the second row, the
frequency for the 35–45 interval (i.e., 1) is added to the frequency
for the 45–55 interval (i.e., 2). Thus, the cumulative frequency is 3
(1+2=3), meaning we have 3 participants in the 34 to 54 age group.
The next column is the Percentage column. In this column, a list
of the percentage of the frequency is given. To do this, divide the
frequency by the total number of data and multiply by 100. In this
case, the frequency of the first row is 1 and the total number of data
is 10. The percentage would then be 10.0. ((1/10)*100 =10.0).
The final column is Cumulative percentage. In this column, divide
the cumulative frequency by the total number of results and then to
make a percentage, multiply by 100. Note that the last number in this
column should always equal 100.0. In this example, the cumulative
frequency is 1 and the total number of data is 10, therefore the
cumulative percentage of the first row is 10.0. ((1/10)*100=10.0).
However, in the second row, the frequency for the 35–45 interval
(i.e., 10) is added to the frequency for the 45–55 interval (i.e.,
(2/10)*100=20). Thus, the cumulative frequency is 30 (10+20=30),
meaning we have 30% of participants in the 34 to 54 age group.
47
1 DESCRIPTIVE STATISTICS
Cumulative
Lower Upper Frequency Cumulative
Class absolute Percentage
Value Value ( fi ) percentage
Cumulative frequency
frequency
35 44 35 - 45 1 1 10.0 10.0
distribution
table. 45 54 45 - 55 2 3 20.0 30.0
55 64 55 - 65 2 5 20.0 50.0
65 74 65 - 75 2 7 20.0 70.0
75 84 75 - 85 2 9 20.0 90.0
85 94 85 - 95 1 10 10.0 100.0
For example, we can conclude that 50% of the participants have less
than 65 years of age (or 64 years of age or less), etc.
To illustrate, suppose we set out age ranges for a study of young people,
while allowing for the possibility that some older people may also fall
into the scope of our study. The absolute frequency of a class interval
is the number of observations that occur in a particular predefined
interval. So, for example, if 20 people aged 5 to 9 (9 is included) appear
in our study's data, the frequency for the [5–9] or [5–10[ interval is 20.
The endpoints of a class interval are the lowest and highest values that
a variable can take (L1i and L2i). So, the closed intervals in our study are
0 to 4 years, 5 to 9 years, 10 to 14 years, 15 to 19 years, 20 to 24 years,
and 25 years and over. The endpoints of the first interval are 0 and 4
if the variable is discrete, and 0 and 4.999 if the variable is continuous.
The endpoints of the other class intervals would be determined in the
same way.
48
STATISTICS IN ECONOMICS AND MANAGEMENT
In deciding on the width of the class intervals, you will have to find a
compromise between having intervals short enough so that not all of
the observations fall in the same interval, but long enough so that you
do not end up with only one observation per interval. It is also very
important to make sure that the class intervals are mutually exclusive.
49
1 DESCRIPTIVE STATISTICS
Example 1.8.
Thirty AA batteries were tested to determine how long they would last.
The results, to the nearest minute, were recorded as follows:12
423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363,
391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419, 386, 390
The lowest value is 363 and the highest is 431. Using the given data
and a width of class interval of 10, the interval for the first class is
[360 to 370[, where 363 (the lowest value) is included. Remember, there
should always be enough class intervals so that the highest value has
been included.
12
http://www.statcan.gc.ca/edu/power-pouvoir/ch8/5214814-eng.htm#Top, access: 26.01.2010.
50
STATISTICS IN ECONOMICS AND MANAGEMENT
First we have to find minimal and maximal value in data set for our
decision about number and width of classes. We will use Excel functions
MIN and MAX:
51
1 DESCRIPTIVE STATISTICS
52
STATISTICS IN ECONOMICS AND MANAGEMENT
Lowest value in data set is 363 and highest value is 431. We will take
classes with 10, and according to that we will create 8 classes. Then, we
will create new columns, one with lower and one with upper endpoints
of closed classes:
53
1 DESCRIPTIVE STATISTICS
Then we have to select free cells beside that column and choose Excel
function – Statistical – Frequency:
Data array – row or column or array with original data,
Bins array – new column with upper endpoint of classes,
54
STATISTICS IN ECONOMICS AND MANAGEMENT
55
1 DESCRIPTIVE STATISTICS
xi fi pi Pi
360 – 370 2 0.07 7
Calculating relative
frequency and 370 – 380 3 0.10 10
percentage 380 – 390 5 0.17 17
frequency.
390 – 400 7 0.23 23
400 – 410 5 0.17 17
410 – 420 4 0.13 13
420 – 430 3 0.10 10
430 – 440 1 0.03 3
Total 30 1.00 100
56
STATISTICS IN ECONOMICS AND MANAGEMENT
xi fi ci CAFi CRF%i
360 – 370 2 365 2 6.67 Calculating
center of interval.
370 – 380 3 375 5 16.67
380 – 390 5 385 10 33.33
390 – 400 7 395 17 56.67
400 – 410 5 405 22 73.33
410 – 420 4 415 26 86.67 Calculating
420 – 430 3 425 29 96.67 Cumulative absolute
frequencies and
430 – 440 1 435 30 100.00 Cumulative relative
Total 30 frequencies.
1.7.4. Outliers
There may be more than one outlier in a set of data. Sometimes, outliers
are significant pieces of information and should not be ignored. Other
times, they occur because of an error or misinformation and should be
ignored.
57
1 DESCRIPTIVE STATISTICS
Example 1.9.
61.7, 58.4, 59.2, 61.5, 61.4, 59.8, 59.0, 61.1, 61.6, 56.3,
61.9, 65.7, 58.9, 59.0, 61.2, 61.4, 58.4, 60.0, 59.3, 61.9
In this case, the stems will be the whole number values and the leaves
will be the decimal values. The data range from 56.3 to 65.7, so the
stems should start at 56 and finish at 65. The following table is a stem
and leaf plot for lengths of 20 products:
Lengths of 20 products
Stem Leaf
Constructing stem 56 3
and leaf plot. 57
58 449
59 00238
60 0
61 124456799
63
64
65 7
In this case, 56.3 and 65.7 could be considered as outliers, since these
two values are quite different from the other values.
58
STATISTICS IN ECONOMICS AND MANAGEMENT
59
1 DESCRIPTIVE STATISTICS
A grouped series:
Split columns – bar charts (discrete series, no intervals)
Structured column
Structured circle - pie
histogram – adjoining columns (discrete series, intervals)
polygon of absolute frequencies
polygon of cumulative frequencies
line diagram (discrete non-interval grouped series)
You too can experiment with different types of graphs and select the
most appropriate. There are several suggestions for appropriate selecti-
on according to effects that you want to get with graphs:
pie chart (description of components)
horizontal bar graph (comparison of items and relationships, time
series)
vertical bar graph (comparison of items and relationships, time series,
frequency distribution)
line graph (time series and frequency distribution)
scatter plot (analysis of relationships)
histogram (continuous variable).
If you decide that a graph is the best way to present your information,
then no matter what type of graph you use, you need to keep in mind
the following rules:
convey an important message
decide on a clear purpose
draw attention to the message, not the source
experiment with various options and graph styles
60
STATISTICS IN ECONOMICS AND MANAGEMENT
Horizontal bar Compares important data. Useful when category names are
graph too long to fit at the foot of a column.
61
1 DESCRIPTIVE STATISTICS
62
STATISTICS IN ECONOMICS AND MANAGEMENT
63
1 DESCRIPTIVE STATISTICS
64
STATISTICS IN ECONOMICS AND MANAGEMENT
65
1 DESCRIPTIVE STATISTICS
66
STATISTICS IN ECONOMICS AND MANAGEMENT
67
2
DESCRIPTIVE
STATISTICS
CHAPTER
2
STATISTICS IN ECONOMICS AND MANAGEMENT
2.1. INTRODUCTION
71
2 DESCRIPTIVE STATISTICS
72
STATISTICS IN ECONOMICS AND MANAGEMENT
• The mode is the most frequent value in the list (or one
of the most frequent values, if there is more than one). It
differs from the fewest possible members of the list.
73
2 DESCRIPTIVE STATISTICS
74
STATISTICS IN ECONOMICS AND MANAGEMENT
While probably not intuitively obvious, the mean has a very desirable
property: it is the “best guess” for a score in the distribution, when
we measure “best” as least in error13. This might seem especially odd
because, in some case, no one would report 5.4 best friends, so if you
guessed 5.4 for someone, you are always wrong! But if you measure
how far off your guess would tend to be from the actual score that you
are trying to guess, 5.4 would produce the smallest error in your guess.
It is worth elaborating on this point because it is important. Suppose I
put the data into a hat, and pulled the scores out of the hat one by one
and each time I ask you to guess the score I pulled out of the hat. After
each guess, I record how far off your guess was, using the formula:
error = actual score - guess. Repeating this procedure for all 5 scores,
we can compute your mean error. Now, if you always guessed 5.4, your
mean error would be, guess what? Mean error would be 0! Any other
guessing strategy you used would produce a mean error different from
zero. Because of this, the mean is often used to characterize the “typical”
value in a distribution. No other single number we could report would
more accurately describe every data point in the distribution.
Proof:
The arithmetic mean is placed between the lowest and highest value
of the series.
13
http://www.une.edu.au/WebStat/unit_materials/c4_descriptive_statistics/central_tendency_
measure.html, access: 14. 11. 2009.
75
2 DESCRIPTIVE STATISTICS
Proof:
Proof:
76
STATISTICS IN ECONOMICS AND MANAGEMENT
Proof:
If we add the same constant to each observation, the arithmetic
mean of the new variable is equal to the sum of the constant and the
arithmetic mean of the original variable.
Proof:
77
2 DESCRIPTIVE STATISTICS
14
http://www.statistics.com/resources/glossary/h/harmmean.php, access: 25. 01. 2010.
78
STATISTICS IN ECONOMICS AND MANAGEMENT
Means turnover and invested funds have indirect relation, and we will
calculate harmonic mean.
Instead of adding the set of numbers and then dividing the sum by the
count of numbers in the set, for the geometric mean the numbers are
multiplied and then the Nth root of the resulting product is taken. For
non-grouped data geometric mean is equal to:
79
2 DESCRIPTIVE STATISTICS
We can use geometric mean only for data set where . It is used
when phenomena act (behave) according to the geometric progression.
It is important in the analysis of temporal series, for calculating the
average growth rate.
80
STATISTICS IN ECONOMICS AND MANAGEMENT
2.2.4. Median
One way to compute the median is to list all scores in numerical order,
and then locate the score in the center of the sample. Well, theoretical
to relative frequencies 0.5. For example, if there are 500 scores in the
list, score on position 250th would be the median.
There is the rule: If we have N like odd number then for ordered set of
N like even number then for ordered set of data median will be equal to
15
the arithmetic mean of data on positions: and . Or by formula,
15
If the two middle scores had different values, we would have to interpolate to determine the
median.
81
2 DESCRIPTIVE STATISTICS
For example, quiz score for 8 students taking the exam are given:
15, 20, 21, 20, 36, 15, 25, 15
There are 8 scores and scores x4 and x5 represent the halfway point.
Since both of these scores are 20, the median is 20.
82
STATISTICS IN ECONOMICS AND MANAGEMENT
2.2.5. Mode
In our example (quiz score for 8 students taking the exam where
following scores are obtained: 15, 20, 21, 20, 36, 15, 25, 15), mode
value is the value 15, which occurs the most frequently in the series
(three times). In some distributions there is more than one modal value.
For instance, in a bimodal distribution there are two values that occur
most frequently.
Notice that for the same set of 8 scores we got three different values
(20.875, 20, and 15) for the mean, median and mode respectively. If the
distribution is truly normal (i.e., bell-shaped), the mean, median and
mode are all equal to each other.
The mode is used only for descriptive purposes because the mode is
more variable from sample to sample than other measures of central
tendency.
Well, if we want to know what the most common modality is, we will
use mode as measure of central tendency. The mode is used less than
83
2 DESCRIPTIVE STATISTICS
either the mean or the median in business applications. Perhaps its most
obvious use is by manufacturers who produce goods, such as clothing,
in various sizes. The modal size of items sold is then the one in heaviest
demand.
2.2.6. Quartiles
In each of the parts, there is 25% data from the series. There are three
quartiles: Q1, Q2 = Me and Q3. The first quartile is a value for which
25% of the observations are smaller or equal to while other 75% are
larger. The third quartile is a value for which 75% of the observations
are smaller or equal and 25% are larger.
84
STATISTICS IN ECONOMICS AND MANAGEMENT
85
2 DESCRIPTIVE STATISTICS
Example 2.1.
The following data represent the total daily number of produced burgers
(’000) from a selected 20 fast-food chains in one town:
34, 15, 9, 19, 31, 34, 35, 39, 19, 34, 43, 7, 9, 15, 19, 35, 15, 19, 9, 31.
Solution:
Firstly, we will make a series arranged in order from the smallest to the
largest in size:
7, 9, 9, 9, 15, 15, 15, 19, 19, 19, 19, 31, 31, 34, 34, 34, 35, 35, 39, 43.
xi fi
7 1
Constucting 9 3
frequency 15 3
distribution.
19 4
31 2
34 3
35 2
39 1
43 1
n 20
86
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
xi fi xi . fi CAF
7 1 7 1
9 3 27 4
15 3 45 7
19 4 76 11
31 2 62 13
34 3 102 16
35 2 70 18
39 1 39 19
43 1 43 20
n 20 471
Mean:
Calculating and
interpreting
Average daily number of produced burgers in analyzed sample was aritmetic mean.
23,550 burgers.
Median: f
Calculating and
50% of analyzed fast-food chains have daily production of burger equal interpreting median.
First quartile:
Calculating and
interpreting first
25% of analyzed fast-food chains have daily production of burger equal quartile.
to or less than 15,000, while 75% of analyzed fast-food chains produce
more than 15,000 burgers daily.
Third quartile:
Calculating and
interpreting third
75% of analyzed fast-food chains have daily production of burger equal quartile.
to or less than 34,000, while 25% of analyzed fast-food chains produce
more than 34,000 burgers daily.
87
2 DESCRIPTIVE STATISTICS
Mode:
Calculating and
interpreting mode. In this sample, fast-food occurs the most frequently chain with
production of 19 000 burgers per day.
Example 2.2.
Solution:
xi fi pi ci ci . fi CAF CRF
4.5 – 5.0 25 0.28 4.75 118.75 25 0.28
5.0 – 5.5 35 0.39 5.25 183.75 60 0.67
5.5 – 6.0 20 0.22 5.75 115.00 80 0.89
6.0 – 6.5 10 0.11 6.25 62.50 90 1.00
Total 90 1.00 480
Mean:
Calculating and
interpreting Average height of students in class is 5.33 feet.
aritmetic mean.
88
STATISTICS IN ECONOMICS AND MANAGEMENT
Median:
Calculating and
interpreting median.
50% of analyzed students have height equal to or less than 5.286 feet,
while 50% of analyzed students are taller than 5.286 feet.
First quartile:
Calculating and
interpreting first
quartile.
25% of analyzed students have height equal to or less than 4.95 feet,
while 75% of analyzed students are taller than 4.95 feet.
Third quartile:
Calculating and
interpreting third
quartile.
89
2 DESCRIPTIVE STATISTICS
75% of analyzed students have height equal to or less than 5.6875 feet,
while 25% of analyzed students are taller than 5.6875 feet.
Graphicaly
presentation
of median.
Mode:
Calculating
and interpreting
mode.
90
STATISTICS IN ECONOMICS AND MANAGEMENT
Graphicaly
presentation
of mode.
91
2 DESCRIPTIVE STATISTICS
The range is simply the highest value minus the lowest value:
RV = xmax - xmin.
In our example distribution with quiz score for students that take exam,
the highest value is 36 and the lowest is 15, so the range is 36 - 15 = 21.
92
STATISTICS IN ECONOMICS AND MANAGEMENT
tendency and the characteristic that we can check using the theorem
König-Huygens.
The first member on the right side of expression does not depend on w
and it is a variance of the variable X. Hence, previous expression has a
minimum value when:
or when
16
According to characteristics of arithmetic mean
93
2 DESCRIPTIVE STATISTICS
for population:
for sample:
94
STATISTICS IN ECONOMICS AND MANAGEMENT
for population:
where
for sample:
95
2 DESCRIPTIVE STATISTICS
(as was true in the example with exam scores where the single outlier
value of 36 stands apart from the rest of the values).
Again let’s analyse the set of scores: 15, 20, 21, 20, 36, 15, 25, 15.
To compute the standard deviation, we will first calculate the distance
between each value and the arithmetic mean. We previously calculated
the mean of 20.875. So, the differences from the mean are as follows:
15 - 20.875 = -5.875
20 - 20.875 = -0.875
21 - 20.875 = 0.125
20 - 20.875 = -0.875
36 - 20.875 = 15.125
15 - 20.875 = -5.875
25 - 20.875 = 4.125
15 - 20.875 = -5.875
We should notice that values that are less than the mean have
negative discrepancies and values greater than the mean have positive
discrepancies. For next step, we will square each distance:
Now, we will sum these “squares” to get the Sum of Squares (SS) value.
That sum is 350.878. In the next step, we will divide this sum by the
number of scores minus 1 (n-1), because we are working with sample,
96
STATISTICS IN ECONOMICS AND MANAGEMENT
To get the standard deviation, we will take the square root of the
variance, because we squared the deviations in earlier stage. This would
be square root from (50.125) = 7.0799. The standard deviation has the
same measurement unit as analyzed variable, so we can find logical
interpretation for standard deviation value.
This computation may seem confusing, but it’s actually quite simple.
To prove this, consider the formula for the standard deviation:
for population:
for sample:
In the numerator of the ratio we can see that each score has the mean
subtracted from its value, the difference is squared, and the squares
are summed. In the denumerator, we take the number of scores (or the
number of scores minus 1 for sample). The ratio is the variance and the
square root is the standard deviation.
97
2 DESCRIPTIVE STATISTICS
For population:
For sample:
For population:
For sample:
98
STATISTICS IN ECONOMICS AND MANAGEMENT
If we add the same number to each observation, the variance will not
change. Or mathematically:
Proof:
If then:
Proof:
If then:
99
2 DESCRIPTIVE STATISTICS
From the two previous properties, we can observe and express the
following proposition:
The first member on the right side of the given relation is weighted
arithmetic mean of variances for two series and it is called the variance
in the series. Another member is variance of the arithmetic mean and
it is called a variance between the series. This rule can be generalized
to cases of aggregation. Variance is the dispersion parameter whose
numerical value cannot be correctly explained but which has analyzed
characteristics of computation. Therefore, we define the standard
deviation, whose numerical value can be explained specifically but it
does not have characteristics of computing that we have demonstrated
for the variance.
100
STATISTICS IN ECONOMICS AND MANAGEMENT
or for sample
2.4.4. Z value
or for sample
101
2 DESCRIPTIVE STATISTICS
We will calculate quartiles in the same way like median with theoretical
positions 25% and 75%.
102
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 2.3.
We use data from the sample of 7 participants at one seminar who had
to fill out a form that gave their name, address and age. The following
ages of the participants were recorded:
Solution:
There is only 7 data (each different from another), hence we will not
construct frequency distribution. First we will make order series for this
data:
Ordinal numeral
Age - xj
of participant
1. 36 -23.429 548.898 23.429
2. 48 -11.429 130.612 11.429
3. 54 -5.429 29.469 5.429
4. 57 -2.429 5.898 2.429
5. 63 3.571 12.755 3.571
6. 66 6.571 43.184 6.571
7. 92 32.571 1,060.898 32.571
Total 416 0.000 1,831.714 85.429
103
2 DESCRIPTIVE STATISTICS
Median
position for median is , so median is Me = 57. 50% of
selected participants are 57 years old or younger, while 50% are more
than 57 years old.
Mode
All data has different value and we did not create frequency
distribution, so we cannot calculate mode.
17
- Average linear distance
Calculating and
interpreting standard from average age (59.4286) in sample is 17.47 years.
deviation.
17
Data are given for the sample, so we use formula for standard deviation of sample (with (N-1)).
104
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 2.4.
Solution:
xi fi xi . fi CAF
1 6 6 6 8.64 7.2
2 7 14 13 0.28 1.4
3 4 12 17 2.56 3.2
4 3 12 20 9.72 5.4
n 20 44 21.2 28
Mean:
Aritmetic mean
Average number of cars that were registered for households in analyzed
sample is 2.2 cars.
Median:
Median
50% of analyzed households from sample have 2 registered cars or less.
105
2 DESCRIPTIVE STATISTICS
Mode:
Mode
Households with 2 registered cars are the most frequent in this sample.
18
- Average linear distance
Calculating and
interpreting standard from average number of registered cars per household in the analyzed
deviation. sample (2.2) is 1.056 cars.
Calculating and with 1 car registered are below average for 1.136 standard deviations.
interpreting z value.
Example 2.5.
Thirty AA batteries from the sample were tested to determine how long
they would last. The results, to the nearest minute, were recorded as
follows:
18
There are 20 homes like sample, so we use formula for standard deviation from sample (with
(N-1)).
106
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
xi fi CAF ci ci . fi
Mean:
Aritmetic mean
Average battery life in analyzed sample is 398 min.
107
2 DESCRIPTIVE STATISTICS
Median:
Median
50% of analyzed batteries last 397.14 minutes or less, while 50% last
longer.
Mode:
Mode
The battery which lasts 395 minutes is the most frequent in the sample.
Quartile 1:
Quartile 1
25% of analyzed batteries have life of 385 minutes or less, while 75%
last longer than 385 minutes.
Quartile 3:
Quartile 3
108
STATISTICS IN ECONOMICS AND MANAGEMENT
19
- Average linear distance
Standard deviation
from average battery life in analyzed sample (398 min.) is 18.22
minutes.
Calculating and
interpreting
When we remove 25% of the smallest and 25% of the highest data, the quartile range.
new range of variation will be 26.25 minutes.
19
There is thirty AA batteries like sample, so we use formula for standard deviation from
sample (with (N-1)).
109
2 DESCRIPTIVE STATISTICS
α3 = 0 ⇒ symmetry
Measure
α3 > 0 ⇒ positively
of skewness
skewed
(for
population) α3 < 0 ⇒ negatively
skewed
110
STATISTICS IN ECONOMICS AND MANAGEMENT
α3 = 0 ⇒ symmetry
Measure
of α3 > 0 ⇒ positively
skewness skewed
(for
sample) α3 < 0 ⇒ negatively
skewed
111
2 DESCRIPTIVE STATISTICS
112
STATISTICS IN ECONOMICS AND MANAGEMENT
2.6.2. Kurtosis
113
2 DESCRIPTIVE STATISTICS
Measure α4 = 3 ⇒ normal
of kurtosis
α4 > 3 ⇒ leptocurtic
(for
population) α4 < 3 ⇒ platykurtic
α4 = 3 ⇒ normal
Measure
of kurtosis α4 > 3 ⇒ leptocurtic
(for sample)
α4 < 3 ⇒ platykurtic
114
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 2.6.
In the last 2 years in the company ICC 50 injuries have happened and
the number of hours lost due to injury was:
115
2 DESCRIPTIVE STATISTICS
Solution:
xi fi xi . fi
a)
The average number of hours lost due to injury for analysed population
is 2.7.
20
b) - average linear deviation
20
This is population for two years, so we use formula for standard deviation for population
(with N).
116
STATISTICS IN ECONOMICS AND MANAGEMENT
c)
Calculating and
interpreting measure
of skewness.
Calculating and
interpreting measure
of kurtosis.
Graphicaly
presentation of
measures of
skewnes and
kurtosis.
Example 2.7.
117
2 DESCRIPTIVE STATISTICS
Solution:
xi fi ci ci . fi
- Average linear
Measure of skewness
Measure of kurtosis
118
STATISTICS IN ECONOMICS AND MANAGEMENT
119
2 DESCRIPTIVE STATISTICS
The higher the Gini coefficient, the more unequal the distribution is.
120
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 2.8.
Solution:
xi fi ci pi CRFi ci . fi Qi
Calculating and
interpreting Gini
coefficient –
trapezoid method.
121
2 DESCRIPTIVE STATISTICS
Calculating and
interpreting Gini
coefficient – triangle
method.
Graphicaly
presentation
Lorenz curve.
122
STATISTICS IN ECONOMICS AND MANAGEMENT
The most simple and the fastest way to get several parameters which
will describe given series (xmin, xmax, average, deviation, mod, median,
kurtosis and skewness) is to use Excel function: Tools – Data Analysis.
If that option is not included we have to renew it:
1. Tools – Add-ins:
21
http://www.doingbusiness.org/CustomQuery/, data for 2008 year, access: 15. 04. 2009.
123
2 DESCRIPTIVE STATISTICS
124
STATISTICS IN ECONOMICS AND MANAGEMENT
We will get list with analysis that we can make. Currently we are
interested in option Descriptive statistics, so we choose it and click OK.
In the same time, in Input range we can select all columns with several
variables and group according to the columns ($B$1:$D$182). After
data selection, we include the first cell with variable name and then
choose option Labels in the first row. Then we set up empty cell or new
sheet where we want to save the result of analyses and select which
parameter statistics we want to determine:
Summary statistics - xmin, xmax, average, deviation, mod, median,
kurtosis and skewness, range, count...
Confidence level for mean – This is boundary for confidence interval
for average with given confidence level (for example 95%)
If we want to calculate quintiles we will choose Kth largest i Kth
smallest option. For example, for the first and the 99th percentiles in
both cases we take 1, for the first and the third quartile in both cases
we take 25, for the first and ninth deciles in both cases we take 10…
125
2 DESCRIPTIVE STATISTICS
126
STATISTICS IN ECONOMICS AND MANAGEMENT
127
2 DESCRIPTIVE STATISTICS
For Input range we will select column with original data (C2:C182) and for
Bin Range we will select cells where we type upper limits for intervals
(I22:I47). We will find place to save result and option Chart output:
128
STATISTICS IN ECONOMICS AND MANAGEMENT
Graph that we are get is graph with vertical bars, but we will click on
graph and choose Chart options – Options. There we will set up that gap
between bars is equal to 0:
129
2 DESCRIPTIVE STATISTICS
Conclusions about distribution shape drawn from histogram are the same
conclusions that we inferred from previously calculated parameters.
It is very positive (right) asymmetric and peaked distribution. This
distribution is significantly different in comparison with normal curve.
130
STATISTICS IN ECONOMICS AND MANAGEMENT
Since we make decision to set up intervals 5,000 wide, the upper limits
included in intervals (bins) are: 4,999.99, 9,999.99, 14,999.99, …, 5,4999,99.
We will type these limits in empty column in sheet where original
data are:
131
2 DESCRIPTIVE STATISTICS
132
STATISTICS IN ECONOMICS AND MANAGEMENT
133
2 DESCRIPTIVE STATISTICS
134
STATISTICS IN ECONOMICS AND MANAGEMENT
135
2 DESCRIPTIVE STATISTICS
136
STATISTICS IN ECONOMICS AND MANAGEMENT
137
2 DESCRIPTIVE STATISTICS
138
STATISTICS IN ECONOMICS AND MANAGEMENT
For line of perfect equality we will take the same data for relative
cumulative frequencies for both axes.
139
2 DESCRIPTIVE STATISTICS
Now with Add we will insert new series for line with perfect equality:
140
STATISTICS IN ECONOMICS AND MANAGEMENT
141
2 DESCRIPTIVE STATISTICS
142
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
3, 4, 6, 7, 10, 18, 25
a)
143
2 DESCRIPTIVE STATISTICS
50% of data have value 7 or less, while 50% of the data have value more
than 7.
Solution:
2, 2, 3, 3, 3, 4, 7, 8
a)
50% of data have value 3 or less, while 50% of the data have value more
than 3.
c)
144
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
Geometric mean
Solution:
Note: Performance and the average time required to execute the observed
action have indirect relation; hence we will calculate harmonic mean.
xi (minutes) fi
20 10
25 17
30 7
35 6
Σ 40
Harmonic mean
145
2 DESCRIPTIVE STATISTICS
80, 80, 80, 80, 90, 90, 90, 90, 90, 90, 100, 100, 100, 110, 110.
Solution:
Since we have a series with few data, a non interval grouped frequency
distribution will be formed:
146
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
c)
The most frequent daily earning for the 15 observed employees is 90 KM.
To find the median, we firstly use the formula for the location (position).
The position is . Afterward, we look for the least value
of cumulative absolute frequency that is greater or equal to calculated
position. The corresponding modality represents median:
d)
147
2 DESCRIPTIVE STATISTICS
2 3 4 2 4
3 3 1 3 4
4 5 5 1 2
2 1 4 0 3
3 3 2 2 1
Solution:
a)
Quiz scores Number of students xi . fi
0 1 1 0
1 4 5 4
2 6 11 12
3 7 18 21
4 5 23 20
5 2 25 10
Σ 25 67
148
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
c)
The average quiz score for the 25 observed students is 2.68 points.
d)
The most frequent quiz scores for the 25 observed students is 3 points.
149
2 DESCRIPTIVE STATISTICS
e)
2.7. The following values are the number of cars that households of one
rich part of city possess:
150
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
b)
c)
151
2 DESCRIPTIVE STATISTICS
d)
Determining and
interpreting first
decile. In this case, there is no great difference between the actual (12%) and
theoretical (10%) cumulative frequency so in our interpretations we use
actual theoretical cumulative frequency. Therefore, 10% of households
have 1 car or less, while 90% of the households have more than 1 car.
Determining and
interpreting ninth
decile.
In this case, there is no great difference between the actual (92%) and
theoretical (90%) cumulative frequency so in our interpretations we use
actual theoretical cumulative frequency. Therefore, 90% of households
have 4 cars or less, while 10% of the households have more than 4 cars.
2.8. The numbers of new orders received by a company over the past 20
working days were recorded as follows:
Solution:
a)
Number of Number of pi xi . fi
new orders working days
0 2 0.1000 36 0 2
1 2 0.1000 36 2 4
2 4 0.2000 72 8 8
3 6 0.3000 108 18 14
4 4 0.2000 72 16 18
5 2 0.1000 36 10 20
Σ 20 1 360 54
152
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
c)
d)
e)
40% of time company received 2 new orders or less, while 60% of time
company received more than 2 new orders.
90% of time company received 4 new orders or less, while 10% of times
company received more than 4 new orders.
153
2 DESCRIPTIVE STATISTICS
Solution:
a)
Speeds ci ci . fi
Number of cars
(in kph)
[100 – 110[ 2 105 210
[110 – 120[ 3 115 345
[120 – 130[ 4 125 500
[130 – 140[ 8 135 1080
[140 – 150] 3 145 435
Σ 20 2570
154
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
c)
d)
155
2 DESCRIPTIVE STATISTICS
e)
2.10. The following frequency distribution shows the distance (in km)
that 50 workers need travel to work:
156
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
Number of ci ci . fi
Distance (in km)
workers
[0 – 5[ 7 2.5 7 17.5
[5 – 10[ 20 7.5 27 150
[10 – 15[ 16 12.5 43 200
[15 – 20] 7 17.5 50 122.5
Σ 50 490
b)
The average distance that 50 observed workers need to travel is 9.8 km.
157
2 DESCRIPTIVE STATISTICS
e)
Graphicaly
presentation
of quartiles.
158
STATISTICS IN ECONOMICS AND MANAGEMENT
f)
Graphicaly
presentation
of box plot.
2.11. A supervisor of a bank kept records of the time (in minutes) that
employees needed to complete a particular task. The data are given
in the next table:
11 29 16 24 15 23 10 21 18 20
15 22 13 24 16 28 21 14 26 27
25 20 19 23 17 23 18 22 19 29
159
2 DESCRIPTIVE STATISTICS
Solution:
a)
Time ci pi ci . fi
Frequency
(in min)
[10 – 15[ 4 12.5 0.1333 48 50 4
[15 – 20[ 9 17.5 0.3000 108 157.5 13
[20 – 25[ 11 22.5 0.3667 132 247.5 24
[25 – 30] 6 27.5 0.2000 72 165 30
Σ 30 1 360 620
b)
c)
d)
160
STATISTICS IN ECONOMICS AND MANAGEMENT
2.12. The table below shows the distribution of scores on driving test
undertaken by 90 candidates:
a) Draw a histogram.
b) Calculate the average score on the driving test.
c) Calculate and explain quartile.
d) Calculate and explain C1 and C99 .
161
2 DESCRIPTIVE STATISTICS
Solution:
Number of ci ci . fi
Scores
candidates
[0 – 20[ 8 10 80 8
[20 – 40[ 16 30 480 24
[40 – 60[ 35 50 1750 59
[60 – 80[ 18 70 1260 77
[80 – 100] 13 90 1170 90
Σ 90 4740
a)
b)
c)
162
STATISTICS IN ECONOMICS AND MANAGEMENT
25% of candidates on the driving test have 38.13 points or less, while
75% candidates have more than 38.13 points.
75% of candidates on the driving test have 69.44 points or less, while
25% candidates have more than 69.44 points.
d) Calculating
and interpreting
first centile.
1% of candidates on the driving test have 2.25 points or less, while 99%
candidates have more than 2.25 points.
Calculating and
interpreting ninety
ninth centile.
99% of candidates on the driving test have 98.62 points or less, while
1% candidates have more than 98.62 points.
5, 6, 6, 8, 7, 7, 7, 8, 9, 9, 8, 7, 7, 10, 9, 8
163
2 DESCRIPTIVE STATISTICS
Solution:
5, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10
xi fi xi . fi
5 1 5 -2.56 6.55
6 2 12 -1.56 4.87
7 5 35 -0.56 1.57
8 4 32 0.44 0.77
9 3 27 1.44 6.22
10 1 10 2.44 5.95
Σ 16 121 25.93
164
STATISTICS IN ECONOMICS AND MANAGEMENT
2.14. A company has produced the following table to describe the monthly
overhead expenses:
Determine:
a) Graphically present the frequency distribution by using histogram.
b) Calculate the average monthly overhead expenses.
c) Compute and explain mode and median.
d) Compute and explain middle absolute distance.
Solution:
165
2 DESCRIPTIVE STATISTICS
a)
b)
The average monthly overhead expenses for the 12 observed months are
5330 KM.
c)
166
STATISTICS IN ECONOMICS AND MANAGEMENT
d)
The average absolute deviation of the individual data from the average
monthly overhead expenses amounts to 1780 KM.
Solution:
Number of Number of xi . fi
sold cars working days
0 3 0 -1.87 10.49
1 10 10 -0.87 7.57
2 8 16 0.13 0.14
3 6 18 1.13 7.66
4 3 12 2.13 13.61
Σ 30 56 39.47
The average number of sold cars for the 30 observed working days is
1.87.
167
2 DESCRIPTIVE STATISTICS
The most frequent number of sold cars for the 30 observed working
days is 1.
The average linear deviation of the individual data from the average
number of sold cars amounts to 1.15.
2.16. We recorded the time cleaners needed to finish certain job and for
40 cleaners gained the following data (in minutes):
18 23 18 16 16 23 19 16 20 19
17 17 14 12 14 12 15 13 21 18
22 20 19 17 21 21 23 15 19 16
18 23 18 12 14 12 14 16 20 19
168
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
a)
Ri fi ci ci . fi
b)
169
2 DESCRIPTIVE STATISTICS
c)
The average time needed to finish job for the 40 cleaners is 18.00 min.
d)
50% of cleaners finished job in 18.23 min or less, while 50% of cleaners
need more than 18.23 min.
25% of cleaners finished job in 15.30 min or less, while 75% of cleaners
need more than 15.30 min.
75% of cleaners finished job in 20.54 min or less, while 25% of cleaners
need more than 20.54 min.
170
STATISTICS IN ECONOMICS AND MANAGEMENT
e)
171
2 DESCRIPTIVE STATISTICS
Solution:
a)
Ri fi ci ci . fi
[0 – 100[ 5 50 250 5
[100 – 200[ 4 150 600 9
[200 – 300[ 8 250 2000 17
[300 – 400[ 3 350 1050 20
[400 – 500] 4 450 1800 24
Σ 24 5700
b)
c)
d)
172
STATISTICS IN ECONOMICS AND MANAGEMENT
When we remove 25% of the smallest and 25% of the highest data, the
new range of variation will be 208.33 flats.
Solution:
173
2 DESCRIPTIVE STATISTICS
b)
The average weekly amount spent for the 130 observed households is
663.08 KM.
c)
The average linear deviation of the individual data from the average
amount spent amounts to 235.38 KM.
Households that weekly spent 759 KM have above average spending for
0.41 standard deviations.
e)
174
STATISTICS IN ECONOMICS AND MANAGEMENT
2.19. The number of working days lost by employees in the last month
is given in the following table:
Solution:
175
2 DESCRIPTIVE STATISTICS
a)
b)
The average number of working days lost by employees in the last month
is 2.11.
c)
d)
176
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
177
2 DESCRIPTIVE STATISTICS
a)
b)
The average number of traffic offences for the 90 observed days is 2.12.
c)
The most common number of traffic offences for the 90 observed days
is equal to 2.
d)
178
STATISTICS IN ECONOMICS AND MANAGEMENT
179
2 DESCRIPTIVE STATISTICS
Solution:
a)
b)
The average age of the workers for the 43 observed workers is 27.63 years.
c)
180
STATISTICS IN ECONOMICS AND MANAGEMENT
50% of workers are 26.50 years or younger, while the remaining 50% of
workers are older than 26.50 years.
d)
e)
181
2 DESCRIPTIVE STATISTICS
2.22. A company collected the ages of its middle managers with the data
shown below (in years):
65 35 46 40 25 28 58 39 41 41
38 53 36 49 43 52 60 54 59 30
Solution:
a)
b)
The average age of the middle managers for the 20 observed managers
is 44.50 years.
c) The range of a data set is the difference between the largest and the
smallest data values. It is the simplest measure of variability. It is
very sensitive to the smallest and the largest data values.
182
STATISTICS IN ECONOMICS AND MANAGEMENT
Calculating
deciles range.
183
2 DESCRIPTIVE STATISTICS
Calculating
centiles range.
184
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
a)
b)
c)
185
2 DESCRIPTIVE STATISTICS
The average linear deviation of the individual data from the average
returns is equal to 5.63 %.
d)
Solution:
Number Number
of new of working xi . fi
orders days
1 7 7 -2.20 33.88 -74.54 163.98
2 10 20 -1.20 14.40 -17.28 20.74
3 13 39 -0.20 0.52 -0.10 0.02
4 9 36 0.80 5.76 4.61 3.69
5 8 40 1.80 25.92 46.66 83.98
6 3 18 2.80 23.52 65.86 184.40
Σ 50 160 -2.20 104.00 25.21 456.81
186
STATISTICS IN ECONOMICS AND MANAGEMENT
distribution
187
2 DESCRIPTIVE STATISTICS
Solution:
Weekly Number
earnings of xi . fi
($) employees
a)
b)
c)
188
STATISTICS IN ECONOMICS AND MANAGEMENT
d)
frequency distribution.
2.26. The table below shows the distribution of the time students spend
on a particular homework assignment (sample of 30 students):
Solution:
Number of ci ci . fi
Time (in min)
students
[0 − 20[ 3 10 30 300 -25.33 -48755.86
[20 − 40[ 18 30 540 16200 -5.33 -2725.55
[40 − 60[ 7 50 350 17500 14.67 22099.80
[60 − 80] 2 70 140 9800 34.67 83347.30
Σ 30 1060 43800 -25.33 53965.69
189
2 DESCRIPTIVE STATISTICS
a)
b)
c)
d)
frequency distribution.
190
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
The number ci ci . fi
Frequency
of hours
[5 − 10[ 13 7.5 97.5 -9.35 1136.49 99355.02
[10 − 15[ 30 12.5 375 -4.35 567.68 10741.83
[15 − 20[ 50 17.5 875 0.65 21.12 8.93
[20 − 25[ 20 22.5 450 5.65 638.45 20380.92
[25 − 30] 10 27.5 275 10.65 1134.23 128646.64
Σ 123 2072.5 -9.35 3497.97 259133.33
191
2 DESCRIPTIVE STATISTICS
a)
b)
The most frequent hours spent studying the course material for the 123
observed students is 17.00 hours.
c)
The average linear deviation of the individual data from the average
value (mean) is equal to 5.33 hours.
192
STATISTICS IN ECONOMICS AND MANAGEMENT
d)
distribution
2.28. A supervisor of a bank kept records of the time (in minutes) that
employees needed to complete a particular task. The data are
given in next table:
11 29 16 24 15 23 10 21 18 20
15 22 13 24 16 28 21 14 26 27
25 20 19 23 17 23 18 22 19 29
Solution:
a)
Time ci ci . fi
Frequency
(in min)
[10 − 15[ 4 12.5 50 625.00 -8.17 -2181.35 17821.66
[15 − 20[ 9 17.5 157.5 2756.25 -3.17 -286.70 908.82
[20 − 25[ 11 22.5 247.5 5568.75 1.83 67.41 123.37
[25 − 30] 6 27.5 165 4537.50 6.83 1911.67 13056.72
Σ 30 620 13487.50 -488.96 31910.57
193
2 DESCRIPTIVE STATISTICS
b)
c)
distribution.
194
STATISTICS IN ECONOMICS AND MANAGEMENT
Annual salary
Number of workers
(in 000 KM)
[10 − 15[ 5
[15 − 20[ 15
[20 − 25[ 20
[25 − 30[ 30
[30 − 35] 15
Solution:
Annual Number
salary of ci pi ci . fi
(in 000 KM) workers
[10 − 15[ 5 12.5 0.0588 0.0588 62.5 0.0299 0.0299
[15 − 20[ 15 17.5 0.1765 0.2353 262.5 0.1258 0.1557
[20 − 25[ 20 22.5 0.2353 0.4706 450 0.2156 0.3713
[25 − 30[ 30 27.5 0.3529 0.8235 825 0.3952 0.7665
[30 − 35] 15 32.5 0.1765 1 487.5 0.2335 1
Σ 85 1 2087.5 1
a)
195
2 DESCRIPTIVE STATISTICS
b)
Trapezoid method
Triangles method
196
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
xi fi xi . fi CAF
Mean:
Median:
197
2 DESCRIPTIVE STATISTICS
50% of students got grade 8 or less, while 50% students got grade higher
than 8.
Mode:
Measures of dispersion:
The average linear deviation of the individual data from the average
grade in the analyzed sample is 1.25 grades.
The average absolute deviation of the individual data from the average
grade in analyzed sample is 1.05 grades.
Coefficient of variation:
Students with grade 7 are below average for 0.6 standard deviations.
22
There are 20 students in the sample, so we use formula for standard deviation from sample
(with (n-1)).
198
STATISTICS IN ECONOMICS AND MANAGEMENT
frequency distribution.
frequency distribution.
199
2 DESCRIPTIVE STATISTICS
Solution:
xi fi CAF ci ci . fi
Mean:
Median:
50% of workers of the Melly Company are 35.63 years old or younger,
while the remaining 50% of workers are older than 35.63 years.
Mode:
The most frequent age of the workers of the Melly Company is 36.67
years.
200
STATISTICS IN ECONOMICS AND MANAGEMENT
Quartile 1:
25% of workers of the Melly Company are 26.79 years old or younger,
while the remaining 75% of workers are older than 26.79 years.
Quartile 3:
75% of workers of the Melly Company are 43.44 years old or younger,
while the remaining 25% of workers are older than 43.44 years.
Measures of dispersion:
The average linear deviation of the individual data from the average
years in analyzed sample is 11.90 years.
23
There are 25 students in the sample, so we use formula for standard deviation from sample
(with (n-1)).
201
2 DESCRIPTIVE STATISTICS
The average absolute deviation of the individual data from the average
years in analyzed sample is 11.90 years.
Coefficient of variation:
40 years old workers are for 0.34 standard deviations above average.
When we remove 25% of the smallest and 25% of the highest data, the
new range of variation will be 16.65 years.
frequency distribution.
frequency distribution.
202
STATISTICS IN ECONOMICS AND MANAGEMENT
2.32. A variable that can only take certain values (whole numbers) is
referred to as a:
a) continuous variable.
b) discrete variable.
c) constant.
d) statistical variable.
Answer: b)
2.34. You measure the width (in inches) of a number of fabric samples.
This would be an example of measurement at the:
a) nominal level.
b) ordinal level.
c) interval level.
d) ratio level.
Answer: d)
203
2 DESCRIPTIVE STATISTICS
Answer:
Temperature Frequency
51 4
50 4
49 6
48 0
47 3
46 3
45 4
44 3
43 3
Ν 30
Answer:
204
STATISTICS IN ECONOMICS AND MANAGEMENT
59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1, 60.7, 61.6, 56.3, 61.9, 68.7, 60.4,
58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2, 62.7, 60.0, 59.3, 61.9, 61.7, 58.4, 62.2
Answer:
a) Variable is continuous, since students’ weight can take any value
from certain interval and it is obtained by measurement procedure.
b) There are two outliers: 56.3 and 68.7.
d) histogram
205
2 DESCRIPTIVE STATISTICS
a) bar graphs
b) histogram
c) pie chart
d) polygon of frequency.
Answer: d)
2.40. Given the following data set: 13, 15, 12, 13, 9, 13.
2.41. Given the following data set: 11, 9, 10, 13, 11, 12, 13, 14, 11, 15, 9.
206
STATISTICS IN ECONOMICS AND MANAGEMENT
Compute the arithmetic mean, mode and quartiles of the following data:
8 10 8 7 8 9 7 6 7 8 5 6
9 6 8 6 5 10 8 7 9 5 9 9
7 7 9 8 7 9 8 5 7 7 10 7
Answer: (7.53, 7, 7, 7, 9)
2.44. The following frequency table summarize the ages of 195 visitors
at the local museum:
207
2 DESCRIPTIVE STATISTICS
2.45. The data about the number of persons that are temporarily
employed abroad, according to age, are given in the table below:
2.46. Suppose that you want to drive 10 km in your car. You will not
drive with the same speed all the time:
2 1 4 4 1 4 1 2 3 4
3 3 1 3 3 2 0 3 3 4
5 5 5 5 4 3 2 0 2 1
208
STATISTICS IN ECONOMICS AND MANAGEMENT
2.49. Data about the level of capacity utilization in 23 factories are given
in the table below:
Calculate:
a) The average level of capacity utilization.
b) The most common level of capacity utilization.
209
2 DESCRIPTIVE STATISTICS
2.51. Data about the age of cell phone users are given in the following
table:
210
STATISTICS IN ECONOMICS AND MANAGEMENT
2.52. The following values are the number of cars that households of
one rich part of city posses:
211
2 DESCRIPTIVE STATISTICS
212
STATISTICS IN ECONOMICS AND MANAGEMENT
18 1.15
19 0.95
20 1.41
21 0.95
22 1.88
23 0.51
24 1.03
25 1.26
26 1.31
27 1.14
28 0.87
29 0.84
30 0.81
31 0.93
32 0.88
33 0.84
34 0.74
35 0.63
36 0.77
37 1.38
38 1.42
39 0.71
40 1.30
41 0.67
42 0.88
43 0.94
44 1.14
45 1.95
46 0.85
47 1.81
48 2.06
49 1.28
213
2 DESCRIPTIVE STATISTICS
50 1.59
51 0.87
52 0.84
53 0.84
54 1.00
55 0.96
56 1.03
57 1.22
58 0.94
59 0.62
60 1.11
61 1.49
62 0.89
63 0.49
64 0.88
65 1.02
66 1.99
67 0.71
68 0.11
69 1.20
70 0.91
71 0.73
72 0.85
73 1.06
74 0.87
75 0.22
76 0.40
77 0.48
78 0.63
79 0.22
80 0.31
81 1.11
214
STATISTICS IN ECONOMICS AND MANAGEMENT
82 1.36
83 1.04
84 1.13
85 0.72
86 1.03
87 2.11
88 1.96
89 1.97
90 2.13
91 0.99
92 1.00
93 0.95
94 0.99
95 1.36
96 1.13
97 0.65
98 0.99
99 0.77
100 1.19
101 1.34
102 1.25
103 1.06
104 2.06
105 1.20
106 0.85
107 0.85
108 0.89
109 0.94
110 0.52
111 1.04
112 0.88
113 0.90
215
2 DESCRIPTIVE STATISTICS
114 0.86
115 1.57
116 0.79
117 0.64
118 1.40
119 1.00
120 1.29
121 0.84
122 0.85
123 1.11
124 1.74
125 0.80
126 1.09
127 1.26
128 1.37
129 0.61
130 0.83
131 0.99
132 1.25
133 1.06
134 1.06
135 1.90
136 1.95
137 0.85
138 0.81
139 0.99
140 0.89
141 0.89
142 0.89
143 0.88
144 1.51
145 1.05
216
STATISTICS IN ECONOMICS AND MANAGEMENT
146 1.02
147 1.07
148 1.14
149 0.95
150 1.00
151 0.88
152 0.85
153 1.04
154 0.99
155 0.93
156 0.89
157 0.71
158 0.77
159 0.44
160 1.44
161 0.97
162 0.96
163 1.32
164 1.67
165 0.83
166 1.26
167 0.97
168 1.20
169 0.95
170 0.95
171 0.78
172 1.12
173 0.54
174 0.88
175 1.15
176 1.54
177 1.16
217
2 DESCRIPTIVE STATISTICS
178 0.94
179 1.18
180 0.84
181 0.94
182 0.67
183 0.63
184 1.06
185 0.91
186 1.36
187 1.22
188 0.80
189 0.96
190 0.56
191 0.93
192 1.08
193 0.83
194 2.07
195 0.93
196 0.98
197 0.79
198 1.35
199 0.78
200 1.10
218
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer:
Sample
Expense ratio (xi) fi
[0 - 0.5) 1
[0.5 - 1.0) 6
[1.0 - 1.5) 10
[1.5 - 2.0] 3
Total 20
219
2 DESCRIPTIVE STATISTICS
Population
Expense ratio (xi) fi
[0 - 0.25) 3
[0.25 - 0.50) 5
[0.50 - 0.75) 21
[0.75 - 1.00) 83
[1.00 - 1.25) 46
[1.25 - 1.50) 22
[1.50 - 1.75) 6
[1.75 - 2.00) 9
[2.00 - 2.25] 5
Total 200
220
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer:
Histogram
221
2 DESCRIPTIVE STATISTICS
Mean: 105.67
Median: 44.93
Mode: 12.60
Standard deviation: 197.36
Variance: 38,949.87
Coefficient of variation: 186.77%
Q1: 20.28
Q3: 94.78
Inter-quartile range: 74.50
222
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer:
223
2 DESCRIPTIVE STATISTICS
First, we have to make sure that the access time intervals are of the
same length. In other words, our starting table will be following:
224
3
REGRESSION
AND
CORRELATION
CHAPTER
3
STATISTICS IN ECONOMICS AND MANAGEMENT
3.1. INTRODUCTION
One variable has to have two or more scores coming from the same
object or individual. Over many cases we wish to know whether there is
a relationship between the variables.
227
3 REGRESSION AND CORRELATION
228
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 3.1.
John Betty Sarah Peter Fiona Charlie Tim Gerry Martine Rachel
Maths
72 65 80 36 50 21 79 64 44 55
score
Statistics
78 70 81 31 55 29 74 64 47 53
score
24
http://richardbowles.tripod.com/maths/correlation/corr.htm, access: 28. 01. 2010.
229
3 REGRESSION AND CORRELATION
230
STATISTICS IN ECONOMICS AND MANAGEMENT
We can see that the points follow a very strong pattern. Students who
are good at Maths tend to be good at Statistics as well. The marks lie
fairly close to an imaginary straight line that we can draw on the graph.
In the diagram above, we can draw in this straight line: we will make right
click with “mouse” on marks and we will select options as shown below.
231
3 REGRESSION AND CORRELATION
232
STATISTICS IN ECONOMICS AND MANAGEMENT
The fact that the points lie close to the straight line is called a strong
correlation. The fact that this line is upward sloping - indicating that the
Statistics mark tends to increase as the Maths mark increases - is called
a positive correlation.
233
3 REGRESSION AND CORRELATION
The arithmetic means are found by adding the relevant scores for
exams, and dividing sum by 10. This is because there are results for ten
students in the table with original data.
We work out:
mean Maths scores =
= (72 + 65 + 80 + 36 + 50 + 21 + 79 + 64 + 44 + 55) / 10 = 56.6
mean Statistics scores =
= (78 + 70 + 81 + 31 + 55 + 29 + 74 + 64 + 47 + 53) / 10 = 58.2
and we can be sure that the line has to go through the point (56.6, 58.2).
We can see on scatter plot from example 1 that there is roughly the same
number of data point lying above this line as there are below it.
We can use the regression line to make predictions. For instance, what
Statistics mark would we expect someone to receive if they received a
Maths mark of 40? If we look at the straight line, we can see that when
the Maths mark is 40, the Statistics mark is approximately 42. Similarly,
we can assume that anyone who got 40 marks on Statistics exam, would
234
STATISTICS IN ECONOMICS AND MANAGEMENT
also get about 38 marks on Maths exam. However, there are limits on
the predictions that we can make, as we will elaborate later on.
There are steps to obtain the standard error of estimate and the coefficient
of determination:
235
3 REGRESSION AND CORRELATION
3.
236
STATISTICS IN ECONOMICS AND MANAGEMENT
The stronger the correlation the larger explained variability will be:
If r = 0 then
If r = 1 then
237
3 REGRESSION AND CORRELATION
The stronger the correlation, the smaller unexplained variability will be:
If r = 0 then
If r = 1 then
Cohen25 has observed that all such criteria are in some ways arbitrary
and should not be observed too strictly. This is because the interpretation
of a correlation coefficient depends on the context and purposes. A
correlation of 0.9 may be very low if one is verifying a physical law
using high-quality instruments, but may be regarded as very high in
the social sciences where there may be a greater contribution from
complicating, unobserved factors.
25
Cohen, J., Statistical power analysis for the behavioral sciences (2nd ed.), Lawrence
Erlbaum Associates, 1988.
238
STATISTICS IN ECONOMICS AND MANAGEMENT
The linear regression model is defined by two numbers - the slope and
the intercept on the vertical axis of the line that best fits those points.
We always refer to the slope of the line as b and the intercept as a, which
gives the equation of the regression line as:
According to this LSM method, here are formulas for calculation of the
slope and the intercept and general rules for their interpretation:
239
3 REGRESSION AND CORRELATION
Solution:
240
STATISTICS IN ECONOMICS AND MANAGEMENT
241
3 REGRESSION AND CORRELATION
Regression model:
Interpretation:
Statistics score will rise by 0.938 on average if Math score rises by 1.
Students who have 0 score from Math will have 5.089 score from
Statistics.
242
STATISTICS IN ECONOMICS AND MANAGEMENT
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.971121335
R Square 0.943076647
Adjusted
0.935961228
R Square
Standard
4.68868839
Error
Observations 10
ANOVA
Significance
Df SS MS F
F
Regression 1 2913.729609 2913.729609 132.5399 2.94E-06
Residual 8 175.8703905 21.98379882
Total 9 3089.6
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 5.083182203 4.846187507 1.048903328 0.324874 -6.09215 16.25851
Math score 0.938459678 0.081515907 11.5125957 2.94E-06 0.750484 1.126436
RESIDUAL OUTPUT
Observation Predicted Y Residuals Standard Residuals
1 72.65227905 5.347720953 1.209744422
2 66.0830613 3.916938701 0.886077412
3 80.15995647 0.840043526 0.190031974
4 38.86773063 -7.867730625 -1.779812993
5 52.00616612 2.993833877 0.677255576
6 24.79083545 4.209164551 0.952183814
7 79.2214968 -5.221496796 -1.181190394
8 65.14460162 -1.14460162 -0.258928137
9 46.37540805 0.624591948 0.141293203
10 56.69846451 -3.698464515 -0.836654877
243
3 REGRESSION AND CORRELATION
where:
is the covariance between
x (independent variable) and y (dependent variable).
244
STATISTICS IN ECONOMICS AND MANAGEMENT
- mean of variable x
- mean of variable y
n is number of objects.
Solution:
Maths Statistics
Object x2 y2 x.y
score - x score - y
John 72 78 5184 6084 5616
Betty 65 70 4225 4900 4550
Sarah 80 81 6400 6561 6480
Peter 36 31 1296 961 1116
Fiona 50 55 2500 3025 2750
Charlie 21 29 441 841 609
Tim 79 74 6241 5476 5846
Gerry 64 64 4096 4096 4096
Martine 44 47 1936 2209 2068
Rachel 55 53 3025 2809 2915
Total 566 582 35344 36962 36046
245
3 REGRESSION AND CORRELATION
Calculating and
interpreting
correlation coefficient.
Those parameters could also be calculated by using Excel.
246
STATISTICS IN ECONOMICS AND MANAGEMENT
247
3 REGRESSION AND CORRELATION
If students have Math score 75, what is the expected score for Statistics?
Solution:
Forecasting values According to previous regression model, we will expect that students
of dependent
variable y. who have Math score of 75 will get 75.214 score on Statistic.
The only difference between ρ and the standard r is that the data used
for calculation of ρ are ranks.
Example 3.2.
Two art historians were asked to rank six paintings from 1 (best) to 6
(worst). Their rankings are shown as a table:
248
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
1.
2. Standard error for parameter b
where:
249
3 REGRESSION AND CORRELATION
3.
4.
, parameter b is significant.
250
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 3.3.
To examine relationship between the store size (i.e. square footage) and
its annual sales, a sample of 14 stores was selected. The results for these
14 stores are summarized in the next table:
Annual sales
Store Square feet (000)
(in millions of $)
1 1.7 3.7
2 1.6 3.9
3 2.8 6.7
4 5.6 9.5
5 1.3 3.4
6 2.2 5.6
7 1.3 3.7
8 1.1 2.7
9 3.2 5.5
10 1.5 2.9
11 5.2 10.7
12 4.6 7.6
13 5.8 11.8
14 3.0 4.1
a) Create scatter plot to examine relationship between the store size and
its annual sales. Comment. Creating the
b) Create regression model and explain parameters. equation of the
linear regression
c) Calculate and explain coefficient of correlation and coefficient of model using Excel.
determination.
d) If store size is 4200 square feet, what level of annual sales for that
store we could expect?
251
3 REGRESSION AND CORRELATION
Solution:
a) Scatter plot:
1. independent variable is store size,
2. dependent variable is annual sale
b) linear model:
252
STATISTICS IN ECONOMICS AND MANAGEMENT
Slope (parameter b)
Correlation
coefficient
253
3 REGRESSION AND CORRELATION
The predicted average annual sale of a store with 4,200 square feet is
$7,978,000.
Replacement:
linear model:
a = antilogarithm A, b = antilogarithm B
= antilogarithm
254
STATISTICS IN ECONOMICS AND MANAGEMENT
We will again use the idea to convert a power model to a linear one,
using the logarithm, as follows:
Replacement: linear
model:
255
3 REGRESSION AND CORRELATION
a = antilogarithm A, = antilogarithm
256
STATISTICS IN ECONOMICS AND MANAGEMENT
257
3 REGRESSION AND CORRELATION
Or by expression:
258
STATISTICS IN ECONOMICS AND MANAGEMENT
1.
the basis of
3.
4.
259
3 REGRESSION AND CORRELATION
2.
4.
Example 3.4.
260
STATISTICS IN ECONOMICS AND MANAGEMENT
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.870475
R Square 0.757726
Adjusted R
0.742095
Square
Standard
638.0653
Error
Observations 34
Significance
ANOVA df SS MS F
F
Regression 2 39472731 19736365 48.47713 2.86E-10
Residual 31 12620947 407127.3
Total 33 52093677
Standard Upper
Coefficients t Stat P-value Lower 95%
Error 95%
Intercept 5837.521 628.1502 9.293192 1.79E-10 4556.4 7118.642
Price -53.2173 6.852221 -7.76644 9.2E-09 -67.1925 -39.2421
Promotion
3.613058 0.685222 5.272828 9.82E-06 2.215538 5.010578
cost
26
The database column with the dependent variable must be either the first or last, because
the independent variables must be given as "block" variables.
261
3 REGRESSION AND CORRELATION
– In the second column are the results of the sum of the squared
deviation.
– In the third column are the results of the mean square (MS),
calculated as the sum of squared of deviation / number of degrees
of freedom
262
STATISTICS IN ECONOMICS AND MANAGEMENT
27
If n is less than 30 we use t distribution with (n-k-1) degrees of freedom.
263
3 REGRESSION AND CORRELATION
When designing the research, the indicator variable is often used to set
boundaries between different groups. Indicator variable is very useful
because it does not necessarily require construction of separate regression
models for each group or subset and gives the possibility to use a simple
regression equation for the representation of different groups.
264
STATISTICS IN ECONOMICS AND MANAGEMENT
265
3 REGRESSION AND CORRELATION
Example of
regression indicator
variables in the simple What is interpretation of these model parameters?
model with a "dummy"
variable. Parameter a indicates that expected wage of those who are not
married is equal to 798.44 KM.
Parameter b indicates that, on average, the wage of persons who are
married is by 178.61 KM greater than the wage of persons who are
not married.
Summary of parameter a and b indicates that those who are married
have average wage of 977.05 KM.
266
STATISTICS IN ECONOMICS AND MANAGEMENT
Example of multiple
regression models
What is interpretation of these parameters in the model? with indicator
and continuous
variables as an
Parameter a indicates that expected wage of those who have not explanatory
completed university, and whose work experience is equal to 0 (start variables in the
model.
to work) is equal to 275 KM.
Parameter bd indicates that, on average, the wage of the person who
finished university is by 162 KM higher than the wage of those who
has not completed university, holding other things constant
Parameter bx1 indicates that if all other factors in the model remain
unchanged increase of service for 1 month leads to increase of wages
for 6.3 KM, on average.
Example 3.5.
267
3 REGRESSION AND CORRELATION
Construct the model to predict the sales value of the house depending
on its size and information about the system of fire protection. Interpret
the parameters obtained.
Solution:
Since the variable possession of the fire protection system indicates the
absence/presence of the system, we need to create a dummy variable:
268
STATISTICS IN ECONOMICS AND MANAGEMENT
Possession of fire
Sale value - y Size - x d
protection systems
84.4 2 yes 1
77.4 1.71 no 0
75.7 1.45 no 0
85.9 1.76 yes 1
79.1 1.93 no 0
70.4 1.2 yes 1
75.8 1.55 yes 1
85.9 1.93 yes 1
78.5 1.59 yes 1
79.2 1.5 yes 1
86.7 1.9 yes 1
79.3 1.39 yes 1
74.5 1.54 no 0
83.8 1.89 yes 1
76.8 1.59 no 0
269
3 REGRESSION AND CORRELATION
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.900587
R Square 0.811057
Adjusted R
0.779567
Square
Standard
2.262596
Error
Observations 15
Significance
ANOVA Df SS MS F
F
Total 14 325.136
Standard Upper
Coefficients t Stat P-value Lower 95%
Error 95%
Possession
of fire
3.852982 1.241223 3.104183 0.009119 1.148591 6.557374
protection
systems
270
STATISTICS IN ECONOMICS AND MANAGEMENT
Since:
Further in the table are the parameters of the model which form the
regression equation: Interpretation
of the coefficients is:
– For each 100 square meters sale value is higher by 16.186 KM on
average, if other variables stay the same.
– House that possesses fire protection system has, on average,
greater sale value by 3.853 KM than house without fire protection
system, if other variables stay the same.
271
3 REGRESSION AND CORRELATION
272
STATISTICS IN ECONOMICS AND MANAGEMENT
1. Impartiality
2. Consistency
3. Efficiency
4. The best linear impartiality.
Multicollinearity
For first, we monitor correlation matrix. The rule of thumb says that
if the correlation coefficient between the independent variables is
higher than 0.8 (Gujarati, 2004, p.359), there could be the problem of
multicollinearity.
273
3 REGRESSION AND CORRELATION
Outliers
274
STATISTICS IN ECONOMICS AND MANAGEMENT
Normality
Autocorrelation
275
3 REGRESSION AND CORRELATION
1.
2.
3.
4. or
Heteroskedasticity
We will create two regressions for two samples and use the F test to
compare the residual deviations. Hypothesis H0 is accepted if there are
no significant differences between the sum residual squared deviations.
This problem could be solved by the weighted regression with the square
root of the inverse variable that is the source of heteroskedasticity.
276
STATISTICS IN ECONOMICS AND MANAGEMENT
3.1. There has been huge discussion in the media all over the world
about unproductive public sector labour force in Greece, especially
in the light of the current crisis that Greece is facing with. Foreign
analysist have complaints on the high salary that workers receive
for their poor performance. To see how workers earnings affect
their productivity, we collect data on average earnings and workers
productivity index in five public institutions in Greece. Data are
given in the table:
Average earnings
Institution Workers’ productivity index
(in 00KM)
I 103.3 139
II 103.9 140
III 104 140.5
IV 104.5 141
V 104.8 143
Solution:
277
3 REGRESSION AND CORRELATION
Intercept
Slope
278
STATISTICS IN ECONOMICS AND MANAGEMENT
All sums needed to calculate formulas (i.e. their parts mean, covariance
and standard deviation coefficients) will be obtained in the following
working table.
x y x.y x2 y2
139 103.3 14358.7 19321 10670.89
140 103.9 14546 19600 10795.21
140.5 104 14612 19740.25 10816
141 104.5 14734.5 19881 10920.25
143 104.8 14986.4 20449 10983.04
Total: 703.5 520.5 73237.6 98991.25 54185.39
279
3 REGRESSION AND CORRELATION
280
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
281
3 REGRESSION AND CORRELATION
the vertical axis (the y-axis) and the income on the horizontal axis (the
x-axis). From the scatter plot above we conclude that weight increase
with increase in mothly income.
Intercept
Slope
All sums needed to calculate formulas (i.e. their parts mean, covariance
and standard deviation coefficients) will be obtained in the following
working table.
x y x.y x2 y2
1000 23 23000 1000000 529
1150 25.5 29325 1322500 650.25
1100 25 27500 1210000 625
1300 27 35100 1690000 729
1600 30 48000 2560000 900
1400 28 39200 1960000 784
Total: 7550 158.5 202125 9742500 4217.25
282
STATISTICS IN ECONOMICS AND MANAGEMENT
283
3 REGRESSION AND CORRELATION
d)
If the monthly family income is 1500 KM, the estimated child’s weight
is 28.99 kg.
3.3. The data on monthly loan payment and amount of monthly savings
in 6 households are given in the following table:
Solution:
284
STATISTICS IN ECONOMICS AND MANAGEMENT
Intercept
Slope
All sums needed to calculate formulas (i.e. their parts mean, covariance
and standard deviation coefficients) will be obtained in the following
working table.
285
3 REGRESSION AND CORRELATION
x y x.y x2 y2
5 1.5 7.5 25 2.25
4.8 2 9.6 23.04 4
2.5 3 7.5 6.25 9
3.8 2.4 9.12 14.44 5.76
4 2.2 8.8 16 4.84
1.2 3.8 4.56 1.44 14.44
Total: 21.3 14.9 47.08 86.17 40.29
286
STATISTICS IN ECONOMICS AND MANAGEMENT
3.4. The table presents the production volume and costs in one
international company that were recorded during 6 year period:
287
3 REGRESSION AND CORRELATION
Solution:
288
STATISTICS IN ECONOMICS AND MANAGEMENT
x y x.y x2 y2
4 100 400 16 10000
6 146 876 36 21316
8 178 1424 64 31684
10 220 2200 100 48400
12 256 3072 144 65536
13 280 3640 169 78400
Total: 53 1180 11612 529 255336
289
3 REGRESSION AND CORRELATION
d)
3.5. In order to determine effect that the costs of advertising (x) have
on sales volume (y), we collected data at 10 different shopping
malls and obtained the following result:
290
STATISTICS IN ECONOMICS AND MANAGEMENT
11 33
16 41
26 63
29 87
Solution:
We will create a working table with all sums needed for our calculation.
291
3 REGRESSION AND CORRELATION
The
Volume
costs of
of sales x.y x2 y2 rx ry rx - r y ( rx - ry )2
advertising
-y
–x
18 55 990 324 3025 6 6 0 0
7 17 119 49 289 2 1 1 1
14 36 504 196 1296 4 4 0 0
31 85 2635 961 7225 10 9 1 1
21 62 1302 441 3844 7 7 0 0
5 18 90 25 324 1 2 -1 1
11 33 363 121 1089 3 3 0 0
16 41 656 256 1681 5 5 0 0
26 63 1638 676 3969 8 8 0 0
29 87 2523 841 7569 9 10 -1 1
Total: 178 497 10820 3890 30311 4
292
STATISTICS IN ECONOMICS AND MANAGEMENT
c)
3.6. The data of qualification rank and working efficiency rank for 6
employees are given in the table below:
Worker B E D A F C
Qualification rank 1 2 3 4 5 6
Efficiency score 25 30 23 21 18 20
293
3 REGRESSION AND CORRELATION
Solution:
Worker x rx ry d = rx - ty d2
B 25 1 2 -1 1
E 30 2 1 1 1
D 23 3 3 0 0
A 21 4 4 0 0
F 18 5 6 -1 1
C 20 6 5 1 1
Σ 4
Monthly sales
Test results
(in 1000 $)
10 55
11 62
29 80
12 62
294
STATISTICS IN ECONOMICS AND MANAGEMENT
20 70
13 62
24 75
18 80
15 65
Solution:
y x x.y y2 x2 ry rx d = ry - rx d2
10 55 550 100 3025 1 1 0 0
11 62 682 121 3844 2 3 1 1
29 80 2320 841 6400 9 8.5 0.5 0.25
12 62 744 144 3844 3 3 0 0
20 70 1400 400 4900 7 6 -1 1
13 62 826 169 3844 4 3 -1 1
24 75 1800 576 5625 8 7 -1 1
18 80 1440 324 6400 6 8.5 2.5 6.25
15 65 975 225 4225 5 5 0 0
Σ 152 611 10717 2900 42107 10.5
295
3 REGRESSION AND CORRELATION
3.8. There have been significant changes in the clothing market since
the beginning of the 21st century. Expansion of the discount
fashion sector and increasing number and type of competitors
(supermarket chain becoming more and more important factor at
clothing market) are just a few. In this competitive environment,
decision making is becoming more complex and requires more
information. The marketing manager of a popular clothing brand
would like to determine the effect of advertising expenditure on
the sales of clothes. To test the effectiveness of advertising, a
random sample of 5 markets is selected and following values are
recorded:
296
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
297
3 REGRESSION AND CORRELATION
x y x.y x2 y2
1.6 5 8 2.56 25
2.2 7 15.4 4.84 49
1.4 4 5.6 1.96 16
1.9 6 11.4 3.61 36
2.4 10 24 5.76 100
Total: 9.5 32 64.4 18.73 226
0.9481
298
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
299
3 REGRESSION AND CORRELATION
x y x.y x2 y2
30 60 1800 900 3600
60 59 3540 3600 3481
90 57 5130 8100 3249
120 55 6600 14400 3025
140 54.5 7630 19600 2970.25
160 53 8480 25600 2809
Total: 600 338.5 33180 72200 19134.25
300
STATISTICS IN ECONOMICS AND MANAGEMENT
301
3 REGRESSION AND CORRELATION
d)
Answer: b)
3.11. The data has been collected to show that tenure affect monthly
worker”s earnings (assuming that other workers characteristics
such as educational level or job responsibilities are the same):
302
STATISTICS IN ECONOMICS AND MANAGEMENT
303
3 REGRESSION AND CORRELATION
Answer: b) c)
d)
304
STATISTICS IN ECONOMICS AND MANAGEMENT
Orders
Weight of the mail (pounds)
(in 000)
216 6.1
283 9.1
237 7.2
203 7.5
259 6.9
374 11.5
342 10.3
301 9.5
365 9.2
384 10.6
404 12.5
426 12.9
482 14.6
432 13.6
409 12.8
305
3 REGRESSION AND CORRELATION
3.15. The evil Swindler has been collecting data on the effect radiation
exposure has on Captain Amazing’s super powers. Here is the
number of minutes of exposure to radiation, paired with the
number of tons Captain Amazing is able to lift:
Your job is to use least squares regression to find the line of best fit,
and then find the correlation coefficient to describe the strength of the
relationship between your line and the data. Sketch the scatter diagram
too. If Swindler exposes Captain Amazing to radiation for 5 minutes,
what weight do you expect Captain Amazing to be able to lift?
3.16. Sample data showing the predicted hours of sunshine and concert
attendance for different events. We can use this to estimate ticket
sales based on the predicted hours of sunshine for the day.
306
STATISTICS IN ECONOMICS AND MANAGEMENT
4.7 38
5.5 49
5.9 42
7.2 55
307
4
TIME SERIES
ANALYSIS
CHAPTER
4
STATISTICS IN ECONOMICS AND MANAGEMENT
4.1. INTRODUCTION
Some effects are relatively stable, and did not show rapid changes in the
scope and structure, so it is enough to follow a year or even five-year
311
4 TIME SERIES ANALYSIS
The basic assumption of time-series analysis is that the factors that have
influenced patterns of activity in the past and present will continue to do
so in more or less the same manner in the future. Because of that, main
aim of time-series analysis is to identify and isolate these influencing
factors for process of prediction.
To achieve this goal, many mathematical models have been devised for
exploring the changes and fluctuations among the component factors
of a time series. Most fundamental models are given for data recorded
annually, quarterly or monthly.
312
STATISTICS IN ECONOMICS AND MANAGEMENT
313
4 TIME SERIES ANALYSIS
Many sales, production and other time series fluctuate with the seasons.
Typical examples of variables or phenomena with seasonal component are:
consumption of electricity and gas, production of agricultural products,
number of overnight stays in tourism,
intensity of construction, etc.
314
STATISTICS IN ECONOMICS AND MANAGEMENT
These deviations are positive in some years and negative in others, and,
in general, do not lead to changes in trend. But if the effect of random
factors is strongly expressed (e.g. in case of war or an earthquake
etc.) then it is possible that their effect (positive or negative) will
lead to changes in the basic course of development of phenomena
(the trend).
315
4 TIME SERIES ANALYSIS
28
Source: Somun-Kapetanović R., Statistika u ekonomiji i menadžmentu, Ekonomski fakultet u
Sarajevu, Sarajevo 2008., page 202
316
STATISTICS IN ECONOMICS AND MANAGEMENT
When we have more series monitored in the same period then we can apply:
Arithmetic chart (lines)
Connected bars and
Split bars.
Example 4.1.
29
http://www.bhas.ba/new/indikatori.asp?Pripadnost=6, access: 28. 01. 2010.
317
4 TIME SERIES ANALYSIS
Graphicaly
presentation
of time series
by the bar chart.
Graphicaly
presentation of time
series by the arithetic
diagram.
318
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 4.2.
30
http://www.bhas.ba/new/indikatori.asp?Pripadnost=6, access: 28. 01. 2010.
319
4 TIME SERIES ANALYSIS
Graphicaly
presentation of time
series with
connected bars.
Graphicaly
presentation
of time series
with splitted bars.
320
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 4.3.
For 2008, we monitored monthly Export of B&H31. The results are given
in the next table:
Graphicaly
presentation
of time series
with polar diagram.
31
http://www.bhas.ba/new/indikatori.asp?Pripadnost=6, access: 28. 01. 2010.
321
4 TIME SERIES ANALYSIS
322
STATISTICS IN ECONOMICS AND MANAGEMENT
323
4 TIME SERIES ANALYSIS
Relative changes
Absolute changes compared compared to 2000
GDP to 2000. year year
Year
('000 KM)
In the same way as we take the initial year with which comparison
is made, we can take any of the years from a given period. Some
comparisons may always be made with the previous year:
Relative changes
Absolute changes compared compared to previous yea
GDP to previous year
Year
('000 KM)
2000 6,722,631 / /
324
STATISTICS IN ECONOMICS AND MANAGEMENT
Index numbers are not concerned with absolute values but rather the
movement of values for analyzed variable. Index numbers can provide
summary of changes by aggregating the available information and
enabling a comparison to a starting figure of 100.
325
4 TIME SERIES ANALYSIS
Fixed base indices or basis indices always take the same year
as a base year:
326
STATISTICS IN ECONOMICS AND MANAGEMENT
There is the connection between the base and chain index, as follows:
We can use this connection for conversion from basic to chain indices
or vice versa.
Also, we can find the connection between the basic indices with different
bases:
We apply the indices to calculate rate of change and vice versa according
to the following link between these parameters:
327
4 TIME SERIES ANALYSIS
2006 12,146,338
2007 13,861,000
2008 15,632,000
Example 4.4.
Solution:
328
STATISTICS IN ECONOMICS AND MANAGEMENT
The rate of change or increase in this case in the third in relation to the
first year is 1.2%.
From the last formula we can express average annual rate r using formula:
329
4 TIME SERIES ANALYSIS
Example 4.5.
Known levels of investment in one branch of the economy (in $000) are
given in the next table:
Year Investment
2003 150
2004 184
2005 192
2006 185
2007 187
2008 191
a) What is the average annual growth rate and what does it mean?
b) If we continue this growth per annum, in which year will investment
reach the level of 82% higher than the level of investment in 2003?
c) If the growth per annum stays the same, what investment level can
be expected in 2012?
Solution:
a)
Calculating and
interpreting average
annual growth rate.
330
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
Determining of
number of years.
c)
Forecasting
the level of
phenomenon.
331
4 TIME SERIES ANALYSIS
For example, the price index means and expresses the common price
variations for all products in consumer basket for observed period.
332
STATISTICS IN ECONOMICS AND MANAGEMENT
333
4 TIME SERIES ANALYSIS
334
STATISTICS IN ECONOMICS AND MANAGEMENT
335
4 TIME SERIES ANALYSIS
336
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 4.6.
We have information about prices and quantities sold for the four items
in the two periods (2007 and 2008):
Solution:
At first we will complete a working table with sums that we need for
calculation according to method of aggregation32:
Calculating and
interpreting index
of values.
32
Method of aggregation is simpler option for calculating than method of weighted average,
hence we use method of aggregation. However, results have to be same.
337
4 TIME SERIES ANALYSIS
Calculating and
interpreting
price aggregate
indices.
Observed volume of
consumer basket with 4 products is increased by 10.64% in 2008
compared to 2007 (by Fisher).
Calculating and
interpreting
quantity aggregate
indices.
Observed price of
consumer basket with 4 products is increased by 16.93% in 2008
compared to 2007 year (by Fisher).
338
STATISTICS IN ECONOMICS AND MANAGEMENT
Or:
Application
of index
decomposition.
Example 4.7.
Solution:
339
4 TIME SERIES ANALYSIS
If we need only general idea where the trend is going, then we will use
our judgment to draw a trend line onto the graph or we will use method
by „eye“. First step in a time series analysis is to plot the original data
and observe any patterns that may occur over time.
The main problem with this method is that several persons all drawing
such a trend line will tend to create slightly different lines. Then there
is discussion that has got the best line. Also, estimation by „eye“ does
not provide such approach that would be appropriate for more complex
further analysis.
340
STATISTICS IN ECONOMICS AND MANAGEMENT
This type of trend tries to smooth out the oscillations in original data
series by looking at intervals of time that make sense, finding an average
value and then moving forward by one step and again calculating an
average. The process continues until we reach the end of data set.
341
4 TIME SERIES ANALYSIS
The longer the length of the period selected for constructing the moving
averages, the fewer the number of moving averages that can be computed
and plotted. So, selecting moving averages with periods of length greater
than 7 time units (for example years) is usually undesirable because too
many data points would be missing at the beginning and end of the
original data set. Because of that, overall impression of the whole series
can be very difficult to obtain.
By the method of moving averages, we could "press" the trend line and
we will eliminate the impact of residiuum. According to that we can
conclude that some phenomenon has the growing or declining long-
term character. When we have original series of quarterly data, then
calculated moving averages don’t contain seasonal variation, because
the moving average for quarterly data eliminated those.
Example 4.8.
We know the data on the treasury bill rates for the period 2000-2009:
342
STATISTICS IN ECONOMICS AND MANAGEMENT
Arithmetic
diagram
343
4 TIME SERIES ANALYSIS
Arithmetic
diagram
As we can see on the graph, we will get new aligned (pressed) line of
moving averages of order 3.
344
STATISTICS IN ECONOMICS AND MANAGEMENT
Graphicaly
presentation of
moving averages
order 4
As we can see on graph we will get new aligned (pressed) line for
moving averages of order 4. By completion of gross graph data with
graph which we received on the basis of data calculated as fourth-
order moving averages, we can recognize the trend of growth for
analyzed phenomena in the observed period. We can conclude that this
phenomenon mostly has the growing long-term character.
Example 4.9.
345
4 TIME SERIES ANALYSIS
Q3 23
Q4 37
2007-Q1 23
Q2 26
Q3 22
Q4 39
2008-Q1 24
Q2 29
Q3 28
Q4 40
346
STATISTICS IN ECONOMICS AND MANAGEMENT
347
4 TIME SERIES ANALYSIS
2. On the graph in Excel we will take given line and click right tip on
mouse:
348
STATISTICS IN ECONOMICS AND MANAGEMENT
For Period we will take 4 and will get a new line for moving average of
order 4:
In the same way we can get the graph for moving averages of order p.
Trend is the most often analyzed component of time series, and studied
as a help in making forecasting projection.
In this section the main focus is on least-square method for fitting best
mathematical trend model as guide for forecasting.
349
4 TIME SERIES ANALYSIS
350
STATISTICS IN ECONOMICS AND MANAGEMENT
Relative error of trend can be used for comparison of the series expressed
in different units of measure.
Linear trend
351
4 TIME SERIES ANALYSIS
352
STATISTICS IN ECONOMICS AND MANAGEMENT
Parabolic trend
There are the same rules for centering of independent time variables as
in linear trend model. Parameter a is estimated intercept, parameter b
is estimated linear time effect on dependent variable and parameter c is
estimated quadratic time effect on dependent variable.
353
4 TIME SERIES ANALYSIS
Parameters are evaluated using the LSM based system and then we get
the normal equation:
Exponential trend
LSM can be directly applied for exponential trend model. First we have
to make linearization:
354
STATISTICS IN ECONOMICS AND MANAGEMENT
355
4 TIME SERIES ANALYSIS
Arithmetic
diagram
33
http://www.bhas.ba/new/indikatori.asp?Pripadnost=6, access: 28. 01. 2010.
356
STATISTICS IN ECONOMICS AND MANAGEMENT
Year y x
2000 6,722,631 -4
2001 7,273,874 -3
2002 7,942,665 -2
2003 9,688,863 -1
2004 10,321,440 0
2005 10,831,267 1 Centering
2006 12,146,338 2 independent time
variable - odd
2007 13,861,000 3 number of data.
2008 15,632,000 4
Now we can apply linear trend model. First we need sums from working
table:
Year y x x2 x .y
2000 6,722,631 -4 16 -26,890,524
2001 7,273,874 -3 9 -21,,821,622
2002 7,942,665 -2 4 -15,885,330
2003 9,688,863 -1 1 -9,688,863
2004 10,321,440 0 0 0
2005 10,831,267 1 1 10,831,267
2006 12,146,338 2 4 242,92,676
2007 13,861,000 3 9 41,583,000
2008 15,632,000 4 16 62,528,000
Total 94,420,078 0 60 64,948,604
357
4 TIME SERIES ANALYSIS
Calculating and
interpreting of
linear trend model
coefficients.
Determining linear
trend model.
Year y yt ( y - yt )2
2000 6,722,631 6,161,213 315,190,345,387.40
2001 7,273,874 7,243,690 911,099,344.89
2002 7,942,665 8,326,166 147,073,255,623.94
2003 9,688,863 9,408,643 78,523,223,491.56
2004 10,321,440 10,491,120 28,791,226,986.72
2005 10,831,267 11,573,597 551,053,103,066.46
2006 12,146,338 12,656,073 259,830,019,428.84
2007 13,861,000 13,738,550 14,994,007,942.22
2008 15,632,000 14,821,027 6576,77,675,291.26
Total 94,420,078 94,420,078 2,054,043,956,563.29
Calculating standard
error of trend.
358
STATISTICS IN ECONOMICS AND MANAGEMENT
The other way is to calculate relative error of trend that is given by trend
coefficient of variation:
Calculating trend
coefficient of
variation.
This value for relative error of trend is low, so we can say that linear
trend model is representative of original data set.
Now we can make forecasting for the next period, for example for 2010,
assuming that the trend remains the same:
Forecasting values
of phenomenon for
the next period.
Assuming that the trend remains the same, we can expect that GDP for
2010 will be 16,985,980,200 KM.
At the end, we can apply trend isolation method. We need original and
predicted data:
Where trend isolation expression has value higher than 100 (years:
2000, 2001, 2003, 2007, 2008), the residiuum has positive impact on
359
4 TIME SERIES ANALYSIS
GDP movement. Where trend isolation expression has value lower than
100 (years: 2002, 2004, 2005, 2006), the residiuum has negative impact
on GDP movement. We can present this on the graph:
Graphicaly
presentaton
of trend isolation.
When line for trend isolation is above 100, residiuum has positive
impact on GDP movement. When line for trend isolation is below 100,
residiuum has negative impact on GDP movement.
Example 4.10.
360
STATISTICS IN ECONOMICS AND MANAGEMENT
2005 891
2006 992
2007 1110
2008 1148
Arithmetic
diagram
Year y x
1999 581 -4.5
2000 581 -3.5
2001 590 -2.5
361
4 TIME SERIES ANALYSIS
Now we can apply linear trend model. First we need sums from the
working table:
Year y x x2 x.y
1999 581 -4.5 20.25 -2,614.5
2000 581 -3.5 12.25 -2,033.5
2001 590 -2.5 6.25 -1475
2002 620 -1.5 2.25 -930
2003 699 -0.5 0.25 -349.5
2004 781 0.5 0.25 390.5
2005 891 1.5 2.25 1,336.5
2006 992 2.5 6.25 2,480
2007 1,110 3.5 12.25 3,885
2008 1,148 4.5 20.25 5,166
Total 7,993 0 82.5 5,855.5
Calculating and
interpreting of
linear trend model
coefficients.
Determining linear
trend model.
362
STATISTICS IN ECONOMICS AND MANAGEMENT
By Excel procedure we will check quality of given linear trend model. Calculating linear
On Excel graph Add trend line will be selected and options for linear trend model using
model equation and R square value: Excel.
363
4 TIME SERIES ANALYSIS
364
STATISTICS IN ECONOMICS AND MANAGEMENT
Result is:
Assuming that the trend remains the same, we can aspect that actual
gross revenue in 2011 will be 1,331.65 million dollars.
Example 4.11.
The following annual time series for the number of passengers (in
millions) on a particular airline is given:
365
4 TIME SERIES ANALYSIS
Arithmetic
diagram
366
STATISTICS IN ECONOMICS AND MANAGEMENT
Year y x
2000 30 1 Set up independent
time variable without
2001 32.7 2 centering.
2003 36 4
2004 37.9 5
2005 39.2 6
2007 43.1 8
2008 45 9
2009 47.8 10
Now we can apply linear trend model. First we need sums from the
working table:
Year y x x2 x.y
2000 30 1 1 30
2001 32.7 2 4 65.4
2003 36 4 16 144
2004 37.9 5 25 189.5
2005 39.2 6 36 235.2
2007 43.1 8 64 344.8
2008 45 9 81 405
2009 47.8 10 100 478
Total 311.7 45 327 1,891.9
Calculating and
interpreting of linear
trend model
coefficients.
Determining linear
trend model.
367
4 TIME SERIES ANALYSIS
We will again use the same Excel procedure as in the previous example
to check quality of given linear trend model. The result is:
Graphically
presentation
of linear trend.
Assuming that the trend remains the same, we can aspect that the
number of passengers in 2013 will be 52.8 million.
368
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 4.12.
369
4 TIME SERIES ANALYSIS
370
STATISTICS IN ECONOMICS AND MANAGEMENT
371
4 TIME SERIES ANALYSIS
Example 4.13.
Year Trade
2000 395
2001 459
2002 558
2003 607
2004 751
2005 816
2006 956
2007 1,137
2008 1,328
372
STATISTICS IN ECONOMICS AND MANAGEMENT
But if we choose exponential model in option Add trend line, the result
will be:
373
4 TIME SERIES ANALYSIS
R square value is higher for exponential model and hence we will decide
to apply exponential model: y = 6E-174·e0.2024·x, for forecasting on retail
trade turnover in company ABS.
4.1. Data about certain phenomenon are collected for the period 1998-
2004:
Solution:
Chain
1998 325 /
indices 1999 338 104.00
2000 346 102.37
2001 342 98.84
2002 357 104.39
2003 359 100.56
2004 365 101.67
- chain index
374
STATISTICS IN ECONOMICS AND MANAGEMENT
4.2. Data about certain phenomenon are collected in the period 1996-
2002.
Solution:
375
4 TIME SERIES ANALYSIS
Solution:
376
STATISTICS IN ECONOMICS AND MANAGEMENT
Number of graduate
Year xi xi2 xi . yi yti
students - yi
2000 100 -5 25 -500 104.24
2001 112 -3 9 -336 110.61
2002 120 -1 1 -120 116.98
2003 127 1 1 127 123.36
2004 129 3 9 387 129.73
2005 133 5 25 665 136.10
Σ 721 0 70 223
b) In 2002.
4.4. Values of investment in the car industry (000 $) in the period 1999-
2003. are given in the following table:
377
4 TIME SERIES ANALYSIS
Year Investment
1999 185
2000 187
2001 191
2002 188
2003 193
Solution:
a)
Calculating and
interpreting the
average absolute Investment in the car industry increases by 2000 $ annually, on average.
growth.
b)
378
STATISTICS IN ECONOMICS AND MANAGEMENT
c)
d)
4.5. Quantities and prices for the three products (A, B and C) in period
1998 – 1999 are presented in table below:
379
4 TIME SERIES ANALYSIS
Quantities Prices
Product
1998 1999 1998 1999
A 10 11 61 65
B 4 5 54 37
C 5 6 82 83
Solution:
q0 q1 p0 p1 p0 . q0 p1 . q1 p0 . q1 p1 . q0
10 11 61 65 610 715 671 650
4 5 54 37 216 185 270 148
5 6 82 83 410 498 492 415
Total 1236 1398 1433 1213
a)
380
STATISTICS IN ECONOMICS AND MANAGEMENT
Year Phenomenon Y
1997 28
1998 36
1999 33
2000 39
2001 41
2002 40
2003 45
381
4 TIME SERIES ANALYSIS
Solution:
t yi xi xi2 xi . yi yti
1997 28 -3 9 -84 30.26
1998 36 -2 4 -72 32.65
1999 33 -1 1 -33 35.04
2000 39 0 0 0 37.43
2001 41 1 1 41 39.82
2002 40 2 4 80 42.21
2003 45 3 9 135 44.6
Σ 262 0 28 67
a) Arithmetic diagram:
382
STATISTICS IN ECONOMICS AND MANAGEMENT
c)
a) Calculate and explain the basic indices with the base in 1999.
b) Calculate and explain relative change.
c) Calculate and explain the average growth rate.
d) Calculate and explain the average absolute growth.
383
4 TIME SERIES ANALYSIS
4.9. Data on meat, milk and cheese prices and quantities produced for
the period 1996-1998 are presented in the table below:
Production Prices
Products
1996 1997 1998 1996 1997 1998
Meat (000 kg) 30 33 35 10.00 10.50 11.00
Milk (000 l) 25 27 30 1.10 1.20 1.25
Cheese (000 kg) 10 12 15 6.00 6.50 7.00
384
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer:
a)
b)
c)
385
4 TIME SERIES ANALYSIS
Year Investments
1996 175
1997 250
1998 280
1999 300
2000 350
2001 400
2002 480
2003 565
2004 690
2005 720
4.11. Average net salary in Bosnia and Herzegovina in the period 1998
- 2003 was:
386
STATISTICS IN ECONOMICS AND MANAGEMENT
387
4 TIME SERIES ANALYSIS
7 3201
8 3075
9 3094
388
5
PROBABILITY
AND
THEORETICAL
DISTRIBUTIONS
CHAPTER
5
STATISTICS IN ECONOMICS AND MANAGEMENT
5.1. INTRODUCTION
391
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Consider a standard deck of cards that has 26 red and 26 black cards.
The probability of selecting a black card is:
392
STATISTICS IN ECONOMICS AND MANAGEMENT
For example, if a poll is a taken and 57% of the respondents indicate that
they prefer the candidate X, there is 0.57 probability that an individual
respondent randomly selected prefers the candidate X.
393
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
394
STATISTICS IN ECONOMICS AND MANAGEMENT
When the outcome of one event does not affect the probability
of occurrence of another event, the events are independent.
For example, if we role dice and coin in the same time, the outcomes
are independent as the outcome of dice does not influence the outcome
of the coin. However, if we select two cards from a deck without
replacement, the outcome of the second selection will be influenced by
the first selection. The probability of getting a “A Face card” in the first
selection is . But the probability of getting an “A Face card” in
the second selection depends on the outcome of the first selection and
it is:
, if in the first selection outcome was not “A Face card”,
395
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
396
STATISTICS IN ECONOMICS AND MANAGEMENT
For example, if two dices (black and white) are rolled, events
“a 4 on black dice” and “2 on white dice” are independent and
probability that these two events occurr simultaneously will be
397
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Example 5.1.
398
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.2.
Solution:
Probability that we randomly select woman under condition that she has
bachelor degree in business is 26.86%.
399
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Example 5.3.
Solution:
Event A – first computer will work for 1 year before any repair is needed
Event B – second computer will work for 1 year before any repair is needed
Event C – third computer will work for 1 year before any repair is needed
Example 5.4.
Suppose that the probability that you will get an A in Statistics is 0.65
and that probability that you will get an A in Organizational Behaviour
Illustration: general
multiplication rule for is 0.8. If these events are independent, what is probability that:
independent events a) you will get an A in both subjects.
and general b) you will get at least one A.
addition rule for
events that are not
mutually exclusive. Solution:
400
STATISTICS IN ECONOMICS AND MANAGEMENT
b)
Example 5.5.
Solution:
a) Contingency table
401
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
b) Simple events:
responent chosen at random is male
respondent chosen at random is female
respondent chosen at random enjoys shopping clothes
respondent chosen at random does not enjoy shopping clothes
Joint events:
a female and enjoys shopping clothes
a female and does not enjoys shopping clothes
a female or does not enjoys shopping clothes
d) i.
ii.
iii.
iv.
v.
vi.
vii.
402
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.6.
Suppose that a school has 60% boys and 40% girls. Half of the girls wear
trousers and the other half wear skirts, while all boys wear trousers. An Application of
observer sees a (random) student from a distance; all they can see is that Bayes theorem.
this student is wearing trousers. What is the probability this student is
a girl?
Solution:
It is clear that the probability is less than 40%, but by how much? Is it
half that, since only half of the girls are wearing trousers? The correct
answer can be computed using Bayes‘ theorem.
403
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
The event A is that the observed student is a girl, and the event B is that
the observed student is wearing trousers. In order to compute p(A/B),
we first need to determine:
Given all this information, the probability of the observer having spotted
a girl given that the observed student is wearing trousers can be compu-
ted by substituting these values in the formula:
404
STATISTICS IN ECONOMICS AND MANAGEMENT
405
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
34
f (x)is probability density function of continuous random variable.
406
STATISTICS IN ECONOMICS AND MANAGEMENT
407
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
408
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.7.
Solution:
409
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Probability that he will realise three sales among these five contacts is
23%.
Mean
Variance
Shape
410
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.8.
411
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
412
STATISTICS IN ECONOMICS AND MANAGEMENT
413
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
where:
• x is number of events per unit (number of successes per unit)
• p(x) is the probability of x successes given the knowledge of λ
• λ is the average or expected number of events per unit
(average or expected number of successes per unit)
• e=2.71828 (constant)
414
STATISTICS IN ECONOMICS AND MANAGEMENT
The horizontal axis is the index k. The function is only defined at integer
values of k (empty lozenges). The connecting lines are only guides for the eye.
Example 5.9.
Solution:
a)
b)
There is 32.3% of chance that out of 2,000 individuals more than 2 will
suffer a bad reaction.
415
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Solution:
Shape
Poisson distribution is always positively (right) skewed.
Mean
Variance
416
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.11.
417
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
418
STATISTICS IN ECONOMICS AND MANAGEMENT
P (X ≤ 2) = {=POISSON(2;8.4;TRUE)}=0.010047 1.0047 %
419
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Example 5.12.
N = 32, n = 8, M = 22, k = 3
Example 5.13.
420
STATISTICS IN ECONOMICS AND MANAGEMENT
421
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Finally, probability that there are not more than 2 incorrect products
is sum of previous computed probabilities (“or” probability for
mutually excluded events) – 0.931034 93.1%
422
STATISTICS IN ECONOMICS AND MANAGEMENT
Proof:
423
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
since it is:
We used following:
424
STATISTICS IN ECONOMICS AND MANAGEMENT
Since it is:
and
Finally:
425
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
426
STATISTICS IN ECONOMICS AND MANAGEMENT
1.
2.
3.
4.
427
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
428
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.14.
Illustration of
Normal distribution
The tread life of a certain brand of tire has a normal distribution with and standardized
mean 35,000 miles and standard deviation 4,000 miles. For randomly Normal
selected tire, what is probability that its life is: distribution.
Application of
a) less than 37,200 miles standardized Normal
b) more than 38,000 distribution rules.
c) between 30,000 and 36,000 miles Using of statistical
tables.
d) less than 34,000 miles
e) more than 33,000 miles.
Solution:
a)
429
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
b)
c)
d)
e)
Solution by
First we have to standardize or to transform x in z.
using Excel.
430
STATISTICS IN ECONOMICS AND MANAGEMENT
431
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
This is table value for cumulate because z is positive and relation is <.
We don’t ask for probability in point than for cumulative function, so for
option Cumulative we will take True:
432
STATISTICS IN ECONOMICS AND MANAGEMENT
This is not table value for cumulate because z is positive and relation is
>. We don’t look for probability in point but for cumulative function, so
for option Cumulative we will take True but on the end apply formula
for the opposite event:
433
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
And
434
STATISTICS IN ECONOMICS AND MANAGEMENT
and
435
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
436
STATISTICS IN ECONOMICS AND MANAGEMENT
437
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Example 5.15.
Solution:
We can also use Excel function to obtain the result. There is inverse
situation, we know probability and we need to find z and x for that
probability. We will use Excel function NORMINV:
438
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.16.
A journal editor finds that the length of time that elapses between
receipt of a manuscript and a decision on publication follows a normal
distribution with mean 18 weeks and deviation 4 weeks. If the probability
that it will take longer is 0.2, how many weeks will pass before a decision
on a manuscript is made?
Solution:
439
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
We can also use Excel function. There is opposite for table cumulate.
So, we will find z for table value (1-0.2) = 0.8.
440
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.17.
441
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
442
STATISTICS IN ECONOMICS AND MANAGEMENT
mi - observed frequency
ei - expected (theoretical) frequency.
443
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Example 5.18.
We can also obtain the same result by using Excel function. When
, it is direct relation for Excel function CHIINV.
444
STATISTICS IN ECONOMICS AND MANAGEMENT
5.14. F DISTRIBUTION
445
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
446
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 5.19.
We can also apply Excel solution for this problem. There is relation >, so
we can directly apply Excel function FINV:
447
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Solution:
The sample space is the collection of all possible events. In our example,
the possible outcomes that can be realized on each of dices are the
numbers: 1, 2, 3, 4, 5 and 6. Hence, the sample space is set of all possible
pairs of numbers, where the first number represents result recorded on
the first dice and the second number represents the number recorded on
the second dice. Therefore, required sample space is following set:
448
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
The probability that “three” will appear on both dice is equal to:
449
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Notice that (in the same way as under a), we concluded that
satisfied if “three” will appear on the first dice or “three” will appear
on the second dice (both of last two events include the case that “three”
will appear on both dice). We conclude that event C will be realized if
any of events A and B is realized. Therefore: .
450
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
451
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
c) Calculated in part b) :
5.4. A direct retailer can receive orders either from its catalogue or by
Contingency table, repeat-customer order forms or by phone. The orders are classified
multiplication and as small, medium and large. The data about last 1000 orders are
addition rule,
given in the table below:
conditional probability.
452
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
P – order by phone;
S – small order;
M – medium order;
L – large order;
a)
c)
d) In this case, the event L is already realized (we know that the order
is large), therefore it’s about conditional probability:
453
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Solution:
The events:
b)
c)
454
STATISTICS IN ECONOMICS AND MANAGEMENT
5.6. Suppose that 5 out of 100 men and 25 out of 10000 women are
colour blind and suppose that number of men equals the number Application of Bayes
of women. theorem.
Solution:
The events:
M – person is male,
F – person is female,
D – person is colour blind:
The events M and F are mutually exclusive and their union covers the
entire sample space (each person is a male or female).
a)
455
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Solution:
c)
d)
456
STATISTICS IN ECONOMICS AND MANAGEMENT
f)
g)
Solution:
457
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
b)
c)
d)
e)
Solution:
458
STATISTICS IN ECONOMICS AND MANAGEMENT
a)
b)
c)
459
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Graphical presentation:
Solution:
460
STATISTICS IN ECONOMICS AND MANAGEMENT
a)
b)
461
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Answer: 86%
Answer: 45%
462
STATISTICS IN ECONOMICS AND MANAGEMENT
Forecast
Outcome
improvement about the same Worse
improvement 218 82 66
about the same 106 153 75
worse 75 84 141
Answer: 93.81%
463
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Solution:
b) the probability that X is more than 93 and less than 162 (between 93
and 162)
464
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer: 2.28%
a) What is the probability that four people drawn at random all have
blue eyes?
b) What is the probability that two individuals out of four in a sample
have blue eyes?
c) Calculate the mean and variance of blue eyed individuals in the
previous exercise
Answer:
a) 12.96%
b)
465
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
Solution:
a)
466
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer: 0.05
Answer: 0.19
5.25. 512 out of 1000 newborns are boys. Find the probability that a
newborn is a boy.
Answer: 0.512
5.26. One card is drawn from the deck of cards consisting of 32 cards
(from 7 to ace). Find the probability that the drawn card is ace or
king.
Answer: 0.25
467
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS
5.29. On the basis of experience, 10% of all shoes made in certain shoe
factory are damaged. Find the probability that:
5.30. If dice is thrown 10 times, find the probability that “four” falls 3
times.
Answer: 15.5%
Answer: 13.78%
468
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer: 27.385%
Answer: 0.135%
469
6
INFERENTIAL
STATISTICS
CHAPTER
6
STATISTICS IN ECONOMICS AND MANAGEMENT
6.1. INTRODUCTION
35
ibidem
473
6 INFERENTIAL STATISTICS
where:
• ϕ - statistic from sample
• - parameter from population
• h - surroundings
• (1 – α) - confidence
• α - first type error
36
In the most common use of hypothesis testing, a „straw man“ null hy-
pothesis is put forward and it is determined whether the data are strong
enough to reject it. For the sleep deprivation study, the null hypothesis
would be that sleep deprivation has no effect on performance.
36
Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1
474
STATISTICS IN ECONOMICS AND MANAGEMENT
The samples are chosen randomly and the values of point estimators
are random variable. The values of these variables are randomly
475
6 INFERENTIAL STATISTICS
This proves that the arithmetic mean of arithmetic means of the samples
is equal to arithmetic mean of the population. That means that we have
unbiased estimation of the parameter (mean) for population.
476
STATISTICS IN ECONOMICS AND MANAGEMENT
We can say that the larger the sample size n then the closer the sampling
distribution of the sample mean is to being normal. In other words, the
larger n means the better the approximation.
The standard error of the mean depends on the sample size (n), so the
larger sample leads to the smaller standard error of the mean.
477
6 INFERENTIAL STATISTICS
where:
• is the sample mean
• z is the upper critical value for the standard
normal distribution and depends on required confidence
• is the standard error of the mean.
If we know standard deviation for population, there are some rules for
determining sample size:
In most applications, a sample size of n = 30 is adequate.
If the population distribution is highly skewed or contains outliers, a
sample size of 50 or more is recommended.
If the population is not normally distributed but is roughly symmetric,
a sample size as small as 15 will suffice.
If the population is believed to be at least approximately normal, a
sample size of less than 15 can be used.
478
STATISTICS IN ECONOMICS AND MANAGEMENT
where:
• is the sample mean
• t is the upper critical value for the t distribution
with (n-1) degrees of freedom,
• is approximation for the standard error of the
mean
479
6 INFERENTIAL STATISTICS
Example 6.1.
Solution:
Confidence interval of
Example 6.2.
the population mean
with unknown NGOs’ often present in public that millionaires should be required
population standard
devitation, large
to donate to charity. Hence, we take a sample of 19 millionaires and
sample. conduct a survey to find out what percent of their income the average
millionaire donates to charity. The mean percent in the observed sample
480
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
Example 6.3.
481
6 INFERENTIAL STATISTICS
Solution:
Example 6.4.
Solution:
37
http://www.doingbusiness.org/CustomQuery/, predictions for 2009. year, access: 13. 12. 2009.
482
STATISTICS IN ECONOMICS AND MANAGEMENT
483
6 INFERENTIAL STATISTICS
>>>
484
STATISTICS IN ECONOMICS AND MANAGEMENT
where:
• is the proportion in the sample,
• z depends on the level of desired confidence, and
• σ , the standard error of a proportion, is equal to:
where:
• pA is the proportion of the population and
• n is the sample size.
Example 6.5.
Solution:
485
6 INFERENTIAL STATISTICS
• small sample
• large sample
486
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 6.6.
This is large sample, and then confidence interval will be with normal
distribution:
487
6 INFERENTIAL STATISTICS
Solution:
This is small sample, and then confidence interval will be with chi-
square distribution:
With 95% certainty, the population variance would be in interval [7.3, 26.95].
488
STATISTICS IN ECONOMICS AND MANAGEMENT
When sample data is collected and the sample mean is calculated, that
sample mean is typically different from the population mean μ. This
difference between the sample and population means can be thought of
as an error.
where:
• is known as the critical value, the positive z value that
is at the vertical boundary for the area of in the right tail
of the standard normal distribution.
• σ is the population standard deviation.
• n is the sample size.
Rearranging this formula, we can get the expression for the sample size
necessary to produce results accurate to a specified confidence and
margin of error:
489
6 INFERENTIAL STATISTICS
Although it’s unlikely that you know σ when the population mean is not
known, you may be able to determine σ from a similar process or from
a pilot test/simulation.
Example 6.8.
We want to estimate average bill for the mobile phone that inhabitants of
Illustration of a capital spend. Studies obtained elsewhere find the standard deviation of
determining sample $25. The group wants to estimate the average bill within of the true
size for estimating average and with 95% confidence. Determine the size of a sample needed?
population mean.
Solution:
490
STATISTICS IN ECONOMICS AND MANAGEMENT
where:
is known as the critical value, the positive value that is at the
vertical boundary for the area of in the right tail of the standard
normal distribution.
pA is the proportion of population.
N is the sample size.
This formula can be used when you know pA and want to determine
the sample size necessary to establish, with a confidence of (1 _ α), the
proportion for population within .
You can still use this formula if you don’t know your population
proportion and you have a proportion from sample:
Example 6.9.
491
6 INFERENTIAL STATISTICS
Solution:
38
Levine D.M. and others, Statistics for Managers Using Microsoft Excel, Prentice Hall, NY,
2005., p. 332
492
STATISTICS IN ECONOMICS AND MANAGEMENT
39
Levine D.M. and others, Statistics for Managers Using Microsoft Excel, Prentice Hall,
NY, 2005., p. 333
493
6 INFERENTIAL STATISTICS
494
STATISTICS IN ECONOMICS AND MANAGEMENT
The sampling distributions of the test statistics are divided into two
regions:
Region of rejection (critical region) and
Region of non-rejection.
495
6 INFERENTIAL STATISTICS
Next table illustrates the results of two possible decisions (do not reject
H0 or reject H0) that can occur in any hypothesis test. Depending on the
specific decision, one of two types of errors may occur or one of two
types of correct conclusion may be reached.
Actual situation
Statistical decision
H0 true H0 false
Correct decision Type II error
do not reject H0
Confidence = (1-α) p(type II error) = β
Type I error Correct decision
reject H0
p(type I error) = α Power = (1-β )
496
STATISTICS IN ECONOMICS AND MANAGEMENT
We begin with the problem of testing the simple null hypothesis that
the population mean is equal to, higher or lower than some specified
value μ0. Procedure for selecting to appropriate test depends on answer
for question: “Do we know standard deviation for population or for
sample?”. If we know only standard deviation for sample, we have to
decide which theoretical distribution we will apply according to the
sample size.
To use the one-sample test about mean, the obtained numerical data
are assumed to represent a random sample from a population that is
normally distributed. In practice, as long as the sample size is not very
small and the population is not very skewed, the Student - t distribution
provides a good approximation to the sampling distribution of the mean,
when variance for population is unknown. When a large sample size is
available, standard deviation from sample estimates standard deviation
from population precisely enough, so that there is little difference
between t and z distribution. Therefore, for large sample, a z test can be
used instead of t test when variance for population is unknown.
497
6 INFERENTIAL STATISTICS
1. Two-tailed test
1.
2.
3.
4.
2. One-tailed test
a. Lower boundary
1.
2.
3.
4.
b. Upper boundary
1.
2.
3.
498
STATISTICS IN ECONOMICS AND MANAGEMENT
4.
1. Two-tailed test
1.
2.
3.
4.
2. One-tailed test
a. Lower boundary
1.
2.
3.
4.
b. Upper boundary
1.
2.
499
6 INFERENTIAL STATISTICS
3.
4.
1. Two-tailed test
1.
2.
3.
4.
2. One-tailed test
a. Lower boundary
1.
2.
3.
4.
500
STATISTICS IN ECONOMICS AND MANAGEMENT
b. Upper boundary
1.
2.
3.
4.
Example 6.10.
Solution:
1.
2.
501
6 INFERENTIAL STATISTICS
3.
4.
There is evidence that the mean amount in the bottles is different from
2.0 liters.
Example 6.11.
Solution:
1.
2.
502
STATISTICS IN ECONOMICS AND MANAGEMENT
3.
4.
Example 6.12.
Solution:
1.
2.
503
6 INFERENTIAL STATISTICS
3.
4.
There is evidence that the average number of products is more than 350.
A t test can be used with large or small samples. However, as the sample
size becomes smaller, mean differences have to be larger to become
significant. In addition to the requirement of continuous measurement,
the t test assumes that the variable being measured is normally
distributed in the population from which the sample was selected. Even
when distributions for samples are mildly skewed, it may be reasonable
to assume a normal distribution for the variable in the population.
However, when the distribution for a sample is badly skewed or you
doubt that the variable is normally distributed in the population, you
should not use a t test. As an alternative you can compare medians or
convert continuous data to a set of intervals and conduct a chi square
test.
We have two main types of test for the significance of the difference
between two means if we don’t know population variances:
504
STATISTICS IN ECONOMICS AND MANAGEMENT
1.
1.
2.
3.
4.
2.
1.
2.
3.
4.
Example 6.13.
Let’s imagine that a new soft drink has been developed and its
Test of the difference
manufacturers claim that it boosts memory-recall. We need to test between two
whether or not the drink is effective. We start by collecting two random population means,
samples, each of 100 students. We give all students a soft drink, but one large samples.
505
6 INFERENTIAL STATISTICS
sugar water drink (this is known as a placebo). All 200 students think
they have received the memory drink. The students all take a memory
recall test, with the following results:
Group 1 (Total -Recall): Mean Score: 55; Standard Deviation: 12
marks
Group 2 (placebo): Mean Score 51.8; Standard Deviation: 9 marks
The difference in the Mean Scores between the two groups is 3.2 marks,
in favour of the Total-Recall drink. Is this result significant (α = 1%)?
Solution:
1.
2.
3.
4.
506
STATISTICS IN ECONOMICS AND MANAGEMENT
This result “The difference in the Mean Scores between the two groups
is 3.2 marks“ is not statistically significant.
Example 6.14.
Solution:
1.
2.
507
6 INFERENTIAL STATISTICS
3.
4.
Example 6.15.
O. N. I test II test
1 32 28
2 34 26
3 28 30
4 27 25
5 35 33
6 19 21
7 24 22
8 30 30
9 30 27
10 27 22
11 40 32
12 28 29
508
STATISTICS IN ECONOMICS AND MANAGEMENT
13 35 31
14 37 36
15 15 20
16 18 20
17 19 15
18 21 20
19 27 26
20 30 28
21 38 34
22 32 30
23 30 20
24 28 21
25 27 26
26 29 33
27 22 20
28 14 15
29 35 30
30 33 32
Solution:
1.
Data Analysis Excel Option (from Tools) is used in the analysis of given
paired samples:
509
6 INFERENTIAL STATISTICS
Result is:
510
STATISTICS IN ECONOMICS AND MANAGEMENT
I test II test
Mean 28.13333 26.06667
Variance 45.29195 32.61609
Observations 30 30
Pearson Correlation 0.853868
Hypothesized Mean Difference 0
Df 29
t Stat 3.231368
P(T<=t) one-tail 0.001531
t Critical one-tail 1.601972
P(T<=t) two-tail 0.003063
t Critical two-tail 1.957293
The one way analysis of variance (One Way ANOVA) aims to test
whether there is a difference among arithmetic means of more than two
populations and to compare their variances. In other words, One Way
ANOVA investigates the influence of certain factor with k dimensions
to one characteristics (variable). Therefore, we have k samples related
to k factor dimensions. For example, investigating the influence of
different fertilizer the harvest yields some kind of wheat. If the number
of elements in the i-th sample is ni, and if the j-th element and the sample
we designate by xij, we have the following results of measurements:
511
6 INFERENTIAL STATISTICS
total variance
From the
it is
512
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 6.16.
513
6 INFERENTIAL STATISTICS
Solution:
fibre n1
A1 7 1,680.0
A2 5 1,662.0
A3 8 1,636.2
A4 6 1,568.3
514
STATISTICS IN ECONOMICS AND MANAGEMENT
Then:
For 3 and 22 degrees of freedom and type I error 5%, we have theoretical
F value: Ft. = 3.05. Considering we conclude that heterogeneity
in the results cannot be considered significant and we are indifferent to
the choice of quality fibre. Test of the
differences of three
population means –
Example 6.17. analysis of variance.
Solution by Excel –
ANOVA.
We know data on the number of days of sick leave of employees of
enterprises X in 2004. We analyzed three employee groups: younger
515
6 INFERENTIAL STATISTICS
workers (20 - 30 yrs.), middle age workers (30 - 50 yrs.) and older
workers (50 - 60 yrs.).
Solution:
516
STATISTICS IN ECONOMICS AND MANAGEMENT
Test of the
differences of three
population means –
analysis of variance.
Solution by Excel –
ANOVA.
517
6 INFERENTIAL STATISTICS
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 13 2 6.5 0.108552 0.897402 3.238096
Within Groups 2335.286 39 59.87912
Total 2348.286 41
p value of F test is higher than 0.01 averages are equal, the conclusion
is that there is no statistically significant difference in the number of
days of sick leave among the three reference age group.
Well, the next problem is testing the simple null hypothesis that the
population proportion is equal to, higher or lower than some specified
value p0. We will start with model for confidence interval for the
population proportion, to find test statistic for the test for the population
proportion:
Then we can evaluate procedures for different kinds of the test for the
population proportion.
518
STATISTICS IN ECONOMICS AND MANAGEMENT
1.
2.
Test of the
differences of three
population means –
analysis of variance.
Solution by Excel –
3. ANOVA.
4.
1.
2.
3.
4.
1.
2.
3.
4.
519
6 INFERENTIAL STATISTICS
Example 6.18.
Solution:
This is one sided test for the population proportion, upper boundary:
1.
2.
3.
4.
520
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 6.19.
Solution:
This is one sided test for the population proportion, lower boundary:
1.
2.
3.
4.
521
6 INFERENTIAL STATISTICS
Our conclusion is that we reject assumption that less than 60% population
of college-educated 35 to 64-year olds with household incomes more
than $100,000 per year agrees that government should be more involved
in regulation of private enterprise.
1.
2.
522
STATISTICS IN ECONOMICS AND MANAGEMENT
3.
4.
1.
2.
3.
4.
1.
2.
3.
4.
Example 6.20.
523
6 INFERENTIAL STATISTICS
Solution:
This is two sided test for the population proportion, large sample:
1.
2.
a.
b.
3.
4.
524
STATISTICS IN ECONOMICS AND MANAGEMENT
Example 6.21.
Solution:
1.
2.
3.
4.
525
6 INFERENTIAL STATISTICS
2.
3.
4.
Example 6.22.
526
STATISTICS IN ECONOMICS AND MANAGEMENT
Traveling Stress
time high moderate low total
Under 15 min 9 5 18 32
15-45 min 17 8 28 53
Over 45 min 18 6 7 31
Total 44 19 53 116
Solution:
Theoretical frequencies
Traveling Stress
time High moderate low total
Under 15 min 12.13793 5.241379 14.62069 32
15-45 min 20.10345 8.681034 24.21552 53
Over 45 min 11.75862 5.077586 14.16379 31
Total 44 19 53 116
527
6 INFERENTIAL STATISTICS
2.
3.
4.
1.
2.
528
STATISTICS IN ECONOMICS AND MANAGEMENT
3.
4.
where:
m - number of samples (number of populations)
Pk - proportion in k-th population
nk - sample size for sample from k-th population
f k ( f kt ) - empirical (theoretical) frequency
Example 6.23.
Solution:
529
6 INFERENTIAL STATISTICS
C 150 37
D 250 43
Total 700 135
1.
2.
3.
4.
530
STATISTICS IN ECONOMICS AND MANAGEMENT
3.
4.
where:
r - number of parameters that are estimated from empirical data
m- number of modalities or intervals
f k ( f kt ) - empirical (theoretical) frequencies
Example 6.24.
Modalities Frequencies
Chi square test of
0 35 adequacy to
approximate with
1 115
Poisson distribution.
2 130
3 75
4 30
5 10
6 5
531
6 INFERENTIAL STATISTICS
Solution:
xi fi xi . fi
0 35 0
1 115 115
2 130 260
3 75 225
4 30 120
5 10 50
6 5 30
Sum 400 800
xi fi pti
0 35 0.13534
1 115 0.27067
2 130 0.27067
3 75 0.18045
4 30 0.09022
5 10 0.03609
6 5 0.01656
Sum 400 1
532
STATISTICS IN ECONOMICS AND MANAGEMENT
xi fi fti
0 35 54.136
1 115 108.268
2 130 108.268
3 75 72.18
4 30 36.088
5 10 14.436
6 5 6.624
Sum 400 400
1.
2.
3.
xi fi fti
0 35 54.136 10.46247
1 115 108.268 0.394085
2 130 108.268 3.632922
3 75 72.18 0.106032
4 30 36.088 1.235458
5 10 14.436 1.96781
6 5 6.624 0.527475
Sum 400 400 18.32625
533
6 INFERENTIAL STATISTICS
Solution:
Sample Me Probability
000 0 0 1/27
001 0.33 0 1/27
002 0.67 0 1/27
010 0.33 0 1/27
011 0.67 1 1/27
012 1 1 1/27
020 0.67 0 1/27
021 1 1 1/27
022 1.33 2 1/27
100 0.33 0 1/27
101 0.67 1 1/27
102 1 1 1/27
110 0.67 1 1/27
534
STATISTICS IN ECONOMICS AND MANAGEMENT
111 1 1 1/27
112 1.33 1 1/27
120 1 1 1/27
121 1.33 1 1/27
122 1.67 2 1/27
200 0.67 0 1/27
201 1 1 1/27
202 1.33 2 1/27
210 1 1 1/27
211 1.33 1 1/27
212 1.67 2 1/27
220 1.33 2 1/27
221 1.67 2 1/27
222 2 2 1/27
From the table it can be concluded that can assume the values 0;
0.33; 0.67; 1; 1.33; 1.67 and 2. Value occurs in only one sample:
. Similarly, value occurs in three samples:
, and so on. By calculating the probabilities of the
remaining values of , we obtain sampling distribution of , given in
the table below:
Me 0 1 2
p(Me) 7/27 13/27 7/27
535
6 INFERENTIAL STATISTICS
6.2. Research indicated that bicycle helmet saves lives. A study reported
in Public Health Reports (1992.) intended to identify ways of
Confidence interval of
the mean, unknown encouraging helmet use by children. One of the variables measured
population variance, was the children’s perception of the risk. A 4-point scale was used,
large sample. with scores ranging from 1 (no risk) to 4 (very high risk). A sample
of 797 children with grades 4 – 6 yielded the following results
on the perception of risk variable: . Estimate a
90% confidence interval for the average perception of risk for all
children.
Solution:
536
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
537
6 INFERENTIAL STATISTICS
6.4. A chain of “quick lube” shops has a standard service for performing
Confidence interval
oil changes and basic checkups on automobiles. The chain has a
of the mean, known standard that says that the average time per car for this service
population variance. should be 12.5 minutes. There is considerable variability in times,
due to differences in layout of engines, degree of time pressure
from other jobs, and many other sources. The standard deviation
for the chain has been 2.4 minutes. The manager of one shop picked
randomly 48 times and timed the next job after each random time.
The data were analyzed and where following results obtained:
Estimate confidence interval for the mean with type I error of 5%.
Solution:
538
STATISTICS IN ECONOMICS AND MANAGEMENT
of population is known.
Solution:
Sample proportions:
Type I error:
539
6 INFERENTIAL STATISTICS
Solution:
α = 0.08
540
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
small sample.
6.8. New drug was tested on a sample of 12 patients. The results One tailed test of
obtained on the sample showed that the symptoms begin to the population mean
disappear on average after 16 hours, with standard deviation of 0.8 (lower boundary,
unknown
hours. Pharmaceutical manufacturer claims that the drug works population variance,
for less than 14 hours. Test the manufacturer’s claim with the small sample.
significance of 95%.
541
6 INFERENTIAL STATISTICS
Solution:
1.
2.
3.
542
STATISTICS IN ECONOMICS AND MANAGEMENT
1.
2.
3.
6.10. The factory produces the chips packaged in 50 grams bags, with
the weight standard deviation of 2 grams. The quality control Two tailed test
supervisor wants to check the weight of packaged chips bags. In of the mean, known
population variance.
the sample of 25 bags of chips, the average weight was 48.9 grams.
Test the hypothesis that the weight of produced chips bags is 50
grams with type I error of 6%.
Solution:
543
6 INFERENTIAL STATISTICS
1.
2.
3.
Solution:
1.
2.
544
STATISTICS IN ECONOMICS AND MANAGEMENT
3.
Solution:
1.
2.
3.
545
6 INFERENTIAL STATISTICS
Solution:
1.
2.
3.
4. we cannot reject H0
546
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
Let’s calculate the mean and the variance from given sample:
x1 j
67 16
51 400
89 324
72 1
80 81
55 256
74 9
92 441
58 169
72 1
Σ 1698
547
6 INFERENTIAL STATISTICS
a) 1.
3.
b) 1.
2.
3.
4.
548
STATISTICS IN ECONOMICS AND MANAGEMENT
Solution:
1.
3.
549
6 INFERENTIAL STATISTICS
550
STATISTICS IN ECONOMICS AND MANAGEMENT
6.21. In a sample of 100 patients, 78 respond that they are satisfied with
health care they got. With type I error of 6%, find the confidence
interval for proportion of satisfied patients in population.
6.22. In the sample with 15 car parts, the calculated variance of particular
car part width was 4. With type I error of 5%, find the confidence
interval for variance.
551
6 INFERENTIAL STATISTICS
73, 19, 16, 64, 28, 28, 31, 90, 60, 56, 31, 56, 22, 18, 45, 48, 17, 17,
17, 91, 92, 63, 50, 51, 69, 16, 17.
552
STATISTICS IN ECONOMICS AND MANAGEMENT
6.28. What proportion of people living in the USA use the Internet when
planning their vacation? According to a poll, 35% of them use
Internet. If you were to conduct a study that would provide 95%
confidence that the point estimate is correct to within ±0.04 of the
population proportion, how large a sample size would be required?
Answer: n=547
553
6 INFERENTIAL STATISTICS
days between the receipt of the complaint and the resolution of the
complaint:
54, 5, 35, 137, 31, 27, 152, 2, 123, 81, 74, 27, 11, 19, 126, 110, 110,
29, 61, 35, 94, 31, 26, 5, 12, 4, 165, 32, 29, 28, 29, 26, 25, 1, 14, 13,
13, 10, 5, 27, 4, 52, 30, 22, 36, 26, 20, 23, 33, 68
Answer: ze = 0.39,
554
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer: te =.,55 < tt =2.62 The null hypothesis is not rejected. Therefore,
supermarkets should not switch to selling the new brand.
Before 52 60 58 58 53 51 52 59 60 53 55
After 56 62 63 50 55 56 55 59 61 58 56
555
6 INFERENTIAL STATISTICS
CYCLE TIME
EFFECTIVENESS <1 1-2 2-4 >4 Total
month months months months
Highly effective 15 28 24 6 73
Effective 9 26 33 19 87
Ineffective 5 2 3 5 15
Total 29 56 60 30 175
Answer:
556
STATISTICS IN ECONOMICS AND MANAGEMENT
Answer: Cannot reject H0: the average strength is less than 8000 lb.
Answer: .
We can accept the assumption.
6.41. Medicine producer claims that use of his medicine for 8 months
will cure allergy with 90% probability. In sample consisting of
200 persons which have been taking that medicine for at least 8
months, 40 of them have not been cured. Test producer’s claim
with type I error of 2%.
557
6 INFERENTIAL STATISTICS
558
REFERENCES
STATISTICS IN ECONOMICS AND MANAGEMENT
REFERENCES
561
REFERENCES
562
STATISTICS IN ECONOMICS AND MANAGEMENT
http://www.angelfire.com
http://bma.ac.in:8080/dspace
http://courses.wcupa.edu
http://www.bhas.ba
http://www.doingbusiness.org
http://www.mnstate.edu
http://www.statcan.ca
http://www.zoology.ubc.ca
http://www.socialresearchmethods.net
http://www.une.edu.au
http://www.statsoft.com
http://www.archive.org
http://home.ubalt.edu
563
REFERENCES
http://sofia.fhda.edu
http://www.learner.org
http://www.vias.org
http://www.frc.mass.edu
http://bcs.whfreeman.com
http://www.xycoon.com
http://www.wessa.net
http://www.le.ac.uk
564
STATISTICAL
TABLES
STATISTICS IN ECONOMICS AND MANAGEMENT
567
STATISTICAL TABLES
568
STATISTICS IN ECONOMICS AND MANAGEMENT
x 0 904837 818731 740818 670320 606531 548812 496585 449329 406570 367879
1 090484 163746 222245 268128 303265 329287 347610 359463 365913 367879
2 004524 016375 033337 053626 075816 098786 121663 143785 164661 183940
3 000151 001092 003334 007150 012636 019757 028388 038343 049398 061313
4 000004 000055 000250 000715 001580 002964 004968 007669 011115 015328
5 000000 000002 000015 000057 000158 000356 000696 001227 002001 003066
10 000000
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
569
STATISTICAL TABLES
λ 2 3 4 5 6 7 8 9 10 11
x 0 135335 049787 018316 006738 002479 000912 000335 000123 000045 000017
1 270671 149361 073263 033690 014873 006383 002684 001111 000454 000184
2 270671 224042 146525 084224 044618 022341 010735 004998 002270 001010
3 180447 224042 195367 140374 089235 052129 028626 014994 007567 003705
4 090224 168031 195367 175467 133853 091226 057252 033737 018917 010189
5 036089 100819 156293 175467 160623 127717 091604 060727 037833 022415
6 012030 050409 104196 146223 160623 149003 122138 091090 063055 041095
7 003437 021604 059540 104445 137677 149003 139587 117116 090079 064577
8 000859 008102 029770 065278 103258 130377 139587 131756 112599 088794
9 000191 002701 013231 036266 068838 101405 124077 131756 125110 108526
10 000038 000810 005292 018133 041303 070983 099262 118580 125110 119378
11 000007 000221 001925 008242 022529 045171 072190 097020 113736 119378
12 000001 000055 000642 003434 011264 026350 048127 072765 094780 109430
13 000000 000013 000197 001321 005199 014188 029616 050376 072908 092595
29 000001 000003
30 000000 000001
570
STATISTICS IN ECONOMICS AND MANAGEMENT
second
z decimal 0 1 2 3 4 5 6 7 8 9
place
0.1 396953 396536 396080 395585 395052 394479 393868 393219 392531 391806
0.2 391043 390242 389404 388529 387617 386668 385683 384663 383606 382515
0.3 381388 380226 379031 377801 376537 375240 373911 372548 371154 369728
0.4 368270 366782 365263 363714 362135 360527 358890 357225 355533 353812
0.5 352065 350292 348493 346668 344818 342944 341046 339124 337180 335213
0.6 333225 331215 329184 327133 325062 322972 320864 318737 316593 314432
0.7 312254 310060 307851 305627 303389 301137 298872 296595 294305 292004
0.8 289692 287369 285036 282694 280344 277985 275618 273244 270864 268477
0.9 266085 263688 261286 258881 256471 254059 251644 249228 246809 244390
1.0 241971 239551 237132 234714 232297 229882 227470 225060 222653 220251
1.1 217852 215458 213069 210686 208308 205936 203571 201214 198863 196520
1.2 194186 191860 189543 187235 184937 182649 180371 178104 175847 173602
1.3 171369 169147 166937 164740 162555 160383 158225 156080 153948 151831
1.4 149727 147639 145564 143505 141460 139431 137417 135418 133435 131468
1.5 129518 127583 125665 123763 121878 120009 118157 116323 114505 112704
1.6 110921 109155 107406 105675 103961 102265 100586 098925 097282 095657
1.7 094049 092459 090887 089333 087796 086277 084776 083293 081828 080380
1.8 078950 077538 076143 074766 073407 072065 070740 069433 068144 066871
1.9 065616 064378 063157 061952 060765 059595 058441 057304 056183 055079
2.0 053991 052919 051864 050824 049800 048792 047800 046823 045861 044915
2.1 043984 043067 042166 041280 040408 039550 038707 037878 037063 036262
2.2 035475 034701 033941 033194 032460 031740 031032 030337 029655 028985
2.3 028327 027682 027048 026426 025817 025218 024631 024056 023491 022937
2.4 022395 021862 021341 020829 020328 019837 019356 018885 018423 017971
2.5 017528 017095 016670 016254 015848 015449 015060 014678 014305 013940
2.6 013583 013234 012892 012558 012232 011912 011600 011295 010997 010706
2.7 010421 010143 009871 009606 009347 009094 008846 008605 008370 008140
2.8 007915 007697 007483 007274 007071 006873 006679 006491 006307 006127
2.9 005953 005782 005616 005454 005296 005143 004993 004847 004705 004567
3.0 004432 004301 004173 004049 003928 003810 003695 003584 003475 003370
3.1 003267 003167 003070 002975 002884 002794 002707 002623 002541 002461
3.2 002384 002309 002236 002165 002096 002029 001964 001901 001840 001780
3.3 001723 001667 001612 001560 001508 001459 001411 001364 001319 001275
3.4 001232 001191 001151 001112 001075 001038 001003 000969 000936 000904
3.5 000873 000843 000814 000785 000758 000732 000706 000681 000657 000634
3.6 000612 000590 000569 000549 000529 000510 000492 000474 000457 000441
3.7 000425 000409 000394 000380 000366 000353 000340 000327 000315 000303
3.8 000292 000281 000271 000260 000251 000241 000232 000223 000215 000207
3.9 000199 000191 000184 000177 000170 000163 000157 000151 000145 000139
4.0 000134 000129 000124 000119 000114 000109 000105 000101 000097 000093
571
STATISTICAL TABLES
second
z decimal 0 1 2 3 4 5 6 7 8 9
place
0.1 539828 543795 547758 551717 555670 559618 563559 567495 571424 575345
0.2 579260 583166 587064 590954 594835 598706 602568 606420 610261 614092
0.3 617911 621720 625516 629300 633072 636831 640576 644309 648027 651732
0.4 655422 659097 662757 666402 670031 673645 677242 680822 684386 687933
0.5 691462 694974 698468 701944 705401 708840 712260 715661 719043 722405
0.6 725747 729069 732371 735653 738914 742154 745373 748571 751748 754903
0.7 758036 761148 764238 767305 770350 773373 776373 779350 782305 785236
0.8 788145 791030 793892 796731 799546 802337 805105 807850 810570 813267
0.9 815940 818589 821214 823814 826391 828944 831472 833977 836457 838913
1.0 841345 843752 846136 848495 850830 853141 855428 857690 859929 862143
1.1 864334 866500 868643 870762 872857 874928 876976 879000 881000 882977
1.2 884930 886861 888768 890651 892512 894350 896165 897958 899727 901475
1.3 903200 904902 906582 908241 909877 911492 913085 914657 916207 917736
1.4 919243 920730 922196 923641 925066 926471 927855 929219 930563 931888
1.5 933193 934478 935745 936992 938220 939429 940620 941792 942947 944083
1.6 945201 946301 947384 948449 949497 950529 951543 952540 953521 954486
1.7 955435 956367 957284 958185 959070 959941 960796 961636 962462 963273
1.8 964070 964852 965620 966375 967116 967843 968557 969258 969946 970621
1.9 971283 971933 972571 973197 973810 974412 975002 975581 976148 976705
2.0 977250 977784 978308 978822 979325 979818 980301 980774 981237 981691
2.1 982136 982571 982997 983414 983823 984222 984614 984997 985371 985738
2.2 986097 986447 986791 987126 987455 987776 988089 988396 988696 988989
2.3 989276 989556 989830 990097 990358 990613 990863 991106 991344 991576
2.4 991802 992024 992240 992451 992656 992857 993053 993244 993431 993613
2.5 993790 993963 994132 994297 994457 994614 994766 994915 995060 995201
2.6 995339 995473 995604 995731 995855 995975 996093 996207 996319 996427
2.7 996533 996636 996736 996833 996928 997020 997110 997197 997282 997365
2.8 997445 997523 997599 997673 997744 997814 997882 997948 998012 998074
2.9 998134 998193 998250 998305 998359 998411 998462 998511 998559 998605
3.0 998650 998694 998736 998777 998817 998856 998893 998930 998965 998999
3.1 999032 999065 999096 999126 999155 999184 999211 999238 999264 999289
3.2 999313 999336 999359 999381 999402 999423 999443 999462 999481 999499
3.3 999517 999534 999550 999566 999581 999596 999610 999624 999638 999651
3.4 999663 999675 999687 999698 999709 999720 999730 999740 999749 999758
3.5 999767 999776 999784 999792 999800 999807 999815 999822 999828 999835
3.6 999841 999847 999853 999858 999864 999869 999874 999879 999883 999888
3.7 999892 999896 999900 999904 999908 999912 999915 999918 999922 999925
3.8 999928 999931 999933 999936 999938 999941 999943 999946 999948 999950
3.9 999952 999954 999956 999958 999959 999961 999963 999964 999966 999967
4.0 999968 999970 999971 999972 999973 999974 999975 999976 999977 999978
572
STATISTICS IN ECONOMICS AND MANAGEMENT
573
STATISTICAL TABLES
574
STATISTICS IN ECONOMICS AND MANAGEMENT
575
STATISTICAL TABLES
576
STATISTICS IN ECONOMICS AND MANAGEMENT
577
STATISTICAL TABLES
578
STATISTICS IN ECONOMICS AND MANAGEMENT
Fisher-Snedecor's F distribution
P(F)=0,1
m 1 2 3 4 5 6 7 8 9 10
n 1 39.8635 49.5000 53.5932 55.8330 57.2401 58.2044 58.9060 59.4390 59.8576 60.1950
2 8.5263 9.0000 9.1618 9.2434 9.2926 9.3255 9.3491 9.3668 9.3805 9.3916
3 5.5383 5.4624 5.3908 5.3426 5.3092 5.2847 5.2662 5.2517 5.2400 5.2304
4 4.5448 4.3246 4.1909 4.1072 4.0506 4.0097 3.9790 3.9549 3.9357 3.9199
5 4.0604 3.7797 3.6195 3.5202 3.4530 3.4045 3.3679 3.3393 3.3163 3.2974
6 3.7759 3.4633 3.2888 3.1808 3.1075 3.0546 3.0145 2.9830 2.9577 2.9369
7 3.5894 3.2574 3.0741 2.9605 2.8833 2.8274 2.7849 2.7516 2.7247 2.7025
8 3.4579 3.1131 2.9238 2.8064 2.7264 2.6683 2.6241 2.5893 2.5612 2.5380
9 3.3603 3.0065 2.8129 2.6927 2.6106 2.5509 2.5053 2.4694 2.4403 2.4163
10 3.2850 2.9245 2.7277 2.6053 2.5216 2.4606 2.4140 2.3772 2.3473 2.3226
12 3.1765 2.8068 2.6055 2.4801 2.3940 2.3310 2.2828 2.2446 2.2135 2.1878
15 3.0732 2.6952 2.4898 2.3614 2.2730 2.2081 2.1582 2.1185 2.0862 2.0593
20 2.9747 2.5893 2.3801 2.2489 2.1582 2.0913 2.0397 1.9985 1.9649 1.9367
24 2.9271 2.5383 2.3274 2.1949 2.1030 2.0351 1.9826 1.9407 1.9063 1.8775
30 2.8807 2.4887 2.2761 2.1422 2.0492 1.9803 1.9269 1.8841 1.8490 1.8195
40 2.8354 2.4404 2.2261 2.0909 1.9968 1.9269 1.8725 1.8289 1.7929 1.7627
60 2.7911 2.3933 2.1774 2.0410 1.9457 1.8747 1.8194 1.7748 1.7380 1.7070
120 2.7478 2.3473 2.1300 1.9923 1.8959 1.8238 1.7675 1.7220 1.6842 1.6524
479001600 2.7055 2.3026 2.0838 1.9449 1.8473 1.7741 1.7167 1.6702 1.6315 1.5987
P(F)=0,05
m 1 2 3 4 5 6 7 8 9 10
n 1 161.4476 199.5000 215.7073 224.5832 230.1619 233.9860 236.7684 238.8827 240.5433 241.8817
2 18.5128 19.0000 19.1643 19.2468 19.2964 19.3295 19.3532 19.3710 19.3848 19.3959
3 10.1280 9.5521 9.2766 9.1172 9.0135 8.9406 8.8867 8.8452 8.8123 8.7855
4 7.7086 6.9443 6.5914 6.3882 6.2561 6.1631 6.0942 6.0410 5.9988 5.9644
5 6.6079 5.7861 5.4095 5.1922 5.0503 4.9503 4.8759 4.8183 4.7725 4.7351
6 5.9874 5.1433 4.7571 4.5337 4.3874 4.2839 4.2067 4.1468 4.0990 4.0600
7 5.5914 4.7374 4.3468 4.1203 3.9715 3.8660 3.7870 3.7257 3.6767 3.6365
8 5.3177 4.4590 4.0662 3.8379 3.6875 3.5806 3.5005 3.4381 3.3881 3.3472
9 5.1174 4.2565 3.8625 3.6331 3.4817 3.3738 3.2927 3.2296 3.1789 3.1373
10 4.9646 4.1028 3.7083 3.4780 3.3258 3.2172 3.1355 3.0717 3.0204 2.9782
12 4.7472 3.8853 3.4903 3.2592 3.1059 2.9961 2.9134 2.8486 2.7964 2.7534
15 4.5431 3.6823 3.2874 3.0556 2.9013 2.7905 2.7066 2.6408 2.5876 2.5437
20 4.3512 3.4928 3.0984 2.8661 2.7109 2.5990 2.5140 2.4471 2.3928 2.3479
24 4.2597 3.4028 3.0088 2.7763 2.6207 2.5082 2.4226 2.3551 2.3002 2.2547
30 4.1709 3.3158 2.9223 2.6896 2.5336 2.4205 2.3343 2.2662 2.2107 2.1646
40 4.0847 3.2317 2.8387 2.6060 2.4495 2.3359 2.2490 2.1802 2.1240 2.0772
60 4.0012 3.1504 2.7581 2.5252 2.3683 2.2541 2.1665 2.0970 2.0401 1.9926
120 3.9201 3.0718 2.6802 2.4472 2.2899 2.1750 2.0868 2.0164 1.9588 1.9105
479001600 3.8415 2.9957 2.6049 2.3719 2.2141 2.0986 2.0096 1.9384 1.8799 1.8307
579
STATISTICAL TABLES
Fisher-Snedecor's F distribution
P(F)=0,1
m 1 2 3 4 5 6 7 8 9 10
n 1 4052.1807 4999.5000 5403.3520 5624.5833 5763.6496 5858.9861 5928.3557 5981.0703 6022.4732 6055.8467
2 98.5025 99.0000 99.1662 99.2494 99.2993 99.3326 99.3564 99.3742 99.3881 99.3992
3 34.1162 30.8165 29.4567 28.7099 28.2371 27.9107 27.6717 27.4892 27.3452 27.2287
4 21.1977 18.0000 16.6944 15.9770 15.5219 15.2069 14.9758 14.7989 14.6591 14.5459
5 16.2582 13.2739 12.0600 11.3919 10.9670 10.6723 10.4555 10.2893 10.1578 10.0510
6 13.7450 10.9248 9.7795 9.1483 8.7459 8.4661 8.2600 8.1017 7.9761 7.8741
7 12.2464 9.5466 8.4513 7.8466 7.4604 7.1914 6.9928 6.8400 6.7188 6.6201
8 11.2586 8.6491 7.5910 7.0061 6.6318 6.3707 6.1776 6.0289 5.9106 5.8143
9 10.5614 8.0215 6.9919 6.4221 6.0569 5.8018 5.6129 5.4671 5.3511 5.2565
10 10.0443 7.5594 6.5523 5.9943 5.6363 5.3858 5.2001 5.0567 4.9424 4.8491
12 9.3302 6.9266 5.9525 5.4120 5.0643 4.8206 4.6395 4.4994 4.3875 4.2961
15 8.6831 6.3589 5.4170 4.8932 4.5556 4.3183 4.1415 4.0045 3.8948 3.8049
20 8.0960 5.8489 4.9382 4.4307 4.1027 3.8714 3.6987 3.5644 3.4567 3.3682
24 7.8229 5.6136 4.7181 4.2184 3.8951 3.6667 3.4959 3.3629 3.2560 3.1681
30 7.5625 5.3903 4.5097 4.0179 3.6990 3.4735 3.3045 3.1726 3.0665 2.9791
40 7.3141 5.1785 4.3126 3.8283 3.5138 3.2910 3.1238 2.9930 2.8876 2.8005
60 7.0771 4.9774 4.1259 3.6490 3.3389 3.1187 2.9530 2.8233 2.7185 2.6318
120 6.8509 4.7865 3.9491 3.4795 3.1735 2.9559 2.7918 2.6629 2.5586 2.4721
479001600 6.6349 4.6052 3.7816 3.3192 3.0173 2.8020 2.6393 2.5113 2.4073 2.3209
P(F)=0,005
m 1 2 3 4 5 6 7 8 9 10
n 1 16210.7227 19999.5000 21614.7414 22499.5833 23055.7982 23437.1111 23714.5658 23925.4062 24091.0041 24224.4868
2 198.5013 199.0000 199.1664 199.2497 199.2996 199.3330 199.3568 199.3746 199.3885 199.3996
3 55.5520 49.7993 47.4672 46.1946 45.3916 44.8385 44.4341 44.1256 43.8824 43.6858
4 31.3328 26.2843 24.2591 23.1545 22.4564 21.9746 21.6217 21.3520 21.1391 20.9667
5 22.7848 18.3138 16.5298 15.5561 14.9396 14.5133 14.2004 13.9610 13.7716 13.6182
6 18.6350 14.5441 12.9166 12.0275 11.4637 11.0730 10.7859 10.5658 10.3915 10.2500
7 16.2356 12.4040 10.8824 10.0505 9.5221 9.1553 8.8854 8.6781 8.5138 8.3803
8 14.6882 11.0424 9.5965 8.8051 8.3018 7.9520 7.6941 7.4959 7.3386 7.2106
9 13.6136 10.1067 8.7171 7.9559 7.4712 7.1339 6.8849 6.6933 6.5411 6.4172
10 12.8265 9.4270 8.0807 7.3428 6.8724 6.5446 6.3025 6.1159 5.9676 5.8467
12 11.7542 8.5096 7.2258 6.5211 6.0711 5.7570 5.5245 5.3451 5.2021 5.0855
15 10.7980 7.7008 6.4760 5.8029 5.3721 5.0708 4.8473 4.6744 4.5364 4.4235
20 9.9439 6.9865 5.8177 5.1743 4.7616 4.4721 4.2569 4.0900 3.9564 3.8470
24 9.5513 6.6609 5.5190 4.8898 4.4857 4.2019 3.9905 3.8264 3.6949 3.5870
30 9.1797 6.3547 5.2388 4.6234 4.2276 3.9492 3.7416 3.5801 3.4505 3.3440
40 8.8279 6.0664 4.9758 4.3738 3.9860 3.7129 3.5088 3.3498 3.2220 3.1167
60 8.4946 5.7950 4.7290 4.1399 3.7599 3.4918 3.2911 3.1344 3.0083 2.9042
120 8.1788 5.5393 4.4972 3.9207 3.5482 3.2849 3.0874 2.9330 2.8083 2.7052
479001600 7.8794 5.2983 4.2794 3.7151 3.3499 3.0913 2.8968 2.7444 2.6210 2.5188
580
STATISTICS IN ECONOMICS AND MANAGEMENT
Fisher-Snedecor's F distribution
P(F)=0,1
m 12 15 20 24 30 40 60 120 479001600
n 1 60.7052 61.2203 61.7403 62.0020 62.2650 62.5291 62.7943 63.0606 63.3282
2 9.4081 9.4247 9.4413 9.4496 9.4579 9.4662 9.4746 9.4829 9.4912
3 5.2156 5.2003 5.1845 5.1764 5.1681 5.1597 5.1512 5.1425 5.1337
4 3.8955 3.8704 3.8443 3.8310 3.8174 3.8036 3.7896 3.7753 3.7607
5 3.2682 3.2380 3.2067 3.1905 3.1741 3.1573 3.1402 3.1228 3.1050
6 2.9047 2.8712 2.8363 2.8183 2.8000 2.7812 2.7620 2.7423 2.7222
7 2.6681 2.6322 2.5947 2.5753 2.5555 2.5351 2.5142 2.4928 2.4708
8 2.5020 2.4642 2.4246 2.4041 2.3830 2.3614 2.3391 2.3162 2.2926
9 2.3789 2.3396 2.2983 2.2768 2.2547 2.2320 2.2085 2.1843 2.1592
10 2.2841 2.2435 2.2007 2.1784 2.1554 2.1317 2.1072 2.0818 2.0554
12 2.1474 2.1049 2.0597 2.0360 2.0115 1.9861 1.9597 1.9323 1.9036
15 2.0171 1.9722 1.9243 1.8990 1.8728 1.8454 1.8168 1.7867 1.7551
20 1.8924 1.8449 1.7938 1.7667 1.7382 1.7083 1.6768 1.6433 1.6074
24 1.8319 1.7831 1.7302 1.7019 1.6721 1.6407 1.6073 1.5715 1.5327
30 1.7727 1.7223 1.6673 1.6377 1.6065 1.5732 1.5376 1.4989 1.4564
40 1.7146 1.6624 1.6052 1.5741 1.5411 1.5056 1.4672 1.4248 1.3769
60 1.6574 1.6034 1.5435 1.5107 1.4755 1.4373 1.3952 1.3476 1.2915
120 1.6012 1.5450 1.4821 1.4472 1.4094 1.3676 1.3203 1.2646 1.1926
479001600 1.5458 1.4871 1.4206 1.3832 1.3419 1.2951 1.2400 1.1686 1.0000
P(F)=0,05
m 12 15 20 24 30 40 60 120 479001600
n 1 243.9060 245.9499 248.0131 249.0518 250.0951 251.1432 252.1957 253.2529 254.3148
2 19.4125 19.4291 19.4458 19.4541 19.4624 19.4707 19.4791 19.4874 19.4957
3 8.7446 8.7029 8.6602 8.6385 8.6166 8.5944 8.5720 8.5494 8.5264
4 5.9117 5.8578 5.8025 5.7744 5.7459 5.7170 5.6877 5.6581 5.6281
5 4.6777 4.6188 4.5581 4.5272 4.4957 4.4638 4.4314 4.3985 4.3650
6 3.9999 3.9381 3.8742 3.8415 3.8082 3.7743 3.7398 3.7047 3.6689
7 3.5747 3.5107 3.4445 3.4105 3.3758 3.3404 3.3043 3.2674 3.2297
8 3.2839 3.2184 3.1503 3.1152 3.0794 3.0428 3.0053 2.9669 2.9276
9 3.0729 3.0061 2.9365 2.9005 2.8637 2.8259 2.7872 2.7475 2.7067
10 2.9130 2.8450 2.7740 2.7372 2.6996 2.6609 2.6211 2.5801 2.5379
12 2.6866 2.6169 2.5436 2.5055 2.4663 2.4259 2.3842 2.3410 2.2962
15 2.4753 2.4034 2.3275 2.2878 2.2468 2.2043 2.1601 2.1141 2.0658
20 2.2776 2.2033 2.1242 2.0825 2.0391 1.9938 1.9464 1.8963 1.8432
24 2.1834 2.1077 2.0267 1.9838 1.9390 1.8920 1.8424 1.7896 1.7330
30 2.0921 2.0148 1.9317 1.8874 1.8409 1.7918 1.7396 1.6835 1.6223
40 2.0035 1.9245 1.8389 1.7929 1.7444 1.6928 1.6373 1.5766 1.5089
60 1.9174 1.8364 1.7480 1.7001 1.6491 1.5943 1.5343 1.4673 1.3893
120 1.8337 1.7505 1.6587 1.6084 1.5543 1.4952 1.4290 1.3519 1.2539
479001600 1.7522 1.6664 1.5705 1.5173 1.4591 1.3940 1.3180 1.2214 1.0000
581
STATISTICAL TABLES
Fisher-Snedecor's F distribution
P(F)=0,1
m 12 15 20 24 30 40 60 120 479001600
n 1 6106.3207 6157.2846 6208.7302 6234.6309 6260.6486 6286.7821 6313.0301 6339.3913 6365.8685
2 99.4159 99.4325 99.4492 99.4575 99.4658 99.4742 99.4825 99.4908 99.4992
3 27.0518 26.8722 26.6898 26.5975 26.5045 26.4108 26.3164 26.2211 26.1251
4 14.3736 14.1982 14.0196 13.9291 13.8377 13.7454 13.6522 13.5581 13.4631
5 9.8883 9.7222 9.5526 9.4665 9.3793 9.2912 9.2020 9.1118 9.0204
6 7.7183 7.5590 7.3958 7.3127 7.2285 7.1432 7.0567 6.9690 6.8800
7 6.4691 6.3143 6.1554 6.0743 5.9920 5.9084 5.8236 5.7373 5.6495
8 5.6667 5.5151 5.3591 5.2793 5.1981 5.1156 5.0316 4.9461 4.8588
9 5.1114 4.9621 4.8080 4.7290 4.6486 4.5666 4.4831 4.3978 4.3105
10 4.7059 4.5581 4.4054 4.3269 4.2469 4.1653 4.0819 3.9965 3.9090
12 4.1553 4.0096 3.8584 3.7805 3.7008 3.6192 3.5355 3.4494 3.3608
15 3.6662 3.5222 3.3719 3.2940 3.2141 3.1319 3.0471 2.9595 2.8684
20 3.2311 3.0880 2.9377 2.8594 2.7785 2.6947 2.6077 2.5168 2.4212
24 3.0316 2.8887 2.7380 2.6591 2.5773 2.4923 2.4035 2.3100 2.2107
30 2.8431 2.7002 2.5487 2.4689 2.3860 2.2992 2.2079 2.1108 2.0062
40 2.6648 2.5216 2.3689 2.2880 2.2034 2.1142 2.0194 1.9172 1.8047
60 2.4961 2.3523 2.1978 2.1154 2.0285 1.9360 1.8363 1.7263 1.6006
120 2.3363 2.1915 2.0346 1.9500 1.8600 1.7628 1.6557 1.5330 1.3805
479001600 2.1847 2.0385 1.8783 1.7908 1.6964 1.5923 1.4730 1.3246 1.0000
P(F)=0,005
m 12 15 20 24 30 40 60 120 479001600
n 1 24426.3662 24630.2051 24835.9709 24939.5653 25043.6277 25148.1532 25253.1369 25358.5734 25464.4593
2 199.4163 199.4329 199.4496 199.4579 199.4663 199.4746 199.4829 199.4912 199.4996
3 43.3874 43.0847 42.7775 42.6222 42.4658 42.3082 42.1494 41.9895 41.8283
4 20.7047 20.4383 20.1673 20.0300 19.8915 19.7518 19.6107 19.4684 19.3247
5 13.3845 13.1463 12.9035 12.7802 12.6556 12.5297 12.4024 12.2737 12.1435
6 10.0343 9.8140 9.5888 9.4742 9.3582 9.2408 9.1219 9.0015 8.8793
7 8.1764 7.9678 7.7540 7.6450 7.5345 7.4224 7.3088 7.1933 7.0760
8 7.0149 6.8143 6.6082 6.5029 6.3961 6.2875 6.1772 6.0649 5.9506
9 6.2274 6.0325 5.8318 5.7292 5.6248 5.5186 5.4104 5.3001 5.1875
10 5.6613 5.4707 5.2740 5.1732 5.0706 4.9659 4.8592 4.7501 4.6385
12 4.9062 4.7213 4.5299 4.4314 4.3309 4.2282 4.1229 4.0149 3.9039
15 4.2497 4.0698 3.8826 3.7859 3.6867 3.5850 3.4803 3.3722 3.2602
20 3.6779 3.5020 3.3178 3.2220 3.1234 3.0215 2.9159 2.8058 2.6904
24 3.4199 3.2456 3.0624 2.9667 2.8679 2.7654 2.6585 2.5463 2.4276
30 3.1787 3.0057 2.8230 2.7272 2.6278 2.5241 2.4151 2.2998 2.1760
40 2.9531 2.7811 2.5984 2.5020 2.4015 2.2958 2.1838 2.0636 1.9318
60 2.7419 2.5705 2.3872 2.2898 2.1874 2.0789 1.9622 1.8341 1.6885
120 2.5439 2.3727 2.1881 2.0890 1.9840 1.8709 1.7469 1.6055 1.4311
479001600 2.3583 2.1868 1.9998 1.8983 1.7891 1.6691 1.5325 1.3637 1.0000
582
INDEX
STATISTICS IN ECONOMICS AND MANAGEMENT
INDEX
Census 5, 22
Chain indices 328
Chi-square 10, 443, 525, 545, 546, 576, 577, 578
Class 5, 48, 49
Coefficient of determination 253, 289, 301
Coefficient of variation 6, 101, 198, 202, 210, 222
Complement of event 394
Confidence interval 127, 480, 481, 482, 484, 485, 487, 536, 537, 538,
539, 540, 541, 554
Contingency table 401, 452, 526
Continuous variables 20, 40, 48
Correlation 227, 237, 238, 247, 253, 262, 270, 287, 511
Cumulative frequency 47
Cumulative function 405, 406
585
INDEX
Event 400
586
STATISTICS IN ECONOMICS AND MANAGEMENT
587
INDEX
Sample 5, 21, 24, 34, 219, 220, 224, 260, 306, 393, 474, 475, 481, 507,
511, 529, 534, 539, 557
Sample bias 474
Sample size 507, 529
588
STATISTICS IN ECONOMICS AND MANAGEMENT
Variables 123
Variance 95, 100, 127, 222, 273, 406, 407, 410, 416, 511, 517, 546, 557
589