Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

AAMS1773 QUANTITATIVE STUDIES

CHAPTER 1: INTRODUCTION TO STATISTICS AND DATA PRESENTATION

INTRODUCTION TO STATISTICS

Statistics represent scientific procedures and methods for collecting,


organizing, summarizing, presenting, analyzing, and interpreting data, as
well as drawing valid conclusions and making reasonable decisions based
on the analysis. However, the figures that result from statistical analysis are also
referred to as “statistics”.

Collect Data
Summarize Organize
Data Data
STATISTICS
Analyse
Present Data
Data
Interpret
Data

PURPOSES OF STATISTICS
• Statistical techniques are used extensively by marketing managers,
accountants, consumers, educators, politicians, physicians, etc.
• Statistical techniques are used to make many decisions that affect our
lives. Regardless what your future line of work is, you will make decisions
that involved data.

Chapter 1 – Page 1
Reasons for learning statistics:

• To know how to properly present and describe information.


• To know how to obtain reliable forecasts of variables of interest.
• To know how to draw conclusions about large populations based on
information obtained from samples.

Population and Sample


Population: A set of all items under observation.

Sample: A set of items


selected (subset) from the
population / some items from
a population

Statistic and Parameter


• A summary measure such as mean, median, mode or standard deviation,
computed from sample data is called a statistic.
• A summary measure for the entire population is called a parameter.
• Statisticians often estimate population parameters from the corresponding
sample statistics.

Chapter 1 – Page 2
TYPES-OF VARIABLES

Variables measure the characteristics of the population that the researcher wants
to study.

Variable

• The characteristics of the population


of interest
• E.g. monthly income of respondents,
respondents’ age, gender, level of
education, number of children and type
of house owned

Quantitative or Numerical Qualitative or Attributive

• Measured on numerical scale • Measured on non-numerical


• Yields numerical response scale
• E.g. How tall are you? The • Yields categorical responses
answer is numerical. • E.g. Are you a Malaysian? The
answer is only “Yes” or “No”.

Discrete Continuous

• Numerical response which • Numerical response which


arises from a counting arises from a measuring
process. process.
• E.g. How many mobile • E.g. What is your weight?
phones do you have?

Chapter 1 – Page 3
DATA PRESENTATION

Raw data
• Data collected that have not been organized or processed are called
raw data.
• When every observed value of the random variable is listed, the data are
called ungrouped data.
• Grouping is one of the most common methods of organizing data. When
we group data, we are actually constructing frequency distributions for the
raw data.

Frequency Distribution
• A frequency distribution is a table in which possible values for a variable
are grouped into non–overlapping classes, and the number of observed
values which fall into each class is recorded.
• Data organized in a frequency distribution are called grouped data.

Example
The frequency distribution below represents the number of books read by
500 students in a school for one year:

No. of books read No. of students (Frequency, f)


0–9 52
10 – 19 63
20 – 29 71
30 – 39 96
40 – 49 43
50 – 59 58
60 – 79 72
80 – 99 45

The variable is “number of books read”.


The data (number of books read) are grouped into 8 classes.

Chapter 1 – Page 4
• Classes / class intervals set up should be non-overlapping and no double
counting.
• Commonly, the number of classes is between 5 to 15.
• Use equal class sizes / widths whenever possible.
• Sometimes, FREQUENCY is modified/given as PROPORTION or
PERCENTAGE.

Some common practices for classes:

* Class ** Class *** Class


(exclusive type) (inclusive type) (with open-ended classes)
0 - < 10 or 0 - 10 0– 9 Below 20
10 - < 20 10 - 20 10 – 19 20 - < 30
20 - < 30 20 - 30 20 – 29 30 - < 40
30 - < 40 30 - 40 30 – 39 40 - < 50
40 - < 50 40 - 50 40 – 49 50 and above

* Exclusive class type is mainly used for continuous data or discrete data
which have been rounded to the nearest tens, hundreds, thousands, millions
etc.

** Inclusive class type is mainly used for discrete data where there is a gap
between classes.

*** An open-ended class size is assumed to be the same with the class size
of the nearest (immediate neighbor) class.

Chapter 1 – Page 5
Example
The following is a record of the number of books borrowed per week in the
library for 30 weeks: -

21 47 64 42 89 76 55 100 75 67
89 15 97 25 35 12 92 36 93 34
87 27 74 21 66 25 47 10 89 30

Tabulate the data in the form of a frequency distribution, grouping by suitable


class size.

Solution:

The variable is the number of books borrowed per week which is discrete.
Number of classes is set to be 5.

Lowest value = 10; highest value = 100


Class size is set to be 20.

Frequency distribution for the number of books borrowed per week in the
library for 30 weeks:

Number of books Tally count Number of weeks (f)


10 – 29
30 – 49
50 – 69
70 – 89
90 – 109
Total 30

Chapter 1 – Page 6
Example
The amount of rainfall (in cm) for a small town was recorded for the month
of December.

20.42 21.06 22.40 21.117 22.6 33.01 22.89 22.9


30.34 25.61 23 24.5 26.881 24.49 23.7 28
25.0 25.69 27.14 26.321 27.216 19.22 29.6 26.5
24.15 24.18 26.4 25 25.7 28 25.556

Construct a grouped frequency distribution for the data using suitable class
size.

Solution:

The variable is the amount of rainfall which is continuous.


Number of classes is set to be 5.

Lowest value = 19.22; highest value = 33.01


Class size is set to be 3 cm.

Frequency distribution for the amount of rainfall in the month of December:

Amount of rainfall (cm) Tally count Number of days (f)


19 - < 22
22 - < 25
25 - < 28
28 - < 31
31 - < 34
Total 31

Chapter 1 – Page 7
Basic components of a frequency distribution:

 Class limits
the smallest and largest possible measurements in each class, i.e. the
upper and lower limits are known as class limits.

 Class boundaries
the dividing lines (walls) between successive classes.

 Class size / class width = upper class boundary – lower class boundary.

 Class mark / mid-point (x)


the value exactly at the middle of a class. It lies halfway between the
class limits or the class boundaries.

1
Class mark = x = (lower class limit + upper class limit)
2
or
1
Class mark = x = (lower class boundary + upper class boundary)
2

Chapter 1 – Page 8
Example
Class Class boundaries Class size Class mark, x
10 - 29 9.5 – 29.5 29.5 – 9.5 =20 19.5
30 - 49 29.5 – 49.5 49.5 – 29.5=20 39.5
50 - 69 49.5 – 69.5 69.5 – 49.5=20 59.5
70 - 89 69.5 – 89.5 89.5 – 69.5=20 79.5
90 - 109 89.5 – 109.5 109.5 –89.5=20 99.5

1st class 2nd class 3 rd class


Class Mark (x) 19.5 39.5 59.5
  
...] [//////// • //// ///] [/////// • ////////] [/////// • ////////] […
Class Limits 9 10 29 30 49 50 69 70

Class Boundaries 9.5 29.5 49.5 69.5

Example
Class Class boundaries Class size Class mark, x
19 – < 22 19 – 22 22 – 19 = 3 20.5
22 – < 25 22 – 25 25 – 22 = 3 23.5
25 – < 28 25 – 28 28 – 25 = 3 26.5
28 – < 31 28 – 31 31 – 28 = 3 29.5
31 – < 34 31 – 34 34 – 31 = 3 32.5

1st class 2 nd class 3 rd class


Class marks 20.5 23.5 26.5
  
[///////// • /////////)[//////// • //////////)[///////// • /////////)[……
Class limits 19 22 25 28
Class boundaries 19 22 25 28

Chapter 1 – Page 9
Histogram
• is a graphical representation of the frequency distribution. A bar is drawn
for each class and the area of each bar is proportional to the class
frequency. The bars are drawn adjacent to another.

• The x-axis shows either the class BOUNDARIES or the class MID-POINT.
The y-axis shows the frequency.

• For frequency distribution with equal class size, the height of each bar
is drawn proportional to the actual frequency of each class and the width
of each bar extends from the lower class boundary to the upper class
boundary of the class.

• For frequency distribution with unequal class size, adjustment of


frequency is needed for each class, where

Common class size × Given frequency


Adjusted frequency =
Given class size

Example
Construct a histogram for the frequency distribution of the number of books
borrowed per year in the library by 30 students:

Number of Books Number of Students


10 – 29 8
30 – 49 7
50 – 69 4
70 – 89 7
90 – 109 4
Total 30

Chapter 1 – Page 10
Solution:

Example
Construct a histogram for the frequency distribution the amount of rainfall in
the month of December:

Amount of rainfall (cm) Number of days


19 - < 22 4
22 - < 25 10
25 - < 28 12
28 - < 31 4
31 - < 34 1
Total 31

Chapter 1 – Page 11
Solution:

Example
Construct a histogram for the frequency distribution of sales of 46 branches
of a company in one week.
Sales (units) No. of branches
0 – 99 10
100 – 199 18
200 – 299 8
300 – 499 6
500 – 699 4
Solution:
No. of branches Class Class *Adjusted
Sales (units)
(frequency) boundaries size frequency
0 – 99 10 -0.5 – 99.5 100 10
100 – 199 18 99.5 – 199.5 100 18
200 – 299 8 199.5 – 299.5 100 8
300 – 499 6 299.5 – 499.5 200 3
500 – 699 4 499.5 – 699.5 200 2

100 × given frequency


Adjusted frequency = , where common class size = 100
given class size

Chapter 1 – Page 12
• The term skewness is used to describe the shape of a frequency
distribution.

Positive skewness Negative skewness Symmetrical / Not skewed


The peak of the histogram lies The peak of the histogram lies The peak of the histogram lies
to the left of the center of the to the right of the center of at the center of the
distribution. the distribution. distribution.

Chapter 1 – Page 13
Cumulative Frequency Distribution
• Given a frequency distribution, a cumulative frequency distribution can be
derived by the addition of the frequencies of the successive classes.
• There are two types of cumulative frequency distributions: “Less than”
and “More than”. In this course, the “Less than” cumulative frequency
distribution is used.

The “Less than” cumulative frequency distribution


is a table showing the total frequency of all values less than the upper class
boundary of each class.

Example
Number of Number of Class ‘<’ Cum. Freq. table
books weeks (freq.) boundaries No. of books Cum. freq.
< 9.5 0
10 – 29 8 9.5 – 29.5 < 29.5 8
30 – 49 7 29.5 – 49.5 < 49.5 15
50 – 69 4 49.5 – 69.5 < 69.5 19
70 – 89 7 69.5 – 89.5 < 89.5 26
90 – 109 4 89.5 – 109.5 < 109.5 30

upper class boundaries

Example
Amount of rainfall Number of Class ‘<’ Cum. Freq. table
(cm) days (freq.) boundaries Amount of rainfall (cm) Cum. freq.
< 19 0
19 - < 22 4 19 – 22 < 22 4
22 - < 25 10 22 – 25 < 25 14
25 - < 28 12 25 – 28 < 28 26
28 - < 31 4 28 – 31 < 31 30
31 - < 34 1 31 – 34 < 34 31

upper class boundaries

Chapter 1 – Page 14
The “Less than” Cumulative Frequency Polygon (Ogive)
is a line chart of a cumulative frequency distribution that shows the
cumulative frequency less than the upper class boundary plotted against
the upper class boundary of a class.

Example
The following table shows the output produced by 20 employees in an hour
in a factory.
Output (units) Number of employees
1–5 1
6 – 10 2
11 – 15 3
16 – 20 9
21 – 25 5

Construct a ‘less than’ cumulative frequency distribution and plot a ‘less than’
cumulative frequency polygon. Then, estimate

(i) the number of employees producing output less than 13 units


(ii) the proportion of employees producing output more than 22 units
(iii) the number of units of output which will be exceeded by 90% of the
employees
(iv) the number of employees producing output between 8 and 18 units.

Solution:
Output Number of Class ‘<’ Cum. Freq. table
(units) employees (freq.) boundaries Output (units) Cum. freq.

1–5 1
6 – 10 2
11 – 15 3
16 – 20 9
21 – 25 5

Chapter 1 – Page 15
'<' Ogive of output produced by 20 employees
20

18

16

14
Cumulative Frequency

12

10

0
0.5 5.5 10.5 15.5 20.5 25.5
Output

From the ‘<’ cumulative frequency polygon, we can estimate

(i) number of employees producing output less than 13 units to be 4.5.

(ii) proportion of employees producing output more than 22 units to be


20 − 16.5 3.5
= = 0.175
20 20

Chapter 1 – Page 16
(iii) 90% of the employees are producing more than x units
→ 10% of the other employees (10% x 20= 2 employees) are
producing less than x units.
 From the ‘<’ cum. Freq. polygon, x = 8 units.

(iv) the number of employees producing output between 8 and 18 units to


be 10.5 − 2 = 8.5.

What is Business Analytics (BA)?


• Refers to the skill, technologies, practices for continuous iterative
exploration and investigation of past business performance to gain insight
and drive business planning.
• In short, BA is a rational, fact based approach to decision making.
• BA using analysis from real data, thus BA is about skills to turn data into
decision.
• To summarize, we distinguish the following steps in BA project.

(i) Data collection and pre-processing are always the first steps of BA
project.
(ii) Data often need to be collected, cleansed and combined with other
sources, - not all current and historical data stored contains all the
information required for a certain analysis.
(iii) Descriptive analytics – data will be analyzed and patterns
(insight/information) are found.

Chapter 1 – Page 17
(iv) Predictive analytics – Insight found from predictive phase used in this
phase to predict what is likely to happen in the future, if the situation
remain the same.
(v) Prescriptive analytics – alternative decisions are determined that change
the situation and which will lead to desirable outcomes.
(vi) – Decision has to be implemented, this requires various skills such as
knowledge of change management.
– Some of the steps above need to be repeated depending on the
outcome. For eg. If predictions are not accurate enough for a particular
application, then extra data is required to improve them.
– Not all BA projects include all the steps above. For eg. Prescriptive
analytics are not included if the project achieve prediction goal. The
project finish after descriptive or predictive steps.

Example:
A hotel chain analyzes its reservations to look for patterns: which are the busiest
days of the week? What is the impact of events in the city? Is there a seasonal
pattern? Etc. The outcomes are used to make a prediction for the revenue in the
upcoming months. By changing the pricing of the rooms in certain situations (such
as sports events or school holiday), the expected revenue can be maximized.

Chapter 1 – Page 18
AAMS1773 QUANTITATIVE STUDIES
Tutorial 1: Introduction to Statistics and Data Presentation

1. The data below are the marks obtained by 40 students in an examination.


62 54 38 33 80 66 56 60 68 52
57 71 85 47 50 71 52 76 49 69
48 68 55 49 79 41 61 65 75 81
64 58 66 59 52 43 65 48 41 56

(a) Construct a frequency distribution table using 30 – 39 as the first class, 40


– 49 as the second class and so on.
(b) Draw a histogram for the above data.
(c) Construct a “less than” cumulative frequency distribution.
(d) Draw a “less than” cumulative frequency polygon.

2. The following data is the heights (in nearest centimeters) of 85 employees


in a company:
169 179 183 186 166 181 177 173 167 193 176
183 162 170 186 174 188 165 168 174 170 176
186 177 185 175 179 166 190 182 182 180 194
177 184 175 168 181 180 172 178 192 175 189
180 175 183 191 172 188 180 176 185 178 179
173 165 170 178 181 181 189 187 191 179 196
179 182 171 169 171 184 198 182 175 190 187
176 164 187 167 185 177 184 178

(a) Tabulate the above data in the form of a frequency distribution, using 160 -
<165 as the first class, 165 - <170 as the second class and so on.
(b) Draw a histogram for the above data.
(c) Construct a “less than” cumulative frequency distribution.
(d) Draw a “less than” cumulative frequency polygon.
(e) Using the graph in part (d), estimate:
(i) the height which will be exceeded by 25% of the employees.
(ii) the number of employees who have heights less than 175 cm.
(iii) the proportion of employees who have heights exceeding 175 cm.
Chapter 1 – Page 19
3. The following table shows the gross profit of a random sample of 500 small
companies in a year.

Gross Profit ($thousand) Percentages of companies


Under 10 8
10 and under 20 22
20 and under 30 36
30 and under 40 18
40 and under 60 10
60 and under 90 6

(a) Draw a histogram.


(b) Construct a “less than” cumulative frequency distribution.
(c) Plot a “less than” cumulative frequency polygon and use it to estimate
(i) the number of small companies which earned at least $38,000 of
gross profit;
(ii) the proportion of small companies which earned less than $45,000 of
gross profits.

4. The following data shows the number of rejects from the assembly line of a
local manufacturer recorded for a period of 80 days:
Number of rejects Number of days
0–4 1
5–9 14
10 – 14 23
15 – 19 20
20 – 24 16
25 – 29 6

(a) Draw a histogram for the data.


(b) Construct a “less than” cumulative frequency distribution and plot a “less
than” cumulative frequency polygon. Use the graph to estimate
(i) the number of days that produce at most 12 rejects;
(ii) the number of rejects exceeded by 10 % of the days.

Chapter 1 – Page 20
5. The following cumulative frequency distribution shows the duration of each
telephone call made by an employee recorded for a period of one month:

Duration (minutes) Number of calls


Under 3 45
Under 6 104
Under 9 142
Under 12 173
Under 18 192
Under 24 200

(a) Draw the cumulative frequency polygon for the above distribution.
(b) Use the graph to estimate:
(i) the number of calls that lasted between 5 and 10 minutes;
(ii) the duration not exceeded by 90% of the calls.
(c) Redraft the above data in the form of frequency distribution and construct a
histogram.

Answers:
2. (e) (i) 185.5 cm. (ii) 23 (iii) 0.7294
3. (c) (i) 100 (ii) 0.865
4. (b) (i) 26.5 days (ii) 24 rejects
5. (b) (i) 68 calls (ii) 14 min.

Chapter 1 – Page 21

You might also like