Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Chapter 1 Introduction to Statistics and Data

Presentation

Introduction to Statistics
➢ Statistics is the science of collecting, organizing, presenting,
analyzing and interpreting numerical data to assist in making
more effective decisions.

➢ Purposes of Statistics
• Statistical techniques are used extensively by marketing
managers, accountants, consumers, educators, politicians,
physicians, etc.
• Statistical techniques are used to make many decisions that
affect our lives. Regardless what your future line of work is,
you will make decisions that involved data.
Each, every

➢ Population – a collection of all possible individuals, objects or


measurements of interest.
➢ Sample – a portion or part, of the population of interest.
group

1
➢ Type of variables:

2
Data Presentation
➢ Raw data
• Data collected that have not been organized or processed are
called raw data.
• When every observed value of the random variables is listed,
the data are called ungrouped data.
• Grouping is one of the most common methods of organizing
data. When we group data we are actually frequency
distributions for the raw data.

➢ Frequency distribution
• A table in which possible values for a variable are grouped
Age f into non-overlapping classes, and the number of observed
1-3 4
2-4 5 values which fall into each class is recorded.
1-3 3 • Data organized in a frequency distribution are called
4-6 4
grouped data.

Eg 1: The frequency distribution below represents the number of


books read by 500 students in a school for one year.

No. of No. of students


books read (Frequency, f)
0–9 52
10 – 19 63
20 – 29 71
30 – 39 96
40 – 49 43
50 – 59 58
60 – 79 72
80 – 99 45

The variable is number of books read.


The data (number of books read) are grouped into 8 classes.
3
• Number of classes - Usually the number of classes for a
frequency distribution table gives varies from 5 to 15,
depending mainly on the number of observation in the data
set.
• Class midpoint - A point that divides a class into two equal
parts. This is the average of the upper and lower class limit.
• Class frequency - The number of observations in each class.
• Class width (size)
𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒−𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
Approximate class width = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

Some common practice for classes:

* Exclusive class type is mainly used for continuous data or discrete


data which have been rounded to the nearest tens, hundreds,
thousands, millions and so on.
** Inclusive class type is mainly used for discrete data where there is
a gap between classes.
*** An open-ended class size is assumed to be the same with the class
size of the nearest (immediate neighbor) class.

4
Eg 2: The following is a record of the number of books borrowed per
week in the library for 30 weeks.

Tabulate the data in the form of a frequency distribution,


grouping by suitable class size.
Solution: The variable of interest is number of books borrowed.
It is quantitative discrete variable.

The smallest value = 10


The largest value = 100
We decide to group these data using 5 classes of equal width.
100−10
Approximate width of class = = 18  20
5

Frequency distribution for the number of books borrowed per week in


the library for 30 weeks.

Number of books Tally count Frequency


10 – 29   8
30 – 49   7
50 - 69  4
70 – 89   7
90 - 109  4
Total 30

5
Eg 3: The amount of rain fall (in cm3) for a small town was recorded
for the month of December.

Construct a grouped frequency distribution for the data using


suitable class size.

Solution: The variable of interest is amount of rain fall.


It is quantitative continuous variable.

The lowest value = 19.22


The highest value = 33.01
We decide to group these data using 5 classes of equal width.
33.01−19.22
Approximate class width = = 2.758  3
5

Frequency distribution for the amount of rainfall in the month of


December.

Amount of rain fall Tally count Number of days, f


(cm3)
19 - < 22  4
22 - < 25   10
25 - < 28    12
28 - < 31  4
31 - < 34  1
Total 31

6
Eg 4:

Number of Class Class Size Class Midpoint,


books boundaries x
10 – 29 9.5 – 29.5 20 19.5
30 – 49 29.5 – 49.5 20 39.5
50 – 69 49.5 – 69.5 20 59.5
70 – 89 69.5 – 89.5 20 79.5
90 – 109 89.5 – 109.5 20 99.5

Eg 5:
Amount of rain Class boundaries Class size Class
fall (cm3) Midpoint, x
19 - < 22 19 – 22 3 20.5
22 - < 25 22 – 25 3 23.5
25 - < 28 25 – 28 3 26.5
28 - < 31 28 – 31 3 29.5
31 - < 34 31 – 34 3 32.5

7
Histogram
➢ A graph in which the classes are marked on the horizontal axis
and the class frequencies on the vertical axis.

➢ The class frequencies are represented by the heights of the bars


and the bars are drawn adjacent to each other.

➢ Shapes of histogram
❖ Symmetric
❖ Skewed
❖ Uniform or a rectangular

Symmetric Skewed to right

Skewed to left Uniform

8
Eg 6: Construct a histogram for the frequency distribution of the
number of books read borrowed per year in the library by 30
students.

Solution:
Number of f Class
books boundaries
10 – 29 8 9.5 – 29.5
30 – 49 7 29.5 – 49.5
50 – 69 4 49.5 – 69.5
70 – 89 7 69.5 – 89.5
90 – 109 4 89.5 – 109.5

9
Eg 7: Construct a histogram for the frequency distribution the amount
of rainfall in the month of December.

Solution:

10
➢ For frequency distribution of unequal class size, the height of
each bar is drawn proportional to the adjusted frequency of each
𝑐𝑜𝑚𝑚𝑜𝑛 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒 ×𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
bar where Adjusted frequency =
𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒

Eg 8: Construct a histogram for the frequency distribution of sales of


46 branches of a company in the course of one week.
Sales (units) No. of branches
0 – 99 10
100 – 199 18
200 – 299 8
300 – 499 6
500 – 699 4
Solution:
Sales Frequency Class Class size Adjusted
(units) boundaries frequency
0 – 99 10 -0.5 – 99.5 100 10
100 – 199 18 99.5 – 199.5 100 18
200 – 299 8 199.5 – 299.5 100 8
300 – 499 6 299.5 – 499.5 200 100×6
=3
200
500 – 699 4 499.5 – 699.5 200 100×4
=2
200

11
➢ Cumulative Frequency Distribution
• Used to determine how many or what proportion of the data
values are below or above a certain value.
• The “Less than” cumulative frequency distribution is a table
showing the total frequency of all values less than the upper
class boundary of each class.

Eg 9:
Number of f Class Upper cf or F
books boundaries boundary
9.5 0
10 – 29 8 9.5 – 29.5 29.5 8
30 – 49 7 29.5 – 49.5 49.5 15
50 – 69 4 49.5 – 69.5 69.5 19
70 – 89 7 69.5 – 89.5 89.5 26
90 – 109 4 89.5 – 109.5 109.5 30

Eg 10:
Amount of f Class Upper cf
rain fall boundaries boundary
(cm3)
19 0
19 - < 22 4 19 – 22 22 4
22 - < 25 10 22 – 25 25 14
25 - < 28 12 25 – 28 28 26
28 - < 31 4 28 – 31 31 30
31 - < 34 1 31 – 34 34 31

12
The “Less than” Cumulative frequency polygon (Ogive)
is a line chart of a cumulative frequency distribution that shows the
cumulative frequency less than the upper class boundary plotted
against the upper class boundary of a class.

Eg 11: The following table shows the output produced by 20


employees in an hour in a factory.
Output (units) Number of employees
1–5 1 Variable of interest = output
6 – 10 2 Quantitative discrete
11 – 15 3
16 – 20 9
21 – 25 5

Construct a “less than” cumulative frequency distribution and plot a


‘less than’ ogive. Hence, estimate
(i) the number of employees producing output less than 13 units.
(ii) the proportion of employees producing output more than 22 units.
(iii) the number of units output which will be exceeded by 90% of the
employees.
(iv) the number of employees producing output between 8 and 18
units.

13
Solution:
Output Number of Class Upper cf
(units) employees, f boundaries boundary
0.5 0
1–5 1 0.5 – 5.5 5.5 1
6 – 10 2 5.5 – 10.5 10.5 3
11 – 15 3 10.5 – 15.5 15.5 6
16 – 20 9 15.5 – 20.5 20.5 15
21 – 25 5 20.5 – 25.5 25.5 20

14
From the ogive,
(i) number of employees producing output less than 13 units = 4.5
(ii) proportion of employees producing output more than 22 units
20−16.5 3.5
= = = 0.175
20 20

(iii) 90% of the employees are producing more than x units


→ 10% of the other employees (10% ╳ 20 = 2 employees) are
producing less than x units.
 From the ogive, x = 8 units

(iv) the number of employees producing output between 8 and 18


units = 10.5 – 2 = 8.5

Business Analytics (BA)


What is Business Analytics (BA)?
-refers to the skill, technologies, practices for continuous iterative
exploration and investigation of past business performance to gain
insight and drive business planning.
-In short, BA is a rational, fact-based approach to decision making.
-BA using analysis from real data, thus BA is about skills to turn data
into decision.
-To summarize, we distinguish the following steps in BA project.

15
(i) Data collection and pre-processing are always the first steps of
BA project.
(ii) Data often need to be collected, cleansed and combined with
other sources, - not all current and historical data stored contains
all the information required for a certain analysis.
(iii) Descriptive analytics- data will be analyzed and patterns
(insight/information) are found.
(iv) Predictive analytics- insight found from predictive phase used in
this phase to predict what is likely to happen in the future, if the
situation remains the same.
(v) Prescriptive analytics- alternative decisions are determined that
change the situation and which will lead to desirable outcomes.
(vi) -Decision has to be implemented, this requires various skills such
as knowledge of change management.
-Some of the steps above need to be repeated depending on the
outcome. For eg. If predictions are not accurate enough for a
particular application, then extra data is required to improve them.
-Not all BA projects include all the steps above. For eg.
Prescriptive analytics are not included if the project achieve
prediction goal. The project finish after descriptive or predictive
steps.

16
Example:
A hotel chain analyzes its reservations to look for patterns: which are
the busiest days of the week? What is the impact of events in the city?
Is there a seasonal pattern? Etc. The outcomes are used to make a
prediction for the revenue in the upcoming months. By changing the
pricing of the rooms in certain situations (such as sports events or
school holiday), the expected revenue can be maximized.

17

You might also like