Professional Documents
Culture Documents
2021 CUMT 105 Module 1
2021 CUMT 105 Module 1
2021 CUMT 105 Module 1
INTRODUCTION TO STATISTICS
(Covid 2021)
by
Prof E. Chinamasa
e . x
x!
Prof. E. Chinamasa
2
echinamasa@cut.ac.zw
A.PREAMBLE
1. develop an appreciation of the role of statistics in their lives and confidence in handling
quantitative data
2. develop a statistical literacy and a foundation for further studies.
3. Apply statistical knowledge and skills in their disciplines
4. enhance student’s ability to present, analyse and interpret findings for decision making
5. Carry out statistical research and report its findings
6. Infer population data from sample findings
7. Carry out hypothesis tests for both parametric and non parametric statistics
8. Use calculators and computers to analyse and present findings for assignments
B.COURSE CONTENT
1 Orientation
Statistics Concepts
Data presentation
Prof. E. Chinamasa
3
C. Instructional Methodology: The course assumes that, students are able to deal with
elementary mathematics hence instruction takes a lecture participatory approach, project based
learning, experimentation complemented by discussions, exercises and assignments.
D. Assessment: Introduction to Statistics will be assessed by two in-class tests, two assignments
which constitutes the coursework mark contributing 40% of the final mark. A three hour
EXAMINATION paper contributes 60% of the final mark. A candidate must get at least 50%
from both coursework and exam, to pass Introduction to Statistics course.
E. References
Ness, P.D. Miller, R.K. Kartchner,A.D. and Pentico,D.W. (1985). Quantitative Methods for
Management Decision. New York: Mc Graw-Hill.
Prof. E. Chinamasa
4
Render, B., Stair, R.M. and Hanna, E. M. (2012). Quantitative Analysis for Managemant. New
York: Pearson.
Saha, S. (1995). Practical Business Mathematics and Statistics. New Delhi: Prenticehall.
Tulsian, P.C. and Pandey, N. (2002). Quantitative Techniques; Theory and Problems. New
Delhi: Prenticehall.
Wisneieswki, M. (2006). Quantitative Methods for Decision makers. New Delhi: Prenticehall.
Wisnieswki, M. and Stead, R. (1996). Quantitative Methods for Business. New Delhi:
Prenticehall.
STATISTICS CONCEPTS
One of the variables affecting students’ performance in this module is the language and statistical
terms used. We will start by visiting them so that we speak the same language.
1. Statistics as a process: is the collection, presentation, analysis and interpretation of data.
It can also be considered as a processing tool for creating an understanding from
numerical data.
2. Data is any collected information in its raw (unprocessed) state. The quantity of a
variable to be analyzed. For example these ages in years, of seven students:
24, 9, 35, 21, 30, 32, and 26 constitutes raw data.
3. Descriptive Statistics refers to methods of organizing, summarizing and presenting raw
data so that it shows a picture of its distribution to enable it to be described. It can be in:
a) Graphical form such as tally tables, bar graphs, stem and leaf diagram, box and
whisker plot, histograms, scatter plots or pictograms.
b) Numerical form such as measures of central tendency (mean, mode, median) or
measures of scatter, dispersion or variability (variance, standard deviation).
4. Inferential Statistics involves methods of using information from sample data to infer or
draw conclusion about the population. For example, if 85 students from the Statistics
Social class say they enjoy statistics, then we can conclude that, all students from the
Social class enjoy statistics. We use results that we get from the sample to approximate
the situation for the population. Why do you think this conclusion can work?
Prof. E. Chinamasa
5
5. Population refers to all sources of the data. For example, all first year students from
Hospitality and Tourism, School of Art and Design, Wild life Management and Visual
Arts form the population for Introduction to Statistics Class.
6. Sampling is the selection of participants subjected to the taste on behalf of the
population. Hence, a sample is a portion or part of the population selected for analysis.
Samples are representative articles or people from the population. For example, students
from A-level day schools who are in the Introduction to Statistics Social class, form a
sample of students for the Introduction to Statistics Social class. They are not a sample of
all first year students. Why? Can we infer that, a sample is a proper subset of the
population? Justify your answer.
7. A variable is any attribute or characteristic of the population that is of interest to the
researcher or statistics student. It can be measured or observed. Examples of variables
include: (i) students’ hair styles. (ii) the pocket money students have. (iii) number of cars
in the university car park at the flag post. (iv) color of the cars. (v) students’ complexion.
A variable can be qualitative, quantitative, discrete or continuous.
a) A quantitative variable has a numerical measurement which can be added,
subtracted or averaged. For example, the number of library books a student can
borrow from the library per day, is a number. Say 4 books. The number makes it a
quantifiable variable, because 4 is a quantity.
b) Qualitative variables describe the quality of objects or attribute of individuals by
categorizing or placing them in a group. For example, students can be categorized by
gender as Female or Male. So gender is a qualitative variable. Classify the variables
in the definition examples (i) to (v) as either qualitative or quantitative. Justify your
classification.
c) Discrete variables are counted in whole numbers. For example, the number of
students wearing hats in the class. There are no fractions here. Another example is
cars in the car park and monkeys in the university. Why should we say, books
borrowed form a discrete quantitative variable?
d) Continuous variables take approximated numerical values. For example we can say
your age is 23years. That is an approximation. Your actual age may be 22years,
7months, 3weeks, 5days, 13hours, 24minutes and 12 seconds. Where is the
continuity? Other examples of continuous variables include speed, weight, height.
8. A Statistic is a descriptive measure from a sample. For example, the sample mean ( x ).
If the average age of a representative sample of seven students from Creative Art is 25,
we can write, x = 25. The average 25 is a Statistic. Not Statistics, No!!
9. A parameter is a descriptive measure of the population. For example, the population
mean ( ). If the average age of all students in the Social Sciences class is 34, we can
write, = 34 years. The population average age 34, is the parameter.
10. Statistics has basically five functions:
Prof. E. Chinamasa
6
a) Condensing large volumes of data into simple forms through graphical and
descriptive presentations. For example, we reduce space by referring to the average
rather than listing all the values.
b) Prediction and forecasting (linear regression analysis)
c) Comparison of data distribution by use of percentages, bar graphs and line graphs.
d) Facilitation of description of variable distribution.
e) Statistics is used for inferences of population parameters from sample statistics. There
are four branches of statistical inferences, namely: estimation theory, hypothesis
testing, non parametric tests and sequential analysis.
1. It is important for the manager to know the appropriate data collection and sampling
methods used so that:
(a) He/ She advises subordinates who do not know how to collect data. This becomes a
source of the manager’s expert power.
(b) Managers should evaluate data presented to them before making decisions which commit
organization resources.
(c) Data collection and sampling methods have a bearing on the data itself hence the
manager should know how it is collected.
(d) The identification of the population helps to focus intervention strategies to the
appropriate and affected population.
(e) Managers should also know how the data was collected in order:
To account for errors in it
For remedial action to improve reliability of findings
To determine the representativeness and reliability of data
To justify the target population for intervention
2. There are basically four methods of collecting data which can be used in organizations:
a)Surveys: collect data from people’s minds by asking questions. The questions can be oral
called an interview or written either as a test or self –reporting questionnaire. Surveys collect large
quantities of data within a short period. Examples of surveys include an examination, census and
elections. The following instruments are used in surveys.
Prof. E. Chinamasa
7
Method Instrument
Interview Interview Guide
Examination Examination Question Paper
Elections Ballot Paper
Census Optimal Mark Recognition Form
Note that: Key elements of a survey are collection of data from people’s minds by asking
questions. Hence people’s perceptions, opinions, preferences and knowledge levels
are collected through surveys.
b) Observations: Collect data through sight. Variable indicator for observation is behavior or
action. The researcher infers the meaning of particular actions or behaviors observed.
Observation is ideal when:
1. Variable indicator is action
2. Environment is used to interpret data.
3. Participants have no language or are unable to describe and explain their actions.
4. Participants are over –involved in their actions to be able to describe it.
5. Observer thinks that participants would not say what they actually do.
6. It may be dangerous if participants know that data about them is being collected.
7. Observation can also be used to evaluate manipulative skill (dexterity).
(d) Experimental Interventions: such as campaigns, organization training sessions are data
collection methods. For managers to evaluate the effectiveness of interventions, they
need to measure the level of the variable before the intervention and after the
intervention.
For an intervention:
Data must be collected from the same participants before and after the intervention
The same variable must be measured before and after the intervention
All participants must be exposed to the intervention at the same time.
Effectiveness is measured by comparing the levels of the variable before and after the
intervention.
There are four probability sampling techniques presented in the table below:
Variable Distribution Sampling Method
1. Is uniform, evenly distributed within Simple Match participant’s numerical
the population e.g. your knowledge of Random identity to simple random
quantitative methods taught by Doctor numbers generated by a
Chinamasa computer or scientific
calculator
2. Follows a linear dependence, e.g. Systematic 1. Apply simple random
people in a queue for a scarce Random sampling to identify
commodity. Each respondent’s view Sampling the starting point on
depends on the position of the the queue
respondent in the queue e.g. bottles of 2. Count and pick every
Coca Cola on a conveyor belt 5th item for inspection
e.g. people queuing for service at the or interview
passport offices
uniformly distributed.
Note: 1. Each probability sampling technique has a simple random sampling at some point.
2.Strategic managers are expected to apply probability sampling techniques during data
collection and their research projects.
DATA PRESENTATION
1. A graph is a diagram showing (a series of one or more points, lines, line segments, curves or
area) that represents the variation of a variable in comparison with that of one or more other
variables.
2. The purpose of graphs is to present data that are too numerous or complicated to be
described in words. Hence we use graphs:
a) for easy visualization (pictogram)
b) to display distribution patterns ( stem and leaf diagram, bar graph)
c) to illustrate relationships between two sets of variables (scatter plot)
d) to save time and space
e) to communicate with the illiterate (pictograms)
f) to compare parts of a whole (pie chart)
3. A graph must have an informative title with the variables on the horizontal and vertical axis
labeled. The horizontal axis must always have the independent variable. Diagrams must not be
too small or too big to distort information.
20 15 5 40 10
i) How do you classify these variables; clothes, food, sweets, cosmetics and module?
Justify your answer. [4]
ii) Use your computer to present the data on a justified graph. [9]
Answer
The variables are quantitative and discrete. They can be counted in whole numbers.
The data is presented on a pie-chart. The aim is to show the distribution of the $100,00.
Therefore the appropriate graph is the pie-chart.
How do we draw a pie-chart at a University of Technology?
1. Close this Word Window, and Go to Excel.
2. Enter the Items and expenses in the cells as shown by the lecturer or tutor
3. Highlight the cells
4. Look at your computer tool bar, Click Insert.
5. Select pie-chart and click (the pie-chart is drawn but it has no title)
6. On the tool bar, check for the graph with a provisional space for the title and click
7. Type the title of the graph.
8. Highlight the graph and copy.
9. Save the graph and close the window.
10. Open your word document and Paste the graph as shown below
Prof. E. Chinamasa
11
Qb) The following data shows the marks of 51 students from the Introduction to Statistics
Social class test. 74, 98, 42, 75, 83, 87, 65, 59, 63, 86, 78, 37, 99, 66, 90, 79,
80, 89, 68, 57, 95, 55, 79, 88, 76, 60, 77, 49, 92, 83, 71, 78, 53, 81,
77, 58, 93, 85, 70, 61, 15, 80, 74, 69, 90, 62, 84, 64, 73, 48, 72.
(i) Name two sources of this data. [2]
(ii) Present the data on a justified graph. [10]
(iii) Write the advantages of presenting data on a Stem and Leaf diagram.[4]
(iv) Name two variables from your area of specialization, whose distribution can be
presented on a Stem and Leaf diagram. [4]
Answer
i) Three possible sources of this data are: the lecturer’s mark sheet, students’ scripts (by
document analysis) and students’ themselves (by interviews through surveys)
ii) Data is presented on a Stem and Leaf diagram to show the distribution and maintain
the identity of each entry.
Prof. E. Chinamasa
12
2) Record each unit under the Leaf column, against the appropriate stem like this:
Stem Leaf
1 5
2
3 7
4 2 9 8
5 9 7 5 3 8
6 5 3 6 8 0 1 9 2 4
7 4 5 8 9 9 6 7 1 8 7 0 4 3 2
8 3 7 6 0 9 8 3 1 5 0 4
9 8 9 0 5 2 3 0
3) Now draw a second diagram with the units (leaves) arranged in order of size, to produce this
second graph which you submit for marking.
Stem Leaf
1 5
2
3 7
4 2 8 9
5 3 5 7 8 9
6 0 1 2 3 4 5 6 8 9
7 0 1 2 3 4 4 5 6 7 7 8 8 9 9
8 0 0 1 3 3 4 5 6 7 8 9
9 0 0 2 3 5 8 9
4) Provide a key: 7 6 = 76. (which reads: 7 vertical line 6 equals 76)
iii) Advantages of presenting data on a Stem and Leaf diagram.
1. The data is arranged in order from 15 to 99.
Prof. E. Chinamasa
13
2. It shows the distribution of the variables (marks). We can see that, the majority of
the students got around 70s. That is the longest line.
3. There is one outlier case of 15%
4. The mark distribution is negatively skewed. They decrease gradually backwards.
The Skewness of a distribution can be shown by the Box and Whisker plot. How do we
construct the box and whisker plot?
Identify the Quartiles in the distribution. Divide the distribution in to 4 equal parts
(quarters). In this case Q1 = 62, Q2 = 75 and Q3 = 84.
5. Draw a horizontal line to cover the whole distribution. From 10 to 100.
6. Mark the quartiles and draw the box
7. Extent the whiskers to the lowest entry 37 and show the outlier 15 by a dot.
Extent also the upper whisker to 99.
We now focus on the box. For any distribution, one of these three boxes will arise.
Q1 Q2 Q3 When Q2 – Q1 > Q3 – Q2, the distribution is negatively
Skewed.
Q1 Q2 Q3 When Q2 –Q1 < Q3 – Q2, the distribution is positively
Skewed.
Q1 Q2 Q3 When Q2 – Q1 = Q3 –Q2, the distribution is normal.
What is the distribution shown by the box and whisker plot for the students’ mark distribution
above?
Prof. E. Chinamasa
14
Qc) The table shows the number of students in each of the programs, who were registered
for Introduction to Statistics, Social Class in December 2016.
Number of Candidates 21 53 55 77 16 30
ii).Since the data shows everybody who registered for Introduction to Statistics in the Social
Class, A census was used.
ii) The majority of the students registered for Introduction to Statistics in the Social Class are
from BSIT. This is followed by BSCAD. The number of students is from BSWSM.
Prof. E. Chinamasa
15
Qd) The table shows the average rainfall recorded at a weather station in Chinhoyi.
Month Oct 2016 Nov 2016 Dec 2016 Jan 2017 Feb 2017 March 2017
Rainfall (mm) 30 72 45 70 92 53
Answer
i) We intent to describe trends (changes over time) for a single variable (rainfall). We
use a line graph.
ii) Rainfall was lowest in October 2016 (30mm). It increased from October (30mm) to
November (72mm) the dropped in December to (45mm). From December rainfall
increased to its pick in February (90mm) then dropped again in March (50mm).
iii) Data was collected first by experimental methods. Daily rainfall was measured and
the average for each month calculated. These records were kept as the given data.
Prof. E. Chinamasa
16
Qe) The table shows the speed of cars approaching robots at the intersection of Hebert
Chitepo and Harare/Chirundu road in Chinhoyi.
Speed (km/h) 15 to 20 21 to 30 31 to 45 46 to 50 51 to 75
Number of Cars 9 20 60 10 25
i) Name the method and instrument used to collect this data. [2]
ii) Identify three factors which justifies the data collection method you named in (i).
[3]
iii) Present the data on a justified graph.[2+8]
iv) Name five differences between a bar graph and a frequency density graph.[5]
Answer
i) Data was collected by observation. The instrument used is a speed detector.
ii) Observation is the data collection method because; vehicles have no language, the
variable indicator (speed) is action and drivers may not say the truth about the speed
of their vehicles.
iii) The variable (speed) is quantitative and continuous. Continuous variables are
presented on a histogram (which shows distribution of continuous variables).
Prof. E. Chinamasa
17
Used for single discrete variables Used for single continuous variables
Bars are separated to show discreteness Bars are joint to show continuity
Qf) The table shows the time a that a sample of students spend studying Introduction to
Statistics and the marks that they got in the In-Class test 2.
Student A B C D E F G H I
Mark 65 21 63 80 55 14 84 50 68
i) Why are letters and not names used in this study? [2]
ii) Show the relationship between time spend preparing for the in-class and mark
attainted on a justified graph. [12]
iii) Describe the relationship between the variables.[2]
Answer
i) Letters are used instead of students’ names to protect the individual students. This is
an ethical observation in research.
ii) The appropriate graph is a scatter plot. It is used to show relationships between two
variables (time) and (mark) which have a linear dependency.
Prof. E. Chinamasa
18
iii) There is a positive co-relationship between the time spend studying and the mark a
student would get. As the time spend reading increase the mark will also increase.
STUDENT ACTIVITY 1
Q1. The ages of people who came for an HIV-test on Father’s day are recorded below:
24, 91, 32, 56, 59, 36, 68, 76, 23, 40, 54, 32, 60, 55, 40, 31, 30, 25, 28, 30,
36, 79, 76, 54, 37, 66, 56, 24, 74, 72, 69, 21, 30, 57, 70, 43, 53, 35, 60,
65, 42, 26, 54, 76, 70, 21, 38, 76,
Age Group 21 to 35 36 to 51 52 to 61 62 to 91
Frequency
Q2. The figures below represent the number of cattle owned by farmers in Shangwe
Resettlement area.
70, 81, 17, 107, 69, 50, 62, 57, 69, 99, 30, 38, 89, 77, 55, 68, 19, 20, 89, 40,
27, 35, 84, 47, 35, 20, 83, 67, 20, 35, 54, 72, 53, 48, 35, 31, 22, 63, 75, 76, 68,
56, 48, 34, 49, 35, 49.
Prof. E. Chinamasa
19
Q4. The table shows the sales of school uniforms and blankets from a Shop in 2015.
Month Jan Feb March April May June July Aug Sept Oct Nov Dec
Blankets 10 13 20 26 30 21 10 8 7 8 9 8
Uniforms 38 20 11 30 26 15 12 5 28 26 20 27
Prof. E. Chinamasa
20
STUDENT ACTIVITY 2
1.Use your calculator to find the following: Mean, Mode, standard deviation and Median of
Questions 1 to 4 in student’s activity 1. Use them to describe the distributions.
2. The variance and standard deviation show us how the variables are spread within the
3( Mean Median)
population. Hence Pearson’s Coefficiant of Skewness =
S tan darddeviat ion
3. The mean, mode, median and standard deviation are presented here because they help us to
describe in numerical terms, the distribution of the variables within the population.
STUDENT ACTIVITY 3
Q1. The speed of cars at a certain section of a curved road are recorded in the Table 1
b) Calculate and interpret the (i) Mean (ii) Mode (iii) Median (iv) Standard Deviation
Q2. The height of tobacco plants on a seedbed after two-weeks of germination is presented
(a) Calculate and interpret the (i) Mean (ii) Mode (iii) Median (iv) Standard Deviation
Q3. The time taken by infancy at a pre-school, to solve a task in the form of a model was
recorded in table:
Prof. E. Chinamasa
21
Q4.The weights of calves delivered at a dairy farm in May are recorded in Table:
Q5. The time taken by Fine Art students to paint a still life composition is presented in the Table:
Q6. The ages of a sample of students from the Masters in Strategic Studies Group were recorded
In Table:
ANSWERS TO QUESTIONS
Q1 (a)
Prof. E. Chinamasa
22
Mean
= 29,017
29km/h. The average speed of cars on this section of the road is 29 km/h
Mode, Mo = +
Prof. E. Chinamasa
23
20, 5 +
= 20, 5 +
= 20, 5 + 3, 2
Median, = 20,5 +
Me = 20, 5 +
Me = 20, 5 + 4, 25
Prof. E. Chinamasa
24
= 56 = 1 625 = 54 892,5
Variance, = ( – )
= (54 892, 5 –
= 140.7
Standard Deviation s =
S=
S = 11, 86
S = 11, 9
Q2 (a)
Prof. E. Chinamasa
25
(i) Mean, =
Mode, = + +
Prof. E. Chinamasa
26
= 10, 5 +
= 10, 5 + 0, 857
(iii)
Median, = +
= 10,5 +
= 10, 5 +
= 10, 5 + 0,875
= 11,375
11, 4 cm
Prof. E. Chinamasa
27
3,5 2 7 24,5
8 10 80 640
12 12 144 1728
17 7 119 2023
= 31 = 350 = 4 415,5
Variance, = ( – )
= (4 415,5 – )
1
= (4 63,89)
30
= 15,46
Standard Deviation s =
S=
S = 3,93
Prof. E. Chinamasa
28
Q3 c (i) Mean, =
= 9,275
The average time taken by infants to complete the task is 9,275 min
Mode, = +
= 10, 5 + 1, 9286
Prof. E. Chinamasa
29
Median = +
= 7, 5 + 3( )
= 7, 5 + 2
= 9, 5
4,5 18 81 364,5
9 3 27 243
12 12 144 1 728
17 7 119 2 023
2
= 40 = 371 = 4 358,5
(iv) Variance = ( –
= (
= (
= (197, 48)
= 23,525
Variance = 23,525
Standard Deviation, s =
S=
S = 4.85
Prof. E. Chinamasa
30
ANSWERS
Q5. (a) = 10,8min (b) = 13,5min (c) = 10,88min (d) s = 6,12 min
Q6. (a) = 39,2 years (b) = 40,5years (c) = 39,75years (d) s = 9 years
Prof. E. Chinamasa