2021 CUMT 105 Module 1

1
INTRODUCTION TO STATISTICS
SOCIAL SCIENCES CLASS
 (Covid 2021)
by
Prof E. Chinamasa
e   . x
x!
COURSE: Introduction to Statistics and Data Analysis

CODE: CUMT 105
Prof. E. Chinamasa
2
DURATION: January– March 2021, SEMESTER (48 HOURS)
LECTURER: Prof. E. Chinamasa
echinamasa@cut.ac.zw
A.PREAMBLE
Introduction to Statistics course, aims to introduce statistical concepts at university level.

Specifically, the course (CUMT 105) aims to:
1. develop an appreciation of the role of statistics in their lives and confidence in handling
quantitative data
2. develop a statistical literacy and a foundation for further studies.
3. Apply statistical knowledge and skills in their disciplines
4. enhance student’s ability to present, analyse and interpret findings for decision making
5. Carry out statistical research and report its findings
6. Infer population data from sample findings
7. Carry out hypothesis tests for both parametric and non parametric statistics
8. Use calculators and computers to analyse and present findings for assignments
B.COURSE CONTENT
Week Content Notes
1 Orientation
Statistics Concepts
2 Data collection and Sampling
Data presentation
3 Data presentation Assignment 1, Due: May 28

2021
Measures of central tendency
4 Measures of dispersion/Scatter In-class One last lecture

week:24 to 28 May 2021
Tutorial
Prof. E. Chinamasa
3
5 Linear Regression analysis
Linear Regression analysis
6 Tree diagram Assignment 2, due end of

lesson today 02/08 /2021
Normal distribution
7 Nor mal distribution In-class 2. 09/08/2021
Concept of Hypothesis testing
8 Test for mean, small samples
EXAM, EXAM, Exams please!!!!!!!
C. Instructional Methodology: The course assumes that, students are able to deal with
elementary mathematics hence instruction takes a lecture participatory approach, project based
learning, experimentation complemented by discussions, exercises and assignments.
D. Assessment: Introduction to Statistics will be assessed by two in-class tests, two assignments
which constitutes the coursework mark contributing 40% of the final mark. A three hour
EXAMINATION paper contributes 60% of the final mark. A candidate must get at least 50%
from both coursework and exam, to pass Introduction to Statistics course.
E. References
Crawshaw J. and Chambers, J. (2001) Advanced Level STATISTICS. Cheltenham: Nelson

Thorns.
Jothikumas, J. (2005). STATISTICS: Higher Secondary – First Year.

[o].www.textbooksonline.tn.nic.in/books/11/std11-stat-em.pdf.
Mc Grattan, E. R. (2007). Lecture notes: Quantitative Methods.[O].

ftp://ftp.mpls.frb.fed.us/pub/research/mcgrattan/minho07/minho1.pdf.
Morris, C. (1993). Quantitative Approaches in Business Studies. London: Pitman.
Ness, P.D. Miller, R.K. Kartchner,A.D. and Pentico,D.W. (1985). Quantitative Methods for
Management Decision. New York: Mc Graw-Hill.
Rao, A. (2003). Quantitative Techniques in Business. New Delhi: Jenico.
Prof. E. Chinamasa
4
Render, B., Stair, R.M. and Hanna, E. M. (2012). Quantitative Analysis for Managemant. New
York: Pearson.
Saha, S. (1995). Practical Business Mathematics and Statistics. New Delhi: Prenticehall.
Tulsian, P.C. and Pandey, N. (2002). Quantitative Techniques; Theory and Problems. New
Delhi: Prenticehall.
Valerie, P. (2006). Quantitative techniques in Business.[o] www.shsu.educ/mgt-

ves/BAN530/ReadingAssignment1.doc..
Wisneieswki, M. (2006). Quantitative Methods for Decision makers. New Delhi: Prenticehall.
Wisnieswki, M. and Stead, R. (1996). Quantitative Methods for Business. New Delhi:
Prenticehall.
STATISTICS CONCEPTS
One of the variables affecting students’ performance in this module is the language and statistical
terms used. We will start by visiting them so that we speak the same language.
1. Statistics as a process: is the collection, presentation, analysis and interpretation of data.
It can also be considered as a processing tool for creating an understanding from
numerical data.
2. Data is any collected information in its raw (unprocessed) state. The quantity of a
variable to be analyzed. For example these ages in years, of seven students:
24, 9, 35, 21, 30, 32, and 26 constitutes raw data.
3. Descriptive Statistics refers to methods of organizing, summarizing and presenting raw
data so that it shows a picture of its distribution to enable it to be described. It can be in:
a) Graphical form such as tally tables, bar graphs, stem and leaf diagram, box and
whisker plot, histograms, scatter plots or pictograms.
b) Numerical form such as measures of central tendency (mean, mode, median) or
measures of scatter, dispersion or variability (variance, standard deviation).
4. Inferential Statistics involves methods of using information from sample data to infer or
draw conclusion about the population. For example, if 85 students from the Statistics
Social class say they enjoy statistics, then we can conclude that, all students from the
Social class enjoy statistics. We use results that we get from the sample to approximate
the situation for the population. Why do you think this conclusion can work?
Prof. E. Chinamasa
5
5. Population refers to all sources of the data. For example, all first year students from
Hospitality and Tourism, School of Art and Design, Wild life Management and Visual
Arts form the population for Introduction to Statistics Class.
6. Sampling is the selection of participants subjected to the taste on behalf of the
population. Hence, a sample is a portion or part of the population selected for analysis.
Samples are representative articles or people from the population. For example, students
from A-level day schools who are in the Introduction to Statistics Social class, form a
sample of students for the Introduction to Statistics Social class. They are not a sample of
all first year students. Why? Can we infer that, a sample is a proper subset of the
population? Justify your answer.
7. A variable is any attribute or characteristic of the population that is of interest to the
researcher or statistics student. It can be measured or observed. Examples of variables
include: (i) students’ hair styles. (ii) the pocket money students have. (iii) number of cars
in the university car park at the flag post. (iv) color of the cars. (v) students’ complexion.
A variable can be qualitative, quantitative, discrete or continuous.
a) A quantitative variable has a numerical measurement which can be added,
subtracted or averaged. For example, the number of library books a student can
borrow from the library per day, is a number. Say 4 books. The number makes it a
quantifiable variable, because 4 is a quantity.
b) Qualitative variables describe the quality of objects or attribute of individuals by
categorizing or placing them in a group. For example, students can be categorized by
gender as Female or Male. So gender is a qualitative variable. Classify the variables
in the definition examples (i) to (v) as either qualitative or quantitative. Justify your
classification.
c) Discrete variables are counted in whole numbers. For example, the number of
students wearing hats in the class. There are no fractions here. Another example is
cars in the car park and monkeys in the university. Why should we say, books
borrowed form a discrete quantitative variable?
d) Continuous variables take approximated numerical values. For example we can say
your age is 23years. That is an approximation. Your actual age may be 22years,
7months, 3weeks, 5days, 13hours, 24minutes and 12 seconds. Where is the
continuity? Other examples of continuous variables include speed, weight, height.
8. A Statistic is a descriptive measure from a sample. For example, the sample mean ( x ).
If the average age of a representative sample of seven students from Creative Art is 25,
we can write, x = 25. The average 25 is a Statistic. Not Statistics, No!!
9. A parameter is a descriptive measure of the population. For example, the population
mean (  ). If the average age of all students in the Social Sciences class is 34, we can
write,  = 34 years. The population average age 34, is the parameter.
10. Statistics has basically five functions:
Prof. E. Chinamasa
6
a) Condensing large volumes of data into simple forms through graphical and
descriptive presentations. For example, we reduce space by referring to the average
rather than listing all the values.
b) Prediction and forecasting (linear regression analysis)
c) Comparison of data distribution by use of percentages, bar graphs and line graphs.
d) Facilitation of description of variable distribution.
e) Statistics is used for inferences of population parameters from sample statistics. There
are four branches of statistical inferences, namely: estimation theory, hypothesis
testing, non parametric tests and sequential analysis.
DATA COLLECTION METHODS
1. It is important for the manager to know the appropriate data collection and sampling
methods used so that:
(a) He/ She advises subordinates who do not know how to collect data. This becomes a
source of the manager’s expert power.
(b) Managers should evaluate data presented to them before making decisions which commit
organization resources.
(c) Data collection and sampling methods have a bearing on the data itself hence the
manager should know how it is collected.
(d) The identification of the population helps to focus intervention strategies to the
appropriate and affected population.
(e) Managers should also know how the data was collected in order:
 To account for errors in it
 For remedial action to improve reliability of findings
 To determine the representativeness and reliability of data
 To justify the target population for intervention
2. There are basically four methods of collecting data which can be used in organizations:
a)Surveys: collect data from people’s minds by asking questions. The questions can be oral
called an interview or written either as a test or self –reporting questionnaire. Surveys collect large
quantities of data within a short period. Examples of surveys include an examination, census and
elections. The following instruments are used in surveys.
Prof. E. Chinamasa
7
Method Instrument
Interview Interview Guide
Examination Examination Question Paper
Elections Ballot Paper
Census Optimal Mark Recognition Form
Note that: Key elements of a survey are collection of data from people’s minds by asking
questions. Hence people’s perceptions, opinions, preferences and knowledge levels
are collected through surveys.
b) Observations: Collect data through sight. Variable indicator for observation is behavior or
action. The researcher infers the meaning of particular actions or behaviors observed.
Observation is ideal when:
1. Variable indicator is action
2. Environment is used to interpret data.
3. Participants have no language or are unable to describe and explain their actions.
4. Participants are over –involved in their actions to be able to describe it.
5. Observer thinks that participants would not say what they actually do.
6. It may be dangerous if participants know that data about them is being collected.
7. Observation can also be used to evaluate manipulative skill (dexterity).
c)Documentary Analysis: Is the collection of data from documents. Data is collected to

deduce information about the author (s) or events being described in those documents.
For example:
 A letter of dismissal informs me that my employers no longer want my services.
 Circulated minutes of a meeting tell us what members present thought and agreed on.
They can also show points of debate and contradictions.
 An examination answer script reveals what the candidate knows and understood.
 Policy documents tell us what our employers want us to do and how we should do it.
Note that, Data can be collected during the absence of the participants.
(d) Experimental Interventions: such as campaigns, organization training sessions are data
collection methods. For managers to evaluate the effectiveness of interventions, they
need to measure the level of the variable before the intervention and after the
intervention.
For an intervention:
 Data must be collected from the same participants before and after the intervention
 The same variable must be measured before and after the intervention
 All participants must be exposed to the intervention at the same time.
 Effectiveness is measured by comparing the levels of the variable before and after the
intervention.
3. Probability Sampling Methods

Sampling is the selection of items or participants from a population, who are subjected to a test
of a specific variable on behalf of the population. Sampling is either probability or non-
probability.
Prof. E. Chinamasa
8
Probability sampling is appropriate when:

(a) The total population is known
(b) Every element or member is accorded an equal chance to participate
(c) The variable is expected to exist within each element or member.
There are four probability sampling techniques presented in the table below:
Variable Distribution Sampling Method
1. Is uniform, evenly distributed within Simple Match participant’s numerical
the population e.g. your knowledge of Random identity to simple random
quantitative methods taught by Doctor numbers generated by a
Chinamasa computer or scientific
calculator
2. Follows a linear dependence, e.g. Systematic 1. Apply simple random
people in a queue for a scarce Random sampling to identify
commodity. Each respondent’s view Sampling the starting point on
depends on the position of the the queue
respondent in the queue e.g. bottles of 2. Count and pick every
Coca Cola on a conveyor belt 5th item for inspection
e.g. people queuing for service at the or interview
passport offices
3. Depends on the level of the individual Stratified 1. Apply proportional

in a hierarchy e.g. Managerial Random sampling from level/
Supervisory or Operational Sampling strata to strata to cater
e.g. Form 6, Form 3, Form 1 for their quantitative
 They are different in numbers variation.
from level to level 2. Apply simple random
 Their views depend on their sampling within each
level (differs from level to strata since the variable
level) but can be uniform is expected to be
within the same level uniformly distributed
within each strata
4. Depends on the group to which the Cluster 1. Apply proportional
individual responding belongs e.g. Random sampling from group/
students’ views from different Sampling cluster to cluster to
universities. Each university is cater for the
considered as a unique cluster. quantitative variations
among the groups
2. Apply simple random

sampling in each group
since variable in one
group expected to be
Prof. E. Chinamasa
9
uniformly distributed.
Note: 1. Each probability sampling technique has a simple random sampling at some point.
2.Strategic managers are expected to apply probability sampling techniques during data
collection and their research projects.
Factors affecting sample size

1. The research philosophy used. For qualitative research whose purpose is to understand, a
single case can suffice. For quantitative research whose purpose is to generalize findings,
large samples (30 < N) are required.
2. Variable distribution within the population. It can be homogenous, heterogeneous, linear.
3. Number of cases proposed for the sample
4. The nature of the study. Surveys require large samples while case studies need smaller
samples to facilitate detailed analysis
5. The type of sampling used, either probability or non-probability
6. Size of the population and detail required.
DATA PRESENTATION
1. A graph is a diagram showing (a series of one or more points, lines, line segments, curves or
area) that represents the variation of a variable in comparison with that of one or more other
variables.
2. The purpose of graphs is to present data that are too numerous or complicated to be
described in words. Hence we use graphs:
a) for easy visualization (pictogram)
b) to display distribution patterns ( stem and leaf diagram, bar graph)
c) to illustrate relationships between two sets of variables (scatter plot)
d) to save time and space
e) to communicate with the illiterate (pictograms)
f) to compare parts of a whole (pie chart)
3. A graph must have an informative title with the variables on the horizontal and vertical axis
labeled. The horizontal axis must always have the independent variable. Diagrams must not be
too small or too big to distort information.
4. Examples of questions requiring graphs:

Qa) The table shows how a first year student used her $100,00 pocket money during the
orientation week.
Prof. E. Chinamasa
10
Clothes Food Sweets cosmetics Statistics Module
20 15 5 40 10
i) How do you classify these variables; clothes, food, sweets, cosmetics and module?
Justify your answer. [4]
ii) Use your computer to present the data on a justified graph. [9]
Answer
The variables are quantitative and discrete. They can be counted in whole numbers.
The data is presented on a pie-chart. The aim is to show the distribution of the $100,00.
Therefore the appropriate graph is the pie-chart.
How do we draw a pie-chart at a University of Technology?
1. Close this Word Window, and Go to Excel.
2. Enter the Items and expenses in the cells as shown by the lecturer or tutor
3. Highlight the cells
4. Look at your computer tool bar, Click Insert.
5. Select pie-chart and click (the pie-chart is drawn but it has no title)
6. On the tool bar, check for the graph with a provisional space for the title and click
7. Type the title of the graph.
8. Highlight the graph and copy.
9. Save the graph and close the window.
10. Open your word document and Paste the graph as shown below
Prof. E. Chinamasa
11
Qb) The following data shows the marks of 51 students from the Introduction to Statistics
Social class test. 74, 98, 42, 75, 83, 87, 65, 59, 63, 86, 78, 37, 99, 66, 90, 79,
80, 89, 68, 57, 95, 55, 79, 88, 76, 60, 77, 49, 92, 83, 71, 78, 53, 81,
77, 58, 93, 85, 70, 61, 15, 80, 74, 69, 90, 62, 84, 64, 73, 48, 72.
(i) Name two sources of this data. [2]
(ii) Present the data on a justified graph. [10]
(iii) Write the advantages of presenting data on a Stem and Leaf diagram.[4]
(iv) Name two variables from your area of specialization, whose distribution can be
presented on a Stem and Leaf diagram. [4]
Answer
i) Three possible sources of this data are: the lecturer’s mark sheet, students’ scripts (by
document analysis) and students’ themselves (by interviews through surveys)
ii) Data is presented on a Stem and Leaf diagram to show the distribution and maintain
the identity of each entry.
How do we draw the Stem and Leaf Diagram?

1) Identify the Stem (In this case, tens because the data is made up of two digit values)
Prof. E. Chinamasa
12
2) Record each unit under the Leaf column, against the appropriate stem like this:
Stem Leaf
1 5
2
3 7
4 2 9 8
5 9 7 5 3 8
6 5 3 6 8 0 1 9 2 4
7 4 5 8 9 9 6 7 1 8 7 0 4 3 2
8 3 7 6 0 9 8 3 1 5 0 4
9 8 9 0 5 2 3 0
3) Now draw a second diagram with the units (leaves) arranged in order of size, to produce this
second graph which you submit for marking.
Stem Leaf
1 5
2
3 7
4 2 8 9
5 3 5 7 8 9
6 0 1 2 3 4 5 6 8 9
7 0 1 2 3 4 4 5 6 7 7 8 8 9 9
8 0 0 1 3 3 4 5 6 7 8 9
9 0 0 2 3 5 8 9
4) Provide a key: 7 6 = 76. (which reads: 7 vertical line 6 equals 76)
iii) Advantages of presenting data on a Stem and Leaf diagram.
1. The data is arranged in order from 15 to 99.
Prof. E. Chinamasa
13
2. It shows the distribution of the variables (marks). We can see that, the majority of
the students got around 70s. That is the longest line.
3. There is one outlier case of 15%
4. The mark distribution is negatively skewed. They decrease gradually backwards.
The Skewness of a distribution can be shown by the Box and Whisker plot. How do we
construct the box and whisker plot?
Identify the Quartiles in the distribution. Divide the distribution in to 4 equal parts
(quarters). In this case Q1 = 62, Q2 = 75 and Q3 = 84.
5. Draw a horizontal line to cover the whole distribution. From 10 to 100.
6. Mark the quartiles and draw the box
7. Extent the whiskers to the lowest entry 37 and show the outlier 15 by a dot.
Extent also the upper whisker to 99.
Students Mark distribution

Q1 Q2 Q3
.
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
We now focus on the box. For any distribution, one of these three boxes will arise.
Q1 Q2 Q3 When Q2 – Q1 > Q3 – Q2, the distribution is negatively
Skewed.
Q1 Q2 Q3 When Q2 –Q1 < Q3 – Q2, the distribution is positively
Skewed.
Q1 Q2 Q3 When Q2 – Q1 = Q3 –Q2, the distribution is normal.
What is the distribution shown by the box and whisker plot for the students’ mark distribution
above?
Prof. E. Chinamasa
14
Qc) The table shows the number of students in each of the programs, who were registered
for Introduction to Statistics, Social Class in December 2016.
Program BSF BSHT BSCAD BSIT BSWSM BSTR

A
Number of Candidates 21 53 55 77 16 30
i) Show the distribution of this data on a justified graph. [11]

ii) What sampling method was used to get this data? [2]
iii) Describe the distribution. [3]
Answer
i).Number of students is a discrete random variable. Its’ distribution is shown on a bar graph.
The aim of presenting this data (number of students) is to compare a single variable distribution
in a group, which justifies use of bar graph.
ii).Since the data shows everybody who registered for Introduction to Statistics in the Social
Class, A census was used.
ii) The majority of the students registered for Introduction to Statistics in the Social Class are
from BSIT. This is followed by BSCAD. The number of students is from BSWSM.
Prof. E. Chinamasa
15
Qd) The table shows the average rainfall recorded at a weather station in Chinhoyi.
Month Oct 2016 Nov 2016 Dec 2016 Jan 2017 Feb 2017 March 2017
Rainfall (mm) 30 72 45 70 92 53
i) Present the data on a justified graph. [7]

ii) Describe the trends. [3]
iii) Suggest a data collection method for this data.[3]
Answer
i) We intent to describe trends (changes over time) for a single variable (rainfall). We
use a line graph.
ii) Rainfall was lowest in October 2016 (30mm). It increased from October (30mm) to
November (72mm) the dropped in December to (45mm). From December rainfall
increased to its pick in February (90mm) then dropped again in March (50mm).
iii) Data was collected first by experimental methods. Daily rainfall was measured and
the average for each month calculated. These records were kept as the given data.
Prof. E. Chinamasa
16
Qe) The table shows the speed of cars approaching robots at the intersection of Hebert
Chitepo and Harare/Chirundu road in Chinhoyi.
Speed (km/h) 15 to 20 21 to 30 31 to 45 46 to 50 51 to 75
Number of Cars 9 20 60 10 25
i) Name the method and instrument used to collect this data. [2]
ii) Identify three factors which justifies the data collection method you named in (i).
[3]
iii) Present the data on a justified graph.[2+8]
iv) Name five differences between a bar graph and a frequency density graph.[5]
Answer
i) Data was collected by observation. The instrument used is a speed detector.
ii) Observation is the data collection method because; vehicles have no language, the
variable indicator (speed) is action and drivers may not say the truth about the speed
of their vehicles.
iii) The variable (speed) is quantitative and continuous. Continuous variables are
presented on a histogram (which shows distribution of continuous variables).
Prof. E. Chinamasa
17
Bar graph Histogram or Frequency Density graph
Used for single discrete variables Used for single continuous variables
Vertical axis, has frequency Vertical axis, has frequency density
Bars have equal widths Bars have unequal class widths
Bars are separated to show discreteness Bars are joint to show continuity
Height of each bar is equal to frequency Height of each bar corresponds to

frequency density
Qf) The table shows the time a that a sample of students spend studying Introduction to
Statistics and the marks that they got in the In-Class test 2.
Student A B C D E F G H I
Time (h) 6 3 8 7 5 2 9 4 7,5
Mark 65 21 63 80 55 14 84 50 68
i) Why are letters and not names used in this study? [2]
ii) Show the relationship between time spend preparing for the in-class and mark
attainted on a justified graph. [12]
iii) Describe the relationship between the variables.[2]
Answer
i) Letters are used instead of students’ names to protect the individual students. This is
an ethical observation in research.
ii) The appropriate graph is a scatter plot. It is used to show relationships between two
variables (time) and (mark) which have a linear dependency.
Prof. E. Chinamasa
18
iii) There is a positive co-relationship between the time spend studying and the mark a
student would get. As the time spend reading increase the mark will also increase.
STUDENT ACTIVITY 1
Q1. The ages of people who came for an HIV-test on Father’s day are recorded below:
24, 91, 32, 56, 59, 36, 68, 76, 23, 40, 54, 32, 60, 55, 40, 31, 30, 25, 28, 30,
36, 79, 76, 54, 37, 66, 56, 24, 74, 72, 69, 21, 30, 57, 70, 43, 53, 35, 60,
65, 42, 26, 54, 76, 70, 21, 38, 76,
a) Present these ages on a stem and leaf diagram then box-plot

b) Describe the findings
c) Copy and complete this table of grouped data for the same ages:
Age Group 21 to 35 36 to 51 52 to 61 62 to 91
Frequency
d) Present the grouped data on a Frequency Density graph
Q2. The figures below represent the number of cattle owned by farmers in Shangwe
Resettlement area.
70, 81, 17, 107, 69, 50, 62, 57, 69, 99, 30, 38, 89, 77, 55, 68, 19, 20, 89, 40,
27, 35, 84, 47, 35, 20, 83, 67, 20, 35, 54, 72, 53, 48, 35, 31, 22, 63, 75, 76, 68,
56, 48, 34, 49, 35, 49.
Prof. E. Chinamasa
19
(a) Present the data on a Stem and Leaf diagram

(b) Draw a box and whisker plot for the data
(c) Describe the findings
Q3.The salaries of workers at White Wheat Farm are:

130, 120, 132, 110, 112, 153, 165, 175, 120, 114, 115, 197, 126, 153, 129,
140, 162, 132, 172, 113, 146, 118, 135, 173, 129, 154, 132, 172, 156, 144,
159, 163, 118, 117, 125, 195, 176, 134, 123, 145, 132, 149, 165, 132, 144.
a) Suggest two methods of collecting this data.
b) Present the data on a Stem and Leaf diagram.
c) Construct a box and whisker plot
d) Describe the salary distribution
Q4. The table shows the sales of school uniforms and blankets from a Shop in 2015.
Month Jan Feb March April May June July Aug Sept Oct Nov Dec
Blankets 10 13 20 26 30 21 10 8 7 8 9 8
Uniforms 38 20 11 30 26 15 12 5 28 26 20 27
a) Use Excel to show the Sales trends for each commodity.

b) Describe the sales trends.
c) Suggest a stocking policy for the shop, based on the findings.
Q5. A student spent a day doing the following:

Lectures 9h, Eating 1h, Tutorials 1.5h, Drinking 2h, Library 4h, Sleeping 6.5h.
a) Use Excel to present the data on a pie-chart

b) Write three advantages of presenting data on a pie-chart.
c) Write three disadvantages of using pictograms.
MEASURES of CENTRAL TENDENCY

1.Measures of central tendency (mean, mode and median) show us how the variables are grouped
within the population.
2. Their relationship is used to determine the skewness of the distribution. i.e.
a) if Mean = Mode = Median, the distribution is normal.
b) if Mean < Median < Mode, the variable is negatively skewed
Prof. E. Chinamasa
20
c) if Mode < Median < Mean, the variable is positively skewed
STUDENT ACTIVITY 2
1.Use your calculator to find the following: Mean, Mode, standard deviation and Median of
Questions 1 to 4 in student’s activity 1. Use them to describe the distributions.
2. The variance and standard deviation show us how the variables are spread within the
3( Mean  Median)
population. Hence Pearson’s Coefficiant of Skewness =
S tan darddeviat ion
3. The mean, mode, median and standard deviation are presented here because they help us to
describe in numerical terms, the distribution of the variables within the population.
STUDENT ACTIVITY 3
Q1. The speed of cars at a certain section of a curved road are recorded in the Table 1
Speed (km/h) 10-20 21-25 26-40 41-60

Number of Cars 11 20 15 10
a) Present the data on a frequency-density graph.
b) Calculate and interpret the (i) Mean (ii) Mode (iii) Median (iv) Standard Deviation
Q2. The height of tobacco plants on a seedbed after two-weeks of germination is presented
Height (cm) 2-5 6-10 11-13 14-20

Frequency 2 10 12 7
a) Draw a histogram
(a) Calculate and interpret the (i) Mean (ii) Mode (iii) Median (iv) Standard Deviation
Q3. The time taken by infancy at a pre-school, to solve a task in the form of a model was
recorded in table:
Time (min) 2–7 8 – 10 11 – 13 14 – 20

Frequency 18 3 12 7
a) Suggest and justify one method of collecting this data [3]

b) Present the data on a justified graph [8]
c) Calculate and interpret the (i). Mean [4]
(ii) Mode [5] (iii) Median [5] and standard deviation [8] of this distribution.
Prof. E. Chinamasa
21
Q4.The weights of calves delivered at a dairy farm in May are recorded in Table:
Weight (kg) 10-15 16-25 26-30 31-50

Frequency 6 20 15 10
a) Draw a histogram
b) Calculate and interpret the:
(i) Mean (ii) Mode (iii) Median (iv) Standard Deviation
Q5. The time taken by Fine Art students to paint a still life composition is presented in the Table:
Time (min) 2-7 8-10 11-13 14-25

(a) Present the data on a frequency density graph [7]

(b) Calculate the:
(i) Mean [4]
(ii) Mode [3]
(iii) Standard Deviation [6]
Q6. The ages of a sample of students from the Masters in Strategic Studies Group were recorded
In Table:
Ages in years 25-30 31-40 41-45 46-60

(a) Draw a frequency density graph

(b) Calculate and interpret the:
(i) Mean (ii) Mode (iii) Median (iv) Standard Deviation
ANSWERS TO QUESTIONS
Q1 (a)
Speed 10-20 21-25 26-40 41-60

Class Width 11 5 15 20
Frequency 1 4 1 0,5
Density
Prof. E. Chinamasa
22
Speed 10-20 21-25 26-40 41-60

Class Mid-Pt 15 23 33 50,5
Mean
= 29,017
29km/h. The average speed of cars on this section of the road is 29 km/h
(ii) Modal class is the class with highest frequency density.

Modal Class is
Mode, Mo = +
Prof. E. Chinamasa
23
20, 5 +
= 20, 5 +
= 20, 5 + 3, 2
= 23,7km/h. The majority of cars were travelling at 23,7km/h.

(iii) Median class contains the central value
Speed km/h 10-20 21-25 26-40 41-60

F
0 1 31 46 56
1
( ) = 28 is in this median class
Median class is (20,5 – 25, 5)
Median, = + Where n = 56.
Median, = 20,5 +
Me = 20, 5 +
Me = 20, 5 + 4, 25
= 24, 75km/h. Half of the cars were running at 24, 75km/h.
(iv) Sample variance, Variance, = ( – )
Where is the class mid-point
is frequency of each class, is total frequency
Prof. E. Chinamasa
24
= 56 = 1 625 = 54 892,5
Variance, = ( – )
= (54 892, 5 –
= 140.7
The distribution variance, S2 = 140,7
Standard Deviation s =
S=
S = 11, 86
S = 11, 9
The general speed of the cars differed by 11,9 km/h
Q2 (a)
Height (cm) 2-5 6-10 11-13 14-20

Number of plants 2 10 12 7
Class Width 4 5 3 7
Frequency Density 0,5 2 4 1
Prof. E. Chinamasa
25
Height (cm) 2-5 6-10 11-13 14-20

Frequency 2 10 12 7
Class Mid Point 3.5 8 12 17
(i) Mean, =
= 11.29. The average height of tobacco plants is 11.3cm
Q2b (ii) Modal Class is
Mode, = + +
Prof. E. Chinamasa
26
= 10, 5 +
= 10, 5 + 0, 857
The majority of plants were 11, 4 cm.
(iii)
Height 2-5 6-10 11-13 14-20

Frequency 2 10 12 7
F
2 1 2 3
0 2 4 1
( ) = 15, 5 is the median class
The median class is (10, 5 – 13, 5)
Median, = +
= 10,5 +
= 10, 5 +
= 10, 5 + 0,875
= 11,375
11, 4 cm
Half of the plants were 11,4cm high.
Prof. E. Chinamasa
27
3,5 2 7 24,5
8 10 80 640
12 12 144 1728
17 7 119 2023
= 31 = 350 = 4 415,5
Variance, = ( – )
= (4 415,5 – )
1
= (4 63,89)
30
= 15,46
Standard Deviation s =
S=
S = 3,93
The general differences between the plants is 3,93cm
Q3 (a) Data was collected by observation

 Infants have no skill to read time and no language to describe the time
 The variable indicator is behavior or action
(b) Data is presented on a histogram because time is a continuous variable.

Real Class limits 1,5 7,5 10,5 13,5 20,5
Time (min) 2–7 8 – 10 11 – 13 14 – 20
Frequency 18 3 12 7
Class Width 6 3 3 7
Frequency Density 3 1 4 1
Class Mid-Point 4,5 9 12 17
Prof. E. Chinamasa
28
Q3 c (i) Mean, =
= 9,275
The average time taken by infants to complete the task is 9,275 min
(ii) The modal class is (10, 5 – 13, 5)
Mode, = +
= 10, 5 + 1, 9286
The majority of infants completed the task in 12,4 minutes.
(iii) Median class is (7, 5 – 10, 5)
Prof. E. Chinamasa
29
Median = +
= 7, 5 + 3( )
= 7, 5 + 2
= 9, 5
Half of the infants completed the task in less than 9, 5 minutes.
4,5 18 81 364,5
9 3 27 243
12 12 144 1 728
17 7 119 2 023
2
= 40 = 371 = 4 358,5
(iv) Variance = ( –
= (
= (
= (197, 48)
= 23,525
Variance = 23,525
Standard Deviation, s =
S=
S = 4.85
The time taken by infants differed by 4, 85 minutes.
Prof. E. Chinamasa
30
ANSWERS
Q4. (a) = 26,3kg (b) = 22,2kg (c) = 26 kg (d) s = 8,8 kg
Q5. (a) = 10,8min (b) = 13,5min (c) = 10,88min (d) s = 6,12 min
Q6. (a) = 39,2 years (b) = 40,5years (c) = 39,75years (d) s = 9 years
Prof. E. Chinamasa

2021 CUMT 105 Module 1

Uploaded by

Copyright:

Available Formats

You might also like

2021 CUMT 105 Module 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2021 CUMT 105 Module 1

Uploaded by

Copyright:

Available Formats

1

SOCIAL SCIENCES CLASS

COURSE: Introduction to Statistics and Data Analysis

DURATION: January– March 2021, SEMESTER (48 HOURS)

LECTURER: Prof. E. Chinamasa

Introduction to Statistics course, aims to introduce statistical concepts at university level.

Week Content Notes

2 Data collection and Sampling

3 Data presentation Assignment 1, Due: May 28

4 Measures of dispersion/Scatter In-class One last lecture

5 Linear Regression analysis

Linear Regression analysis

6 Tree diagram Assignment 2, due end of

7 Nor mal distribution In-class 2. 09/08/2021

Concept of Hypothesis testing

8 Test for mean, small samples

EXAM, EXAM, Exams please!!!!!!!

Crawshaw J. and Chambers, J. (2001) Advanced Level STATISTICS. Cheltenham: Nelson

Jothikumas, J. (2005). STATISTICS: Higher Secondary – First Year.

Mc Grattan, E. R. (2007). Lecture notes: Quantitative Methods.[O].

Morris, C. (1993). Quantitative Approaches in Business Studies. London: Pitman.

Rao, A. (2003). Quantitative Techniques in Business. New Delhi: Jenico.

Valerie, P. (2006). Quantitative techniques in Business.[o] www.shsu.educ/mgt-

DATA COLLECTION METHODS

c)Documentary Analysis: Is the collection of data from documents. Data is collected to

3. Probability Sampling Methods

Probability sampling is appropriate when:

3. Depends on the level of the individual Stratified 1. Apply proportional

2. Apply simple random

Factors affecting sample size

4. Examples of questions requiring graphs:

Clothes Food Sweets cosmetics Statistics Module

How do we draw the Stem and Leaf Diagram?

Students Mark distribution

Program BSF BSHT BSCAD BSIT BSWSM BSTR

i) Show the distribution of this data on a justified graph. [11]

i) Present the data on a justified graph. [7]

Bar graph Histogram or Frequency Density graph

Vertical axis, has frequency Vertical axis, has frequency density

Bars have equal widths Bars have unequal class widths

Height of each bar is equal to frequency Height of each bar corresponds to

Time (h) 6 3 8 7 5 2 9 4 7,5

a) Present these ages on a stem and leaf diagram then box-plot

d) Present the grouped data on a Frequency Density graph

(a) Present the data on a Stem and Leaf diagram

Q3.The salaries of workers at White Wheat Farm are:

a) Use Excel to show the Sales trends for each commodity.

Q5. A student spent a day doing the following:

a) Use Excel to present the data on a pie-chart

MEASURES of CENTRAL TENDENCY

within the population.

2. Their relationship is used to determine the skewness of the distribution. i.e.

a) if Mean = Mode = Median, the distribution is normal.

b) if Mean < Median < Mode, the variable is negatively skewed

c) if Mode < Median < Mean, the variable is positively skewed

Speed (km/h) 10-20 21-25 26-40 41-60

Height (cm) 2-5 6-10 11-13 14-20

Time (min) 2–7 8 – 10 11 – 13 14 – 20