Lect.5

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 16

Data Analysis

In their effort to understand the phenomenon


of interest, researchers may analyse:

•One variable related to that phenomenon or

•May analyse the relationship between two


variables or

•Between more than two variables.


•Where analysis involves one variable (like age
of the respondents, education level etc) it is
known as Univariate Data Analysis

•Where the analysis involves the relationships


between two variables, it is known as
Bivariate Data Analysis and

•Where the relationship between more than


two variables is analysed, it is Multivariate
data analysis
Univariate Data Analysis

• In Univariate Data Analysis we Identify the Central


Value and Measure the way other values group
around or Disperse from it

• So we have Measures of Central Tendency which


include Mode, Median and Mean together with

• Measures of Dispersion grouped into:


• Measures of Dispersion in relation to Median and
• Measures of Dispersion in relation to Mean
Measures of Central Tendency
1. Mode: Is the most frequently occurring value in a
distribution

•In grouped data, the interval with highest frequency is called


the Modal Class

•Two types of Distribution can be distinguished basing on


Mode: Uni-modal Distributions and Bi-Modal Distributions

•Uni-modal Distributions are those distributions with only one


mode while Bi-modal distributions are those with two modes
2. Median: Is the central value in a series of
ordered values, i.e. the middle figure of the
range of observations ranked in terms of
magnitude

•If the data array has an odd number – Median


is the central value

•If it has even number, median is the mid point


between the two middle values
3. Mean (also known as Arithmetic Mean or Average)

Computation of Mean using individual values:


• Sum-up all individual values and divide the product by the number of values

• Formula for computation of the mean using individual values

Where: is the Mean


X is the individual values making up the series of data
n is the number of occurrences/ values
∑ Summation notation
• Table 1: Calculate the Arithmetic Mean of the
following hypothetical data
Month Number of Vehicles
January 215
February 431
March 690
April 745
May 419
June 226
July 338
August 642
September 169
October 912
November 91
December 402
Computation of Mean using Grouped Data:

• Arithmetic mean for grouped data is computed basing on


mid points

• The following formula is used:

(NB. Other equations for mean in grouped data may also be


available in literature and can be used)
Procedures:
1. Enter class mid points for each class

2. Estimate visually the class in which actual mean is likely to be


(This is called the assumed mean and is denoted Xo)

3. Enter frequency (f) for each class

4. Then, calculate the deviation (d) in the number of classes from


the assumed mean and assign -ve values for classes less than the
assumed mean and +ve for the classes greater than

5. Multiply f by the d for each class

6. Sum the results in number 3 & 5

7. Use the formula above.


Please Note: ‘C’’ is the class interval
Example:
•Data in Table 1 above were grouped as shown in Table 2

Table 2
Number of Vehicles Frequency
1-200 2
201-400 3
401-600 3
601-800 3
801-1000 1

Calculate the Mean.


• Procedures:
1.Insert class mid-points
2.Estimate the assumed mean
3.Enter the Frequency for each class (we have them already)
4.Calculate the deviation (d) in the number of classes from the assumed mean
5.Multiply f by the d for each class
6.Sum the results in number 3 & 5

Number of Frequen
Vehicles cy
1-200 2
201-400 3
401-600 3
601-800 3
801-1000 1
• Now back to the Formula

• Inserting the data to the Formula:

= 500.5 + 200 x -2/12

= 500.5 + (-400/12)

=500.5 – 33.3

= 467.2
Efficacy/ Differences/ Weaknesses of the
Three measures of Central Tendency
Discussed
Mode:
•Where a distribution has more than one Mode
(Bi-Modal Distribution) decision on what exactly
is a mode becomes difficult

•The mode does not possess true mathematical


qualities

•Thus, it is the least recommended measure in


statistical analysis.
The Median:
•Uses relative positions of values in the distribution

•Every value is given the same weight in calculation of


the Median

•However, the magnitude of individual value is not


important

•Median, just like the Mode, has no real mathematical


qualities

•Hence, not strongly recommended in statistical analy


The Mean:

•All values are used in computation of the


mean, as such, it uses more information than
mode and median

•Weight is given to each value according to its


magnitude

•Has much better mathematical qualities


However, Mean is not without some disadv.s:

• Mean is not good measure for highly skewed


distributions b’se it tends to stress the extreme
values. I.e, it does not give accurate impression
of the location of the majority of the values.

Eg. Find the mean of the following data and see how it is
pulled towards the extreme value
10, 20, 30, 40, 1000

•In highly skewed distributions, median is much


better measure of central tendency

You might also like