Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

RESEARCH METHODOLOGY:

DATA ANALYSIS – DESCRIPTIVE


STATISTICS

F. M. Kapepiso
Learning objectives

At the end of the lecture, you should be able to:


• Explain the difference between descriptive and
inferential statistics
• Calculate basic descriptive statistics
• Explain the importance of knowing the levels of
measurement in order to choose an appropriate
statistical technique
Introduction
Once the data collection has been completed, an in-depth analysis
of the data is made. Data set generated may consist of -
• Variables and Attributes
• Quantitative Data and Qualitative Data
• A variable is a feature or an item under observation, different
states of which can be measured and expressed in terms of
quantitative values.
– A variable represents certain characteristic feature of the data. For analysis
a quantitative classification is done of observations in respect of each
characteristic features.
• Attributes is a feature of research subject under observation that
is not amenable to measurement and quantification as such.
– Examples of such a non quantifiable feature are - colour of hair, quality of a
product, health condition of a person, intelligence level of a group, etc.
– Such features are quantified using definitions for each condition of the
attribute under observation.
Data Analysis - to precede by
• Data Presentation which includes –
1. Editing, i.e., making data wholesome by removing errors and
omissions.
2. Coding, i.e., converting responses into numerals and symbols.
2. Classification, i.e., making Simple, Discrete and Continuous
Series
3. Tabulation i.e., making one dimensional, two dimensional and
multi dimensional tables
Objectives of Data Analysis
• Simplification and Summarization
• Draw conclusions
• Facilitate Comparison
• Usable for Forecasting and Estimation
• Suitable for Decision Making
• Useful for Planning and Policy Making
Nature of Data Analysis
(a) Descriptive Analysis – are concerned with the
description and or summary of data obtained for a
group of individual units of analysis.
1. Univariate – One variable
2. Bivariate – Two variables
3. Multi variate – Three or more variables
(b) Inferential Analysis – they allow to draw inferences
about a large populations by collecting data on
relatively small sample
1. Parametric Tests – For higher quality data based on
probability sampling
2. Non Parametric Tests – For lower quality limited scope data
based on non probability sampling
Descriptive Analysis
As their name implies, descriptive statistics describe a body of
data and allow some basic questions to be answered, for
example:
• What is the range of the data? This refer to the maximum and
minimum value.
• What is the central point of the data? This can be determined
in terms of the mean, the median or the mode.
Descriptive Analysis - Techniques
1. Univariate Analysis -
Measures of Central Tendency – mean, median, mode
Measures of Variability – Range, Mean deviation, Standard Deviation
Measures of Spread – Skewness, Kurtosis, Moments
2. Bivariate Analysis -
Measures of Relationship – Correlation, Association
Measures for Forecasting and Estimation – Regression,
Time Series Analysis -Trend, Seasonal and Cyclical Variations
3. Multivariate Analysis –
Multivariate Correlation, Partial Correlation
Multiple Regression,
Discriminant Analysis,
Factor Analysis.
Conjoint Analysis,
Cluster Analysis,
Decomposition Analysis, etc.
Arithmetic Mean
• Arithmetic mean is a value obtained by dividing the sum of all
the values of an item in a series by the number of items.
• For example, if there are five students in a class with the age
13, 15, 12, 9 and 11 years respectively, find the mean.
• For calculation of arithmetic mean first the values of all the
items are added together. Then the sum of all the values is
divided by the number of items.
• Thus,
X1 + X2 + X3 + … Xn X
X = --------------------------- or = ----
n n
Where, X = arithmetic mean
X = Sum of values of all the items [X1 + X2 + X3 +… Xn]
n = Number of items
Mode
• Mode is an item that occurs most frequently in a series.
• It is the item with maximum frequency or more precisely the
item with maximum concentration of frequencies.
• Mode represents the most common item of a series.
Example:
Locate the mode in the following data:
6, 7, 10, 8, 9, 9, 8, 8
Median
• Median is the value of the central item of a series when
these are arranged in ascending or descending order of
magnitude.
• Median refers to the position of a value in the series.
• For example, there are 5 students in a class with their
respective height as 60", 61", 65", 64" and 63". Find the
median height?
• Median divides a series in two equal parts.
• If the number of items in the series is an odd number, the
median is exactly at the center with an equal number of items
on either side. If the number of items is even median value
shall lie between the two central items.
• For example, if there are six students with their respective
height as 60", 61", 65", 64", 67" and 63". Find the median
height of the group?
Features of Ideal Measure of Central Tendency
1. Rigidly defined
2. Easy to calculate
3. Based on all observations
4. Suitable for further mathematical treatment
5. Not affected by fluctuations of sampling
6. Not affected by extreme values
However, the choice depends on nature of data
Measures of Variation

1. Nature of Measure
Absolute Measure
Relative Measure
2. Different Measures of Variation
1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation
5. Variance
Range
Range is the absolute difference between the values of the largest
(maximum) and the smallest (minimum) items of a series. For
example if there are 30 students in a class and the height of the
tallest and shortest amongst the students is 170 cms. and 145 cms.
Respectively, the range of their heights would be 170 – 145 = 25
cms.
Range is measured as
Range R = m1 – m0
Where R = Range
M1 = maximum value among the items
M0 = minimum value among the items
The relative measure of range is known as coefficient of range. The
coefficient of range is measured by dividing the difference between
the
maximum and the minimum value by the sum of the two values.
m1 – m0
Coefficient of Range = -------------
m1 + m0
Standard Deviation
Standard deviation is square-root of arithmetic average of squared deviations
taken from arithmetic average. It is a most widely used measure of
dispersion. It represent the average distance that the data values vary from
the mean. As per Karl Pearson's method it is
calculated as
 (x - x)2
Standard deviation () = [-------------]
N-1
Relative measure of standard deviation is known as coefficient of
standard deviation. It is calculated as

Coefficient of standard deviation = ----
x
For example, calculate the standard deviation given the following:
Sample A B C D E F G H I J K L M
Value (days) 40 44 40 38 41 45 45 40 42 46 41 42 43
Variance, Coefficient of Variation

(x -x)2
Variance ()2 = [-----------]
n-1

Coefficient of Variation = ---- x 100
x
Data Analysis & Measurement
• Statistical technique is used as per the requirement of data
• Data may be based on one or more of the following
measurements (refer to measurement notes)-
1. Nominal Scale Measurements
2. Ordinal Scale Measurements
3. Cardinal Scale Measurements
4. Ratio Scale Measurements
Nominal Scale Measurements
• Mode – For central tendency
• Correlation – taking Yes (1) and No (0)
• Chi square test – For homogeneity and independence
• Graphs – bar and pie charts
Ordinal Scale Measurements
• Median – for central tendency
• Correlation – Spearman’s correlation using ranks
• Tests of significance
• Chi square test for homogeneity and independence
• Wilcoxon matched pairs signed rank test
• Mann- Whitney u test
• Kruskal Wallis test – in case of more than two variables
• Discriminant Analysis – for classification
Cardinal (Interval scaled) measurements
• Mean – for central tendency
• Karl Pearson’s Standard deviation
• Karl Pearson’s Coefficient of Correlation
• Tests of significance
Chi square test for attributes
t test for small samples
Z test for two or more variables
ANOVA – Analysis of Variance for more than two
variables
• Discriminant Analysis – for classification
Ratio Scale Measurements

• All measures as applicable in case of cardinal measurements.


Limitations of Data Analysis
1. Statistical measures are basically designed for situations/aspects, which are
amenable to quantitative expression and treatment. Although at times qualitative
attributes are also dealt with in research and as such may be required to be
subjected to statistical analysis, but in that case these of necessity have to be
transferred into quantities by adopting certain definitions.
2. The statistical results are not final 'truth' but represent averages of the aggregates.
These pathetically ignore the individual intricacies. Such neglect may result in
superficial findings.
3. The quantities certainly provide a useful indicator and serve a very useful purpose
but these are greatly influenced by subjective factors outside the ambit of
mathematical treatment. No valid conclusions can be arrived at on their basis in
such cases.
4. Quantitative information can not prevent their misuse and many a time for their
own sake the researchers may be tempted to twist them to suit their fond
conclusions or hypothesis.
5. The validity of certain statistical methods depends upon the nature of data, the
level of measurement, the knowledge of pertinent aspects of the situation and
assumptions taken. To the extent any of the above is deficient the conclusions
may be also wrong.
6. The users of statistical analysis techniques should therefore, view the results in this
light and draw conclusions accordingly. No false expectations should be made
from quantitative techniques as such.
7. They are only a support to be used with full awareness of their strengths as well as
limitations.
Thanks

You might also like