Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Quantitative Analysis:

Descriptive Statistics
Akanksha Patra (TF, RMSSD)
Types of Quantitative Analysis
1. Univariate Analysis: analysis of a single variable ; serves the purpose
of description
Eg: frequency distributions, measures of central tendency & dispersion

2. Bivariate Analysis: analysis of 2 variables simultaneously (one DV,


one IV). Used to study the relationship between the 2 variables
Eg: percentages, correlation, linear regression

3. Multivariate Analysis: analysis of more than 2 variables


simultaneously. The data has one DV and 2 IVs
Eg: factor analysis, cluster analysis, multiple regression
What can you tell about this data set?
• Scores of 11 students from an in-class quiz:

• 9, 10, 15, 6, 8, 19, 13, 12, 10, 9, 14


Measures of Central Tendency: Mean
• The most common type of average computed. It is simply the sum of
all the values in a group, divided by the number of values in the
group.

• Example:
The scores of a student enrolled in RMSSD’23 over 7 in-class quizzes
are as follows:
12, 15, 10, 17, 17, 13, 18
What is the mean?
Weighted Mean
• For a larger data set, it is easier to summarize data in a frequency
table before calculating the mean. In this case, you need to weigh
each possible value by the frequency of the value to calculate the
total.
• Example: Consider the following scores by students on in-class quiz 3.
What is the average performance of the class?
Scores No. of Students
0 8
5 12
7 14
8 11
9 9
10 14
12 9
Measures of Central Tendency: Mean
• The mean is sometimes represented by the letter M and is also called
the typical, average, or most central score.
• A small n represents the sample size for which the mean is being
computed. A large N would represent the population size.
• The sample mean is the measure of central tendency that most
accurately reflects the population mean.
• It is very sensitive to extreme scores. An extreme score can pull the
mean in one or the other direction and make it less representative of
the set of scores and less useful as a measure of central tendency.
Measures of Central Tendency: Median
• This is also an average of a different kind. It is the midpoint in a set of
scores.

• Example:
The scores of a student enrolled in RMSSD’23 over in-class quiz 2 are as
follows:
7, 12, 15, 10, 17, 19, 13, 18, 16, 9, 5
What is the median?
Percentile Points
• Percentile points are used to define the percentage of cases equal to
and below a certain point in a distribution or set of scores.
• For example, if a score is at the 75th percentile, it means the score is
at or above 75% if the other scores in the distribution.
• The median is the 50th percentile.
• The 25th percentile is Q1, the 75th is Q3
• What is Q2?
Measures of Central Tendency: Median
• Why use the median instead of the mean? It is insensitive to extreme
scores, also known as outliers.
• When you have a set of scores in which one or more scores are
extreme, the median better represents the centremost value of that
set of scores than any other measure of central tendency.
• Example: Consider the list of five incomes:
• Rs. 135,456 Rs. 54,365 Rs. 37,668 Rs. 32,456 Rs. 25,500
Measures of Central Tendency: Median
• Rs. 135,456 Rs. 54,365 Rs. 37,668 Rs. 32,456 Rs. 25,500
• The mean of the set of five scores you see above is the sum of the set
of five divided by 5, which turns out to be Rs. 57,089. On the other
hand, the median for this set of five scores is Rs. 37,668. Which is
more representative of the group?

• Social and Economic Variables such as Income are often reported


using Median as a measure of Central Tendency.
• Too many extreme scores can skew or distort the distribution of
scores.
Measures of Central Tendency: Mode
• It is the most general and least precise measure of central tendency,
but plays an important part in understanding the characteristics of a
set of scores.
• It is the value that occurs most frequently.
• For example: the examination of the political party affiliation of 300
people might result in the following distribution of scores:
Party Affiliation Frequency
BJP 140
INDI Alliance 90
Others 70
Measures of Central Tendency: Mode
• Can a distribution be multimodal?
• If more than one value appears with equal frequency, the distribution
is multimodal.
Hair Color Frequency
Black 45
Brown 45
Red 7
Blonde 3

• When dealing with a large set of data points, it is usually unlikely, but
possible. Are the categories mutually exclusive?
When to Use What Measure of Central
Tendency
• Depends on the type of data you are describing. Remember the
various scales of measurement? Nominal, Ordinal, Interval, Ratio
• A measure of central tendency for qualitative, categorical, or nominal
data (such as gender, voting preference, hair color) can be described
using Mode.
• Median and Mean are best used with quantitative data, such as
height, income level (in currency, not categories), age, IQ scores etc.
• Mean is more precise than Median. Median is more precise than the
Mode. This means Mean is the most often used measure of central
tendency.
Measures of Variability or Dispersion
• Consider the following data sets:

• 7, 6, 3, 3, 1
• 3, 4, 4, 5, 4
• 4, 4, 4, 4, 4
Measures of Variability or Dispersion
• Variability (or spread, or dispersion) is a measure of how different
data points or scores are from one another.
• It is even more accurate (and easier) to think of variability as how
different scores are from one particular score. Which one do you
think that is?
• Three measures of variability are used to reflect the degree of
variability: range, standard deviation and variance.
Range
• Range is how far apart scores are from one another. The range is
computed simply by subtracting the lowest score from the highest.

• This is also called the exclusive range. The Inclusive Range is highest
score minus the lowest sore plus 1 (h-l+1).
Standard Deviation
• The most frequently used measure of variability. It represents the
average amount of variability in a set of scores.
• It is the average distance from the mean. The larger the standard
deviation, the larger average distance each data point is from the
mean of the distribution and the more variable the set of scores is.
Standard Deviation
• Consider this data set:
• 5, 8, 5, 4, 6, 7, 8, 8, 3, 6

• Compute the s for it.


Standard Deviation
• Remember, standard deviation is an estimate of the population
standard deviation, and it is an unbiased estimate, but only when we
subtract 1 from n. By subtracting 1 from the denominator, we
artificially force the s to be larger than it would be otherwise.
• The larger the s, the more spread out the values are, and the more
different they are from one another.
• It is also sensitive to extreme scores.
• What does s=0 mean?
• The Standard Deviation is unaffected when a constant is added, but
changes when it is multiplied.
Variance
• The variance, is simply the standard deviation squared.
• Variance is a difficult number to interpret and apply to a set of data,
as it is based on squared deviation scores.
• Standard deviation is stated in the original units from which it was
derived. Variance is stated in units that are squared.
Descriptive Statistics
“Descriptive” = computations that describe characteristics of a sample or the
relationship among variables in a sample
• Used merely for summarizing a data set / set of observations
• Mostly used to describe sample of population, age, gender, income etc. to give
readers an overview of sample

Paths to description:

1. Data reduction-transformation of numerical information derived empirically or


experimentally into a corrected, ordered, and simplified form
2. Measures of association-a wide variety of coefficients (including bivariate
correlation and regression coefficients) that measure the strength and direction
of the relationship between variables; these measures of strength, or
association, can be described in several ways, depending on the analysis.
Applications
• In an article, ‘A survey of World Wide Web Use in middle-aged and
older adults’ Roger Morrell and his colleagues examined Internet use
patterns in 550 adults in several age groups, including middle-aged
(ages 40-59), young-old (60-74), old-old (75-92).
• The survey primarily used descriptive statistics to reach its
conclusion.

You might also like