Quantitative Data Analysis

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Quantitative data analysis

Data analysis
• Quantifying and statistically reducing
• Definition
raw data in order to make
interpretations and conclusions

• Descriptive • Organising, summarising


representative characteristics or values
• Inferential
• Hypothesis testing, making
generalizations or predictions
A major distinction needs to be made
• Categorical – categories which do not inherently
represent a quantity
– Male, female; HIV+, HIV-.
– 10-19; 20-29; 30-39.
• Numerical – representing values
– Discrete – a variable with a finite number of values
between two points e.g. 1, 2, 3 houses. The number
of 2.5 houses is not meaningful.
– Continuous - values can be placed on a continuum.
An infinite number of values between two points.
E.g. weight. An infinite number of values occur
between 2 and 3 gram e.g. 2.001g, 2.1g, 2.5g ect
Why the distinction between discrete
variables and continuous variables?

• In the case of discrete variables (data) we


need to capture the gaps between values
• In the case of continuous variables (data)
we need to represent the continuity
between values of the data
• Respectively a bar-diagram versus a
histogram
FREQUENCY DISTRIBUTIONS
Frequency distribution
• The frequency of scores refers to the
number of time that the given score
appears within a data set
• A frequency distribution is a tabular or
graphical representation of a data set
indicating the set of scores on a variable
together with their frequency
Frequency distribution
Categories of nurses (n=233 responses)
Category Frequency Percentage
Enrolled nurses/assistant 82 35.1 %
Sub-professional
Senior enrolled 54 23.1 %
nurse/assistant
Subtotal 136 58.3%
Professional nurse 35 15.0 %

Senior professional nurse 16 6.8 %


Professional
Chief professional nurse 41 17.6 %

Deputy manager 5 2.1 %

Subtotal 97 41.6 %

Total 233 99.9%


Grouped frequency distributions
• A tabular or a graphical representation of ordinal,
interval or ratio data
• Scores are grouped into class intervals
• Frequencies are given for these intervals
• A class interval is a division or category of scores on a
grouped frequency distribution
• E.g.: Age can be groups into intervals to represent:
neonate, toddler, … adult … aged
• Using an existing convention: A = 80-100%; B = 70-80%
• Using equally spaced ‘arbitrary’ categories:
• Determine the range
• Consider the interval and number of divisions
• Consider the sample size in all of this
• Consider apparent limits (provide for extreme values)
Grouped frequency distribution table
Age distribution of respondents (n=227)
Age Frequency Percentage
20-29 years 111 48.89% %

30-39 years 79 34.80 %

40-49 years 29 12.77 %

50-59 years 3 1.32 %

60 years and more 5 2.20 %

Total: 227 99.98 %


Cumulative frequency

• Cumulative frequency refers to the frequency


of all data items with a value less than or equal
to a specific score
• Cumulative percentage frequency refers to the
percentage of items within a data set that have
a value less than or equal to a specific score
Cumulative frequency table
Age distribution of respondents (n=227) Cumulative
Age Frequency Percentage percentage
20-29 years 111 48.89% 48.89%

30-39 years 79 34.80 % 83.69 %

40-49 years 29 12.77 % 96.46 %

50-59 years 3 1.32 % 97.78 %

60 years and more 5 2.20 % 99.98 %

Total: 227 99.98%


Descriptive Statistics
Classification of statistical data
A. Descriptive and
B. inferential statistics.
Descriptive statistics
These are statistical data that describes and
summarizes data but do not allow reaching to
conclusions about the whole population from
which the sample is selected.
• Summaries of the data are made by use of
charts, tables, and graphs.
Example of descriptive statistics
• Descriptive statistics simplify large amounts of
data in a meaningful way by reducing lots of it
into a summary.
Example
• In a college the average test scores for a
course Biochemistry for incoming students
doesn’t explain anything about why the data is
so or trends that can be seen .
• Inferential statistics these are statistics used
to test a hypothesis, draw conclusions and
make predictions about a whole population,
based on a given sample.
Types of descriptive statistics
Descriptive statistics has 2 main types:
• Measures of Central Tendency (Mean,
Median, and Mode).

• Measures of Dispersion or Variation


(Variance, Standard Deviation, Range).
1. Central Tendency
• Central tendency (also called measures of
location or central location) is a method to
describe what’s typical for a group (set) of
data.

• It explains what is normal or average for a


given set of data. There are three key methods
to show central tendency: mean, mode, and
median.
Examples of Measures of Central
Tendency
a. Mean
This is the average of a given set of numbers. The mean is
calculated in two very easy steps:
1. Find the whole sum as add the data together
2. Divide the sum by the total number of data

E.g. The ladies’ heights in inches: 62, 70, 60, 63, 66.
62+70+60+63+66= 320
Mean=320/5=64 inches
Advantage of the mean
• it can be used to find both continuous and
discrete numerical data

Disadvantage of Mean
• Data must be numerical in order to calculate
the mean.
b. Mode
Mode
It refers to the number in the set that occurs most often
e.g:-

Considering a dataset with the retirement age of 10


people, in years: 55, 55, 55, 56, 56, 57, 58, 58, 59, 60

• Check the frequency of each age: the most common


value is 55. Thus, the mode of this data set is 55
years.
Advantages of mode
Unlike mean and median, it can be calculated for
both numerical and categorical data
c. Median
• This refers to the middle value in a data
set. Its calculated by:
i. listing the data in a numerical order
ii. locating the value in the middle of the list.

E.G.
• The middle number in this data set is 26 as
there are 4 numbers above it and 4 numbers
below: 21, 22, 24, 24, 26, 27, 28, 29, 31.
Advantage of Median
• it is less reflected by outliers and skewed data
than the mean.

• It’s preferred when the data set is not


symmetrical.
2. Measures of Dispersion
• Central tendency tells us important information but it
doesn’t show everything about average values.

• It fails to reveal the extent to which the values of the


individual items vary in a data set.

• Dispersion in statistics describes the spread of the data


values in a given dataset.

• It shows how data is “dispersed” around the mean (the


central value).
Types of measures of dispersion
a. The Range
• The range is simply the difference between the
largest and smallest value in a data set.

• It shows how much variation from the average


exists.

• Range = max. value – min. value


E.g: Group of students A: 56, 58, 60, 62, 64
Group of students B: 40, 50, 60, 70, 80
The range=
• Group A: 64 – 56 = 8
• Group B: 80 – 40 = 40

Thus, the data values in Group A are much closer to the


mean than the ones in Group B.

Disadvantage of the Range:


It only provides information about the minimum and
maximum of the data set.
b. The Standard Deviation

• It provides information on how much variation from


the mean exists.

• Unlike range, it also shows how each value in a


dataset varies from the mean.

• As in the Range, a low standard deviation tells us that


the data points are very close to the mean.

• And a high standard deviation shows the opposite.


C. Variance is a measure of how spread out a
data set is.

• It is calculated as the average squared


deviation of each number from the mean of a
data set.
D. The interquartile range (IQR)

• It REFERS TO a measure of variability, based on dividing a


data set into quartiles.

• Quartiles divide a rank-ordered data set into four equal


parts.

• The values that divide each part are called the first, second,
and third quartiles; and they are denoted by Q1, Q2, and
Q3, respectively.
Calculating the IQR
1.Find the median of the lower and upper half
of your data. The median is the "midpoint," or
the number that is halfway into a set. ...
2.Subtract Q3 - Q1 to determine the IQR.
NB: You GET TO know how many numbers lie
between the 25th percentile and the 75th
percentile.
The End

You might also like