Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 72

MEASUREMENT OF

CENTRAL TENDENCY
OGUZHAN AKYILDIRIM
STATISTICIAN
ADM3101326
INTRODUCTION TO STATISTICS
Basic terminology
and concepts
• Statistical terms
• Ratio – different subjects
• Proportion – same subject
• Percentage – percentage of
proportion
• Rate – time sensitive
• Growth rate
The Second Bonus!!! (2 students separate)
• Prepare a powerpoint presentation with 12 slides telling a story about
any subject selected by using 6 different chart types
Ratio

Comparison of two numbers


expressed as:
Used to express such
• a to b, a per b, a:b comparisons as clinicians to
patients or beds to patients

Example – In district X, there


are 600 employees and 200
Small-Medium sized
Calculation a/b
enterprise (SME). What is
the ratio of employee to
SME?
Examples for Ratios
Policy Making
• Student – Teacher Ratio in Education Policy
• Patient – Doctor Ratio in Health Policy
• Debt – to – GDP Ratio in Economic Policy
Policy Impact Evaluation
• Renewable to Non-Renewable Energy Ratio
• Employment – to - Population Ratio
• Ratio of Public to Private School Enrollments
A ratio in which all individuals in the numerator
are also in the denominator.

Used to compare part of the whole, such as


proportion of all employee who are less than 20
years old.
Proportion
Example: If 20 of 100 employees in a factory are
less than 20 years of age, what is the proportion
of young employees in the factory?

20/100 = 1/5
Examples for Proportions
Policy Making
• Proportion of budget allocated to education – increase from 15% to 20%
• Proportion of renewable energy in total energy consumption – 2 in 5 units
• Proportion of population with access to (quality) healthcare – 6 of 10 people
Policy Impact Evaluation
• Proportion of Students Graduating On Time Before and After Education
Initiatives
• Proportion of Uninsured Individuals Before and After Healthcare Reform
• Proportion of Students Graduating On Time Before and After Education
Initiatives
A way to express a proportion (proportion multiplied
by 100)

Expresses a number in relation to the whole

Percentage
Example: Males comprise 2/5 of the employees, or
40% of the employees are male (0.40 x 100)

Allows us to express a quantity relative to another


quantity. Can compare different groups, facilities,
countries that may have different denominators
Examples for Percentages
Policy Making
• Setting Unemployment Rate Targets – decreasing unemployment rate
from 35% to 25% for youth in 2 years
• Renewable Energy Goals
• Health Insurance Coverage Expansion
Policy Impact Evaluation
• Evaluating Education Reform Impact
• Assessing Public Health Campaigns
• Impact of Tax Policy Changes
Measured with respect to another
measured quantity during the same
time period

Rate Used to express the frequency of


specific events in a certain time
period (fertility rate, mortality rate)
• Numerator and denominator must be from
same time period
• Often expressed as a ratio (per 1,000)
Examples for Rates
Policy Making
• Crime Rate Reduction Policies – crimes for 100,000 people from 500 to 400
• Literacy Rate Improvement
• Traffic Accident Rate Decrease – traffic accident per million vehicles
Policy Impact Evaluation
• Healthcare Access Policies – changes in number of doctor visits per 100 people
• Birth and Fertility Rate Adjustments After Family Planning Policies – number of
births per 1000 women
• Evaluating Environmental Regulation Impact - Changes in pollution rates
Calculation

Total number of increase ÷ time of increase

Rate of Used to calculate monthly, quarterly, yearly increases in


increase – education service delivery. Example: increase in # of new
clients, commodities distributed

Growth rate Consider a city whose population was 100,000 at the beginning
of 2019 and grew to 102,500 by the end of 2019. The
population growth rate for that year would be:

2.5%
Examples for Rate of Increases (Growth
Rates)
Policy Making
• Economic growth rates - 3% increase in GDP per year.
• Rate of increase in educational enrollment
• Renewable energy consumption increase rate
Policy Impact Evaluation
• Evaluating healthcare coverage expansion
• Public transformation usage growth rate
• Rate of increase in employment opportunities in emerging sectors
Lecture outcomes
• Understand concepts of ungrouped / grouped data:
• Understand concepts of central tendency: Define and distinguish between the mean,
median, and mode as measures of central tendency and understand their relevance in
political science research.
• Application of central tendency measures: demonstrate the ability to calculate the
mean, median, and mode for given datasets and interpret these measures in the context
of political science data, such as survey responses, voting data, or policy analysis.
• Analyze distributions: gain the ability to analyze the distribution of political data to
determine the most appropriate measure of central tendency to use in different
scenarios, acknowledging the impact of outliers and skewed data.
• Communicate statistical findings: effectively communicate the results of their
statistical analyses, including the use of central tendency measures, in written and oral
formats suitable for academic and policy-making audiences.
Ungrouped vs. Grouped Data

Ungrouped data Grouped data


have not been summarized in any way have been organized into a frequency distribution
are also called raw data
Why we group data?
• Data Management and Simplification
• Overview and Clarity
• Statistical Analysis
• Reducing Complexity
• Comparisons
• Efficiency
• Confidentiality
• Data Stability
• Pattern Recognition
42 26 32 34 57
Ungrouped vs. 30 58 37 50 30
Grouped Data 53 40 30 47 49
50 40 32 31 40
•Example: Ages of a 52 28 23** 35 25
sample of managers 30 36 32 26 50
55 30 58 64 52
49 33 43 46 32
61 31 30 40 60
74* 37 29 43 54
Frequency Distribution of Manager’s Ages:

Class Interval Frequency

Ungrouped 20-under 30 6
30-under 40 18
vs. Grouped 40-under 50 11
Data 50-under 60 11
60-under 70 3
70-under 80 1
Ungrouped vs. Grouped Data
Range and Class

Data Range:Range = Largest – Smallest


Ex: Range = 74 – 23 = 51
Number of Classes and Class Width
• The number of classes should be between 5 and 20.
Fewer than 5 classes cause excessive summarization.
More than 20 classes leave too much detail.
Ungrouped vs. Grouped Data
• Class Width

Divide the range by the number of classes for an approximate class width
Round up to a convenient number

Ex: Approximate Class Width = 51/6 = 8.5


Class Width = 10
Ungrouped vs. Grouped Data
Relative Frequency

Class Interval Frequency Relative Frequency


20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
Ungrouped vs. Grouped Data
Cumulative Frequency

Class Interval Frequency Cumulative Frequency


20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
Ungrouped vs. Grouped Data
Cumulative Relative Frequencies

Class Interval Frequency RF Cu. Frequency CRF


20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Measures of Central Tendency
Measures of the
location of the middle
or the center of a
Central distribution of data

tendency
Mode Mean Median
Measures of Central Tendency
Ungrouped Data
Mode

• The most frequently occurring value in a data set

• Applicable to all levels of data measurement (nominal, ordinal,


interval, and ratio)

Bimodal -- Data sets that have two modes


Multimodal -- Data sets that contain more than two modes
Measures of Central Tendency
Ungrouped Data
Example:

35 37 37 39 40 40
41 41 43 43 43 43 Value 44 occurs 5 times
44 44 44 44 44 45 The mode is 44
45 46 46 46 46 48

• Mode is often used in determining sizes (garment industry): S, M, L,


XL, XXL (modal sizes)
The middle of a distribution (when numbers are
in order: half of the numbers are above the
median and half are below the median)

The median is not as sensitive to extreme values


as the mean

Median
Odd number of numbers,
median = the middle Median of 2, 4, 7 = 4
number

Even number of numbers,


Median of 2, 4, 7, 12 =
median = mean of the two (4+7) /2 = 5.5
middle numbers
Measures of Central Tendency
Ungrouped Data
Median

• Middle value in an ordered array of numbers.

• Applicable for ordinal and scale (interval, and ratio) data

• Not applicable for nominal data

• Unaffected by extremely large and extremely small values

• Median is determined without using all information from the data set.
Measures of Central Tendency
Ungrouped Data
Computational Procedure

• Arrange the observations in an ordered array.

• If there is an odd number of terms, the median is the middle term of


the ordered array.

• If there is an even number of terms, the median is the average of the


middle two terms.
Measures of Central Tendency
Ungrouped Data
Example:

• Ordered Array: 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
- There are 17 terms in the ordered array.
- Position of median = (n+1)/2 = (17+1)/2 = 9
- The median is the 9th term, 15.

• Ordered Array: 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
- There are 16 terms in the ordered array.
- Position of median = (n+1)/2 = (16+1)/2 = 8.5
- The median is between the 8th and 9th terms: 14.5.
Measures of Central Tendency
Ungrouped Data
Calculate the median of each of the following set of numbers
1) 3, 2, 1, 3, 4, 5, 3 1233345
2) 5, 10, 8, 7, 12, 12 5 7 8 10 12 12
The average of your dataset

The value obtained by dividing the sum of


a set of quantities by the number of
quantities in the set
Mean
Example: (22+18+30+19+37+33) = 159 ÷
6 = 26.5

The mean is sensitive to extreme values


Measures of Central Tendency
Ungrouped Data
Arithmetic Mean

• Commonly called ‘the mean’: the average of a group of numbers

• Applicable only for interval and ratio data

• Affected by each value in the data set, including extreme values

• Computed by summing all values in the data set and dividing the sum by the
number of values in the data set
Measures of Central Tendency
Ungrouped Data
Population Mean

Sample Mean
Calculating mean
• Average number of migrants counseled per month
– January: 30
(30+45+38+41+37+40) = 231÷ 6 = 38.5
– February: 45
– March: 38
Mean or average = 38.5
– April: 41
– May: 37
– June: 40
Types of Mean

Mean

Weighted Geometric Harmonic


Geometric Mean
• Calculation of mean of growth rates
• Calculation of mean investment returns

• x1​,x2​,…,xn​are the n data points in the dataset, and


• ∏ denotes the product of the values.
Weighted Mean
• Calculation of Grade Point Average (GPA)
• Investment Portfolio Performance

• xi​represents each data point in the dataset,


• wi​is the weight corresponding to each data point,
• n is the total number of data points,
• ∑ denotes the summation over all data points.
Example
Weighted Mean
• If a student scores an A (4.0) in a 3-credit class, a B (3.0) in a 4-credit
class, and a BA (3.5) in a 3-credit class, their GPA would be calculated
as:
Harmonic Mean
• Calculation of average speed

• n is the total number of values in the dataset, and


• xi​represents each individual value in the dataset.
Where to use mean and
median?
Use of the Mean
• Data is scale (interval or ratio) - general use
• Data is homogenous – (No outlier)
Use of Median
• Data is skewed
• Presence of outliers
• Describing income, expenditure indicators
Measures of Central Tendency
Ungrouped Data
Percentiles

• Measures of central tendency that divide a group of data into 100 parts.
• Applicable for ordinal, interval, and ratio data
• At least n% of the data lie below the nth percentile, and at most (100 - n)% of the data lie
above the nth percentile

Example

• 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the
data lie above it
• The median is the 50th percentile.
Measures of Central Tendency
Ungrouped Data
Example

Raw Data: 14, 12, 19, 23, 5, 13, 28, 17


Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28

Location of 30th percentile:


The location index, i, is not a whole number; [i ]+1 = 2+1=3

The 30th percentile is at the 3rd location of the array: the 30th percentile is 13.
Measures of Central Tendency
Ungrouped Data
Quartiles

Measures of central tendency that divide a group of data into four


subgroups

Q1 is equal to the 25th percentile


Q2 is located at 50th percentile and equals the median
Q3 is equal to the 75th percentile
Measures of Central Tendency
Ungrouped Data
Example
Ordered array: 106, 109, 114, 116, 121, 122, 125, 129
25 109  114
Q: i 8   2  Q   111.5
100 2
1 1

50 116  121
Q: i 8   4  Q   118.5
100 2
2 2

75 122  125
Q: i 8   6  Q   123.5
100 2
3 3
Measures of Central Tendency
Ungrouped Data
Mode of Grouped Data

• Midpoint of the modal class


• Modal class has the greatest frequency

Example (see the former slide)

The modal class is 30-under 40. So, Mode = 35.


Measures of Central Tendency
Grouped Data
Mean of Grouped Data

• Weighted average of class midpoints


• Class frequencies are the weights

N N
 fM  fM
 
i 1 i i i i
i 1
N
f or N
 i
i 1
Measures of Central Tendency
Grouped Data
Example
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
Total 50 2150
N
 fM 2150
   43.0
i 1 i i

N 50
Measures of Central Tendency
Grouped Data
Median of Grouped Data
N
 cf
Median = L  2
p

W 
f med
where:

L : the lower limit of the median class


cfp : cumulative frequency of class preceding the median class
fmed : frequency of the median class
W : width of the median class
N : total of frequency
Measures of Central Tendency
Grouped Data
Example
Class Interval Frequency Cu. Frequency
20-under 30 6 6
30-under 40 18 24 50
 24
40-under 50 11 35 Median  40  2 10   40.9
50-under 60 11 46 11
60-under 70 3 49
70-under 80 1 50
N = 50
Note that N/2 = 25, therefore the median is the average of the 25th and 26th values.
So, the median class: 40-under 50.
Box Plot
Outliers
• Significantly differ from other observations in a data set
• Much higher or lower than the majority of data
• Outliers can result from variability in the measurement or indicate
experimental errors
• Outliers can have a significant impact on the results, including
skewing the mean, affecting the standard deviation, and influencing
the outcomes of statistical models.
• Depending on the context and purpose of the analysis, outliers may be
excluded, adjusted, or kept in the data to accurately reflect the
underlying distribution or to identify potential areas for further study.
How to use Measures of Central Tendency in
policy making and/or evaluation
• Informing policy decisions
• Evaluating policy impact
• Identifying target groups for policy interventions
• Resource allocation
• Comparing groups within a population
• Monitoring trends over time
• Communicating with stakeholders
Informing Policy Decisions
Examples:
• Public Health: The average (mean) number of new cases of a disease (e.g.,
flu or COVID-19) per day in a community can guide public health policies. If
the average number of cases rises above a certain threshold, it may prompt the
implementation of public health interventions such as vaccination campaigns,
mask mandates, or lockdown measures to control the spread.
• Education: The median score of students on standardized tests can inform
education policy. If the median score in certain schools or districts is
significantly below the national average, policymakers might allocate
additional resources, such as funding for after-school tutoring programs or
teacher training workshops, to help improve educational outcomes.
Informing Policy Decisions
• Economic Development: The mode of the income distribution within a
region can highlight the most common income bracket. If a significant
portion of the population falls into lower income brackets, economic
development policies may focus on job creation, skills training, and other
measures to increase employment opportunities and raise income levels.
• Environmental Policy: The average (mean) level of air or water pollution
in different areas can inform environmental policies. For areas with high
average pollution levels, governments might implement stricter
environmental regulations, promote cleaner technologies, or invest in
pollution control measures to protect public health and the environment.
Informing Policy Decisions
• Housing: The median home price in different regions can guide housing
policies. In areas where the median home price is significantly higher than
the average household income, policies may be needed to increase the
availability of affordable housing, such as subsidies for low-income
families or incentives for the construction of affordable units.
• Social Services: The average (mean) waiting time for accessing social
services (e.g., unemployment benefits, food assistance programs) can
inform policy improvements. High average waiting times might lead to
reforms aimed at streamlining processes, increasing staffing, or enhancing
online systems to make services more accessible to those in need.
Evaluating Policy Impact
• Educational Initiatives: To evaluate the impact of educational policies,
such as improvements in teacher training or the introduction of technology
in classrooms, the median scores of Turkish students on national or
international assessment tests (e.g., PISA) before and after the initiatives
could be examined. An upward shift in these median scores would point
towards an enhancement in educational outcomes.
• Healthcare Reforms: Turkey has undergone significant healthcare reforms
aimed at improving access and quality of care. The average (mean) number
of annual doctor visits per capita before and after the healthcare reform
could be analyzed. An increase in this average would suggest that more
people are accessing healthcare services, indicating the reform's success.
Evaluating Policy Impact
• Social Welfare Programs: The effectiveness of social welfare programs,
such as conditional cash transfers for families in need, can be assessed by
looking at the median household income or expenditure levels among
beneficiaries before and after receiving support. An increase in this median
would indicate that the program is successful in alleviating poverty.
• Public Transportation Expansion: In cities like Istanbul, where public
transportation projects have been implemented to ease congestion, the
median commute times for residents before and after the expansion of metro
lines or bus services can serve as an indicator of policy impact. Decreased
median commute times would signal an improvement in public
transportation efficiency and accessibility.
Identifying target groups for policy
interventions
• Youth Unemployment: By evaluating the average age of unemployed
individuals, policies could be targeted to the age groups that are most
affected by unemployment. If the mean age of the unemployed
population is lower, this might indicate the need for policies focusing
on youth employment opportunities.
• Public Health: The mode of healthcare utilization among different
regions could identify areas with the highest demand for healthcare
services. If a particular region shows a mode that indicates low
utilization of preventive services, health policies might focus on
enhancing access to these services in that region.
Identifying target groups for policy
interventions
• Elderly Support Services: The average (mean) age of the population
within various districts can help in identifying where the elderly
population is most concentrated. Social services could then be
directed more heavily towards these areas to provide age-appropriate
healthcare, housing, and recreational activities.
• Rural Development: Examining the median income or agricultural
yield in rural areas can help identify which regions are lagging.
Targeted interventions, such as subsidies or training programs for
modern farming techniques, could be directed towards these areas to
improve productivity and income levels.
Resource Allocation
• Disaster Response: In the aftermath of a natural disaster, the average
(mean) damage assessment costs in affected neighborhoods could be
used to prioritize and allocate resources for reconstruction and aid to
the most severely impacted areas.
• Education Resource Allocation: If the mean test scores of students in
eastern regions of Turkey are lower than those in other regions,
resources could be allocated for educational support, such as building
new schools, providing additional training for teachers, or supplying
educational materials to these areas.
Resource Allocation
• Healthcare Services: Should data show that the median number of
healthcare facilities per capita in rural areas of Turkey is significantly
lower than in urban areas, resources might be allocated to improve
healthcare infrastructure in those underserved regions.
• Environmental Protection: If certain regions have a higher average
level of industrial pollution, targeted resources could be allocated for
environmental cleanup projects, as well as for the implementation of
more stringent environmental protection policies.
Comparing groups within a population
• Social Media Usage: The average (mean) amount of time spent on
social media platforms can be compared across different age groups to
understand generational differences in technology use.
• Employee Satisfaction: In organizational studies, the mean or
median satisfaction scores from employee surveys can be compared
across different departments or job roles. This can highlight which
groups are more or less satisfied with their work conditions.
Comparing groups within a population
• Income Level Comparisons: To compare economic status among
different demographic groups, analysts might look at the median
income levels of each group. Since the median is not influenced by
outliers, it can provide a better sense of the typical income for each
group.
• Education Attainment: Educational researchers may compare the
mode of the highest education level achieved across various age
groups or ethnic groups. This can identify the most common level of
education within each group, which is useful for policy targeting.
Monitoring Trends Over Time
• Unemployment Rates: The mode of the unemployment rate over time
can show the most frequently occurring rate. If the mode shifts lower
consistently, it may suggest that more people are finding jobs,
indicating a positive trend in the labor market.
• Crime Rates: The median number of reported crimes in different
categories (like theft, violent crimes) can be monitored over time. If
the median decreases, it could suggest that policies and measures to
reduce crime are effective.
Trends Over Time
• Economic Trends: By tracking the average (mean) income of citizens
over several years, economists can determine whether the general
population's earning capacity is increasing or decreasing, which can
influence fiscal policy decisions.
• Housing Market Trends: Real estate analysts might track the
average (mean) selling price of homes in a region over multiple years
to gauge the housing market's health. An upward trend could indicate
a growing economy or possibly a housing bubble.
Key messages
• Purpose of analysis is to provide
answers to policy making
questions
• Descriptive analyses describe the
sample/target population
• Descriptive analyses do not
define causality – that is, they tell
you what, not why?
Third Bonus!!! (4 students, each will present
different subject)
• Prepare a powerpoint presentation with at least 15 slides telling a story
about any subject given below by using different chart types,
parameters like (proportions, rates, percentages etc.) and
measurements of the central tendency.
• Subjects: Economy, Health, Education, Labour Market (in Turkiye)
• 10 points
Rules
• Use as many as graphics
• Use as many as parameters like mode, mean, median, proportion,
ratio, percentage, percentile, ratio, rate, growth rate etc.
• Q&A sessions with the class

You might also like