Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Public Health,

Health Economics
AKUOKO EBENEZER
PHD CANDIDATE, FEFU
EMAIL: AKUOKO.EBE@DVFU.RU
REGRESSION ANALYSIS
Objectives

 By the end of this lesson, you should be able to


i. Explain the levels of measurement of data
ii. Explain regression analysis
iii. Perform and interpret regression analysis
Scales/Levels of Measurement
1. Nominal level of measurement: characterized by data that consist of names,
labels, or categories only. The data cannot be arranged in an ordering
scheme such as low to high
 Properties;
i. data categories
ii. mutually exclusive (an individual can belong to only one category)
iii. have no logical order
 Examples

a. Classification of eye colours: amber, blue, brown, gray, green, hazel, or red.
b. Gender of people as male or female.
 2. Ordinal level of measurement: this involves data that may be arranged in some
order. These variables have qualitative categories that are ordered in terms of
degree or magnitude.
 With the ordinal scale, the variables are divided into categories but are ranked
from lowest to highest or vice versa.
 Properties
a. Data categories are mutually exclusive (an individual can belong to only
category)
b. Data categories have logical order
c. Data categories are scaled according to the amount of a particular characteristic
that they possess
 Examples;
a. Blood pressure of a person may be low, normal or high.
b. A student’s grade may be excellent, good, satisfactory, unsatisfactory or fail
 3. Interval level of measurement : The variables have quantitative values or numbers. The interval scale
has the characteristics of the rank order and equal interval .
 It does not possess an absolute zero-point, e.g Fahrenheit, and Celsius scales have zero base, but does not mean
there is no temperature.
 An interval scale incorporates some features of ordinal and nominal scales and the additional feature that
distance between levels on the scale can be specified.
 Examples
a. Temperature: the difference between temperatures 50 o and 70o is the same as the difference between
temperatures 10o and 30o.
b. Intelligence Quotient (I.Q) :the difference between IQ of 140 and 150 is the same as between 70 and 80;
Properties
c. Data categories are mutually exclusive
d. Data categories have logical order
e. Data categories are scaled according to the amount of a particular characteristic that they possess.
 4. Ratio level of measurement: the interval level modified to include the natural zero
starting point (where zero indicates that none of the quantity is present).
 Examples of the ratio scale of measurement include wages, weight and
height.
 Money is a good illustration. If you have 0 Rubles, then you have no money.
 Weight is another example. If the dial on the scale is at zero, then there is complete
absence of weight
 Properties

i. Data categories are mutually exclusive


ii. Data categories have logical order
iii. Data categories are scaled according to the amount of a particular characteristic they
possess
iv. Equal differences in the characteristic are represented by equal differences in the
numbers assigned to the categories
REGRESSION ANALYSIS
 In our previous discussions, we looked at how we can establish the strength and nature of

the linear relationship between two different variables through the conduct of correlation

analysis.

 In Public Health study, it may be necessary to develop an equation or model, so that we

can predict the values of the dependent variable from the set of values of the independent

variable.

 This statistical equation or model is known as the regression analysis.


 Regression analysis thus explains the variability in dependent variable by means of one or
more of independent or control variables.

 To perform regression analysis, the following data requirements must be met:

i. Measurement of two or more variables one of which must be dependent.

ii. Dependent variable must have interval or ratio scale measurement.

iii. If independent variables are ordinal qualitative scaled (e.g. brand choice), then appropriate
caution must be maintained so that results from analysis can be interpreted. For example, it
may be necessary to create variables that take values 0 and 1 or dummy variables.
Types of regression analysis
i. Simple Linear Regression: simple regression establishes the relationship between two
variables. Linear regression is graphically depicted using a straight line with the slope
defining how the change in one variable impacts a change in the other. The y-intercept of a
linear regression relationship represents the value of one variable when the value of the
other is 0.
ii. Multiple Regression: For complex connections between data, the relationship might be
explained by more than one variable. In this case, an analyst uses multiple regression which
attempts to explain a dependent variable using more than one independent variable
 Logistic Regression: Values of some dependent variables may be one of two or more
outcomes (eg, positive or negative, yes or no). In this case, Logistic regression analysis is
performed.
Linear Regression
 In linear regression, the relationship between the dependent and independent
variables is a linear function.
 The straight line that best fits the data is of interest in establishing the linear
relationship between the variables
 Consider a simple linear regression model :

 where Y is termed as the dependent or study variable and X is termed as the


independent or explanatory variable
 The terms “a” and “b” are the parameters of the model. The parameter “a” is
termed as an intercept term, and the parameter “b” is termed as the slope
parameter.
 “a” and “b” are usually called as regression coefficients.
 “b” indicates the level of change in “y” with a corresponding unit change in “x”.
 Like in correlation coefficients, when “b” is positive, it means a positive
Example 1
 A researcher wants to predict the life expectancy rate of Ghana given the country’s death rates. The table below
shows the death rates per 1000 population and life expectancy rate of Ghana from 2012 to 2018.
i. Develop and interpret regression model for the researcher.
ii. Due to the COVID-19 pandemic, the death rate in Ghana in 2020 increased by 40% from the 2018 figure ,
predict the life expectancy of Ghana in 2020 (assuming that death rate is the only predictor of life expectancy).

Year Death rate Life Expectancy


2012 8 62
2013 8 62
2014 8 62
2015 8 63
2016 8 63
2017 7 63
2018 7 64
Solution (Excel)
Year Death rate (x) Life Expectancy (y) x^2 y^2 xy
2012 8 62 64 3844 496
2013 8 62 64 3844 496
2014 8 62 64 3844 496
2015 8 63 64 3969 504
2016 8 63 64 3969 504
2017 7 63 49 3969 441
2018 7 64 49 4096 448
Sum 54 439 418 27535 3385
n 7

Intercept (a) Slope (b)


Numerator 712.00 Numerator -11.00
Denominator 10.00 Denominator 10.00
Intercept (a) 71.20 Slope (b) -1.10
 Task one
 Model Y= -1.1x + 71.2
 Interpretation:
 Life expectancy decreases by 1.1 units by every unit increase in death rate. Or Life expectancy increases by 1.1
units by every unit decrease in death rate.
 Life expectancy, if no deaths are recorded is 83, assuming death rates was the only determinants of life
expectancy. We can however see that death rate is indeed, not the only determinant of life expectancy rates, and as
such there are other variables that need to be investigated.
 Task two
 It is said that in 2020, death rate increased by 40% from the 2018 figure (7). That means that death rate in 2020
was 9.8.
 Thus from the regression model for 2020 will be expressed as
Y= -1.1(9.8) + 71.2
Y = 60.42
Thus due to COVID-19, the life expectancy in Ghana in 2020 is predicted to be about 60 years.
Example 2
 A researcher is interested in finding out about how the per capital health expenditures of Country A influences
the life expectancy of that country. He collected his data and presented in the table below.

Year Per Capita Health Expenditure (US Dollars) Life Expectancy (in years)
2010 75 50
2011 76 52
2012 80 55
2013 85 57
2014 88 64
2015 90 69
2016 96 74
2017 100 84
2018 115 72
2019 120 80
2020 125 82

 i. Using a regression model, present a brief report for the researcher


 Ii. A presidential candidate is campaigning on increasing the life expectancy rate in the country to 100 by the
year 2024. Assuming per capital expenditure in the only predictor of life expectancy, what evidence can be
provided to prove that the politician will be able to achieve his target?
Answer (Excel)
Per Capita Health Expenditure (US Life Expectancy (in
Year Dollars) years) x^2 y^2 xy
2010 75 50 5625 2500 3750
2011 76 52 5776 2704 3952
2012 80 55 6400 3025 4400
2013 85 57 7225 3249 4845
2014 88 64 7744 4096 5632
2015 90 69 8100 4761 6210
2016 96 74 9216 5476 7104
2017 100 84 10000 7056 8400
2018 115 72 13225 5184 8280
2019 120 80 14400 6400 9600
2020 125 82 15625 6724 10250
Sum 1050 739 103336 51175 72423
n 11

Intercept (a) Slope (b)


Numerator 321154 Numerator 20703.00
Denominator 34196 Denominator 34196.00
Intercept (a) 9.39 Slope (b) 0.61
 Task one
 Model Y= 0.61x + 9.39
 Interpretation:
 Life expectancy increases by 0.61 units by every unit increase in per capita expenditure.
 Task two
 The politician is campaigning on increasing life expectancy to 100 years by 2024.
 Given the model , Y= 0.61x + 9.39, it means we have to find the value of “x”, when “y” is 100.
 100 = 0.61x + 9.39
 x= 100-9.39 = 148.54
0.61
 Thus for us to believe that the politician can achieve his promise, he should convince us that he can ensure
per-capita expenditure of more than $148 in any year from 2021 to 2024.
Example 3
 In efforts to control the outbreak of Malaria in the Volta region of Ghana, the Ministry of Health launched the
Operation Distribution of Insecticide Mosquito Nets policy (2015-2021). In the policy the ministry set out to do
mass distribution of the nets to the citizens of the region. After the successful completion of the program, the
ministry wants to evaluate the impact of the intervention, against the annual number of new cases of Malaria,
registered in the region. The data is shown in the table below..
Number of Insecticide Treated
Nets distributed (In thousands) New cases of malaria (in
Year ( x) thousands) (y)
2015 150 8
2016 200 7
2017 100 11
2018 80 15
2019 120 9
2020 90 12
2021 300 4

 i. Using a regression model, write a brief report on the evaluation of the intervention.
 ii. With the help of the word bank, the ministry is set to distribute 350,000 nets in 2022. Predict the number of
Answer (Excel)
Number of Insecticide
Treated Nets
distributed (In New cases of malaria
Year thousands) (in thousands) x^2 y^2 xy
2015 150 8 22500 64 1200
2016 200 7 40000 49 1400
2017 100 11 10000 121 1100
2018 80 15 6400 225 1200

2019 120 9 14400 81 1080


2020 90 12 8100 144 1080

2021 300 4 90000 16 1200


Sum 1040 66 191400 700 8260

n 7

Intercept (a) Slope (b)

Numerator 4042000 Numerator -10820.00

Denominator 258200 Denominator 258200.00

Intercept (a) 15.65 Slope (b) -0.04


 Task one
 Model Y= -0.04x + 15.65
 Interpretation:
 The number of new cases of malaria decreases by 0.04 units by every unit increase in number of ITN distributed.
Or The number of new cases of malaria increases by 0.04 units by every unit decrease in number of ITN
distributed.
 Task two
 It is said that in 2022, 350,000 ITN will be distributed.
 Thus from the regression model for 2022 will be expressed as
Y= -0.04*(350) + 15.65
Y=
Thus, in 2022, the number of new cases of malaria in the Volta region of Ghana is predicted to be 1,650.
End of Lessons

Questions??

You might also like