Forecasting Mortality Trends in The 1980 Cohort - A Comparative Analysis of Cairns Blake Dowd (CBD) and Lee-Carter Models

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Predicting Mortality Trends in the 1980 Kenyan Cohort: A Comparative

Analysis of Cairns Blake Dowd(CBD) and Renshaw-Haberman Models

A research proposal submitted in partial fulfillment of the requirements of the


degree of Bachelor of Science in Actuarial Science at the Jomo Kenyatta
University of Agriculture and Technology.

2023
DECLARATION
This research is our original work and has not been presented for a degree award in any
other university.

NAME REGISTRATION NUMBER SIGNATURE

IAN KARARI SCM221-0013/2020 .……………...

EMILY NGAHU SCM221-0004/2020 ……………….

BRIDGES NJIRU SCM221-0051/2020 ………………

GRACE MICHELLE SCM221-0005/2020 ………………

MARYANNE MUTHONI SCM221-0699/2020 ………………

The research proposal has been submitted for examination with my approval as a
university supervisor.

Signature………………………………Date…………………………

Dr. Matabel Odin

Department of Statistics and Actuarial Sciences, JKUAT.


ACKNOWLEDGEMENT

We primarily thank the Almighty God for His grace throughout our research work and
for enabling us to reach this stage in our career.

It would not have been possible to do this project proposal without the support of our
families. Their care, understanding, prayers, and continued support have enabled us to
reach this far and we are forever grateful. It has also been a great honor and privilege to
undergo training at this prestigious university.

We are also highly indebted to Dr. Matabel Odin for her guidance and constant
supervision throughout the research project proposal process.

LIST OF ABBREVIATIONS

LC: Lee-Carter

CBD: Cairns-Blake-Dowd

RH: Renshaw Haberman model

KNBS: Kenya National Bureau of Statistics

KDHS: Kenya Demographic and Health Survey

SVD: Singular Value Decomposition

OLS: Ordinary Least Squares

AKI: Association of Kenyan Insurers

RMSE: Root Mean Squared Error

MAE: Mean Absolute Error


Table of Contents

DECLARATION............................................................................................................................2
ACKNOWLEDGEMENT.............................................................................................................3
LIST OF ABBREVIATIONS........................................................................................................3
Table of Contents........................................................................................................................... 4
CHAPTER ONE............................................................................................................................ 6
INTRODUCTION................................................................................................................... 6
1.1 Background of the study................................................................................................ 6
1.3 Objectives...................................................................................................................... 8
1.4 Significance of the study................................................................................................8
CHAPTER TWO........................................................................................................................... 9
LITERATURE REVIEW...........................................................................................................9
2.1 Introduction....................................................................................................................9
2.2 Theoretical review......................................................................................................... 9
2.2.1 Overall Mortality In Kenya.........................................................................................9
2.2.2 Causes of mortality............................................................................................. 11
2.2.3 Mortality Prediction............................................................................................ 11
2.2.4 History of Life Tables......................................................................................... 14
2.3 Research gap................................................................................................................ 16
METHODOLOGY...................................................................................................................... 17
3.1 Summary...................................................................................................................... 17
3.2 Modelling Framework/Preliminaries........................................................................... 17
3.2.1 Mortality Trends..................................................................................... 17
3.2.2 Model Selection and Justification.......................................................................18
3.2.3 Data Collection................................................................................................... 18
3.2.4 Software and Tools............................................................................................. 19
3.3 Objective 1: To estimate mortality rates using the CBD and Renshaw-Haberman
models from their current age in the year 2022 up to age 80............................................ 19
3.3.1 Renshaw-Haberman Model................................................................................ 19
3.3.2 Cairns-Blake-Dowd Model.................................................................................21
3.3.3 Assumptions on mortality................................................................................... 24
3.3.4 Model Validation.................................................................................................24
3.4 Objective 2: To construct cohort life tables, using the Renshaw-Haberman and CBD
model..................................................................................................................................24
3.4.1 Assumptions of our Life Table........................................................................... 25
3.5 Objective 3: To compare the mortality trends based on the two models and model
performance....................................................................................................................... 26
3.5.1 Comparison of mortality trends.......................................................................... 26
3.5.2 Performance evaluation of the models................................................................27
WORK PLAN...............................................................................................................................32
BUDGET.......................................................................................................................................33
CHAPTER ONE

INTRODUCTION

1.1 Background of the study

Mortality trends, refer to the patterns and changes observed in the central death rate, life
expectancies,cumulative death rate,survivorship function,total number of years lived and
probability of death within a population over a specific period of time. Studying these trends is
useful in the evaluation of the accuracy of the different models of mortality prediction. Over the
years, there have been various models that have been developed to try to ascertain the changes in
mortality rates in the future. The most well-known of these models is the Cairns-Blake-Dowd
(CBD) and Lee Carter (LC) model, which researchers have modified and adjusted to account for
various effects on mortality. One adjustment of the Lee-Carter model is the modification of the
original Lee-Carter model to account for cohort effects. Therefore, bringing about the Renshaw
Haberman model. Similarly, the Cairns-Blake-Dowd model has undergone various
modifications, one being the addition of the cohort effect to the original CBD model to bring
about the CBD (2) model.

According to Withers (2009), a cohort is a set of individuals that are grouped according
to similar traits which in most cases is by age, in other words, a birth cohort. Cohort analysis is
following a specific group of people born in a specific year and studying them to track their
behavioral change. The data from cohort analysis can be used to generate cohort life tables. The
primary reason for studying cohort mortality is to have an understanding of how certain external
and internal factors affect cohort mortality (Thelle & Laake, 2015). As a result of such studies,
one can be able to get a preview of how similar internal and external factors can affect the
mortality of the demographic group as a whole.

In the construction of life tables, the period and cohort approaches are used to analyze
and understand mortality. Period life tables focus on examining demographic data within specific
intervals or periods while cohort centers on analyzing data for groups of individuals born during
the same time period and then following these cohorts throughout their lifetimes. Through life
tables, there’s the ability to observe mortality patterns, obtain life expectancy for different
cohorts, and obtain survival probabilities and population projections which are important for
summarizing mortality data (Glenn, 1977).

The study seeks to predict mortality rates in the Kenyan 1980 cohort and performs a
comparative analysis of the Renshaw Habberman and Cairns-Blake-Dowd models. The
motivation for picking the Renshaw-Habberman and CBD models is that both the
Renshaw-Haberman and CBD models are widely recognized with widespread adoption in
academic research and practical applications make them relevant for comparative evaluation.

The 1980 cohort is a cohort of interest due to the presence of adequate data on the cohort
to make predictions since to make accurate predictions it is important to have a significant
dataset.The choice of this cohort allows us to have a substantial portion,at least 50%,hence more
robust and reliable predictions.The mortality rates increased from the 1980’s due to the HIV and
AIDS epidemic. Therefore, this sparks interest in the mortality trends for this cohort as they age.
Health insurance uptake is highest among Kenyans aged 45 and above,due to the age related
disease burden.Data published by the Kenya National Bureau of Statistics (KNBS) shows that 34
percent of Kenyans aged 45 and above have some sort of health coverage.This is the highest
percentage in any age group.In a few years,this age group will fall within the 1980
cohort,making the mortality rates of this cohort a point of interest especially to life offices.

1.2 Statement of The Problem

Accurate mortality rate predictions are pivotal for guiding decisions in life insurance. It is
widely accepted in actuarial science that, for the pricing of and reserving for annuity and pension
products, we need to understand the trends in mortality rates over time so that the underlying
changes can be accurately modeled and projected into the future. These trends have been
predominantly downward for the populations of many countries in recent years. A failure to
account for these downward trends would mean that the premiums and reserves for annuity and
pension products would be understated with potentially disastrous consequences for the financial
institutions involved.
The existing divergence in approaches and assumptions among widely used models for
mortality predictions, such as the Renshaw-Haberman and Cairns-Blake-Dowd (CBD) models,
raises the debate as to which model gives predictions closer to actual mortality. This necessitates
a thorough examination of their performance. By systematically evaluating and comparing these
models, this research seeks to enhance the precision of mortality predictions and assess the
performance of these two models in predicting mortality rates.

1.3 Objectives

Main

To predict and compare mortality trends in the 1980 Kenyan cohort using Cairns Blake
Dowd(CBD) and Renshaw-Haberman models.

Sub objectives

1. To estimate mortality rates using the CBD and Renshaw-Haberman models from their
current age in the year 2022 up to age 80.
2. To construct cohort life tables, using the mortality rates from the Renshaw-Haberman and
CBD model.
3. To compare the mortality trends based on the two models and model performance.

1.4 Significance of the study

This study holds great significance within the fields of actuarial science and
demographic research. The outcomes of this study will not only assist professionals in actuarial
science, providing them with more reliable tools for risk assessment, but will also offer valuable
insights to researchers, policymakers, and academics. In pension schemes, predicting mortality
rates allows them to estimate how long retirees are likely to live and, consequently, how long
they will be drawing pension benefits. This information is essential for determining the amount
of money needed to fund the pension obligations.

In life insurance, mortality rates are used in longevity risk management which is the risk
of policyholders living longer than expected resulting in increased payout amount. Accurate
predictions will allow for better reserve calculation ensuring the insurer's financial stability. This
study will contribute to the continuous improvement of mortality prediction methodologies,
thereby shaping the landscape of actuarial practice and demographic analysis. This study also
contributes to the body of knowledge and is a stepping stone for more research on the prediction
of mortality.
CHAPTER TWO

LITERATURE REVIEW

2.1 Introduction

This literature review explores the history and developments in mortality, mortality
projection, and history of life tables. It examines various studies, methodologies, and models
employed in the field of mortality projections. The findings contribute to a comprehensive
understanding of the theoretical and empirical foundations of mortality prediction techniques
and identifying research gaps from past studies.

2.2 Theoretical review

2.2.1 Overall Mortality In Kenya

According to The Kenya National Bureau of Statistics(2003),14% of women and 18% of


men were likely to die between exact ages 15 and 50. The maternal mortality ratio was 362
maternal deaths per 100,000 live births for the seven-year period preceding the survey.
Comparing these results with the results of the previous Kenya Demographics and Health
Statistics report,1998, where the maternal mortality ratio was 520 maternal deaths per 100,000
live births, the study concluded that the differential was not large enough to conclude whether or
not there has been any change over time between the two surveys. In the same report, the
estimated level of adult mortality was slightly higher among men (4.78 deaths per 1,000
population) than among women (3.72 deaths per 1,000 population)

A comparison of the 2003 KDHS and the 1998 KDHS rates indicated a substantial
increase in adult mortality rates for both males and females at all ages, with the exception of men
aged 15-19. The summary measure of mortality for the age group 15-49 showed an increase of
about 40% in female mortality rates and about 30% in male mortality rates from the 1998 KDHS
rates (Kenya National Bureau of Statistics, 2003). The overall mortality rates derived from the
2003 KDHS data are higher among females than males (6.6 and 6.2 deaths per 1,000 years of
exposure, respectively), which is unusual since male mortality typically exceeds female mortality
during these ages. It was noted that the rise of AIDS, had a significant cause of death and its
emergence had altered the age and sex pattern of mortality.

According to KDHS,2014, infant mortality was reported to be 77 deaths per 1,000 live
births and under-five mortality was 115 deaths per 1,000 live births. In contrast with their
previous report, 2003, where the maternal mortality ratio was 362 maternal deaths per 100,000
live births, it is evident that the mortality rate had decreased. The population of Kenya increased
from 10.9 million in 1969 to 28.7 million in 1999 (Central Bureau of Statistics, 1994, 2001). The
results of the previous censuses indicated that the annual population growth rate was 2.9 percent
per annum during the 1989-1999 period, down from 3.4 percent reported for both the 1969-1979
and 1979-1989 intercensal periods (Kenya National Bureau of Statistics, n.d.). The decline in
population growth was a realization of the efforts contained in the National Population Policy for
Sustainable Development (National Council for Population and Development, 2000) and was a
result of the decline in fertility rates since the mid-1980s. In contrast, mortality rates have risen
since the 1980s, presumably due to increased deaths from the HIV/AIDS epidemic, deterioration
of health services, and widespread poverty (National Council for Population and Development,
2000)

According to the World Health Organization, the adult mortality rate in Kenya is the
percentage of total deaths between ages 15 and 60 (per 100 total population), both sexes
combined. In 2020, the adult mortality rate in Kenya was 42.15, which means that out of 100
people who were alive at age 15, 42.15 would die before reaching age 60(World Health
Organization, 2022). This is an increase from 2015 when the adult mortality rate was 41.91. The
adult mortality rate is one of the indicators of the health status of a population, as it reflects the
mortality due to both communicable and non-communicable diseases, as well as injuries and
violence.

Also, according to the World Health Organization(2022), life expectancy at birth in


Kenya has improved by 12.2 years from 53.9 years in 2000 to 66.1 years in 2019. However,
Kenya still faces many health challenges, such as HIV/AIDS, tuberculosis, malaria, lower
respiratory infections, and maternal and neonatal conditions. These are some of the leading
causes of death for both males and females in Kenya.
UNICEF data shows that the under-five mortality rate in Kenya has declined from 108.8
deaths per 1,000 live births in 2000 to 40.9 deaths per 1,000 live births in 2020. However, this is
still higher than the global average of 36.4 deaths per 1,000 live births in 2020.

In 2020, at the spike of the COVID-19 pandemic, Kenya's mortality rates increased
significantly. The Worldometer reported that Kenya had recorded 344,070 confirmed cases and
5,689 deaths due to COVID-19 as of November 15, 2023. The country had administered
22,713,776 doses of COVID-19 vaccines, covering 21.1% of the population.

2.2.2 Causes of mortality

According to Plos One journal, 2022, mortality from infectious diseases, especially
HIV/AIDS, was high in Kisumu County, but there was a shift toward higher mortality from
noncommunicable diseases, possibly reflecting an epidemiologic transition and improving HIV
outcomes. The epidemiologic transition suggests the need for increased focus on controlling
noncommunicable conditions despite the high communicable disease burden.

According to the WHO 2022 report, the main causes of death for males in Kenya in 2019
were neonatal conditions, tuberculosis, lower respiratory infections, HIV/AIDS, and road injury.
The main causes of death for females in Kenya in 2019 were lower respiratory infections,
HIV/AIDS, neonatal conditions, diarrhoeal diseases, and stroke. The life expectancy in Kenya in
2016 was 69.0 for females and 64.7 for males. This has been an improvement from the year 1990
when the life expectancy was 62.6 and 59.0 respectively. However, Kenya still faces many health
challenges, such as high maternal and child mortality, high burden of infectious diseases, and
low access to quality health services.

2.2.3 Mortality Prediction

Mortality prediction has been an area of much interest, evolving significantly over time.
The roots of mortality modeling trace back to the early 18th century, characterized by subjective
models that relied on expert opinions rather than data-driven approaches. De Moivre (1725)
proposed constructing life tables from mortality datasets using linear survival functions, marking
an early attempt to formalize mortality analysis. This was followed by Gompertz's (1825)
demonstration that mortality follows an exponential pattern across all ages, highlighting the
acceleration of death rates with an increase in age. This foundational insight laid the groundwork
for subsequent studies in mortality modeling. The works of Brass (1971) and Wilmoth (1990)
further refined models using logistic and logarithmic transforms to ensure positive mortality
rates.

Lee and Carter's groundbreaking work (1992) marked a paradigm shift in mortality
prediction, pioneering the introduction of the influential Lee-Carter model. This two-factor
model became a cornerstone in demographic literature and sparked the development of various
extensions and adaptations to address its limitations (Deaton and Paxson 2004).

Recognizing the limitations of the Lee-Carter model, researchers sought to refine its
assumptions and broaden its applicability. Brouhns et al. (2002) and Giacometti et al. (2009)
explored alternative distributions, like the Poisson and generalized hyperbolic distributions, to
mitigate the normality assumption in the Lee Carter random component. Additionally, Mitchell
et al. (2013) proposed the Mitchell-Brockett-Mendoza-Muthuraman (MBMM) model by
applying parameterization to the detrended natural logarithm of mortality rates.

The earliest and generally famous stochastic factor discrete-time model, Lee and Carter
(1992) postulate that the true underlying death rate 𝑚𝑥,𝑡=- log(1−𝑞𝑥,𝑡). This implies that longevity

risk is not affected by cohorts i.e. changes in age-specific demographic parameters, and lower
infant mortality rates. The Renshaw and Haberman (2006) was an extension of the Lee-Carter
model effects. This modification to the Lee-Carter model was done to capture the effects that
could be attributed to the year of birth (t −x). This model has been generally used for both
demographic and actuarial applications because, firstly, it produced satisfactory fits and forecasts
of mortality rates for different nations. For instance, the Lee-Carter model was used in Japan,
Austria, Australia, Belgium, and the Nordic countries. Secondly, the Lee-Carter model structure
permits the construction of confidence intervals related to mortality projections. Despite its
reasonable performance, the LC model had a few constraints (Lee 2000) which caused negative
responses. Because of this, new stochastic models were produced with the most remarkable
models being the Renshaw and Haberman (2006) and Cairns et.al models (2006, 2007, and
2008).
More significant advancements reviewed the parameterization of the Lee-Carter model.
Renshaw and Haberman (2006) introduced the Age-Period-Cohort (APC) model, also known as
the Renshaw Haberman model, by incorporating a cohort effect, whereas Cairns et al. (2006,
2007) elaborated on the implementation of a Generalized Linear Model (GLM) for mortality odd
ratios, resulting in the well-known Cairns-Blake-Dowd (CBD) model. The CBD model is a
two-factor model that posits each of its two parameters follows a random walk with drift. In this
model, the rate of drift is constant, and the changes in the parameters are correlated. The CBD
model describes the logit of the initial mortality rate with a slope term and an intercept term,
allowing for the number of deaths to follow a Poisson distribution. Future stochastic simulations
are then obtained by projecting these two terms as following correlated random walks.

A study by Cairns et al., (2009) sought to use formal methods of model selection to rank
which of the eight mortality models works best for data on males in England-Wales between
1961 to 2004 and males in the United States from the period 1968 to 2003. The focus of this
study was mainly on males of higher ages 60 to 89 since the study was interested in the mortality
rates of pensioners whose mortality rates had been declining over the years. In order to achieve
this, Cairns et al., (2009) used various methods of mortality selection such as the Bayes
Information Criterion (BIC), robustness in parameters, standardized residuals, and the
comparison of nested models. From this, Cairns, et. al, (2009) concluded that no specific model
stands out as being better than others. However, they observed that different models have
different strengths in the way they project mortality.

Some of the strengths include; the Renshaw-Haberman model allows for great flexibility
of the age effects while the one-dimensional P-splines method allows for smoothing of the age
effects if the effects are seen as a drawback to estimating mortality. Similarly, the CBD model
and its extensions were also seen to allow the smoothing of age effects but in contrast to the
Renshaw-Haberman model they allow for richer period effects. Based on all these strengths
Cairns et al., (2009) concluded that according to the BIC, the best model for the England-Wales
data would be an extension of the CBD model that assumes that the impact of the cohort effect
for any specific cohort diminishes over time instead of remaining constant. Moreover, for the
United States dataset, the preferred model would be The Renshaw and Haberman model.
However, based on the robustness of parameter estimates, a generalized model of the CBD that
adds a quadratic term to the age effect was seen as the perfect fit for both datasets.

2.2.4 History of Life Tables

The evolution of life tables spans centuries, beginning with John Graunt's work on age
patterns in populations in his “Natural and Political Observations upon Bills of Mortality,”. From
Graunt's conceptualizations, pioneers like Edmond Halley and Dr. Price Northampton
contributed to the development of the first life tables. These early milestones laid the foundation
for subsequent advancements, including Mine's Carlisle table and Dr. Farr's English Life Tables,
constructed using census data.

Globally, life expectancy surged in the 20th century, particularly with the advent of
universal healthcare. However, Kenya's trajectory was marred by the HIV/AIDS epidemic,
causing a significant decline in life expectancy from the late 1980s until the early 2010s. Despite
this setback, the country managed to recover to pre-epidemic levels by 2010s.even with the
Covid pandemic the life expectancy is still higher than in previous years. (O'Neill, 2022)

Throughout history, Kenya relied on British mortality tables that failed to accurately
represent its population. The turning point came with the development of the KE 2001-2003
Tables, marking Kenya's shift from outdated British tables to its own mortality table. This pivotal
moment showcased Kenya on the global stage for taking charge of its mortality data. Subsequent
efforts, such as the KE 2007-2010 mortality tables, were based on meticulous data collected by
the Association of Kenya Insurers(AKI). This investigation specifically focused on
understanding the mortality patterns among assured lives in Kenya and the tables are already in
use.

Collaborative efforts involving industry players, research bodies, and regulatory


institutions like the IRA(Insurance Regulatory Authority) played a vital role in these
developments. The meticulous data collection and analysis not only underscore the commitment
to refining life tables but also signify a significant step forward in understanding mortality trends
within Kenya.
Other works contributing to the history of life tables in Kenya include, Mikhala (1985)
utilized life tables to explore adult mortality differences in Kenya, affirming disparities across
genders. Notably, he found more robust death registration data for males than for females in most
districts. His constructed mortality tables further suggested that males generally held higher life
expectancies than females.

Kibiwott Bett(2017) employed the Lee-Carter model to generate abridged life tables
using Kenyan population data, revealing a strikingly higher infant mortality rate compared to
other age groups. Additionally, adult males exhibited higher death rates than females, while
uncertainties surrounded old age mortality due to data limitations.

Machau (2014) utilized graduation techniques to model Kenyan mortality trends,


determining that higher-order polynomials effectively graduated crude rates. Comparison with
English life tables revealed higher mortality rates in the standard tables, notably with females
exhibiting higher mortality than males.

However, challenges persist, particularly in data collection and reliability, highlighted by


gaps in birth and death registration. Based on the most recent data presented in the Kenya Vital
Statistics Report, it was noted that in 2021, roughly 14% of births and 45% of deaths were not
officially documented (CRVS - Birth, Marriage and Death Registration in Kenya, n.d.). This
implies that approximately 86% of births and 55% of deaths were properly registered. It is
important to highlight the considerable progress made in the comprehensiveness of birth and
death registration over the past half-decade in Kenya (WHO, 2022). The reliability of this
registration data is contingent on factors like the efficiency of the data collection process and the
trustworthiness of the information sources. Efforts to improve data quality and methodologies, as
well as the need for periodic reviews of these tables, are essential for enhancing accuracy and
relevance.

Looking ahead, the future promises more accurate and representative life tables for
Kenya as data availability improves and collaborative efforts continue to refine the
understanding of mortality trends within the population. This historical journey from early
conceptualizations to specific developments in Kenya reflects not just the evolution of life tables
but also the nation's resilience and commitment to capturing and comprehending its population's
mortality patterns.

2.3 Research gap

According to the report, Kenya Demographics and Health Statistics 2022, little is known
about adult mortality in Kenya when compared with infant and child mortality due to various
reasons. The reasons include: while early childhood mortality can be estimated through the birth
history approach, there is no equivalent in adult mortality measurement; death rates are also
much lower at adult ages than at childhood, and hence estimates for particular age groups can be
distorted by sampling errors and there is usually very limited information about the
characteristics of those who have died. While the same can be said about data on childhood
mortality, it is reasonable to expect the characteristics of parents to directly influence their
children’s chances of survival.

Due to this, there has been little to no study of adult mortality, with a focus on child and
infant mortality in Kenya. In Kenya, most researchers have used traditional methods to predict
infant mortality and only a few studies have applied machine learning methods. In Kenya, little
to no studies have been done using any of the mortality prediction models, therefore there is a
need to study mortality and make predictions with various mortality prediction models then
compare the mortality predictions from the methods.
CHAPTER THREE

METHODOLOGY

3.1 Summary

In this project, we aim to predict mortality rates for the 1980 cohort in Kenya using
Renshaw-Haberman and Cairns Blake Dowd models. Then create cohort life tables, one using
the projected mortality rates from Renshaw-Haberman and the other one from the projected
mortality rates from the CBD model, and compare the life expectancies, mortality rates, and
crude death rates of the tables created from the predicted mortality rates of these two models. To
achieve this, our methodology will involve data collection with relevant features, preprocessing,
validation of the models, predicting mortality rates using the models, construction of cohort life
tables for males and females, and comparison of the mortality trends.

3.2 Modelling Framework/Preliminaries

3.2.1 Mortality Trends

𝑞𝑥,𝑡 : The probability of a person aged exactly x dying within the next t years.

𝑚𝑥,𝑡 : Central death rate at age x in year t

𝑙𝑥+𝑡
𝑇𝑥: 𝑙𝑥
Total number of years lived

𝑇𝑥
𝑒𝑥= 𝑙𝑥
, the life expectancy of a person aged x

𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠


Cumulative death rates = ( 𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
* 1000 ): the total number of deaths over

a specified period.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑠𝑢𝑟𝑣𝑖𝑣𝑖𝑛𝑔 𝑡𝑜 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑎𝑔𝑒 𝑥


𝑆𝑥= 𝑖𝑛𝑖𝑡𝑖𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑎𝑡 𝑏𝑖𝑟𝑡ℎ
: proportion of individuals in a population

that survives to different ages.


3.2.2 Model Selection and Justification

The selection of the Renshaw-Haberman and Cairns-Blake-Dowd (CBD) models


for evaluation in mortality rate prediction is driven by several key factors.In this study, we will
use the modified Lee-Carter model, and Renshaw-Haberman model(2006), which takes into
account cohort effects. Both the Renshaw-Haberman and CBD models are widely recognized
and their widespread adoption in academic research and practical applications makes them
relevant for comparative evaluation. The Renshaw-Haberman model, utilizing singular value
decomposition to capture general trends and age-specific patterns, and the CBD model,
incorporating additional components for flexibility, offer methodological diversity enabling a
comprehensive assessment of their respective strengths and weaknesses in mortality prediction.

Additionally, both the CBD and Renshaw-Haberman models are commended for their
simplicity and robustness. Comparative analysis with the Renshaw-Haberman model allows for
an understanding of how advancements in modeling, as seen in the CBD model, contribute to
improved accuracy or other desirable attributes. The well-documented methodologies and
implementation in widely used statistical software for both the Renshaw-Haberman and CBD
models ensure accessibility, facilitating a standardized evaluation process for researchers and
practitioners. The Renshaw-Haberman model contains the interaction component, while the CBD
model does not contain the interaction component but includes two time-specific components.

3.2.3 Data Collection

It’s important to note that the quality of our data will be assessed by checking for missing
values, outliers, and inconsistencies. Any limitations or biases in the data that could affect our
results, such as under-reporting of deaths or inaccuracies in age reporting, will be discussed.
Based on the most recent data presented in the Kenya Vital Statistics Report, it was noted that in
2021, roughly 14% of births and 45% of deaths were not officially documented (CRVS - Birth,
see Marriage and Death Registration in Kenya, n.d.). This implies that approximately 86% of
births and 55% of deaths were properly registered. It is important to highlight the considerable
progress made in the comprehensiveness of birth and death registration over the past half-decade
in Kenya (WHO, 2022). The reliability of this registration data is contingent on factors like the
efficiency of the data collection process and the trustworthiness of the information sources.

3.2.4 Software and Tools

We plan to use R and Power BI. In R we will use different packages to clean data, model testing
and validation, and make the predictions. We will use Power BI for data visualizations.

3.3 Objective 1: To estimate mortality rates using the CBD and Renshaw-Haberman
models from their current age in the year 2022 up to age 80.

3.3.1 Renshaw-Haberman Model

The Renshaw-Haberman model is a stochastic statistical method for forecasting mortality


rates based on historical data. In this study, we will use the modified Lee-Carter
model(Renshaw-Haberman), which takes into account cohort effects.

Its assumptions are:

● The logarithm of the crude death rate at a given age and time can be decomposed into
three components: an age-specific component, a time-specific component, an interaction
component, and a cohort-specific component.
● The age-specific component captures the general shape of the mortality curve across
different ages and is assumed to be constant over time.
● The time-specific component captures the overall level of mortality and its trend over
time and is assumed to follow a random walk with drift.
● The cohort-specific component captures the cohort effects of individuals aged x years
born in the year t-x and assumes that people born in the same generation will experience
the same trend in mortality.
The model can be expressed as

𝑙𝑛(𝑚𝑥,𝑡) = α𝑥 + β𝑥𝑘𝑡 + 𝑦𝑡−𝑥 + ε𝑥,𝑡 (1)

where:

● 𝑚𝑥,𝑡 is the central death rate at age x and time t

● α𝑥 is the age-specific component.

● β𝑥 is the age-specific sensitivity to the time component.

● 𝑘𝑡 is the time-specific component.

● γ𝑡−𝑥 is the cohort-specific component.

● ε𝑥,𝑡 is the error term.

The model parameters are estimated by singular value decomposition (SVD), a technique
that minimizes the sum of squared errors. The model can then forecast future mortality rates by
extrapolating the time-specific component and applying the estimated age-specific and
interaction components.

Singular Value Decomposition (SVD) is a matrix factorization technique used in linear


algebra and numerical analysis. It breaks down a matrix into three constituent matrices, enabling
a deeper understanding of the data's structure and aiding in various mathematical computations
and data analyses.

To estimate the parameters of the Renshaw-Haberman model, we first transform the


age-specific mortality rates using the logarithm 𝑙𝑛(𝑚𝑥,𝑡). Then create a matrix M where each row

corresponds to an age group, each column corresponds to a time period, and the entries are the
transformed mortality rates. Apply SVD to the matrix M given by;

𝑇
M=UΣ𝑉 (2)

Where;
U is the left singular vectors matrix.

Σ (sigma) is a diagonal matrix of singular values.

𝑇
𝑉 is the transpose of the right singular vectors matrix.

The values of α𝑥,β𝑥 and 𝑘𝑡 are extracted from the matrices where;

The values of α𝑥are related to the first column of U.

The values of β𝑥 are related to the first column of V.

The values of 𝑘𝑡 related to the first singular value and the corresponding columns of U
𝑇
and 𝑉 .

α𝑥 = 𝑈1,𝑥 (3)

β𝑥 = 𝑉1,𝑥 (4)


1,1
𝑘𝑡 = * 𝑈1,𝑡 (5)
𝑇

𝑇
Here,𝑈1,𝑥 and 𝑉1,𝑥 denote the first column of matrices U and 𝑉 respectively, Σ1,1is the first

singular value and T is the number of time periods.

3.3.2 Cairns-Blake-Dowd Model

The Cairns-Blake-Dowd (CBD) model is a two-factor stochastic method.

The Cairns-Blake-Dowd (CBD) model using the logit of the mortality rate can be
expressed as follows:
(1) (2) (3)
𝑙𝑜𝑔𝑖𝑡 (𝑞𝑥,𝑡 ) = 𝑘𝑡 + 𝑘𝑡 (𝑥 − 𝑥) + γ𝑡−𝑥 (6)

where:

● 𝑞𝑥,𝑡 is the mortality rate at age x and time t.

(1) (2)
● 𝑘𝑡 and 𝑘𝑡 are the time-specific components.

● x̄ is the average age in the data.


(3)
● γ𝑡−𝑥 is the cohort-specific component.

This model is either a log-Poisson or a log-negative-binomial version of the CBD model.


The model parameters are estimated by the ordinary least squares method. Ordinary Least
Squares (OLS) is a method used in statistical regression analysis to estimate the parameters of a
linear regression model. The goal of linear regression is to find the best-fitting straight line
through a set of data points that minimizes the sum of the squared errors. The following formulas
will be used;

The Cairns-Blake-Dowd (CBD) model has several assumptions:

● The model assumes that the age effects are simple and allow different improvements at
different ages at different times.
● The model assumes that there is a perfect correlation across ages.
● The model accounts for cohort effects, which are the influences of shared experiences
and characteristics of people born in the same year or period.

Variables involved in the models:

(i) Age-specific component (α𝑥)

The age-specific component refers to the baseline level of mortality for a particular age
group or cohort in the absence of the effects of the time trend, cohort effects, or other
factors. It is the general shape of mortality by age.

(ii) Time-specific component

The time-specific component has different dynamics for the two models. In
Renshaw-Haberman, 𝑘𝑡, is the time index representing the level of mortality at time t.

(1)
In CBD, 𝑘𝑡 , is the intercept of the model. It affects every age in the same way and it
(2)
represents the level of mortality at time t. 𝑘𝑡 represents the slope of the model, every

age is differently affected by this parameter.

(iii) Cohort Effects (γ𝑡−𝑥 )

The cohort effect represents the impact of shared experiences, environmental factors, and
other cohort-specific influences on mortality rates.

iv)Interaction component

The interaction component is the product β𝑥𝑘𝑡 and it represents the impact of the overall

time trend 𝑘𝑡 on the age-specific mortality rates α𝑥. It describes mortality dynamics over

age and time. The β𝑥 values determine how sensitive each age group is to the overall

trend in mortality. β𝑥 describes the extent to which mortality at age x changes given the
overall temporal change in the general level of mortality: greater values of β𝑥 are

associated with faster mortality change.

3.3.3 Assumption on mortality

In this study, we will assume that mortality was not affected by the Covid-19 pandemic.

3.3.4 Model Validation

To validate our model, we plan to use out-of-sample validation where some of our data
will be held out for testing the model’s predictions.

Data Partitioning:

Training Set:

The training set will comprise 80% of the total dataset i.e from ages 0-33 for the 1980
cohort. This subset will be used to train the two models. To achieve this, we will use R to
estimate the parameters of the models using the least square method. The models will learn
patterns, relationships, and features from this portion of the data.

Test Set:

The test set will constitute the remaining 20% of the dataset i.e from age 34-42. This
independent subset serves as a benchmark to assess the model's generalization to new, unseen
data. We will fit the test data into the models trained above to forecast the mortality for this
age(34-42). The model's performance will then be evaluated based on its ability to make accurate
predictions on this distinct set.

3.4 Objective 2: To construct cohort life tables, using the Renshaw-Haberman and CBD
model.

Using the collected data and predicted mortality rates we will construct the male and
female cohort life tables for the two models.
The values predicted from the Renshaw-Haberman model are the values of 𝑚𝑥 , hence

we need to calculate the values of 𝑞𝑥 from the relationship between the two.

The death rate,( 𝑚𝑥,𝑡 ), and the mortality rate, ( 𝑞𝑥,𝑡 ), are typically very close to one

another in value. With a simple assumption, we can formalize this relationship more precisely:

Assumption: For integers t and x, and for all, that is, the force of mortality remains
constant over each year of integer age and over each calendar year. This implies that:

a. 𝑚(𝑡, 𝑥) = µ(𝑡, 𝑥) (9)


b. 𝑞(𝑡, 𝑥) = 1 − 𝑒𝑥𝑝[− µ(𝑡, 𝑥)] = 1 − 𝑒𝑥𝑝[− 𝑚(𝑡, 𝑥)] (10)

Relationship (a) is often used in the analysis of death rate data. Relationship (b) is useful
in the analysis of parametric models for mortality that are formulated in terms of ( 𝑞𝑥,𝑡 ). The

assumption does not normally hold exactly, but the resulting relationship between ( 𝑚𝑥,𝑡 ) and

( 𝑞𝑥,𝑡 ) is generally felt to provide an accurate approximation.

3.4.1 Assumptions of our Life Table

1. The only form of exit is death.


2. Assumes a closed cohort where births, deaths, immigration, and emigration are not
considered. This assumption simplifies calculations and analysis.
3. Individuals within the cohort are assumed to have the same mortality characteristics and
variations within the cohort are not explicitly considered.

Our life tables will contain the following variables;

x: age
𝑙𝑥 : the survivor-ship function”: the number of persons alive at age x.

𝑑𝑥 : number of deaths in the interval (x, x + 1) for persons alive at age x, computed as

𝑑𝑥 = 𝑙𝑥 − 𝑙𝑥+1 (11)

𝑞𝑥: the probability of a person aged exactly x will die before reaching age x+1

𝑚𝑥 : the mortality rate or the central death rate at age x.

𝐿𝑥 : total number of person-years lived by the cohort from age x to x + 1. This is the sum of the

years lived by the 𝑙𝑥+1 persons who survive the interval, and the dx persons who die during the

interval.

𝐿𝑥 = 𝑙𝑥+1 + 0. 5(𝑑𝑥) (12)

𝑙𝑥+𝑡
𝑇𝑥 = 𝑙𝑥
: total number of person-years lived by the cohort from age x until all members of the

cohort have died.


𝑒𝑥 : the life expectancy of persons alive at age x, computed as

𝑇𝑥
𝑒𝑥= 𝑙𝑥
. (13)

We will use R to generate the tables.

3.5 Objective 3: To compare the mortality trends based on the two models

3.5.1 Comparison of mortality trends

The trend analysis involves analyzing; time trends of mortality rate, overall mortality
rate, survivorship curves, cumulative death rates, gender differences, probability of death, crude
death rate, and life expectancies against time.

To do this comparison we will need the life expectancy values ( 𝑒𝑥 ), mortality rates (𝑞𝑥,,),
and the crude death rate(𝑚𝑥) from the life tables. We will plot each of these values against time

(ages or years) to get a visual of how the mortality trend of the cohort moves as they age
depending on the predictions of the two models. We will conduct a comprehensive analysis of
these trends to identify and explain any differences and similarities that will occur.

3.5.2 Performance evaluation of the models

Evaluating the performance of a predictive model is essential to gauge its accuracy and
reliability in making predictions. By employing various metrics, we can objectively measure how
well the models perform in predicting outcomes based on the provided data. These metrics allow
us to understand the extent to which the model's predictions align with the actual observed
values. In this analysis, we'll be using RMSE and MAE, two widely accepted measures in the
field of predictive modeling, to quantify the accuracy of our predictions and understand the
average magnitude of errors between predicted and actual.

RMSE measures the average difference between values predicted by a model and the
actual values. It provides an estimation of how well the model is able to predict the target value
(accuracy). The lower the value of the RMSE, the better the model is.

It is given by:

𝑛 2
1
RMSE= 𝑛
∑ (𝑦 − 𝑦) (14)
𝑖=1

Where;

y = actual mortality rates

𝑦 = Predicted mortality rates

The Mean Absolute Error (MAE) is a metric used to measure the average magnitude of
errors between predicted and actual values. It assesses the accuracy of a predictive model. Lower
MAE values indicate better performance, but the interpretation of ‘how good’ is often
problem-specific. It's also common to use MAE in conjunction with other metrics to get a more
comprehensive understanding of a model's performance.

It is given by:

𝑛
1
MAE= 𝑛
∑ |𝑦 − 𝑦| (15)
𝑖=1

Where;

y = actual mortality rates

𝑦 = Predicted mortality rates


REFERENCES

De Moivre, A. (1731). Annuities upon lives, or, the valuation of annuities upon any number of
lives, as also, of reversions: To which is added, an appendix concerning the expectations
of life, and probabilities of survivorship. London printed, and Dublin re-printed, by and
for S. Fuller.

Gompertz, B. (1825). XXIV. On the nature of the function expressive of the law of human
mortality, and on a new mode of determining the value of life contingencies. In a letter to
Francis Baily, Esq. FRS &c. Philosophical Transactions of the city of London, (115),
513-583.

Brass, William. 1971. Mortality models and their uses in demography. Transactions of the
Faculty of Actuaries 33: 123–142.

Cairns, A. J. G., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A., & Balevich, I.
(2009). A Quantitative Comparison of Stochastic Mortality Models Using Data From
England and Wales and the United States. North American Actuarial Journal, 13(1),
1–35. https://doi.org/10.1080/10920277.2009.10597538

Wilmoth, John R. 1990. Variation in vital rates by age, period, and cohort. Sociological
methodology 20: 295–335.

Lee, Ronald D., and Lawrence R. Carter. 1992. Modeling and Forecasting US Sex Differentials
in Mortality. International Journal of Forecasting 8: 393–411.

Deaton, Angus S., and Christina Paxson. 2004. Mortality, Income, and Income Inequality over
Time in Britain and the United States. In Perspectives on the Economics of Aging.
Chicago: University of Chicago Press

Brouhns, Natacha, Michel Denuit, and Jeroen K. Vermunt. 2002. Measuring the Longevity Risk
in Mortality Projections. Bulletin of the Swiss Association of Actuaries 2: 105–30.
Giacometti, Rosella, Sergio Ortobelli, and Maria Ida Bertocchi. 2009. Impact of Different
Distributional Assumptions in Forecasting Italian Mortality Rates. Investment
Management and Financial Innovations 6: 65–72.

Mitchell, Daniel, Patrick Brockett, Rafael Mendoza-Arriaga, and Kumar Muthuraman. 2013.
Modeling and Forecasting Mortality Rates. Insurance: Mathematics and Economics 52:
275–85.

Renshaw, Arthur E., and Steven Haberman. 2006. A Cohort-Based Extension to the
Renshaw-Haberman Model for Mortality Reduction Factors. Insurance: Mathematics
and Economics 38: 556–70.

Suzanne Davies Withers. (2009). Longitudinal Methods (Cohort Analysis, Life Tables). Elsevier
EBooks, 285–292. https://doi.org/10.1016/b978-008044910-4.00469-7

Thelle, D. S., & Laake, P. (2015). Epidemiology. Research in Medical and Biological Sciences,
275–320. https://doi.org/10.1016/b978-0-12-799943-2.00009-4

Wunsch, G. (2012). Introduction to Demographic Analysis. Springer Science & Business Media.

Glenn, N. D. (1977). Cohort analysis. Sage Publications.

CRVS - Birth, Marriage, and Death Registration in Kenya. (n.d.). UNICEF DATA.
https://data.unicef.org/crvs/kenya/

WHO. (2022, May 18). Improving Civil Registration, Vital Statistics And Health Data Through
Strengthened Partnerships In Kenya. Www.who.int; World Health Organization.
https://www.who.int/news-room/feature-stories/detail/strengthening-health-data-kenya

Kenya National Bureau of Statistics. (2003). Kdhs2003fulreport. Kenya National Bureau of


Statistics. https://www.knbs.or.ke/download/kdhs2003fulreport/

Kenya National Bureau of Statistics. (n.d.). 2014 Kenya Demographic And Health Survey.
Kenya National Bureau of Statistics. Retrieved November 19, 2023, from
https://www.knbs.or.ke/download/2014-kenya-demographic-and-health-survey/
World Health Organization. (2022). Kenya data | World Health Organization. Data.who.int.
https://data.who.int/countries/404

UNICEF(2015). KEN - UNICEF DATA https://data.unicef.org/country/ken/

Worldometers.Kenya COVID - Coronavirus Statistics Kenya COVID - Coronavirus Statistics -


Worldometer (worldometers.info)

Waruru, A., Onyango, D., Nyagah, L., Sila, A., Waruiru, W., Sava, S., Oele, E., Nyakeriga, E.,
Muuo, S. W., Kiboye, J., Musingila, P. K., van der Sande, M. A. B., Massawa, T., Rogena,
E. A., DeCock, K. M., & Young, P. W. (2022). Leading causes of death and high mortality
rates in an HIV-endemic setting (Kisumu County, Kenya, 2019). PLOS ONE, 17(1),
e0261162. https://doi.org/10.1371/journal.pone.0261162

Kenya National Bureau of Statistics (2022). Kenya Demographics and health statistics

https://www.knbs.or.ke/wp-content/uploads/2023/07/2022-KDHS-Summary-Report.pdf

Kibiwott Bett, N. (2017). Modeling and Forecasting Mortality and Longevity Risk Based on
Insucient Data: Kenyan Population. Research Report in Mathematics, 18.
http://erepository.uonbi.ac.ke/bitstream/handle/11295/101300/Bett_Modelling%20and%2
0Forecasting%20Mortality%20and%20Longevity%20Risk%20Based%20on%20Insuffici
ent%20Data-%20Kenyan%20Population.pdf
WORK PLAN

Activity Sep 2023 Oct 2023 Nov 2023 Dec 2023 Jan 2024 Feb 2024 Mar 2024

Study the
theoretical
Background

Literature
Review

Methodology
and proposal
presentation

Data analysis
and Findings

Project report
BUDGET

ITEM DESCRIPTION QUANTITY TOTAL


COST(Ksh)

Proposal Printing copies of the 6 1000


research proposals

Internet Monthly subscription 7 months 14000

Data collection Data sourcing 2000


Transport

Research project Printing copies of the 6 1500


final document

Total cost 18500

You might also like