Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Trick or Treat?

Application of Neural
Networks in Insurance
January 10th, 2019

Dr Ben Flood, bflood@kpmg.com, https://www.linkedin.com/in/drbenflood/


Jan Brunckhorst, jbrunckhorst@kpmg.com, https://www.linkedin.com/in/jbrunckhorst/
Mathias Leppmeier, mleppmeier@kpmg.com, https://www.linkedin.com/in/mleppmeier/
Zeno Schneider, zschneider@kpmg.com, https://www.linkedin.com/in/zenoschneider/
1 Introduction.................................................................................................................. 1
1.1 Purpose of this document ........................................................................................ 1
1.2 About the authors ................................................................................................... 1
1.3 What is a neural network?........................................................................................ 1
1.4 What type of people build neural networks?............................................................... 1
1.5 Where we learn about neural networks ..................................................................... 2
2 Applying neural networks ............................................................................................... 2
2.1 What is a neural network used for? ........................................................................... 2
2.2 Steps to set up a neural network ............................................................................... 4
3 Future of risk & regulatory .............................................................................................. 5
3.1 Verification with a traditional model .......................................................................... 5
3.2 Ethics of letting a machine making decisions ............................................................... 5
3.3 Going beyond best estimates.................................................................................... 5
3.4 Individualised Pricing & Reserving ............................................................................. 6
4 Technical ...................................................................................................................... 6
4.1 Architecture of a neural network............................................................................... 6
4.2 How do neural networks learn? ................................................................................ 7
4.3 Architecture engineering is the new feature engineering .............................................. 8
5 Tools............................................................................................................................ 8
5.1 Libraries ................................................................................................................ 8
5.2 Programming Language ........................................................................................... 9
6 Discussion..................................................................................................................... 9
1

Neural networks can operate directly on raw


1 Introduction data, and need little to no encoding to
descriptive variables by humans. This means
1.1 Purpose of this document image and speech files can be used directly as
We have written this document to share our input. This is the innovation that allows
excitement and our experience with neural Google Translate to understand our speech,
networks. We give an overview of this new and Amazon’s Alexa to understand our voice
technology, including some examples and commands, and allows social media and
details to help you dive in to this rapidly- photo tools on our phones to identify the
evolving area. people in a photo.

This is not an academic paper, and opinions The mathematical basis of a neural network is
therein are only our own. Our view is not hard to understand, but it can be very
captured by the following three points. difficult to understand how the parts
combine.
1 Neural networks perform one trick,
matching output to input, really well. Importantly, with the available software
2 Lots of business processes can be seen as libraries, the mathematics do not need to be
performing that trick. understood to use neural networks.
3 The barrier to entry for applying neutral Nevertheless, training and debugging models
networks is low. is easier with the mathematical basis.

1.2 About the authors 1.4 What type of people build neural networks?
The authors work in consulting in the CFRO Machine learning experts and data scientists
department of KPMG in Munich, in the areas typically train neural networks. A machine
of actuarial science and risk management. learning expert is expected to be able to
optimize the algorithms, and a data scientist is
1.3 What is a neural network? expected to know more about testing
hypotheses, as a general rule.
A neural network picks the best output from a
list of choices, given an input. If a business More recently, the software is so practical
process can be expressed as picking an output that people from other fields are training their
from a list of choices, and you want to own neural networks. Increasingly the tools
automate that, then a neural network is are used by actuaries, risk managers,
probably the way to go. There have long underwriters, and more. National actuarial
existed other methods to perform the same organizations are actively launching data
task – the key innovation, we think, is that a science initiatives in many countries.
neural network is really good at this task.
There is no formal definition of a data
A neural network is a machine learning scientist. A panel discussion on registering
technique. It falls under the label artificial data scientist as a protected profession, at the
intelligence (AI), because it can replace tasks German Data Science Days conference in
where a human would be expected to make a Munich in 2018, showed the idea as very
decision; it is sometimes treated as unpopular with practitioners.
synonymous with AI, but the field of AI is
much broader. The idea of a neural network is In our experience in the actuarial and risk
simple, but has many applications, and is area, it is often easier to train people with
changing the way we work. mathematical qualifications and an affinity for
coding and databases than to directly find
data scientists. The computing part involves a
2

lot of failure and debugging, and the need to Forums


learn never stops. The affinity for computing Hacker News (news.ycombinator.com) is a
appears to be particularly important to great source for community submitted links
success in data science. on machine learning, computer science and
startups. GitHub repositories often have lots
We have seen several companies very
of detail and explanations.
successfully roll out data science training for
their existing staff. Anecdotally, the key, as Workshops
shared in the community, is that there needs We have hosted and taken part in workshops,
to be at least some initial resource, who both internally and externally, on technology
understands the techniques and technology, and use cases for machine learning in the
to guide training and recruiting the team. insurance industry. A workshop in which
participants present their own use cases, and
While new graduates often have a very high
then brainstorm, is usually a very efficient way
level of technical skill, framing business
to share knowledge and generate projects.
problems such that they can be solved by
machine learning or statistical methods is Graduates
often a hurdle. Hires with these skills are hard Hiring new graduates is a good source of
to find, and harder to evaluate. information. While only two or three years
ago it was nigh on impossible to recruit
1.5 Where we learn about neural networks graduates who had studied machine learning
The field of machine learning moves fast. or data science, the numbers of courses and
Quoting a postdoctoral researcher from a interested students has exploded. These
seminar at a recent Datageeks meetup in graduates often have a good network within
Munich, if you miss the last two or three which information is shared.
papers in reinforcement learning, you are
already behind. Keeping up with 2 Applying neural networks
developments is an ongoing process and relies
on several changing sources. Find a good data 2.1 What is a neural network used for?
scientist, and you have probably found
someone who enjoys learning new things. Since a neural network picks an output from a
list of choices, given an input, a neural
Meetups network will only work on aspects of the
There are most likely multiple data scientist business problem that can be described in this
meetups in your city – findable via a search simple way. Fortunately, many business
engine. Practitioners love to meet up and problems can be described in this way.
share their expertise. These are usually quite
Insurance
informal events (think pizza and beer).
Datageeks and the Insurtech & Fintech For example, many health insurers face the
meetups are particularly established in business problem of trying to improve their
net promoter score. This does not obviously
Munich. Events with a commercial focus are
less likely to attract practitioners. generalize to picking an output from a list. It
has, however, been identified that higher net
Virtual Courses promoter scores can be achieved by paying
We have used Coursera and Datacamp, but claims faster. Also that many claims were
there are many other options available for obviously payable. This aspect of the overall
online courses, including IBM, Udemy, Google, business goal does generalise to picking a
Amazon Web Services, etc. value from a list. A neural network is used to
pick whether a claim is instantly payable
3

without review. This speeds up the claims Other Fields


payment process for many (one example is Call centres and robo advisors are using
25% of claims payable within 3 seconds). This neural networks to decide based on live call
process has been applied by many insurers monitoring whether to pass the call to a
internationally. human or not.
Another application is the decision to accept Neural networks are particularly effective for
an application for life insurance automatically, problems involving speech recognition and
without a medical underwriter needing to image recognition. Speech recognition is
review the requirements for medical picking which word has been said, given
assessments. One company we know already waveforms as input. Image recognition is
can approve 70% of applications picking which object is in a picture given pixels
automatically. as input. Instead of painstakingly deriving
features from waveforms and images, and
A major insurer issued a data science
algorithms to analyse them, neural networks
competition to identify the driver of a car
are used to do the hard work.
based on telematics data.
An international energy company, using the
Further, neural networks are being used by
wind as the input, and the settings of a wind
many insurers and reinsurers to predict
turbine as output, has increased its energy
claims, for pricing or reserving. This has been
output from its windfarms by 1% a year using
considered for both frequency and severity of
neural networks.
claims and for loss triangles.
Finance
Using the inputs from a wide range of sensors
on a car, the major car companies are
Portfolio managers, hedge funds, and hedging
producing the output of the action a car
departments are testing neural networks to
should take next, and this builds the
maximize returns subject to a given risk
foundation of autonomous driving. Training
appetite, or minimize risk subject to given
data is being gathered by real and virtual trials
returns.
(amusingly, in some cases the training data is
created using computer games about driving,
including grand theft auto).

Table 1 Sample of Applications

Item Input Output Goal


Claims Payment Claim Data Automatically approve Reduce time it takes to
payment pay a claim
Customer Churn / Customer data, interactions, Lapse in the next period Minimise Lapse
Lapse products, etc. or not
Next Best Offer Customer data, interactions, Which product to sell Maximise Sales
channel, products, etc.
Medical Customer application data Automatically approve Maximise conversion
Assessment application rate
Portfolio Market Data, Financial A permitted portfolio Maximum Return
Management Reports, Social Media Feeds allocation
Wind Turbine Winds Turbine settings Maximise Energy Output
Speech Recognition Audio waveform Words Automate processes
Call Centre Waveform, or text of the call Pass the call to a human Automate processes
operator or not
Image Recognition Pixels Object Automate processes
4

more difficulty accessing databases across


2.2 Steps to set up a neural network
departments, and data from legacy systems.
Frame the problem For example, for training neural networks to
The first, and often hardest, step is to frame predict the next best offer for cross-selling,
the business problem as a machine learning multiple product databases need to be joined,
problem. This does not mean changing the and this is often a showstopper for a whole
business problem – just describing the project.
business problem in a way such that a neural
network can handle it. Data is always a problem and it is safe to
assume that getting clean data will take
This is always a cross-disciplinary effort. longer than expected. Back a few years ago
Neural network specialists are usually not when ‘big data’ was the key buzzword, many
good at identifying the business problems, newly established data science departments
and business specialists often have difficulty contacted the business owners with the
generalizing their business problem to a message, “Give us your data, and we will find
problem that can be solved by machine great insights.” From the business owners’
learning. perspective, creating data queries and
cleaning data is hard work, and it is not clear
In the workshops we have been part of, we
who is going to get the credit for the insights.
have sometimes seen machine learning
This needs to be managed sensitively for
specialists try to shoehorn everything into
cooperation to be successful.
using the latest algorithm, when a simpler
solution suits better. On the other hand, if you Train a neural network
don’t let your techies play with the newest We often hear of new data scientists that
algorithms, they will quit and go work want to build their software from scratch.
somewhere else. This needs to be balanced. While it might be a useful academic exercise
to implement your own neural network
Framing the problem involves engaging
trainer for a simple example, we prefer to use
business owners. Engaging business owners to
one of the libraries available. The developers
help identify data science projects is always a
of open source libraries have already faced
challenge. They need to be incentivised to
and solved an abundance of tricky technical
support the project with their time and
problems, the majority of which are probably
expertise, and the fear that they or their staff
not relevant to the business problem you are
will be replaced by a computer needs to be
solving.
managed carefully. Anecdotally, many of the
applications we see in insurance are to Neural networks can be trained on your
support overworked staff in claims handling laptop. For larger applications, Amazon,
and reporting, or replace retiring staff. We see Microsoft, Google, and others provide pre-
often more engagement where business built operating system images in the cloud
owners are made part of the data analytics with many software configurations already
team. installed. Libraries provide support for
Get the training data alternative processors, such as graphics
processors, which allow speed-up through
Collecting enough training data is the next big
better parallelization and other properties.
challenge. To train the network, you need to
have a data set consisting of inputs and the Training a complicated neural network can
corresponding outputs. For getting data in the take hours or days, and it is advisable to get
insurance industry, data privacy and data into the habit of carefully documenting each
quality are key issues. In comparison with experiment. Fitting mathematical models
internet firms, for example, there is often
5

involves mathematics, and some engineering, One of the main problems, preventing the
to set the computers up. But the bulk of the wide usage of neural networks in the
work looks more like chemistry: designing insurance industry, is the lack of
experiments, setting out what you expect to interpretability of those models. Research has
learn, trying to set up the experiment just shown that there are some mathematical
right, waiting, and documenting the results. It concepts to increase the interpretability, but
is very easy to waste calendar time by not it is early days yet.
being systematic with the experimentation.
We see a ‘trust but verify’ method being used
Validate the results in practice. Say two models are fitted, a
Validating neural networks is not much traditional model and a neural network. The
different than validating models in the neural network is assumed to have better
actuarial, risk management, quant, or predictive power. The traditional model can
statistical world. Validation is still all about be explained easily. The rule goes: If both
separating training and validation data, models give a similar answer, use the neural
analysing residuals, and producing good network. If two materially different answers
graphics. However, explaining the parameters are given, investigate.
fitted by a neural network can be next to
impossible. 3.2 Ethics of letting a machine make decisions
Traditional statistical methods and machine In some of the discussions that we have been
learning usually use cross validation where part of, the consensus has been that a
the training and testing is done multiple machine can act to benefit the customer, but
times. With lots of data, larger neural it cannot take decisions that disadvantage the
networks can take a lot of computing power customer. This means automating paying a
to train. Training and validation data is often claim, accepting an application, for example,
defined once, and a test data set is kept aside. but not automatically refusing a claim or
application.
Use it to predict
Once the neural network has been trained, 3.3 Going beyond best estimates
using it to predict is an almost instantaneous
The main non-standard work we face using
process. The input is passed to the network,
neural networks for insurance is fitting
and the answer comes out.
probability distributions, which we solve by
Decisions about how and when to update the estimating conditional density function
network are complicated. The only new parameters.
consideration versus traditional modelling in
Standard neural networks provide point
insurance is that computing time might be
estimates and not estimates for distribution
expensive.
functions. These neural networks are not able
to compute percentiles or risk measures.
3 Future of risk & regulatory Since Solvency II, and other regulatory risk
frameworks, require metrics like value at risk
3.1 Verification with a traditional model to compute the capital requirements, it is
A neural network is a black box. If it does necessary to estimate features of probability
something weird, it might be hard to explain distributions.
to your stakeholders. It is a good idea to have This extension requires some extra
a simple, interpretable model as a baseline for programming work.
the neural network, where regulated and risky
activities are concerned.
6

2018, consists of 1024 layers with 340 million


3.4 Individualised Pricing & Reserving
parameters. The model was trained on 3.3
In non-life pricing, there are cases in which billion word corpus. For tasks like the
machine learning cannot be used in pricing for prediction of house prices or insurance claims,
regulatory reasons, but where the difference neural networks with ‘only’ a few layers are
in prices predicted via machine learning and already enough and can over-fit quickly. How
permitted methods can be used to identify wide and deep a neural network should be
pricing gaps. Good and bad risks are identified depends mainly on the amount of data
using this method, and management actions available, and the complexity of relationships.
are taken.
Neuron
The base element of a neural network is a
4 Technical deep dive neuron. A neuron computes a non-linear
function of a linearly weighted sum of its
4.1 Architecture of a neural network input data. The non-linear function is called
the activation function. Training the neural
A neural network is composed of layers of
network means estimating the weights.
neurons. It is believed that neural networks
with a lot of layers generalize better than Note that the specification of a neuron is
their shallower counter parts, but are harder equivalent to a generalized linear model,
to train. Neural networks for image where the weights are the coefficients, the
classification can have hundreds or thousands inputs are the independent variables, and the
of neurons if enough data is available and the link function of a GLM corresponds to the
task requires learning very complex, non- activation function of the neuron. The
linear relationships. activation functions, the same as link
functions, are used to model non-linear
Google’s natural language processing model relationships between the output and the
BERT, which was published in November input.
Figure 1 A neural network
7

Neural networks stack potentially hundreds or network architecture are used to solve
thousands of neurons, and, hence, can model particular problems.
very complicated non-linear relationships. It
For sequential data, like text or time series
has been proved that any mathematical
data, recurrent layers introduce a mechanism
relationship can be modelled exactly by a
for learning time-dependent relationships.
sufficiently complicated neural network.
The neurons receive information not only
Activation Function from previous layers, but also from previous
There is no single activation function that rounds in this layer. This means that the order
works best in all cases. Recent research has of the given information is important.
shown that a mixture of activation functions
Gated recurrent layers and long short-term
can be better. The choice of activation
memory (LSTM) layers are used to represent
functions for your problem affects both
deeper relationships in sequential data. The
predictive power and time it takes to train the
LSTM introduces a memory cell that is able to
network.
forget. LSTMs can learn very complex
The rectified linear unit (ReLU) function, or sequences and are used for time series
the ‘rectifier’, f(x) = max(0, x), is currently analysis with seasonal effects. They are also
favoured by many practitioners, because it widely used for writing and speech
works. The ReLU activation function is quick recognition.
to calculate. It also allows neurons to be
In image classification, convolutional layers
switched on and off for different data items,
use a sliding window to move across the
making it easy to represent multiple
image, and calculate a function of groups of
relationships in the same network.
pixels at a time. The results of this function
Several modifications to ReLU are available. are then used by the other layers. The
Exponential linear units (ELUs) have been neurons in a convolutional layer are only
shown to allow higher classification accuracy connected to local regions of the image.
than ReLUs. Leaky ReLUs allow a small
Pooling layers are also sometimes used to
positive gradient when the ReLU is not active.
compress the data in image classification, but
Softplus approximates the ReLU with a
their use is controversial. Quoting Geoffrey
smooth function.
Hinton, pioneer of deep learning, “The
In the early days of neural networks the pooling operation used in convolutional
sigmoid function, with its S-shaped curve, was neural networks is a big mistake and the fact
a very popular choice, but it was hard to train. that it works so well is a disaster.”
It compresses the values of the output to the
range between zero and one. The tanh 4.2 How do neural networks learn?
function became accepted as an alternative to The parameters of a neural network are
speed up training, and this is still used in trained via gradient descent. Given a
recurrent neural networks for predicting performance measure, the gradients can be
sequential data. computed with respect to the network
The list of activation functions used is parameters via back propagation. In back
continually growing. propagation, the chain rule for derivatives is
used recursively starting at the last layer and
Layers passing the gradients backwards through the
In a basic neural network, a single layer neural network. The parameters are then
usually has the same activation function for updated by taking small steps in the opposite
each neuron. Extensions of the basic neural direction of the calculated gradients.
8

The gradients are not usually computed on In theory, training a neural network is pretty
the whole training data set at once. A small, easy. In practice, selecting the hyper-
randomly-chosen batch is used to calculate parameters and network structure is as much
the gradients and update the model an art as a science. It takes a lot of trial and
parameters for each training step. This error to find a good network architecture and
process is called mini-batch gradient descent, the experimentation itself is very time
and is a variant of stochastic gradient descent. consuming.
A training step is a single calculation of the
gradients and parameter update, and an
epoch is a full cycle through all the data.
5 Tools & toys
Applying mathematical models has long been
Neural networks are prone to overfitting, but
possible by choosing the right tools. At a
methods are available to mitigate this, called
conference on Bayesian statistics in 2006 in
regularization. Dropout regularization
Lago di Como, the late John Nelder, described
randomly selects neurons that are left out
how he had released the GLIM software in the
during a training step. This requires the neural
1970s, so that non-statisticians could apply
network to not rely heavily on specific nodes
generalized linear models. At a seminar at
and is believed to improve generalisation.
Google’s office in Munich in 2018, one of the
Another regularization technique adds the L2- lead engineers of TensorFlow described how
norm of the neural network weights to the the software is being developed to enable
loss, which also smooths the loss surface and non-experts in machine learning to apply
improves the learning process. neural networks.

4.3 Architecture engineering is the new feature 5.1 Libraries


engineering We discuss three popular libraries here, Keras,
with the lowest barrier to entry; and
Deep learning allows raw data to be used in
TensorFlow and PyTorch, more powerful
the model, without a human needing to
libraries.
decide on useful variables.
Keras
For example, one of the first algorithms in
Keras is the easiest library to get started with.
computer vision to be successful at identifying
Several insurers we have spoken to use Keras
Chinese characters used 5 lines drawn
in real applications. It is developed to be a
through each character, and used the number
high-level library for neural networks, with a
of times the character crossed each line as the
focus on rapid prototyping. Besides standard
input data set. Ten or fifteen years ago, a lot
neural networks, it also supports
of research was done in evaluating tens of
convolutional and recurrent neural networks,
thousands of automatically calculated
and has gained favor for its out-of-the-box
descriptor variables from images, to identify
solutions.
useful variables.
TensorFlow and PyTorch
Now, the first few layers of the neural
TensorFlow and PyTorch are open source
network figure out themselves which
properties of the input data are useful for libraries, TensorFlow developed by Google,
prediction. This makes neural networks very and PyTorch developed primarily by
Facebook. TensorFlow is established, and
generally applicable, but there are still
PyTorch is more recent, but is gaining
obstacles to overcome.
momentum in academic papers. The licenses
9

and the projects’ main sponsors are different, exploratory and statistical work. In
but the functionality is increasingly the same. comparison to R, Python focuses on
productivity and code readability. The choice
All three libraries can be used on a laptop, and
depends on the task, knowledge of the
are supported by cloud platforms. Operating
language and personal preference.
system images with the software pre-installed
are available on most cloud platforms. Packages and Modules
The module repository of Python, called PyPi,
In contrast to Keras, TensorFlow and PyTorch
is by far behind the package repository CRAN
offer low-level as well as high-level
for R for statistical methods, although it’s
application programmers’ interfaces (APIs).
catching up. Statistical libraries like numpy
The low level APIs come with support for
and scikit-learn for Python grow and improve
machine learning generally and offer low level
further, but the focus is on basic functionality
functions, providing much more flexibility
in comparison to readily implemented
than Keras. PyTorch and TensorFlow have
statistical models in R.
frontends for C++, which is intended to enable
research in high performance, low latency C++ Visualization
applications. Python and R both support visualization, but
Python’s support is not built-in. Python
Since TensorFlow’s Estimator Framework and
supports graphical visualization with basic
PyTorch’s Ignite were launched, the
libraries like Matplotlib or advanced ones like
respective high level APIs, it is much easier to
Plot.ly. It requires more effort to achieve basic
get started with neural networks using
results than in R, but you also have more
PyTorch or TensorFlow.
possibilities, like interactive dashboards and
TensorBoard better support of web applications.
TensorBoard is a tool for visualization,
specially developed for TensorFlow, but can
also be used with PyTorch. It visualizes the
6 Discussion
whole training process of your models. Neural networks have suddenly become easy
to train and popular. In insurance, as with
5.2 Programming Language other industries, we are finding a gold rush of
If your team consists only of computer business problems we can apply this new
technology to.
scientists, software engineers, or similar, then
they might want Scala, C++, JavaScript, and Applying neural networks needs a
that’s fine. If, like us, you have a mixed team combination of business and technological
from various backgrounds, either Python or R acumen.
is a better choice.
Our outlook is, if there is a repeated task that
Python & R involves picking an output from a list of
Python and R are high-level languages that choices, it is probably going to be automated
allow application of algorithms, without using neural networks in the near to medium-
having to handle all the ugly details term future.
underneath. The languages frequently
reference the same libraries to do the hard
work.
Both languages are powerful tools for
statistical modeling and both have advantages
and disadvantages. R is primary used for

You might also like