1 Introduction to Bayesian data analysis

1.1 part 1: What is Bayes?

00:02 hello I’m Rasmus Bots and welcome to 00:06 this part one of a three part 00:07 intro-
duction to bayesian data analysis so 00:11 this is an introduction that I am being 00:12 giving
before for example at the 2015 00:15 use or conference and it’s targeted at 00:18 you who isn’t
necessarily that 00:21 well-versed in probability theory and 00:23 statistics but that do know
your way 00:25 around the programming language such as 00:27 R or Python and even though
it is in 00:31 three parts it is going to be quite 00:34 brief and I’m going to warn you that it
00:36 is also going to be quite hand-wavy in 00:39 parts. But I do hope it will give you 00:42
some intuition about what bayesian data 00:44 analysis is why it is useful and how you 00:48
can perform bayesian data analysis 00:50 yourself so this part one is about what 00:54 what
bayesian data analysis is. But before 00:58 we go into that I’m going to fade myself 01:00 out
and we’re going to start by looking 01:03 at some famous people.
So this is Nate 01:09 silver he’s one of the more famous 01:12 statisticians around not least
because 01:13 they did a very good job predicting the 01:15 outcome of the to Obama election
and he 01:19 wasn’t completely off in the Trump 01:20 election he’s currently the 01:22 editor-
in-chief of the well-known 01:24 data-driven news site 538 and here is 01:28 Sebastian Thrun
you want to draw up a 01:30 2005 challenge which was about building 01:32 a self-driving
car that could drive 01:34 through over 200 kilometres of rough 01:37 terrain and after that he
worked on 01:39 Google’s self-driving car finally here 01:41 is Alan Turing a giant in computer
01:44 science who helped crack the german 01:46 enigma cipher during the Second World
01:47 War which helped secure now at victory 01:50 unlikely shorten the war significantly.
So what do these three people have in 01:55 common? Well they all worked on complex
01:58 problems where there was a large 02:00 inherent uncertainty that needed to be 02:03
quantified and required efficient 02:05 integration of many sources of 02:07 information and
they all used bayesian 02:11 data analysis 02:13 that’s because data analysis is a 02:16 great
tool and are imply some are great 02:19 tools for doing Bayesian data analysis 02:21 but if you
google Bayesian there’s a 02:25 good chance you won’t find articles 02:27 about how this tool
could be used 02:28 instead you might get the philosophy.
02:31 You’ll find articles discussing whether 02:34 statistics should be subjective or 02:36
objective 02:37 whatever that means or whether 02:39 substation should adhere to frequent ism
02:41 or Bayesian ism as if there were 02:44 different religions within statistics 02:46 and
you will find heated arguments about 02:48 whether one should use or should not use 02:50
subjective probabilities rather than 02:52 p-values.
And in this tutorial I won’t 02:56 talk about any of this. I will just talk 02:58 about
Bayesian data analysis as one good 03:01 tool among many that you should have in 03:03 your
data science tool belt. So this 03:07 tutorial is about the what the why and how of Bayesian data
(i) Part one, which you are watching right now, try to answer what is bayesian data analy-
sis, (ii) part two touches on why you should want to use bayesian data analysis and (iii) part
three have some hints on how you how to actually perform a bayesian data analysis in practice.
So let’s start part one 03:33 proper: What is bayesian data analysis? Well, this can be
characterized in a number of ways from more helpful than others one that isn’t too helpful. But
that is correct is that Bayesian statistics is when you use probability to represent uncertainty in
all parts of a statistical model. So, if you use probability to represent all uncertainty, then you
are, by definition, using a Bayesian model.
You could also see 04:05 Bayesian data analysis as a flexible 04:07 extension of maximum
likelihood maximum 04:10 likely being perhaps the most common way 04:12 of fitting models
in classical 04:13 statistics.
You can also argue that 04:16 Bayesian data analysis is potentially the 04:20 most infor-
mation efficient method to fix 04:24 statistical model 04:25 but it’s also the most complication
(computation?) of 04:29 the intensive method.
The characterization that we’re going to run 04:34 with in this tutorial is the following:
Bayesian data analysis is a method for 04:40 figuring out unknowns often called 04:42
parameters that requires three things. 04:45
● one, data,
● two, something called a generative 04:49 model and
● three, priors what information 04:53 the model has before seeing the data.
So 04:57 what is a generative model here?
Well, it’s a very simple concept. It’s any kind of computer program or mathematical
expression, or set of rules, that you can feed fixed parameter values and that will generate
simulated data. A typical example of a generative model is probability distributions like the
normal distribution which you can use to 05:20 simulate data, but also, any kind of function
that you can whip up in R or Python that simulates data counts as a generative model. So a
generative model is great if you know what parameter values you want but you’re interested in
how much the data could vary given those parameters. Because then you can simply plug in
those parameter values run your generative model a large number of times look at how much
the data jumps around. That is it’s a classical Monte Carlo simulation.
But often we are in the complete opposite situation: we know what the data is, It’s not un-
certain and we want to know what are the reasonable parameter values that could have given rise
to this data. That is, we want to work our way backwards from the data that we know and learn
about the parameter values that we don’t know. And 06:12 it is this step that Bayesian inference
06:15 helps you with. So now I’m going to 06:17 explain how Bayesian inference works 06:19
with a motivating example. That while 06:22 it’s up to you if you think it’s motivating but it’s
about fish and who doesn’t like fish.
But to keep things simple we’re actually going to start with just estimating the perfor-
mance of one method, so it’s just a A testing but it will come around to the B later.
06:55 Now Swedish Fish Incorporated is a 06:58 company that makes money by selling
fish 07:00 subscriptions. You know you sign up for a 07:04 year and every month you get a
frozen 07:06 salmon in the mail. They are huge in Sweden but now they want to break into
07:11 the lucrative danish market.
But how 07:16 should Swedish Fish Incorporated enter the Danish market? 07:20 Well
the CEO has already come up with a plan let’s call it method A he put together this colorful
brochure which advertises the one-year salmon subscription plan and marketing has 07:33 ac-
tually already tried this out on 16 07:36 randomly chosen things and out of the 16 07:40 Danes
that got a brochure six signed up for one year of salmon.
So what we want 07:46 to know now is how good is method A what should we expect
the percentage of of sign up to be if we start sending 07:55 brochures on a large scale? Well
we could of course calculated percent percentage 08:01 of sign-ups in our sample that’s just 6
divided by and 16 equals 38 percent and maybe that is a good guess, but surely 08:11 this guess
is quite uncertain. Especially 08:15 since we have such a small sample.
So not only do we want to know what’s a good 08:22 guess for the percentage of sign ups
but 08:24 we also want to know how uncertain is 08:26 this percentage and that’s what we’re
08:29 going to use Bayesian data analysis for. 08:32 So remember what that Bayesian data
analysis requires three things: data, and 08:38 we have data so check on that. Then we need a
generative model which we don’t have, 08:44 So let’s come up with that. Let’s come up with a
generative model of people 08:50 signing up for fish.
08:51 There are of course many ways of doing this I’m just going to go with something
simple here. So first let’s assume that there is one underlying rate1 with which people sign
up. So now let’s just have a number let’s 09:03 say 55% then we ask a number of people
09:08 where the chance of each person signing 09:11 up is then 55%. So ask is in quotes here
E aqui, o que ele está fazendo em relação ao 3b1b?
Ele tá escolhendo, dentro da amostra, a parte significativa, para se ajustar à hipótese H.
boom boom go to to 21:27 go to the compose button boom boom boom 21:32 boom boom

ssssssssssssssssssssssssssss 10:20 gerar dados simulados, o problema é é claro que isso
nesse horário especı́fico para wwwwwwwwwwwwwwwwwwwwwwwww 15:06 se o modelo
aaaaaaaaaaaaaaaaa 19:49 estimativa de probabilidade porque enquanto 19:51 você
Introduction to Bayesian data analysis - Part 2: Why use Bayes?

00:03 this part 2 of a 3 part introduction to Bayesian data analysis which will go into the why
of Bayesian data analysis if you haven’t checked out part 1 yet I really recommend you do that
So, why use Bayesian data analysis? Why could it be a useful approach rather than using
say classical statistics? Well, I’m going to give you a couple of reasons.
00:28 One reason to use Bayesian data analysis is that you have great flexibility when
building models and can focus on that rather than on computational issues.
Now 00:40 if you’ve done some Bayesian modeling before this might sound a little bit
strange to you as there are often 00:46 computational issues when you want to fit your model.
What I mean here is that 00:50 since there is a very clean separation between specifying and
fitting a model in a Bayesian framework you often don’t have to focus 00:58 too much on how
your model is going to be computed when you construct it.
That 01:04 means that you can focus on what assumptions are reasonable and what 01:08
information you should use rather than on algorithms when doing the actual modeling and with
many good tools that 01:16 help you fit Bayesian models like Stan 01:18 Jag’s and PI MC there
is a good chance 01:21 that just specifying the model actually 01:23 is enough if it’s not too
complicated so 01:27 let me give you an example of how easy 01:30 it is to change a Bayesian
model while 01:32 the computation stays the same so this 01:35 is the CEO of Swedish Krish
incorporated 01:39 and he is telling us that I’ve come up 01:42 with a new brilliant way of
marketing 01:44 our salmon subscription service.
So I 01:47 guess we no longer have only one method to advertise salmon subscriptions
with and that means it’s time to bring back 01:54 that be in a B testing. So remember that
method A involves sending out a colorful brochure to advertise the salmon subscription service
and when marketing try this on 16 randomly selected Danes 6 02:08 out of 16 signed up.
The new method our CEO proposes 02:13 let’s call it method B involves sending 02:17
out the very same colorful brochure but 02:19 this time accompanied by a sample frozen 02:22
salmon and marketing has actually 02:25 already tried this method on another 16 games and
this time 10 out of 16 signed 02:31 up.
So what we now want to know is which 02:34 seems to be the better method ensure there
is some evidence that method B is better but how certain or uncertain 02:42 should we be that
this is the case so 02:45 what we want to do is to specify and fit 02:48 a Bayesian model that
helps us answer 02:50 these questions.
This is the model we had before when we just had one advertising 02:56 method.
We drew a rate of sign up from one prior and ran a generative model 03:01 that gave us
one simulated data set but 03:04 now, I have two advertising methods but 03:07 the cool thing
here is that all we need 03:10 to do is to copy and paste the one group 03:13 model. So instead
we draw two rates of 03:16 sign up independently from two priors 03:19 and separately run to
generative models 03:21 to simulate two data sets.
This is the only change we need to make to fit this 03:28 new model. We can use the
same procedure as we use the for part one of this tutorial 03:32 going on to the long name
approximate equation computation.
So here we again 03:38 first draw fixed parameter values from 03:41 the priors 03:42 this
time we happen to draw a sign-up 03:44 rate of 20% for method a and the rate of 72% for method
B and then we plug these 03:50 parameter draws into the generative models and simulate some
data. 03:54 This time we got for sign up for method A and 10 signups for method B.
But then we keep these parameter draws only if 04:02 the simulated data match the actual
data. 04:05 And this time it didn’t so we’re going to filter it away.
Shirt for method B the 04:11 simulated data match the actual data since we in the reality
got 10 signups, 04:15 but it doesn’t match for method a as we in reality got 6 signups there.
And we want all the simulated data to match the 04:25 real data and for these prompt
drawers have to go.
So we do it again this time we draw some 04:32 all the parameter values and when we run
the generative model this time well what 04:37 do you know this time we simulate the data that
04:39 matched, so we’re keeping these parameter goals and now as last time we do this 04:45
whole draw simulate react procedure many 04:48 many times say a million times.
And what we are left with are two distributions 04:54 the distributions of the parameters
goes for method A and method B that made it 05:01 past the rejection filtering step.
Here 05:05 is this distribution for the rate of signup for method a and since it’s the 05:10
probability distribution over likely parameter values that we got after 05:14 having used the data
05:16 it’s what’s usually called a posterior distribution. It should look familiar to 05:21 you as it
is the same as before when we 05:23 only have the data for method A. So again 05:26 it seems
likely that the right designer 05:28 prayed for method a is somewhere between 05:30 20 and 60
percent with it most likely 05:34 being somewhere around 35 percent.
And 05:37 here is the posterior distribution for 05:40 method B and just looking at it it
seems 05:43 there is some evidence that method B 05:45 would result in more signups as the
bulk 05:48 of the distribution is between 40 and 80 05:50 percent with a sign-up rate most likely
05:53 being around 65 percent.
But this is just 05:56 as eyeballing the posterior 05:58 distributions and we really would
like 06:00 to calculate some probabilities say the 06:03 probability that method B do have a
higher rate of sign-up than method A.
06:08 Fortunately this is very easy to do as these posterior probability 06:13 distributions
are represented by a long list of parameter draws.
So here are the 06:20 numbers behind the two posterior distributions I only show the first
06:24 eight rows but there are many many more rows in this table. So here each row is a 06:29
pair of parameter draws that when 06:32 plugged into the generator model 06:33 simulated data
matching the actual real 06:36 data. So the way these parameter drawers are distribute 06:40
that represents the uncertainty around 06:42 what the rate of sign up could be.
Now, if 06:46 you calculate new measures and we do it separately for each row then we
retain 06:52 this uncertainty and the resulting distributions of these new measures can 06:57
also be interpreted as posterior probability distributions, that is what 07:02 is known about
these new measures given the model and the data.
So what could 07:07 such a measure be? Well since we’re interested in which your method
a and 07:11 method B gives the highest rate of sign up 07:14 why not calculate the difference
between grade a and rate be using some are like 07:19 pseudocode it could look something like
this and when applied to each row it 07:23 would give us a new column for the distribution
of the difference between 07:27 method a and method B where a positive number would be in
favor of method B.
So 07:33 now we could take a look at this new derivative distribution.
Just eyeballing 07:39 it we see that it is quite likely that 07:41 method B has a higher rate
of sign up 07:43 almost all of the probability is to the 07:46 right of the zero mark with the right
B 07:48 being most likely around 25 percentage 07:51 point higher than rate A.
Again since we 07:55 are working with a table of parameter growth it is very easy to
calculate the 07:59 probability that rate B is higher than 08:01 rate A. We simply sum up how
many rows of 08:05 the rate difference was above zero that 08:07 is how many times rate B was
higher than 08:10 rate a and then we divide by the total 08:13 number of draws. This time we
get that 08:16 92% of the rate difference distribution 08:19 is above zero that is there is a 92%
probability that rate B is better than rate A.
To arrive at this probability we 08:28 didn’t need to change the way we fitted the model
we could use the same method 08:32 as when we just had data for mass of A
all we needed to do was change the model 08:37 and add a prior and a generative model
for method B and then we just did some 08:42 simple post-processing of the posterior draws
using basic arithmetic.
08:49 so another reason to use Bayesian data analysis is that it allows you to 08:54 include
information sources in addition to the data. For example expert opinion.
09:00 here is again the CEO of Swedish Fish Incorporated and he’s come to tell us 09:05
that the signup rate has never been higher than 20% not even in Norway and 09:11 it’s usually
between 5% and 15%.
now I’m not really sure exactly how much we 09:19 should trust our CEO. I mean I I think
is smoking tobacco but I don’t know, but for 09:25 now let’s roll with this new information and
see how we can include this expert 09:30 opinion into the model again this is the model we have
so far I’ve forgotten 09:36 about method B for the time being so now our back just estimating
the rate of 09:41 signup flow method a
so how can we include the CEOs information? well a 09:47 natural place to include it is
in the prior what the model knows about the 09:53 rate of signup before seeing the data.
what we need to do is to change the 09:58 prior from a uniform prior which basically says
that any rate between 10:03 zero and 100% is equally likely to a more informative distribution
that 10:08 favors values between 5 and 50 percent. Now there are many ways to define custom
10:14 prior distributions we could stitch together a couple of uniform 10:18 distributions where
we put more probability on the distributions 10:21 covering 5 to 15% or we could even draw a
probability distribution with pen and 10:26 paper and scan it in. But often the easiest solution
is to use assembler or 10:31 probability distributions that is 10:33 flexible enough to represent
the 10:35 information that we have. And that we 10:37 will tweak until it represents that 10:39
for us a good choice would 10:42 be the beta distribution so the beta is 10:45 a continuous
distribution bounded 10:47 between 0 and, 1 which is good because 10:50 the rate of signup
compa less than 0% nor more than 100. It has two parameters 10:55 alpha and beta that allow it
to take all the forms depicted here. For example when 11:01 alpha and beta are one it becomes
a uniform distribution. 11:05
the larger the alpha and beta parameters are the more keep shaped and peaked it 11:10
will become
so here is a uniform distribution we’re using right now as 11:16 the prior for the signup
rate. A uniform prior is sometimes called a non informative 11:21 tip prior as it really doesn’t
contain that much information with regards to 11:25 what the signup rate could be
and here is a proposal for what a more 11:31 informative prior could be this is a beta dis-
tribution with the Alpha parameter set to 3 and the beta 11:37 parameter set to 25 but the specific
parameter values really doesn’t matter 11:41 here what matters is what shape the distribution
And here I wanted to 11:46 capture the information from our CEO that the rate of signup
usually is 11:51 between 5 and 15 percent so this 11:54 informative prior puts most the 11:57
probability between 5 and 15 percent but 12:00 does not rule out the possibility that 12:02 the
sign of red could be up to 30 12:05 percent.
you could certainly capture the 12:07 CEOs information in many other ways but this is
what we’re going to roll with so 12:13 this is our new model it’s the same as 12:15 before but
now with the informative 12:17 prior on top and the cool thing again is 12:19 that we don’t need
to change the computational part of how we fit this 12:23 model we can use the same procedure
as before the only difference is that we 12:28 will draw the parameter draws from our 12:30
new informative prior distribution 12:32 instead from the uniform distribution as 12:35 before
here is a distribution you should 12:40 recognize it’s the posterior probability 12:42 distribution
of the likely rate of sign 12:44 up using the uniform non informative 12:47 prior and here is
what we got using the new informative prior
looking at it it 12:54 seems that after having used info from 12:57 the CEO and the info
from the data it is 13:00 most probable that the rate of signup is 13:02 between 10 and 30%.
So the information in 13:06 the data point is the rate of sign up 13:08 being somewhere around
40% and the CEO 13:11 stated that it’s usually around 5 to 15% 13:14 so it shouldn’t come as
a surprise 13:16 the resulting posterior distribution 13:18 looks like a mix between these two
13:21 information sources
now if we had more 13:24 data the information in the prior would 13:27 have less and
less influence with enough 13:30 data the prior wouldn’t matter at all 13:32 similarly if we had
less data that 13:34 posture would look more like the prior 13:36 and if we had no data at all
the 13:39 posterior would be the same as the prior 13:41
now we are in a slightly confusion 13:44 situation, however. That we have run two 13:48
different models and have two different 13:50 results from the same data set and at 13:53 some
point we should decide whether we 13:55 want to go with a non informative prior 13:57 or the
prior from the CEO.
but it’s 14:00 totally fine to try out different models 14:03 in different priors and it can
be 14:05 worthwhile to try out an informative 14:07 prior because if you’re not using an 14:10
informative prior you’re leaving money 14:12 on the table as Robert Weiss puts it.
14:15 that is if you’re not using an informant 14:17 the prior you’re really leaving out
14:19 information from the analysis that you 14:21 have which seems like a waste.
all right 14:26 a third reason why Bayesian data analysis 14:29 is useful is because they’re
the result 14:32 of a Bayesian analysis retains the 14:34 uncertainty of the estimated parameters.
14:36 Which is very useful in prediction and 14:39 decision analysis here decision analysis
14:43 is when you take the results of analysis 14:45 and bring it closer to what you care 14:48
about. Usually you don’t ultimately care 14:51 about the parameter value what you care 14:53
about is often things like money and 14:56 what decision to make to get mortgage or 14:59
what you could do to avoid different 15:00 types of loss.
we never seem to get rid 15:05 of our CEO and here he is again he asks 15:09 us so what
should we do 15:12 and by the way marketing forgot to tell 15:14 you that the cost of sending
a brochure 15:16 is 30 Kronus the cost of sending a 15:19 salmon is 300 Krona 15:20 and if a
person signs up we make 15:23 thousand crona’s on average.
okay so 15:27 so what should we do here are the two 15:32 methods that we are consi-
dering a 15:33 sending a colorful brochure or be 15:36 sending a brochure and a sample frozen
15:39 Selman and this is the result we got 15:41 after having fitted the model with the 15:43
data from both method a and method B we 15:46 did that before remember.
and while this 15:50 showed that it is probable that method B 15:52 has a higher rate of
sign up it doesn’t 15:54 directly tell us what to do, because 15:56 we’re not really interested in
the rate 15:58 of sign up we’re really interested in 16:00 which method will give us the most
money 16:03 and while method B seems to have a 16:05 higher rate of sign up it also involves
16:07 sending out costly samples Salomon’s.
but 16:11 since we did a Bayesian analysis and we 16:13 have access to the raw draws
behind 16:15 these two probability distributions it’s 16:17 very easy to do a quick decision
16:19 analysis to figure out which method will 16:22 probably give us the most money.
so to 16:25 the left here we have the first eight 16:27 rows from the many many drawers
that 16:29 make up these two posterior probability 16:32 distributions.
and the distribution of 16:35 these draws represents the uncertainty 16:38 regarding what
the underlying rate of 16:40 signups are for these two methods
and 16:42 remember that any calculation we perform 16:45 row wise here will give us a
new 16:47 posterior probability distribution that 16:49 retains this uncertainty
so some 16:53 reasonable things to calculate would 16:54 here be the expected profit
when using 16:57 method a which is the rate of signup 17:00 times a thousand crooners we
make per 17:02 sign up minus the cost of sending the 17:04 brochure
so for the first row that would 17:07 be 33% 10,000 crooners which means we 17:11
would make 331 crona’s on average minus 17:14 the 30 kronas the brochure costs.
so an 17:17 expected profit of 301 crona’s percent 17:20 for sure and so on for all the
and 17:24 similarly we can calculate the expected 17:27 profit for method B, which is
almost the 17:30 same same calculation but now minus 300 17:33 crooners for the salmon
and finally 17:36 since we’re interested in which of these 17:38 two methods would give
the higher 17:40 profits we will calculate the 17:43 difference in profit between the methods
17:45 where a positive difference here means 17:48 method B is better. Just looking at
these 17:51 first eight rows we see that five out of 17:54 eight rows are actually favoring method
17:56 a but of course we should look at the 17:59 profit difference distribution for all 18:01 the
so here we see that there is 18:04 much uncertainty regarding which method 18:07 would
give the highest profit
it could 18:10 be that net would be is better but if we 18:13 count up how many drawers
are in favor 18:15 of method A we actually get that there 18:18 is a sixty-one percent probability
that 18:20 method a would result in better profits
18:22 so if we had to decide this small 18:26 decision analysis tells us that we 18:27
should go for method A even if method B 18:31 has a higher rate of sign up
but the 18:34 main take over here should really be 18:35 that given the data that we have
there 18:37 is much uncertainty and we really would 18:40 need some better data before making
a 18:42 decision.
all right, so we went from 18:45 estimated rate parameters to a posterior 18:49 probability
distribution of the light 18:51 difference in profit between these two 18:53 methods
and I hope you saw how easy that 18:57 was since we started from the result of 18:59
a Bayesian analysis that is probability 19:02 distributions represented as a long 19:04 table of
parameter growth
if we instead 19:08 would have used classical statistical 19:10 methods like maximum li-
kelihood 19:12 estimation, would just have gotten out 19:14 point estimates which we wouldn’t
be 19:17 able to post process into something that 19:19 informed us about the expected profit
19:21 and that included some measure of 19:23 uncertainty or certainty regarding the 19:25
expected profits
but with base it was 19:28 pretty simple
so and last reason to use 19:32 Bayesian data analysis but there are many 19:34 more
reasons but a last reason to use it 19:36 is because you probably are already
what 19:41 I mean here is that a lot of classical statistical procedures that you might 19:44
already be familiar with such as 19:46 classical linear regression or the 19:49 bootstrap can be
interpreted as a 19:51 Bayesian model with priors and generative model and the same is true
19:56 for many machine learning procedures
and 19:58 while you don’t have to interpret the 20:01 statistical model to use from baye-
sian 20:03 perspective it helped me better 20:06 understand what many statistical 20:08 proce-
dures actually do
not least was a 20:11 Bayesian perspective super useful for me when understanding how
mixed models and 20:15 hierarchical models worked which are simple and straightforward from
a 20:20 Bayesian perspective but slightly 20:22 mysterious from a classical perspective.
20:26 So that were some reasons for why to use Bayesian data analysis.
Let’s look at some 20:34 reasons for why not to use Bayesian data analysis so maybe
everything is working fine as it is and you just have to with 20:43 your tools and your workflow
then you might not need Bayesian data analysis or 20:48 maybe you’re not that interested in
20:49 uncertainty there are many good machine 20:51 learning tools that just give 20:52 pre-
dictions but with no indication of 20:54 these predictions uncertainty and if you 20:57 want that
maybe you don’t need Bayes or 21:00 maybe Bayesian statistics is too 21:03 computationally
demanding, maybe you 21:05 would want to fit the Bayesian model but 21:06 your data set is so
large it just take 21:08 too long time or maybe you just feel a 21:12 Bayesian statistics take too
much work to 21:15 set up even if you would want to try the 21:17 cost-benefit situation doesn’t
allowed 21:20 these are perfectly good reasons not to 21:24 use space and what I wanted to say
with 21:27 this slide here is that Bayesian data 21:29 analysis is just one two out of many in
21:32 your data science tool belt and while it 21:35 can be a very useful tool it’s not the 21:37
be-all end-all of data analytical 21:40 methods even though it’s sometimes 21:42 presented as
So that concludes part 21:47 two of this three part introduction to 21:49 Bayesian data
analysis if you want to try 21:52 out what we talked about here you could 21:54 go back to
your solution to the exercise 21:56 in part 1 and change it according to the 21:58 CEOs request
that is try adding an 22:02 informative prior to the model change 22:04 the model so that it can
accommodate 22:05 data from both method a 22:07 and method B and you can also try 22:10
replicating the decision analysis where 22:12 we looked at the expected profit of each 22:14
method. However, if you try this you could 22:18 run into some trouble because when you add
the second data source to the model 22:22 you might find that it takes a really 22:24 long time to
run that’s because the method we used to fit Bayesian models in part one approximate Bayesian
computation 22:32 was conceptually simple but also extremely slow. So in part three of this
introduction I will give you some hints 22:40 on how you can do speedy Bayesian computation.
And especially we will look 22:45 at a useful tool called Stan but for now I’m rational
sports and 22:50 thanks for staying with me to the end.

2 Introduction to Bayesian data analysis - part 3: How to do Bayes?

Hallie’s hello as we say in Sweden I’m 00:03 Rasmus port and welcome to this last 00:05 part
of a three part introduction to 00:07 Bayesian data analysis which will go 00:10 into the how
Bayesian data analysis how 00:13 to do it efficiently and how to do it in 00:15 practice but
I’ll warn you will just 00:18 make the tiniest scratch on the far 00:20 reaching surface that is
Bayesian 00:22 computation.
The goal is just for you to 00:24 get some familiarity with words like Markov chain Monte
Carlo and parameter 00:29 space and to be able to run a simple 00:32 Bayesian model using a
powerful Bayesian computational framework called span also 00:37 if you haven’t watched part
1 and part 2 00:39 yet and especially if you didn’t do the exercise in part 1 I really recommend
that you do that first because what’s 00:46 going to follow isn’t going to make much sense
otherwise also this part 3 is quite a lot longer than the other parts so feel free to take a break at
any time
aaaaaa 00:57 so what is the problem with how we have 01:00 performed Bayesian data
analysis so far well we really coded things from scratch and from my experience that is slow
and 01:09 error-prone and it’s especially slow because we’ve been doing approximate Bayesian
computation which is the most 01:16 general and conceptually simplest method but also the
slowest method for fitting a Bayesian model.
But there are many 01:23 faster method for fitting Bayesian models that are faster because
they take 01:28 computational shortcuts and these faster methods have in common that they
require 01:33 that delighted that the generator model will generate any given data can be 01:38
calculated rather than simulated.
So what we did 01:43 when we did approximate Bayesian 01:44 computation was that
we defined a generative model function that took a 01:49 fixed parameter value and simulated
some 01:52 data and we figured out the likelihood of the generative model simulating the 01:57
actual data by running it many many times and then counting how many times 02:03 it reduced
data matching the actual data and all this simulation can be very very 02:09 time consuming
and fast the methods require 02:12 function like this that takes both data and fixed parameters
as input and 02:17 directly calculates the likelihood of these parameters producing the data it’s
02:23 not always straightforward to calculate likelihood for any generative model but 02:27 for
a large number of generative models someone has already done this work for 02:33 you. For
example for most common probability distributions the likelihood 02:37 is easy to calculate.
Another thing faster methods have in common is that 02:42 they explore the parameter
space in a 02:45 smarter way. 02:46
Rather than just sampling from the prior 02:49 as we have done, faster methods try to
02:52 find and explore the regions in 02:54 parameter space that have higher 02:55 probability

