Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways:
- State space models and traditional time series models are well-suited to forecast loss ratios in the insurance industry, although actuaries have been slow to adopt modern statistical methods.
- Working with limited data is a challenge, but informed priors and hierarchical models can help improve the modeling process.
- Bayesian model stacking allows for blending together different model predictions and taking the best of both (or all if more than 2 models) worlds.
- Model comparison is done using out-of-sample performance metrics, such as the expected log point-wise predictive density (ELPD). Brute leave-future-out cross-validation is often used due to the time-series nature of the data.
- Stacking or averaging models are trained on out-of-sample performance metrics to determine the weights for blending the predictions. Model stacking can be a powerful approach for combining predictions from candidate models. Hierarchical stacking in particular is useful when weights are assumed to vary according to covariates.
- BayesBlend is a Python package developed by Ledger Investing that simplifies the implementation of stacking models, including pseudo Bayesian model averaging, stacking, and hierarchical stacking.
- Evaluating the performance of patient time series models requires considering multiple metrics, including log likelihood-based metrics like ELPD, as well as more absolute metrics like RMSE and mean absolute error.
- Using robust variants of metrics like ELPD can help address issues with extreme outliers. For example, t-distribution estimators of ELPD as opposed to sample sum/mean estimators.
- It is important to evaluate model performance from different perspectives and consider the trade-offs between different metrics. Evaluating models based solely on traditional metrics can limit understanding and trust in the model. Consider additional factors such as interpretability, maintainability, and productionization.
- Simulation-based calibration (SBC) is a valuable tool for assessing parameter estimation and model correctness. It allows for the interpretation of model parameters and the identification of coding errors.
- In industries like insurance, where regulations may restrict model choices, classical statistical approaches still play a significant role. However, there is potential for Bayesian methods and generative AI in certain areas.
Chapters:
00:00 Introduction to Bayesian Modeling in Insurance
13:00 Time Series Models and Their Applications
30:51 Bayesian Model Averaging Explained
56:20 Impact of External Factors on Forecasting
01:25:03 Future of Bayesian Modeling and AI
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.
Links from the show:
- Nate’s website: http://haines-lab.com/
- Nate on GitHub: https://github.com/Nathaniel-Haines
- Nate on Linkedin: https://www.linkedin.com/in/nathaniel-haines-216049101/
- Nate on Twitter: https://x.com/nate__haines
- Nate on Google Scholar: https://scholar.google.com/citations?user=lg741SgAAAAJ
- LBS #14 Hidden Markov Models & Statistical Ecology, with Vianey Leos-Barajas: https://learnbayesstats.com/episode/14-hidden-markov-models-statistical-ecology-with-vianey-leos-barajas/
- LBS #107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt: https://learnbayesstats.com/episode/107-amortized-bayesian-inference-deep-neural-networks-marvin-schmitt/
- LBS #109 Prior Sensitivity Analysis, Overfitting & Model Selection, with Sonja Winter: https://learnbayesstats.com/episode/109-prior-sensitivity-analysis-overfitting-model-selection-sonja-winter/
- BayesBlend – Easy Model Blending: https://arxiv.org/abs/2405.00158
- BayesBlend documentation: https://ledger-investing-bayesblend.readthedocs-hosted.com/en/latest/
- SBC paper: https://arxiv.org/abs/1804.06788
- Isaac Asimov’s Foundation (Hari Seldon): https://en.wikipedia.org/wiki/Hari_Seldon
- Stancon 2023 talk on Ledger’s Bayesian modeling workflow: https://github.com/stan-dev/stancon2023/blob/main/Nathaniel-Haines/slides.pdf
- Ledger’s Bayesian modeling workflow: https://arxiv.org/abs/2407.14666v1
- More on Ledger Investing: https://www.ledgerinvesting.com/about-us
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
In this episode, I am thrilled to host Nate Haynes, the head of data science research at
Ledger Investing and a PhD from Ohio State University.
2
Nate's expertise in generative invasion modeling
3
helps tackle the challenges in insurance -linked securities, especially with issues like
measurement errors and small data sets.
4
He delves into his use of state -space and traditional time series models to effectively
predict loss ratios and discusses the importance of informed priors in these models.
5
Nate
6
also introduces the BaseBlood package, designed to enhance predictive performance by
integrating diverse model predictions through model stacking.
7
He also explains how they assess model performance using both traditional metrics like
RMSE and innovative methods like simulation -based calibration, one of my favorites, to
8
ensure accuracy and robustness in their focus.
9
So join us as Nate unpacks the complexities of Bayesian modeling.
10
the insurance sector, revealing how advanced statistical techniques can lead to more
informed decision -making.
11
This is Learning Vision Statistics, episode 115, recorded June 25, 2024.
12
Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
13
I'm your host, Alex Andorra.
14
You can follow me on Twitter at alex -underscore -andorra.
15
like the country.
16
For any info about the show, learnbasedats .com is Laplace to be.
17
Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on
Patreon, everything is in there.
18
That's learnbasedats .com.
19
If you're interested in one -on -one mentorship, online courses, or statistical
consulting, feel free to reach out and book a call at topmate .io slash alex underscore
20
and dora.
21
See you around, folks.
22
and best patient wishes to you all.
23
And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can
help bring them to life.
24
Check us out at pimc -labs .com.
25
Nate Haynes, welcome to Learning Bayesian Statistics.
26
Thanks for having me, very excited to be here.
27
Same.
28
Very, very excited to have you here.
29
Also because a lot of patrons of the show have requested you to be here.
30
One of the most convincing was Stefan Lorentz.
31
I'm pronouncing that the German way because
32
think he's somewhere from from there.
33
Maybe he's in or Swiss and then like he hates me right now.
34
But yeah, Stefan, thank you so much for recommending Nate on the show.
35
And I hope you'll appreciate the the episode.
36
If you don't, this is entirely my fault.
37
And not Nate's at all.
38
Yeah, well, I appreciate the shoutouts.
39
Yeah, no, he was like really
40
He told me like that, I'll tell you what he told me.
41
was, for a while now, I've been thinking about an interesting hook to recommend Nathaniel
Haynes.
42
Someone who is not like so many of my previous recommendations currently in academia.
43
Yeah.
44
Yeah.
45
And he was like, today it seems to have presented itself.
46
He just released.
47
a Python library for Bayesian model averaging, a very practical topic that hasn't been
discussed yet in any episode.
48
know, he was really, really happy about your work.
49
Very cool.
50
Yeah, that's all I wanted to hear.
51
Yeah, and we're definitely going to talk about that today, model averaging and a lot of
cool stuff on the deck.
52
But first, can you tell us basically what you're doing nowadays?
53
If you're not in academia, especially since you live in Columbus.
54
which I think is mostly not for Ohio State University.
55
Right, yeah.
56
We think of ourselves as a flyover state.
57
Well, others think of us as that.
58
We like to think that we're much cooler and hipper and all that.
59
Yeah, yeah, Yeah, so I've been, for the last few years, I've been working as a data
scientist remotely, and I've been at two different startups during my time the last few
60
years after
61
graduating from my PhD with my PhD focused on clinical mathematical psychology, where it's
kind of really where I did a lot of Bayesian modelings where I learned a lot of Bayesian
62
modeling, which led me to yeah, where I am today at my current company, I'm with Ledger
Investing.
63
And we are sort of like in between what I would call like insurance industry and like
finance in a way.
64
We are not an insurance company, but that's the data that we deal with.
65
And so a lot of our modeling is focused on that.
66
And at Ledger, I'm the manager of data science research.
67
So a lot of my work focuses on building new models, productionizing those models, and
finding out different ways to kind of incorporate models into our production workflow.
68
And yeah, I'll be.
69
happy to dive into more detail about that or kind of how I got here because I know it's
something I've talked to a lot of people about too, especially on the academic side.
70
Like the transition to industry itself can be kind of something that's a little bit
opaque, but then also like based in industry is like, I didn't know there was like people
71
doing a lot of that.
72
And so yeah, excited to talk in more detail about about all of that.
73
for sure.
74
Actually, how did you end up working on these different on these topics because
75
BASE is already a niche.
76
then specializing in something BASE is even more niche.
77
So I'm really curious about how I ended up doing that.
78
Yeah, just like BASE in general.
79
So I actually got exposed to BASE pretty early on.
80
I guess you could say I have a weird background as a data scientist because I did my
undergrad degree in psychology and I just did a BA, so I didn't really take...
81
much math in undergrad.
82
But I got involved in a mathematical psychology lab, research lab, later on in my
undergraduate degree.
83
And this was run by Tricia Van Zandt at Ohio State.
84
So actually, I grew up in Columbus and I've been here ever since.
85
But I started to work with her and some grad students in her lab.
86
And they were all, yeah, the best way to put it, were hardcore Bayesians.
87
So they did a lot of mathematical modeling of sort of more simple decision making tasks.
88
And by simple, guess I just mean like the, you know, like response time types of tasks.
89
And so they did a lot of reaction time modeling, which has a pretty deep history in
psychology.
90
And so they were all Bayesians.
91
That was the first time I saw the word.
92
I remember seeing that word on like a grad students poster one time.
93
Like, what is that?
94
And so I got exposed to it a bit in undergrad.
95
And then when I, like as I was going through undergrad, I knew I wanted to go to grad
school.
96
I wanted to do a clinical psychology program.
97
I was really interested in cognitive mechanisms, things involved with mental health.
98
And I got really lucky because there was an incoming faculty at Ohio State who, Wooyoung
Ahn, that's his name, and he
99
was the one who he brought a lot of that to Ohio State.
100
He wasn't that he didn't end up being there for too long, but he said now it's still
National University.
101
But I worked with him for a year as a lab manager.
102
And in that first year, he he really wanted to build some open source software to allow
other people to do decision making modeling with psychological data.
103
And the way to do that was to use hierarchical bays.
104
And so
105
I kind of got exposed to all of that through my work with Young.
106
yeah, we did a lot of that work in Stan.
107
And so that was kind of like the first time I really worked on it myself.
108
But I'd kind of known about it and knew about some of the benefits that Bayes can offer
over other philosophies of statistics.
109
And that started pretty early on in grad school.
110
So I think
111
I'm probably a weird case because I didn't really have like traditional stats training
before I got the Bayes training.
112
And so a lot of my perspective is very much like I'm, I think a lot in terms of generative
models and I didn't have to unlearn a lot of frequentist stuff because my understanding by
113
the time I really started diving into Bayes was pretty rudimentary on the frequentist
side.
114
And so yeah, that, kind of naturally
115
I got really involved in some of that open source work during graduate school on the
package we released called HBaseDM, which is a mouthful, but really good for search
116
because nothing else pops up if you search HBaseDM.
117
so that was kind of my first foray into like open source Bayesian modeling types of
software.
118
that eventually, like I decided that
119
you know, I really like to do this method stuff.
120
It was more focused on modeling side of my work than I was on like the domain per se.
121
And had a really interesting kind of track into industry.
122
I wasn't actually originally pursuing that.
123
That wasn't my intention, but I actually got, I just got a cold email one day the summer I
graduated from a co -founder at my previous company, which is called AVO Health.
124
and they were looking for someone who did something very particular, which was like
hierarchical Bayesian modeling.
125
They were familiar with psychological data.
126
And so I kind of fit the bill for that.
127
And I decided that it'd be worth the shot to try that out.
128
And I've been in industry ever since.
129
so, yeah, I think it was...
130
Really what got me into it originally was just kind of being in the context surrounded by
people doing it, which I don't think most people get that experience because BASE is still
131
rather niche, like you said.
132
But at least in the circle that I was in, in undergrad and grad school and things like
that, it was just kind of the way to do things.
133
And so I think that's colored my perspective of it quite a bit and definitely played a big
role in why I ended up at Ledger today.
134
Yeah, super cool.
135
Yeah, and I definitely can relate to the background in the sense that I too was introduced
to stats mainly through the Bayesian framework.
136
So thankfully, mean, that was hard, but that was not as hard as having to forget
everything again.
137
so.
138
right, right.
139
It was great.
140
I remember being very afraid when I opened a classic statistics book and
141
was like, my God, how many tests are there?
142
It's just terrible.
143
No, exactly.
144
And it's hard to see how things connect together, yeah.
145
No, I was not liking stats at all at that point.
146
And then thankfully, I did electoral forecasting and you kind of have to do base in these
realms.
147
know, that was really cool.
148
one of the best things that ever happened to me.
149
exactly.
150
So you're forced into it from the start.
151
It doesn't give you much choice.
152
And then you look back and you're happy that that ended up happening, Exactly.
153
Yeah.
154
And actually, you do quite a lot of time series models, if I understood correctly.
155
So yeah, could you talk a bit about that?
156
I'm always very interested in time series and forecasting models.
157
how useful they are in your work.
158
Yeah, yeah.
159
So I think maybe first to start, like I can give a bit of context on kind of the core
problem we're trying to solve at Ledger and that'll help kind of frame what we do with the
160
time series models.
161
So like, basically what we provide is an alternative source of capital for insurance
companies.
162
And so it's like, you if I wanted to start an insurance company, I'd have to have
163
a ton of money to like have in the bank so that when people are, know, if something goes
wrong and I write a bunch of policies for private auto, for example, for car insurance, I
164
have to be able to make people whole when, you know, an accident happens.
165
And so when insurers are trying to fund different books of business, they need often to
raise lots of capital for that.
166
And traditionally,
167
one of the methods that they have done to accomplish this is to approach reinsurers, which
I didn't know anything about before I joined Ledger.
168
I'm kind of an insurance newbie at Ledger.
169
Now it's been a couple years, so I can't say that anymore.
170
But basically go to someone with even more money to kind of provide the capital and kind
of allow them to write their business.
171
And so we...
172
kind of, we basically work with insurers or other similar entities and then investors and
allow the investors access to this insurance risk as like an asset class.
173
And then from the perspective of the insurance side, they're getting the capital that they
need to fund their programs.
174
And so it's sort of a way for the insurance companies like it because it's the source of
capital they need to do business.
175
the investors like it because they get to invest in how these portfolios of insurance
programs are performing as opposed to like say investing in an insurance company stock or
176
something like that.
177
And so it's a little bit more uncorrelated with like the market in terms of like other
types of assets that investors might have access to.
178
And so that's kind of the core problem.
179
Our data science team is like the context that we're
180
we're baked within and what we're actually modeling, like the thing we're trying to solve
is, you know, so we have, say an insurance company approaches us and they have a
181
commercial auto or a private auto program or a workers compensation program.
182
You know, so a lot of times they'll have like been writing that kind of program.
183
They've been in that business for, you know, five, 10 years or something.
184
And so they have historic data.
185
And the way we look at the data is you have like different accident years.
186
So if you think like, you know, if we're looking at it today in year 2024, maybe they have
like 10 years of business that they've been writing.
187
And so we look back all the way to like 2014, 2015, 2016, and we see how much have they
lost, like how much claims have they paid out versus premium have they taken in.
188
And so there's this quantity of like the loss ratio is really important.
189
And in a lot of areas of business, it's like around a 60%.
190
And this is before like you're paying salaries and all of that, just like the pure like
insurance side, like around 60 % might be pretty typical or pretty reasonable for like a
191
good -ish program.
192
So it's an overgeneralization, but just to keep some numbers in mind.
193
And the interesting thing about this though is that, you know, we look back at 2014 and we
have 10 years of history.
194
So we...
195
We kind of know what the loss is for 2014 for a program that comes to us today, but what
of like 2023, right?
196
There's only been a year since then.
197
And the way that insurance often works, you've ever had to file a claim for like
homeowners or car insurance, something like this, you're probably familiar.
198
It can take quite a while for you to get paid out.
199
And sometimes there's lawsuits, sometimes people don't file a claim until years later.
200
Like it can, there can be a lot of different reasons that
201
you know, the information we have today about losses in any given year maybe isn't
complete or the way that we think about it is it's a loss ratio isn't developed.
202
And so you think about like the data that we have, it kind of takes the shape of this
triangle where if you, and we call them loss triangles where you have, you can think of
203
like a matrix where the sort of act or the Y axis would be
204
the accident years, so the different years that accidents are occurring.
205
And then the X axis would be how much time has passed since we're looking at that accident
year.
206
So we call that like the development period or something similar.
207
And so like 2014, we have 10 cells, 10 years of data that we can look back on.
208
2014, we have nine and so on and so forth.
209
And so it kind of forms this triangle.
210
And so basically for us to price these deals,
211
What we end up needing to do is two things and there's kind of two basic like modeling
steps involved.
212
And the first is to find out, you know, where do we think the loss ratio is going to end
up for all of these accident years?
213
Like if we were looking back, you know, like a hundred years from now.
214
And so we want to know like what that ultimate state of the loss ratio is.
215
And so that's the first part where some time series models come into play.
216
And so we have this kind of weirdly shaped data and we want to extrapolate out from
historical to kind of thinking about the year as being static that we're looking at, but
217
like our information that we have on it is dynamic and we can learn more about that as
time goes on.
218
And so our first task is to kind of predict that ultimate state.
219
And that just gives us a sort of more accurate representation of what we think the history
will look like.
220
And so that's our first step.
221
And then the second step, which is where we use much more traditional time series models.
222
And the second step is to then say, okay, given that history of like the ultimate state
for each previous year, like what are the next two, three years going to look like?
223
And that's where we have more traditional forecasting models.
224
But because we have this multi -stage process, like there's uncertainty from one model
output that we need to account for in that.
225
second stage.
226
And so we have, like we do some measurement error modeling and things like that.
227
And that at the end of the day is really why like Bayes ends up being such a useful tool
for this problem, just because there's lots of sources of uncertainty.
228
There's a rich history of actuarial science where actuaries have developed models to solve
similar problems to this.
229
And so there's like theory -informed models.
230
historic data that we can use.
231
And so we get to use really everything in the base toolbox.
232
Like we get to use priors, we get to use very theory informed generative models.
233
And then we also get to do some some fun things like measurement error modeling and things
of that nature, kind of between the various stages of the modeling workflow that we
234
follow.
235
I know this is of a long explanation, but I think the context is kind of helpful to
understand like
236
why we approach it that way and why we think base is a useful way to do so.
237
Yeah.
238
Thanks a lot for that context.
239
I think it's very useful because also then I want to ask you about, you know, then which
kind of time series models you mostly use for these use cases and what are some of the
240
most significant challenges you face when dealing with that?
241
that kind of time series data?
242
Yeah, no, it's a question.
243
So I'd say like the time series models, we do a lot of state space modeling.
244
so we've done, it really kind of like there's, we do a lot of research on exploring
different forms of models, but the stuff that we end up using in production, like that
245
first stage where we do our kind of development process.
246
Those models are more similar to like the classic actuarial science models.
247
So they technically are science or time series models, but they're kind of these non
-parametric models where we're just estimating, you know, say your loss is 20 % during the
248
first development life.
249
what are, can we estimate some parameters that if you kind of multiply that by some factor
that that gives us the next.
250
period.
251
And so there's these link ratio style models that we use in that context.
252
And so there's a little less traditional but more in line with what actuaries have
historically done.
253
And then for the forecasting piece, that's where we use more kind of modern, more
classical, or not classical, but more what you would imagine when you think of time series
254
models today.
255
things like, like
256
autoregressive styles of models.
257
We do, like I said, states -based models where we kind of assume that these loss ratios
are really this latent kind of drifting parameter over time.
258
And then we have sort of the latent dynamics paired with some observational model of how
those losses are distributed.
259
And then sometimes in
260
A lot of times in the investment or the finance world, people talk about whether they
think some sort of asset is mean reverting or if it shows some sort of momentum in the
261
underlying trends.
262
And so we have different models that capture some of those different assumptions.
263
actually, pretty interesting, people all across the business tend to be pretty interested
in whether a model has mean reversion in it or momentum in it.
264
And that becomes actually a question that
265
a lot of investors and insurance companies alike are interested in knowing because they
might have disagreements about whether or not that component should be in the model.
266
But so that's the I'd say like those types of time series models are what we use most
regularly in production.
267
So like your traditional state space models that like terms of the challenges we face.
268
I think the big challenge is that
269
kind of based on the context that I was providing, you we might have like 10 years of
history on a program and that would be a good outcome.
270
And so, you know, if our time series is 10 previous data points where some of the more
recent ones are highly uncertain because they're actually, you know, they're predictions
271
from a previous model, I think you might kind of start to imagine where the issues can
arise there.
272
And so I think...
273
I would say that's probably our biggest challenge is the data that we work with from a
given program.
274
The numbers are big because we're talking about investment amounts of dollars.
275
A program might write 10 to $100 million of premium.
276
And so the loss values are pretty high themselves.
277
And so it's a lot of information there.
278
But the history that we have to build a time series model on is pretty...
279
short.
280
And so a lot of like classical time series approaches, there's quite a bit more data that
people are working with.
281
You'll hear about like things with seasonality and other types of things where you're
decomposing a time series.
282
And we don't really have the ability to do any of those classical modeling approaches,
mostly just because we don't have the history for it.
283
And so one of the ways that we approach that problem to help
284
solve it, at least to solve it to some extent, is that we do have information on many,
many different insurance companies and their losses historically.
285
even if the history may not be very long, we might have at maximum 30, 40 years of
history, 50 years of history sometimes on 50 basically individual data points in a time
286
series model.
287
Typically we have much less.
288
But one of the things that happens in the insurance industry is that all of these
companies, need to publicly release certain information each year.
289
so we're able to use that to basically, we were able to take that information and use it
to help us obtain informed, like data informed priors.
290
And so that when a smaller program comes our way and we are using our time series models
on that.
291
we have priors that have been pretty fine tuned to the problem.
292
so like priors that are fine tuned to that particular line of business, whether it's
commercial auto or workers' compensation or something like that.
293
I'd say that's like our biggest challenge is that small kind of problem.
294
then based with the informed priors is a way that we're able to tackle that in a more
principled way.
295
Yeah, yeah.
296
That makes, like...
297
ton of sense.
298
And that sounds like very fun models to work on.
299
Yeah.
300
Yeah.
301
I really love that.
302
So state space models.
303
You're mainly talking about HMMs, things like that.
304
Yeah, of the same form.
305
Yeah, we've done some experimenting with Gaussian processes.
306
Actually about to submit, my colleagues about to submit a paper doing some work with HMMs.
307
hidden markup models for anyone who's listening, guess, who might not know what that
acronym stands for.
308
But typically, the models that we use end up being even more simpler than that for our
forecasting problem, mostly just because of the fact that we do have such a small data to
309
work with.
310
Oftentimes, the functional form of the model can't be too complex.
311
And so they end up being kind of more similar to your
312
typical like a Rima style models, which you can kind of write in a state space fashion.
313
And so that it tends to be, yeah, more closely related to those than, than models that do
like regime switching things like that, because oftentimes we just don't have enough
314
information to, be able to fit those types of models, even with them from priors, it might
not be as believable.
315
That being said, it's, one of those things that like, if we,
316
If we do think that something might work well, like if we think that, you know, adding a
more complicated mechanism on the momentum piece of the model or, or adding in different
317
assumptions about mean reversion and things like that, we, we typically do explore those
types of things, but surprisingly hard to beat simple time series models with the small n
318
in our context.
319
And so we, we do, we do quite a lot of
320
cross validation to determine which types of models we should be using in production.
321
And oftentimes it's a mix of evaluating those models based on their performance, but also
how well calibrated they are and things of that nature so that we know that the models
322
we're using are interpretable and we can kind of defend them if something ends up going
sideways.
323
We want to be able to go to the investor and say, like, you know, we did our due diligence
and here's why we think this was still a good choice at the time.
324
I'm not sure if that gets at that question, but let me know if I can expand on the models
in particular.
325
No, mean, it's funny you'd say that because definitely it's hard to beat, it's
surprisingly hard to beat regression in a lot of contexts.
326
If you do a generalized regression, that's already a very good baseline and that's
327
pretty hard to beat.
328
So I'm not supposed to hear that's the same here.
329
Right.
330
And I think part of the issue too with our data is that like the more recent observations
in the time series have this high uncertainty along with them.
331
So with the measurement error component in there, it's difficult to choose between
different model configurations.
332
And so the more complicated your forecasting model is, that uncertainty ends up kind of
333
making it even harder for a more complex model to win out in our tests.
334
And so that's one of the things that we've observed in something that I think probably
anyone who's been involved in similar contexts would be able to say they've run into as
335
well.
336
Yeah.
337
And well, actually, I want to make sure we get to model leveraging and comparison.
338
So I still have
339
a few questions for you with time series and these kind of models.
340
let's switch gears a bit here.
341
And like, tell us how you use Bayesian model averaging in your projects.
342
And what advantages do you do you see in this approach over a single model predictions?
343
Yeah, no, it's a good question.
344
So I hadn't done a ton of work with
345
Bayesian model averaging or model averaging in general before I joined Ledger.
346
And so I was really excited to get to work on some of that stuff.
347
one of the ways I'd say it comes up in multiple parts of our workflow now, but one of the
first use cases was for our forecasting models.
348
And I was describing a bit earlier, we
349
You know, they, we have different models that make different assumptions about the
underlying losses and how they might change over time.
350
And I think the one, one example is, like, does the, does the process have momentum or
not?
351
like if a loss ratio is trending upward, do we think it's going like, is there going to be
some component of the model that kind of keeps it trending upward over time versus do we
352
have something in there where it functions more like a random walk and.
353
And this is something that a lot of industry experts might debate or like if you're if
you're like an actuary or CEO of some insurance company and you're trying to explain like
354
why your losses are trending in a certain direction, like people talk about these things
pretty normally, like momentum or reversion, things like that.
355
And so so because people have varying opinions about this, our approach
356
you know, one approach would be to try different models that make those different
assumptions and then do some do some model comparison and just select one.
357
But the because often there's certain contexts where, you know, it might make sense to
assume a momentum.
358
It might make sense to assume reversion and other contexts where it might not.
359
The model averaging became kind of like a very natural.
360
thing to do and try out in that context.
361
And so that was really what inspired it is just this idea that we don't have to
necessarily choose a model.
362
If we think both are reasonable, we can allow the data to make that decision for us.
363
And so that's really where it came into our workflow.
364
when we're doing our forecasts, we'll have these different models that we fit and make
predictions with.
365
And then we have our model averaging models, which now talking about models of models gets
a little bit fun terminology wise.
366
But that's the stage where we bring those in and we say, like, okay, given, you know, we
might have some covariates that we can use to build those models, those averaging models.
367
so things like we know what line of business it is, it's commercial auto workers'
compensation, something like that.
368
Like we know how much, how like...
369
big the volume is, like how much premium that the program brings in.
370
We know locations of these different businesses.
371
And so all of those can then be used as covariates and like a stacking model, for example.
372
And we can train those models to combine, rely more on the assumptions of one model over
the other, depending on the context.
373
And that was the motivation and that's where we still do that work today is mostly at that
forecasting step.
374
But yeah, I think Bayesian model averaging is really nice because if you have the capacity
to be able to fit the models that you want to blend together, we found through our
375
research, if we do that and compare it to like a single model in isolation,
376
Not always, but oftentimes it will end up performing better.
377
And so it's sort of like, why not take the best of both worlds as opposed to having to
worry about model selection?
378
And especially when the underlying models that we're blending together are both like
equally theoretically motivated and it's hard to really make a decision, even if the data
379
were to suggest one over the other.
380
Yeah, I mean, that definitely makes sense if you have a bunch of good models.
381
That's really cool to be able to average them.
382
I remember when I started learning Bayesian stanza, I was really blown away by the fact
that this is even possible.
383
that's just incredible.
384
Can you...
385
So actually, can you contrast model averaging with Bayesian model comparison to make sure
listeners understand both concepts and how they fit together, and then talk about how you
386
implement
387
these techniques in your modeling workflow?
388
Yeah, no, great question.
389
think so when I think of Bayesian model comparison, I often think of different types of
metrics that we might have, whether it's approximations or done by brute force, we might
390
like we might have some sort of cross validation metrics that we evaluate the models on.
391
So like in our forecasting case, you know, we might have actual historical
392
you know, maybe we look, have actual data from like 2000.
393
And so we actually have like 10 years of history on it.
394
We know what the ultimate state is.
395
We know what like the forecast should predict.
396
In those cases, you know, we can train our models.
397
We can have them do the out of sample predictions.
398
And then we can score on those out of sample predictions, like how well they're
performing.
399
So, you know, we often actually do the brute force.
400
as opposed to doing something like the, I know in the Stan community, you might have like
the Pareto smooth importance sampling, leave one out approximations, things like that is
401
another way to approach the problem.
402
But basically a lot of times when you're doing Bayesian model comparison, you'll have some
out of sample metric or approximation to it.
403
And then you like, you might have that for a bunch of out of sample data points.
404
And then those data points, can
405
do some statistical tests or even just look at sort of absolute values of how much better
one model is predicting now to sample performance metrics versus another.
406
And in the STAND community, and well, PMC as well, think like the expected log point -wise
predictive density or the ELPD is a quantity that's often used, which is sort of a log
407
likelihood based metric that we can use on.
408
out of sample data to compute like expected predictive performance.
409
And typically for Bayesian model comparison, the practice almost stops after you get that
ELPD value or something similar.
410
might be, you might do some test of like how different they are, like some standard error
on the difference of the ELPD between two models.
411
But at the end of the day, like once you have that metric,
412
that's sort of the inference that you might have at the end is that, okay, this model is
performing better per this metric.
413
with stacking, what you're doing, and I guess there's different forms of model averaging.
414
have like Bayesian model averaging, which is slightly different than stacking and things
like that.
415
But what would all of them follow the same basic principle is that you have your out of
sample performance metrics.
416
And then what you do is instead of just choosing one model based on the model that has
better out of sample performance metrics, you build a model on those performance metrics
417
to kind of tell you when you should rely on, you know, model A versus model B.
418
And so, so the stacking or averaging models we can think of as just like a different model
themselves that are trained instead of on your outcome.
419
measure in your actual substantive or your candidate model that you care about.
420
It's trained on the performance metrics, the auto sample performance metrics that you are
using to, in this case, you wouldn't be doing model selection.
421
You'd be blending together the predictions from each of your candidate models according to
the model and how it thinks you should weight both based on the auto sample performance.
422
And so.
423
So kind of going that route does require a bit more, you have to think a little bit more
about like how you're using your data because if you want to evaluate how well a stacking
424
model is performing, for example, you have to leave out a little bit more validation data.
425
So you don't want to do any double dipping.
426
so you'll have your candidate models that you'll make out of sample predictions on.
427
Those predictions become
428
that your performance on those predictions become the basis for training your stacking
model.
429
And then at the end of the day, you might train your stacking model on some other third
validation set of data.
430
So I think that's really the only big limitation, I would say, of using those approaches
over just like your traditional model comparison, where you're kind of done once you
431
select your model.
432
That being said, think, yeah, being able to combine the predictions from the candidate
models ends up oftentimes being well worth, well worth kind of dividing your data up that
433
way.
434
Yeah.
435
Yeah, yeah, definitely.
436
That's, that's an extremely, extremely good point and also very useful method.
437
I know in, in PIMC, for instance, with RVs, you can do that very easily where you
438
basically do your model comparison with all these, it's gonna give weights to the models
and then those weights are used by a plan C with the PMW sample, post -hera predictive W
439
where we weight each models, predictions, each models, post -hera predictive samples,
according to the weights from the model comparison.
440
So is that.
441
how you usually end up implementing that or using Stan?
442
How do you do that?
443
I think it's going to be interesting for the listeners who want to give that a try.
444
Yeah, no, it's a great question.
445
So that's good plug for the paper we just wrote.
446
So yeah, we've been actually using some internal software to do a lot of this.
447
like actually, all of our software historically has been like we have our own kind of
448
BRMS is not the right term for it, but we have our own language that we use to write Stan
models.
449
And then we do a lot of our stacking and stuff.
450
had our own internal code that we would use to do all of this.
451
But we decided recently that this was something that, yeah, I think we were talking about
before the show started today, that it's not something that
452
has gotten a lot of attention, like in terms of like making this easy to do in a generic
way with like a bunch of different types of stacking models.
453
And so we've actually just released and wrote a paper on this package in Python called
Bayes blend.
454
And what this package, the intent and what we hope that it will allow users to do, what
allows us to do it at least.
455
hopefully other users as well is
456
like given, you know, I might have a model that I fit in Stan or PMC or, you know,
whatever probabilistic programming language of choice.
457
We built the package such that you can kind of initialize a stacking model, one of a
various different ones.
458
So we have like the pseudo Bayesian model averaging types of models, the pseudo BMA plus
models, which are things that are based on the
459
ELPD and they blend based on that.
460
And then we also have like proper Bayesian stacking and hierarchical stacking models that
you can use with Bayes blend where given the out of sample likelihood metrics that you can
461
get by training your data or training your model on one set of data, making out of sample
predictions on another test set, given those as input.
462
you can fit a variety of these different stacking models and then easily blend them all
together and evaluate performance and things like that.
463
And so we've built that in Python just because that's the stack that like we use Python
for our day to day work and in our production setting.
464
And then we've been building integrations so that like right now it's really easy to
interface with.
465
command stand because that's what we use.
466
So we kind of built from that perspective first, but it does interface with our viz as
well.
467
So if you're using like IMC, for example, make your, can kind of create that our viz
inference data object and then use that as input for base blend.
468
And then yeah, what you will get at the end of that workflow if you use base blend is you
get the blended predictions from your candidate models.
469
as well as the blended likelihood, like the posterior likelihood, which you can use then
to evaluate performance and things like that.
470
And so, yeah, we're really excited about this.
471
I'm really excited to get other people outside of Ledger to use it and tell us what they
think, make some complaints, some issues.
472
There's a discussion board on the GitHub page as well.
473
And we have a paper that we've submitted to
474
We have a preprint on archive and we've submitted the paper as well, the journal, see how
that goes.
475
But it's something that we use regularly and so it's something that we plan to keep
contributing to.
476
if there's like quality of life or convenience things to make it easier for other folks to
use, we'd to hear about it.
477
Because I think there's a lot of work that can still be done with stacking.
478
There's a lot of really cool methods out there.
479
I think hierarchical stacking in particular is something that I haven't really seen used
much in the wild.
480
It's something we use every day at Ledger, which I think is, yeah, so I'm hoping base
blend will allow other people to kind of see that benefit and apply it in their own work
481
easily in a reproducible way.
482
Yeah, this is super cool.
483
And so of course I put in the show notes the paper.
484
and the documentation website to Baseband for people who want to dig deeper, which I
definitely encourage you to do.
485
And when you're using Baseband, let's say I'm using Baseband from a PIMC model.
486
So I'm going to give an inference data object.
487
Do I get back an inference data object?
488
object with my weighted positive predictive samples?
489
How do I get back the like, which format am I going to get back the predictions?
490
Yeah, that's a great question.
491
I think if I can remember correctly, I don't want to give you the wrong information.
492
I'm pretty sure we have like, like, when you create the object that does the stacking, so
the model object, there's a from our method.
493
And then I think we have a toRViz method that you can kind of, it will use its own
internal representation of the predictions and things like that for just the sake of
494
fitting the stacking model.
495
But then I think you can return it back to an RViz inference object at the end.
496
And one of the things that I wanna do, we haven't had the bandwidth for it quite yet, but
it's not that many steps to then just kind of get rid of like.
497
we should just have like a from time C method, for example.
498
And I think implementing something like that would be pretty straightforward.
499
So I think we'll probably get to it eventually, but if anyone else wants to contribute,
once they know that, we have that doc on like how to contribute as well on the GitHub
500
page.
501
So, but yeah, so I think we, our intention is to make it as seamless as possible.
502
And so to the extent that there's ways that we can make it easier to use, definitely open
to add those features or take recommendations on how we should approach it.
503
But yeah, think probably right now working through the RVIS entrance data object is the
way you can interface with most things other than command stand.
504
Yeah.
505
Yeah.
506
Yeah.
507
I mean, for people listening in the Pinesy world and even
508
Python world and even Stan world, investing in understanding better the inference data
object and XRA is definitely a great investment of your time because I know it sounds a
509
bit frustrating, but it's like, basically it's like pandas.
510
It's the pandas of our world.
511
And if you become proficient at that format, it's gonna help you tremendously in your
Bayesian modeling workflow because
512
You may only want to interact with the model, but actually a huge part of your time is
going to be making plots.
513
And making plots is done with prior predictive or preserve predictive samples.
514
And that means they live in the inference data object.
515
I know it can be a bit frustrating because you have yet another thing to learn, but it is
actually extremely powerful because it's a multi -dimensional pandas data frame,
516
basically.
517
So instead of only having to have.
518
2D pandas data frames, you can do a lot of things with a lot more dimensions, which is
always the case in No, I totally agree with that.
519
And I think the other thing that's nice about it is you can use it in it.
520
They have integrations in ARVIZ to work with a whole host of different PPLs.
521
it's like, whether you're using Stan or PrimeC or whatever else, if ARVIZ data inference
object is always the commonality, it's...
522
makes it easy if like, I'm in a different setting and I'm using this other PPO in this
case, and having to learn a bunch of different tools to do plotting and deal with the data
523
can be quite annoying.
524
So it's nice to have one place to do most of it.
525
And I think we're gonna lean on that pretty heavily with like developing base blends.
526
I think there's more we could probably do to integrate with the inference data structure
and like.
527
in terms of making it easier to plot things and stuff like that.
528
I think it's something I'm learning more and more about myself and would definitely also
recommend others to do.
529
Yeah, that's what I tell my students almost all the time.
530
It's like, time spent learning how inference data object works is well spent.
531
Yeah, agreed.
532
Because you're going to have to do that anyways.
533
So you might as well start over.
534
Right.
535
Yeah.
536
You'll encounter it at some point.
537
yeah, yeah.
538
And I'm wondering, so you talked about model stacking too.
539
I'm not familiar with that term.
540
Is that just the same as model averaging or is that different?
541
Yeah, I mean, so there's like, there's technically some differences.
542
And I think some of the ways that like when
543
I think historically the term Bayesian model averaging has meant something pretty specific
in the literature.
544
And yeah, I want to hope to not get this wrong because sometimes I mix things up in my
head when thinking about them.
545
It's just due to the names makes it easy.
546
But I'm pretty sure historically Bayesian model averaging was done on like in sample fit
statistics and not out of sample, which can kind of.
547
it's a small thing, but it can make a big difference in terms of like results and how the
problem is approached and things like that.
548
And so when talking about model averaging, I'd say like stacking is one form of model
averaging.
549
And there are many ways that one could perform model averaging, whereas like stacking is
one specific variant of that.
550
like the way that we like actually implement stacking is
551
There's a couple of different ways that you can do it, but you're basically optimizing
the, like if you have the out of sample log likelihood statistics, you can compute like a
552
point -wise ELPD, if you will.
553
So it's not like the sum of the log predictive density across all of your data points, but
just like each data point has its own LPD.
554
And then what you're essentially doing with stacking is you're fitting a model to optimize
combining all of those points across your different models.
555
So it's like, maybe for certain data points, yeah, model A has a higher out of sample
likelihood than model B and for others it has lower.
556
And so the goal of the stacking model is to fit it to those, with those as outcome
measures.
557
And then,
558
the weights that you derive from that are basically just optimizing how to combine those
likelihood values.
559
so the way that stacking is actually implemented after you estimate those weights is to
sample from the posterior.
560
So if I have, for a given data point, I have a 50 % weight on one model, 50 % weight on
another.
561
kind of blending together the posteriors by drawing samples in proportion to the weights.
562
so that's kind of how stacking is approached and how we've implemented it in Bayes Blend.
563
I know like Pseudo -BMA, think Yu Lingyao who had done a lot of work with stacking and
Pseudo -BMA and stuff, we've had some talks with him.
564
as well as Aki, Vittari, and some other folks who have done some work on these methods.
565
I think they're moving away from the pseudo Bayesian model averaging terminology to start
to call it something that is less suggestive of like what classical Bayesian model
566
averaging has typically referred to.
567
And so I think like for folks interested in exploring more of that today,
568
I mean, if you can read the preprint, some definitions that does a pretty good job, I'd
say, kind of defining some of these different ideas and gives you the math that you can
569
look at to see how it's actually done mathematically.
570
But then if you're kind of searching for resources to, I think, focusing on like the
stacking terminology should be probably pretty helpful over like Bayesian model averaging.
571
That's my two cents, at least.
572
Okay, yeah.
573
Yeah, so what I get from that is that it's basically trying to do the same thing, but
using different approaches.
574
Yeah, right, right, right.
575
And that's my impression.
576
I'm sure other people will have different reads on the literature.
577
Like I said, it's something I've only really begun to explore in the last couple of years.
578
And so there's I'm sure there are many other people out there that know much more.
579
Okay, yeah, yeah, for sure.
580
If we said
581
If we've made a big mistake here and someone knows about that, please get in touch to me.
582
can, you can be outraged in your message.
583
That's I've learned something from that.
584
I'm welcome.
585
That's right.
586
Me as well.
587
And so actually I'd like to get back a bit to the previous
588
the previous models we talked about, know, now that we've talked about your model
averaging work, and I'm curious about how do external factors like economic downturns or
589
global health crises, for instance, how does that affect your forecasting models and what
strategies do you employ?
590
to adjust models in response to such events?
591
no, it's a great question.
592
So yeah, can economic factors definitely, definitely can influence kind of the performance
of these portfolios.
593
But a lot of times it's actually surprisingly, like, these loss ratios are surprisingly
robust to a lot of these economic factors.
594
And partly,
595
It's just because of the way that I think insurance generally works where, you know, if a
good example of this is in, yeah, like COVID times, people like, for example, if you're
596
thinking about insuring commercial auto or private auto insurance policies, and like when
COVID happened, people stopped driving.
597
And so people got into a lot less accidents.
598
And so in that case, loss ratio is one.
599
really far down for auto policies or for auto programs.
600
And in some cases, insurance companies actually paid back some of the policyholders, like
some portion of the premium, just because things were so, like the the loss ratios were so
601
low.
602
And so there's examples of things like that happening.
603
like just due to the nature of how like policies are written out,
604
You do have to have, and how they're paid out.
605
So like I paid my insurance upfront and then I only, they only lose money when claims are
made.
606
The things that we think about, they have to be things that would, mostly things that
would influence claims, I would say is the primary factor.
607
So if there's something economic that we believe is going to affect how much claims are
made, whether we think it will make them go up or down, like that's going to be our, like
608
the primary force through which
609
like economic conditions could affect these models, mostly because like the premium that
is written is pretty stable.
610
Like generally, regardless of what's going on economically, the same types of insurance
policies like are often either required or things like that.
611
So unless management is changing a lot about the program in terms of how they're pricing
things or something of that nature, you don't tend to get huge swings in the premium
612
that's coming in.
613
And so that's what we focus on.
614
Mostly it would be things that affect claims.
615
when we do look at that, one of the things that we've implemented, that we've looked into
is we look at like modeling overall industry level trends and using that as sort of input
616
to our program level models.
617
And so it's not quite like driving priors from industry.
618
It's more like
619
we actually know across all of the commercial auto programs, for example, what the sort of
industry level loss ratio is.
620
And if we can understand, like that's where we might have some general idea of how
economic factors might influence something at that high of a scale.
621
like the interest rate environment or like the location of the industry and other things
like that.
622
We've built some models of like industry level trends that are then used as, so it's like
given we can predict like an industry loss ratio for the next so many accident years, we
623
can use that information in our program level models and say like, how much do we think we
need to weight, you know, the industry where the industry is trending versus what we see
624
in this program.
625
That's kind of how we've approached that problem historically.
626
I'd say we approach it that way mostly just because it's really hard at the level of
granularity that a lot of the programs that we deal with, like they're pretty small
627
relative to industry at large.
628
And so it's often hard to observe like general industry trends in the data, especially
when we have relatively few historic data points.
629
hard to do it in a data -driven way.
630
that's the big way that we've approached that problem is to kind of...
631
We can better understand industry and understand how economic factors influence where the
industry is trending.
632
We can then use that information in our program level analysis.
633
And so we do have some models that do that.
634
Yeah, fascinating.
635
Fascinating.
636
I really love that.
637
That's super interesting because, by definition, these events are extremely low frequency.
638
But at the same time, can have a huge magnitude.
639
you would be tempted to just forget about them because of their low frequency.
640
But the magnitude means you can't really forget about them.
641
so that's really weird.
642
think also having done that innovation framework is actually very helpful because you can
actually accommodate that in the model.
643
Yeah.
644
And I think that too, another thing that
645
This is kind of interesting about the kind of insurance portfolios that we deal with is
that some of this is actually on the underwriters or the management team who's actually
646
like writing the insurance policies.
647
And so a lot of times, like those folks are the ones who are way ahead of the game in
terms of like, I think there's this really what we might call a long tail risk or
648
something like historically.
649
Like workers compensation asbestos was an example of this where it something that was
introduced in a bunch of houses.
650
It was used everywhere as an insulator and you know, decades down the road come to find
out that this stuff is causing cancer and doing horrible things.
651
And like those long tailed risks are, they're pretty rare.
652
You don't come by them often.
653
But it's something that a lot of times the underwriters who are
654
kind of pricing the policies and writing the insurance policies, they are sort of the
frontline defense for that because they're on the lookout for all of these long tailed
655
risks and taking that into account when like pricing the policies, for example, or when
writing policies themselves to exclude certain things if they think it shouldn't apply.
656
so oftentimes like that
657
that kind of makes its way into the perspective we have on modeling because when we're
modeling a loss ratio, for example, our perspective is that we're almost trying to
658
evaluate the performance of the management team because they're responsible for actually
writing the policies and marketing their insurance product and all of that.
659
And we view ourselves as looking at the historic information is just like their track
record.
660
so, I mean, that doesn't stop big economic things from.
661
from changing that track record.
662
But that's something that's kind of influenced how we think about our models, at least
from a generative perspective.
663
yeah, I think it's definitely important to have that perspective when you're in such a
case where the data that you're getting is made and it kind of arises in a rather
664
complicated way.
665
Yeah, fantastic points, I completely agree with that.
666
And are there some metrics or, well, techniques we've talked about them, but are there any
metrics, if any, that you find most effective for evaluating the performance of these
667
Bayesian time series models?
668
Yeah, no, think, yeah, historically we've done a lot of the sort of log likelihood based
metrics.
669
So we use ELPD for a lot of our decision making.
670
So if we're exploring different models and we're doing our stacking workflow and all of
that, at the end of the day, if we're deciding whether it's worth including another
671
candidate model in the stacking model in production, we'll often compare like what we
currently have in production to the new proposed thing, which could be a single model.
672
It could be some stacked models or what have you.
673
Typically we're using ELPD and we also look at things like
674
like RMSE and mean absolute error.
675
We tend to not rely necessarily on any given metric just because sometimes, especially
with ELPD, with the types of models we work with, there are some times where you can get
676
pretty wild values for ELPD that can really bias like.
677
Like at the end of the day, I guess this gets a little technical, but you might have an
LPD score for each data point.
678
And if one of those data points is quite off, when you take the sum to get your total
model performance metric, it can often, can sometimes, it acts like any outlier and can
679
kind of make the sum go in one direction quite a bit.
680
so sometimes the LPD might be very sensitive to...
681
like outlier data points compared to things like RMSE.
682
So you might be actually, and then the reason is just because like you might be, your
prediction might be pretty close, like in an absolute scale, but like if your uncertainty
683
is really low in your prediction, like what ELPD really is measuring is like the height of
your posterior density, prediction of the density at where the data point is.
684
And so if you're too certain, your data points like way out in the tail of some
distribution,
685
and it ends up getting this crazy value even though RMSE might be pretty good because on
average you're pretty close actually.
686
So we have had to do some forays into more robust ways to compute or to estimate ELPD.
687
We've done some research on that and sometimes we'll use those metrics in production where
we will say instead of
688
instead of taking a sum of the ELPD values across all your data points, out of sample data
points will fit like a T distribution to all of those data points.
689
And that's one way that like the expectation of that T distribution is not going to be as
influenced by some extreme outliers.
690
You also get the benefit of, you get like a degrees of freedom parameter estimated from
the T distribution that way.
691
And that can be a sort of diagnostic because if it's too low, then you're approaching like
a Cauchy distribution that doesn't have an expectation.
692
It doesn't have a variance.
693
so, so you can, we've explored methods like that that we'll sometimes use in production
just because we do so many tests.
694
It's a shame to like not be able to do a comparison because there's like a few data points
out of the thousands of data points that we have in our historic database that kind of
695
throw everything off and make it such that
696
there's no consensus on which model is performing better.
697
And so yeah, that's a long way of saying we mostly focus on ELPD, but use some other like
more absolute metrics that are easily interpretable and then also some what we kind of
698
think of as these more robust variants of ELPD, which I think at some point I think we'll
try to write a paper on it, see what other people think because one of those things that
699
comes up, you come up with a solution to something that you think is a pretty big problem
and then very curious what other people might actually think about that or if they see any
700
big holes in the approach.
701
so, yeah, maybe at some point we'll have a paper out on that.
702
We'll see.
703
Yeah, that sounds like fun.
704
But actually, I think it's a good illustration of something I always answer my students
who come from the statistic framework.
705
and they tend to be much more oriented towards metrics and tests.
706
And that's always weird to me because I'm like, you have posterior samples, you have
distribution for everything.
707
Why do you want just one number?
708
And actually you worked hard to get all those posterior samples in distribution.
709
So why do want to throw them out the window as soon as you have them?
710
I'm curious.
711
So yeah, You need to make a decision, right?
712
Yeah.
713
And often they ask that like something related to, but what's the metric to know that
basically the model is good.
714
You know, so how do I compute R squared, for instance?
715
Right.
716
And I always give an answer that must be very annoying, that's like, I understand that you
want a statistics, you know, a metric, a statistic.
717
That's great.
718
But it's just a summary.
719
It's nothing magic.
720
So what you should probably do in all of the cases is using a lot of different metrics.
721
And that's just what you answered here is like, you don't have one go -to metric that's
supposed to be a magic number, and then you're good.
722
It's like, no, you're looking at different metrics because each metric gives you an
estimation of a different angle of the same model.
723
And a model is going to be good at some things
724
but not at others, right?
725
It's a bit like an athlete.
726
An athlete is rarely extremely complete because it has to be extremely specialized.
727
So that means you have trade -offs to make.
728
so your model, often you have to choose, well, I want my model to be really good at that.
729
Don't really care about it being really good at that.
730
But then if your metric is measuring the second option, your model is gonna appear really
bad, but you don't really care about that.
731
what you end up doing as the modeler is looking at the model from different perspectives
and angles.
732
And that will also give you insights about your model because often the models are huge
and multi -dimensional and you just have a small homo sapiens brain that cannot see beyond
733
three dimensions, right?
734
So you have to time down everything and basically...
735
I'm always saying, look at different metrics.
736
Don't always look at the same one.
737
And maybe also sometimes invent your own metric, because often that's something you're
interested in.
738
You're interested in a very particular thing.
739
Well, just invent your own metric, because you can always compute it, because it's just
posterior samples.
740
And in the end, posterior samples, you just count them, and you see how it goes.
741
That's not that hard.
742
No, I think that's great.
743
And it's great to get that.
744
in folks heads when they're like starting to learn about this stuff.
745
It's like I can't even, I can't even count like how many, know, classic machine learning
papers I've seen where you just have tables of bolded metrics with the model with the
746
lowest RMSEs the best, right?
747
And so therefore it's chosen.
748
You know, I think that perspective can, it can make it a little harder to actually
understand your models for sure.
749
And yeah, because there's even other things like we've, it reminds me like, yeah, I we
look at like we do simulation based calibration, like prior sensitivity analysis, like all
750
of these things that aren't necessarily tied to a performance metric, but they're tied to
how well you can interpret your model and how much you can trust it in the parameters that
751
it's outputting.
752
And so I think like all of those should also definitely be considered.
753
And, you know, another thing that we encounter quite a lot is like,
754
there's a cost to productionize these models.
755
Like if we have a new model and it performs better technically by a small amount, like is
it really worth it if it's like a very complicated, hard to interpret and hard to
756
maintain?
757
And I think sometimes the answer is no, actually what we have is good enough.
758
And so we don't actually need this more complicated, you know, hard to work with model.
759
And that's something that I feel is
760
probably more common in industry settings where you're expected to maintain and reuse
these models repeatedly versus maybe more in academic work where where you like research
761
is the primary objective and maybe you don't need to think as much about like model
maintainability or productionization things like that.
762
And so I feel like having a holistic perspective on how your models evaluated is I think
very important and
763
and definitely something that not any single metric is going to allow you to do that.
764
Yeah, definitely.
765
I'm really happy to hear that you guys use simulation -based calibration a lot because
that's quite new, but it's so useful.
766
It's very useful.
767
Yeah.
768
It's nice to figure out if you have problems with your model before you fit it to real
data.
769
Yeah.
770
Yeah, I'm curious about how you do that.
771
But first, for folks who want some detail about that, you can go back and listen to
episodes 107 with Marvin Schmidt.
772
We talked about amortized Bayesian inference and why that's super useful for simulation
-based calibration.
773
And also episode 109 with Sonja Winter, where we actually go into how would you implement
simulation -based calibration.
774
why that's useful.
775
So it's a bit of context here if you don't know what that is about.
776
And now there are chapters to the episodes.
777
So if you go to the website, learnbasedats .com, you go to the episode page, you'll have
the chapters of the episodes and you can directly click to the timestamp and you'll see,
778
you'll be able to jump directly to the...
779
episode to the part of the episode where we talk about that stuff in particular And of
course you have that also on the YouTube channel.
780
So now if you go to any YouTube episodes you click on the timestamp you're interested in
and the video will just get there.
781
So that's pretty cool to reference back to something, you know, you're like, I've heard
that somewhere In the episodes, but I don't remember exactly where so yeah
782
like something you can do that I use actually quite a lot.
783
You go to LearnBaseStats.
784
I'm going to be using this now.
785
This is a good tip.
786
You can do Control -F.
787
You do Control -F and you look for the terms you're interested in and it will show up in
the transcript because you have the transcript now also on on each episode page on
788
LearnBaseStats .com.
789
You look at the timestamp and then with the timestamp you can infer which chapter this is
talked in and then you get back to the
790
to the part of the episode you're interested in much faster.
791
yeah.
792
Yeah, very helpful because searching in podcasts has historically been a challenging
problem.
793
yeah.
794
Now that's getting much better.
795
So yeah, definitely use that.
796
I do that all the time because I'm like, wait, we talked about that.
797
I remember it's in that episode, but I don't remember when.
798
So I use that all the time.
799
So yeah, maybe like I know we're running like we're already at
800
when I were in 15, but just can you talk a bit about SBC simulation based calibration?
801
How do you guys use that in the trenches?
802
very curious about that.
803
That's a good question.
804
Yeah.
805
So we have like for pretty much everything we do, we have like pretty custom models that
we use pretty custom software.
806
So we have like our own internal software that we've written to like make it easy for
807
the style of models that we use.
808
yeah, for any model that we do research on or any model that we end up deploying in
production, typically, yeah, we start with simulation -based calibration and prior
809
sensitivity analysis and that sort of thing.
810
And so with simulation -based calibration, typically, because described our workflow
earlier where it's kind of like multi -stage.
811
but there are multiple different models.
812
Like there are a couple of different models along that pipeline.
813
And so typically we will do SBC for each of those individual component models just to see
if there are any clear issues that arise from any kind of sub component of our whole
814
pipeline.
815
And then we will also try to do it at least for...
816
Like in some cases we might be able to do it for the whole pipeline, but typically you
might look at it for each individual model and then we'll, for the whole pipeline, we
817
might look more like it, like the calibration of the posterior predictions, for example,
against the true holdout points.
818
But yeah, SVC is really nice because it, I mean, oftentimes we do want to be able to
interpret the parameters in our models.
819
Let's say like if you really don't care about interpretation, maybe it's.
820
it's maybe not as motivating to go through the whole SBC process.
821
But in our case, oftentimes we'll have parameters that represent like how much momentum a
program has, how much reversion it has, like where the average program level loss ratio is
822
that gets reverted back to is sitting, which is an important quantity.
823
And we want to know that when we get those numbers out of our models and the parameter
values that we can actually interpret them in a meaningful way.
824
And so SBC is, yeah, a great way to be able to look and see like, okay, are we able to
actually estimate parameters in our models in an unbiased and an interpretable way?
825
But then also like, have we programmed the models correctly?
826
I think it's another big thing.
827
SBC helps you resolve because often, you
828
The only time you really know if your model is actually coded up the right way is if you
simulate some fake data with known parameter values and then try to recover them with the
829
same model.
830
And SBC is sort of just like a comprehensive way to do that.
831
I remember like before I read this SBC paper I used to back in my PhD years, like we
would, you know, pick a random set of parameter values for these models and simulate some
832
data and refit just that single set of parameters and see
833
Like, okay, like are they falling in the posteriors?
834
And I feel like SBC is sort of just like a very nice way to do that in a much more
principled way.
835
And so I would definitely encourage regular use of SBC, even though, yeah, it takes a
little bit more time, but it saves you more headaches later on down the road.
836
Yeah, I mean, SBC is definitely just a industrialized way of doing
837
what you described, right?
838
Just fixing the parameters, sampling the model, and then seeing if the model was able to
recover the parameters we used to fit it, basically.
839
yeah.
840
From these parameters, you sample prior predictive samples, which you use as the data to
fit on the model.
841
yeah, multi -spatial inference is super useful for that, because then once you've trained
the neural network, it's free to get posterior samples.
842
But two things.
843
First, that'd be great if you could add that SPC paper you mentioned to the show notes,
because I think it's going to be interesting to listeners.
844
second, how do you do that concretely?
845
So when you do SPC, you're going to fit the model 50 or 100 times in a loop.
846
That's basically how you do it right now.
847
Yeah, usually we'll have will fit probably probably like 100 to 500 times closer to 500.
848
Usually a lot of our models actually don't take too long to fit.
849
Most of the most of the fitting time that is involved is in like the R &D process, like
backtesting, training the stacking models and all that.
850
But once you're like for the individual models to fit to data, they're pretty quick most
of the time.
851
So we we do a decent number of samples usually.
852
And yeah, it's sort of like, well,
853
We have like the way that it's programmed out in our internal software is that will like
refit the model basically to like save out all the sort of rank statistics that we need or
854
like the location of the the percentiles of the the true values in the posterior predicted
values for those parameters and just store all those like going over in a loop.
855
I think we've
856
We might have some stuff that we've messed around with like parallelizing that, but it
ends up usually just being faster to just parallelize the MCMC chains instead.
857
So a lot of times we just run this stuff locally because the models fit so quick.
858
That's usually how we approach it.
859
But yeah, so it's like if you can set a single set of parameter values and do a parameter
recovery simulation, SBC is basically just a loop on top of that.
860
it's not a lot.
861
too much overhead, really.
862
The overhead is in the added time it takes, not necessarily the added amount of code that
it takes.
863
Just kind of nice, I think.
864
Yeah, Yeah, the code is trivial.
865
It's just like you need to let the computer basically run a whole night.
866
Yeah, go make some tea, get a coffee.
867
Yeah, to do all the simulations.
868
mean, that's fine.
869
keep your house warm in the winter.
870
Usually it fits quite fast, the model, because you're using the prior predictive samples.
871
and as the data so it's not too weird.
872
In general, you have a rubber structure in your generative model.
873
So yeah, definitely.
874
Yeah, no, that's a point.
875
Please nurse to do that.
876
Yeah, sure.
877
Yeah.
878
And if you don't have completely stupid priors, Yeah, you will find that you probably do.
879
Yeah.
880
If you're using extremely wide priors, then yeah, your your data is gonna look very weird.
881
a good portion of the time.
882
so yeah, like then the model fitting is going to be longer, but then.
883
Yeah.
884
No, that's a good point.
885
Yeah.
886
It didn't bring up, but yeah, for, for helping come up with reasonable priors, SBC's
another, so it's a good way to do that because if it works, then that's a good sign.
887
If it's going off the rails, then probably your priors would be the first place to look
other than perhaps you'll get the code.
888
Yeah.
889
Yeah.
890
No, exactly.
891
And that's why SBC is really cool.
892
I think it's like, as you were saying,
893
Because then you have also much more confidence in your model when you actually start
fitting it on real data.
894
So maybe one last question for you, Nate.
895
I have so many questions for you.
896
It's like you do so many things, we're closing up to the one hour and a half.
897
So I want to be respectful of your time.
898
But I'm curious where you see the future of
899
patient modeling, especially in in your field, so insurance and financial markets,
particularly with respect to new technologies like, you know, the new machine learning
900
methods and especially generative AI.
901
Yeah, that's a great question.
902
I
903
I'm of two minds, think.
904
A part of me, from doing some Bayesian modeling and guess kind of like healthcare before
this and now more in like finance insurance side, I think there's like what you see in the
905
press about like all the advances in generative AI and all of that.
906
And then there's like the reality of the actual data structures and organization that you
see in the wild.
907
And I think...
908
I think there's still like a lot of room for more of what you might think of as like the
classical kind of workflow where people are, you know, not really necessarily relying on
909
any really complicated infrastructure or modeling techniques, but more following like your
traditional principle Bayesian workflow.
910
And I think especially in the insurance industry, the insurance industry is like very
heavily regulated.
911
And like if you're doing any pricing for insurance, for example,
912
you basically have to use a linear model.
913
There's really very little deviation you can get from there.
914
And so like, yeah, you could do base there, but you can't really, I think more of like
the, what we might think of when we think of like AI types of technologies.
915
think there's potentially room for that and like within organizations, but for some of the
core modeling work that's done that influences decisions that are made, I think.
916
there's still a ton of room for more of these classical statistics types of approaches.
917
That being said, I think there's a lot of interest in bays at scale and more modern
machine learning types of contexts.
918
And I think there's a lot of work that's going on with in -law and variational bays and
like
919
the Stan team just released Pathfinder, which is kind of a new algorithm, like on the
variational side of things.
920
And I think when I think of like Bayes at scale and like Bayesian machine learning types
of applications, I think there's probably a lot of interesting work that can be done in
921
that area.
922
think there's a lot of interesting future potential for those methods.
923
I have less experience with them myself, so I can't really speak to
924
to them in too much detail.
925
But I also think there's a lot of interesting things to explore with full bays.
926
As we have more compute power, it's easier to, for example, run a model with many chains
with relatively few samples.
927
And so I think with distributed computing, I think it would be great to have a future
where we can still do full bays.
928
like, you know, get our full posterior's with some, some variant of MCMC, but in a, in in
faster way, just with more compute.
929
And so I think, yeah.
930
So, so I guess all that to say, I think that there's going to be a long time before, you
know, the classical statistics, like modeling workflow becomes obsolete.
931
I don't see that happening anytime soon, but I think in terms of like using Bayes and
other things at scale, there's a lot of
932
really exciting different methods that are being explored that I haven't actually myself
had any real exposure to in like work setting or applied setting because the problems that
933
I've worked on have kind of retained there.
934
I can still mostly fit the models on my laptop or on like some EC2 instance that some
computer that doesn't require too much compute.
935
So yeah, I guess that's.
936
That's my current perspective.
937
see how it changes.
938
Yeah, yeah.
939
mean, these are good points for sure.
940
mean, you're still going to need to understand the models and make sure the assumptions
make sense and understand the edge cases, the different dimensions of the models and
941
angles as we talked about a bit earlier.
942
yeah, I think that's a really tremendous asset.
943
and a kick -ass sidekick.
944
So for sure, that's extremely useful right now.
945
I can't wait to see how much progress we're gonna make on that front.
946
I really dream about having a Jarvis, like Iron Man, like Tony Stark has Jarvis, and then
it's like, it's extreme.
947
That'd be perfect, but you're basically outsource a lot of that stuff that you're not very
good at and you focus on the thing you're really extremely good at and efficient and
948
productive.
949
Yeah, no, definitely think that like a lot of the generative AI types of tools definitely
can aid with productivity for sure.
950
Like, I can't tell you how many times I've just been like, hey, tell me how to do this
with Pandas because I don't want to figure it out.
951
Similarly with like Plotly or stuff like that.
952
feel there are certain parts of the workflow where Google or Stack Overflow is no longer
the first line of defense, right?
953
And I think a lot of that stuff that I don't like to spend time on sometimes can be sped
up by a lot of these tools, which is really nice to have, I would say.
954
But yeah, we'll definitely be curious to see.
955
if Bayes underlies any of some of these methods going forward.
956
I know there's an interest in it, but the scalability concerns have so far maybe made that
a little challenging.
957
Although I don't know in your case, in my case, I've never had a project where we were
like, no, actually we can't use Bayes here because the sketch is too big.
958
No, I agree.
959
think I've been similarly for me.
960
Usually there's a way.
961
And I think, yeah, I think there are definitely problems where that gets challenging, but
at the same time, like if it's getting challenging for Bayes, it's probably gonna be
962
challenging for other methods as well, I think.
963
And then you deal with other issues in these cases too.
964
And so I think, yeah, I've also been kind of biased by this, because a lot of times I'm
working with rather small datasets.
965
at least in terms of how much memory they're taking up on my computer or something like
that.
966
They're small enough that we can do some fun modeling and not have to worry too much.
967
Yeah.
968
That's a good point.
969
But yeah, definitely I'm still waiting for a case where we'll be like, yeah, no, actually
we cannot use base here.
970
Right.
971
Yeah.
972
That's actually interesting.
973
Hopefully it never happens.
974
Right.
975
Yeah.
976
That's the dream.
977
And so, Nate, let's call it a show.
978
I've already taken a lot of your time and energy.
979
I'm guessing you need a coffee.
980
yeah.
981
I'll probably go get some tea after this.
982
Maybe a Red Bull.
983
We'll see how I'm feeling.
984
This has been a great time.
985
Yeah.
986
It was great.
987
I mean, of course, I'm going to ask you the last two questions.
988
I escaped against the end of show.
989
So first one, if you had unlimited time and resources, which problem would you try to
solve?
990
Yeah.
991
So I thought a lot about this because I've listened to the podcast for so long and I
contemplated every time.
992
I feel like I always miss the other listeners' responses because I'm lost in my thoughts
about like, how would I answer this?
993
And I think probably, I think I would want to do
994
some work in like mental health, which is sort of my, the field that I grew up in, right?
995
Like there's a lot of like open problems in like psychiatry, clinical psychology, both in
terms of like how we measure like what a person is experiencing, like mental illness in
996
general, how we dissociate different types of.
997
disorders and things like that, but then also in terms of treatment selection, as well as
like, your treatments like automated treatments that are maybe scaffolded through like
998
apps first, or that have different types of models rather than just like the face to face
therapy model.
999
And I think, yeah, if I had unlimited resources, unlimited time and funding, would be
exploring
Speaker:
exploring kind of solutions to that, I guess solutions isn't even the right word, ways to
kind of approach mental health crisis and how to measure, both better measure and better
Speaker:
get people into treatments that they need.
Speaker:
And some of the work I was doing at AVO before was related to this, but it's a
surprisingly hard field to get funding for.
Speaker:
Just because there's a lot of barriers.
Speaker:
Working in like healthcare is a hard thing to navigate.
Speaker:
And there's a lot of snake oil treatments out there that seem to suck up a lot of the
interest in funding.
Speaker:
And so I think, you know, if I didn't have to worry about that, there'd be a lot of
interesting things to do.
Speaker:
But yeah, that'd be that I think that would be what I would focus my energy on.
Speaker:
Yeah, definitely a worthwhile quest.
Speaker:
And since you never get the guest's answers, I think you're the first one to answer that.
Speaker:
It goes with my Freudian sips mug here.
Speaker:
It's my favorite.
Speaker:
And second question, if you could have dinner with any great scientific mind, dead, alive
or fictional, who would it be?
Speaker:
Yeah, this one was a lot harder for me to think about.
Speaker:
But I came to what I think would be the answer.
Speaker:
And so I'm going to pick a fictional person.
Speaker:
And I don't know if you've read Foundation series or like watched the television series,
but Harry Seldon is the architect of what he calls psycho history, which is essentially
Speaker:
this, science of
Speaker:
predicting like mass behaviors like population level behaviors and he's like developed
this mathematical model.
Speaker:
It allows him to predict, you know, thousands of years into the future how people will be
interacting and like, you know, saves the galaxy all that spoiler alert, but sort of.
Speaker:
I'll leave some ambiguity, right?
Speaker:
But I think that would be the person just because it's kind of an interesting concept.
Speaker:
I think he's an interesting character and like, Psycho History sort of is like, given my
background, I'm just kind of interested in that whole concept.
Speaker:
And so if someone were able to do that, I'd sure like to better understand how exactly
they would be doing it.
Speaker:
Maybe there's phase involved, we'll see.
Speaker:
Yeah, great answer.
Speaker:
And here again, first time I hear that on the show.
Speaker:
That's awesome.
Speaker:
You should definitely put a reference to that show in the show notes.
Speaker:
That sounds like fun.
Speaker:
I'm definitely going to check that out.
Speaker:
And I'm sure a lot of listeners also will.
Speaker:
yeah, definitely.
Speaker:
Well, awesome.
Speaker:
Thanks again.
Speaker:
Nate, that was a blast.
Speaker:
I really learned a lot.
Speaker:
And that was great.
Speaker:
think you have an updated episode about model averaging, model comparison.
Speaker:
I hope, Stéphane, you were happy with how it turned out to be.
Speaker:
Me too.
Speaker:
Well, as usual, I'll put resources and a link to your website in the show notes for those
who want to dig deeper.
Speaker:
Thank you again, Nate, for taking the time and being on this show.
Speaker:
Awesome, yeah, thanks a lot for having me.
Speaker:
I had a blast and yeah, I forward to being a continued listener.
Speaker:
Yeah, thank you so much for listening to the show for so many years.
Speaker:
Definitely means a lot to And you're welcome back anytime in the show, course.
Speaker:
Yeah, just let me know.
Speaker:
This has been another episode of Learning Bayesian Statistics.
Speaker:
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbaystats .com for more resources about today's topics, as well as access to more
Speaker:
episodes to help you reach true Bayesian state of mind.
Speaker:
That's learnbaystats .com.
Speaker:
Our theme music is Good Bayesian by Baba Brinkman.
Speaker:
Fit MC Lance and Meghiraan.
Speaker:
Check out his awesome work at bababrinkman .com.
Speaker:
I'm your host.
Speaker:
Alex Andorra.
Speaker:
can follow me on Twitter at Alex underscore Andorra, like the country.
Speaker:
You can support the show and unlock exclusive benefits by visiting Patreon .com slash
LearnBasedDance.
Speaker:
Thank you so much for listening and for your support.
Speaker:
You're truly a good Bayesian.
Speaker:
Change your predictions after taking information in.
Speaker:
And if you're thinking I'll be less than amazing, let's adjust those expectations.
Speaker:
Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation