Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag 😉

Takeaways:

  • State space models and traditional time series models are well-suited to forecast loss ratios in the insurance industry, although actuaries have been slow to adopt modern statistical methods.
  • Working with limited data is a challenge, but informed priors and hierarchical models can help improve the modeling process.
  • Bayesian model stacking allows for blending together different model predictions and taking the best of both (or all if more than 2 models) worlds.
  • Model comparison is done using out-of-sample performance metrics, such as the expected log point-wise predictive density (ELPD). Brute leave-future-out cross-validation is often used due to the time-series nature of the data.
  • Stacking or averaging models are trained on out-of-sample performance metrics to determine the weights for blending the predictions. Model stacking can be a powerful approach for combining predictions from candidate models. Hierarchical stacking in particular is useful when weights are assumed to vary according to covariates.
  • BayesBlend is a Python package developed by Ledger Investing that simplifies the implementation of stacking models, including pseudo Bayesian model averaging, stacking, and hierarchical stacking.
  • Evaluating the performance of patient time series models requires considering multiple metrics, including log likelihood-based metrics like ELPD, as well as more absolute metrics like RMSE and mean absolute error.
  • Using robust variants of metrics like ELPD can help address issues with extreme outliers. For example, t-distribution estimators of ELPD as opposed to sample sum/mean estimators.
  • It is important to evaluate model performance from different perspectives and consider the trade-offs between different metrics. Evaluating models based solely on traditional metrics can limit understanding and trust in the model. Consider additional factors such as interpretability, maintainability, and productionization.
  • Simulation-based calibration (SBC) is a valuable tool for assessing parameter estimation and model correctness. It allows for the interpretation of model parameters and the identification of coding errors.
  • In industries like insurance, where regulations may restrict model choices, classical statistical approaches still play a significant role. However, there is potential for Bayesian methods and generative AI in certain areas.

Chapters:

00:00 Introduction to Bayesian Modeling in Insurance

13:00 Time Series Models and Their Applications

30:51 Bayesian Model Averaging Explained

56:20 Impact of External Factors on Forecasting

01:25:03 Future of Bayesian Modeling and AI

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
Speaker:

In this episode, I am thrilled to host Nate Haynes, the head of data science research at

Ledger Investing and a PhD from Ohio State University.

2

00:00:15,530 --> 00:00:18,872

Nate's expertise in generative invasion modeling

3

00:00:18,872 --> 00:00:26,467

helps tackle the challenges in insurance -linked securities, especially with issues like

measurement errors and small data sets.

4

00:00:26,467 --> 00:00:37,234

He delves into his use of state -space and traditional time series models to effectively

predict loss ratios and discusses the importance of informed priors in these models.

5

00:00:37,234 --> 00:00:37,836

Nate

6

00:00:37,836 --> 00:00:46,951

also introduces the BaseBlood package, designed to enhance predictive performance by

integrating diverse model predictions through model stacking.

7

00:00:46,951 --> 00:00:57,748

He also explains how they assess model performance using both traditional metrics like

RMSE and innovative methods like simulation -based calibration, one of my favorites, to

8

00:00:57,748 --> 00:01:00,960

ensure accuracy and robustness in their focus.

9

00:01:00,960 --> 00:01:05,166

So join us as Nate unpacks the complexities of Bayesian modeling.

10

00:01:05,166 --> 00:01:13,316

the insurance sector, revealing how advanced statistical techniques can lead to more

informed decision -making.

11

00:01:13,316 --> 00:01:21,026

This is Learning Vision Statistics, episode 115, recorded June 25, 2024.

12

00:01:22,764 --> 00:01:30,919

Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the

projects, and the people who make it possible.

13

00:01:30,919 --> 00:01:33,081

I'm your host, Alex Andorra.

14

00:01:33,081 --> 00:01:37,123

You can follow me on Twitter at alex -underscore -andorra.

15

00:01:52,332 --> 00:01:53,152

like the country.

16

00:01:53,152 --> 00:01:57,384

For any info about the show, learnbasedats .com is Laplace to be.

17

00:01:57,384 --> 00:02:04,566

Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on

Patreon, everything is in there.

18

00:02:04,566 --> 00:02:06,476

That's learnbasedats .com.

19

00:02:06,476 --> 00:02:16,829

If you're interested in one -on -one mentorship, online courses, or statistical

consulting, feel free to reach out and book a call at topmate .io slash alex underscore

20

00:02:16,829 --> 00:02:17,510

and dora.

21

00:02:17,510 --> 00:02:19,020

See you around, folks.

22

00:02:19,020 --> 00:02:20,893

and best patient wishes to you all.

23

00:02:20,893 --> 00:02:28,006

And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can

help bring them to life.

24

00:02:28,006 --> 00:02:30,469

Check us out at pimc -labs .com.

25

00:02:34,572 --> 00:02:38,763

Nate Haynes, welcome to Learning Bayesian Statistics.

26

00:02:39,283 --> 00:02:41,664

Thanks for having me, very excited to be here.

27

00:02:42,244 --> 00:02:43,224

Same.

28

00:02:44,165 --> 00:02:46,665

Very, very excited to have you here.

29

00:02:46,665 --> 00:02:54,727

Also because a lot of patrons of the show have requested you to be here.

30

00:02:56,708 --> 00:03:01,049

One of the most convincing was Stefan Lorentz.

31

00:03:01,709 --> 00:03:04,650

I'm pronouncing that the German way because

32

00:03:04,792 --> 00:03:08,584

think he's somewhere from from there.

33

00:03:08,584 --> 00:03:13,566

Maybe he's in or Swiss and then like he hates me right now.

34

00:03:13,566 --> 00:03:20,058

But yeah, Stefan, thank you so much for recommending Nate on the show.

35

00:03:20,058 --> 00:03:24,210

And I hope you'll appreciate the the episode.

36

00:03:24,210 --> 00:03:27,111

If you don't, this is entirely my fault.

37

00:03:27,111 --> 00:03:28,812

And not Nate's at all.

38

00:03:28,812 --> 00:03:31,333

Yeah, well, I appreciate the shoutouts.

39

00:03:31,333 --> 00:03:33,784

Yeah, no, he was like really

40

00:03:33,784 --> 00:03:36,986

He told me like that, I'll tell you what he told me.

41

00:03:36,986 --> 00:03:44,199

was, for a while now, I've been thinking about an interesting hook to recommend Nathaniel

Haynes.

42

00:03:44,219 --> 00:03:49,922

Someone who is not like so many of my previous recommendations currently in academia.

43

00:03:50,603 --> 00:03:51,083

Yeah.

44

00:03:51,083 --> 00:03:52,344

Yeah.

45

00:03:52,344 --> 00:03:55,825

And he was like, today it seems to have presented itself.

46

00:03:55,825 --> 00:03:57,466

He just released.

47

00:03:57,600 --> 00:04:04,815

a Python library for Bayesian model averaging, a very practical topic that hasn't been

discussed yet in any episode.

48

00:04:05,296 --> 00:04:09,199

know, he was really, really happy about your work.

49

00:04:09,199 --> 00:04:10,991

Very cool.

50

00:04:10,991 --> 00:04:12,682

Yeah, that's all I wanted to hear.

51

00:04:13,283 --> 00:04:19,037

Yeah, and we're definitely going to talk about that today, model averaging and a lot of

cool stuff on the deck.

52

00:04:19,037 --> 00:04:23,511

But first, can you tell us basically what you're doing nowadays?

53

00:04:23,511 --> 00:04:27,692

If you're not in academia, especially since you live in Columbus.

54

00:04:27,692 --> 00:04:31,945

which I think is mostly not for Ohio State University.

55

00:04:32,166 --> 00:04:33,177

Right, yeah.

56

00:04:33,177 --> 00:04:35,378

We think of ourselves as a flyover state.

57

00:04:35,378 --> 00:04:37,200

Well, others think of us as that.

58

00:04:37,200 --> 00:04:42,574

We like to think that we're much cooler and hipper and all that.

59

00:04:42,594 --> 00:04:54,945

Yeah, yeah, Yeah, so I've been, for the last few years, I've been working as a data

scientist remotely, and I've been at two different startups during my time the last few

60

00:04:54,945 --> 00:04:56,706

years after

61

00:04:56,706 --> 00:05:10,177

graduating from my PhD with my PhD focused on clinical mathematical psychology, where it's

kind of really where I did a lot of Bayesian modelings where I learned a lot of Bayesian

62

00:05:10,177 --> 00:05:17,122

modeling, which led me to yeah, where I am today at my current company, I'm with Ledger

Investing.

63

00:05:17,302 --> 00:05:25,828

And we are sort of like in between what I would call like insurance industry and like

finance in a way.

64

00:05:26,743 --> 00:05:30,154

We are not an insurance company, but that's the data that we deal with.

65

00:05:30,154 --> 00:05:32,775

And so a lot of our modeling is focused on that.

66

00:05:33,415 --> 00:05:36,226

And at Ledger, I'm the manager of data science research.

67

00:05:36,226 --> 00:05:50,980

So a lot of my work focuses on building new models, productionizing those models, and

finding out different ways to kind of incorporate models into our production workflow.

68

00:05:50,980 --> 00:05:52,540

And yeah, I'll be.

69

00:05:55,295 --> 00:06:02,988

happy to dive into more detail about that or kind of how I got here because I know it's

something I've talked to a lot of people about too, especially on the academic side.

70

00:06:02,988 --> 00:06:12,212

Like the transition to industry itself can be kind of something that's a little bit

opaque, but then also like based in industry is like, I didn't know there was like people

71

00:06:12,212 --> 00:06:13,572

doing a lot of that.

72

00:06:13,673 --> 00:06:18,234

And so yeah, excited to talk in more detail about about all of that.

73

00:06:18,595 --> 00:06:19,025

for sure.

74

00:06:19,025 --> 00:06:24,377

Actually, how did you end up working on these different on these topics because

75

00:06:24,897 --> 00:06:28,358

BASE is already a niche.

76

00:06:28,458 --> 00:06:32,119

then specializing in something BASE is even more niche.

77

00:06:32,119 --> 00:06:36,400

So I'm really curious about how I ended up doing that.

78

00:06:36,560 --> 00:06:38,661

Yeah, just like BASE in general.

79

00:06:39,441 --> 00:06:43,502

So I actually got exposed to BASE pretty early on.

80

00:06:43,502 --> 00:06:54,677

I guess you could say I have a weird background as a data scientist because I did my

undergrad degree in psychology and I just did a BA, so I didn't really take...

81

00:06:54,677 --> 00:06:57,178

much math in undergrad.

82

00:06:57,838 --> 00:07:05,810

But I got involved in a mathematical psychology lab, research lab, later on in my

undergraduate degree.

83

00:07:05,810 --> 00:07:09,641

And this was run by Tricia Van Zandt at Ohio State.

84

00:07:09,641 --> 00:07:12,642

So actually, I grew up in Columbus and I've been here ever since.

85

00:07:13,022 --> 00:07:17,123

But I started to work with her and some grad students in her lab.

86

00:07:17,123 --> 00:07:22,625

And they were all, yeah, the best way to put it, were hardcore Bayesians.

87

00:07:22,967 --> 00:07:28,979

So they did a lot of mathematical modeling of sort of more simple decision making tasks.

88

00:07:29,060 --> 00:07:33,862

And by simple, guess I just mean like the, you know, like response time types of tasks.

89

00:07:33,862 --> 00:07:41,305

And so they did a lot of reaction time modeling, which has a pretty deep history in

psychology.

90

00:07:41,525 --> 00:07:44,456

And so they were all Bayesians.

91

00:07:44,456 --> 00:07:45,617

That was the first time I saw the word.

92

00:07:45,617 --> 00:07:48,728

I remember seeing that word on like a grad students poster one time.

93

00:07:48,728 --> 00:07:49,888

Like, what is that?

94

00:07:51,561 --> 00:07:54,642

And so I got exposed to it a bit in undergrad.

95

00:07:54,642 --> 00:08:01,464

And then when I, like as I was going through undergrad, I knew I wanted to go to grad

school.

96

00:08:01,544 --> 00:08:03,965

I wanted to do a clinical psychology program.

97

00:08:03,965 --> 00:08:10,686

I was really interested in cognitive mechanisms, things involved with mental health.

98

00:08:10,827 --> 00:08:18,955

And I got really lucky because there was an incoming faculty at Ohio State who, Wooyoung

Ahn, that's his name, and he

99

00:08:18,955 --> 00:08:22,507

was the one who he brought a lot of that to Ohio State.

100

00:08:22,547 --> 00:08:27,550

He wasn't that he didn't end up being there for too long, but he said now it's still

National University.

101

00:08:27,550 --> 00:08:30,861

But I worked with him for a year as a lab manager.

102

00:08:30,861 --> 00:08:45,079

And in that first year, he he really wanted to build some open source software to allow

other people to do decision making modeling with psychological data.

103

00:08:45,079 --> 00:08:47,891

And the way to do that was to use hierarchical bays.

104

00:08:47,891 --> 00:08:48,585

And so

105

00:08:48,585 --> 00:08:53,507

I kind of got exposed to all of that through my work with Young.

106

00:08:53,507 --> 00:08:57,148

yeah, we did a lot of that work in Stan.

107

00:08:57,609 --> 00:09:02,751

And so that was kind of like the first time I really worked on it myself.

108

00:09:03,131 --> 00:09:12,255

But I'd kind of known about it and knew about some of the benefits that Bayes can offer

over other philosophies of statistics.

109

00:09:13,576 --> 00:09:15,487

And that started pretty early on in grad school.

110

00:09:15,487 --> 00:09:16,397

So I think

111

00:09:16,705 --> 00:09:24,327

I'm probably a weird case because I didn't really have like traditional stats training

before I got the Bayes training.

112

00:09:24,327 --> 00:09:36,410

And so a lot of my perspective is very much like I'm, I think a lot in terms of generative

models and I didn't have to unlearn a lot of frequentist stuff because my understanding by

113

00:09:36,410 --> 00:09:42,212

the time I really started diving into Bayes was pretty rudimentary on the frequentist

side.

114

00:09:42,952 --> 00:09:46,271

And so yeah, that, kind of naturally

115

00:09:46,271 --> 00:09:57,577

I got really involved in some of that open source work during graduate school on the

package we released called HBaseDM, which is a mouthful, but really good for search

116

00:09:57,577 --> 00:10:00,748

because nothing else pops up if you search HBaseDM.

117

00:10:03,310 --> 00:10:10,614

so that was kind of my first foray into like open source Bayesian modeling types of

software.

118

00:10:12,475 --> 00:10:15,897

that eventually, like I decided that

119

00:10:16,363 --> 00:10:19,254

you know, I really like to do this method stuff.

120

00:10:19,374 --> 00:10:25,215

It was more focused on modeling side of my work than I was on like the domain per se.

121

00:10:26,296 --> 00:10:33,288

And had a really interesting kind of track into industry.

122

00:10:33,288 --> 00:10:36,348

I wasn't actually originally pursuing that.

123

00:10:36,348 --> 00:10:46,091

That wasn't my intention, but I actually got, I just got a cold email one day the summer I

graduated from a co -founder at my previous company, which is called AVO Health.

124

00:10:46,091 --> 00:10:52,948

and they were looking for someone who did something very particular, which was like

hierarchical Bayesian modeling.

125

00:10:52,948 --> 00:10:55,271

They were familiar with psychological data.

126

00:10:55,271 --> 00:10:59,175

And so I kind of fit the bill for that.

127

00:10:59,175 --> 00:11:03,209

And I decided that it'd be worth the shot to try that out.

128

00:11:03,209 --> 00:11:05,261

And I've been in industry ever since.

129

00:11:05,261 --> 00:11:09,355

so, yeah, I think it was...

130

00:11:09,355 --> 00:11:18,128

Really what got me into it originally was just kind of being in the context surrounded by

people doing it, which I don't think most people get that experience because BASE is still

131

00:11:18,128 --> 00:11:19,768

rather niche, like you said.

132

00:11:20,988 --> 00:11:27,410

But at least in the circle that I was in, in undergrad and grad school and things like

that, it was just kind of the way to do things.

133

00:11:27,410 --> 00:11:36,939

And so I think that's colored my perspective of it quite a bit and definitely played a big

role in why I ended up at Ledger today.

134

00:11:36,939 --> 00:11:37,929

Yeah, super cool.

135

00:11:37,929 --> 00:11:47,434

Yeah, and I definitely can relate to the background in the sense that I too was introduced

to stats mainly through the Bayesian framework.

136

00:11:47,434 --> 00:11:53,786

So thankfully, mean, that was hard, but that was not as hard as having to forget

everything again.

137

00:11:53,786 --> 00:11:54,877

so.

138

00:11:55,537 --> 00:11:56,498

right, right.

139

00:11:56,498 --> 00:11:57,738

It was great.

140

00:11:57,738 --> 00:12:04,801

I remember being very afraid when I opened a classic statistics book and

141

00:12:04,801 --> 00:12:08,243

was like, my God, how many tests are there?

142

00:12:08,243 --> 00:12:10,354

It's just terrible.

143

00:12:10,354 --> 00:12:11,215

No, exactly.

144

00:12:11,215 --> 00:12:15,667

And it's hard to see how things connect together, yeah.

145

00:12:15,667 --> 00:12:20,770

No, I was not liking stats at all at that point.

146

00:12:21,271 --> 00:12:29,065

And then thankfully, I did electoral forecasting and you kind of have to do base in these

realms.

147

00:12:29,065 --> 00:12:31,797

know, that was really cool.

148

00:12:31,937 --> 00:12:35,218

one of the best things that ever happened to me.

149

00:12:35,959 --> 00:12:36,559

exactly.

150

00:12:36,559 --> 00:12:38,390

So you're forced into it from the start.

151

00:12:38,390 --> 00:12:39,690

It doesn't give you much choice.

152

00:12:39,690 --> 00:12:45,733

And then you look back and you're happy that that ended up happening, Exactly.

153

00:12:45,733 --> 00:12:47,324

Yeah.

154

00:12:47,324 --> 00:12:54,006

And actually, you do quite a lot of time series models, if I understood correctly.

155

00:12:54,006 --> 00:12:57,018

So yeah, could you talk a bit about that?

156

00:12:57,018 --> 00:13:00,689

I'm always very interested in time series and forecasting models.

157

00:13:00,791 --> 00:13:03,602

how useful they are in your work.

158

00:13:03,722 --> 00:13:04,882

Yeah, yeah.

159

00:13:04,882 --> 00:13:16,625

So I think maybe first to start, like I can give a bit of context on kind of the core

problem we're trying to solve at Ledger and that'll help kind of frame what we do with the

160

00:13:16,625 --> 00:13:18,786

time series models.

161

00:13:18,786 --> 00:13:25,948

So like, basically what we provide is an alternative source of capital for insurance

companies.

162

00:13:25,948 --> 00:13:30,025

And so it's like, you if I wanted to start an insurance company, I'd have to have

163

00:13:30,025 --> 00:13:40,924

a ton of money to like have in the bank so that when people are, know, if something goes

wrong and I write a bunch of policies for private auto, for example, for car insurance, I

164

00:13:40,924 --> 00:13:45,458

have to be able to make people whole when, you know, an accident happens.

165

00:13:45,699 --> 00:13:55,287

And so when insurers are trying to fund different books of business, they need often to

raise lots of capital for that.

166

00:13:55,287 --> 00:13:57,348

And traditionally,

167

00:13:58,269 --> 00:14:08,577

one of the methods that they have done to accomplish this is to approach reinsurers, which

I didn't know anything about before I joined Ledger.

168

00:14:08,658 --> 00:14:12,171

I'm kind of an insurance newbie at Ledger.

169

00:14:12,171 --> 00:14:14,733

Now it's been a couple years, so I can't say that anymore.

170

00:14:14,733 --> 00:14:24,851

But basically go to someone with even more money to kind of provide the capital and kind

of allow them to write their business.

171

00:14:24,851 --> 00:14:25,992

And so we...

172

00:14:27,799 --> 00:14:41,333

kind of, we basically work with insurers or other similar entities and then investors and

allow the investors access to this insurance risk as like an asset class.

173

00:14:41,333 --> 00:14:48,855

And then from the perspective of the insurance side, they're getting the capital that they

need to fund their programs.

174

00:14:48,855 --> 00:14:56,245

And so it's sort of a way for the insurance companies like it because it's the source of

capital they need to do business.

175

00:14:56,245 --> 00:15:05,908

the investors like it because they get to invest in how these portfolios of insurance

programs are performing as opposed to like say investing in an insurance company stock or

176

00:15:05,908 --> 00:15:06,698

something like that.

177

00:15:06,698 --> 00:15:18,071

And so it's a little bit more uncorrelated with like the market in terms of like other

types of assets that investors might have access to.

178

00:15:18,071 --> 00:15:20,792

And so that's kind of the core problem.

179

00:15:20,792 --> 00:15:25,421

Our data science team is like the context that we're

180

00:15:25,421 --> 00:15:37,741

we're baked within and what we're actually modeling, like the thing we're trying to solve

is, you know, so we have, say an insurance company approaches us and they have a

181

00:15:37,741 --> 00:15:42,861

commercial auto or a private auto program or a workers compensation program.

182

00:15:43,101 --> 00:15:48,601

You know, so a lot of times they'll have like been writing that kind of program.

183

00:15:48,601 --> 00:15:52,101

They've been in that business for, you know, five, 10 years or something.

184

00:15:52,101 --> 00:15:53,913

And so they have historic data.

185

00:15:54,029 --> 00:16:00,519

And the way we look at the data is you have like different accident years.

186

00:16:00,519 --> 00:16:08,649

So if you think like, you know, if we're looking at it today in year 2024, maybe they have

like 10 years of business that they've been writing.

187

00:16:08,649 --> 00:16:19,869

And so we look back all the way to like 2014, 2015, 2016, and we see how much have they

lost, like how much claims have they paid out versus premium have they taken in.

188

00:16:19,869 --> 00:16:23,981

And so there's this quantity of like the loss ratio is really important.

189

00:16:23,981 --> 00:16:28,181

And in a lot of areas of business, it's like around a 60%.

190

00:16:28,181 --> 00:16:38,921

And this is before like you're paying salaries and all of that, just like the pure like

insurance side, like around 60 % might be pretty typical or pretty reasonable for like a

191

00:16:38,921 --> 00:16:40,100

good -ish program.

192

00:16:40,100 --> 00:16:43,821

So it's an overgeneralization, but just to keep some numbers in mind.

193

00:16:44,601 --> 00:16:50,341

And the interesting thing about this though is that, you know, we look back at 2014 and we

have 10 years of history.

194

00:16:50,341 --> 00:16:51,373

So we...

195

00:16:51,373 --> 00:16:58,393

We kind of know what the loss is for 2014 for a program that comes to us today, but what

of like 2023, right?

196

00:16:58,393 --> 00:17:00,453

There's only been a year since then.

197

00:17:00,453 --> 00:17:08,333

And the way that insurance often works, you've ever had to file a claim for like

homeowners or car insurance, something like this, you're probably familiar.

198

00:17:08,333 --> 00:17:11,443

It can take quite a while for you to get paid out.

199

00:17:11,443 --> 00:17:16,583

And sometimes there's lawsuits, sometimes people don't file a claim until years later.

200

00:17:16,583 --> 00:17:19,669

Like it can, there can be a lot of different reasons that

201

00:17:19,669 --> 00:17:29,916

you know, the information we have today about losses in any given year maybe isn't

complete or the way that we think about it is it's a loss ratio isn't developed.

202

00:17:29,916 --> 00:17:41,684

And so you think about like the data that we have, it kind of takes the shape of this

triangle where if you, and we call them loss triangles where you have, you can think of

203

00:17:41,684 --> 00:17:48,488

like a matrix where the sort of act or the Y axis would be

204

00:17:49,549 --> 00:17:52,509

the accident years, so the different years that accidents are occurring.

205

00:17:52,509 --> 00:17:57,729

And then the X axis would be how much time has passed since we're looking at that accident

year.

206

00:17:57,729 --> 00:18:02,009

So we call that like the development period or something similar.

207

00:18:02,149 --> 00:18:08,429

And so like 2014, we have 10 cells, 10 years of data that we can look back on.

208

00:18:08,869 --> 00:18:11,309

2014, we have nine and so on and so forth.

209

00:18:11,309 --> 00:18:13,369

And so it kind of forms this triangle.

210

00:18:13,369 --> 00:18:19,379

And so basically for us to price these deals,

211

00:18:19,505 --> 00:18:27,410

What we end up needing to do is two things and there's kind of two basic like modeling

steps involved.

212

00:18:27,491 --> 00:18:34,796

And the first is to find out, you know, where do we think the loss ratio is going to end

up for all of these accident years?

213

00:18:34,796 --> 00:18:37,798

Like if we were looking back, you know, like a hundred years from now.

214

00:18:38,579 --> 00:18:42,321

And so we want to know like what that ultimate state of the loss ratio is.

215

00:18:42,321 --> 00:18:47,785

And so that's the first part where some time series models come into play.

216

00:18:49,342 --> 00:19:01,256

And so we have this kind of weirdly shaped data and we want to extrapolate out from

historical to kind of thinking about the year as being static that we're looking at, but

217

00:19:01,256 --> 00:19:06,779

like our information that we have on it is dynamic and we can learn more about that as

time goes on.

218

00:19:06,779 --> 00:19:11,230

And so our first task is to kind of predict that ultimate state.

219

00:19:11,891 --> 00:19:17,543

And that just gives us a sort of more accurate representation of what we think the history

will look like.

220

00:19:17,693 --> 00:19:20,064

And so that's our first step.

221

00:19:20,064 --> 00:19:25,387

And then the second step, which is where we use much more traditional time series models.

222

00:19:25,387 --> 00:19:37,184

And the second step is to then say, okay, given that history of like the ultimate state

for each previous year, like what are the next two, three years going to look like?

223

00:19:37,184 --> 00:19:40,145

And that's where we have more traditional forecasting models.

224

00:19:40,566 --> 00:19:47,319

But because we have this multi -stage process, like there's uncertainty from one model

output that we need to account for in that.

225

00:19:47,319 --> 00:19:48,190

second stage.

226

00:19:48,190 --> 00:19:53,114

And so we have, like we do some measurement error modeling and things like that.

227

00:19:53,114 --> 00:20:05,424

And that at the end of the day is really why like Bayes ends up being such a useful tool

for this problem, just because there's lots of sources of uncertainty.

228

00:20:05,705 --> 00:20:12,081

There's a rich history of actuarial science where actuaries have developed models to solve

similar problems to this.

229

00:20:12,081 --> 00:20:14,913

And so there's like theory -informed models.

230

00:20:14,977 --> 00:20:16,678

historic data that we can use.

231

00:20:16,678 --> 00:20:20,029

And so we get to use really everything in the base toolbox.

232

00:20:20,029 --> 00:20:25,701

Like we get to use priors, we get to use very theory informed generative models.

233

00:20:25,701 --> 00:20:35,536

And then we also get to do some some fun things like measurement error modeling and things

of that nature, kind of between the various stages of the modeling workflow that we

234

00:20:35,536 --> 00:20:35,936

follow.

235

00:20:35,936 --> 00:20:44,329

I know this is of a long explanation, but I think the context is kind of helpful to

understand like

236

00:20:44,427 --> 00:20:48,488

why we approach it that way and why we think base is a useful way to do so.

237

00:20:48,488 --> 00:20:50,659

Yeah.

238

00:20:50,659 --> 00:20:52,099

Thanks a lot for that context.

239

00:20:52,099 --> 00:21:09,384

I think it's very useful because also then I want to ask you about, you know, then which

kind of time series models you mostly use for these use cases and what are some of the

240

00:21:09,384 --> 00:21:12,779

most significant challenges you face when dealing with that?

241

00:21:12,779 --> 00:21:15,110

that kind of time series data?

242

00:21:15,651 --> 00:21:17,112

Yeah, no, it's a question.

243

00:21:17,112 --> 00:21:23,076

So I'd say like the time series models, we do a lot of state space modeling.

244

00:21:23,877 --> 00:21:36,496

so we've done, it really kind of like there's, we do a lot of research on exploring

different forms of models, but the stuff that we end up using in production, like that

245

00:21:36,496 --> 00:21:40,229

first stage where we do our kind of development process.

246

00:21:40,589 --> 00:21:46,039

Those models are more similar to like the classic actuarial science models.

247

00:21:46,039 --> 00:21:58,629

So they technically are science or time series models, but they're kind of these non

-parametric models where we're just estimating, you know, say your loss is 20 % during the

248

00:21:58,629 --> 00:22:00,069

first development life.

249

00:22:00,069 --> 00:22:09,793

what are, can we estimate some parameters that if you kind of multiply that by some factor

that that gives us the next.

250

00:22:09,793 --> 00:22:10,293

period.

251

00:22:10,293 --> 00:22:15,675

And so there's these link ratio style models that we use in that context.

252

00:22:15,675 --> 00:22:20,606

And so there's a little less traditional but more in line with what actuaries have

historically done.

253

00:22:20,606 --> 00:22:32,949

And then for the forecasting piece, that's where we use more kind of modern, more

classical, or not classical, but more what you would imagine when you think of time series

254

00:22:32,949 --> 00:22:34,240

models today.

255

00:22:35,580 --> 00:22:38,719

things like, like

256

00:22:38,719 --> 00:22:40,990

autoregressive styles of models.

257

00:22:40,990 --> 00:22:53,373

We do, like I said, states -based models where we kind of assume that these loss ratios

are really this latent kind of drifting parameter over time.

258

00:22:53,373 --> 00:23:01,736

And then we have sort of the latent dynamics paired with some observational model of how

those losses are distributed.

259

00:23:02,796 --> 00:23:05,977

And then sometimes in

260

00:23:06,101 --> 00:23:16,464

A lot of times in the investment or the finance world, people talk about whether they

think some sort of asset is mean reverting or if it shows some sort of momentum in the

261

00:23:16,464 --> 00:23:17,744

underlying trends.

262

00:23:17,744 --> 00:23:22,645

And so we have different models that capture some of those different assumptions.

263

00:23:24,126 --> 00:23:33,209

actually, pretty interesting, people all across the business tend to be pretty interested

in whether a model has mean reversion in it or momentum in it.

264

00:23:33,209 --> 00:23:35,263

And that becomes actually a question that

265

00:23:35,263 --> 00:23:44,661

a lot of investors and insurance companies alike are interested in knowing because they

might have disagreements about whether or not that component should be in the model.

266

00:23:45,623 --> 00:23:51,538

But so that's the I'd say like those types of time series models are what we use most

regularly in production.

267

00:23:51,538 --> 00:23:58,965

So like your traditional state space models that like terms of the challenges we face.

268

00:23:58,965 --> 00:24:01,549

I think the big challenge is that

269

00:24:01,549 --> 00:24:10,169

kind of based on the context that I was providing, you we might have like 10 years of

history on a program and that would be a good outcome.

270

00:24:10,349 --> 00:24:19,949

And so, you know, if our time series is 10 previous data points where some of the more

recent ones are highly uncertain because they're actually, you know, they're predictions

271

00:24:19,949 --> 00:24:26,929

from a previous model, I think you might kind of start to imagine where the issues can

arise there.

272

00:24:27,369 --> 00:24:29,029

And so I think...

273

00:24:29,293 --> 00:24:35,533

I would say that's probably our biggest challenge is the data that we work with from a

given program.

274

00:24:35,533 --> 00:24:41,133

The numbers are big because we're talking about investment amounts of dollars.

275

00:24:41,713 --> 00:24:46,993

A program might write 10 to $100 million of premium.

276

00:24:46,993 --> 00:24:49,983

And so the loss values are pretty high themselves.

277

00:24:49,983 --> 00:24:53,053

And so it's a lot of information there.

278

00:24:54,213 --> 00:24:58,803

But the history that we have to build a time series model on is pretty...

279

00:24:58,803 --> 00:24:59,653

short.

280

00:24:59,653 --> 00:25:08,757

And so a lot of like classical time series approaches, there's quite a bit more data that

people are working with.

281

00:25:08,757 --> 00:25:13,719

You'll hear about like things with seasonality and other types of things where you're

decomposing a time series.

282

00:25:13,719 --> 00:25:22,562

And we don't really have the ability to do any of those classical modeling approaches,

mostly just because we don't have the history for it.

283

00:25:23,243 --> 00:25:28,205

And so one of the ways that we approach that problem to help

284

00:25:28,205 --> 00:25:40,525

solve it, at least to solve it to some extent, is that we do have information on many,

many different insurance companies and their losses historically.

285

00:25:40,525 --> 00:25:55,985

even if the history may not be very long, we might have at maximum 30, 40 years of

history, 50 years of history sometimes on 50 basically individual data points in a time

286

00:25:55,985 --> 00:25:56,665

series model.

287

00:25:56,665 --> 00:25:58,071

Typically we have much less.

288

00:25:58,071 --> 00:26:07,177

But one of the things that happens in the insurance industry is that all of these

companies, need to publicly release certain information each year.

289

00:26:07,177 --> 00:26:21,277

so we're able to use that to basically, we were able to take that information and use it

to help us obtain informed, like data informed priors.

290

00:26:21,277 --> 00:26:28,095

And so that when a smaller program comes our way and we are using our time series models

on that.

291

00:26:28,095 --> 00:26:32,669

we have priors that have been pretty fine tuned to the problem.

292

00:26:32,669 --> 00:26:41,606

so like priors that are fine tuned to that particular line of business, whether it's

commercial auto or workers' compensation or something like that.

293

00:26:41,606 --> 00:26:46,681

I'd say that's like our biggest challenge is that small kind of problem.

294

00:26:46,681 --> 00:26:54,317

then based with the informed priors is a way that we're able to tackle that in a more

principled way.

295

00:26:54,317 --> 00:26:55,888

Yeah, yeah.

296

00:26:55,888 --> 00:26:57,101

That makes, like...

297

00:26:57,101 --> 00:26:57,771

ton of sense.

298

00:26:57,771 --> 00:27:02,114

And that sounds like very fun models to work on.

299

00:27:02,114 --> 00:27:02,734

Yeah.

300

00:27:02,734 --> 00:27:03,465

Yeah.

301

00:27:03,465 --> 00:27:04,545

I really love that.

302

00:27:04,545 --> 00:27:06,726

So state space models.

303

00:27:08,087 --> 00:27:11,629

You're mainly talking about HMMs, things like that.

304

00:27:12,089 --> 00:27:14,070

Yeah, of the same form.

305

00:27:14,070 --> 00:27:19,693

Yeah, we've done some experimenting with Gaussian processes.

306

00:27:20,234 --> 00:27:26,457

Actually about to submit, my colleagues about to submit a paper doing some work with HMMs.

307

00:27:28,082 --> 00:27:35,125

hidden markup models for anyone who's listening, guess, who might not know what that

acronym stands for.

308

00:27:35,125 --> 00:27:46,529

But typically, the models that we use end up being even more simpler than that for our

forecasting problem, mostly just because of the fact that we do have such a small data to

309

00:27:46,529 --> 00:27:47,589

work with.

310

00:27:48,250 --> 00:27:51,912

Oftentimes, the functional form of the model can't be too complex.

311

00:27:51,912 --> 00:27:55,959

And so they end up being kind of more similar to your

312

00:27:55,959 --> 00:28:01,893

typical like a Rima style models, which you can kind of write in a state space fashion.

313

00:28:03,055 --> 00:28:14,434

And so that it tends to be, yeah, more closely related to those than, than models that do

like regime switching things like that, because oftentimes we just don't have enough

314

00:28:14,434 --> 00:28:20,749

information to, be able to fit those types of models, even with them from priors, it might

not be as believable.

315

00:28:21,850 --> 00:28:24,692

That being said, it's, one of those things that like, if we,

316

00:28:25,721 --> 00:28:38,675

If we do think that something might work well, like if we think that, you know, adding a

more complicated mechanism on the momentum piece of the model or, or adding in different

317

00:28:38,675 --> 00:28:51,768

assumptions about mean reversion and things like that, we, we typically do explore those

types of things, but surprisingly hard to beat simple time series models with the small n

318

00:28:51,988 --> 00:28:52,868

in our context.

319

00:28:52,868 --> 00:28:55,309

And so we, we do, we do quite a lot of

320

00:28:55,735 --> 00:29:01,149

cross validation to determine which types of models we should be using in production.

321

00:29:01,149 --> 00:29:12,686

And oftentimes it's a mix of evaluating those models based on their performance, but also

how well calibrated they are and things of that nature so that we know that the models

322

00:29:12,686 --> 00:29:18,490

we're using are interpretable and we can kind of defend them if something ends up going

sideways.

323

00:29:18,490 --> 00:29:25,865

We want to be able to go to the investor and say, like, you know, we did our due diligence

and here's why we think this was still a good choice at the time.

324

00:29:26,101 --> 00:29:36,065

I'm not sure if that gets at that question, but let me know if I can expand on the models

in particular.

325

00:29:37,006 --> 00:29:48,030

No, mean, it's funny you'd say that because definitely it's hard to beat, it's

surprisingly hard to beat regression in a lot of contexts.

326

00:29:48,030 --> 00:29:55,373

If you do a generalized regression, that's already a very good baseline and that's

327

00:29:55,373 --> 00:29:56,874

pretty hard to beat.

328

00:29:57,014 --> 00:29:59,756

So I'm not supposed to hear that's the same here.

329

00:30:00,117 --> 00:30:00,507

Right.

330

00:30:00,507 --> 00:30:09,864

And I think part of the issue too with our data is that like the more recent observations

in the time series have this high uncertainty along with them.

331

00:30:09,864 --> 00:30:17,089

So with the measurement error component in there, it's difficult to choose between

different model configurations.

332

00:30:17,089 --> 00:30:23,253

And so the more complicated your forecasting model is, that uncertainty ends up kind of

333

00:30:23,925 --> 00:30:27,668

making it even harder for a more complex model to win out in our tests.

334

00:30:27,668 --> 00:30:38,818

And so that's one of the things that we've observed in something that I think probably

anyone who's been involved in similar contexts would be able to say they've run into as

335

00:30:38,818 --> 00:30:39,678

well.

336

00:30:40,459 --> 00:30:41,260

Yeah.

337

00:30:41,260 --> 00:30:48,857

And well, actually, I want to make sure we get to model leveraging and comparison.

338

00:30:48,857 --> 00:30:51,955

So I still have

339

00:30:51,955 --> 00:30:54,877

a few questions for you with time series and these kind of models.

340

00:30:54,877 --> 00:30:57,789

let's switch gears a bit here.

341

00:30:57,789 --> 00:31:05,615

And like, tell us how you use Bayesian model averaging in your projects.

342

00:31:05,615 --> 00:31:15,942

And what advantages do you do you see in this approach over a single model predictions?

343

00:31:15,982 --> 00:31:17,863

Yeah, no, it's a good question.

344

00:31:18,164 --> 00:31:20,735

So I hadn't done a ton of work with

345

00:31:20,951 --> 00:31:28,528

Bayesian model averaging or model averaging in general before I joined Ledger.

346

00:31:28,528 --> 00:31:32,230

And so I was really excited to get to work on some of that stuff.

347

00:31:33,952 --> 00:31:44,221

one of the ways I'd say it comes up in multiple parts of our workflow now, but one of the

first use cases was for our forecasting models.

348

00:31:44,942 --> 00:31:47,123

And I was describing a bit earlier, we

349

00:31:48,865 --> 00:31:55,367

You know, they, we have different models that make different assumptions about the

underlying losses and how they might change over time.

350

00:31:55,367 --> 00:32:02,649

And I think the one, one example is, like, does the, does the process have momentum or

not?

351

00:32:02,649 --> 00:32:13,152

like if a loss ratio is trending upward, do we think it's going like, is there going to be

some component of the model that kind of keeps it trending upward over time versus do we

352

00:32:13,152 --> 00:32:17,353

have something in there where it functions more like a random walk and.

353

00:32:17,945 --> 00:32:29,608

And this is something that a lot of industry experts might debate or like if you're if

you're like an actuary or CEO of some insurance company and you're trying to explain like

354

00:32:29,608 --> 00:32:39,811

why your losses are trending in a certain direction, like people talk about these things

pretty normally, like momentum or reversion, things like that.

355

00:32:39,811 --> 00:32:46,233

And so so because people have varying opinions about this, our approach

356

00:32:46,461 --> 00:32:56,924

you know, one approach would be to try different models that make those different

assumptions and then do some do some model comparison and just select one.

357

00:32:57,364 --> 00:33:05,557

But the because often there's certain contexts where, you know, it might make sense to

assume a momentum.

358

00:33:05,557 --> 00:33:09,467

It might make sense to assume reversion and other contexts where it might not.

359

00:33:10,528 --> 00:33:14,423

The model averaging became kind of like a very natural.

360

00:33:14,423 --> 00:33:17,235

thing to do and try out in that context.

361

00:33:17,235 --> 00:33:25,419

And so that was really what inspired it is just this idea that we don't have to

necessarily choose a model.

362

00:33:25,419 --> 00:33:31,422

If we think both are reasonable, we can allow the data to make that decision for us.

363

00:33:32,143 --> 00:33:36,425

And so that's really where it came into our workflow.

364

00:33:36,425 --> 00:33:43,435

when we're doing our forecasts, we'll have these different models that we fit and make

predictions with.

365

00:33:43,435 --> 00:33:51,527

And then we have our model averaging models, which now talking about models of models gets

a little bit fun terminology wise.

366

00:33:51,527 --> 00:34:04,771

But that's the stage where we bring those in and we say, like, okay, given, you know, we

might have some covariates that we can use to build those models, those averaging models.

367

00:34:04,831 --> 00:34:10,052

so things like we know what line of business it is, it's commercial auto workers'

compensation, something like that.

368

00:34:10,052 --> 00:34:12,023

Like we know how much, how like...

369

00:34:12,023 --> 00:34:16,476

big the volume is, like how much premium that the program brings in.

370

00:34:16,857 --> 00:34:20,028

We know locations of these different businesses.

371

00:34:20,028 --> 00:34:26,854

And so all of those can then be used as covariates and like a stacking model, for example.

372

00:34:26,854 --> 00:34:38,512

And we can train those models to combine, rely more on the assumptions of one model over

the other, depending on the context.

373

00:34:39,505 --> 00:34:46,607

And that was the motivation and that's where we still do that work today is mostly at that

forecasting step.

374

00:34:47,767 --> 00:35:00,211

But yeah, I think Bayesian model averaging is really nice because if you have the capacity

to be able to fit the models that you want to blend together, we found through our

375

00:35:00,211 --> 00:35:07,673

research, if we do that and compare it to like a single model in isolation,

376

00:35:08,875 --> 00:35:12,897

Not always, but oftentimes it will end up performing better.

377

00:35:12,897 --> 00:35:21,173

And so it's sort of like, why not take the best of both worlds as opposed to having to

worry about model selection?

378

00:35:21,254 --> 00:35:30,740

And especially when the underlying models that we're blending together are both like

equally theoretically motivated and it's hard to really make a decision, even if the data

379

00:35:30,740 --> 00:35:32,171

were to suggest one over the other.

380

00:35:32,171 --> 00:35:37,825

Yeah, I mean, that definitely makes sense if you have a bunch of good models.

381

00:35:37,825 --> 00:35:40,397

That's really cool to be able to average them.

382

00:35:40,397 --> 00:35:47,511

I remember when I started learning Bayesian stanza, I was really blown away by the fact

that this is even possible.

383

00:35:48,392 --> 00:35:50,793

that's just incredible.

384

00:35:50,854 --> 00:35:52,595

Can you...

385

00:35:52,675 --> 00:36:06,434

So actually, can you contrast model averaging with Bayesian model comparison to make sure

listeners understand both concepts and how they fit together, and then talk about how you

386

00:36:06,434 --> 00:36:07,585

implement

387

00:36:07,585 --> 00:36:10,357

these techniques in your modeling workflow?

388

00:36:10,357 --> 00:36:11,497

Yeah, no, great question.

389

00:36:11,497 --> 00:36:24,605

think so when I think of Bayesian model comparison, I often think of different types of

metrics that we might have, whether it's approximations or done by brute force, we might

390

00:36:24,605 --> 00:36:29,738

like we might have some sort of cross validation metrics that we evaluate the models on.

391

00:36:29,738 --> 00:36:35,341

So like in our forecasting case, you know, we might have actual historical

392

00:36:35,341 --> 00:36:39,360

you know, maybe we look, have actual data from like 2000.

393

00:36:39,360 --> 00:36:41,891

And so we actually have like 10 years of history on it.

394

00:36:41,891 --> 00:36:43,531

We know what the ultimate state is.

395

00:36:43,531 --> 00:36:45,901

We know what like the forecast should predict.

396

00:36:45,901 --> 00:36:49,201

In those cases, you know, we can train our models.

397

00:36:49,201 --> 00:36:52,421

We can have them do the out of sample predictions.

398

00:36:52,421 --> 00:36:57,841

And then we can score on those out of sample predictions, like how well they're

performing.

399

00:36:57,841 --> 00:37:01,893

So, you know, we often actually do the brute force.

400

00:37:01,893 --> 00:37:11,676

as opposed to doing something like the, I know in the Stan community, you might have like

the Pareto smooth importance sampling, leave one out approximations, things like that is

401

00:37:11,676 --> 00:37:13,236

another way to approach the problem.

402

00:37:13,236 --> 00:37:20,238

But basically a lot of times when you're doing Bayesian model comparison, you'll have some

out of sample metric or approximation to it.

403

00:37:20,238 --> 00:37:26,260

And then you like, you might have that for a bunch of out of sample data points.

404

00:37:26,260 --> 00:37:28,961

And then those data points, can

405

00:37:29,611 --> 00:37:39,638

do some statistical tests or even just look at sort of absolute values of how much better

one model is predicting now to sample performance metrics versus another.

406

00:37:39,638 --> 00:37:53,589

And in the STAND community, and well, PMC as well, think like the expected log point -wise

predictive density or the ELPD is a quantity that's often used, which is sort of a log

407

00:37:53,589 --> 00:37:57,301

likelihood based metric that we can use on.

408

00:37:57,445 --> 00:38:01,886

out of sample data to compute like expected predictive performance.

409

00:38:02,007 --> 00:38:11,990

And typically for Bayesian model comparison, the practice almost stops after you get that

ELPD value or something similar.

410

00:38:11,990 --> 00:38:19,052

might be, you might do some test of like how different they are, like some standard error

on the difference of the ELPD between two models.

411

00:38:19,172 --> 00:38:23,233

But at the end of the day, like once you have that metric,

412

00:38:23,275 --> 00:38:31,527

that's sort of the inference that you might have at the end is that, okay, this model is

performing better per this metric.

413

00:38:33,288 --> 00:38:41,130

with stacking, what you're doing, and I guess there's different forms of model averaging.

414

00:38:41,130 --> 00:38:46,541

have like Bayesian model averaging, which is slightly different than stacking and things

like that.

415

00:38:46,541 --> 00:38:53,093

But what would all of them follow the same basic principle is that you have your out of

sample performance metrics.

416

00:38:53,109 --> 00:39:04,087

And then what you do is instead of just choosing one model based on the model that has

better out of sample performance metrics, you build a model on those performance metrics

417

00:39:04,087 --> 00:39:11,792

to kind of tell you when you should rely on, you know, model A versus model B.

418

00:39:11,792 --> 00:39:22,389

And so, so the stacking or averaging models we can think of as just like a different model

themselves that are trained instead of on your outcome.

419

00:39:22,389 --> 00:39:26,832

measure in your actual substantive or your candidate model that you care about.

420

00:39:27,072 --> 00:39:37,540

It's trained on the performance metrics, the auto sample performance metrics that you are

using to, in this case, you wouldn't be doing model selection.

421

00:39:37,540 --> 00:39:49,128

You'd be blending together the predictions from each of your candidate models according to

the model and how it thinks you should weight both based on the auto sample performance.

422

00:39:49,128 --> 00:39:50,188

And so.

423

00:39:50,701 --> 00:40:01,340

So kind of going that route does require a bit more, you have to think a little bit more

about like how you're using your data because if you want to evaluate how well a stacking

424

00:40:01,340 --> 00:40:07,455

model is performing, for example, you have to leave out a little bit more validation data.

425

00:40:07,455 --> 00:40:10,007

So you don't want to do any double dipping.

426

00:40:10,007 --> 00:40:17,803

so you'll have your candidate models that you'll make out of sample predictions on.

427

00:40:17,904 --> 00:40:19,649

Those predictions become

428

00:40:19,649 --> 00:40:23,853

that your performance on those predictions become the basis for training your stacking

model.

429

00:40:23,853 --> 00:40:30,589

And then at the end of the day, you might train your stacking model on some other third

validation set of data.

430

00:40:30,589 --> 00:40:42,911

So I think that's really the only big limitation, I would say, of using those approaches

over just like your traditional model comparison, where you're kind of done once you

431

00:40:42,911 --> 00:40:44,292

select your model.

432

00:40:45,217 --> 00:40:53,662

That being said, think, yeah, being able to combine the predictions from the candidate

models ends up oftentimes being well worth, well worth kind of dividing your data up that

433

00:40:53,662 --> 00:40:53,832

way.

434

00:40:53,832 --> 00:40:54,122

Yeah.

435

00:40:54,122 --> 00:40:56,063

Yeah, yeah, definitely.

436

00:40:56,063 --> 00:41:04,968

That's, that's an extremely, extremely good point and also very useful method.

437

00:41:05,048 --> 00:41:14,573

I know in, in PIMC, for instance, with RVs, you can do that very easily where you

438

00:41:15,073 --> 00:41:30,600

basically do your model comparison with all these, it's gonna give weights to the models

and then those weights are used by a plan C with the PMW sample, post -hera predictive W

439

00:41:30,600 --> 00:41:39,674

where we weight each models, predictions, each models, post -hera predictive samples,

according to the weights from the model comparison.

440

00:41:39,674 --> 00:41:42,145

So is that.

441

00:41:42,541 --> 00:41:47,885

how you usually end up implementing that or using Stan?

442

00:41:47,885 --> 00:41:48,776

How do you do that?

443

00:41:48,776 --> 00:41:54,130

I think it's going to be interesting for the listeners who want to give that a try.

444

00:41:54,191 --> 00:41:55,472

Yeah, no, it's a great question.

445

00:41:55,472 --> 00:42:00,276

So that's good plug for the paper we just wrote.

446

00:42:00,276 --> 00:42:05,110

So yeah, we've been actually using some internal software to do a lot of this.

447

00:42:05,110 --> 00:42:11,104

like actually, all of our software historically has been like we have our own kind of

448

00:42:11,425 --> 00:42:18,569

BRMS is not the right term for it, but we have our own language that we use to write Stan

models.

449

00:42:19,390 --> 00:42:24,793

And then we do a lot of our stacking and stuff.

450

00:42:24,793 --> 00:42:28,435

had our own internal code that we would use to do all of this.

451

00:42:29,155 --> 00:42:39,091

But we decided recently that this was something that, yeah, I think we were talking about

before the show started today, that it's not something that

452

00:42:39,153 --> 00:42:48,118

has gotten a lot of attention, like in terms of like making this easy to do in a generic

way with like a bunch of different types of stacking models.

453

00:42:48,118 --> 00:42:54,621

And so we've actually just released and wrote a paper on this package in Python called

Bayes blend.

454

00:42:54,742 --> 00:43:05,447

And what this package, the intent and what we hope that it will allow users to do, what

allows us to do it at least.

455

00:43:05,447 --> 00:43:08,209

hopefully other users as well is

456

00:43:08,491 --> 00:43:18,044

like given, you know, I might have a model that I fit in Stan or PMC or, you know,

whatever probabilistic programming language of choice.

457

00:43:19,104 --> 00:43:26,226

We built the package such that you can kind of initialize a stacking model, one of a

various different ones.

458

00:43:26,226 --> 00:43:36,981

So we have like the pseudo Bayesian model averaging types of models, the pseudo BMA plus

models, which are things that are based on the

459

00:43:36,981 --> 00:43:39,742

ELPD and they blend based on that.

460

00:43:39,742 --> 00:43:54,349

And then we also have like proper Bayesian stacking and hierarchical stacking models that

you can use with Bayes blend where given the out of sample likelihood metrics that you can

461

00:43:54,349 --> 00:44:03,893

get by training your data or training your model on one set of data, making out of sample

predictions on another test set, given those as input.

462

00:44:03,893 --> 00:44:12,962

you can fit a variety of these different stacking models and then easily blend them all

together and evaluate performance and things like that.

463

00:44:12,962 --> 00:44:21,910

And so we've built that in Python just because that's the stack that like we use Python

for our day to day work and in our production setting.

464

00:44:21,970 --> 00:44:29,217

And then we've been building integrations so that like right now it's really easy to

interface with.

465

00:44:29,217 --> 00:44:31,108

command stand because that's what we use.

466

00:44:31,108 --> 00:44:36,262

So we kind of built from that perspective first, but it does interface with our viz as

well.

467

00:44:36,262 --> 00:44:48,150

So if you're using like IMC, for example, make your, can kind of create that our viz

inference data object and then use that as input for base blend.

468

00:44:48,591 --> 00:44:58,335

And then yeah, what you will get at the end of that workflow if you use base blend is you

get the blended predictions from your candidate models.

469

00:44:58,335 --> 00:45:06,587

as well as the blended likelihood, like the posterior likelihood, which you can use then

to evaluate performance and things like that.

470

00:45:07,808 --> 00:45:09,448

And so, yeah, we're really excited about this.

471

00:45:09,448 --> 00:45:19,491

I'm really excited to get other people outside of Ledger to use it and tell us what they

think, make some complaints, some issues.

472

00:45:19,491 --> 00:45:23,172

There's a discussion board on the GitHub page as well.

473

00:45:24,072 --> 00:45:27,253

And we have a paper that we've submitted to

474

00:45:27,914 --> 00:45:33,458

We have a preprint on archive and we've submitted the paper as well, the journal, see how

that goes.

475

00:45:33,458 --> 00:45:41,365

But it's something that we use regularly and so it's something that we plan to keep

contributing to.

476

00:45:41,946 --> 00:45:47,501

if there's like quality of life or convenience things to make it easier for other folks to

use, we'd to hear about it.

477

00:45:47,501 --> 00:45:52,575

Because I think there's a lot of work that can still be done with stacking.

478

00:45:52,575 --> 00:45:54,733

There's a lot of really cool methods out there.

479

00:45:54,733 --> 00:45:59,786

I think hierarchical stacking in particular is something that I haven't really seen used

much in the wild.

480

00:45:59,786 --> 00:46:13,864

It's something we use every day at Ledger, which I think is, yeah, so I'm hoping base

blend will allow other people to kind of see that benefit and apply it in their own work

481

00:46:13,864 --> 00:46:16,385

easily in a reproducible way.

482

00:46:16,765 --> 00:46:18,366

Yeah, this is super cool.

483

00:46:18,366 --> 00:46:23,549

And so of course I put in the show notes the paper.

484

00:46:23,725 --> 00:46:37,654

and the documentation website to Baseband for people who want to dig deeper, which I

definitely encourage you to do.

485

00:46:38,635 --> 00:46:46,390

And when you're using Baseband, let's say I'm using Baseband from a PIMC model.

486

00:46:46,390 --> 00:46:50,023

So I'm going to give an inference data object.

487

00:46:51,004 --> 00:46:53,353

Do I get back an inference data object?

488

00:46:53,353 --> 00:46:57,805

object with my weighted positive predictive samples?

489

00:46:57,805 --> 00:47:03,007

How do I get back the like, which format am I going to get back the predictions?

490

00:47:03,007 --> 00:47:04,908

Yeah, that's a great question.

491

00:47:04,908 --> 00:47:08,900

I think if I can remember correctly, I don't want to give you the wrong information.

492

00:47:08,900 --> 00:47:19,545

I'm pretty sure we have like, like, when you create the object that does the stacking, so

the model object, there's a from our method.

493

00:47:19,851 --> 00:47:31,950

And then I think we have a toRViz method that you can kind of, it will use its own

internal representation of the predictions and things like that for just the sake of

494

00:47:31,950 --> 00:47:33,792

fitting the stacking model.

495

00:47:33,792 --> 00:47:38,334

But then I think you can return it back to an RViz inference object at the end.

496

00:47:38,375 --> 00:47:46,793

And one of the things that I wanna do, we haven't had the bandwidth for it quite yet, but

it's not that many steps to then just kind of get rid of like.

497

00:47:46,793 --> 00:47:49,994

we should just have like a from time C method, for example.

498

00:47:49,994 --> 00:47:54,345

And I think implementing something like that would be pretty straightforward.

499

00:47:54,345 --> 00:48:07,199

So I think we'll probably get to it eventually, but if anyone else wants to contribute,

once they know that, we have that doc on like how to contribute as well on the GitHub

500

00:48:07,199 --> 00:48:07,629

page.

501

00:48:07,629 --> 00:48:15,785

So, but yeah, so I think we, our intention is to make it as seamless as possible.

502

00:48:15,785 --> 00:48:27,368

And so to the extent that there's ways that we can make it easier to use, definitely open

to add those features or take recommendations on how we should approach it.

503

00:48:27,828 --> 00:48:35,781

But yeah, think probably right now working through the RVIS entrance data object is the

way you can interface with most things other than command stand.

504

00:48:35,781 --> 00:48:36,521

Yeah.

505

00:48:36,521 --> 00:48:37,211

Yeah.

506

00:48:37,211 --> 00:48:37,511

Yeah.

507

00:48:37,511 --> 00:48:44,863

I mean, for people listening in the Pinesy world and even

508

00:48:45,005 --> 00:48:56,660

Python world and even Stan world, investing in understanding better the inference data

object and XRA is definitely a great investment of your time because I know it sounds a

509

00:48:56,660 --> 00:49:01,012

bit frustrating, but it's like, basically it's like pandas.

510

00:49:01,012 --> 00:49:03,173

It's the pandas of our world.

511

00:49:03,173 --> 00:49:12,801

And if you become proficient at that format, it's gonna help you tremendously in your

Bayesian modeling workflow because

512

00:49:12,801 --> 00:49:19,424

You may only want to interact with the model, but actually a huge part of your time is

going to be making plots.

513

00:49:19,424 --> 00:49:23,705

And making plots is done with prior predictive or preserve predictive samples.

514

00:49:23,705 --> 00:49:27,647

And that means they live in the inference data object.

515

00:49:28,647 --> 00:49:40,012

I know it can be a bit frustrating because you have yet another thing to learn, but it is

actually extremely powerful because it's a multi -dimensional pandas data frame,

516

00:49:40,012 --> 00:49:40,572

basically.

517

00:49:40,572 --> 00:49:42,649

So instead of only having to have.

518

00:49:42,649 --> 00:49:52,517

2D pandas data frames, you can do a lot of things with a lot more dimensions, which is

always the case in No, I totally agree with that.

519

00:49:52,517 --> 00:49:56,260

And I think the other thing that's nice about it is you can use it in it.

520

00:49:56,260 --> 00:50:01,795

They have integrations in ARVIZ to work with a whole host of different PPLs.

521

00:50:01,795 --> 00:50:12,235

it's like, whether you're using Stan or PrimeC or whatever else, if ARVIZ data inference

object is always the commonality, it's...

522

00:50:12,235 --> 00:50:21,801

makes it easy if like, I'm in a different setting and I'm using this other PPO in this

case, and having to learn a bunch of different tools to do plotting and deal with the data

523

00:50:21,801 --> 00:50:24,022

can be quite annoying.

524

00:50:24,022 --> 00:50:26,854

So it's nice to have one place to do most of it.

525

00:50:26,854 --> 00:50:31,957

And I think we're gonna lean on that pretty heavily with like developing base blends.

526

00:50:31,957 --> 00:50:38,857

I think there's more we could probably do to integrate with the inference data structure

and like.

527

00:50:38,857 --> 00:50:41,669

in terms of making it easier to plot things and stuff like that.

528

00:50:41,669 --> 00:50:49,885

I think it's something I'm learning more and more about myself and would definitely also

recommend others to do.

529

00:50:49,885 --> 00:50:52,857

Yeah, that's what I tell my students almost all the time.

530

00:50:52,857 --> 00:50:59,812

It's like, time spent learning how inference data object works is well spent.

531

00:51:00,112 --> 00:51:01,593

Yeah, agreed.

532

00:51:01,613 --> 00:51:03,625

Because you're going to have to do that anyways.

533

00:51:03,625 --> 00:51:04,585

So you might as well start over.

534

00:51:04,585 --> 00:51:05,376

Right.

535

00:51:05,376 --> 00:51:06,276

Yeah.

536

00:51:07,127 --> 00:51:08,998

You'll encounter it at some point.

537

00:51:09,079 --> 00:51:10,339

yeah, yeah.

538

00:51:10,420 --> 00:51:17,746

And I'm wondering, so you talked about model stacking too.

539

00:51:18,487 --> 00:51:20,458

I'm not familiar with that term.

540

00:51:20,458 --> 00:51:26,413

Is that just the same as model averaging or is that different?

541

00:51:26,494 --> 00:51:30,127

Yeah, I mean, so there's like, there's technically some differences.

542

00:51:30,127 --> 00:51:35,421

And I think some of the ways that like when

543

00:51:35,797 --> 00:51:43,240

I think historically the term Bayesian model averaging has meant something pretty specific

in the literature.

544

00:51:43,400 --> 00:51:51,044

And yeah, I want to hope to not get this wrong because sometimes I mix things up in my

head when thinking about them.

545

00:51:51,044 --> 00:51:53,785

It's just due to the names makes it easy.

546

00:51:53,785 --> 00:52:04,319

But I'm pretty sure historically Bayesian model averaging was done on like in sample fit

statistics and not out of sample, which can kind of.

547

00:52:04,385 --> 00:52:10,270

it's a small thing, but it can make a big difference in terms of like results and how the

problem is approached and things like that.

548

00:52:11,031 --> 00:52:19,157

And so when talking about model averaging, I'd say like stacking is one form of model

averaging.

549

00:52:19,218 --> 00:52:26,023

And there are many ways that one could perform model averaging, whereas like stacking is

one specific variant of that.

550

00:52:27,845 --> 00:52:32,168

like the way that we like actually implement stacking is

551

00:52:33,429 --> 00:52:47,754

There's a couple of different ways that you can do it, but you're basically optimizing

the, like if you have the out of sample log likelihood statistics, you can compute like a

552

00:52:47,754 --> 00:52:49,644

point -wise ELPD, if you will.

553

00:52:49,644 --> 00:52:58,197

So it's not like the sum of the log predictive density across all of your data points, but

just like each data point has its own LPD.

554

00:52:58,877 --> 00:53:08,204

And then what you're essentially doing with stacking is you're fitting a model to optimize

combining all of those points across your different models.

555

00:53:08,204 --> 00:53:18,391

So it's like, maybe for certain data points, yeah, model A has a higher out of sample

likelihood than model B and for others it has lower.

556

00:53:18,391 --> 00:53:26,177

And so the goal of the stacking model is to fit it to those, with those as outcome

measures.

557

00:53:26,177 --> 00:53:26,989

And then,

558

00:53:26,989 --> 00:53:36,989

the weights that you derive from that are basically just optimizing how to combine those

likelihood values.

559

00:53:38,749 --> 00:53:46,829

so the way that stacking is actually implemented after you estimate those weights is to

sample from the posterior.

560

00:53:46,829 --> 00:53:56,909

So if I have, for a given data point, I have a 50 % weight on one model, 50 % weight on

another.

561

00:53:57,333 --> 00:54:05,375

kind of blending together the posteriors by drawing samples in proportion to the weights.

562

00:54:05,896 --> 00:54:15,598

so that's kind of how stacking is approached and how we've implemented it in Bayes Blend.

563

00:54:15,598 --> 00:54:25,941

I know like Pseudo -BMA, think Yu Lingyao who had done a lot of work with stacking and

Pseudo -BMA and stuff, we've had some talks with him.

564

00:54:26,199 --> 00:54:31,071

as well as Aki, Vittari, and some other folks who have done some work on these methods.

565

00:54:31,071 --> 00:54:42,847

I think they're moving away from the pseudo Bayesian model averaging terminology to start

to call it something that is less suggestive of like what classical Bayesian model

566

00:54:42,847 --> 00:54:45,618

averaging has typically referred to.

567

00:54:46,098 --> 00:54:51,520

And so I think like for folks interested in exploring more of that today,

568

00:54:52,161 --> 00:55:03,880

I mean, if you can read the preprint, some definitions that does a pretty good job, I'd

say, kind of defining some of these different ideas and gives you the math that you can

569

00:55:03,880 --> 00:55:07,453

look at to see how it's actually done mathematically.

570

00:55:07,974 --> 00:55:19,082

But then if you're kind of searching for resources to, I think, focusing on like the

stacking terminology should be probably pretty helpful over like Bayesian model averaging.

571

00:55:20,044 --> 00:55:22,025

That's my two cents, at least.

572

00:55:22,658 --> 00:55:23,558

Okay, yeah.

573

00:55:23,558 --> 00:55:31,024

Yeah, so what I get from that is that it's basically trying to do the same thing, but

using different approaches.

574

00:55:31,345 --> 00:55:32,746

Yeah, right, right, right.

575

00:55:32,746 --> 00:55:33,667

And that's my impression.

576

00:55:33,667 --> 00:55:36,579

I'm sure other people will have different reads on the literature.

577

00:55:36,579 --> 00:55:41,924

Like I said, it's something I've only really begun to explore in the last couple of years.

578

00:55:41,924 --> 00:55:45,567

And so there's I'm sure there are many other people out there that know much more.

579

00:55:45,567 --> 00:55:49,120

Okay, yeah, yeah, for sure.

580

00:55:49,120 --> 00:55:50,580

If we said

581

00:55:51,189 --> 00:55:57,534

If we've made a big mistake here and someone knows about that, please get in touch to me.

582

00:55:57,534 --> 00:56:00,446

can, you can be outraged in your message.

583

00:56:00,446 --> 00:56:04,689

That's I've learned something from that.

584

00:56:04,689 --> 00:56:06,740

I'm welcome.

585

00:56:07,421 --> 00:56:08,842

That's right.

586

00:56:08,842 --> 00:56:09,942

Me as well.

587

00:56:11,984 --> 00:56:19,369

And so actually I'd like to get back a bit to the previous

588

00:56:20,993 --> 00:56:39,483

the previous models we talked about, know, now that we've talked about your model

averaging work, and I'm curious about how do external factors like economic downturns or

589

00:56:39,483 --> 00:56:49,997

global health crises, for instance, how does that affect your forecasting models and what

strategies do you employ?

590

00:56:49,997 --> 00:56:53,318

to adjust models in response to such events?

591

00:56:53,819 --> 00:56:55,639

no, it's a great question.

592

00:56:55,979 --> 00:57:07,984

So yeah, can economic factors definitely, definitely can influence kind of the performance

of these portfolios.

593

00:57:07,984 --> 00:57:18,539

But a lot of times it's actually surprisingly, like, these loss ratios are surprisingly

robust to a lot of these economic factors.

594

00:57:18,539 --> 00:57:19,909

And partly,

595

00:57:20,021 --> 00:57:34,623

It's just because of the way that I think insurance generally works where, you know, if a

good example of this is in, yeah, like COVID times, people like, for example, if you're

596

00:57:34,623 --> 00:57:43,701

thinking about insuring commercial auto or private auto insurance policies, and like when

COVID happened, people stopped driving.

597

00:57:43,701 --> 00:57:46,123

And so people got into a lot less accidents.

598

00:57:46,123 --> 00:57:49,415

And so in that case, loss ratio is one.

599

00:57:49,441 --> 00:57:55,705

really far down for auto policies or for auto programs.

600

00:57:55,705 --> 00:58:07,521

And in some cases, insurance companies actually paid back some of the policyholders, like

some portion of the premium, just because things were so, like the the loss ratios were so

601

00:58:07,521 --> 00:58:10,273

low.

602

00:58:10,273 --> 00:58:13,334

And so there's examples of things like that happening.

603

00:58:15,015 --> 00:58:18,837

like just due to the nature of how like policies are written out,

604

00:58:18,841 --> 00:58:21,562

You do have to have, and how they're paid out.

605

00:58:21,562 --> 00:58:27,845

So like I paid my insurance upfront and then I only, they only lose money when claims are

made.

606

00:58:28,605 --> 00:58:35,788

The things that we think about, they have to be things that would, mostly things that

would influence claims, I would say is the primary factor.

607

00:58:35,788 --> 00:58:46,052

So if there's something economic that we believe is going to affect how much claims are

made, whether we think it will make them go up or down, like that's going to be our, like

608

00:58:46,052 --> 00:58:48,053

the primary force through which

609

00:58:48,053 --> 00:58:54,947

like economic conditions could affect these models, mostly because like the premium that

is written is pretty stable.

610

00:58:54,947 --> 00:59:03,702

Like generally, regardless of what's going on economically, the same types of insurance

policies like are often either required or things like that.

611

00:59:03,702 --> 00:59:16,499

So unless management is changing a lot about the program in terms of how they're pricing

things or something of that nature, you don't tend to get huge swings in the premium

612

00:59:16,499 --> 00:59:17,599

that's coming in.

613

00:59:18,117 --> 00:59:20,808

And so that's what we focus on.

614

00:59:20,808 --> 00:59:23,359

Mostly it would be things that affect claims.

615

00:59:23,359 --> 00:59:40,163

when we do look at that, one of the things that we've implemented, that we've looked into

is we look at like modeling overall industry level trends and using that as sort of input

616

00:59:40,163 --> 00:59:41,824

to our program level models.

617

00:59:41,824 --> 00:59:45,485

And so it's not quite like driving priors from industry.

618

00:59:45,485 --> 00:59:46,625

It's more like

619

00:59:46,625 --> 00:59:56,402

we actually know across all of the commercial auto programs, for example, what the sort of

industry level loss ratio is.

620

00:59:56,402 --> 01:00:06,359

And if we can understand, like that's where we might have some general idea of how

economic factors might influence something at that high of a scale.

621

01:00:06,359 --> 01:00:16,425

like the interest rate environment or like the location of the industry and other things

like that.

622

01:00:16,705 --> 01:00:30,975

We've built some models of like industry level trends that are then used as, so it's like

given we can predict like an industry loss ratio for the next so many accident years, we

623

01:00:30,975 --> 01:00:41,362

can use that information in our program level models and say like, how much do we think we

need to weight, you know, the industry where the industry is trending versus what we see

624

01:00:41,362 --> 01:00:42,762

in this program.

625

01:00:43,584 --> 01:00:46,281

That's kind of how we've approached that problem historically.

626

01:00:46,281 --> 01:00:56,509

I'd say we approach it that way mostly just because it's really hard at the level of

granularity that a lot of the programs that we deal with, like they're pretty small

627

01:00:56,509 --> 01:00:58,031

relative to industry at large.

628

01:00:58,031 --> 01:01:06,508

And so it's often hard to observe like general industry trends in the data, especially

when we have relatively few historic data points.

629

01:01:06,508 --> 01:01:09,780

hard to do it in a data -driven way.

630

01:01:11,081 --> 01:01:15,841

that's the big way that we've approached that problem is to kind of...

631

01:01:15,841 --> 01:01:22,385

We can better understand industry and understand how economic factors influence where the

industry is trending.

632

01:01:22,385 --> 01:01:26,648

We can then use that information in our program level analysis.

633

01:01:26,648 --> 01:01:32,150

And so we do have some models that do that.

634

01:01:33,231 --> 01:01:34,252

Yeah, fascinating.

635

01:01:34,252 --> 01:01:34,892

Fascinating.

636

01:01:34,892 --> 01:01:35,503

I really love that.

637

01:01:35,503 --> 01:01:42,987

That's super interesting because, by definition, these events are extremely low frequency.

638

01:01:43,329 --> 01:01:47,611

But at the same time, can have a huge magnitude.

639

01:01:47,631 --> 01:01:52,203

you would be tempted to just forget about them because of their low frequency.

640

01:01:52,203 --> 01:01:57,035

But the magnitude means you can't really forget about them.

641

01:01:57,035 --> 01:01:59,435

so that's really weird.

642

01:01:59,495 --> 01:02:07,939

think also having done that innovation framework is actually very helpful because you can

actually accommodate that in the model.

643

01:02:08,539 --> 01:02:09,140

Yeah.

644

01:02:09,140 --> 01:02:12,421

And I think that too, another thing that

645

01:02:12,833 --> 01:02:23,061

This is kind of interesting about the kind of insurance portfolios that we deal with is

that some of this is actually on the underwriters or the management team who's actually

646

01:02:23,061 --> 01:02:24,763

like writing the insurance policies.

647

01:02:24,763 --> 01:02:34,511

And so a lot of times, like those folks are the ones who are way ahead of the game in

terms of like, I think there's this really what we might call a long tail risk or

648

01:02:34,511 --> 01:02:37,232

something like historically.

649

01:02:38,293 --> 01:02:45,880

Like workers compensation asbestos was an example of this where it something that was

introduced in a bunch of houses.

650

01:02:45,880 --> 01:02:54,987

It was used everywhere as an insulator and you know, decades down the road come to find

out that this stuff is causing cancer and doing horrible things.

651

01:02:55,028 --> 01:02:59,762

And like those long tailed risks are, they're pretty rare.

652

01:02:59,762 --> 01:03:01,733

You don't come by them often.

653

01:03:01,914 --> 01:03:05,717

But it's something that a lot of times the underwriters who are

654

01:03:05,773 --> 01:03:19,090

kind of pricing the policies and writing the insurance policies, they are sort of the

frontline defense for that because they're on the lookout for all of these long tailed

655

01:03:19,090 --> 01:03:31,547

risks and taking that into account when like pricing the policies, for example, or when

writing policies themselves to exclude certain things if they think it shouldn't apply.

656

01:03:32,347 --> 01:03:35,585

so oftentimes like that

657

01:03:35,585 --> 01:03:44,517

that kind of makes its way into the perspective we have on modeling because when we're

modeling a loss ratio, for example, our perspective is that we're almost trying to

658

01:03:44,517 --> 01:03:54,620

evaluate the performance of the management team because they're responsible for actually

writing the policies and marketing their insurance product and all of that.

659

01:03:54,620 --> 01:04:01,242

And we view ourselves as looking at the historic information is just like their track

record.

660

01:04:01,242 --> 01:04:04,375

so, I mean, that doesn't stop big economic things from.

661

01:04:04,375 --> 01:04:06,226

from changing that track record.

662

01:04:06,246 --> 01:04:11,238

But that's something that's kind of influenced how we think about our models, at least

from a generative perspective.

663

01:04:11,238 --> 01:04:25,224

yeah, I think it's definitely important to have that perspective when you're in such a

case where the data that you're getting is made and it kind of arises in a rather

664

01:04:25,224 --> 01:04:26,734

complicated way.

665

01:04:28,035 --> 01:04:32,957

Yeah, fantastic points, I completely agree with that.

666

01:04:33,557 --> 01:04:46,297

And are there some metrics or, well, techniques we've talked about them, but are there any

metrics, if any, that you find most effective for evaluating the performance of these

667

01:04:46,297 --> 01:04:48,268

Bayesian time series models?

668

01:04:49,169 --> 01:04:56,384

Yeah, no, think, yeah, historically we've done a lot of the sort of log likelihood based

metrics.

669

01:04:56,384 --> 01:05:00,437

So we use ELPD for a lot of our decision making.

670

01:05:01,053 --> 01:05:10,098

So if we're exploring different models and we're doing our stacking workflow and all of

that, at the end of the day, if we're deciding whether it's worth including another

671

01:05:10,098 --> 01:05:18,953

candidate model in the stacking model in production, we'll often compare like what we

currently have in production to the new proposed thing, which could be a single model.

672

01:05:18,953 --> 01:05:23,065

It could be some stacked models or what have you.

673

01:05:23,246 --> 01:05:28,909

Typically we're using ELPD and we also look at things like

674

01:05:29,313 --> 01:05:32,976

like RMSE and mean absolute error.

675

01:05:33,738 --> 01:05:46,830

We tend to not rely necessarily on any given metric just because sometimes, especially

with ELPD, with the types of models we work with, there are some times where you can get

676

01:05:46,830 --> 01:05:52,896

pretty wild values for ELPD that can really bias like.

677

01:05:53,905 --> 01:05:59,747

Like at the end of the day, I guess this gets a little technical, but you might have an

LPD score for each data point.

678

01:05:59,747 --> 01:06:11,820

And if one of those data points is quite off, when you take the sum to get your total

model performance metric, it can often, can sometimes, it acts like any outlier and can

679

01:06:11,820 --> 01:06:15,871

kind of make the sum go in one direction quite a bit.

680

01:06:16,831 --> 01:06:22,073

so sometimes the LPD might be very sensitive to...

681

01:06:22,743 --> 01:06:25,995

like outlier data points compared to things like RMSE.

682

01:06:25,995 --> 01:06:34,201

So you might be actually, and then the reason is just because like you might be, your

prediction might be pretty close, like in an absolute scale, but like if your uncertainty

683

01:06:34,201 --> 01:06:45,799

is really low in your prediction, like what ELPD really is measuring is like the height of

your posterior density, prediction of the density at where the data point is.

684

01:06:45,799 --> 01:06:51,853

And so if you're too certain, your data points like way out in the tail of some

distribution,

685

01:06:51,897 --> 01:07:00,803

and it ends up getting this crazy value even though RMSE might be pretty good because on

average you're pretty close actually.

686

01:07:00,823 --> 01:07:13,552

So we have had to do some forays into more robust ways to compute or to estimate ELPD.

687

01:07:13,552 --> 01:07:20,957

We've done some research on that and sometimes we'll use those metrics in production where

we will say instead of

688

01:07:21,665 --> 01:07:33,053

instead of taking a sum of the ELPD values across all your data points, out of sample data

points will fit like a T distribution to all of those data points.

689

01:07:33,093 --> 01:07:42,860

And that's one way that like the expectation of that T distribution is not going to be as

influenced by some extreme outliers.

690

01:07:43,201 --> 01:07:49,889

You also get the benefit of, you get like a degrees of freedom parameter estimated from

the T distribution that way.

691

01:07:49,889 --> 01:07:57,701

And that can be a sort of diagnostic because if it's too low, then you're approaching like

a Cauchy distribution that doesn't have an expectation.

692

01:07:57,701 --> 01:07:58,921

It doesn't have a variance.

693

01:07:58,921 --> 01:08:07,684

so, so you can, we've explored methods like that that we'll sometimes use in production

just because we do so many tests.

694

01:08:07,684 --> 01:08:17,407

It's a shame to like not be able to do a comparison because there's like a few data points

out of the thousands of data points that we have in our historic database that kind of

695

01:08:17,407 --> 01:08:19,571

throw everything off and make it such that

696

01:08:19,571 --> 01:08:22,843

there's no consensus on which model is performing better.

697

01:08:23,564 --> 01:08:35,945

And so yeah, that's a long way of saying we mostly focus on ELPD, but use some other like

more absolute metrics that are easily interpretable and then also some what we kind of

698

01:08:35,945 --> 01:08:48,545

think of as these more robust variants of ELPD, which I think at some point I think we'll

try to write a paper on it, see what other people think because one of those things that

699

01:08:48,777 --> 01:08:57,413

comes up, you come up with a solution to something that you think is a pretty big problem

and then very curious what other people might actually think about that or if they see any

700

01:08:57,413 --> 01:09:00,325

big holes in the approach.

701

01:09:00,325 --> 01:09:04,228

so, yeah, maybe at some point we'll have a paper out on that.

702

01:09:04,228 --> 01:09:05,168

We'll see.

703

01:09:06,230 --> 01:09:07,870

Yeah, that sounds like fun.

704

01:09:07,971 --> 01:09:16,376

But actually, I think it's a good illustration of something I always answer my students

who come from the statistic framework.

705

01:09:17,203 --> 01:09:23,975

and they tend to be much more oriented towards metrics and tests.

706

01:09:25,195 --> 01:09:31,077

And that's always weird to me because I'm like, you have posterior samples, you have

distribution for everything.

707

01:09:31,077 --> 01:09:33,167

Why do you want just one number?

708

01:09:33,167 --> 01:09:37,659

And actually you worked hard to get all those posterior samples in distribution.

709

01:09:37,659 --> 01:09:42,520

So why do want to throw them out the window as soon as you have them?

710

01:09:42,660 --> 01:09:43,980

I'm curious.

711

01:09:44,881 --> 01:09:47,091

So yeah, You need to make a decision, right?

712

01:09:47,157 --> 01:09:48,328

Yeah.

713

01:09:48,328 --> 01:09:57,524

And often they ask that like something related to, but what's the metric to know that

basically the model is good.

714

01:09:57,524 --> 01:10:01,066

You know, so how do I compute R squared, for instance?

715

01:10:01,066 --> 01:10:02,267

Right.

716

01:10:02,688 --> 01:10:15,095

And I always give an answer that must be very annoying, that's like, I understand that you

want a statistics, you know, a metric, a statistic.

717

01:10:15,496 --> 01:10:16,677

That's great.

718

01:10:17,229 --> 01:10:19,070

But it's just a summary.

719

01:10:19,070 --> 01:10:20,281

It's nothing magic.

720

01:10:20,281 --> 01:10:26,896

So what you should probably do in all of the cases is using a lot of different metrics.

721

01:10:26,896 --> 01:10:34,400

And that's just what you answered here is like, you don't have one go -to metric that's

supposed to be a magic number, and then you're good.

722

01:10:34,901 --> 01:10:44,247

It's like, no, you're looking at different metrics because each metric gives you an

estimation of a different angle of the same model.

723

01:10:44,247 --> 01:10:46,785

And a model is going to be good at some things

724

01:10:46,785 --> 01:10:48,606

but not at others, right?

725

01:10:48,606 --> 01:10:51,566

It's a bit like an athlete.

726

01:10:51,566 --> 01:10:56,068

An athlete is rarely extremely complete because it has to be extremely specialized.

727

01:10:56,068 --> 01:10:59,258

So that means you have trade -offs to make.

728

01:10:59,258 --> 01:11:04,450

so your model, often you have to choose, well, I want my model to be really good at that.

729

01:11:04,450 --> 01:11:06,991

Don't really care about it being really good at that.

730

01:11:06,991 --> 01:11:14,733

But then if your metric is measuring the second option, your model is gonna appear really

bad, but you don't really care about that.

731

01:11:15,617 --> 01:11:20,829

what you end up doing as the modeler is looking at the model from different perspectives

and angles.

732

01:11:20,829 --> 01:11:33,323

And that will also give you insights about your model because often the models are huge

and multi -dimensional and you just have a small homo sapiens brain that cannot see beyond

733

01:11:33,323 --> 01:11:34,763

three dimensions, right?

734

01:11:34,763 --> 01:11:41,245

So you have to time down everything and basically...

735

01:11:41,385 --> 01:11:43,826

I'm always saying, look at different metrics.

736

01:11:43,826 --> 01:11:45,477

Don't always look at the same one.

737

01:11:45,477 --> 01:11:52,019

And maybe also sometimes invent your own metric, because often that's something you're

interested in.

738

01:11:52,019 --> 01:11:54,550

You're interested in a very particular thing.

739

01:11:54,550 --> 01:12:00,393

Well, just invent your own metric, because you can always compute it, because it's just

posterior samples.

740

01:12:00,393 --> 01:12:05,214

And in the end, posterior samples, you just count them, and you see how it goes.

741

01:12:05,295 --> 01:12:06,955

That's not that hard.

742

01:12:07,496 --> 01:12:09,306

No, I think that's great.

743

01:12:09,306 --> 01:12:10,957

And it's great to get that.

744

01:12:11,549 --> 01:12:15,392

in folks heads when they're like starting to learn about this stuff.

745

01:12:15,392 --> 01:12:26,950

It's like I can't even, I can't even count like how many, know, classic machine learning

papers I've seen where you just have tables of bolded metrics with the model with the

746

01:12:26,950 --> 01:12:28,892

lowest RMSEs the best, right?

747

01:12:28,892 --> 01:12:30,483

And so therefore it's chosen.

748

01:12:30,483 --> 01:12:38,048

You know, I think that perspective can, it can make it a little harder to actually

understand your models for sure.

749

01:12:39,229 --> 01:12:49,466

And yeah, because there's even other things like we've, it reminds me like, yeah, I we

look at like we do simulation based calibration, like prior sensitivity analysis, like all

750

01:12:49,466 --> 01:12:58,803

of these things that aren't necessarily tied to a performance metric, but they're tied to

how well you can interpret your model and how much you can trust it in the parameters that

751

01:12:58,803 --> 01:12:59,543

it's outputting.

752

01:12:59,543 --> 01:13:03,706

And so I think like all of those should also definitely be considered.

753

01:13:03,706 --> 01:13:08,129

And, you know, another thing that we encounter quite a lot is like,

754

01:13:08,129 --> 01:13:10,250

there's a cost to productionize these models.

755

01:13:10,250 --> 01:13:20,074

Like if we have a new model and it performs better technically by a small amount, like is

it really worth it if it's like a very complicated, hard to interpret and hard to

756

01:13:20,074 --> 01:13:20,644

maintain?

757

01:13:20,644 --> 01:13:26,397

And I think sometimes the answer is no, actually what we have is good enough.

758

01:13:26,397 --> 01:13:33,240

And so we don't actually need this more complicated, you know, hard to work with model.

759

01:13:33,240 --> 01:13:35,261

And that's something that I feel is

760

01:13:35,873 --> 01:13:46,958

probably more common in industry settings where you're expected to maintain and reuse

these models repeatedly versus maybe more in academic work where where you like research

761

01:13:46,958 --> 01:13:55,921

is the primary objective and maybe you don't need to think as much about like model

maintainability or productionization things like that.

762

01:13:55,921 --> 01:14:04,937

And so I feel like having a holistic perspective on how your models evaluated is I think

very important and

763

01:14:04,937 --> 01:14:12,742

and definitely something that not any single metric is going to allow you to do that.

764

01:14:13,323 --> 01:14:15,764

Yeah, definitely.

765

01:14:16,405 --> 01:14:26,432

I'm really happy to hear that you guys use simulation -based calibration a lot because

that's quite new, but it's so useful.

766

01:14:26,432 --> 01:14:28,173

It's very useful.

767

01:14:28,173 --> 01:14:29,293

Yeah.

768

01:14:29,293 --> 01:14:33,416

It's nice to figure out if you have problems with your model before you fit it to real

data.

769

01:14:33,416 --> 01:14:34,517

Yeah.

770

01:14:34,701 --> 01:14:37,541

Yeah, I'm curious about how you do that.

771

01:14:37,541 --> 01:14:46,971

But first, for folks who want some detail about that, you can go back and listen to

episodes 107 with Marvin Schmidt.

772

01:14:46,971 --> 01:14:53,781

We talked about amortized Bayesian inference and why that's super useful for simulation

-based calibration.

773

01:14:53,781 --> 01:15:04,023

And also episode 109 with Sonja Winter, where we actually go into how would you implement

simulation -based calibration.

774

01:15:04,023 --> 01:15:04,924

why that's useful.

775

01:15:04,924 --> 01:15:08,256

So it's a bit of context here if you don't know what that is about.

776

01:15:08,757 --> 01:15:11,819

And now there are chapters to the episodes.

777

01:15:11,819 --> 01:15:26,910

So if you go to the website, learnbasedats .com, you go to the episode page, you'll have

the chapters of the episodes and you can directly click to the timestamp and you'll see,

778

01:15:26,910 --> 01:15:30,815

you'll be able to jump directly to the...

779

01:15:30,815 --> 01:15:39,712

episode to the part of the episode where we talk about that stuff in particular And of

course you have that also on the YouTube channel.

780

01:15:39,773 --> 01:15:48,810

So now if you go to any YouTube episodes you click on the timestamp you're interested in

and the video will just get there.

781

01:15:48,810 --> 01:15:59,541

So that's pretty cool to reference back to something, you know, you're like, I've heard

that somewhere In the episodes, but I don't remember exactly where so yeah

782

01:15:59,541 --> 01:16:02,203

like something you can do that I use actually quite a lot.

783

01:16:02,203 --> 01:16:03,413

You go to LearnBaseStats.

784

01:16:03,413 --> 01:16:04,564

I'm going to be using this now.

785

01:16:04,564 --> 01:16:05,704

This is a good tip.

786

01:16:05,704 --> 01:16:08,124

You can do Control -F.

787

01:16:08,146 --> 01:16:18,201

You do Control -F and you look for the terms you're interested in and it will show up in

the transcript because you have the transcript now also on on each episode page on

788

01:16:18,201 --> 01:16:19,852

LearnBaseStats .com.

789

01:16:20,132 --> 01:16:29,239

You look at the timestamp and then with the timestamp you can infer which chapter this is

talked in and then you get back to the

790

01:16:29,239 --> 01:16:32,490

to the part of the episode you're interested in much faster.

791

01:16:32,490 --> 01:16:33,390

yeah.

792

01:16:33,391 --> 01:16:37,562

Yeah, very helpful because searching in podcasts has historically been a challenging

problem.

793

01:16:37,562 --> 01:16:38,693

yeah.

794

01:16:38,953 --> 01:16:40,183

Now that's getting much better.

795

01:16:40,183 --> 01:16:43,875

So yeah, definitely use that.

796

01:16:43,875 --> 01:16:47,436

I do that all the time because I'm like, wait, we talked about that.

797

01:16:47,436 --> 01:16:50,578

I remember it's in that episode, but I don't remember when.

798

01:16:50,578 --> 01:16:53,299

So I use that all the time.

799

01:16:53,299 --> 01:16:59,053

So yeah, maybe like I know we're running like we're already at

800

01:16:59,053 --> 01:17:07,643

when I were in 15, but just can you talk a bit about SBC simulation based calibration?

801

01:17:07,643 --> 01:17:10,973

How do you guys use that in the trenches?

802

01:17:10,973 --> 01:17:12,733

very curious about that.

803

01:17:12,953 --> 01:17:14,793

That's a good question.

804

01:17:15,253 --> 01:17:16,013

Yeah.

805

01:17:16,533 --> 01:17:22,933

So we have like for pretty much everything we do, we have like pretty custom models that

we use pretty custom software.

806

01:17:22,933 --> 01:17:27,703

So we have like our own internal software that we've written to like make it easy for

807

01:17:27,703 --> 01:17:29,894

the style of models that we use.

808

01:17:31,195 --> 01:17:42,551

yeah, for any model that we do research on or any model that we end up deploying in

production, typically, yeah, we start with simulation -based calibration and prior

809

01:17:42,551 --> 01:17:44,722

sensitivity analysis and that sort of thing.

810

01:17:44,722 --> 01:17:55,067

And so with simulation -based calibration, typically, because described our workflow

earlier where it's kind of like multi -stage.

811

01:17:57,837 --> 01:17:59,828

but there are multiple different models.

812

01:17:59,828 --> 01:18:02,879

Like there are a couple of different models along that pipeline.

813

01:18:02,879 --> 01:18:16,666

And so typically we will do SBC for each of those individual component models just to see

if there are any clear issues that arise from any kind of sub component of our whole

814

01:18:16,666 --> 01:18:17,826

pipeline.

815

01:18:18,507 --> 01:18:22,949

And then we will also try to do it at least for...

816

01:18:23,979 --> 01:18:31,676

Like in some cases we might be able to do it for the whole pipeline, but typically you

might look at it for each individual model and then we'll, for the whole pipeline, we

817

01:18:31,676 --> 01:18:39,663

might look more like it, like the calibration of the posterior predictions, for example,

against the true holdout points.

818

01:18:39,663 --> 01:18:47,711

But yeah, SVC is really nice because it, I mean, oftentimes we do want to be able to

interpret the parameters in our models.

819

01:18:47,711 --> 01:18:50,603

Let's say like if you really don't care about interpretation, maybe it's.

820

01:18:50,603 --> 01:18:54,736

it's maybe not as motivating to go through the whole SBC process.

821

01:18:55,698 --> 01:19:06,066

But in our case, oftentimes we'll have parameters that represent like how much momentum a

program has, how much reversion it has, like where the average program level loss ratio is

822

01:19:06,066 --> 01:19:09,809

that gets reverted back to is sitting, which is an important quantity.

823

01:19:09,910 --> 01:19:18,857

And we want to know that when we get those numbers out of our models and the parameter

values that we can actually interpret them in a meaningful way.

824

01:19:18,859 --> 01:19:33,091

And so SBC is, yeah, a great way to be able to look and see like, okay, are we able to

actually estimate parameters in our models in an unbiased and an interpretable way?

825

01:19:33,091 --> 01:19:35,873

But then also like, have we programmed the models correctly?

826

01:19:35,873 --> 01:19:38,105

I think it's another big thing.

827

01:19:38,105 --> 01:19:42,969

SBC helps you resolve because often, you

828

01:19:43,153 --> 01:19:50,919

The only time you really know if your model is actually coded up the right way is if you

simulate some fake data with known parameter values and then try to recover them with the

829

01:19:50,919 --> 01:19:51,639

same model.

830

01:19:51,639 --> 01:19:56,413

And SBC is sort of just like a comprehensive way to do that.

831

01:19:56,413 --> 01:20:07,921

I remember like before I read this SBC paper I used to back in my PhD years, like we

would, you know, pick a random set of parameter values for these models and simulate some

832

01:20:07,921 --> 01:20:11,223

data and refit just that single set of parameters and see

833

01:20:11,223 --> 01:20:14,395

Like, okay, like are they falling in the posteriors?

834

01:20:15,096 --> 01:20:21,000

And I feel like SBC is sort of just like a very nice way to do that in a much more

principled way.

835

01:20:22,822 --> 01:20:34,712

And so I would definitely encourage regular use of SBC, even though, yeah, it takes a

little bit more time, but it saves you more headaches later on down the road.

836

01:20:35,413 --> 01:20:41,257

Yeah, I mean, SBC is definitely just a industrialized way of doing

837

01:20:41,271 --> 01:20:42,572

what you described, right?

838

01:20:42,572 --> 01:20:53,750

Just fixing the parameters, sampling the model, and then seeing if the model was able to

recover the parameters we used to fit it, basically.

839

01:20:53,750 --> 01:20:55,190

yeah.

840

01:20:55,190 --> 01:21:01,305

From these parameters, you sample prior predictive samples, which you use as the data to

fit on the model.

841

01:21:01,305 --> 01:21:09,940

yeah, multi -spatial inference is super useful for that, because then once you've trained

the neural network, it's free to get posterior samples.

842

01:21:11,117 --> 01:21:12,977

But two things.

843

01:21:12,977 --> 01:21:21,717

First, that'd be great if you could add that SPC paper you mentioned to the show notes,

because I think it's going to be interesting to listeners.

844

01:21:22,617 --> 01:21:28,497

second, how do you do that concretely?

845

01:21:28,777 --> 01:21:36,327

So when you do SPC, you're going to fit the model 50 or 100 times in a loop.

846

01:21:36,327 --> 01:21:38,245

That's basically how you do it right now.

847

01:21:38,445 --> 01:21:46,845

Yeah, usually we'll have will fit probably probably like 100 to 500 times closer to 500.

848

01:21:46,845 --> 01:21:50,425

Usually a lot of our models actually don't take too long to fit.

849

01:21:50,425 --> 01:21:58,025

Most of the most of the fitting time that is involved is in like the R &D process, like

backtesting, training the stacking models and all that.

850

01:21:58,025 --> 01:22:02,775

But once you're like for the individual models to fit to data, they're pretty quick most

of the time.

851

01:22:02,775 --> 01:22:06,385

So we we do a decent number of samples usually.

852

01:22:06,385 --> 01:22:08,387

And yeah, it's sort of like, well,

853

01:22:08,657 --> 01:22:21,639

We have like the way that it's programmed out in our internal software is that will like

refit the model basically to like save out all the sort of rank statistics that we need or

854

01:22:21,639 --> 01:22:35,260

like the location of the the percentiles of the the true values in the posterior predicted

values for those parameters and just store all those like going over in a loop.

855

01:22:35,400 --> 01:22:36,225

I think we've

856

01:22:36,225 --> 01:22:44,350

We might have some stuff that we've messed around with like parallelizing that, but it

ends up usually just being faster to just parallelize the MCMC chains instead.

857

01:22:44,350 --> 01:22:49,332

So a lot of times we just run this stuff locally because the models fit so quick.

858

01:22:49,332 --> 01:22:52,194

That's usually how we approach it.

859

01:22:52,494 --> 01:23:02,300

But yeah, so it's like if you can set a single set of parameter values and do a parameter

recovery simulation, SBC is basically just a loop on top of that.

860

01:23:02,300 --> 01:23:04,281

it's not a lot.

861

01:23:04,589 --> 01:23:05,619

too much overhead, really.

862

01:23:05,619 --> 01:23:11,872

The overhead is in the added time it takes, not necessarily the added amount of code that

it takes.

863

01:23:11,872 --> 01:23:13,093

Just kind of nice, I think.

864

01:23:13,093 --> 01:23:14,733

Yeah, Yeah, the code is trivial.

865

01:23:14,733 --> 01:23:19,235

It's just like you need to let the computer basically run a whole night.

866

01:23:19,235 --> 01:23:22,096

Yeah, go make some tea, get a coffee.

867

01:23:22,096 --> 01:23:24,327

Yeah, to do all the simulations.

868

01:23:24,327 --> 01:23:25,188

mean, that's fine.

869

01:23:25,188 --> 01:23:28,219

keep your house warm in the winter.

870

01:23:28,659 --> 01:23:33,521

Usually it fits quite fast, the model, because you're using the prior predictive samples.

871

01:23:34,297 --> 01:23:39,121

and as the data so it's not too weird.

872

01:23:39,161 --> 01:23:43,455

In general, you have a rubber structure in your generative model.

873

01:23:43,455 --> 01:23:45,016

So yeah, definitely.

874

01:23:45,016 --> 01:23:46,017

Yeah, no, that's a point.

875

01:23:46,017 --> 01:23:47,728

Please nurse to do that.

876

01:23:47,889 --> 01:23:49,389

Yeah, sure.

877

01:23:49,430 --> 01:23:49,680

Yeah.

878

01:23:49,680 --> 01:23:56,095

And if you don't have completely stupid priors, Yeah, you will find that you probably do.

879

01:23:56,095 --> 01:23:57,776

Yeah.

880

01:23:57,877 --> 01:24:03,231

If you're using extremely wide priors, then yeah, your your data is gonna look very weird.

881

01:24:03,317 --> 01:24:05,058

a good portion of the time.

882

01:24:05,058 --> 01:24:08,358

so yeah, like then the model fitting is going to be longer, but then.

883

01:24:08,358 --> 01:24:09,169

Yeah.

884

01:24:09,169 --> 01:24:10,559

No, that's a good point.

885

01:24:10,559 --> 01:24:10,969

Yeah.

886

01:24:10,969 --> 01:24:21,542

It didn't bring up, but yeah, for, for helping come up with reasonable priors, SBC's

another, so it's a good way to do that because if it works, then that's a good sign.

887

01:24:21,682 --> 01:24:28,024

If it's going off the rails, then probably your priors would be the first place to look

other than perhaps you'll get the code.

888

01:24:28,024 --> 01:24:28,304

Yeah.

889

01:24:28,304 --> 01:24:28,564

Yeah.

890

01:24:28,564 --> 01:24:29,404

No, exactly.

891

01:24:29,404 --> 01:24:31,165

And that's why SBC is really cool.

892

01:24:31,165 --> 01:24:33,425

I think it's like, as you were saying,

893

01:24:33,473 --> 01:24:40,637

Because then you have also much more confidence in your model when you actually start

fitting it on real data.

894

01:24:44,319 --> 01:24:47,460

So maybe one last question for you, Nate.

895

01:24:47,841 --> 01:24:49,071

I have so many questions for you.

896

01:24:49,071 --> 01:24:54,925

It's like you do so many things, we're closing up to the one hour and a half.

897

01:24:54,925 --> 01:24:57,946

So I want to be respectful of your time.

898

01:24:58,927 --> 01:25:03,095

But I'm curious where you see the future of

899

01:25:03,095 --> 01:25:15,523

patient modeling, especially in in your field, so insurance and financial markets,

particularly with respect to new technologies like, you know, the new machine learning

900

01:25:15,523 --> 01:25:18,195

methods and especially generative AI.

901

01:25:18,896 --> 01:25:20,487

Yeah, that's a great question.

902

01:25:20,487 --> 01:25:21,117

I

903

01:25:23,465 --> 01:25:24,986

I'm of two minds, think.

904

01:25:24,986 --> 01:25:38,676

A part of me, from doing some Bayesian modeling and guess kind of like healthcare before

this and now more in like finance insurance side, I think there's like what you see in the

905

01:25:38,676 --> 01:25:43,379

press about like all the advances in generative AI and all of that.

906

01:25:43,379 --> 01:25:50,384

And then there's like the reality of the actual data structures and organization that you

see in the wild.

907

01:25:50,384 --> 01:25:52,235

And I think...

908

01:25:52,833 --> 01:26:04,339

I think there's still like a lot of room for more of what you might think of as like the

classical kind of workflow where people are, you know, not really necessarily relying on

909

01:26:04,339 --> 01:26:10,893

any really complicated infrastructure or modeling techniques, but more following like your

traditional principle Bayesian workflow.

910

01:26:10,893 --> 01:26:17,267

And I think especially in the insurance industry, the insurance industry is like very

heavily regulated.

911

01:26:17,267 --> 01:26:21,889

And like if you're doing any pricing for insurance, for example,

912

01:26:21,889 --> 01:26:24,400

you basically have to use a linear model.

913

01:26:24,400 --> 01:26:27,641

There's really very little deviation you can get from there.

914

01:26:27,641 --> 01:26:38,384

And so like, yeah, you could do base there, but you can't really, I think more of like

the, what we might think of when we think of like AI types of technologies.

915

01:26:38,384 --> 01:26:50,117

think there's potentially room for that and like within organizations, but for some of the

core modeling work that's done that influences decisions that are made, I think.

916

01:26:50,477 --> 01:26:54,958

there's still a ton of room for more of these classical statistics types of approaches.

917

01:26:56,119 --> 01:27:07,462

That being said, I think there's a lot of interest in bays at scale and more modern

machine learning types of contexts.

918

01:27:07,462 --> 01:27:17,805

And I think there's a lot of work that's going on with in -law and variational bays and

like

919

01:27:18,173 --> 01:27:25,019

the Stan team just released Pathfinder, which is kind of a new algorithm, like on the

variational side of things.

920

01:27:25,139 --> 01:27:37,630

And I think when I think of like Bayes at scale and like Bayesian machine learning types

of applications, I think there's probably a lot of interesting work that can be done in

921

01:27:37,630 --> 01:27:38,250

that area.

922

01:27:38,250 --> 01:27:42,753

think there's a lot of interesting future potential for those methods.

923

01:27:43,154 --> 01:27:46,897

I have less experience with them myself, so I can't really speak to

924

01:27:47,117 --> 01:27:48,938

to them in too much detail.

925

01:27:49,999 --> 01:27:57,264

But I also think there's a lot of interesting things to explore with full bays.

926

01:27:57,344 --> 01:28:06,231

As we have more compute power, it's easier to, for example, run a model with many chains

with relatively few samples.

927

01:28:06,231 --> 01:28:15,297

And so I think with distributed computing, I think it would be great to have a future

where we can still do full bays.

928

01:28:15,539 --> 01:28:25,792

like, you know, get our full posterior's with some, some variant of MCMC, but in a, in in

faster way, just with more compute.

929

01:28:25,792 --> 01:28:27,732

And so I think, yeah.

930

01:28:27,732 --> 01:28:38,185

So, so I guess all that to say, I think that there's going to be a long time before, you

know, the classical statistics, like modeling workflow becomes obsolete.

931

01:28:38,185 --> 01:28:45,399

I don't see that happening anytime soon, but I think in terms of like using Bayes and

other things at scale, there's a lot of

932

01:28:45,399 --> 01:28:58,830

really exciting different methods that are being explored that I haven't actually myself

had any real exposure to in like work setting or applied setting because the problems that

933

01:28:58,830 --> 01:29:01,973

I've worked on have kind of retained there.

934

01:29:01,973 --> 01:29:11,841

I can still mostly fit the models on my laptop or on like some EC2 instance that some

computer that doesn't require too much compute.

935

01:29:11,841 --> 01:29:14,573

So yeah, I guess that's.

936

01:29:14,647 --> 01:29:16,598

That's my current perspective.

937

01:29:16,598 --> 01:29:18,079

see how it changes.

938

01:29:19,079 --> 01:29:19,960

Yeah, yeah.

939

01:29:19,960 --> 01:29:22,271

mean, these are good points for sure.

940

01:29:22,271 --> 01:29:34,818

mean, you're still going to need to understand the models and make sure the assumptions

make sense and understand the edge cases, the different dimensions of the models and

941

01:29:34,818 --> 01:29:37,690

angles as we talked about a bit earlier.

942

01:29:37,690 --> 01:29:43,693

yeah, I think that's a really tremendous asset.

943

01:29:44,605 --> 01:29:47,226

and a kick -ass sidekick.

944

01:29:48,047 --> 01:29:54,449

So for sure, that's extremely useful right now.

945

01:29:55,410 --> 01:30:01,972

I can't wait to see how much progress we're gonna make on that front.

946

01:30:01,972 --> 01:30:13,057

I really dream about having a Jarvis, like Iron Man, like Tony Stark has Jarvis, and then

it's like, it's extreme.

947

01:30:13,057 --> 01:30:24,142

That'd be perfect, but you're basically outsource a lot of that stuff that you're not very

good at and you focus on the thing you're really extremely good at and efficient and

948

01:30:24,142 --> 01:30:25,242

productive.

949

01:30:25,662 --> 01:30:34,666

Yeah, no, definitely think that like a lot of the generative AI types of tools definitely

can aid with productivity for sure.

950

01:30:34,666 --> 01:30:41,309

Like, I can't tell you how many times I've just been like, hey, tell me how to do this

with Pandas because I don't want to figure it out.

951

01:30:41,798 --> 01:30:44,670

Similarly with like Plotly or stuff like that.

952

01:30:44,670 --> 01:30:51,206

feel there are certain parts of the workflow where Google or Stack Overflow is no longer

the first line of defense, right?

953

01:30:52,287 --> 01:31:04,017

And I think a lot of that stuff that I don't like to spend time on sometimes can be sped

up by a lot of these tools, which is really nice to have, I would say.

954

01:31:04,919 --> 01:31:07,721

But yeah, we'll definitely be curious to see.

955

01:31:08,121 --> 01:31:11,953

if Bayes underlies any of some of these methods going forward.

956

01:31:11,953 --> 01:31:20,126

I know there's an interest in it, but the scalability concerns have so far maybe made that

a little challenging.

957

01:31:21,387 --> 01:31:31,051

Although I don't know in your case, in my case, I've never had a project where we were

like, no, actually we can't use Bayes here because the sketch is too big.

958

01:31:31,431 --> 01:31:32,291

No, I agree.

959

01:31:32,291 --> 01:31:34,772

think I've been similarly for me.

960

01:31:34,772 --> 01:31:36,153

Usually there's a way.

961

01:31:36,681 --> 01:31:48,559

And I think, yeah, I think there are definitely problems where that gets challenging, but

at the same time, like if it's getting challenging for Bayes, it's probably gonna be

962

01:31:48,559 --> 01:31:51,251

challenging for other methods as well, I think.

963

01:31:51,251 --> 01:31:55,154

And then you deal with other issues in these cases too.

964

01:31:55,274 --> 01:32:04,500

And so I think, yeah, I've also been kind of biased by this, because a lot of times I'm

working with rather small datasets.

965

01:32:05,407 --> 01:32:13,389

at least in terms of how much memory they're taking up on my computer or something like

that.

966

01:32:13,389 --> 01:32:19,463

They're small enough that we can do some fun modeling and not have to worry too much.

967

01:32:22,484 --> 01:32:24,665

Yeah.

968

01:32:25,005 --> 01:32:26,325

That's a good point.

969

01:32:26,746 --> 01:32:34,269

But yeah, definitely I'm still waiting for a case where we'll be like, yeah, no, actually

we cannot use base here.

970

01:32:34,495 --> 01:32:34,865

Right.

971

01:32:34,865 --> 01:32:36,046

Yeah.

972

01:32:36,046 --> 01:32:37,628

That's actually interesting.

973

01:32:37,628 --> 01:32:38,669

Hopefully it never happens.

974

01:32:38,669 --> 01:32:39,529

Right.

975

01:32:39,770 --> 01:32:40,340

Yeah.

976

01:32:40,340 --> 01:32:43,172

That's the dream.

977

01:32:43,633 --> 01:32:46,936

And so, Nate, let's call it a show.

978

01:32:46,936 --> 01:32:49,438

I've already taken a lot of your time and energy.

979

01:32:49,438 --> 01:32:51,238

I'm guessing you need a coffee.

980

01:32:51,820 --> 01:32:51,940

yeah.

981

01:32:51,940 --> 01:32:53,661

I'll probably go get some tea after this.

982

01:32:53,661 --> 01:32:54,522

Maybe a Red Bull.

983

01:32:54,522 --> 01:32:56,544

We'll see how I'm feeling.

984

01:32:56,544 --> 01:32:58,025

This has been a great time.

985

01:32:58,025 --> 01:32:59,126

Yeah.

986

01:32:59,126 --> 01:33:00,306

It was great.

987

01:33:00,467 --> 01:33:02,943

I mean, of course, I'm going to ask you the last two questions.

988

01:33:02,943 --> 01:33:05,685

I escaped against the end of show.

989

01:33:05,685 --> 01:33:12,259

So first one, if you had unlimited time and resources, which problem would you try to

solve?

990

01:33:12,420 --> 01:33:13,500

Yeah.

991

01:33:13,500 --> 01:33:18,764

So I thought a lot about this because I've listened to the podcast for so long and I

contemplated every time.

992

01:33:18,764 --> 01:33:25,929

I feel like I always miss the other listeners' responses because I'm lost in my thoughts

about like, how would I answer this?

993

01:33:26,590 --> 01:33:31,573

And I think probably, I think I would want to do

994

01:33:33,101 --> 01:33:41,847

some work in like mental health, which is sort of my, the field that I grew up in, right?

995

01:33:41,888 --> 01:33:57,639

Like there's a lot of like open problems in like psychiatry, clinical psychology, both in

terms of like how we measure like what a person is experiencing, like mental illness in

996

01:33:57,639 --> 01:34:00,461

general, how we dissociate different types of.

997

01:34:00,577 --> 01:34:13,786

disorders and things like that, but then also in terms of treatment selection, as well as

like, your treatments like automated treatments that are maybe scaffolded through like

998

01:34:13,786 --> 01:34:20,531

apps first, or that have different types of models rather than just like the face to face

therapy model.

999

01:34:20,851 --> 01:34:28,736

And I think, yeah, if I had unlimited resources, unlimited time and funding, would be

exploring

Speaker:

01:34:30,069 --> 01:34:43,733

exploring kind of solutions to that, I guess solutions isn't even the right word, ways to

kind of approach mental health crisis and how to measure, both better measure and better

Speaker:

01:34:43,733 --> 01:34:46,534

get people into treatments that they need.

Speaker:

01:34:46,534 --> 01:34:54,996

And some of the work I was doing at AVO before was related to this, but it's a

surprisingly hard field to get funding for.

Speaker:

01:34:55,056 --> 01:34:58,117

Just because there's a lot of barriers.

Speaker:

01:34:59,637 --> 01:35:02,838

Working in like healthcare is a hard thing to navigate.

Speaker:

01:35:04,138 --> 01:35:11,500

And there's a lot of snake oil treatments out there that seem to suck up a lot of the

interest in funding.

Speaker:

01:35:11,640 --> 01:35:16,512

And so I think, you know, if I didn't have to worry about that, there'd be a lot of

interesting things to do.

Speaker:

01:35:16,512 --> 01:35:23,384

But yeah, that'd be that I think that would be what I would focus my energy on.

Speaker:

01:35:23,464 --> 01:35:27,765

Yeah, definitely a worthwhile quest.

Speaker:

01:35:28,005 --> 01:35:36,872

And since you never get the guest's answers, I think you're the first one to answer that.

Speaker:

01:35:37,774 --> 01:35:41,887

It goes with my Freudian sips mug here.

Speaker:

01:35:41,887 --> 01:35:45,880

It's my favorite.

Speaker:

01:35:47,762 --> 01:35:55,879

And second question, if you could have dinner with any great scientific mind, dead, alive

or fictional, who would it be?

Speaker:

01:35:56,071 --> 01:35:59,863

Yeah, this one was a lot harder for me to think about.

Speaker:

01:36:00,604 --> 01:36:04,306

But I came to what I think would be the answer.

Speaker:

01:36:05,007 --> 01:36:08,009

And so I'm going to pick a fictional person.

Speaker:

01:36:08,350 --> 01:36:22,749

And I don't know if you've read Foundation series or like watched the television series,

but Harry Seldon is the architect of what he calls psycho history, which is essentially

Speaker:

01:36:22,749 --> 01:36:25,020

this, science of

Speaker:

01:36:26,017 --> 01:36:32,349

predicting like mass behaviors like population level behaviors and he's like developed

this mathematical model.

Speaker:

01:36:32,349 --> 01:36:46,083

It allows him to predict, you know, thousands of years into the future how people will be

interacting and like, you know, saves the galaxy all that spoiler alert, but sort of.

Speaker:

01:36:46,403 --> 01:36:48,423

I'll leave some ambiguity, right?

Speaker:

01:36:49,064 --> 01:36:54,475

But I think that would be the person just because it's kind of an interesting concept.

Speaker:

01:36:54,661 --> 01:37:06,238

I think he's an interesting character and like, Psycho History sort of is like, given my

background, I'm just kind of interested in that whole concept.

Speaker:

01:37:06,238 --> 01:37:13,632

And so if someone were able to do that, I'd sure like to better understand how exactly

they would be doing it.

Speaker:

01:37:13,632 --> 01:37:15,693

Maybe there's phase involved, we'll see.

Speaker:

01:37:20,194 --> 01:37:21,854

Yeah, great answer.

Speaker:

01:37:22,094 --> 01:37:24,945

And here again, first time I hear that on the show.

Speaker:

01:37:24,945 --> 01:37:26,075

That's awesome.

Speaker:

01:37:26,075 --> 01:37:29,987

You should definitely put a reference to that show in the show notes.

Speaker:

01:37:29,987 --> 01:37:32,097

That sounds like fun.

Speaker:

01:37:32,097 --> 01:37:34,718

I'm definitely going to check that out.

Speaker:

01:37:36,579 --> 01:37:39,899

And I'm sure a lot of listeners also will.

Speaker:

01:37:40,420 --> 01:37:41,780

yeah, definitely.

Speaker:

01:37:42,280 --> 01:37:43,481

Well, awesome.

Speaker:

01:37:43,481 --> 01:37:44,811

Thanks again.

Speaker:

01:37:44,811 --> 01:37:46,052

Nate, that was a blast.

Speaker:

01:37:46,052 --> 01:37:47,783

I really learned a lot.

Speaker:

01:37:48,063 --> 01:37:49,895

And that was great.

Speaker:

01:37:49,895 --> 01:37:55,548

think you have an updated episode about model averaging, model comparison.

Speaker:

01:37:55,569 --> 01:37:59,361

I hope, Stéphane, you were happy with how it turned out to be.

Speaker:

01:37:59,361 --> 01:38:01,653

Me too.

Speaker:

01:38:01,653 --> 01:38:09,359

Well, as usual, I'll put resources and a link to your website in the show notes for those

who want to dig deeper.

Speaker:

01:38:09,359 --> 01:38:13,331

Thank you again, Nate, for taking the time and being on this show.

Speaker:

01:38:14,165 --> 01:38:15,556

Awesome, yeah, thanks a lot for having me.

Speaker:

01:38:15,556 --> 01:38:20,208

I had a blast and yeah, I forward to being a continued listener.

Speaker:

01:38:20,769 --> 01:38:27,413

Yeah, thank you so much for listening to the show for so many years.

Speaker:

01:38:27,413 --> 01:38:33,676

Definitely means a lot to And you're welcome back anytime in the show, course.

Speaker:

01:38:34,297 --> 01:38:35,637

Yeah, just let me know.

Speaker:

01:38:39,117 --> 01:38:42,820

This has been another episode of Learning Bayesian Statistics.

Speaker:

01:38:42,820 --> 01:38:53,309

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbaystats .com for more resources about today's topics, as well as access to more

Speaker:

01:38:53,309 --> 01:38:57,392

episodes to help you reach true Bayesian state of mind.

Speaker:

01:38:57,392 --> 01:38:59,354

That's learnbaystats .com.

Speaker:

01:38:59,354 --> 01:39:02,216

Our theme music is Good Bayesian by Baba Brinkman.

Speaker:

01:39:02,216 --> 01:39:04,198

Fit MC Lance and Meghiraan.

Speaker:

01:39:04,198 --> 01:39:07,360

Check out his awesome work at bababrinkman .com.

Speaker:

01:39:07,360 --> 01:39:08,555

I'm your host.

Speaker:

01:39:08,555 --> 01:39:09,606

Alex Andorra.

Speaker:

01:39:09,606 --> 01:39:13,709

can follow me on Twitter at Alex underscore Andorra, like the country.

Speaker:

01:39:13,709 --> 01:39:21,014

You can support the show and unlock exclusive benefits by visiting Patreon .com slash

LearnBasedDance.

Speaker:

01:39:21,014 --> 01:39:23,396

Thank you so much for listening and for your support.

Speaker:

01:39:23,396 --> 01:39:25,688

You're truly a good Bayesian.

Speaker:

01:39:25,688 --> 01:39:29,200

Change your predictions after taking information in.

Speaker:

01:39:29,200 --> 01:39:35,873

And if you're thinking I'll be less than amazing, let's adjust those expectations.

Speaker:

01:39:35,873 --> 01:39:49,029

Let me show you how to be a good Bayesian Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Previous post
Next post