Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag 😉

Takeaways:

  • User experience is crucial for the adoption of Stan.
  • Recent innovations include adding tuples to the Stan language, new features and improved error messages.
  • Tuples allow for more efficient data handling in Stan.
  • Beginners often struggle with the compiled nature of Stan.
  • Improving error messages is crucial for user experience.
  • BridgeStan allows for integration with other programming languages and makes it very easy for people to use Stan models.
  • Community engagement is vital for the development of Stan.
  • New samplers are being developed to enhance performance.
  • The future of Stan includes more user-friendly features.

Chapters:

00:00 Introduction to the Live Episode

02:55 Meet the Stan Core Developers

05:47 Brian Ward’s Journey into Bayesian Statistics

09:10 Charles Margossian’s Contributions to Stan

11:49 Recent Projects and Innovations in Stan

15:07 User-Friendly Features and Enhancements

18:11 Understanding Tuples and Their Importance

21:06 Challenges for Beginners in Stan

24:08 Pedagogical Approaches to Bayesian Statistics

30:54 Optimizing Monte Carlo Estimators

32:24 Reimagining Stan’s Structure

34:21 The Promise of Automatic Reparameterization

35:49 Exploring BridgeStan

40:29 The Future of Samplers in Stan

43:45 Evaluating New Algorithms

47:01 Specific Algorithms for Unique Problems

50:00 Understanding Model Performance

54:21 The Impact of Stan on Bayesian Research

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke and Robert Flannery.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
Speaker:

This episode is the first of its kind.

2

00:00:07,874 --> 00:00:18,194

Welcome to the very first live episode of the Learning Visions Statistics podcast recorded

at STANCON on September 10, 2024.

3

00:00:18,194 --> 00:00:25,614

Again, I want to thank the whole STANCON committee for their help, trust and support in

organizing this event.

4

00:00:25,614 --> 00:00:27,968

I surely had a blast and I hope

5

00:00:27,968 --> 00:00:28,838

Everybody did.

6

00:00:28,838 --> 00:00:37,342

In this episode, you will hear not about one, but two StandCore developers, Charles

Marcossian and Brian Ward.

7

00:00:37,342 --> 00:00:44,866

They'll tell us all about Stand's future as well as give us some practical advice for

better statistical modeling.

8

00:00:44,866 --> 00:00:49,368

And of course, there is a Q &A session with the audience at the end.

9

00:00:49,368 --> 00:00:53,990

This is Learning Basics Statistics, episode 118.

10

00:00:59,181 --> 00:01:19,522

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,

the projects, and the people who make it possible.

11

00:01:19,522 --> 00:01:21,772

I'm your host, Alex Andorra.

12

00:01:21,772 --> 00:01:25,324

You can follow me on Twitter at alex-underscore-andorra.

13

00:01:25,324 --> 00:01:26,154

like the country.

14

00:01:26,154 --> 00:01:30,376

For any info about the show, learnbasedats.com is Laplace to be.

15

00:01:30,376 --> 00:01:37,558

Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on

Patreon, everything is in there.

16

00:01:37,558 --> 00:01:39,488

That's learnbasedats.com.

17

00:01:39,488 --> 00:01:50,522

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

18

00:01:50,522 --> 00:01:52,022

See you around, folks.

19

00:01:52,022 --> 00:01:53,905

and best patient wishes to you all.

20

00:01:53,905 --> 00:02:00,998

And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can

help bring them to life.

21

00:02:00,998 --> 00:02:04,102

Check us out at pimc-labs.com.

22

00:02:07,022 --> 00:02:13,967

Hello my dear patients, today I want to welcome a new patron in the LearnBasedDance

family.

23

00:02:13,967 --> 00:02:19,431

Thank you so much, Rob Flannery, your support truly makes this show possible.

24

00:02:19,431 --> 00:02:29,137

I can't wait to talk to you in the Slack channel and hope that you will enjoy the

exclusive merch coming your way very soon.

25

00:02:29,422 --> 00:02:32,752

Before we start, I have great news for you.

26

00:02:32,752 --> 00:02:44,242

Because if you like live shows, I want to have two new live shows of LBS coming up on

November 7 and November 8 at Piedata, New York.

27

00:02:44,302 --> 00:02:55,566

So if you want to be part of the live experience, join the Q &A's and connect with the

speakers and myself, and also get some pretty cool stickers, well...

28

00:02:55,566 --> 00:03:02,326

You can get your ticket already at pine data dot org slash NYC 2024.

29

00:03:02,446 --> 00:03:04,446

can't wait to see you there.

30

00:03:04,446 --> 00:03:06,790

OK, on to the show now.

31

00:03:09,270 --> 00:03:10,492

So, welcome.

32

00:03:10,492 --> 00:03:12,284

Thank you so much for being here.

33

00:03:12,284 --> 00:03:24,770

You are going to the immense honor and privilege to be the first ever live audience of the

Learning Basics and Statistics podcast.

34

00:03:27,266 --> 00:03:28,586

Thank you.

35

00:03:29,947 --> 00:03:34,839

Of course, as usual, a huge thank you to all the organizers of StandCon.

36

00:03:34,839 --> 00:03:36,669

Charles, of course, thank you so much.

37

00:03:36,669 --> 00:03:38,750

know you worked a lot.

38

00:03:39,070 --> 00:03:42,712

Michael also who organized all of that.

39

00:03:42,712 --> 00:03:47,193

So I think you can give them a big round of applause.

40

00:03:53,150 --> 00:03:56,472

Okay, so let's get started.

41

00:03:56,753 --> 00:04:01,217

So for those of you who don't know me, I'm Alex Endora.

42

00:04:01,337 --> 00:04:05,330

I am an open source developer.

43

00:04:05,330 --> 00:04:07,963

I am actually a PMC core developer.

44

00:04:07,963 --> 00:04:10,065

Am I allowed to say those words here?

45

00:04:10,065 --> 00:04:11,205

That's fine.

46

00:04:11,606 --> 00:04:13,638

Don't worry.

47

00:04:13,638 --> 00:04:21,217

Yes, and very recently started as the senior applied scientist at the Miami Marlins.

48

00:04:21,217 --> 00:04:23,578

So if you're ever in Miami, let me know.

49

00:04:24,638 --> 00:04:35,861

And today we are gonna talk, and yeah, no, of course I am the host and creator of the

Learning Patient Statistics podcast, which is the best show about patient stats.

50

00:04:35,861 --> 00:04:39,682

I think we can say that confidently because it's the only one.

51

00:04:41,823 --> 00:04:43,384

it's not that hard.

52

00:04:43,464 --> 00:04:45,814

But today we have amazing guests with us.

53

00:04:45,814 --> 00:04:50,986

We're gonna talk about everything Stan, today's the nerd panel.

54

00:04:51,070 --> 00:04:59,776

anything you wanted to know about Stan, about samplers, about all the technical stuff

behind Stan.

55

00:04:59,857 --> 00:05:04,340

Why does it take so long to have inline there, for instance, know, stuff like that.

56

00:05:05,642 --> 00:05:06,522

You can ask that.

57

00:05:06,522 --> 00:05:09,985

It's going to be like the last 10 minutes of the show, I think.

58

00:05:09,985 --> 00:05:15,409

But before that, we're going to talk with Brian and Charles.

59

00:05:17,609 --> 00:05:26,114

So I'm going to be without the mic that gives to the room for the rest of the show so that

you can hear from the guys mainly.

60

00:05:26,515 --> 00:05:28,816

So let's start with Brian.

61

00:05:28,897 --> 00:05:35,281

So Brian Ward, you were a Standcore developer, if I understood correctly.

62

00:05:36,482 --> 00:05:43,096

Can you first give you a bit of a background, the origin story of Brian?

63

00:05:43,096 --> 00:05:44,767

How did you end up doing what you're doing?

64

00:05:44,767 --> 00:05:47,231

Because it seems to me that you're doing a lot of

65

00:05:47,231 --> 00:05:54,125

software engineering thing, which is a priori quite far from the Bayesian world.

66

00:05:54,125 --> 00:05:57,126

So how did you end up doing what you're doing today?

67

00:05:57,126 --> 00:06:04,130

Yeah, so I majored in computer science and I sort of came into this from a very software

development angle.

68

00:06:04,570 --> 00:06:08,713

So I sort of was always interested in how things work.

69

00:06:08,713 --> 00:06:11,214

So I learned to program and then I was like, well, how programming languages work?

70

00:06:11,214 --> 00:06:15,976

So I learned about compilers and then I stopped before going any deeper because there are

dragons down there.

71

00:06:17,131 --> 00:06:24,337

But as part of my studies, I started working on a project with a couple of my professors

that was about Stan.

72

00:06:24,337 --> 00:06:33,766

And they were mostly interested in Stan because in their words, it was the probabilistic

programming language that had the most thorough formal documentation of the language and

73

00:06:33,766 --> 00:06:34,466

its semantics.

74

00:06:34,466 --> 00:06:39,240

They really liked that they could form an abstract model of the Stan language.

75

00:06:39,240 --> 00:06:42,413

And so that was my first time ever using a probabilistic programming language.

76

00:06:42,413 --> 00:06:44,895

It was really coming in from that angle.

77

00:06:44,949 --> 00:06:55,022

And then since 2021, I've been working a lot on the STAND compiler, but then also just on,

like you said, general software engineering for the different Python libraries and trying

78

00:06:55,022 --> 00:06:59,893

to improve the installation process on systems like Windows and that sort of thing.

79

00:07:00,733 --> 00:07:01,674

OK.

80

00:07:02,154 --> 00:07:07,055

So we'll get back to that because I think there are a lot of interesting threads here.

81

00:07:07,055 --> 00:07:09,816

But first, let's switch to Charles.

82

00:07:09,816 --> 00:07:13,693

So maybe for the rest of the audience, Charles was already.

83

00:07:13,693 --> 00:07:17,154

in the podcast, he's got the classic episode.

84

00:07:17,154 --> 00:07:22,906

So if you're really interested in Charles' background, you can go and check out his

episode.

85

00:07:22,906 --> 00:07:28,817

But maybe just for now, if you can quickly tell us who you are, how you ended up doing

that.

86

00:07:29,796 --> 00:07:33,138

Yes, I should mention that I am an understudy.

87

00:07:33,138 --> 00:07:38,840

were actually two other stand developers we were hoping to have on this panel.

88

00:07:38,840 --> 00:07:43,041

because of circumstances, I ended up being here.

89

00:07:43,041 --> 00:07:50,686

I'm in very good company and I have a lot of thoughts about the future of Stan, which is

the topic of this conversation.

90

00:07:50,686 --> 00:07:58,691

But essentially, I've been a Stan developer for eight years now.

91

00:07:59,152 --> 00:08:13,033

And I started when I was working in biotech in pharmacometrics where Stan was up and

coming, but it lacked certain features to be used in pharmacometrics modeling.

92

00:08:13,033 --> 00:08:21,180

Notably, know, support for ODE systems, features to model clinical trials.

93

00:08:21,180 --> 00:08:34,221

So my first project for Stan was developing an extension of Stan called Torsten, but also

in the process developed some features that directly appeared in Stan.

94

00:08:34,221 --> 00:08:39,295

For example, the matrix exponential, which is used to solve linear ODE's, the algebraic

solvers.

95

00:08:40,217 --> 00:08:41,197

And then,

96

00:08:41,197 --> 00:08:53,117

I became a statistician, I pursued a PhD in statistics and I continued developing certain

features firsthand, kind of in that theme of implicit functions.

97

00:08:53,117 --> 00:08:55,837

And I think we'll talk a little bit about that.

98

00:08:56,357 --> 00:09:07,497

Nowadays, what I am is a research fellow, which is a glorified postdoc at the Flatiron

Institute, where I'm actually a colleague with Brian.

99

00:09:07,497 --> 00:09:10,157

And I mostly do research.

100

00:09:10,157 --> 00:09:22,037

around Bayesian computation, so that includes Markov chain Monte Carlo, variational

inference, and thinking about probabilistic programming languages today, tomorrow, but

101

00:09:22,037 --> 00:09:25,257

also maybe in five or 10 years, what these might look like.

102

00:09:25,937 --> 00:09:27,517

Yeah, thanks, Charles.

103

00:09:27,957 --> 00:09:33,497

Quick legal announcement that I forgot, of course.

104

00:09:33,857 --> 00:09:37,757

For the questions, we're going to record your voice.

105

00:09:37,757 --> 00:09:39,575

So if you ask a question, you're

106

00:09:39,575 --> 00:09:41,586

consenting to being recorded.

107

00:09:41,586 --> 00:09:50,289

If you don't want your voice to be recorded, just come ask the question afterwards or find

a buddy who is willing to ask the question for you.

108

00:09:50,630 --> 00:09:52,330

And that will be all fine.

109

00:09:52,630 --> 00:09:53,431

So that's that.

110

00:09:53,431 --> 00:09:59,913

Also, write down your questions because we're going to have the Q &A at the end of the

episode.

111

00:10:00,193 --> 00:10:01,774

So let's continue.

112

00:10:01,774 --> 00:10:04,935

Maybe with like that's for both of you.

113

00:10:05,476 --> 00:10:09,537

I'm wondering before we talk about the future,

114

00:10:09,825 --> 00:10:22,389

You guys work with Stan all the time, so you do a lot of things, but what has been your

most exciting recent project involving Stan, of course?

115

00:10:24,107 --> 00:10:25,337

I can go first.

116

00:10:26,138 --> 00:10:35,610

So this is a bit further ago, but one of the first real major, major win for me was adding

tuples to the language.

117

00:10:35,610 --> 00:10:39,119

it's a slightly more advanced type than it previously appeared in Stan.

118

00:10:39,119 --> 00:10:46,083

It had a lot of implementation difficulty, but it was a really big change to the language

in the compiler that finally made it in.

119

00:10:46,404 --> 00:10:51,489

But more recently, working directly on Stan, I've been working on

120

00:10:51,489 --> 00:11:01,225

been trying to add features to try to make it easier to do some of the things that are

built into Stan, especially related to the constraints and the transforms directly in

121

00:11:01,225 --> 00:11:01,665

Stan.

122

00:11:01,665 --> 00:11:09,720

So trying to take some of the magic that's built in out and let you be able to do things

yourself that work much closer to that.

123

00:11:09,720 --> 00:11:17,544

And that's been interesting to think about how to make Stan a language that is easier to

extend for newer people.

124

00:11:17,544 --> 00:11:19,845

this next release will have a

125

00:11:19,905 --> 00:11:27,808

functions that make it a little easier to write your own user-defined transforms that do

the right thing during optimization, for example.

126

00:11:27,808 --> 00:11:28,869

Hmm, okay.

127

00:11:28,869 --> 00:11:29,369

that's cool.

128

00:11:29,369 --> 00:11:35,191

Can you maybe give an example about such a function that people could use in a model?

129

00:11:35,191 --> 00:11:35,591

Sure.

130

00:11:35,591 --> 00:11:44,245

So one thing you might want to do is you might want a simplex parameter, but you want,

because you have some understanding of the posterior geometry, you want an alternative

131

00:11:44,245 --> 00:11:45,155

parameterization.

132

00:11:45,155 --> 00:11:49,707

You want to use softmax or you want to use some other thing than what's built into Stan.

133

00:11:49,749 --> 00:11:55,330

And you can do this right now and it will work almost the same in almost all of the cases.

134

00:11:56,311 --> 00:11:59,605

going forward, we're trying to make it work the same in all of the cases.

135

00:11:59,605 --> 00:12:02,032

We're trying to sort of cover off those last things.

136

00:12:02,032 --> 00:12:11,015

in particular, if you're finding a maximum likelihood estimate, that is done without the

Jacobian adjustment for the change of variables there.

137

00:12:11,935 --> 00:12:18,277

But for the built-in types in STAND, but right now there's no way to have that also happen

for your custom transforms.

138

00:12:18,277 --> 00:12:20,037

But there will be going forward.

139

00:12:20,313 --> 00:12:24,175

Okay, that's really cool.

140

00:12:24,935 --> 00:12:36,000

so I have to admit that a lot of my recent work has been more Stan-adjacent rather than

specific contributions to Stan.

141

00:12:39,371 --> 00:12:55,050

And so I could talk about that, but maybe one of the features that we are hoping to

release soon and that I developed a few years ago, I prototyped a few years ago, was we

142

00:12:55,050 --> 00:12:59,072

wanted to build a nested Laplace approximation inside of Stan.

143

00:13:01,193 --> 00:13:08,493

And actually, we developed one and we had a prototype in 2020.

144

00:13:08,677 --> 00:13:12,759

So that already goes back and we published a paper about that.

145

00:13:12,780 --> 00:13:23,627

And then another year or two later when I wrote my PhD thesis, I had a more thorough

prototype that also released and then we kind of got stuck.

146

00:13:24,768 --> 00:13:38,613

And I can talk a little bit about that, but essentially Steve Braunder who was supposed to

join us today, had something came up, hopefully he'll be there in the next few days.

147

00:13:38,613 --> 00:13:52,687

at StenCon has really been pushing the C++ code and the development and we have this idea

that maybe by the next Sten release we'll actually have that integrated Laplace

148

00:13:52,687 --> 00:13:58,058

approximation and we'll make it available to the users.

149

00:13:58,058 --> 00:14:07,927

And of course there are a lot of interesting things in moving parts that are happening

around these features both from a technical

150

00:14:07,927 --> 00:14:09,057

point of view.

151

00:14:09,157 --> 00:14:16,559

So the automatic differentiation that we had to deploy is, I think, very interesting, very

challenging.

152

00:14:18,380 --> 00:14:26,452

Also, the ways in which, what are the features that we put in our integrated Laplace?

153

00:14:26,452 --> 00:14:36,885

So I don't think it's going to be as performant as the integrated Laplace approximation

that's implemented in Inla.

154

00:14:37,259 --> 00:14:48,601

and I can discuss a little bit what are some of the features we lacked, but we also

focused on what are some unique things that having this integrated Laplace approximation

155

00:14:48,601 --> 00:14:53,315

in Stan can give to the users in terms of modeling capabilities.

156

00:14:53,916 --> 00:14:55,807

And those are things I'm excited about.

157

00:14:55,807 --> 00:15:04,060

And there are going to be a few challenges about using this approximate algorithms, just

as they are whenever you use an approximate algorithm.

158

00:15:04,060 --> 00:15:06,911

And that's going to motivate, you know,

159

00:15:07,542 --> 00:15:22,650

new elements of a Bayesian workflow, new diagnostics, new checks that will have to be

semi-automated, that will have to be very well documented, and that will also need to be

160

00:15:22,650 --> 00:15:23,450

demonstrated.

161

00:15:23,450 --> 00:15:32,275

These are all the pieces you need for users to use an algorithm effectively.

162

00:15:32,275 --> 00:15:35,405

And that's part of the journey between

163

00:15:35,405 --> 00:15:37,126

We have a prototype.

164

00:15:37,167 --> 00:15:45,513

We can publish this in what's considered a top machine learning conference, the paper

appeared in NeurIPS, versus.

165

00:15:47,181 --> 00:15:49,742

I can almost say we have something that's stand worthy.

166

00:15:52,763 --> 00:15:55,364

And the requirements are a little bit orthogonal.

167

00:15:55,364 --> 00:16:03,448

So it's not like one is superior, but there's a lot of extra work that needs to happen.

168

00:16:03,448 --> 00:16:06,689

And that will continue to happen.

169

00:16:06,689 --> 00:16:16,439

Because one of the, I think, open question is when we make a new feature available, how

much responsibility

170

00:16:16,439 --> 00:16:21,391

do we take and how much responsibility do we give to the users?

171

00:16:23,552 --> 00:16:27,103

So maybe those are some of the topics that we can dive into.

172

00:16:27,103 --> 00:16:39,959

But one thing that I'll say is the tuples that Brian mentioned, that was one of the key

technical components that we needed to develop in order to have an interface that's

173

00:16:39,959 --> 00:16:43,820

user-friendly enough to use this integrated Laplace.

174

00:16:44,421 --> 00:16:45,653

Yeah, I love that because

175

00:16:45,653 --> 00:16:54,019

I don't know for you folks, but me, if I hear, yeah, we integrated two poles, I don't

think it's that important.

176

00:16:54,019 --> 00:17:07,428

But then when you talk to the guys who actually code the stuff and implement that, it's a

building block that then unlocks a ton of incredible features and new stuff for users.

177

00:17:07,428 --> 00:17:11,051

Yeah, and we can make that very, very concrete.

178

00:17:11,051 --> 00:17:11,961

Yeah, for sure.

179

00:17:11,961 --> 00:17:14,072

Actually, to give an example.

180

00:17:15,569 --> 00:17:18,470

Well, Brian, how would you define a tuple?

181

00:17:21,853 --> 00:17:23,853

So in type, no, I'm joking.

182

00:17:24,374 --> 00:17:28,836

So a tuple is essentially just a grouping of different types of things.

183

00:17:28,836 --> 00:17:34,719

So the simplest one to think of is like a point in R2, like a xy coordinate.

184

00:17:34,719 --> 00:17:37,881

It's just a tuple of a real number and another real number.

185

00:17:37,881 --> 00:17:42,374

But the nice thing about tuples as compared to like an array is that those don't have to

be the same type.

186

00:17:42,374 --> 00:17:45,195

So for example, in more recent versions of Stan,

187

00:17:45,281 --> 00:17:56,231

there is a function called eigen decompose which gives you a matrix of the eigenvectors

and a vector of the eigenvalues both back to you at the same time.

188

00:17:56,231 --> 00:18:04,999

And so this actually cuts the amount of computation that has to be done in half because in

previous versions you had to call the eigenvectors function and the eigenvalues function

189

00:18:04,999 --> 00:18:10,074

separately and they were repeating some work and now it can just give you this object that

has both at once.

190

00:18:10,074 --> 00:18:11,305

And so that's like.

191

00:18:11,369 --> 00:18:18,143

One of the really useful things of tuples is it lets you have a principal way to talk

about a combination of different types like that.

192

00:18:18,143 --> 00:18:18,853

Yeah, yeah.

193

00:18:18,853 --> 00:18:26,957

And so one place where having this grouping of different types is very useful is in

functionals.

194

00:18:27,218 --> 00:18:28,678

So what's an example of a functional?

195

00:18:28,678 --> 00:18:31,540

DoD solver and stand, it's a functional.

196

00:18:32,681 --> 00:18:39,564

One of its arguments is a function, so the function that defines the right-hand side of

your differential equation.

197

00:18:39,564 --> 00:18:41,065

And then you need to pass.

198

00:18:41,065 --> 00:18:43,767

arguments to that function.

199

00:18:43,927 --> 00:18:51,814

And of course, the user is specifying the function, and so they're going to specify what

are the arguments that we pass to that function.

200

00:18:51,814 --> 00:18:55,997

There was this time where this function needed to have a strict signature.

201

00:18:55,997 --> 00:19:10,769

So we told the user, you're first going to pass the time, the state, then the parameters,

then the real integers, and then the real data and the integer data.

202

00:19:10,821 --> 00:19:12,162

And you have the strict format.

203

00:19:12,162 --> 00:19:20,825

so basically, those are just way of taking the arguments, packing them into a specific

structure, and then inside the OD, you unpack them.

204

00:19:21,125 --> 00:19:36,752

And so not only was this tedious, it can lead you to make your code less efficient if

you're not being careful about distinguishing what's a parameter and what's a data point.

205

00:19:36,752 --> 00:19:39,653

And one experience of that

206

00:19:40,193 --> 00:19:47,999

I had collaborating with applied people, with epidemiologists, so with Julien Rioux.

207

00:19:47,999 --> 00:19:51,281

This was during the pandemic, during the COVID-19.

208

00:19:51,321 --> 00:20:05,871

At some point, Julien reached out to the stand development team and he said he's

developing this really cool model, but right now it takes two, three days to fit, right?

209

00:20:05,871 --> 00:20:06,952

Something like that.

210

00:20:06,952 --> 00:20:09,203

And we're not at the...

211

00:20:09,653 --> 00:20:12,054

level of complexity that we want to be at.

212

00:20:12,375 --> 00:20:20,259

And so I have to give really most of the credit to Ben Bales, who was also a stand

developer at the time.

213

00:20:20,259 --> 00:20:27,943

And we took a look at how the ODE was implemented and how it was coded up and how the

different types were being handled.

214

00:20:27,943 --> 00:20:33,686

And we realized that way more of the arguments that were being passed were parameters than

was necessary.

215

00:20:33,686 --> 00:20:38,588

And once you correct for that, the running time of the model went from two, three days to

two hours.

216

00:20:39,211 --> 00:20:49,224

So not only is that much faster and that's good in terms of reproducibility, that also

means you can then keep developing the model and go to something more complicated.

217

00:20:49,344 --> 00:20:59,627

So having this kind of two poles, well really what it gave us was variational, what's

called variadic arguments, sorry.

218

00:20:59,627 --> 00:21:05,009

That was a big step actually, where now you don't have those strict signatures when you

pass the functionals.

219

00:21:05,009 --> 00:21:06,913

People can really pass different things.

220

00:21:06,913 --> 00:21:17,056

Now for the integrated Laplace, so I realize we haven't really defined what it is, but

basically what I'll say is that there are two functionals that you need to pass.

221

00:21:17,056 --> 00:21:23,078

One is you're defining a likelihood function and the other one is you're defining a

covariance function.

222

00:21:23,078 --> 00:21:33,380

And so we want the users to be able to use variadic arguments for both those functions

that they're defining.

223

00:21:33,380 --> 00:21:35,777

So they're not construed by types.

224

00:21:35,777 --> 00:21:42,002

That way it's not tedious, it's not error prone, or it's not prone to inefficiencies.

225

00:21:42,002 --> 00:21:54,272

And that's why those two poles, to make the code user friendly, to probably decrease the

compute time that users will spend on this algorithm.

226

00:21:54,272 --> 00:21:56,814

That's why that kind of stuff is important.

227

00:21:56,814 --> 00:22:00,517

The power users, they don't need it.

228

00:22:00,517 --> 00:22:02,699

They can handle the strict signatures.

229

00:22:02,699 --> 00:22:04,800

I handle the strict signatures.

230

00:22:05,515 --> 00:22:06,308

No problem.

231

00:22:06,308 --> 00:22:10,220

But once you start using other probabilistic programming languages,

232

00:22:12,523 --> 00:22:23,017

You realize that one of the big strengths of Stan is the attention it gives to users, to

API, how mindful it is from the users.

233

00:22:23,438 --> 00:22:30,701

Other languages, you can tell that it really feels like sometimes they're written for

software engineers.

234

00:22:30,701 --> 00:22:35,463

And the software engineers are the ones who are going to be the best ones at using those

languages.

235

00:22:37,164 --> 00:22:40,641

But I think that that's one of the strengths of Stan.

236

00:22:40,641 --> 00:22:50,285

and that some of the innovations are maybe gonna be less technical or algorithmic,

although those exist, and maybe we'll have time to talk about it, but actually making this

237

00:22:50,285 --> 00:22:53,546

more user-friendly, less error-prone, less inefficiency-prone.

238

00:22:53,546 --> 00:22:58,258

Yeah, and that definitely comes up, and I think it will come up whenever we're working on

new features for Stan.

239

00:22:58,258 --> 00:23:00,359

There's always sort of two users we have in our head.

240

00:23:00,359 --> 00:23:10,213

There's the user who is already at the limit of what Stan can do and wants to fit the next

biggest model, and how can we help that user, but also the user of like, you

241

00:23:10,241 --> 00:23:16,083

they have a relatively small model that they just can't figure out right now and can we

make that user's life easier too?

242

00:23:16,083 --> 00:23:24,605

sometimes they're actually sort fighting each other, but usually we can find features that

actually make both of their lives better, which is like the ideal circumstance.

243

00:23:24,605 --> 00:23:35,068

But by the way, kind of in the spirit of that, apparently most of our Stan users are BRMS

users.

244

00:23:35,568 --> 00:23:37,348

I think that's established, right?

245

00:23:39,533 --> 00:23:47,113

BRMS really gives you this beautiful syntax that people can play with, that people can

reason with.

246

00:23:47,453 --> 00:23:49,873

Personally, I like the Stan language.

247

00:23:49,873 --> 00:23:52,493

That syntax is a bit more explicit.

248

00:23:52,493 --> 00:24:02,113

But even that syntax in the Stan model is a simplification of what Stan is doing under the

hood.

249

00:24:02,113 --> 00:24:03,593

I'll give you a simple example.

250

00:24:03,593 --> 00:24:06,553

You know those tilde statements that you have in the model block, right?

251

00:24:06,553 --> 00:24:07,989

That's because

252

00:24:08,269 --> 00:24:13,634

You know, people like Andrew Galman like reasoning about models in a data-generated

fashion, right?

253

00:24:13,634 --> 00:24:21,050

But really, you know, what's going on under the hood is we're incrementing a log

probability density, right?

254

00:24:21,050 --> 00:24:36,312

So different users function with different level of abstractions, depending on whether

they're statisticians or, you know, more software engineering, maybe ML-oriented people,

255

00:24:36,312 --> 00:24:37,223

or maybe

256

00:24:37,223 --> 00:24:42,734

scientists who primarily reason about covariates, right?

257

00:24:42,734 --> 00:24:47,816

That's where I see one of the big roles that BRMS is playing.

258

00:24:47,816 --> 00:25:00,439

And we need a way that's maintainable, that's, you know, avoid compromises, you know, to

kind of like cater to these different users.

259

00:25:00,439 --> 00:25:07,149

And in fact, we should talk about BridgeStand and a new community of users we're hoping to

reach with.

260

00:25:07,149 --> 00:25:09,920

withstand maybe at some point.

261

00:25:09,920 --> 00:25:11,991

Yeah, I'll add that to the notes.

262

00:25:12,052 --> 00:25:13,562

Good, good.

263

00:25:13,562 --> 00:25:15,013

Yeah, so many questions.

264

00:25:15,013 --> 00:25:16,354

Thank you so much, guys.

265

00:25:16,354 --> 00:25:20,376

think, yeah, something I'd like to pick up.

266

00:25:20,376 --> 00:25:22,877

We'll get back to Inla also at some point.

267

00:25:22,877 --> 00:25:28,180

think it's going to be like the, how do you say, chirurgie in English?

268

00:25:28,180 --> 00:25:29,181

The thread.

269

00:25:29,181 --> 00:25:31,082

The thread, thank you.

270

00:25:31,902 --> 00:25:33,763

The red thread, you can say that.

271

00:25:33,763 --> 00:25:34,564

I don't know.

272

00:25:34,564 --> 00:25:36,104

So it's going to be the thread.

273

00:25:37,335 --> 00:25:47,051

Talking a bit more about the beginners you were talking about and the user who is trying

to get his model work but cannot figure it out yet.

274

00:25:48,493 --> 00:25:58,930

Do you see a common difficulty that these kind of users are having lately, maybe in the

stand forums, things like that?

275

00:25:58,930 --> 00:26:06,285

And maybe you can tell them how to use that right now or maybe tell us what you guys are

doing.

276

00:26:06,291 --> 00:26:11,003

in the coming month to address that kind of obstacles.

277

00:26:11,463 --> 00:26:14,868

I think there are two, and they're sort of different.

278

00:26:14,868 --> 00:26:24,729

So I think a lot of users who are coming from more traditional like R or Python and are

trying to write Stan themselves for the first time, the difficulty of just having a

279

00:26:24,729 --> 00:26:31,022

compiled language at all, both in terms of the extra installation steps, but then also

like dealing with static typing.

280

00:26:31,022 --> 00:26:34,443

And if you're not used to sort of thinking about variables in this way.

281

00:26:34,599 --> 00:26:44,836

And so there are things we've talked about of trying to work on that, but a lot of what

I've invested in is just trying to improve the error messages the compiler gives you and

282

00:26:44,836 --> 00:26:51,801

trying to have them less be like what a compiler engineer knows went wrong and make it

more like what you think went wrong.

283

00:26:52,582 --> 00:27:03,085

But I think the second class that I see, and this is sort of going back to Charles's

point, is I think we have a lot of users who will use a tool like BRMS or Rstan Arm.

284

00:27:03,085 --> 00:27:07,125

and it will get them as far as it gets them and then they want to go a bit further.

285

00:27:07,125 --> 00:27:14,625

But I think the issue is if they've never written any stand code at that point, they ask

BRMS, hey, can you give me your stand code?

286

00:27:14,625 --> 00:27:20,385

And they're given this model that would have taken them several months to write themselves

and now they have no hope.

287

00:27:20,905 --> 00:27:27,165

They're starting off in the deep end already because they already have a very powerful

model that they just want to tune one bit further.

288

00:27:27,165 --> 00:27:31,845

And that's a much harder thing, both in terms of

289

00:27:31,935 --> 00:27:34,947

Software, also pedagogically, I don't know how to handle that.

290

00:27:34,947 --> 00:27:37,068

I don't know if you have more.

291

00:27:38,769 --> 00:27:42,271

I think a bit less about beginners.

292

00:27:42,831 --> 00:27:50,436

No, no, okay, okay, so let me, let me nuance that a little bit.

293

00:27:50,436 --> 00:27:59,300

So I teach workshops, I've had opportunities to teach.

294

00:28:00,371 --> 00:28:12,846

And actually, I think about some fundamental questions that a beginner is likely to ask,

but for which we don't have great answers to.

295

00:28:13,487 --> 00:28:15,307

And I'll give you one example.

296

00:28:15,427 --> 00:28:19,969

For how many iterations should we run Markov chain Monte Carlo?

297

00:28:20,089 --> 00:28:20,969

Right?

298

00:28:21,190 --> 00:28:28,873

That's an elementary question, and it's not an easy one to answer.

299

00:28:28,873 --> 00:28:34,405

especially if you start digging and thinking about what is the optimal length of a Markov

chain?

300

00:28:34,405 --> 00:28:38,016

What is the optimal length of a warm-up phase, of a sampling phase?

301

00:28:38,016 --> 00:28:47,398

What is the number of Markov chains that I should run given some compute that's available

to me?

302

00:28:48,638 --> 00:28:58,251

And then you get into a more fundamental question, which is what is the precision that

people need from their Monte Carlo estimators?

303

00:28:58,343 --> 00:29:03,316

So I asked an audience of scientists, well, what effective sample size do you need?

304

00:29:03,897 --> 00:29:07,419

What summaries of the posterior distribution do you need?

305

00:29:07,419 --> 00:29:15,424

Are you really interested in the expectation value, or do you need the variance, or maybe

you need these quantiles or these other quantiles?

306

00:29:16,185 --> 00:29:19,248

And we have some unfortunate terminology.

307

00:29:19,248 --> 00:29:21,869

People say we're computing the posterior.

308

00:29:23,050 --> 00:29:25,051

That doesn't mean much.

309

00:29:27,359 --> 00:29:30,871

It conveys a good first order intuition, but not a good second order intuition.

310

00:29:30,871 --> 00:29:33,172

I like to say we're probing the posterior.

311

00:29:33,172 --> 00:29:38,073

And then we need to think about what are the properties of the posterior that we're

actually pursuing.

312

00:29:38,174 --> 00:29:43,516

And so then we get into, people ask me, when should I use MCMC or variational inference?

313

00:29:44,497 --> 00:29:47,738

So people criticize variational inference.

314

00:29:47,738 --> 00:29:53,220

say, well, even when you solve the, so what does VI do?

315

00:29:53,220 --> 00:29:55,361

Maybe just as a summary is.

316

00:29:55,409 --> 00:30:00,090

You have a family of approximation, for example, Gaussians.

317

00:30:00,130 --> 00:30:06,352

And then within that family of approximation, it tries to find the best approximation to

your posterior.

318

00:30:06,772 --> 00:30:14,314

And people will dismiss it because they say, look, even if you solve the optimization

problem, at the end of the day, your posterior is not a Gaussian.

319

00:30:14,314 --> 00:30:16,735

So your optimal solution is not good.

320

00:30:16,735 --> 00:30:21,336

It has what's called, what people call an asymptotic bias.

321

00:30:21,416 --> 00:30:24,533

Whereas MCMC, you know that we have enough compute power.

322

00:30:24,533 --> 00:30:27,075

and enough can be a lot, right?

323

00:30:27,075 --> 00:30:30,318

Eventually you will hit arbitrary precision, right?

324

00:30:30,318 --> 00:30:42,498

But now if I think about, I'm trying to probe the posterior, well maybe that Gaussian

approximation does match the expectation value, does match the summary quantities that I'm

325

00:30:42,498 --> 00:30:43,028

interested in.

326

00:30:43,028 --> 00:30:49,553

Maybe it captures the variance, or maybe it captures the entropy, right?

327

00:30:49,553 --> 00:30:54,411

So maybe that is the pedagogical work that

328

00:30:54,411 --> 00:31:06,297

I'm trying to do for beginners with the caveat that I don't have great answers to all

those questions.

329

00:31:06,297 --> 00:31:09,429

I think these are real research topics.

330

00:31:10,230 --> 00:31:18,714

But if I think about one goal, for example, that I would like to achieve, I would like to,

I want it to be part of the workflow.

331

00:31:19,735 --> 00:31:20,845

people are doing work on that.

332

00:31:20,845 --> 00:31:24,647

Aki Vettari is doing great work on that, to only name one person.

333

00:31:24,913 --> 00:31:30,936

Once people figure out this is how precise my Monte Carlo estimators need to be, I want

that to be the input to stand.

334

00:31:31,637 --> 00:31:45,364

And then I want it to run the Markov chains for the right number of iterations in a way

that gives you that precision without wasting too much computational power.

335

00:31:45,364 --> 00:31:47,185

And we're not there yet.

336

00:31:47,185 --> 00:31:54,489

We have promising directions to do that, which also come with their fair share of

challenges.

337

00:31:55,341 --> 00:32:03,305

But yeah, that's the kind of thing I want to do for beginners and for intermediates and

for advanced and for myself.

338

00:32:03,305 --> 00:32:08,126

But yeah, the beginners ask the right questions and the difficult questions.

339

00:32:08,427 --> 00:32:09,697

Okay, thanks Charles.

340

00:32:09,697 --> 00:32:10,537

Nice save.

341

00:32:10,537 --> 00:32:24,713

No, so more seriously, yeah, Brian, was wondering like, so if you had, let's say Stan

Wulham,

342

00:32:24,737 --> 00:32:36,126

He comes to you in a dream and he's like, okay, Brian, you've got one wish to make Stan

better for everybody, including the beginners, Charles.

343

00:32:37,828 --> 00:32:39,649

So what would it be?

344

00:32:40,629 --> 00:32:43,493

This is like a genie powerful wish.

345

00:32:43,493 --> 00:32:45,414

I can rewrite the history of the...

346

00:32:46,996 --> 00:32:50,609

Something that we've talked about again and again, but it would just be such a huge lift.

347

00:32:50,609 --> 00:32:53,761

But if I'm allowed to go back to the start, I think that...

348

00:32:53,847 --> 00:33:01,751

There's been a lot of talk about how the block structure of Stan gives a lot of power, but

it also makes a lot of things limiting.

349

00:33:01,751 --> 00:33:09,585

it's, right now if you want to do a prior predictive check, you oftentimes need a separate

model that looks a little different than the model you're actually writing.

350

00:33:09,746 --> 00:33:16,269

And this is one of the things that's great about BRMS, right, is the single formula can be

turned into all these models at once.

351

00:33:17,330 --> 00:33:22,393

But there has been previous research, so Maria Goranova, Goranova?

352

00:33:23,033 --> 00:33:29,516

She did a master's thesis and a PhD thesis on a tool she called SlickStand, which was a

stand with no blocks.

353

00:33:29,516 --> 00:33:38,800

And so it sort of would automatically, you would write your stand model as you do now, but

without saying what's data and what's parameters, and then you would just give it data,

354

00:33:38,800 --> 00:33:46,283

and it would then figure out, okay, these are the data, these are the parameters, here are

things I can move to generated quantities, and it would sort of be a much more powerful

355

00:33:46,283 --> 00:33:52,481

form of the compiler that would really capture a lot of these ideas, but it would also be

sort of a fundamentally different.

356

00:33:52,481 --> 00:33:54,522

thing than Stan.

357

00:33:54,563 --> 00:33:57,385

If I could really do anything in the world, that would probably be it.

358

00:33:57,385 --> 00:34:01,668

But I don't know if that will ever make it there.

359

00:34:01,789 --> 00:34:04,852

There's a lot of existing stuff that we would have to give up, I think.

360

00:34:04,852 --> 00:34:07,554

Yeah.

361

00:34:07,554 --> 00:34:08,695

I understand.

362

00:34:08,695 --> 00:34:12,077

If you're interested, Mario Gorinoa was in the podcast.

363

00:34:12,618 --> 00:34:15,821

You can go on their website, learnbasedats.com.

364

00:34:15,881 --> 00:34:19,384

There is a small stuff on the right.

365

00:34:19,384 --> 00:34:20,875

On the top, you can...

366

00:34:21,247 --> 00:34:22,858

look for any guests.

367

00:34:22,858 --> 00:34:30,261

So Maria Gorinova, that was a great episode because I think she's also working on

automatic reparameterization, if I remember correctly.

368

00:34:30,621 --> 00:34:40,285

So if you ever had to reparameterize a model, that can be quite frustrating if you're a

beginner because you're like, but it's the same model.

369

00:34:40,285 --> 00:34:42,746

I'm just doing that for the sampler.

370

00:34:42,746 --> 00:34:48,941

And so one of the goals of that is just having the sampler figure that out by itself.

371

00:34:48,941 --> 00:34:57,841

Yeah, and then she also did some interesting work on automatic marginalization where it's

tractable, which was very cool, because that's another, I don't feel confident in my own

372

00:34:57,841 --> 00:35:03,541

ability to marginalize a model off the top of my head, so it's like a, I know that's a

thing that new users hit a lot.

373

00:35:03,541 --> 00:35:10,901

Yeah, yeah, yeah, I mean, you hit that quite a lot, and yeah, if we could automate that at

some point, that'd be absolutely fantastic, yeah.

374

00:35:11,861 --> 00:35:18,327

Charles, I think we've got nine minutes before the Q &As.

375

00:35:18,355 --> 00:35:20,485

So I'm going to give you choice.

376

00:35:22,086 --> 00:35:34,810

No, so we could go back to talk about Inla a bit, because I realize we should have done

something at the beginning, which is defining Inla and telling people why that would be

377

00:35:34,810 --> 00:35:36,230

useful and when.

378

00:35:36,590 --> 00:35:41,031

We can also talk about BridgeStand, but I think, Brian, you can talk about BridgeStand

too.

379

00:35:41,031 --> 00:35:44,392

So your call, Charles.

380

00:35:49,536 --> 00:35:51,957

Let's talk about BridgeStand.

381

00:35:52,117 --> 00:35:54,677

Or let's talk about BridgeStand.

382

00:35:57,117 --> 00:35:58,977

Let's see how fast I can do it.

383

00:35:58,977 --> 00:36:00,317

Maybe we can do both.

384

00:36:01,157 --> 00:36:03,537

Yes and yes.

385

00:36:04,437 --> 00:36:07,557

So Simon's talk earlier mentioned BridgeStand.

386

00:36:07,557 --> 00:36:15,937

And if people aren't familiar, this was something that Edward Raldis, who's a Stand

developer, started a few years ago when he was visiting us in New York.

387

00:36:16,651 --> 00:36:19,623

drives me crazy that I didn't think of this.

388

00:36:19,623 --> 00:36:27,918

Edward deserves so much credit because it was sitting there all this time, but what it

essentially does is it, through a lot of technical mumbo jumbo that you should ask me

389

00:36:27,918 --> 00:36:35,493

about later, it makes it very easy for people to use Stan models outside of Stan's C++

ecosystem.

390

00:36:35,493 --> 00:36:41,477

And so if you have a model in Stan, but you want to use a...

391

00:36:41,733 --> 00:36:50,520

like an algorithm that's only implemented in our package or that you're developing

yourself, it really lets you get the log densities and the gradients with all of the speed

392

00:36:50,520 --> 00:36:57,306

and quality of the Stan Math library, but you can use these Python libraries or these like

experimental things that you're working on.

393

00:36:57,306 --> 00:37:06,703

And so it's our, a lot, we have a paper and it has a few citations already of people who

have been using it to develop new algorithms and like I know a lot of work that Bob has

394

00:37:06,703 --> 00:37:11,293

been doing recently has been using it and so like that's one way we're, especially

395

00:37:11,293 --> 00:37:18,008

One of the things we're thinking of for those users who want to push the edge is new forms

of variational inference and new forms of HMFC.

396

00:37:18,008 --> 00:37:22,882

And it has already been a really huge boon for that research.

397

00:37:22,882 --> 00:37:24,023

Yeah, yeah.

398

00:37:24,023 --> 00:37:31,549

At the Flatiron Institute, we do a lot of algorithmic work on new samplers and new

variational inference.

399

00:37:31,549 --> 00:37:34,971

And we now use BridgeStand all the time.

400

00:37:36,717 --> 00:37:45,717

I'll give you two good reasons and there are probably more but one of them is that gives

us access to Stan's automatic differentiation and if you look at a lot of papers that

401

00:37:45,717 --> 00:38:01,357

evaluate the performance of algorithms they do it not against time but against number of

gradient evaluations because that tends to be the dominant operation computationally and

402

00:38:01,357 --> 00:38:06,205

so now you write your sampler in Python or

403

00:38:06,209 --> 00:38:16,053

maybe an R, or you write your VI in Python in R, but you still get the high performance

from using Stan.

404

00:38:16,353 --> 00:38:16,943

So that's great.

405

00:38:16,943 --> 00:38:27,258

And then the second thing is that means that you can now test those new algorithms that

you've developed in a pretty straightforward way on Stan models and the library of Stan

406

00:38:27,258 --> 00:38:32,180

models, including posterior DB or maybe some other models that you've been using.

407

00:38:32,180 --> 00:38:35,391

And those models are very readable.

408

00:38:35,629 --> 00:38:40,122

It standardizes a little bit the testing framework.

409

00:38:40,122 --> 00:38:50,517

so it has changed my thinking a little bit as someone who works a lot on the Stan

compiler, thinking of Stan not just as its own sort of ecosystem, but also as like a

410

00:38:50,517 --> 00:38:52,508

language for communicating models.

411

00:38:52,508 --> 00:38:54,740

I find it really helpful.

412

00:38:54,740 --> 00:38:59,883

Someone can describe a model in LaTeX up on a slide, but as soon as they show me the Stan

code, I'm like, I get it.

413

00:38:59,883 --> 00:39:04,685

And even if my job now was to go implement it in PyMC or something, I think it's still

helped.

414

00:39:04,929 --> 00:39:13,788

Having this language that is a little bit bigger than itself or a little bit bigger than

it used to be where now, I see Adrian here is in the audience and he has an implementation

415

00:39:13,788 --> 00:39:15,379

of HMC in Rust.

416

00:39:15,379 --> 00:39:18,472

But you can use Stan models with it because of BridgeStan.

417

00:39:18,472 --> 00:39:22,966

it has opened up the, sorry, Adrian's in the back.

418

00:39:23,007 --> 00:39:29,665

But it's opened up the world of things that Stan can be, which is one thing that I think

is very cool.

419

00:39:29,665 --> 00:39:40,928

Yeah, and I think, so when I spoke about the new community of users that I think we're

going to reach is there are people who write their own samplers who have particularly

420

00:39:40,928 --> 00:39:41,878

difficult problems.

421

00:39:41,878 --> 00:39:56,512

And even today, we've had two examples, at least two examples of people who departed from

the traditional samplers that are implemented in Stan, either to implement tempering or to

422

00:39:56,512 --> 00:39:59,113

implement massive parallelization.

423

00:39:59,289 --> 00:40:12,014

And so, you know, I really think that, you know, there is a group of people who for their

problems, you know, like to develop and try out certain samplers.

424

00:40:12,014 --> 00:40:24,960

And, you know, that's also going to drive research for what could be the next default

sampler or variational inference or approximation in Stan.

425

00:40:24,960 --> 00:40:28,381

They are candidates for that.

426

00:40:29,195 --> 00:40:35,768

Although it's true that the more we learn, the more we develop new samplers, the more we

realize how good Nuts is.

427

00:40:39,031 --> 00:40:43,012

But things are going to change over the years.

428

00:40:43,073 --> 00:40:44,514

OK, awesome.

429

00:40:44,514 --> 00:40:45,393

Thanks a lot, guys.

430

00:40:45,393 --> 00:40:47,776

So I still have a ton of questions.

431

00:40:47,776 --> 00:40:50,817

But already, let's open it up to the audience.

432

00:40:50,817 --> 00:40:52,228

Are there already any questions?

433

00:40:52,228 --> 00:40:55,239

Or should I ask one?

434

00:40:55,960 --> 00:40:57,220

OK, perfect.

435

00:40:58,393 --> 00:41:14,810

So, mentioning the new samplers that you guys are developing at the Flatiron and also I

have a lot of guests who come on the show and talk about new samplers, normalizing flows

436

00:41:14,810 --> 00:41:25,544

for instance, Marie-Lou Gabriel was on the show, also Marvin Schmidt, Paul Buechner is

here, he works a lot on bass flow with Marvin Schmidt.

437

00:41:27,425 --> 00:41:30,747

They are doing amortized patient inference.

438

00:41:30,747 --> 00:41:40,474

So I'm really curious how you guys think about that and Stan, basically.

439

00:41:40,474 --> 00:41:47,420

Because most of the time, it's also tied to increasing data sizes.

440

00:41:47,420 --> 00:41:55,253

And so people are looking into new samplers which can adapt to their use case better.

441

00:41:55,253 --> 00:42:04,541

So I'm curious how you guys think about that in the Stan team and what you're thinking of

developing in the coming month about that.

442

00:42:04,582 --> 00:42:16,783

Yeah, I think one of the challenges that these approaches often, sort of one of the

motivating reasons for them is that you can get a wall clock time reduction by just

443

00:42:16,783 --> 00:42:20,757

throwing a massive amount of compute at it with GPUs, which is one place where...

444

00:42:21,165 --> 00:42:28,825

Stan's GPU support is still kind of piecemeal, like we're working on it, but it's sort of

like we can't compete with Google developing Jacks, you know?

445

00:42:28,825 --> 00:42:38,905

And so like, you know, Simon's presentation earlier showed that like on CPU, Stan actually

beats Jacks or BridgeStand, you know, can be faster than Jacks.

446

00:42:38,905 --> 00:42:41,785

But on GPU, we have sort of no hope.

447

00:42:41,785 --> 00:42:44,545

And I think that like, or at least at the moment, no hope.

448

00:42:44,545 --> 00:42:50,309

But I think that's where these approaches become really challenging is like trying to

think of.

449

00:42:50,833 --> 00:42:58,047

And I think it's sort of an almost existential question of like, is Stan just like the CPU

solution, right?

450

00:42:58,047 --> 00:42:59,798

And is something else better?

451

00:42:59,798 --> 00:43:06,281

Because there are things about Stan's like, sort of core design that don't like GPUs.

452

00:43:06,442 --> 00:43:14,726

It's a very expressive language and GPUs really like less expressive languages that are

much more easier to guess what you're gonna do next.

453

00:43:15,927 --> 00:43:19,068

And so I think that is something that, know,

454

00:43:19,909 --> 00:43:25,353

I personally believe there will always be sort of a community of like, know, researchers

working on their laptop or that sort of thing.

455

00:43:25,353 --> 00:43:30,746

And so I think there will always be a place for these like CPU bound implementations.

456

00:43:30,746 --> 00:43:34,328

But yeah, if you can predict that, you can probably make a lot of money.

457

00:43:34,328 --> 00:43:37,940

Charles?

458

00:43:39,241 --> 00:43:45,605

Yeah, I'm going to try and return to the original question, which is, you know,

459

00:43:45,901 --> 00:43:55,747

So there are a lot of algorithms that are being developed and there are a of good ideas

that go into developing these algorithms and there some good experiments and some good

460

00:43:55,747 --> 00:44:01,089

empirical evidence that supports why you might want to use those algorithms.

461

00:44:02,070 --> 00:44:14,677

Nonetheless, 80 to 90 % of the time when I read a paper about a new algorithm, it doesn't

give me enough information as to whether

462

00:44:14,677 --> 00:44:17,999

I should now start using this algorithm to solve my problem.

463

00:44:18,681 --> 00:44:21,963

And there is a, so what does that mean?

464

00:44:21,963 --> 00:44:33,213

That means that usually you need to somehow implement that algorithm and test it yourself

on your own problem, and that's fine, but I think that a lot of these algorithms out there

465

00:44:33,213 --> 00:44:35,074

are not yet battle tested.

466

00:44:35,776 --> 00:44:42,281

And we're kind of in a situation where, okay, we,

467

00:44:42,505 --> 00:44:49,110

maybe we like the prototype and maybe it's promising, do we put in the developer time to

build this in Stan?

468

00:44:49,331 --> 00:44:53,504

And it's a bit of a cycle because once it appears in Stan, then it really gets battle

tested.

469

00:44:53,504 --> 00:44:59,499

And then we get feedback from the community and we can try to learn things about this

algorithm, we can try to improve it.

470

00:44:59,499 --> 00:45:07,585

That's actually what happened to the no U-turn sampler which has evolved since its

original inception.

471

00:45:11,541 --> 00:45:15,950

You know, I'm of the opinion that,

472

00:45:18,433 --> 00:45:24,918

My bar for scientific papers is it presents a good idea and it's thought stimulating.

473

00:45:25,079 --> 00:45:31,143

But I don't think it tells me this is the next thing we should build in Stan.

474

00:45:31,944 --> 00:45:45,245

I think BridgeStan can alleviate some of that because it makes it easier for people to

build implementations that can then be tested in Stan and then we kind of get into battle

475

00:45:45,245 --> 00:45:45,845

testing things.

476

00:45:45,845 --> 00:45:48,205

Maybe someone builds a Python package

477

00:45:48,205 --> 00:46:04,205

that is compatible with BridgeStand and maybe the process becomes instead of the stand

developers, the stand community, brutally evaluating an algorithm before deciding to put

478

00:46:04,205 --> 00:46:14,205

some amount of work, maybe first this package gets used and it's developed by an algorithm

developers.

479

00:46:14,205 --> 00:46:15,589

But this...

480

00:46:15,789 --> 00:46:21,470

This is the broader question of how do algorithms get developed, implemented, and adopted?

481

00:46:22,171 --> 00:46:28,272

And I'll tell you what, another big criterion here is the simplicity of the algorithm.

482

00:46:30,413 --> 00:46:37,915

That plays a huge role into whether an algorithm is adopted by developers, by users, or

not.

483

00:46:38,835 --> 00:46:44,137

So the answer is I don't know.

484

00:46:45,963 --> 00:46:48,726

Yeah, that's always a fine answer.

485

00:46:49,328 --> 00:46:50,689

Any questions?

486

00:46:52,252 --> 00:46:55,246

I'm going to bring one up for my neighbor.

487

00:46:55,246 --> 00:46:56,999

Wait, Perfect.

488

00:46:56,999 --> 00:46:58,561

We needed the mic.

489

00:47:01,429 --> 00:47:07,355

So what do we do about algorithms that are good for specific situations but not good for

other things?

490

00:47:07,355 --> 00:47:12,020

Like so far we've only developed like black box algorithms that we kind of hope work

everywhere.

491

00:47:12,020 --> 00:47:15,364

We don't have any kind of real specific algorithms for anything.

492

00:47:15,364 --> 00:47:16,915

Is there any future for that?

493

00:47:20,577 --> 00:47:21,747

I mean, this is...

494

00:47:24,213 --> 00:47:34,361

I think this is one advantage, so I'm gonna quote the person who just asked the question,

but one thing Bob has said a lot is the reason we don't wanna just put 30 samplers into

495

00:47:34,361 --> 00:47:43,749

Stan is then a lot of practitioners would try all 30 of them and then just report the,

there's an advantage to sort of being a great filter and being very conservative in what

496

00:47:43,749 --> 00:47:45,590

is actually in Stan.

497

00:47:45,590 --> 00:47:54,089

But I do think this is one advantage to making it easier to broaden the ecosystem where

now I think a future for that kind of

498

00:47:54,089 --> 00:48:07,217

algorithm is in a R package or a Python package that can interface with, there are now

existing examples out there of an implementation of an algorithm that has support for Stan

499

00:48:07,217 --> 00:48:08,748

models and PyMC models.

500

00:48:08,748 --> 00:48:18,663

So it can kind of bridge gaps between communities, also sort of, if you have to install a

separate package, that makes it fairly clear that this is for a separate purpose.

501

00:48:18,663 --> 00:48:21,925

And so I think that's what I would say the future is for those.

502

00:48:23,917 --> 00:48:25,517

Yeah, I agree.

503

00:48:30,573 --> 00:48:38,858

Do you have an intuition how easy it is for the Sten compiler to figure out whether a

model is generative and then to be able to sample from it?

504

00:48:38,858 --> 00:48:43,781

I mean, of course we can do it in generative quantities, but it's always awkward to double

code our models.

505

00:48:43,781 --> 00:48:58,239

This is a question that also sort of does expose a bit of my sort of not traditional

statistics background, is that I have never been presented with a definition of like,

506

00:48:58,411 --> 00:49:02,943

generative or graphical model that is precise enough for me to actually answer this

question.

507

00:49:03,843 --> 00:49:07,945

I think that there are definitely easy cases and hard cases.

508

00:49:07,945 --> 00:49:18,129

I suspect that in general it would be impossible, but it's also, I think it's probably

likely that we could have a system where it tries really hard and then if it doesn't

509

00:49:18,129 --> 00:49:21,130

succeed in a minute it gives up or something like that.

510

00:49:21,131 --> 00:49:26,393

There are all these sorts of tricks in the compiler world, but I think that the...

511

00:49:27,917 --> 00:49:37,537

This is another one of these things, kind of like GPU support, that because you can write

basically anything you want, you can also write sort of the worst possible case for this

512

00:49:37,537 --> 00:49:39,377

kind of automated analysis.

513

00:49:39,697 --> 00:49:47,317

an open question I've had for a long time is like, what percentage of STAND models in the

wild are generative or not?

514

00:49:47,437 --> 00:49:54,387

If that number just naturally is 80, 90%, I think then this is like a very fruitful thing.

515

00:49:54,387 --> 00:49:57,497

But if it's like 60, I don't know.

516

00:49:57,516 --> 00:49:58,908

less, I'm not sure.

517

00:50:00,673 --> 00:50:14,204

That's been what I've heard is that it is more like, it is fairly high, yeah, I think it

would be something that's worth looking into, but I would need some handholding on the

518

00:50:14,204 --> 00:50:16,347

statistic modeling side of that, actually.

519

00:50:20,513 --> 00:50:22,270

Sorry, I shouldn't call on people.

520

00:50:24,695 --> 00:50:31,010

Hi, so I have a question about more on the people trying to implement models in Stan.

521

00:50:31,311 --> 00:50:37,375

And say there's a model and it's just, you know, it's taking a very long time.

522

00:50:37,796 --> 00:50:43,661

And people think, well, Stan, you know, they might have some complaints or I say it's too

slow.

523

00:50:43,661 --> 00:50:51,126

But what I found in practice also is I never clear sometimes what parts of my model are

causing the delay.

524

00:50:51,347 --> 00:50:53,889

So what are the slow bits or?

525

00:50:54,635 --> 00:51:03,070

It can either just be like mathematically this is just harder to estimate or there's some

shape of my posterior that's really harder to navigate.

526

00:51:03,070 --> 00:51:09,273

But I don't really get that feedback unless I'm like fixing certain parameters, toying

with other things.

527

00:51:09,273 --> 00:51:16,057

Is there any way to allow, know, give that feedback of, what's causing some issues?

528

00:51:22,029 --> 00:51:24,760

you ever thought about modeling that?

529

00:51:24,760 --> 00:51:25,641

Sorry.

530

00:51:28,333 --> 00:51:39,133

So I remember maybe a year ago, I was actually, I met Andrew Gelman and Meti Morris in

Paris at a cafe.

531

00:51:39,133 --> 00:51:41,964

We just all so happened to be in Paris.

532

00:51:43,006 --> 00:51:47,389

And we started brainstorming.

533

00:51:47,950 --> 00:51:55,957

We had an idea of a research project, which is how much can you learn about your model and

your sampler by running 20 iterations of HMC?

534

00:51:56,921 --> 00:52:06,646

And the idea that, you know, fail fast, learn fast, that, you know, the early iterations

of a Bayesian workflow should be based on that.

535

00:52:06,646 --> 00:52:14,991

And I think that a lot of the statistics literature and the more formal literature, you

know, kind of imagines that, you know, you've done a really good job fitting your model,

536

00:52:14,991 --> 00:52:19,093

you've thrown a lot of computation, you've waited a long time.

537

00:52:20,394 --> 00:52:25,476

And we want to figure out, you know, what are the lessons that you can learn quickly,

right?

538

00:52:25,476 --> 00:52:26,977

So now,

539

00:52:27,945 --> 00:52:42,965

I can talk a little bit from experience and I can give you that, but we kind of want to

make that also part of the workflow and your early iterations that we can learn with fast

540

00:52:42,965 --> 00:52:43,565

approximation.

541

00:52:43,565 --> 00:52:48,065

And then hopefully we'll have a good answer to your question.

542

00:52:48,125 --> 00:52:50,457

There's also a tool for instrumentation.

543

00:52:50,457 --> 00:52:54,840

Yeah, was gonna say, in the immediate sense, there is the ability to profile stand models.

544

00:52:54,840 --> 00:53:03,696

You can write a block that starts with the word profile and then a name, and then you can

turn that on when you're running it, and it will give you a printout of like, the block

545

00:53:03,696 --> 00:53:10,551

named X took this percentage of the time, the block named Y took that percentage, and it

can help you identify at least like, here's the bad line.

546

00:53:10,551 --> 00:53:13,591

Now, it might not help you figure out what you need to do instead.

547

00:53:13,591 --> 00:53:20,845

But that's where I found that there are some real wizards who live on the Stand Forum,

some of whom are in the room and some of whom are completely anonymous and will never meet

548

00:53:20,845 --> 00:53:21,265

them.

549

00:53:21,265 --> 00:53:23,046

But they're super helpful.

550

00:53:23,046 --> 00:53:28,889

if it's a model that you can share, that you can share a snippet of, there is a lot of

human capital.

551

00:53:28,889 --> 00:53:33,792

yeah, automating that and putting that into documentation is an ongoing thing.

552

00:53:33,792 --> 00:53:36,644

Yeah, mean, plus one to the human capital.

553

00:53:36,644 --> 00:53:43,057

And the contributions of everyone here who comes to this conference, who teaches

tutorials, who demonstrates

554

00:53:44,167 --> 00:53:49,760

their models, who shares the documentation, who makes their code open source.

555

00:53:49,760 --> 00:53:54,079

I that's also one of the things that makes a programming language work.

556

00:53:55,853 --> 00:53:57,933

Time for one last question.

557

00:54:01,693 --> 00:54:12,213

So I was thinking, if you go back some decades, 50, 60 years or 48, if you develop a

model, then you have to develop a way to sample from the posterior and stuff like that.

558

00:54:12,213 --> 00:54:20,613

But maybe fast forward to today and maybe my advisor could be thinking, when I was a boy,

I had to write my own sampler.

559

00:54:21,653 --> 00:54:31,418

Now you can have people that can be designing models or new ways to model, observe data,

but they maybe don't have to think too much about that computational side.

560

00:54:31,519 --> 00:54:42,965

So what you think about the effect of Stan and similar languages on opening up this

research in Bayesian modeling to people who maybe are not numerical analysts or stuff like

561

00:54:42,965 --> 00:54:43,765

that.

562

00:54:44,066 --> 00:54:47,307

think you should bring your advisor to Stencon.

563

00:54:49,249 --> 00:54:50,409

Yeah, so...

564

00:54:50,673 --> 00:54:55,615

One way to think about this question is to think about how old Hamiltonian Monte Carlo is.

565

00:54:55,815 --> 00:54:58,713

So the original paper is from 1987.

566

00:54:59,697 --> 00:55:08,301

And yet it was largely unused by the broader scientific community until Stan came out.

567

00:55:08,301 --> 00:55:17,995

And what were the technologies, technological developments that enabled Stan to make

Hamiltonian Monte Carlo

568

00:55:18,175 --> 00:55:20,716

the workhorse of so many scientists.

569

00:55:20,716 --> 00:55:22,486

I that's something worth thinking about.

570

00:55:22,486 --> 00:55:29,598

Though I should say the one exception, the one person who did use HMC through the 90s and

2000s is Radford Neal, right, who did manage.

571

00:55:29,598 --> 00:55:39,171

But otherwise, the tuning parameters, the control parameters, the requirement to calculate

gradients, that was an obstacle to many people.

572

00:55:39,171 --> 00:55:47,413

And so instead of using HMC, they're using other samplers, which we know perform.

573

00:55:48,877 --> 00:55:52,540

between less well and dramatically less well in many cases.

574

00:55:53,681 --> 00:55:58,805

So I think it's great that we have these black box methods.

575

00:55:59,166 --> 00:56:08,033

But the one nuance that I will say is that the algorithm is not the only thing that's

black boxified and Stan.

576

00:56:08,033 --> 00:56:17,175

The diagnostics, the warning messages, the generation of those things, the fact that these

things are generated automatically.

577

00:56:17,175 --> 00:56:21,351

That's what makes a black box algorithm reliable.

578

00:56:23,309 --> 00:56:25,329

It was the derivatives too.

579

00:56:25,489 --> 00:56:28,989

There wasn't a good auto-div system when we built Stan.

580

00:56:30,069 --> 00:56:32,169

I mentioned gradients, no?

581

00:56:32,909 --> 00:56:38,489

I'll caveat this a bit with the previous question hints at the fact that these things are

never truly black box.

582

00:56:38,489 --> 00:56:46,069

Because when you're facing performance difficulties, when you're at the edge, you do need

to have a fairly sophisticated understanding of what's happening.

583

00:56:46,069 --> 00:56:52,077

If you ever have used the reduce some function in Stan, that is technically like an

implementation detail.

584

00:56:52,077 --> 00:56:55,139

that you are having to exploit to get the speed you need.

585

00:56:55,139 --> 00:57:07,946

And so there's always a fuzzy boundary here, but I think that it does help lower the

barrier to entry, even if the hypothetical ceiling can stay as high as your imagination.

586

00:57:07,946 --> 00:57:08,396

That's true.

587

00:57:08,396 --> 00:57:10,147

We could be more black box.

588

00:57:10,337 --> 00:57:11,868

That's seriously, huh?

589

00:57:11,868 --> 00:57:18,991

I think that people do tweak and manipulate the methods a lot, and they need to understand

some fundamental concepts.

590

00:57:19,792 --> 00:57:20,413

Awesome.

591

00:57:20,413 --> 00:57:21,833

Well, I think we're good.

592

00:57:22,263 --> 00:57:26,791

Thank you so much, folks, for being part of the first live show.

593

00:57:38,967 --> 00:57:42,660

This has been another episode of Learning Bayesian Statistics.

594

00:57:42,660 --> 00:57:53,149

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbayestats.com for more resources about today's topics, as well as access to more

595

00:57:53,149 --> 00:57:57,232

episodes to help you reach true Bayesian state of mind.

596

00:57:57,232 --> 00:57:59,194

That's learnbayestats.com.

597

00:57:59,194 --> 00:58:02,056

Our theme music is Good Bayesian by Baba Brinkman.

598

00:58:02,056 --> 00:58:04,038

Fit MC Lance and Meghiraam.

599

00:58:04,038 --> 00:58:07,200

Check out his awesome work at bababrinkman.com.

600

00:58:07,200 --> 00:58:08,395

I'm your host.

601

00:58:08,395 --> 00:58:09,226

Alex Andorra.

602

00:58:09,226 --> 00:58:13,505

You can follow me on Twitter at Alex underscore Andorra like the country.

603

00:58:13,505 --> 00:58:20,854

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

604

00:58:20,854 --> 00:58:23,236

Thank you so much for listening and for your support.

605

00:58:23,236 --> 00:58:25,548

You're truly a good Bayesian.

606

00:58:25,548 --> 00:58:32,363

Change your predictions after taking information in and if you're thinking I'll be less

than amazing.

607

00:58:32,363 --> 00:58:35,895

Let's adjust those expectations.

608

00:58:35,895 --> 00:58:48,869

Let me show you how to be a good Bayesian Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Previous post
Next post