*Proudly sponsored by **PyMC Labs**, the Bayesian Consultancy. **Book a call**, or **get in touch**!*

*Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his **awesome work**!*

Visit our Patreon page to unlock exclusive Bayesian swag 😉

**Takeaways**:

- User experience is crucial for the adoption of Stan.
- Recent innovations include adding tuples to the Stan language, new features and improved error messages.
- Tuples allow for more efficient data handling in Stan.
- Beginners often struggle with the compiled nature of Stan.
- Improving error messages is crucial for user experience.
- BridgeStan allows for integration with other programming languages and makes it very easy for people to use Stan models.
- Community engagement is vital for the development of Stan.
- New samplers are being developed to enhance performance.
- The future of Stan includes more user-friendly features.

**Chapters**:

00:00 Introduction to the Live Episode

02:55 Meet the Stan Core Developers

05:47 Brian Ward’s Journey into Bayesian Statistics

09:10 Charles Margossian’s Contributions to Stan

11:49 Recent Projects and Innovations in Stan

15:07 User-Friendly Features and Enhancements

18:11 Understanding Tuples and Their Importance

21:06 Challenges for Beginners in Stan

24:08 Pedagogical Approaches to Bayesian Statistics

30:54 Optimizing Monte Carlo Estimators

32:24 Reimagining Stan’s Structure

34:21 The Promise of Automatic Reparameterization

35:49 Exploring BridgeStan

40:29 The Future of Samplers in Stan

43:45 Evaluating New Algorithms

47:01 Specific Algorithms for Unique Problems

50:00 Understanding Model Performance

54:21 The Impact of Stan on Bayesian Research

**Thank you to my Patrons for making this episode possible!**

*Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev,* *Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke and Robert Flannery.*

**Links from the show:**

- Come see the show live at PyData NYC: https://pydata.org/nyc2024/
- LBS #90, Demystifying MCMC & Variational Inference, with Charles Margossian: https://learnbayesstats.com/episode/90-demystifying-mcmc-variational-inference-charles-margossian/
- Charles’ website: https://charlesm93.github.io/
- Charles on GitHub: https://github.com/charlesm93
- Charles on LinkedIn: https://www.linkedin.com/in/charles-margossian-3428935b/
- Charles on Google Scholar: https://scholar.google.com/citations?user=nPtLsvIAAAAJ&hl=en
- Charles on Twitter: https://x.com/charlesm993
- Brian’s website: https://brianward.dev/
- Brian on GitHub: https://github.com/WardBrian
- Brian on LinkedIn: https://www.linkedin.com/in/ward-brianm/
- Brian on Google Scholar: https://scholar.google.com/citations?user=bzosqW0AAAAJ&hl=en
- Brian on Twitter: https://x.com/ward_brianm
- Bob Carpenter’s reflections on StanCon: https://statmodeling.stat.columbia.edu/category/bayesian-statistics/

**Transcript**

*This is an automatic transcript and may therefore contain errors. Please **get in touch** if you’re willing to correct them.*

##### Transcript

This episode is the first of its kind.

2

Welcome to the very first live episode of the Learning Visions Statistics podcast recorded

at STANCON on September 10, 2024.

3

Again, I want to thank the whole STANCON committee for their help, trust and support in

organizing this event.

4

I surely had a blast and I hope

5

Everybody did.

6

In this episode, you will hear not about one, but two StandCore developers, Charles

Marcossian and Brian Ward.

7

They'll tell us all about Stand's future as well as give us some practical advice for

better statistical modeling.

8

And of course, there is a Q &A session with the audience at the end.

9

This is Learning Basics Statistics, episode 118.

10

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,

the projects, and the people who make it possible.

11

I'm your host, Alex Andorra.

12

You can follow me on Twitter at alex-underscore-andorra.

13

like the country.

14

For any info about the show, learnbasedats.com is Laplace to be.

15

Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on

Patreon, everything is in there.

16

That's learnbasedats.com.

17

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

18

See you around, folks.

19

and best patient wishes to you all.

20

And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can

help bring them to life.

21

Check us out at pimc-labs.com.

22

Hello my dear patients, today I want to welcome a new patron in the LearnBasedDance

family.

23

Thank you so much, Rob Flannery, your support truly makes this show possible.

24

I can't wait to talk to you in the Slack channel and hope that you will enjoy the

exclusive merch coming your way very soon.

25

Before we start, I have great news for you.

26

Because if you like live shows, I want to have two new live shows of LBS coming up on

November 7 and November 8 at Piedata, New York.

27

So if you want to be part of the live experience, join the Q &A's and connect with the

speakers and myself, and also get some pretty cool stickers, well...

28

You can get your ticket already at pine data dot org slash NYC 2024.

29

can't wait to see you there.

30

OK, on to the show now.

31

So, welcome.

32

Thank you so much for being here.

33

You are going to the immense honor and privilege to be the first ever live audience of the

Learning Basics and Statistics podcast.

34

Thank you.

35

Of course, as usual, a huge thank you to all the organizers of StandCon.

36

Charles, of course, thank you so much.

37

know you worked a lot.

38

Michael also who organized all of that.

39

So I think you can give them a big round of applause.

40

Okay, so let's get started.

41

So for those of you who don't know me, I'm Alex Endora.

42

I am an open source developer.

43

I am actually a PMC core developer.

44

Am I allowed to say those words here?

45

That's fine.

46

Don't worry.

47

Yes, and very recently started as the senior applied scientist at the Miami Marlins.

48

So if you're ever in Miami, let me know.

49

And today we are gonna talk, and yeah, no, of course I am the host and creator of the

Learning Patient Statistics podcast, which is the best show about patient stats.

50

I think we can say that confidently because it's the only one.

51

it's not that hard.

52

But today we have amazing guests with us.

53

We're gonna talk about everything Stan, today's the nerd panel.

54

anything you wanted to know about Stan, about samplers, about all the technical stuff

behind Stan.

55

Why does it take so long to have inline there, for instance, know, stuff like that.

56

You can ask that.

57

It's going to be like the last 10 minutes of the show, I think.

58

But before that, we're going to talk with Brian and Charles.

59

So I'm going to be without the mic that gives to the room for the rest of the show so that

you can hear from the guys mainly.

60

So let's start with Brian.

61

So Brian Ward, you were a Standcore developer, if I understood correctly.

62

Can you first give you a bit of a background, the origin story of Brian?

63

How did you end up doing what you're doing?

64

Because it seems to me that you're doing a lot of

65

software engineering thing, which is a priori quite far from the Bayesian world.

66

So how did you end up doing what you're doing today?

67

Yeah, so I majored in computer science and I sort of came into this from a very software

development angle.

68

So I sort of was always interested in how things work.

69

So I learned to program and then I was like, well, how programming languages work?

70

So I learned about compilers and then I stopped before going any deeper because there are

dragons down there.

71

But as part of my studies, I started working on a project with a couple of my professors

that was about Stan.

72

And they were mostly interested in Stan because in their words, it was the probabilistic

programming language that had the most thorough formal documentation of the language and

73

its semantics.

74

They really liked that they could form an abstract model of the Stan language.

75

And so that was my first time ever using a probabilistic programming language.

76

It was really coming in from that angle.

77

And then since 2021, I've been working a lot on the STAND compiler, but then also just on,

like you said, general software engineering for the different Python libraries and trying

78

to improve the installation process on systems like Windows and that sort of thing.

79

OK.

80

So we'll get back to that because I think there are a lot of interesting threads here.

81

But first, let's switch to Charles.

82

So maybe for the rest of the audience, Charles was already.

83

in the podcast, he's got the classic episode.

84

So if you're really interested in Charles' background, you can go and check out his

episode.

85

But maybe just for now, if you can quickly tell us who you are, how you ended up doing

that.

86

Yes, I should mention that I am an understudy.

87

were actually two other stand developers we were hoping to have on this panel.

88

because of circumstances, I ended up being here.

89

I'm in very good company and I have a lot of thoughts about the future of Stan, which is

the topic of this conversation.

90

But essentially, I've been a Stan developer for eight years now.

91

And I started when I was working in biotech in pharmacometrics where Stan was up and

coming, but it lacked certain features to be used in pharmacometrics modeling.

92

Notably, know, support for ODE systems, features to model clinical trials.

93

So my first project for Stan was developing an extension of Stan called Torsten, but also

in the process developed some features that directly appeared in Stan.

94

For example, the matrix exponential, which is used to solve linear ODE's, the algebraic

solvers.

95

And then,

96

I became a statistician, I pursued a PhD in statistics and I continued developing certain

features firsthand, kind of in that theme of implicit functions.

97

And I think we'll talk a little bit about that.

98

Nowadays, what I am is a research fellow, which is a glorified postdoc at the Flatiron

Institute, where I'm actually a colleague with Brian.

99

And I mostly do research.

100

around Bayesian computation, so that includes Markov chain Monte Carlo, variational

inference, and thinking about probabilistic programming languages today, tomorrow, but

101

also maybe in five or 10 years, what these might look like.

102

Yeah, thanks, Charles.

103

Quick legal announcement that I forgot, of course.

104

For the questions, we're going to record your voice.

105

So if you ask a question, you're

106

consenting to being recorded.

107

If you don't want your voice to be recorded, just come ask the question afterwards or find

a buddy who is willing to ask the question for you.

108

And that will be all fine.

109

So that's that.

110

Also, write down your questions because we're going to have the Q &A at the end of the

episode.

111

So let's continue.

112

Maybe with like that's for both of you.

113

I'm wondering before we talk about the future,

114

You guys work with Stan all the time, so you do a lot of things, but what has been your

most exciting recent project involving Stan, of course?

115

I can go first.

116

So this is a bit further ago, but one of the first real major, major win for me was adding

tuples to the language.

117

it's a slightly more advanced type than it previously appeared in Stan.

118

It had a lot of implementation difficulty, but it was a really big change to the language

in the compiler that finally made it in.

119

But more recently, working directly on Stan, I've been working on

120

been trying to add features to try to make it easier to do some of the things that are

built into Stan, especially related to the constraints and the transforms directly in

121

Stan.

122

So trying to take some of the magic that's built in out and let you be able to do things

yourself that work much closer to that.

123

And that's been interesting to think about how to make Stan a language that is easier to

extend for newer people.

124

this next release will have a

125

functions that make it a little easier to write your own user-defined transforms that do

the right thing during optimization, for example.

126

Hmm, okay.

127

that's cool.

128

Can you maybe give an example about such a function that people could use in a model?

129

Sure.

130

So one thing you might want to do is you might want a simplex parameter, but you want,

because you have some understanding of the posterior geometry, you want an alternative

131

parameterization.

132

You want to use softmax or you want to use some other thing than what's built into Stan.

133

And you can do this right now and it will work almost the same in almost all of the cases.

134

going forward, we're trying to make it work the same in all of the cases.

135

We're trying to sort of cover off those last things.

136

in particular, if you're finding a maximum likelihood estimate, that is done without the

Jacobian adjustment for the change of variables there.

137

But for the built-in types in STAND, but right now there's no way to have that also happen

for your custom transforms.

138

But there will be going forward.

139

Okay, that's really cool.

140

so I have to admit that a lot of my recent work has been more Stan-adjacent rather than

specific contributions to Stan.

141

And so I could talk about that, but maybe one of the features that we are hoping to

release soon and that I developed a few years ago, I prototyped a few years ago, was we

142

wanted to build a nested Laplace approximation inside of Stan.

143

And actually, we developed one and we had a prototype in 2020.

144

So that already goes back and we published a paper about that.

145

And then another year or two later when I wrote my PhD thesis, I had a more thorough

prototype that also released and then we kind of got stuck.

146

And I can talk a little bit about that, but essentially Steve Braunder who was supposed to

join us today, had something came up, hopefully he'll be there in the next few days.

147

at StenCon has really been pushing the C++ code and the development and we have this idea

that maybe by the next Sten release we'll actually have that integrated Laplace

148

approximation and we'll make it available to the users.

149

And of course there are a lot of interesting things in moving parts that are happening

around these features both from a technical

150

point of view.

151

So the automatic differentiation that we had to deploy is, I think, very interesting, very

challenging.

152

Also, the ways in which, what are the features that we put in our integrated Laplace?

153

So I don't think it's going to be as performant as the integrated Laplace approximation

that's implemented in Inla.

154

and I can discuss a little bit what are some of the features we lacked, but we also

focused on what are some unique things that having this integrated Laplace approximation

155

in Stan can give to the users in terms of modeling capabilities.

156

And those are things I'm excited about.

157

And there are going to be a few challenges about using this approximate algorithms, just

as they are whenever you use an approximate algorithm.

158

And that's going to motivate, you know,

159

new elements of a Bayesian workflow, new diagnostics, new checks that will have to be

semi-automated, that will have to be very well documented, and that will also need to be

160

demonstrated.

161

These are all the pieces you need for users to use an algorithm effectively.

162

And that's part of the journey between

163

We have a prototype.

164

We can publish this in what's considered a top machine learning conference, the paper

appeared in NeurIPS, versus.

165

I can almost say we have something that's stand worthy.

166

And the requirements are a little bit orthogonal.

167

So it's not like one is superior, but there's a lot of extra work that needs to happen.

168

And that will continue to happen.

169

Because one of the, I think, open question is when we make a new feature available, how

much responsibility

170

do we take and how much responsibility do we give to the users?

171

So maybe those are some of the topics that we can dive into.

172

But one thing that I'll say is the tuples that Brian mentioned, that was one of the key

technical components that we needed to develop in order to have an interface that's

173

user-friendly enough to use this integrated Laplace.

174

Yeah, I love that because

175

I don't know for you folks, but me, if I hear, yeah, we integrated two poles, I don't

think it's that important.

176

But then when you talk to the guys who actually code the stuff and implement that, it's a

building block that then unlocks a ton of incredible features and new stuff for users.

177

Yeah, and we can make that very, very concrete.

178

Yeah, for sure.

179

Actually, to give an example.

180

Well, Brian, how would you define a tuple?

181

So in type, no, I'm joking.

182

So a tuple is essentially just a grouping of different types of things.

183

So the simplest one to think of is like a point in R2, like a xy coordinate.

184

It's just a tuple of a real number and another real number.

185

But the nice thing about tuples as compared to like an array is that those don't have to

be the same type.

186

So for example, in more recent versions of Stan,

187

there is a function called eigen decompose which gives you a matrix of the eigenvectors

and a vector of the eigenvalues both back to you at the same time.

188

And so this actually cuts the amount of computation that has to be done in half because in

previous versions you had to call the eigenvectors function and the eigenvalues function

189

separately and they were repeating some work and now it can just give you this object that

has both at once.

190

And so that's like.

191

One of the really useful things of tuples is it lets you have a principal way to talk

about a combination of different types like that.

192

Yeah, yeah.

193

And so one place where having this grouping of different types is very useful is in

functionals.

194

So what's an example of a functional?

195

DoD solver and stand, it's a functional.

196

One of its arguments is a function, so the function that defines the right-hand side of

your differential equation.

197

And then you need to pass.

198

arguments to that function.

199

And of course, the user is specifying the function, and so they're going to specify what

are the arguments that we pass to that function.

200

There was this time where this function needed to have a strict signature.

201

So we told the user, you're first going to pass the time, the state, then the parameters,

then the real integers, and then the real data and the integer data.

202

And you have the strict format.

203

so basically, those are just way of taking the arguments, packing them into a specific

structure, and then inside the OD, you unpack them.

204

And so not only was this tedious, it can lead you to make your code less efficient if

you're not being careful about distinguishing what's a parameter and what's a data point.

205

And one experience of that

206

I had collaborating with applied people, with epidemiologists, so with Julien Rioux.

207

This was during the pandemic, during the COVID-19.

208

At some point, Julien reached out to the stand development team and he said he's

developing this really cool model, but right now it takes two, three days to fit, right?

209

Something like that.

210

And we're not at the...

211

level of complexity that we want to be at.

212

And so I have to give really most of the credit to Ben Bales, who was also a stand

developer at the time.

213

And we took a look at how the ODE was implemented and how it was coded up and how the

different types were being handled.

214

And we realized that way more of the arguments that were being passed were parameters than

was necessary.

215

And once you correct for that, the running time of the model went from two, three days to

two hours.

216

So not only is that much faster and that's good in terms of reproducibility, that also

means you can then keep developing the model and go to something more complicated.

217

So having this kind of two poles, well really what it gave us was variational, what's

called variadic arguments, sorry.

218

That was a big step actually, where now you don't have those strict signatures when you

pass the functionals.

219

People can really pass different things.

220

Now for the integrated Laplace, so I realize we haven't really defined what it is, but

basically what I'll say is that there are two functionals that you need to pass.

221

One is you're defining a likelihood function and the other one is you're defining a

covariance function.

222

And so we want the users to be able to use variadic arguments for both those functions

that they're defining.

223

So they're not construed by types.

224

That way it's not tedious, it's not error prone, or it's not prone to inefficiencies.

225

And that's why those two poles, to make the code user friendly, to probably decrease the

compute time that users will spend on this algorithm.

226

That's why that kind of stuff is important.

227

The power users, they don't need it.

228

They can handle the strict signatures.

229

I handle the strict signatures.

230

No problem.

231

But once you start using other probabilistic programming languages,

232

You realize that one of the big strengths of Stan is the attention it gives to users, to

API, how mindful it is from the users.

233

Other languages, you can tell that it really feels like sometimes they're written for

software engineers.

234

And the software engineers are the ones who are going to be the best ones at using those

languages.

235

But I think that that's one of the strengths of Stan.

236

and that some of the innovations are maybe gonna be less technical or algorithmic,

although those exist, and maybe we'll have time to talk about it, but actually making this

237

more user-friendly, less error-prone, less inefficiency-prone.

238

Yeah, and that definitely comes up, and I think it will come up whenever we're working on

new features for Stan.

239

There's always sort of two users we have in our head.

240

There's the user who is already at the limit of what Stan can do and wants to fit the next

biggest model, and how can we help that user, but also the user of like, you

241

they have a relatively small model that they just can't figure out right now and can we

make that user's life easier too?

242

sometimes they're actually sort fighting each other, but usually we can find features that

actually make both of their lives better, which is like the ideal circumstance.

243

But by the way, kind of in the spirit of that, apparently most of our Stan users are BRMS

users.

244

I think that's established, right?

245

BRMS really gives you this beautiful syntax that people can play with, that people can

reason with.

246

Personally, I like the Stan language.

247

That syntax is a bit more explicit.

248

But even that syntax in the Stan model is a simplification of what Stan is doing under the

hood.

249

I'll give you a simple example.

250

You know those tilde statements that you have in the model block, right?

251

That's because

252

You know, people like Andrew Galman like reasoning about models in a data-generated

fashion, right?

253

But really, you know, what's going on under the hood is we're incrementing a log

probability density, right?

254

So different users function with different level of abstractions, depending on whether

they're statisticians or, you know, more software engineering, maybe ML-oriented people,

255

or maybe

256

scientists who primarily reason about covariates, right?

257

That's where I see one of the big roles that BRMS is playing.

258

And we need a way that's maintainable, that's, you know, avoid compromises, you know, to

kind of like cater to these different users.

259

And in fact, we should talk about BridgeStand and a new community of users we're hoping to

reach with.

260

withstand maybe at some point.

261

Yeah, I'll add that to the notes.

262

Good, good.

263

Yeah, so many questions.

264

Thank you so much, guys.

265

think, yeah, something I'd like to pick up.

266

We'll get back to Inla also at some point.

267

think it's going to be like the, how do you say, chirurgie in English?

268

The thread.

269

The thread, thank you.

270

The red thread, you can say that.

271

I don't know.

272

So it's going to be the thread.

273

Talking a bit more about the beginners you were talking about and the user who is trying

to get his model work but cannot figure it out yet.

274

Do you see a common difficulty that these kind of users are having lately, maybe in the

stand forums, things like that?

275

And maybe you can tell them how to use that right now or maybe tell us what you guys are

doing.

276

in the coming month to address that kind of obstacles.

277

I think there are two, and they're sort of different.

278

So I think a lot of users who are coming from more traditional like R or Python and are

trying to write Stan themselves for the first time, the difficulty of just having a

279

compiled language at all, both in terms of the extra installation steps, but then also

like dealing with static typing.

280

And if you're not used to sort of thinking about variables in this way.

281

And so there are things we've talked about of trying to work on that, but a lot of what

I've invested in is just trying to improve the error messages the compiler gives you and

282

trying to have them less be like what a compiler engineer knows went wrong and make it

more like what you think went wrong.

283

But I think the second class that I see, and this is sort of going back to Charles's

point, is I think we have a lot of users who will use a tool like BRMS or Rstan Arm.

284

and it will get them as far as it gets them and then they want to go a bit further.

285

But I think the issue is if they've never written any stand code at that point, they ask

BRMS, hey, can you give me your stand code?

286

And they're given this model that would have taken them several months to write themselves

and now they have no hope.

287

They're starting off in the deep end already because they already have a very powerful

model that they just want to tune one bit further.

288

And that's a much harder thing, both in terms of

289

Software, also pedagogically, I don't know how to handle that.

290

I don't know if you have more.

291

I think a bit less about beginners.

292

No, no, okay, okay, so let me, let me nuance that a little bit.

293

So I teach workshops, I've had opportunities to teach.

294

And actually, I think about some fundamental questions that a beginner is likely to ask,

but for which we don't have great answers to.

295

And I'll give you one example.

296

For how many iterations should we run Markov chain Monte Carlo?

297

Right?

298

That's an elementary question, and it's not an easy one to answer.

299

especially if you start digging and thinking about what is the optimal length of a Markov

chain?

300

What is the optimal length of a warm-up phase, of a sampling phase?

301

What is the number of Markov chains that I should run given some compute that's available

to me?

302

And then you get into a more fundamental question, which is what is the precision that

people need from their Monte Carlo estimators?

303

So I asked an audience of scientists, well, what effective sample size do you need?

304

What summaries of the posterior distribution do you need?

305

Are you really interested in the expectation value, or do you need the variance, or maybe

you need these quantiles or these other quantiles?

306

And we have some unfortunate terminology.

307

People say we're computing the posterior.

308

That doesn't mean much.

309

It conveys a good first order intuition, but not a good second order intuition.

310

I like to say we're probing the posterior.

311

And then we need to think about what are the properties of the posterior that we're

actually pursuing.

312

And so then we get into, people ask me, when should I use MCMC or variational inference?

313

So people criticize variational inference.

314

say, well, even when you solve the, so what does VI do?

315

Maybe just as a summary is.

316

You have a family of approximation, for example, Gaussians.

317

And then within that family of approximation, it tries to find the best approximation to

your posterior.

318

And people will dismiss it because they say, look, even if you solve the optimization

problem, at the end of the day, your posterior is not a Gaussian.

319

So your optimal solution is not good.

320

It has what's called, what people call an asymptotic bias.

321

Whereas MCMC, you know that we have enough compute power.

322

and enough can be a lot, right?

323

Eventually you will hit arbitrary precision, right?

324

But now if I think about, I'm trying to probe the posterior, well maybe that Gaussian

approximation does match the expectation value, does match the summary quantities that I'm

325

interested in.

326

Maybe it captures the variance, or maybe it captures the entropy, right?

327

So maybe that is the pedagogical work that

328

I'm trying to do for beginners with the caveat that I don't have great answers to all

those questions.

329

I think these are real research topics.

330

But if I think about one goal, for example, that I would like to achieve, I would like to,

I want it to be part of the workflow.

331

people are doing work on that.

332

Aki Vettari is doing great work on that, to only name one person.

333

Once people figure out this is how precise my Monte Carlo estimators need to be, I want

that to be the input to stand.

334

And then I want it to run the Markov chains for the right number of iterations in a way

that gives you that precision without wasting too much computational power.

335

And we're not there yet.

336

We have promising directions to do that, which also come with their fair share of

challenges.

337

But yeah, that's the kind of thing I want to do for beginners and for intermediates and

for advanced and for myself.

338

But yeah, the beginners ask the right questions and the difficult questions.

339

Okay, thanks Charles.

340

Nice save.

341

No, so more seriously, yeah, Brian, was wondering like, so if you had, let's say Stan

Wulham,

342

He comes to you in a dream and he's like, okay, Brian, you've got one wish to make Stan

better for everybody, including the beginners, Charles.

343

So what would it be?

344

This is like a genie powerful wish.

345

I can rewrite the history of the...

346

Something that we've talked about again and again, but it would just be such a huge lift.

347

But if I'm allowed to go back to the start, I think that...

348

There's been a lot of talk about how the block structure of Stan gives a lot of power, but

it also makes a lot of things limiting.

349

it's, right now if you want to do a prior predictive check, you oftentimes need a separate

model that looks a little different than the model you're actually writing.

350

And this is one of the things that's great about BRMS, right, is the single formula can be

turned into all these models at once.

351

But there has been previous research, so Maria Goranova, Goranova?

352

She did a master's thesis and a PhD thesis on a tool she called SlickStand, which was a

stand with no blocks.

353

And so it sort of would automatically, you would write your stand model as you do now, but

without saying what's data and what's parameters, and then you would just give it data,

354

and it would then figure out, okay, these are the data, these are the parameters, here are

things I can move to generated quantities, and it would sort of be a much more powerful

355

form of the compiler that would really capture a lot of these ideas, but it would also be

sort of a fundamentally different.

356

thing than Stan.

357

If I could really do anything in the world, that would probably be it.

358

But I don't know if that will ever make it there.

359

There's a lot of existing stuff that we would have to give up, I think.

360

Yeah.

361

I understand.

362

If you're interested, Mario Gorinoa was in the podcast.

363

You can go on their website, learnbasedats.com.

364

There is a small stuff on the right.

365

On the top, you can...

366

look for any guests.

367

So Maria Gorinova, that was a great episode because I think she's also working on

automatic reparameterization, if I remember correctly.

368

So if you ever had to reparameterize a model, that can be quite frustrating if you're a

beginner because you're like, but it's the same model.

369

I'm just doing that for the sampler.

370

And so one of the goals of that is just having the sampler figure that out by itself.

371

Yeah, and then she also did some interesting work on automatic marginalization where it's

tractable, which was very cool, because that's another, I don't feel confident in my own

372

ability to marginalize a model off the top of my head, so it's like a, I know that's a

thing that new users hit a lot.

373

Yeah, yeah, yeah, I mean, you hit that quite a lot, and yeah, if we could automate that at

some point, that'd be absolutely fantastic, yeah.

374

Charles, I think we've got nine minutes before the Q &As.

375

So I'm going to give you choice.

376

No, so we could go back to talk about Inla a bit, because I realize we should have done

something at the beginning, which is defining Inla and telling people why that would be

377

useful and when.

378

We can also talk about BridgeStand, but I think, Brian, you can talk about BridgeStand

too.

379

So your call, Charles.

380

Let's talk about BridgeStand.

381

Or let's talk about BridgeStand.

382

Let's see how fast I can do it.

383

Maybe we can do both.

384

Yes and yes.

385

So Simon's talk earlier mentioned BridgeStand.

386

And if people aren't familiar, this was something that Edward Raldis, who's a Stand

developer, started a few years ago when he was visiting us in New York.

387

drives me crazy that I didn't think of this.

388

Edward deserves so much credit because it was sitting there all this time, but what it

essentially does is it, through a lot of technical mumbo jumbo that you should ask me

389

about later, it makes it very easy for people to use Stan models outside of Stan's C++

ecosystem.

390

And so if you have a model in Stan, but you want to use a...

391

like an algorithm that's only implemented in our package or that you're developing

yourself, it really lets you get the log densities and the gradients with all of the speed

392

and quality of the Stan Math library, but you can use these Python libraries or these like

experimental things that you're working on.

393

And so it's our, a lot, we have a paper and it has a few citations already of people who

have been using it to develop new algorithms and like I know a lot of work that Bob has

394

been doing recently has been using it and so like that's one way we're, especially

395

One of the things we're thinking of for those users who want to push the edge is new forms

of variational inference and new forms of HMFC.

396

And it has already been a really huge boon for that research.

397

Yeah, yeah.

398

At the Flatiron Institute, we do a lot of algorithmic work on new samplers and new

variational inference.

399

And we now use BridgeStand all the time.

400

I'll give you two good reasons and there are probably more but one of them is that gives

us access to Stan's automatic differentiation and if you look at a lot of papers that

401

evaluate the performance of algorithms they do it not against time but against number of

gradient evaluations because that tends to be the dominant operation computationally and

402

so now you write your sampler in Python or

403

maybe an R, or you write your VI in Python in R, but you still get the high performance

from using Stan.

404

So that's great.

405

And then the second thing is that means that you can now test those new algorithms that

you've developed in a pretty straightforward way on Stan models and the library of Stan

406

models, including posterior DB or maybe some other models that you've been using.

407

And those models are very readable.

408

It standardizes a little bit the testing framework.

409

so it has changed my thinking a little bit as someone who works a lot on the Stan

compiler, thinking of Stan not just as its own sort of ecosystem, but also as like a

410

language for communicating models.

411

I find it really helpful.

412

Someone can describe a model in LaTeX up on a slide, but as soon as they show me the Stan

code, I'm like, I get it.

413

And even if my job now was to go implement it in PyMC or something, I think it's still

helped.

414

Having this language that is a little bit bigger than itself or a little bit bigger than

it used to be where now, I see Adrian here is in the audience and he has an implementation

415

of HMC in Rust.

416

But you can use Stan models with it because of BridgeStan.

417

it has opened up the, sorry, Adrian's in the back.

418

But it's opened up the world of things that Stan can be, which is one thing that I think

is very cool.

419

Yeah, and I think, so when I spoke about the new community of users that I think we're

going to reach is there are people who write their own samplers who have particularly

420

difficult problems.

421

And even today, we've had two examples, at least two examples of people who departed from

the traditional samplers that are implemented in Stan, either to implement tempering or to

422

implement massive parallelization.

423

And so, you know, I really think that, you know, there is a group of people who for their

problems, you know, like to develop and try out certain samplers.

424

And, you know, that's also going to drive research for what could be the next default

sampler or variational inference or approximation in Stan.

425

They are candidates for that.

426

Although it's true that the more we learn, the more we develop new samplers, the more we

realize how good Nuts is.

427

But things are going to change over the years.

428

OK, awesome.

429

Thanks a lot, guys.

430

So I still have a ton of questions.

431

But already, let's open it up to the audience.

432

Are there already any questions?

433

Or should I ask one?

434

OK, perfect.

435

So, mentioning the new samplers that you guys are developing at the Flatiron and also I

have a lot of guests who come on the show and talk about new samplers, normalizing flows

436

for instance, Marie-Lou Gabriel was on the show, also Marvin Schmidt, Paul Buechner is

here, he works a lot on bass flow with Marvin Schmidt.

437

They are doing amortized patient inference.

438

So I'm really curious how you guys think about that and Stan, basically.

439

Because most of the time, it's also tied to increasing data sizes.

440

And so people are looking into new samplers which can adapt to their use case better.

441

So I'm curious how you guys think about that in the Stan team and what you're thinking of

developing in the coming month about that.

442

Yeah, I think one of the challenges that these approaches often, sort of one of the

motivating reasons for them is that you can get a wall clock time reduction by just

443

throwing a massive amount of compute at it with GPUs, which is one place where...

444

Stan's GPU support is still kind of piecemeal, like we're working on it, but it's sort of

like we can't compete with Google developing Jacks, you know?

445

And so like, you know, Simon's presentation earlier showed that like on CPU, Stan actually

beats Jacks or BridgeStand, you know, can be faster than Jacks.

446

But on GPU, we have sort of no hope.

447

And I think that like, or at least at the moment, no hope.

448

But I think that's where these approaches become really challenging is like trying to

think of.

449

And I think it's sort of an almost existential question of like, is Stan just like the CPU

solution, right?

450

And is something else better?

451

Because there are things about Stan's like, sort of core design that don't like GPUs.

452

It's a very expressive language and GPUs really like less expressive languages that are

much more easier to guess what you're gonna do next.

453

And so I think that is something that, know,

454

I personally believe there will always be sort of a community of like, know, researchers

working on their laptop or that sort of thing.

455

And so I think there will always be a place for these like CPU bound implementations.

456

But yeah, if you can predict that, you can probably make a lot of money.

457

Charles?

458

Yeah, I'm going to try and return to the original question, which is, you know,

459

So there are a lot of algorithms that are being developed and there are a of good ideas

that go into developing these algorithms and there some good experiments and some good

460

empirical evidence that supports why you might want to use those algorithms.

461

Nonetheless, 80 to 90 % of the time when I read a paper about a new algorithm, it doesn't

give me enough information as to whether

462

I should now start using this algorithm to solve my problem.

463

And there is a, so what does that mean?

464

That means that usually you need to somehow implement that algorithm and test it yourself

on your own problem, and that's fine, but I think that a lot of these algorithms out there

465

are not yet battle tested.

466

And we're kind of in a situation where, okay, we,

467

maybe we like the prototype and maybe it's promising, do we put in the developer time to

build this in Stan?

468

And it's a bit of a cycle because once it appears in Stan, then it really gets battle

tested.

469

And then we get feedback from the community and we can try to learn things about this

algorithm, we can try to improve it.

470

That's actually what happened to the no U-turn sampler which has evolved since its

original inception.

471

You know, I'm of the opinion that,

472

My bar for scientific papers is it presents a good idea and it's thought stimulating.

473

But I don't think it tells me this is the next thing we should build in Stan.

474

I think BridgeStan can alleviate some of that because it makes it easier for people to

build implementations that can then be tested in Stan and then we kind of get into battle

475

testing things.

476

Maybe someone builds a Python package

477

that is compatible with BridgeStand and maybe the process becomes instead of the stand

developers, the stand community, brutally evaluating an algorithm before deciding to put

478

some amount of work, maybe first this package gets used and it's developed by an algorithm

developers.

479

But this...

480

This is the broader question of how do algorithms get developed, implemented, and adopted?

481

And I'll tell you what, another big criterion here is the simplicity of the algorithm.

482

That plays a huge role into whether an algorithm is adopted by developers, by users, or

not.

483

So the answer is I don't know.

484

Yeah, that's always a fine answer.

485

Any questions?

486

I'm going to bring one up for my neighbor.

487

Wait, Perfect.

488

We needed the mic.

489

So what do we do about algorithms that are good for specific situations but not good for

other things?

490

Like so far we've only developed like black box algorithms that we kind of hope work

everywhere.

491

We don't have any kind of real specific algorithms for anything.

492

Is there any future for that?

493

I mean, this is...

494

I think this is one advantage, so I'm gonna quote the person who just asked the question,

but one thing Bob has said a lot is the reason we don't wanna just put 30 samplers into

495

Stan is then a lot of practitioners would try all 30 of them and then just report the,

there's an advantage to sort of being a great filter and being very conservative in what

496

is actually in Stan.

497

But I do think this is one advantage to making it easier to broaden the ecosystem where

now I think a future for that kind of

498

algorithm is in a R package or a Python package that can interface with, there are now

existing examples out there of an implementation of an algorithm that has support for Stan

499

models and PyMC models.

500

So it can kind of bridge gaps between communities, also sort of, if you have to install a

separate package, that makes it fairly clear that this is for a separate purpose.

501

And so I think that's what I would say the future is for those.

502

Yeah, I agree.

503

Do you have an intuition how easy it is for the Sten compiler to figure out whether a

model is generative and then to be able to sample from it?

504

I mean, of course we can do it in generative quantities, but it's always awkward to double

code our models.

505

This is a question that also sort of does expose a bit of my sort of not traditional

statistics background, is that I have never been presented with a definition of like,

506

generative or graphical model that is precise enough for me to actually answer this

question.

507

I think that there are definitely easy cases and hard cases.

508

I suspect that in general it would be impossible, but it's also, I think it's probably

likely that we could have a system where it tries really hard and then if it doesn't

509

succeed in a minute it gives up or something like that.

510

There are all these sorts of tricks in the compiler world, but I think that the...

511

This is another one of these things, kind of like GPU support, that because you can write

basically anything you want, you can also write sort of the worst possible case for this

512

kind of automated analysis.

513

an open question I've had for a long time is like, what percentage of STAND models in the

wild are generative or not?

514

If that number just naturally is 80, 90%, I think then this is like a very fruitful thing.

515

But if it's like 60, I don't know.

516

less, I'm not sure.

517

That's been what I've heard is that it is more like, it is fairly high, yeah, I think it

would be something that's worth looking into, but I would need some handholding on the

518

statistic modeling side of that, actually.

519

Sorry, I shouldn't call on people.

520

Hi, so I have a question about more on the people trying to implement models in Stan.

521

And say there's a model and it's just, you know, it's taking a very long time.

522

And people think, well, Stan, you know, they might have some complaints or I say it's too

slow.

523

But what I found in practice also is I never clear sometimes what parts of my model are

causing the delay.

524

So what are the slow bits or?

525

It can either just be like mathematically this is just harder to estimate or there's some

shape of my posterior that's really harder to navigate.

526

But I don't really get that feedback unless I'm like fixing certain parameters, toying

with other things.

527

Is there any way to allow, know, give that feedback of, what's causing some issues?

528

you ever thought about modeling that?

529

Sorry.

530

So I remember maybe a year ago, I was actually, I met Andrew Gelman and Meti Morris in

Paris at a cafe.

531

We just all so happened to be in Paris.

532

And we started brainstorming.

533

We had an idea of a research project, which is how much can you learn about your model and

your sampler by running 20 iterations of HMC?

534

And the idea that, you know, fail fast, learn fast, that, you know, the early iterations

of a Bayesian workflow should be based on that.

535

And I think that a lot of the statistics literature and the more formal literature, you

know, kind of imagines that, you know, you've done a really good job fitting your model,

536

you've thrown a lot of computation, you've waited a long time.

537

And we want to figure out, you know, what are the lessons that you can learn quickly,

right?

538

So now,

539

I can talk a little bit from experience and I can give you that, but we kind of want to

make that also part of the workflow and your early iterations that we can learn with fast

540

approximation.

541

And then hopefully we'll have a good answer to your question.

542

There's also a tool for instrumentation.

543

Yeah, was gonna say, in the immediate sense, there is the ability to profile stand models.

544

You can write a block that starts with the word profile and then a name, and then you can

turn that on when you're running it, and it will give you a printout of like, the block

545

named X took this percentage of the time, the block named Y took that percentage, and it

can help you identify at least like, here's the bad line.

546

Now, it might not help you figure out what you need to do instead.

547

But that's where I found that there are some real wizards who live on the Stand Forum,

some of whom are in the room and some of whom are completely anonymous and will never meet

548

them.

549

But they're super helpful.

550

if it's a model that you can share, that you can share a snippet of, there is a lot of

human capital.

551

yeah, automating that and putting that into documentation is an ongoing thing.

552

Yeah, mean, plus one to the human capital.

553

And the contributions of everyone here who comes to this conference, who teaches

tutorials, who demonstrates

554

their models, who shares the documentation, who makes their code open source.

555

I that's also one of the things that makes a programming language work.

556

Time for one last question.

557

So I was thinking, if you go back some decades, 50, 60 years or 48, if you develop a

model, then you have to develop a way to sample from the posterior and stuff like that.

558

But maybe fast forward to today and maybe my advisor could be thinking, when I was a boy,

I had to write my own sampler.

559

Now you can have people that can be designing models or new ways to model, observe data,

but they maybe don't have to think too much about that computational side.

560

So what you think about the effect of Stan and similar languages on opening up this

research in Bayesian modeling to people who maybe are not numerical analysts or stuff like

561

that.

562

think you should bring your advisor to Stencon.

563

Yeah, so...

564

One way to think about this question is to think about how old Hamiltonian Monte Carlo is.

565

So the original paper is from 1987.

566

And yet it was largely unused by the broader scientific community until Stan came out.

567

And what were the technologies, technological developments that enabled Stan to make

Hamiltonian Monte Carlo

568

the workhorse of so many scientists.

569

I that's something worth thinking about.

570

Though I should say the one exception, the one person who did use HMC through the 90s and

2000s is Radford Neal, right, who did manage.

571

But otherwise, the tuning parameters, the control parameters, the requirement to calculate

gradients, that was an obstacle to many people.

572

And so instead of using HMC, they're using other samplers, which we know perform.

573

between less well and dramatically less well in many cases.

574

So I think it's great that we have these black box methods.

575

But the one nuance that I will say is that the algorithm is not the only thing that's

black boxified and Stan.

576

The diagnostics, the warning messages, the generation of those things, the fact that these

things are generated automatically.

577

That's what makes a black box algorithm reliable.

578

It was the derivatives too.

579

There wasn't a good auto-div system when we built Stan.

580

I mentioned gradients, no?

581

I'll caveat this a bit with the previous question hints at the fact that these things are

never truly black box.

582

Because when you're facing performance difficulties, when you're at the edge, you do need

to have a fairly sophisticated understanding of what's happening.

583

If you ever have used the reduce some function in Stan, that is technically like an

implementation detail.

584

that you are having to exploit to get the speed you need.

585

And so there's always a fuzzy boundary here, but I think that it does help lower the

barrier to entry, even if the hypothetical ceiling can stay as high as your imagination.

586

That's true.

587

We could be more black box.

588

That's seriously, huh?

589

I think that people do tweak and manipulate the methods a lot, and they need to understand

some fundamental concepts.

590

Awesome.

591

Well, I think we're good.

592

Thank you so much, folks, for being part of the first live show.

593

This has been another episode of Learning Bayesian Statistics.

594

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbayestats.com for more resources about today's topics, as well as access to more

595

episodes to help you reach true Bayesian state of mind.

596

That's learnbayestats.com.

597

Our theme music is Good Bayesian by Baba Brinkman.

598

Fit MC Lance and Meghiraam.

599

Check out his awesome work at bababrinkman.com.

600

I'm your host.

601

Alex Andorra.

602

You can follow me on Twitter at Alex underscore Andorra like the country.

603

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

604

Thank you so much for listening and for your support.

605

You're truly a good Bayesian.

606

Change your predictions after taking information in and if you're thinking I'll be less

than amazing.

607

Let's adjust those expectations.

608

Let me show you how to be a good Bayesian Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation