Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways:
- User experience is crucial for the adoption of Stan.
- Recent innovations include adding tuples to the Stan language, new features and improved error messages.
- Tuples allow for more efficient data handling in Stan.
- Beginners often struggle with the compiled nature of Stan.
- Improving error messages is crucial for user experience.
- BridgeStan allows for integration with other programming languages and makes it very easy for people to use Stan models.
- Community engagement is vital for the development of Stan.
- New samplers are being developed to enhance performance.
- The future of Stan includes more user-friendly features.
Chapters:
00:00 Introduction to the Live Episode
02:55 Meet the Stan Core Developers
05:47 Brian Ward’s Journey into Bayesian Statistics
09:10 Charles Margossian’s Contributions to Stan
11:49 Recent Projects and Innovations in Stan
15:07 User-Friendly Features and Enhancements
18:11 Understanding Tuples and Their Importance
21:06 Challenges for Beginners in Stan
24:08 Pedagogical Approaches to Bayesian Statistics
30:54 Optimizing Monte Carlo Estimators
32:24 Reimagining Stan’s Structure
34:21 The Promise of Automatic Reparameterization
35:49 Exploring BridgeStan
40:29 The Future of Samplers in Stan
43:45 Evaluating New Algorithms
47:01 Specific Algorithms for Unique Problems
50:00 Understanding Model Performance
54:21 The Impact of Stan on Bayesian Research
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke and Robert Flannery.
Links from the show:
- Come see the show live at PyData NYC: https://pydata.org/nyc2024/
- LBS #90, Demystifying MCMC & Variational Inference, with Charles Margossian: https://learnbayesstats.com/episode/90-demystifying-mcmc-variational-inference-charles-margossian/
- Charles’ website: https://charlesm93.github.io/
- Charles on GitHub: https://github.com/charlesm93
- Charles on LinkedIn: https://www.linkedin.com/in/charles-margossian-3428935b/
- Charles on Google Scholar: https://scholar.google.com/citations?user=nPtLsvIAAAAJ&hl=en
- Charles on Twitter: https://x.com/charlesm993
- Brian’s website: https://brianward.dev/
- Brian on GitHub: https://github.com/WardBrian
- Brian on LinkedIn: https://www.linkedin.com/in/ward-brianm/
- Brian on Google Scholar: https://scholar.google.com/citations?user=bzosqW0AAAAJ&hl=en
- Brian on Twitter: https://x.com/ward_brianm
- Bob Carpenter’s reflections on StanCon: https://statmodeling.stat.columbia.edu/category/bayesian-statistics/
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
This episode is the first of its kind.
2
Welcome to the very first live episode of the Learning Visions Statistics podcast recorded
at STANCON on September 10, 2024.
3
Again, I want to thank the whole STANCON committee for their help, trust and support in
organizing this event.
4
I surely had a blast and I hope
5
Everybody did.
6
In this episode, you will hear not about one, but two StandCore developers, Charles
Marcossian and Brian Ward.
7
They'll tell us all about Stand's future as well as give us some practical advice for
better statistical modeling.
8
And of course, there is a Q &A session with the audience at the end.
9
This is Learning Basics Statistics, episode 118.
10
Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,
the projects, and the people who make it possible.
11
I'm your host, Alex Andorra.
12
You can follow me on Twitter at alex-underscore-andorra.
13
like the country.
14
For any info about the show, learnbasedats.com is Laplace to be.
15
Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on
Patreon, everything is in there.
16
That's learnbasedats.com.
17
If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
18
See you around, folks.
19
and best patient wishes to you all.
20
And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can
help bring them to life.
21
Check us out at pimc-labs.com.
22
Hello my dear patients, today I want to welcome a new patron in the LearnBasedDance
family.
23
Thank you so much, Rob Flannery, your support truly makes this show possible.
24
I can't wait to talk to you in the Slack channel and hope that you will enjoy the
exclusive merch coming your way very soon.
25
Before we start, I have great news for you.
26
Because if you like live shows, I want to have two new live shows of LBS coming up on
November 7 and November 8 at Piedata, New York.
27
So if you want to be part of the live experience, join the Q &A's and connect with the
speakers and myself, and also get some pretty cool stickers, well...
28
You can get your ticket already at pine data dot org slash NYC 2024.
29
can't wait to see you there.
30
OK, on to the show now.
31
So, welcome.
32
Thank you so much for being here.
33
You are going to the immense honor and privilege to be the first ever live audience of the
Learning Basics and Statistics podcast.
34
Thank you.
35
Of course, as usual, a huge thank you to all the organizers of StandCon.
36
Charles, of course, thank you so much.
37
know you worked a lot.
38
Michael also who organized all of that.
39
So I think you can give them a big round of applause.
40
Okay, so let's get started.
41
So for those of you who don't know me, I'm Alex Endora.
42
I am an open source developer.
43
I am actually a PMC core developer.
44
Am I allowed to say those words here?
45
That's fine.
46
Don't worry.
47
Yes, and very recently started as the senior applied scientist at the Miami Marlins.
48
So if you're ever in Miami, let me know.
49
And today we are gonna talk, and yeah, no, of course I am the host and creator of the
Learning Patient Statistics podcast, which is the best show about patient stats.
50
I think we can say that confidently because it's the only one.
51
it's not that hard.
52
But today we have amazing guests with us.
53
We're gonna talk about everything Stan, today's the nerd panel.
54
anything you wanted to know about Stan, about samplers, about all the technical stuff
behind Stan.
55
Why does it take so long to have inline there, for instance, know, stuff like that.
56
You can ask that.
57
It's going to be like the last 10 minutes of the show, I think.
58
But before that, we're going to talk with Brian and Charles.
59
So I'm going to be without the mic that gives to the room for the rest of the show so that
you can hear from the guys mainly.
60
So let's start with Brian.
61
So Brian Ward, you were a Standcore developer, if I understood correctly.
62
Can you first give you a bit of a background, the origin story of Brian?
63
How did you end up doing what you're doing?
64
Because it seems to me that you're doing a lot of
65
software engineering thing, which is a priori quite far from the Bayesian world.
66
So how did you end up doing what you're doing today?
67
Yeah, so I majored in computer science and I sort of came into this from a very software
development angle.
68
So I sort of was always interested in how things work.
69
So I learned to program and then I was like, well, how programming languages work?
70
So I learned about compilers and then I stopped before going any deeper because there are
dragons down there.
71
But as part of my studies, I started working on a project with a couple of my professors
that was about Stan.
72
And they were mostly interested in Stan because in their words, it was the probabilistic
programming language that had the most thorough formal documentation of the language and
73
its semantics.
74
They really liked that they could form an abstract model of the Stan language.
75
And so that was my first time ever using a probabilistic programming language.
76
It was really coming in from that angle.
77
And then since 2021, I've been working a lot on the STAND compiler, but then also just on,
like you said, general software engineering for the different Python libraries and trying
78
to improve the installation process on systems like Windows and that sort of thing.
79
OK.
80
So we'll get back to that because I think there are a lot of interesting threads here.
81
But first, let's switch to Charles.
82
So maybe for the rest of the audience, Charles was already.
83
in the podcast, he's got the classic episode.
84
So if you're really interested in Charles' background, you can go and check out his
episode.
85
But maybe just for now, if you can quickly tell us who you are, how you ended up doing
that.
86
Yes, I should mention that I am an understudy.
87
were actually two other stand developers we were hoping to have on this panel.
88
because of circumstances, I ended up being here.
89
I'm in very good company and I have a lot of thoughts about the future of Stan, which is
the topic of this conversation.
90
But essentially, I've been a Stan developer for eight years now.
91
And I started when I was working in biotech in pharmacometrics where Stan was up and
coming, but it lacked certain features to be used in pharmacometrics modeling.
92
Notably, know, support for ODE systems, features to model clinical trials.
93
So my first project for Stan was developing an extension of Stan called Torsten, but also
in the process developed some features that directly appeared in Stan.
94
For example, the matrix exponential, which is used to solve linear ODE's, the algebraic
solvers.
95
And then,
96
I became a statistician, I pursued a PhD in statistics and I continued developing certain
features firsthand, kind of in that theme of implicit functions.
97
And I think we'll talk a little bit about that.
98
Nowadays, what I am is a research fellow, which is a glorified postdoc at the Flatiron
Institute, where I'm actually a colleague with Brian.
99
And I mostly do research.
100
around Bayesian computation, so that includes Markov chain Monte Carlo, variational
inference, and thinking about probabilistic programming languages today, tomorrow, but
101
also maybe in five or 10 years, what these might look like.
102
Yeah, thanks, Charles.
103
Quick legal announcement that I forgot, of course.
104
For the questions, we're going to record your voice.
105
So if you ask a question, you're
106
consenting to being recorded.
107
If you don't want your voice to be recorded, just come ask the question afterwards or find
a buddy who is willing to ask the question for you.
108
And that will be all fine.
109
So that's that.
110
Also, write down your questions because we're going to have the Q &A at the end of the
episode.
111
So let's continue.
112
Maybe with like that's for both of you.
113
I'm wondering before we talk about the future,
114
You guys work with Stan all the time, so you do a lot of things, but what has been your
most exciting recent project involving Stan, of course?
115
I can go first.
116
So this is a bit further ago, but one of the first real major, major win for me was adding
tuples to the language.
117
it's a slightly more advanced type than it previously appeared in Stan.
118
It had a lot of implementation difficulty, but it was a really big change to the language
in the compiler that finally made it in.
119
But more recently, working directly on Stan, I've been working on
120
been trying to add features to try to make it easier to do some of the things that are
built into Stan, especially related to the constraints and the transforms directly in
121
Stan.
122
So trying to take some of the magic that's built in out and let you be able to do things
yourself that work much closer to that.
123
And that's been interesting to think about how to make Stan a language that is easier to
extend for newer people.
124
this next release will have a
125
functions that make it a little easier to write your own user-defined transforms that do
the right thing during optimization, for example.
126
Hmm, okay.
127
that's cool.
128
Can you maybe give an example about such a function that people could use in a model?
129
Sure.
130
So one thing you might want to do is you might want a simplex parameter, but you want,
because you have some understanding of the posterior geometry, you want an alternative
131
parameterization.
132
You want to use softmax or you want to use some other thing than what's built into Stan.
133
And you can do this right now and it will work almost the same in almost all of the cases.
134
going forward, we're trying to make it work the same in all of the cases.
135
We're trying to sort of cover off those last things.
136
in particular, if you're finding a maximum likelihood estimate, that is done without the
Jacobian adjustment for the change of variables there.
137
But for the built-in types in STAND, but right now there's no way to have that also happen
for your custom transforms.
138
But there will be going forward.
139
Okay, that's really cool.
140
so I have to admit that a lot of my recent work has been more Stan-adjacent rather than
specific contributions to Stan.
141
And so I could talk about that, but maybe one of the features that we are hoping to
release soon and that I developed a few years ago, I prototyped a few years ago, was we
142
wanted to build a nested Laplace approximation inside of Stan.
143
And actually, we developed one and we had a prototype in 2020.
144
So that already goes back and we published a paper about that.
145
And then another year or two later when I wrote my PhD thesis, I had a more thorough
prototype that also released and then we kind of got stuck.
146
And I can talk a little bit about that, but essentially Steve Braunder who was supposed to
join us today, had something came up, hopefully he'll be there in the next few days.
147
at StenCon has really been pushing the C++ code and the development and we have this idea
that maybe by the next Sten release we'll actually have that integrated Laplace
148
approximation and we'll make it available to the users.
149
And of course there are a lot of interesting things in moving parts that are happening
around these features both from a technical
150
point of view.
151
So the automatic differentiation that we had to deploy is, I think, very interesting, very
challenging.
152
Also, the ways in which, what are the features that we put in our integrated Laplace?
153
So I don't think it's going to be as performant as the integrated Laplace approximation
that's implemented in Inla.
154
and I can discuss a little bit what are some of the features we lacked, but we also
focused on what are some unique things that having this integrated Laplace approximation
155
in Stan can give to the users in terms of modeling capabilities.
156
And those are things I'm excited about.
157
And there are going to be a few challenges about using this approximate algorithms, just
as they are whenever you use an approximate algorithm.
158
And that's going to motivate, you know,
159
new elements of a Bayesian workflow, new diagnostics, new checks that will have to be
semi-automated, that will have to be very well documented, and that will also need to be
160
demonstrated.
161
These are all the pieces you need for users to use an algorithm effectively.
162
And that's part of the journey between
163
We have a prototype.
164
We can publish this in what's considered a top machine learning conference, the paper
appeared in NeurIPS, versus.
165
I can almost say we have something that's stand worthy.
166
And the requirements are a little bit orthogonal.
167
So it's not like one is superior, but there's a lot of extra work that needs to happen.
168
And that will continue to happen.
169
Because one of the, I think, open question is when we make a new feature available, how
much responsibility
170
do we take and how much responsibility do we give to the users?
171
So maybe those are some of the topics that we can dive into.
172
But one thing that I'll say is the tuples that Brian mentioned, that was one of the key
technical components that we needed to develop in order to have an interface that's
173
user-friendly enough to use this integrated Laplace.
174
Yeah, I love that because
175
I don't know for you folks, but me, if I hear, yeah, we integrated two poles, I don't
think it's that important.
176
But then when you talk to the guys who actually code the stuff and implement that, it's a
building block that then unlocks a ton of incredible features and new stuff for users.
177
Yeah, and we can make that very, very concrete.
178
Yeah, for sure.
179
Actually, to give an example.
180
Well, Brian, how would you define a tuple?
181
So in type, no, I'm joking.
182
So a tuple is essentially just a grouping of different types of things.
183
So the simplest one to think of is like a point in R2, like a xy coordinate.
184
It's just a tuple of a real number and another real number.
185
But the nice thing about tuples as compared to like an array is that those don't have to
be the same type.
186
So for example, in more recent versions of Stan,
187
there is a function called eigen decompose which gives you a matrix of the eigenvectors
and a vector of the eigenvalues both back to you at the same time.
188
And so this actually cuts the amount of computation that has to be done in half because in
previous versions you had to call the eigenvectors function and the eigenvalues function
189
separately and they were repeating some work and now it can just give you this object that
has both at once.
190
And so that's like.
191
One of the really useful things of tuples is it lets you have a principal way to talk
about a combination of different types like that.
192
Yeah, yeah.
193
And so one place where having this grouping of different types is very useful is in
functionals.
194
So what's an example of a functional?
195
DoD solver and stand, it's a functional.
196
One of its arguments is a function, so the function that defines the right-hand side of
your differential equation.
197
And then you need to pass.
198
arguments to that function.
199
And of course, the user is specifying the function, and so they're going to specify what
are the arguments that we pass to that function.
200
There was this time where this function needed to have a strict signature.
201
So we told the user, you're first going to pass the time, the state, then the parameters,
then the real integers, and then the real data and the integer data.
202
And you have the strict format.
203
so basically, those are just way of taking the arguments, packing them into a specific
structure, and then inside the OD, you unpack them.
204
And so not only was this tedious, it can lead you to make your code less efficient if
you're not being careful about distinguishing what's a parameter and what's a data point.
205
And one experience of that
206
I had collaborating with applied people, with epidemiologists, so with Julien Rioux.
207
This was during the pandemic, during the COVID-19.
208
At some point, Julien reached out to the stand development team and he said he's
developing this really cool model, but right now it takes two, three days to fit, right?
209
Something like that.
210
And we're not at the...
211
level of complexity that we want to be at.
212
And so I have to give really most of the credit to Ben Bales, who was also a stand
developer at the time.
213
And we took a look at how the ODE was implemented and how it was coded up and how the
different types were being handled.
214
And we realized that way more of the arguments that were being passed were parameters than
was necessary.
215
And once you correct for that, the running time of the model went from two, three days to
two hours.
216
So not only is that much faster and that's good in terms of reproducibility, that also
means you can then keep developing the model and go to something more complicated.
217
So having this kind of two poles, well really what it gave us was variational, what's
called variadic arguments, sorry.
218
That was a big step actually, where now you don't have those strict signatures when you
pass the functionals.
219
People can really pass different things.
220
Now for the integrated Laplace, so I realize we haven't really defined what it is, but
basically what I'll say is that there are two functionals that you need to pass.
221
One is you're defining a likelihood function and the other one is you're defining a
covariance function.
222
And so we want the users to be able to use variadic arguments for both those functions
that they're defining.
223
So they're not construed by types.
224
That way it's not tedious, it's not error prone, or it's not prone to inefficiencies.
225
And that's why those two poles, to make the code user friendly, to probably decrease the
compute time that users will spend on this algorithm.
226
That's why that kind of stuff is important.
227
The power users, they don't need it.
228
They can handle the strict signatures.
229
I handle the strict signatures.
230
No problem.
231
But once you start using other probabilistic programming languages,
232
You realize that one of the big strengths of Stan is the attention it gives to users, to
API, how mindful it is from the users.
233
Other languages, you can tell that it really feels like sometimes they're written for
software engineers.
234
And the software engineers are the ones who are going to be the best ones at using those
languages.
235
But I think that that's one of the strengths of Stan.
236
and that some of the innovations are maybe gonna be less technical or algorithmic,
although those exist, and maybe we'll have time to talk about it, but actually making this
237
more user-friendly, less error-prone, less inefficiency-prone.
238
Yeah, and that definitely comes up, and I think it will come up whenever we're working on
new features for Stan.
239
There's always sort of two users we have in our head.
240
There's the user who is already at the limit of what Stan can do and wants to fit the next
biggest model, and how can we help that user, but also the user of like, you
241
they have a relatively small model that they just can't figure out right now and can we
make that user's life easier too?
242
sometimes they're actually sort fighting each other, but usually we can find features that
actually make both of their lives better, which is like the ideal circumstance.
243
But by the way, kind of in the spirit of that, apparently most of our Stan users are BRMS
users.
244
I think that's established, right?
245
BRMS really gives you this beautiful syntax that people can play with, that people can
reason with.
246
Personally, I like the Stan language.
247
That syntax is a bit more explicit.
248
But even that syntax in the Stan model is a simplification of what Stan is doing under the
hood.
249
I'll give you a simple example.
250
You know those tilde statements that you have in the model block, right?
251
That's because
252
You know, people like Andrew Galman like reasoning about models in a data-generated
fashion, right?
253
But really, you know, what's going on under the hood is we're incrementing a log
probability density, right?
254
So different users function with different level of abstractions, depending on whether
they're statisticians or, you know, more software engineering, maybe ML-oriented people,
255
or maybe
256
scientists who primarily reason about covariates, right?
257
That's where I see one of the big roles that BRMS is playing.
258
And we need a way that's maintainable, that's, you know, avoid compromises, you know, to
kind of like cater to these different users.
259
And in fact, we should talk about BridgeStand and a new community of users we're hoping to
reach with.
260
withstand maybe at some point.
261
Yeah, I'll add that to the notes.
262
Good, good.
263
Yeah, so many questions.
264
Thank you so much, guys.
265
think, yeah, something I'd like to pick up.
266
We'll get back to Inla also at some point.
267
think it's going to be like the, how do you say, chirurgie in English?
268
The thread.
269
The thread, thank you.
270
The red thread, you can say that.
271
I don't know.
272
So it's going to be the thread.
273
Talking a bit more about the beginners you were talking about and the user who is trying
to get his model work but cannot figure it out yet.
274
Do you see a common difficulty that these kind of users are having lately, maybe in the
stand forums, things like that?
275
And maybe you can tell them how to use that right now or maybe tell us what you guys are
doing.
276
in the coming month to address that kind of obstacles.
277
I think there are two, and they're sort of different.
278
So I think a lot of users who are coming from more traditional like R or Python and are
trying to write Stan themselves for the first time, the difficulty of just having a
279
compiled language at all, both in terms of the extra installation steps, but then also
like dealing with static typing.
280
And if you're not used to sort of thinking about variables in this way.
281
And so there are things we've talked about of trying to work on that, but a lot of what
I've invested in is just trying to improve the error messages the compiler gives you and
282
trying to have them less be like what a compiler engineer knows went wrong and make it
more like what you think went wrong.
283
But I think the second class that I see, and this is sort of going back to Charles's
point, is I think we have a lot of users who will use a tool like BRMS or Rstan Arm.
284
and it will get them as far as it gets them and then they want to go a bit further.
285
But I think the issue is if they've never written any stand code at that point, they ask
BRMS, hey, can you give me your stand code?
286
And they're given this model that would have taken them several months to write themselves
and now they have no hope.
287
They're starting off in the deep end already because they already have a very powerful
model that they just want to tune one bit further.
288
And that's a much harder thing, both in terms of
289
Software, also pedagogically, I don't know how to handle that.
290
I don't know if you have more.
291
I think a bit less about beginners.
292
No, no, okay, okay, so let me, let me nuance that a little bit.
293
So I teach workshops, I've had opportunities to teach.
294
And actually, I think about some fundamental questions that a beginner is likely to ask,
but for which we don't have great answers to.
295
And I'll give you one example.
296
For how many iterations should we run Markov chain Monte Carlo?
297
Right?
298
That's an elementary question, and it's not an easy one to answer.
299
especially if you start digging and thinking about what is the optimal length of a Markov
chain?
300
What is the optimal length of a warm-up phase, of a sampling phase?
301
What is the number of Markov chains that I should run given some compute that's available
to me?
302
And then you get into a more fundamental question, which is what is the precision that
people need from their Monte Carlo estimators?
303
So I asked an audience of scientists, well, what effective sample size do you need?
304
What summaries of the posterior distribution do you need?
305
Are you really interested in the expectation value, or do you need the variance, or maybe
you need these quantiles or these other quantiles?
306
And we have some unfortunate terminology.
307
People say we're computing the posterior.
308
That doesn't mean much.
309
It conveys a good first order intuition, but not a good second order intuition.
310
I like to say we're probing the posterior.
311
And then we need to think about what are the properties of the posterior that we're
actually pursuing.
312
And so then we get into, people ask me, when should I use MCMC or variational inference?
313
So people criticize variational inference.
314
say, well, even when you solve the, so what does VI do?
315
Maybe just as a summary is.
316
You have a family of approximation, for example, Gaussians.
317
And then within that family of approximation, it tries to find the best approximation to
your posterior.
318
And people will dismiss it because they say, look, even if you solve the optimization
problem, at the end of the day, your posterior is not a Gaussian.
319
So your optimal solution is not good.
320
It has what's called, what people call an asymptotic bias.
321
Whereas MCMC, you know that we have enough compute power.
322
and enough can be a lot, right?
323
Eventually you will hit arbitrary precision, right?
324
But now if I think about, I'm trying to probe the posterior, well maybe that Gaussian
approximation does match the expectation value, does match the summary quantities that I'm
325
interested in.
326
Maybe it captures the variance, or maybe it captures the entropy, right?
327
So maybe that is the pedagogical work that
328
I'm trying to do for beginners with the caveat that I don't have great answers to all
those questions.
329
I think these are real research topics.
330
But if I think about one goal, for example, that I would like to achieve, I would like to,
I want it to be part of the workflow.
331
people are doing work on that.
332
Aki Vettari is doing great work on that, to only name one person.
333
Once people figure out this is how precise my Monte Carlo estimators need to be, I want
that to be the input to stand.
334
And then I want it to run the Markov chains for the right number of iterations in a way
that gives you that precision without wasting too much computational power.
335
And we're not there yet.
336
We have promising directions to do that, which also come with their fair share of
challenges.
337
But yeah, that's the kind of thing I want to do for beginners and for intermediates and
for advanced and for myself.
338
But yeah, the beginners ask the right questions and the difficult questions.
339
Okay, thanks Charles.
340
Nice save.
341
No, so more seriously, yeah, Brian, was wondering like, so if you had, let's say Stan
Wulham,
342
He comes to you in a dream and he's like, okay, Brian, you've got one wish to make Stan
better for everybody, including the beginners, Charles.
343
So what would it be?
344
This is like a genie powerful wish.
345
I can rewrite the history of the...
346
Something that we've talked about again and again, but it would just be such a huge lift.
347
But if I'm allowed to go back to the start, I think that...
348
There's been a lot of talk about how the block structure of Stan gives a lot of power, but
it also makes a lot of things limiting.
349
it's, right now if you want to do a prior predictive check, you oftentimes need a separate
model that looks a little different than the model you're actually writing.
350
And this is one of the things that's great about BRMS, right, is the single formula can be
turned into all these models at once.
351
But there has been previous research, so Maria Goranova, Goranova?
352
She did a master's thesis and a PhD thesis on a tool she called SlickStand, which was a
stand with no blocks.
353
And so it sort of would automatically, you would write your stand model as you do now, but
without saying what's data and what's parameters, and then you would just give it data,
354
and it would then figure out, okay, these are the data, these are the parameters, here are
things I can move to generated quantities, and it would sort of be a much more powerful
355
form of the compiler that would really capture a lot of these ideas, but it would also be
sort of a fundamentally different.
356
thing than Stan.
357
If I could really do anything in the world, that would probably be it.
358
But I don't know if that will ever make it there.
359
There's a lot of existing stuff that we would have to give up, I think.
360
Yeah.
361
I understand.
362
If you're interested, Mario Gorinoa was in the podcast.
363
You can go on their website, learnbasedats.com.
364
There is a small stuff on the right.
365
On the top, you can...
366
look for any guests.
367
So Maria Gorinova, that was a great episode because I think she's also working on
automatic reparameterization, if I remember correctly.
368
So if you ever had to reparameterize a model, that can be quite frustrating if you're a
beginner because you're like, but it's the same model.
369
I'm just doing that for the sampler.
370
And so one of the goals of that is just having the sampler figure that out by itself.
371
Yeah, and then she also did some interesting work on automatic marginalization where it's
tractable, which was very cool, because that's another, I don't feel confident in my own
372
ability to marginalize a model off the top of my head, so it's like a, I know that's a
thing that new users hit a lot.
373
Yeah, yeah, yeah, I mean, you hit that quite a lot, and yeah, if we could automate that at
some point, that'd be absolutely fantastic, yeah.
374
Charles, I think we've got nine minutes before the Q &As.
375
So I'm going to give you choice.
376
No, so we could go back to talk about Inla a bit, because I realize we should have done
something at the beginning, which is defining Inla and telling people why that would be
377
useful and when.
378
We can also talk about BridgeStand, but I think, Brian, you can talk about BridgeStand
too.
379
So your call, Charles.
380
Let's talk about BridgeStand.
381
Or let's talk about BridgeStand.
382
Let's see how fast I can do it.
383
Maybe we can do both.
384
Yes and yes.
385
So Simon's talk earlier mentioned BridgeStand.
386
And if people aren't familiar, this was something that Edward Raldis, who's a Stand
developer, started a few years ago when he was visiting us in New York.
387
drives me crazy that I didn't think of this.
388
Edward deserves so much credit because it was sitting there all this time, but what it
essentially does is it, through a lot of technical mumbo jumbo that you should ask me
389
about later, it makes it very easy for people to use Stan models outside of Stan's C++
ecosystem.
390
And so if you have a model in Stan, but you want to use a...
391
like an algorithm that's only implemented in our package or that you're developing
yourself, it really lets you get the log densities and the gradients with all of the speed
392
and quality of the Stan Math library, but you can use these Python libraries or these like
experimental things that you're working on.
393
And so it's our, a lot, we have a paper and it has a few citations already of people who
have been using it to develop new algorithms and like I know a lot of work that Bob has
394
been doing recently has been using it and so like that's one way we're, especially
395
One of the things we're thinking of for those users who want to push the edge is new forms
of variational inference and new forms of HMFC.
396
And it has already been a really huge boon for that research.
397
Yeah, yeah.
398
At the Flatiron Institute, we do a lot of algorithmic work on new samplers and new
variational inference.
399
And we now use BridgeStand all the time.
400
I'll give you two good reasons and there are probably more but one of them is that gives
us access to Stan's automatic differentiation and if you look at a lot of papers that
401
evaluate the performance of algorithms they do it not against time but against number of
gradient evaluations because that tends to be the dominant operation computationally and
402
so now you write your sampler in Python or
403
maybe an R, or you write your VI in Python in R, but you still get the high performance
from using Stan.
404
So that's great.
405
And then the second thing is that means that you can now test those new algorithms that
you've developed in a pretty straightforward way on Stan models and the library of Stan
406
models, including posterior DB or maybe some other models that you've been using.
407
And those models are very readable.
408
It standardizes a little bit the testing framework.
409
so it has changed my thinking a little bit as someone who works a lot on the Stan
compiler, thinking of Stan not just as its own sort of ecosystem, but also as like a
410
language for communicating models.
411
I find it really helpful.
412
Someone can describe a model in LaTeX up on a slide, but as soon as they show me the Stan
code, I'm like, I get it.
413
And even if my job now was to go implement it in PyMC or something, I think it's still
helped.
414
Having this language that is a little bit bigger than itself or a little bit bigger than
it used to be where now, I see Adrian here is in the audience and he has an implementation
415
of HMC in Rust.
416
But you can use Stan models with it because of BridgeStan.
417
it has opened up the, sorry, Adrian's in the back.
418
But it's opened up the world of things that Stan can be, which is one thing that I think
is very cool.
419
Yeah, and I think, so when I spoke about the new community of users that I think we're
going to reach is there are people who write their own samplers who have particularly
420
difficult problems.
421
And even today, we've had two examples, at least two examples of people who departed from
the traditional samplers that are implemented in Stan, either to implement tempering or to
422
implement massive parallelization.
423
And so, you know, I really think that, you know, there is a group of people who for their
problems, you know, like to develop and try out certain samplers.
424
And, you know, that's also going to drive research for what could be the next default
sampler or variational inference or approximation in Stan.
425
They are candidates for that.
426
Although it's true that the more we learn, the more we develop new samplers, the more we
realize how good Nuts is.
427
But things are going to change over the years.
428
OK, awesome.
429
Thanks a lot, guys.
430
So I still have a ton of questions.
431
But already, let's open it up to the audience.
432
Are there already any questions?
433
Or should I ask one?
434
OK, perfect.
435
So, mentioning the new samplers that you guys are developing at the Flatiron and also I
have a lot of guests who come on the show and talk about new samplers, normalizing flows
436
for instance, Marie-Lou Gabriel was on the show, also Marvin Schmidt, Paul Buechner is
here, he works a lot on bass flow with Marvin Schmidt.
437
They are doing amortized patient inference.
438
So I'm really curious how you guys think about that and Stan, basically.
439
Because most of the time, it's also tied to increasing data sizes.
440
And so people are looking into new samplers which can adapt to their use case better.
441
So I'm curious how you guys think about that in the Stan team and what you're thinking of
developing in the coming month about that.
442
Yeah, I think one of the challenges that these approaches often, sort of one of the
motivating reasons for them is that you can get a wall clock time reduction by just
443
throwing a massive amount of compute at it with GPUs, which is one place where...
444
Stan's GPU support is still kind of piecemeal, like we're working on it, but it's sort of
like we can't compete with Google developing Jacks, you know?
445
And so like, you know, Simon's presentation earlier showed that like on CPU, Stan actually
beats Jacks or BridgeStand, you know, can be faster than Jacks.
446
But on GPU, we have sort of no hope.
447
And I think that like, or at least at the moment, no hope.
448
But I think that's where these approaches become really challenging is like trying to
think of.
449
And I think it's sort of an almost existential question of like, is Stan just like the CPU
solution, right?
450
And is something else better?
451
Because there are things about Stan's like, sort of core design that don't like GPUs.
452
It's a very expressive language and GPUs really like less expressive languages that are
much more easier to guess what you're gonna do next.
453
And so I think that is something that, know,
454
I personally believe there will always be sort of a community of like, know, researchers
working on their laptop or that sort of thing.
455
And so I think there will always be a place for these like CPU bound implementations.
456
But yeah, if you can predict that, you can probably make a lot of money.
457
Charles?
458
Yeah, I'm going to try and return to the original question, which is, you know,
459
So there are a lot of algorithms that are being developed and there are a of good ideas
that go into developing these algorithms and there some good experiments and some good
460
empirical evidence that supports why you might want to use those algorithms.
461
Nonetheless, 80 to 90 % of the time when I read a paper about a new algorithm, it doesn't
give me enough information as to whether
462
I should now start using this algorithm to solve my problem.
463
And there is a, so what does that mean?
464
That means that usually you need to somehow implement that algorithm and test it yourself
on your own problem, and that's fine, but I think that a lot of these algorithms out there
465
are not yet battle tested.
466
And we're kind of in a situation where, okay, we,
467
maybe we like the prototype and maybe it's promising, do we put in the developer time to
build this in Stan?
468
And it's a bit of a cycle because once it appears in Stan, then it really gets battle
tested.
469
And then we get feedback from the community and we can try to learn things about this
algorithm, we can try to improve it.
470
That's actually what happened to the no U-turn sampler which has evolved since its
original inception.
471
You know, I'm of the opinion that,
472
My bar for scientific papers is it presents a good idea and it's thought stimulating.
473
But I don't think it tells me this is the next thing we should build in Stan.
474
I think BridgeStan can alleviate some of that because it makes it easier for people to
build implementations that can then be tested in Stan and then we kind of get into battle
475
testing things.
476
Maybe someone builds a Python package
477
that is compatible with BridgeStand and maybe the process becomes instead of the stand
developers, the stand community, brutally evaluating an algorithm before deciding to put
478
some amount of work, maybe first this package gets used and it's developed by an algorithm
developers.
479
But this...
480
This is the broader question of how do algorithms get developed, implemented, and adopted?
481
And I'll tell you what, another big criterion here is the simplicity of the algorithm.
482
That plays a huge role into whether an algorithm is adopted by developers, by users, or
not.
483
So the answer is I don't know.
484
Yeah, that's always a fine answer.
485
Any questions?
486
I'm going to bring one up for my neighbor.
487
Wait, Perfect.
488
We needed the mic.
489
So what do we do about algorithms that are good for specific situations but not good for
other things?
490
Like so far we've only developed like black box algorithms that we kind of hope work
everywhere.
491
We don't have any kind of real specific algorithms for anything.
492
Is there any future for that?
493
I mean, this is...
494
I think this is one advantage, so I'm gonna quote the person who just asked the question,
but one thing Bob has said a lot is the reason we don't wanna just put 30 samplers into
495
Stan is then a lot of practitioners would try all 30 of them and then just report the,
there's an advantage to sort of being a great filter and being very conservative in what
496
is actually in Stan.
497
But I do think this is one advantage to making it easier to broaden the ecosystem where
now I think a future for that kind of
498
algorithm is in a R package or a Python package that can interface with, there are now
existing examples out there of an implementation of an algorithm that has support for Stan
499
models and PyMC models.
500
So it can kind of bridge gaps between communities, also sort of, if you have to install a
separate package, that makes it fairly clear that this is for a separate purpose.
501
And so I think that's what I would say the future is for those.
502
Yeah, I agree.
503
Do you have an intuition how easy it is for the Sten compiler to figure out whether a
model is generative and then to be able to sample from it?
504
I mean, of course we can do it in generative quantities, but it's always awkward to double
code our models.
505
This is a question that also sort of does expose a bit of my sort of not traditional
statistics background, is that I have never been presented with a definition of like,
506
generative or graphical model that is precise enough for me to actually answer this
question.
507
I think that there are definitely easy cases and hard cases.
508
I suspect that in general it would be impossible, but it's also, I think it's probably
likely that we could have a system where it tries really hard and then if it doesn't
509
succeed in a minute it gives up or something like that.
510
There are all these sorts of tricks in the compiler world, but I think that the...
511
This is another one of these things, kind of like GPU support, that because you can write
basically anything you want, you can also write sort of the worst possible case for this
512
kind of automated analysis.
513
an open question I've had for a long time is like, what percentage of STAND models in the
wild are generative or not?
514
If that number just naturally is 80, 90%, I think then this is like a very fruitful thing.
515
But if it's like 60, I don't know.
516
less, I'm not sure.
517
That's been what I've heard is that it is more like, it is fairly high, yeah, I think it
would be something that's worth looking into, but I would need some handholding on the
518
statistic modeling side of that, actually.
519
Sorry, I shouldn't call on people.
520
Hi, so I have a question about more on the people trying to implement models in Stan.
521
And say there's a model and it's just, you know, it's taking a very long time.
522
And people think, well, Stan, you know, they might have some complaints or I say it's too
slow.
523
But what I found in practice also is I never clear sometimes what parts of my model are
causing the delay.
524
So what are the slow bits or?
525
It can either just be like mathematically this is just harder to estimate or there's some
shape of my posterior that's really harder to navigate.
526
But I don't really get that feedback unless I'm like fixing certain parameters, toying
with other things.
527
Is there any way to allow, know, give that feedback of, what's causing some issues?
528
you ever thought about modeling that?
529
Sorry.
530
So I remember maybe a year ago, I was actually, I met Andrew Gelman and Meti Morris in
Paris at a cafe.
531
We just all so happened to be in Paris.
532
And we started brainstorming.
533
We had an idea of a research project, which is how much can you learn about your model and
your sampler by running 20 iterations of HMC?
534
And the idea that, you know, fail fast, learn fast, that, you know, the early iterations
of a Bayesian workflow should be based on that.
535
And I think that a lot of the statistics literature and the more formal literature, you
know, kind of imagines that, you know, you've done a really good job fitting your model,
536
you've thrown a lot of computation, you've waited a long time.
537
And we want to figure out, you know, what are the lessons that you can learn quickly,
right?
538
So now,
539
I can talk a little bit from experience and I can give you that, but we kind of want to
make that also part of the workflow and your early iterations that we can learn with fast
540
approximation.
541
And then hopefully we'll have a good answer to your question.
542
There's also a tool for instrumentation.
543
Yeah, was gonna say, in the immediate sense, there is the ability to profile stand models.
544
You can write a block that starts with the word profile and then a name, and then you can
turn that on when you're running it, and it will give you a printout of like, the block
545
named X took this percentage of the time, the block named Y took that percentage, and it
can help you identify at least like, here's the bad line.
546
Now, it might not help you figure out what you need to do instead.
547
But that's where I found that there are some real wizards who live on the Stand Forum,
some of whom are in the room and some of whom are completely anonymous and will never meet
548
them.
549
But they're super helpful.
550
if it's a model that you can share, that you can share a snippet of, there is a lot of
human capital.
551
yeah, automating that and putting that into documentation is an ongoing thing.
552
Yeah, mean, plus one to the human capital.
553
And the contributions of everyone here who comes to this conference, who teaches
tutorials, who demonstrates
554
their models, who shares the documentation, who makes their code open source.
555
I that's also one of the things that makes a programming language work.
556
Time for one last question.
557
So I was thinking, if you go back some decades, 50, 60 years or 48, if you develop a
model, then you have to develop a way to sample from the posterior and stuff like that.
558
But maybe fast forward to today and maybe my advisor could be thinking, when I was a boy,
I had to write my own sampler.
559
Now you can have people that can be designing models or new ways to model, observe data,
but they maybe don't have to think too much about that computational side.
560
So what you think about the effect of Stan and similar languages on opening up this
research in Bayesian modeling to people who maybe are not numerical analysts or stuff like
561
that.
562
think you should bring your advisor to Stencon.
563
Yeah, so...
564
One way to think about this question is to think about how old Hamiltonian Monte Carlo is.
565
So the original paper is from 1987.
566
And yet it was largely unused by the broader scientific community until Stan came out.
567
And what were the technologies, technological developments that enabled Stan to make
Hamiltonian Monte Carlo
568
the workhorse of so many scientists.
569
I that's something worth thinking about.
570
Though I should say the one exception, the one person who did use HMC through the 90s and
2000s is Radford Neal, right, who did manage.
571
But otherwise, the tuning parameters, the control parameters, the requirement to calculate
gradients, that was an obstacle to many people.
572
And so instead of using HMC, they're using other samplers, which we know perform.
573
between less well and dramatically less well in many cases.
574
So I think it's great that we have these black box methods.
575
But the one nuance that I will say is that the algorithm is not the only thing that's
black boxified and Stan.
576
The diagnostics, the warning messages, the generation of those things, the fact that these
things are generated automatically.
577
That's what makes a black box algorithm reliable.
578
It was the derivatives too.
579
There wasn't a good auto-div system when we built Stan.
580
I mentioned gradients, no?
581
I'll caveat this a bit with the previous question hints at the fact that these things are
never truly black box.
582
Because when you're facing performance difficulties, when you're at the edge, you do need
to have a fairly sophisticated understanding of what's happening.
583
If you ever have used the reduce some function in Stan, that is technically like an
implementation detail.
584
that you are having to exploit to get the speed you need.
585
And so there's always a fuzzy boundary here, but I think that it does help lower the
barrier to entry, even if the hypothetical ceiling can stay as high as your imagination.
586
That's true.
587
We could be more black box.
588
That's seriously, huh?
589
I think that people do tweak and manipulate the methods a lot, and they need to understand
some fundamental concepts.
590
Awesome.
591
Well, I think we're good.
592
Thank you so much, folks, for being part of the first live show.
593
This has been another episode of Learning Bayesian Statistics.
594
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbayestats.com for more resources about today's topics, as well as access to more
595
episodes to help you reach true Bayesian state of mind.
596
That's learnbayestats.com.
597
Our theme music is Good Bayesian by Baba Brinkman.
598
Fit MC Lance and Meghiraam.
599
Check out his awesome work at bababrinkman.com.
600
I'm your host.
601
Alex Andorra.
602
You can follow me on Twitter at Alex underscore Andorra like the country.
603
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
604
Thank you so much for listening and for your support.
605
You're truly a good Bayesian.
606
Change your predictions after taking information in and if you're thinking I'll be less
than amazing.
607
Let's adjust those expectations.
608
Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation