How does it feel to switch careers and start a postdoc at age 47? How was it to be one of the people who created the probabilistic programming language Stan? What should the Bayesian community focus on in the coming years?
These are just a few of the questions I had for my illustrious guest in this episode — Bob Carpenter. Bob is, of course, a Stan developer, and comes from a math background, with an emphasis on logic and computer science theory. He then did his PhD in cognitive and computer sciences, at the University of Edinburgh.
He moved from a professor position at Carnegie Mellon to industry research at Bell Labs, to working with Andrew Gelman and Matt Hoffman at Columbia University. Since 2020, he’s been working at Flatiron Institute, a non-profit focused on algorithms and software for science.
In his free time, Bob loves to cook, see live music, and play role playing games — think Monster of the Week, Blades in Dark, and Fate.
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Thomas Wiecki, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Elea McDonnell Feit, Bert≈rand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Joshua Duncan, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, David Haas, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin and Raphaël R.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉
Links from the show:
- Bob’s website: https://bob-carpenter.github.io
- Bob on GitHub: https://github.com/bob-carpenter
- Bob on Google Scholar: https://scholar.google.com.au/citations?user=kPtKWAwAAAAJ&hl=en
- Stat modeling blog: https://statmodeling.stat.columbia.edu
- Stan home page: https://mc-stan.org/
- BridgeStan home page: https://github.com/roualdes/bridgestan
- bayes-infer home page: https://github.com/bob-carpenter/bayes-infer
- Crowdsourcing with item difficulty: https://github.com/bob-carpenter/rater-difficulty-paper
- Pathfinder VI system: https://www.jmlr.org/papers/v23/21-0889.html
- Flatiron Institute home page: https://www.simonsfoundation.org/flatiron/
- 0 to 100K in 10 years – Nurturing an open-source software community: https://www.youtube.com/watch?v=P9gDFHl-Hss&t=81s
- Information Theory, Inference and Learning Algorithms: https://www.amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/0521642981
- LBS #20 – Regression and Other Stories, with Andrew Gelman, Jennifer Hill & Aki Vehtari: https://learnbayesstats.com/episode/20-regression-and-other-stories-with-andrew-gelman-jennifer-hill-aki-vehtari/
- LBS #27 – Modeling the US Presidential Elections, with Andrew Gelman & Merlin Heidemanns: https://learnbayesstats.com/episode/27-modeling-the-us-presidential-elections-with-andrew-gelman-merlin-heidemanns/
- LBS #17 – Reparametrize Your Models Automatically, with Maria Gorinova: https://learnbayesstats.com/episode/17-reparametrize-your-models-automatically-with-maria-gorinova/
- LBS #36 – Bayesian Non-Parametrics & Developing Turing.jl, with Martin Trapp: https://learnbayesstats.com/episode/36-bayesian-non-parametrics-developing-turing-julia-martin-trapp/
- LBS #19 – Turing, Julia and Bayes in Economics, with Cameron Pfiffer: https://learnbayesstats.com/episode/19-turing-julia-and-bayes-in-economics-with-cameron-pfiffer/
- LBS #74 – Optimizing NUTS and Developing the ZeroSumNormal Distribution, with Adrian Seyboldt: https://learnbayesstats.com/episode/74-optimizing-nuts-developing-zerosumnormal-distribution-adrian-seyboldt/
- Bayesian Workflow paper: https://arxiv.org/abs/2011.01808
- BAyesian Model-Building Interface (Bambi) in Python: https://bambinos.github.io/bambi/
- On Being Certain: Believing You Are Right Even When You’re Not: https://www.amazon.com/Being-Certain-Believing-Right-Youre/dp/031254152X
In this episode, you meet the man behind the code. Namely, Bob Carpenter, one of the core developers of STAN, a popular statistical programming language.
After working in computational linguistic for some time, Bob became a PostDoc with Andrew Gellman to really learn Statistics and Modelling.
There he and a small team developed the first implementation of STAN. We talk about the challenges associated with the team growing and the Open Source conventions.
Besides the initial intention behind and the beginning of STAN, we talk about the future of probabilistic programming.
Creating a tool for people with different degrees of mathematics and programming knowledge is a big challenge and working with these tools may also be more difficult for the user.
We discuss why Bayesian statistical programming is popular nonetheless and what makes it uniquely adequate for research.
Unknown Speaker 0:14
Bob Carpenter, welcome to Learning Bayesian statistics. Thank you. Yeah, thanks so much for taking the time. I'm really super happy to have you on the show, please. Yeah, I've been meaning to invite you for a long time and so I'm super glad that you could find some time out of your busy schedule to come and talk to the listeners who I can tell you are very excited to hear about you. So, okay. We start then. And as usual, I always start always not like starting by the background of my guests because, I mean, there are so many diverse and inspiring backgrounds that I find that fascinating. So let's let's do that with you as usual. So let me ask you well, how did you come to the stats and data world in? Was it more of a senior's or straight path?
Unknown Speaker 2:28
Well, I don't know. It's I started doing a lot of probability when I was a kid because I played a lot of games, especially role playing games, and my dad was a bookie who used to have me calculate the odds for bets. So I started doing probability very early on but then when I went to undergrad, I really wanted to do logic, AI and natural language semantics and did for many, many years. And only when natural language processing started to make a turn into statistics, did I like take this up seriously, sort of in the mid 90s. We had just hired Chris Manning at Carnegie Mellon. He was teaching an intro NLP class and stats and that's what got me hooked.
Unknown Speaker 3:12
I see. Okay, so actually, it started very early your, like your statistical practice.
Unknown Speaker 3:20
Yeah, although you know, the way statistics was done in natural language processing in the 1990s was not particularly sophisticated. So it took, you know, I felt like I really needed to leave my job working in industry in natural language processing and go hang out with Andrew Gelman for a few years to actually learn statistics. I found it very hard to read to understand just reading textbooks. Yeah, so I could learn basics. I could learn how to code bugs. I just I you know, it was actually understanding the subtleties. There's a lot there.
Unknown Speaker 3:50
Yeah. Yeah, I definitely relate to that. It's like reading the textbook is a good first step. But then when you start practicing, it just makes a huge difference in in the practical way that you can use it and really understand it. To me, it's really when I started making kind of, like, classic errors and mistakes that I really started to understand what all these was about. And so I'm actually already very curious about a lot of things, especially on the NLP that you mentioned, but let me not get off course right now and come back to that a bit later. But so to continue on, you're basically painting your picture for the listeners about who you are, where you come from, in what you're doing. Can you tell us how you would define the work that you're doing nowadays, actually, and what are the topics that you are particularly interested in?
Unknown Speaker 4:54
Well, I'd say pretty much my whole career I've been doing the same sort of thing, which is building programming languages and API's for people to use to do whatever they're doing because I love kind of working at that metal level of of a language. So you know, it's what led me develop Stan in the first place, but the language side of it. Nowadays, though, now that I've moved a couple years ago, I moved to Vegas three years ago now I moved to flat iron Institute, from Columbia University, and that's cleared up a lot of my time, which used to go into fundraising and managing a group of postdocs and things where I can do a lot more work on my own now. And I'd say nowadays, I'm working much more on algorithms. So we're working on variational inference algorithms. We're working on the geometry of transforms for constrained variables. So I'm doing much more higher level stuff than just low level standard. coding. So my looks less engineering oriented and more research science oriented, I would say is the biggest difference. Having said that, though, I still spend half my time developing code. So we're really excited about a bunch of packages. Brian Ward and I are rolling out with Edward rol desk that are sort of exposing Stan through different languages like Python and Giulia, and then we're building algorithms on top of that. So I'd say that's, that's sort of what's keeping me busy day to day now.
Unknown Speaker 6:14
That's, it sounds very cool. You want to talk a bit about that.
Unknown Speaker 6:20
I'm sure I mean, we we have you know, it's been a struggle with the interfaces in Stan because, you know, we have a language that's independent of any existing programming language that we built from scratch, which means people have to wrap it in languages that they're they're using and the original versions. of our Stan and pi Stan that we designed. were communicating in memory to the standard C++ code and the back end. The way we did that, though, was through high level interfaces like Rcpp and Cython, and R and Python which imposed a lot of C++ binary compatibility. And recently, when enrolled us was visiting flat iron last summer. He's one of the stand developers who's a professor at Cal State Chico. He wanted to develop algorithms in Giulia and we didn't have any way to expose standard models in Giulia so he started working on low level in memory access, just a very low level C interface, foreign function interface to Giulia. And while he was at it, we were like, can you do the same thing in Python and R and he figured out like the low level interfaces to both those there's something called the dot C interface to R and the C types interface to Python. The nice part about those is that they only require memory compatibility. They don't require the whole compiler chain, right, which was a huge pain for our stand and tasting stand installation cran as well on our is a very difficult thing to deal with, because they insist on packages being very small, and they don't have a dependency management system for packages. So we're about two years out of date on our Stan, because of cran. And I don't know that we'll ever get another cran version up again of Stan because they changed policies. So that our Stan was the only way to really get a standard model. So a standard models a C Plus Plus encapsulation of data basically goes into the constructor and then it defines a log density function, which is automatically differentiable. We wanted to be able to access the derivatives the gradients, the Hessians, and the variable transforms because Stan automatically transforms variables from constructive bases to unconstrained spaces and back. So bridge Stan basically is just an interface that exposes the standard C Plus Plus model into those languages in an efficient way that doesn't involve any copying in memory. And now that we're building inference system starting in Python to just implement all the standard algorithms, because there turns out to not be good reference implementations of any of the MCMC stuff in Python. It's all I can Stan tied up or in PI MC or something. It's all tied up with a particular probabilistic programming language. So there's a lot of good implementations of it, but none that make it particularly easy to just wrap your own gradient and log density function and go. So we're combining these two things of bridge Stan which will expose those gradients and then just building reference sampling and posterior analysis tools and things which will be hopefully useful both pedagogically and for us to be doing research and for building things because you don't really want to be developing algorithms in C++. Not a good experimental prototyping language. For more grammars like me,
Unknown Speaker 9:43
yeah, like me, too. And have you looked into Blackjacks that's in Python.
Unknown Speaker 9:52
That's one of the things where we tried to get our algorithms working in Blackjacks. We tried to get just our log density gradient function, but it makes a lot of assumptions that you're going to actually be using Jax underlying Lee so we couldn't actually get it to work without it trying to auto diff our code with Jax. Maybe there's some way to do that, but we couldn't figure out how to how to do that. Right. Now the other we've gotten the jacks models working with our system that we can do.Unknown Speaker:
Thank you. Okay, okay. I'm already super super cool stuff. Like let me backpedal a bit in ICU another traditional question, which is, do you remember how you first got introduced to Bayesian methods, actually, and how, like, how frequently do you use them nowadays?Unknown Speaker:
Yeah, I would say it's kind of a subtle introduction, because I started in natural language processing and speech recognition and you know, like a lot of places people are, like implicitly Bayesian without actually sort of understanding that they're taking a Bayesian interpretation of things. So I think the NLP stuff kind of has a Bayesian perspective in it like it's tends not to do Bayesian inference, but it tends to fit Bayesian models. Right. So which I think is true of a lot of machine learning where they talk about the models in a Bayesian way. You talk about speech recognition is like a noisy channel decoding model where you build an acoustic model, right, than what people say and then you build a language model of what they're going to say and then you just do the standard like Bayesian inversion, right to actually decode what they're saying. So I got exposed to that theory early, but getting exposed to really taking uncertainty seriously to building hierarchical models to like, pushing your uncertainty through inference to do like posterior predictive inference. That all came much later after I started learning bugs and Jags maybe it would have been about 2005 I was working at a small NLP company. For dealing with a lot of crowd sourcing problems every time we got a new customer, they would be like hey, can you guys recognize rap artists and songs in text? And we'd be like, sure, we just need a training corpus. Now me and my partner Breck Baldwin would go back and start annotating a lot of data and try to do our best with that. But we wound up with lots of data annotation problems we like worked with epidemiologists who were annotating like chief complaints and emergency rooms. And we were constantly faced with these crowdsourcing problems. And it seemed like a natural sort of place to apply Bayesian methods. So, I had known Andrew Gelman, personally for many years before I started working with him, so I asked him if I could start hanging out in his research group. So I'm working in my company. I'm hanging out with Andrew and Jennifer Hill, and they're multiple imputation group. And I'm like, stealing some of their side on the time, their their time on the side so they could help me build one of these models. We wound up building a crowdsourcing model, which wound up to be absolutely isomorphic to David schemes model. So Phil, David built a model in 1979. Right, which is a really cool crowdsourcing model. People are using it all over the machine learning world now and many people like me rediscovered it, were really this was like Andrew and Jennifer rediscovered it by me telling them what my problem was. So I think like a lot of things having that one particular instance it was kind of a complicated discrete model problem, but having that one problem that I actually cared about, I really cared about the answer. I really it wasn't a textbook problem. I wanted to like put this into practice. That made me come to grips with hierarchical priors, with all the computation with all the inference that you were doing, and it just changed my entire perspective on how to do calculations. And after about a year of working with Andrew and Jennifer sort of on these problems and applying them to crowdsourcing I wrote dozens of blog post I wasn't really reading papers at the time. I finally decided both I and my partner and my company were kind of board building like we were to person natural language company so we could build classifiers we could build spell checkers, we couldn't build anything large scale. And we were just getting bored building another you know, Twitter classifier like sure it may be in Korean this time instead of in French, but, you know, sure. It's the same technology. So I'm like, Okay, I'm gonna quit, go start another postdoc. I went, I basically said, Andrew, can you hear me? Whatever you can pay me and he's like, Well, I got a couple of postdoc openings. Maybe I can hire you that way. So I was at a at a very late stage in my career went and, you know, jumped on into a learning situation, which was a lot of fun. I don't think I could have ever learned this on my own while working a full time job. Even though I was spending all my spare time doing it. I just don't think I could have without a mentor really helping me. I don't think I could have like, crossed the gap.Unknown Speaker:
Yeah, that's definitely hard. But that's, I mean, it's, it's I love that story. Bs. You're 47 right when you started a postdoc, yeah, in a new field. Let's do it. Yeah, super inspiring. The like, yeah, did.Unknown Speaker:
peers were were very jealous. Like my peers, deans and department heads. They were like, you're gonna actually get to go to work and learn stuff again. It's like all I do is management. So nice. So it's nice. I recommend it. Yeah. Yeah.Unknown Speaker:
I'm curious. How do you feel like when you when you did that?Unknown Speaker:
I mean, it felt just like doing research, like research always feels the same once. Once you've gotten to like a grad student or postdoc level. Everyone kind of works together. It's not like, it wasn't like everybody treated me like a kid again. Going back so it was really like, it was a great learning environment, because for a couple of years, I didn't have any responsibility. When we were first building the first version of Stan. I had no responsibilities for teaching for supervising for fundraising. There's just a couple of years of funding I had didn't have to worry about anything, which is a great way to get and learn things. So it felt like going back to grad school really more than going back and doing a postdoc if I didn't have classes, and I'd sit in on the interviews classes, butUnknown Speaker:
which sounds like fun. I mean, Andrew has been on the podcast a couple of times in definitely I was absorbed by the conversation. I'd really love to attend one of his lectures for for a semester.Unknown Speaker:
His classes are great master classes, you know if you build the material and want to discuss them because Andrew is kind of bored at teaching the intro material. So if you go to one of his classes, it's almost always a discussion of like the high level stuff. It kind of reminds me of David McKay's book on information theory, which is a great like, Intro to neural nets, Bayesian inference, machine learning, but it's very, it's very particular. It's very like a couple page chapters that really give you insight into something rather than just regurgitating the formulas that everybody knows and I think Andrews classes are very much like that. You go away you always learn something going to one of Andrews classes. You learn some subtlety that's not in the textbook.Unknown Speaker:
Yeah, no future and I mean, if people don't know, Andrews style, I would recommend listening back to the, the episodes he was he was here for. So if I remember correctly, it was 20 with Akki vettery and Jennifer Hill for their book, regression in other stories, and I think episode 47 I have to check, but he was there with Merlin. Kinda man's talking about the model they did for the US 2020 election. I put that in the show notes for people who want to listen to that again. And also we should put the book you just told us about so it's by David Mackay. And it's called information theory, right?Unknown Speaker:
It's, it's, yeah, it's a long time. It's got five nouns in the title or something, but information theory is one of them.Unknown Speaker:
Yeah, I think it's information theory, inference and learning algorithm.Unknown Speaker:
There you go. Yeah. So it's it's characterized by describing logistic regression. As classification with one neuron, because it takes this neural network approach to everything and they just so it's quirky, but it's very insightful. Basically, we coded HMC based on their discussion based on Mkhize discussion. I see okay, but there's a really nice discussion of Hamiltonian Monte Carlo andUnknown Speaker:
you easy in the shownotes people, so if you want to check it out. Awesome. And well, you already touched a beat on it. So but you're one of the cofounders of Stan. So I'm guessing that's when you started all that postdoc stuff and so on. So yeah, can you tell us a bit about the origins of Stan and how and why you got involved? And if people want more details, I put in the show notes, torch that you did about, about that about 10 years of Stan and how to use to nurture an open source software community. So it's a YouTube video I put that in the show notes already. But yeah, maybe maybe tell us something here but the paintings of Stan and and maybe something that's not in the in your talk on YouTube?Unknown Speaker:
Yeah, I can't remember exactly what the talk was. But so I apologize if I duplicate myself. But the project really started because Andrew wanted to fit better hierarchical models. He took his and Jennifer's book before they rewrote it with Aki M had a bunch of models in it. And he used a lot of LME, for its max marginal likelihood package and are that super popular? And it turns out that modeling language for LME for was not quite rich enough to express all the models. And Gelman and Hill's book are now with with Aki the new one. And he hired me and Matt, because basically, Andrew got money that sort of fell on him out of the sky. Some people had some grants were leaving Columbia, gave the grants to Andrew. So he just took a flyer on hiring a couple of computer scientists, so he hired me and Matt Hoffman. who just finished his PhD with Dave Bly. And he wanted us we spent the first month we'll try to, first of all trying to get our terminology together, right, so we can all talk to each other, which was challenging, because coming up, Matt and I were both coming out of machine learning and Andrew is very, very particular about language and everything else needs to be tastic that way. So it was a big learning curve in the beginning just so we could talk to each other. But then what we were trying to do is Matt and I were trying to design a language that would let you that was like an extension of LME for that would cover all of Andrews books and eventually we just went on and we couldn't figure out how to do it. And then finally I said why don't we just build something like bugs? I think we can you know, at that point, we'd gotten the advice to do Hamiltonian Monte Carlo to do automatic differentiation. And I'm like, I can see how because my own background, I should say is in programming languages and natural language processing. So I went to Edinburgh for a PhD, which has a great programming languages group and I spent a lot of time hanging out with those people and programming language design and my early academic career. I spent a lot like I wrote a book on programming language theory back when I was doing logic programming so it doesn't look at anything at all, like Stata. So I'd had an adult strong background in design languages and compilers and things and I thought, well, really all we need to do for HMC is have a system like bugs that will let us evaluate that will compute the log density function in a way we can auto debit. So I just sat down with bugs and said what are the statements and bugs look like if they're going to just instead of doing sampling and defining a graphical model, which is what bugs or Jags does, instead, if we took those from a more procedural programming language and thought of them as just defining a log density function, what would the contribution of each little piece did? Once I made sense of that it was pretty easy to sit down and write a prototype for the stand language that was very much like bugs, but my own background is In type theory. So I spent a lot of time both in natural language semantics and in programming languages working on simple type systems. So I wanted to build something that didn't look like our Python or Jags or bugs and instead, was strongly statically typed like C or Java or a language like Oh, camel or Haskell or something. So that's what I thought was going to be the main bottleneck for people. I thought people were really not going to like the typing and there's certainly people have a bit of difficulty with arrays versus vectors and things in STEM. Right. We use vector types for all of our linear algebra and arrays. For everything else, which still confuses people. But overall, that that was not not the kind of bottleneck I thought it was. But the thing was, we at the same time we were trying to we were pulling the lid off of Jags like Jags has a very clean code base. It's very it's a very inefficient code. It's amazing for somebody who's not a computer scientist who wrote by himself, like the fact that Martin Plummer got this done and it's as cool as it is, is just amazing. But it's written with a bunch of our infrastructure right there because basically, it's using our code using our classes our definitions, which are in our itself is super inefficient the way it's the way it's coded. So as a result, Jags inherits a lot of the clunkiness from our so we were thinking we would just rebuild that. We're like, okay, build a better Gibbs Sampler because like we can do we'll just do the software engineering. And, you know, Matt started writing vectorized versions of things in Jags and we started because I still hadn't learned C++ at this point. Right? So Matt was doing the actual coding at this point. Like I better if we're going to do something like this, I better start learning C and C++. I mean, I had known C, but I hadn't know C++. So that was a huge effort as well. But Matt started optimizing Jags code, but then we realized that you know, we didn't really just want to make a faster Gibbs Sampler. It's like Gibbs sampling. No matter how fast it went is still not going to go fast enough in dimension it still scales. Very poorly. All the state of the art methods are gradient based. So we knew we kind of needed to jump up from Gibbs sampling. So after a few months of trying to develop a new language that looked like LME, for trying to make Jags go faster, we're just like, Okay, we need to start from scratch. Well our own language This was before we had constrained parameters or anything like that there were just three main data parameters model block, but it was actually quite fast building the prototype. The hardest part was just learning C++. So Daniel Lee and I were the ones who built most of the first version code, Matt built the sample or he helped with the memory design for everything because Matt is like a crack C++ programmer, but he doesn't want to spend his time doing that. So he was like the C++ advisor for us on the project. But Daniel Lee and I did most of the coding, but that's how it started. It was like, can you make hierarchical models go faster in a way like LME for and we were like, Yeah, not really. But we can build a faster version of something like bugs or Jags, right that will actually solve some of these problems for you. Although ironically, it turns out hierarchical models are still some of the hardest models to fit if you fit just if you just take like Radford Neil's funnel example, which is a hierarchical prior with a normal prior with no data, right? That's still something you can't fit with HMC. There's no fixed step size that will deal with the bad conditioning of that distribution nuts doesn't help. Right? So it's still something that you can't solve. So one of the things I've been doing recently is trying to figure out how to build samplers that will sample those problems. Right, I think what we did with Stan was we just push this back on the user we're like, no, that's not going to sample the model the way it's written. You have to re parameterize the model. Right? It's your job as a user to give me a parameterization where the posterior looks close to standard normal, then we're good to go.Unknown Speaker:
Yeah, and that definitely is still a bottleneck for sure. And especially for beginners, I can see that and I mean, it's the same in in PMC. As you're saying, it's like, ideally, we would like the computer and the new algorithm to do that by themselves, but they still can't sue we have to push that on the users side, which makes things a bit even more trickier to like, adapt and adopt for for beginners.Unknown Speaker:
Yeah, and it gets it gets into issues like trying to print it's even worse than multivariate problems. So doing like Neil's funnel is relatively easy to build the non centered parameterization. But when you have something multivariate for that it gets much harder. Now all of a sudden, it's less key factors and everything else floating around. It's just a lot. The code gets a lot clunkier. Right this was one of the things that led Maria Goran over to build slick stan for her master's thesis. It was the there's no way like the worst part of Stan, in my opinion is that there's no way to encapsulate modules. There's no way to write like a hierarchical prior and Stan right. This was the largely the focus of like my probe probe talk based on this year, which was based on what Maria was doing, kind of surveying a bunch of the systems. In the end I decided I really like turing.jl like for a language and for you know, the expressiveness it's like bugs only better.Unknown Speaker:
Thanks. See ya. Oh, for listeners, by the way, Marya Marinova was in the Podcast, episode 17 where she talked exactly about the project that Ben was mentioning, Bob sorry, was mentioning so definitely check that out. If you want to more detail review and that's exactly where that like how do you regroup and revise your models automatically. And and the whole project with that and that was Yeah, super cool episode. I really loved it.Unknown Speaker:
You know, she did a cool internship with Matt to where she did discrete parameters. Like I don't know if she was talking about that then but there's she's done a bunch of cool work. Yeah, yeah, I keep up with I don't I haven't actually contact her to see what happens after the Twitter blowout so so she moved to Twitter. True, maybe back on the job market.Unknown Speaker:
Yeah, that's true. In actual yell, I also have a few episodes about touring. I need to know in the shownotes episode 19 and 36. about developing touring the Jael and the Juliet ecosystem, with Cameron Pfeiffer and Martin Tripp, who both are developers of touring, so definitely take that for a walk.Unknown Speaker:
I'm very jealous of how clean zygote is for their auto death. We just did like complex derivatives and fast Fourier transforms and stuff but I was almost just crying looking at the like, three lines of code it took to write that zygote and the like, you know, 100 lines of code it took to write it and plus it stands.Unknown Speaker:
Yeah. Well, I mean, you were also like you started that way, way earlier in and you were also one of the one of the motherships of the probabilistic programming languages. So I mean, definitely all the ppls out there. Oh, something big to stand. So like, I think I'm speaking in the name of everybody when I thank you, Bob, and all the team for your hard work and on on on getting the first version of Stan out to the world. And I think that was definitely trailblazing work. So yeah, definitely. Thank you for that.Unknown Speaker:
Oh, you're welcome. It's certainly been fun. And it's not me obviously. It's a big team of people as you will see if youUnknown Speaker:
look yeah, for sure. Like, oh, at the firstUnknown Speaker:
version released, I think we had six or seven developers by by like, by like release day. Like Ben Goodrich was really critical for getting all of our multivariate stuff. I still remember the first implementation I did. Like I was so ignorant, that I literally wrote multivariate normal. I wrote our first multivariate normal implementation is just, you know, y minus mu transpose times inverse sigma times Y minus mu. And I'm like, What's wrong with that? People are like no, Bob did not take in verses in linear algebra. Where did you go to school you idiot. I had to learn a lot about computational linear algebra, other stuff like that as I went along to so but that I would never have been able to do by myself, right. It was like, Yeah, we had both Michael Betancourt and Ben Goodrich, and Andrew, all of whom were bringing in like heavy duty stats background into this.Unknown Speaker:
Yeah, I mean, that's the that's the beauty and magic of these open source projects, right is just like so much work in people working behind the scenes, but also I really love that spirit of open garage, you know, and like nobody is pretending to know more than then they do and, like you can openly make mistakes. And and it's going to be fine. People are not going to scold you for that. On the contrary, you're going to learn a lot and it's definitely one of the safest spaces. I've I've encountered to, to work and learn.Unknown Speaker:
Yeah, we've tried to be very, very open. Now. Having said that, I kind of got kicked off our mailing list for being too rude to people. I'm back on our mailing list again, but it's it's a very delicate operation talking to users. Like it's like you try to be nice, but you know, it's it's very challenging. Not it's very challenging to be nice. It's very challenging to have the users believe you're being nice. Yeah.Unknown Speaker:
Oh, yeah. Yeah, definitely. I understand that. Especially since the open source way of, you know, giving feedback and being, like, open and transparent is very, can be very different from a lot of companies, private companies. And so if people come with that mindset, and they see the light there, they clash with the open source mindset. It can be really a nice, really different world for them. And it can seem very violent, even though it's not at all. But yeah, that's definitely something that I also have to keep in mind. Each time I need to talk to users.Unknown Speaker:
Yeah, cuz people are very invested in their ideas. They're very, they're often very stressed out because they've spent, you know, days trying to get your system installed or working. They have deadlines coming up. So yeah, it's a lot of a lot of pressure in different different ways. But overall, our users are great and understanding and I find if you just tell them look, the reason this is broken is because we don't have a developer for this like, you're welcome to come help us fix it. Otherwise, you know, we'll get to it when we can get to it. People will be understanding rather than trying to be defensive and say look, it's okay as it is like we've just owned up to all of our mistakes. I saw that I was very happy to see that. Oh, what's the name of this package? Nimble did exactly the same thing. So they broke their inference. Right. And then they did a big campaign of publishing that they broke their inference so that people needed that and they fixed it that people needed to upgrade. We went through that in one of our releases. We had an off by one error in HMC. And one of our users found it with a million iterations. Unlike the third decimal place, or the second decimal place of a correlation and a multivariate normal our unit tests that ran 10,000 iterations didn't pick this up. From user read a million iterations. It said You're off by like 1% on the correlation parameter, and you shouldn't be. We're like, No, we shouldn't be. And we realized that we'd introduced an off by one error when we were moving from the slice sampling version of HMC to the multinomial version. of HMC. But I went to blog I taught posted stuff on our forums just saying, Sorry, everyone, this is our fault. We blew it. Like please fix this as soon as you can. Right? And I find if you're open like that, people are much more understanding that if we tried to try to sweep it under the rug, like I have, I won't mention packages by name, but I've seen one of these packages release, a releasing Well, we've had a bug for a year that we didn't tell anybody about, and we just fixed it now. And that's not not a way to earn trust with your users.Unknown Speaker:
Yes, ma'am. Fisher. Yeah, that's that's definitely an interesting theory. And yeah, that definitely relates to to what I was saying about the the open source culture. So that's actually great. Thanks for contributing dance and actually, you were saying that it was super fun to work on Stan and well, I'm guessing you're still involved, of course in suphan. I can't imagine that there were a lot of difficulties and so can you I'm always interested in net can you tell us which difficult difficulties you encountered with this project and what you learned from them?Unknown Speaker:
There are a lot of difficulties with this project. Um, one I've learned C++ is about as bad as it looked like from the outside, like it's super great that you can control memory, but just to the fact that there's so much undefined behavior in the compiler spec just as a disaster for consistency. So I learned a lot about C++ a lot about continuous integration and all of this stuff. But I think the thing that I've learned the most was that these projects are about people. It's not about the ideas. It's really about managing people and it's a really like the big. The biggest trouble we've had is when we moved from like two or three people sitting in an office where we pretty much had divided the project up and everybody was doing their thing to where we had to coordinate a bunch of stuff and we had different ideas about how to move from version one to version two of Stan and we could not agree. We spent about a year sort of deadlocked before. We sort of came in and got governance as part of the project. So we started during that point, we went from sort of project feels very different at different scales. It feels like one way when there's two or three people sitting in an office, it was just Daniel, me and Matt. We were sitting in an office, right and we could just talk to each other was like that felt one way now. We've got 40 developers spread all over the world. It feels very, very different. Getting from there to here was really the hard part. And I think it was managing the community, managing both community, the developers and managing the communities of users. Right To do this, and I think the big thing that we did, I think where we went wrong is Shawn Thompson i at the point where we needed governance. We'd both been in industry and we thought you know, interest industrial models work really well. It's very clear who has decision making process. It's very fast. Somebody always makes decisions, right? They may not be the right decision, but But you get stuff done. Right. I was talking to other friends of mines who are tech managers, and they're like, Look, if you've been debating these options for a year and you can't decide just do one obviously they're too close to decide and theory, trying to get a bunch of academics to just do something before they figure out which one's the best is challenging. Right? It's just not how we're brought up. We all like arguing about third order details of things. That's what drives you into academia in the first place. So moving the project at that point, what we did is we moved to governance, where we had a technical manager and then we had technical leads for each of our projects. Right? First, Shawn was technical manager because I didn't want to do it. And then I became technical manager. Maybe we did it the other way around. But anyway, our developers did not like having somebody in charge, right. It's sort of it sort of cuts against the grain of open source projects. It's sort of I think, undercuts the ownership that people feel and what they're doing. So that didn't work well and I was probably more stressed than I've ever been in my work life. At that point. When Stan was moving from version one to version two. It really felt like the project might fall apart because we all couldn't decide where to go. Then we then we put me and Shawn in charge that also didn't work well. And then finally we moved to an Apache style voting system where nobody's in charge but if issues come up that people don't agree on, like a pull request, that some people say don't merge it and other people say merge it then now we've run it just having that mechanism there was huge. And, you know, I learned a lot about how people work on big projects. I'm still not a good manager, right I very much when I made the move from academia to industry, I was a really apprehensive about sort of giving a day to day control to a manager, but in fact, I loved it. My manager John Irwin, when I worked at speech works was way better than I was at figuring out what I should be doing on a day to day basis. And he figured out what all like 30 people on his team should be doing on a day to day basis, which meant we went really fast. Right? If you think you're gonna get that on an open source project, you're gonna be very disappointed. Right and I was shot and I think we're coming in with that idea that hey, we can make stay on like a real industrial project. We can really start flying on this with some decision making, but and not so much. So, so I think dealing with the People's than than the sort of hardest aspect of this. I mean, there's a lot of technical challenges. Obviously, working cross platform is huge. If I could if I could go back and undo the decision that we would support Windows. I think that would probably save most of the technical hassles of our project. Yeah, we lose a lot of users. But I'm still surprised at the number of statisticians who use Windows day to day and don't want to use like the Linux they've got a Windows machine but they will not open up the Linux subsystem or like, hey, everything's easy with the Linux subsystem just works like that. Nope. That's a bridge to fire.Unknown Speaker:
Bob, I feel like you're seeing out loud with a lot of open source developers I thinking okay, well, yeah. Yeah, like,Unknown Speaker:
back is bad to Mac, like at one point. There Xcode distribution changed the template parameter and broke all of our programs. Yeah, it was a release, but it basically broke everybody's stamp program. And then, like custom hacker do a quick release to patch the bug that Apple introduced and, you know, so it's a lot of time dealing with that kind of stuff.Unknown Speaker:
Yeah, yeah, I agree. I was surprised and I keep being surprised also at the time, we have to spend on the on the cross platforms problem. And actually, it's something that also motivates the current work that we're doing on the PMC side where we're trying to sheep, the number a back end, because that way we could rely on on number four for all the backend work, and we wouldn't have to deal with the cross platforms problems, because they would be dealt with in in a way better way by the number team. And we were just playing into that and it's something that in particular, Adrian Z bolt is working a lot on. I got him on the podcast for episode 74. And Adrian is absolutely brilliant in generous guy. So if you're curious about what he's doing, definitely listen to episode 74. We talked about net pie, which is the reimplementation of nets that he did in Rust was actually really really good results. And speedups. We talked about that also talked about the zero sum normal distribution that he came up with. And that's 9.2. So, yeah, I mean, cross platforms problems in general, offer.Unknown Speaker:
Three I'm afraid I haven't been able to keep up with the PI MC platform stuff. I thought last I heard I thought you were going to Jack's back end.Unknown Speaker:
Yeah. I mean, there have been a lot of changes. So but basically, now we are in a more stable course where we have, again, our own back end, which is called PI tensor. And the goal is to basically use number through with PI tensors. So that the C back end custom from pi MC can go away. And we can use number four for the back end which should both simplify a lot the code base for PI MC the installation for PI MC, and as I was saying the handling of the cross platform problems. So actually, I'd like to, yeah, I'd like to talk to you, but a bit about that. But the hurdles, basically. And I'm curious, what do you think the biggest hurdles are currently in the Bayesian workflow? Like what do you see what do you see people being being bothered by most of the time, mainly in the user base?Unknown Speaker:
Yeah, I mean, we're putting a lot of effort into thinking about this. It's basically what Andrew is doing pretty much full time now. Andrew and Aki are wrote a long paper like an 80 or 90 Page sprawling paper about 10 Stand developers contributed, you know, threw paint at the wall and we got a 90 page like paper. Now Andrew and Aki are trying to turn it into a book. And what we're realizing is so much of this stuff is not written down so much of the stuff that you need to do right, it's like it's not in the textbooks. So I think the thing that's really challenging that that's sort of been challenging all along, is is still the two problems of being able to express your model being able to write it down, being able to convert from the math, that you have written down to the model or in some cases formulating the math that you need to write down because we have a lot of users for I'm sure you get this in PI MC we get a lot of users who are applied statisticians, they may be physicists, biologists or sports analysts of some kind and they don't all have like PhD levels of statistics background, you know, so, a lot of those people need a lot of help formulating models. But often those people are good programmers, whereas the flip side is we get a lot of statisticians, who are excellent at differential equations and stochastic differential equations. They know measure theory, but they don't know how to write a program with indexes. Right? So actually having somebody sort of go from the conception of the model to the actual working code. Me coming from computer science. I thought that was the trivial part because that's always been the easy part for me is figuring out how to code it once. I know what I want to do is always the easy part. That turns out to be a huge bottleneck for for most people. And it's a huge bottleneck for me when what I want to do is explore a bunch of different models. Right? So the problem with workflow isn't so much if I have a model, can I code it? It's that I don't know what model I want to say that I've got some data and I generally want to fit as big a model as I can with as many interactions and predictors as my data will support. But I don't really know where that is until I start playing around with the model. Right? So there's a lot of back and forth there and none of these systems provide really good tools for exploring multiple models. It's something I've been asking all the time see in Pyro and other developers is how much workflow support there is at the high level instead, and we have a problem in that there isn't a good way to formulate sort of chunks of models. Right. Whereas in something like pi MC, you could easily formulate the code for a hierarchical prior, formulate the code for a couple of different priors as Python programs and swap them out as part of like a Python workflow. But I don't really see people doing that a lot. And I see them doing the same kind of thing. People didn't bugs and Jags and they write 10 Full pi MC models that are like you write one, copy and paste, modify it a bit, write another one. Right? If there are examples of people sort of organizing the code into workflow for that I'd really love to hear it. I generally ask any other ppl developer whenever I see them if they're doing that, because we want to figure out how to do that. You can't really do that easily in Stan at all. But we want to start thinking about how we can encapsulate pieces of the workflow that are that are smaller than a whole program but bigger than just one line. Yeah, then there's another problem of once you've written your model down Have you got the right parameterization Can you debug it? And can you actually sample from it? Right the debugging tools for all of these things are not great compared to writing regular code trying to debug Stan code is like writing some print statements inside of and it's terrible. Right? It's a little better for some of the embedded things but I think it's still difficult to get into the middle of sampling and see what's like going wrong. Right if gradients explode somewhere trying to diagnose all that is and all of a sudden now you're into like geometry problems, sampling problems, convergence problems, you're into all these like, computational stats, things that you don't want to be in because all you want to do is fit your damn model. Right? You're like I've got some data. I've got a model or few models I want to fit them that is still hard. I wondered why it wasn't so hard in the frequentist world, but I think that's just because they don't fit a lot of different models. Right, everyone isn't trying to fit 100 Different kinds of models to everything.Unknown Speaker:
Yeah, that I don't know about. I have to say easy was the Mike Pearson statistics is disease is on the Bayesian side.Unknown Speaker:
Me too. So take that with a grain of salt. Yeah.Unknown Speaker:
What I can say though, is that yeah, you definitely can do those and casualty encapsulated bits of models with PI MC because yeah, like you can accept, encapsulate that in Python functions. So for sure, like we do that a lot at at painty labs actually, like most of the time. I mean, in the development phase, it's usually useful to do the copy pasting stuff that you're talking about, but then once you have your models and so on, encapsulating Python functions is like those things we do to deploy in production basically. So for sure, and like hierarchical priors, for instance, I I do that I do that all the time, especially if you have like a non centered hierarchical prior, for instance. Yeah, that's right. It's a very useful.Unknown Speaker:
I'd be really curious if you have code you can share for that because I've been looking for examples of that pretty much for the last 10 years.Unknown Speaker:
Okay, yeah, I see what we can share. I know that they did that for a client but I don't know. Because she revolted, right? Yeah. Yeah. But I mean, it's not that complicated to come up with some sample data. Yeah, and definitely I found that super useful too, because I'm, I'm terrible at you know, keeping multiple dimensions in my head. I'm like, it's awful. So it's like, it's time I'm working on a hierarchical model. My main problem is, wait where I am in the in the pyramid now, you know, what's the shape of this thing? And so having the dimensions and coordinates that we have now in PI MC, instead of specifying the hard number, shape, and also having those small functions where you can just say okay, hierarchical prior on that, and then just simple appropriate gifts stuff like that. That's super useful to me. Yeah. And then like also we use we use more and more Bambi which is like the RMS for for Python, IMC and that's definitely super useful. So for I mean, not for all models. Some of them you really want them in time seem completely customized. But for a lot of models, you can use bendy and get great. A great model out of that which encapsulates all those things and automates a lot of the decisions for you.Unknown Speaker:
Right, we found that particularly useful and getting good priors and doing things like preconditioning the data matrix and stuff which we show how people how to do it in our users guide, but you know, you want to run a linear regression. The textbook version is like 10 lines of code, right the real version that preconditions that that post conditions that does everything sort of efficiently, all of a sudden, looks really complicated, right? Because you like transform data, you transform the parameters, then you transform them back. So it's, it's really a lot of work.Unknown Speaker:
Ya know? Exactly. And I'm actually working on a on an online course right now, and developing a lesson about categorical regression. And I'm actually showing how you would do that with PI MC, and then how you would do that with Bambi. And then that shows you how much omit automated things that he is doing for you. Basically, as you're saying, like transforming the data or not. Setting the priors. If you're dealing with like categorical data, well, you need a reference. I mean, you can use the zero sum normal, which is what I do when I do those, model those models with fine tea, but to refuse Bambi. Well, Bambi will choose the category for you and do all the pivoting stuff. And you have to take care also of the categorical predictors, which need to pivot to such if you do that in points, it's a lot of code, and you have to go back and use use PI tensor has always been the interest do this stuff, and Ben B will, will handle that for you. So that'sUnknown Speaker:
super cool. Yeah, I think the future might be combining some of these systems is that is, you know, combining the Bambi like ideas into the PI MC code, butUnknown Speaker:
yeah, yeah, definitely. That's, that's a world I'd like to live in. And so yeah, like, I mean, talking about that. Thanks for the great segue, because I wanted to ask you about the future of Bayesian stats and basically what does it look like to you? And, more specifically, what would you like to see and not see?Unknown Speaker:
I don't know I'm not much of a pre cog, future seer. I pretty much every guest I've ever made about the future has been wrong. So I'm reluctant to speculate.Unknown Speaker:
I mean, you can instead of a prediction, you know,Unknown Speaker:
in two years, Chet GPT is going to be maybe be writing all of our code, right? I don't know. Maybe it'll just we're gonna give up all these parametric modeling and it's all gonna be normalizing flows, and we're just gonna fit variational approximations to things I really don't know. I I'm reluctant to speculate on what the future looks like. What I do know is that teacher has a long tail, right? No matter what the future looks like, there's going to be a long tail of people using things like Stan pi MC things like cuz it's just, you know, people learn things and it takes a while, you know, the generational changes in academia are slow. I remember being like an eager grad student thinking, Oh, this whole field is going to be revolutionized in five years and being like, what people are still doing what they were doing five years ago. That's just, you know, that goes with people that with that, you know, everyone doesn't change it once. So I don't know. I think there's gonna be a lot more machine learning a lot more things like normalizing flows and diffusion processes and things more nonparametric steps than we're doing now.Unknown Speaker:
So more Gaussian processes. You thinkUnknown Speaker:
that Gaussian processes are very expensive, but you know, so there's a lot of work going on here. So like, I'm at Flatiron Institute now with a bunch of really great applied math people. And there's work going on here like really accelerating GPS. The way applied math people work is sort of like the way in law works if you know that system. Right? It's a it's a great nested Laplace system, but they work on its inlet, if you don't know Enlai would suggest looking it up and maybe inviting some of their deaths. Because it's a really coolUnknown Speaker:
system. Now you said Yeah, yeah, I definitely want to invite some people that formula, if you can, if you can make some introduction for me.Unknown Speaker:
Yeah, yeah, it's a it's a really cool system, but the way they work is very much an applied math mode where they do very careful analysis of one model, or one limited family of model and then develop a method that works for that. So we've got people here who are building Fourier embedded Gaussian processes that can scale to like hundreds of millions of data points with exact solutions in one or two dimensions. But they can't really give you like variance estimates at the same time. They're like that. They can do the mean for so there's a lot of like applied math stuff that's like, can make some of these particular instances really fast. But I don't know how much the future is going to look like that. Right and how much the future is going to be blackbox where we just sort of write down models. And and, you know, the computation keeps up. Like I think the state of the art is always going to be some mix of automation and custom building.Unknown Speaker:
Yeah. So still some humans in the loop. Yeah, yeah. Yeah, I mean, definitely about inlab. It's it's an episode that they've been wanting to make for for for tight for a long time. Also, and I have some patrons who asked for it. But to me to tell you honest, like it always fell apart because of scheduling issues. So as of now, I haven't been able to get some member of the inlet team to come on the show, and I definitely want to so yeah, like, Okay, if you or anybody listening can make some introduction orUnknown Speaker:
the title a few of them. I'll send you an introduction, so Yep.Unknown Speaker:
Awesome. Yeah. Thanks a lot.Unknown Speaker:
It's a small world. Everyone's always asking me if we're competing with PI MC and all these things. And I'm like, No, we're all on the same side. We're all like, you know, we want to just get more people using these kinds of systems will be good for all of us.Unknown Speaker:
Yeah, I mean, exactly. And, and that's actually something I'm curious about. From your experience. How do you like how do you answer people's questions about but why would I use a patient model here? You know, like, I don't know I already have an AB model or a regression model. That works. Why would I use a Bayesian model here, especially if it's going to be harder to fit?Unknown Speaker:
Yeah, I mean, I don't think there is a good answer to that, I think for a lot of situations and non Bayesian models fine for what you're doing. So I'm generally not in the business of trying to convert people into bass Ian's there's enough people that have converted themselves, I can just work with them. But I think that the motivation for bass is I think the one we all understand which is the the sort of natural interpretation of probabilities, right? That we're just everything's just going through probability theory, right? We haven't tied one of our hands behind our back by saying parameters can't be random. So and it's the practical systems like pi MC or Stan or something are giving you a way to fit models together, compositionally. Right? And I think that compositional aspect of the models that I can build a COVID model, I can pull out a, you know, an AR two prior and couple that with like an AI car spatial model and then build a GLM for Poisson type data and it's just really easy to put all those pieces together. Right if you're in a frequentist world trying to calibrate confidence intervals or something for that it's a huge amount of math every time you change your models, right? Whereas the base world is just sort of plug and play and go. Right. You can do that in the frequency aside with something like a bootstrap confidence intervals or something but they tend to be not so stable. Which is why everyone's rights not just the whole frequentists world isn't all doing blackbox Bootstrap. At least that's what they tell me. I've never really tried it myself. But I think the black box nature of Bayes just being able to write down the model you want and being able to write the data generating process, and it's sort of I find there's two different ways people look at statistics often in engineering, and what I think characterized by like the data geometry people, what you'll see is people trying to figure out how they can take their data and somehow transform their data. sort of matches a standard model was, I think, the sort of standard Bayesian way to do it's the other way around, where you try to code the generative process and like, mainly when I'm trying to teach people how to code models, I'm trying to get them to stop trying to control things themselves and just write down how they think. The data came about. And it's hard because people don't believe you can just invert that process. It seems like magic like the first time I saw like a mixture model fit I'm like, How in the world did it do this? I'm like, looking at the data. I'm like, There's no way you know, and it just teases these things apart. That don't seem don't seem like humans could do it. But you know, we can. That's why we built computer seconds.Unknown Speaker:
Yeah, yeah. I mean, I agree with that. That's also what what I love them in the Bayesian framework, is that that customizable, building block idea, you know, it's like, often in the labs team we talk about, like Bayesian modeling being more like Legos, when when the traditional machine learning models will be more like play marbles are way less customizable, even though they are very fun to plan to play with. But you can only play with the intent the wayUnknown Speaker:
you Yeah, they're like the new Lego where you build a starship with your Lego. And that's all you get. And then you get another box of Lego and build another Star Wars product. Or something. Yeah.Unknown Speaker:
And yeah, so like that that aspect in the black box model. Definitely in the interpretation that it gives you from the model is just priceless, for sure.Unknown Speaker:
Yeah. And I think I think the other big thing is that it's nice to be able to propagate your uncertainty, right? It's just very natural to propagate your your estimation uncertainty in your sample in uncertainty through inference. Yeah. Right. And that's not a problem. You it's not something you see done outside the Bayesian world very well.Unknown Speaker:
Yeah. True. And, and something I love also is that it makes counterfactual analysis way easier, right? Because it's just well, forward sampling. It's baked in, in the it's baked into the model already, that you can change the data and see what that would mean for the inference side. And I mean, we call that forward sampling, but basically, it's, it's related to like, counterfactual analysis. And that that part for me is really amazing to see especially when you see people understanding that we have been taken to counterfactual analysis by just swiping the data. That's right. It's priceless to see the smile onUnknown Speaker:
the it's very easy, you know, it's like it's a big jump in statistics. And I find this one of the harder things to teach people is that retrospectively, we take our observed data, like the y's in regression, and we treat them all counterfactually as if they could have come out differently. Right. So we've learned something we treat it probabilistically right. And that's the key like what Andrew says, Is this greases the wheel of commerce, right? You apply some uncertainty and that lets your inference get unstuck. Like if you were trying to solve all this stuff, exactly. You wouldn't get a solution but you allow some slippage from some uncertainty, and then all of a sudden, everything can can chunk along.Unknown Speaker:
Yeah, yeah, yeah, exactly. Yeah. I love that. I love that metaphor. Okay, night time is my morning in I want to ask you just like just two last questions before the last two questions. Like talking about that and discovering bees and in in that, basically also, philosophy and practicality of the framework. Are there any mistakes that you think are good to make when you start learning Bayesian statistics?Unknown Speaker:
No, I'm not even sure what you're thinking about there. Like, like a mistake. That's good to make. Yeah, but I can think of,Unknown Speaker:
so like, for instance, for me, I know that it took me it took me a long time to understand the difference. Between posterior samples and preserve predictive samples related to what we just talked about, right, that posterior predictive samples, actually samples that integrate the whole uncertainty of your model, and that would be the the observed data the y's, as you were doing anywheres, as you were saying, and if you mentioned that the y's have a normal likelihood. So why would be following a normal likelihood with mu and sigma, first enunciation and the y's would of course be more dispersed than the views. And the views are usually your posterior samples. Stupid. And to me, it took me a long time to understand that that difference between okay the Meuse and then the y's and I think I made several mistakes about that, like trying to understand that and after some point, I was like, oh, okay, I get it now. And, and also trying to come up with my own posterior predictive samples instead of using pm the sample preserve predictive, I would go to that up afterwards with NumPy based on the posterior samples that I got to see how you get that, how you get to this percent break sample. So does that make the question be clear?Unknown Speaker:
Yeah, I wouldn't say that's a good mistake to make though. I wouldn't encourage people to make that mistake. It's something you need to tease apart. For sure. And it's a problem a lot of people have going that I think the other problem is understanding the post this difference between standard error and standard deviation and the posterior but the idea that a larger sample size will bring your standard error down for estimating the mean, but there's still residual uncertainty. Because, you know, the mean is the mean, and you've got standard deviation. So it's like, we're always trying to convince people to use fewer posterior drawers. Because once you've got 100 posterior drawers, your posterior means identified within 10% of the posterior standard deviation. So you've got you've eliminated most of the uncertainty at that point. Right and and sort of people letting go with that is, I think really hard. Right, realizing that that's enough that we've still got residual uncertainty, right. Even if I know this parameter Exactly. Even if I know them, you exactly there's still residual uncertainty in the sampling the why so we have to deal with two kinds of posterior uncertainty, our estimation uncertainty in the parameters and our sampling uncertainty and whatever the underlying model is.Unknown Speaker:
Yeah. Yes, definitely. Super important indeed. Okay, good. I could continue, but maybe, like, just can you tell me can you tell us quickly, what's the next thing you have in mind that you want to learn that you're curious about?Unknown Speaker:
Oh, that I want to learn? Yeah, well, we just had a really cool workshop here on diffusion processes and, like normalizing flows and measure transport. So I think the main thing I've been getting into now is is normalizing flows and trying to understand variational measures like Wasserstein distance, and thinking about variational inference is a kind of Wasserstein distance minimization, kind of like quasi Monte Carlo. So I want to understand much more than those geometry ideas. I'm also trying to learn more reminding in geometry too, because we're trying to build better samplers that take into account some curvature information.Unknown Speaker:
And so how do you go about that, by the way, like, do you read? Are you more of a book reader and do you listen to podcasts? Do you talk to people?Unknown Speaker:
I'm very much a book and Case Study reader and I like to implement stuff to understand it. I generally try to understand things myself then write it down in math notation, I can understand and then code it. At that point, they have to go talk to experts, because usually I get stuck code again. So like with normalizing flows, I could understand all the math and the tutorials. But all the tutorials were like oh, just plug a context function in here. Oh, just plug any kind of thing. And I'm like, can you just give me one example like I want one normalizing flow. I can code that I don't have to make like five arbitrary decisions. Because I'm not quite sure the space you're trying to ask me to make decisions. They're like, You need a link function. Well, maybe you're going to use value. Maybe you're going to use a leaky value. Maybe you're going to use soft floss, you're gonna use something there. Right and all the tutorials are all just leave in the stats. literature's the same way. It's written for people who understand it already. Right? So they leave a lot of decision points open and it's very hard to code and for me, I don't understand stuff till I can actually get it coded. Yeah, like actually, like have an algorithm and like I'm a computer scientist. So I like to, you know, reading the math sounds good. But I don't like papers where the algorithms are written where they say, See Formula Three and see formula five, and you realize, I don't Formula Three is not enough for me to fill in this blank and the pseudocode. Yeah, so I'm usually really arguing with my co authors to write more explicit pseudocode. Although our last paper to a liar, they pushed back and said, this is like too much. It's like everybody knows this. And I'm trying to say no, everybody doesn't know this. We learned this like Lu Zhang, the person I was working on this paper with, she went and read 1970s Fortran code to learn this there was not like, there were no papers, no textbooks. It was like, we're trying to do a surface here and people are like, Nah, that's like too pedantic. Let's give it something sketchy that no one will be able to replicate. We compromised and pushed it all off to the appendix. Yeah.Unknown Speaker:
Damn. Well, Bob. Thanks so much for taking the time. I mean, I've still zillions questions for you. But let's go with the show. Before that, of course, I'm going to ask you the two final questions I ask you. So first one, if you had unlimited time and resources, which problem would you try to solve?Unknown Speaker:
I would go back to natural language semantics as my original love and I would you know the problem I want to solve as metaphor, how we do associative reasoning as humans? How do our Words have meanings How do we understand each other? Right? How does our How does our language connect to the world? And how does this all having all this stuff floating around in our heads? Let us talk to each other sort of the deep natural language class questions. Questions. I still find most interesting. There's not like technical questions and bass stats, I want to figure out to me, that's a much more practical thing. I'm not good enough at the theory to really think that I will ever make progress and in Bayesian theory, right when I was doing linguistics I told people I wanted to work on metaphor. And Barbara party was a very senior person in my field, told me, she said, Tell you what, when you get tenure, you can work on it on your sabbatical years. That's a good amount of time to work on a problem that heart. Bummer. So you learn that when you come to academia is everybody works out very narrow problems. You wanna, you want to solve metaphor. Yeah, they're gonna you're gonna work on on one preposition. That's what you're gonna work on. You know, we're kind of metaphor for the word for 20 yearsUnknown Speaker:
well, actually meaning to that and I'm reading your book right now by Robert Burton, called on being certain believing you're right, even when you're not a neuroscientist, and that's a bit related to what you say and stuff like that in the book is really fascinating. SoUnknown Speaker:
yeah, in my mind is a cognitive question. It's not a softball question. It's really like how do our brains work? How do our brains let us do this kind of associative reasoning we do.Unknown Speaker:
Yep. And Psych and so buki like talks a lot about that. And that's really fascinating. I love it in definitely recommend it, and I put it in the show notes. And second question, if you could have dinner with any great scientific mind, Dead Alive, fictional, who would it be?Unknown Speaker:
I think I'd go back to my first problem. I think I'd probably choose Vidkun Stein. Or maybe Klein, you know, one of the one of the early 20th century philosophers of language who kind of threw out the pattern, you know. So there was there was a huge movement in the in the 20th century of logical positivism and trying to be a very reductionist notion of language and Vidkun Stein and then Klein finally put the nail in that coffin. You know, that language is really constructed by human agents. It's really associated it's a reality. You know, it's about human agents. It's not about discovering meaning in the world. So I think Quine or Vidkun Stein would be my or maybe Richard Rorty. That's sort of going up in time. He's the most recent he's no longer alive, but credible philosopher sort of sort of changed philosophy of language dramatically in the 20 later 20th century, in a way that I think was good.Unknown Speaker:
Okay, well, I have nothing to argue here. That sounds like it. Absolutely fascinating dinner. So make just make sure to invite me please