#74 Optimizing NUTS and Developing the ZeroSumNormal Distribution, with Adrian Seyboldt

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

We need to talk. I had trouble writing this introduction. Not because I didn’t know what to say (that’s hardly ever an issue for me), but because a conversation with Adrian Seyboldt always takes deliciously unexpected turns.

Adrian is one of the most brilliant, interesting and open-minded person I know. It turns out he’s courageous too: although he’s not a fan of public speaking, he accepted my invitation on this show — and I’m really glad he did!

Adrian studied math and bioinformatics in Germany and now lives in the US, where he enjoys doing maths, baking bread and hiking.

We talked about the why and how of his new project, Nutpie, a more efficient implementation of the NUTS sampler in Rust. We also dived deep into the new ZeroSumNormal distribution he created and that’s available from PyMC 4.2 onwards — what is it? Why would you use it? And when?

Adrian will also tell us about his favorite type of models, as well as what he currently sees as the biggest hurdles in the Bayesian workflow.

Each time I talk with Adrian, I learn a lot and am filled with enthusiasm — and now I hope you will too!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, Adam Bartonicek, William Benton, James Ahloy, Robin Taylor, Thomas Wiecki, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Elea McDonnell Feit, Bert≈rand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Joshua Duncan, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, David Haas, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey and Andreas Kröpelin.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

Links from the show:

LBS on Twitter: https://twitter.com/LearnBayesStats
LBS on Linkedin: https://www.linkedin.com/company/learn-bayes-stats/
Adrian on GitHub: https://github.com/aseyboldt
Nutpie repository: https://github.com/pymc-devs/nutpie
ZeroSumNormal distribution: https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.ZeroSumNormal.html
Pathfinder – A parallel quasi-Newton algorithm for reaching regions of high probability mass: https://statmodeling.stat.columbia.edu/2021/08/10/pathfinder-a-parallel-quasi-newton-algorithm-for-reaching-regions-of-high-probability-mass/

Abstract

by Christoph Bamberg

Adrian Seyboldt, the guest of this week’s episode, is an active developer of the PyMC library in Python and his new tool nutpie in Rust. He is also a colleague at PyMC-Labs and friend. So naturally, this episode gets technical and nerdy.

We talk about parametrisation, a topic important for anyone trying to implement a Bayesian model and what to do or avoid (don’t use the mean of the data!).

Adrian explains a new approach to setting categorical parameters, using the Zero Sum Normal Distribution that he developed. The approach is explained in an accessible way with examples, so everyone can understand and implement it themselves.

We also talked about further technical topics like initialising a sampler, the use of warm-up samples, mass matrix adaptation and much more. The difference between probability theory and statistics as well as his view on the challenges in Bayesian statistics complete the episode.

Transcript

[00:00:00] Okay, we need to talk. . I had trouble writing this introduction. No, not because I didn't know what to say. That's hardly ever an issue for me, but because the conversation with Adrian Zeal always takes deliciously unexpected turns. Adrian is one of the most brilliant, interesting, and open-minded person I know.

It turns out he's courageous too. Although he's not a fan of public speaking, he accepted my invitation on this show and I'm really. He did. Adrian studied math and bioinformatics in Germany and now lives in the US where he enjoys doing math, baking bread, and hiking. We talked about the why and the how of his new project, , which is a more efficient implementation of the net sampler that he wrote in the programming language called rest.

And of course, this project marries very well with m c from the get-go because Adrian is a fellow m C developer. What else? We also dived deep into the new zero normal distribution that Adrian created, [00:01:00] and that's already available from M C 4.2 onwards. We'll see what it is, why you would use it and when.

Adrian will also tell us about his favorite type of models as well as what he currently sees as the biggest hurdles in the Beijing workflow with all. Honestly, each time I talk with Adrian, I learn a lot and I'm filled with enthusiasm. And now I hope you will too. This is Learning Beijing Statistics, episode 74, recorded November 4th, 2022.

Welcome to Learning Beijing Statistics, a fortnightly podcast on based on in France, the method, the project. In the people who make it possible. I'm your host, Alex Andora. You can follow Twitter, Alex Manura, like the country. For any info about the podcast, learn based stats.com. Isla be show notes, becoming corporate sponsor.

Supporting lbs. Patron and unlocking be merch. Everything is in there. [00:02:00] That's Learn Bass Dance Duck. If with all that info a ion model is still we're resisting you, or if you find my voice especially smooth and want me to come and teach bass stance in your company, then reach out at Alex dot Andora at p c labs dot I or book call with me@learnbassdance.com.

Thanks a lot folks, and best be wishes to you all. Let me show you how to be a good and change your predictions after taking information. And if you thinking now be less than amazing, less adjust those expect. Aian is someone who cares about evidence and doesn't jump to assumptions based on intuitions and prejudice.

Aian makes predictions on the best available info and adjusts the probability cuz every belief is provisional. And when I kick the flow mostly I'm watching eyes widen. Maybe cuz my likeness lowers expectations of tight. Howard. Would I know unless I'm Ryman in front of a [00:03:00] bunch of blind men dropping placebo controlled science like I'm Richard Feinman.

Hello Ians. Before we go on with the show, I like the warmly thing, Andrea Coplin, who just joined the LBS patron in the full poster tear. Your support makes big difference, for instance. It helped me pay for the new mic I'm using right now to record these words. So may the Bayes be with you and Yes. Oh, and if you wanna stay connected with your favorite podcast, MBS now has its own Twitter account and Lin page.

Both are at learn based stance, but if you don't like mystery, the links are in the show notes. Now it's here from Adrian Zeal learning patient statistics. Thank you for having me. Glad to be here. Yeah, I'm super happy. So we're not gonna continue in German because, well I wanted to, but I don't know why. Uh, Adrian told me that he has to train his, his practice, his [00:04:00] his English.

So that's really a shame. . Yeah. Like that's how it goes. No, it's not true. It's my German is way too rusty. So yeah, let's continue in English. But yeah, I'm super happy to have you here because I've known you for why now and you've done so many things that it was hard actually to pick the questions I wanted to, to ask you for the podcast cause you've done so many projects and I had so many, yeah.

Exciting questions for you. So I'm happy that you're finally on the show, but I guess I was a bit shy, so , but Alex worked hard to get me here, so , that's among my superpowers, but people should not know. Yeah, I just. I'm relentless . Yeah. So, but let's start, you know, easy and as usual, I like to start with, uh, the background of the guests.

And so let's start with your origin story. I'm curious how you came to the stats in Data world and whether it was more of a senior or a straight path. Definitely not a straight [00:05:00] path, so mm-hmm. , I was always really interested in mathematics and that's also what I started studying at university. And at that time, I never would've dreamed that, that I end up doing statistics.

That's just not what I really was interested in. Yeah. I guess I was somewhat interested in the philosophical part of statistics, like what are, what is the probability and the whole debate of frequent statistics versus patient statistics. That, that did interest me a little bit, I guess. But I was more side interest and just something I, I was curious about.

Not really something I would wanted to work on. But say during the first part of mathematics at university, I learned probability theory, measure theory and things like that, that I guess come in handy now. But nothing at all about applying probability theory in, in any sense. So what, what I would call statistics that came later.

So after, after some time at university, I switched to doing mathematics. I switched to bioinformatics because I wanted to have more [00:06:00] applied content in in, in what I do. And their statistics really did become more important. And yeah, at some point I just started. I don't really know why I started looking into that more.

I think it came up in a couple of projects I was working on and I was just curious. So I actually learned Stan at the beginning. Mm-hmm. , which was a lot of fun and, but I was also into Python quite a lot. So I started looking into P M C and P M C, didn't do a couple of things that Stan could, and I was, yeah, wanted to use Python.

So it started contributing to P M C actually doing mass matrix adaptation things. I think it was pretty much the first, first, well definitely among the first poll request I did, I think 2018 or some something round. Some sometime around bad. I became a core contributor then to p c at some point and continued doing that till now, I guess, and just continued doing statistics.

[00:07:00] And now I work at p c labs together with Alex, where we develop statistical models. Exactly. Yeah. Yeah, we have a lot of fun. we're gonna talk about, about, uh, like more of that later, but, okay. Actually, I, I didn't know you started with Stan. And, and then well, like that is so typical of you. Oh, I'm gonna start the mass matrix adaptation

Like, so I wanna, I wanna be clear to listeners, you don't need to do a new mass matrix adaptation for your first PR in two , but that would be the outcome. I think there's so still a lot of potential for improving, for improvement in that area. So if you want to do that as your first pr, please go ahead.

That's, that's all I'm say. Sure. Yeah. If you wanna, but uh, don't be intimidated. You don't need to. I think my first PR was probably your typo somewhere. , uh, you know, I'm, I'm not sure It actually was my first PR to be honest. The first and the, the first one I remember, I, I dunno if it actually was the first.

It's too late now. That's too [00:08:00] much. Ok. So that's the story. Now drop the bump. Ok. You know how the news works, right? Okay. Yeah. It's like now it's gonna be on CNN and like, you're, you're done. Like it's gonna be, can't change that anymore. It's out now. Yeah. It's like, yeah, like now to change it. It's too many details and nobody likes details, so that's all.

Okay. But okay. I see. So you started with the pure math world and then, or was drawn into the, into the dark of . Yeah. It's just this, I wanted to do applications and as soon as you start working on application statistics, pretty much in, I, yeah, probably not every field, but in really a lot of fields that just pops up at some point.

And if you're already a bit of interested in the philosoph philosophical part of it, then I think that that's easy to draw you in. Yeah, no, for sure. And actually, like how do you define statistics to people? I guess how do you explain your job? You know, because I always have a hard time, like, because it's not math.

But it's related to math because you use numbers all the time. Yeah, sure. It's not software [00:09:00] development because, well, you use numbers and, and math and so it's not just developing programs. So like how actually do you define statistics like to math? Yes. Interesting question. So I think, so I, I guess probability theory for me really is math.

So that's where you just say probabilities have these properties and you don't say what a probability is. You just say what properties something should have for it to be called a probability. Mm-hmm. , then you just prove theorems based on those assumptions on those properties and you don't really, I mean, maybe you think about that, but the, the subject in, in the mathematical world really isn't how to do I apply that to a real world problem, but it's just what does just automatically follow from those?

From those properties, and I'd call that probability theory and statistics would for me then be, if you start applying probability theory, mostly that I guess to [00:10:00] real world problem, to model uncertainty frequencies, how often something happens. I think kind of the core thing for me at least, is really uncertainty, quantification.

So if we say, mm-hmm , we wanna know something and we also wanna know how, how well we know, yeah. We can, turns out, we can interpret that as a probability if we want. Mm-hmm. . And then I guess you can have really philosophical discussions around what, what this probability actually means, which in practice I think, I mean it's interesting, but in practice I think it's probably not always relevant because in the end it's more, yeah.

How you apply it is also kind of a big topic of big subject back. Mm. Yeah. Yeah. Okay. Yeah. I see. Yeah, I kind agree. Like, would you place. To me rings a bell between, you know, the difference between math and physics. Yeah, I think that's, that's physics. We'll try and like really care about what, how to explain the world and how things around us are happening and how it's [00:11:00] possible and how certain we are about that.

So would you place, like if you had a vet diagram, would you place physics in statistics at the same layer? In the same layer? Yeah. I think, uh, may, maybe not exactly, because so physics really tries to explain how the, or describe how the world works around us. While statistics, I think doesn't really, but it's more.

Trying to quantify or tell what we know about it. So it's on a slightly different level, but I think kind of the relationship to mathematics is a, is a bit similar in that it's applying parts of mathematics to our knowledge of something or frequencies of something happening. So I guess in, in that case, it would, might actually be really describing something directly in the real world and not just our knowledge of it.

That that again, depends a bit on how you define probabilities, right? Mm. Yeah. I see. And so that, that got nerdy very quickly.[00:12:00]

Cool. Okay. Yeah. Interesting. And so, yeah, basically, so you said you work now with us at labs and you're doing some fun statistical and mathematical projects, and so yeah, basically, can you also, can you tell us like the topics that you are particularly interested in these days? And so I know you like. You know, getting obsessed by some topics and projects.

So yeah. Maybe tell us about some of your latest obsessions and, and topics that you're particularly interested in these days. Ok. Something I've been looking into a lot because Okay. In a lot of projects we had. Some one problem came up repeatedly where we have a data set. We built a model that works really well with that data set, hopefully really well.

But then a client might apply that model to a different data set and suddenly things don't really work as well [00:13:00] either because of really statistical problems and we didn't understand part of the data generating process and things are just different. Or we can also just have computational issues because the parameterization that we found with the first data set works really well there, but it doesn't really work as well in a different data set.

So I was interested in trying to make that the computational part of that more robust, which then got me actually. Back to the mass matrix adaptation things, which never really left me. I guess that that stayed around. And so trying to find better algorithms to approximate the posterior so that sampler math sampling methods can be more robust.

So that's definitely something I've, I've been thinking about quite a lot recently. Then I guess you mentioned that in that, that you might wanna talk about that zero abnormal question. Yeah. Mm-hmm. Which is in essence, I think a bit a question about [00:14:00] how we want to write regression models or hierarchical regression models.

Mm-hmm. Partially pull what, whatever you want to call them. Yeah. How to write, write those so that they end up being. Fast for the sampler, but also easy to interpret. And I think there are think still some, some open questions there. I think we can improve also priors for standard deviations, which is kind of an eternal subject I guess.

That never really goes away. Yeah, true. Yeah, yeah, yeah. Remember you posting something very interesting about the priors for St Deviations in, in our discord on by CNAs? Uh, yeah. It's like these kind of thought experiment about like basically trying to set the senate aviation the whole like on the whole data set.

Well, yeah. On the whole data set instead of one parameter at a time. And then you could just trickle like do domino effect on what that means for. Individual parameters. So the basic idea would be to ask [00:15:00] how much variance do we have in total in the model? And say we might, may, might wanna have that as a parameter in some sense.

Yeah. How we parameterize that is then I guess a bit of a different question. And then we ask how much of that variance comes from various sources. And that way if we increase the number of predictors, for instance, the total variance doesn't grow larger and larger each time we add a add a predictor, which I think doesn't really make sense.

That shouldn't happen. Yeah, yeah, yeah. No, I agree. And also it makes, it's easier to set the priors because it's true that it's always like, yeah, I don't know. I don't know how to set the prior on that slope. You know? That's just like, what does that mean? Whereas having a total variance for the whole phenomenon you're explaining is more, yeah, more intuitive.

And then it's. How do you automate the trickle dyna effect on the individual parameters? Well, I, I don't think I really have that worked out in any sense yet, but what was pretty don't nice is No, no, I don't. What I definitely [00:16:00] liked was when I kind of had this whole different approach of, okay, trying to find out which parameters do how we do we wanna set the price.

Then I looked at that for a while and worked with it, and after some time I realized, oh, this is actually equivalent to setting half normal price of the variants, which is the, which is default, what we do all time was because this great new thing turns out to be, well, just same, same old, but also, I guess in a sense that I also like that because maybe that points to that.

Maybe that was actually not, not such a bad idea, but yeah. Let's see. Let, let's see how that works out. Yeah. Do you, do you have any. Like are you testing that approach already on a project? No, no. I tested that a bit on kind of artificial data sets where played ground with it a bit and I think that would just be the natural next step now to do [00:17:00] that.

Yeah, couple of real projects and just compare first how does it actually do something different? Does it make more sense? Does it actually make it easier to set the priors or maybe it doesn't, I don't know. So yeah, you have to experiment with that a bit. Yeah, that'd be, that'd be very interesting. Yeah. I have some projects in mind there were where trying that.

Yeah. As usual the problem. Like the time. It's the time. But, uh, yeah, like that, that'd be definitely super Griffin. Cool. Okay. So yeah, like actually before we, we dive into those, those different topics, I am, I'm wondering if you remember how first, first got introduced to Basin methods because like, Yeah, basically you got introduced more to stats during doing bioinformatics, so I'm guessing it happened at that time, but like did it or was it actually later?

That definitely happened in stages, so pre, so I remember definitely having discussions around patient stats with friends in first year at, at university, I think. [00:18:00] Mm-hmm. , that was more like I'd read something or friends read something and we just talked for about it for a bit. Nobody really did anything with patients.

That's then I think the first time I really did that was near the end really worked use, used patient statistics to do anything was ne relatively near the end of my univer time university. I think using it to model RNA accounts for instance. So if you have actually pretty large data sets and I think actually pretty complicated.

Not, not that complicated models, but with, with horseshoe prior. So it kind of went all in and did all the things that, that actually pretty difficult to sample correctly. Mm-hmm. where it's really hard to avoid getting divergences and making sure you actually sample from the posterior. So it didn't, didn't start easy that way I guess.

Okay. And that's when you started with 10 I guess? Yeah. Mm-hmm. Okay, nice. I see. And so, but, and then you stick to it because like it was mainly appropriate methods for the, the kind of of [00:19:00] problems you were dealing with. I mean, and I'm talking about patient stats here, not, not staff. I think maybe a bit also kind of the usual thing that if you learn the tools and the the methods of working in, in, in one framework and you, you tend to use that for the next problem that comes.

Comes around, which I don't think is necessarily a good thing, but it's also not necessarily a bad thing. So I, yeah, I dunno. I think that's definitely part of it. And then you also, I think you notice the problems more where you could use those, those methods and you kind of gravitate to, to one to one set of problems, like Yeah.

Yeah. If you have a hammer, you, you're gonna see all the problems as, as names or, well, you'll just, just find nails everywhere, find names. Just ignore those screws that are around as well. I dunno. Yeah. Yeah. Screw those screws. . Yeah, that's a good one. Yeah. Okay, I see. Okay. Actually, let's dive in. Inwe, the, the first part you talked about, like, so with actually mass matrix [00:20:00] adaptation, I correctly, it's called not buy and yeah.

I'll, I'll stop there. Like what can you tell us about Nu Buy? Basically give us the elevator pitch and then, and then we'll. Yeah, sure. So that also started more as a small hobby thing. I wanted to try out, I just thought, hey, it would be fun. Rust is a fun language. Mm-hmm. always liked it. Why not? Why not write nuts?

Hamiltonian markoff, Jane Montecarlo methods in in rust and see how that goes. So I did that some time ago with really basic implementation and was sitting around for a while at some point. Then we had discussions in the iza backend, which we use to compute block P graphs, that we could compile that to number so we can just get out a plane C function that doesn't have any, any patent in it anymore.

And I thought, ah, maybe it would be interesting to see if we can't use that rust implementation I, I had around. To then call that. But in order to do [00:21:00] that, I had to develop to make that actually real. I had to, to develop the, the rust implementation quite a bit. I also worked in a couple of new ideas for mass matrix adaptation.

So there's actually pretty simple change we can do to mass matrix adaptation to use the gradients as well as the draws. And so we just use a bit of additional information. And the method looks really similar, but in all my experiments, it seems like it's actually working quite a bit better. So especially early in tuning, uh, we can get to good, good mass matrix and I think to, to the posterior where, where we want to get with quite a, with fewer gradient evaluations, not orders of magnitudes, fewer, but definitely fewers, fewer and seems also so, so far.

Pretty stable, more stable and robust actually than, than the default, uh, in, in my experiments. So never, never trust the author of something like that for something like that. I guess So do your own experiments, but [00:22:00] so, and now I think it developed, so, but by now I think it's a pretty stable library, actually.

Relatively small library written in rust implements, just the basic Hamiltonian mark of Jane Montelo sampler. And you can actually use that to sample Stan or P M C models, both with little asterisks in there that, because for Stan, you'll have to use a different branch of HTTP stands so that we can actually get at, at the gradient function easily.

And for P M C, that works out of the box and I think much nicer. So you can just install nut pie using Conda or Mumba or whatever, and just call two functions to sample using nut pie. But that requires the new number backend for aza, which is still a bit of a work in progress. So depending on your models, it might work really well, or you might just get an error message if you're unlucky.

In that case, if you do, and if you try it, please open an issue that would, that would be great. So I kinda [00:23:00] know where that stands, but the library itself I think is pretty stable by now. So just getting the gradient functions, that's actually the tricky part. Yeah. Yeah. So actually, yeah. So if people wanna try it out, it's in like downloadable from PPI and also conduct, right?

I think I actually not, not PPI yet, I think Oh, just should be actually, should actually pretty easy to add. But, but it's just haven't done that yet, so, uh, conduct, yeah. So yeah, I, I, yeah, I think that's why I installed it. I think I installed it with, with member. So yeah. I'm by install, not buy, and then you have it, uh, and you can try and, yeah, we'll put a link to the, to the GitHub repo in the show notes so that while people can, can look at the, the examples you are talking about, how do you sample the points model or stand model, thanks to that.

And then, yeah, if you find some issues, for sure, open issues on the, on the repo guide because it helps make, [00:24:00] make the, the library better for everybody. That's the. Tri it's library returning rest that implements H M C nuts. So like, which is the most robust algorithm we have right now to do M cmc. That's already what we're using p c, that's what used in stand.

And the last twist is that you used new mass matrix adaptation in this implementation of ht. Exactly. I get that right. Yeah. So let's dig a bit into that. Can you first, can you remind listeners. What the mass matrix adaptation is and why would we care about that when we are sampling a basin model? Okay.

Yeah, sure. So I think most people who work using, who have actually played with Asian models and try to sample those with agency, noticed that sometimes you can rewrite your model a tiny bit so that it does the exactly, [00:25:00] exactly the same thing, but somehow in one version it samples really fast and a different version.

It samples really slow. Or in one version you get divergence and in. So it doesn't really work actually, and in a different version it works just fine. So that's always the question of parameterizations. So the model might actually be the same, but the numbers you des used to describe that model are different and.

So sometimes, and some parameterizations are good for the sampler and some are bad for the sampler. Now we try all, always, when we implement agency pretty much to do some of those three parameterizations automatically. Namely we rescale all parameters. So you could just say, I sample stand normal and scale that by the standard deviation, or I just, yeah.

So each parameter you have in your model, you could just multiply by some value as parameter and then divide by that value again when you use it. And that would be the same model as before, [00:26:00] but that. Might turn out to have really different performance for the, for the sampler. Yeah. And so for instance, like, uh, a positive parameter, a standard aviation that has a half normal prior, actually it's sampled on the log scale so that we, like, we transform it on the log scale so that we sample on the real, on the real line.

Right. It's not sampled on the, on the positive. And then you try to find, you rescale all those transformed variables so that they're posterior. Usually that's kind of the usual way of doing it, so that the posterior standard deviation is one. So that all of those have the same, same, same variants. So you mean all the parameters in the, in the model.

Yeah. And so just rescaling all each individual parameters so that it has posterior sensit deviation, one. That's usually what we refer then to as diagonal mass matrix adaptation. Mm-hmm. and non diagonal mass matrix adaptation would then be something where we actually do a linear transformation. [00:27:00] Of all those individual parameters.

And to be clear, you do that. You do. We do that only during the sampling. But then the poster you get back from time C or STEM is then rescaled. Yeah. Yeah. So that's completely hidden. You don't notice where you use the library. That happens automatically in the background and you, you don't need to to worry about it if you're just, just use the library.

But during sampling, that's really important because the sampler will have really bad performance. If one posterior center deviation, for instance, is 10 to the minus two and another posterior center deviation is maybe a thousand, then it just will, will be horribly slow. So we need to avoid that. And the usual way of doing that is just to try and sample a little bit, so UN, until you get a sense of how large the posterior ventilation is, and then you rescale everything to fit.

To fit that, then you sample a bit more to get a better idea of what the actual posterior deviation is. And you rescale again, and you iterate that a [00:28:00] couple of times until you, you find something that, that you actually like and, and think works well. And that's mostly what's happening during two. Yeah, exactly.

Yeah. That's what, that's that. Yeah, that's the warmup or tuning phase that you like people have probably heard about. And then once you get to that phase where it's pretty much stable, then you, you think that you've reached the typical set and then you can start the sampling phase per se. That's the kind of samples that you get from in your trace.

Once the model has finished track, I mean during warmup also, or tuning also, a couple of other things are happening. So for instance, you are actually moving to the typical set. So because you. Start somewhere where that's just not in the, in the typical set at all, where you just don't wanna be. So you need to move in the right direction first and you need to find step sizes and, and, and the mass matrix.

So I guess those are the three different things. But the mass matrix adaptation is really usually that which takes so long. So that's why you have a thousand samples [00:29:00] or something before you actually start drawing the samples that you actually want. That's the whole reason why there is this first long section that often takes a long time, and we would like to reduce that and make it faster.

And the basic idea than behind modifying that a bit is saying that, okay, maybe we actually have more information than just the posterior than a deviation. We also already, for h c, we computed the gradients of each lock P value and gradients and draws. They both provide information about how much variation there is.

For a certain variable. And turns out if you use both, you can usually converge to something reasonable faster and you can't always, and it's not just that you can converge to something faster, but the thing that you converge to might also be a bit different. And that tends to lead to better effective sample size per draw values.

So it's the [00:30:00] tuning tends to happen faster and after tuning you get a, tend to get a better result, at least in all the experiments I did. But I mean, maybe there was one or two models where it was a bit worse, but get the idea. I see. Okay. Okay, that's very cool. So basically you're using more information because you're using the gradient and so that helps you arriving faster to the typical set and probably.

Most of the time finding, having a more precise idea of what the typical set actually is. Yeah. Or a more accurate idea, maybe not more precise. And I think the basic idea of how I derive that can also be extended to re parametrizations that are not just rescaling, but that are more complicated. So hopefully that can be developed in a way so that correlations in the posterior, for instance, are way less of a problem even for large models.

So right now you can do something called full mass matrix adaptation, where you try to find a [00:31:00] linear transformation of the parameter space that gets rid of all correlations. That doesn't really work if you have a large number of parameters. So if you have 5,000 parameters, that would be a 5,000 by 5,000 matrix.

And working with that just is no fun. And it's also hard to estimate then. So it definitely seems like if you use the same math I used to derive the new diagonal mass matrix adaptation and apply it to full mass matrix adaptation or some something in between, actually. That also seems to work pretty well.

But that's not enough PA yet. That's still experimental stuff I'm working on and hopefully works out well, but let's see. Mm, okay. Yeah, and then that would be. Stuff that you, like a new kind of adaptation that you would add to Nu Buy for instance, and then that people could be able to use in, in, so there actually is already an implementation on GitHub somewhere, COAPT, which actually works with a default prime C sampler.

Not an, not a nut pie, but that's more of an [00:32:00] experimental implementation and I think it works. So I, I don't think it's a completely experimental, broken thing in a sense, but I wouldn't just just use that in production if I . Yeah, that doesn't probably sound like a good idea at this point. Yeah, yeah, yeah.

No, I get so, yeah, first that reminds me. So you see folks, that's why we tell you to not code your own samplers, . It's like you can see the whole, like the, the amount of research and expertise in math and like collective. With them and thoughts that go into all those samplers that you use through probab basic programming languages.

It's because it's like, it's really hard work and extremely technical. Oh, you do implement your own samples and just do that work, which is really a lot of fun. So that Sure. Yeah. . Yeah. But , it's usually, it's harder. Yeah. Yeah. I, I definitely think so. If you're like me, I would recommend not doing that , just use the samplers [00:33:00] that smarter people than you say.

Yeah. Yeah. You should use that and like get rid of your sampler that you've written in Python. So, yeah. Uh, first and um, second. Okay. So thanks. That actually helped me understand what the mass matrix patient is. Cause I always forget, so that's cool. Yeah. You also talked about the step size. That's right. So of course I would've questions for that, but like your question I often get when I teach patient stats is so with, you said that the sampler starts somewhere and that very often is not in the typical setting.

Mm-hmm. . So that's the image value, right? The initial value where the center starts, and for a moment we see that less and less. So it seems that we've done a good job at communicating that, but for a long time it was quite frequent that people would start the initialization forced C, for instance, to start the initialization at the mean of the [00:34:00] data or the maximum epi of the parameter.

Mm-hmm. . And we told people to not do that. So, Can you, yeah. Can you tell us why it's usually a bad idea to do that and usually leave that choice to point C or stand? So I think one reason is definitely that, I don't think it ever, never used the word ever, but it typically doesn't really help. So it definitely adds an additional thing that needs to be done.

So you need to run optimization first. And I think, so instance, I mean in the literature there's actually interesting ideas around that of doing that. So Pathfinder for instance, is I think, really interesting paper that that tries to develop that idea. But if you just basically doesn't really help, and I think there are cases where definitely can, can make things worse.

So where if you just optimize naively, you might end up in a funnel somewhere, or you might end up somewhere. I mean, the gradient is [00:35:00] zero at the mean by the finish or by at the posterior estimate, at least by definition. So, Is that good for an initialization? I, I think not. So mostly I think it's a why do that, don't really see the point it does, doesn't help much.

So I think maybe it did help actually with, so before the whole h c thing came around, before the gradient based methods, maybe it did actually help there. So that might actually be why it was a thing. But with agency, I mean, usually there might be models where that's different, but really usually if it's a well-defined model, you have the first first couple of draws where the sampler just goes to the right spot, and after that it doesn't matter anymore.

Yeah. Okay. I see. And another question that that I get is like, why aren't the tuning sample samples also given back by P M C? By default, you can get them, for [00:36:00] instance, in C, if you say like there is a, I think it's discard tuning samples equals false by default. It's true. But can you explain why in, it's not like, most of the time you don't need those training samples and when would you need them?

If you develop mass matrix adaptation methods, for instance, you definitely wanna use them because then you can see what it's actually doing behind the behind the scenes. Yeah, that's definitely where you need them. But other than that, I mean, it's just an area where it tries to find good parameters and it's not trustworthy samples during that, during that period in a sense.

So other than for diagnostic and trying to understand what's happening, reasons, I don't really see what you would do with those draws. So they're there to find good parameters for the sampler. And after you found the parameters, you then sample. Yeah. They don't necessarily tell you anything about the inferences that your model make makes, right?

It's like these samples are the poster samples that you actually get from your [00:37:00] ppp. That's what you care about if you wanna see if your model is making good inferences on. Cool. And so it's, I'm taking from what you're saying that basically if listeners are interested in Nampa, they basically can try it on any of their models right there.

There shouldn't be any restrictions on the kind of models that you. Can in theory use net pie on the only, like the main restriction was the one you mentioned, which is, well, sometimes an OP would be missing in number, in which case open a, open an issue. But, uh, in general, like you can try net pie on any model.

Yeah, I, I don't think there's, again, if you have cu custom ops and call into C code or something, which I like to do a lot, but I think 99% of people don't do that. So you're safe , and if you do that, I think you know that you're not safe. So in that sense, go for it. Yeah. So if we zoom a bit, I'm actually curious about the, any difficulties that you [00:38:00] encountered with implementing Net Buy and in general, what you learned from them?

If you encountered any difficulties. I, good question. I definitely encountered lots of small difficulties, so, mm-hmm. fighting. Yeah. Thinking about how to structure different parts of the library. It definitely, the, the whole adaptation things, they, they tend to, Cut across concerns a bit. So they are a bit tricky to separate out in, in a, in a nice and clean way, I think.

But to be honest, it went surprisingly smoothly. I think probably it helped that I worked on the P M C sampler before quite a bit, so that it wasn't the first nuts implementation I ever did. Then definitely getting all the details right of nuts. That's, that's always a bit tricky because there's a lot of small things you can do wrong and you, they don't look wrong, they just are wrong.

So you get incorrect results if you get. Mess something up or you get less [00:39:00] efficient sampling but looks right. Mm-hmm. . So there are definitely it. It's definitely a tricky thing to implement, right? I think. Oh yeah. But I had, I mean I was looking at the stand implementation of the p c implementation a bit to compare and make sure things actually work the same way.

So, you know, there was actually really a lot of fun and I really enjoyed working in Rust as well, which, It's definitely the largest project I ever did in Rust, and uh, I really enjoyed that. Yeah. Okay. Well that's cool. I didn't expect that answer, . I thought you had a lot of banging your head against the walls.

Oh, yeah. Definitely had banging my head against walls. I mean, but I think that's just a given for any, any software project, be honest. Mm-hmm. and, uh, listen, I don't have really this one thing that Yeah. You didn't have like one big block. And, uh, I wanna reassure the listeners because they don't have the video, but, uh, your head looks, it looks good now it looks, it looks like, yeah.

All the bumps from the wall seems [00:40:00] have disappeared. Yeah. That may heal nicely. I'm, I'm, I'm glad. Yes. Yeah. Maybe that's also why that project was softened because the BES disappeared so fast. , that's good. Okay. Before we switch to another project, Of yours. Do you need any help actually on respite? On respite, on not by um, like if yes, what can people help you with?

So nut pie itself feels to me relatively finished. So if people have nice ideas, that that's always great. And I mean, you can definitely. Clean up coat more. I think it's decent hope, but it's not like there's nothing left to do. But it ha is a library that has relatively small scope, so implement this one algorithm and do it nicely, hopefully.

So I don't think there's that much work to do on that library itself at this point, unless somebody wants to add more samplers. That might be really interesting. SMC would be, I, I'd be glad. Uh, glad about something like that. [00:41:00] I guess where really a couple of things are where it could really use help is the number back end.

So making sure, so just testing, testing the, the new mass matrix adaptation for on different models and see if we can find something where it's worse, definitely worse than the old implementation. That would be interesting to see than just and from the implementation point of view or the. Number ops that might be missing, still figure out which ones those actually are and implement them.

And I think that's, and test them. Make sure they actually do the right thing as well, which would be, that would be nice. So that's definitely something where I think lots of people could, could help them, which would be great. Good. And so the way to contact you about that is to go on the GitHub rebook of net.

Yep. So if you run it with a p c model and you notice, okay, this op doesn't work, or the compiler throws an error message or something, just open an issue. Simple as [00:42:00] that. And, uh, that'd be there to look at that. Good. Okay. Cool. And um, yeah, actually when you look at the examples on Net five, on the GitHub Ripple, there is a comment on the P M C model example that says that this distribution should be a center of normal.

And it's the perfect segue for the next topic because you know that, uh, the Statistical Academy now calls that zero abnormal Adrian. Okay. So I guess that would be a contribution, maybe pr that, uh, fix us that and especially since like now actually pm your normal exists, it match. You can even use that in the, in the example.

Thanks to Alex, because I was Yeah, I will. Yeah. Thanks. It was really a collective endeavor. So Luciano Pass helped me a lot. Andra, of course, as usual. I feel like Fido now is kind of, um, you know, uh, extension of the [00:43:00] repo . So, and um, yeah, so. Let's talk about that because zero normal. So yeah, I worked on the pr, uh, to merge it into, into P M C, but you are actually the, the father of the distribution.

So yeah. Can you tell us how, like, what it is and how you came up with the idea? Say first I guess, about being the father of that distribution? I'm not act, so I, I'm pretty sure I made that up on my own in a sense. I mean, as on, on your own as you can. I'm not entirely sure at all. Sure. I was the first person to make this up.

So somebody else I think definitely understand on, understand court. There were people discussing very similar things, I think. Yeah. So no clue who was first there in, in any way. Not, don't wanna, maybe it's like, but you know that, uh, you know, a story about lab class. Who rediscovered Bass formula and , [00:44:00] he thought he was the first one if he was like super heavy.

And uh, and then I think it's, I thinks maybe actually a bit, bit, bit less important than that as well. But , , well I'm not sure , we'll see what he story says, you know, but yeah, like DMO actually like that has already been discovered. So yeah, he was quite sad about, about that. You know, that was actually quite reassuring to me.

A genius like life. Could be sad sometimes and depressed, you know, that was inspiring. in some way. Yeah. But, okay. Anyways, so yeah, now that we've done the usual caveats, hey, can you tell us about like what that distribution is and how you Yeah. Came up with the idea basically the perfect, okay. So from a mathematical point of view, it's actually a really simple distribution in some sense.

It's just a multivariate normal distribution where one of the IG values happens to be zero. So it's a weird co variance matrix that's just a constant co [00:45:00] variance matrix in that sense. Nothing special move on. I guess where it came up, where, where I started to think about this is, In linear regressions, we always have, or it's well known issue and with well-known solutions in a sense where if you have discrete predictors, so you might have a model where you have an intercept plus something in each us state or so, and you might wanna have a different parameter for each US state.

You have too many parameters in a sense, because you are over parametrizing. Yeah, you're over parameterizing your model. So you would like to get rid of one of the degrees of freedom and how statisticians have often done this and which is, and it's because I'm, I'm interrupting you, but Yeah, it's over parametrized because once you know the behavior of 49 states, states, in that example, you would know the behavior of the 50th state without.

Having to compute it. [00:46:00] Yeah, I, I guess so you have one parameter for each state, and you have the intercept, so you have, yeah. How many states are there? Sorry? 50. 50, okay. . But then you have 51 parameters. So you have 50 states, so actually 50 output variables in a sense. But you have 51 parameters because you also have the intercept.

And a normal way of dealing with that is to just drop one of those states in a sense and say that corresponding parameter is defined to be zero. So we'll just basically use the intercept value then for predictions there. Now if you do that in a frequenter setting where you don't have any regularization, that's perfectly fine in a sense that it doesn't matter which state you choose.

You get the, I mean, you get different numbers out of your model, but you get the same results. Uh, so the different numbers just mean, mean something else. But no matter which, say you choose, you get the same meaning for your results. This is not true. However, if you do it in a patients model where you assume maybe that the values from the states come from a normal distribution, for instance, [00:47:00] and suddenly it does matter which state you, you drop.

So you don't really wanna drop a state like you do, like you tend to do in, in, uh, frequentist regression because then you introduce this weird. Weird arbitrary choice. That changes a few results. I mean, in some settings that choice isn't arbitrary. So maybe one state is a control in a state that doesn't really make much sense.

Yeah. But maybe for drug term. Yeah, exactly. Placebo, it's control. And that makes sense to define that as zero, so that's fine. But in other cases it doesn't make sense. Yeah, and in a lot of cases, actually you don't like, because, so what you're talking about I think is called reference encoding or like Yeah, yeah.

Reference categorical encoding and like you use that also in multi regressions where, well, now that you have several categories you need, well, well, you need the usual way of not having the other parametrization, which comes we an identifying an identifiability of some parameters is to, as you were saying, [00:48:00] take one of the categories as the reference or pivot also, that's called the pivot.

And so, P 10 codeine reference and coding, that's like, that's exactly what we're talking about, taking one of the categories as the reference and like a lot of times it's quite hard to have a reference category in mind, like that really makes sense unless it's really a placebo or something that really makes sense to like make reference to all the time say.

But this leads to the fact that often in in a patient model we actually have it over over parameterized in some sense, which isn't really a problem in some, in some sense because everything works out fine because we have our priors. So it might slow down the sampler quite significantly in some cases.

And it also sometimes makes interpretation more difficult I think. So then there was the idea, I mean, pretty similar to difference coding in, in the frequentist sense. Can't we say we have parameters that tell us not [00:49:00] what the value is for each state, but a parameter that tells us how much does that.

States differ from the mean of all the states and the, I mean, in this case, I mean the sample. Okay. For states there's no, no difference between the mean of all states and the sample mean of states. But let's ignore that difference for a moment. So there's, what's the difference to the meme of each individual state?

So you could say we have a parameter that tells us that some. Has a larger than average value. And you have another parameter that tells us it's lower than average. So if there were only two states, you could say, actually we just say what's the difference between the states? So we define a new distribution that tells us both the values for both of them, but that distribution has one degree of freedom.

Fewer, because it just says, okay, it's required to sum to zero. So for two, one would then just be plus a value and the other would be minus that value [00:50:00] for three values. It's a bit more complicated then, but still works out similarly. And it turns out that you can then write the, an original model that's just intercept plus one parameter for each one parameter from, from a normal distribution.

You can take any normal distribution if you like, and pull it into two parts, a zero sum normal distribution, and the sample mean distribution. So you could ask if you have 50 states, what's the distribution of the meme of those states? And what's the difference through the meme and both of those distributions you can make sure if you write it down correctly are normal distributions.

So multivariate normal in this case then, and then you have intercept plus this sample mean distribution. But there those two are just normal distributions. So you can combine them into one normal distribution without really changing the model if you like. So you could adapt the center deviation a bit of that parameter.

So you can write hierarchical models like that as just using the zero sum normal [00:51:00] in a sense that it's just strip the ere parameterization to make a sample faster. But then I think if you do that, there are many cases where I think it actually makes more sense to look at the parameters of the zero sum.

So for the states, for instance, it doesn't really make sense to have a parameter for so be because the number of states is a fixed, it's a fixed set of states, and it doesn't really make sense to want to make predictions for new states coming in to the US in the future or something like that. So it's not a really infinite distribu distribution for an infinite number of states, but it's just a finite, finite set of states and it makes sense to say, okay, the mean of those has to be zero.

So sometimes I think, depending on the context, it also makes it much easier to interpret results. Then if you just say, okay, how, how different is this state to the mean of all states? Yeah. Yeah. Because then your reference is changing. It's not one category, it's the mean [00:52:00] behavior of all the categories. So here it would be the mean of the.

And so, yeah, in a lot of cases that makes more sense. First to interpret the model is like, that helps you understand. Oh yeah. Okay. So Alaska is really below average, whereas Massachusetts is where you above, and also that means that it's not gonna, like, the results are not gonna change because like you're not changing the category, the reference category.

Whereas like if you are using classic reference and coding and then you change the pivot or the reference category, then your re your resources are gonna change the parameters, the parameter values are gonna change. And also you probably, maybe you'll have to change the priors. Yeah. And, and not just the values change, but also their meaning might change.

So you can really get different results, not just the same results written in a different way, but really something that's fundamentally different. Yeah. And so that's something that would be helpful to use. So anytime you [00:53:00] do linear regression with categorical predictors, for instance, so that's what you are talking about.

And also, as I was saying, multinomial regression, well, it's like trying to infer the prob later probability. Given number of categories. So here that's the same. And so what I really love about that is that I do use quite a lot of Multim models and, and before you had to do that pivot thing. And so imagine you have a multi numeral regression and in the regression you have a categorical predictor.

That means you have to do two pivots at some times at some point. Yeah. And it's like, so, and, and you drop a dimension at each time you do a pivots, it's like you, you have to keep all of that in mind and in the code and also you have to stack with asa and it makes the code way more complicated. So in those cases, I would say if you really need to use classic reference and critique, then use Bambi if you're in Python, cause Bambi will [00:54:00] do that dropping in reference and coding for you, the pivoting.

But then if you can use zero normal because it makes more sense to you. What I love is that you can actually write the multi normal regression like you write the Panal regression. But it's just like instead of using pm normal, you're gonna use pm zero to normal. And that's super cool . It makes everything way easier,

So I think these are like, I would say the two main examples and I have to work on a notebook, Jupyter Notebook example with Ben Vincent to go through at least two examples of categorical regression, categor linear regression with categorical predictors and Al regression. So we'll get there folks at Timer recording, we haven't, well Ben has started that and Tony Capto, I just haven't looked at it at all.

Yeah, that's definitely something there already also get, I think it gets awesome. Bit more interesting if you add interactions as well, cuz the zero sum normous [00:55:00] implementation that we have now also. Work in multiple engines. So you can say it needs to sum to zero if you sum across one axis and a second axis separately needs to sum to zero or something.

Yeah. Which is really nice. Uh, if you, if you're working with interactions as well. Yeah. So it's like in your example, for instance, that that would be that common intercept. Then you have one perimeter per state where you have a zero oma that some zero cross states. Then you have one perimeter per, what could we use here?

The example as a second categorical predictor. . That could be age group. Yeah, that could be age group. Yeah, exactly. So then you have age groups, which are ca here are categories again. And so you have the main effect of age groups where you would use the zero oma on, uh, on age this time. And then you could look at the interaction of age group and states.

So that means another [00:56:00] set of parameters. But here you would have. To force the parameters on the interaction of age and state to be zero sum, normal, to be zero sum across ages and states. And so that will be in the example that I was talking about folks, but that gives you a first idea of what that means.

And yeah, the cool stuff is that the implementation we have in prime C is that, uh, you can just say PM zero, or. And the usual stuff. So give a name, then your perimeter, uh, your prior for sigma. Then you just say, zero sumax axis equals two, and that means principal understand. You just know what the two zero sumax is.

Basically on the dimensions that you, that you also maybe could mention a case where it might not make sense to use the zero sum normal. That might, for instance, be if you actually have a predictor where it makes sense to say this is we can draw an arbitrary. Elements from this. So it's not just a fixed, finite set.

Yeah. But it's potentially infinite set, so I don't know, patient or [00:57:00] something. Mm-hmm. , I mean that's not potentially infinite, but it's potentially very large. Yeah. So you could just draw on new patients or get see new patients and it doesn't really make sense to compare patients to the mean of the patients I have already seen.

You then want to compare to the meme of all patients. There are probably, yeah. And then it's makes sense that that's actually a parameter in the model. If that set is finite and fixed, then I think fixing it to something, yeah. Based on the meme of those makes more sense. Mm-hmm. . And so if that set is not finite.

So using zero normal here Yeah. Is. The best idea, what would you use? Would you use like classic reference and coding? Uh, no, I think that would probably, I mean you can use the zero sum normal in the per re parametrization sense that you just say, I just wanna make the model faster. Sample faster. So I use this, but I kind of work out in the end again what, what it would've been if I hadn't done that.

But you can just use a piano norm. I mean [00:58:00] that, that's over parameterized in a sense, but I don't think that's really something wrong with that in that setting. Yeah. I mean, yeah, mathematically there isn't, but so the problem could be in the sampling, but if you can get away with sampling, that works because well not is quite robot robust, especially if you use net byte

And so like then you could get away with a model that's over parametrized, but actually just way easier to read and code and then to understand. And then if you wanna make predictions on the new patient, for instance, well you can still do Cool. Nice. Yeah, I love, I love that distribution. It's really, is really interesting because as you say, like it's quite simple mathematically, but uh, like it still takes a bit of time to understand how it behaves and, and how useful and when it's useful.

I, I think it's definitely kind of a variation of well understood things. So kind of difference coding and dropping columns in, in linear regressions. All that definitely isn't new. That has been around. Yeah. [00:59:00] I dunno how long, long, yeah, for sure. But I think this is kind of a new take. I, I think new, I new in the sense that I haven't seen it before, let's put it like that, but new take on mm-hmm.

on that and I think useful way of looking at it. Okay. Time is flying by and, um, I still have a few questions for you. So actually when, like, when I'd like to ask you is a bit like more general. I dunno if you'll have an answer but that, but what do you think the biggest hurdles right now in the patient workflow?

Is there something you think like would be very, very neat to have improved in the workflow? That's a good question. So I think figuring out sensitivity to some choices, for instance, is definitely, would definitely be on a list of important things to look at. So in any model I have ever written, I just put in numbers at some point somewhere for price, for prior, for instance, I mean numbers for standard deviation or [01:00:00] which distribution do I choose and mm-hmm.

in many cases, It's hard to really investigate all of them and really carefully think about all of those. And in a small model you can definitely do that and um, kind of try to get a feel for how, which ones are important. And often some of those priors really just don't, don't matter at all. So whether you have, if you have a large data dataset, whether you put a half normal prior with a standard deviation of five or 2.5 on your final zema, that, that, that might just not make any, any measurable difference after sampling.

So it's kind of, yeah, but figuring out which of those things that you put in and think air there, probably, but they're not, not gonna matter. Actually do matter. That I think is, is an important point. So it would be great. For instance, I don't know if that's kind of the, the best way of doing that would be if we could mark things [01:01:00] that we think Okay, they might, I, that that isn't somewhat arbitrary choice and I put in a number, but I can, could kind of mark that, okay, this is one that could, I could have chosen differently and wouldn't really know the difference.

Yeah. So that you could mark those in a model by using, I don't know, PM hyper parameter or either something and then automatically have some method that helps you, which of those are actually important? Mm-hmm. . So either by resampling or by looking at gradients, trying to Yeah. Do sensitivity analysis in some sense of how, how does the posterior depend on those choices that I make there?

Mm-hmm. and have that. semi-automated at least, so that I can easily, more easily get, get ahold of which are the important things, which prize to actually need to think about and which ones are just, yeah, who cares, doesn't make a difference. Yeah. Yeah, that'd be cool for sure. There's definitely also something that's, yeah, I agree is it's hard [01:02:00] to, to handle on a, like in on a principled Yeah.

In a principled way. Yeah. Like definitely, and I mean, it ties back to what we're talking about at the beginning about how do you choose the sickness for any given individual parameter. Right. Whereas it makes way more sense often to just say, okay, that's the, the amount of variation I expect from that whole phenomenon.

And how like, here is how I think it is distributed among the parameters. , I don't know an exactly. Then another thing that I think is important for most, mostly for highly non-linear models, let's say you have something with an O D E, there's something my wife actually does at the university, uh, in her postdoc where it comes up a lot that if you have an O D E and you, you don't really know, think that this O D E Exactly describes the data generating process.

Mm-hmm. , how can you put error models in there? So that, I mean the, the whole setting where [01:03:00] I don't actually think my model is the model and something else is going on mm-hmm. and that then you kind of have this, okay. The, the Beijing statistics will just assume that the model you specify is exactly the right model and give you your exact perfect answers for that model.

But if that's wrong, then yeah, well what does the output mean? And definitely in highly non-linear models you can easily see example where does. Just goes horribly wrong and it's really hard to fix. So the question, how can we get uncertainty of more complicated processes in the model, how exactly they, it may work and make, make the uncertainty estimations more robust to, to changes to the model.

Mm-hmm. That, that's more of something where, yeah, I'm not actually sure how to approach that, to be honest. Yeah, no, for sure. I mean, it's a whole, that reminds me, I think, of the software assisted patient workflow, you know, [01:04:00] that activated the are working on and, and talked about in, in this podcast. And I mean, it's a very active area of research.

So, yeah. Uh, I mean, posterior predictive checking definitely is, is an important thing there where you can then easily notice that something is wrong, at least. Yeah. But that, that doesn't necessarily make it that easy to fix. So No, no, no, no. For sure. For sure. And. Yeah, for the record, I think that PM Careful is a really good name for that kind of automated you were talking about

It'd be cool. I'd like to use pm Careful . Uh, okay. Well, so before asking you the last two questions, I'm actually wondering like, kinda a fun question, but is there a method or type of models that you particularly like and uh, you would like to tell us about here? Ah, I dunno actually just [01:05:00] plain old, plenty of regressions.

I think they're kind of, they're family of models that seems really simple and easy to understand, but actually is pretty, I mean, generalized linear models I guess. So the more general version is surprisingly useful. Unreasonably useful I think so. It's really looks simple. Seems. Like, you could learn that pretty quickly, but actually learning all the details and making sure you can actually use it for different settings is, is really tricky.

And yeah, I think that's, yeah, just in general an interesting thing. Yeah. Yeah. Yeah. I agree. I, I really love generalized linear neuro aggressions. I, I mean that kind of like, you know, the, the illustration of the per principle in, in statistics, , you know, it's like just, it's like, it's sounds like 20% of, of models, but you get 80% of the results next to those.

It's, it's really cool. And also kind, most, [01:06:00] most things are models seem to kind of be okay, just a regression with some twist in it somewhere. There's, there's often a twist where it's not the standard gimme a regression thing. Yeah. Other than that twist, it's still kind of a really, really useful building block that you can really do a lot.

Yeah. Yeah, I agree. It's like, you know, a traditional re cooking recipe that it's like each time you make it, it's like, yeah, that, that stuffs good. That works. , can I like pizza? You know, it's like, yeah, pizza works like most the time, you know, just love it. Uh, you try to do other staff and some new fancy fusion stuff in Michel Star restaurant, but yeah, just pizza will get you a long way,

Okay. Maybe a model. The pizza of statistics. Yeah. Oh, that's good. That's good. Should be it could be. Yeah. So I was recording an episode the other day and was like with, uh, next Fighter pilot [01:07:00] of the Canadian Army. , your episode's gonna air before that. So that's like for ERs. It's actually a teaser . And I realized that, I mean, it would be so cool to have a movie about bass statistics, right?

Like it would make it, it would make it so sexy. So like definitely we need a movie about bass stats and that. Should be the name of the movie. Probably in Aggression is the Pizza of Statistics. such a bad name. . Yeah. Okay. Which projects are most excited about for the coming month? Do you have anything right now you are working on, in, uh, are super excited about?

Yeah, that we've been up Pie, I think. And just in general the, the mass matrix adaptation changes and kind of, I, I also kind of like the mathematical framework that I came up for that which mm-hmm. , which horribly slowed with writing down properly, which I just, yeah. Anyway. Yeah. But I'm really, really excited about that.

I think that [01:08:00] was a lot of fun to work on. And Cool. I beat your hair that in again. I can hear, , can hear the excitement and, uh, okay. So before letting you go though, I'm gonna ask you little two questions. Ask every guest at the end of the show. So, uh, first one. If you had unlimited time and resources, which problem would you try to solve?

Yeah, probab, I mean, getting the best matrix or depletion. Yeah, getting the sample is more robust. , I think that , I knew it. That's probably it. Yeah. . I love that answer. You're definitely the first one to say that. . So now I, I guess it shows that I'd have to actually listen to your podcast that off, because I don't really know what most people answer there.

Sorry about that. I'll guess. Uh, that's good catching up to do. That's good. That means you were, you were not anchored at all. So that, that's perfect. That's a really true independent sample here. Like no auto correlation with the rest of the samples. Any problem within statistics [01:09:00] that, that, that's how I was answering that.

Not kinda. Worldwide problem. I think that I might have other priorities to be honest. Okay. Okay. , I see what you did here. That's not gonna fly, but I see what you're trying to . So second question, if you could have dinner with any great scientific mind, dead, alive, or fictional, who would be, that's also a hard question.

As someone like Euclid would be actually pretty interesting, assuming there's no language barrier, because otherwise I'll definitely go for somebody German or English or British or whatever. Yeah, I think that would be interesting to just, I mean I, I don't know if he was actually a nice guy. Don't think we actually know that much about Tim personally.

Yeah, I don't think we do kind of to have this axioma axiomatic way of thinking about mathematics 2000 or something or 2000 years ago that I think is actually really fascinating and it would might be fun to kind of figure out how you. Things about how math developed, for instance. Mm-hmm. . Yeah. Also kind of in [01:10:00] general.

I, I guess I like antiquity. Interesting. Yeah. Period. So even apart from math and science and stuff, uh, might be things to ask. Yeah, for sure. That must be fun. And the cool thing is that if it turns out to be a jerk, well probably after the dinner he would probably die because of all the germs that you gave him and that he has no immunity against the, hopefully it goes that direction.

I'm not entirely sure, but, uh, yeah, true. Oh, true. Yeah. I mean, it's not probable that it goes that direction than the other, but yeah, you never know. Yeah, you never know. fi dank. I, I would've liked so many more questions, but, um, it's already been a long time, so let's, let's call it a show and as usual, I put resources and a link to your.

Different projects in the show notes for those who wanna dig deeper. Thank you again, Adrian, for taking the time and being on this show. Thank you. That was a lot of fun. This has been another episode of Turning Patient [01:11:00] Statistics. Be sure to preview and subscribe to the show on your favorite pacher or PA and visit learn based.com for more resources based on today's topics, as well as access to more lip episodes that will help you reach true patients state of mind.

That's learn based dance.com. Our theme music is good patient. Bye Baba Ringman at Mc Lars and mega. Check out his awesome work@babaringman.com. I'm your host, Eric Sandora. You can follow me on Twitter, alexco endora, like the country. You can support the show and unlock exclusive benefits by visiting patriot.com/learn steps.

Thanks so much for listening and for your support. You're truly a. Good and change your predictions after taking information in. And if you're thinking of be less than amazing, let's adjust those expectations. Let me show you how to be a good change calculations after taking fresh data in those predictions that your brain is [01:12:00] making.

Let's get the solid foundation.

Transcript

Sign up for our newsletter!

The latest from Reverend Bayes directly in your inbox!

QUICK Links

Get in Touch