Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

What’s the difference between MCMC and Variational Inference (VI)? Why is MCMC called an approximate method? When should we use VI instead of MCMC?

These are some of the captivating (and practical) questions we’ll tackle in this episode. I had the chance to interview Charles Margossian, a research fellow in computational mathematics at the Flatiron Institute, and a core developer of the Stan software.

Charles was born and raised in Paris, and then moved to the US to pursue a bachelor’s degree in physics at Yale university. After graduating, he worked for two years in biotech, and went on to do a PhD in statistics at Columbia University with someone named… Andrew Gelman — you may have heard of him.

Charles is also specialized in pharmacometrics and epidemiology, so we also talked about some practical applications of Bayesian methods and algorithms in these fascinating fields.

Oh, and Charles’ life doesn’t only revolve around computers: he practices ballroom dancing and pickup soccer, and used to do improvised musical comedy!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar and Matt Rosinski.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

Links from the show:

Abstract

by Christoph Bamberg

In episode 90 we cover both methodological advances and their application, namely variational inference and MCMC sampling and their application in pharmacometrics. 

And we have just the right guest for this topic – Charles Margossian! You might know Charles from his work on STAN, his workshop teaching or his work at his current position at the Flatiron Institute.

His main focus now is on two topics: variational inference and MCMC sampling. When is variational inference (or approximate Bayesian methods) appropriate? And when does it fail? Charles answers these questions convincingly, clearing up some discussion around this topic.

In his work on MCMC, he tries to answer some fundamental questions: How much computational power should we invest? When is MCMC sampling more appropriate than approximate Bayesian methods? The short answer: when you care about quantifying uncertainty. We even talk about what the R-hat measure means and how to improve on it with nested R-hats.

After covering these two topics, we move to his practical work: pharmacometrics. For example, he worked on modelling the speed of drugs dissolving in the body or the role of genetics in the workings of drugs. 

Charles also contributes to making Bayesian methods more accessible for pharmacologists: He co-developed the Torsten library for Stan that facilitates Bayesian analysis with pharmacometric data. 

We discuss the nature of pharmacometric data and how it is usually modelled with Ordinary Differential Equations. 

In the end we briefly cover one practical example of pharmacometric modelling: the Covid-19 pandemic.

All in all, episode 90 is another detailed one, covering many state-of-the-art techniques and their application.

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
[Alex Andorra]:

Charles Margossian, welcome to Learning Vision Statistics.

[Charles Margossian]:

Hi Alex, thanks for having me.

[Alex Andorra]:

Yeah, thanks a lot for taking the time. I'm super happy to have you here.

[Alex Andorra]:

It's been a while since I wanted to have you on the show and now I managed to

[Alex Andorra]:

find the slot and thank you also to a few patrons who sent me messages

[Alex Andorra]:

to tell me that they would really like to hear about you on the show. So thanks

[Alex Andorra]:

a lot folks for being so proactive and giving me. ideas for the show. So Charles,

[Alex Andorra]:

let's dive in and

[Charles Margossian]:

But

[Alex Andorra]:

as

[Charles Margossian]:

maybe

[Alex Andorra]:

usual...

[Charles Margossian]:

I can bring up a very quick anecdote, which

[Alex Andorra]:

Oh sure.

[Charles Margossian]:

is I think two, three weeks ago, your show came up. And I told my

[Charles Margossian]:

colleagues, I would feel like I have made it in the Bayesian. statistics

[Charles Margossian]:

world if I get an invitation to speak on learning Bayesian statistics. So

[Charles Margossian]:

I'm thrilled to be here and I think sort of the mysterious patrons

[Charles Margossian]:

who have incited this meeting.

[Alex Andorra]:

Yes, I'm sure they will recognize themselves. So, yeah, let's start with your

[Alex Andorra]:

origin story, actually. That's also something I found super interesting. How

[Alex Andorra]:

did you come to the world of statistics and pharmacometrics and epidemiology?

[Charles Margossian]:

Mm-hmm. Right. So I became interested in statistics and data as an undergrad.

[Charles Margossian]:

Actually,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I was studying physics. I was already in the US. I was at Yale and

[Charles Margossian]:

I was working with an astronomy lab

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

on Exoplanet. We had a lot of data.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So that generally got me interested in data science. But I wasn't

[Charles Margossian]:

introduced to Bayesian methods. And I always was a bit uncomfortable

[Charles Margossian]:

with how we handling uncertainty, what might we do about it. And I was fortunate

[Charles Margossian]:

after I graduated to actually get a job at a, at a biotech company where,

[Charles Margossian]:

so the company was met from research group. They were

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

based in Connecticut and the supervisor I got there, William Gillespie.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

is one of the pioneers of Bayesian methods in pharmacometrics. And

[Charles Margossian]:

so he introduced me to Bayesian statistics and it made a lot of sense.

[Charles Margossian]:

And I realized, oh, this is what I have been looking for when I was

[Charles Margossian]:

doing astronomy. Right. And I started, you know, piecing this together.

[Charles Margossian]:

And to be fair, I hadn't done a lot of statistics before. I don't have

[Charles Margossian]:

the experience of struggling with classical statistics for a decade before

[Charles Margossian]:

being rescued by Bayesian statistics. I encountered Bayesian very early

[Charles Margossian]:

on. But even then, the way we counterfeit uncertainty, the fact that

[Charles Margossian]:

everything reduced to essentially one equation was extremely compelling

[Charles Margossian]:

to me. And Bill Gillespie, hired me

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

at Matrim Research Group to work on Stan. So this was back in 2015.

[Charles Margossian]:

And at the time, Stan was fairly new, very exciting, very promising.

[Charles Margossian]:

And you had these specialized software in pharmacometrics, but they were

[Charles Margossian]:

not open source. They were not really supporting Bayesian statistics,

[Charles Margossian]:

or at least that was not their priority. And they didn't give you flexibility

[Charles Margossian]:

that a lot of my colleagues were looking for. On the other hand, you had

[Charles Margossian]:

Stan, which was open source, which had great algorithms to do vision modeling,

[Charles Margossian]:

but which lacked a lot of the features required to do pharmacometrics

[Charles Margossian]:

modeling, right? One example was support for differential equation solvers.

[Charles Margossian]:

And all these models are based

[Alex Andorra]:

Yeah.

[Charles Margossian]:

on ODEs. And at the time, Stan had limited support for ODEs. and more

[Charles Margossian]:

generally implicit functions.

[Charles Margossian]:

And then there were some more specialized things like handling, you

[Charles Margossian]:

know, the event schedule of clinical trials. And the project ended up

[Charles Margossian]:

being, we're going to write some general features, we're going to

[Charles Margossian]:

contribute them to Stan, and then we're going to write a specialized

[Charles Margossian]:

extension called Torsten. And that's going to have, you know, more bespoke

[Charles Margossian]:

features targeted at a pharmacometrics audience. And so... That was

[Charles Margossian]:

my exposure to Stan. I can tell you about my first poll request, which

[Charles Margossian]:

Bob Carpenter and Daniel Lee reviewed

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

extensively. They were very patient with my C++. And

[Alex Andorra]:

Hehehe

[Charles Margossian]:

my plan, my loose plan had been, well, I'm gonna work for one or two

[Charles Margossian]:

years. I'm gonna learn programming and statistics because I had gained

[Charles Margossian]:

an appreciation for it in astronomy. And then I'm going to do a PhD

[Charles Margossian]:

in physics. But Andrew Gellman, who I had met in an official capacity

[Charles Margossian]:

like through my work, encouraged me to apply to the statistics program

[Charles Margossian]:

at Columbia University, saying, you know, if you do statistics, you'll

[Charles Margossian]:

still be able to work on the natural sciences and the physics that

[Charles Margossian]:

you're interested in, but you

[Alex Andorra]:

Yeah.

[Charles Margossian]:

know, you'll have, maybe you'll be able to make a more unique contribution

[Charles Margossian]:

as a statistician. So I took his word for it. And yeah, and once

[Charles Margossian]:

I had the offer from Columbia, I mean, it was a difficult decision. Um, but

[Charles Margossian]:

I decided, okay, I'm going to do a PhD in statistics. And that was a big

[Charles Margossian]:

change, the field. Um, but I did pursue that. I was able to continue

[Charles Margossian]:

working on Stan and on Stan adjacent, uh, projects. And just a year

[Charles Margossian]:

ago, I completed the PhD. So I would say that's the origin story. Um,

[Charles Margossian]:

that eventually led me to the Flatiron Institute. So that's where

[Charles Margossian]:

I'm currently, it's in New York. We're a non-profit. We focus on applying

[Charles Margossian]:

computational methods to the basic sciences.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So big emphasis on collaboration. We have people who do astronomy, who do quantum

[Charles Margossian]:

physics, do biology. And so that really resonates with why I wanted

[Charles Margossian]:

to do statistics in the first place, which I wanna solve problems in

[Charles Margossian]:

the sciences.

[Alex Andorra]:

Yeah, I can really relate to that. That was also kind of what happened to

[Alex Andorra]:

me too, even though the first field was political science and not astronomy.

[Charles Margossian]:

Okay.

[Alex Andorra]:

But yeah,

[Charles Margossian]:

Yeah.

[Alex Andorra]:

definitely in the end, applying the methods became more interesting than the

[Alex Andorra]:

field in itself. So continued

[Charles Margossian]:

Right,

[Alex Andorra]:

on that path.

[Charles Margossian]:

right, and I think that, you know, the attitude I had, so I had offers

[Charles Margossian]:

to do a PhD in physics, a PhD in statistics, and a bunch of different

[Charles Margossian]:

fields. And at the time, my attitude was, look, wherever I go, I'm gonna

[Charles Margossian]:

have to do, you know, analyze some data, understand where that data

[Charles Margossian]:

comes from. I'm gonna have to, you know, have the tools to do the

[Charles Margossian]:

analysis or do the statistics.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So the idea is, I think our loyalty is not to a discipline or to a

[Charles Margossian]:

field, it's really to a problem.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Whatever

[Alex Andorra]:

Yeah.

[Charles Margossian]:

problem we work on, there's always a lot that we need to learn. We're never

[Charles Margossian]:

experts in a new problem, in a new research problem. And so kind of like,

[Charles Margossian]:

try not to take the field that I mean, whether it's for the PhD or even,

[Charles Margossian]:

right now I'm in computational mathematics, right? You know, whatever that

[Charles Margossian]:

means. that I care more about the problems I work on than the department

[Charles Margossian]:

I'm affiliated with.

[Alex Andorra]:

Yeah, yeah, completely, completely really do that for sure. I like the fact of

[Alex Andorra]:

using the method to solve an interesting problem, whether it is about astronomy,

[Alex Andorra]:

political science, biology,

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

epidemiology, it's like, and in the end, I find that even more interesting

[Alex Andorra]:

because you get to work on a variety of topics that you wouldn't have otherwise.

[Alex Andorra]:

And also you

[Charles Margossian]:

Right.

[Alex Andorra]:

learn so much. So that's really

[Charles Margossian]:

Yeah,

[Alex Andorra]:

the cool thing.

[Charles Margossian]:

yeah.

[Alex Andorra]:

Yeah. Thanks a lot for this introduction. And so right now, well, these

[Alex Andorra]:

days you're working at the Flatiron Institute, as you were saying, and it seems

[Alex Andorra]:

to be a very diverse organization. We had Bob Carpenter actually

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

recently on the show, so I'm going to link to his episode in the show notes.

[Alex Andorra]:

But you, Charles, what are the topics you are... particularly interested in

[Alex Andorra]:

these days at the flat iron.

[Charles Margossian]:

Yeah, I think that I have a little bit two pools right now. There are

[Charles Margossian]:

a bit on the methodology side

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

of things. And one of them is variational inference, which is a form

[Charles Margossian]:

of, it's called approximate Bayesian method, is what it's called. And really

[Charles Margossian]:

what it's... What I'm trying to understand is when should we use

[Charles Margossian]:

an approximate method like variational inference, because it's incredibly

[Charles Margossian]:

popular. It's used in a lot of fields of machine learning.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And it's used a lot of times in artificial intelligence. And yet you

[Charles Margossian]:

have so many smart, brilliant people who are completely distrustful of variation

[Charles Margossian]:

and inference. And it's also not difficult to construct an example

[Charles Margossian]:

where it really doesn't do the job, where it really fails in a spectacular

[Charles Margossian]:

way. And I think that ultimately it depends on the problem you apply

[Charles Margossian]:

it to. And what we need to do is understand. you know, when can we

[Charles Margossian]:

get away with the approximations that variational inference proposes to do?

[Charles Margossian]:

Why does it sometimes work really well? Or why do we sometimes really

[Charles Margossian]:

get punished for using variational inference? And I can give you a

[Charles Margossian]:

very simple example of that,

[Alex Andorra]:

Yeah.

[Charles Margossian]:

which was, so this was a recent work with. with Lauren Saul, who is

[Charles Margossian]:

also at the Flatiron Institute. He kind of has machine learning there. And

[Charles Margossian]:

there's a statement that goes around, which is to say, well, variation

[Charles Margossian]:

inference will give you good estimates of the expectation value for

[Charles Margossian]:

a target distribution. So that's great. We often care about expectation

[Charles Margossian]:

values, whether

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

that's in statistical physics or in Bayesian statistics.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

But on the other hand, it will underestimate uncertainty.

[Alex Andorra]:

Mm-hmm. Okay.

[Charles Margossian]:

Okay, well, what does it mean to underestimate uncertainty? How do

[Charles Margossian]:

we, you know, how do we make sense of a statement like that, which

[Charles Margossian]:

has appeared time and time again, over the last two decades.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And really we realized that there were two measures of uncertainty

[Charles Margossian]:

that seem to come up again and again, one is the marginal variances and

[Charles Margossian]:

the other one is the entropy

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

and so people like the entropy because it's a multivariate notion.

[Charles Margossian]:

of uncertainty and it's a generalization of the marginal variance.

[Charles Margossian]:

So when people say variation on inference underestimates uncertainty, they

[Charles Margossian]:

usually mean you're underestimating the marginal variances and you're underestimating

[Charles Margossian]:

the entropy.

[Alex Andorra]:

Okay.

[Charles Margossian]:

And so what we ended up doing is demonstrating this on the Gaussian

[Charles Margossian]:

case, right? You have a Gaussian target, you're approximating it

[Charles Margossian]:

with a Gaussian with a diagonal covariance matrix. So that's called

[Charles Margossian]:

the factorized approximation or the mean field approximation.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And indeed you underestimate the marginal variance and you underestimate

[Charles Margossian]:

the entropy. Here's where it gets interesting, which is as you start

[Charles Margossian]:

going to higher dimensions, And for some reason, I don't think people,

[Charles Margossian]:

people have only looked at, you know, really the two dimensional case,

[Charles Margossian]:

because that's the figure that fits on that page. Right. But if you start

[Charles Margossian]:

going to higher dimensions and you take limits where the dimension goes

[Charles Margossian]:

to infinity, you can construct examples where you actually get very, very

[Charles Margossian]:

accurate estimates of the entropy, but you are underestimating the

[Charles Margossian]:

marginal variances in every dimension in an arbitrarily bad man.

[Charles Margossian]:

And so the two notions of uncertainty are not at all equivalent and

[Charles Margossian]:

not at all interchangeable. And what ends up happening is you look at fields

[Charles Margossian]:

where variation inference is applied. So

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

for example, in statistical physics, where, where people want to estimate

[Charles Margossian]:

the entropy of an easing model, for example, and that's

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

where this factor is this mean field approximation comes

[Alex Andorra]:

Okay.

[Charles Margossian]:

from. Well, here it works just fine. You actually get good estimates

[Charles Margossian]:

of the entropy, right? In certain limits. In machine learning, where

[Charles Margossian]:

you're trying to maximize marginal likelihoods, rather than default Bayesians,

[Charles Margossian]:

actually, you know, you get good estimates of those marginal likelihoods.

[Charles Margossian]:

At least that's our working conjectures. But in

[Alex Andorra]:

Okay.

[Charles Margossian]:

Bayesian statistics, where we have interpretable quantities, and we know

[Charles Margossian]:

those marginal variances mean something for those parameters that have

[Charles Margossian]:

a meaning. Well, here, variation inference might really not or at

[Charles Margossian]:

least not vanilla implementations in it. And that's an example of how, by studying

[Charles Margossian]:

an example, we can start understanding why is it that some people are

[Charles Margossian]:

so enthusiastic about variation inference and other people are so

[Charles Margossian]:

distrustful of it. It really depends on what is the measure of uncertainty

[Charles Margossian]:

that you care about, and that in turn is informed by the problem you

[Charles Margossian]:

want to solve. So I bring this up as an archetype of the work that we're

[Charles Margossian]:

trying to do.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

We wanna understand this method one in one.

[Alex Andorra]:

Mmm. Yeah, that's really interesting.

[Charles Margossian]:

First big topic.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

The other big topic is I still do a lot of MCMC.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I care a lot about MCMC because as we'll see in pharmacometrics, I

[Charles Margossian]:

really think that is our best tool to solve many problems.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And here I'm trying to understand some more fundamental questions,

[Charles Margossian]:

right? So people often say, well, you know, yeah, MCMC is great but

[Charles Margossian]:

it's too computationally expensive. That's why we'll

[Alex Andorra]:

Yeah.

[Charles Margossian]:

use approximate methods. And one thing I will argue is actually it's

[Charles Margossian]:

computationally expensive, but we don't really have a very good sense

[Charles Margossian]:

of how much computation we should throw at MCMC. Because ultimately

[Charles Margossian]:

we have three fundamental tuning parameters. One is the number of chains

[Charles Margossian]:

that we use. The other one is how long is the warmup or the burning

[Charles Margossian]:

phase? And then the third one is how long is the sampling phase? And

[Charles Margossian]:

actually,

[Charles Margossian]:

it's not clear what is the optimal computation that you should throw

[Charles Margossian]:

at an MCMC problem. I think people rely on heuristics. More often,

[Charles Margossian]:

they rely on conventions. These are the defaults in Stan, in PMC, in

[Charles Margossian]:

TensorFlow probability. But actually, you know, we need to think about

[Charles Margossian]:

have we used a warm up phase that's too long or too short or

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

sampling phase that's too long or too short, right?

[Alex Andorra]:

Yeah.

[Charles Margossian]:

Or how many chains is it useful to do? Especially now we have GPUs,

[Charles Margossian]:

right? In theory, I could run MCMC with 10,000 chains. And people do

[Charles Margossian]:

that now, right? I mean, some people do that. It selects you do that

[Charles Margossian]:

now, right? But what are the implications of doing that, right? I'll

[Charles Margossian]:

tell you an implication. If I have 10,000 chains, that I'm running on

[Charles Margossian]:

a GPU. then I think the sampling phase can be one iteration per chain.

[Charles Margossian]:

We can discuss this and we can argue for or against it, but it changes

[Charles Margossian]:

a little bit our perspective of how much computation we throw at it. Another

[Charles Margossian]:

perspective I like is if you have a strict computational budget,

[Charles Margossian]:

and then you say, well, I'm not going to use MCMC, it's too expensive.

[Charles Margossian]:

Let me run variation inference. I know it's biased. I know it's approximate,

[Charles Margossian]:

but at least it, it finishes running in, you know, within 10 minutes and

[Charles Margossian]:

like, well, run 10 minutes of MCMC. It'll be biased. It'll be approximate

[Charles Margossian]:

and it will finish running in 10 minutes. And then ask yourself, well,

[Charles Margossian]:

how good is this estimate? thinking a little bit more carefully about,

[Charles Margossian]:

you know, how much computation do we really need for MCMC and for the different

[Charles Margossian]:

problems that we might be trying to solve.

[Alex Andorra]:

Mm-hmm.

[Alex Andorra]:

Yeah, that's really interesting. Thanks a lot for that very clear presentation.

[Alex Andorra]:

I really love it. It's actually continue on that path because I wanted to ask

[Alex Andorra]:

you about that a bit later in the show anyways. Yeah, several things that

[Alex Andorra]:

bumped into my mind. First thing is MCMC is also an approximation method. So...

[Alex Andorra]:

Why

[Charles Margossian]:

Yes.

[Alex Andorra]:

do we say, why do we, and I know we usually do that in the field, we define

[Alex Andorra]:

variational inference as an approximation method, which kind of underlies,

[Alex Andorra]:

assume that MCMC is not, but it is. So can you maybe draw the distinction?

[Alex Andorra]:

What makes the difference between the two methods? And why do we call them approximation

[Alex Andorra]:

for variation?

[Charles Margossian]:

Yeah, absolutely. So I think that. people think like a lot of statisticians

[Charles Margossian]:

asymptotically. So you know that asymptotically in what sense when

[Charles Margossian]:

you run a single chain for an infinite number of iterations, MCMC

[Charles Margossian]:

is not only gonna generate samples from stationary distribution, which

[Charles Margossian]:

oftentimes is the posterior distribution, but also multicolor estimators

[Charles Margossian]:

with an arbitrary precision. Your multicolor estimator will converge true

[Charles Margossian]:

expectation value or the true variance or whatever it is you're trying

[Charles Margossian]:

to estimate. Whereas with variational inference, here we have to be a

[Charles Margossian]:

little bit careful because what does it mean for the asymptotic of variation

[Charles Margossian]:

inference? So you might say, okay, I'm gonna run the optimization for

[Charles Margossian]:

an infinite number of iterations. So let's assume that the optimizer

[Charles Margossian]:

does converge. And actually until recently, this was not really shown.

[Charles Margossian]:

There's a recent pre-print that I know Robert Gower and Justin Dumkey

[Charles Margossian]:

have been working on where they actually showed that, yes, under certain

[Charles Margossian]:

conditions, stochastic optimization will converge for variation inference.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

But then even if it doesn't converge, you say, well, let me think about

[Charles Margossian]:

the approximation that minimizes my objective function. So oftentimes

[Charles Margossian]:

the Colbert library divergence. That's what I get asymptotically. Well, that

[Charles Margossian]:

will still not be in general, my target distribution.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Even asymptotically, I'm still approximate. I think that's why people

[Charles Margossian]:

draw this distinction between MCMC and variational inference. It's

[Charles Margossian]:

really in the asymptotic sense that MCMC is an exact method, whereas

[Charles Margossian]:

VI remains approximate. Even if you've thrown an infinite amount of

[Charles Margossian]:

computation, you're probably not. Right? Now, in practice, we're not asymptotic,

[Charles Margossian]:

right? We work with finite computation. And so I think it's very important

[Charles Margossian]:

to recognize that, yes, MCMC is also an approximate method. It's not

[Charles Margossian]:

unbiased because you don't initialize from the stationary distribution.

[Charles Margossian]:

You do not reach the stationary distribution, right? So when I hear statements

[Charles Margossian]:

like, well, first we wait for MCMC to converge to the stationary distribution.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I think people have the right intuition, right? I don't think it's

[Charles Margossian]:

a misleading statement, but nonetheless, it's an incorrect statement.

[Charles Margossian]:

What we do is we wait for MCMC to get close enough to their stationary

[Charles Margossian]:

distribution. And the reason why we care about being close enough

[Charles Margossian]:

to the stationary distribution is because we want the bias to be small, right?

[Charles Margossian]:

And so

[Alex Andorra]:

Thank you.

[Charles Margossian]:

when we think about, you know, convergence diagnostic. the way I think

[Charles Margossian]:

we really should start thinking about a quantity like the r hat statistics,

[Charles Margossian]:

for example. And r hat is interesting. r hat has been around for three

[Charles Margossian]:

decades and frankly, there's still debates about what does r hat measure.

[Charles Margossian]:

I mean, this is very existential, right? We have an estimator, but it's not clear

[Charles Margossian]:

what the estimate is. And my perspective, my most recent perspective

[Charles Margossian]:

is that what really matters, the reason I care about convergence is because

[Charles Margossian]:

I want the bias of my Monte Carlo estimator to be sufficiently small.

[Charles Margossian]:

It's not going to be zero, but it has to be sufficiently small. And so

[Charles Margossian]:

can R hat tell me something about how small my bias is? And here's the

[Charles Margossian]:

paradox, which is that R hat the way you compute it, it's a ratio

[Charles Margossian]:

of two standard deviations. Right. So you know that when you measure variance,

[Charles Margossian]:

it doesn't tell you something about bias. Right. And yet, you know, we say

[Charles Margossian]:

our hat tells you if your warmup phase is long enough.

[Alex Andorra]:

Yep.

[Charles Margossian]:

Not everyone agrees that that's what it tells you, but you know, that's

[Charles Margossian]:

my perspective and, um, and I think it's a reasonable perspective,

[Charles Margossian]:

right? But the, the point of the warmup phase, you know, at a, the primary

[Charles Margossian]:

point, not the only point is for the bias to go down. Right? So we're

[Charles Margossian]:

faced with this paradox, this very fundamental question. Can R hat actually

[Charles Margossian]:

give us any useful information? And there was a recent paper that argued that,

[Charles Margossian]:

well, since R hat is really just looking at the variance, it's a one-to-one

[Charles Margossian]:

map. So this was some really nice work by,

[Charles Margossian]:

I know that Kida Vats is one of the co-authors on the paper, and

[Charles Margossian]:

there's another co-author. They call it revisiting the Gellman-Rubin statistic.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And they argue, well, R hat is just a reframing of variance and of

[Charles Margossian]:

effective sample size. And to me, effective sample size only matters

[Charles Margossian]:

when you're looking at the sampling phase, because it tells you, is

[Charles Margossian]:

your variance low enough? And so we had to think a little bit hard about

[Charles Margossian]:

this because it wasn't completely satisfactory because either it means

[Charles Margossian]:

that r hat is not a useful convergence diagnostic in some sense, or actually

[Charles Margossian]:

there's more going on. And what we realize is you look at the variance

[Charles Margossian]:

that's being measured by r hat. So really what r hat ends up measuring

[Charles Margossian]:

is you're running a bunch of chains. It could be four chains, it could

[Charles Margossian]:

be more. Each chain generates one Monte Carlo estimator. And then you average

[Charles Margossian]:

the per chain Monte Carlo estimator. So now you look at the variance

[Charles Margossian]:

of a single chain Monte Carlo estimator, that variance you can actually

[Charles Margossian]:

decompose it into a non-stationary variance and a persistent variance. And what

[Charles Margossian]:

we realize, and you have to be careful, you have to do that analysis

[Charles Margossian]:

for non-stationary Markov chains. Otherwise

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

you completely miss the non-stationary variance.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And the non-stationary variance is, you know, actually, it's a measure

[Charles Margossian]:

of how well you've forgotten your initial point.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And you can show in some cases that it decays at the same rate as the

[Charles Margossian]:

squared bias. So you're not directly measuring the squared bias, but

[Charles Margossian]:

because NCMC is what it is, the non-stationary variance gives you a

[Charles Margossian]:

proxy clock for the bias. And so our argument is that, well, what's interesting

[Charles Margossian]:

about our hat is not that it measures, you know, the persistent variance,

[Charles Margossian]:

which you can then relate to the effective sample size, but that it measures

[Charles Margossian]:

the non-stationary variance, right? And this then led us to, you know, we're

[Charles Margossian]:

coming up with revisions of our hat, which more directly measure the

[Charles Margossian]:

non-stationary variance rather than the total variance. So that we

[Charles Margossian]:

actually get an estimator that unambiguously tells you something about

[Charles Margossian]:

the length of the warm up phase. And then you ask the question of

[Charles Margossian]:

the length of the sampling phase in a second and separate state.

[Charles Margossian]:

So, sorry, it's conceptual explanation. So that this is a paper that we

[Charles Margossian]:

have a preprint out called the nested r hat.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And this is joint work with, so Andrew Gellman, Akiva Terry, Matt

[Charles Margossian]:

Hoffman, so some of the usual suspects. We also have Pavel Junssov

[Charles Margossian]:

and Yonel Viyouduran.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So we've all worked on this together. And

[Charles Margossian]:

A cool anecdote here is that what we were really interested in is those

[Charles Margossian]:

regimes where we're running hundreds of chains in parallel, or thousands

[Charles Margossian]:

of chains in parallel, like GPU-friendly MCMC. And what this made

[Charles Margossian]:

us do is instead of thinking asymptotics in the limit where we have an infinitely

[Charles Margossian]:

long chains, which is how asymptotics for MCMC have worked for the

[Charles Margossian]:

past, you know, five or six decades, right? because that's what people

[Charles Margossian]:

did. They ran long chains, right?

[Alex Andorra]:

Yep.

[Charles Margossian]:

And so even though asymptotics are property of infinity, we want to somehow

[Charles Margossian]:

get close to asymptotic regime, right? That's where we care about this

[Charles Margossian]:

asymptotic analysis. And here we thought, well, let's take asymptotics

[Charles Margossian]:

in another direction. Let's say we have a finite number of chains,

[Charles Margossian]:

but what happens when we have an infinite number of chains? And then

[Charles Margossian]:

suddenly you can do asymptotic analysis. on non-stationary Markov chains.

[Charles Margossian]:

Right? So the problem is if I take an asymptotic in the length of

[Charles Margossian]:

the chain, well, I've made my chain stationary. And then there are

[Charles Margossian]:

only so many properties that I can study. If I take asymptotics in

[Charles Margossian]:

the other direction, which is the number of chains, and

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

suddenly I can start making statements about non-stationary Markov chains,

[Charles Margossian]:

I can elicit terms such as, you know, non-stationary variance. And

[Charles Margossian]:

this was a cool example where the hardware you know, trying to work

[Charles Margossian]:

with, you know, GPUs running a lot of chains. Actually, I think is really

[Charles Margossian]:

changing our theoretical approach and our conceptual understanding of

[Charles Margossian]:

MCMC. And I think we're going to get a lot of breakthroughs from this

[Charles Margossian]:

kind of perspective.

[Alex Andorra]:

Hmm. Yeah, super interesting. I put the two papers you mentioned in the show

[Alex Andorra]:

notes. And so that makes me think, basically, what would you say right

[Alex Andorra]:

now practically for people, when would variational inference usually be

[Alex Andorra]:

most helpful, especially in comparison to MCMC?

[Charles Margossian]:

Yeah, so great. So I think there are two things to unpack here. One is

[Charles Margossian]:

when can variational inference still give you accurate answers?

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And two, when do you actually not need an accurate answer?

[Charles Margossian]:

Right? So, in the first question, it really depends on the family of

[Charles Margossian]:

variation and inference that you're willing to use, the family of approximation.

[Charles Margossian]:

And also it turns out the objective function. So I mentioned earlier

[Charles Margossian]:

that when I have this mean field approximation, and I'll just remind

[Charles Margossian]:

what mean field means is, I'm assuming that all my latent variables

[Charles Margossian]:

are independent. Now we don't believe that that's true in practice,

[Charles Margossian]:

but that makes the computation much cheaper. And when you have, millions of

[Charles Margossian]:

observations, millions of parameters, you need an algorithm where

[Charles Margossian]:

the cost scales linearly with the number of observations. And if I don't

[Charles Margossian]:

have this mean field assumption, I get things that can scale quadratically or

[Charles Margossian]:

cubically.

[Charles Margossian]:

And so when I do this approximation, well, I'm gonna get some things

[Charles Margossian]:

wrong, but maybe I'll get the things that I care about right, which

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

could be the first moment or it could be the entropy, right? if you change

[Charles Margossian]:

the objective function, which actually the KL divergence, you can kind

[Charles Margossian]:

of reverse it. You can show that you get arbitrarily poor estimates

[Charles Margossian]:

of the entropy, but good estimates of the marginal variances. Right,

[Charles Margossian]:

so it turns out that the choice of objective function that you use to

[Charles Margossian]:

measure, you know, the disagreement between your approximation and

[Charles Margossian]:

your target matters. So there's a real question of, you know, what

[Charles Margossian]:

are the quantities you care about? because we don't care about the

[Charles Margossian]:

whole distribution. We care about some summaries of the posterior

[Charles Margossian]:

distribution.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And then that informs when you use them. So I think that's the first

[Charles Margossian]:

question. And

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I wanna emphasize that there's a lot of great work by Tamara Broderick

[Charles Margossian]:

and her group, Justin Dumke and his colleagues about trying to get

[Charles Margossian]:

more accurate variation inference. And sometimes that works really

[Charles Margossian]:

well. Which is a bit of a... I can't really give you a more precise

[Charles Margossian]:

prescription than that, because we have to go into the details of the

[Charles Margossian]:

different problems.

[Charles Margossian]:

But then the second point I made is sometimes you don't need a really

[Charles Margossian]:

accurate answer. So what are examples of that? So in machine learning,

[Charles Margossian]:

let's say you're just training a model, and maybe you're more interested

[Charles Margossian]:

in either a more complicated model or using more data than improving the

[Charles Margossian]:

accuracy of the inference. And then you

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

look at something like performing a task, a classification prediction

[Charles Margossian]:

and so forth under a computational budget,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

right? It turns out that it's better to have a sophisticated model with

[Charles Margossian]:

very approximate inference than a less sophisticated model with more accurate

[Charles Margossian]:

inference.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Now, it's hard to know, and my big problem with variation inference

[Charles Margossian]:

is it's hard to know which regime you're gonna be in. and to actually

[Charles Margossian]:

justify it. And even once you've run variation inference, you don't have

[Charles Margossian]:

that many diagnostics that can tell you, well, you know, you're doing

[Charles Margossian]:

okay, but you would do much better if you improve the inference. So

[Charles Margossian]:

I think that it's an open question. But then the other example that I

[Charles Margossian]:

wanna bring up where we don't always need accurate inference is when

[Charles Margossian]:

we're developing models. So this takes back into this idea of the Bayesian

[Charles Margossian]:

workflow. Uh, that now has been championed. Uh, you know, so Andrew

[Charles Margossian]:

Gellman and colleagues wrote a lot about it. Michael Bettencourt wrote

[Charles Margossian]:

a lot about it. David Bly wrote about a lot about it. You know, arguably

[Charles Margossian]:

George Box, right. Wrote a lot about it. And, um, and you know, if you,

[Charles Margossian]:

if you ever work on an applied project, you sit down, you come up

[Charles Margossian]:

with the first iteration of your model and, uh, arguably there are a

[Charles Margossian]:

lot of problems with that model. And you don't need super accurate inference

[Charles Margossian]:

to diagnose the problems with this model. You do a quick fit, a quick

[Charles Margossian]:

approximation, and usually something obvious is gonna pop up.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Then you revise the model. Then you do again, quick inference, okay? And

[Charles Margossian]:

you keep refining and refining. And only once you have actually a

[Charles Margossian]:

polished version of the model, do I think that it makes sense to, you

[Charles Margossian]:

know, get out the big gun and the very accurate inference. And I

[Charles Margossian]:

think that, you know, if we talk about pharmacometrics and epidemiology,

[Charles Margossian]:

I'll give you some very precise examples of those situations.

[Alex Andorra]:

Yeah, for sure. We'll do that in a few minutes. But yeah, thanks a lot for

[Alex Andorra]:

that tour. That makes a lot of sense, actually, all of that. I will use

[Charles Margossian]:

I realize my

[Alex Andorra]:

those.

[Charles Margossian]:

answers are very long, but your questions are very deep.

[Alex Andorra]:

Yeah, yeah, yeah. No, no, that's really good. I mean, that's why the podcast

[Alex Andorra]:

is for also, you know, going deep into the explanation that you cannot really

[Alex Andorra]:

do in a paper, right? In the papers, usually it's a more technical audience

[Alex Andorra]:

first.

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

And so like, you're not going to really explain the difference between volitional

[Alex Andorra]:

inference and MCMC in a paper, because if the audience already is supposed

[Alex Andorra]:

to know that, well, why would you do that in a paper? So.

[Charles Margossian]:

Yeah, and frankly the paper, so what ends up being the discussion

[Charles Margossian]:

that ends up being in the paper is usually the discussion you've

[Charles Margossian]:

had with the reviewer, which is a subset, which is really a subset

[Charles Margossian]:

of everything you would like to discuss. So, yeah, it's nice to have

[Charles Margossian]:

a more free format

[Alex Andorra]:

Yeah.

[Charles Margossian]:

to really think about those questions. And what I will say is that

[Charles Margossian]:

actually the one format that was... that I really like where all these

[Charles Margossian]:

questions come up is, you know, is teaching, is like workshops.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

And so when you do a workshop on Bayesian modeling, you do a workshop on

[Charles Margossian]:

Stan, on PMC or something like that, all the questions that I've brought

[Charles Margossian]:

up, you know, how long should the sampling phase be? How long should

[Charles Margossian]:

the warmup phase be? Should I use this or that algorithm? Those are questions

[Charles Margossian]:

that a person sitting at a workshop Intro to Stan, intro to your

[Charles Margossian]:

favorite language would ask. And so these end up being forums where

[Charles Margossian]:

we do discuss these fundamental questions. Because even though they're deep,

[Charles Margossian]:

they're elementary nonetheless. And I mean that in the most positive sense

[Charles Margossian]:

of the word elementary possible.

[Alex Andorra]:

Yeah, now completely you have completely outmasked the way I pick questions

[Alex Andorra]:

for the show. I'm just doing the same as you did. Yeah, I teach a lot of

[Alex Andorra]:

workshops too. And these questions are basically the questions that a lot of beginners

[Alex Andorra]:

ask, where it's like they often have used variational inference because for

[Alex Andorra]:

some reason, especially when they come from the classical machine learning

[Alex Andorra]:

world, then using variational inference makes more sense to them because

[Alex Andorra]:

it's closer from home, basically

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

closer to home. And so yeah, like in the end the natural question is when

[Alex Andorra]:

should I use variational inference? Why should I use it? Why should I even bother

[Alex Andorra]:

with MCMC? Things like that. So I have a lot of the questions I've been

[Charles Margossian]:

Yeah.

[Alex Andorra]:

asking.

[Charles Margossian]:

And these are totally open questions. I mean, correct me. Maybe you

[Charles Margossian]:

have an answer that I missed. You know, we have heuristics and we

[Charles Margossian]:

have good pointers and we have good case studies. But so much of this

[Charles Margossian]:

remains unanswered. I think

[Alex Andorra]:

Yeah, yeah,

[Charles Margossian]:

not

[Alex Andorra]:

yeah.

[Charles Margossian]:

unapproachable. Let me be clear. Totally approachable. And you can,

[Charles Margossian]:

you know, it's not crippling. We can totally still do a Bayesian modeling.

[Charles Margossian]:

But there are open questions that linger.

[Alex Andorra]:

Yeah. And then you have to strike the right balance between when you're answering

[Alex Andorra]:

such a question from a beginner, where you want to be like intellectually honest

[Alex Andorra]:

and saying,

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

yeah, like these are still open questions, but at the same time, you don't

[Alex Andorra]:

want them to walk away with the feeling that, well, this is completely...

[Alex Andorra]:

like is completely undefined and I cannot really use these methods because

[Alex Andorra]:

no, um, there is no clear rule of about what I would use in when

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

and why. So, uh, like that's always a, an important balance to strike and

[Alex Andorra]:

not, not always an easy one.

[Charles Margossian]:

Yeah, yeah, I absolutely relate to that.

[Alex Andorra]:

Yeah, so actually before we dig into a bit more of the applications, I'm

[Alex Andorra]:

curious basically on the work you're doing right now because I really love

[Alex Andorra]:

the fact that you're both working on MCMC and on approximate patient inference.

[Alex Andorra]:

So I'm wondering what are the frontiers? currently in that field of algorithms

[Alex Andorra]:

that you find particularly excited about. You already mentioned basically the

[Alex Andorra]:

progress of the hardware, which opens a lot of avenues. I'm curious if there

[Alex Andorra]:

are other things you have your eye on.

[Charles Margossian]:

Yeah, I think that...

[Charles Margossian]:

So. Well, I think hardware is an important question. And I think it's

[Charles Margossian]:

a difficult question.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And I wanna talk, I think, you know, maybe I wanna say a little bit more

[Charles Margossian]:

because I think

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

frontier is good. And not only that, I think it's an ambiguous frontier.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Because I don't think, you know, to me, it's not clear. you know,

[Charles Margossian]:

how much we're gonna get out of hardware for MCMC, for example,

[Charles Margossian]:

and you know, what are gonna be the limits of that? And so what I'm

[Charles Margossian]:

excited is that now we have GPUs, now we have several algorithms that

[Charles Margossian]:

are GPU friendly. And I'll explain a little bit what that means,

[Charles Margossian]:

but essentially, So at a very fundamental level, what it means is

[Charles Margossian]:

you can run a lot of Markov chains

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

in a synchronized time. And so you're not waiting for the slowest chain,

[Charles Margossian]:

basically. That's kind of the intuition here. And people are like,

[Charles Margossian]:

well, this is great. If you run a lot of chains, we can really make

[Charles Margossian]:

the sampling phase much shorter. Like again, so let's say your target

[Charles Margossian]:

effective sample size is 2000. And actually what effective sample

[Charles Margossian]:

size you should target. That's another interesting and very fundamental

[Charles Margossian]:

question. And it turns out a question where different people have

[Charles Margossian]:

different opinions on

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

within the Bayesian community. But let's say now for sake of argument, you're

[Charles Margossian]:

targeting an effective sample size of 2000. And I'm not saying this

[Charles Margossian]:

is what you should do or not do, but let's just say 2000. Then once

[Charles Margossian]:

you run 2000 chains and you warm them up enough, right? But really

[Charles Margossian]:

you only need one good sample per chain,

[Alex Andorra]:

Thanks for watching!

[Charles Margossian]:

which means your sampling phase can be one iteration. And then all your

[Charles Margossian]:

questions about how long should the warmup be, it's no longer about

[Charles Margossian]:

adapting the kernels so that I have a low auto correlation during the

[Charles Margossian]:

sampling phase. Actually the auto correlation only matters in so far

[Charles Margossian]:

as it reduces your bias. All right, and so

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I think that suddenly we've greatly simplified the question of diagnosing

[Charles Margossian]:

how much computation we need to throw at the algorithm. If we have

[Charles Margossian]:

a lot of variance, then suddenly it just becomes about bias decay.

[Charles Margossian]:

And the sampling problem becomes much closer to the optimization problem.

[Charles Margossian]:

Right. And then I can go into questions. So then there are interesting

[Charles Margossian]:

things where you actually run a lot of chains. You can pool information

[Charles Margossian]:

between the different chains to

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

make the warmup phase shorter. And for some

[Alex Andorra]:

Yeah.

[Charles Margossian]:

problems, like you have these multimodal problems, like it's exponentially

[Charles Margossian]:

shorter, like you're never going to get a good answer with a single chain

[Charles Margossian]:

that doesn't use cross adaptation. Right. And so here, I want to give

[Charles Margossian]:

a shout out to Marie-Lou Gabrielle and her work on MCMC that uses normalizing

[Charles Margossian]:

flow for adaptation.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Right. That's the thing, that's a technique where they actually, you

[Charles Margossian]:

know, run 10,000 workers or chains. And I remember when I was talking

[Charles Margossian]:

to her and colleagues, I was like, well, you know, once you've, you're

[Charles Margossian]:

already running 10,000 chains to jump between the modes, you actually

[Charles Margossian]:

don't need a long sampling phase at all,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

right? So that's one aspect of it. But then even for more ordinary problems,

[Charles Margossian]:

you can show that the time it takes you to reduce the bias goes down

[Charles Margossian]:

with the number of chains, because you are pooling information between

[Charles Margossian]:

the Markov chains. And this is not something that we really understand.

[Alex Andorra]:

Hmm.

[Charles Margossian]:

Um, and so, you know, where I see the frontier is that actually,

[Charles Margossian]:

if I run a lot of chains, I also get more accurate diagnostics. My computations

[Charles Margossian]:

of our hat and its generalization become much more reliable. And I think the

[Charles Margossian]:

Holy grail would be something where we don't have users specify, um, the

[Charles Margossian]:

length of the warmup phase or the length of the sampling phase. We have

[Charles Margossian]:

them think about what is your target ESS? That's the number of chains

[Charles Margossian]:

that you run. And then we're going to automatically stop the warmup

[Charles Margossian]:

phase. when we hit a certain target, right? And then suddenly,

[Charles Margossian]:

we're starting to do optimal computation for MCMC. And I think that to do

[Charles Margossian]:

optimal computation, at least in the way that I've described it, we

[Charles Margossian]:

need those GPUs. And at the same time, I think that there are a lot

[Charles Margossian]:

of problems that are not gonna be amiable to GPUs, right? It's still,

[Charles Margossian]:

there's still this fundamental sequential component. which is the bias has

[Charles Margossian]:

to go down, the warmup needs to happen, right? At some point, adding

[Charles Margossian]:

more chains is not gonna help you. Whether the speed up you're gonna

[Charles Margossian]:

get on this is not gonna be arbitrarily large, right? And then the benefit

[Charles Margossian]:

you're gonna get from various reduction by running more chains, well,

[Charles Margossian]:

once you've read. If your target ESS is 2000, maybe it doesn't help to

[Charles Margossian]:

run 10,000 chains, right?

[Alex Andorra]:

Yeah.

[Charles Margossian]:

At least not immediately, right? So those are very clear, you know, questions

[Charles Margossian]:

that arise about, you know, ultimately, what are, how far are we

[Charles Margossian]:

going to be able to go with this ES, this, you know, running many

[Charles Margossian]:

chains. But what I want to emphasize is that there's a, you know, there's

[Charles Margossian]:

a computational game, which people think about, our algorithms are

[Charles Margossian]:

faster. I also think there's a conceptual game. And there is the opportunity

[Charles Margossian]:

to make MCMC more black box.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

We're less reliant on some very fundamental tuning algorithms.

[Charles Margossian]:

And therefore, you know, we kind of get into this regime where the computation

[Charles Margossian]:

that we're using is just the right amount. And that's really, you know,

[Charles Margossian]:

if I have to say, like, I have another one or two years as a postdoc,

[Charles Margossian]:

very optimistically, that's the problem I'd like to solve. Right? Like,

[Charles Margossian]:

we remove the fundamental tuning parameters of MCMC. And there are other

[Charles Margossian]:

approaches towards that. I'm not going to pretend that this is the only

[Charles Margossian]:

angle to tackle this problem. Let me be absolutely clear. I think it's a

[Charles Margossian]:

very, very promising.

[Alex Andorra]:

Yeah, yeah,

[Charles Margossian]:

Yeah,

[Alex Andorra]:

for sure. That's fascinating.

[Charles Margossian]:

what I will say is since we're going to go into applications is one

[Charles Margossian]:

limitation

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

with GPUs is they're very bad at solving ODEs.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And a lot of the problems that I care about, you

[Alex Andorra]:

Have

[Charles Margossian]:

know,

[Alex Andorra]:

a day!

[Charles Margossian]:

have likelihoods. And in order to evaluate those likelihoods, you

[Charles Margossian]:

need to solve an OD.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So I don't think we're at a stage where running a thousand chains

[Charles Margossian]:

on a GPU is going to solve all the problems in pharmacometrics that I

[Charles Margossian]:

am, you know. deeply invested in, right? That said, you know, we have clusters

[Charles Margossian]:

of CPUs where maybe we can run 60 to 120 chains and that can get us

[Charles Margossian]:

some of the way.

[Alex Andorra]:

Hmm, yeah. Yeah, I mean, when you were talking about that, I was thinking

[Alex Andorra]:

that'd be super cool to have, you know, in Stan or Pimcey afterwards, like

[Alex Andorra]:

at some point, optimize the way the number of chains and samples that are

[Alex Andorra]:

taken instead of having them, because right now it's like, okay, we're gonna

[Alex Andorra]:

run as many chains as we can with the GP or CPU we have. So that's

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

already kind of automated. But the number of samples not really automated

[Alex Andorra]:

is just a rule of thumb. Where it's like, OK, we think in general, these number

[Alex Andorra]:

of samples work well. In PyMC, it's 1,000 per chain after warming up. But

[Alex Andorra]:

what would be

[Charles Margossian]:

Thanks

[Alex Andorra]:

super

[Charles Margossian]:

for watching!

[Alex Andorra]:

cool is like, OK, PyMC or Stan, see what's there, the resources. And then it's

[Alex Andorra]:

like, OK, so given the complexity of the posterior that we can see right now,

[Alex Andorra]:

we are going

[Charles Margossian]:

Yeah.

[Alex Andorra]:

to run that many chains and that many sample per chain. That'd be super

[Alex Andorra]:

cool, because also that would be something that would be easier for the beginners,

[Alex Andorra]:

because they are

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

really, really sometimes very anxious about having a lot of samples, even

[Alex Andorra]:

though you know, no, you don't need 10,000 samples per chain for that simple

[Alex Andorra]:

regression, but it's hard to explain.

[Charles Margossian]:

Yeah, and so what ends up, also what ends up happening, very real

[Charles Margossian]:

experience, and I do it myself, is the simple models that probably

[Charles Margossian]:

don't need that many iterations. I run them for a lot of iterations because

[Charles Margossian]:

that's computationally cheap to do. And then the hard models. where

[Charles Margossian]:

each iteration is very expensive and the posterior distribution is

[Charles Margossian]:

much more complicated, I actually end up, and where I would need more

[Charles Margossian]:

iterations, I end up running less iterations.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

Right, and I think a lot of people will sympathize with that. That's

[Charles Margossian]:

my experience interacting with practitioners. I'll give you another example

[Charles Margossian]:

of things I've seen my colleagues in epidemiology do is that when

[Charles Margossian]:

their model starts getting really complicated, they start using somewhat

[Charles Margossian]:

less over dispersed initializations.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Even though we know that that's what we need for the convergence diagnostic

[Charles Margossian]:

to be reliable. And I'll tell you a little bit more about that because

[Charles Margossian]:

actually that's another question is like, what does it mean for initialization

[Charles Margossian]:

to be over dispersed? And I have some answers to that are not quite,

[Charles Margossian]:

you know, the regular answers. Um, but that's a huge problem. Like

[Charles Margossian]:

when models get hard and again, those ODE based models. Right? You cannot

[Charles Margossian]:

solve those ODE's if you throw insane parameter values at your model.

[Charles Margossian]:

And so now people have to have to make compromises. And I think that,

[Charles Margossian]:

you know, especially statisticians were a little bit cavalier. We try to, we tend

[Charles Margossian]:

to be conservative, uh,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

because maybe in a way, you know, that's the role of the statistician

[Charles Margossian]:

and the theorist is to be conservative and to be, and to play it safe.

[Charles Margossian]:

But when the safe and the conservative heuristics, um become impractical

[Charles Margossian]:

uh we need to think harder about you know okay if we don't want to be

[Charles Margossian]:

too conservative but we still want to be safe what do we need to do

[Charles Margossian]:

and that's where

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

these questions of you know optimal warm-up length optimal sampling

[Charles Margossian]:

phase optimal over dispersed initialization uh really come in into play

[Charles Margossian]:

because if you're too conservative in your prescriptions You might

[Charles Margossian]:

make your editor happy, but actually practitioners are going to have

[Charles Margossian]:

a really hard time following those prescriptions.

[Alex Andorra]:

Yeah,

[Charles Margossian]:

And then they do

[Alex Andorra]:

yeah.

[Charles Margossian]:

things. I'm not saying that they do silly things instead, right? But

[Charles Margossian]:

they do other things that are maybe less principled.

[Alex Andorra]:

Mm-hmm. Yeah. Yeah, fascinating. I could spend the whole episode on these topics.

[Alex Andorra]:

I really love it. Thanks a lot for diving so deep into these, Charles. But

[Alex Andorra]:

let's get a bit more practical here and talk about

[Charles Margossian]:

Uh huh.

[Alex Andorra]:

what you do, basically, with epidemiology and pharmacometrics. So first,

[Alex Andorra]:

can you define pharmacometrics for us? I personally don't know what that is,

[Charles Margossian]:

Yeah.

[Alex Andorra]:

how that differs from epidemiology and what do patient statistics bring to epidemiology

[Alex Andorra]:

and pharmacometrics.

[Charles Margossian]:

Yeah, so pharmacometrics, I mean, the way I would think about pharmacometrics

[Charles Margossian]:

is pharmacometrics is to pharmacology, what econometrics is to economics.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

There's some people who want to emphasize that they're using quantitative

[Charles Margossian]:

methods.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Now, the particular field of pharmacometrics that I've worked on is called

[Charles Margossian]:

PKPD modeling, so pharmacokinetics and pharmacodynamics. And essentially, what

[Charles Margossian]:

happens is, let's say there's a new treatment that's being developed.

[Charles Margossian]:

So it could be a new drug compound or it could be a number of things.

[Charles Margossian]:

We're usually interested in two questions. One, how does the drug get

[Charles Margossian]:

absorbed and how does it diffuse in the patient's body? That's called

[Charles Margossian]:

the pharmacokinetics. And two, once the drug diffuses in the body,

[Charles Margossian]:

what does it do to the body? And so that includes targeting a disease,

[Charles Margossian]:

but also side effects. I was worried about how toxic the treatment

[Charles Margossian]:

might be. And what people are trying to do with these models is based

[Charles Margossian]:

on early data from clinical trials. So either on human beings or even on

[Charles Margossian]:

animals, they try to predict. what is going to happen when we look at a broader

[Charles Margossian]:

population of individuals and when we kind of like start changing the

[Charles Margossian]:

treatment. So you have some, you know, some medical treatments can

[Charles Margossian]:

get very complicated. You have questions of, you know, you know,

[Charles Margossian]:

I have a certain drug compound. How much do I administer? How often

[Charles Margossian]:

do I administer it? Is it better to take, you know, half a dose? every

[Charles Margossian]:

half hour or only a single dose every hour. And so you have all these

[Charles Margossian]:

combinatorics of possibilities. If I really increase the dose, do I immediately

[Charles Margossian]:

get better effects? Or does it saturate? We often have these nonlinear

[Charles Margossian]:

effects. These are called Michaelis-Menten models, where at some point,

[Charles Margossian]:

adding more dose erases the cost and doesn't help the patient.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

Right. And one way to do it would be

[Charles Margossian]:

you run a ton of clinical trials and then you hope that something works

[Charles Margossian]:

out that's extremely expensive and time consuming and, you know, well, Maybe

[Charles Margossian]:

it's safer for the humans than the animals. Let me put it this way. Or

[Charles Margossian]:

based on a bit of data, and then a really mechanistic model where you

[Charles Margossian]:

actually really bake in some of your expertise as a pharmacologist,

[Charles Margossian]:

as a biomedical engineer, you try to understand the underlying system

[Charles Margossian]:

so that when you're trying out different regimens, you can really

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

predict. what are gonna be the doses that are more promising? So it's

[Charles Margossian]:

very useful to do exploratory analysis. And then it's also useful to do,

[Charles Margossian]:

once a drug hits the market, you actually collect very imperfect data, you

[Charles Margossian]:

actually have uncertainty, but you still want to keep learning about the

[Charles Margossian]:

dose and the dosing regimen, right, from

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

data from hospitals. And sometimes the data is rare, you have rare diseases,

[Charles Margossian]:

rare conditions, and that kind of thing. And so that is... If you will,

[Charles Margossian]:

then...

[Charles Margossian]:

the domain of pharmacometrics. And what's extremely interesting is that within

[Charles Margossian]:

that domain, you have, so first of all, you have what's called mechanistic

[Charles Margossian]:

models. And what I mean by that is the parameters are interpretable.

[Charles Margossian]:

The relationship between models

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

is interpretable. Now, at contrast that with a neural network, for

[Charles Margossian]:

example, where I might get good predictions, but then if I want to

[Charles Margossian]:

do out of sample predictions, which is actually really what we want to do,

[Charles Margossian]:

right? in pharmacometrics, right? Examples of other samples would be a

[Charles Margossian]:

different dosing regimen, or I've tested the drug on an adult population.

[Charles Margossian]:

What happens if I do it for children, right? That's the kind of pediatric

[Charles Margossian]:

questions. We need to bake in the mechanistic understanding, which

[Charles Margossian]:

doesn't exclude a role in neural networks. I think these can also

[Charles Margossian]:

play a role, but I'll leave that for now. But then you have various

[Charles Margossian]:

degrees details in the mechanism. You have some equations that are

[Charles Margossian]:

very simple. So the two-compartment model with first-order absorption says the

[Charles Margossian]:

human body is three compartments. There's the gut, where the drug arrives when

[Charles Margossian]:

you already administer it. There is the central compartment where

[Charles Margossian]:

the drug diffuses quickly. So usually that includes the blood. And then

[Charles Margossian]:

maybe there's a peripheral compartment, so tissues where the drug diffuses

[Charles Margossian]:

more slowly. That's obviously not a very detailed description of the human

[Charles Margossian]:

body. And then you have models that are more complicated. So at Matron,

[Charles Margossian]:

Matthew Riggs and some colleagues, they work on this bone mineral

[Charles Margossian]:

density, where actually they had a lot of different parameters. And

[Charles Margossian]:

now instead of having a system of three differential equations, you

[Charles Margossian]:

have 30 differential equations. You have a ton of parameters, but you have

[Charles Margossian]:

a lot of information, prior information, about these parameters.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And then you have people who really throw differential equations with

[Charles Margossian]:

you know, hundreds of states at them and, you know, thousands of

[Charles Margossian]:

interpretable parameters. And frankly, I don't think we have the Bayesian

[Charles Margossian]:

computation to fit those models, even though in theory, they lend themselves

[Charles Margossian]:

extremely well to a Bayesian analysis, right? I think that realistically

[Charles Margossian]:

we're somewhere in the semi-mechanistic regime. So these are models

[Charles Margossian]:

that have some level of sophistication, but already we pay a dire price

[Charles Margossian]:

for this sophistication, which is that the computation. can take hours

[Charles Margossian]:

or days to fit. And so there's like really this potential for, better

[Charles Margossian]:

Bayesian computation can really allow people to deploy better models

[Charles Margossian]:

and more sophisticated. The other big aspect of pharmacometrics is

[Charles Margossian]:

usually we have trials with data from different patients, there's

[Charles Margossian]:

heterogeneity

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

between the patients or similarity

[Alex Andorra]:

Yeah.

[Charles Margossian]:

between patients, so that lends itself very well to hierarchical modeling.

[Alex Andorra]:

Yeah,

[Charles Margossian]:

And

[Alex Andorra]:

for

[Charles Margossian]:

we

[Alex Andorra]:

sure.

[Charles Margossian]:

know hierarchical modeling is hard, right? It tends to create these posterior

[Charles Margossian]:

distributions with geometries that are very frustrating. And I spent a lot

[Charles Margossian]:

of time worrying about hierarchical models. I've done a lot of work

[Charles Margossian]:

on the... nested Laplace approximations. So

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

that's another nice example of an approximation. It's not variation inference

[Charles Margossian]:

has very complimentary qualities. And what the nested Laplace approximation

[Charles Margossian]:

allows you to do is marginalize out the latent variables in a hierarchical

[Charles Margossian]:

model.

[Charles Margossian]:

And people often explain nested Laplace as, oh, it's great, it reduces

[Charles Margossian]:

the dimension of your problem. And then we can throw a quadratic

[Charles Margossian]:

at the remaining parameters. We've had models where we had thousands of

[Charles Margossian]:

hyperparameters, those were genetic problems where we were using horseshoe

[Charles Margossian]:

parameters to select Gs. So even once you marginalize out, you know,

[Charles Margossian]:

the latent variables, you still have a high dimensional problem.

[Charles Margossian]:

So we threw Hamiltonian Monte Carlo at it, but it still made a big

[Charles Margossian]:

difference because we simplified the geometry of the posterior distribution

[Charles Margossian]:

by doing this marginalizations for the hierarchical models. So I'm very excited

[Charles Margossian]:

about the prospect of having, you know, and that's the plus approximation

[Charles Margossian]:

in Stan.

[Alex Andorra]:

Thanks

[Charles Margossian]:

We

[Alex Andorra]:

for

[Charles Margossian]:

have

[Alex Andorra]:

watching!

[Charles Margossian]:

a prototype. that works really well. We have some really cool automatic

[Charles Margossian]:

differentiation supporting it.

[Alex Andorra]:

Hmph.

[Charles Margossian]:

But, and you know, the problem is again, I wanna try this on OD based

[Charles Margossian]:

models.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

I don't get a good approximation. I don't get efficient automatic differentiation.

[Charles Margossian]:

I get something that's unstable. Now the simple examples where I got

[Charles Margossian]:

it working, it actually gave surprisingly accurate results, right?

[Charles Margossian]:

But this is again an example where here's this awesome. algorithm and statistical

[Charles Margossian]:

methods, and it just gets frustrated by the nature of the problems we

[Charles Margossian]:

encounter in pharmacometrics. Even though, you know, these are hierarchical

[Charles Margossian]:

models and these methods are designed for hierarchical models. But again,

[Charles Margossian]:

if your likelihood is not a general linear model, yeah, suddenly

[Charles Margossian]:

those approximations become much more tricky. And that's why, that's why,

[Charles Margossian]:

you know, I think that We have to use MCMC for these models.

[Alex Andorra]:

Hmm.

[Alex Andorra]:

Yeah, that's interesting. So in these cases, yeah, that makes it a bit clearer,

[Alex Andorra]:

I think, for people in the practical cases where you would have to do that trade-off,

[Alex Andorra]:

basically. How do you choose between the trade-off between the different

[Alex Andorra]:

methods? I think it's very important.

[Charles Margossian]:

And that said, I do want to say that there's some really cool approximations

[Charles Margossian]:

that people

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

do deploy in pharmacometrics.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I recently read a...

[Charles Margossian]:

I'm an anonymous reviewer, so I'm not going to give too many details.

[Alex Andorra]:

Ha, sure.

[Charles Margossian]:

But what I like, you know, it's, there were questions of what if we fix

[Charles Margossian]:

those parameters or what if, you know, we draw these parameters from

[Charles Margossian]:

their priors because they're removed away enough from the data,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

but maybe they're not influenced that much by the data, so the posterior

[Charles Margossian]:

stays close to the prior, but nonetheless, we need that uncertainty for

[Charles Margossian]:

the interpretable quantities in the middle. Right.

[Alex Andorra]:

Hm-hm.

[Charles Margossian]:

And so people are coming up with these compromises. And of course, now

[Charles Margossian]:

we're again in the business of we have these awesome computational constraints.

[Charles Margossian]:

People come up with these approximations either of the inference or even,

[Charles Margossian]:

you know, they say, well, let's use a simpler model. We know there's

[Charles Margossian]:

a more complicated model out there but maybe we still get all the answers

[Charles Margossian]:

that we need with the simpler model. Right. And so now we get again

[Charles Margossian]:

in this, you know, the question of understanding what are the simplifications

[Charles Margossian]:

that we get away with? What are the ones where we pay a heavy price?

[Charles Margossian]:

Can we actually quantify the price we're paying? Can we diagnose when

[Charles Margossian]:

the simplification is too dire or not? And as far as I can tell,

[Charles Margossian]:

the answer is, you know, we're not at the stage of diagnosing the problem,

[Charles Margossian]:

but at least now people are taking this problem seriously enough that

[Charles Margossian]:

they're building these case studies. Now based on these case studies

[Charles Margossian]:

where, you know, we do try out the different models and we do fit

[Charles Margossian]:

the complicated methods and we're able to say something about the simpler

[Charles Margossian]:

methods because we did fit the complicated methods. So it's an academic

[Charles Margossian]:

exercise in a way.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

But at least it gives us, you know, that's how it starts. And that's

[Charles Margossian]:

how we're going to start developing the intuition and the heuristics

[Charles Margossian]:

to then deal with, you know. deploying the approximations and the

[Charles Margossian]:

simplifications. In practice.

[Alex Andorra]:

Yeah, that makes sense. And so actually, do you have an example of a research

[Alex Andorra]:

project that where you applied patient statistics in this field of pharmacometrics

[Alex Andorra]:

and where really patient statistics helped you uncover very important insights?

[Alex Andorra]:

thanks to all those methods that you've talked about.

[Charles Margossian]:

Yeah. So I've never led a project in pharmacometrics. I've always collaborated

[Charles Margossian]:

with pharmacologists.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So, you know, and it's true that my work has been more methodological,

[Charles Margossian]:

has been more developing towards in itself. That's it. What I can talk

[Charles Margossian]:

about is some of the interactions I've had with pharmacometricians and some of

[Charles Margossian]:

the works were maybe a more contributor than say, the project lead.

[Charles Margossian]:

And I'll give you one example. And then I think that if we have time

[Charles Margossian]:

and we talk a little bit about epidemiology, I have a very

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

good example in epidemiology.

[Alex Andorra]:

Ah, okay, I was...

[Charles Margossian]:

But this is a, this was super cool actually. This is, this is a bit

[Charles Margossian]:

anecdotal.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I don't know that there's a preprint on this now, but I was in Paris,

[Charles Margossian]:

I was visiting in CERN, which does a lot of medical work. And I was

[Charles Margossian]:

visiting France Mautre's group, and they have fantastic people there.

[Charles Margossian]:

And I'm talking with Julie Bertrand, and she's interested in pharmacogenetics,

[Charles Margossian]:

right? So we're... We have some genome sequencing or some gene tracking

[Charles Margossian]:

that's involved, and we're trying to see how a patient reacts to a treatment,

[Charles Margossian]:

to a certain condition. And in that case, what ended up happening,

[Charles Margossian]:

so one of the things that was frustrating is they're trying to identify

[Charles Margossian]:

what are the genes that are meaningful, that seem to have a meaningful

[Charles Margossian]:

connection to the outcome of the treatments. And they couldn't get anything

[Charles Margossian]:

that is statistically significant in the traditional sense, right?

[Charles Margossian]:

There wasn't one gene or one SNP that unembarrassed bigously stood

[Charles Margossian]:

out as, yes, this is a meaningful snip and it should intervene in

[Charles Margossian]:

oilier analysis.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And what they had done was a Bayesian analysis where they had used

[Charles Margossian]:

a horseshoe prior.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And so what the horseshoe prior does is it does like, it's a regularization

[Charles Margossian]:

tool that does a soft selection. And the way that soft selection

[Charles Margossian]:

manifests is there is a quantity that you can look at. and it will

[Charles Margossian]:

usually be bimodal. And one mode will indicate that the covariate that

[Charles Margossian]:

corresponds to this snip is completely

[Charles Margossian]:

regressed to zero. So that's an indication that this is not a very

[Charles Margossian]:

useful explanatory variable. And then the second mode tells you actually

[Charles Margossian]:

this variable matters and it's not regressed to zero. And so why

[Charles Margossian]:

there are two modes is because there is uncertainty. But the very

[Charles Margossian]:

cool thing is that even though there was no single snip that stood

[Charles Margossian]:

out as this is the meaningful snip, uh, you had two snips that came up and

[Charles Margossian]:

they both had a bimodal, right? So what that means is you couldn't definitely

[Charles Margossian]:

say that, uh, either snip mattered, right? But you say, okay, with some

[Charles Margossian]:

probability, either of the snips can matter.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And here's where it gets very interesting is that actually, because

[Charles Margossian]:

you have a multivariate posterior distribution, you can go a bit further

[Charles Margossian]:

and you realize that the two snips are anti-correlated, right? And so

[Charles Margossian]:

what that means is when you have a lot of posterior mass at one mode

[Charles Margossian]:

for one snip that says, this variable matters, don't regress it to

[Charles Margossian]:

zero, the other covariate would always get regressed to zero

[Alex Andorra]:

Hmm.

[Charles Margossian]:

and vice versa, right?

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So what the multivariate analysis tells you and what this proper treatment

[Charles Margossian]:

of uncertainty tells you is like, yeah, you can't say that one snip

[Charles Margossian]:

is statistically significant in the traditional sense, but now you have

[Charles Margossian]:

this more comprehensive treatment of uncertainty that tells you, but

[Charles Margossian]:

you know what? It has to be one of those two. You can't tell for

[Charles Margossian]:

sure which one, but it has to be one of those two. And that's a nice

[Charles Margossian]:

example where we're really, not just looking at the maximum likelihood

[Charles Margossian]:

estimator or even the expectation value or just a variance, we're

[Charles Margossian]:

really looking at essentially the quantiles, the extreme quantiles

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

of the posterior distribution. And excuse me.

[Charles Margossian]:

Sorry about that.

[Charles Margossian]:

we're looking at multiple variables and

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

the uncertainty across those multiple variables, right? So I think

[Charles Margossian]:

that's a very neat example and you know. paper to look forward to when

[Charles Margossian]:

it comes out or again. Again, this was anecdotal. This was a conversation

[Charles Margossian]:

in the laboratory,

[Alex Andorra]:

Yeah.

[Charles Margossian]:

but I got very excited about that exam.

[Alex Andorra]:

Yeah, I mean, that sounds super exciting. Thanks a lot for sharing that. And

[Alex Andorra]:

yeah, like if when the paper is out, please get in touch and then that'd

[Alex Andorra]:

be a fun thing to talk about again. And actually, so we're running a bit

[Alex Andorra]:

long, but do you still have a few more minutes because I still have a few

[Alex Andorra]:

questions for you. Or you want

[Charles Margossian]:

Yeah,

[Alex Andorra]:

to

[Charles Margossian]:

yeah,

[Alex Andorra]:

close out?

[Charles Margossian]:

I can stick around for a bit. Yeah.

[Alex Andorra]:

Okay, awesome. Yeah. So, uh, yeah, I'd like to, to talk to you about, uh,

[Alex Andorra]:

priors and then maybe, um, Torsten, uh, or an epidemiology example, uh, or both

[Alex Andorra]:

of them. Um, so yeah, maybe, um, basically you have, uh, been working on,

[Alex Andorra]:

on Torsten, which is, if I understood correctly, uh, pharmacometrics application

[Alex Andorra]:

of Stan models. So that was interesting to me because the Bayesian field

[Alex Andorra]:

has been evolving really, really fast lately, especially with the new techniques,

[Alex Andorra]:

the new software tools, and you've definitely been part of that effort,

[Alex Andorra]:

Charles. So I'm wondering if any recent developments that have particularly

[Alex Andorra]:

excited you, especially in your field of pharmacometrics and epidemiology,

[Alex Andorra]:

and also in relationship with what you're doing with Tolston.

[Charles Margossian]:

Yeah, okay, so let me talk about, let me give one example, right?

[Alex Andorra]:

Mm-hmm, yeah.

[Charles Margossian]:

So not a comprehensive answer, but an administrative answer.

[Alex Andorra]:

Yeah, it's great.

[Charles Margossian]:

Um, and so one, one fundamental, you know, yet another fundamental question

[Charles Margossian]:

that would come up at a workshop, um, is okay. I have this ODE integrator,

[Charles Margossian]:

right? I need to solve an ODE to evaluate my likelihood and the ODE's

[Charles Margossian]:

come with certain tuning parameters. In

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

particular, what is the precision with which you should solve your

[Charles Margossian]:

ODE? And. And that is going to have an impact on the quality of your

[Charles Margossian]:

entrance and also the run time. Right, because if you solve the OD

[Charles Margossian]:

with a very strict tolerance, it takes much longer to solve that OD. Right.

[Charles Margossian]:

And so again, that comes back to the question of how much computation

[Charles Margossian]:

should we throw at the problem. And For the longest time, I didn't

[Charles Margossian]:

have a good answer. And maybe I still don't to that question. And the

[Charles Margossian]:

way this manifested is either, you know, at workshops when teaching

[Charles Margossian]:

the subject, but even when we were writing the Stan manual, we have

[Charles Margossian]:

a page on OD integrators. We are, we have a, you know, we state, these

[Charles Margossian]:

are the default values that we use, but we don't have any clear recommendations

[Charles Margossian]:

on what is the precision you should use. And we kind of assume that

[Charles Margossian]:

the user knows their ODE well enough that they'll know which ODE integrator

[Charles Margossian]:

to pick and what tolerance to set.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

which is not realistic. But it's just we didn't have an answer to that

[Charles Margossian]:

question. And so recently, there was a paper. And so I know that.

[Charles Margossian]:

Let me look up exactly who the... I know Aki is a co-author on it,

[Charles Margossian]:

Aki the Tarii.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

But I want to give a shout out to the lead author.

[Alex Andorra]:

Yeah, for sure. And we'll put that also in the show notes for this episode.

[Charles Margossian]:

Yeah, so there you go. So Juho Timonen is the lead authors, is the

[Charles Margossian]:

lead author, and then there are a bunch of other people on it. So

[Charles Margossian]:

their paper is an important sampling approach for Bayesian ODIs. And so

[Charles Margossian]:

essentially, what they realize is, if when you're solving, when you're

[Charles Margossian]:

using a numerical integrator to evaluate your likelihood, really you're

[Charles Margossian]:

not computing the true likelihood, you're computing an approximation

[Charles Margossian]:

of this likelihood. And we have a lot of tools in statistics when we're

[Charles Margossian]:

not dealing with the exact likelihood, but some approximation to this

[Charles Margossian]:

likelihood, notably in the field of importance sampling.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And what they came up with is a way to use those tools that exist

[Charles Margossian]:

in importance sampling. to actually check whether the approximate likelihood

[Charles Margossian]:

and therefore the tuning parameter of the ODE integrators are as precise

[Charles Margossian]:

enough

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

or not, right? And so that gives you a diagnostic that you can use.

[Charles Margossian]:

It's not a completely perfect diagnostic and I think I'm still trying

[Charles Margossian]:

to test that idea and play around with it and see how well it really

[Charles Margossian]:

works. So I wanna try it out on pharmacometrics problems. Um, right

[Charles Margossian]:

now I'm writing, uh, I've been tasked with writing, um, a tutorial on

[Charles Margossian]:

Torsten. So we released part one a while ago. Now we're writing part

[Charles Margossian]:

two and we promised that in part two, we'd explain how to tune the ODEs.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Um, except we only know so much how to tune the ODEs. So I am trying

[Alex Andorra]:

Yeah.

[Charles Margossian]:

this method. I am trying this method, um, on the ODEs, but it's just

[Charles Margossian]:

getting me, you know, thinking about. are the tolerances we're using

[Charles Margossian]:

too conservative? Are they too strict? We actually get important

[Charles Margossian]:

speed ups. I'm teaching a course this September in Leuven, the Advanced

[Charles Margossian]:

Summer School in Basie Methods, where I'm gonna have the students

[Charles Margossian]:

on an epidemiology problem, on a pharmacokinetic problem, try out different

[Charles Margossian]:

tolerances and see the differences and then build this diagnostic

[Charles Margossian]:

that's based on important sampling to check whether the precision with

[Charles Margossian]:

which they're solving their ODEs is making meaningful changes to

[Charles Margossian]:

the inference, right?

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And so again, I think that this is one tuning parameter where either

[Charles Margossian]:

we're using ODE solvers that are not precise enough or ODE solvers

[Charles Margossian]:

that are too slow. We're not being optimal in our computation. And this

[Charles Margossian]:

is preventing us from either getting accurate answers or deploying

[Charles Margossian]:

models with the sophistication that we would want to. And so that's a development

[Charles Margossian]:

that I'm excited about. The one caveat that I will throw in this is

[Charles Margossian]:

that right now we're still thinking about it as a single tuning parameter.

[Charles Margossian]:

When what I've observed in practice is that the behavior of the ODE

[Charles Margossian]:

can change widely depending on where you are in the parameter space.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

So for certain parameter values, you don't need to be super precise.

[Charles Margossian]:

And for other parameter values, you need a lot more precision or you

[Charles Margossian]:

need a different type of integrator because the OD just behaves in

[Charles Margossian]:

a different way.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

Now very concretely, how does this manifest? So I don't wanna say it's

[Charles Margossian]:

hopeless, but what ends up happening is during the warmup phase,

[Charles Margossian]:

where we start the Markov chains far off in the parameter space, or we

[Charles Margossian]:

haven't tuned the MCMC sampler, so the Markov chains are still jumping

[Charles Margossian]:

left and right.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

During the warm-up phase, we are more vulnerable to those extreme parameter

[Charles Margossian]:

values. Whereas during the sampling phase, we get away with less strict

[Charles Margossian]:

body solvers. So I think that somehow, what I would like to do is two

[Charles Margossian]:

things. One, I would like to have a very automatic way of running this

[Charles Margossian]:

diagnostics in Torsten. But I also wanna give users control over what

[Charles Margossian]:

OD solver do they use at different stages of MCMC. They think that

[Charles Margossian]:

makes a crucial difference. Another way to approach this problem is coming

[Charles Margossian]:

up with good initializations. If I can start... my MCMC near, you know, within

[Charles Margossian]:

the parameter space where I might land from the stationary distribution.

[Charles Margossian]:

And I know that here, the parameter values are a bit less absurd.

[Charles Margossian]:

And so solving the ODEs is a bit more feasible computationally. If

[Charles Margossian]:

I can start there, then maybe I'm skipping the early regions that really

[Charles Margossian]:

frustrate my ODE integrator and my MCMC sample. And the way this manifests

[Charles Margossian]:

in practice is you'll have, you know, you run, let's say you run

[Charles Margossian]:

eight chains. You have six of them that finish quickly, and then you

[Charles Margossian]:

have two of them that are lagging because they're stuck somewhere

[Charles Margossian]:

during the warmup phase. They're encountering this region where the

[Charles Margossian]:

parameter values are a little bit absurd. Your OD is super hard to

[Charles Margossian]:

solve, and that's eating up all your computation. And the truth

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

is, you know, at least... And the way we do things right now, we always

[Charles Margossian]:

wait for the slowest chain. By

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

the way, we don't have to do it, right? So I'm excited about methods

[Charles Margossian]:

to come up with good initializations. And I think that this is a place where variational

[Charles Margossian]:

inference can be good, right? So the, especially

[Alex Andorra]:

Aha.

[Charles Margossian]:

now the pathfinder variational inference, right,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

was originally designed to produce good initializations for NCNC. And

[Charles Margossian]:

so getting good initializations, that's a great example of, I need a good answer,

[Charles Margossian]:

but not a super precise answer. Right. Uh, and so, you know, if somehow

[Charles Margossian]:

pathfinder can help me skip the regions that frustrate OD integrators,

[Charles Margossian]:

I think that's a big win for pharmacometrics. Again, that's, that's

[Charles Margossian]:

something we have to test and really try out. But I will say now going

[Charles Margossian]:

back, I'm going to make another connection back to our hat. which is

[Charles Margossian]:

when we think about over dispersion. really what over dispersion means.

[Charles Margossian]:

So we have shown a bit formally that what makes our hat reliable and

[Charles Margossian]:

we define reliability in a formal sense is the initial variance has

[Charles Margossian]:

to be large relative to the initial bias. Now, if you have an initialization,

[Charles Margossian]:

like you draw your sample from your prior and that's reliable,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

and then you throw variation inference, right? If variation inference reduces

[Charles Margossian]:

your squared bias more than it reduces your variance, it turns out

[Charles Margossian]:

you preserve the property of reliability. So there's actually a sense

[Charles Margossian]:

that we might be able to get good initializations for MCMC without

[Charles Margossian]:

compromising, uh, the reliability of our diagnostics for convergence,

[Charles Margossian]:

right? And these are all the pieces that I think can come together and

[Charles Margossian]:

really help a great deal with, you know, pharmacometrics, but more generally,

[Charles Margossian]:

uh, with OD based models. and even more generally, models based on implicit

[Charles Margossian]:

functions. And I do include things that use nested Laplace approximations

[Charles Margossian]:

in that, because that's an optimization problem, that's an implicit

[Charles Margossian]:

function. It has the same kind of misbehaviors that an ODE has, but also

[Charles Margossian]:

technologies that we develop for ODE is in a Bayesian context, better initializations,

[Charles Margossian]:

different tolerances, important sampling corrections would apply

[Charles Margossian]:

to nested Laplace stuff. So those are the things that I'm excited about

[Charles Margossian]:

Um, but it's going to take time.

[Alex Andorra]:

Hey,

[Charles Margossian]:

I just

[Alex Andorra]:

yeah.

[Charles Margossian]:

want to, I just want to be perfectly honest. It takes time to really,

[Charles Margossian]:

you know, uh, you know, between the paper and. The software implementation

[Charles Margossian]:

and, uh, you know, clear description in the manual that the users can,

[Charles Margossian]:

can follow. It takes a lot of time.

[Alex Andorra]:

Yeah, yeah, for sure. I mean, that stuff is really at the frontier of the

[Alex Andorra]:

research. So it does make sense that it takes time to percolate

[Charles Margossian]:

Yeah, yeah.

[Alex Andorra]:

from how to find a solution to, okay, this is how we can implement it and

[Alex Andorra]:

reproduce it reliably.

[Charles Margossian]:

Yeah, exactly. And I'm most concerned about how speculative I am for

[Charles Margossian]:

some of the ideas I'm sharing. But I think these are directions where

[Charles Margossian]:

it's worth pushing the research. That has the potential to have a

[Charles Margossian]:

really big impact.

[Alex Andorra]:

Yeah, yeah, for sure, for sure. And so I put in the show notes the Pathfinder

[Alex Andorra]:

paper, actually. That made me think that I should do an episode about the

[Alex Andorra]:

Pathfinder paper, basically, what that is about and what that means concretely.

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

So yeah, I'll try to do that

[Charles Margossian]:

Yeah.

[Alex Andorra]:

in

[Charles Margossian]:

And

[Alex Andorra]:

the

[Charles Margossian]:

if you haven't

[Alex Andorra]:

near

[Charles Margossian]:

reached

[Alex Andorra]:

future.

[Charles Margossian]:

out to Lu Zhang, you know, I mean, it, you know, first of all, the paper

[Charles Margossian]:

is great. But actually sitting down and discussing this with her,

[Charles Margossian]:

you know, at the blackboard or whatever, like, she has so many ideas that

[Charles Margossian]:

have not appeared in the paper itself. I think, you know, if I can recommend

[Charles Margossian]:

a guest. If you haven't had her already, I don't know if you... But

[Charles Margossian]:

yeah, Lu

[Alex Andorra]:

No,

[Charles Margossian]:

Zheng, I think,

[Alex Andorra]:

I didn't.

[Charles Margossian]:

would be fantastic to interview.

[Alex Andorra]:

Yeah, yeah, exactly. I was actually thinking about inviting her on the podcast

[Alex Andorra]:

to talk about Pathfinder because she's the lead author on the paper and also

[Alex Andorra]:

all the other authors have been on the show, Bob

[Charles Margossian]:

Ah

[Alex Andorra]:

Carpenter,

[Charles Margossian]:

cool.

[Alex Andorra]:

Andrew

[Charles Margossian]:

Cool.

[Alex Andorra]:

Gellman, Akive Tareiso. Lu Zheng is missing, so definitely

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

need to correct that. So yeah, in the near future, definitely try. to have

[Alex Andorra]:

that episode that would be a very interesting one. So maybe can you talk

[Alex Andorra]:

Charles about... So I'll give you two avenues and you pick the one you prefer

[Alex Andorra]:

because I don't want to take too much of your time but basically I'm curious

[Alex Andorra]:

either about hearing an example of your work in the epidemiology field Or you

[Alex Andorra]:

can talk a bit more in general about a very, very common question that students

[Alex Andorra]:

always ask me, and it's about priors. So basically, how do you choose the

[Alex Andorra]:

prior, how do you approach the challenge of choosing appropriate prior distributions,

[Alex Andorra]:

and especially when you're dealing with complex models. So these are the two avenues

[Alex Andorra]:

I have in mind. And feel

[Charles Margossian]:

Okay.

[Alex Andorra]:

free to... Pick one or... Pick both.

[Charles Margossian]:

Okay, so let me say that about priors, I think that...

[Charles Margossian]:

I want to get better answers,

[Alex Andorra]:

Yeah.

[Charles Margossian]:

a better answer to this question. And, you know, like the next time

[Charles Margossian]:

workshop that I'm giving, I don't have a module on priors that I find

[Charles Margossian]:

satisfactory. And so I'm still undergoing this journey. But what I will

[Charles Margossian]:

do is I'll give, I'll talk about epidemiology and I will talk about

[Charles Margossian]:

the prior that we use there. So that will be, that would be an example.

[Charles Margossian]:

And, you know, I like to think. through examples, I like to think

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

through the anecdotal, as complementary to the formal, right? I'm

[Charles Margossian]:

a big fan of fairy tales and fables, simple stories that have good themes.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

So what happened in epidemiology, and this will be a good example, is, well,

[Charles Margossian]:

the pandemic happened, and suddenly COVID, we're all going home.

[Charles Margossian]:

And we had colleagues in epidemiology, in particular Julien Rioux,

[Charles Margossian]:

who actually met Julien Rioux at the StankCon. He was in Cambridge,

[Charles Margossian]:

he was a PhD student at the time and he demonstrated his model and

[Charles Margossian]:

he was using those ODI based models. So now instead of having a drug

[Charles Margossian]:

compound that flows between different parts of the body, what you have

[Charles Margossian]:

is... you separate the population, the human population, to these

[Charles Margossian]:

different compartments. So susceptible individuals, infected individuals,

[Charles Margossian]:

recovered individuals, and then the individuals flow between the compartments.

[Charles Margossian]:

And there are a bit more layers. But basically the mathematical formalism

[Charles Margossian]:

that I had familiarized myself with in the context of pharmacometrics

[Charles Margossian]:

turned out to be very relevant to... certain classes of epidemiological

[Charles Margossian]:

models. And essentially, Julien was working on an early model of COVID-19.

[Charles Margossian]:

They were trying to estimate the mortality rate. There was a lot of uncertainty

[Charles Margossian]:

in the data. There were a lot of

[Charles Margossian]:

things to correct for. Right. So, for example, early on, not everyone

[Charles Margossian]:

got tested and testing was not widely available.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

Who got tested? The people with severe symptoms. So now if you think about

[Charles Margossian]:

you're trying to estimate the mortality rate

[Alex Andorra]:

Yeah.

[Charles Margossian]:

and according to your data, the people who catch the disease are

[Charles Margossian]:

only the ones who have severe symptoms, then it looks like a lot of

[Charles Margossian]:

people are dying from the disease. I mean, a lot of people were dying

[Charles Margossian]:

from the disease, but that inflates the number. There's a bias

[Alex Andorra]:

Yeah,

[Charles Margossian]:

because you're

[Alex Andorra]:

yeah,

[Charles Margossian]:

only

[Alex Andorra]:

for sure.

[Charles Margossian]:

testing the people who are sick. You're not testing the people with

[Charles Margossian]:

mild symptoms or even with no symptoms.

[Alex Andorra]:

No, for sure. So, clear sampling by ASEA.

[Charles Margossian]:

The other bias was some of the people who had caught the virus had

[Charles Margossian]:

not died yet. So you count them as living, but that doesn't mean they

[Charles Margossian]:

survived the disease. So that's a bias

[Alex Andorra]:

Yeah.

[Charles Margossian]:

in another direction. And that's an example where you actually have

[Charles Margossian]:

a somewhat mechanistic model. That's based on the epidemiology of

[Charles Margossian]:

how a disease transmits and circulates in a population. Then on top

[Charles Margossian]:

of that, you need to build a measurement model to account for, you know,

[Charles Margossian]:

how the data is collected. Right. But at the end of the day, none of the,

[Charles Margossian]:

you know, we were not able to draw any conclusions. unless we understood

[Charles Margossian]:

what was the rate of people who were symptomatic or had severe symptoms.

[Charles Margossian]:

Right. And so there's one parameter in the model, which is the asymptomatic

[Charles Margossian]:

rate. And so now you have two options in a classic classical statistics

[Charles Margossian]:

framework. Either you fix the parameter and then you're making a

[Charles Margossian]:

strong assumption and maybe you try different values of the fixed

[Charles Margossian]:

parameter. Right. Or you just say, well, I don't know this. So really

[Charles Margossian]:

I have no idea what the mortality rate is because maybe the entire

[Charles Margossian]:

population was infected or maybe only a small fraction was infected

[Charles Margossian]:

and everyone in that small fraction had the severe disease, right? And

[Charles Margossian]:

so we needed an in-between, between saying we don't know anything and saying

[Charles Margossian]:

we know everything. And this is why I think we're basing shines, which

[Charles Margossian]:

is we can quantify uncertainty. We have more nuanced statements. through the

[Charles Margossian]:

language of probability about what our state of knowledge is. And actually

[Charles Margossian]:

what had happened is there were some instances where we had measured asymptomatic

[Charles Margossian]:

rates. The example was a cruise ship, the Diamond Princess, in the

[Charles Margossian]:

coast of Japan. So they had identified some cases of COVID-19. They

[Charles Margossian]:

put the cruise ship in quarantine and they test everybody, regardless

[Charles Margossian]:

of whether they had symptoms or not. So now you have a small population

[Charles Margossian]:

and based on that small population, you get an estimate of the

[Alex Andorra]:

Yeah.

[Charles Margossian]:

asymptomatic rate. And then you had one or two incidents where people

[Charles Margossian]:

do that. There were some cities where they had done some other experiments

[Charles Margossian]:

and measured some other data. And so then you bring all that information

[Charles Margossian]:

and you use that information. to construct priors and then to propagate uncertainty

[Charles Margossian]:

into the model, right? The reason we're able to make predictions and

[Charles Margossian]:

then to

[Charles Margossian]:

calculate things like the mortality rate with the appropriate uncertainty

[Charles Margossian]:

is because we had a prior on the asymptomatic rate. And that's a very

[Charles Margossian]:

nice example. This is more an example of why it was crucial to have

[Charles Margossian]:

a prior rather than, you know. How should you construct priors in general,

[Charles Margossian]:

right? This is a bit of a specific case,

[Alex Andorra]:

Yep.

[Charles Margossian]:

but it's a very good example. And so I'll recommend two papers. One is the

[Charles Margossian]:

one by Julien Rioux, which was, so this was disappeared in PLOS Medicine.

[Charles Margossian]:

And it's with a lot of contributors, but it's an estimation of seroscoped

[Charles Margossian]:

tumor mortality during the early stages of an epidemic. And then the

[Charles Margossian]:

other paper, that goes a bit more into what are the lessons that we

[Charles Margossian]:

learned from this, from a Bayesian workflow perspective, is Bayesian

[Charles Margossian]:

workflow for disease transmission modeling in Stan. And so here the first author

[Charles Margossian]:

was Leo Grintjerdsch.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And then we had also Lisa Semenova and Julien Rioux as co-authors.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And this is a beautiful paper that really goes into the, you know,

[Charles Margossian]:

here's the prior reuse. Here's the first version of the model. Here

[Charles Margossian]:

are the limitations with this model. Here's how we diagnose the models.

[Charles Margossian]:

Here's the next iterations. And we go through all the iterations. And I

[Charles Margossian]:

like to show the numbers that the model that eventually we used to model

[Charles Margossian]:

COVID-19 was the 15th iteration. And along the way we had the model

[Charles Margossian]:

that took three days to run. And we had to change the way we wrote

[Charles Margossian]:

the Stan model to improve the computation. So that had to do with how

[Charles Margossian]:

the OD was parameterized and how the automatic differentiation was happening.

[Charles Margossian]:

We got it from three days to two days. And that's useful not, sorry,

[Charles Margossian]:

three days to two hours, not two days, two hours, drastic speed up.

[Charles Margossian]:

And so that not only was good because the inference is faster, but that's

[Charles Margossian]:

what allowed us to then use more sophisticated versions of the model.

[Charles Margossian]:

And so

[Alex Andorra]:

Yeah.

[Charles Margossian]:

all that is described in that. Bayesian workflow for disease transmission.

[Charles Margossian]:

paper.

[Alex Andorra]:

Hmm, oh yeah, need

[Charles Margossian]:

So

[Alex Andorra]:

that.

[Charles Margossian]:

that's my epidemiology fairy tale. And I don't mean

[Alex Andorra]:

Yeah.

[Charles Margossian]:

that in the sense that it had a happy ending or that everything was

[Charles Margossian]:

nice and glowing. I mean that this is a very nice story that touches

[Charles Margossian]:

upon a lot of interesting things.

[Alex Andorra]:

Yeah, yeah, for sure. Definitely going to use link to this paper in the show

[Alex Andorra]:

notes. Already have it on the Stan website. Actually, you've done a case

[Alex Andorra]:

study on this, so that's perfect with all the co-authors. So I'm going to put

[Alex Andorra]:

that right now in the show notes. And maybe last question before letting you

[Alex Andorra]:

go, Charles. I'm breaking records these days on the episodes. Like episode 89

[Alex Andorra]:

is going out this week actually. And it's so far the longest episode. Uh, it's

[Alex Andorra]:

about two hours. And right now we're like approaching this record, uh, Charles.

[Alex Andorra]:

So, uh, like well done. And at the same time.

[Charles Margossian]:

Okay. There's a fantastic podcast and I forget the name. It's a German

[Charles Margossian]:

podcast.

[Alex Andorra]:

Uh huh.

[Charles Margossian]:

I can't believe I forget the name. I'll try and email you what it is,

[Charles Margossian]:

but basically they do interviews with these, you know, intellectuals

[Charles Margossian]:

and well established, you know, politicians and well, I'm not saying

[Charles Margossian]:

all politicians are intellectuals, but people have opinions and thoughts and there's

[Charles Margossian]:

no time limit to the interview. And they just go on and on and on and on and

[Charles Margossian]:

on. And they just have so many topics to discuss. It's really, I'm

[Charles Margossian]:

not saying you should do that with me, but you could imagine like,

[Charles Margossian]:

you know, someone like, you know, Some of the other co-authors that

[Charles Margossian]:

have come up, I feel like you could talk six hours with them and you

[Charles Margossian]:

would just

[Alex Andorra]:

Oh

[Charles Margossian]:

pick

[Alex Andorra]:

yeah,

[Charles Margossian]:

their

[Alex Andorra]:

for

[Charles Margossian]:

brand

[Alex Andorra]:

sure.

[Charles Margossian]:

and they would have so much insights to

[Alex Andorra]:

Yeah.

[Charles Margossian]:

share.

[Alex Andorra]:

No, for sure.

[Charles Margossian]:

So

[Alex Andorra]:

Like, I mean, most

[Charles Margossian]:

they

[Alex Andorra]:

of

[Charles Margossian]:

are.

[Alex Andorra]:

the time the limitation is the, is the guest's own, own time.

[Charles Margossian]:

Right, right,

[Alex Andorra]:

Yeah.

[Charles Margossian]:

right. But the

[Alex Andorra]:

Yeah, for

[Charles Margossian]:

dev,

[Alex Andorra]:

sure.

[Charles Margossian]:

yeah, I think it's really cool, this idea of, yeah, we're gonna do, what

[Charles Margossian]:

if we didn't have a time limit on the interview? What would happen?

[Charles Margossian]:

And

[Alex Andorra]:

Exactly.

[Charles Margossian]:

eventually somebody gets hungry and that's what happens, but. Ha ha

[Charles Margossian]:

ha.

[Alex Andorra]:

Yeah, yeah, exactly. Yeah, but so basically before asking you the last two

[Alex Andorra]:

questions, I'm also wondering, because you also teach, so that's super interesting

[Alex Andorra]:

to me. And I often hear that a lot of practitioners and beginners in particular

[Alex Andorra]:

might be hesitant or even intimidated. to adopt patient methods because they perceive

[Alex Andorra]:

them as complex. So I don't necessarily agree with that underlying assumption,

[Alex Andorra]:

but without necessarily disputing it, what do you do in those cases? Basically,

[Alex Andorra]:

what resources or strategies do you recommend to those who want to learn

[Alex Andorra]:

and apply patient techniques in the work, but might be intimidated or hesitant?

[Charles Margossian]:

Yeah, okay, so I think that usually when I teach.

[Charles Margossian]:

Most of the time it's people who are already interested in Bayesian

[Charles Margossian]:

methods,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

especially if it's a workshop on stand, the people who do sign up, they already

[Charles Margossian]:

have adhered to Bayesian methodology. So I don't know that I've really

[Charles Margossian]:

had to convince that many people. Definitely when I was a TA at Columbia,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

I had those conversations a bit more with the

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

students, especially when I TA'd the PhD level. that everyone has to

[Charles Margossian]:

take applied statistics and

[Alex Andorra]:

Hm.

[Charles Margossian]:

not everyone is gonna do Bayesian, right? And so we have those conversations.

[Charles Margossian]:

But I think that ultimately what it is, is not that Bayesian is complex,

[Charles Margossian]:

it's analysis is complex. Analysis is difficult. And when you use less

[Charles Margossian]:

complicated methods,

[Charles Margossian]:

like, maximum likelihood estimates or point estimates. And these are

[Charles Margossian]:

not simple. I really don't want to undermine the difficulty related to

[Charles Margossian]:

those methods and the fact that they are useful in a lot of applications.

[Charles Margossian]:

But anyway, those methods are simple and they will work for simple analysis.

[Charles Margossian]:

But if an analysis requires you quantify uncertainty, to propagate uncertainty,

[Charles Margossian]:

to take some decisions with imperfect data. And then someone says,

[Charles Margossian]:

well, I don't wanna be Bayesian because that's complicated, but I

[Charles Margossian]:

still want to quantify uncertainty and I still want to propagate uncertainty

[Charles Margossian]:

and I still want to make predictions. Well, suddenly the classical methods,

[Charles Margossian]:

you really have to do a lot of gymnastic to get them to do what you're

[Charles Margossian]:

interested in. And And so the classical methods also become complicated.

[Charles Margossian]:

And that's more because you're trying to use them for, uh, you know,

[Charles Margossian]:

to do a sophisticated analysis. And so I think that what matters

[Charles Margossian]:

is, you know, to really meet practitioners, uh, where they are, what

[Charles Margossian]:

is the problem they're interested in? And. you know, does the, you

[Charles Margossian]:

know, is it, is it a problem where, you know, the, the complexity of the

[Charles Margossian]:

analysis would be handled in a relatively straightforward way by a Bayesian

[Charles Margossian]:

analysis? And the answer could be yes, it could also be no. And if it's

[Charles Margossian]:

no, that's fine. You know, I think that again, we have, we have to be

[Charles Margossian]:

loyal to, to the problem, more so than to the field or to, or to the

[Charles Margossian]:

method, but that's how I would start this conversation. And so when,

[Charles Margossian]:

you know, when I'm sitting with pharmacometricians, And by the way, a

[Charles Margossian]:

lot of them don't use Bayesian. I think I can talk for 30 minutes

[Charles Margossian]:

about what do we want to get out of a pharmacokinetic analysis without

[Charles Margossian]:

bringing up what kind of statistics we do. And once we've established

[Charles Margossian]:

the common goals, then we think about what are the methods that are

[Charles Margossian]:

going to get there. That's how I would do it. Frankly, I haven't had that

[Charles Margossian]:

much experience doing it. So take what I say with a grain of salt.

[Charles Margossian]:

Yeah, but I really think it's, Bayesian is complicated because it confronts

[Charles Margossian]:

you with the complexity of data analysis. Right? It doesn't

[Alex Andorra]:

Yeah.

[Charles Margossian]:

introduce the complexity. Let me put it this way.

[Alex Andorra]:

Yeah, yeah. No, completely agree. And that usually is... I usually give an answer

[Alex Andorra]:

along those lines. So

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

that's interesting. To see that we have converged, even though we didn't, we

[Alex Andorra]:

didn't, you know, talk about that before the show. so that people don't

[Alex Andorra]:

start having a conspiracy theories about Charles and I trying to push back the

[Alex Andorra]:

message.

[Charles Margossian]:

This is not rehearsed.

[Alex Andorra]:

Exactly. Awesome. Well, is there a topic I didn't ask you about that you'd like

[Alex Andorra]:

to mention before we close up the show?

[Charles Margossian]:

Um, no. Let me keep

[Alex Andorra]:

Perfect.

[Charles Margossian]:

it simple.

[Alex Andorra]:

Yeah. I mean, we've talked about a lot

[Charles Margossian]:

There

[Alex Andorra]:

of

[Charles Margossian]:

are

[Alex Andorra]:

things.

[Charles Margossian]:

a lot of topics that I think we could have very interesting conversations

[Charles Margossian]:

about, but I also think that we're starting to hit the time constraints.

[Alex Andorra]:

Awesome. Well, let's go to show then. Me too, I could keep asking you a lot

[Alex Andorra]:

of things, but let's do another episode. Another day. That'd be a fun thing.

[Alex Andorra]:

Or if one day you have a model you've been working about and that you think

[Alex Andorra]:

would be beneficial for listeners to go through. I have this new format now,

[Alex Andorra]:

which are the modeling webinars, where basically you would come on the show

[Alex Andorra]:

and share your screen and show us your code and do some live coding about

[Alex Andorra]:

basically a project that you've been working on. And so if one day you have

[Alex Andorra]:

something like that, feel free to get in touch and we'll get that organized

[Alex Andorra]:

because... It's a really cool new format and I really love it. In August,

[Alex Andorra]:

we've had Justin Boyce showcasing how to do Bayesian modeling workflow in the

[Alex Andorra]:

biostatistics world. So even if you're not a biostatistician, it's a really

[Alex Andorra]:

useful thing because basically, as we were saying, Bayes are basically methods.

[Alex Andorra]:

Even though the field might... not be yours, the methods are definitely transferable

[Alex Andorra]:

in the workflow. So that was a very fun one. And then we've got one coming

[Alex Andorra]:

up in September in theory with Benjamin Vincent and we're going to dive into

[Alex Andorra]:

the new do operator that we have in Pimc and how to use that. So that's

[Alex Andorra]:

going to be a very fun one too.

[Charles Margossian]:

Yeah, I love the idea of that format, and I'm definitely going to

[Charles Margossian]:

check out the two webinars.

[Alex Andorra]:

Yeah,

[Charles Margossian]:

The

[Alex Andorra]:

yeah.

[Charles Margossian]:

one that's already there and the one that's coming up. Actually, the

[Charles Margossian]:

two topics sound extremely interesting.

[Alex Andorra]:

Yeah, I'll send that to you. And yeah, for sure that that's something I've

[Alex Andorra]:

been starting to do. And so if you have one such analysis one day, feel

[Alex Andorra]:

free to reach out that'd be a fun one. Um,

[Charles Margossian]:

Absolutely.

[Alex Andorra]:

so before letting you go, of course, I'm going to ask you the last two questions

[Alex Andorra]:

I ask every guest at the end of the show. One, if you had unlimited time

[Alex Andorra]:

and resources, which problem would you try to solve?

[Charles Margossian]:

Oh yeah, I thought about this and

[Alex Andorra]:

Hehe.

[Charles Margossian]:

I feel like so much of my work is about working under computational

[Charles Margossian]:

constraints,

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

you know, and time constraints and resource constraints and suddenly you relax

[Charles Margossian]:

all of this. And...

[Charles Margossian]:

I think. For me, they're really big questions.

[Charles Margossian]:

let's say at a curiosity level, right? I think there are, you know,

[Charles Margossian]:

there are utilitarian questions and philanthropic questions that

[Charles Margossian]:

I might prioritize. But my thinking right now is, I really love the

[Charles Margossian]:

problems I worked on in astronomy. I am mind blown by some of the stuff

[Charles Margossian]:

my colleagues do in cosmology, where they're trying to understand, you know,

[Charles Margossian]:

the early universe. They have models with six parameters that apparently

[Charles Margossian]:

explain the entire structure of the universe. I like to understand that

[Charles Margossian]:

a little bit better. I love the work I did on exoplanets. I think

[Charles Margossian]:

thinking about... Yeah, is there life on other planets? How does it

[Charles Margossian]:

manifest? My advisor when I was an undergrad, she...

[Charles Margossian]:

She put it a good way, she said, you know, astronomy helps us think

[Charles Margossian]:

about our position in the universe, our place in the universe.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And you know, yes, I have unlimited time and resources in this completely

[Charles Margossian]:

ideal scenario. I think I would gravitate towards these really, really

[Charles Margossian]:

big questions, which by the way, I can still get involved in

[Alex Andorra]:

Heh

[Charles Margossian]:

right

[Alex Andorra]:

heh.

[Charles Margossian]:

now as a researcher in Flat Iron, but it's true that there's more

[Charles Margossian]:

competition for that. Um, and just, there's a scene that I, um,

[Charles Margossian]:

Yeah, okay. If I had thought about this more, I would have just started

[Charles Margossian]:

there. But there's a scene that I really like. This goes back to cosmology,

[Charles Margossian]:

to astronomy, you know, which is this idea of a theory of everything

[Charles Margossian]:

and, you know, a very fundamental model.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And, you know, maybe it all reduces to one equation or one set of equations.

[Charles Margossian]:

And what I really wonder is, you know, if we did have that model and

[Charles Margossian]:

that theory, how much insight would we actually get from it? because

[Charles Margossian]:

you can have a simple system with simple particles and simple interaction

[Charles Margossian]:

rules. And that doesn't mean you understand the emerging behavior of

[Charles Margossian]:

the system. And if I had unlimited resources and I can actually figure

[Charles Margossian]:

out what that equation is and then run the simulations with infinite

[Charles Margossian]:

computation and then study the behavior. And actually, at a very conceptual

[Charles Margossian]:

level, understand how much insight we get out of this.

[Alex Andorra]:

Yeah.

[Charles Margossian]:

And the example I like to give is, The rules of chess are simple,

[Charles Margossian]:

but just because you understand the rules of chess doesn't mean you understand

[Charles Margossian]:

chess.

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

And I think that there is a, there is a tension with the reductionist

[Charles Margossian]:

view of physics and the state of the world that we live in, that is

[Charles Margossian]:

kind of related to that. And that sometimes the simplifications that

[Charles Margossian]:

we use, involvement of probability theory of statistics. That's still

[Charles Margossian]:

useful, not just because we don't have the computation to run all the

[Charles Margossian]:

simulations based on fundamental equations, but because it actually is more

[Charles Margossian]:

intelligible to us. And yeah, I think we've only made the time and

[Charles Margossian]:

resources. You could really explore that. Once

[Alex Andorra]:

Hmm.

[Charles Margossian]:

you can run all the simulations you want, what are the actual models that teach

[Charles Margossian]:

you something and that gives you insight?

[Alex Andorra]:

Mm hmm.

[Charles Margossian]:

Let's try

[Alex Andorra]:

Yeah.

[Charles Margossian]:

that.

[Alex Andorra]:

Yeah, that sounds like a fun one for sure. And second question, if you could

[Alex Andorra]:

have dinner with any great scientific mind, dead, alive or fictional, who would it

[Alex Andorra]:

be?

[Charles Margossian]:

I read the question yesterday and I really have...

[Charles Margossian]:

I, um...

[Charles Margossian]:

I like the idea of, that's not a definite answer. Let's put it as an answer.

[Charles Margossian]:

But I like the idea of talking, you know, with one of the founders

[Charles Margossian]:

of hypothesis testing. So, you know, Naaman Pearson, Fisher, and,

[Charles Margossian]:

you know, and by the way, just because I'm having the dinner with them,

[Charles Margossian]:

that doesn't mean I condone everything they've done and their character

[Charles Margossian]:

and their behavior. Just to follow this closure. But I think that I'd

[Charles Margossian]:

be interested to ask them, why were you thinking when you came up with

[Charles Margossian]:

those ideas? And what do you make of how this method that you've

[Charles Margossian]:

developed has been used and misused

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

in current days? And what do

[Alex Andorra]:

Mm-hmm.

[Charles Margossian]:

you, and also like, I don't know how much Bayesian methods were on

[Charles Margossian]:

their radar, but what would they think of Bayesian methods now that

[Charles Margossian]:

we have the computations that are available. So kind of, you know, someone

[Charles Margossian]:

from the early 20th, late 19th century, one of those, you know, statisticians

[Charles Margossian]:

that's described as having done something foundational, but also that

[Charles Margossian]:

has worked with, you know, on a field, on a branch of statistics that

[Charles Margossian]:

historically has been opposed to the branch of statistics that I

[Charles Margossian]:

work on. And I think that could be, you know, hopefully a pleasant. certainly

[Charles Margossian]:

an engaging conversation for dinner.

[Alex Andorra]:

Yeah, very interesting answer and very original. You're the first one to

[Alex Andorra]:

answer that. And I had not even thought about that, but that definitely makes

[Alex Andorra]:

sense.

[Charles Margossian]:

Again, an

[Alex Andorra]:

Very

[Charles Margossian]:

answer.

[Alex Andorra]:

interesting.

[Charles Margossian]:

If having dinner with Laplace is on the table, I'm not saying I wouldn't

[Charles Margossian]:

take that.

[Alex Andorra]:

Yeah, for sure, for sure. But yeah, that's definitely super interesting.

[Alex Andorra]:

And I do remember that I, I mean, in my recollection, which of course is

[Alex Andorra]:

very fuzzy, has any Homo sapiens memory. We talked about that in episode 51

[Alex Andorra]:

with Aubrey Clayton. And we took it about his book. Bernoulli's fallacy

[Alex Andorra]:

and the crisis of modern science. I'll put that in the show notes. And I seem

[Alex Andorra]:

to remember also from his book that the founder of hypothesis testing definitely

[Alex Andorra]:

had an active role in putting aside patient statistics at the time. That

[Alex Andorra]:

it was definitely very, very motivated also in that regard, but I don't

[Alex Andorra]:

remember one. why on the top of my head. But yeah. So yeah, I'll put that

[Alex Andorra]:

into the show notes. I should really listen to this episode also personally.

[Alex Andorra]:

This was a very interesting one. So I recommend it to people who, when

[Alex Andorra]:

I get started with the podcast, actually, I think it's a very good first one

[Alex Andorra]:

to understand a bit more, basically, why you would like to think a bit more about

[Alex Andorra]:

the foundations of the podcast. hypothesis testing and

[Charles Margossian]:

Mm-hmm.

[Alex Andorra]:

why it's interesting to think about other frameworks and why the Bayesian

[Alex Andorra]:

framework is an interesting one.

[Charles Margossian]:

Oh yeah, that sounds great. Yeah. So maybe I could have dinner with, remind

[Charles Margossian]:

me the name

[Alex Andorra]:

Aubrey

[Charles Margossian]:

of your

[Alex Andorra]:

Clayton.

[Charles Margossian]:

guest. Yeah. So I mean, add that to my answer.

[Alex Andorra]:

Awesome. Well, thanks a lot, Charles, for taking the time. As usual, I put

[Alex Andorra]:

resources and a link to your website in the show notes for those who want

[Alex Andorra]:

to dig deeper. Thank you again, Charles, for taking the time and being on this

[Charles Margossian]:

Yeah,

[Alex Andorra]:

show.

[Charles Margossian]:

thanks Alex for the invitation and also the service to the community that

[Charles Margossian]:

I think this podcast is. It's really a fantastic resource. So thank

Previous post
Next post