#82 Sequential Monte Carlo & Bayesian Computation Algorithms, with Nicolas Chopin

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

——————————————————————————

Max Kochurov’s State of Bayes Lecture Series: https://www.youtube.com/playlist?list=PL1iMFW7frOOsh5KOcfvKWM12bjh8zs9BQ

——————————————————————————

We talk a lot about different MCMC methods on this podcast, because they are the workhorses of the Bayesian models. But other methods exist to infer the posterior distributions of your models — like Sequential Monte Carlo (SMC) for instance. You’ve never heard of SMC? Well perfect, because Nicolas Chopin is gonna tell you all about it in this episode!

A lecturer at the French university of ENSAE since 2006, Nicolas is one of the world experts on SMC. Before that, he graduated from Ecole Polytechnique and… ENSAE, where he did his PhD from 1999 to 2003.

Outside of work, Nicolas enjoys spending time with his family, practicing aikido, and reading a lot of books.

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady and Kurt TeKolste.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

Links from the show:

Old episodes relevant to these topics:
LBS #14, Hidden Markov Models & Statistical Ecology, with Vianey Leos-Barajas: https://learnbayesstats.com/episode/14-hidden-markov-models-statistical-ecology-with-vianey-leos-barajas/
LBS #41, Thinking Bayes, with Allen Downey: https://learnbayesstats.com/episode/41-think-bayes-allen-downey/
Nicolas’ show notes:
Nicolas on Mastodon: nchopin@mathstodon.xyz
2-hour introduction to particle filters: https://www.youtube.com/watch?v=mE_PJ9ASc8Y
Nicolas’ website: https://nchopin.github.io/
Nicolas on GitHub: https://github.com/nchopin
Nicolas on Linkedin: https://www.linkedin.com/in/nicolas-chopin-442a78102/
Nicolas’ blog (shared with others): https://statisfaction.wordpress.com/
INLA original paper: https://people.bath.ac.uk/man54/SAMBa/ITTs/ITT2/EDF/INLARueetal2009.pdf
Nicolas’ book, An introduction to Sequential Monte Carlo: https://nchopin.github.io/books.html
Laplace’s Demon, A Seminar Series about Bayesian Machine Learning at Scale: https://ailab.criteo.com/laplaces-demon-bayesian-machine-learning-at-scale/
Paper about Expectation Propagation, Leave Pima Indians Alone – Binary Regression as a Benchmark for Bayesian Computation: https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Leave-Pima-Indians-Alone–Binary-Regression-as-a-Benchmark/10.1214/16-STS581.full
Blackjax website: https://blackjax-devs.github.io/blackjax/

Abstract

by Christoph Bamberg

In episode 82 Nicolas Chopin is our guest. He is a graduate from the Ecole Polytechnique and currently lectures at the French university of ENSAE.

He is a specialist for Sequential Monte Carlo (SMC) samplers and explains in detail what they are, clearing up some confusion about what SMC stands for and when to use them.

We discuss the advantages of SMC over other types of commonly used samplers for bayesian models such as MCMC or Gibbs samplers.

Besides a detailed look at SMC we also cover INLA. INLA stands for Integrated Nested LaPlace Approximation.

INLA can be a fast, approximate sampler for specific kinds of models. It works well for geographic data and relationships, such as for example relationships between regions in a country.

We discuss the difficulties with and future of SMC and INLA and probabilistic sampling in general.

Transcript

please note that the transcript was generated automatically and may therefore contain errors. Feel free to reach out if you’re willing to correct them.

Transcript

Nicholas shoot be on the New England statistics. So yes, again, French person. To him. I'm starting to expand the universe of Bayesian French people. And I'm very happy to have you on the show. Several people have been recommending you including Bob Carpenter, and Oswaldo Martin, so thank you to the both of you for probing although you're listening and Nicola is on the show now. And actually, Nicola, I think you're the first one from the inside university to be in the show. So I'm quite, quite glad. also glad to hear that. Bayesian stats are taught in good hands with you at the end say, I don't know if other people know. But me as a French educated person, of course I know. Preacher, happy to have you on the show. And thank you for taking the time. Thank you. So, as usual, I have so many questions. So so that listeners know kind of a summary. We'll be talking mainly about SMC sequential Monte Carlo because that's one of your specialties. We'll also talk a bit about INLA. In so that's integrated nested Laplace approximation, which a lot of listeners have been asking about there. So it's gonna be mainly about SMC because that's really your specialty, but you also are very knowledgeable about it. So once you take the opportunity, before actually having the opportunity to do a full show about inland. So basically, that's going to be that. But of course, as usual, let's start with your origin story. So can you tell us Nico, how did you come to the stats and data world and how serious of a path that was?

My path to applied mathematics was quite direct because I was always a fan of mathematics. And when I was a kid, I remember reading a magazine and there was this scientist who was doing applied mathematics for rocket science, actual rocket science as well. Applied Mathematics. That's what I want to do. So I'm not it's not very original, but I always wanted to do maths. But in France going to STATS people this for my generation, it's not so obvious, because you will not talk statistics at in high school and in universities are going to decode you will learn probability maybe in failure when their graduate degree and maybe starts later on. It was a bit closed on so and so. This is mainly not very interesting to the listeners. But at the beginning, I wanted to go to computer science. I was a bit of a nerd. I guess I'm still in that and science and for some reason I ended up at the end sir, I really hated it. And that was not a nice thing, but that's part was the part that really attracted me most. It was not talk so well at the time again, oops, I shouldn't say that so directly, and I hope we do a better job that we used to bested I really liked it and I get interesting. And maybe it says received too much of advertising and no idea what was machine learning because at the time, statistics was really detached. It was like a war between probability and statistics, machine learning and percentiles. So yeah, that's the gist of it. Yeah, it's about six from the start, but statistics took me more time. It's quite common. This kind of thing.

Yeah, I couldn't get at least at least for the friendship geeky people. me it was exactly the same thing. I actually was much more into math and we through that you get taught much more, at least at the time. I don't know now. I hope it changed. Yeah. I mean, the fact that you're teaching it insane. It's just like the first the first testimonial they didn't teach it. But it's true that yeah, me I did a lot of linear algebra. Not a basically very conceptual math. Which I loved to the next linear algebra is actually super useful also. So I'm not saying we shouldn't teach linear algebra. It's very Pilar of the methods where we're using today that tax for me at least was just really awful because it was only pen and paper stance. And so that was super boring. And I'm not that good of, you know, compute. I'm not a machine so I don't compute that fast and that reliably. And so it's like, I don't understand why I'm doing that stuff, you know, and also it limits the kind of problems that you can work on because you have to compute with your brain. And the human brain is not very good at that. So it's like, doesn't like I don't really understand the problems I'm working on and I don't see how they are interesting. Whereas if you had taught me statistics, the way I learned it later on by myself, which was with a computer and simulating distributions and simulating events and scenarios, for them, I would have I think it would have been hooked way faster, because it's awesome to simulate everything and see basically the distributions existing in front of it.

Amen to that.

Yeah, I mean, if we have time for sure, we'll we'll talk about a bit more but that educational aspect because I'm really curious about how you see that in how you do that, and say, but there's me can you tell us actually, so you said you started more with applied math really early. But now you're doing way more of a stance oriented research. And so how did that happen? Oh, wow.

So no, I guess it's a bit like you said, I mean, initially started didn't look so exciting, but it's really following questions about their scores, which was Christian was my PhD advisor, but I started to see the connection between computation and statistics and, and Bayesian statistics also was very appealing to me and so I progressive them towards statistics in the mean in the last year or the answer, following Christians course must be I'm not sure. I agree on it

was a big influence on me even before I did a PhD. Now, I mean maybe I should not say to openly that Christians teaching were anointed resistance we will be to feast reasons like me would really like it. And big majority will find it it goes too fast. And so yeah, it's it's that that will majority might appreciate its course, advertising. It was super fast. But I used to really love it.

Yeah, I mean, it's, it's really cool to hear about like kind of life changing courses, classes. And what about patients that in particular, do you remember when you first got introduced to them?

Say so to be a bit more precise. What happened is that I was so I did put it a call Polytechnique in France and I ended up prover availability techniques you might go to visit special tracks for civil servant and I did that but I was really unsure what my choice and in this track you so you go to the insert for instance, in my case, and I realized I was destined to do a PhD. And I had to contact people invest ministration to be sure I could do PhDs and they said, Oh, it shouldn't be about economics, econometrics. And that was what it was when econometrician wanted to play between Bayesian statistics. So that's why I got in touch with Chris Jojoba even before I started this quote, and I started reading his book. And I started to discover MCMC and Bayesian statistics at the same time, so I got really excited about it even before starting this course. So is that right? Yeah. And yeah, I guess the idea of staying in the world of probability, that's what Bayesian statistics does. So probability is beautiful. It's very technical, but it's quite beautiful. And Bayesian statistics, you're still in the realm of probability. I think that's what's really appealing to me. Whereas standard statistics, classical stuff, it's always, oh, we have this problem. Let's do this. Let's do this. And we see a problem. Let's do this. It looks much less principled. Now at least two by nine eyes and with time it looks like that. So it looks like Bayesian statistics was much more principled. Then very nice. Id your present your, your prior knowledge or lack of knowledge for distribution, you get requests a year. I go to the gym immediately, and I stick to it. Yeah, and I teach a statistics course and it was funny because I could see some students were really ticked with Bayesian statistics resolvers were typically things so I guess it's something in your brain some people are Bayesian. Yes,

I mean, that's actually a topic I like talking about in the in the podcast Satoshi with neuroscientists, through episode 77 is actually a lot like a book that we both casual ish. And I think so. Yeah. So 81 and stalker, where we go into that, especially how people visualize to receive visual speed, for instance, which is really easy to priors in and help you understand which parts of the brain brain are truly Bayesian, and which parts are not. And that's a pretty fascinating topic. I'll definitely put these two episodes in the show notes. For people who want to go and dig debrief. They haven't listened to them. And, actually, so now, can you define for listeners, the work that you're doing nowadays, and the topics that you are particularly interested in?

Right, so I'm still very much interested in the fundamentals of Monte Carlo. So just even if you're not in statistics, you just want to approximate and then take over or do you do it? And that's funny because it's there's still work to be done. There's still stuff to learn result we established. We don't have the final word yet. So that's kind of fun. And you don't necessarily consider a posterior distribution as your target could be anything but sometimes knowing about statistics give you insight about constructing good multicolored guidance. So to be a bit more precise, I'm talking about I obviously some paper we've met together, where we try to answer the following question. You want to compute an article in dimension d. And your budget is any valuation of your funds via integrand F. What is the best rate of convergence you can get for your approximation? Veni mot Gallo would be square root one of the square root of n, but you can actually do better if you're ready to assume that your function is smooth in some sense. And so these results were reserved that tells you the optimal of eight if you have for whatever on the algorithm, the optimal rate will be one of the n minus one RF meanness of a D where is very good it and these days you mention it has been known let's since the 50s. And we imagined we managed to come up with adequate events that rich there's a team on rates and, and that we are not really want to work on fees for Swati from permanency which is associated to the answer but it's in hand and funds Yes, another paper with me review where he looks, they look at the same problem and they come up with a different solution. So that's funny because Monte Carlo is like a nonmetal. I mean, I mean, then yeah, Monte Carlo. They asked us this stuff to Petone in this direction. And so that's one thing. And also these days, I work a lot on second channel tickler samplers. So that's where you use sequential Monte Carlo, maybe to sample from one fixed distribution, even if you define a sequence of distribution, you actually integrate into just a single distribution. So I've done work essentially on VAT trying to come up with more efficient SMC samplers, and I'm still working on it. That's a bigger part of my research and these days, I happen to supervise PhD thesis which are more applied, which is very interesting, too. So, one in cardiology with a cardiologist were trying to predict sudden, sudden, sudden death. Smiling that that's a pretty dramatic problem and if we could predict that people have I changed, forgetting something that you could maybe save lives. And after applied walk on customer energy, cognitive sciences, but each time I was lucky to end up with a PhD student who wanted to do this kind of applied work, and that's, that's difficult, but it's also very rewarding. And also to find colleagues in other areas of science who want to work with you. But it's something you also do and I find it really interesting to

Yeah, that's really cool. That was actually like, you kind of answered my next question, which was related to Yeah, you're doing very theoretical stuff in the slow do you do to explain us how like, which impact it actually has? Then like the, the applications and I mean, you you basically answered, I didn't even know you were doing like, do all these different, very different deputations. That sounds really, really interesting. So please put any links in the show notes to any papers or dependencies or anything that came out of that because I think people are going to be very interested Sam, like any cosmology papers or whatever, maybe some papers or you and your, your, the paper you mentioned, you just need to put the improving Monte Carlo algorithms, for sure. That's awesome. Thank you. Sure. There's there are all types of listeners on this broadcast, so they will all be happy. So yeah, for sure. These will be in the show notes, people. So definitely take a look. And so actually, yeah, you mentioned it, I think now is a good time to take a beating to that sequential Monte Carlo, because that's really one of your main areas of research. So we haven't talked a lot about that on the show yet. So can you basically tell us what sequential Monte Carlo is to start with?

Yes, I think it's important to start with that. Thank you for this question. Because there's a there's a few misconception about SMC. So I like to start with saying that I think most people have heard about SMC but they will be confused because by SMC we mean slightly different things. So on one hand, they are what I would call a particulate filter. So again, fsmc algorithm you use in a specific scenario when you have time series data and you want to learn secondarily some latent variable from your model. So the model will be either knockoff or state space model. And base modeller rise a lot for instance in in signal processing, where you want to track the boat or car what to do car navigation, stuff like that. So extreme

model. Yeah, yeah. The prediction models, for instance of like, you're trying to infer the population of sharks or things like that. Actually, for listeners theory, so the whole episode of boutique shamans, hidden Markov models, I put that in the show notes. It was with VNA. Leo's about a half who is doing cool work, but that is especially for that these kinds of models. So, let you continue.

So when you have a hidden Markov model or model, which is a dynamic model with dynamic, state and and variables, you might want to do filtering. So filtering is trying to recover this latent variable secondarily, and then you do this kind of sequential Monte Carlo algorithm. Again at the porch is Kalman filter. And we can then filter is is what you do when you have a specific type of state space model which is linear and Gaussian. And then computing different joint distribution is that computing the mean of ions of a certain wish and distribution and then you can do everything? Exactly. So you don't need to carry and can manage we can and filtering has been popular since the 60s in the 70s. It's really important in a lot of signal processing tasks and other problems. But if your problem is not linear or going on agression, you do maybe a particulate filter. So that's one thing and that's very interesting and I did some work on visa for cannot remain contributor is a lot of people that work on this.

Yeah, that's really nice. Thanks for for talking about that because something I covered a lot during the on the show. So let's call that your mentoring. And I show a saw also that you need to two hour introduction to particle filters that you did on YouTube, and that's in the in the show notes. So if you're interested in particle filters, definitely take a look at that.

And the other thing when we say CMC we might also mean SMC samplers, which I believe them but they are this same type of algorithm because you do something sequentially. And what you do say control is to simulate 100 variables and you compute wait for an important sampling step and then you might have some mock up step. But what is different is no. You consider any sequence of distribution. He doesn't need to be related to the lock of model and in particular, you could be interested in sampling from the posterior distribution, but you just introduce a second sub distribution for instance, it could be the higher than the posterior given one data point when two data points three data points or instead you could do sampling where between the prior and posterior you consider a pseudo posterior which will be prior times likelihood for something and we have experiences between zero and one and you will try to interpolate between the pion posterior or it could be any sequence of distribution and maybe you're really interested in each member of the sequence or maybe you're just creating an artificial sequence to submit for the terminal distribution which is the distribution of interest. Okay, and then when you do this kind of thing, and in particular and wherever the way you mutate with the particles is for MCMC steps, then we're in the realm of SMC samplers, and that's why and if I may add a few words about this, the advantage of SMC samplers mean because you could use MCMC instead, with the begin Vantage is that some of the big advantages is that because you you have many particles together then it's much easier to paralyze your algorithm compared to MCMC which is sequential in nature. So you have this bias ation which comes for free which is embarrassingly easy to do. And also compared to MCMC you get an estimate of marginal likelihood, which is something I get in MCMC. And further advantage is that because you have this particular sample at every time t it makes if you try to come up with recipes to automatically choose certain tuning parameters of the algorithm is matches a vision ETSI just because you have a population of points other than one point in standard MCMC I obviously is clear to the listeners. But what I'm trying to say is that adaptive MCMC is is actually up because every time to use at one point, or maybe you have a complete chain, but maybe it was started in a bad state maybe you're not so well. Whereas in SMC samplers, you have many particles and we have ways not to make sure they are and most if you have some kind of small number they're essentially follow from the current target distribution. Yeah, yeah. Subject so I talked for 10 minutes she's interrupted to play fine.

Yeah. So basically all the all the different topics you just talked about right now. So hidden Markov models, particle filters, all the different types of models you've just mentioned, are more appropriate. It would be very appropriate to use an SMC method or an SMC sampler because it will be more efficient.

Yes, I mean, when I talk about SMC samplers I will use the same. So okay, in which model or problem statistical problems, we will use SMC samplers, essentially any situation where you could use a metal police channel instead. Like even be a nerd saw lounger etc. The reason I'm saying this is because you could find models where you would like to do give something instead and maybe keep some player would be asked to beat because the Gibbs Sampler VD advantage is that it really takes into account the specific structure of your, of your model. With that said, I mean I've been known to be a bit critical of something and I wrote a paper with your credit and logistic regression and for this particular problem, which found that Gibbs sampling was not actually very competitive with even a random walk Metropolis because the big drawback of deep samplers on the other end is very, very specialized. You give me a model. Maybe I can derive a Gibbs Sampler and then change the people that you have to work again and you have a Gibbs Sampler. So maybe use when the eggs are related software to July for the Gibbs Sampler for you, but still, it's a very specialized tool. And it requires a lot of work compared to on the mock Petropoulos approach is like that. We just require you to be able to compute V log V log target and CT. So, there must be cases I'm pretty sure in the olden statistical models you considers, you may consider you can end up in situation where Gibbs samplers are the only viable option and ready to be deferred but if you're not in such a case where it's imaginary SMC samplers can be used in any other problem, where you will do instead of MCMC or any kind of Metropolis sampler larger nets for the mock, whatever your fancy.

Okay, and so, yeah, I think a good thing also to give nice nursing ideas, these and you started doing that week. So feel free now to dive deeper into that. Maybe give people an intuition of what the main differences between MCMC and SMC samplers is basically and that will also I think, give people an intuition of where, when and where SMC would be interesting further problems.

Okay, so that's that's a tricky one. But it's, it's, it's because my understanding of this has changed recently. The thing to see that when you use the SMC sampler at time t, when you feed some pervert particles, you have to move them according to MCMC channel. So we're using MCMC is on search. Okay, and we actually spent some paper with which I will add to, to show notification. I have this case on paper with, I don't know my former PhD student where we propose a slightly different type of SMC sampler called a whiskey SMC. And we derive the asymptotic violence and we get an asymptotic violence which is very much the same as you would if you will use the verb MCMC. Can you use that time? So you could say it's not better and that was MCMC? It's exactly the same right? So we're using MCMC. We're not doing better than MCMC or worse, in a sense, that what we gain is that we have as I said before, we gain visibility to do like an independent MCMC chains a bit bank construction. So that's good for paralyzation. And let's also, we also in this paper propose a way to estimate where synthetic violence, which is basically the Hallmark that doing this algorithm, you have almost like an independent chain so you can choose to give young political violence. And so the third point they make before Yes, essentially. So the way to understand what we do is that we define a sequence of distribution we go for with different distribution for the waiting steps to Java and we do this MCMC steps. So, yes, at every time t looks like we, we do MCMC but for instance, let's say we do homework metropolis, the big issue we work on the mock metropolis, you get very bad performance. If you don't get embrace the automatic steps. It's critical that you have your current particle sample that tell you exactly not exactly but many are very good ID of shape of the current distribution. And you can use that to calibrate your ondoc metropolis. So we are not doing better when MCMC but we we kind of get like the best of MCMC whereas if you do start our MCMC or even additive MCMC, but we'll be out to get that's my pitch. I don't know of obviously clear it is that question? Yeah. I mean, for instance, if you know Jackson, Titan basically, yeah, you need to do fetch him, you might have already interviewed him, maybe we've developed Blackjacks. So in Blackjacks, verb ID, the Westfield's MC already. And if you use this when you SMC sampler is going to be implemented more or less automatically. On the GPU is going to be 100 times faster than any MCMC just because it's it's parallel and you use the GPU you exploit this property. Yeah.

Yeah, that's really cool.

I'm so excited to be really to explain this properly, because it's a bit sceptre.

No, I think you're doing a good job at heart. And I mean, this is like these facilities here to introduce people to the to the topics, and then they can see if it actually fits, what they are working on, or what they are curious about these days. And then if they are differently, you refer to the show notes, or get in touch with me or with Nicola and we'll get going.

Yeah. So if I could add this story, but maybe we'll make clarify. This way this is I have this colleague Robin rider who is doing great work and applied Bayesian statistics and in linguistics, for instance, it's really nice to have actually you you might consider inviting him because his talks are was very, very interesting. And he gave his talk where he started. So I think we all agree that the only reason we do SMC is because the user is because Nikolas in the home we want to be to make him happy, otherwise, better. This is a program where we started to do MCMC and we never managed to make it work. So we turned to SMC and I played problem which was in the past, there was quite a dimensional and there was some structure but no obvious way to exploit it. And the whatever kind of MCMC they tried to use, impossible to get decent performance and in the end, this temporary SMC approach. And so at some point is just a way to define a second sequence of distribution. And when they managed to make it work, so needless to say, I mean, there are people who actually ended up with very our problems because they were Pistoia is eight dimensional. It's very hard to calibrate. And if you try, SMC is not really SMC that saves you is that using SMC you could use for instance tampering and we've done thing you can start at the deepish distribution where it's easy to move around quickly. And then you progressively reduce the temperature and in the end, and you usually really get something that starts to walk. You just want to say I mean we're still not so many people were convinced that arrived the example in the wild of concrete difficult problems where the claims I'm making and this show actually shouldn't actually older in this case, at least.

Yeah, yeah. No, definitely. super interesting. And yeah, so I encourage people who are interested to, to look at the shownotes and, and study all of these because I mean, I'm definitely super curious about that. And also read read about it because it's another way of getting to the posterior distribution. And so it's never very tricky. It's always it always takes a bit of time to get familiarized to it. So if you're completely lost, don't worry. It's completely normal. And it goes from time and repetition. And actually using those tools. So Oh, yeah. And actually before, before diving into when does SMC break basically different tiers right now of SMC actually for people who want to try out SMC Do you have any package that you would recommend? How can people try that out in their own work and projects?

So I've developed this Titan package called particles and I am an STP advisor. So I forced my PhD student to use it as this vacancy wherever the recommendation is not so great and stuff. And so as a PhD student, but also I have, I teach a course on SMC at the end zone failure. So I recommend the students to use it. And for me too, I can see every year of passion increases of documentation speaking. So yes, I mean, you can use my package and I developed it to I'm writing this book. I wrote with me hospital CPUs to use so it's was meant to, to recite the book but also a temporary non service waste VSMC samplers for different scenario. Alternatively, again, in Blackjacks, you have an implementation of SBS MC. So that's option so much, but that's the true implementation I'm familiar with. Otherwise, you will might find mostly code that people have developed for a certain project but that's the two Viale two packages I know which tried to do a favor, a general package you might choose in different scenario. Yeah. And he's on GitHub. And if you don't understand how to use it, feel free to raise an issue on GitHub, and I'll try to answer it quickly.

Yeah, yeah. So for sure. I just added black checks to the shownotes. And so yeah, when you have the time, definitely add your link to your package particles to them to the shownotes is that and like can people just Kondeh install it and or peeking slowly. Perfect to folks available on people come there. So install away, and we'll put a link to the GitHub repo in the show notes so that people can can look around and if you find any issues, have an one on the GitHub and if you know how to solve it, or are willing to learn how to solve it. Well, I'm sure Nicola will be absolutely thrilled. Okay, so that's cool. So people can use that. And then when question will be, so what are the current frontiers of SMC right now? So, what I mean by that by that is, when does he break and when should it be avoided? preferentially

right so I have this not so recent paper with a pear shaped curb and Alexander both house where we managed to make it work in like dimension 4000. So dimension is not such an issue. I just if you use specifically tampering sequences, I think. But I'm, I'm not sure what SMC some type of SMC sampler I'm talking about will work. So well if you add it dimensional problem with very separated modes. Because as I told you, you get the you managed to get the best out of MCMC but MCMC is not so good. At jumping from one region to the other. So, yes, mid multimodality still not something that is under the case really by SMC samplers just because we MCMC k&n, we're going to use are not so well suited for that. So when you start pulling at least at the beginning, you you move around between the different regions because suffering like reduced drastically, like the rewards between the different region. So you might manage to get a lot of particles everywhere. But then when so it will still be better than just a standard MCMC but not great. So that's one of the things I want to walk. That's one of my project at the moment to try to make smcu samplers work better. Marginally in for problems we have a posterior is far from Goshen. Right? Like, yeah, like because multimodality is so funny problem, but in the end of the day I mean, people always look at target distribution, which is a mixture aggression mixture. We don't really care about that. And I'm not sure what people produce always others so many ways that it not, maybe maybe not, but also you could imagine like banana shapes are weird shapes that are difficult to explore. That's still something we need to improve.

Okay, yeah, well, that's good to know. I'm guessing these are topics that you have. If you're working on then yeah, actually on that. I'm curious. In like The next improvement that you'd like to see on SMC what would it be

right that's exactly what I want to. I just don't want to reveal too much because it's the topic of a PhD project that might start in September, but yeah, I mean, very, as I said before, the very nice thing about SMC samplers is that you have a sample that you can use to learn features of distribution to better calibrate shock and CMC step. And I want to improve this part for nasty percentiles or nasty targets which by nasty, I mean, really weird shapes. I dimensioned, whatever, stuff like that. So I have some IDs and I have a I have a student who is willing to travel this community. So yeah, that's the kind of thing I'd like to work on. Yeah. Maybe also looking into machine learning techniques, like networks on flows could be useful too, but I'm not sure I mean, that's something we could look into as well.

I don't like kung fu. Movies.

I've activated it and I found no conflict here. We

Okay, so cool. Like, I mean, thanks a lot for that. Overview of SMC. That's super cool. We've never done that in the show yet. That our member kind of gives now almost 90 episodes for a long time. But that's awesome. I hope that will be useful to people. Pretty sure the show notes will be quite full for this episode, folks. So if you're interested, that person would definitely take a look. Now we're gonna switch a bit and actually talk about another method that you're familiar with. Even though you're saying Hatcher you're not familiar with that. I think people will disagree. And that's called INLA. So that stands for Integrated Nash nested Laplace approximations, and well actually have pending to invite one of the main imulast inventors whose name is Harvard guru. I'm working on getting him on the show, folks, and that's gonna help me but while we, in the meantime, while we're waiting for a Harvard in we'll do a full in low pursuit. Nicholas, can you give us the rundown about inline basically help me through what DC support what does integrated nested Laplace approximations mean?

All right, so the key Well, the important one is the Laplace approximation. So and it's Enlai is one of his fast deterministic approximation of a posterior so it's a completely different beast compared to something algorithms like MCMC SMC is not going to be exact the limit as the number of samples go to infinity is just some fast approximation you compute deterministic. And so you could also we could also mention a variational base or expectation propagation scalar feedback comes more from machine learning. What's nice about thing in life but it comes from it's really comes from some statistical understanding of your model, right? So you have you look at the spatial models, where you have a latent variable, which is crucial. And also typically that has Markov properties. So for instance, think about Italy, like France and you divide this to actually in Smith says like, cities are the pattern all and for each said you have a Latin viable, which is the capacity that this population to have a certain disease and so the Markov properties here to say that these latent variables, they are related, we are correlated, but you depend only on your neighbors. Like, if you're in Paris, it depends on the department have enough of bias and stuff of bias but not must say. So. You can encode this Markov properties value for some properties of your Gaussian distribution. And so your posterior is going to be the distribution of X you have your latent variables from having your special model and not so you might have some fixed parameters in a vector theta, which typically low dimensional. The big insight is that if you fix theta and if you look at the Laplace approximation of the posterior given theta for x is going to be pretty good actors. And this is because for this type of spatial model where you assume this cushion flattened viability travel Gaussian Markov random field now the title is quite informative. The dependencies between your neighbors we slow called a special dependency is really given by the pile and relax to use the data that you observe at each part of France, for instance, it's informative on the state at each location, but they start very informative and this dependencies so because of this, the Laplace approximation, works pretty well. And then in ELI, you build on this, you leverage this to do two. So you do read fix for fixed data, and then you approximate the marginal for event data and then you agree to fit the values. So I don't want to get too much into specific details. Because because it's all to do with what Blackboard and second I don't think so legitamate because, as I told you at Axanar it's, it's really over the rule that has developed enough over the last 510 years, and it should be would be a better position to explain a vis I mean, I wasn't being paid more or less because I was visiting about at the right time. And, as must be is IDs and is more feminine than I have with this type of nice special special topology where you could use it now with the big thing to understand is that in law works wonderfully for certain class of models. And you understand that if you really have some good statistical understanding of estimators, which are called Gaussian Markov random things and they are so nice extension based on the CDs and stuff like that, and so, you have to read all this paper by hour to make sense of this. And also what is beautiful and that is better. There's some influence on me and my says, I have understood from the start that this method will catch up early if you develop some software behind it, which will be really plug and play for practitioners. Right. So you you have visibility to this, grab your model for two or three lines of R. Then you have this underlying image engine that you run and it will compute intimately pre your, your approximation is super fast. And this is modeled where MCMC Gibbs Sampler that have been developed for some time but compared to this verse just went on red went fastest times slower. So there is no they're not. They're not serious, tentative in that SPECT. So that's really neat. So I'm a big fan. That and I have to be honest, again, I'm not the best person to explain especially the recent progresses, that has been made. And what is really important is that software has been developed, which is super user friendly. And it works very well for certain models. And it's not a universal approach, which is why I like it. too, because it was very well. It is a good understanding. No, I think like, why it works so well for certain motors. And this is

yes, no, definitely that I think that's already useful. And I already put the original inlab paper in, in the show notes. And I actually I think that'd be useful for people. So if you know of any good tutorials about inlab first to put that in the show notes and second link to the software that you're talking about so that people who want to try out in law can experiment with it.

The documentation of Fina is really beautiful. And there's all these models for each model. And there's an example there's a card on the so it describes a bit like I don't know if you're familiar with with a windowed manner, which is pretty famous because it used to have always nowadays and data sets example and for each example, it will give exactly the the index code to implement it. So we look at documentation that is the same and that's very convenient. So for practitioner, if your model has already been implemented, you just copy and paste.

It's perfect. Yeah, so definitely put a put a link to that. In the show notes that perfect

books to have out recently that people should check out I will try to I will add the links to his books.

In the show notes for are going to be awesome. It's cool. Anything you want to add about in law that you didn't mention and feel like you should

No, not really. I would like to add it I mean, I don't feel like I'm an important contributor to elaborate on VR and I really appreciate that working on this gives me a different perspective on the agent competition and got me interested more in deterministic approximation to this kind of climatic reports are not so the idea of offering software if you don't provide software for a new method that people might never use it etc. So it had the very positive impact of my research, I think.

Yeah, I completely agree with that. It's definitely super interesting to have the other part of the research which is more deterministic approach instead of approximations like MCMC and SMC it's, it's I mean, if we can really have both sides, that's really awesome because then it will make babies even more versatile and able to adapt to a lot of different circumstances. So definitely Moline for that. And that's why I think it's important to cover it on the show also because personally I don't know a lot but enough for instance, I'm used and way more used to approximation methods, of course, especially any flavor of MCMC SMC already something that I use way less. But, I mean, I've read more intuitions about it. But xinlei Something that I still need to learn in wanna learn is it's kind of a different paradigm. So yeah, definitely. Definitely curious about it. And yeah, then then the software parties yeah, as you were saying, to me, extremely important. That's also why I think open source software like Stan or PI MC are extremely important because they allow you to basically outsource the heavy lifting math part that just a few people can and want to do. And then for people who are more interested in the code or who just are not very math prone, while they can cook their model and leave the mash parts, which are extremely complicated to people who actually enjoy it and are really good at it. And that's why instead of making your own samplers using centers that have been waterproofed by the open source development workflow, is actually extremely important. And then, like it's multiplier of power, in a sense, because I mean, I don't think I could work on all the on all the models I'm working on right now. If it were not for PI MC, for instance, because when I'm not a mathematician, so if I had to develop my own algorithm takes way longer, and I would end up with algorithms that are way less efficient than those I can use right now. So that's cool. I think we're getting short on time. So before I ask you the last two questions, I want to talk a bit about expectation propagation because that's one of the topics you also work a lot on and so yeah, what can you tell us about expectation propagation? I think you in particular have a paper recently about that. I liked the title, leave peanut engines alone, and it's about binary regression. I put that already in the show notes. So yeah, like, tell us a bit about that. So that people get an idea of what expectation propagation is, and why that would be interesting.

If you don't mind what I'm going to do is to describe briefly this paper because it's so what we have is super bright, super efficient PhD student, James Ridgeway. And at the time I had this honest question which was every time I open a paper on the new method and MCMC the first numerical experiment is logistic regression, or public regression. And also, there are some people who develop similar Gibbs Sampler specifically for this problem. So I was thinking at the end of the day, which of these methods will work better actually for HD correlations? So we want this paper which was tentative review of all the methods you can use to compute a posterior ever Monte Carlo so exactly vermi or approximate including expectation propagation. And I will put it would be nice because people are in stats are not so familiar with expectation propagation because it comes from machine learning. And so one point we so I think to make it short, sorry, but two points I want. In the end, we wanted to make his first what's the point of developing a Gibbs Sampler for a given problem, if more of a generic approach even a condom multiple parties works better? Because we realized that for the data set that people were looking at the Gibbs Sampler were not definitely better, but again, they are more work because they are specialized algorithms. So what's the point? And the second point, we wanted to make is do guys we are visiting mystic approximation, even Laplace if you want. Okay, you want to ignore it, but if you use it, only to get the first idea of what the posterior looks like, and maybe to get a break, you MCMC sample it might make it such a huge difference. So we have a bit of an eye in the end of the day. Initially, my main motivation for this paper was to complain about the lack of benchmarking culture, like people just are not we are going on so they don't compare to anything. So that's one point we want to make. Then the second point, third point was visa mean, glue guys are developing a deep sampler for this problem. Are you comparing to other methods and people not? All we do, we will compare two basic MCMC sampler but they will not calibrated properly using this approximation. A good thing to do and it's so that was the point of the paper and if listeners could ever look at the paper and yell at me if you completely disagree, you will not be the first one because the paper is made not controversial, but that everybody was happy with our findings. And anyway, if we could start a discussion in the community about or do we compare go with what is the best approach? If you have two approaches? Can you make sure that if you're on the porch is more space available, go and walk walk, you get actually better results. If you could also be a bit more open minded about different methods like expectation propagation, because then even if you dismiss it as something approximate at least you could use it to even just draft your chain, you know, you have a good starting point. Why not? Etcetera, etcetera. So that's the point we were trying to make in two minutes. Please have a look. At and for tomatoes at me later if you want but I won't be mad. It's a I mean, I've stopped talking about this paper I used to give talks about it for some time. That if I could have if this paper could have some impact on the community, that would be great. Reaction. Some people like paper, but I also know people who don't and this the equator.

I mean, if you're looking for reaction, the fact that people don't like the paper it's actually interesting, because that means at least it's it's start to debate, which is which? It seems that you need something you want so Exactly. That's pretty positive.

And in a few words, do it not really say what is expectation propagation. It's a way to compute essentially a Gaussian approximation to your Prestea except it's a bit smarter for Vyas reason Vana. Laplace is going to give you a better Gaussian approximation. And it relies on the factorization of the posterior. So if you have an independent data points, you might be able to do it when VP is a URL, this particular technique to date, your, your local approximation for given site and that's model dependent, that for Kobita login we have a very nice solution. So you can Okay. The PISA is a very curious algorithm. I don't pretend I understand it fully. And I can give you references to read some papers that give a better understanding. But it's pretty neat. And it works. Well. It was.

Yeah, so this is more related to expectation propagation is something that's more related to inlab for instance, that's the kind of deterministic algorithms then to SMC and PMC. Which are more approximation algorithms. Yeah. Okay. Right.

Related device on base.

Okay. Yeah. Which is also a topic you're working on. Awesome. Well, I think these these three amazing and I've already been kept you a long time, so we can call it a show. Thanks, Nicola. But of course, I'm gonna, I'm gonna ask you the list of questions. Don't worry. It's not finished yet. So well. So yeah, first one is if you had unlimited time and resources, which problem would you try to solve?

Well, I went home. I like to work on many different things. It will be a nightmare to work on one problem. I don't know. I mean, I have to make a confession. I never I mean, I ain't playing to ground so stuff like that. is mostly because you have to commit yourself to something and I could change my mind tomorrow and work on a different problem. That's what's different. I never know what I'm working to work next and I don't know I don't want to correct myself. So.

I mean, that's what I love about academia and research is that

I mean, you can always work on something else and you never know what's going to come next. I like something to

know for sure. Hey, nothing in the question, though, says that you should absolutely commit to one thing working because you have unlimited time and resources. So you know.

I mean, I'm always excited about the project I'm currently working with Vietnam will work on something else and they will be etc. etc, etc. But they said I've already said in the show, like, trying to make SMC simpler or better. That's one of the things I really want to

do your passion. And second question, if you could have dinner with any great scientific mind that alive or fictional, who would it be?

Better not to? Okay, I didn't get mine. I can I can have different I can mention several people in Bayesian computation, shall Geyer is someone who wrote a lot of very nice papers. And because he is of another generation has never met him and be curious to meet him. So shall Gaya reason one. I love some speakers. Same old, same story, but most people might have not heard about him. Similar other what two very nice papers in the 60s about Monte Carlo, which are kind of forgotten. Oh, that's very sad. So I like to beat him and tell him that. I've worked recently on stuff that is a bit like an extension of his work. And I mean, anyway, to make sense of what I've just said, you might want to have a look at the introduction on the paper for integration and MonteCarlo. I mentioned at the beginning. Otherwise, maybe Alan Turing, of course. But it was very shy, apparently. So sufficient, sufficient minds a female mathematician that she used to live close to, to the French Revolution, which are very interesting time in history.

But very interesting, very strange, interesting life. So to my official, extremely, extremely brilliant woman, who of course had difficulties doing math at a time where it was difficult for women to work in anything

that sounds familiar was letting them do it because they were kind of where we were anywhere else. It looks like the interesting character and so I'm a bit fascinated by the future of clean because it was 2000 years ago, we don't have I'm not sure if you could talk at all in any way, but 2000 years ago, we already figured out completely geometry just like twice already close to 2000 years ago. I find it fascinating myself, always. Greek philosophers and mathematicians, they forgot so much. And that's always been true. I mean, there's always found it fascinating. I'm not sure we're communicating in a relevant way that people I respect and actually talk to to be honest and just

Yeah, well, super interesting person like the person you're competing cleaners. Feel free to invite me. Okay. Novice how to fake goodbye. Eon sweet. It is already yours videographer or doula piece for this? Okay. All right. So I think it's time to call it a show. Nicola, thanks so much for coming in the show. That was really fascinating. I learned a lot and be tired because I had to think a lot of what I need to eat something. But yeah, thanks a lot. I'm sure this was also challenging and interesting for listeners, as usual, and we simply call it during this episode. I put resources and a link to our website in the show notes for those who want to dig deeper. Thank you again, for taking the time and being

in your experience to be with me nice. Thanks. Give it

Okay, so my no

question about this

is your fishy export the things they don't do well, okay. You know, St. George's, so to promote E to simulator from sales signed to ensure PCM

does got the artist name type data, okay. Yeah. Man we say to never get beat down cat. So

you know, up in flux theology straight to mobile computer with Google and I thought books can wait several minutes at a sample by mail. Me scuba content you don't you move well. Thanks.

said there are 100 the turkeys implanted key because

no cell

was received

in the way oh oh see truly unfortunate session we are limiting a repose good. No, not only visit us for more episodes, get one and do. Look at how long is it? Yeah, two doors are more senior country pod is off to

a great college is today the Dakota pass because you just had to serve the songs from one warm for my blast. So

anyway, cell density se Hello Cal grown up early on sees like getting paid on call me you know. We can take on

the complexity of on passive resume.

Okay, okay. Okay, good. This etcd routine activity.

Okay. But then add on to it pursuing Professor improvisation and did not get that point in detail to say Oh,

well. Well

player voices read your

books as well where the fee is what they could took

to buy the

ship out then yes, you will cool go on there. API bases for years.

Come on look confused within semitone. This confuses a polytechnic example. My sweat sweat they play up on your digital device repaired and they made ESP bags on

excuse Ecomondo. I mean, USA will sit for

the exam. As the PICO lead sample I like you're trying to scare RTP this comfort zone where you're where you're learning the uncle and Uncle preacher problem and you know natality or the travel example on fairpoint compassion for us in which you're

right No no, no. No, no D VGT. I should say a virtual city just to put with dignity.

To the kanessa one person with that sense of

where we shall see ocgt The NFL University to fulfill the chance to teach your commercial don't do some people do more. Do Lists also. We're going to step down on computer we're listed on the main cafeteria. I don't think there are any burner Clyburn computers that show the past perhaps the most renewing stuff going on or someone any of them ammonium. You're not able to do much on national normal to say prison one, two or three. A bunker pod ADR or open source consumer noise Ischia schema book would be the continuum for me company Romani you get more and more data with regular click

to connect some data Natasa some a good early on forensic you can get this just for you. In Ephesians 400 fell in tears on the ICRC center receive permission. says this is a super tragic talk himself or Sarah confirmands resiliency.

Well it knows everything about epi you don't know. Meow, more product service usually for most retail turnover formation was truly to the region Mr. Data sends to an insane should we definitely you do

no issues shouldn't be the minute any poker should get a clip with basil and sage up Tony the rounds into torturous age typical the Bayesian instead

of totally by yourself or slippery person per se part of it. But like sounds like

let's say you want some time at least medically frequent laboratory frequencies to see some time to

focus on Tommy John confidence. I want to start the semester. class at Christian promosi Magna have a push from the doctor homie we'll see denominates up to Mr. Boo say collector certain voice up down command

equal to less me V bonk watch to

make good will still go way up to come to convince them suffering okay so now Cyril grew and he got up

Transcript

Sign up for our newsletter!

The latest from Reverend Bayes directly in your inbox!

QUICK Links

Get in Touch