#150 Fast Bayesian Deep Learning, with David Rügamer, Emanuel Sommer & Jakob Robnik
• Join this channel to get access to perks:
https://www.patreon.com/c/learnbayesstats
• Proudly sponsored by PyMC Labs! Get in touch at alex.andorra@pymc-labs.com
• Intro to Bayes Course (first 2 lessons free): https://topmate.io/alex_andorra/503302
• Advanced Regression Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Takeaways:
Chapters:
00:00 Scaling Bayesian Neural Networks
04:26 Origin Stories of the Researchers
09:46 Research Themes in Bayesian Neural Networks
12:05 Making Bayesian Neural Networks Fast
16:19 Microcanonical Langevin Sampler Explained
22:57 Bottlenecks in Scaling Bayesian Neural Networks
29:09 Practical Tools for Bayesian Neural Networks
36:48 Trade-offs in Computational Efficiency and Posterior Fidelity
40:13 Exploring High Dimensional Gaussians
43:03 Practical Applications of Bayesian Deep Ensembles
45:20 Comparing Bayesian Neural Networks with Standard Approaches
50:03 Identifying Real-World Applications for Bayesian Methods
57:44 Future of Bayesian Deep Learning at Scale
01:05:56 The Evolution of Bayesian Inference Packages
01:10:39 Vision for the Future of Bayesian Statistics
Thank you to my Patrons (https://learnbayesstats.com/#patrons) for making this episode possible!
Come meet Alex at the Field of Play Conference in Manchester, UK, March 27, 2026!
https://www.fieldofplay.co.uk/
Links from the show:
David Rügamer:
Emanuel Sommer:
Jakob Robnik:
General references:
Today, we're going deep into one of the hardest problems in modern machine learning, how to scale Bayesian neural networks to truly high-dimensional settings without giving up
uncertainty or computational sanity.
My guests are David Ruegammer, who leads a research group at LMU Munich working on uncertainty quantification and deep learning, together with Emmanuel Zommer and Jakob
Robnik,
two PhD researchers at the core of this work.
In this conversation, we unpack why Bayesian neural networks still matter in a deep learning dominated world, what uncertainty quantification really buys you in practice, and
where standard approaches break down in high dimensions.
We also talk about applications, community-driven tooling, and the long-term future of Bayesian deep learning, where it already outperforms standard neural networks.
what's still missing for broader adoption, and why there is real optimism that scalable Bayesian methods are finally within reach.
This is Learning Bayesian Statistics, episode 150, recorded November 20, 2025.
Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods, the projects, and the people who make it possible.
I'm your host, Alex Andorra.
You can follow me on Twitter at alex-underscore-andorra.
like the country.
For any info about the show, LearnBasedStats.com is Laplace to be.
Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on Patreon, everything is in there.
That's LearnBasedStats.com.
If you're interested in one-on-one mentorship, online courses, or statistical consulting, feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.
See you around, folks.
and best patient wishes to you all.
And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can help bring them to life.
Check us out at pimc-labs.com.
Hello my dear Bajans!
Before today's episode, I wanted to let you know that this year, we'll be talking about Bajan modeling in soccer at the Field of Play conference in Manchester, UK on March 27,
2026.
So if you want to meet me, you can come there uh in the audience if you want, but also as a speaker, because we have already locked in most of the speakers, an announcement coming
soon.
Stay tuned and follow on LinkedIn.
Last year we had speakers from baseball, cycling, education, fantasy sports, soccer obviously, because it's Manchester, and that mix honestly genuinely raised the level of
conversation.
The theme for this year, 2026, is communicating complex ideas, how do you take something technical, nuanced, uncertain, like models, abilities, trade-offs.
and make it understandable and useful for people who are not data experts.
Like last year, we are opening up one of the final speaker slots.
So if this theme resonates with you or someone you know, whether they work in football or somewhere completely different, feel free to contact me and I will take a look.
And in any case, you can already buy...
your tickets, go to Field of Play's website or LinkedIn page and you'll have all the information there.
I'm really looking forward to seeing you there and well, I will for sure have some LBS merch with you, so please come say hello and well, come to my talk also so that then I can
say that the room was full, because otherwise, I don't know what I will do.
Thank you so much, people, I will see you there, and now, let's go on.
with the episode.
David Ruggamur, Emmanuel Zomer, and Jacob Rupnik.
to Learning Basics and Statistics.
Thanks.
No, you bet.
Thanks for taking the time, Emmanuel and David.
It's late for you in the evening, so I definitely appreciate it.
And actually, was recommended, I mean, you were recommended to me by a listener.
Of course, I'm forgetting.
his name right now but it will come back to me during the recording.
um So we'll get back to that and I will thank you properly at that time.
But in the meantime, let's start with you guys.
As usual, em we're going to dive into what you're doing because you're doing really some fascinating stuff, the three of you.
um But first, your origin story.
What are you doing nowadays?
How did you end up working on that?
And also lastly listeners, please forgive me for the weird voice.
I am a bit uh Sick today, but I have uh team a dream team today to record with so I could not cancel on them.
So David let's start with you.
What's your origin story?
What are you doing nowadays?
How do you end up doing that?
Yeah First of all, thanks for having us em So I think uh Rafael was the one
that um brought up the connection.
um And yeah, what I'm doing nowadays, I'm a professor at the statistics department of LMU Munich.
um There I run a lab called the Munich Uncertainty Quantification AI Lab, short munich.ai.
I do teaching at the department.
I'm teaching statistics courses now and then, but
My main duty is to teach deep learning, all sorts of deep learning, applied deep learning, the foundations of deep learning.
um And I'm also like a principal investigator of Indie the Munich Center for Machine Learning, which is I'm very grateful for because they, they bring up, they give you so
much like network uh funding, et cetera.
So it's actually one of the six big AI centers in Germany.
and being part of this is uh pretty amazing.
And yeah, how did I end up working on that?
em I did my PhD in statistics, actually.
And during my PhD, I already found these uh algorithms quite intriguing and quantifying their uncertainty.
And that's actually something that then sticked and then, yeah, I'm...
At some point I started working on deep learning and nowadays, that's my main focus.
Okay.
I heard it's a pretty popular focus lately for some reason.
don't know why.
Emmanuel, what about you?
Yeah.
So, well, my background actually is in math.
So also in Munich from TUM, however, now I switched sites to Alamu, to David's lab where I...
currently do my PhD and with a very strong focus on basing neural networks.
So uncertainty quantification, quite generic networks with a particular focus again on sampling, which also brought up the connection with Jakob.
And, um well, I did actually a wide variety of stuff like probabilistic forecasting and financial domains, learning to rank in the industry before.
uh well, the common theme always was kind of the fascination with probabilistic.
machine learning methods, would say.
So quantifying some kind of uncertainty, I guess.
And, however, also the time as a practitioner, in fact, taught me that I can actually motivate myself much better if I envision my work to be of practical value.
So that's why I kind of try to fuse the probabilistic rigor also with a little bit of practicality, I would say.
And I think that's, I guess, quite defining of my work now.
Yeah, I would say the same from what I've seen from your work preparing for this episode, but we'll dive into that in just a few seconds.
First, Jakob, what about you?
I am actually a physicist by training and I'm currently a PhD student at Berkeley.
I generally work on different problems in statistics and physics and astronomy, basically whatever is data-driven.
And I'm recently also been developing MCMC algorithms and that's how we met at the conference with Emmanuel and we started collaborating.
Okay.
Okay.
So you met, you all met at a conference and then, and then boom, this, this happened.
Awesome.
Yeah, I love it.
Nice, nice origin story.
Well, maybe one of you, em can you just give us a big picture of review?
of your group's work.
Actually, you guys are focused mostly on Bayesian neural networks.
What are the overarching research themes?
Who wants to tackle that?
I can at least say a of words of what my group is doing.
We're not only doing Bayesian neural networks, but also things related to optimization, optimization, sparsity and...
Then uncertainty quantification are the two pillars in my group and we try also to combine that.
And um the idea is usually that this somehow helps in understanding neural networks better.
So not from an integral machine learning XAI point of view, but more like um can we understand their learning dynamics?
Can we understand uh what the optimal solutions of those is?
then also most
maybe most importantly, what is the distribution of the parameters of these models?
And that's exactly where the Bayesian networks come into play because you get the distribution of the parameters.
And by that, you can understand a lot even beyond the applications that I think we might also talk about later.
um by having the distribution of the parameters, you can sort of already maybe understand better what is going on.
in these models.
And yeah, that's the big picture in the group is actually, as I said, one of the pillars is the Bayesian neural network is the uncertainty quantification.
And ideally, at some point, we were able to, I don't know, get big transformers or big, big other neural networks to give us the uncertainty for any prediction task immediately
without any
any overhead, any computational overhead time.
um then so you can have in any AI application, you get the uncertainty essentially then right away.
Yeah.
Yeah.
Of course you're preaching to the choir here.
That'd be absolutely fantastic.
um How does it work?
like, yeah, what does it mean to make Bayesian neural networks fast?
How did you...
start doing that and what does it mean concretely?
I m think Imala can maybe uh give more details about that, but I can maybe briefly mention that when we started there was actually a couple of choices and we found that this is uh
super slow, everything is super slow.
And then um at some point we realized that JAX um is notably faster, at least when you sample these Bayesian neural networks.
m
um I don't know, like Immanuel can maybe give you bit more details about that.
Yeah.
Well, basically we're still talking about, maybe we can talk about comparisons with other uh competitive methods or other approximations.
um But like focusing on the sampling point of view first.
Yeah.
uh So I basically got the idea for Jax from David.
think before starting in his group, I've never touched it.
I've heard of it, but never touched it.
and it worked quite well, like a charm.
um But then in the first project, I actually went for NumPyro as um kind of the sampling engine with these classical samplers implemented.
And in fact, it was actually super frustrating for me because uh it was reasonably fast, but not very modular.
And I couldn't really get my hands on the core algorithms.
And also some quite
small details were actually not configurable, which made a huge difference when you actually want to scale these things up.
basically what I did quite fast is switch to Blackjacks.
Then it was quite a painful refactor, but it was well worth it.
It's much more modular and I felt like it was really good.
Again, shorter to Rafael, by the way.
He mentioned Blackjacks the first time to me.
because basically there are some tiny things that actually make a big difference if you want to scale this up like memory management, like really boring stuff.
For instance, you would not want to carry, let's say, 100 samples of ResNet 50 models in your memory, which blows up quite fast, but you actually want to implement, be able to
implement callbacks, let's say, to actually save samples at each time point.
which is very unusual for the classic OBS, I guess.
But if you work with neural networks, this actually makes things much faster.
So there are actually little tweaks that kind of ah make this faster.
But ah of course, compilation with JX works perfectly on GPU, TPU and so on.
That's pretty nice.
But that's only one ah major pillar, of course, like software is important.
But you also have to kind of challenge these classical methodological um
stances that are uh well established in classical Bayesian inference.
For instance, for us, some big steps were a parallel first budget allocation.
So actually parallelizing a lot of things, having rather short chains, putting a lot of emphasis on ensembles, uh using hybrid optimization techniques, like actually getting a
lot of information from optimization using warm stats.
cleverly.
mean, these ideas are not entirely new, like Aki, who was also on the podcast with you at some point, I think wrote a paper in 2000, where he kind of sketched that roughly with
Wormstead.
So it's nothing super fancy, but you actually just have to put all of these things together.
And finally, you also have to have a powerful sampler.
And that's where kind of Jakob comes into play.
um That also makes a big difference.
Yeah, okay.
Yeah, that's really fascinating.
uh And thanks for doing my job, Emmanuel, because yeah, actually, Jakob, I think it's a perfect segue for you because if I understood correctly, you worked on something called
microcanonical Langevin sampler.
I have no idea what that is, so you're to have to explain it to me.
uh
And why did it play a critical role in scaling these methods to very high dimensions?
Yeah, right.
So um there are actually two things here.
One is that, is the actual dynamics that the sampler moves according to.
And the second is how you actually implement numerical integration.
um So the first thing that we introduced was, so, you know, the standard sampler MCMC in these kinds of settings is Hamiltonian Monte Carlo.
um
where you have some Hamiltonian dynamics that makes your samples move very efficiently uh and make it less correlated.
um And it's some, the gold standard.
uh So what we changed here was instead of Hamiltonian Monte Carlo introduces, we call it micro canonical Hamiltonian Monte Carlo, Langevin Monte Carlo, which is based on a
slightly different type of dynamics.
The key difference somehow is that it's velocity of the chain.
is fixed throughout the sampling.
um And this makes it somewhat more stable where you have sharp transitions in the likelihood and so on.
The velocity doesn't just blow up and it keeps constant.
This is makes it more stable.
It also acts somewhat more deterministically.
so this is one thing, but it's actually, it's not the main reason why MCLMC, our microkernelic enlargement, when the color is so fast.
um The main reason is actually how you do integration.
um Because uh when, for basically for either for Hamiltonian Monte Carlo or for microcanonical Monte Carlo, um you have some dynamics and uh these dynamics in principle
are designed in such a way that you get correct samples.
But in practice, then you have to actually numerically integrate these equations to uh get your samples.
And this numerical integration gives you some error.
um And this translates to bias of your samples.
And the way this is usually approached is that you treat numerical integration as a proposal in a metropolis-hasting scheme.
So this is a scheme where you do integration and then you accept or reject as a sample.
um And this accept-reject probability can be determined in such a way that you remove the bias.
And this is what people typically do.
It's a great idea, but it has a problem in high dimensions.
that you need to have a increasingly smaller step size as you increase the dimensionality um to keep the acceptance probability high.
And so this is our main innovation that we actually do not do this metropolis uh correction, but in steps we have an alternative scheme that controls the bias to be small
enough so that it's negligible compared to the other types of error.
um And this is what really allows you to keep a constant step size when you scale up.
Okay.
And so having
So having this constant step size is really what gives you the most important improvement on the sampling velocity?
Yes.
So, you know, step size determines how fast do you move.
And if the step size is low, then you have big correlation in samples.
But with this scheme, you can actually keep a constant step size, don't lose efficiency with higher number of parameters.
Okay.
But how is that?
So how's that...
What's the difference with the step size from the Hamiltonian Monte Carlo then?
Because if I understand correctly, this is doing the same thing.
So why does this in this case make such a big impact on the neural network sampling?
So in principle, Hamiltonian Monte Carlo could can use the same scheme.
uh just usually doesn't, but people usually complement it with the metropolis test.
So yeah.
What we did was find a scheme that makes this possible.
And it's also applicable to Hamilton and Monte Carlo.
Okay.
Okay.
Super interesting.
Emmanuel, it looks like you wanted to add something.
This with a few errors that are quite prominent in Bayesian neural networks to kind of put it together.
At least that's my view on the whole thing, because that's one of the major pillars uh as Asiakov just mentioned.
And then to also put it back with the main key ingredients to actually make it fast and valid in the very high dimensional settings is uh also the Hamiltonian.
uh For instance, this Metropolis Hastings correction also requires full batch.
So full APOC through your data loader, which is not feasible for large datasets that you encounter in deep learning.
So that's basically a no brainer.
Actually, there was a really good paper from 2021.
Vincent was also involved, who was also here on the show, that showed that actually this acceptance probability then in these very high dimensions goes to zero.
So that's a very hard result in that sense.
so Metropolis Hastings Adjustment doesn't really work.
We can control that with the sampler.
But then if you think about the approximation error in the sampling of page neural networks, you also have the initialization error and the Monte Carlo error.
And these two then tie together to these key ingredients that I mentioned before, because if you then warm start with optimization, which is really good, because we have known
recipes in the community, everyone knows how to, by now, at least to train a resonant or any type of neural network by now, almost, you've got pre-trained models everywhere on
hugging face and so on.
So it can actually start with a very well-researched foundation.
And this almost eliminates or...
greatly reduces your initialization error.
Then you still have the discretization error, which we tackle with above-mentioned approach.
And then ah we then also have the Monte Carlo error, which is basically just the error that occurs from not having an infinite compute budget.
And that's where basically being smart about your budget allocation makes a huge difference, especially in this multimodal complex geometries, where we then use a lot of
emphasis um for our ensembling.
And then we enrich this via very flexible local approximations.
uh Basically what means we use rather short chains.
So that kind of puts it all maybe in context.
Yeah.
Okay.
Yeah.
Thanks, Emmanuel.
Very, very helpful.
And so I think you guys kind of answered here and there that next question, but I'm curious if you can make like...
a summary maybe of how, like what were the main bottlenecks that you encountered when doing the scaling work of Bayesian neural nets to high dimensional parameter spaces in,
well, the how you overcame them we just talked about, but what was the main list of main frustration?
I would say that I'm guessing a lot of practitioners encounter, are encountering right now without your work.
mean, I can try to answer, but I think Emmanuel also and Jakob.
to, I think, to some extent, actually already answered it in some way or the other.
So, I mean, first of all, we, think the software was one of the bottlenecks that um if you use PyTorch and then you do sampling, that even in very, very, very small neural networks
was 10 times slower, even maybe even slower than 10 times slower.
um And then this is kind of...
a bummer because if you work with, I don't know, a single hidden layer, three neural network, and you have to wait for a day to sample it, um which is actually still the case
if you, so you can ask JET GPT to write code for sampling, for sampling of Asian neural network.
And if it does that in Pyro um and you run it on, I don't know, Google Colab, then it will probably take uh quite some time and you will
maybe ask yourself whether um this is actually working or maybe if you want to give up.
And there, I think it's very important to know the software to maybe switch to JAX.
I think you can still make it work with PyTorch, um but then, um yeah, I think JAX can be faster.
So why not use JAX?
um I think there's also a lot of other communities now potentially.
switching over to Jax.
um Maybe Emmanuel can say something about this later.
um And once you have this hurdle, you manage this hurdle, the software hurdle, then I think there's a couple of other things that um are challenging in Bayes-Noor network
sampling.
um The standard stochastic samplers don't work off the shelf.
This is um pretty difficult to tune them.
um So you mean something like the nut sampler here, right?
be clear with listeners.
uh Well, the nut sampler, I think in most cases is, I guess, used with a full batch setting where you feed in the whole data set.
But then there's stochastic samplers that only feed a small part of the data set into the sample based on a smaller part of the data set.
And then um this is...
will become quite unreliable and also it requires hyper parameter tuning.
So this is not something that, at least there are not these defaults that are available maybe like for Adam and stuff like that, that where you can just run it off the shelf and
it will work.
um And I think we put in some effort also into this.
mean, the one thing that the model mentioned was this initialization thing, right?
If you initialize already in a good region, then this is another
um can be another big bottleneck or big hurdle because some templates are not necessarily good, at least from our perspective to do the job that an optimizer does to traverse the
weight space and find a completely different topological space, um like terms of like weight distributions, completely different weight distributions.
um It can do that, I think, but then in practice, maybe
step sizes play a crucial role and stuff like that.
And so, at least from our experience, was way more efficient to, as Imal mentioned before, start your sampling from a very good starting point already, from a maybe optimized neural
network already.
then you can also overcome this hurdle that the bottleneck that the sampler needs to find, first of all,
something needs to find the typical set and to sample from that.
And it has much easier time to do that, um to find the typical set if you initialize it um the right way.
Okay, so here very concretely, that would be, you would feed your neural network in a classic way with the Adam optimizer from PyTorch, for instance, and then use that as the
initialization.
of your Bayesian neural network sampling algorithm, uh if understood correctly, not with NUTs and with another package probably based on Jax.
How wrong am I?
No, you could also use NUTs.
You could use any sampler.
I think they will all profit from the fact that you initialize these in the right region.
um But then, um as Jakub said, uh
uh MCLMC based samplers could be more efficient in high dimensions.
um mean, NUTs still work.
So we did that.
And you get similar performance um in these Bayesian neural networks, at least in maybe not super large Bayesian neural networks, but um practically relevant Bayesian neural
networks.
get a similar performance with NUTs than with uh as with the um MCLMC based sampling.
But then um it's lower.
It's notably slower.
Yeah.
Okay.
yeah.
So actually that was something I wanted to ask you later, but since we're on that, um, and maybe Emmanuel, because I think David said you, would, you might have things to add to
that.
Um, like concretely, how can today, how can curious listeners try what you're talking about right now?
Like these methods that you've developed is there.
a reliable Python package they can install and play around with?
Yeah.
So basically if you want to check out MCLMC, then Jakob and his crew has done a tremendous effort to put that into Blackjacks, which works really well, as I said before.
But at the same time, maybe Blackjacks is a little bit more for the tech savvy guys, right?
Because it's quite configurable.
um
But what's actually quite nice about the whole JAX ecosystem is also that it has this functional character, right?
And this actually serves the purpose of, ah for instance, sampling a Bayesian neural network really well, because if you think about it, what you need for gradient-based
samplers like MCLMC or HMC is you just need to evaluate your likelihood, your neural network, get some gradient out of this.
And this is the same thing that you need for optimizing your neural network.
And if you just have a set of uh parameters, let's say it's, can imagine it as a dictionary, you can just handle this through the whole pipeline.
You just use your classical uh Optex, for instance, your optimizer, you just optimize your network.
And then you do the, just plug a take out basically in each step, the same.
uh
tree of parameters and you just put it into the next step, whether it be an optimization step or a sampling step.
So it basically kind of, you can play around with your parameters.
You can even uh switch up things right in between, like for instance, ah cyclical SGLD would do.
So you can actually be quite flexible.
And I think this functional approach to um sampling and optimization also teaches you that both of these approaches are kind of similar and also share a lot of.
things.
actually you can smooth out this hybrid approach can also be done in a very smooth way, like transitioning from a non-tempered ah or highly tempered, so basically optimization
phase into a non-tempered sampling phase, for instance.
So you have a lot of ah things to do.
And I think like, yeah, Jax and Blackjack are a good ah combo.
And maybe also we will publish some code.
which is more user-friendly than our usual research code.
Also soon for PyTorch lovers, I would like to advert this Sam Duffield's Posteriorus package.
um This comes with a lot of ready-to-use samplers, especially for the stochastic case, so they scale up quite well.
They also come with some variational or approximate Bayesian optimization-based methods.
uh We also use
these methods a lot for our comparisons because that's just a well-maintained and uh easy-to-use ah package for Python.
Soon also, uh our group will probably release a hopefully very accessible, very user-friendly, sklearn compatible package um that is tailored to the tabular case.
ah
classification regression and that implements basically this uh hybrid Bayesian deep ensemble um approach together with an MCLMC based sampler.
um that's maybe my two cents on that.
Yeah, this sounds really cool, really, really practical.
I love that.
We definitely need, as soon as you have these package available that you just talked about, send that to me.
and I'll make sure um to post that around in the LBS universe.
You have a lot of people here who I'm sure are waiting for that.
And also if you can already post the links to um at least the Posterous Package from Sam that you talked about in the show notes for this episode, that'd be great because it
sounds like it's mainly...
the main way of interacting with these methods.
Like the fastest and easiest way to interact with those methods would be there.
And then I put the link to Blackjacks already in the show notes.
if you guys, if listeners wanna dive into that, they already have Blackjacks and Posteriors.
And once you guys have your package, and I'm guessing some um notebook examples, we'll make sure to...
uh
To add that to the show notes apposteriorly, but also to make sure that people know about that because I think it's extremely helpful to distill the great work you guys do at the
research level and just decimate it at the, at the practitioner level.
So yeah, thank you for investing in this.
That's, that's extremely important.
Anyone want to add something on that before I ask another question?
I mean, I think we will certainly.
We certainly have a package also from our group.
I think this is, and also, I we could already link to research code.
If those are for those who work in research, think you're, ah nowadays you need to um work through the other group's research code.
I think we happily can provide that as well.
Well, maybe I should also say that if you are not interested just in Bayesian neural networks, but in general, statistic sampling, actually, BlackJax integrates pretty well um
with other standard probabilistic languages.
I don't know, you can write your model in Pyro or Pyro or whatever, and you just extract the log likelihood from it.
It's a one line thing.
um And then you can plug it any sampler from BlackJax and there are also tutorials how to do that.
So it's pretty straightforward.
Yeah, yeah.
Like if you want something a bit less in the frontier of the Bayesian research, because Bayesian neural nets are pretty much at the frontier right now, if you want something a
bit more classical, Backjacks, Plugs, Plugs and Plays very well with Pyro and Empyro and of course, Pimsy.
And Bambi also, if you're using Bambi, can access it directly from there, which is great, especially for beginners who can have Wilkinson notation.
models and then just use blackjacks to have great samplers.
That's amazing.
Actually a more generic question I had for you guys.
Well before that actually, you were right.
David, that's Raphael.
Raphael Rems who recommended me uh to you guys.
So thank you so much Raphael for being such a long time listener and also you know being a now you're a matchmaker.
So official LBS matchmaker.
Thank you so much.
um But yeah, like a bit more, uh a bit broader question is, I'm curious, how do you guys think about the trade off between computational efficiency and posterior fidelity when
designing and using and choosing which algorithms to use for which model?
Because ah something you need to do as a practitioner usually is you need to justify, like people when they know you use patient models,
They usually associate that with NUTs or MCMC.
And we know NUTs and MCMCs, MCMC, have the, they take longer to fit, but they have these guarantees of convergence in the long-term, which you don't have necessarily with other
approximation algorithms.
uh People may be familiar with ADVI or DADVI, which is something we talked about on the show with.
uh
in episode, I think, 147.
uh So I'll put that in the show notes.
But yeah, or Laplace or Inla, things like that.
But these come with more assumptions than nuts, but they are faster.
So in this case, you have to be careful of your assumptions, make sure that the assumptions work and your prior predictive samples make sense most of the time, and that
the model is able to recover the parameters from prior...
uh
predictive sampling analysis with fake data.
Let's say, yeah, how do you guys think about that in these cases, like when you're doing patient neural networks?
I can maybe add, or sorry.
I actually wanted to say that Emanuel probably can say a couple of things there, but then maybe just as a general note, think there are the VIE methods that are currently out
there.
They are not necessarily fast.
I think you get a
I would say in the same amount of time, potentially get uh better performance with sampling actually.
yeah, I mean, is the expert there.
can probably say more about this.
I just wanted to say that maybe I will just give a perspective from the B &N case.
And I think Jakob also knows a couple of other perspectives quite well.
So he can maybe also give a slightly different perspective also from different um areas where maybe also
You actually go this way of thinking that you kind of outlined.
I would like to kind of revert it ah completely to maybe start by the most crude approximation that you could think of that is also often advertised by Andrew
Gordon-Wilson at his group at NYU.
If you think about these Bayesian neural networks and the approximation of the posterior of these huge spaces, you can think first ah of the most simple thing, you have just an
optimized model and you put a Dirac prior.
on exactly that model.
That's the most crude approximation that you can basically put onto that very complex posterior.
Well, then you can basically step up an edge and you can actually include an explicit prior.
You do map estimation.
Then you choose a restrictive family of distributions like a factorized Gaussian.
Then you approach basically the whole thing from a variational standpoint.
You can also basically put a Gaussian
on top of your local approximation, which is then Laplace approximation, which is already, as we have shown, as many researchers basically across the field have uh it's uh
generalizability, also performance and robustness, and in many cases, and calibration of the predictions, and for instance, credible intervals from these predictions.
But, and then you also have, you can assemble these things, right?
And you have an ensemble, let's say, of
Gaussians of very high dimensional Gaussians.
So you have a mixture of Gaussians, but still that's not quite flexible, right?
um So taking it from there, you can also again use an ensemble, but then you don't impose any, as you said, restrictive assumptions, but you actually do some sampling, which is
very flexible, very efficient if you then use, for instance, very strong gradient based samples like mclmc.
And then you get much more flexible local approximations.
And then you still
if you want to be correct and also if you want to not just lie about things, obviously this is an approximation to this huge dimensional space.
In the last weeks, also in a project where Jakub is involved, we sampled a vision transformer with, for instance, 22 million parameters.
We sampled all of those and we obviously did not collect millions of samples from this very large network.
but we collected uh many thousands of samples, in fact, of this model.
But still, in comparison with the dimensionality of the problem, is likely a very crude approximation.
But if you see the whole transition from this very crude Dirac delta ah to these restrictive local approximations to these then flexible local approximations, you can see
that you have gone quite a far, long way.
with all this uh flexibility.
then if you then look at performance uh functionally on uncertainty quantification methods, even just predictive precision, you can actually see that this works.
And this then is also not much more expensive than the optimization after all.
And it's also, at least if you are, again, little bit smarter about how you do the whole thing, um in my experience, also more robust.
compared to, let's say, variational approaches in these high dimensions, because these also come with a lot of hyperparameters, and uh they can also be quite brittle, like,
because of very noisy gradients and so on.
So overall, um think that actually this Bayesian deep ensemble approach, while still being a crude approximation overall in the big picture, is a much more flexible and precise and
quite beautiful approximation of the whole ah thing.
Um, then let's say, uh, other very, very crude, um, things, but of course you have, you incur a lot of error still.
So I think then from the other side of the very, uh, high fidelity, uh, posterior reconstruction there, have to ask, uh, Jakob.
Yeah, for sure.
Thanks first time for these very practical rundown.
Um, I, I'm really looking forward to you guys.
example notebooks, I think you should definitely write stuff here and basically make these.
Like, this stuff is in your papers, but we need that for practitioners.
I think it's going to be super helpful, honestly.
so, Jakob, yeah, what do you have to say about that?
Yeah, so I usually uh mostly think about, you know, more scientific applications where you don't have a basic neural network, but you're,
trying to learn the posterior and you want it to be the exact posterior.
um So the MCLMC, if you include metropolis adjustments, has similar type of guarantees as the standard HMC samples, which is to say, if you assume something about the target, like,
you know, it's log convex, something like that, then you can show that they actually converge.
But as soon as the target is, you know, has multiple modes or something like that, then you lose all of the guarantees in any case.
So.
There are no theoretical guarantees in either case, whether it includes adjustment or not.
So I everything is about the same setting here.
um The critical part is how we make it work in practice and how you look at the diagnosis in practice after you land the chains.
um So for example, know, standards, you know, people look at things like German Rubin statistics and stuff like that.
um
And we are actually right now also working on similar types of diagnostics also for our scheme without metropolis adjustment, um where you can basically check your bias is as
small as you would like it to be.
um Yeah.
So yeah, I think it's very practical question.
um
It's an art.
Yeah, yeah, yeah.
know, I know.
Sorry for asking these practical questions, but it's like, that's, that's what I do here.
no, no, no, I think it's a, it's an important question.
Definitely.
And we are thinking a bit a lot about it and yeah.
Yeah.
Yeah.
And actually it seems we're on the practical side of things.
I'm curious.
Do you, how do you, how do your Bayesian neural network models compare to standard deep learning approaches?
You're both in predictive performance and uncertainty calibration.
I think that one is going to be for you, David.
Yeah.
mean, of course in our papers, uh not of course, but they usually perform better both in predictive performance and uncertainty, um, qualification metrics.
this is maybe sort of, um, obvious because, um, essentially we have a big model average.
over like every sample is a weight vector of this one neural network.
then um we have a huge ensemble of um all of these networks.
then I think this is also, I mean, that has been shown in the past to be um very promising.
Assemble methods um have worked for random forests, boosting, stuff like that.
They also work extremely well for Bayesian neural networks.
um And as Immanuel said before, the uncertainty quantification metrics, at least those that are currently used, these Bayesian networks also outperform the standard deep
learning approaches uh by a larger margin.
Again, you could say this is obvious because you're comparing one neural network versus a big ensemble of neural networks.
Well, you have to first come up with this sample and that's the art then to sample from these Bayesian neural network posterior.
Joao, do you want to add something or?
I think that puts it very well.
I mean, for some problems, even sometimes cut metrics in half, right?
Like an RMSE for some problems is task specific in a way.
It also
It works for all different kinds of models.
in our this year's iClear paper with Jakob as well, we also had applications to uh different data modalities like text, uh images, tabular data, all different kinds of
models.
have now sampled the VITs, nano GPT, you name it, Resonance.
uh It basically is a very generic way of formulating the problem.
And then in the end, of course, uh you don't get a huge margin in each and every experiment.
But I think what's quite encouraging is that you get, you're never worse kind of than the standard uh Deeposomo, which is a really strong baseline.
Like you can actually make it work to always give an improvement and then it's up to the user whether this improvement is worth the extra effort, which we try to keep very low.
And also that is kind of an
the choice of the user, like how long would you sample from these very good starting points?
ah And obviously, if you in most cases, it's actually quite fast, performance.
So you don't have to sample for an excessive amount.
But obviously, you can just take a few samples and which maybe a few samples more would give you slightly better performance, right?
But then you basically have to trade off, as you said, computational efficiency versus ah your
performance at inference kind of.
also like to also showcase maybe that it's not the same, it's not just compute, right?
If you spend the same amount of compute for mean field variation inference for fitting a lot of just optimize a lot of ah ensemble members for a deep ensemble and spend the same
amount of time for sampling.
You can see that for this given budget, you can actually make sampling work much better, especially on these uncertainty quantification metrics.
there is some um magic sauce to the sampling itself, because if you think about it also, obviously you also bias your uh model average, which you have in the end.
If you, for instance, have a depot sample and if you just optimize each model, you don't have any kind of exploitation um part in the whole thing.
And other approaches have already experienced that as well.
like exploration, exploitation trade-off.
I think you heard Yingjun Li on the podcast as well with Cyclical SGLD, which kind of is the um personification of uh exploration, um exploitation trade-off.
So um this again holds here as well.
Okay.
Damn.
oh And something I'm curious about actually is...
I think so the way I see the models we're talking about right now is mainly as an add-on when you're limited by the classic deep neural networks, right?
Where you need actually the Bayesian machinery for some reason.
So first, maybe you'll tell me, no, no, actually there is a need in itself for this.
And I'm happy to hear about that.
But also what I'd like to hear from you is...
In which cases would you recommend listeners to look into that?
What kind of real world problems do you see as the best fit for these fast patient neural network approaches?
I can briefly answer and maybe Emmanuel-Jacob can add to this.
I think in general, the computation, the overhead is not that large anymore, that big anymore.
um
I would rather, I would actually um ask whether whether it's not worth doing it for all the applications because it's, as I said, the overhead is not that large and then you get
um you get a better performance and you get uncertainty quantification, you could argue that this uncertainty quantification is not like how do like do it?
Do I actually know um
Monte Carlo error that I incurred by not sampling 1 million samples, stuff like that.
But then the metrics that measure that we measure for uncertainty quantification are usually notably better than if you apply version inference or Laplace or something.
So um maybe you are not there yet.
Maybe you don't get the uncertainty quantification that you would want to have for your Bayesian networks because this still a limited um
limited compute.
then you're doing better than everything else that is there.
So I don't see any, any reason not to use it.
um Yeah.
And the, the, mean, there's has been so many, like, there's no specific domains, I think, as I said, like you can, you can apply to all sorts of Bayesian neural networks.
I think we're still
Maybe still not clear how effective it is or how much it brings to the table if you apply this for generative models, right?
So you could apply those also for generative models.
could sample as a manual said, a nano GPT model and then generate text from a nano GPT model.
um Does the additional uncertainty in the parameters in the weights, um does that somehow improve the generation?
That's I think still unclear, but
Then again, for all the other supervised learning tasks, think this is why should you not try it if it has not that much of overhead?
Yeah, no, 100%.
Especially in that's why I push always for these reproducible workflow accompanied with a reliable Python page because then it's like the entry cost is super
super low for practitioners.
And so it's like, well, you have the package here, you have the example notebook.
Just, you know, try and, and sample it up, fire it up and see what it gives you.
Maybe it's too, it's still too slow for what you need and that's fine, but maybe it's actually really what you need because all the bells and whistles you get from the Beijing
approach are, are going to be extremely important for your use case.
So yeah, like completely um agree with that.
Emmanuel and Jacob, anything to add on that?
No, that's I think David did a good job.
Yeah.
Actually, I'm sure like probably Jacob, you'll have something to say about that.
Do you have an example in mind or maybe uh a theoretical use case from you knowing the theory so much where a Bayesian neural network with
the microcanonical sampling approach could reveal something or revealed something that the classical neural net would have missed?
mean, again, I think that, yeah, I don't think the, I mean, from the large language models, there's this mixture of experts hype right now, right?
So there seems to be really,
something that you gain by mixing experts, by combining models.
I think you would profit in, I mean, there's certainly examples and we saw that also in benchmarks.
saw benchmarks where, I mean, Emanuel was, I think, very open about this before um for certain combinations of data set and neural network architecture.
there might not be an improvement in the predictive performance if you assemble models or there might not be a better uncertainty quantification than if you just put a Gaussian
around your optimized value.
um But it's usually not worse than that.
It's not worse than uh these comparison models.
yeah, I don't know.
I think maybe the
The thing that Jakub could have said is maybe um there are, think, some non-Bayesian non-network applications where you can definitely show that MCLMC uh does notably better
than other methods.
mean, if that's what you mean, non-Bayesian applications, there is plenty where without MCLMC you cannot explore the posterior sufficiently.
m
correlation length is just too high.
For example, we're working on cosmology applications.
And that big thing is now a field level inference, where you want to together with your parameters describe how the universe evolves.
You want to also learn the entire field of the universe.
And this is millions of parameters and as the service gets bigger, it's getting bigger.
So we're also getting in the machine learning dimensionality regime.
And yes, definitely if you just run NAT.
You will basically not converge in reasonable time.
So people have just given up on that and tried Laplace approximation and other these type of approximations.
But if you run MCMC, then you actually get a good posterior that's more reliable than approximations for sure.
And it's actually even faster than the approximations.
So um yeah, in this type of applications, it's very important.
Hmm.
Hmm.
Okay.
Yeah, I see.
I see what you mean.
um
Actually, you think, uh Emmanuel, anything you want to add on that or is that fine?
Okay.
Cool.
And Jacob and of course, uh David and Emmanuel too, if you want to jump, but I'm curious if you, do you expect micro-canonical methods to become more widely used in Bayesian
machine learning or do you see them
mostly particularly suited to the specific classes of models we've been discussing today?
No, they're actually very general purpose.
We have been trying it across models, uh like various different models from Bayesian neural networks to the set, physics applications to quantum problems.
uh Basically you name it or benchmarks and on none of the problems we found it to perform worse, but usually it performs much better.
Especially high dimensions.
uh
can be orders of magnitude faster.
um So yeah, it's definitely not a problem specific application.
Okay.
Okay.
Yeah, that's cool.
That's great to hear.
And actually, so that opens me up to two questions I have a bit more regarding the future, which is how optimistic are you guys about the feasibility of going to getting to where
you were talking about at the very beginning, David, which is...
Bayesian deep learning at scale without latency to put it in a very ambitious way as a default in the future.
How teamistic are you about that and what breakthroughs do we need to get there?
So I would be very optimistic about that.
I think that in a couple of years, we probably are, I mean, still, I mean,
We're not, I don't, so I'm optimistic that we can solve any, or that we can uh sample Bayesian neural networks of the size that now people in research, for example, optimize.
uh Maybe not the uh country, sorry, a company neural network sizes like OpenAI's GPT-5 model.
I don't think we are, we'd be able to sample that, that this
This is just not, um I probably also don't want that.
um But other than that, I think um there's almost no limit to, well, there is a limit for the typical use cases.
none of the researchers that I know outside of OpenAI or the other companies that run these large language models, all the researchers that I know,
um they could, I think in two or three years, um sample their models as well instead of just optimizing them.
And then um there might even be approaches that um maybe researchers from OpenAI and other companies ah are using that instead of sampling to get something like a Bayesian electric,
like for example, the I-Von optimizer from uh people from the Rican group in Japan.
um There are still some open problems that maybe Emmanuel can mention, um but I'm very optimistic.
There was a paper in the past from Andrew Wilson's group, again, where they used a lot of TPUs and lot of runtime to sample these Bayesian networks.
I think essentially what they did, we can do now in an hour or two with...
um
standard GPUs.
And I think this will just um get better and better.
And then as I said in the beginning, maybe at some point, we actually have like a meta model that doesn't even um that gives the sampling ah that the samples right away without
doing the actual sampling.
That is a generative model that can generate probabilistic distributions right away.
um
We have shown that this already works for uh small applications, for small models.
You can essentially replace an MCMC sampler with a transformer um and gives you the identical posterior um by learning a distribution in this transformer instead of sampling
from the model's posterior.
Sorry, I think Imad also had some comments about the um scale.
Yeah, thanks David.
was already super interesting.
I love that future.
He's done that on that part.
yeah, I completely agree.
Emmanuel?
Yeah, I to be honest, I totally agree with that future.
There are still a couple of obstacles, but I think they're also manageable.
I think one thing, for instance, uh would be basically...
that is shared for variation inference, Laplace, and so on, is that at inference time, ah you sample again, or you basically have to do forward passes through all your models that
you saved, ah basically your samples.
So this obviously incurs extra latency.
And I've been to industry and latency is relevant for the end user, especially if you user-facing applications.
um
But I can tell that there are groups that are very talented.
um We are also um maybe among of them um that are working on basically discerning this knowledge down.
And David mentioned a future where transformers can learn that, um which is not too unrealistic by now.
And also classical distillation or methods from statistics um could be applied there.
And I've even had some POCs on my laptop that didn't make it out to the public where I've seen that you could speed up the inference cost by like a factor of 100 quite easily
without losing um any performance.
So um I guess there's a bright future ahead for these sampling methods, which are not.
classical sampling methods anymore.
And I think that's actually something that people have to realize that a lot of these classical things don't work anymore, especially for BNNs.
Like we've also put this into one of our papers, the Gellman-Rubin statistics or the effective sample size was mentioned.
They don't make any sense in ah non-identifiable models, for instance.
um So you actually have to switch gears a little bit and you have to be smart about things.
with all this understanding that you can also get from, let's say, the optimization theory part, if you actually digest that, then you can come up with quite simple recipes that are
quite robust, that are often hybrid models between, let's say, optimization and sampling.
And applying those is actually very feasible and pretty performative.
I also think...
There's quite a bright future ahead and there are actually like, I can name four papers right now that sampled quite large networks and they were published this year at AiStats
or iClear.
um So there's a lot of progress actually in this field.
Yeah, that's really incredible.
Great time to be alive.
Jakob, do you want to add anything on this topic?
No, no, think guys already answered everything.
Okay.
So actually I'm curious also to hear you guys about, okay, that's, that's what you would like to see.
um And that's how you envision Bayesian deep learning at scale to look like.
More, even more concretely, I am also wondering where do you see the interaction of the, the method, the concrete packages and methods we talked about.
So Jack's, Black Jack's.
and Beijing inference, where do you see these heading in the next few years?
Especially with these new samplers, these new methods we just talked about.
Probably best if Jakob or Imanul answer this one.
Maybe I can say two words and kind of kick it to you Jakob.
So basically I think...
we're developing quite good code.
think there needs to be kind of a community effort to actually commit on a certain type of um package.
But uh the big flexibility, um like Posterior is already a good step into the right direction.
But I think like, actually delivering a package for the whole pipeline, like because Posterior is packaged as just a sampling.
So in fact, to kind of put it together, it also does kind of the optimization part, but uh you need kind of
at least for these hybrid methods, need optimization, need sampling, and then you also need to take care of the samples, as I said.
You have to actually manage all these samples.
And that's actually something that many people forget, because that's kind of this boring engineering that just goes into uh actually making these things work at scale.
You actually have to do it also at inference, you need to assess all of these samples, you have to do this in batches, you have to do this efficiently, um but you can do that.
with a Jax quite easily, you just have to put out the engineering.
But also what I've seen in the blog of, for instance, Andrew Gelman, which I think is quite encouraging.
also just recently mentioned Jakob by name and also Black Jax.
And I think this also will spread a lot in the also classical Bayesian world.
And I think Jakob, you have a better view on that.
Yeah, it seems like a lot of Bayesianists are turning to Jax.
um And so black Jax philosophy is to um somehow provide a skeleton for samplers.
It doesn't provide the end results.
That's the philosophy.
I don't think that will change.
um So the goal is not to, you know, provide one line calls that you can do to the sampling, but to give you a seminal already set.
Like the ability to tweak all the little details if you want to.
So I think that's a great start.
um I like it how it's framed.
um But we also of course need to also put out the final results um as the manner is doing.
And also in the more classical sampling world, I think that's maybe one missing piece, something to put together the different samplers and make them very user friendly.
Yeah, it's like, this is really a full stack, full stack work, right?
Which is kind of the difficulty.
You don't need only the technical implementation, which is already extremely hard in itself, but then you need also the, the thing around it, all the communication that goes
afterwards, you know, how to make sure people know about your method.
How do you explain to them without being overflowed by the...
by the customer support, let's say, and all that stuff that's beyond technical abilities was extremely important because in the end a great package that nobody uses is a shame.
that's also why I'm doing that, making sure we shine the light as bright as possible on things that really can change the game for a lot of people.
um Fantastic, guys, that's...
That's really cool.
I'm going to start to wind us down here because it's late for you guys.
is there anything I didn't mention that you would like to mention here before I ask you the last two questions?
Cool.
That means I did a good job because I had a lot of questions for you guys.
I'm very happy about that.
Again, to all the listeners, check out the show notes because they are...
They are quite big for this episode.
I think you'll really appreciate them.
There's a lot of really cool nuggets in there.
But as usual, guys, before letting you go, I'm going to ask you the last two questions.
Ask every guest at the end of the show.
So if you had unlimited time and resources, which problem would you try to solve?
Maybe let's start with David.
That's a very good question.
So actually...
because I recently saw some groups celebrating that they got uh access to super computer super clusters.
I was actually thinking to myself, well, we actually don't need that.
So they wanted to also do something for Asian.
I thought, well, we can do that now without the super cluster from Germany.
um But I mean, with unlimited time and resources, I think what is definitely would definitely improve our understanding would be to actually.
um hard with sledgehammer style um brute force um extract the lost landscapes from neural networks so that you exactly get the whole lost landscape of a neural network and then we
would have an understanding what is going on there.
Same for the learning dynamics because this is something that people now try.
uh
to get the grip on to understand um how is it possible that great indescent, sarcastic great indescent approaches work so well.
um And then if you would have infinitely many resources and time, then unlimited time, then you could do that and understand it.
of course, the more relevant thing for our world would be to do something towards, yeah.
um
Can we turn the current trends um be it AGI or not in the end, can we turn that into something um for humanity and not against humanity?
think this is the more broader answer.
m Jakob, what about you?
Okay.
If it was purely driven by curiosity, not by ethic concerns, I think I would like to understand life.
ah how it forms, what it means and uh where it is.
Actually, my main science goal is characterize Earth-like planets.
So this is a small step in that direction.
um But yeah, I'm really curious about, know, are we alone and what other forms are there?
Yeah, understand that.
Who would not be?
ah Love it.
Love that answer.
That's great.
And Emmanuel, what about you?
Well, really hard to add something to those two good answers.
Well, again, ah certainly there would be more influential and more important problems, but I'm far from the right person to sketch a roadmap in these domains.
So I will just say flexible yet affordable, generalized, maybe universal probabilistic modeling at all scales.
um Because I actually truly believe that, ah well, calibrate the probabilities can lead to truly transparent and trustworthy systems.
And ah well, this doesn't really necessarily require Bayesian twist.
I'm not sure how this would turn out, right?
But uh since I started to work with probabilities, it almost and distributions, it almost always gave me a feel of like um I'm on the right track.
ah compared to like a little bit more limited knowledge ah from just looking at one point, like the broader perspective of that actually there's no uncertainty and that also kind of
is reflected in our everyday life everywhere.
And to actually really grasp that in some sense is quite inspirational, I think.
Yeah, yeah, definitely.
ah
Yeah, love that answer.
For the interest of time, I will not comment.
But second question, if you could have dinner with any great scientific mind, dead, alive or fictional, who would it be?
David, let's start with you again.
Great question.
Also a difficult question.
I think maybe something that...
I'm very lucky to have is that we have this Bayesian deep learning consortium.
Like this is like a lot of people that do Bayesian deep learning.
uh They write a book together.
So there will be saying maybe next year coming out like a huge book on Bayesian deep learning um co authored by hundreds of, I guess, hundreds of authors.
So I could have probably have dinner with them already with these all these great minds.
um But if they would
be one thing that is, if you say fictional, then maybe someone from the future that knows uh what will happen in a couple of years and then talk to them.
That would be, I think, quite enlightening, obviously.
Yeah.
I love that.
um Let me know when that dinner happens.
I definitely need to come in with some mics.
It sounds like there will be some very interesting conversation there.
ah
Emmanuel, what about you?
Yeah, pretty difficult.
I mean, to be honest, maybe someone who's alive right now, maybe it will happen at some point.
Let's put it out, shout out to Pavel Ismailov.
I probably pronounced the name wrongly, but I just started out with Beijing Neural Networks when I came to David's group and I read a lot of his papers.
And I really appreciate them.
They were really good.
And I feel like that he has also that we, I would be interested in some of his takes because we've looked at very similar problems and also had, think, would agree on many
things, maybe on some things not.
And I would be interested in that, but maybe just more generally, I'm actually quite fascinated by Newton himself ah due to his very broad
influence on science in general.
ah Yeah, that sounds about right.
And the good thing is that he spoke English, though maybe harder English than two days.
So I'm curious how you would communicate actually.
um And well, Jakub, let's finish with you.
Okay, given that David went into the future and Emmanuel stays in the present, I will go in the past.
I think I would like to speak with von Neumann.
He's a character from the history, supposedly very smart and I heard he was fun to be around.
So I think it would be enjoyable dinner.
I don't think I would learn that much.
He would probably be way too smart for me, but should be fun.
Yeah.
Yeah.
I love that.
I don't remember, but I think you're the first one to answer of annoyments.
So amazing.
That'd be weird if you were the first one though, because I've been doing that for six years.
where that he didn't come up yet, but maybe there are some listeners.
Raphael, you know better than me probably, so let me know.
Fantastic.
Well guys, thank you so much for taking the time.
That was a great show.
I really enjoyed it.
Thank you so much also for the work you do.
I think it's extremely important, not only because it's going to unlock a lot of issues.
Practitioners are...
currently having but also I think it's helping a lot to make patient inference still relevant in today's, in Tuesday's age and I can only thank you for that because obviously
we all love patient stats here and we think they have something to contribute uh but we we always need to make sure they they keep at the frontier they stay at the frontier because
otherwise we're gonna
lag behind, so thank you for being our knights in binary armor at the frontier of patient stats.
And again, thank you so much for taking the time and being on this show.
Thank you for having us.
Thanks.
This has been another episode of Learning Bayesian Statistics.
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit LearnBayStats.com for more resources about today's topics, as well as access to more
episodes to help you reach true Bayesian state of mind.
That's LearnBayStats.com.
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraam.
Check out his awesome work at BabaBrinkman.com.
I'm your host.
Alex and Dora.
can follow me on Twitter at Alex underscore and Dora like the country.
You can support the show and unlock exclusive benefits by visiting Patreon.com slash LearnBasedDance.
Thank you so much for listening and for your support.
You're truly a good Bayesian.
Change your predictions after taking information and if you're thinking I'll be less than amazing.
Let's adjust those expectations.
Let me show you how to
Let's get them on a solid foundation