Name: Fast Bayesian Deep Learning
Uploaded: 2026-01-28T15:30:14Z
Description: How do you scale Bayesian neural networks without sacrificing speed? Rügamer, Sommer & Robnik explore BNNs, deep ensembles, and fast samplers

‌

Listen on your favorite platform:

Apple Podcasts

Spotify

Youtube

• Join this channel to get access to perks:

https://www.patreon.com/c/learnbayesstats

• Proudly sponsored by PyMC Labs: https://www.pymc-labs.com/contact

• Intro to Bayes Course (first 2 lessons free): https://topmate.io/alex_andorra/503302

• Advanced Regression Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Takeaways:

Bayesian neural networks are crucial for uncertainty quantification.
Scaling Bayesian methods to high dimensions is a significant challenge.
JAX offers substantial speed improvements for Bayesian sampling.
Initialization errors can hinder the performance of Bayesian neural networks.
Microcanonical Langevin sampler enhances sampling efficiency.
Practical tools are essential for wider adoption of Bayesian methods.
Understanding neural networks requires better uncertainty quantification.
Ensemble methods can improve the performance of Bayesian models.
Computational efficiency must be balanced with posterior fidelity.
Community-driven tools are vital for advancing Bayesian deep learning. Bayesian deep ensembles provide a more flexible approximation.
Sampling methods can yield better predictive performance.
Uncertainty quantification is crucial for practical applications.
The overhead of Bayesian methods is decreasing.
Bayesian neural networks outperform standard approaches in many cases.
Exploration-exploitation trade-offs are important in sampling.
Future advancements may allow for Bayesian deep learning at scale.
Community efforts are needed to improve Bayesian inference packages.
Practical applications of Bayesian methods are expanding.
Understanding life and probabilistic modeling are key future goals.

Chapters:

00:00 Scaling Bayesian Neural Networks

04:26 Origin Stories of the Researchers

09:46 Research Themes in Bayesian Neural Networks

12:05 Making Bayesian Neural Networks Fast

16:19 Microcanonical Langevin Sampler Explained

22:57 Bottlenecks in Scaling Bayesian Neural Networks

29:09 Practical Tools for Bayesian Neural Networks

36:48 Trade-offs in Computational Efficiency and Posterior Fidelity

40:13 Exploring High Dimensional Gaussians

43:03 Practical Applications of Bayesian Deep Ensembles

45:20 Comparing Bayesian Neural Networks with Standard Approaches

50:03 Identifying Real-World Applications for Bayesian Methods

57:44 Future of Bayesian Deep Learning at Scale

01:05:56 The Evolution of Bayesian Inference Packages

01:10:39 Vision for the Future of Bayesian Statistics

Thank you to my Patrons (https://learnbayesstats.com/#patrons) for making this episode possible!

Come meet Alex at the Field of Play Conference in Manchester, UK, March 27, 2026!

https://www.fieldofplay.co.uk/

Links from the show:

David Rügamer:

Website: https://www.statistik.uni-muenchen.de/people/professors/rügamer/index.html
Google Scholar: https://scholar.google.com/citations?user=y1p8VhsAAAAJ&hl=en
GitHub: https://github.com/compstat-lmu

Emanuel Sommer:

Website: https://emanuelsommer.github.io/my-yourney/
GitHub: https://github.com/emanuelsommer
Google Scholar: https://scholar.google.com/citations?user=qa2P1tYAAAAJ&hl=en

Jakob Robnik:

Google Scholar: https://scholar.google.com/citations?user=J9E2DxAAAAAJ&hl=en
GitHub: https://github.com/JakobRobnik
Microcanonical Langevin paper: https://www.jmlr.org/papers/volume24/22-1450/22-1450.pdf
LinkedIn: https://www.linkedin.com/in/emanuelsommer/

General references:

JAX: https://github.com/google/jax
BlackJAX: https://github.com/blackjax-devs/blackjax
sklearn-contrib-bde: https://github.com/scikit-learn-contrib/bde (easy to use and fast MILE for tabular data)
A Beginner's guide to Variational Inference: https://www.youtube.com/watch?v=XECLmgnS6Ng
posteriors (pytorch+sampling): https://github.com/normal-computing/posteriors
MILE paper: https://arxiv.org/abs/2502.06335

Today, we're going deep into one of the hardest problems in modern machine learning, how to scale Bayesian neural networks to truly high-dimensional settings without giving up

uncertainty or computational sanity.

My guests are David Ruegammer, who leads a research group at LMU Munich working on uncertainty quantification and deep learning, together with Emmanuel Zommer and Jakob

Robnik,

two PhD researchers at the core of this work.

In this conversation, we unpack why Bayesian neural networks still matter in a deep learning dominated world, what uncertainty quantification really buys you in practice, and

where standard approaches break down in high dimensions.

We also talk about applications, community-driven tooling, and the long-term future of Bayesian deep learning, where it already outperforms standard neural networks.

what's still missing for broader adoption, and why there is real optimism that scalable Bayesian methods are finally within reach.

This is Learning Bayesian Statistics, episode 150, recorded November 20, 2025.

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods, the projects, and the people who make it possible.

I'm your host, Alex Andorra.

You can follow me on Twitter at alex-underscore-andorra.

like the country.

For any info about the show, LearnBasedStats.com is Laplace to be.

Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on Patreon, everything is in there.

That's LearnBasedStats.com.

If you're interested in one-on-one mentorship, online courses, or statistical consulting, feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.

See you around, folks.

and best patient wishes to you all.

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can help bring them to life.

Check us out at pimc-labs.com.

Hello my dear Bajans!

Before today's episode, I wanted to let you know that this year, we'll be talking about Bajan modeling in soccer at the Field of Play conference in Manchester, UK on March 27,

2026.

So if you want to meet me, you can come there uh in the audience if you want, but also as a speaker, because we have already locked in most of the speakers, an announcement coming

soon.

Stay tuned and follow on LinkedIn.

Last year we had speakers from baseball, cycling, education, fantasy sports, soccer obviously, because it's Manchester, and that mix honestly genuinely raised the level of

conversation.

The theme for this year, 2026, is communicating complex ideas, how do you take something technical, nuanced, uncertain, like models, abilities, trade-offs.

and make it understandable and useful for people who are not data experts.

Like last year, we are opening up one of the final speaker slots.

So if this theme resonates with you or someone you know, whether they work in football or somewhere completely different, feel free to contact me and I will take a look.

And in any case, you can already buy...

your tickets, go to Field of Play's website or LinkedIn page and you'll have all the information there.

I'm really looking forward to seeing you there and well, I will for sure have some LBS merch with you, so please come say hello and well, come to my talk also so that then I can

say that the room was full, because otherwise, I don't know what I will do.

Thank you so much, people, I will see you there, and now, let's go on.

with the episode.

David Ruggamur, Emmanuel Zomer, and Jacob Rupnik.

to Learning Basics and Statistics.

Thanks.

No, you bet.

Thanks for taking the time, Emmanuel and David.

It's late for you in the evening, so I definitely appreciate it.

And actually, was recommended, I mean, you were recommended to me by a listener.

Of course, I'm forgetting.

his name right now but it will come back to me during the recording.

um So we'll get back to that and I will thank you properly at that time.

But in the meantime, let's start with you guys.

As usual, em we're going to dive into what you're doing because you're doing really some fascinating stuff, the three of you.

um But first, your origin story.

What are you doing nowadays?

How did you end up working on that?

And also lastly listeners, please forgive me for the weird voice.

I am a bit uh Sick today, but I have uh team a dream team today to record with so I could not cancel on them.

So David let's start with you.

What's your origin story?

What are you doing nowadays?

How do you end up doing that?

Yeah First of all, thanks for having us em So I think uh Rafael was the one

that um brought up the connection.

um And yeah, what I'm doing nowadays, I'm a professor at the statistics department of LMU Munich.

um There I run a lab called the Munich Uncertainty Quantification AI Lab, short munich.ai.

I do teaching at the department.

I'm teaching statistics courses now and then, but

My main duty is to teach deep learning, all sorts of deep learning, applied deep learning, the foundations of deep learning.

um And I'm also like a principal investigator of Indie the Munich Center for Machine Learning, which is I'm very grateful for because they, they bring up, they give you so

much like network uh funding, et cetera.

So it's actually one of the six big AI centers in Germany.

and being part of this is uh pretty amazing.

And yeah, how did I end up working on that?

em I did my PhD in statistics, actually.

And during my PhD, I already found these uh algorithms quite intriguing and quantifying their uncertainty.

And that's actually something that then sticked and then, yeah, I'm...

At some point I started working on deep learning and nowadays, that's my main focus.

Okay.

I heard it's a pretty popular focus lately for some reason.

don't know why.

Emmanuel, what about you?

Yeah.

So, well, my background actually is in math.

So also in Munich from TUM, however, now I switched sites to Alamu, to David's lab where I...

currently do my PhD and with a very strong focus on basing neural networks.

So uncertainty quantification, quite generic networks with a particular focus again on sampling, which also brought up the connection with Jakob.

And, um well, I did actually a wide variety of stuff like probabilistic forecasting and financial domains, learning to rank in the industry before.

uh well, the common theme always was kind of the fascination with probabilistic.

machine learning methods, would say.

So quantifying some kind of uncertainty, I guess.

And, however, also the time as a practitioner, in fact, taught me that I can actually motivate myself much better if I envision my work to be of practical value.

So that's why I kind of try to fuse the probabilistic rigor also with a little bit of practicality, I would say.

And I think that's, I guess, quite defining of my work now.

Yeah, I would say the same from what I've seen from your work preparing for this episode, but we'll dive into that in just a few seconds.

First, Jakob, what about you?

I am actually a physicist by training and I'm currently a PhD student at Berkeley.

I generally work on different problems in statistics and physics and astronomy, basically whatever is data-driven.

And I'm recently also been developing MCMC algorithms and that's how we met at the conference with Emmanuel and we started collaborating.

Okay.

So you met, you all met at a conference and then, and then boom, this, this happened.

Awesome.

Yeah, I love it.

Nice, nice origin story.

Well, maybe one of you, em can you just give us a big picture of review?

of your group's work.

Actually, you guys are focused mostly on Bayesian neural networks.

What are the overarching research themes?

Who wants to tackle that?

I can at least say a of words of what my group is doing.

We're not only doing Bayesian neural networks, but also things related to optimization, optimization, sparsity and...

Then uncertainty quantification are the two pillars in my group and we try also to combine that.

And um the idea is usually that this somehow helps in understanding neural networks better.

So not from an integral machine learning XAI point of view, but more like um can we understand their learning dynamics?

Can we understand uh what the optimal solutions of those is?

then also most

maybe most importantly, what is the distribution of the parameters of these models?

And that's exactly where the Bayesian networks come into play because you get the distribution of the parameters.

And by that, you can understand a lot even beyond the applications that I think we might also talk about later.

um by having the distribution of the parameters, you can sort of already maybe understand better what is going on.

in these models.

And yeah, that's the big picture in the group is actually, as I said, one of the pillars is the Bayesian neural network is the uncertainty quantification.

And ideally, at some point, we were able to, I don't know, get big transformers or big, big other neural networks to give us the uncertainty for any prediction task immediately

without any

any overhead, any computational overhead time.

um then so you can have in any AI application, you get the uncertainty essentially then right away.

Yeah.

Of course you're preaching to the choir here.

That'd be absolutely fantastic.

um How does it work?

like, yeah, what does it mean to make Bayesian neural networks fast?

How did you...

start doing that and what does it mean concretely?

I m think Imala can maybe uh give more details about that, but I can maybe briefly mention that when we started there was actually a couple of choices and we found that this is uh

super slow, everything is super slow.

And then um at some point we realized that JAX um is notably faster, at least when you sample these Bayesian neural networks.

um I don't know, like Immanuel can maybe give you bit more details about that.

Yeah.

Well, basically we're still talking about, maybe we can talk about comparisons with other uh competitive methods or other approximations.

um But like focusing on the sampling point of view first.

Yeah.

uh So I basically got the idea for Jax from David.

think before starting in his group, I've never touched it.

I've heard of it, but never touched it.

and it worked quite well, like a charm.

um But then in the first project, I actually went for NumPyro as um kind of the sampling engine with these classical samplers implemented.

And in fact, it was actually super frustrating for me because uh it was reasonably fast, but not very modular.

And I couldn't really get my hands on the core algorithms.

And also some quite

small details were actually not configurable, which made a huge difference when you actually want to scale these things up.

basically what I did quite fast is switch to Blackjacks.

Then it was quite a painful refactor, but it was well worth it.

It's much more modular and I felt like it was really good.

Again, shorter to Rafael, by the way.

He mentioned Blackjacks the first time to me.

because basically there are some tiny things that actually make a big difference if you want to scale this up like memory management, like really boring stuff.

For instance, you would not want to carry, let's say, 100 samples of ResNet 50 models in your memory, which blows up quite fast, but you actually want to implement, be able to

implement callbacks, let's say, to actually save samples at each time point.

which is very unusual for the classic OBS, I guess.

But if you work with neural networks, this actually makes things much faster.

So there are actually little tweaks that kind of ah make this faster.

But ah of course, compilation with JX works perfectly on GPU, TPU and so on.

That's pretty nice.

But that's only one ah major pillar, of course, like software is important.

But you also have to kind of challenge these classical methodological um

stances that are uh well established in classical Bayesian inference.

For instance, for us, some big steps were a parallel first budget allocation.

So actually parallelizing a lot of things, having rather short chains, putting a lot of emphasis on ensembles, uh using hybrid optimization techniques, like actually getting a

lot of information from optimization using warm stats.

cleverly.

mean, these ideas are not entirely new, like Aki, who was also on the podcast with you at some point, I think wrote a paper in 2000, where he kind of sketched that roughly with

Wormstead.

So it's nothing super fancy, but you actually just have to put all of these things together.

And finally, you also have to have a powerful sampler.

And that's where kind of Jakob comes into play.

um That also makes a big difference.

Yeah, okay.

Yeah, that's really fascinating.

uh And thanks for doing my job, Emmanuel, because yeah, actually, Jakob, I think it's a perfect segue for you because if I understood correctly, you worked on something called

microcanonical Langevin sampler.

I have no idea what that is, so you're to have to explain it to me.

And why did it play a critical role in scaling these methods to very high dimensions?

Yeah, right.

So um there are actually two things here.

One is that, is the actual dynamics that the sampler moves according to.

And the second is how you actually implement numerical integration.

um So the first thing that we introduced was, so, you know, the standard sampler MCMC in these kinds of settings is Hamiltonian Monte Carlo.

where you have some Hamiltonian dynamics that makes your samples move very efficiently uh and make it less correlated.

um And it's some, the gold standard.

uh So what we changed here was instead of Hamiltonian Monte Carlo introduces, we call it micro canonical Hamiltonian Monte Carlo, Langevin Monte Carlo, which is based on a

slightly different type of dynamics.

The key difference somehow is that it's velocity of the chain.

is fixed throughout the sampling.

um And this makes it somewhat more stable where you have sharp transitions in the likelihood and so on.

The velocity doesn't just blow up and it keeps constant.

This is makes it more stable.

It also acts somewhat more deterministically.

so this is one thing, but it's actually, it's not the main reason why MCLMC, our microkernelic enlargement, when the color is so fast.

um The main reason is actually how you do integration.

um Because uh when, for basically for either for Hamiltonian Monte Carlo or for microcanonical Monte Carlo, um you have some dynamics and uh these dynamics in principle

are designed in such a way that you get correct samples.

But in practice, then you have to actually numerically integrate these equations to uh get your samples.

And this numerical integration gives you some error.

um And this translates to bias of your samples.

And the way this is usually approached is that you treat numerical integration as a proposal in a metropolis-hasting scheme.

So this is a scheme where you do integration and then you accept or reject as a sample.

um And this accept-reject probability can be determined in such a way that you remove the bias.

And this is what people typically do.

It's a great idea, but it has a problem in high dimensions.

that you need to have a increasingly smaller step size as you increase the dimensionality um to keep the acceptance probability high.

And so this is our main innovation that we actually do not do this metropolis uh correction, but in steps we have an alternative scheme that controls the bias to be small

enough so that it's negligible compared to the other types of error.

um And this is what really allows you to keep a constant step size when you scale up.

Okay.

And so having

So having this constant step size is really what gives you the most important improvement on the sampling velocity?

Yes.

So, you know, step size determines how fast do you move.

And if the step size is low, then you have big correlation in samples.

But with this scheme, you can actually keep a constant step size, don't lose efficiency with higher number of parameters.

Okay.

But how is that?

So how's that...

What's the difference with the step size from the Hamiltonian Monte Carlo then?

Because if I understand correctly, this is doing the same thing.

So why does this in this case make such a big impact on the neural network sampling?

So in principle, Hamiltonian Monte Carlo could can use the same scheme.

uh just usually doesn't, but people usually complement it with the metropolis test.

So yeah.

What we did was find a scheme that makes this possible.

And it's also applicable to Hamilton and Monte Carlo.

Okay.

Super interesting.

Emmanuel, it looks like you wanted to add something.

This with a few errors that are quite prominent in Bayesian neural networks to kind of put it together.

At least that's my view on the whole thing, because that's one of the major pillars uh as Asiakov just mentioned.

And then to also put it back with the main key ingredients to actually make it fast and valid in the very high dimensional settings is uh also the Hamiltonian.

uh For instance, this Metropolis Hastings correction also requires full batch.

So full APOC through your data loader, which is not feasible for large datasets that you encounter in deep learning.

So that's basically a no brainer.

Actually, there was a really good paper from 2021.

Vincent was also involved, who was also here on the show, that showed that actually this acceptance probability then in these very high dimensions goes to zero.

So that's a very hard result in that sense.

so Metropolis Hastings Adjustment doesn't really work.

We can control that with the sampler.

But then if you think about the approximation error in the sampling of page neural networks, you also have the initialization error and the Monte Carlo error.

And these two then tie together to these key ingredients that I mentioned before, because if you then warm start with optimization, which is really good, because we have known

recipes in the community, everyone knows how to, by now, at least to train a resonant or any type of neural network by now, almost, you've got pre-trained models everywhere on

hugging face and so on.

So it can actually start with a very well-researched foundation.

And this almost eliminates or...

greatly reduces your initialization error.

Then you still have the discretization error, which we tackle with above-mentioned approach.

And then ah we then also have the Monte Carlo error, which is basically just the error that occurs from not having an infinite compute budget.

And that's where basically being smart about your budget allocation makes a huge difference, especially in this multimodal complex geometries, where we then use a lot of

emphasis um for our ensembling.

And then we enrich this via very flexible local approximations.

uh Basically what means we use rather short chains.

So that kind of puts it all maybe in context.

Yeah.

Okay.

Yeah.

Thanks, Emmanuel.

Very, very helpful.

And so I think you guys kind of answered here and there that next question, but I'm curious if you can make like...

a summary maybe of how, like what were the main bottlenecks that you encountered when doing the scaling work of Bayesian neural nets to high dimensional parameter spaces in,

well, the how you overcame them we just talked about, but what was the main list of main frustration?

I would say that I'm guessing a lot of practitioners encounter, are encountering right now without your work.

mean, I can try to answer, but I think Emmanuel also and Jakob.

to, I think, to some extent, actually already answered it in some way or the other.

So, I mean, first of all, we, think the software was one of the bottlenecks that um if you use PyTorch and then you do sampling, that even in very, very, very small neural networks

was 10 times slower, even maybe even slower than 10 times slower.

um And then this is kind of...

a bummer because if you work with, I don't know, a single hidden layer, three neural network, and you have to wait for a day to sample it, um which is actually still the case

if you, so you can ask JET GPT to write code for sampling, for sampling of Asian neural network.

And if it does that in Pyro um and you run it on, I don't know, Google Colab, then it will probably take uh quite some time and you will

maybe ask yourself whether um this is actually working or maybe if you want to give up.

And there, I think it's very important to know the software to maybe switch to JAX.

I think you can still make it work with PyTorch, um but then, um yeah, I think JAX can be faster.

So why not use JAX?

um I think there's also a lot of other communities now potentially.

switching over to Jax.

um Maybe Emmanuel can say something about this later.

um And once you have this hurdle, you manage this hurdle, the software hurdle, then I think there's a couple of other things that um are challenging in Bayes-Noor network

sampling.

um The standard stochastic samplers don't work off the shelf.

This is um pretty difficult to tune them.

um So you mean something like the nut sampler here, right?

be clear with listeners.

uh Well, the nut sampler, I think in most cases is, I guess, used with a full batch setting where you feed in the whole data set.

But then there's stochastic samplers that only feed a small part of the data set into the sample based on a smaller part of the data set.

And then um this is...

will become quite unreliable and also it requires hyper parameter tuning.

So this is not something that, at least there are not these defaults that are available maybe like for Adam and stuff like that, that where you can just run it off the shelf and

it will work.

um And I think we put in some effort also into this.

mean, the one thing that the model mentioned was this initialization thing, right?

If you initialize already in a good region, then this is another

um can be another big bottleneck or big hurdle because some templates are not necessarily good, at least from our perspective to do the job that an optimizer does to traverse the

weight space and find a completely different topological space, um like terms of like weight distributions, completely different weight distributions.

um It can do that, I think, but then in practice, maybe

step sizes play a crucial role and stuff like that.

And so, at least from our experience, was way more efficient to, as Imal mentioned before, start your sampling from a very good starting point already, from a maybe optimized neural

network already.

then you can also overcome this hurdle that the bottleneck that the sampler needs to find, first of all,

something needs to find the typical set and to sample from that.

And it has much easier time to do that, um to find the typical set if you initialize it um the right way.

Okay, so here very concretely, that would be, you would feed your neural network in a classic way with the Adam optimizer from PyTorch, for instance, and then use that as the

initialization.

of your Bayesian neural network sampling algorithm, uh if understood correctly, not with NUTs and with another package probably based on Jax.

How wrong am I?

No, you could also use NUTs.

You could use any sampler.

I think they will all profit from the fact that you initialize these in the right region.

um But then, um as Jakub said, uh

uh MCLMC based samplers could be more efficient in high dimensions.

um mean, NUTs still work.

So we did that.

And you get similar performance um in these Bayesian neural networks, at least in maybe not super large Bayesian neural networks, but um practically relevant Bayesian neural

networks.

get a similar performance with NUTs than with uh as with the um MCLMC based sampling.

But then um it's lower.

It's notably slower.

Yeah.

Okay.

yeah.

So actually that was something I wanted to ask you later, but since we're on that, um, and maybe Emmanuel, because I think David said you, would, you might have things to add to

that.

Um, like concretely, how can today, how can curious listeners try what you're talking about right now?

Like these methods that you've developed is there.

a reliable Python package they can install and play around with?

Yeah.

So basically if you want to check out MCLMC, then Jakob and his crew has done a tremendous effort to put that into Blackjacks, which works really well, as I said before.

But at the same time, maybe Blackjacks is a little bit more for the tech savvy guys, right?

Because it's quite configurable.

But what's actually quite nice about the whole JAX ecosystem is also that it has this functional character, right?

And this actually serves the purpose of, ah for instance, sampling a Bayesian neural network really well, because if you think about it, what you need for gradient-based

samplers like MCLMC or HMC is you just need to evaluate your likelihood, your neural network, get some gradient out of this.

And this is the same thing that you need for optimizing your neural network.

And if you just have a set of uh parameters, let's say it's, can imagine it as a dictionary, you can just handle this through the whole pipeline.

You just use your classical uh Optex, for instance, your optimizer, you just optimize your network.

And then you do the, just plug a take out basically in each step, the same.

tree of parameters and you just put it into the next step, whether it be an optimization step or a sampling step.

So it basically kind of, you can play around with your parameters.

You can even uh switch up things right in between, like for instance, ah cyclical SGLD would do.

So you can actually be quite flexible.

And I think this functional approach to um sampling and optimization also teaches you that both of these approaches are kind of similar and also share a lot of.

things.

actually you can smooth out this hybrid approach can also be done in a very smooth way, like transitioning from a non-tempered ah or highly tempered, so basically optimization

phase into a non-tempered sampling phase, for instance.

So you have a lot of ah things to do.

And I think like, yeah, Jax and Blackjack are a good ah combo.

And maybe also we will publish some code.

which is more user-friendly than our usual research code.

Also soon for PyTorch lovers, I would like to advert this Sam Duffield's Posteriorus package.

um This comes with a lot of ready-to-use samplers, especially for the stochastic case, so they scale up quite well.

They also come with some variational or approximate Bayesian optimization-based methods.

uh We also use

these methods a lot for our comparisons because that's just a well-maintained and uh easy-to-use ah package for Python.

Soon also, uh our group will probably release a hopefully very accessible, very user-friendly, sklearn compatible package um that is tailored to the tabular case.

classification regression and that implements basically this uh hybrid Bayesian deep ensemble um approach together with an MCLMC based sampler.

um that's maybe my two cents on that.

Yeah, this sounds really cool, really, really practical.

I love that.

We definitely need, as soon as you have these package available that you just talked about, send that to me.

and I'll make sure um to post that around in the LBS universe.

You have a lot of people here who I'm sure are waiting for that.

And also if you can already post the links to um at least the Posterous Package from Sam that you talked about in the show notes for this episode, that'd be great because it

sounds like it's mainly...

the main way of interacting with these methods.

Like the fastest and easiest way to interact with those methods would be there.

And then I put the link to Blackjacks already in the show notes.

if you guys, if listeners wanna dive into that, they already have Blackjacks and Posteriors.

And once you guys have your package, and I'm guessing some um notebook examples, we'll make sure to...

To add that to the show notes apposteriorly, but also to make sure that people know about that because I think it's extremely helpful to distill the great work you guys do at the

research level and just decimate it at the, at the practitioner level.

So yeah, thank you for investing in this.

That's, that's extremely important.

Anyone want to add something on that before I ask another question?

I mean, I think we will certainly.

We certainly have a package also from our group.

I think this is, and also, I we could already link to research code.

If those are for those who work in research, think you're, ah nowadays you need to um work through the other group's research code.

I think we happily can provide that as well.

Well, maybe I should also say that if you are not interested just in Bayesian neural networks, but in general, statistic sampling, actually, BlackJax integrates pretty well um

with other standard probabilistic languages.

I don't know, you can write your model in Pyro or Pyro or whatever, and you just extract the log likelihood from it.

It's a one line thing.

um And then you can plug it any sampler from BlackJax and there are also tutorials how to do that.

So it's pretty straightforward.

Yeah, yeah.

Like if you want something a bit less in the frontier of the Bayesian research, because Bayesian neural nets are pretty much at the frontier right now, if you want something a

bit more classical, Backjacks, Plugs, Plugs and Plays very well with Pyro and Empyro and of course, Pimsy.

And Bambi also, if you're using Bambi, can access it directly from there, which is great, especially for beginners who can have Wilkinson notation.

models and then just use blackjacks to have great samplers.

That's amazing.

Actually a more generic question I had for you guys.

Well before that actually, you were right.

David, that's Raphael.

Raphael Rems who recommended me uh to you guys.

So thank you so much Raphael for being such a long time listener and also you know being a now you're a matchmaker.

So official LBS matchmaker.

Thank you so much.

um But yeah, like a bit more, uh a bit broader question is, I'm curious, how do you guys think about the trade off between computational efficiency and posterior fidelity when

designing and using and choosing which algorithms to use for which model?

Because ah something you need to do as a practitioner usually is you need to justify, like people when they know you use patient models,

They usually associate that with NUTs or MCMC.

And we know NUTs and MCMCs, MCMC, have the, they take longer to fit, but they have these guarantees of convergence in the long-term, which you don't have necessarily with other

approximation algorithms.

uh People may be familiar with ADVI or DADVI, which is something we talked about on the show with.

in episode, I think, 147.

uh So I'll put that in the show notes.

But yeah, or Laplace or Inla, things like that.

But these come with more assumptions than nuts, but they are faster.

So in this case, you have to be careful of your assumptions, make sure that the assumptions work and your prior predictive samples make sense most of the time, and that

the model is able to recover the parameters from prior...

predictive sampling analysis with fake data.

Let's say, yeah, how do you guys think about that in these cases, like when you're doing patient neural networks?

I can maybe add, or sorry.

I actually wanted to say that Emanuel probably can say a couple of things there, but then maybe just as a general note, think there are the VIE methods that are currently out

there.

They are not necessarily fast.

I think you get a

I would say in the same amount of time, potentially get uh better performance with sampling actually.

yeah, I mean, is the expert there.

can probably say more about this.

I just wanted to say that maybe I will just give a perspective from the B &N case.

And I think Jakob also knows a couple of other perspectives quite well.

So he can maybe also give a slightly different perspective also from different um areas where maybe also

You actually go this way of thinking that you kind of outlined.

I would like to kind of revert it ah completely to maybe start by the most crude approximation that you could think of that is also often advertised by Andrew

Gordon-Wilson at his group at NYU.

If you think about these Bayesian neural networks and the approximation of the posterior of these huge spaces, you can think first ah of the most simple thing, you have just an

optimized model and you put a Dirac prior.

on exactly that model.

That's the most crude approximation that you can basically put onto that very complex posterior.

Well, then you can basically step up an edge and you can actually include an explicit prior.

You do map estimation.

Then you choose a restrictive family of distributions like a factorized Gaussian.

Then you approach basically the whole thing from a variational standpoint.

You can also basically put a Gaussian

on top of your local approximation, which is then Laplace approximation, which is already, as we have shown, as many researchers basically across the field have uh it's uh

generalizability, also performance and robustness, and in many cases, and calibration of the predictions, and for instance, credible intervals from these predictions.

But, and then you also have, you can assemble these things, right?

And you have an ensemble, let's say, of

Gaussians of very high dimensional Gaussians.

So you have a mixture of Gaussians, but still that's not quite flexible, right?

um So taking it from there, you can also again use an ensemble, but then you don't impose any, as you said, restrictive assumptions, but you actually do some sampling, which is

very flexible, very efficient if you then use, for instance, very strong gradient based samples like mclmc.

And then you get much more flexible local approximations.

And then you still

if you want to be correct and also if you want to not just lie about things, obviously this is an approximation to this huge dimensional space.

In the last weeks, also in a project where Jakub is involved, we sampled a vision transformer with, for instance, 22 million parameters.

We sampled all of those and we obviously did not collect millions of samples from this very large network.

but we collected uh many thousands of samples, in fact, of this model.

But still, in comparison with the dimensionality of the problem, is likely a very crude approximation.

But if you see the whole transition from this very crude Dirac delta ah to these restrictive local approximations to these then flexible local approximations, you can see

that you have gone quite a far, long way.

with all this uh flexibility.

then if you then look at performance uh functionally on uncertainty quantification methods, even just predictive precision, you can actually see that this works.

And this then is also not much more expensive than the optimization after all.

And it's also, at least if you are, again, little bit smarter about how you do the whole thing, um in my experience, also more robust.

compared to, let's say, variational approaches in these high dimensions, because these also come with a lot of hyperparameters, and uh they can also be quite brittle, like,

because of very noisy gradients and so on.

So overall, um think that actually this Bayesian deep ensemble approach, while still being a crude approximation overall in the big picture, is a much more flexible and precise and

quite beautiful approximation of the whole ah thing.

Um, then let's say, uh, other very, very crude, um, things, but of course you have, you incur a lot of error still.

So I think then from the other side of the very, uh, high fidelity, uh, posterior reconstruction there, have to ask, uh, Jakob.

Yeah, for sure.

Thanks first time for these very practical rundown.

Um, I, I'm really looking forward to you guys.

example notebooks, I think you should definitely write stuff here and basically make these.

Like, this stuff is in your papers, but we need that for practitioners.

I think it's going to be super helpful, honestly.

so, Jakob, yeah, what do you have to say about that?

Yeah, so I usually uh mostly think about, you know, more scientific applications where you don't have a basic neural network, but you're,

trying to learn the posterior and you want it to be the exact posterior.

um So the MCLMC, if you include metropolis adjustments, has similar type of guarantees as the standard HMC samples, which is to say, if you assume something about the target, like,

you know, it's log convex, something like that, then you can show that they actually converge.

But as soon as the target is, you know, has multiple modes or something like that, then you lose all of the guarantees in any case.

So.

There are no theoretical guarantees in either case, whether it includes adjustment or not.

So I everything is about the same setting here.

um The critical part is how we make it work in practice and how you look at the diagnosis in practice after you land the chains.

um So for example, know, standards, you know, people look at things like German Rubin statistics and stuff like that.

And we are actually right now also working on similar types of diagnostics also for our scheme without metropolis adjustment, um where you can basically check your bias is as

small as you would like it to be.

um Yeah.

So yeah, I think it's very practical question.

It's an art.

Yeah, yeah, yeah.

know, I know.

Sorry for asking these practical questions, but it's like, that's, that's what I do here.

no, no, no, I think it's a, it's an important question.

Definitely.

And we are thinking a bit a lot about it and yeah.

Yeah.

And actually it seems we're on the practical side of things.

I'm curious.

Do you, how do you, how do your Bayesian neural network models compare to standard deep learning approaches?

You're both in predictive performance and uncertainty calibration.

I think that one is going to be for you, David.

Yeah.

mean, of course in our papers, uh not of course, but they usually perform better both in predictive performance and uncertainty, um, qualification metrics.

this is maybe sort of, um, obvious because, um, essentially we have a big model average.

over like every sample is a weight vector of this one neural network.

then um we have a huge ensemble of um all of these networks.

then I think this is also, I mean, that has been shown in the past to be um very promising.

Assemble methods um have worked for random forests, boosting, stuff like that.

They also work extremely well for Bayesian neural networks.

um And as Immanuel said before, the uncertainty quantification metrics, at least those that are currently used, these Bayesian networks also outperform the standard deep

learning approaches uh by a larger margin.

Again, you could say this is obvious because you're comparing one neural network versus a big ensemble of neural networks.

Well, you have to first come up with this sample and that's the art then to sample from these Bayesian neural network posterior.

Joao, do you want to add something or?

I think that puts it very well.

I mean, for some problems, even sometimes cut metrics in half, right?

Like an RMSE for some problems is task specific in a way.

It also

It works for all different kinds of models.

in our this year's iClear paper with Jakob as well, we also had applications to uh different data modalities like text, uh images, tabular data, all different kinds of

models.

have now sampled the VITs, nano GPT, you name it, Resonance.

uh It basically is a very generic way of formulating the problem.

And then in the end, of course, uh you don't get a huge margin in each and every experiment.

But I think what's quite encouraging is that you get, you're never worse kind of than the standard uh Deeposomo, which is a really strong baseline.

Like you can actually make it work to always give an improvement and then it's up to the user whether this improvement is worth the extra effort, which we try to keep very low.

And also that is kind of an

the choice of the user, like how long would you sample from these very good starting points?

ah And obviously, if you in most cases, it's actually quite fast, performance.

So you don't have to sample for an excessive amount.

But obviously, you can just take a few samples and which maybe a few samples more would give you slightly better performance, right?

But then you basically have to trade off, as you said, computational efficiency versus ah your

performance at inference kind of.

also like to also showcase maybe that it's not the same, it's not just compute, right?

If you spend the same amount of compute for mean field variation inference for fitting a lot of just optimize a lot of ah ensemble members for a deep ensemble and spend the same

amount of time for sampling.

You can see that for this given budget, you can actually make sampling work much better, especially on these uncertainty quantification metrics.

there is some um magic sauce to the sampling itself, because if you think about it also, obviously you also bias your uh model average, which you have in the end.

If you, for instance, have a depot sample and if you just optimize each model, you don't have any kind of exploitation um part in the whole thing.

And other approaches have already experienced that as well.

like exploration, exploitation trade-off.

I think you heard Yingjun Li on the podcast as well with Cyclical SGLD, which kind of is the um personification of uh exploration, um exploitation trade-off.

So um this again holds here as well.

Okay.

Damn.

oh And something I'm curious about actually is...

I think so the way I see the models we're talking about right now is mainly as an add-on when you're limited by the classic deep neural networks, right?

Where you need actually the Bayesian machinery for some reason.

So first, maybe you'll tell me, no, no, actually there is a need in itself for this.

And I'm happy to hear about that.

But also what I'd like to hear from you is...

In which cases would you recommend listeners to look into that?

What kind of real world problems do you see as the best fit for these fast patient neural network approaches?

I can briefly answer and maybe Emmanuel-Jacob can add to this.

I think in general, the computation, the overhead is not that large anymore, that big anymore.

I would rather, I would actually um ask whether whether it's not worth doing it for all the applications because it's, as I said, the overhead is not that large and then you get

um you get a better performance and you get uncertainty quantification, you could argue that this uncertainty quantification is not like how do like do it?

Do I actually know um

Monte Carlo error that I incurred by not sampling 1 million samples, stuff like that.

But then the metrics that measure that we measure for uncertainty quantification are usually notably better than if you apply version inference or Laplace or something.

So um maybe you are not there yet.

Maybe you don't get the uncertainty quantification that you would want to have for your Bayesian networks because this still a limited um

limited compute.

then you're doing better than everything else that is there.

So I don't see any, any reason not to use it.

um Yeah.

And the, the, mean, there's has been so many, like, there's no specific domains, I think, as I said, like you can, you can apply to all sorts of Bayesian neural networks.

I think we're still

Maybe still not clear how effective it is or how much it brings to the table if you apply this for generative models, right?

So you could apply those also for generative models.

could sample as a manual said, a nano GPT model and then generate text from a nano GPT model.

um Does the additional uncertainty in the parameters in the weights, um does that somehow improve the generation?

That's I think still unclear, but

Then again, for all the other supervised learning tasks, think this is why should you not try it if it has not that much of overhead?

Yeah, no, 100%.

Especially in that's why I push always for these reproducible workflow accompanied with a reliable Python page because then it's like the entry cost is super

super low for practitioners.

And so it's like, well, you have the package here, you have the example notebook.

Just, you know, try and, and sample it up, fire it up and see what it gives you.

Maybe it's too, it's still too slow for what you need and that's fine, but maybe it's actually really what you need because all the bells and whistles you get from the Beijing

approach are, are going to be extremely important for your use case.

So yeah, like completely um agree with that.

Emmanuel and Jacob, anything to add on that?

No, that's I think David did a good job.

Yeah.

Actually, I'm sure like probably Jacob, you'll have something to say about that.

Do you have an example in mind or maybe uh a theoretical use case from you knowing the theory so much where a Bayesian neural network with

the microcanonical sampling approach could reveal something or revealed something that the classical neural net would have missed?

mean, again, I think that, yeah, I don't think the, I mean, from the large language models, there's this mixture of experts hype right now, right?

So there seems to be really,

something that you gain by mixing experts, by combining models.

I think you would profit in, I mean, there's certainly examples and we saw that also in benchmarks.

saw benchmarks where, I mean, Emanuel was, I think, very open about this before um for certain combinations of data set and neural network architecture.

there might not be an improvement in the predictive performance if you assemble models or there might not be a better uncertainty quantification than if you just put a Gaussian

around your optimized value.

um But it's usually not worse than that.

It's not worse than uh these comparison models.

yeah, I don't know.

I think maybe the

The thing that Jakub could have said is maybe um there are, think, some non-Bayesian non-network applications where you can definitely show that MCLMC uh does notably better

than other methods.

mean, if that's what you mean, non-Bayesian applications, there is plenty where without MCLMC you cannot explore the posterior sufficiently.

correlation length is just too high.

For example, we're working on cosmology applications.

And that big thing is now a field level inference, where you want to together with your parameters describe how the universe evolves.

You want to also learn the entire field of the universe.

And this is millions of parameters and as the service gets bigger, it's getting bigger.

So we're also getting in the machine learning dimensionality regime.

And yes, definitely if you just run NAT.

You will basically not converge in reasonable time.

So people have just given up on that and tried Laplace approximation and other these type of approximations.

But if you run MCMC, then you actually get a good posterior that's more reliable than approximations for sure.

And it's actually even faster than the approximations.

So um yeah, in this type of applications, it's very important.

Hmm.

Okay.

Yeah, I see.

I see what you mean.

Actually, you think, uh Emmanuel, anything you want to add on that or is that fine?

Okay.

Cool.

And Jacob and of course, uh David and Emmanuel too, if you want to jump, but I'm curious if you, do you expect micro-canonical methods to become more widely used in Bayesian

machine learning or do you see them

mostly particularly suited to the specific classes of models we've been discussing today?

No, they're actually very general purpose.

We have been trying it across models, uh like various different models from Bayesian neural networks to the set, physics applications to quantum problems.

uh Basically you name it or benchmarks and on none of the problems we found it to perform worse, but usually it performs much better.

Especially high dimensions.

can be orders of magnitude faster.

um So yeah, it's definitely not a problem specific application.

Okay.

Yeah, that's cool.

That's great to hear.

And actually, so that opens me up to two questions I have a bit more regarding the future, which is how optimistic are you guys about the feasibility of going to getting to where

you were talking about at the very beginning, David, which is...

Bayesian deep learning at scale without latency to put it in a very ambitious way as a default in the future.

How teamistic are you about that and what breakthroughs do we need to get there?

So I would be very optimistic about that.

I think that in a couple of years, we probably are, I mean, still, I mean,

We're not, I don't, so I'm optimistic that we can solve any, or that we can uh sample Bayesian neural networks of the size that now people in research, for example, optimize.

uh Maybe not the uh country, sorry, a company neural network sizes like OpenAI's GPT-5 model.

I don't think we are, we'd be able to sample that, that this

This is just not, um I probably also don't want that.

um But other than that, I think um there's almost no limit to, well, there is a limit for the typical use cases.

none of the researchers that I know outside of OpenAI or the other companies that run these large language models, all the researchers that I know,

um they could, I think in two or three years, um sample their models as well instead of just optimizing them.

And then um there might even be approaches that um maybe researchers from OpenAI and other companies ah are using that instead of sampling to get something like a Bayesian electric,

like for example, the I-Von optimizer from uh people from the Rican group in Japan.

um There are still some open problems that maybe Emmanuel can mention, um but I'm very optimistic.

There was a paper in the past from Andrew Wilson's group, again, where they used a lot of TPUs and lot of runtime to sample these Bayesian networks.

I think essentially what they did, we can do now in an hour or two with...

standard GPUs.

And I think this will just um get better and better.

And then as I said in the beginning, maybe at some point, we actually have like a meta model that doesn't even um that gives the sampling ah that the samples right away without

doing the actual sampling.

That is a generative model that can generate probabilistic distributions right away.

We have shown that this already works for uh small applications, for small models.

You can essentially replace an MCMC sampler with a transformer um and gives you the identical posterior um by learning a distribution in this transformer instead of sampling

from the model's posterior.

Sorry, I think Imad also had some comments about the um scale.

Yeah, thanks David.

was already super interesting.

I love that future.

He's done that on that part.

yeah, I completely agree.

Emmanuel?

Yeah, I to be honest, I totally agree with that future.

There are still a couple of obstacles, but I think they're also manageable.

I think one thing, for instance, uh would be basically...

that is shared for variation inference, Laplace, and so on, is that at inference time, ah you sample again, or you basically have to do forward passes through all your models that

you saved, ah basically your samples.

So this obviously incurs extra latency.

And I've been to industry and latency is relevant for the end user, especially if you user-facing applications.

But I can tell that there are groups that are very talented.

um We are also um maybe among of them um that are working on basically discerning this knowledge down.

And David mentioned a future where transformers can learn that, um which is not too unrealistic by now.

And also classical distillation or methods from statistics um could be applied there.

And I've even had some POCs on my laptop that didn't make it out to the public where I've seen that you could speed up the inference cost by like a factor of 100 quite easily

without losing um any performance.

So um I guess there's a bright future ahead for these sampling methods, which are not.

classical sampling methods anymore.

And I think that's actually something that people have to realize that a lot of these classical things don't work anymore, especially for BNNs.

Like we've also put this into one of our papers, the Gellman-Rubin statistics or the effective sample size was mentioned.

They don't make any sense in ah non-identifiable models, for instance.

um So you actually have to switch gears a little bit and you have to be smart about things.

with all this understanding that you can also get from, let's say, the optimization theory part, if you actually digest that, then you can come up with quite simple recipes that are

quite robust, that are often hybrid models between, let's say, optimization and sampling.

And applying those is actually very feasible and pretty performative.

I also think...

There's quite a bright future ahead and there are actually like, I can name four papers right now that sampled quite large networks and they were published this year at AiStats

or iClear.

um So there's a lot of progress actually in this field.

Yeah, that's really incredible.

Great time to be alive.

Jakob, do you want to add anything on this topic?

No, no, think guys already answered everything.

Okay.

So actually I'm curious also to hear you guys about, okay, that's, that's what you would like to see.

um And that's how you envision Bayesian deep learning at scale to look like.

More, even more concretely, I am also wondering where do you see the interaction of the, the method, the concrete packages and methods we talked about.

So Jack's, Black Jack's.

and Beijing inference, where do you see these heading in the next few years?

Especially with these new samplers, these new methods we just talked about.

Probably best if Jakob or Imanul answer this one.

Maybe I can say two words and kind of kick it to you Jakob.

So basically I think...

we're developing quite good code.

think there needs to be kind of a community effort to actually commit on a certain type of um package.

But uh the big flexibility, um like Posterior is already a good step into the right direction.

But I think like, actually delivering a package for the whole pipeline, like because Posterior is packaged as just a sampling.

So in fact, to kind of put it together, it also does kind of the optimization part, but uh you need kind of

at least for these hybrid methods, need optimization, need sampling, and then you also need to take care of the samples, as I said.

You have to actually manage all these samples.

And that's actually something that many people forget, because that's kind of this boring engineering that just goes into uh actually making these things work at scale.

You actually have to do it also at inference, you need to assess all of these samples, you have to do this in batches, you have to do this efficiently, um but you can do that.

with a Jax quite easily, you just have to put out the engineering.

But also what I've seen in the blog of, for instance, Andrew Gelman, which I think is quite encouraging.

also just recently mentioned Jakob by name and also Black Jax.

And I think this also will spread a lot in the also classical Bayesian world.

And I think Jakob, you have a better view on that.

Yeah, it seems like a lot of Bayesianists are turning to Jax.

um And so black Jax philosophy is to um somehow provide a skeleton for samplers.

It doesn't provide the end results.

That's the philosophy.

I don't think that will change.

um So the goal is not to, you know, provide one line calls that you can do to the sampling, but to give you a seminal already set.

Like the ability to tweak all the little details if you want to.

So I think that's a great start.

um I like it how it's framed.

um But we also of course need to also put out the final results um as the manner is doing.

And also in the more classical sampling world, I think that's maybe one missing piece, something to put together the different samplers and make them very user friendly.

Yeah, it's like, this is really a full stack, full stack work, right?

Which is kind of the difficulty.

You don't need only the technical implementation, which is already extremely hard in itself, but then you need also the, the thing around it, all the communication that goes

afterwards, you know, how to make sure people know about your method.

How do you explain to them without being overflowed by the...

by the customer support, let's say, and all that stuff that's beyond technical abilities was extremely important because in the end a great package that nobody uses is a shame.

that's also why I'm doing that, making sure we shine the light as bright as possible on things that really can change the game for a lot of people.

um Fantastic, guys, that's...

That's really cool.

I'm going to start to wind us down here because it's late for you guys.

is there anything I didn't mention that you would like to mention here before I ask you the last two questions?

Cool.

That means I did a good job because I had a lot of questions for you guys.

I'm very happy about that.

Again, to all the listeners, check out the show notes because they are...

They are quite big for this episode.

I think you'll really appreciate them.

There's a lot of really cool nuggets in there.

But as usual, guys, before letting you go, I'm going to ask you the last two questions.

Ask every guest at the end of the show.

So if you had unlimited time and resources, which problem would you try to solve?

Maybe let's start with David.

That's a very good question.

So actually...

because I recently saw some groups celebrating that they got uh access to super computer super clusters.

I was actually thinking to myself, well, we actually don't need that.

So they wanted to also do something for Asian.

I thought, well, we can do that now without the super cluster from Germany.

um But I mean, with unlimited time and resources, I think what is definitely would definitely improve our understanding would be to actually.

um hard with sledgehammer style um brute force um extract the lost landscapes from neural networks so that you exactly get the whole lost landscape of a neural network and then we

would have an understanding what is going on there.

Same for the learning dynamics because this is something that people now try.

to get the grip on to understand um how is it possible that great indescent, sarcastic great indescent approaches work so well.

um And then if you would have infinitely many resources and time, then unlimited time, then you could do that and understand it.

of course, the more relevant thing for our world would be to do something towards, yeah.

Can we turn the current trends um be it AGI or not in the end, can we turn that into something um for humanity and not against humanity?

think this is the more broader answer.

m Jakob, what about you?

Okay.

If it was purely driven by curiosity, not by ethic concerns, I think I would like to understand life.

ah how it forms, what it means and uh where it is.

Actually, my main science goal is characterize Earth-like planets.

So this is a small step in that direction.

um But yeah, I'm really curious about, know, are we alone and what other forms are there?

Yeah, understand that.

Who would not be?

ah Love it.

Love that answer.

That's great.

And Emmanuel, what about you?

Well, really hard to add something to those two good answers.

Well, again, ah certainly there would be more influential and more important problems, but I'm far from the right person to sketch a roadmap in these domains.

So I will just say flexible yet affordable, generalized, maybe universal probabilistic modeling at all scales.

um Because I actually truly believe that, ah well, calibrate the probabilities can lead to truly transparent and trustworthy systems.

And ah well, this doesn't really necessarily require Bayesian twist.

I'm not sure how this would turn out, right?

But uh since I started to work with probabilities, it almost and distributions, it almost always gave me a feel of like um I'm on the right track.

ah compared to like a little bit more limited knowledge ah from just looking at one point, like the broader perspective of that actually there's no uncertainty and that also kind of

is reflected in our everyday life everywhere.

And to actually really grasp that in some sense is quite inspirational, I think.

Yeah, yeah, definitely.

Yeah, love that answer.

For the interest of time, I will not comment.

But second question, if you could have dinner with any great scientific mind, dead, alive or fictional, who would it be?

David, let's start with you again.

Great question.

Also a difficult question.

I think maybe something that...

I'm very lucky to have is that we have this Bayesian deep learning consortium.

Like this is like a lot of people that do Bayesian deep learning.

uh They write a book together.

So there will be saying maybe next year coming out like a huge book on Bayesian deep learning um co authored by hundreds of, I guess, hundreds of authors.

So I could have probably have dinner with them already with these all these great minds.

um But if they would

be one thing that is, if you say fictional, then maybe someone from the future that knows uh what will happen in a couple of years and then talk to them.

That would be, I think, quite enlightening, obviously.

Yeah.

I love that.

um Let me know when that dinner happens.

I definitely need to come in with some mics.

It sounds like there will be some very interesting conversation there.

Emmanuel, what about you?

Yeah, pretty difficult.

I mean, to be honest, maybe someone who's alive right now, maybe it will happen at some point.

Let's put it out, shout out to Pavel Ismailov.

I probably pronounced the name wrongly, but I just started out with Beijing Neural Networks when I came to David's group and I read a lot of his papers.

And I really appreciate them.

They were really good.

And I feel like that he has also that we, I would be interested in some of his takes because we've looked at very similar problems and also had, think, would agree on many

things, maybe on some things not.

And I would be interested in that, but maybe just more generally, I'm actually quite fascinated by Newton himself ah due to his very broad

influence on science in general.

ah Yeah, that sounds about right.

And the good thing is that he spoke English, though maybe harder English than two days.

So I'm curious how you would communicate actually.

um And well, Jakub, let's finish with you.

Okay, given that David went into the future and Emmanuel stays in the present, I will go in the past.

I think I would like to speak with von Neumann.

He's a character from the history, supposedly very smart and I heard he was fun to be around.

So I think it would be enjoyable dinner.

I don't think I would learn that much.

He would probably be way too smart for me, but should be fun.

Yeah.

I love that.

I don't remember, but I think you're the first one to answer of annoyments.

So amazing.

That'd be weird if you were the first one though, because I've been doing that for six years.

where that he didn't come up yet, but maybe there are some listeners.

Raphael, you know better than me probably, so let me know.

Fantastic.

Well guys, thank you so much for taking the time.

That was a great show.

I really enjoyed it.

Thank you so much also for the work you do.

I think it's extremely important, not only because it's going to unlock a lot of issues.

Practitioners are...

currently having but also I think it's helping a lot to make patient inference still relevant in today's, in Tuesday's age and I can only thank you for that because obviously

we all love patient stats here and we think they have something to contribute uh but we we always need to make sure they they keep at the frontier they stay at the frontier because

otherwise we're gonna

lag behind, so thank you for being our knights in binary armor at the frontier of patient stats.

And again, thank you so much for taking the time and being on this show.

Thank you for having us.

Thanks.

This has been another episode of Learning Bayesian Statistics.

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit LearnBayStats.com for more resources about today's topics, as well as access to more

episodes to help you reach true Bayesian state of mind.

That's LearnBayStats.com.

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraam.

Check out his awesome work at BabaBrinkman.com.

I'm your host.

Alex and Dora.

can follow me on Twitter at Alex underscore and Dora like the country.

You can support the show and unlock exclusive benefits by visiting Patreon.com slash LearnBasedDance.

Thank you so much for listening and for your support.

You're truly a good Bayesian.

Change your predictions after taking information and if you're thinking I'll be less than amazing.

Let's adjust those expectations.

Let me show you how to

Let's get them on a solid foundation

Related Episodes

#142 Bayesian Trees & Deep Learning for Optimization & Big Data, with Gabriel Stechschulte

Listen →

Support & Resources

→ Support the show on Patreon
→ Intro to Bayes Course (first 2 lessons free)
→ Advanced Regression Course (first 2 lessons free)
Theme music: “Good Bayesian” by Baba Brinkman (feat MC Lars and Mega Ran). bababrinkman.com