Bayesian Experimental Design & Active Learning

‌

Listen on your favorite platform:

In this episode, Alex Andorra sits down with Adam Foster, a researcher at Microsoft Research, to cover a topic that’s long overdue on the show: Bayesian experimental design. The question driving Adam’s work is one most practitioners never stop to ask — not just how do you analyze data, but how do you decide which data to collect in the first place?

Adam came to Bayesianism not out of philosophical conviction but because it was the most natural language for the problem he wanted to solve. That problem, broadly, is this: if you have a model and a budget, which experiment would teach you the most?

What Is Bayesian Experimental Design?

Most of the time, we treat the dataset as a given. Bayesian experimental design flips that. It asks — given your model and the uncertainty it encodes — which experiment would reduce that uncertainty the most? Which reagents go into the reaction? What questions go in the questionnaire?

Adam is clear about when it’s worth the trouble: when experiments are expensive, when you have a model you trust, and when you have real control over how data is collected. If data is cheap, just collect more of it. The machinery only earns its place when the cost of a wrong experiment is real.

Expected Information Gain: The Score That Ties It Together

Once you accept the framing, you need a way to score candidate experiments. That’s where Expected Information Gain (EIG) comes in — a measure of how much you’d expect to learn about your model parameters by running a given experiment.

What’s striking is that you can arrive at EIG from two completely different directions — one from reducing posterior uncertainty, one from maximising outcome entropy while correcting for noise — and they turn out to be mathematically identical. The fact that researchers across many fields keep independently re-deriving it is itself a signal it’s the right quantity to optimise.

Computing it, though, is another matter. In simple discrete cases it’s trivial. In the general continuous case you run into what Adam calls double intractability: Bayesian inference is already hard, and now you’re doing it in a loop over many synthetic datasets. His variational BED paper tackles this directly, using amortized variational inference to short-circuit the repeated integration.

BALD and EPIG: EIG in Disguise

This is where active learning enters the picture — and where Adam is refreshingly blunt. BALD (Bayesian Active Learning by Disagreement), one of the most widely used active learning methods, is just EIG written differently. It scores designs by how much different model hypotheses disagree about the predicted outcome. For a neural net classifier with dropout, that means finding inputs where weight samples produce maximally different predictions.

BED is the general framework; BALD is one concrete, computable instantiation of it for a specific model class. That specificity is exactly why it’s been so widely adopted — it’s clear what to do.

EPIG, a variant Adam developed with Freddie Bickford-Smith, pushes the idea one step further: rather than reducing uncertainty globally across all parameters, it focuses the budget specifically on predictions you actually care about. If you already know what inputs matter downstream, why waste experiments on the rest?

Deep Adaptive Design: When You Can’t Wait

BALD and EPIG assume you have time to score designs between experiments. But what if you don’t? The motivating case for Deep Adaptive Design is human-in-the-loop settings — an AI interviewing a person one question at a time, needing to pick the next question the moment the previous answer arrives.

The solution is to pre-train a neural network that takes in all observed data so far and directly outputs the next best design, bypassing the posterior computation entirely. It’s amortized inference taken one level up: instead of amortizing the posterior, you amortize the whole design policy. Expensive upfront, but fast at query time — and for sequential, interactive settings, often the only practical path.

Why Hasn’t This Crossed Over?

The most honest thread in the episode is Adam’s reflection on why BED hasn’t seen wider adoption outside academia. It’s not the math, and it’s not really the tools. It’s that BED is almost infinitely flexible — which makes it genuinely hard to know where to start.

His recommendation: begin with BALD. Score a finite set of candidate designs, pick the best one, and see if it helps. The failure modes will tell you which direction to go next. What the field needs most right now isn’t more theory — it’s deep collaborations with domain experts who are running real experiments and willing to try a different approach to designing them.

Looking Ahead

Adam is candid that BED’s role in his current protein structure work is still an open question — the models aren’t Bayesian in the right way yet. In quantum chemistry, the connections are cleaner: using model uncertainty to trigger when to label new data with expensive simulations is BED in all but name, and it’s already happening in the machine learning force fields literature.

Check out the full episode above and the show notes for links to Adam’s papers on variational BED, EPIG, and Deep Adaptive Design.

You can also interact with the episode on NotebookLM! Ask questions, generate flashcards, and more.

Thank you to my Patrons for making this episode possible! Hope you enjoyed it, and see you in two weeks, my dear Bayesians!

Chapters

00:00:00 What is Bayesian experimental design and why does it matter?

00:06:02 What problem does Bayesian experimental design actually solve?

00:08:54 When should practitioners use Bayesian experimental design?

00:12:00 Is Bayesian experimental design changing how scientists work in practice?

00:15:04 What are the limitations of Bayesian experimental design?

00:17:55 What is expected information gain (EIG) and how does it work?

00:21:05 How do you compute expected information gain in practice?

00:23:48 What is active learning and how does it connect to Bayesian experimental design?

00:41:02 What is active learning by disagreement?

00:48:57 What is deep adaptive design and when should you use it?

00:56:02 How is Bayesian experimental design applied in protein dynamics and quantum chemistry?

01:01:58 What does a practical Bayesian experimental design workflow look like?

01:06:50 What are the future directions for Bayesian experimental design research?

Links from the Show

Test Alex’s Causal inference agent skill

Adam’s website

Adam on Linkedin

Adam on GitHub

Adam on Google Scholar

A good review paper of recent developments in BED: Modern Bayesian Experimental Design

A concrete example of BED for a human-centric experiment setting: Designing Adaptive Experiments to Study Working Memory — Pyro Tutorials 1.9.1 documentation

BALD and BED: Connecting Bayesian active learning by disagreement and Bayesian experimental design

Bayesian experimental design for model selection: variational and classification approaches

Variational Bayesian Optimal Experimental Design

Expected Predictive Information Gain (EPIG)

Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design

Designing experiments using gradient descent for large design spaces

Deep Adaptive Design and Bayesian reinforcement learning

And finally, if you’re curious about the quantum chemistry: A machine learning introduction to Orbformer | Adam Foster

What Is Bayesian Experimental Design?
Expected Information Gain: The Score That Ties It Together
BALD and EPIG: EIG in Disguise
Deep Adaptive Design: When You Can’t Wait
Why Hasn’t This Crossed Over?
Looking Ahead
Chapters
Links from the Show

uh Today, we are diving into Beijing experimental design with Adam Foster, whose work tries to answer exactly this.

How do you use Beijing reasoning not just to analyze data, but to decide which data to collect in the first place?

Adam works at Microsoft Research AI for Science, where he works on protein structure sampling.

In this episode, we build the whole picture from scratch.

What even is Bayesian experimental design and when should you actually care about it?

We also get into active learning by disagreement, deep adaptive design, and how all of these are actually just the same idea, wearing different clothes.

And then we zoom out to the bigger picture.

Why is this field struggling to cross over from academia into practice?

And what would it actually take to change that?

That part hit close to home for me.

This

is Learning Basion Statistics, episode 156, recorded April 19, 2026.

Welcome to Learning Basion Statistics, a podcast about patient inference, the methods, the projects,

and the people who make it possible.

I'm your host, Alex Andorra.

You can follow me on Twitter at alex.andorra, like the country.

For any info about the show, learnbasedats.com is Laplace to be.

Show notes, becoming a corporate sponsor, unlocking Beijing merch, supporting the show on Patreon, everything is in there.

That's learnbasedats.com.

If you're interested in one-on-one mentorship, online courses, or statistical consulting, feel free to reach out and book a call

Topmate.io slash Alex underscore and Dora see you around folks and best patient wishes to you all

Hello, my damn Bajans!

Continuing on my Bajan skill project and open source repo, I wrote a few weeks ago a skill to do causal inference in your models, and that way we are making sure that your agent is

taught causal inference the right way.

Well, I also wrote a blog post about the full story behind the causal inference agent skill, and the key insight in there is that agents know

causal inference, they identify the right design, pick the right estimator, write clean code, but they never stop to ask, is my DAG right?

Or does this survive a placebo test?

In other words, the word causes was never heard.

I go into details in the blog post and I show you that the skill doesn't add knowledge, it just adds discipline.

You'll see the full detail in the blog post, but basically we go through why agents confuse correlation with causation.

I show you a side-by-side comparison, same prompt, two agents, very different conclusions.

And I show you also the four mandatory checkpoints where the agent stops and asks you as the user for your input.

And a lot more things, especially the philosophy I use for these skills, which is basically the idea of doing less but carefully and why that beats doing more but faster in

that domain where confidence without rigor is dangerous.

So, see you soon.

My dear patients, can check out the blog post in the show notes.

And in the meantime, let's listen to Adam Foster.

And I'm Foster.

Welcome to Learning Vision Statistics.

Thank you very much.

Thank you very much for having me on the show.

Yeah, I'm very excited about that.

We're going to talk about about new topics that we haven't covered yet on the show in eight years.

So that was long overdue.

To start with, obviously, as usual, what is your origin story?

You know, what are you doing nowadays?

And how did you end up?

doing that?

Yeah.

First of all, shocked that you haven't had Bayesian experimental design yet.

You know, but I'm glad you're correcting that now.

Yeah.

So about me, guess, I would say right now I work for Microsoft Research AI for Science, or I'm currently working on protein structure sampling.

But to kind of rewind, I started out doing maths, you know, as my first degree and masters.

And then, you know, I realized, okay, machine learning is kind of a thing.

What's the part of maths closest to that?

So then I started doing stats and probability.

And then I worked in the US for a year for a company called Rome Analytics that was kind of machine learning based.

Then I decided I wanted to do my PhD in that topic.

So I joined Oxford stats where I worked with EY Tay and Tom Rainforth on Bayesian experimental design as my main topic.

Yep.

So worked through, know, PhD on that.

um Many interesting kind of chapters and side quests went into that as well.

And yeah, I think that's going to be the meat of what we're going to talk about today.

And then since that, yeah, I joined Microsoft research, worked on a few interesting topics, joined AI for Science, which I think is a really, really exciting organization

within MSR um and worked on a couple of topics, quantum chemistry.

And now yeah, protein structure.

Yeah.

Okay.

Yeah.

This is so, um, technical and interesting.

I love that.

We'll definitely, we'll definitely dig into that.

Um, yeah, think so you're right.

I did have, I'm pretty sure I had vision experimental design already on the show several times, but I don't think I had active learning.

So this is mostly, it's mostly going to be the new things for listeners.

It's debatable.

If they're different, we can get into that later.

That there's a lot of um people vying for territory in the kind of namespaces.

Yeah, I can't guess that.

So that's your more general, let's say broad uh story, but I'm also curious to what drew you to Bayesian stats in particular and then even more particularly to Bayesian

experimental design.

in whether it was your slow realization or if it was more of a light bulb moment.

It's interesting that you ask about what drew me to sort Bayesianism.

So in Cambridge, in the stats lab, when I was there sort of around 2015, there was a very strong anti-Bayesian ah sort of culture, which I don't think there's that many groups that

are strongly anti-Bayesian.

But I think

I was hanging out with some of the most anti-Basians.

So how did I end up doing Bayesian?

Well, I think partly it was just who I was around when I joined my PhD, EYT and Tom Rainforth, both have done a lot of serious Bayesian stuff.

But I think also it was the topic of experimental design and iterated experimental design where I felt like just, yeah, Bayesianism

was just sort of a language that allows you to talk about that quite concretely and quite naturally.

Um, and that was sort of maybe missing from other frameworks.

So I think they actually kind of came onto the scene kind of together a little bit.

Um, where, you know, me getting into Bayesianism was also driven by me wanting to think more about, experimental design.

And then the experimental design specifically.

So I think it was Tom Rainforth, my, my

um, co-supervisor at Oxford who first actually brought this topic to my attention.

Um, and then I went to do an internship with Uber AI Labs, uh, and now sort of defunct part of Uber.

And there I worked with Noah Goodman and he was also really interested.

So I kind of had these two like people who were pushing me in that direction.

Um, and it did seem like a really interesting under explored area, you know, um, everyone's doing

you have your data set, download MNIST or download some tabular data set and try and get a better number on that, get a better sort of marginal likelihood.

And this was a completely different setup and a different problem that felt underexpored.

Hmm.

Okay.

Yeah.

So it was really, it was really motivated by something that you saw as a gap in the literature at that time, if I understood correctly.

Yeah, definitely.

And also, know, Noah had a lot of practical stuff with, sort of cog-sci experiments and psychology experiments.

So it's sort of like, okay, there's a, there's a reason to do this.

There's a gap in the literature and it's a very fun problem that I think gets to the heart of why we do Bayesian.

I really think it does get to the heart of why we do Bayesian em stuff in the first place.

Hmm.

Okay.

Yeah.

So actually let's, let's dig a bit into that.

You know, when you say

And when I say Bayesian experimental design, what does that mean and what problem are you actually solving?

Yeah, interesting.

So why don't, yeah, let's start from the sort of problem on the practical side.

By the way, we have written a couple of papers trying to try and answer this question.

So it's not, it's not that clear where the boundaries lie.

And I think some people might, might be doing things that we are actually classifying as Bayesian experimental design.

But anyway, that's a bit by the by.

So experimental design, right?

You, you don't have your data set yet, or you maybe only have a partial data set and you're going to collect more data and you have control over some of the parameters of that

collection.

Right.

So for example, you're going to do a, you want to go and collect data from humans.

Well, how are you going to interact with them?

If it's a questionnaire, what's, what are the questions?

Or you're going to go into the chemistry lab.

Well, what do you do?

What are the reagents and what are the conditions?

Right.

So.

I guess it's sort of upstream from data collection.

Whereas typically like I was saying, you have a fixed data set and you just start doing analysis.

So it's a slight change in mindset there.

And then once you have this, then obviously anyone would ask, well, how do I choose the settings?

Right.

Like I can talk to a hundred people.

Well, what should I do?

Or, you know, I have this budget or whatever.

I have this much time.

Well, what settings should I choose?

So that's just experimental design in general.

there's like many, many ways you could try and tackle that.

In Bayesian experimental design, I think we take the view that your model, right, your Bayesian model already has uncertainty within it.

That's one of the reasons why people really like Bayesian, right?

It comes with this uncertainty measure, which you wouldn't necessarily get from other paradigms.

And then you could ask the question, well, what new data would reduce uncertainty?

And that I think is really the Zen of Bayesian experimental design.

You're asking, well, what new data could I add that would then reduce uncertainty?

And then you could flesh that out and say, uncertainty in exactly what?

um and that, you know, there are different answers depending on what your aim is.

Hmm.

Yeah.

why, I mean, why should a practitioner even care about Bayesian experimental design?

You know, just to, well, collect more data.

You know, why would that even be interesting?

Well, how, how are you going to collect the data?

Right.

That's the whole question.

Like, yes, you could, you could go out.

mean, I think it just depends on what your topic is.

Right.

If you're doing LLM, yeah, you can just go and call the internet and that is data.

And maybe there are some parameters about how you would do the call, but the data just kind of exists.

I mean, if you're writing a questionnaire, well, what are the questions?

like, you can't avoid the question of, sorry, you can't avoid the topic of experimental design.

If you're in that setting, right.

You have to fill in something.

Well, then you can ask the question, well, why not just put random stuff or just what feels right.

em I mean, often you, you can, and if you, if you're really going in blind, you have no model, you have no expectations, then maybe that's legitimate.

Um, if you have some prior, uh, or if you have a model or if you have some concept of how you're going to train the model afterwards, right?

I think this is important.

If you already have an idea of what, what you're going to do after you've collected the data, you can sort of forecast that ahead, right?

You can say, okay, well, when I've got the data, I would do this.

So let's analyze that actually before doing the kind of committal step, right?

The committal step is to, you know, do the survey, pay, pay people or.

or spend volunteers time or, you know, build the object and, I don't know, blow it up, whatever your experiment is.

That's the committal step.

So why not do some analysis of then, well, now let's imagine this was the dataset.

Well, what would it look like at the end?

um So I think that's the general pitch.

And then whether you would choose a specific algorithm would come a little bit more down to your exact model.

your exact assumptions.

Hmm.

Okay.

Can you, can you maybe clarify a bit and for listeners and maybe give them a lay of the land on when you think these kinds of methods and model are most useful?

I think, I think they're most useful.

First of all, are you in a position to actively collect data?

Right.

So, so

If you, for example, if you're running your website and you're just tracking what people do, but you don't, you don't actively engage with them in a way to change the data, then

it probably doesn't make sense, right?

It's like you are just observing the world as it goes past and you're not going to do anything to it.

So that's probably a case where you wouldn't really think much about experimental design, although you could think about it.

So we'd think more about the case where you are, you have control parameters, right?

So like I said,

You're writing a questionnaire, you are choosing reagents to put into your reaction.

You are designing an object that you're going to explode in some exciting way.

So are you in that case?

Right.

And then if you want to now use sort of specific methods that I've worked on, you would think about, do I have a model of this process?

And can I capture some of my uncertainty in a Bayesian model in the sense that

If you were to look at the prior predictive distribution of your Bayesian model, that would give you a good idea.

Okay.

You'd say, okay, yeah, this is what typical data sets might look like.

If you're in that setting, then you know, you've, you've hit the money.

Like that is the time to kind of switch on your Bayesian experimental design.

um Another case could be that you've already done some data collection and you want to, you want to scale up.

Right.

Cause then you can, you're not just going in with pure assumptions.

Right.

You can validate a lot of assumptions on your small data set.

You can use that to build yourself a Bayesian model that, you know, again, you think sort of captures a lot of your uncertainties about the data generating process.

And then you want to do that scale up.

So you want to spend more on data collection.

Well, that would be the time to do it.

Okay.

Yeah.

Yeah, thanks.

I think this is very practical and things that the audience appreciate.

um actually.

you know, keeping staying on the practical side.

Do you see these kinds of methods?

So a patient approach to experiment design.

Do you see actually changing how scientists work in practice or is it still mostly a research endeavor?

Let's say, because I think you're in a very interesting position because you've been in there for years.

You're at the forefront of the research and I'm guessing that you have a

an interesting way of things, how things are going.

Yeah.

I think it's really dependent on the area and what exact experiments people do and how they've typically analyzed their data.

So, you know, I've been fortunate to just sort of get the odd email from a practitioner and they've been from really diverse areas, know, like manufacturing to some sort of like

neutron physics and

a whole range of really interesting applications.

I think in most of these cases, it was about um research and is this a new way that we could do things?

I know that there are some areas that are a bit more advanced and I think if they're quite advanced, they've kind of redeveloped Bayesian experimental design a little bit because it

was such a natural fit to what they had to do.

I think the domain specialists of

kind of rediscovered it in some way.

You often find this.

think Bayesian experimental design is one of the most rediscovered ideas.

Um, cause it sort of keeps coming back with a slightly different name and things like that.

I know for example, like in sensor placement, it's quite advanced.

Um, for example, cause I think that's just a place where the practicalities really can use Bayesian experimental design and it's quite high value.

Um, and yeah, I know, you know,

with people like Noah and doing the COGSYE research, think it's relatively studied there, for example.

So I think it very much depends on the domain and how the data is analyzed and what tools people are familiar with and whether they think they could get a big gain from it.

And it would be very valuable.

Yeah.

Do you think there is one domain, like is there one domain in your mind that is

still not applying your research and that you think would get a big gain in that, know, like it would be a low hanging fruit.

And when I'm seeing your research, not like, only your personally, but like your fields research.

Yeah, yeah.

The subfield.

um Nothing really springs to mind.

because I think you do need to be a little bit expert.

I think that would be slightly arrogant of me to say, oh, look at these silly people.

Why are they not using this?

I no, for sure.

I've done, there are reasons why people do things the way they do them.

know for example, you know, like with A B testing, right.

It's so studied.

And you could say, well, do they use it based on experimental design?

And if not, why not?

And I think one of the reasons they, they

I think they do use it.

Some people do use it, but one of the reasons people don't is that, um, they, they have, they just have other criteria by which they judge things and they have kind of these

unknown unknowns that they are a bit worried about.

And so they're looking to, they just have slightly different criteria, um, with what they do.

And they're not, I think they're not comfortable basically relying on prior knowledge from previous experiments.

Right.

That would be the way that you would apply in AB testing.

You'd say, well, actually a lot of the variations we're experimenting with, we already have data on, so our prior is strong.

So we should, we should invest less time in those.

I think some people adopt that mentality, but I think other people are like, Ooh, but there could have been an unknown change that we don't have eyes on.

So we actually just need to just run the baseline again with exactly the same sample size.

yeah, I would, I would.

very much hesitate to come out and say, this is the specific application that should use it.

I think it's an area that a lot of fields could investigate and I think really interesting research could be done.

I think it also revolves around models and how advanced are the models?

How confident are people with the level of uncertainty?

Do they think that's accurate or is it just, you know, something that's unreliable itself?

Yeah.

Um, and something I'm, I'm always curious about when we talk about methods on the show is when are they useful, which I think you, you covered pretty well already, but also when

they break down, you know, so when, when would you say the Asian experimental design breaks down or underperforms, um, simpler heuristics?

when would you tell someone, Hey, you know, like don't bother.

With that, just keep doing the classic thing.

Yeah.

I think there's many cases where I would say that.

So one would be if just picking up random data is just really easy for you, just do that.

Right.

Like I think there's no point doing a lot of computational work to design your experiment.

If the data is so cheap, you can just get 10 times more of it.

um So I think if you're...

And that you think it's valuable, right?

Like, I think there are settings where you can get 10 times more data, but it's just all telling you nothing.

And then you obviously would want to invest in some experimental design.

Um, but there are cases where you can get quite reasonable data at a very low cost and, and you can sort of apply simple heuristics about, you know, what your train, what your

test set looks like.

And you can sort of correlate your data with that.

And then you're like, okay, well, this data should help answer my question.

Um, so I think it would be a case where maybe your experiments are quite expensive, typically to sort of justify wanting to design it properly.

and, and I think the other thing is about models, right?

Um, you need to have some kind of model that you're somewhat committed to.

It doesn't mean it has to be a hundred percent totally right, but it has to represent pretty good guess at how things work, um, in the system.

I mean, I think this is, this is a topic of, of much debate in the, in the field.

Can you take a black box model?

you know, random forest, Bayesian neural network.

Um, and can you run experimental design with that and how much, how, how much initial training data would you need?

Right.

Obviously I think your Bayesian neural network untrained prior is not going to tell you a great deal.

but you might be able to bootstrap it with a small amount of data or something.

I think that can work definitely.

mean, we've done research on that topic and we've, we've shown that it, that it definitely can help.

Um, but you know, if you, yeah, I think if you are just really not wanting to commit to any model, um, then that would also be a sign maybe that things aren't working.

Or the other thing would be that your model is not Bayesian or it doesn't express any uncertainties that you would want, right?

That's quite common.

You know, people have a model.

They know it's wrong, but it just makes these point estimates and that they have some kind of bias that's not quantifiable.

And there's no parameter of the model that you could vary that would sort of change those predictions.

So think that would be another setting where it maybe wouldn't make sense.

Okay.

Very valuable.

Thanks for entertaining me here.

And actually, I remembered I...

I had already an episode about Bayesian experimental design with Desi Ivanova.

That was episode 117.

folks, if you want more background about this, I remember we dug quite a lot into the intuitions and the concept about Bayesian experimental designs.

Do look into that because, well, what we did with Adam was more of a...

of a refresher here and we're also going to talk about other topics.

So definitely check out the show notes for that.

And Adam, actually something you also like, yeah, something I've seen a lot in your work is something that's called the expected information gain.

Yeah.

So can you tell us what it is measuring intuitively and why maximizing it is the right objective?

First of all, uh everyone should go and listen to that episode with Desi.

I've worked with her pretty closely back at the beginning of her PhD.

I'm sure she's got some probably quite complimentary things to say, given her slightly different background.

I would really recommend people to check that out.

Okay, so I to expect an information gain or EIG, as we acronymatize it.

So as I was telling you,

Um, Alex, what, what we do is we, have parameters of our experiment and we want to sort of decide them, The questions of your questionnaire or something.

Um, and there's a sense that you could choose them optimally.

So what would that mean?

Well, optimization in this case means that you can associate a score to every one of your designs.

And then you choose the one with the highest design, you choose the design with the highest score.

And that would be your optimal design.

Okay.

And then the score is this thing, expected information gain.

So it's the amount of information you expect to gain about your model parameters by performing an experiment with a given design.

Now, so why don't we unpack that a little bit?

Cause I think that's, you know, that's sort of maths language just put out into an English sentence.

I'm just trying to think of an example that might, that might kind of, um,

bring this home, but you, well, if you remember, we said you have uncertainty, right?

Your Bayesian model is sort of reflects the uncertainty that you have.

If you choose a design of your experiment, you can also forecast what the datasets would look like through your prior predictive distribution.

Right?

So now you can take a dataset, a prior predictive sample, which is a simulated dataset.

Then you can do your

Bayesian inference, and then you can look at how the uncertainty changed.

And then you can repeat that for various different synthetic data sets.

And then you take the average and that's your expected information gain.

That's one way to arrive at it.

I think what's really cool about it is that, you know, if you actually work through the maths, you can arrive at it in quite different from, from quite different starting points.

Here's a second starting point, which sounds completely different to the first one, but actually arrives at the same quantity.

So I don't know if you're familiar with this.

It's in Brooklyn nine nine.

It's this question about you have, I think it's like 12 prisoners on an island.

One of them is slightly heavier or lighter.

You know, how would you find, how would you find out which one it is?

um Okay.

I don't know if you know how to solve that problem, but basically the, kind of the way to think about solving that is.

And that's an experimental design problem, right?

Because you have settings, which of these people you want to weigh against, which other ones, and you have outcomes, which is, well, did the scales balance or not balance, right?

I don't know if, like, have I told you enough about the setting or do want me to just clarify that little problem?

Yeah, maybe clarify it, I think, for listeners.

Yeah.

Okay.

I think this is quite a fun, a fun puzzle.

people might enjoy it.

So.

I think it's 12.

Yeah, I'm pretty sure it's, have, you have 12 people on an island and you're told that one of them is slightly heavier or lighter than all the rest.

Okay.

And you have access to like a ginormous balance scale, right?

One of those old fashioned scales where you weigh two things against the other and it's either level or it's tipping one way or it's tipping the other way.

Right.

Like my mum has one, still it's like kind of like a nice old one.

Um, and you have a stack of weights that you compare to your flower or something.

Okay.

So now imagine you have a giant one that can weigh people against each other.

Um, and then the question is, well, how would you most efficiently find this odd, odd person, right?

Of the 12, you've got an odd one out.

How would you find them most efficiently with the fewest weighings?

That's the puzzle.

and.

The kind of, won't tell you the answer.

I think it might be fun for people to think about it a little bit.

It's in the show and I think they can't solve it in the show.

At least if I remember rightly.

But the best way to think about it is there are three outcomes, right?

The scale can either tip left, tip right, or it can balance.

And ideally what you would want is for all three of those to be equally represented.

Now, why would you want that?

It's because you can, you're kind of dividing your search space down as quickly as possible.

Right.

If you, if you, if all three of them are equally likely with your different hypotheses that you might have.

Um, then if you see it tipped left, so for example, suppose that it was a third, a third, a third based on the hypotheses that you have, then when, if you see it tipped left, the

search space has divided by three.

So you've made really good progress on identifying, you know, the true hypothesis.

If the probabilities were not a third, a third, a third, you might have only reduced the search space by quite a small amount.

So you can, and you can, you can mathematize this and you can, you can talk about the, right.

So the entropy of the outcome is the, right.

If you made all likely.

And it turns out that for this experiment, the rate approaches to a maximum outcome.

Um, so I think hopefully with that hint, people could now figure out this EIG.

This has become a very, there's one, there's

That's basically...

So I recommend you tell us the solution, but at the end of the show.

Okay.

Well, I actually don't know it.

I only know how to derive it.

I will have to email you afterwards with an actual solution.

perfect.

Yeah.

All I know is what I've told you, which is that you need to divide the search space in three every time.

Yeah.

But then you need to come up with a decision tree that actually does that, which I can't do off the top of my head.

So go ahead, sorry, I was always taking in the math enigma.

So, I think to kind of get people back in, in the flow of where we were, right?

We're talking about EIG.

And I've given you kind of two arguments about how to design an experiment.

One was about reducing uncertainty in the model parameters.

And then this one is about maximizing the uncertainty of the outcome.

And it turns out there's kind of one missing piece, which is, um, you know, in this sort of balance scale problem, there's no notion of noise, right?

So you, you, you want a really uncertain outcome because that would divide your search space, but you can imagine that actually some experiments have noise and you want to avoid

experiments that are very noisy, right?

Because they could be very uncertain because they're just noise, right?

If you, if on the island, you also had a six sided dice.

You'd be like, wow, like that has log six of information because it has six outcomes.

Whereas my mass balance only has three.

Yeah, but it's not correlated with your problem.

It's not correlated with the problem of the sailors who you're weighing.

So you need to correct for this noise problem.

And once you correct for this, you have, you have two formulas.

One is about the reduction in uncertainty about your model parameters.

And one is about how

uncertain is your outcome accounting for noise, right?

And it turns out that these are equal, right?

If you actually just work through, they are the same thing and that is EIG.

And that's why I think EIG is really central because it kind of, you can, you can re-derive it in many ways.

It keeps coming back.

As I said before, it's one of those things that's been re-derived at least 10 times um by different people.

So I think the fact that people keep coming back to it as the kind of objective criterion

choosing experiments tells us that it's quite important.

And then just to kind of link that back then to the big topic.

So I said, you know, it's all about choosing inputs for experiments.

So the idea now is you have a scalar value associated with every candidate design, which is this EIG, and then you just choose the candidate design with the highest EIG score.

It's very simple.

Yeah, I mean, it's very simple once you explain it.

So, then how, so how do you compute the, the IG score?

Uh, because I've, I've understood from what I've read to prepare for the show that it can be hard.

So what are ways you can do that?

And when it's too hard analytically, are there ways to get around it?

Yeah.

So this is actually now touching on my own sort of real research, actual papers written by me.

But let me just back up to say it may not be hard, right?

So with this prisoner island question, there's a finite number of outcomes.

There's a finite number of hypotheses.

It's just a very small, like you could just do it with a small NumPy array, right?

So it's not hard.

It's actually very easy to do on your laptop to run all the calculations.

um So it really depends on your model, the size of your search space, the size of your hypothesis space, the size of your outcome space and the size of your design space.

Right.

And the type of outcome, right?

Is it continuous?

Is it a classification task?

For example, like we had a three-way classification, right?

Left, right or balance.

um Or in your questionnaire, it could be a yes, no tick box or the person could be writing free text.

Obviously they require completely different models to analyze and to sort of predict.

mean, if your, your model is meant to be of a generative model of text, that's quite a more complicated than a checkbox that could be on or off.

Now, one case that I looked at was about continuous outcomes, right?

So suppose it's kind of a regression model, multi-dimensional regression or something like that.

And in this case, it does get a little bit kind of hairy due to this, what we called, mean, Tom generated this term double intractability, which means um if you think that

Bayesian inference is already kind of intractable, right?

Cause it involves this um

this, this denominator that you have to integrate over all of your prior space, right.

The much, the marginal likelihood.

If you think that's already intractable, then in experimental design, if you recall, I was talking about synthesizing lots of datasets in repeating experimental design, sorry,

repeating inference, right, to compute the score.

That's kind of the heart of your double intractability.

So you would synthesize datasets, do Bayesian inference, and then do that in a loop.

So you're kind of doing this very hard intractable thing many times over.

Now, like I said, with the prisoners on the island or maybe with a classification model, there are ways around that.

And it might be that it's just not really a big deal.

But in the general case with these continuous outcomes, you have to do something about it.

And that's what we looked at in this paper, Variational Bayesian Experimental Design.

I think it's

variational based in optimal experimental design.

um And we looked at, like how could you use variational inference?

Right.

I mean, variational inferences or amortized variational inference is a way to obviously sort of get to the posterior just as a neural network function of the data.

Right.

That's the purpose of amortized inference is that you feed in your data.

and it will immediately tell you how the posterior should behave approximately.

And that is obviously going to be really useful for this concept of synthesizing many data sets and doing inference, right?

You could, well, I'm amortize that.

So I won't have to keep solving ginormous integrals.

I'll just take forward passes through my network.

That's one of, I mean, that is quite a dense paper.

I think if someone enjoys a lot of maths, that would be a good one to read that that's just one aspect of what we studied.

And we looked at other kind of estimators.

Because the EIG can be rewritten, right?

As I said, you know, had these two different viewpoints on it.

m Because it can be rewritten so many ways, can estimate it in different ways.

And it sort of depends on what the dimensions of different things are in your model.

em But yeah, that was kind of the thrust of this paper on variational em Bayesian optimal experimental design.

Yeah.

And I think these were really interesting and brilliant.

paper so please do add that in the show notes because here we can only do them partial justice.

First math in audio form is always harder and second these are long so definitely for people who want to dig deeper um do check these out.

Adam will put the links in the show notes you'll have that on the website and yeah do check these out because I think if you're interested in these topics and they are

something you're working on you'll definitely want to check that out.

Um, and actually m you've also written about the connection between, so, Bayesian active learning by disagreement, which is a new concept.

haven't talked about that yet, on the show and Bayesian experimental design.

So there is a lot to unpack here.

So first, can you define what active learning is for us?

Wow.

Now that is, that's that's a really hard question.

yeah.

I don't think there's a universally accepted definition of active learning.

I think it's more a vibe, right?

The vibe is, it's quite similar to what I've been talking about, but it's usually people imagine the model now to be a neural network.

And people imagine that it's about getting improved test set performance or validation set performance, right?

So you, again, you can, you can actively choose what kind of data points you want.

And it might be that you have a pool, right?

You have a pool of candidates, but they don't have labels and you want to add labels.

That's kind of a typical setup.

Now I would sort of controversially state that this is a special case of experimental design, right?

Like you have, you have certain inputs, which is your set of candidates to be labeled.

And you say you have experimental outcomes, which are the labels that you're going to reveal.

um And then you might have a Bayesian model.

um So for example, with, with m

With Freddie Bickford-Smith, we worked on a lot of stuff about active learning and there we kind of boiled it down to, yeah, it's about um doing experimental design with a view to

improving your performance on a certain test set or, you know, target set, for example.

So you sort of know what predictions you want the model to be good at and you try and then find data that would help the model to achieve that goal.

People use active learning in many different ways.

think some people use it to mean experimental design in general.

So what I've given you is very much my biased definition of active learning.

And I'm sure if there are any professors of active learning listening, they're going to be shaking their fist at the speaker.

that's my definition.

started a feud.

just started a big feud.

This is not the beginning of the feud.

This has been going on for a while.

yeah, we just reignited, put, put fire on the, put oil on the fire.

And so, yeah, that's my, that's my definition of active learning.

um And then bold, right, we were talking about basic active learning by disagreement.

Well, what is that?

um So this comes from a 2011 paper by Neil Hulsby.

I think that's, that's, I believe the first use of the term.

It's basically EIG again, right?

So I said it keeps being

reinvented, it's just the EIG.

It's just a different way of writing it as an objective function.

I think this paper, right, was one of the first to talk about this rearrangement that I was talking about where I said, well, have uncertainty, reducing uncertainty on your model

parameters, and then sort of increasing uncertainty on the outcomes whilst accounting for noise.

And bold is one of the first papers where the

really focused on that rewriting and the fact that those two are equivalent.

And then they, they showed that kind of, I think the intuitive one is the first one, right?

I have model uncertainty on or want to reduce it.

Makes sense, right?

That's probably the most intuitive, but then maybe the more computationally useful one is the second one, which says, well, I have, I have, I want to have an uncertain outcome

subject to not being too noisy.

but you can prove that they're the same and then you can calculate obviously the second one if you think it's easier to calculate um and then you can score designs using that.

it's really, I would say it is the EIG, it's different way to write the EIG um and it can be really good.

It can be much more efficient to do it that way for a classifier model, for example, right?

Neural network classifier.

So yeah, that's what I would say bold is and just to say,

You know, I wasn't writing papers in 2011.

so this whole concept is not due to me.

It's due to either Neil Hulsby or somebody before him basically.

Yes.

And bold.

to make sure people understood correctly, that's patient active learning by disagreement.

So that's the active learning part.

Now some people are, are fighting right now as they listen to you, Adam.

I'm sure you already have a lot of angry mail, but let's...

Let's continue.

Then what is the disagreement part about, you know, and then let's talk about how's that connect back to Bayesian experimental design?

Yeah.

I think that's a great question.

I really like the term actually.

think it's because I think it's kind of a nice way to think about what the EIG is.

So the disagreement is between candidate hypotheses of your model.

Right.

So it could be as samples of your prior or if you have some old data, it's samples from the posterior conditional on your old data.

So actually let's think, let's think about that island prisoner or sailor example again.

There's 24 hypotheses, right?

There's 12 sailors and either one is heavier or one is lighter.

So there's 24 candidate hypotheses in this model space.

And then.

Disagreement would mean the predictions of the experimental outcome made by different hypotheses are different.

Right.

So for example, right, we talked about, why don't you just roll a dice on the beach?

Well, none of the hypotheses will make a different prediction about that outcome.

But now if I weigh two sailors between them, right, if, if, you know, hypothesis, I don't know, 15 might say that this one is the heavy one, then that would make a different

prediction to

a hypothesis that says neither of them is, in which case it would balance.

Right.

So that's where the disagreement comes from.

And I think it sort of connects to thinking of your model as a sort of mixture of experts.

I think that's a nice way to think about it.

Right.

So if you had a finite number of hypotheses, right.

So your posterior is just a weight over a finite number of hypotheses, then active learning by disagreement would basically boil down to well,

What are experimental designs where the predictions of different experts disagree?

And again, you can sort of write out the maths and then it all comes back to this sort of magical EIG quantity that's I think really at the center of a lot of this.

Okay.

Yeah.

Yes.

All right.

This is actually a good name.

Usually because we have a lot of bad names in statistics, but that one I would have to say is actually probably helping to understand what this is all about.

Right.

So can you also come back very, give us the elevator pitch again to make sure it was clear to listeners because this was a long and technical answer, but to make sure that listeners

understand how bold connects to BED?

Yeah, it's the same thing.

um It's my statement.

It's just, so BED, I think this is why people find it a bit hard to get into.

It's extremely abstract.

Yeah.

Right.

So it can kind of like, it's sort of like a chameleon in that it can appear like a completely different algorithm in a different model class in a different setting.

Bold, I think as it's used now, which is by the way, is not, would say the full extent of what was introduced in the original paper, but a sub case of bold would be you have a

classifier model.

You sample the weights of your neural net with dropout.

That gives you a variety of predictions and you find out where those predictions maximally disagree.

Right.

um I would say that, that is an approach to doing BED with a specific model, with a specific way to generate hypotheses, right.

It's sort of Bayesian posterior and yeah, with a certain approximation to the EIG, right.

Cause to get the true EIG, you would need to sample like

the entire hypothesis space.

So that would probably be infinite.

So yeah, I think that's the link.

It's sort of, know, BED is the sort of chameleon, the sort of very abstract concepts and bold is a very concrete, you could use it, right?

But you could use it because it's a specific choice.

There's so many free choices in the BED framework and it's more like here's a lift of specific choices that could work in certain types of problem such as classifier.

classification.

Yeah, yeah, I completely agree with that.

would say, yeah, probably one of the problems of BDs that it's very large, it's very broad.

ah Whereas bold is going to be much more like it's a bit clearer what it is.

And it's also probably less intimidating to people new to it.

um Yeah, I think just because it's so much more specific.

Right.

Like it's a specific type of model and a specific type of problem.

And BED sort of isn't, but because it's so general, it's hard to work out what, well, what is BED for my case.

I don't know.

Right.

You have to do a lot of thinking to answer that question.

Yeah.

No, exactly.

Exactly.

So again, though, as I asked you for BED, when do you think bold is appropriate?

And when is it not?

And when it's not appropriate, what does it get wrong?

And what's the key insight behind the fix?

Yeah.

um Well, there's kind of two answers to this, right?

um So one is about supposing that you want to reduce uncertainty in your model parameters.

Is it the right thing to do?

To which I would say probably, right.

Like I said, it's more specific.

you, do you check all the boxes that bold requires you to check is your model classification model?

Right.

Um, is it, and do you have a way to sort of generate hypotheses?

So some kind of approximate posterior.

So, you know, are you happy with that?

Do you think that the prior predictive of that is going to be reasonable, right?

Like throwing dropout on your neural network is, you know, it's a little risky, let's say, um, you have to decide whether you are.

Okay with that.

So I think if you tick the boxes of using bold, yeah, go for it.

Right.

Use bold.

Um, and I think most of what I've written, like in my own research is not like an alternative.

It's more like understanding why is that?

What is that algorithm really doing on a sort of more math-sy level?

Right.

So I wouldn't say like you had to choose between BED and bold.

That's actually completely wrong.

Now.

There's another paper written with, with, Freddie, right?

Freddie Bickford-Smith, who I worked with quite a bit.

Where we said, well, actually is reducing uncertainty in the model parameters what you want to do?

Probably not.

Well, it may be not.

And it could be that what you're interested in is specific, um, target predictions, right?

So actually it's not the parameters, the model, right?

that are important, it's the predictions that you want to make.

And you might have a good idea of the kind of inputs where you want to make good quality predictions.

Um, and in this case, there's something that we came up with called EPIG.

Um, E P I G you can, you should be able to find that.

Um, which is just slightly changing the objective in such a way that you should now be focused on making predictions on the specific inputs that are of interest to you.

Okay.

Yeah.

There are so many things, um,

Make sure to make sure to add these links please to the show notes because I think this is going to be extremely helpful.

already had added a few of them, but you are going to be much, better than me, of course, to do that.

and so to add a bit more to the complexity, of course, I've also read on your, on your blog, um, about deep adaptive design.

So what is that?

What is the difference with what we just talked about so bold?

And again, when would that be useful?

Yeah, again, I think these are all just different forms of the same thing.

It's like making something more specific.

So this is still BED, it's still based in experimental design.

It's just a different sort of way to do it and with different kinds of needs and assumptions.

So again, it really comes back to like, what is important in any specific setting?

And we kind of make the case in this deep adaptive design paper that sometimes you're in a case where, for example, in a human computer interaction setting, where you cannot go and

do a very large calculation to determine what's the optimal design to our somebody, right?

So imagine it's like an interview, but imagine like one of us is an AI basically.

And you want to kind of extract information from, okay, let's, I'm sorry to put this on you, but imagine you're the AI, right?

You're the interviewer.

I'm the interviewee.

You want to extract information from me and you want to do that as efficiently as possible.

Well, use BED, right?

We've just been talking about that.

How would you extract information from a human?

You use BED.

The problem is, right, I give you some information.

You need to sort of update your internal model where you need to do Bayesian inference.

And then you need to sort of go and do this EIG calculation, find the best candidate model, blah, blah.

So, you know, it's not going to be a very natural conversation because, you know, your, your kind of AI machine would have to go and do this huge calculation.

Now, obviously you could just use a fallback, right?

You could use a much, a much faster, cheaper heuristic algorithm.

And that's what most people do.

Right.

So for example, in the age of LLM, obviously you could just get the LLM do in context um inference, right?

Or in sort of more old school stuff, people defined sort of heuristics for certain types of experiment, right?

We looked at a, um it's basically behavioral economics, right?

So you, you're basically, it's kind of like, um it's like barter, right?

Or, haggling.

You're offering people, would you take this financial trade off?

Would you take this financial trade off?

um You know, would you, I could give you a hundred pounds in a year, or I could give you.

75 today, which would you prefer?

Right.

And you're trying to build a model of their decision-making process.

In that case, you want your AI or your experimental design to respond very fast.

And in that case, what you want to do is you want to kind of absorb the new data and generate the optimal design immediately afterwards.

So you need to kind of cut out this intermediate computation.

So then the idea of deproductive design is that you could pre-train all of that.

So you could have a neural network that is sort of pre-trained to take in data.

And it's kind of like amortized inference.

like the step above amortized inference, right?

Because you're just doing inference.

Yeah, it's like the inputs and outputs are different, right?

So with amortized inference, you just get the posterior back.

But with this, you just jump straight to the best design, right?

So you actually skip the posterior.

Okay.

And so the posterior is just kind of learned implicitly by the model you would think or the right aspects of it are.

Now, obviously there's a bunch of approximations that go into that and it's only as good as your neural network design space.

But that's basically the idea.

So again, it's just BED again, but it's BED in a case where you want this fast call in response and you want to have a pre-trained network that can be adaptive.

Right.

Because you could also imagine a case where

you cannot or don't need to be adaptive, then you wouldn't use it.

Right.

Like for example, you know, you're doing high, high throughput screening in the chemistry lab.

Well, it's high, they were, they go in parallel, right?

So it's, it's high throughput because it's parallel.

Well, then it's not adaptive because you're not deciding one design on the basis of the first experiment.

You have to do them all simultaneously.

So it just really comes back to what your, what your need is and, and how BED actually translates into

an algorithm for this case.

um So yeah, deep adaptor design is just about this problem of how do you adapt very quickly to new data and get very good designs that will keep reducing uncertainty?

And how do you build one kind of end-to-end neural network that can do that?

Okay.

Yes.

This is, yeah, that's really fascinating.

I didn't know that was possible actually.

So...

um I put a link to your blog post about that in the show notes for people who want to give it a look.

I will also put episode 107 that we had with Marvin Schmidt from the Baseflow team that explains what amortized patient inference is, folks, for you to understand what that would

be when you would use that.

And actually next episode, just after you, Annem, will be with Stefan Radeff, who is the...

the creator of Baseflow, the Python package for Amortized Patient Inference.

yeah, we'll dig a bit more into API, what it is, how to do it right now, these days, because episode 107 was a few months ago and things have changed quite a lot.

So how do you use API now in your workflows with em the new version of Baseflow, the 2.01?

But also, um Stefan and I worked on an agent skill to help you do that when you're working with AI agents.

So to train the agent to actually do real good amortized patient inference, like having Stefan uh talking to your, teaching your agent how to do that.

So if you're interested in that, make sure to tune in for next episode.

uh So yeah.

All of that will be in the show notes.

the meantime, Adam, you've also worked and you work often on protein dynamics, as you were saying at the beginning of the show and quantum chemistry.

This is super interesting to me.

What is that about?

And how does Bayesian experimental design show up there or does it even show up?

don't know.

Yeah, that's a great question.

yeah, um these are some things that I've worked on more recently in my job in Microsoft.

I think the connection to Bayesian experimental design is maybe a little tangential.

I it's something I'd like to bring into those topics more.

um I think, as I mentioned, like, you need your model to be Bayesian if you want to use Bayesian experimental design.

And I think

example, with protein dynamics models, they currently are just sort of like feed forward neural network basically.

there is an uncertainty module in some of them, but it's not quite the same as a, like a multi, like a hypothesis based model.

Um, so yeah, I think TBD on the Bayesian experimental design, I mean, I think what people, everyone is, uh, agrees on in, for example, in the protein

structure sampling community is that more experimental data is good.

Experimental data is very important and helpful.

And then there's a question about, how do we collect it and how do we choose it?

um I think, yeah, having worked in the field, I now realize that it's quite a complicated question.

I think this is another reason why BED maybe is only kind of like nascent in a lot of fields is that

The design space is incredibly complicated.

Like this is, this is state of the art stuff.

If you're talking about cutting edge, um, you know, uh, like biological or biochemical experiments.

Um, and they come with certain sort of cost implications and, and, um, there are kind of like tricks you might be able to play that, that maybe you would want to play that are not

kind of represented necessarily in your kind of Bayesian model.

So.

I think I would say for protein structure sampling, I'd say really interesting experimental data, incredibly, incredibly important.

um TBD on how BED, sorry, it's two acronyms next to each other.

It's very good.

Well done.

It's to be decided how Bayesian experimental design will fit into that.

think that's pretty cutting edge um and involves

really quite a lot of complexity, more than maybe me in my kind of PhD days just doing abstract mathematics might ever appreciate it.

Now you also asked about quantum chemistry.

So I think that's interesting because the type of quantum chemistry I worked on, it's not supervised learning, actually.

That's really interesting.

It's unsupervised.

So there actually isn't any data that you have to get.

It's just about doing

doing big computation and trying to amortize these big computations.

So it's kind of like the deep adaptive design effort in the deep learning aspects in a way, right?

You're trying to amortize a very, expensive computation.

So if people are interested in this, it's, um, it's a, I have a blog post called all about all performer.

It's all performer in the title.

That's the name for the model.

Um, so yeah, that's about trying to essentially amortize solutions of the Schrodinger equation.

Um, and in that, in that case, um, yeah, it's not, it's not based on labels.

Essentially it's based on, it's based on just pure computation and trying to amortize that.

Now there are other parts of the quantum chemistry stack where you, what you, what you might want to do, where you might want to get experimental data, or you might want to use

a very expensive quantum chemistry method to label a less expensive one.

And I think that's, so that's another experimental design problem, right?

Well, which, which ones do I label and which method do I use?

What fidelity do I use?

I think in certain aspects, this is very studied, not by me, but other people.

For example, if you're, if you're training a machine learning force field.

um, I'm thinking of papers like, NECWIP, which is a machine learning force fields paper.

I'm 99 % sure that what they did in that paper was.

They sort of ran a kind of molecular dynamic simulation with their model until, and they, and they had a Bayesian model that was based on, again, this is, this is my memory of the

paper, apologies if it's wrong.

They had a GP based model.

So it had its Bayesian uncertainty built in and they basically run their molecular dynamic simulation until that uncertainty was too high.

And then they said, okay, now we're going to label more data.

Again, apologies if my memory of that paper is faulty.

m So that's BED, that's a form of BED.

It's not globally optimal across the whole design space, right?

You're using your MD simulation to explore a small part of the design space and then you're choosing based on high uncertainty.

um But you know, I think the principle applies.

It's the same principle.

And the principle does just sort of keep coming back.

And I think we'll see...

Because that's all in, in Silico, which is a scientist way of saying it's done on a computer.

em It's kind of easier to create the back and forth, um iterative setup that you would need, right?

Where you create, you add more data, you fit your model and you could just leave that running autonomously.

Doing that with wet lab is really hard um because, you know, these things take months sometimes.

Yeah, I can guess, but that was the, think it was the first time I heard about quantum chemistry.

I was, I was really curious about what that was.

um which sounds like we could even make a whole episode about that actually.

So maybe you should, maybe you should come back on the show.

Um, so I'm going to start closing this out here because it's gonna, it's going to be late for you at some point, um I'm curious.

You know, to wrap things up for someone who's listened to us in the past hour and wants to start applying BD and bold practically, not at the frontier like you're doing, but just

usefully in the work.

Where do you point them?

Are there any packages that you recommend, things like that, that people can practically pick up and use to get started?

That's a great question.

um I think the first, you know, as we've said, the struggle is that it's such an abstract and multifaceted sort of way of viewing the world that it's very hard to kind of be very

specific.

So I think the first question is, do you use a Bayesian model?

Right.

Are you doing Bayesian analysis on your data?

If the answer is yes.

And do you like it, right?

Like do you feel that it's

good is giving you the right uncertainties.

You're looking at your posterior predictors and you're happy with it.

If that's all check, check, check, then you're a good candidate.

Okay.

And in that case, the assumption would be that you want to, you want to collect more data that's either very similar or it's literally just more data of the same type that you

already have.

um now if, if the same type, that's exactly the setting that's, you know, studied mathematically.

So you fit your posterior and you, generate candidate hypotheses and you scan over the candidate designs and you choose the best one.

So I think that is, so that would basically be bold, right?

I think bold is a good starting point.

If you can make bold fit in your setting, just that's probably a great starting point, right?

And then there's more stuff you can explore after that by kind of reading around.

And then if whatever reason it doesn't work.

or it doesn't give you the outcomes you're looking for, then you kind of need to think about, why doesn't it?

Right.

So I mean, this, you know, I've written a number of papers, right.

And they all kind of dig into a different aspect.

For example, you might have continuous outcomes and you might feel that just picking a hundred hypotheses and integrating over those to estimate your marginal likelihood is

rubbish.

Then you might need something like what we studied in the variational paper.

You might have a very huge design space of continuous design parameters where you'd be like, you know what, I want to take gradients in the design space.

That's something that we studied in another paper.

um You might be in the case, right, as we discussed with this deep adaptive design, where the problem is that you're interacting with humans and you need a very fast kind of

adaptive response is you're going to, you're going to kind of consume what they've told you and then immediately give the next question.

then you would want to look at something like that.

Or you might be in the case where, like I mentioned, you're interested in predictions on a target set.

So your bold is like quite good, but it's not, it's kind of exploring stuff that you don't need to know about because it's not relevant to what you want to test on.

And then you might look into Epic and things like that.

So I think, yeah.

So I recently interacted with, um,

I won't talk about the domain or her words, but with some scientists who wanted to start on this.

And I kind of said, well, the very first thing you should do is do the really simple version, right?

So you have a small set of candidate designs.

You do kind of bold.

You generate, you generate a finite set of candidate hypotheses.

You look at the predictions for each one, and then you apply the bold formula, which you know,

I don't want to speak it because that will be incomprehensible, but it's kind of a simple formula.

That's a difference of two entropy's basically.

Um, and you see if that works and if that, if that works, you're, doing great.

Um, and, and then if you find, Oh, it doesn't work because I'm estimating entropy of continuous variables, you know, that's one direction to go in.

If it doesn't work because, Oh, my design space is too huge, you know, that's another direction.

So I think.

I think bolded can be a nice starting point.

And I think it's, that is the oldest, the older work within the kind of machine learning that I'm most familiar with.

Um, where it's specifically thinking about computation.

Obviously a lot of people have studied the theory before.

Um, so I think that can be a nice starting point and then you can kind of think, well, what's the problem?

And then you, and you need to look for the solution.

Yeah.

That makes sense to me.

And, and something we, we often recommend here on the show, like always start with the simplest one and then.

build your way up to that because otherwise you can get lost pretty easy.

um Maybe one last question before the last questions of course.

uh Just you know, more broadly, as with your vision as a researcher, what do you think is the most underrated or unexplored or underexplored direction in patient experimental

design as a whole right now?

Oh, wow.

That's really interesting question.

The most under explored area within Bayesian experimental design.

think application.

think it's been picked up by a sort of statistical machine learning community.

And I think there's a lot of really interesting, cool stuff being done there.

And I really do think the next step is sort of finding collaborators, finding, and finding quite a few of them.

and getting more experience about, okay, well, where does this really change the way people do experiments, right?

Is there something that's kind of crying out for this, but hasn't tried it or hasn't succeeded due to like computational or theoretical limits, right?

Which is the thing that statistical machine learning departments could help you with.

They're not going to help you with your specific setup of your wet lab experiments.

So I think...

Yeah, for a while I've been, the only thing that I've been doing maybe in last year or so is yeah, like answering emails from people who've contacted me, who've been working on

very, very specific problems.

And I've been sort of trying to help them to set up BED in their, in their specific case.

And it's been really, really interesting.

I've learned a lot and I think, but also every setting is different, right.

And every setting has different requirements and hits different kinds of walls and stuff.

So.

I think that's what's needed for the field is, to sort of go deep on certain applications and it's happening, right?

I mean, Tom Rainforth, think, yeah, by the way, if you're like looking to do a PhD or postdoc, you should go and work with Tom Rainforth.

If this is the topic, I mean, he's, he's really one of the leading people in this field.

And, um, he is pushing forward on, on these questions, right?

Like, what, kinds of applications are interesting and where can BED make an impact?

But I still think that's probably the thing if I was.

back during that research full time that I would also be interested in, building collaborations with non-mathematicians who are doing experiments.

Yeah.

eh thanks for the call for PhD applications for sure.

Like if people are interested in that, should definitely see if you could go to Tom's department.

Awesome.

Well, I hope he is actually.

taking people on.

Now that I've said that.

be better.

um Well, Adam, that was great.

Thank you so much for taking the time.

think uh we can call it a show.

We've covered so much ground.

I'm very happy about that.

I think it's going to be a very useful episode and practical one, hopefully, for listeners.

Before letting you go, of course, I'm going to ask you the last two questions I ask every guest at the end of the show.

So first one.

If you had unlimited time and resources, which problem would you try to solve?

Yeah.

I mean, you said unlimited.

So I'm going to, I'm going to sort of just take a massive, massive challenge, which is basically AI for science, right?

Which is, what I'm working on right now.

How do you build an AI model that, that understands the real world, like the scientific and physical world?

I think it's one of the most intellectually interesting and most

impactful for human society challenges that are out there.

I mean, that's a ginormous challenge, but you you've given me unlimited resources.

So I could burn up a couple of earth copies to solve that.

And then, you know, I'll come back to you with my AI model that can understand all of science.

Yeah.

Love that.

Let me know.

Let me know how I can help.

I would definitely be down for that kind of project.

So yeah, for sure.

Definitely agree with that.

with the objective and the idea.

um And second question, if you could have dinner with any great scientific mind, dead, alive or fictional, who would be?

Yeah, I think, I think just for BED, I would choose Dennis Lindley, very underrated, he's a statistician from the 1950s.

He kind of first wrote down this EIG equation.

And I think, you know, it'd be interesting for him to know where it's gone and that people are still working on it.

I don't know if he'd be a barrel of laughs.

No idea, you know, what his character was, was thought to be like, or what he was like on a personal level.

I, you know, I think, you know, that's the BED choice, let's say.

If it was more just like a random, exciting person.

Oh, I don't know.

No, I stick with Dennis Lindley.

Okay.

Let's do that.

Awesome.

Well, Adam, that's, that's it for today.

Thank you so much for.

for taking the time, was really detailed and I really loved it, it was really awesome.

Again folks, show notes are going to be quite detailed for that one.

Again, remember that we have um a new website and a new format for the show notes that I hope is easier for you to pass through with some key takeaways and the related episodes in

their own section and the show notes and...

And even a small blog post I write each time after when the episode is ready to publish.

So yeah, Mike, for sure, folks, feel free to look at that if you want to dig deeper.

And Adam, thank you again for taking the time and being on this show.

Thank you for having me.

This has been another episode of Learning Bayesian Statistics.

Be sure to rate, review and follow the show on your favorite podcatcher and visit LearnBayStats.com for more resources about today's topics as well as access to more

episodes to help you reach true Bayesian state of mind.

That's LearnBayStats.com.

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lars and Meghiraam.

Check out his awesome work at BabaBrinkman.com.

I'm your host.

Alex and Dora.

can follow me on Twitter at Alex underscore and Dora like the country.

You can support the show and unlock exclusive benefits by visiting Patreon.com slash LearnBasedDance.

Thank you so much for listening and for your support.

You're truly a good Bayesian.

Change your predictions after taking information and if you're thinking I'll be less than amazing.

Let's adjust those expectations.

Let me show you how to be.

Good day, change calculations after taking fresh data and those predictions that your brain is making Let's get them on a solid foundation

Key Takeaways

It's the practice of using a Bayesian model to decide how to collect data before you collect it. Most statistical thinking starts with a fixed dataset. Bayesian experimental design sits upstream -- you have control over experimental parameters (which questions to ask, which reagents to mix, which conditions to test) and you want to choose them optimally. The Bayesian angle is to ask: what new data would most reduce my current uncertainty?

When two conditions hold: you have active control over how data is collected (not just passive observation), and you have a Bayesian model whose prior predictive distribution gives a reasonable picture of what typical data might look like. It's especially valuable when data collection is expensive or irreversible -- when the "committal step" of running an experiment has real cost, it's worth doing the analysis first.

A: EIG is the score you assign to a candidate experimental design -- the amount of information you expect to gain about your model parameters by running an experiment with that design. You compute it by simulating datasets from your prior predictive, doing Bayesian inference on each, and averaging how much the uncertainty decreased. What's remarkable is that you can derive the same quantity from two completely different starting points -- reducing parameter uncertainty, or maximizing outcome uncertainty while correcting for noise -- and arrive at the same formula. That convergence is why EIG keeps being re-discovered independently across fields.

A: Bayesian inference is already intractable in general because it requires integrating over the full prior space. Computing EIG compounds this: you have to synthesize many datasets, run Bayesian inference on each one, and average the results -- so you're doing something hard in a loop. This is what Adam and Tom Rainforth call "double intractability," and it's the main computational bottleneck for continuous-outcome models.

By training an amortized variational inference network that approximates the posterior as a neural network function of the data, you avoid having to solve a full inference problem from scratch for every simulated dataset. Instead, you feed data into the network and get an immediate posterior approximation -- making the EIG computation loop tractable.

Adam's honest answer is that active learning is "more of a vibe than a strict definition" -- a broad label for any approach where the model influences which data gets collected next. Bayesian experimental design is a principled, mathematically grounded instantiation of that vibe: you're still picking the next datapoint to label or the next experiment to run, but you're doing it by optimizing a well-defined objective (EIG) rather than using heuristics.

BALD (Bayesian Active Learning by Disagreement) is a concrete, practical implementation of BED for classification models. You sample model weights using dropout to get a population of hypotheses, then find the input where those hypotheses disagree most -- that's your next datapoint. It's BED with specific choices made: a classifier model, dropout as the approximate posterior, and disagreement as the EIG proxy. If your problem is classification and you're comfortable with those assumptions, BOLD is a reasonable and accessible starting point.

EPIG (Expected Predictive Information Gain) is a variant that changes the optimization objective from reducing uncertainty about model parameters to reducing uncertainty about predictions on specific inputs of interest. If you already know where you'll need to make predictions, EPIG focuses your data collection on the inputs that matter most for those predictions, rather than trying to learn the model globally.

Deep adaptive design is BED for settings where you need to respond to new data in real time -- like a behavioral economics study or a human-computer interaction scenario where there's no time for a full EIG computation between observations. The idea is to pre-train an end-to-end neural network that takes in observed data and outputs the next optimal design directly, skipping explicit posterior computation. It's like amortized inference taken one step further: you're not amortizing the posterior, you're amortizing the entire design decision.

These are domains where experiments are expensive and simulators are available -- the ideal conditions for BED. In quantum chemistry, you can use BED to decide which molecular configurations to probe next to most efficiently characterize an energy landscape. In protein dynamics, the goal is to guide sampling toward conformational states that are most informative about the protein's behavior. Both fields benefit from the ability to reason about uncertainty over complex, high-dimensional spaces.

Three things: model commitment (you have to be willing to specify a Bayesian model of your data-generating process, which many practitioners aren't), computational complexity (double intractability is still a real constraint), and the need for domain-specific collaboration. BED is abstract enough that applying it to any specific field requires deep knowledge of that field's experimental constraints -- which means it can't just be dropped in as a plug-and-play tool.

Adam's view is that the next frontier is deep collaboration with domain scientists -- finding the fields where the experimental setup is rich enough, the data is scarce enough, and the models are good enough for BED to genuinely change how people work. Theory has largely caught up. What's needed now is the harder, messier work of going deep on specific applications and building the collaborations that make that possible.

Related Episodes

#117 Unveiling the Power of Bayesian Experimental Design, with Desi Ivanova

Listen →

#107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt

Listen →

Support & Resources

→ Support the show on Patreon
→ Intro to Bayes Course (first 2 lessons free)
→ Advanced Regression Course (first 2 lessons free)
Theme music: “Good Bayesian” by Baba Brinkman (feat MC Lars and Mega Ran). bababrinkman.com