*Proudly sponsored by **PyMC Labs**, the Bayesian Consultancy. **Book a call**, or **get in touch**!*

How does the world of statistical physics intertwine with machine learning, and what groundbreaking insights can this fusion bring to the field of artificial intelligence?

In this episode, we delve into these intriguing questions with Marylou Gabrié. an assistant professor at CMAP, Ecole Polytechnique in Paris. Having completed her PhD in physics at École Normale Supérieure, Marylou ventured to New York City for a joint postdoctoral appointment at New York University’s Center for Data Science and the Flatiron’s Center for Computational Mathematics.

As you’ll hear, her research is not just about theoretical exploration; it also extends to the practical adaptation of machine learning techniques in scientific contexts, particularly where data is scarce.

In this conversation, we’ll traverse the landscape of Marylou’s research, discussing her recent publications and her innovative approaches to machine learning challenges, latest MCMC advances, and ML-assisted scientific computing.

Beyond that, get ready to discover the person behind the science – her inspirations, aspirations, and maybe even what she does when not decoding the complexities of machine learning algorithms!

*Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at **https://bababrinkman.com/** !*

**Thank you to my Patrons for making this episode possible!**

*Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser*.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

**Takeaways**

- Developing methods that leverage machine learning for scientific computing can provide valuable insights into high-dimensional probabilistic models.
- Generative models can be used to speed up Markov Chain Monte Carlo (MCMC) methods and improve the efficiency of sampling from complex distributions.
- The Adaptive Monte Carlo algorithm augmented with normalizing flows offers a powerful approach for sampling from multimodal distributions.
- Scaling the algorithm to higher dimensions and handling discrete parameters are ongoing challenges in the field.
- Open-source packages, such as Flow MC, provide valuable tools for researchers and practitioners to adopt and contribute to the development of new algorithms. The scaling of algorithms depends on the quantity of parameters and data. While some methods work well with a few hundred parameters, larger quantities can lead to difficulties.
- Generative models, such as normalizing flows, offer benefits in the Bayesian context, including amortization and the ability to adjust the model with new data.
- Machine learning and MCMC are complementary and should be used together rather than replacing one another.
- Machine learning can assist scientific computing in the context of scarce data, where expensive experiments or numerics are required.
- The future of MCMC lies in the exploration of sampling multimodal distributions and understanding resource limitations in scientific research.

**Links from the show:**

- Marylou’s website: https://marylou-gabrie.github.io/
- Marylou on Linkedin: https://www.linkedin.com/in/marylou-gabri%C3%A9-95366172/
- Marylou on Twitter: https://twitter.com/marylougab
- Marylou on Github: https://github.com/marylou-gabrie
- Marylou on Google Scholar: https://scholar.google.fr/citations?hl=fr&user=5m1DvLwAAAAJ
- Adaptive Monte Carlo augmented with normalizing flows: https://arxiv.org/abs/2105.12603
- Normalizing-flow enhanced sampling package for probabilistic inference: https://flowmc.readthedocs.io/en/main/
- Flow-based generative models for Markov chain Monte Carlo in lattice field theory: https://journals.aps.org/prd/abstract/10.1103/PhysRevD.100.034515
- Boltzmann generators – Sampling equilibrium states of many-body systems with deep learning: https://www.science.org/doi/10.1126/science.aaw1147
- Solving Statistical Mechanics Using Variational Autoregressive Networks: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.122.080602
- An example of discrete version of similar algorithms: https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.3.L042024
- Grothendieck’s conference: https://www.youtube.com/watch?v=ZW9JpZXwGXc

**Transcript**

*This is an automatic transcript and may therefore contain errors. Please **get in touch** if you’re willing to correct them.*

##### Transcript

How does the world of statistical physics

intertwine with machine learning, and what

2

groundbreaking insights can this fusion

bring to the field of artificial

3

intelligence?

4

In this episode, we'll delve into these

intriguing questions with Marilou Gavrier.

5

Having completed her doctorate in physics

at Ecole Normale Supérieure, Marilou

6

ventured to New York City for a joint

postdoctoral appointment at New York

7

University's Center for Data Science.

8

and the Flatirons Center for Computational

Mathematics.

9

As you'll hear, her research is not just

about theoretical exploration, it also

10

extends to the practical adaptation of

machine learning techniques in scientific

11

contexts, particularly where data are

scarce.

12

And this conversation will traverse the

landscape of Marie-Lou's research,

13

discussing her recent publications and her

innovative approaches to machine learning

14

challenges.

15

her inspirations, aspirations, and maybe

even what she does when she's not decoding

16

the complexities of machine learning

algorithms.

17

This is Learning Bayesian Statistics,

episode 98, recorded November 23, 2023.

18

Let me show you how to be a good lazy and

change your predictions.

19

Marie-Louis Gabrié, welcome to Learning

Bayesian Statistics.

20

Thank you very much, Alex, for having me.

21

Yes, thank you.

22

And thank you to Virgil, André and me for

putting us in contact.

23

This is a French connection network here.

24

So thanks a lot, Virgil.

25

Thanks a lot, Marie-Lou for taking the

time.

26

I'm probably going to say Marie-Lou

because it flows better in my English

27

because saying Marie-Lou is and then

continuing with English.

28

I'm going to have the French accent, which

nobody wants to hear that.

29

So let's start.

30

So I gave a bit of...

31

of your background in the intro to this

episode, Marie-Lou, but can you define the

32

work that you're doing nowadays and the

topics that you are particularly

33

interested in?

34

I would define my work as being focused on

developing methods and more precisely

35

developing methods that use and leverage

all the progress in machine learning for

36

scientific computing.

37

I have a special focus within this realm.

38

which is to study high-dimensional

probabilistic models, because they really

39

come up everywhere.

40

And I think they give us a very particular

lens on our world.

41

And so I would say I'm working broadly in

this direction.

42

Well, that sounds like a lot of fun.

43

So I understand why Virgil put me in

contact with you.

44

And could you start by telling us about

your journey?

45

actually into the field of statistical

physics and how it led you to merge these

46

interests with machine learning and what

you're doing today.

47

Absolutely.

48

My background is actually in physics, so I

studied physics.

49

Among the topics in physics, I quickly

became interested in statistical

50

mechanics.

51

I don't know if all listeners would be

familiar with statistical mechanics, but I

52

would define it.

53

broadly as the study of complex systems

with many interacting components.

54

So it could be really anything.

55

You could think of molecules, which are

networks of interacting agents that have

56

non-trivial interactions and that have

non-trivial behaviors when put all

57

together within one system.

58

And I think it's really important, as I

was saying, viewpoint of the world today

59

to look at those big macroscopic systems

that you can study probabilistically.

60

And so I was quickly interested in this

field that is statistical mechanics.

61

And at some point machine learning got the

picture.

62

And the way it did is that I was looking

for a PhD in 2015.

63

And I had some of my friends that were,

you know, students in computer science and

64

kind of early commerce to machine

learning.

65

And so I started to know that it existed.

66

I started to know that actually deep

neural networks were revolutionizing the

67

fields, that you could expect a program

to, I don't know, give names to people in

68

pictures.

69

And I thought, well, if this is possible,

I really wanna know how it works.

70

I really want to, for this technology, not

to sound like magic to me, and I want to

71

know about it.

72

And so this is how I started to become

interested and to...

73

find out that people knew how to make it

work, but not how it worked, why it worked

74

so well.

75

And so this is how I, in the end, was put

into contact with Florence Akala, who was

76

my PhD advisor.

77

And I started to have this angle of trying

to use statistical mechanics framework to

78

study deep neural networks that are

precisely those complex systems I was just

79

mentioning, and that are so big that we

are having trouble making really sense of

80

what they are doing.

81

Yeah, I mean, that must be quite...

82

Indeed, it must be quite challenging.

83

We could already dive into that.

84

That sounds like fun.

85

Do you want to talk a bit more about that

project?

86

Since then, I really shifted my angle.

87

I studied in this direction for, say,

three, four years.

88

Now, I'm actually going back to really the

applications to real-world systems, let's

89

say.

90

using all the potentialities of deep

learning.

91

So it's like the same intersection, but

looking at it from the other side.

92

Now really looking at application and

using machine learning as a tool, where I

93

was looking at machine learning as my

study, my object of study, and using

94

statistical mechanics before.

95

So I'm keen on talking about what I'm

doing now.

96

Yeah.

97

So basically you...

98

You changed, now you're doing the other

way around, right?

99

You're studying statistical physics with

machine learning tools instead of doing

100

the opposite.

101

And so how does, yeah, what does that look

like?

102

What does that mean concretely?

103

Maybe can you talk about an example from

your own work so that listeners can get a

104

better idea?

105

Yeah, absolutely.

106

So.

107

As I was saying, statistical mechanics is

really about large systems that we study

108

probabilistically.

109

And here there's a tool, I mean, that

would be one of the, I would say, most

110

active direction of research in machine

learning today, which are generative

111

models.

112

And they are very natural because there

are ways of making probabilistic model,

113

but that you can control.

114

That you have control.

115

produce samples from within one commons,

where you are in need of very much more

116

challenging algorithms if you want to do

it in a general physical system.

117

So we have those machines that we can

leverage and that we can actually combine

118

in our typical computation tools such as

Markov chain Monte Carlo algorithms, and

119

that will allow us to speed up the

algorithms.

120

Of course, it requires some adaptation

compared to what people usually do in

121

machine learning and how those generative

models were developed, but it's possible

122

and it's fascinating to try to make those

adaptations.

123

Hmm.

124

So, yeah, that's interesting because if I

understand correctly, you're saying that

125

one of your...

126

One of the aspects of your...

127

job is to understand how to use MCMC

methods to speed up these models?

128

Actually, it's the other way around, is

how to use those models to speed up MCMC

129

methods.

130

Okay.

131

Can you talk about that?

132

That sounds like fun.

133

Yeah, of course.

134

Say MCMC algorithms, so Markov Chain

Monte-Carlo's are really the go-to

135

algorithm when you are faced with

probabilistic models that is describing

136

whichever system you care about, say it

might be a molecule, and this molecule has

137

a bunch of atoms, and so you know that you

can describe your system, I mean at least

138

classically, at the level of giving the

Cartesian coordinates of all the atoms in

139

your system.

140

And then you can describe the equilibrium

properties of your system.

141

by using the energy function of this

molecule.

142

So if you believe that you have an energy

function for this molecule, then you

143

believe that it's distributed as

exponential minus beta the energy.

144

This is the Boltzmann distribution.

145

And then, okay, you are left with your

probabilistic model.

146

And if you want to approach it, a priori

you have no control onto what this energy

147

function is imposing as constraints.

148

It may be very, very complicated.

149

Well, go-to algorithm is Markov chain

Monte Carlo.

150

And it's a go-to algorithm that is always

going to work.

151

And here I'm putting quotes around this

thing.

152

Because it's going to be a greedy

algorithm that is going to be looking for

153

plausible configurations next to other

plausible configurations.

154

And locally, make a search on the

configuration space, try to visit it, and

155

then.

156

will be representative of the

thermodynamics.

157

Of course, it's not that easy.

158

And although you can make such locally,

sometimes it's really not enough to

159

describe fully probabilistic modeling, in

particular, how different regions of your

160

configuration space are related to one

another.

161

So if I come back to my molecule example,

it would be that I have two different,

162

let's say, conformations of my molecule,

two main templates that my molecule is

163

going to look like.

164

And they may be divided by what we call an

energy barrier, or in the language of

165

probabilities, it's just low probability

regions in between large probability

166

regions.

167

And in this case, local MCMCs are gonna

fail.

168

And this is where we believe that

generative models could help us.

169

And let's say fill this gap to answer some

very important questions.

170

And how would that work then?

171

Like you would...

172

Would you run a first model that would

help you infer that and then use that into

173

the MCMC algorithm?

174

Or like, yeah, what does that look like?

175

I think your intuition is correct.

176

So you cannot do it in one go.

177

And what's, for example, the paper that I

published, I think it was last year in

178

PNAS that is called Adaptive Monte Carlo

Augmented with Normalizing Flows is

179

precisely implementing something where you

have feedback loops.

180

So

181

The idea is that the fact that you have

those local Monte-Carlo's that you can run

182

within the different regions You have

identified as being interesting Will help

183

you to see the training of a generative

model that is going to target generating

184

configurations in those different regions

Once you have this generative model you

185

can include it in your mark of change

strategy You can use it as a proposal

186

mechanism

187

to propose new locations for your MCMC to

jump.

188

And so you're creating a Monte Carlo chain

that is going to slowly converge towards

189

the target distribution you're really

after.

190

And you're gonna do it by using the data

you're producing to train a generative

191

model that will help you produce better

data as it's incorporated within the MCMC

192

kernel you are actually jumping with.

193

So you have this feedback mechanism that

makes that things can work.

194

And this idea of adaptivity really stems

from the fact that in scientific

195

computing, we are going to do machine

learning with scarce data.

196

We are not going to have all the data we

wish we had to start with, but we are

197

going to have these type of methods where

we are doing things in what we call

198

adaptively.

199

So it's doing, recording information,

doing again.

200

In a few words.

201

Yeah.

202

Yeah, yeah.

203

Yeah.

204

So I mean, if I understand correctly, it's

a way of going one step further than what

205

HMC is already doing where we're looking

at the gradients and we're trying to adapt

206

based on that.

207

Now, basically, the idea is to find some

way of getting even more information as to

208

where the next sample should come from.

209

from the typical set and then being able

to navigate the typical set more

210

efficiently?

211

Yes.

212

Yes, so let's say that it's an algorithm

that is more ambitious than HMC.

213

Of course, there are caveats.

214

But HMC is trying to follow a dynamic to

try to travel towards interesting regions.

215

But it has to be tuned quite finely in

order to actually end up in the next

216

interesting region.

217

provided that it started from one.

218

And so to cross those energy barriers,

here with machine learning, we would

219

really be jumping over energy barriers.

220

We would have models that pretty only

targets the interesting regions and just

221

doesn't care about what's in between.

222

And that really focuses the efforts where

you believe it matters.

223

However, there are cases in which those

machine learning models will have trouble

224

scaling where

225

HMC would be more robust.

226

So there is of course always a trade-off

on the algorithms that you are using, how

227

efficient they can be per MCMC step and

how general you can accept them to be.

228

Hmm.

229

I see.

230

Yeah.

231

So, and actually, yeah, that would be one

of my questions would be, when do you

232

think this kind of new algorithm would be?

233

would be interesting to use instead of the

classic and Chempsey?

234

Like in which cases would you say people

should give that a try instead of using

235

the classic rubber state Chempsey method

we have right now?

236

So that's an excellent question.

237

I think right now, so on paper, the

algorithm we propose is really, really

238

powerful because it will allow you to jump

throughout your space and so to...

239

to correlate your MCMC configurations

extremely fast.

240

However, for this to happen, you have that

the proposal that is made by your deep

241

generative model as a new location, I

mean, a new configuration in your MCMC

242

chain is accepted.

243

So in the end, you don't have anymore the

fact that you are jumping locally and that

244

your de-correlation comes from the fact

that you are going to make lots of local

245

jumps.

246

Here you could correlate in one step, but

you need to accept.

247

So the acceptance will be really what you

need to care about in running the

248

algorithm.

249

And what is going to determine whether or

not your acceptance is high is actually

250

the agreement between your deep generative

model and the target distribution you're

251

after.

252

And we have traditional, you know,

253

challenges here in making the genetic

model look like exactly the target we

254

want.

255

There are issues with scalability and

there are issues with, I would say,

256

constraints.

257

So you give me, let's say you're

interested in Bayesian inference, so

258

another case where we can apply these kind

of algorithms, right?

259

Because you have a posterior and you just

want to sample from your posterior to make

260

sense

261

10, 100.

262

I tell you, I know how to train

normalizing flows, which are the specific

263

type of generative models we are using

here, in 10 or 100 dimension.

264

So if you believe that your posterior is

multimodal, that it will be hard for

265

traditional algorithms to visit the entire

landscape and equilibrate because there

266

are some low density regions in between

high density regions, go for it.

267

If you...

268

actually are an astronomer and you want to

marginalize over your initial conditions

269

on a grid that represents the universe and

actually the posterior distribution you're

270

interested in is on, you know, variables

that are in millions of dimension.

271

I'm sorry.

272

We're not going to do it with you and you

should actually use something that is more

273

general, something that will use a local

search, but that is actually going to, you

274

know, be

275

Unperfect, right?

276

Because it's going to be very, very hard

also for this algorithm to work.

277

But the magic of the machine learning will

not scale yet to this type of dimensions.

278

Yeah, I see.

279

And is that an avenue you're actively

researching to basically how to scale

280

these algorithms better to be your scams?

281

Yeah, of course.

282

Of course we can always try to do better.

283

So, I mean, as far as I'm concerned, I'm

also very interested in sampling physical

284

systems.

285

And in physical systems, there are a lot

of, you know, prior information that you

286

have on the system.

287

You have symmetries, you have, I don't

know, yeah, physical rules that you know

288

that the system has to fulfill.

289

Or maybe some, I don't know, multi-scale.

290

property of the probability distribution,

you know that there are some

291

self-significant similarities, you have

information you can try to exploit in two

292

ways, either in the sampling part, so

you're having this coupled MCMC with the

293

degenerative models, so either in the way

you make proposals you can try to

294

symmetrize them, you can try to explore

the symmetry by any means.

295

Oh, you can also directly put it in the

generative model.

296

So those are things that really are

crucial.

297

And we understand very well nowadays that

it's naive to think you will learn it all.

298

You should really use as much information

on your system as you may, as you can.

299

And after that, you can go one step

further with machine learning.

300

But in non-trivial systems, it would be, I

mean, it's not a big deal.

301

deceiving to believe that you could just

learn things.

302

Yeah.

303

I mean, completely resonate with that.

304

It's definitely something we will always

tell students or clients, like, don't

305

just, you know, throw everything at the

model that you can and just try to pray

306

that the model works like that.

307

And, but actually you should probably use

a generative perspective to

308

try and find out what the best way of

thinking about the problem is, what would

309

be the good enough, simple enough model

that you can come up with and then try to

310

run that.

311

Yeah, so definitely I think that resonates

with a lot of the audience where think

312

generatively.

313

And from what I understand from what you

said is also trying to put as much

314

knowledge and information as you have in

your generative model.

315

the deep neural network is here, the

normalizing flow is here to help, but it's

316

not going to be a magical solution to a

suboptimally specified model.

317

Yes, yes.

318

Of course, in all those problems, what's

hidden behind is the curse of

319

dimensionality.

320

If we are trying to learn something in

very high dimension and...

321

It could be arbitrarily hard.

322

It could be that you cannot learn

something in high dimension just because

323

you would need to observe all the location

in this high dimension to get the

324

information.

325

So of course, this is in general not the

case, because what we are trying to learn

326

has some structure, some underlying

structure that is actually described by

327

fewer dimensions.

328

And you actually need fewer observations

to actually learn it.

329

But the question is, how do you find those

structures, and how do you put them in?

330

Therefore, we need to take into account as

much as the knowledge we have on the

331

system to make this learning as efficient

as possible.

332

Yeah, yeah, yeah.

333

Now, I mean, that's super interesting.

334

And that's your paper, Adaptive Monte

Carlo augmented with normalizing floats,

335

right?

336

So this is the paper where we did this

generally.

337

And I don't have yet a paper out where we

are trying to really put the structure in

338

the generative models.

339

But that's the direction I'm actively

340

Okay, yeah.

341

I mean, so for sure, we'll put that paper

I just seated in the show notes for people

342

who want to dig deeper.

343

And also, if by the time this episode is

out, you have the paper or a preprint,

344

feel free to add that to the show notes or

just tell me and I'll add that to the show

345

notes.

346

That sounds really interesting for people

to read.

347

And so I'm curious, like, you know, this

idea of normalizing flows

348

deep neural network to help MCMC sample

faster, converge faster to the typical

349

set.

350

What was the main objective of doing that?

351

I'm curious why did you even start

thinking and working on that?

352

So yes, I think for me,

353

The answer is really this question of

multimodality.

354

So the fact that you may be interested in

priority distribution for which it's very

355

hard to connect the different interesting

regions.

356

In statistical mechanics, it's something

that we called actually metastability.

357

So I don't know if it's a word you've

already heard, but where some communities

358

talk about multimodality, we talk about

metastability.

359

And metastability are at the heart of many

interesting phenomena in physics.

360

be it phase transitions.

361

And therefore, it's something very

challenging in the computations, but in

362

the same time, very crucial that we have

an understanding of.

363

So for us, it felt like there was this big

opportunity with those probabilistic

364

models that were so malleable, that were

so, I mean, of course, hard to train, but

365

then they give you so much.

366

They give you an exact...

367

value for the density that they encode,

plus the possibility of sampling from them

368

very easily, getting just a bunch of

high-ID samples just in one run through a

369

neural network.

370

So for us, there was really this

opportunity of studying multimodal

371

distribution, in particular, metastable

systems from statistical mechanics with

372

those tools.

373

Yeah.

374

Okay.

375

So in theory,

376

these normalizing flows are especially

helpful to handle multimodal posterior.

377

I didn't get that at first, so that's

interesting.

378

Yep.

379

That's really what they're going to offer

you is the possibility to make large

380

jumps, actually to make jumps within your

Markov chain that can go from one location

381

of high density to another one.

382

just in one step.

383

So this is what you are really interested

in.

384

Well, first of all, in one step, so you're

going far in one step.

385

And second of all, regardless of how low

is the density between them, because if

386

you were to run some other type of local

MCMC, you would, in a sense, need to find

387

a path between the two modes in order to

visit both of them.

388

In our case, it's not true.

389

You're just completely jumping out of the

blue thanks to...

390

your normalizing flows that is trying to

mimic your target distribution, and

391

therefore that has developed mass

everywhere that you believe matters, and

392

that from which you can produce an IID

sample wherever it supports very easily.

393

I see, yeah.

394

And I'm guessing you did some benchmarks

for the paper?

395

So I think that's actually a very

interesting question you're asking,

396

because I feel benchmarks are extremely

difficult, both in MCMC...

397

and in deep learning.

398

So, I mean, you can make benchmarks say,

okay, I changed the architecture and I see

399

that I'm getting something different.

400

I can say, I mean, but otherwise, I think

it's one of the big challenges that we

401

have today.

402

So if I tell you, okay, with my algorithm,

I can write an MCMC that is going to mix

403

between the different modes, between the

different metastable states.

404

that's something that I don't know how to

do by any other means.

405

So the benchmark is, actually you won.

406

There is nothing to be compared with, so

that's fine.

407

But if I need to compare on other cases

where actually I can find those algorithms

408

that will work, but I know that they are

going to probably take more iterations,

409

then I still need to factor in a lot of

things in my true

410

honest benchmark.

411

I need to factor in the fact that I run a

lot of experiments to choose the

412

architecture of my normalizing flow.

413

I run a lot of experiments to choose the

hyperparameters of my training and so on

414

and so forth.

415

And I don't see how we can make those

honest benchmarks nowadays.

416

So I can make one, but I don't think I

will think very highly that it's, I mean,

417

you know, really revealing some profound

truth about

418

which solution is really working.

419

The only way of making a known-use

benchmark would be to take different

420

teams, give them problems, and lock them

in a room and see who comes out first with

421

the solution.

422

But I mean, how can we do that?

423

Well, we can call on listeners who are

interested to do the experiments to

424

contact us.

425

That would be the first thing.

426

But yeah, that's actually a very good

point.

427

And in a way, that's a bit frustrating,

right?

428

Because then it means at least

experimentally, it's hard to differentiate

429

between the efficiency of the different

algorithms.

430

So I'm guessing the claims that you make

about this new algorithm being more

431

efficient for multimodalities,

432

theoretical underpinning of the algorithm?

433

No, I mean, it's just based on the fact

that I don't know of any other algorithm,

434

which under the same premises, which can

do that.

435

So, I mean, it's an easy way out of making

any benchmark, but also a powerful one

436

because I really don't know who to compare

to.

437

But indeed, I think then it's...

438

As far as I'm concerned, I'm mostly

interested in developing methodologies.

439

I mean, that's just what I like to do.

440

But of course, what's important is that

those methods are going to work and they

441

are going to be useful to some communities

that really have research questions that

442

they want to answer.

443

I mean, research or not actually could be

engineering questions, decisions to be

444

taken that require to do an MCMC.

445

And I think the true tests of

446

whether or not the algorithm is useful is

going to be this, the test of time.

447

Are people adopting the algorithms?

448

Are they seeing that this is really

something that they can use and that would

449

make their inference work where they could

not find another method that was as

450

efficient?

451

And in this direction, there is the

cross-collaborator, Case Wong, who is

452

working at the Flatiron Institute and with

whom we developed a package that is called

453

FlowMC.

454

that is written in Jax and that implements

these algorithms.

455

And the idea was really to try to write a

package that was as user-friendly as

456

possible.

457

So of course we have the time we have to

take care of it and the experience we have

458

as a region, you know, available softwares

as we have, but we really try hard.

459

And at least in this community of people

studying gravitational waves, it seems

460

that people are really trying, starting to

use this in their research.

461

And so I'm excited, and I think it is

useful.

462

But it's not the proper benchmark you

would dream of.

463

Yeah, you just stole one of my questions.

464

Basically, I was exactly going to ask you,

but then how can people try these?

465

Is there a package somewhere?

466

So yeah, perfect.

467

That's called FlowMC, you told me.

468

Yes, it's called FlowMC.

469

You can pip install FlowMC, and you will

have it.

470

If you are allergic to Jax...

471

Right, I have it here.

472

Yeah, there is a read the docs.

473

So I'll put that in the show notes for

sure.

474

Yes, we have even documentation.

475

That's how far you go when you are

committed to having something that is used

476

and useful.

477

So I mean, of course, we are also open to

both comments and contributions.

478

So just write to us if you're interested.

479

Yeah, for sure.

480

Yeah, that folks, if you are interested in

contributing, if you see any bugs, make

481

sure to open some issues on the GitHub

repo or even better, contribute pull

482

requests.

483

I'm sure Marie-Doux and the co-authors

will be very happy about that.

484

Yes, you know typos in the documentation,

all of this.

485

Yeah, exactly.

486

That's what I...

487

I tell everyone also who wants to start

doing some open source package, start with

488

the smallest PRs.

489

You don't have to write a new algorithm,

like already fixing typos, making the

490

documentation look better, and stuff like

that.

491

That's extremely valuable, and that will

be appreciated.

492

So for sure, do that, folks.

493

Do not be shy with that kind of stuff.

494

So yeah, I put already the paper, you have

out an archive at adaptive Monte Carlo and

495

Flow MC, I put that in the show notes.

496

And yeah, to get back to what you were

saying, basically, I think as more of a

497

practitioner than a person who developed

the algorithms, I would say the reasons I

498

would...

499

you know, adopt that kind of new

algorithms would be that, well, I know,

500

okay, that algorithm is specialized,

especially for handling multimodels,

501

multimodels posterior.

502

So then I'd be, if I have a problem like

that, I'll be like, oh, okay, yeah, I can

503

use that.

504

And then also ease of adoption.

505

So is there an open source package in

which languages that can I just, you know,

506

What kind of trade-off basically do I have

to make?

507

Is that something that's easy to adopt?

508

Is that something that's really a lot of

barriers to adoptions?

509

But at the same time, it really seems to

be solving my problem.

510

You know what I'm saying?

511

It's like, indeed, it's not only the

technical and theoretical aspects of the

512

method, but also how easy it is to...

513

adopt in your existing workflows.

514

Yes.

515

And for this, I guess it's, I mean, the

feedback is extremely valuable because

516

when you know the methods, you're really,

it's hard to exactly locate where people

517

will not understand what you meant.

518

And so I really welcomed.

519

No, for sure.

520

And already I find that absolutely

incredible that now

521

Almost all new algorithms, at least that I

talk about on the podcast and that I see

522

in the community, on the PMC community,

almost all of them now, when they come up

523

with a paper, they come out with an open

source package that's usually installable

524

in a Python, in the Python ecosystem.

525

Which is really incredible.

526

I remember that when I started on these a

few years ago, it was really not the norm

527

and much more the exception and now almost

528

The Icon Panning open source package is

almost part of the paper, which is really

529

good because way more people are going to

use the package than read the paper.

530

So, this is absolutely a fantastic

evolution.

531

And thank you in the name of our soul to

have taken the time to develop the

532

package, clean up the code, put that on

PyPI and making the documentation because

533

That's where the academic incentives are a

bit disaligned with what I think they

534

should be.

535

Because unfortunately, literally it takes

time for you to do that.

536

And it's not very much appreciated by the

academic community, right?

537

It's just like, you have to do it, but

they don't really care.

538

We care as the practitioners, but the

academic world doesn't really.

539

And what counts is the paper.

540

So for now, unfortunately, it's really

just time that you take.

541

out of your paper writing time.

542

So I'm sure everybody appreciates it.

543

Yes, but I don't know.

544

I see true value to it.

545

And I think, although it's maybe not as

rewarded as it should, I think many of us

546

see value in doing it.

547

So you're very welcome.

548

Yeah, yeah.

549

No, for sure.

550

Lots of value in it.

551

Just saying that value should be more

recognized.

552

Just a random question, but something I'm

always curious about.

553

I think I know the answer if I still want

to ask.

554

Can you handle sample discrete parameters

with these algorithms?

555

Because that's one of the grails of the

field right now.

556

How do you sample discrete parameters?

557

So, okay, the pack, so what I've

implemented, tested, is all on continuous

558

space.

559

But, but what I need for this algorithm to

work is a generative model of which I can

560

sample from easily.

561

IID, I mean, not I have to make a Monte

Carlo to sample from my note that I can

562

just in one Python comment or whichever

language you want comment, gets an IID

563

sample from.

564

and that I can write what is the

likelihood of this sample.

565

Because a lot of generative models

actually don't have tractable likelihoods.

566

So if you think, I don't know, of

generative adversarial networks or

567

variational entoencoders for people who

might be familiar with those very, very

568

common generative models, they don't have

this property.

569

You can generate samples easily, but you

cannot write down with which density of

570

probability you've generated this sample.

571

This is really what we need in order to

use this generative model inside a Markov

572

chain and inside an algorithm that we know

is going to converge towards the target

573

distribution.

574

So normalizing flows are playing this role

for us with continuous variables.

575

They give us easy sampling and easy

evaluation of the likelihood.

576

But you also have equivalence on discrete

distributions.

577

And if you want...

578

generative model that would have those two

properties on discrete distribution, you

579

should turn yourself to autoregressive

models.

580

So I don't know if you've learned about

them, but the idea is just that they use a

581

factorization of probability distributions

that is just with conditional

582

distributions.

583

And that's something that is in theory has

full expressivity, that any distribution

584

can be written as a factorized

distribution where you are progressively

585

on the degrees of freedom that you have

already sampled.

586

And you can rewrite the algorithm,

training an autoregressive model in the

587

place of a normalizing flow.

588

So honest answer, I haven't tried, but it

can be done.

589

Well, it can be done.

590

And now that I'm thinking about it, people

have done it because in statistical

591

mechanics, there are a lot of systems that

we like.

592

a lot of our toy systems that are binary.

593

So that's, for example, the Ising model,

which are a model of spins that are just

594

binary variables.

595

And I know of at least one paper where

they are doing something of this sort.

596

So making jumps, they're actually not

trying to refresh full configurations, or

597

they are doing two, both refreshing full

configurations and partial configurations.

598

And they are doing...

599

something that, in essence, is exactly

this algorithm, but with discrete

600

variables.

601

So I'll happily add the reference to this

paper, which is, I think, it's by the

602

group of Giuseppe Carleo from EPFL.

603

And OK, I haven't, I don't think they

train exactly like, so it's not exactly

604

the same algorithm, but things around this

have been tested.

605

OK, well, it sounds like a.

606

Sounds like fun, for sure.

607

Definitely something I'm sure lots of

people would like to test.

608

So folks, if you have some discrete

parameters somewhere in your models, maybe

609

you'll be interested by normalizing flows.

610

So the flow in C package is in the show

notes.

611

Feel free to try it out.

612

Another thing I'm curious about is how do

you run the typical network, actually?

613

And how much of a bottleneck is it on the

sampling time, if any?

614

Yes.

615

So it will definitely depend on the space.

616

No, let me rewrite.

617

The thing is, whether or not it's going to

be worth it to train a neural network in

618

order to help you sampling.

619

depends on how difficult this for you to

sample in, I mean, with the more

620

traditional MCMCs that you have on your

hand.

621

So again, if you have a multimodal

distribution, it's very likely that your

622

traditional MCMC algorithms are just not

going to cut it.

623

And so then, I mean, if you really care

about sampling this posterior distribution

624

or this distribution of configurations of

a physical system,

625

then you will be willing to pay the price

on this sampling.

626

So instead of, say, having to use a local

sampler that will take you billions of

627

iterations in order to see transitions

between the modes, you can train a

628

normalizing flow on the autoregressive

model if you're discrete, and then have

629

those jumps happening every other time.

630

Then it's more than clear that it's worth

doing it.

631

OK, yeah, so the answer is it depends

quite a lot.

632

Of course, of course.

633

Yeah, yeah.

634

And I guess, how does it scale with the

quantity of parameters and quantity of

635

data?

636

So quantity of parameters, it's really

this dimension I was already discussing a

637

bit about and telling you that there is a

cap on what you can really expect these

638

methods will work on.

639

I would say that if the quantity of

parameters is something like tens or

640

hundreds, then things are going to work

well, more or less out of the box.

641

But if it's larger than this, you will

likely run into trouble.

642

And then the number of data is actually

something I'm less familiar with because

643

I'm less from the Bayesian communities

than the stat-mech community to start

644

with.

645

So my distribution doesn't have data

embedded in them, in a sense, most of the

646

time.

647

But for sure, what people argue, why it's

a really good idea to use generative

648

models such as normalizing flows to sample

in the Bayesian context.

649

is the fact that you have an amortization

going on.

650

And what do I mean by that?

651

I mean that you're learning a model.

652

Once it's learned, it's going to be easy

to adjust it if things are changing a

653

little.

654

And with little adjustments, you're going

to be able to sample still a very

655

complicated distribution.

656

So say you have data that is arriving

online, and you keep on having new samples

657

to be added to your posterior

distribution.

658

then it's very easy to just adjust the

normalizing flow with a few training

659

iterations to get back to the new

posterior you actually have now, given

660

that you have this amount of data.

661

So this is what some people call

amortization, the fact that you can really

662

encapsulate in your model all the

knowledge you have so far, and then just

663

adjust it a bit, and don't have to start

from scratch, as you would have to in

664

other.

665

Monte Carlo methods.

666

Yeah.

667

Yeah, so what I'm guessing is that maybe

the tuning time is a bit longer than a

668

classic HMC.

669

But then once you're out of the tuning

phase, the sampling is going to be way

670

faster.

671

Yes, I think that's a correct way of

putting it.

672

And otherwise, for the kind of the number

of, I mean, the dimensionality that the

673

algorithm is comfortable with.

674

In general, the running times of the

model, how have you noticed that being

675

like, has that been close to when you use

a classic HMC or is it something you

676

haven't done yet?

677

I don't think I can honestly answer this

question.

678

I think it will depend because it will

also depend how easily your HMC reaches

679

all the

680

regions you actually care about.

681

So I mean, probably there are some

distributions that are very easy for HMC

682

to cover and where it wouldn't be worth it

to train the model.

683

But then plenty of cases where things are

the other way around.

684

Yeah, yeah, yeah.

685

Yeah, I can guess.

686

That's always something that's really

fascinating in this algorithm world is how

687

dependent everything is on the model.

688

use case, really dependent on the model

and the data.

689

So on this project, on this algorithm,

what are the next steps for you?

690

What would you like to develop next on

this algorithm precisely?

691

Yes, so as I was saying, one of my main

questions is how to scale this algorithm

692

and

693

We kind of wrote it in an all-purpose

fashion.

694

And all-purpose is nice, but all-purpose

does not scale.

695

So that's really what I'm focusing on,

trying to understand how we can learn

696

structures we can know or we can learn

from the system, how to explore them and

697

put them in, in order to be able to tackle

more and more complex systems with higher,

698

I mean, more degrees of freedom.

699

So more parameters than what we are

currently doing.

700

So there's this.

701

And of course, I'm also very interested in

having some collaborations with people

702

that care about actual problem for which

this method is actually solving something

703

for them.

704

As it's really what gives you the idea of

what's next to be developed, what are the

705

next methodologies that's

706

will be useful to people?

707

Can they already solve their problem?

708

Do they need something more from you?

709

And that's the two things I'm having a

look at.

710

Yeah.

711

Well, it definitely sounds like fun.

712

And I hope you'll be able to work on that

and come up with some new, amazing,

713

exciting papers on this.

714

I'll be happy to look at that.

715

And so that's it.

716

It was a great deep dive on this project.

717

And thank you for indulging on my

questions, Marilou.

718

Now, if we want to de-zoom a bit and talk

about other things you do, you're also

719

interested to mention that in the context

of scarce data.

720

So I'm curious on what you're doing on

these, if you could elaborate a bit.

721

Yes, so I guess what I mean by scarce data

is precisely that when we are using

722

machine learning in scientific computing,

usually what we are doing is exploiting

723

the great tool that are deep neural

networks to play the role of a surrogate

724

model somewhere in our scientific

computation.

725

But most of the time, this is without data

a priori.

726

We know that there is a function we want

to approximate somewhere.

727

But in order to have data, either we have

to pay the price of costly experiments,

728

costly observations, or we have to pay the

price of costly numerics.

729

So if you, I mean, a very famous example

of applications of machine learning

730

through scientific computing is molecular

dynamics and quantum precision.

731

So this is what people call density

functional theory.

732

So if you want to.

733

observe the dynamics of a molecule with

the accuracy of what's going on really at

734

the level of quantum mechanics, then you

have to make very, very costly call to a

735

function that predicts what's the energy

predicted by quantum mechanics and what

736

are the forces predicted by quantum

mechanics.

737

So people have seen here an opportunity to

use deep neural nets in order to just

738

regress what's the value of this quantum

potential.

739

at the different locations that you're

going to visit.

740

And the idea is that you are creating your

own data.

741

You are deciding when you are going to pay

the price of do the full numerical

742

computation and then obtain a training

point of given Cartesian coordinates, what

743

is the value of this energy here.

744

And then you have to, I mean, conversely

to what you're doing traditionally in

745

machine learning, where you believe that

you have...

746

huge data sets that are encapsulating a

rule, and you're going to try to exploit

747

them at best.

748

Here, you have the choice of where you

create your data.

749

And so you, of course, have to be as smart

as possible in order to have to create as

750

little as possible training points.

751

And so this is this idea of working with

scarce data that has to be infused in the

752

usage of machine learning in scientific

computing.

753

My example of application is just what we

have discussed, where we want to learn a

754

deep generative model, whereas what we

start, we just have our target

755

distribution as an objective, but we don't

have any sample from it.

756

That would be the traditional data that

people will be using in generative

757

modeling to train a generative model.

758

So if you want, we are playing this

adaptive game.

759

I was already a bit eating at.

760

where we are creating data that is not

exactly the data we want, but that we

761

believe is informative of the data we want

to train the generative model that is in

762

turn going to help us to convert the MCMC

and in the same time as you are training

763

your model, generate the data you would

have needed to train your model.

764

Yeah, that is really cool.

765

And of course I asked about that because

scarce data is something that's extremely

766

common in the Bayesian world.

767

That's where usually Bayesian statistics

from the yeah, helpful and useful because

768

when you don't have a lot of data, you

need more structure and more priors.

769

So if you want to say anything about your

phenomenon of interest.

770

So that's really cool that you're working

on that.

771

I love that.

772

And from also, you know, a bit broader

perspective, you know, MCMC really well.

773

We work on it a lot.

774

So I'm curious where you think MCMC is

heading in the next few years.

775

And if you see its relevance waning in

some way.

776

Well, I don't think MCMC can go out of

fashion in a sense because it's absolutely

777

ubiquitous.

778

So practical use cases are everywhere.

779

If you have a large probabilistic model,

usually it's given to you by the nature of

780

the problem you want to study.

781

And if you cannot choose anything about

putting in the right properties, you're

782

just going to be.

783

you know, left with something that you

don't know how to approach except by MCMC.

784

So it's absolutely ubiquitous as an

algorithm for probabilistic inference.

785

And I would also say that one of the

things that are going to, you know, keep

786

MCMC going for a long time is how much

it's a cherished object of study by

787

actually researchers from different

communities, because I mean...

788

You can see people really from statistics

that are kind of the prime researchers on,

789

okay, how should you make a Monte Carlo

method that has the best convergence

790

properties, the best speed of convergence,

and so on and so forth.

791

But you can also see that the fields where

those algorithms are used a lot, be it

792

statistical mechanics, be it Bayesian

inference, also have full communities that

793

are working on developing MCMCs.

794

And so I think it's really a matter that

they are an object of curiosity and in

795

training to a lot of people.

796

And therefore it's something that's for

now is still very relevant and really

797

unsolved.

798

I mean, something that I love about MCMC

is that when you look at it first, you

799

say, yeah, that's simple, you know?

800

Yeah.

801

Yes, that's, but then you start thinking

about it.

802

Then you...

803

I mean, realize how subtle are all the

properties of those algorithms.

804

And you're telling yourself, but I cannot

believe it's so hard to actually sample

805

from distributions that are not that

complicated when you're a naive newcomer.

806

And so, yeah, I mean, for now, I think

they are still here and in place.

807

And if I could even comment a bit more

regarding exactly the context of my

808

research, where

809

it could seemingly be the case that I'm

trying to replace MCMC's with machine

810

learning.

811

I would warn the listeners that it's not

at all what we are concluding.

812

I mean, that's not at all the direction we

are going to.

813

It's really a case where we need both.

814

That MCMC can benefit from learning, but

learning without MCMC is never going to

815

give you something that you have enough

guarantees on, that something that you can

816

really trust for sure.

817

So I think here there is a really nice

combination of MCMC and learning and that

818

they're just going to nutter each other

and not replace one another.

819

Yeah, yeah, for sure.

820

And I really love the, yeah, that these

projects of trying to make basically MCMC

821

more informed instead of having first

random draws, you know, almost random

822

draws with Metropolis in the end.

823

making that more complicated, more

informed with the gradients, with HMC, and

824

then normalizing flows, which try to

squeeze a bit more information out of the

825

structure that you have to make the

sampling go faster.

826

I found that one super useful.

827

And also, yeah, that's also a very, very

fascinating part of the research.

828

And this is part also of a lot of the

research

829

a lot of initiatives that you have focused

on, right?

830

Personally, basically how that we could

decry it like a machine learning assisted

831

scientific computing.

832

You know, and do you have other examples

to share with us on how machine learning

833

is helping traditional scientific

computing methods?

834

Yes.

835

So, for example, I was giving already the

example of

836

of the learning of the regression of the

potentials of molecular force fields in

837

people that are studying molecules.

838

But we are seeing a lot of other things

going on.

839

So there are people that are trying to

even use machine learning as a black box

840

in order to, how should I say, to make

classifications between things they care

841

about.

842

So for example, you have samples that come

from a model.

843

But you're not sure if they come from this

model or this other one.

844

You're not sure if they are above a

critical temperature or below a critical

845

temperature, if they belong to the same

phase.

846

So you can really try to play this game of

creating an artificial data set where you

847

know what is the answer, train a

classifier, and then use your black box to

848

tell you when you see a new configuration

which type of configuration it is.

849

And it's really.

850

given to you by deep learning because you

would have no idea why the neural net is

851

deciding that it's actually from this or

from this.

852

You don't have any other statistics that

you can gather and that will tell you

853

what's the answer and this is why.

854

But it's kind of like opening this new

conceptual door that sometimes there are

855

things that are predictable.

856

I mean, you can check that, okay, on the

data that you know the answer of the

857

machine is extremely efficient.

858

But then you don't know why things are

happening this way.

859

I mean, there's this, but there are plenty

of other directions.

860

So people that are, for example, using

neural networks to try to discover a

861

model.

862

And here, model would be actually what

people call partial differential

863

equations, so PDEs.

864

So I don't know if you've heard about

those physics-informed neural networks.

865

But there are neural networks that people

are training, such that they are solution

866

of a PDE.

867

So instead of actually having training

data, what you do is that you use the

868

properties of the deep neural nets, which

are that they are differentiable with

869

respect to their parameters, but also with

respect to their inputs.

870

And for example, you have a function f.

871

And you know that the laplation of f is

supposed to be equal to.

872

the derivative in time of f, well, you can

write mean squared loss on the fact that

873

the laplacian of your neural network has

to be close to its derivative in time.

874

And then, given boundary conditions, so

maybe initial condition in time and

875

boundary condition in space, you can ask a

neural net to predict the solution of the

876

PDE.

877

And even better, you can give to your

878

learning mechanism a library of term that

would be possible candidates for being

879

part of the PDE.

880

And you can let the network tell you which

terms of the PDE in the library are

881

actually, seems to be actually in the data

you are observing.

882

So, I mean, there are all kinds of

inventive way that researchers are now

883

using the fact that deep neural nets are

differentiable.

884

smooth, can generalize easily, and yes,

those universal approximators.

885

I mean, seemingly you can use neural nets

to represent any kind of function and use

886

that inside their computation problems to

try to, I don't know, answer all kinds of

887

scientific questions.

888

So it's, I believe, pretty exciting.

889

Yeah, yeah, that is super fun.

890

I love how

891

You know, these comes together to help on

really hard sampling problems like

892

sampling ODE's or PDE's, just extremely

hard.

893

So yeah, using that.

894

Maybe one day also we'll get something for

GPs.

895

I know the Gaussian processes are a lot of

the effort is on decomposing them and

896

finding some useful

897

algebraic decompositions, so like the

helper space, Gaussian processes that Bill

898

Engels especially has added to the PrimeC

API, or eigenvalue decomposition, stuff

899

like that.

900

But I'd be curious to see if there are

also some initiatives on trying to help

901

the conversion of Gaussian processes using

probably deep neural networks, because

902

there is a mathematical connection between

neural networks and GPs.

903

I mean, everything is a GP in the end, it

seems.

904

So yeah, using a neural network to

facilitate the sampling of a Gaussian

905

process would be super fun.

906

So I have so many more questions.

907

But when I be mindful of your time, we've

already been recording for some time.

908

So I try to make my thoughts more packed.

909

But something I wanted to ask you

910

You teach actually a course in

Polytechnique in France that's called

911

Emerging Topics in Machine Learning.

912

So I'm curious to hear you say what are

some of the emerging topics that excite

913

you the most and how do you approach

teaching them?

914

So in this class, it's actually the nice

class where we have a wild card to just

915

talk about whatever we want.

916

So as far as I'm concerned, I'm really

teaching about the last point that we

917

discussed, which is how can we hope to use

the technology of machine learning to

918

assist scientific computing.

919

And I have colleagues that are jointly

teaching this class with me that are, for

920

example, teaching about optimal transport

or about private and federated learning.

921

So it can be different topics.

922

But we all have the same approach to it,

which is to introduce to the students the

923

main ideas quite briefly and then to give

them the opportunity to learn, to read

924

papers that we believe are important or at

least really illustrative of those ideas

925

and the direction in which the research is

going and to read these papers, of course,

926

critically.

927

So the idea is that we want to make sure

that they are understood.

928

We also want them to implement the

methods.

929

And once you implement the methods, you

realize everything that is sometimes under

930

the rug in the paper.

931

So where is it really difficult?

932

Where the method is really making a

difference?

933

And so on and so forth.

934

So that's our approach to it.

935

Yeah, that must be a very fun course.

936

At which level do you teach that?

937

So our students are third year at Ecole

Polytechnique.

938

So that would be equivalent to the first

year of graduate program.

939

Yeah.

940

And actually, looking forward, what do you

think are the most promising areas of

941

research in what you do?

942

So basically, interaction of machine

learning and statistical physics.

943

Well, I think something that actually has

been and will continue being a very, very

944

fruitful field between statistical

mechanics and machine learning are

945

generative models.

946

So you probably heard of diffusion models,

and there are new kind of generative

947

models that are relying on learning how to

reverse a diffusion process, a diffusion

948

process that is noising the data.

949

once you've learned how to reverse it,

will allow you to transform noise into

950

data.

951

It's something that is really close to

statistical mechanics because the

952

diffusion really comes from studying

brilliant particles that are all around

953

us.

954

And this is where this mathematics comes

from.

955

And this is still an object of study in

the field of statistical mechanics.

956

And you've served a lot of machine

learning models.

957

I could also cite Boltzmann machines.

958

I mean, they have even the name of the

father of statistical mechanics,

959

Boltzmann.

960

And it's here again, I mean, something

where it's really inspiration from the

961

model studied by physicists that gave the

first forms of models that were used by

962

machine learner in order to do density

estimation.

963

So there is really this cross-fatalization

964

has been here for, I guess, the last 50

years.

965

The field of machine learning has really

emerged in the communities.

966

And I'm hoping that my work and all the

groups that are working in this direction

967

are also going to demonstrate the other

way around, that generative models can

968

help also a lot in statistical mechanics.

969

So that's definitely what I am looking

forward to.

970

Yeah.

971

Yeah, I love that and understand why

you're talking about that, especially now

972

with the whole conversation we've had.

973

That your answer is not surprising to me.

974

Actually, something also that I mean, even

broader than that, I'm guessing you

975

already care a lot about these questions

from what I get, but if you could choose

976

the questions you'd like to see the answer

to before you die, what would they be?

977

That's obviously a very vast question.

978

If I stick to a bit really this...

979

what we've discussed about the sampling

problems and where I think they are hard

980

and why they are so intriguing.

981

I think that something I'm very keen on

seeing some progress around is this

982

question of sampling multimodal

distributions but have come up with

983

guarantees.

984

Here, there's really, in a sense, sampling

a multimodal distribution could be just

985

judged.

986

undoable.

987

I mean, there is some NP-hardness that is

hidden somewhere in this picture.

988

So of course, it's not going to be

something general, but I'm really

989

wondering, I mean, I'm really thinking

that there should be some assumption, some

990

way of formalizing the problem under which

we could understand how to construct

991

algorithms that will probably, you know,

succeed in making this something happen.

992

And so here, I don't know, it's a

theoretical question, but I'm

993

very curious about what we will manage to

say in this direction.

994

Yeah.

995

And actually that sets us up, I think, for

the last two questions of the show.

996

So, I mean, I have other questions, but

already I've been recording for a long

997

time.

998

So I need to let you go and have dinner.

999

I know it's late for you.

Speaker:

So let me ask you the last two questions.

Speaker:

I ask every guest at the end of the show.

Speaker:

First one.

Speaker:

If you had unlimited time and resources,

which problem would you try to solve?

Speaker:

I think it's an excellent question because

it's an excellent opportunity maybe to say

Speaker:

that we don't have unlimited resources.

Speaker:

I think it's probably the biggest

challenge we have right now to understand

Speaker:

and to collectively understand because I

think now we individually understand that

Speaker:

we don't have unlimited resources.

Speaker:

And in a sense the...

Speaker:

the biggest problem is how do we move this

complex system of human societies we have

Speaker:

created in order to move within the

direction where we are using precisely

Speaker:

less resources.

Speaker:

And I mean, it has nothing to do with

anything that we have discussed before,

Speaker:

but it feels to me that it's really where

the biggest question is lying that really

Speaker:

matters today.

Speaker:

And I have no clue how to approach it.

Speaker:

But

Speaker:

I think it's actually what matters.

Speaker:

And if I had a limit in time and

resources, that's definitely what I would

Speaker:

be researching towards.

Speaker:

Yeah.

Speaker:

Love that answer.

Speaker:

And you're definitely in good company.

Speaker:

Lots of people have talked about that for

this question, actually.

Speaker:

And second question, if you could have

dinner with any great scientific mind,

Speaker:

dead, alive, or fictional, who would it

be?

Speaker:

So, I mean, a logic answer with my last

response is actually Grotendieck.

Speaker:

So, I don't know, you probably know about

this mathematician who, I mean, was

Speaker:

somebody worried about, you know, our

relationship to the world, let's say, as

Speaker:

scientists very early on, and who had

concluded that to some extent we should

Speaker:

not be doing research.

Speaker:

So...

Speaker:

I don't know that I agree, but I also

don't think it's obviously wrong.

Speaker:

So I think it would be really probably one

of the most interesting discussion to be

Speaker:

added on top that he was a fantastic

speaker.

Speaker:

And I do invite you to listen to his

conferences and that it would be really

Speaker:

fascinating to have this conversation.

Speaker:

Yeah.

Speaker:

Great.

Speaker:

Great answer.

Speaker:

You know, definitely the first one to

answer Grotendic.

Speaker:

But that'd be cool.

Speaker:

Yeah.

Speaker:

If you have a favorite conference of him,

feel free to put that in the show notes

Speaker:

for listeners, I think it's going to be

really interesting and fun for people.

Speaker:

Might be in French, but...

Speaker:

I mean, there are a lot of subtitles now.

Speaker:

If it's in YouTube, it's doing a pretty

good job at the automated transcription,

Speaker:

especially in English.

Speaker:

So I think it will be okay.

Speaker:

And that will be good for people's French

lessons.

Speaker:

So yeah, you know, two birds with one

stone.

Speaker:

So definitely include that now.

Speaker:

Awesome, Marie-Lou.

Speaker:

So that was really great.

Speaker:

Thanks a lot for taking the time and being

so generous with your time.

Speaker:

I'm happy because I had a lot of

questions, but I think we did a pretty

Speaker:

good job at tackling most of them.

Speaker:

As usual,

Speaker:

I put resources and a link to your website

in the show notes for those who want to

Speaker:

dig deeper.

Speaker:

Thank you again, Marie-Lou, for taking the

time and being on this show.

Speaker:

Thank you so much for having me.