Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
How does the world of statistical physics intertwine with machine learning, and what groundbreaking insights can this fusion bring to the field of artificial intelligence?
In this episode, we delve into these intriguing questions with Marylou Gabrié. an assistant professor at CMAP, Ecole Polytechnique in Paris. Having completed her PhD in physics at École Normale Supérieure, Marylou ventured to New York City for a joint postdoctoral appointment at New York University’s Center for Data Science and the Flatiron’s Center for Computational Mathematics.
As you’ll hear, her research is not just about theoretical exploration; it also extends to the practical adaptation of machine learning techniques in scientific contexts, particularly where data is scarce.
In this conversation, we’ll traverse the landscape of Marylou’s research, discussing her recent publications and her innovative approaches to machine learning challenges, latest MCMC advances, and ML-assisted scientific computing.
Beyond that, get ready to discover the person behind the science – her inspirations, aspirations, and maybe even what she does when not decoding the complexities of machine learning algorithms!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉
Takeaways
- Developing methods that leverage machine learning for scientific computing can provide valuable insights into high-dimensional probabilistic models.
- Generative models can be used to speed up Markov Chain Monte Carlo (MCMC) methods and improve the efficiency of sampling from complex distributions.
- The Adaptive Monte Carlo algorithm augmented with normalizing flows offers a powerful approach for sampling from multimodal distributions.
- Scaling the algorithm to higher dimensions and handling discrete parameters are ongoing challenges in the field.
- Open-source packages, such as Flow MC, provide valuable tools for researchers and practitioners to adopt and contribute to the development of new algorithms. The scaling of algorithms depends on the quantity of parameters and data. While some methods work well with a few hundred parameters, larger quantities can lead to difficulties.
- Generative models, such as normalizing flows, offer benefits in the Bayesian context, including amortization and the ability to adjust the model with new data.
- Machine learning and MCMC are complementary and should be used together rather than replacing one another.
- Machine learning can assist scientific computing in the context of scarce data, where expensive experiments or numerics are required.
- The future of MCMC lies in the exploration of sampling multimodal distributions and understanding resource limitations in scientific research.
Links from the show:
- Marylou’s website: https://marylou-gabrie.github.io/
- Marylou on Linkedin: https://www.linkedin.com/in/marylou-gabri%C3%A9-95366172/
- Marylou on Twitter: https://twitter.com/marylougab
- Marylou on Github: https://github.com/marylou-gabrie
- Marylou on Google Scholar: https://scholar.google.fr/citations?hl=fr&user=5m1DvLwAAAAJ
- Adaptive Monte Carlo augmented with normalizing flows: https://arxiv.org/abs/2105.12603
- Normalizing-flow enhanced sampling package for probabilistic inference: https://flowmc.readthedocs.io/en/main/
- Flow-based generative models for Markov chain Monte Carlo in lattice field theory: https://journals.aps.org/prd/abstract/10.1103/PhysRevD.100.034515
- Boltzmann generators – Sampling equilibrium states of many-body systems with deep learning: https://www.science.org/doi/10.1126/science.aaw1147
- Solving Statistical Mechanics Using Variational Autoregressive Networks: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.122.080602
- An example of discrete version of similar algorithms: https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.3.L042024
- Grothendieck’s conference: https://www.youtube.com/watch?v=ZW9JpZXwGXc
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
How does the world of statistical physics
intertwine with machine learning, and what
2
groundbreaking insights can this fusion
bring to the field of artificial
3
intelligence?
4
In this episode, we'll delve into these
intriguing questions with Marilou Gavrier.
5
Having completed her doctorate in physics
at Ecole Normale Supérieure, Marilou
6
ventured to New York City for a joint
postdoctoral appointment at New York
7
University's Center for Data Science.
8
and the Flatirons Center for Computational
Mathematics.
9
As you'll hear, her research is not just
about theoretical exploration, it also
10
extends to the practical adaptation of
machine learning techniques in scientific
11
contexts, particularly where data are
scarce.
12
And this conversation will traverse the
landscape of Marie-Lou's research,
13
discussing her recent publications and her
innovative approaches to machine learning
14
challenges.
15
her inspirations, aspirations, and maybe
even what she does when she's not decoding
16
the complexities of machine learning
algorithms.
17
This is Learning Bayesian Statistics,
episode 98, recorded November 23, 2023.
18
Let me show you how to be a good lazy and
change your predictions.
19
Marie-Louis Gabrié, welcome to Learning
Bayesian Statistics.
20
Thank you very much, Alex, for having me.
21
Yes, thank you.
22
And thank you to Virgil, André and me for
putting us in contact.
23
This is a French connection network here.
24
So thanks a lot, Virgil.
25
Thanks a lot, Marie-Lou for taking the
time.
26
I'm probably going to say Marie-Lou
because it flows better in my English
27
because saying Marie-Lou is and then
continuing with English.
28
I'm going to have the French accent, which
nobody wants to hear that.
29
So let's start.
30
So I gave a bit of...
31
of your background in the intro to this
episode, Marie-Lou, but can you define the
32
work that you're doing nowadays and the
topics that you are particularly
33
interested in?
34
I would define my work as being focused on
developing methods and more precisely
35
developing methods that use and leverage
all the progress in machine learning for
36
scientific computing.
37
I have a special focus within this realm.
38
which is to study high-dimensional
probabilistic models, because they really
39
come up everywhere.
40
And I think they give us a very particular
lens on our world.
41
And so I would say I'm working broadly in
this direction.
42
Well, that sounds like a lot of fun.
43
So I understand why Virgil put me in
contact with you.
44
And could you start by telling us about
your journey?
45
actually into the field of statistical
physics and how it led you to merge these
46
interests with machine learning and what
you're doing today.
47
Absolutely.
48
My background is actually in physics, so I
studied physics.
49
Among the topics in physics, I quickly
became interested in statistical
50
mechanics.
51
I don't know if all listeners would be
familiar with statistical mechanics, but I
52
would define it.
53
broadly as the study of complex systems
with many interacting components.
54
So it could be really anything.
55
You could think of molecules, which are
networks of interacting agents that have
56
non-trivial interactions and that have
non-trivial behaviors when put all
57
together within one system.
58
And I think it's really important, as I
was saying, viewpoint of the world today
59
to look at those big macroscopic systems
that you can study probabilistically.
60
And so I was quickly interested in this
field that is statistical mechanics.
61
And at some point machine learning got the
picture.
62
And the way it did is that I was looking
for a PhD in 2015.
63
And I had some of my friends that were,
you know, students in computer science and
64
kind of early commerce to machine
learning.
65
And so I started to know that it existed.
66
I started to know that actually deep
neural networks were revolutionizing the
67
fields, that you could expect a program
to, I don't know, give names to people in
68
pictures.
69
And I thought, well, if this is possible,
I really wanna know how it works.
70
I really want to, for this technology, not
to sound like magic to me, and I want to
71
know about it.
72
And so this is how I started to become
interested and to...
73
find out that people knew how to make it
work, but not how it worked, why it worked
74
so well.
75
And so this is how I, in the end, was put
into contact with Florence Akala, who was
76
my PhD advisor.
77
And I started to have this angle of trying
to use statistical mechanics framework to
78
study deep neural networks that are
precisely those complex systems I was just
79
mentioning, and that are so big that we
are having trouble making really sense of
80
what they are doing.
81
Yeah, I mean, that must be quite...
82
Indeed, it must be quite challenging.
83
We could already dive into that.
84
That sounds like fun.
85
Do you want to talk a bit more about that
project?
86
Since then, I really shifted my angle.
87
I studied in this direction for, say,
three, four years.
88
Now, I'm actually going back to really the
applications to real-world systems, let's
89
say.
90
using all the potentialities of deep
learning.
91
So it's like the same intersection, but
looking at it from the other side.
92
Now really looking at application and
using machine learning as a tool, where I
93
was looking at machine learning as my
study, my object of study, and using
94
statistical mechanics before.
95
So I'm keen on talking about what I'm
doing now.
96
Yeah.
97
So basically you...
98
You changed, now you're doing the other
way around, right?
99
You're studying statistical physics with
machine learning tools instead of doing
100
the opposite.
101
And so how does, yeah, what does that look
like?
102
What does that mean concretely?
103
Maybe can you talk about an example from
your own work so that listeners can get a
104
better idea?
105
Yeah, absolutely.
106
So.
107
As I was saying, statistical mechanics is
really about large systems that we study
108
probabilistically.
109
And here there's a tool, I mean, that
would be one of the, I would say, most
110
active direction of research in machine
learning today, which are generative
111
models.
112
And they are very natural because there
are ways of making probabilistic model,
113
but that you can control.
114
That you have control.
115
produce samples from within one commons,
where you are in need of very much more
116
challenging algorithms if you want to do
it in a general physical system.
117
So we have those machines that we can
leverage and that we can actually combine
118
in our typical computation tools such as
Markov chain Monte Carlo algorithms, and
119
that will allow us to speed up the
algorithms.
120
Of course, it requires some adaptation
compared to what people usually do in
121
machine learning and how those generative
models were developed, but it's possible
122
and it's fascinating to try to make those
adaptations.
123
Hmm.
124
So, yeah, that's interesting because if I
understand correctly, you're saying that
125
one of your...
126
One of the aspects of your...
127
job is to understand how to use MCMC
methods to speed up these models?
128
Actually, it's the other way around, is
how to use those models to speed up MCMC
129
methods.
130
Okay.
131
Can you talk about that?
132
That sounds like fun.
133
Yeah, of course.
134
Say MCMC algorithms, so Markov Chain
Monte-Carlo's are really the go-to
135
algorithm when you are faced with
probabilistic models that is describing
136
whichever system you care about, say it
might be a molecule, and this molecule has
137
a bunch of atoms, and so you know that you
can describe your system, I mean at least
138
classically, at the level of giving the
Cartesian coordinates of all the atoms in
139
your system.
140
And then you can describe the equilibrium
properties of your system.
141
by using the energy function of this
molecule.
142
So if you believe that you have an energy
function for this molecule, then you
143
believe that it's distributed as
exponential minus beta the energy.
144
This is the Boltzmann distribution.
145
And then, okay, you are left with your
probabilistic model.
146
And if you want to approach it, a priori
you have no control onto what this energy
147
function is imposing as constraints.
148
It may be very, very complicated.
149
Well, go-to algorithm is Markov chain
Monte Carlo.
150
And it's a go-to algorithm that is always
going to work.
151
And here I'm putting quotes around this
thing.
152
Because it's going to be a greedy
algorithm that is going to be looking for
153
plausible configurations next to other
plausible configurations.
154
And locally, make a search on the
configuration space, try to visit it, and
155
then.
156
will be representative of the
thermodynamics.
157
Of course, it's not that easy.
158
And although you can make such locally,
sometimes it's really not enough to
159
describe fully probabilistic modeling, in
particular, how different regions of your
160
configuration space are related to one
another.
161
So if I come back to my molecule example,
it would be that I have two different,
162
let's say, conformations of my molecule,
two main templates that my molecule is
163
going to look like.
164
And they may be divided by what we call an
energy barrier, or in the language of
165
probabilities, it's just low probability
regions in between large probability
166
regions.
167
And in this case, local MCMCs are gonna
fail.
168
And this is where we believe that
generative models could help us.
169
And let's say fill this gap to answer some
very important questions.
170
And how would that work then?
171
Like you would...
172
Would you run a first model that would
help you infer that and then use that into
173
the MCMC algorithm?
174
Or like, yeah, what does that look like?
175
I think your intuition is correct.
176
So you cannot do it in one go.
177
And what's, for example, the paper that I
published, I think it was last year in
178
PNAS that is called Adaptive Monte Carlo
Augmented with Normalizing Flows is
179
precisely implementing something where you
have feedback loops.
180
So
181
The idea is that the fact that you have
those local Monte-Carlo's that you can run
182
within the different regions You have
identified as being interesting Will help
183
you to see the training of a generative
model that is going to target generating
184
configurations in those different regions
Once you have this generative model you
185
can include it in your mark of change
strategy You can use it as a proposal
186
mechanism
187
to propose new locations for your MCMC to
jump.
188
And so you're creating a Monte Carlo chain
that is going to slowly converge towards
189
the target distribution you're really
after.
190
And you're gonna do it by using the data
you're producing to train a generative
191
model that will help you produce better
data as it's incorporated within the MCMC
192
kernel you are actually jumping with.
193
So you have this feedback mechanism that
makes that things can work.
194
And this idea of adaptivity really stems
from the fact that in scientific
195
computing, we are going to do machine
learning with scarce data.
196
We are not going to have all the data we
wish we had to start with, but we are
197
going to have these type of methods where
we are doing things in what we call
198
adaptively.
199
So it's doing, recording information,
doing again.
200
In a few words.
201
Yeah.
202
Yeah, yeah.
203
Yeah.
204
So I mean, if I understand correctly, it's
a way of going one step further than what
205
HMC is already doing where we're looking
at the gradients and we're trying to adapt
206
based on that.
207
Now, basically, the idea is to find some
way of getting even more information as to
208
where the next sample should come from.
209
from the typical set and then being able
to navigate the typical set more
210
efficiently?
211
Yes.
212
Yes, so let's say that it's an algorithm
that is more ambitious than HMC.
213
Of course, there are caveats.
214
But HMC is trying to follow a dynamic to
try to travel towards interesting regions.
215
But it has to be tuned quite finely in
order to actually end up in the next
216
interesting region.
217
provided that it started from one.
218
And so to cross those energy barriers,
here with machine learning, we would
219
really be jumping over energy barriers.
220
We would have models that pretty only
targets the interesting regions and just
221
doesn't care about what's in between.
222
And that really focuses the efforts where
you believe it matters.
223
However, there are cases in which those
machine learning models will have trouble
224
scaling where
225
HMC would be more robust.
226
So there is of course always a trade-off
on the algorithms that you are using, how
227
efficient they can be per MCMC step and
how general you can accept them to be.
228
Hmm.
229
I see.
230
Yeah.
231
So, and actually, yeah, that would be one
of my questions would be, when do you
232
think this kind of new algorithm would be?
233
would be interesting to use instead of the
classic and Chempsey?
234
Like in which cases would you say people
should give that a try instead of using
235
the classic rubber state Chempsey method
we have right now?
236
So that's an excellent question.
237
I think right now, so on paper, the
algorithm we propose is really, really
238
powerful because it will allow you to jump
throughout your space and so to...
239
to correlate your MCMC configurations
extremely fast.
240
However, for this to happen, you have that
the proposal that is made by your deep
241
generative model as a new location, I
mean, a new configuration in your MCMC
242
chain is accepted.
243
So in the end, you don't have anymore the
fact that you are jumping locally and that
244
your de-correlation comes from the fact
that you are going to make lots of local
245
jumps.
246
Here you could correlate in one step, but
you need to accept.
247
So the acceptance will be really what you
need to care about in running the
248
algorithm.
249
And what is going to determine whether or
not your acceptance is high is actually
250
the agreement between your deep generative
model and the target distribution you're
251
after.
252
And we have traditional, you know,
253
challenges here in making the genetic
model look like exactly the target we
254
want.
255
There are issues with scalability and
there are issues with, I would say,
256
constraints.
257
So you give me, let's say you're
interested in Bayesian inference, so
258
another case where we can apply these kind
of algorithms, right?
259
Because you have a posterior and you just
want to sample from your posterior to make
260
sense
261
10, 100.
262
I tell you, I know how to train
normalizing flows, which are the specific
263
type of generative models we are using
here, in 10 or 100 dimension.
264
So if you believe that your posterior is
multimodal, that it will be hard for
265
traditional algorithms to visit the entire
landscape and equilibrate because there
266
are some low density regions in between
high density regions, go for it.
267
If you...
268
actually are an astronomer and you want to
marginalize over your initial conditions
269
on a grid that represents the universe and
actually the posterior distribution you're
270
interested in is on, you know, variables
that are in millions of dimension.
271
I'm sorry.
272
We're not going to do it with you and you
should actually use something that is more
273
general, something that will use a local
search, but that is actually going to, you
274
know, be
275
Unperfect, right?
276
Because it's going to be very, very hard
also for this algorithm to work.
277
But the magic of the machine learning will
not scale yet to this type of dimensions.
278
Yeah, I see.
279
And is that an avenue you're actively
researching to basically how to scale
280
these algorithms better to be your scams?
281
Yeah, of course.
282
Of course we can always try to do better.
283
So, I mean, as far as I'm concerned, I'm
also very interested in sampling physical
284
systems.
285
And in physical systems, there are a lot
of, you know, prior information that you
286
have on the system.
287
You have symmetries, you have, I don't
know, yeah, physical rules that you know
288
that the system has to fulfill.
289
Or maybe some, I don't know, multi-scale.
290
property of the probability distribution,
you know that there are some
291
self-significant similarities, you have
information you can try to exploit in two
292
ways, either in the sampling part, so
you're having this coupled MCMC with the
293
degenerative models, so either in the way
you make proposals you can try to
294
symmetrize them, you can try to explore
the symmetry by any means.
295
Oh, you can also directly put it in the
generative model.
296
So those are things that really are
crucial.
297
And we understand very well nowadays that
it's naive to think you will learn it all.
298
You should really use as much information
on your system as you may, as you can.
299
And after that, you can go one step
further with machine learning.
300
But in non-trivial systems, it would be, I
mean, it's not a big deal.
301
deceiving to believe that you could just
learn things.
302
Yeah.
303
I mean, completely resonate with that.
304
It's definitely something we will always
tell students or clients, like, don't
305
just, you know, throw everything at the
model that you can and just try to pray
306
that the model works like that.
307
And, but actually you should probably use
a generative perspective to
308
try and find out what the best way of
thinking about the problem is, what would
309
be the good enough, simple enough model
that you can come up with and then try to
310
run that.
311
Yeah, so definitely I think that resonates
with a lot of the audience where think
312
generatively.
313
And from what I understand from what you
said is also trying to put as much
314
knowledge and information as you have in
your generative model.
315
the deep neural network is here, the
normalizing flow is here to help, but it's
316
not going to be a magical solution to a
suboptimally specified model.
317
Yes, yes.
318
Of course, in all those problems, what's
hidden behind is the curse of
319
dimensionality.
320
If we are trying to learn something in
very high dimension and...
321
It could be arbitrarily hard.
322
It could be that you cannot learn
something in high dimension just because
323
you would need to observe all the location
in this high dimension to get the
324
information.
325
So of course, this is in general not the
case, because what we are trying to learn
326
has some structure, some underlying
structure that is actually described by
327
fewer dimensions.
328
And you actually need fewer observations
to actually learn it.
329
But the question is, how do you find those
structures, and how do you put them in?
330
Therefore, we need to take into account as
much as the knowledge we have on the
331
system to make this learning as efficient
as possible.
332
Yeah, yeah, yeah.
333
Now, I mean, that's super interesting.
334
And that's your paper, Adaptive Monte
Carlo augmented with normalizing floats,
335
right?
336
So this is the paper where we did this
generally.
337
And I don't have yet a paper out where we
are trying to really put the structure in
338
the generative models.
339
But that's the direction I'm actively
340
Okay, yeah.
341
I mean, so for sure, we'll put that paper
I just seated in the show notes for people
342
who want to dig deeper.
343
And also, if by the time this episode is
out, you have the paper or a preprint,
344
feel free to add that to the show notes or
just tell me and I'll add that to the show
345
notes.
346
That sounds really interesting for people
to read.
347
And so I'm curious, like, you know, this
idea of normalizing flows
348
deep neural network to help MCMC sample
faster, converge faster to the typical
349
set.
350
What was the main objective of doing that?
351
I'm curious why did you even start
thinking and working on that?
352
So yes, I think for me,
353
The answer is really this question of
multimodality.
354
So the fact that you may be interested in
priority distribution for which it's very
355
hard to connect the different interesting
regions.
356
In statistical mechanics, it's something
that we called actually metastability.
357
So I don't know if it's a word you've
already heard, but where some communities
358
talk about multimodality, we talk about
metastability.
359
And metastability are at the heart of many
interesting phenomena in physics.
360
be it phase transitions.
361
And therefore, it's something very
challenging in the computations, but in
362
the same time, very crucial that we have
an understanding of.
363
So for us, it felt like there was this big
opportunity with those probabilistic
364
models that were so malleable, that were
so, I mean, of course, hard to train, but
365
then they give you so much.
366
They give you an exact...
367
value for the density that they encode,
plus the possibility of sampling from them
368
very easily, getting just a bunch of
high-ID samples just in one run through a
369
neural network.
370
So for us, there was really this
opportunity of studying multimodal
371
distribution, in particular, metastable
systems from statistical mechanics with
372
those tools.
373
Yeah.
374
Okay.
375
So in theory,
376
these normalizing flows are especially
helpful to handle multimodal posterior.
377
I didn't get that at first, so that's
interesting.
378
Yep.
379
That's really what they're going to offer
you is the possibility to make large
380
jumps, actually to make jumps within your
Markov chain that can go from one location
381
of high density to another one.
382
just in one step.
383
So this is what you are really interested
in.
384
Well, first of all, in one step, so you're
going far in one step.
385
And second of all, regardless of how low
is the density between them, because if
386
you were to run some other type of local
MCMC, you would, in a sense, need to find
387
a path between the two modes in order to
visit both of them.
388
In our case, it's not true.
389
You're just completely jumping out of the
blue thanks to...
390
your normalizing flows that is trying to
mimic your target distribution, and
391
therefore that has developed mass
everywhere that you believe matters, and
392
that from which you can produce an IID
sample wherever it supports very easily.
393
I see, yeah.
394
And I'm guessing you did some benchmarks
for the paper?
395
So I think that's actually a very
interesting question you're asking,
396
because I feel benchmarks are extremely
difficult, both in MCMC...
397
and in deep learning.
398
So, I mean, you can make benchmarks say,
okay, I changed the architecture and I see
399
that I'm getting something different.
400
I can say, I mean, but otherwise, I think
it's one of the big challenges that we
401
have today.
402
So if I tell you, okay, with my algorithm,
I can write an MCMC that is going to mix
403
between the different modes, between the
different metastable states.
404
that's something that I don't know how to
do by any other means.
405
So the benchmark is, actually you won.
406
There is nothing to be compared with, so
that's fine.
407
But if I need to compare on other cases
where actually I can find those algorithms
408
that will work, but I know that they are
going to probably take more iterations,
409
then I still need to factor in a lot of
things in my true
410
honest benchmark.
411
I need to factor in the fact that I run a
lot of experiments to choose the
412
architecture of my normalizing flow.
413
I run a lot of experiments to choose the
hyperparameters of my training and so on
414
and so forth.
415
And I don't see how we can make those
honest benchmarks nowadays.
416
So I can make one, but I don't think I
will think very highly that it's, I mean,
417
you know, really revealing some profound
truth about
418
which solution is really working.
419
The only way of making a known-use
benchmark would be to take different
420
teams, give them problems, and lock them
in a room and see who comes out first with
421
the solution.
422
But I mean, how can we do that?
423
Well, we can call on listeners who are
interested to do the experiments to
424
contact us.
425
That would be the first thing.
426
But yeah, that's actually a very good
point.
427
And in a way, that's a bit frustrating,
right?
428
Because then it means at least
experimentally, it's hard to differentiate
429
between the efficiency of the different
algorithms.
430
So I'm guessing the claims that you make
about this new algorithm being more
431
efficient for multimodalities,
432
theoretical underpinning of the algorithm?
433
No, I mean, it's just based on the fact
that I don't know of any other algorithm,
434
which under the same premises, which can
do that.
435
So, I mean, it's an easy way out of making
any benchmark, but also a powerful one
436
because I really don't know who to compare
to.
437
But indeed, I think then it's...
438
As far as I'm concerned, I'm mostly
interested in developing methodologies.
439
I mean, that's just what I like to do.
440
But of course, what's important is that
those methods are going to work and they
441
are going to be useful to some communities
that really have research questions that
442
they want to answer.
443
I mean, research or not actually could be
engineering questions, decisions to be
444
taken that require to do an MCMC.
445
And I think the true tests of
446
whether or not the algorithm is useful is
going to be this, the test of time.
447
Are people adopting the algorithms?
448
Are they seeing that this is really
something that they can use and that would
449
make their inference work where they could
not find another method that was as
450
efficient?
451
And in this direction, there is the
cross-collaborator, Case Wong, who is
452
working at the Flatiron Institute and with
whom we developed a package that is called
453
FlowMC.
454
that is written in Jax and that implements
these algorithms.
455
And the idea was really to try to write a
package that was as user-friendly as
456
possible.
457
So of course we have the time we have to
take care of it and the experience we have
458
as a region, you know, available softwares
as we have, but we really try hard.
459
And at least in this community of people
studying gravitational waves, it seems
460
that people are really trying, starting to
use this in their research.
461
And so I'm excited, and I think it is
useful.
462
But it's not the proper benchmark you
would dream of.
463
Yeah, you just stole one of my questions.
464
Basically, I was exactly going to ask you,
but then how can people try these?
465
Is there a package somewhere?
466
So yeah, perfect.
467
That's called FlowMC, you told me.
468
Yes, it's called FlowMC.
469
You can pip install FlowMC, and you will
have it.
470
If you are allergic to Jax...
471
Right, I have it here.
472
Yeah, there is a read the docs.
473
So I'll put that in the show notes for
sure.
474
Yes, we have even documentation.
475
That's how far you go when you are
committed to having something that is used
476
and useful.
477
So I mean, of course, we are also open to
both comments and contributions.
478
So just write to us if you're interested.
479
Yeah, for sure.
480
Yeah, that folks, if you are interested in
contributing, if you see any bugs, make
481
sure to open some issues on the GitHub
repo or even better, contribute pull
482
requests.
483
I'm sure Marie-Doux and the co-authors
will be very happy about that.
484
Yes, you know typos in the documentation,
all of this.
485
Yeah, exactly.
486
That's what I...
487
I tell everyone also who wants to start
doing some open source package, start with
488
the smallest PRs.
489
You don't have to write a new algorithm,
like already fixing typos, making the
490
documentation look better, and stuff like
that.
491
That's extremely valuable, and that will
be appreciated.
492
So for sure, do that, folks.
493
Do not be shy with that kind of stuff.
494
So yeah, I put already the paper, you have
out an archive at adaptive Monte Carlo and
495
Flow MC, I put that in the show notes.
496
And yeah, to get back to what you were
saying, basically, I think as more of a
497
practitioner than a person who developed
the algorithms, I would say the reasons I
498
would...
499
you know, adopt that kind of new
algorithms would be that, well, I know,
500
okay, that algorithm is specialized,
especially for handling multimodels,
501
multimodels posterior.
502
So then I'd be, if I have a problem like
that, I'll be like, oh, okay, yeah, I can
503
use that.
504
And then also ease of adoption.
505
So is there an open source package in
which languages that can I just, you know,
506
What kind of trade-off basically do I have
to make?
507
Is that something that's easy to adopt?
508
Is that something that's really a lot of
barriers to adoptions?
509
But at the same time, it really seems to
be solving my problem.
510
You know what I'm saying?
511
It's like, indeed, it's not only the
technical and theoretical aspects of the
512
method, but also how easy it is to...
513
adopt in your existing workflows.
514
Yes.
515
And for this, I guess it's, I mean, the
feedback is extremely valuable because
516
when you know the methods, you're really,
it's hard to exactly locate where people
517
will not understand what you meant.
518
And so I really welcomed.
519
No, for sure.
520
And already I find that absolutely
incredible that now
521
Almost all new algorithms, at least that I
talk about on the podcast and that I see
522
in the community, on the PMC community,
almost all of them now, when they come up
523
with a paper, they come out with an open
source package that's usually installable
524
in a Python, in the Python ecosystem.
525
Which is really incredible.
526
I remember that when I started on these a
few years ago, it was really not the norm
527
and much more the exception and now almost
528
The Icon Panning open source package is
almost part of the paper, which is really
529
good because way more people are going to
use the package than read the paper.
530
So, this is absolutely a fantastic
evolution.
531
And thank you in the name of our soul to
have taken the time to develop the
532
package, clean up the code, put that on
PyPI and making the documentation because
533
That's where the academic incentives are a
bit disaligned with what I think they
534
should be.
535
Because unfortunately, literally it takes
time for you to do that.
536
And it's not very much appreciated by the
academic community, right?
537
It's just like, you have to do it, but
they don't really care.
538
We care as the practitioners, but the
academic world doesn't really.
539
And what counts is the paper.
540
So for now, unfortunately, it's really
just time that you take.
541
out of your paper writing time.
542
So I'm sure everybody appreciates it.
543
Yes, but I don't know.
544
I see true value to it.
545
And I think, although it's maybe not as
rewarded as it should, I think many of us
546
see value in doing it.
547
So you're very welcome.
548
Yeah, yeah.
549
No, for sure.
550
Lots of value in it.
551
Just saying that value should be more
recognized.
552
Just a random question, but something I'm
always curious about.
553
I think I know the answer if I still want
to ask.
554
Can you handle sample discrete parameters
with these algorithms?
555
Because that's one of the grails of the
field right now.
556
How do you sample discrete parameters?
557
So, okay, the pack, so what I've
implemented, tested, is all on continuous
558
space.
559
But, but what I need for this algorithm to
work is a generative model of which I can
560
sample from easily.
561
IID, I mean, not I have to make a Monte
Carlo to sample from my note that I can
562
just in one Python comment or whichever
language you want comment, gets an IID
563
sample from.
564
and that I can write what is the
likelihood of this sample.
565
Because a lot of generative models
actually don't have tractable likelihoods.
566
So if you think, I don't know, of
generative adversarial networks or
567
variational entoencoders for people who
might be familiar with those very, very
568
common generative models, they don't have
this property.
569
You can generate samples easily, but you
cannot write down with which density of
570
probability you've generated this sample.
571
This is really what we need in order to
use this generative model inside a Markov
572
chain and inside an algorithm that we know
is going to converge towards the target
573
distribution.
574
So normalizing flows are playing this role
for us with continuous variables.
575
They give us easy sampling and easy
evaluation of the likelihood.
576
But you also have equivalence on discrete
distributions.
577
And if you want...
578
generative model that would have those two
properties on discrete distribution, you
579
should turn yourself to autoregressive
models.
580
So I don't know if you've learned about
them, but the idea is just that they use a
581
factorization of probability distributions
that is just with conditional
582
distributions.
583
And that's something that is in theory has
full expressivity, that any distribution
584
can be written as a factorized
distribution where you are progressively
585
on the degrees of freedom that you have
already sampled.
586
And you can rewrite the algorithm,
training an autoregressive model in the
587
place of a normalizing flow.
588
So honest answer, I haven't tried, but it
can be done.
589
Well, it can be done.
590
And now that I'm thinking about it, people
have done it because in statistical
591
mechanics, there are a lot of systems that
we like.
592
a lot of our toy systems that are binary.
593
So that's, for example, the Ising model,
which are a model of spins that are just
594
binary variables.
595
And I know of at least one paper where
they are doing something of this sort.
596
So making jumps, they're actually not
trying to refresh full configurations, or
597
they are doing two, both refreshing full
configurations and partial configurations.
598
And they are doing...
599
something that, in essence, is exactly
this algorithm, but with discrete
600
variables.
601
So I'll happily add the reference to this
paper, which is, I think, it's by the
602
group of Giuseppe Carleo from EPFL.
603
And OK, I haven't, I don't think they
train exactly like, so it's not exactly
604
the same algorithm, but things around this
have been tested.
605
OK, well, it sounds like a.
606
Sounds like fun, for sure.
607
Definitely something I'm sure lots of
people would like to test.
608
So folks, if you have some discrete
parameters somewhere in your models, maybe
609
you'll be interested by normalizing flows.
610
So the flow in C package is in the show
notes.
611
Feel free to try it out.
612
Another thing I'm curious about is how do
you run the typical network, actually?
613
And how much of a bottleneck is it on the
sampling time, if any?
614
Yes.
615
So it will definitely depend on the space.
616
No, let me rewrite.
617
The thing is, whether or not it's going to
be worth it to train a neural network in
618
order to help you sampling.
619
depends on how difficult this for you to
sample in, I mean, with the more
620
traditional MCMCs that you have on your
hand.
621
So again, if you have a multimodal
distribution, it's very likely that your
622
traditional MCMC algorithms are just not
going to cut it.
623
And so then, I mean, if you really care
about sampling this posterior distribution
624
or this distribution of configurations of
a physical system,
625
then you will be willing to pay the price
on this sampling.
626
So instead of, say, having to use a local
sampler that will take you billions of
627
iterations in order to see transitions
between the modes, you can train a
628
normalizing flow on the autoregressive
model if you're discrete, and then have
629
those jumps happening every other time.
630
Then it's more than clear that it's worth
doing it.
631
OK, yeah, so the answer is it depends
quite a lot.
632
Of course, of course.
633
Yeah, yeah.
634
And I guess, how does it scale with the
quantity of parameters and quantity of
635
data?
636
So quantity of parameters, it's really
this dimension I was already discussing a
637
bit about and telling you that there is a
cap on what you can really expect these
638
methods will work on.
639
I would say that if the quantity of
parameters is something like tens or
640
hundreds, then things are going to work
well, more or less out of the box.
641
But if it's larger than this, you will
likely run into trouble.
642
And then the number of data is actually
something I'm less familiar with because
643
I'm less from the Bayesian communities
than the stat-mech community to start
644
with.
645
So my distribution doesn't have data
embedded in them, in a sense, most of the
646
time.
647
But for sure, what people argue, why it's
a really good idea to use generative
648
models such as normalizing flows to sample
in the Bayesian context.
649
is the fact that you have an amortization
going on.
650
And what do I mean by that?
651
I mean that you're learning a model.
652
Once it's learned, it's going to be easy
to adjust it if things are changing a
653
little.
654
And with little adjustments, you're going
to be able to sample still a very
655
complicated distribution.
656
So say you have data that is arriving
online, and you keep on having new samples
657
to be added to your posterior
distribution.
658
then it's very easy to just adjust the
normalizing flow with a few training
659
iterations to get back to the new
posterior you actually have now, given
660
that you have this amount of data.
661
So this is what some people call
amortization, the fact that you can really
662
encapsulate in your model all the
knowledge you have so far, and then just
663
adjust it a bit, and don't have to start
from scratch, as you would have to in
664
other.
665
Monte Carlo methods.
666
Yeah.
667
Yeah, so what I'm guessing is that maybe
the tuning time is a bit longer than a
668
classic HMC.
669
But then once you're out of the tuning
phase, the sampling is going to be way
670
faster.
671
Yes, I think that's a correct way of
putting it.
672
And otherwise, for the kind of the number
of, I mean, the dimensionality that the
673
algorithm is comfortable with.
674
In general, the running times of the
model, how have you noticed that being
675
like, has that been close to when you use
a classic HMC or is it something you
676
haven't done yet?
677
I don't think I can honestly answer this
question.
678
I think it will depend because it will
also depend how easily your HMC reaches
679
all the
680
regions you actually care about.
681
So I mean, probably there are some
distributions that are very easy for HMC
682
to cover and where it wouldn't be worth it
to train the model.
683
But then plenty of cases where things are
the other way around.
684
Yeah, yeah, yeah.
685
Yeah, I can guess.
686
That's always something that's really
fascinating in this algorithm world is how
687
dependent everything is on the model.
688
use case, really dependent on the model
and the data.
689
So on this project, on this algorithm,
what are the next steps for you?
690
What would you like to develop next on
this algorithm precisely?
691
Yes, so as I was saying, one of my main
questions is how to scale this algorithm
692
and
693
We kind of wrote it in an all-purpose
fashion.
694
And all-purpose is nice, but all-purpose
does not scale.
695
So that's really what I'm focusing on,
trying to understand how we can learn
696
structures we can know or we can learn
from the system, how to explore them and
697
put them in, in order to be able to tackle
more and more complex systems with higher,
698
I mean, more degrees of freedom.
699
So more parameters than what we are
currently doing.
700
So there's this.
701
And of course, I'm also very interested in
having some collaborations with people
702
that care about actual problem for which
this method is actually solving something
703
for them.
704
As it's really what gives you the idea of
what's next to be developed, what are the
705
next methodologies that's
706
will be useful to people?
707
Can they already solve their problem?
708
Do they need something more from you?
709
And that's the two things I'm having a
look at.
710
Yeah.
711
Well, it definitely sounds like fun.
712
And I hope you'll be able to work on that
and come up with some new, amazing,
713
exciting papers on this.
714
I'll be happy to look at that.
715
And so that's it.
716
It was a great deep dive on this project.
717
And thank you for indulging on my
questions, Marilou.
718
Now, if we want to de-zoom a bit and talk
about other things you do, you're also
719
interested to mention that in the context
of scarce data.
720
So I'm curious on what you're doing on
these, if you could elaborate a bit.
721
Yes, so I guess what I mean by scarce data
is precisely that when we are using
722
machine learning in scientific computing,
usually what we are doing is exploiting
723
the great tool that are deep neural
networks to play the role of a surrogate
724
model somewhere in our scientific
computation.
725
But most of the time, this is without data
a priori.
726
We know that there is a function we want
to approximate somewhere.
727
But in order to have data, either we have
to pay the price of costly experiments,
728
costly observations, or we have to pay the
price of costly numerics.
729
So if you, I mean, a very famous example
of applications of machine learning
730
through scientific computing is molecular
dynamics and quantum precision.
731
So this is what people call density
functional theory.
732
So if you want to.
733
observe the dynamics of a molecule with
the accuracy of what's going on really at
734
the level of quantum mechanics, then you
have to make very, very costly call to a
735
function that predicts what's the energy
predicted by quantum mechanics and what
736
are the forces predicted by quantum
mechanics.
737
So people have seen here an opportunity to
use deep neural nets in order to just
738
regress what's the value of this quantum
potential.
739
at the different locations that you're
going to visit.
740
And the idea is that you are creating your
own data.
741
You are deciding when you are going to pay
the price of do the full numerical
742
computation and then obtain a training
point of given Cartesian coordinates, what
743
is the value of this energy here.
744
And then you have to, I mean, conversely
to what you're doing traditionally in
745
machine learning, where you believe that
you have...
746
huge data sets that are encapsulating a
rule, and you're going to try to exploit
747
them at best.
748
Here, you have the choice of where you
create your data.
749
And so you, of course, have to be as smart
as possible in order to have to create as
750
little as possible training points.
751
And so this is this idea of working with
scarce data that has to be infused in the
752
usage of machine learning in scientific
computing.
753
My example of application is just what we
have discussed, where we want to learn a
754
deep generative model, whereas what we
start, we just have our target
755
distribution as an objective, but we don't
have any sample from it.
756
That would be the traditional data that
people will be using in generative
757
modeling to train a generative model.
758
So if you want, we are playing this
adaptive game.
759
I was already a bit eating at.
760
where we are creating data that is not
exactly the data we want, but that we
761
believe is informative of the data we want
to train the generative model that is in
762
turn going to help us to convert the MCMC
and in the same time as you are training
763
your model, generate the data you would
have needed to train your model.
764
Yeah, that is really cool.
765
And of course I asked about that because
scarce data is something that's extremely
766
common in the Bayesian world.
767
That's where usually Bayesian statistics
from the yeah, helpful and useful because
768
when you don't have a lot of data, you
need more structure and more priors.
769
So if you want to say anything about your
phenomenon of interest.
770
So that's really cool that you're working
on that.
771
I love that.
772
And from also, you know, a bit broader
perspective, you know, MCMC really well.
773
We work on it a lot.
774
So I'm curious where you think MCMC is
heading in the next few years.
775
And if you see its relevance waning in
some way.
776
Well, I don't think MCMC can go out of
fashion in a sense because it's absolutely
777
ubiquitous.
778
So practical use cases are everywhere.
779
If you have a large probabilistic model,
usually it's given to you by the nature of
780
the problem you want to study.
781
And if you cannot choose anything about
putting in the right properties, you're
782
just going to be.
783
you know, left with something that you
don't know how to approach except by MCMC.
784
So it's absolutely ubiquitous as an
algorithm for probabilistic inference.
785
And I would also say that one of the
things that are going to, you know, keep
786
MCMC going for a long time is how much
it's a cherished object of study by
787
actually researchers from different
communities, because I mean...
788
You can see people really from statistics
that are kind of the prime researchers on,
789
okay, how should you make a Monte Carlo
method that has the best convergence
790
properties, the best speed of convergence,
and so on and so forth.
791
But you can also see that the fields where
those algorithms are used a lot, be it
792
statistical mechanics, be it Bayesian
inference, also have full communities that
793
are working on developing MCMCs.
794
And so I think it's really a matter that
they are an object of curiosity and in
795
training to a lot of people.
796
And therefore it's something that's for
now is still very relevant and really
797
unsolved.
798
I mean, something that I love about MCMC
is that when you look at it first, you
799
say, yeah, that's simple, you know?
800
Yeah.
801
Yes, that's, but then you start thinking
about it.
802
Then you...
803
I mean, realize how subtle are all the
properties of those algorithms.
804
And you're telling yourself, but I cannot
believe it's so hard to actually sample
805
from distributions that are not that
complicated when you're a naive newcomer.
806
And so, yeah, I mean, for now, I think
they are still here and in place.
807
And if I could even comment a bit more
regarding exactly the context of my
808
research, where
809
it could seemingly be the case that I'm
trying to replace MCMC's with machine
810
learning.
811
I would warn the listeners that it's not
at all what we are concluding.
812
I mean, that's not at all the direction we
are going to.
813
It's really a case where we need both.
814
That MCMC can benefit from learning, but
learning without MCMC is never going to
815
give you something that you have enough
guarantees on, that something that you can
816
really trust for sure.
817
So I think here there is a really nice
combination of MCMC and learning and that
818
they're just going to nutter each other
and not replace one another.
819
Yeah, yeah, for sure.
820
And I really love the, yeah, that these
projects of trying to make basically MCMC
821
more informed instead of having first
random draws, you know, almost random
822
draws with Metropolis in the end.
823
making that more complicated, more
informed with the gradients, with HMC, and
824
then normalizing flows, which try to
squeeze a bit more information out of the
825
structure that you have to make the
sampling go faster.
826
I found that one super useful.
827
And also, yeah, that's also a very, very
fascinating part of the research.
828
And this is part also of a lot of the
research
829
a lot of initiatives that you have focused
on, right?
830
Personally, basically how that we could
decry it like a machine learning assisted
831
scientific computing.
832
You know, and do you have other examples
to share with us on how machine learning
833
is helping traditional scientific
computing methods?
834
Yes.
835
So, for example, I was giving already the
example of
836
of the learning of the regression of the
potentials of molecular force fields in
837
people that are studying molecules.
838
But we are seeing a lot of other things
going on.
839
So there are people that are trying to
even use machine learning as a black box
840
in order to, how should I say, to make
classifications between things they care
841
about.
842
So for example, you have samples that come
from a model.
843
But you're not sure if they come from this
model or this other one.
844
You're not sure if they are above a
critical temperature or below a critical
845
temperature, if they belong to the same
phase.
846
So you can really try to play this game of
creating an artificial data set where you
847
know what is the answer, train a
classifier, and then use your black box to
848
tell you when you see a new configuration
which type of configuration it is.
849
And it's really.
850
given to you by deep learning because you
would have no idea why the neural net is
851
deciding that it's actually from this or
from this.
852
You don't have any other statistics that
you can gather and that will tell you
853
what's the answer and this is why.
854
But it's kind of like opening this new
conceptual door that sometimes there are
855
things that are predictable.
856
I mean, you can check that, okay, on the
data that you know the answer of the
857
machine is extremely efficient.
858
But then you don't know why things are
happening this way.
859
I mean, there's this, but there are plenty
of other directions.
860
So people that are, for example, using
neural networks to try to discover a
861
model.
862
And here, model would be actually what
people call partial differential
863
equations, so PDEs.
864
So I don't know if you've heard about
those physics-informed neural networks.
865
But there are neural networks that people
are training, such that they are solution
866
of a PDE.
867
So instead of actually having training
data, what you do is that you use the
868
properties of the deep neural nets, which
are that they are differentiable with
869
respect to their parameters, but also with
respect to their inputs.
870
And for example, you have a function f.
871
And you know that the laplation of f is
supposed to be equal to.
872
the derivative in time of f, well, you can
write mean squared loss on the fact that
873
the laplacian of your neural network has
to be close to its derivative in time.
874
And then, given boundary conditions, so
maybe initial condition in time and
875
boundary condition in space, you can ask a
neural net to predict the solution of the
876
PDE.
877
And even better, you can give to your
878
learning mechanism a library of term that
would be possible candidates for being
879
part of the PDE.
880
And you can let the network tell you which
terms of the PDE in the library are
881
actually, seems to be actually in the data
you are observing.
882
So, I mean, there are all kinds of
inventive way that researchers are now
883
using the fact that deep neural nets are
differentiable.
884
smooth, can generalize easily, and yes,
those universal approximators.
885
I mean, seemingly you can use neural nets
to represent any kind of function and use
886
that inside their computation problems to
try to, I don't know, answer all kinds of
887
scientific questions.
888
So it's, I believe, pretty exciting.
889
Yeah, yeah, that is super fun.
890
I love how
891
You know, these comes together to help on
really hard sampling problems like
892
sampling ODE's or PDE's, just extremely
hard.
893
So yeah, using that.
894
Maybe one day also we'll get something for
GPs.
895
I know the Gaussian processes are a lot of
the effort is on decomposing them and
896
finding some useful
897
algebraic decompositions, so like the
helper space, Gaussian processes that Bill
898
Engels especially has added to the PrimeC
API, or eigenvalue decomposition, stuff
899
like that.
900
But I'd be curious to see if there are
also some initiatives on trying to help
901
the conversion of Gaussian processes using
probably deep neural networks, because
902
there is a mathematical connection between
neural networks and GPs.
903
I mean, everything is a GP in the end, it
seems.
904
So yeah, using a neural network to
facilitate the sampling of a Gaussian
905
process would be super fun.
906
So I have so many more questions.
907
But when I be mindful of your time, we've
already been recording for some time.
908
So I try to make my thoughts more packed.
909
But something I wanted to ask you
910
You teach actually a course in
Polytechnique in France that's called
911
Emerging Topics in Machine Learning.
912
So I'm curious to hear you say what are
some of the emerging topics that excite
913
you the most and how do you approach
teaching them?
914
So in this class, it's actually the nice
class where we have a wild card to just
915
talk about whatever we want.
916
So as far as I'm concerned, I'm really
teaching about the last point that we
917
discussed, which is how can we hope to use
the technology of machine learning to
918
assist scientific computing.
919
And I have colleagues that are jointly
teaching this class with me that are, for
920
example, teaching about optimal transport
or about private and federated learning.
921
So it can be different topics.
922
But we all have the same approach to it,
which is to introduce to the students the
923
main ideas quite briefly and then to give
them the opportunity to learn, to read
924
papers that we believe are important or at
least really illustrative of those ideas
925
and the direction in which the research is
going and to read these papers, of course,
926
critically.
927
So the idea is that we want to make sure
that they are understood.
928
We also want them to implement the
methods.
929
And once you implement the methods, you
realize everything that is sometimes under
930
the rug in the paper.
931
So where is it really difficult?
932
Where the method is really making a
difference?
933
And so on and so forth.
934
So that's our approach to it.
935
Yeah, that must be a very fun course.
936
At which level do you teach that?
937
So our students are third year at Ecole
Polytechnique.
938
So that would be equivalent to the first
year of graduate program.
939
Yeah.
940
And actually, looking forward, what do you
think are the most promising areas of
941
research in what you do?
942
So basically, interaction of machine
learning and statistical physics.
943
Well, I think something that actually has
been and will continue being a very, very
944
fruitful field between statistical
mechanics and machine learning are
945
generative models.
946
So you probably heard of diffusion models,
and there are new kind of generative
947
models that are relying on learning how to
reverse a diffusion process, a diffusion
948
process that is noising the data.
949
once you've learned how to reverse it,
will allow you to transform noise into
950
data.
951
It's something that is really close to
statistical mechanics because the
952
diffusion really comes from studying
brilliant particles that are all around
953
us.
954
And this is where this mathematics comes
from.
955
And this is still an object of study in
the field of statistical mechanics.
956
And you've served a lot of machine
learning models.
957
I could also cite Boltzmann machines.
958
I mean, they have even the name of the
father of statistical mechanics,
959
Boltzmann.
960
And it's here again, I mean, something
where it's really inspiration from the
961
model studied by physicists that gave the
first forms of models that were used by
962
machine learner in order to do density
estimation.
963
So there is really this cross-fatalization
964
has been here for, I guess, the last 50
years.
965
The field of machine learning has really
emerged in the communities.
966
And I'm hoping that my work and all the
groups that are working in this direction
967
are also going to demonstrate the other
way around, that generative models can
968
help also a lot in statistical mechanics.
969
So that's definitely what I am looking
forward to.
970
Yeah.
971
Yeah, I love that and understand why
you're talking about that, especially now
972
with the whole conversation we've had.
973
That your answer is not surprising to me.
974
Actually, something also that I mean, even
broader than that, I'm guessing you
975
already care a lot about these questions
from what I get, but if you could choose
976
the questions you'd like to see the answer
to before you die, what would they be?
977
That's obviously a very vast question.
978
If I stick to a bit really this...
979
what we've discussed about the sampling
problems and where I think they are hard
980
and why they are so intriguing.
981
I think that something I'm very keen on
seeing some progress around is this
982
question of sampling multimodal
distributions but have come up with
983
guarantees.
984
Here, there's really, in a sense, sampling
a multimodal distribution could be just
985
judged.
986
undoable.
987
I mean, there is some NP-hardness that is
hidden somewhere in this picture.
988
So of course, it's not going to be
something general, but I'm really
989
wondering, I mean, I'm really thinking
that there should be some assumption, some
990
way of formalizing the problem under which
we could understand how to construct
991
algorithms that will probably, you know,
succeed in making this something happen.
992
And so here, I don't know, it's a
theoretical question, but I'm
993
very curious about what we will manage to
say in this direction.
994
Yeah.
995
And actually that sets us up, I think, for
the last two questions of the show.
996
So, I mean, I have other questions, but
already I've been recording for a long
997
time.
998
So I need to let you go and have dinner.
999
I know it's late for you.
Speaker:
So let me ask you the last two questions.
Speaker:
I ask every guest at the end of the show.
Speaker:
First one.
Speaker:
If you had unlimited time and resources,
which problem would you try to solve?
Speaker:
I think it's an excellent question because
it's an excellent opportunity maybe to say
Speaker:
that we don't have unlimited resources.
Speaker:
I think it's probably the biggest
challenge we have right now to understand
Speaker:
and to collectively understand because I
think now we individually understand that
Speaker:
we don't have unlimited resources.
Speaker:
And in a sense the...
Speaker:
the biggest problem is how do we move this
complex system of human societies we have
Speaker:
created in order to move within the
direction where we are using precisely
Speaker:
less resources.
Speaker:
And I mean, it has nothing to do with
anything that we have discussed before,
Speaker:
but it feels to me that it's really where
the biggest question is lying that really
Speaker:
matters today.
Speaker:
And I have no clue how to approach it.
Speaker:
But
Speaker:
I think it's actually what matters.
Speaker:
And if I had a limit in time and
resources, that's definitely what I would
Speaker:
be researching towards.
Speaker:
Yeah.
Speaker:
Love that answer.
Speaker:
And you're definitely in good company.
Speaker:
Lots of people have talked about that for
this question, actually.
Speaker:
And second question, if you could have
dinner with any great scientific mind,
Speaker:
dead, alive, or fictional, who would it
be?
Speaker:
So, I mean, a logic answer with my last
response is actually Grotendieck.
Speaker:
So, I don't know, you probably know about
this mathematician who, I mean, was
Speaker:
somebody worried about, you know, our
relationship to the world, let's say, as
Speaker:
scientists very early on, and who had
concluded that to some extent we should
Speaker:
not be doing research.
Speaker:
So...
Speaker:
I don't know that I agree, but I also
don't think it's obviously wrong.
Speaker:
So I think it would be really probably one
of the most interesting discussion to be
Speaker:
added on top that he was a fantastic
speaker.
Speaker:
And I do invite you to listen to his
conferences and that it would be really
Speaker:
fascinating to have this conversation.
Speaker:
Yeah.
Speaker:
Great.
Speaker:
Great answer.
Speaker:
You know, definitely the first one to
answer Grotendic.
Speaker:
But that'd be cool.
Speaker:
Yeah.
Speaker:
If you have a favorite conference of him,
feel free to put that in the show notes
Speaker:
for listeners, I think it's going to be
really interesting and fun for people.
Speaker:
Might be in French, but...
Speaker:
I mean, there are a lot of subtitles now.
Speaker:
If it's in YouTube, it's doing a pretty
good job at the automated transcription,
Speaker:
especially in English.
Speaker:
So I think it will be okay.
Speaker:
And that will be good for people's French
lessons.
Speaker:
So yeah, you know, two birds with one
stone.
Speaker:
So definitely include that now.
Speaker:
Awesome, Marie-Lou.
Speaker:
So that was really great.
Speaker:
Thanks a lot for taking the time and being
so generous with your time.
Speaker:
I'm happy because I had a lot of
questions, but I think we did a pretty
Speaker:
good job at tackling most of them.
Speaker:
As usual,
Speaker:
I put resources and a link to your website
in the show notes for those who want to
Speaker:
dig deeper.
Speaker:
Thank you again, Marie-Lou, for taking the
time and being on this show.
Speaker:
Thank you so much for having me.