Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

How does the world of statistical physics intertwine with machine learning, and what groundbreaking insights can this fusion bring to the field of artificial intelligence?

In this episode, we delve into these intriguing questions with Marylou Gabrié. an assistant professor at CMAP, Ecole Polytechnique in Paris. Having completed her PhD in physics at École Normale Supérieure, Marylou ventured to New York City for a joint postdoctoral appointment at New York University’s Center for Data Science and the Flatiron’s Center for Computational Mathematics.

As you’ll hear, her research is not just about theoretical exploration; it also extends to the practical adaptation of machine learning techniques in scientific contexts, particularly where data is scarce.

In this conversation, we’ll traverse the landscape of Marylou’s research, discussing her recent publications and her innovative approaches to machine learning challenges, latest MCMC advances, and ML-assisted scientific computing.

Beyond that, get ready to discover the person behind the science – her inspirations, aspirations, and maybe even what she does when not decoding the complexities of machine learning algorithms!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

Takeaways

  • Developing methods that leverage machine learning for scientific computing can provide valuable insights into high-dimensional probabilistic models.
  • Generative models can be used to speed up Markov Chain Monte Carlo (MCMC) methods and improve the efficiency of sampling from complex distributions.
  • The Adaptive Monte Carlo algorithm augmented with normalizing flows offers a powerful approach for sampling from multimodal distributions.
  • Scaling the algorithm to higher dimensions and handling discrete parameters are ongoing challenges in the field.
  • Open-source packages, such as Flow MC, provide valuable tools for researchers and practitioners to adopt and contribute to the development of new algorithms. The scaling of algorithms depends on the quantity of parameters and data. While some methods work well with a few hundred parameters, larger quantities can lead to difficulties.
  • Generative models, such as normalizing flows, offer benefits in the Bayesian context, including amortization and the ability to adjust the model with new data.
  • Machine learning and MCMC are complementary and should be used together rather than replacing one another.
  • Machine learning can assist scientific computing in the context of scarce data, where expensive experiments or numerics are required.
  • The future of MCMC lies in the exploration of sampling multimodal distributions and understanding resource limitations in scientific research.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
Speaker:

How does the world of statistical physics

intertwine with machine learning, and what

2

00:00:08,154 --> 00:00:12,196

groundbreaking insights can this fusion

bring to the field of artificial

3

00:00:12,196 --> 00:00:13,316

intelligence?

4

00:00:13,337 --> 00:00:18,339

In this episode, we'll delve into these

intriguing questions with Marilou Gavrier.

5

00:00:18,339 --> 00:00:22,862

Having completed her doctorate in physics

at Ecole Normale Supérieure, Marilou

6

00:00:22,862 --> 00:00:27,524

ventured to New York City for a joint

postdoctoral appointment at New York

7

00:00:27,524 --> 00:00:30,194

University's Center for Data Science.

8

00:00:30,194 --> 00:00:33,894

and the Flatirons Center for Computational

Mathematics.

9

00:00:33,895 --> 00:00:38,316

As you'll hear, her research is not just

about theoretical exploration, it also

10

00:00:38,316 --> 00:00:42,917

extends to the practical adaptation of

machine learning techniques in scientific

11

00:00:42,917 --> 00:00:46,998

contexts, particularly where data are

scarce.

12

00:00:47,118 --> 00:00:51,119

And this conversation will traverse the

landscape of Marie-Lou's research,

13

00:00:51,119 --> 00:00:55,080

discussing her recent publications and her

innovative approaches to machine learning

14

00:00:55,080 --> 00:00:56,021

challenges.

15

00:00:59,310 --> 00:01:04,514

her inspirations, aspirations, and maybe

even what she does when she's not decoding

16

00:01:04,514 --> 00:01:07,797

the complexities of machine learning

algorithms.

17

00:01:07,977 --> 00:01:13,982

This is Learning Bayesian Statistics,

episode 98, recorded November 23, 2023.

18

00:01:15,925 --> 00:01:18,867

Let me show you how to be a good lazy and

change your predictions.

19

00:01:19,368 --> 00:01:24,252

Marie-Louis Gabrié, welcome to Learning

Bayesian Statistics.

20

00:01:24,252 --> 00:01:26,133

Thank you very much, Alex, for having me.

21

00:01:26,730 --> 00:01:27,750

Yes, thank you.

22

00:01:27,750 --> 00:01:32,713

And thank you to Virgil, André and me for

putting us in contact.

23

00:01:32,733 --> 00:01:36,095

This is a French connection network here.

24

00:01:36,095 --> 00:01:38,296

So thanks a lot, Virgil.

25

00:01:38,296 --> 00:01:41,037

Thanks a lot, Marie-Lou for taking the

time.

26

00:01:41,578 --> 00:01:44,699

I'm probably going to say Marie-Lou

because it flows better in my English

27

00:01:44,699 --> 00:01:47,581

because saying Marie-Lou is and then

continuing with English.

28

00:01:47,581 --> 00:01:51,343

I'm going to have the French accent, which

nobody wants to hear that.

29

00:01:52,163 --> 00:01:54,404

So let's start.

30

00:01:54,505 --> 00:01:56,225

So I gave a bit of...

31

00:01:56,394 --> 00:02:03,779

of your background in the intro to this

episode, Marie-Lou, but can you define the

32

00:02:03,779 --> 00:02:07,782

work that you're doing nowadays and the

topics that you are particularly

33

00:02:07,782 --> 00:02:10,443

interested in?

34

00:02:10,443 --> 00:02:15,347

I would define my work as being focused on

developing methods and more precisely

35

00:02:15,347 --> 00:02:19,390

developing methods that use and leverage

all the progress in machine learning for

36

00:02:19,390 --> 00:02:21,791

scientific computing.

37

00:02:21,791 --> 00:02:25,133

I have a special focus within this realm.

38

00:02:25,226 --> 00:02:30,770

which is to study high-dimensional

probabilistic models, because they really

39

00:02:30,770 --> 00:02:31,811

come up everywhere.

40

00:02:31,811 --> 00:02:35,874

And I think they give us a very particular

lens on our world.

41

00:02:35,874 --> 00:02:40,358

And so I would say I'm working broadly in

this direction.

42

00:02:42,140 --> 00:02:44,482

Well, that sounds like a lot of fun.

43

00:02:44,482 --> 00:02:49,065

So I understand why Virgil put me in

contact with you.

44

00:02:50,667 --> 00:02:54,002

And could you start by telling us about

your journey?

45

00:02:54,002 --> 00:02:59,565

actually into the field of statistical

physics and how it led you to merge these

46

00:02:59,826 --> 00:03:02,748

interests with machine learning and what

you're doing today.

47

00:03:03,689 --> 00:03:04,669

Absolutely.

48

00:03:05,410 --> 00:03:09,193

My background is actually in physics, so I

studied physics.

49

00:03:09,333 --> 00:03:13,476

Among the topics in physics, I quickly

became interested in statistical

50

00:03:13,476 --> 00:03:14,417

mechanics.

51

00:03:14,657 --> 00:03:19,441

I don't know if all listeners would be

familiar with statistical mechanics, but I

52

00:03:19,441 --> 00:03:20,194

would define it.

53

00:03:20,194 --> 00:03:24,535

broadly as the study of complex systems

with many interacting components.

54

00:03:25,035 --> 00:03:26,415

So it could be really anything.

55

00:03:26,415 --> 00:03:31,037

You could think of molecules, which are

networks of interacting agents that have

56

00:03:31,037 --> 00:03:35,498

non-trivial interactions and that have

non-trivial behaviors when put all

57

00:03:35,498 --> 00:03:38,038

together within one system.

58

00:03:38,459 --> 00:03:42,880

And I think it's really important, as I

was saying, viewpoint of the world today

59

00:03:42,880 --> 00:03:49,321

to look at those big macroscopic systems

that you can study probabilistically.

60

00:03:49,778 --> 00:03:55,082

And so I was quickly interested in this

field that is statistical mechanics.

61

00:03:56,303 --> 00:03:59,166

And at some point machine learning got the

picture.

62

00:03:59,166 --> 00:04:04,310

And the way it did is that I was looking

for a PhD in 2015.

63

00:04:05,051 --> 00:04:10,235

And I had some of my friends that were,

you know, students in computer science and

64

00:04:10,235 --> 00:04:13,357

kind of early commerce to machine

learning.

65

00:04:13,650 --> 00:04:16,472

And so I started to know that it existed.

66

00:04:16,472 --> 00:04:20,054

I started to know that actually deep

neural networks were revolutionizing the

67

00:04:20,054 --> 00:04:25,818

fields, that you could expect a program

to, I don't know, give names to people in

68

00:04:25,818 --> 00:04:26,859

pictures.

69

00:04:27,179 --> 00:04:31,022

And I thought, well, if this is possible,

I really wanna know how it works.

70

00:04:31,022 --> 00:04:35,865

I really want to, for this technology, not

to sound like magic to me, and I want to

71

00:04:35,865 --> 00:04:36,906

know about it.

72

00:04:37,446 --> 00:04:42,029

And so this is how I started to become

interested and to...

73

00:04:42,450 --> 00:04:46,492

find out that people knew how to make it

work, but not how it worked, why it worked

74

00:04:46,492 --> 00:04:47,373

so well.

75

00:04:47,533 --> 00:04:52,957

And so this is how I, in the end, was put

into contact with Florence Akala, who was

76

00:04:52,957 --> 00:04:54,457

my PhD advisor.

77

00:04:54,558 --> 00:05:00,161

And I started to have this angle of trying

to use statistical mechanics framework to

78

00:05:00,161 --> 00:05:03,464

study deep neural networks that are

precisely those complex systems I was just

79

00:05:03,464 --> 00:05:08,847

mentioning, and that are so big that we

are having trouble making really sense of

80

00:05:08,847 --> 00:05:09,448

what they are doing.

81

00:05:09,448 --> 00:05:12,429

Yeah, I mean, that must be quite...

82

00:05:12,818 --> 00:05:14,518

Indeed, it must be quite challenging.

83

00:05:15,239 --> 00:05:18,740

We could already dive into that.

84

00:05:18,740 --> 00:05:20,301

That sounds like fun.

85

00:05:20,301 --> 00:05:24,983

Do you want to talk a bit more about that

project?

86

00:05:25,183 --> 00:05:28,864

Since then, I really shifted my angle.

87

00:05:29,104 --> 00:05:34,286

I studied in this direction for, say,

three, four years.

88

00:05:35,927 --> 00:05:40,549

Now, I'm actually going back to really the

applications to real-world systems, let's

89

00:05:40,549 --> 00:05:41,209

say.

90

00:05:41,490 --> 00:05:44,791

using all the potentialities of deep

learning.

91

00:05:44,791 --> 00:05:50,153

So it's like the same intersection, but

looking at it from the other side.

92

00:05:50,694 --> 00:05:54,636

Now really looking at application and

using machine learning as a tool, where I

93

00:05:54,636 --> 00:05:59,358

was looking at machine learning as my

study, my object of study, and using

94

00:05:59,358 --> 00:06:01,159

statistical mechanics before.

95

00:06:01,159 --> 00:06:04,300

So I'm keen on talking about what I'm

doing now.

96

00:06:04,880 --> 00:06:05,461

Yeah.

97

00:06:05,461 --> 00:06:07,301

So basically you...

98

00:06:08,706 --> 00:06:12,309

You changed, now you're doing the other

way around, right?

99

00:06:12,309 --> 00:06:16,952

You're studying statistical physics with

machine learning tools instead of doing

100

00:06:16,952 --> 00:06:17,912

the opposite.

101

00:06:19,694 --> 00:06:23,457

And so how does, yeah, what does that look

like?

102

00:06:23,877 --> 00:06:25,679

What does that mean concretely?

103

00:06:25,679 --> 00:06:31,563

Maybe can you talk about an example from

your own work so that listeners can get a

104

00:06:31,563 --> 00:06:33,425

better idea?

105

00:06:33,425 --> 00:06:34,165

Yeah, absolutely.

106

00:06:34,165 --> 00:06:34,825

So.

107

00:06:35,458 --> 00:06:40,019

As I was saying, statistical mechanics is

really about large systems that we study

108

00:06:40,019 --> 00:06:41,339

probabilistically.

109

00:06:41,459 --> 00:06:48,461

And here there's a tool, I mean, that

would be one of the, I would say, most

110

00:06:49,241 --> 00:06:53,803

active direction of research in machine

learning today, which are generative

111

00:06:53,803 --> 00:06:54,383

models.

112

00:06:54,383 --> 00:07:00,444

And they are very natural because there

are ways of making probabilistic model,

113

00:07:00,444 --> 00:07:02,185

but that you can control.

114

00:07:02,365 --> 00:07:03,645

That you have control.

115

00:07:05,122 --> 00:07:13,044

produce samples from within one commons,

where you are in need of very much more

116

00:07:13,044 --> 00:07:17,765

challenging algorithms if you want to do

it in a general physical system.

117

00:07:17,825 --> 00:07:24,007

So we have those machines that we can

leverage and that we can actually combine

118

00:07:24,007 --> 00:07:30,649

in our typical computation tools such as

Markov chain Monte Carlo algorithms, and

119

00:07:30,649 --> 00:07:34,329

that will allow us to speed up the

algorithms.

120

00:07:34,570 --> 00:07:40,353

Of course, it requires some adaptation

compared to what people usually do in

121

00:07:40,353 --> 00:07:47,457

machine learning and how those generative

models were developed, but it's possible

122

00:07:47,457 --> 00:07:51,359

and it's fascinating to try to make those

adaptations.

123

00:07:51,579 --> 00:07:52,680

Hmm.

124

00:07:52,680 --> 00:07:58,763

So, yeah, that's interesting because if I

understand correctly, you're saying that

125

00:08:00,504 --> 00:08:01,285

one of your...

126

00:08:01,285 --> 00:08:02,650

One of the aspects of your...

127

00:08:02,650 --> 00:08:10,075

job is to understand how to use MCMC

methods to speed up these models?

128

00:08:10,075 --> 00:08:14,118

Actually, it's the other way around, is

how to use those models to speed up MCMC

129

00:08:14,118 --> 00:08:14,998

methods.

130

00:08:15,459 --> 00:08:17,180

Okay.

131

00:08:18,381 --> 00:08:19,622

Can you talk about that?

132

00:08:19,762 --> 00:08:21,043

That sounds like fun.

133

00:08:21,624 --> 00:08:22,824

Yeah, of course.

134

00:08:24,426 --> 00:08:29,894

Say MCMC algorithms, so Markov Chain

Monte-Carlo's are really the go-to

135

00:08:29,894 --> 00:08:35,118

algorithm when you are faced with

probabilistic models that is describing

136

00:08:35,118 --> 00:08:41,224

whichever system you care about, say it

might be a molecule, and this molecule has

137

00:08:41,224 --> 00:08:46,469

a bunch of atoms, and so you know that you

can describe your system, I mean at least

138

00:08:46,469 --> 00:08:50,793

classically, at the level of giving the

Cartesian coordinates of all the atoms in

139

00:08:50,793 --> 00:08:51,813

your system.

140

00:08:52,414 --> 00:08:56,310

And then you can describe the equilibrium

properties of your system.

141

00:08:56,310 --> 00:08:59,450

by using the energy function of this

molecule.

142

00:08:59,450 --> 00:09:03,591

So if you believe that you have an energy

function for this molecule, then you

143

00:09:03,591 --> 00:09:07,772

believe that it's distributed as

exponential minus beta the energy.

144

00:09:08,093 --> 00:09:09,873

This is the Boltzmann distribution.

145

00:09:09,873 --> 00:09:13,334

And then, okay, you are left with your

probabilistic model.

146

00:09:13,354 --> 00:09:18,856

And if you want to approach it, a priori

you have no control onto what this energy

147

00:09:18,856 --> 00:09:21,356

function is imposing as constraints.

148

00:09:21,436 --> 00:09:22,857

It may be very, very complicated.

149

00:09:22,857 --> 00:09:26,057

Well, go-to algorithm is Markov chain

Monte Carlo.

150

00:09:26,170 --> 00:09:31,694

And it's a go-to algorithm that is always

going to work.

151

00:09:31,694 --> 00:09:36,258

And here I'm putting quotes around this

thing.

152

00:09:37,019 --> 00:09:42,783

Because it's going to be a greedy

algorithm that is going to be looking for

153

00:09:43,464 --> 00:09:47,528

plausible configurations next to other

plausible configurations.

154

00:09:47,528 --> 00:09:53,753

And locally, make a search on the

configuration space, try to visit it, and

155

00:09:53,753 --> 00:09:54,373

then.

156

00:09:54,598 --> 00:09:56,938

will be representative of the

thermodynamics.

157

00:09:57,479 --> 00:09:59,440

Of course, it's not that easy.

158

00:09:59,440 --> 00:10:03,181

And although you can make such locally,

sometimes it's really not enough to

159

00:10:03,181 --> 00:10:08,643

describe fully probabilistic modeling, in

particular, how different regions of your

160

00:10:08,643 --> 00:10:11,505

configuration space are related to one

another.

161

00:10:11,505 --> 00:10:17,287

So if I come back to my molecule example,

it would be that I have two different,

162

00:10:18,167 --> 00:10:22,409

let's say, conformations of my molecule,

two main templates that my molecule is

163

00:10:22,409 --> 00:10:23,669

going to look like.

164

00:10:23,718 --> 00:10:28,340

And they may be divided by what we call an

energy barrier, or in the language of

165

00:10:28,340 --> 00:10:33,202

probabilities, it's just low probability

regions in between large probability

166

00:10:33,202 --> 00:10:34,142

regions.

167

00:10:34,502 --> 00:10:37,323

And in this case, local MCMCs are gonna

fail.

168

00:10:37,784 --> 00:10:41,325

And this is where we believe that

generative models could help us.

169

00:10:41,325 --> 00:10:48,168

And let's say fill this gap to answer some

very important questions.

170

00:10:48,168 --> 00:10:50,789

And how would that work then?

171

00:10:50,789 --> 00:10:51,809

Like you would...

172

00:10:52,246 --> 00:10:56,990

Would you run a first model that would

help you infer that and then use that into

173

00:10:56,990 --> 00:10:58,491

the MCMC algorithm?

174

00:10:58,491 --> 00:11:01,814

Or like, yeah, what does that look like?

175

00:11:01,814 --> 00:11:03,315

I think your intuition is correct.

176

00:11:03,315 --> 00:11:05,577

So you cannot do it in one go.

177

00:11:05,577 --> 00:11:11,582

And what's, for example, the paper that I

published, I think it was last year in

178

00:11:11,582 --> 00:11:16,306

PNAS that is called Adaptive Monte Carlo

Augmented with Normalizing Flows is

179

00:11:16,306 --> 00:11:20,189

precisely implementing something where you

have feedback loops.

180

00:11:20,189 --> 00:11:20,849

So

181

00:11:21,362 --> 00:11:26,604

The idea is that the fact that you have

those local Monte-Carlo's that you can run

182

00:11:26,604 --> 00:11:31,866

within the different regions You have

identified as being interesting Will help

183

00:11:31,866 --> 00:11:36,148

you to see the training of a generative

model that is going to target generating

184

00:11:36,148 --> 00:11:41,811

configurations in those different regions

Once you have this generative model you

185

00:11:41,811 --> 00:11:48,133

can include it in your mark of change

strategy You can use it as a proposal

186

00:11:48,133 --> 00:11:49,046

mechanism

187

00:11:49,046 --> 00:11:52,947

to propose new locations for your MCMC to

jump.

188

00:11:53,067 --> 00:12:01,291

And so you're creating a Monte Carlo chain

that is going to slowly converge towards

189

00:12:01,291 --> 00:12:03,651

the target distribution you're really

after.

190

00:12:04,372 --> 00:12:09,714

And you're gonna do it by using the data

you're producing to train a generative

191

00:12:09,714 --> 00:12:15,136

model that will help you produce better

data as it's incorporated within the MCMC

192

00:12:15,136 --> 00:12:18,777

kernel you are actually jumping with.

193

00:12:18,902 --> 00:12:23,165

So you have this feedback mechanism that

makes that things can work.

194

00:12:23,165 --> 00:12:28,468

And this idea of adaptivity really stems

from the fact that in scientific

195

00:12:28,468 --> 00:12:31,650

computing, we are going to do machine

learning with scarce data.

196

00:12:31,651 --> 00:12:36,554

We are not going to have all the data we

wish we had to start with, but we are

197

00:12:36,554 --> 00:12:41,397

going to have these type of methods where

we are doing things in what we call

198

00:12:41,397 --> 00:12:42,318

adaptively.

199

00:12:42,318 --> 00:12:47,681

So it's doing, recording information,

doing again.

200

00:12:48,982 --> 00:12:51,123

In a few words.

201

00:12:51,123 --> 00:12:51,803

Yeah.

202

00:12:52,463 --> 00:12:52,743

Yeah, yeah.

203

00:12:52,743 --> 00:12:52,843

Yeah.

204

00:12:52,843 --> 00:13:01,087

So I mean, if I understand correctly, it's

a way of going one step further than what

205

00:13:01,087 --> 00:13:06,249

HMC is already doing where we're looking

at the gradients and we're trying to adapt

206

00:13:06,249 --> 00:13:06,949

based on that.

207

00:13:06,949 --> 00:13:15,112

Now, basically, the idea is to find some

way of getting even more information as to

208

00:13:15,112 --> 00:13:18,053

where the next sample should come from.

209

00:13:18,210 --> 00:13:22,392

from the typical set and then being able

to navigate the typical set more

210

00:13:22,392 --> 00:13:23,272

efficiently?

211

00:13:23,873 --> 00:13:24,633

Yes.

212

00:13:24,733 --> 00:13:29,275

Yes, so let's say that it's an algorithm

that is more ambitious than HMC.

213

00:13:29,796 --> 00:13:31,317

Of course, there are caveats.

214

00:13:32,037 --> 00:13:39,781

But HMC is trying to follow a dynamic to

try to travel towards interesting regions.

215

00:13:40,322 --> 00:13:45,965

But it has to be tuned quite finely in

order to actually end up in the next

216

00:13:45,965 --> 00:13:47,010

interesting region.

217

00:13:47,010 --> 00:13:49,011

provided that it started from one.

218

00:13:49,011 --> 00:13:53,415

And so to cross those energy barriers,

here with machine learning, we would

219

00:13:53,415 --> 00:13:56,398

really be jumping over energy barriers.

220

00:13:56,398 --> 00:14:02,083

We would have models that pretty only

targets the interesting regions and just

221

00:14:02,083 --> 00:14:03,784

doesn't care about what's in between.

222

00:14:03,784 --> 00:14:08,489

And that really focuses the efforts where

you believe it matters.

223

00:14:08,489 --> 00:14:12,873

However, there are cases in which those

machine learning models will have trouble

224

00:14:12,873 --> 00:14:14,230

scaling where

225

00:14:14,230 --> 00:14:15,531

HMC would be more robust.

226

00:14:15,531 --> 00:14:22,075

So there is of course always a trade-off

on the algorithms that you are using, how

227

00:14:22,175 --> 00:14:27,959

efficient they can be per MCMC step and

how general you can accept them to be.

228

00:14:27,959 --> 00:14:28,900

Hmm.

229

00:14:28,900 --> 00:14:31,281

I see.

230

00:14:31,281 --> 00:14:31,482

Yeah.

231

00:14:31,482 --> 00:14:37,486

So, and actually, yeah, that would be one

of my questions would be, when do you

232

00:14:37,486 --> 00:14:42,969

think this kind of new algorithm would be?

233

00:14:43,982 --> 00:14:49,245

would be interesting to use instead of the

classic and Chempsey?

234

00:14:49,245 --> 00:14:54,529

Like in which cases would you say people

should give that a try instead of using

235

00:14:54,529 --> 00:14:58,131

the classic rubber state Chempsey method

we have right now?

236

00:14:58,952 --> 00:15:00,893

So that's an excellent question.

237

00:15:01,234 --> 00:15:07,138

I think right now, so on paper, the

algorithm we propose is really, really

238

00:15:07,138 --> 00:15:12,738

powerful because it will allow you to jump

throughout your space and so to...

239

00:15:12,738 --> 00:15:18,121

to correlate your MCMC configurations

extremely fast.

240

00:15:18,121 --> 00:15:23,265

However, for this to happen, you have that

the proposal that is made by your deep

241

00:15:23,265 --> 00:15:27,528

generative model as a new location, I

mean, a new configuration in your MCMC

242

00:15:27,528 --> 00:15:29,549

chain is accepted.

243

00:15:29,549 --> 00:15:36,234

So in the end, you don't have anymore the

fact that you are jumping locally and that

244

00:15:36,234 --> 00:15:40,477

your de-correlation comes from the fact

that you are going to make lots of local

245

00:15:40,477 --> 00:15:41,277

jumps.

246

00:15:41,622 --> 00:15:45,143

Here you could correlate in one step, but

you need to accept.

247

00:15:45,143 --> 00:15:50,104

So the acceptance will be really what you

need to care about in running the

248

00:15:50,104 --> 00:15:51,084

algorithm.

249

00:15:51,424 --> 00:15:58,867

And what is going to determine whether or

not your acceptance is high is actually

250

00:15:58,867 --> 00:16:02,668

the agreement between your deep generative

model and the target distribution you're

251

00:16:02,668 --> 00:16:03,468

after.

252

00:16:04,968 --> 00:16:08,989

And we have traditional, you know,

253

00:16:09,342 --> 00:16:13,803

challenges here in making the genetic

model look like exactly the target we

254

00:16:13,803 --> 00:16:14,583

want.

255

00:16:14,943 --> 00:16:21,645

There are issues with scalability and

there are issues with, I would say,

256

00:16:21,645 --> 00:16:22,705

constraints.

257

00:16:22,885 --> 00:16:28,047

So you give me, let's say you're

interested in Bayesian inference, so

258

00:16:28,047 --> 00:16:30,547

another case where we can apply these kind

of algorithms, right?

259

00:16:30,547 --> 00:16:34,288

Because you have a posterior and you just

want to sample from your posterior to make

260

00:16:34,288 --> 00:16:34,968

sense

261

00:16:39,510 --> 00:16:41,170

10, 100.

262

00:16:41,911 --> 00:16:45,213

I tell you, I know how to train

normalizing flows, which are the specific

263

00:16:45,213 --> 00:16:49,795

type of generative models we are using

here, in 10 or 100 dimension.

264

00:16:50,236 --> 00:16:55,179

So if you believe that your posterior is

multimodal, that it will be hard for

265

00:16:55,179 --> 00:16:59,541

traditional algorithms to visit the entire

landscape and equilibrate because there

266

00:16:59,541 --> 00:17:04,944

are some low density regions in between

high density regions, go for it.

267

00:17:05,965 --> 00:17:06,745

If you...

268

00:17:07,110 --> 00:17:12,133

actually are an astronomer and you want to

marginalize over your initial conditions

269

00:17:12,133 --> 00:17:16,096

on a grid that represents the universe and

actually the posterior distribution you're

270

00:17:16,096 --> 00:17:21,980

interested in is on, you know, variables

that are in millions of dimension.

271

00:17:22,721 --> 00:17:23,661

I'm sorry.

272

00:17:24,122 --> 00:17:28,585

We're not going to do it with you and you

should actually use something that is more

273

00:17:28,585 --> 00:17:34,349

general, something that will use a local

search, but that is actually going to, you

274

00:17:34,349 --> 00:17:35,549

know, be

275

00:17:36,146 --> 00:17:36,906

Unperfect, right?

276

00:17:36,906 --> 00:17:40,849

Because it's going to be very, very hard

also for this algorithm to work.

277

00:17:40,849 --> 00:17:46,392

But the magic of the machine learning will

not scale yet to this type of dimensions.

278

00:17:48,294 --> 00:17:49,755

Yeah, I see.

279

00:17:49,755 --> 00:17:58,601

And is that an avenue you're actively

researching to basically how to scale

280

00:17:58,601 --> 00:18:01,743

these algorithms better to be your scams?

281

00:18:01,903 --> 00:18:02,984

Yeah, of course.

282

00:18:03,384 --> 00:18:05,505

Of course we can always try to do better.

283

00:18:05,526 --> 00:18:11,947

So, I mean, as far as I'm concerned, I'm

also very interested in sampling physical

284

00:18:11,947 --> 00:18:12,867

systems.

285

00:18:13,168 --> 00:18:17,549

And in physical systems, there are a lot

of, you know, prior information that you

286

00:18:17,549 --> 00:18:19,469

have on the system.

287

00:18:19,509 --> 00:18:27,252

You have symmetries, you have, I don't

know, yeah, physical rules that you know

288

00:18:27,252 --> 00:18:30,012

that the system has to fulfill.

289

00:18:30,152 --> 00:18:34,173

Or maybe some, I don't know, multi-scale.

290

00:18:35,234 --> 00:18:39,457

property of the probability distribution,

you know that there are some

291

00:18:39,457 --> 00:18:45,241

self-significant similarities, you have

information you can try to exploit in two

292

00:18:45,241 --> 00:18:51,786

ways, either in the sampling part, so

you're having this coupled MCMC with the

293

00:18:51,786 --> 00:18:56,330

degenerative models, so either in the way

you make proposals you can try to

294

00:18:56,330 --> 00:19:02,053

symmetrize them, you can try to explore

the symmetry by any means.

295

00:19:02,706 --> 00:19:05,307

Oh, you can also directly put it in the

generative model.

296

00:19:05,307 --> 00:19:08,608

So those are things that really are

crucial.

297

00:19:08,608 --> 00:19:15,031

And we understand very well nowadays that

it's naive to think you will learn it all.

298

00:19:15,191 --> 00:19:20,613

You should really use as much information

on your system as you may, as you can.

299

00:19:20,773 --> 00:19:25,616

And after that, you can go one step

further with machine learning.

300

00:19:25,616 --> 00:19:30,506

But in non-trivial systems, it would be, I

mean, it's not a big deal.

301

00:19:30,506 --> 00:19:33,266

deceiving to believe that you could just

learn things.

302

00:19:33,506 --> 00:19:33,686

Yeah.

303

00:19:33,686 --> 00:19:36,987

I mean, completely resonate with that.

304

00:19:36,987 --> 00:19:45,250

It's definitely something we will always

tell students or clients, like, don't

305

00:19:45,250 --> 00:19:51,692

just, you know, throw everything at the

model that you can and just try to pray

306

00:19:51,692 --> 00:19:53,432

that the model works like that.

307

00:19:53,432 --> 00:19:58,653

And, but actually you should probably use

a generative perspective to

308

00:19:58,722 --> 00:20:03,605

try and find out what the best way of

thinking about the problem is, what would

309

00:20:03,605 --> 00:20:08,449

be the good enough, simple enough model

that you can come up with and then try to

310

00:20:08,449 --> 00:20:08,789

run that.

311

00:20:08,789 --> 00:20:14,753

Yeah, so definitely I think that resonates

with a lot of the audience where think

312

00:20:14,753 --> 00:20:15,694

generatively.

313

00:20:15,694 --> 00:20:20,658

And from what I understand from what you

said is also trying to put as much

314

00:20:20,658 --> 00:20:23,640

knowledge and information as you have in

your generative model.

315

00:20:25,846 --> 00:20:30,467

the deep neural network is here, the

normalizing flow is here to help, but it's

316

00:20:30,467 --> 00:20:38,929

not going to be a magical solution to a

suboptimally specified model.

317

00:20:38,929 --> 00:20:40,250

Yes, yes.

318

00:20:40,410 --> 00:20:46,072

Of course, in all those problems, what's

hidden behind is the curse of

319

00:20:46,072 --> 00:20:47,072

dimensionality.

320

00:20:47,072 --> 00:20:51,873

If we are trying to learn something in

very high dimension and...

321

00:20:52,734 --> 00:20:54,094

It could be arbitrarily hard.

322

00:20:54,094 --> 00:20:57,837

It could be that you cannot learn

something in high dimension just because

323

00:20:57,997 --> 00:21:01,740

you would need to observe all the location

in this high dimension to get the

324

00:21:01,740 --> 00:21:02,240

information.

325

00:21:02,240 --> 00:21:05,902

So of course, this is in general not the

case, because what we are trying to learn

326

00:21:05,902 --> 00:21:10,406

has some structure, some underlying

structure that is actually described by

327

00:21:10,406 --> 00:21:11,846

fewer dimensions.

328

00:21:11,907 --> 00:21:15,389

And you actually need fewer observations

to actually learn it.

329

00:21:15,409 --> 00:21:20,532

But the question is, how do you find those

structures, and how do you put them in?

330

00:21:22,422 --> 00:21:26,663

Therefore, we need to take into account as

much as the knowledge we have on the

331

00:21:26,663 --> 00:21:30,244

system to make this learning as efficient

as possible.

332

00:21:30,244 --> 00:21:33,985

Yeah, yeah, yeah.

333

00:21:33,985 --> 00:21:35,505

Now, I mean, that's super interesting.

334

00:21:35,505 --> 00:21:39,906

And that's your paper, Adaptive Monte

Carlo augmented with normalizing floats,

335

00:21:39,906 --> 00:21:40,646

right?

336

00:21:41,167 --> 00:21:44,287

So this is the paper where we did this

generally.

337

00:21:44,287 --> 00:21:49,249

And I don't have yet a paper out where we

are trying to really put the structure in

338

00:21:49,249 --> 00:21:50,009

the generative models.

339

00:21:50,009 --> 00:21:52,430

But that's the direction I'm actively

340

00:21:52,430 --> 00:21:53,650

Okay, yeah.

341

00:21:53,650 --> 00:21:58,733

I mean, so for sure, we'll put that paper

I just seated in the show notes for people

342

00:21:58,733 --> 00:22:00,033

who want to dig deeper.

343

00:22:00,134 --> 00:22:06,757

And also, if by the time this episode is

out, you have the paper or a preprint,

344

00:22:06,977 --> 00:22:10,579

feel free to add that to the show notes or

just tell me and I'll add that to the show

345

00:22:10,579 --> 00:22:11,380

notes.

346

00:22:11,380 --> 00:22:14,401

That sounds really interesting for people

to read.

347

00:22:15,042 --> 00:22:21,185

And so I'm curious, like, you know, this

idea of normalizing flows

348

00:22:22,646 --> 00:22:30,049

deep neural network to help MCMC sample

faster, converge faster to the typical

349

00:22:30,049 --> 00:22:32,510

set.

350

00:22:33,711 --> 00:22:38,433

What was the main objective of doing that?

351

00:22:39,593 --> 00:22:45,656

I'm curious why did you even start

thinking and working on that?

352

00:22:45,656 --> 00:22:47,877

So yes, I think for me,

353

00:22:49,182 --> 00:22:52,442

The answer is really this question of

multimodality.

354

00:22:52,442 --> 00:22:57,684

So the fact that you may be interested in

priority distribution for which it's very

355

00:22:57,684 --> 00:23:01,025

hard to connect the different interesting

regions.

356

00:23:01,225 --> 00:23:05,186

In statistical mechanics, it's something

that we called actually metastability.

357

00:23:05,186 --> 00:23:10,227

So I don't know if it's a word you've

already heard, but where some communities

358

00:23:10,227 --> 00:23:13,348

talk about multimodality, we talk about

metastability.

359

00:23:13,568 --> 00:23:18,809

And metastability are at the heart of many

interesting phenomena in physics.

360

00:23:18,910 --> 00:23:20,430

be it phase transitions.

361

00:23:20,430 --> 00:23:27,453

And therefore, it's something very

challenging in the computations, but in

362

00:23:27,453 --> 00:23:31,175

the same time, very crucial that we have

an understanding of.

363

00:23:31,175 --> 00:23:37,397

So for us, it felt like there was this big

opportunity with those probabilistic

364

00:23:37,397 --> 00:23:43,040

models that were so malleable, that were

so, I mean, of course, hard to train, but

365

00:23:43,040 --> 00:23:44,641

then they give you so much.

366

00:23:44,641 --> 00:23:47,081

They give you an exact...

367

00:23:47,102 --> 00:23:52,403

value for the density that they encode,

plus the possibility of sampling from them

368

00:23:52,403 --> 00:23:59,025

very easily, getting just a bunch of

high-ID samples just in one run through a

369

00:23:59,025 --> 00:24:00,065

neural network.

370

00:24:00,065 --> 00:24:05,827

So for us, there was really this

opportunity of studying multimodal

371

00:24:05,827 --> 00:24:09,388

distribution, in particular, metastable

systems from statistical mechanics with

372

00:24:09,388 --> 00:24:10,628

those tools.

373

00:24:10,628 --> 00:24:11,268

Yeah.

374

00:24:12,169 --> 00:24:12,909

Okay.

375

00:24:13,449 --> 00:24:15,949

So in theory,

376

00:24:18,135 --> 00:24:27,663

these normalizing flows are especially

helpful to handle multimodal posterior.

377

00:24:28,063 --> 00:24:30,385

I didn't get that at first, so that's

interesting.

378

00:24:30,385 --> 00:24:32,227

Yep.

379

00:24:32,227 --> 00:24:37,371

That's really what they're going to offer

you is the possibility to make large

380

00:24:37,371 --> 00:24:43,416

jumps, actually to make jumps within your

Markov chain that can go from one location

381

00:24:43,416 --> 00:24:45,377

of high density to another one.

382

00:24:45,706 --> 00:24:47,146

just in one step.

383

00:24:47,146 --> 00:24:49,727

So this is what you are really interested

in.

384

00:24:49,727 --> 00:24:53,529

Well, first of all, in one step, so you're

going far in one step.

385

00:24:53,529 --> 00:24:58,971

And second of all, regardless of how low

is the density between them, because if

386

00:24:58,971 --> 00:25:04,054

you were to run some other type of local

MCMC, you would, in a sense, need to find

387

00:25:04,054 --> 00:25:07,975

a path between the two modes in order to

visit both of them.

388

00:25:07,975 --> 00:25:09,396

In our case, it's not true.

389

00:25:09,396 --> 00:25:13,226

You're just completely jumping out of the

blue thanks to...

390

00:25:13,226 --> 00:25:17,187

your normalizing flows that is trying to

mimic your target distribution, and

391

00:25:17,187 --> 00:25:21,588

therefore that has developed mass

everywhere that you believe matters, and

392

00:25:21,588 --> 00:25:28,830

that from which you can produce an IID

sample wherever it supports very easily.

393

00:25:28,910 --> 00:25:29,750

I see, yeah.

394

00:25:29,750 --> 00:25:33,811

And I'm guessing you did some benchmarks

for the paper?

395

00:25:34,971 --> 00:25:38,432

So I think that's actually a very

interesting question you're asking,

396

00:25:38,432 --> 00:25:43,133

because I feel benchmarks are extremely

difficult, both in MCMC...

397

00:25:43,942 --> 00:25:45,342

and in deep learning.

398

00:25:45,342 --> 00:25:50,225

So, I mean, you can make benchmarks say,

okay, I changed the architecture and I see

399

00:25:50,225 --> 00:25:52,185

that I'm getting something different.

400

00:25:53,306 --> 00:25:59,248

I can say, I mean, but otherwise, I think

it's one of the big challenges that we

401

00:25:59,248 --> 00:26:00,749

have today.

402

00:26:00,749 --> 00:26:07,612

So if I tell you, okay, with my algorithm,

I can write an MCMC that is going to mix

403

00:26:07,612 --> 00:26:11,193

between the different modes, between the

different metastable states.

404

00:26:11,330 --> 00:26:15,132

that's something that I don't know how to

do by any other means.

405

00:26:15,192 --> 00:26:19,555

So the benchmark is, actually you won.

406

00:26:19,615 --> 00:26:22,337

There is nothing to be compared with, so

that's fine.

407

00:26:22,337 --> 00:26:29,382

But if I need to compare on other cases

where actually I can find those algorithms

408

00:26:29,382 --> 00:26:34,605

that will work, but I know that they are

going to probably take more iterations,

409

00:26:35,246 --> 00:26:40,318

then I still need to factor in a lot of

things in my true

410

00:26:40,318 --> 00:26:41,438

honest benchmark.

411

00:26:41,438 --> 00:26:45,041

I need to factor in the fact that I run a

lot of experiments to choose the

412

00:26:45,041 --> 00:26:47,202

architecture of my normalizing flow.

413

00:26:47,222 --> 00:26:54,347

I run a lot of experiments to choose the

hyperparameters of my training and so on

414

00:26:54,347 --> 00:26:55,048

and so forth.

415

00:26:55,048 --> 00:27:00,171

And I don't see how we can make those

honest benchmarks nowadays.

416

00:27:00,171 --> 00:27:05,975

So I can make one, but I don't think I

will think very highly that it's, I mean,

417

00:27:05,975 --> 00:27:10,017

you know, really revealing some profound

truth about

418

00:27:10,162 --> 00:27:12,443

which solution is really working.

419

00:27:12,904 --> 00:27:17,648

The only way of making a known-use

benchmark would be to take different

420

00:27:17,648 --> 00:27:23,032

teams, give them problems, and lock them

in a room and see who comes out first with

421

00:27:23,032 --> 00:27:23,653

the solution.

422

00:27:23,653 --> 00:27:28,437

But I mean, how can we do that?

423

00:27:28,437 --> 00:27:35,923

Well, we can call on listeners who are

interested to do the experiments to

424

00:27:35,923 --> 00:27:36,944

contact us.

425

00:27:36,944 --> 00:27:38,325

That would be the first thing.

426

00:27:39,190 --> 00:27:41,331

But yeah, that's actually a very good

point.

427

00:27:41,331 --> 00:27:46,596

And in a way, that's a bit frustrating,

right?

428

00:27:46,596 --> 00:27:53,401

Because then it means at least

experimentally, it's hard to differentiate

429

00:27:53,401 --> 00:27:57,544

between the efficiency of the different

algorithms.

430

00:27:58,606 --> 00:28:04,510

So I'm guessing the claims that you make

about this new algorithm being more

431

00:28:04,510 --> 00:28:06,432

efficient for multimodalities,

432

00:28:08,570 --> 00:28:13,432

theoretical underpinning of the algorithm?

433

00:28:14,953 --> 00:28:18,695

No, I mean, it's just based on the fact

that I don't know of any other algorithm,

434

00:28:18,755 --> 00:28:23,278

which under the same premises, which can

do that.

435

00:28:23,278 --> 00:28:28,801

So, I mean, it's an easy way out of making

any benchmark, but also a powerful one

436

00:28:28,801 --> 00:28:33,583

because I really don't know who to compare

to.

437

00:28:33,583 --> 00:28:37,405

But indeed, I think then it's...

438

00:28:37,930 --> 00:28:42,634

As far as I'm concerned, I'm mostly

interested in developing methodologies.

439

00:28:42,634 --> 00:28:45,376

I mean, that's just what I like to do.

440

00:28:45,476 --> 00:28:49,720

But of course, what's important is that

those methods are going to work and they

441

00:28:49,720 --> 00:28:53,304

are going to be useful to some communities

that really have research questions that

442

00:28:53,304 --> 00:28:54,425

they want to answer.

443

00:28:54,425 --> 00:28:59,109

I mean, research or not actually could be

engineering questions, decisions to be

444

00:28:59,109 --> 00:29:01,531

taken that require to do an MCMC.

445

00:29:01,591 --> 00:29:04,446

And I think the true tests of

446

00:29:04,446 --> 00:29:08,869

whether or not the algorithm is useful is

going to be this, the test of time.

447

00:29:08,869 --> 00:29:10,551

Are people adopting the algorithms?

448

00:29:10,551 --> 00:29:17,516

Are they seeing that this is really

something that they can use and that would

449

00:29:17,516 --> 00:29:21,439

make their inference work where they could

not find another method that was as

450

00:29:21,439 --> 00:29:22,400

efficient?

451

00:29:22,620 --> 00:29:28,345

And in this direction, there is the

cross-collaborator, Case Wong, who is

452

00:29:28,345 --> 00:29:33,009

working at the Flatiron Institute and with

whom we developed a package that is called

453

00:29:33,009 --> 00:29:33,889

FlowMC.

454

00:29:34,770 --> 00:29:39,554

that is written in Jax and that implements

these algorithms.

455

00:29:40,135 --> 00:29:44,638

And the idea was really to try to write a

package that was as user-friendly as

456

00:29:44,638 --> 00:29:45,159

possible.

457

00:29:45,159 --> 00:29:49,282

So of course we have the time we have to

take care of it and the experience we have

458

00:29:49,282 --> 00:29:53,906

as a region, you know, available softwares

as we have, but we really try hard.

459

00:29:53,906 --> 00:29:58,370

And at least in this community of people

studying gravitational waves, it seems

460

00:29:58,370 --> 00:30:03,154

that people are really trying, starting to

use this in their research.

461

00:30:03,154 --> 00:30:07,537

And so I'm excited, and I think it is

useful.

462

00:30:07,537 --> 00:30:11,920

But it's not the proper benchmark you

would dream of.

463

00:30:11,920 --> 00:30:16,002

Yeah, you just stole one of my questions.

464

00:30:16,303 --> 00:30:21,065

Basically, I was exactly going to ask you,

but then how can people try these?

465

00:30:21,366 --> 00:30:23,107

Is there a package somewhere?

466

00:30:23,267 --> 00:30:24,028

So yeah, perfect.

467

00:30:24,028 --> 00:30:26,409

That's called FlowMC, you told me.

468

00:30:26,409 --> 00:30:28,391

Yes, it's called FlowMC.

469

00:30:28,391 --> 00:30:32,573

You can pip install FlowMC, and you will

have it.

470

00:30:32,838 --> 00:30:34,138

If you are allergic to Jax...

471

00:30:34,138 --> 00:30:35,579

Right, I have it here.

472

00:30:35,579 --> 00:30:38,401

Yeah, there is a read the docs.

473

00:30:38,401 --> 00:30:41,444

So I'll put that in the show notes for

sure.

474

00:30:41,444 --> 00:30:43,005

Yes, we have even documentation.

475

00:30:43,005 --> 00:30:50,430

That's how far you go when you are

committed to having something that is used

476

00:30:50,430 --> 00:30:51,050

and useful.

477

00:30:51,050 --> 00:30:57,214

So I mean, of course, we are also open to

both comments and contributions.

478

00:30:57,214 --> 00:31:00,677

So just write to us if you're interested.

479

00:31:00,937 --> 00:31:01,890

Yeah, for sure.

480

00:31:01,890 --> 00:31:07,972

Yeah, that folks, if you are interested in

contributing, if you see any bugs, make

481

00:31:07,972 --> 00:31:15,956

sure to open some issues on the GitHub

repo or even better, contribute pull

482

00:31:15,956 --> 00:31:16,736

requests.

483

00:31:16,736 --> 00:31:22,598

I'm sure Marie-Doux and the co-authors

will be very happy about that.

484

00:31:22,598 --> 00:31:25,920

Yes, you know typos in the documentation,

all of this.

485

00:31:26,620 --> 00:31:28,201

Yeah, exactly.

486

00:31:29,061 --> 00:31:29,914

That's what I...

487

00:31:29,914 --> 00:31:35,596

I tell everyone also who wants to start

doing some open source package, start with

488

00:31:35,596 --> 00:31:36,717

the smallest PRs.

489

00:31:36,717 --> 00:31:43,820

You don't have to write a new algorithm,

like already fixing typos, making the

490

00:31:43,820 --> 00:31:45,661

documentation look better, and stuff like

that.

491

00:31:45,661 --> 00:31:48,702

That's extremely valuable, and that will

be appreciated.

492

00:31:48,702 --> 00:31:53,265

So for sure, do that, folks.

493

00:31:53,265 --> 00:31:55,425

Do not be shy with that kind of stuff.

494

00:31:56,002 --> 00:32:02,206

So yeah, I put already the paper, you have

out an archive at adaptive Monte Carlo and

495

00:32:02,206 --> 00:32:04,528

Flow MC, I put that in the show notes.

496

00:32:05,469 --> 00:32:12,554

And yeah, to get back to what you were

saying, basically, I think as more of a

497

00:32:12,554 --> 00:32:22,241

practitioner than a person who developed

the algorithms, I would say the reasons I

498

00:32:22,241 --> 00:32:22,921

would...

499

00:32:23,414 --> 00:32:27,755

you know, adopt that kind of new

algorithms would be that, well, I know,

500

00:32:27,755 --> 00:32:35,457

okay, that algorithm is specialized,

especially for handling multimodels,

501

00:32:35,457 --> 00:32:36,717

multimodels posterior.

502

00:32:36,717 --> 00:32:40,018

So then I'd be, if I have a problem like

that, I'll be like, oh, okay, yeah, I can

503

00:32:40,018 --> 00:32:40,978

use that.

504

00:32:40,978 --> 00:32:42,459

And then also ease of adoption.

505

00:32:42,459 --> 00:32:50,121

So is there an open source package in

which languages that can I just, you know,

506

00:32:53,194 --> 00:32:56,275

What kind of trade-off basically do I have

to make?

507

00:32:56,575 --> 00:32:58,716

Is that something that's easy to adopt?

508

00:32:58,716 --> 00:33:03,358

Is that something that's really a lot of

barriers to adoptions?

509

00:33:03,478 --> 00:33:06,279

But at the same time, it really seems to

be solving my problem.

510

00:33:06,279 --> 00:33:07,220

You know what I'm saying?

511

00:33:07,220 --> 00:33:15,844

It's like, indeed, it's not only the

technical and theoretical aspects of the

512

00:33:15,844 --> 00:33:20,685

method, but also how easy it is to...

513

00:33:22,162 --> 00:33:24,864

adopt in your existing workflows.

514

00:33:26,325 --> 00:33:26,545

Yes.

515

00:33:26,545 --> 00:33:32,009

And for this, I guess it's, I mean, the

feedback is extremely valuable because

516

00:33:32,009 --> 00:33:37,173

when you know the methods, you're really,

it's hard to exactly locate where people

517

00:33:37,173 --> 00:33:39,134

will not understand what you meant.

518

00:33:39,134 --> 00:33:42,636

And so I really welcomed.

519

00:33:43,277 --> 00:33:44,158

No, for sure.

520

00:33:44,158 --> 00:33:50,141

And already I find that absolutely

incredible that now

521

00:33:50,842 --> 00:33:56,265

Almost all new algorithms, at least that I

talk about on the podcast and that I see

522

00:33:56,265 --> 00:34:01,127

in the community, on the PMC community,

almost all of them now, when they come up

523

00:34:01,127 --> 00:34:05,790

with a paper, they come out with an open

source package that's usually installable

524

00:34:05,790 --> 00:34:09,031

in a Python, in the Python ecosystem.

525

00:34:09,032 --> 00:34:10,232

Which is really incredible.

526

00:34:10,232 --> 00:34:15,935

I remember that when I started on these a

few years ago, it was really not the norm

527

00:34:16,236 --> 00:34:19,757

and much more the exception and now almost

528

00:34:19,990 --> 00:34:25,632

The Icon Panning open source package is

almost part of the paper, which is really

529

00:34:25,632 --> 00:34:30,414

good because way more people are going to

use the package than read the paper.

530

00:34:30,414 --> 00:34:34,956

So, this is absolutely a fantastic

evolution.

531

00:34:34,956 --> 00:34:40,758

And thank you in the name of our soul to

have taken the time to develop the

532

00:34:40,758 --> 00:34:48,501

package, clean up the code, put that on

PyPI and making the documentation because

533

00:34:48,918 --> 00:34:55,061

That's where the academic incentives are a

bit disaligned with what I think they

534

00:34:55,061 --> 00:34:55,961

should be.

535

00:34:56,482 --> 00:35:00,664

Because unfortunately, literally it takes

time for you to do that.

536

00:35:00,664 --> 00:35:04,286

And it's not very much appreciated by the

academic community, right?

537

00:35:04,286 --> 00:35:07,527

It's just like, you have to do it, but

they don't really care.

538

00:35:07,588 --> 00:35:10,969

We care as the practitioners, but the

academic world doesn't really.

539

00:35:11,190 --> 00:35:12,951

And what counts is the paper.

540

00:35:12,951 --> 00:35:18,266

So for now, unfortunately, it's really

just time that you take.

541

00:35:18,266 --> 00:35:21,608

out of your paper writing time.

542

00:35:21,608 --> 00:35:24,170

So I'm sure everybody appreciates it.

543

00:35:25,431 --> 00:35:27,373

Yes, but I don't know.

544

00:35:27,373 --> 00:35:28,434

I see true value to it.

545

00:35:28,434 --> 00:35:34,619

And I think, although it's maybe not as

rewarded as it should, I think many of us

546

00:35:34,619 --> 00:35:36,000

see value in doing it.

547

00:35:36,000 --> 00:35:38,121

So you're very welcome.

548

00:35:38,662 --> 00:35:39,383

Yeah, yeah.

549

00:35:39,383 --> 00:35:40,104

No, for sure.

550

00:35:40,104 --> 00:35:41,384

Lots of value in it.

551

00:35:41,405 --> 00:35:44,367

Just saying that value should be more

recognized.

552

00:35:47,818 --> 00:35:51,460

Just a random question, but something I'm

always curious about.

553

00:35:53,222 --> 00:35:56,125

I think I know the answer if I still want

to ask.

554

00:35:56,145 --> 00:36:02,471

Can you handle sample discrete parameters

with these algorithms?

555

00:36:02,471 --> 00:36:08,616

Because that's one of the grails of the

field right now.

556

00:36:08,796 --> 00:36:10,417

How do you sample discrete parameters?

557

00:36:12,126 --> 00:36:18,728

So, okay, the pack, so what I've

implemented, tested, is all on continuous

558

00:36:18,728 --> 00:36:19,408

space.

559

00:36:19,408 --> 00:36:28,652

But, but what I need for this algorithm to

work is a generative model of which I can

560

00:36:28,652 --> 00:36:30,193

sample from easily.

561

00:36:30,733 --> 00:36:35,315

IID, I mean, not I have to make a Monte

Carlo to sample from my note that I can

562

00:36:35,315 --> 00:36:41,037

just in one Python comment or whichever

language you want comment, gets an IID

563

00:36:41,037 --> 00:36:41,997

sample from.

564

00:36:42,414 --> 00:36:47,015

and that I can write what is the

likelihood of this sample.

565

00:36:47,155 --> 00:36:51,256

Because a lot of generative models

actually don't have tractable likelihoods.

566

00:36:51,256 --> 00:36:55,297

So if you think, I don't know, of

generative adversarial networks or

567

00:36:55,297 --> 00:37:01,139

variational entoencoders for people who

might be familiar with those very, very

568

00:37:01,139 --> 00:37:04,379

common generative models, they don't have

this property.

569

00:37:04,379 --> 00:37:08,941

You can generate samples easily, but you

cannot write down with which density of

570

00:37:08,941 --> 00:37:11,161

probability you've generated this sample.

571

00:37:12,182 --> 00:37:17,324

This is really what we need in order to

use this generative model inside a Markov

572

00:37:17,324 --> 00:37:21,727

chain and inside an algorithm that we know

is going to converge towards the target

573

00:37:21,727 --> 00:37:22,827

distribution.

574

00:37:22,927 --> 00:37:28,431

So normalizing flows are playing this role

for us with continuous variables.

575

00:37:28,431 --> 00:37:34,274

They give us easy sampling and easy

evaluation of the likelihood.

576

00:37:34,374 --> 00:37:39,217

But you also have equivalence on discrete

distributions.

577

00:37:39,217 --> 00:37:40,717

And if you want...

578

00:37:41,658 --> 00:37:45,380

generative model that would have those two

properties on discrete distribution, you

579

00:37:45,380 --> 00:37:47,361

should turn yourself to autoregressive

models.

580

00:37:47,361 --> 00:37:53,624

So I don't know if you've learned about

them, but the idea is just that they use a

581

00:37:53,624 --> 00:37:57,966

factorization of probability distributions

that is just with conditional

582

00:37:57,966 --> 00:37:58,807

distributions.

583

00:37:58,807 --> 00:38:05,250

And that's something that is in theory has

full expressivity, that any distribution

584

00:38:05,250 --> 00:38:11,093

can be written as a factorized

distribution where you are progressively

585

00:38:11,790 --> 00:38:16,033

on the degrees of freedom that you have

already sampled.

586

00:38:16,894 --> 00:38:22,699

And you can rewrite the algorithm,

training an autoregressive model in the

587

00:38:22,699 --> 00:38:24,401

place of a normalizing flow.

588

00:38:24,401 --> 00:38:31,046

So honest answer, I haven't tried, but it

can be done.

589

00:38:31,206 --> 00:38:32,007

Well, it can be done.

590

00:38:32,007 --> 00:38:35,971

And now that I'm thinking about it, people

have done it because in statistical

591

00:38:35,971 --> 00:38:39,570

mechanics, there are a lot of systems that

we like.

592

00:38:39,570 --> 00:38:42,111

a lot of our toy systems that are binary.

593

00:38:42,111 --> 00:38:48,675

So that's, for example, the Ising model,

which are a model of spins that are just

594

00:38:48,675 --> 00:38:50,135

binary variables.

595

00:38:50,376 --> 00:38:56,179

And I know of at least one paper where

they are doing something of this sort.

596

00:38:56,419 --> 00:39:00,962

So making jumps, they're actually not

trying to refresh full configurations, or

597

00:39:00,962 --> 00:39:06,105

they are doing two, both refreshing full

configurations and partial configurations.

598

00:39:06,705 --> 00:39:07,685

And they are doing...

599

00:39:07,826 --> 00:39:13,227

something that, in essence, is exactly

this algorithm, but with discrete

600

00:39:13,227 --> 00:39:13,687

variables.

601

00:39:13,687 --> 00:39:19,429

So I'll happily add the reference to this

paper, which is, I think, it's by the

602

00:39:19,429 --> 00:39:21,889

group of Giuseppe Carleo from EPFL.

603

00:39:22,430 --> 00:39:27,651

And OK, I haven't, I don't think they

train exactly like, so it's not exactly

604

00:39:27,651 --> 00:39:33,793

the same algorithm, but things around this

have been tested.

605

00:39:33,793 --> 00:39:36,013

OK, well, it sounds like a.

606

00:39:36,586 --> 00:39:38,066

Sounds like fun, for sure.

607

00:39:38,686 --> 00:39:43,628

Definitely something I'm sure lots of

people would like to test.

608

00:39:43,628 --> 00:39:48,590

So folks, if you have some discrete

parameters somewhere in your models, maybe

609

00:39:48,590 --> 00:39:51,672

you'll be interested by normalizing flows.

610

00:39:51,672 --> 00:39:57,774

So the flow in C package is in the show

notes.

611

00:39:57,774 --> 00:40:00,975

Feel free to try it out.

612

00:40:00,975 --> 00:40:06,717

Another thing I'm curious about is how do

you run the typical network, actually?

613

00:40:07,638 --> 00:40:13,440

And how much of a bottleneck is it on the

sampling time, if any?

614

00:40:14,381 --> 00:40:14,821

Yes.

615

00:40:14,821 --> 00:40:23,104

So it will definitely depend on the space.

616

00:40:23,665 --> 00:40:25,946

No, let me rewrite.

617

00:40:26,786 --> 00:40:33,189

The thing is, whether or not it's going to

be worth it to train a neural network in

618

00:40:33,189 --> 00:40:34,849

order to help you sampling.

619

00:40:35,102 --> 00:40:40,564

depends on how difficult this for you to

sample in, I mean, with the more

620

00:40:40,564 --> 00:40:44,005

traditional MCMCs that you have on your

hand.

621

00:40:44,025 --> 00:40:50,668

So again, if you have a multimodal

distribution, it's very likely that your

622

00:40:50,668 --> 00:40:53,789

traditional MCMC algorithms are just not

going to cut it.

623

00:40:54,049 --> 00:40:59,232

And so then, I mean, if you really care

about sampling this posterior distribution

624

00:40:59,232 --> 00:41:04,313

or this distribution of configurations of

a physical system,

625

00:41:04,394 --> 00:41:08,016

then you will be willing to pay the price

on this sampling.

626

00:41:08,016 --> 00:41:16,562

So instead of, say, having to use a local

sampler that will take you billions of

627

00:41:16,562 --> 00:41:22,506

iterations in order to see transitions

between the modes, you can train a

628

00:41:22,506 --> 00:41:27,129

normalizing flow on the autoregressive

model if you're discrete, and then have

629

00:41:27,129 --> 00:41:29,811

those jumps happening every other time.

630

00:41:30,431 --> 00:41:33,993

Then it's more than clear that it's worth

doing it.

631

00:41:36,410 --> 00:41:40,576

OK, yeah, so the answer is it depends

quite a lot.

632

00:41:41,158 --> 00:41:45,185

Of course, of course.

633

00:41:45,185 --> 00:41:45,425

Yeah, yeah.

634

00:41:45,425 --> 00:41:49,893

And I guess, how does it scale with the

quantity of parameters and quantity of

635

00:41:49,893 --> 00:41:50,613

data?

636

00:41:52,494 --> 00:41:58,398

So quantity of parameters, it's really

this dimension I was already discussing a

637

00:41:58,398 --> 00:42:03,743

bit about and telling you that there is a

cap on what you can really expect these

638

00:42:03,743 --> 00:42:05,384

methods will work on.

639

00:42:05,625 --> 00:42:10,348

I would say that if the quantity of

parameters is something like tens or

640

00:42:11,029 --> 00:42:16,313

hundreds, then things are going to work

well, more or less out of the box.

641

00:42:17,514 --> 00:42:20,697

But if it's larger than this, you will

likely run into trouble.

642

00:42:22,698 --> 00:42:29,460

And then the number of data is actually

something I'm less familiar with because

643

00:42:29,460 --> 00:42:34,201

I'm less from the Bayesian communities

than the stat-mech community to start

644

00:42:34,201 --> 00:42:34,401

with.

645

00:42:34,401 --> 00:42:39,663

So my distribution doesn't have data

embedded in them, in a sense, most of the

646

00:42:39,663 --> 00:42:40,383

time.

647

00:42:40,563 --> 00:42:45,924

But for sure, what people argue, why it's

a really good idea to use generative

648

00:42:45,924 --> 00:42:49,765

models such as normalizing flows to sample

in the Bayesian context.

649

00:42:49,850 --> 00:42:53,713

is the fact that you have an amortization

going on.

650

00:42:53,713 --> 00:42:55,654

And what do I mean by that?

651

00:42:55,735 --> 00:42:57,636

I mean that you're learning a model.

652

00:42:58,037 --> 00:43:04,662

Once it's learned, it's going to be easy

to adjust it if things are changing a

653

00:43:04,662 --> 00:43:05,443

little.

654

00:43:05,443 --> 00:43:09,446

And with little adjustments, you're going

to be able to sample still a very

655

00:43:09,446 --> 00:43:10,667

complicated distribution.

656

00:43:10,667 --> 00:43:16,372

So say you have data that is arriving

online, and you keep on having new samples

657

00:43:16,372 --> 00:43:18,466

to be added to your posterior

distribution.

658

00:43:18,466 --> 00:43:22,787

then it's very easy to just adjust the

normalizing flow with a few training

659

00:43:22,787 --> 00:43:29,168

iterations to get back to the new

posterior you actually have now, given

660

00:43:29,168 --> 00:43:31,069

that you have this amount of data.

661

00:43:31,069 --> 00:43:36,270

So this is what some people call

amortization, the fact that you can really

662

00:43:36,771 --> 00:43:41,492

encapsulate in your model all the

knowledge you have so far, and then just

663

00:43:41,492 --> 00:43:47,293

adjust it a bit, and don't have to start

from scratch, as you would have to in

664

00:43:47,293 --> 00:43:47,870

other.

665

00:43:47,870 --> 00:43:49,130

Monte Carlo methods.

666

00:43:49,170 --> 00:43:50,170

Yeah.

667

00:43:50,231 --> 00:43:54,592

Yeah, so what I'm guessing is that maybe

the tuning time is a bit longer than a

668

00:43:54,592 --> 00:43:56,233

classic HMC.

669

00:43:56,233 --> 00:44:00,455

But then once you're out of the tuning

phase, the sampling is going to be way

670

00:44:00,455 --> 00:44:01,996

faster.

671

00:44:01,996 --> 00:44:05,117

Yes, I think that's a correct way of

putting it.

672

00:44:05,857 --> 00:44:13,081

And otherwise, for the kind of the number

of, I mean, the dimensionality that the

673

00:44:13,081 --> 00:44:15,661

algorithm is comfortable with.

674

00:44:16,294 --> 00:44:21,377

In general, the running times of the

model, how have you noticed that being

675

00:44:21,377 --> 00:44:27,342

like, has that been close to when you use

a classic HMC or is it something you

676

00:44:27,342 --> 00:44:34,607

haven't done yet?

677

00:44:34,607 --> 00:44:37,289

I don't think I can honestly answer this

question.

678

00:44:37,289 --> 00:44:42,873

I think it will depend because it will

also depend how easily your HMC reaches

679

00:44:42,873 --> 00:44:43,653

all the

680

00:44:44,262 --> 00:44:46,403

regions you actually care about.

681

00:44:47,324 --> 00:44:52,288

So I mean, probably there are some

distributions that are very easy for HMC

682

00:44:52,288 --> 00:44:56,271

to cover and where it wouldn't be worth it

to train the model.

683

00:44:56,672 --> 00:45:01,075

But then plenty of cases where things are

the other way around.

684

00:45:01,075 --> 00:45:03,577

Yeah, yeah, yeah.

685

00:45:03,577 --> 00:45:05,719

Yeah, I can guess.

686

00:45:05,719 --> 00:45:10,843

That's always something that's really

fascinating in this algorithm world is how

687

00:45:11,163 --> 00:45:14,165

dependent everything is on the model.

688

00:45:14,270 --> 00:45:17,651

use case, really dependent on the model

and the data.

689

00:45:18,852 --> 00:45:26,056

So on this project, on this algorithm,

what are the next steps for you?

690

00:45:26,596 --> 00:45:34,420

What would you like to develop next on

this algorithm precisely?

691

00:45:34,721 --> 00:45:42,805

Yes, so as I was saying, one of my main

questions is how to scale this algorithm

692

00:45:42,805 --> 00:45:43,274

and

693

00:45:43,274 --> 00:45:45,976

We kind of wrote it in an all-purpose

fashion.

694

00:45:45,976 --> 00:45:49,158

And all-purpose is nice, but all-purpose

does not scale.

695

00:45:49,578 --> 00:45:55,002

So that's really what I'm focusing on,

trying to understand how we can learn

696

00:45:55,002 --> 00:46:03,268

structures we can know or we can learn

from the system, how to explore them and

697

00:46:03,268 --> 00:46:08,712

put them in, in order to be able to tackle

more and more complex systems with higher,

698

00:46:08,712 --> 00:46:10,713

I mean, more degrees of freedom.

699

00:46:10,950 --> 00:46:14,931

So more parameters than what we are

currently doing.

700

00:46:14,931 --> 00:46:15,831

So there's this.

701

00:46:15,831 --> 00:46:21,513

And of course, I'm also very interested in

having some collaborations with people

702

00:46:21,513 --> 00:46:27,515

that care about actual problem for which

this method is actually solving something

703

00:46:27,515 --> 00:46:28,355

for them.

704

00:46:29,515 --> 00:46:34,857

As it's really what gives you the idea of

what's next to be developed, what are the

705

00:46:34,857 --> 00:46:36,417

next methodologies that's

706

00:46:36,806 --> 00:46:38,247

will be useful to people?

707

00:46:38,247 --> 00:46:39,668

Can they already solve their problem?

708

00:46:39,668 --> 00:46:41,569

Do they need something more from you?

709

00:46:41,669 --> 00:46:46,012

And that's the two things I'm having a

look at.

710

00:46:46,192 --> 00:46:46,853

Yeah.

711

00:46:46,853 --> 00:46:49,835

Well, it definitely sounds like fun.

712

00:46:50,415 --> 00:46:56,319

And I hope you'll be able to work on that

and come up with some new, amazing,

713

00:46:56,500 --> 00:46:58,701

exciting papers on this.

714

00:46:59,101 --> 00:47:01,563

I'll be happy to look at that.

715

00:47:01,723 --> 00:47:05,510

And so that's it.

716

00:47:05,510 --> 00:47:07,892

It was a great deep dive on this project.

717

00:47:07,892 --> 00:47:11,255

And thank you for indulging on my

questions, Marilou.

718

00:47:12,276 --> 00:47:17,982

Now, if we want to de-zoom a bit and talk

about other things you do, you're also

719

00:47:17,982 --> 00:47:25,769

interested to mention that in the context

of scarce data.

720

00:47:26,169 --> 00:47:30,773

So I'm curious on what you're doing on

these, if you could elaborate a bit.

721

00:47:31,818 --> 00:47:37,479

Yes, so I guess what I mean by scarce data

is precisely that when we are using

722

00:47:38,099 --> 00:47:43,881

machine learning in scientific computing,

usually what we are doing is exploiting

723

00:47:43,881 --> 00:47:50,063

the great tool that are deep neural

networks to play the role of a surrogate

724

00:47:50,063 --> 00:47:53,144

model somewhere in our scientific

computation.

725

00:47:53,224 --> 00:47:58,965

But most of the time, this is without data

a priori.

726

00:47:59,018 --> 00:48:02,339

We know that there is a function we want

to approximate somewhere.

727

00:48:02,339 --> 00:48:07,281

But in order to have data, either we have

to pay the price of costly experiments,

728

00:48:07,281 --> 00:48:11,783

costly observations, or we have to pay the

price of costly numerics.

729

00:48:11,883 --> 00:48:16,505

So if you, I mean, a very famous example

of applications of machine learning

730

00:48:16,505 --> 00:48:21,587

through scientific computing is molecular

dynamics and quantum precision.

731

00:48:21,587 --> 00:48:24,728

So this is what people call density

functional theory.

732

00:48:25,209 --> 00:48:26,729

So if you want to.

733

00:48:27,538 --> 00:48:33,079

observe the dynamics of a molecule with

the accuracy of what's going on really at

734

00:48:33,079 --> 00:48:38,120

the level of quantum mechanics, then you

have to make very, very costly call to a

735

00:48:38,120 --> 00:48:44,402

function that predicts what's the energy

predicted by quantum mechanics and what

736

00:48:44,402 --> 00:48:47,023

are the forces predicted by quantum

mechanics.

737

00:48:47,423 --> 00:48:53,285

So people have seen here an opportunity to

use deep neural nets in order to just

738

00:48:53,285 --> 00:48:57,105

regress what's the value of this quantum

potential.

739

00:48:58,350 --> 00:49:00,971

at the different locations that you're

going to visit.

740

00:49:01,192 --> 00:49:04,634

And the idea is that you are creating your

own data.

741

00:49:04,634 --> 00:49:09,717

You are deciding when you are going to pay

the price of do the full numerical

742

00:49:09,717 --> 00:49:15,500

computation and then obtain a training

point of given Cartesian coordinates, what

743

00:49:15,500 --> 00:49:17,722

is the value of this energy here.

744

00:49:17,982 --> 00:49:22,245

And then you have to, I mean, conversely

to what you're doing traditionally in

745

00:49:22,245 --> 00:49:24,138

machine learning, where you believe that

you have...

746

00:49:24,138 --> 00:49:30,061

huge data sets that are encapsulating a

rule, and you're going to try to exploit

747

00:49:30,061 --> 00:49:31,061

them at best.

748

00:49:31,061 --> 00:49:34,983

Here, you have the choice of where you

create your data.

749

00:49:34,983 --> 00:49:40,686

And so you, of course, have to be as smart

as possible in order to have to create as

750

00:49:40,686 --> 00:49:43,468

little as possible training points.

751

00:49:43,468 --> 00:49:49,831

And so this is this idea of working with

scarce data that has to be infused in the

752

00:49:49,831 --> 00:49:53,493

usage of machine learning in scientific

computing.

753

00:49:54,534 --> 00:49:58,675

My example of application is just what we

have discussed, where we want to learn a

754

00:49:58,675 --> 00:50:03,998

deep generative model, whereas what we

start, we just have our target

755

00:50:03,998 --> 00:50:07,560

distribution as an objective, but we don't

have any sample from it.

756

00:50:07,560 --> 00:50:12,442

That would be the traditional data that

people will be using in generative

757

00:50:12,442 --> 00:50:15,863

modeling to train a generative model.

758

00:50:15,863 --> 00:50:18,905

So if you want, we are playing this

adaptive game.

759

00:50:18,905 --> 00:50:20,985

I was already a bit eating at.

760

00:50:21,330 --> 00:50:24,752

where we are creating data that is not

exactly the data we want, but that we

761

00:50:24,752 --> 00:50:28,794

believe is informative of the data we want

to train the generative model that is in

762

00:50:28,794 --> 00:50:34,017

turn going to help us to convert the MCMC

and in the same time as you are training

763

00:50:34,017 --> 00:50:37,179

your model, generate the data you would

have needed to train your model.

764

00:50:39,380 --> 00:50:41,822

Yeah, that is really cool.

765

00:50:41,822 --> 00:50:46,344

And of course I asked about that because

scarce data is something that's extremely

766

00:50:46,344 --> 00:50:48,145

common in the Bayesian world.

767

00:50:49,142 --> 00:50:55,845

That's where usually Bayesian statistics

from the yeah, helpful and useful because

768

00:50:56,586 --> 00:51:00,127

when you don't have a lot of data, you

need more structure and more priors.

769

00:51:00,127 --> 00:51:04,449

So if you want to say anything about your

phenomenon of interest.

770

00:51:06,134 --> 00:51:10,157

So that's really cool that you're working

on that.

771

00:51:10,257 --> 00:51:11,198

I love that.

772

00:51:12,239 --> 00:51:18,104

And from also, you know, a bit broader

perspective, you know, MCMC really well.

773

00:51:18,104 --> 00:51:19,385

We work on it a lot.

774

00:51:19,385 --> 00:51:25,590

So I'm curious where you think MCMC is

heading in the next few years.

775

00:51:25,731 --> 00:51:29,013

And if you see its relevance waning in

some way.

776

00:51:30,774 --> 00:51:39,798

Well, I don't think MCMC can go out of

fashion in a sense because it's absolutely

777

00:51:39,798 --> 00:51:40,739

ubiquitous.

778

00:51:40,739 --> 00:51:45,662

So practical use cases are everywhere.

779

00:51:45,662 --> 00:51:52,305

If you have a large probabilistic model,

usually it's given to you by the nature of

780

00:51:52,305 --> 00:51:54,126

the problem you want to study.

781

00:51:54,246 --> 00:51:58,949

And if you cannot choose anything about

putting in the right properties, you're

782

00:51:58,949 --> 00:52:00,042

just going to be.

783

00:52:00,042 --> 00:52:04,883

you know, left with something that you

don't know how to approach except by MCMC.

784

00:52:04,884 --> 00:52:10,746

So it's absolutely ubiquitous as an

algorithm for probabilistic inference.

785

00:52:11,146 --> 00:52:17,789

And I would also say that one of the

things that are going to, you know, keep

786

00:52:17,789 --> 00:52:23,532

MCMC going for a long time is how much

it's a cherished object of study by

787

00:52:23,532 --> 00:52:27,690

actually researchers from different

communities, because I mean...

788

00:52:27,690 --> 00:52:33,931

You can see people really from statistics

that are kind of the prime researchers on,

789

00:52:34,031 --> 00:52:39,493

okay, how should you make a Monte Carlo

method that has the best convergence

790

00:52:39,493 --> 00:52:43,774

properties, the best speed of convergence,

and so on and so forth.

791

00:52:43,814 --> 00:52:48,375

But you can also see that the fields where

those algorithms are used a lot, be it

792

00:52:48,375 --> 00:52:53,977

statistical mechanics, be it Bayesian

inference, also have full communities that

793

00:52:53,977 --> 00:52:56,177

are working on developing MCMCs.

794

00:52:56,242 --> 00:53:03,164

And so I think it's really a matter that

they are an object of curiosity and in

795

00:53:03,164 --> 00:53:04,724

training to a lot of people.

796

00:53:04,724 --> 00:53:13,666

And therefore it's something that's for

now is still very relevant and really

797

00:53:13,666 --> 00:53:14,367

unsolved.

798

00:53:14,367 --> 00:53:17,488

I mean, something that I love about MCMC

is that when you look at it first, you

799

00:53:17,488 --> 00:53:20,849

say, yeah, that's simple, you know?

800

00:53:20,849 --> 00:53:20,989

Yeah.

801

00:53:20,989 --> 00:53:23,569

Yes, that's, but then you start thinking

about it.

802

00:53:23,569 --> 00:53:24,369

Then you...

803

00:53:25,034 --> 00:53:28,836

I mean, realize how subtle are all the

properties of those algorithms.

804

00:53:28,836 --> 00:53:34,580

And you're telling yourself, but I cannot

believe it's so hard to actually sample

805

00:53:34,580 --> 00:53:40,404

from distributions that are not that

complicated when you're a naive newcomer.

806

00:53:41,085 --> 00:53:47,269

And so, yeah, I mean, for now, I think

they are still here and in place.

807

00:53:47,269 --> 00:53:52,573

And if I could even comment a bit more

regarding exactly the context of my

808

00:53:52,573 --> 00:53:53,546

research, where

809

00:53:53,546 --> 00:53:57,968

it could seemingly be the case that I'm

trying to replace MCMC's with machine

810

00:53:57,968 --> 00:53:58,808

learning.

811

00:53:59,369 --> 00:54:04,271

I would warn the listeners that it's not

at all what we are concluding.

812

00:54:04,271 --> 00:54:06,953

I mean, that's not at all the direction we

are going to.

813

00:54:06,953 --> 00:54:11,195

It's really a case where we need both.

814

00:54:11,195 --> 00:54:16,338

That MCMC can benefit from learning, but

learning without MCMC is never going to

815

00:54:16,338 --> 00:54:21,220

give you something that you have enough

guarantees on, that something that you can

816

00:54:21,220 --> 00:54:23,626

really trust for sure.

817

00:54:23,626 --> 00:54:28,847

So I think here there is a really nice

combination of MCMC and learning and that

818

00:54:28,847 --> 00:54:32,408

they're just going to nutter each other

and not replace one another.

819

00:54:33,649 --> 00:54:34,989

Yeah, yeah, for sure.

820

00:54:34,989 --> 00:54:41,631

And I really love the, yeah, that these

projects of trying to make basically MCMC

821

00:54:41,631 --> 00:54:46,773

more informed instead of having first

random draws, you know, almost random

822

00:54:46,773 --> 00:54:48,746

draws with Metropolis in the end.

823

00:54:48,746 --> 00:54:52,428

making that more complicated, more

informed with the gradients, with HMC, and

824

00:54:52,428 --> 00:54:57,992

then normalizing flows, which try to

squeeze a bit more information out of the

825

00:54:58,012 --> 00:55:02,155

structure that you have to make the

sampling go faster.

826

00:55:03,156 --> 00:55:05,878

I found that one super useful.

827

00:55:05,878 --> 00:55:11,923

And also, yeah, that's also a very, very

fascinating part of the research.

828

00:55:11,923 --> 00:55:15,874

And this is part also of a lot of the

research

829

00:55:15,874 --> 00:55:19,634

a lot of initiatives that you have focused

on, right?

830

00:55:19,755 --> 00:55:25,616

Personally, basically how that we could

decry it like a machine learning assisted

831

00:55:25,616 --> 00:55:26,577

scientific computing.

832

00:55:26,577 --> 00:55:34,679

You know, and do you have other examples

to share with us on how machine learning

833

00:55:34,679 --> 00:55:40,240

is helping traditional scientific

computing methods?

834

00:55:40,240 --> 00:55:40,640

Yes.

835

00:55:40,640 --> 00:55:44,001

So, for example, I was giving already the

example of

836

00:55:45,754 --> 00:55:50,316

of the learning of the regression of the

potentials of molecular force fields in

837

00:55:50,316 --> 00:55:52,976

people that are studying molecules.

838

00:55:53,757 --> 00:55:57,239

But we are seeing a lot of other things

going on.

839

00:55:57,239 --> 00:56:03,201

So there are people that are trying to

even use machine learning as a black box

840

00:56:03,381 --> 00:56:09,984

in order to, how should I say, to make

classifications between things they care

841

00:56:09,984 --> 00:56:10,344

about.

842

00:56:10,344 --> 00:56:13,765

So for example, you have samples that come

from a model.

843

00:56:14,002 --> 00:56:16,964

But you're not sure if they come from this

model or this other one.

844

00:56:16,964 --> 00:56:20,686

You're not sure if they are above a

critical temperature or below a critical

845

00:56:20,686 --> 00:56:23,288

temperature, if they belong to the same

phase.

846

00:56:23,989 --> 00:56:31,234

So you can really try to play this game of

creating an artificial data set where you

847

00:56:31,234 --> 00:56:37,078

know what is the answer, train a

classifier, and then use your black box to

848

00:56:37,078 --> 00:56:41,441

tell you when you see a new configuration

which type of configuration it is.

849

00:56:41,441 --> 00:56:42,741

And it's really.

850

00:56:43,098 --> 00:56:50,200

given to you by deep learning because you

would have no idea why the neural net is

851

00:56:50,200 --> 00:56:52,541

deciding that it's actually from this or

from this.

852

00:56:52,541 --> 00:56:56,703

You don't have any other statistics that

you can gather and that will tell you

853

00:56:56,703 --> 00:56:58,764

what's the answer and this is why.

854

00:56:58,764 --> 00:57:04,386

But it's kind of like opening this new

conceptual door that sometimes there are

855

00:57:04,386 --> 00:57:05,607

things that are predictable.

856

00:57:05,607 --> 00:57:09,748

I mean, you can check that, okay, on the

data that you know the answer of the

857

00:57:09,748 --> 00:57:11,649

machine is extremely efficient.

858

00:57:12,874 --> 00:57:17,295

But then you don't know why things are

happening this way.

859

00:57:17,475 --> 00:57:23,818

I mean, there's this, but there are plenty

of other directions.

860

00:57:23,818 --> 00:57:28,740

So people that are, for example, using

neural networks to try to discover a

861

00:57:28,740 --> 00:57:29,580

model.

862

00:57:29,580 --> 00:57:35,903

And here, model would be actually what

people call partial differential

863

00:57:35,903 --> 00:57:37,503

equations, so PDEs.

864

00:57:38,284 --> 00:57:41,605

So I don't know if you've heard about

those physics-informed neural networks.

865

00:57:43,010 --> 00:57:47,753

But there are neural networks that people

are training, such that they are solution

866

00:57:47,753 --> 00:57:48,894

of a PDE.

867

00:57:48,894 --> 00:57:56,459

So instead of actually having training

data, what you do is that you use the

868

00:57:56,459 --> 00:57:59,981

properties of the deep neural nets, which

are that they are differentiable with

869

00:57:59,981 --> 00:58:03,844

respect to their parameters, but also with

respect to their inputs.

870

00:58:04,164 --> 00:58:06,706

And for example, you have a function f.

871

00:58:06,706 --> 00:58:11,769

And you know that the laplation of f is

supposed to be equal to.

872

00:58:12,174 --> 00:58:17,979

the derivative in time of f, well, you can

write mean squared loss on the fact that

873

00:58:17,979 --> 00:58:23,863

the laplacian of your neural network has

to be close to its derivative in time.

874

00:58:24,124 --> 00:58:28,508

And then, given boundary conditions, so

maybe initial condition in time and

875

00:58:28,508 --> 00:58:35,253

boundary condition in space, you can ask a

neural net to predict the solution of the

876

00:58:35,253 --> 00:58:36,254

PDE.

877

00:58:36,334 --> 00:58:40,237

And even better, you can give to your

878

00:58:40,294 --> 00:58:46,637

learning mechanism a library of term that

would be possible candidates for being

879

00:58:46,637 --> 00:58:48,358

part of the PDE.

880

00:58:48,378 --> 00:58:53,481

And you can let the network tell you which

terms of the PDE in the library are

881

00:58:53,481 --> 00:58:59,244

actually, seems to be actually in the data

you are observing.

882

00:58:59,244 --> 00:59:04,767

So, I mean, there are all kinds of

inventive way that researchers are now

883

00:59:04,828 --> 00:59:08,569

using the fact that deep neural nets are

differentiable.

884

00:59:09,378 --> 00:59:16,562

smooth, can generalize easily, and yes,

those universal approximators.

885

00:59:16,562 --> 00:59:23,346

I mean, seemingly you can use neural nets

to represent any kind of function and use

886

00:59:23,346 --> 00:59:31,130

that inside their computation problems to

try to, I don't know, answer all kinds of

887

00:59:31,130 --> 00:59:31,950

scientific questions.

888

00:59:31,950 --> 00:59:34,531

So it's, I believe, pretty exciting.

889

00:59:34,692 --> 00:59:36,933

Yeah, yeah, that is super fun.

890

00:59:36,933 --> 00:59:38,033

I love how

891

00:59:38,378 --> 00:59:46,023

You know, these comes together to help on

really hard sampling problems like

892

00:59:46,023 --> 00:59:50,086

sampling ODE's or PDE's, just extremely

hard.

893

00:59:50,967 --> 00:59:52,609

So yeah, using that.

894

00:59:52,609 --> 00:59:55,291

Maybe one day also we'll get something for

GPs.

895

00:59:55,291 --> 01:00:02,156

I know the Gaussian processes are a lot of

the effort is on decomposing them and

896

01:00:02,156 --> 01:00:04,417

finding some useful

897

01:00:04,510 --> 01:00:10,813

algebraic decompositions, so like the

helper space, Gaussian processes that Bill

898

01:00:10,813 --> 01:00:17,257

Engels especially has added to the PrimeC

API, or eigenvalue decomposition, stuff

899

01:00:17,257 --> 01:00:18,017

like that.

900

01:00:18,017 --> 01:00:24,701

But I'd be curious to see if there are

also some initiatives on trying to help

901

01:00:24,701 --> 01:00:28,543

the conversion of Gaussian processes using

probably deep neural networks, because

902

01:00:28,543 --> 01:00:33,125

there is a mathematical connection between

neural networks and GPs.

903

01:00:33,838 --> 01:00:36,159

I mean, everything is a GP in the end, it

seems.

904

01:00:36,159 --> 01:00:42,824

So yeah, using a neural network to

facilitate the sampling of a Gaussian

905

01:00:42,824 --> 01:00:46,286

process would be super fun.

906

01:00:46,286 --> 01:00:49,048

So I have so many more questions.

907

01:00:49,048 --> 01:00:54,212

But when I be mindful of your time, we've

already been recording for some time.

908

01:00:54,212 --> 01:01:00,236

So I try to make my thoughts more packed.

909

01:01:00,236 --> 01:01:02,617

But something I wanted to ask you

910

01:01:03,278 --> 01:01:08,960

You teach actually a course in

Polytechnique in France that's called

911

01:01:08,960 --> 01:01:11,001

Emerging Topics in Machine Learning.

912

01:01:11,061 --> 01:01:19,845

So I'm curious to hear you say what are

some of the emerging topics that excite

913

01:01:19,845 --> 01:01:23,987

you the most and how do you approach

teaching them?

914

01:01:23,987 --> 01:01:29,629

So in this class, it's actually the nice

class where we have a wild card to just

915

01:01:29,629 --> 01:01:31,269

talk about whatever we want.

916

01:01:31,542 --> 01:01:37,346

So as far as I'm concerned, I'm really

teaching about the last point that we

917

01:01:37,346 --> 01:01:42,730

discussed, which is how can we hope to use

the technology of machine learning to

918

01:01:42,730 --> 01:01:44,451

assist scientific computing.

919

01:01:45,392 --> 01:01:49,815

And I have colleagues that are jointly

teaching this class with me that are, for

920

01:01:49,815 --> 01:01:56,900

example, teaching about optimal transport

or about private and federated learning.

921

01:01:56,900 --> 01:01:59,481

So it can be different topics.

922

01:01:59,606 --> 01:02:05,588

But we all have the same approach to it,

which is to introduce to the students the

923

01:02:05,588 --> 01:02:14,372

main ideas quite briefly and then to give

them the opportunity to learn, to read

924

01:02:14,372 --> 01:02:20,214

papers that we believe are important or at

least really illustrative of those ideas

925

01:02:20,214 --> 01:02:24,876

and the direction in which the research is

going and to read these papers, of course,

926

01:02:24,876 --> 01:02:25,456

critically.

927

01:02:25,456 --> 01:02:29,430

So the idea is that we want to make sure

that they are understood.

928

01:02:29,430 --> 01:02:32,271

We also want them to implement the

methods.

929

01:02:32,271 --> 01:02:36,654

And once you implement the methods, you

realize everything that is sometimes under

930

01:02:36,654 --> 01:02:38,154

the rug in the paper.

931

01:02:38,154 --> 01:02:41,236

So where is it really difficult?

932

01:02:42,277 --> 01:02:46,019

Where the method is really making a

difference?

933

01:02:47,040 --> 01:02:48,200

And so on and so forth.

934

01:02:48,200 --> 01:02:50,921

So that's our approach to it.

935

01:02:52,778 --> 01:02:55,399

Yeah, that must be a very fun course.

936

01:02:56,300 --> 01:02:58,161

At which level do you teach that?

937

01:02:59,002 --> 01:03:02,785

So our students are third year at Ecole

Polytechnique.

938

01:03:02,785 --> 01:03:07,608

So that would be equivalent to the first

year of graduate program.

939

01:03:07,768 --> 01:03:10,250

Yeah.

940

01:03:10,250 --> 01:03:14,573

And actually, looking forward, what do you

think are the most promising areas of

941

01:03:14,573 --> 01:03:17,115

research in what you do?

942

01:03:17,115 --> 01:03:20,757

So basically, interaction of machine

learning and statistical physics.

943

01:03:21,770 --> 01:03:28,832

Well, I think something that actually has

been and will continue being a very, very

944

01:03:28,832 --> 01:03:32,334

fruitful field between statistical

mechanics and machine learning are

945

01:03:32,334 --> 01:03:33,314

generative models.

946

01:03:33,314 --> 01:03:42,258

So you probably heard of diffusion models,

and there are new kind of generative

947

01:03:42,258 --> 01:03:48,961

models that are relying on learning how to

reverse a diffusion process, a diffusion

948

01:03:48,961 --> 01:03:50,921

process that is noising the data.

949

01:03:52,070 --> 01:03:56,531

once you've learned how to reverse it,

will allow you to transform noise into

950

01:03:56,531 --> 01:03:57,411

data.

951

01:03:58,651 --> 01:04:03,212

It's something that is really close to

statistical mechanics because the

952

01:04:03,453 --> 01:04:09,094

diffusion really comes from studying

brilliant particles that are all around

953

01:04:09,094 --> 01:04:09,574

us.

954

01:04:09,574 --> 01:04:12,815

And this is where this mathematics comes

from.

955

01:04:12,835 --> 01:04:17,897

And this is still an object of study in

the field of statistical mechanics.

956

01:04:17,897 --> 01:04:21,817

And you've served a lot of machine

learning models.

957

01:04:22,034 --> 01:04:25,836

I could also cite Boltzmann machines.

958

01:04:26,157 --> 01:04:29,560

I mean, they have even the name of the

father of statistical mechanics,

959

01:04:29,560 --> 01:04:30,520

Boltzmann.

960

01:04:31,021 --> 01:04:36,705

And it's here again, I mean, something

where it's really inspiration from the

961

01:04:36,705 --> 01:04:44,352

model studied by physicists that gave the

first forms of models that were used by

962

01:04:44,352 --> 01:04:47,234

machine learner in order to do density

estimation.

963

01:04:47,234 --> 01:04:51,977

So there is really this cross-fatalization

964

01:04:52,238 --> 01:04:55,518

has been here for, I guess, the last 50

years.

965

01:04:55,518 --> 01:05:02,780

The field of machine learning has really

emerged in the communities.

966

01:05:02,780 --> 01:05:09,282

And I'm hoping that my work and all the

groups that are working in this direction

967

01:05:09,282 --> 01:05:13,123

are also going to demonstrate the other

way around, that generative models can

968

01:05:13,123 --> 01:05:15,964

help also a lot in statistical mechanics.

969

01:05:15,964 --> 01:05:21,105

So that's definitely what I am looking

forward to.

970

01:05:21,105 --> 01:05:21,785

Yeah.

971

01:05:22,510 --> 01:05:28,918

Yeah, I love that and understand why

you're talking about that, especially now

972

01:05:28,918 --> 01:05:31,300

with the whole conversation we've had.

973

01:05:32,322 --> 01:05:34,704

That your answer is not surprising to me.

974

01:05:38,354 --> 01:05:44,116

Actually, something also that I mean, even

broader than that, I'm guessing you

975

01:05:44,116 --> 01:05:50,179

already care a lot about these questions

from what I get, but if you could choose

976

01:05:50,179 --> 01:05:58,962

the questions you'd like to see the answer

to before you die, what would they be?

977

01:05:58,962 --> 01:06:01,783

That's obviously a very vast question.

978

01:06:03,544 --> 01:06:06,745

If I stick to a bit really this...

979

01:06:07,122 --> 01:06:13,306

what we've discussed about the sampling

problems and where I think they are hard

980

01:06:13,306 --> 01:06:15,868

and why they are so intriguing.

981

01:06:16,048 --> 01:06:22,153

I think that something I'm very keen on

seeing some progress around is this

982

01:06:22,153 --> 01:06:27,176

question of sampling multimodal

distributions but have come up with

983

01:06:27,176 --> 01:06:28,317

guarantees.

984

01:06:28,877 --> 01:06:33,781

Here, there's really, in a sense, sampling

a multimodal distribution could be just

985

01:06:33,781 --> 01:06:34,581

judged.

986

01:06:34,974 --> 01:06:35,734

undoable.

987

01:06:35,734 --> 01:06:40,756

I mean, there is some NP-hardness that is

hidden somewhere in this picture.

988

01:06:40,756 --> 01:06:44,557

So of course, it's not going to be

something general, but I'm really

989

01:06:44,557 --> 01:06:48,639

wondering, I mean, I'm really thinking

that there should be some assumption, some

990

01:06:48,639 --> 01:06:54,481

way of formalizing the problem under which

we could understand how to construct

991

01:06:54,481 --> 01:06:59,943

algorithms that will probably, you know,

succeed in making this something happen.

992

01:07:00,664 --> 01:07:04,565

And so here, I don't know, it's a

theoretical question, but I'm

993

01:07:05,350 --> 01:07:09,531

very curious about what we will manage to

say in this direction.

994

01:07:10,051 --> 01:07:12,432

Yeah.

995

01:07:12,432 --> 01:07:16,513

And actually that sets us up, I think, for

the last two questions of the show.

996

01:07:16,513 --> 01:07:22,274

So, I mean, I have other questions, but

already I've been recording for a long

997

01:07:22,274 --> 01:07:22,474

time.

998

01:07:22,474 --> 01:07:26,215

So I need to let you go and have dinner.

999

01:07:26,215 --> 01:07:27,196

I know it's late for you.

Speaker:

01:07:27,196 --> 01:07:29,896

So let me ask you the last two questions.

Speaker:

01:07:29,896 --> 01:07:32,437

I ask every guest at the end of the show.

Speaker:

01:07:33,097 --> 01:07:33,874

First one.

Speaker:

01:07:33,874 --> 01:07:38,321

If you had unlimited time and resources,

which problem would you try to solve?

Speaker:

01:07:40,562 --> 01:07:45,244

I think it's an excellent question because

it's an excellent opportunity maybe to say

Speaker:

01:07:45,244 --> 01:07:49,485

that we don't have unlimited resources.

Speaker:

01:07:50,306 --> 01:07:57,149

I think it's probably the biggest

challenge we have right now to understand

Speaker:

01:07:57,149 --> 01:08:02,631

and to collectively understand because I

think now we individually understand that

Speaker:

01:08:02,631 --> 01:08:04,792

we don't have unlimited resources.

Speaker:

01:08:05,132 --> 01:08:07,913

And in a sense the...

Speaker:

01:08:08,834 --> 01:08:14,156

the biggest problem is how do we move this

complex system of human societies we have

Speaker:

01:08:14,196 --> 01:08:19,778

created in order to move within the

direction where we are using precisely

Speaker:

01:08:19,778 --> 01:08:21,058

less resources.

Speaker:

01:08:21,279 --> 01:08:25,881

And I mean, it has nothing to do with

anything that we have discussed before,

Speaker:

01:08:25,881 --> 01:08:32,083

but it feels to me that it's really where

the biggest question is lying that really

Speaker:

01:08:32,083 --> 01:08:33,304

matters today.

Speaker:

01:08:33,504 --> 01:08:35,965

And I have no clue how to approach it.

Speaker:

01:08:36,425 --> 01:08:37,105

But

Speaker:

01:08:38,046 --> 01:08:39,606

I think it's actually what matters.

Speaker:

01:08:39,606 --> 01:08:46,389

And if I had a limit in time and

resources, that's definitely what I would

Speaker:

01:08:46,529 --> 01:08:47,989

be researching towards.

Speaker:

01:08:49,270 --> 01:08:51,791

Yeah.

Speaker:

01:08:51,791 --> 01:08:52,431

Love that answer.

Speaker:

01:08:52,431 --> 01:08:54,752

And you're definitely in good company.

Speaker:

01:08:54,932 --> 01:08:59,494

Lots of people have talked about that for

this question, actually.

Speaker:

01:08:59,875 --> 01:09:04,076

And second question, if you could have

dinner with any great scientific mind,

Speaker:

01:09:04,136 --> 01:09:07,417

dead, alive, or fictional, who would it

be?

Speaker:

01:09:09,518 --> 01:09:14,201

So, I mean, a logic answer with my last

response is actually Grotendieck.

Speaker:

01:09:14,201 --> 01:09:20,584

So, I don't know, you probably know about

this mathematician who, I mean, was

Speaker:

01:09:20,744 --> 01:09:27,988

somebody worried about, you know, our

relationship to the world, let's say, as

Speaker:

01:09:27,988 --> 01:09:34,732

scientists very early on, and who had

concluded that to some extent we should

Speaker:

01:09:34,732 --> 01:09:36,213

not be doing research.

Speaker:

01:09:36,573 --> 01:09:37,253

So...

Speaker:

01:09:38,174 --> 01:09:44,835

I don't know that I agree, but I also

don't think it's obviously wrong.

Speaker:

01:09:44,835 --> 01:09:50,617

So I think it would be really probably one

of the most interesting discussion to be

Speaker:

01:09:50,617 --> 01:09:54,258

added on top that he was a fantastic

speaker.

Speaker:

01:09:54,258 --> 01:09:58,579

And I do invite you to listen to his

conferences and that it would be really

Speaker:

01:09:58,579 --> 01:10:00,580

fascinating to have this conversation.

Speaker:

01:10:01,340 --> 01:10:02,120

Yeah.

Speaker:

01:10:02,180 --> 01:10:02,920

Great.

Speaker:

01:10:03,420 --> 01:10:04,061

Great answer.

Speaker:

01:10:04,061 --> 01:10:06,741

You know, definitely the first one to

answer Grotendic.

Speaker:

01:10:07,922 --> 01:10:09,402

But that'd be cool.

Speaker:

01:10:09,402 --> 01:10:09,542

Yeah.

Speaker:

01:10:09,542 --> 01:10:14,624

If you have a favorite conference of him,

feel free to put that in the show notes

Speaker:

01:10:14,624 --> 01:10:19,266

for listeners, I think it's going to be

really interesting and fun for people.

Speaker:

01:10:19,727 --> 01:10:21,247

Might be in French, but...

Speaker:

01:10:22,328 --> 01:10:26,369

I mean, there are a lot of subtitles now.

Speaker:

01:10:26,369 --> 01:10:31,932

If it's in YouTube, it's doing a pretty

good job at the automated transcription,

Speaker:

01:10:31,932 --> 01:10:32,772

especially in English.

Speaker:

01:10:32,772 --> 01:10:35,213

So I think it will be okay.

Speaker:

01:10:36,498 --> 01:10:40,059

And that will be good for people's French

lessons.

Speaker:

01:10:40,059 --> 01:10:43,961

So yeah, you know, two birds with one

stone.

Speaker:

01:10:44,241 --> 01:10:45,981

So definitely include that now.

Speaker:

01:10:47,642 --> 01:10:48,563

Awesome, Marie-Lou.

Speaker:

01:10:48,563 --> 01:10:51,124

So that was really great.

Speaker:

01:10:51,124 --> 01:10:54,705

Thanks a lot for taking the time and being

so generous with your time.

Speaker:

01:10:55,546 --> 01:10:59,268

I'm happy because I had a lot of

questions, but I think we did a pretty

Speaker:

01:10:59,268 --> 01:11:02,789

good job at tackling most of them.

Speaker:

01:11:03,029 --> 01:11:03,969

As usual,

Speaker:

01:11:04,070 --> 01:11:08,375

I put resources and a link to your website

in the show notes for those who want to

Speaker:

01:11:08,375 --> 01:11:09,316

dig deeper.

Speaker:

01:11:09,316 --> 01:11:12,580

Thank you again, Marie-Lou, for taking the

time and being on this show.

Speaker:

01:11:13,301 --> 01:11:14,883

Thank you so much for having me.

Previous post
Next post