Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Changing perspective is often a great way to solve burning research problems. Riemannian spaces are such a perspective change, as Arto Klami, an Associate Professor of computer science at the University of Helsinki and member of the Finnish Center for Artificial Intelligence, will tell us in this episode.
He explains the concept of Riemannian spaces, their application in inference algorithms, how they can help sampling Bayesian models, and their similarity with normalizing flows, that we discussed in episode 98.
Arto also introduces PreliZ, a tool for prior elicitation, and highlights its benefits in simplifying the process of setting priors, thus improving the accuracy of our models.
When Arto is not solving mathematical equations, you’ll find him cycling, or around a good board game.
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉
Takeaways:
– Riemannian spaces offer a way to improve computational efficiency and accuracy in Bayesian inference by considering the curvature of the posterior distribution.
– Riemannian spaces can be used in Laplace approximation and Markov chain Monte Carlo algorithms to better model the posterior distribution and explore challenging areas of the parameter space.
– Normalizing flows are a complementary approach to Riemannian spaces, using non-linear transformations to warp the parameter space and improve sampling efficiency.
– Evaluating the performance of Bayesian inference algorithms in challenging cases is a current research challenge, and more work is needed to establish benchmarks and compare different methods.
– PreliZ is a package for prior elicitation in Bayesian modeling that facilitates communication with users through visualizations of predictive and parameter distributions.
– Careful prior specification is important, and tools like PreliZ make the process easier and more reproducible.
– Teaching Bayesian machine learning is challenging due to the combination of statistical and programming concepts, but it is possible to teach the basic reasoning behind Bayesian methods to a diverse group of students.
– The integration of Bayesian approaches in data science workflows is becoming more accepted, especially in industries that already use deep learning techniques.
– The future of Bayesian methods in AI research may involve the development of AI assistants for Bayesian modeling and probabilistic reasoning.
Chapters:
00:00 Introduction and Background
02:05 Arto’s Work and Background
06:05 Introduction to Bayesian Inference
12:46 Riemannian Spaces in Bayesian Inference
27:24 Availability of Romanian-based Algorithms
30:20 Practical Applications and Evaluation
37:33 Introduction to Prelease
38:03 Prior Elicitation
39:01 Predictive Elicitation Techniques
39:30 PreliZ: Interface with Users
40:27 PreliZ: General Purpose Tool
41:55 Getting Started with PreliZ
42:45 Challenges of Setting Priors
45:10 Reproducibility and Transparency in Priors
46:07 Integration of Bayesian Approaches in Data Science Workflows
55:11 Teaching Bayesian Machine Learning
01:06:13 The Future of Bayesian Methods with AI Research
01:10:16 Solving the Prior Elicitation Problem
Links from the show:
- LBS #29, Model Assessment, Non-Parametric Models, And Much More, with Aki Vehtari: https://learnbayesstats.com/episode/model-assessment-non-parametric-models-aki-vehtari/
- LBS #20 Regression and Other Stories, with Andrew Gelman, Jennifer Hill & Aki Vehtari: https://learnbayesstats.com/episode/20-regression-and-other-stories-with-andrew-gelman-jennifer-hill-aki-vehtari/
- LBS #98 Fusing Statistical Physics, Machine Learning & Adaptive MCMC, with Marylou Gabrié: https://learnbayesstats.com/episode/98-fusing-statistical-physics-machine-learning-adaptive-mcmc-marylou-gabrie/
- Arto’s website: https://www.cs.helsinki.fi/u/aklami/
- Arto on Google Scholar: https://scholar.google.com/citations?hl=en&user=v8PeLGgAAAAJ
- Multi-source probabilistic inference Group: https://www.helsinki.fi/en/researchgroups/multi-source-probabilistic-inference
- FCAI web page: https://fcai.fi
- Probabilistic AI summer school lectures: https://www.youtube.com/channel/UCcMwNzhpePJE3xzOP_3pqsw
- Keynote: “Better priors for everyone” by Arto Klami: https://www.youtube.com/watch?v=mEmiEHsfWyc&ab_channel=ProbabilisticAISchool
- Variational Inference and Optimization I by Arto Klami: https://www.youtube.com/watch?v=60USDNc1nE8&list=PLRy-VW__9hV8s–JkHXZvnd26KgjRP2ik&index=3&ab_channel=ProbabilisticAISchool
- PreliZ, A tool-box for prior elicitation: https://preliz.readthedocs.io/en/latest/
- AISTATS paper that presents the new computationally efficient metric in context of MCMC: https://researchportal.helsinki.fi/en/publications/lagrangian-manifold-monte-carlo-on-monge-patches
- TMLR paper that scales up the solution for larger models, using the metric for sampling-based inference in deel learning: https://openreview.net/pdf?id=dXAuvo6CGI
- Riemannian Laplace approximation (to appear in AISTATS’24): https://arxiv.org/abs/2311.02766
- Prior Knowledge Elicitation — The Past, Present, and Future: https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Prior-Knowledge-Elicitation-The-Past-Present-and-Future/10.1214/23-BA1381.full
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
Let me show you how to be a good b...
2
how they can help sampling Bayesian models
and their similarity with normalizing
3
flows that we discussed in episode 98.
4
ARTO also introduces Prelease, a tool for
prior elicitation, and highlights its
5
benefits in simplifying the process of
setting priors, thus improving the
6
accuracy of our models.
7
When ARTO is not solving mathematical
equations, you'll find him cycling or
8
around the good board game.
9
This is Learning Bayesian Statistics,
episode 103.
10
recorded February 15, 2024.
11
Welcome to Learning Bayesian Statistics, a
podcast about Bayesian inference, the
12
methods, the projects, and the people who
make it possible.
13
I'm your host.
14
You can follow me on Twitter at Alex
underscore and Dora like the country for
15
any info about the show.
16
Learnbasedats .com is left last to me.
17
Show notes, becoming a corporate sponsor,
unlocking Bayesian Merch, supporting the
18
show on Patreon.
19
Everything is in there.
20
That's Learnbasedats .com.
21
If you're interested in one -on -one
mentorship, online courses, or statistical
22
consulting, feel free to reach out and
book a call at topmate .io slash Alex
23
underscore.
24
And Dora, see you around, folks, and best
wishes to you all.
25
Clemmy, welcome to Layer Name Patient
Statistics.
26
Thank you.
27
You're welcome.
28
How was my Finnish pronunciation?
29
Oh, I think that was excellent.
30
For people who don't have the video, I
don't think that was true.
31
So thanks a lot for taking the time,
Artho.
32
I'm really happy to have you on the show.
33
And I've had a lot of questions for you
for a long time, and the longer we
34
postpone the episode, the more questions.
35
So I'm gonna do my best to not take three
hours of your time.
36
And let's start by...
37
maybe defining the work you're doing
nowadays and well, how do you end up
38
working on this?
39
Yes, sure.
40
So I personally identify as a machine
learning researcher.
41
So I do machine learning research, but
very much from a Bayesian perspective.
42
So my original background is in computer
science.
43
I'm essentially a self -educated
statistician in the sense that I've never
44
really
45
kind of studied properly statistics
design, well except for a few courses here
46
and there.
47
But I've been building models, algorithms,
building on the Bayesian principles for
48
addressing various kinds of machine
learning problems.
49
So you're basically like a self -taught
statistician through learning, let's say.
50
More or less, yes.
51
I think the first things I started doing,
52
with anything that had to do with Bayesian
statistics was pretty much already going
53
to the deep end and trying to learn
posterior inference for fairly complicated
54
models, even actually non -parametric
models in some ways.
55
Yeah, we're going to dive a bit on that.
56
Before that, can you tell us the topics
you are particularly focusing on through
57
that?
58
umbrella of topics you've named.
59
Yes, absolutely.
60
So I think I actually have a few somewhat
distinct areas of interest.
61
So on one hand, I'm working really on the
kind of core inference problem.
62
So how do we computationally efficiently,
accurately enough approximate the
63
posterior distributions?
64
Recently, we've been especially working on
inference algorithms that build on
65
concepts from Riemannian geometry.
66
So we're trying to really kind of account
the actual manifold induced by this
67
posterior distribution and try to somehow
utilize these concepts to kind of speed up
68
inference.
69
So that's kind of one very technical
aspect.
70
Then there's the other main theme on the
kind of Bayesian side is on priors.
71
So we'll be working on prior elicitation.
72
So how do we actually go about specifying
the prior distributions?
73
and ideally maybe not even specifying.
74
So how would we extract that knowledge
from a domain expert who doesn't
75
necessarily even have any sort of
statistical training?
76
And how do we flexibly represent their
true beliefs and then encode them as part
77
of a model?
78
That's maybe the main kind of technical
aspects there.
79
Yeah.
80
Yeah.
81
No, super fun.
82
And we're definitely going to dive into
those two aspects a bit later in the show.
83
I'm really interested in that.
84
Before that, do you remember how you first
got introduced to Bayesian inference,
85
actually, and also why it sticks with you?
86
Yeah, like I said, I'm in some sense self
-trained.
87
I mean, coming with the computer science
background, we just, more or less,
88
sometime during my PhD,
89
I was working in a research group that was
led by Samuel Kaski.
90
When I joined the group, we were working
on neural networks of the kind that people
91
were interested in.
92
That was like 20 years ago.
93
So we were working on things like self
-organizing maps and these kind of
94
methods.
95
And then we started working on
applications where we really bumped into
96
the kind of small sample size problems.
97
So looking at...
98
DNA microarray data that was kind of tens
of thousands of dimensions and medical
99
applications with 20 samples.
100
So we essentially figured out that we're
gonna need to take the kind of uncertainty
101
into account properly.
102
Started working on the Bayesian modeling
side of these and one of the very first
103
things I was doing is kind of trying to
create Bayesian versions of some of these
104
classical analysis methods that were
105
especially canonical correlation analysis.
106
That's the original derivation is like an
information theoretic formulation.
107
So I kind of dive directly into this that
let's do Bayesian versions of models.
108
But I actually do remember that around the
same time I also took a course, a course
109
by Akivehtari.
110
He's his author of this Gelman et al.
111
book, one of the authors.
112
I think the first version of the book had
been released.
113
just before that.
114
So Aki was giving a course where he was
teaching based on that book.
115
And I think that's the kind of first real
official contact on trying to understand
116
the actual details behind the principles.
117
Yeah, and actually I'm pretty sure
listeners are familiar with Aki.
118
He's been on the show already, so I'll
link to the episode, of course, where Aki
119
was.
120
And yeah, for sure.
121
I also recommend going through these
episodes, show notes for people who are
122
interested in, well, starting learning
about basic stuff and things like that.
123
Something I'm wondering from what you just
explained is, so you define yourself as a
124
machine learning researcher, right?
125
And you work in artificial intelligence
too.
126
But there is this interaction with the
Bayesian framework.
127
How does that framework underpin your
research in statistical machine learning
128
and artificial intelligence?
129
How does that all combine?
130
Yeah.
131
Well, that's a broad topic.
132
There's of course a lot in that
intersection.
133
I personally do view all learning problems
in some sense from a Bayesian perspective.
134
I mean, no matter what kind of a, whether
it's a very simple fitting a linear
135
regression type of a problem or whether
it's figuring out the parameters of a
136
neural network with 1 billion parameters,
it's ultimately still a statistical
137
inference problem.
138
I mean, most of the cases, I'm quite
confident that we can't figure out the
139
parameters exactly.
140
We need to somehow quantify for the
uncertainty.
141
I'm not really aware of any other kind of
principled way of doing it.
142
So I would just kind of think about it
that we're always doing Bayesian inference
143
in some sense.
144
But then there's the issue of how far can
we go in practice?
145
So it's going to be approximate.
146
It's possibly going to be very crude
approximations.
147
But I would still view it through the lens
of Bayesian statistics in my own work.
148
And that's what I do when I teach for my
BSc students, for example.
149
I mean not all of them explicitly
formulate the learning algorithms kind of
150
from these perspectives but we are still
kind of talking about that what's the
151
relationship what can we assume about the
algorithms what can we assume about the
152
result and how would it relate to like
like properly estimating everything
153
through kind of exactly how it should be
done.
154
Yeah okay that's an interesting
perspective yeah so basically putting that
155
in a in that framework.
156
And that means, I mean, that makes me
think then, how does that, how do you
157
believe, what do you believe, sorry, the
impact of Bayesian machine learning is on
158
the broader field of AI?
159
What does that bring to that field?
160
It's a, let's say it has a big effect.
161
It has a very big impact in a sense that
pretty much most of the stuff that is
162
happening on the machine learning front
and hence also on the kind of all learning
163
based AI solutions.
164
It is ultimately, I think a lot of people
are thinking about roughly in the same way
165
as I am, that there is an underlying
learning problem that we would ideally
166
want to solve more or less following
exactly the Bayesian principles.
167
don't necessarily talk about it from this
perspective.
168
So you might be happy to write algorithms,
all the justification on the choices you
169
make comes from somewhere else.
170
But I think a lot of people are kind of
accepting that it's the kind of
171
probabilistic basis of these.
172
So for instance, I think if you think
about the objectives that people are
173
optimizing in deep learning, they're all
essentially likelihoods of some
174
assume probabilistic model.
175
Most of the regularizers they are
considering do have an interpretation of
176
some kind of a prior distribution.
177
I think a lot of people are all the time
going deeper and deeper into actually
178
explicitly thinking about it from these
perspectives.
179
So we have a lot of these deep learning
type of approaches, various autoencoders,
180
Bayesian neural networks, various kinds of
generative AI models that are
181
They are actually even explicitly
formulated as probabilistic models and
182
some sort of an approximate inference
scheme.
183
So I think the kind of these things are,
they are the same two sides of the same
184
coin.
185
People are kind of more and more thinking
about them from the same perspective.
186
Okay, yeah, that's super interesting.
187
Actually, let's start diving into these
topics from a more technical perspective.
188
So you've mentioned the
189
research and advances you are working on
regarding Romanian spaces.
190
So I think it'd be super fun to talk about
that because we've never really talked
191
about it on the show.
192
So maybe can you give listeners a primer
on what a Romanian space is?
193
Why would you even care about that?
194
And what you are doing in this regard,
what your research is in this regard.
195
Yes, let's try.
196
I mean, this is a bit of a mathematical
concept to talk about.
197
But I mean, ultimately, if you think about
most of the learning algorithms, so we are
198
kind of thinking that there are some
parameters that live in some space.
199
So we essentially, without thinking about
it, that we just assume that it's a
200
Euclidean space in a sense that we can
measure distances between two parameters,
201
that how similar they are.
202
It doesn't matter which direction we go,
if the distance is the same, we think that
203
they are kind of equally far away.
204
So now a Riemannian geometry is one that
is kind of curved in some sense.
205
So we may be stretching the space in
certain ways and we'll be doing this
206
stretching locally.
207
So what it actually means, for example, is
that the shortest path between two
208
possible
209
values, maybe for example two parameter
configurations, that if you start
210
interpolating between two possible values
for a parameter, it's going to be a
211
shortest path in this Riemannian geometry,
which is not necessarily a straight line
212
in an underlying Euclidean space.
213
So that's what the Riemannian geometry is
in general.
214
So it's kind of the tools and machinery we
need to work with these kind of settings.
215
And now then the relationship to
statistical inference comes from trying to
216
define such a Riemannian space that it has
somehow nice characteristics.
217
So maybe the concept that most of the
people actually might be aware of would be
218
the Fisher information matrix that kind of
characterizes the kind of the curvature
219
induced by a particular probabilistic
model.
220
So these tools kind of then allow, for
example, a very recent thing that we did,
221
it's going to come out later this spring
in AI stats, is an extension of the
222
Laplace approximation in a Riemannian
geometry.
223
So those of you who know what the Laplace
approximation is, it's essentially just
224
fitting a normal distribution at the mode
of a distribution.
225
But if we now fit the same normal
distribution in a suitably chosen
226
Riemannian space,
227
we can actually model also the kind of
curvature of the posterior mode and even
228
kind of how it stretches.
229
So we get a more flexible approximation.
230
We are still fitting a normal
distribution.
231
We're just doing it in a different space.
232
Not sure how easy that was to follow, but
at least maybe it gives some sort of an
233
idea.
234
Yeah, yeah, yeah.
235
That was actually, I think, a pretty
approachable.
236
introduction and so if I understood
correctly then you're gonna use these
237
Romanian approximations to come up with
better algorithms is that what you do and
238
why you focus on Romanian spaces and yeah
if you can if you can introduce that and
239
tell us basically why that is interesting
to then look
240
at geometry from these different ways
instead of the classical Euclidean way of
241
things geometry.
242
Yeah, I think that's exactly what it is
about.
243
So one other thing, maybe another
perspective of thinking about it is that
244
we've also been doing Markov chain Monte
Carlo algorithms, so MCMC in these
245
Riemannian spaces.
246
And what we can achieve with those is that
if you have, let's say, a posterior
247
distribution,
248
that has some sort of a narrow funnel,
some very narrow area that extends far
249
away in one corner of your parameter
space.
250
It's actually very difficult to get there
with something like standard Hamiltonian
251
Monte Carlo, but with the Riemannian
methods we can kind of make these narrow
252
funnels equally easy compared to the
flatter areas.
253
Now of course this may sound like a magic
bullet that we should be doing all
254
inference with these techniques.
255
Of course it does come with
256
certain computational challenges.
257
So we do need to be, like I said, the
shortest paths are no longer straight
258
lines.
259
So we need numerical integration to follow
the geodesic paths in these metrics and so
260
on.
261
So it's a bit of a compromise, of course.
262
So they have very nice theoretical
properties.
263
We've been able to get them working also
in practice in many cases so that they are
264
kind of comparable with the current state
of the art.
265
But it's not always easy.
266
Yeah, there is no free lunch.
267
Yes.
268
Yeah.
269
Yeah.
270
Do you have any resources about these?
271
Well, first the concepts of Romanian
spaces and then the algorithms that you
272
folks derived in your group using these
Romanian space for people who are
273
interested?
274
Yeah, I think I wouldn't know, let's say a
very particular
275
reasons I would recommend on the Romanian
geometry.
276
It is actually a rather, let's say,
mathematically involved topic.
277
But regarding the specific methods, I
think they are...
278
It's a couple of my recent papers, so we
have this Laplace approximation is coming
279
out in AI stats this year.
280
The MCMC sampler we had, I think, two
years ago in AI stats, similarly, the
281
first MCMC method building on these and
then...
282
last year one paper on transactions of
machine learning research.
283
I think they are more or less accessible.
284
Let's definitely link to those papers if
you can in the show notes because I'm
285
personally curious about it but also I
think listeners will be.
286
It sounds from what you're saying that
this idea of doing algorithms in this
287
Romanian space is
288
somewhat recent.
289
Am I right?
290
And why would it appear now?
291
Why would it become interesting now?
292
Well, it's not actually that recent.
293
I think the basic principle goes back, I
don't know, maybe 20 years or so.
294
I think the main reason why we've been
working on this right now is that the
295
We've been able to resolve some of the
computational challenges.
296
So the fundamental problem with these
models is always this numeric integration
297
of following the shortest paths depending
on an algorithm we needed for different
298
reasons, but we always needed to do it,
which usually requires operations like
299
inversion of a metric tensor, which has
the kind of a dimensionality of the
300
parameter space.
301
So we came up with the particular metric.
302
that happens to have computationally
efficient inverse.
303
So there's kind of this kind of concrete
algorithmic techniques that are kind of
304
bringing the computational cost to the
level so that it's no longer notably more
305
expensive than doing kind of standard
Euclidean methods.
306
So we can, for example, scale them for
Bayesian neural networks.
307
That's one of the application cases we are
looking at.
308
We are really having very high
-dimensional problems but still able to do
309
some of these Riemannian techniques or
approximations of them.
310
That was going to be my next question.
311
In which cases are these approximations
interesting?
312
In which cases would you recommend
listeners to actually invest time to
313
actually use these techniques because they
have a better chance of working than the
314
classic Hamiltonian Monte Carlo semper
that are the default in most probabilistic
315
languages?
316
Yeah, I think the easy answer is that when
the inference problem is hard.
317
So essentially one very practical way
would be that if you realize that you
318
can't really get a Hamiltonian Monte Carlo
to explore the space, the posterior
319
properly, that it may be difficult to find
out that this is happening.
320
Of course, if you're ever visiting a
certain corner, you wouldn't actually
321
know.
322
But if you have some sort of a reason to
believe that you really are handling with
323
such a complex posterior that I'm kind of
willing to spend a bit more extra
324
computation to be careful so that I really
try to cover every corner there is.
325
Another example is that we realized on the
scope of these Bayesian neural networks
326
that there are certain kind of classical
327
Well, certain kind of scenarios where we
can show that if you do inference with the
328
two simple methods, so something in the
Euclidean metric with the standard
329
Vangerman dynamics type of a thing, what
we actually see is that if you switch to
330
using better prior distributions in your
model, you don't actually see an advantage
331
of those unless you at the same time
switch to using an inference algorithm
332
that is kind of able to handle the extra
complexity.
333
So if you have for example like
334
heavy tail spike and slap type of priors
in the neural network.
335
You just kind of fail to get any benefit
from these better priors if you don't pay
336
a bit more attention into how you do the
inference.
337
Okay, super interesting.
338
And also, so that seems it's also quite
interesting to look at that when you have,
339
well, or when you suspect that you have
multi -modal posteriors.
340
Yes, well yeah, multimodal posteriors are
interesting.
341
I'm not, we haven't specifically studied
like this question that is there and we
342
have actually thought about some ideas of
creating metrics that would specifically
343
encourage exploring the different modes
but we haven't done that concretely so we
344
now still focusing on these kind of narrow
thin areas of posteriors and how can you
345
kind of reach those.
346
Okay.
347
And do you know of normalizing flows?
348
Sure, yes.
349
So yeah, we've had Marie -Lou Gabriel on
the show recently.
350
It was episode 98.
351
And so she's working a lot on these
normalizing flows and the idea of
352
assisting NCMC sampling with these machine
learning methods.
353
And it's amazing.
354
can sound somewhat similar to what you do
in your group.
355
And so for listeners, could you explain
the difference between the two ideas and
356
maybe also the use cases that both apply
to it?
357
Yeah, I think you're absolutely right.
358
So they are very closely related.
359
So there are, for example, the basic idea
of the neural transport that uses
360
normalizing flows for
361
essentially transforming the parameter
space in a suitable non -linear way and
362
then running standard Euclidean
Hamiltonian Monte Carlo.
363
It can actually be proven.
364
I think it is in the original paper as
well that I mean it is actually
365
mathematically equivalent to conducting
Riemannian inference in a suitable metric.
366
So I would say that it's like a
complementary approach of solving exactly
367
the same problem.
368
So you have a way of somehow in a flexible
way warping your parameter space.
369
You either do it through a metric or you
kind of do it as a pre -transformation.
370
So there's a lot of similarities.
371
It's also the computation in some sense
that if you think about mapping...
372
sample through a normalizing flow.
373
It's actually very close to what we do
with the Riemannian Laplace approximation
374
that you start kind of take a sample and
you start propagating it through some sort
375
of a transformation.
376
It's just whether it's defined through a
metric or as a flow.
377
So yes, so they are kind of very close.
378
So now the question is then that when
should I be using one of these?
379
I'm afraid I don't really have an answer.
380
that in a sense that I mean there's
computational properties on let's say for
381
example if you've worked with flows you do
need to pre -train them so you do need to
382
train some sort of a flow to be able to
use it in certain applications so it comes
383
with some pre -training cost.
384
Quite likely during when you're actually
using it it's going to be faster than
385
working in a Riemannian metric where you
need to invert some metric tensors and so
386
on.
387
So there's kind of like technical
differences.
388
Then I think the bigger question is of
course that if we go to really challenging
389
problems, for example, very high
dimensions, that which of these methods
390
actually work well there.
391
For that I don't quite now have an answer
in the sense that I would dare to say that
392
or even speculate that which of these
things I might miss some kind of obvious
393
limitations of one of the approaches if
trying to kind of extrapolate too far.
394
from what we've actually tried in
practice.
395
Yeah, that's what I was going to say.
396
It's also that these methods are really at
the frontier of the science.
397
So I guess we're lacking, we're lacking
for now the practical cases, right?
398
And probably in a few years we'll have
more ideas of these and when one is more
399
appropriate than another.
400
But for now, I guess we have to try.
401
those algorithms and see what we get back.
402
And so actually, what if people want to
try these Romanian based algorithms?
403
Do you have already packages that we can
link to that people can try and plug their
404
own model into?
405
Yes and no.
406
So we have released open source code with
each of the research papers.
407
So there is a reference implementation
that
408
can be used.
409
We have internally been integrating these,
kind of working a bit towards integrating
410
the kind of proper open ecosystems that
would allow, make like for example model
411
specification easy.
412
It's not quite there yet.
413
So there's one particular challenge is
that many of the environments don't
414
actually have all the support
functionality you need for the Riemannian
415
methods.
416
They're essentially simplifying some of
the things that directly encoding these
417
assumptions that the shortest path is an
interpolation or it's a line.
418
So you need a bit of an extra machinery
for the most established libraries.
419
There are some libraries, I believe, that
are actually making it fairly easy to do
420
kind of plug and play Riemannian metrics.
421
I don't remember the names right now, but
that's where we've kind of been.
422
planning on putting in the algorithms, but
they're not really there yet.
423
Hmm, OK, I see.
424
Yeah, definitely that would be, I guess,
super, super interesting.
425
If by the time of release, you see
something that people could try,
426
definitely we'll link to that, because I
think listeners will be curious.
427
And I'm definitely super curious to try
that.
428
Any new stuff like that, or you'd like to?
429
try and see what you can do with it.
430
It's always super interesting.
431
And I've already seen some very
interesting experiments done with
432
normalizing flows, especially Bayox by
Colin Carroll and other people.
433
Colin Carroll is one of the EasyPindC
developer also.
434
And yeah, now you can use Bayox to take
any
435
a juxtifiable model and you plug that into
it and you can use the flow MC algorithm
436
to sample your juxtifiable PIMC model.
437
So that's really super cool.
438
And I'm really looking forward to more
experiments like that to see, well, okay,
439
what can we do with those algorithms?
440
Where can we push them to what extent, to
what degree, where do they fall down?
441
That's really super interesting, at least
for me, because I'm not a mathematician.
442
So when I see that, I find that super,
like, I love the idea of, basically the
443
idea is somewhat simple.
444
It's like, okay, we have that problem when
we think about geometry that way, because
445
then the geometry becomes a funnel, for
instance, as you were saying.
446
And then sampling at the bottom of the
funnel is just super hard in the way we do
447
it right now, because just super small
distances.
448
What if we change the definition of
distance?
449
What if we change the definition of
geometry, basically, which is this idea
450
of, OK, let's switch to Romanian space.
451
And the way we do that, then, well, the
funnel disappears, and it just becomes
452
something easier.
453
It's just like going beyond the idea of
the centered versus non -centered
454
parameterization, for instance, when you
do that in model, right?
455
But it's going big with that because it's
more general.
456
So I love that idea.
457
I understand it, but I cannot really read
the math and be like, oh, OK, I see what
458
that means.
459
So I have to see the model and see what I
can do and where I can push it.
460
And then I get a better understanding of
what that entails.
461
Yeah, I think you gave a much better
summary of what it is doing than I did.
462
So good for that.
463
I mean, you are actually touching that, of
course.
464
So there's the one point is making the
algorithms.
465
available so that everyone could try them
out.
466
But then there's also the other aspect
that we need to worry about, which is the
467
proper evaluation of what they're doing.
468
I mean, of course, most of the papers when
you release a new algorithm, you need to
469
emphasize things like, in our case,
computational efficiency.
470
And you do demonstrate that it, maybe for
example, being quite explicitly showing
471
that these very strong funnels, it does
work better with those.
472
But now then the question is of course
that how reliable these things are if used
473
in a black box manner in a so that someone
just runs them on their favorite model.
474
And one of the challenges we realized is
that it's actually very hard to evaluate
475
how well an algorithm is working in an
extremely difficult case.
476
Because there is no baseline.
477
I mean, in some of the cases we've been
comparing that let's try to do...
478
standard Hamiltonian MCMC on nuts as
carefully as we can.
479
And they kind of think that this is the
ground truth, this is the true posterior.
480
But we don't really know whether that's
the case.
481
So if it's hard enough case, our kind of
supposed ground truth is failing as well.
482
And it's very hard to kind of then we
might be able to see that our solution
483
differs from that.
484
But then we would need to kind of
separately go and investigate that which
485
one was wrong.
486
And that is a practical challenge,
especially if you would like to have a
487
broad set of models.
488
And we would want to show somehow
transparently for the kind of end users
489
that in these and these kind of problems,
this and that particular method, whether
490
it's one of ours or something else, any
other new fancy.
491
When do they work when they don't?
492
Without relying that we really have some
particular method that they already trust
493
and we kind of, if it's just compared to
it, we can't kind of really convince
494
others that is it correct when it is
differing from what we kind of used to
495
rely on.
496
Yeah, that's definitely a problem.
497
That's also a question I asked Marilu.
498
when she was on the show and then that was
kind of the same answer if I remember
499
correctly that for now it's kind of hard
to do benchmarks in a way, which is
500
definitely an issue if you're trying to
work on that from a scientific perspective
501
as well.
502
If we were astrologists, that'd be great,
like then we'd be good.
503
But if you're a scientist, then you want
to evaluate your methods and...
504
And finding a method to evaluate the
method is almost as valuable as finding
505
the method in the first place.
506
And where do you think we are on that
regarding in your field?
507
Is that an active branch of the research
to try and evaluate these algorithms?
508
How would that even look like?
509
Or are we still really, really at a very
early time for that work?
510
That's a...
511
Very good question.
512
So I'm not aware of a lot of people that
would kind of specifically focus on
513
evaluation.
514
So for example, Aki has of course been
working a lot on that, trying to kind of
515
create diagnostics and so on.
516
But then if we think about more on the
flexible machine learning side, I think my
517
hunch is that it's the individual research
groups are kind of all circling around the
518
same problems that they are kind of trying
to figure out that, okay,
519
Every now and then someone invents a fancy
way of evaluating something.
520
It introduces a particular type of
synthetic scenario where I think that the
521
most common in tries that what people do
is that you create problems where you
522
actually have an analytic posterior and
it's somehow like an artificial problem
523
that you take a problem and you transform
it in a given way and then you assume that
524
I didn't have the analytic one.
525
But they are all, I mean, they feel a bit
artificial.
526
They feel a bit synthetic.
527
So let's see.
528
It would maybe be something that the
community should kind of be talking a bit
529
more about on a workshop or something
that, OK, let's try to really think about
530
how to verify the robustness or possibly
identify that these things are not really
531
ready or reliable for practical use in
very serious applications yet.
532
Yeah.
533
I haven't been following very closely
what's happening, so I may be missing some
534
important works that are already out
there.
535
Okay, yeah.
536
Well, Aki, if you're listening, send us a
message if we forgot something.
537
And second, that sounds like there are
some interesting PhDs to do on the issue,
538
if that's still a very new branch of the
research.
539
So, people?
540
If you're interested in that, maybe
contact Arto and we'll see.
541
Maybe in a few months or years, you can
come here on the show and answer the
542
question I just asked.
543
Another aspect of your work I really want
to talk about also that I really love and
544
now listeners can relax because that's
going to be, I think, less abstract and
545
closer to their user experience.
546
is about priors.
547
You talked about it a bit at the
beginning, especially you are working and
548
you worked a lot on a package called
Prelease that I really love.
549
One of my friends and fellow Pimc
developers, Osvaldo Martin, is also
550
collaborating on that.
551
And you guys have done a tremendous job on
that.
552
So yeah, can you give people a primer
about Prelease?
553
What is it?
554
When could they use it and what's its
purpose in general?
555
Maybe I need to start by saying that I
haven't worked a lot on prelease.
556
Osvaldo has and a couple of others, so
I've been kind of just hovering around and
557
giving a bit of feedback.
558
But yeah, so I'll maybe start a bit
further away, so not directly from
559
prelease, but the whole question of prior
elicitation.
560
So I think the...
561
Yeah.
562
What we've been working with that is the
prior elicitation is simply an, I would
563
frame it as that it's some sort of
unusually iterative approach of
564
communicating with the domain expert where
the goal is to estimate what's their
565
actual subjective prior knowledge is on
whatever parameters the model has and
566
doing it so that it's like cognitively
easy for the expert.
567
So many of the algorithms that we've been
working on this are based on this idea of
568
predictive elicitation.
569
So if you have a model where the
parameters don't actually have a very
570
concrete, easily understandable meaning,
you can't actually start asking questions
571
from the expert about the parameters.
572
It would require them to understand fully
the model itself.
573
The predictive elicitation techniques kind
of ask
574
communicate with the expert usually in the
space of the observable quantities.
575
So they're trying to make that is this
somehow more likely realization than this
576
other one.
577
And now this is where the prelease comes
into play.
578
So when we are communicating with the
user, so most of the times the information
579
we show for the user is some sort of
visualizations.
580
of predictive distributions or possibly
also about the parameter distributions
581
themselves.
582
So we need an easy way of communicating
whether it's histograms of predicted
583
values and whatnot.
584
So how do we show those for a user in
scenarios where the model itself is some
585
sort of a probabilistic program so we
can't kind of fixate to a given model
586
family.
587
That's actually what's the main role of
Prelease is essentially making it easy to
588
interface with the user.
589
Of course, Prelease also then includes
these algorithms themselves.
590
So, algorithms for estimating the prior
and the kind of interface components for
591
the expert to give information.
592
So, make a selection, use a slider that I
would want my distribution to be a bit
593
more skewed towards the right and so on.
594
That's what we are aiming at.
595
A general purpose tool that would be used,
it's essentially kind of a platform for
596
developing and kind of bringing into use
all kinds of prioritization techniques.
597
So it's not tied to any given algorithm or
anything but you just have the components
598
and could then easily kind of commit,
let's say, a new type of prioritization
599
algorithm into the library.
600
Yeah and I re -encourage
601
folks to go take a look at the prelease
package.
602
I put the link in the show notes because,
yeah, as you were saying, that's a really
603
easier way to specify your priors and also
elicit them if you need the intervention
604
of non -statisticians in your model, which
you often do if the model is complex
605
enough.
606
So yeah, like...
607
I'm using it myself quite a lot.
608
So thanks a lot guys for this work.
609
So Arto, as you were saying, Osvaldo
Martín is one of the main contributors,
610
Oriol Abril Blas also, and Alejandro
Icazati, if I remember correctly.
611
So at least these four people are the main
contributors.
612
And yeah, so I definitely encourage people
to go there.
613
What would you say, Arto, are the...
614
like the Pareto effect, what would it be
if people want to get started with
615
Prelease?
616
Like the 20 % of uses that will give you
80 % of the benefits of Prelease for
617
someone who don't know anything about it.
618
That's a very good question.
619
I think the most important thing actually
is to realize that we need to be careful
620
when we set the priors.
621
So simply being aware that you need a tool
for this.
622
You need a tool that makes it easy to do
something like a prior predictive check.
623
You need a tool that relieves you from
figuring out how do I inspect.
624
my priors or the effects it has on the
model.
625
That's actually where the real benefit is.
626
You get most of the...
627
when you kind of try to bring it as part
of your Bayesian workflow in a kind of a
628
concrete step that you identify that I
need to do this.
629
Then the kind of the remaining tale of
this thing is then of course that the...
630
maybe in some cases you have such a
complicated model that you really need to
631
deep dive and start...
632
running algorithms that help you eliciting
the priors.
633
And I would actually even say that the
elicitation algorithms, I do perceive them
634
useful even when the person is actually a
statistician.
635
I mean, there's a lot of models that we
may think that we know how to set the
636
priors.
637
But what we are actually doing is
following some very vague ideas on what's
638
the effect.
639
And we may also make
640
severe mistakes or spend a lot of time in
doing it.
641
So to an extent these elicitation
interfaces, I believe that ultimately they
642
will be helping even kind of hardcore
statisticians in just kind of doing it
643
faster, doing it slightly better, doing it
perhaps in a more better documented
644
manner.
645
So you could for example kind of store all
the interaction the modeler had.
646
with these things and kind of put that
aside that this is where we got the prior
647
from instead of just trial and error and
then we just see at the end the result.
648
So you could kind of revisit the choices
you made during an elicitation process
649
that I discarded these predictive
distributions for some reason and then you
650
can later kind of, okay I made a mistake
there maybe I go and change my answer in
651
that part and then an algorithm provides
you an updated prior.
652
without you needing to actually go through
the whole prior specification process
653
again.
654
Yeah.
655
Yeah.
656
Yeah, I really love that.
657
And that makes the process of setting
priors more reproducible, more transparent
658
in a way.
659
That makes me think a bit of the scikit
-learn pipelines that you use to transform
660
the data.
661
For instance, you just set up the pipeline
and you say, I want to standardize my
662
data, for instance.
663
And then you have that pipeline ready.
664
And when you do the auto sample
predictions, you can use the pipeline and
665
say, okay, now like do that same
transformation on these new data so that
666
we're sure that it's done the right way,
but it's still transparent and people know
667
what's going on here.
668
It's a bit the same thing, but with the
priors.
669
And I really love that because that makes
it also easier for people to think about
670
the priors and to actually choose the
priors.
671
Because.
672
What I've seen in teaching is that
especially for beginners, even more when
673
they come from the Frequentis framework,
sending the priors can be just like
674
paralyzing.
675
It's like products of choice.
676
It's way too many, way too many choices.
677
And then they end up not choosing anything
because they are too afraid to choose the
678
wrong prior.
679
Yes, I fully agree with that.
680
I mean, there's a lot of very simple
models.
681
that already start having six, seven,
eight different univariate priors there.
682
And then I've been working with these
things for a long time and I still very
683
easily make stupid mistakes that I'm
thinking that I increase the variance of
684
this particular prior here, thinking that
what I'm achieving is, for example, higher
685
predictive variance as well.
686
And then I realized that, no, that's not
the case.
687
It's actually...
688
Later in the model, it plays some sort of
a role and it actually has the opposite
689
effect.
690
It's hard.
691
Yeah.
692
Yeah.
693
That stuff is really hard and same here.
694
When I discovered that, I'm extremely
frustrated because I'm like, I always did
695
hours on these, whereas if I had a more
producible pipeline, that would just have
696
been handled automatically for me.
697
So...
698
Yeah, for sure.
699
We're not there yet in the workflow, but
that definitely makes it way easier.
700
So yeah, I absolutely agree that we are
not there yet.
701
I mean, the Prellis is a very well
-defined tool that allows us to start
702
working on it.
703
But I mean, then the actual concrete
algorithms that would make it easy to
704
let's say for example, avoid these kind of
stupid mistakes and be able to kind of
705
really reduce the effort.
706
So if it now takes two weeks for a PhD
student trying to think about and fiddle
707
with the prior, so can we get to one day?
708
Can we get it to one hour?
709
Can we get it to two minutes of a quick
interaction?
710
And probably not two minutes, but if we
can get it to one hour and it...
711
It will require lots of things.
712
It will require even better of this kind
of tooling.
713
So how do we visualize, how do we play
around with it?
714
But I think it's going to require quite a
bit better algorithms on how do you, from
715
kind of maximally limited interaction, how
do you estimate.
716
what the prior is and how you design the
kind of optimal questions you should be
717
asking from the expert.
718
There's no point in kind of reiterating
the same things just to fine -tune a bit
719
one of the variances of the priors if
there is a massive mistake still somewhere
720
in the prior and a single question would
be able to rule out half of the possible
721
scenarios.
722
It's going to be an interesting...
723
let's say, rise research direction, I
would say, for the next 5, 10 years.
724
Yeah, for sure.
725
And very valuable also because very
practical.
726
So for sure, again, a great PhD
opportunity, folks.
727
Yeah, yeah.
728
Also, I mean, that may be hard to find
those algorithms that you were talking
729
about because it is hard, right?
730
I know I worked on the...
731
find constraint prior function that we
have in PMC now.
732
And it's just like, it seemed like a very
simple case.
733
It's not even doing all the fancy stuff
that Prellis is doing.
734
It's mainly just optimizing distribution
so that it fits the constraints that you
735
are giving it.
736
Like for instance, I want a gamma with 95
% of the mass between 2 and 6.
737
Give me the...
738
parameters that fit that constraint.
739
That's actually surprisingly hard
mathematically.
740
You have a lot of choices to make, you
have a lot of things to really be careful
741
about.
742
And so I'm guessing that's also one of the
hurdles right now in that research.
743
Yeah, it absolutely is.
744
I mean, I would say at least I'm
approaching this.
745
more or less from an optimization
perspective then that I mean, yes, we are
746
trying to find a prior that best satisfies
whatever constraints we have and trying to
747
formulate an optimization problem of some
kind that gets us there.
748
This is also where I think there's a lot
of room for the, let's say flexible
749
machine learning tools type of things.
750
So, I mean, if you think about the prior
that satisfies these constraints, we could
751
be specifying it with some sort of a
flexible
752
not a particular parametric prior but some
sort of a flexible representation and then
753
just kind of optimizing for within a much
broader set of this.
754
But then of course it requires completely
different kinds of tools that we are used
755
to working on.
756
It also requires people accepting that our
priors may take arbitrary shapes.
757
They may be distributions that we could
have never specified directly.
758
Maybe they're multimodal.
759
priors that we kind of just infer that
this is what you couldn't really and
760
there's going to be also a lot of kind of
educational perspective on getting people
761
to accept this.
762
But even if I had to give you a perfect
algorithm that somehow cranks out a prior
763
and then you look at the prior and you're
saying that I don't even know what
764
distribution this is, I would have never
ever converged into this if I was manually
765
doing this.
766
So will you accept?
767
that that's your prior or will you insist
that your method is doing something
768
stupid?
769
I mean, I still want to use my my Gaussian
prior here.
770
Yeah, that's a good point.
771
And in a way that's kind of related to a
classic problem that you have when you're
772
trying to automate a process.
773
I think there's the same issue with the
automated cars, like those self -driving
774
cars, where people actually trust the cars
more if they think they have
775
some control over it.
776
I've seen interesting experiments where
they put a placebo button in the car that
777
people could push on to override if they
wanted to, but the button wasn't doing
778
anything.
779
People are saying they were more
trustworthy of these cars than the
780
completely self -driving cars.
781
That's also definitely something to take
into account, but that's more related to
782
the human psychology than to the
algorithms per se.
783
related to human psychology but it's also
related to this evaluation perspective.
784
I mean of course if we did have a very
robust evaluation pattern that somehow
785
tells that once you start using these
techniques your final conclusions in some
786
sense will be better and if we can make
that kind of a very convincing then it
787
will be easier.
788
I mean if you think about, I mean there's
a lot of people that would say that
789
very massive neural network with four
billion parameters.
790
It would never ever be able to answer a
question given in a natural language.
791
A lot of people were saying that five
years ago that this is a pipeline, it's
792
never gonna happen.
793
Now we do have it and now everyone is
ready to accept that yes, it can be done.
794
And they are willing to actually trust
these judge -y pity type of models in a
795
lot of things.
796
And they are investing a lot of effort
into figuring out what to do with this.
797
It just needs this kind of very concrete
demonstration that there is value and that
798
it works well enough.
799
It will still take time for people to
really accept it, but I mean, I think
800
that's kind of the key ingredient.
801
Yeah, yeah.
802
I mean, it's also good in some way.
803
Like that skepticism makes the tools
better.
804
So that's good.
805
I mean, so we could...
806
Keep talking about Prolis because I have
other technical questions about that.
807
But actually, since you're like, that's a
perfect segue to a question I also had for
808
you because you have a lot of experience
in that field.
809
So how do you think can industries better
integrate the patient approaches into
810
their data science workflows?
811
Because that's basically what we ended up
talking about right now without me nudging
812
you towards it.
813
Yeah, I have actually indeed been thinking
about that quite a bit.
814
So I do a lot of collaboration with
industrial partners in different domains.
815
I think there's a couple of perspectives
to this.
816
So one is that, I mean, people are
finally, I think they are starting to
817
accept the fact that probabilistic
programming with kind of black box
818
automated inference is the only sensible
way.
819
doing statistical modeling.
820
So looking at back like 10 -15 years ago,
you would still have a lot of people,
821
maybe not in industry but in research in
different disciplines, in meteorology or
822
physics or whatever.
823
People would actually be writing
Metropolis -Hastings algorithms from
824
scratch, which is simply not reliable in
any sense.
825
I mean, it took time for them to accept
that yes, we can actually now do it with
826
something like Stan.
827
I think this is of course the way that to
an extent that there are problems that fit
828
well with what something like Stan or
Priency offers.
829
I think we've been educating long enough
master students who are kind of familiar
830
with these concepts.
831
Once they go to the industry they will use
them, they know roughly how to use them.
832
So that's one side.
833
But then the other thing is that I
think...
834
Especially in many of these predictive
industries, so whether it's marketing or
835
recommendation or sales or whatever,
people are anyway already doing a lot of
836
deep learning types of models there.
837
That's a routine tool in what they do.
838
And now if we think about that, at least
in my opinion, that these fields are
839
getting closer to each other.
840
So we have more and more deep learning
techniques that are, like various and
841
autoencoder is a prime example, but it is
ultimately a Bayesian model in itself.
842
This may actually be that they creep
through that all this bayesian thinking
843
and reasoning is actually getting into use
by the next generation of these deep
844
learning techniques that they are doing.
845
They've been building those models,
they've been figuring out that they cannot
846
get reliable estimates of uncertainty,
they maybe tried some ensembles or
847
whatnot.
848
And they will be following.
849
So once the tools are out there, there's
good enough tutorials on how to use those.
850
So they might start using things like,
let's say, Bayesian neural networks or
851
whatever the latest tool is at that point.
852
And I think this may be the easiest way
for the industries to do so.
853
They're not going to go switch back to
very simple classical linear models when
854
they do their analysis.
855
But they're going to make their deep
learning solutions Bayesian on some time
856
scale.
857
Maybe not tomorrow, but maybe in five
years.
858
Yeah, that's a very good point.
859
Yeah, I love that.
860
And of course, I'm very happy about that,
being one of the actors making the
861
industry more patient.
862
So I have a vested interest in these.
863
But yeah, also, I've seen the same
evolution you were talking about.
864
Right now, it's not even really an issue
of
865
convincing people to use these kind of
tools.
866
I mean, still from time to time, but less
and less.
867
And now the question is really more in
making those tools more accessible, more
868
versatile, easier to use, more reliable,
easier to deploy in industry, things like
869
that, which is a really good point to be
at for sure.
870
And to some extent, I think it's...
871
It's an interesting question also from the
perspective of the tools.
872
So to some extent, it may mean that we
just end up doing a lot of the kind of
873
Bayesian analysis on top of what we would
now call deep learning frameworks.
874
And it's going to be, of course, it's
going to be libraries building on top of
875
those.
876
So like NumPyro is a library building on
PyTorch.
877
But the syntax is kind of intentionally
similar to what they've used in
878
used to in the deep learning type of
modeling these.
879
And this is perfectly fine.
880
We are anyway using a lot of stochastic
optimization routines in Bayesian
881
inference and so on.
882
So they are actually very good tools for
building all kinds of Bayesian models.
883
And I think this may be the layer where
the industry use happens, that it's going
884
to be always.
885
They need the GPU type of scaling and
everything there anyway.
886
So just happy to have our systems.
887
work on top of these libraries.
888
Yeah, very good point.
889
And also to come back to one of the points
you've made in passing, where education is
890
helping a lot with that.
891
You have been educating now the data
scientists who go in industry.
892
And I know in Finland, in France, not that
much.
893
Where are you originally from?
894
But in Finland, I know there is this
really great integration between the
895
research part, the university and the
industry.
896
You can really see that in the PhD
positions, in the professorship positions
897
and stuff like that.
898
So I think that's really interesting and
that's why I wanted to talk to you about
899
that.
900
To go back to the education part, what
challenges and opportunities do you see in
901
teaching Bayesian machine learning as you
do at the university level?
902
Yeah, it's challenging.
903
I must say that.
904
I mean, especially if we get to the point
of well, Bayesian machine learning.
905
So it is a combination of two topics that
are somewhat difficult in itself.
906
So if we want to talk about normalizing
flows and then we want to talk about
907
statistical properties of estimators or
MCMC convergence.
908
So they require different kinds of
mathematical tools.
909
tools, they require a certain level of
expertise on the software, on the
910
programming side.
911
So what it means actually is that it's
even that if we look at the population of
912
let's say data science students, we can
always have a lot of people that are
913
missing background on one of these sites.
914
So I think this is a difficult topic to
teach.
915
If it was a small class, it would be fine.
916
But it appears to be that at least our
students are really excited about these
917
things.
918
So I can launch a course with explicitly a
title of a Bayesian machine learning,
919
which is like an advanced level machine
learning course.
920
And I would still get 60 to 100 students
enrolling on that course.
921
And then that means that within that
group, there's going to be some CS
922
students with almost no background on
statistics.
923
There's going to be some statisticians who
924
certainly know how to program but they're
not really used to thinking about GPU
925
acceleration of a very large model.
926
But it's interesting, I mean it's not an
impossible thing.
927
I think it is also a topic that you can
kind of teach on a sufficient level for
928
everyone.
929
So everyone agrees is able to understand
the basic reasoning of why we are doing
930
these things.
931
Some of the students may struggle,
932
figuring out all the math behind it.
933
But they might still be able to use these
tools very nicely.
934
They might be able to say that if I do
this and that kind of modification, I
935
realize that my estimates are better
calibrated.
936
And some others are really then going
deeper into figuring out why these things
937
work.
938
So it just needs a bit of creativity on
how do we do it and what do we expect from
939
the students.
940
What should they know once they've
completed a course like this?
941
Yeah, that makes sense.
942
Do you have seen also an increase in the
number of students in the recent years?
943
Well, we get as many students as we can
take.
944
So I mean, it's actually been for quite a
while already that in our university, by
945
far the most...
946
popular master's programs and bachelor's
programs are essentially data science and
947
computer science.
948
So we can't take in everyone we would
want.
949
So it actually looks to us that it's more
or less like a stable number of students,
950
but it's always been a large number since
we launched, for example, the data science
951
program.
952
So it went up very fast.
953
So there's definitely interest.
954
Yeah.
955
Yeah.
956
That's fantastic.
957
And...
958
So I've been taking a lot of your time.
959
So we're going to start to close up the
show, but there are at least two questions
960
I want to get your insight on.
961
And the first one is, what do you think
the biggest hurdle in the Bayesian
962
workflow currently is?
963
We've talked about that a bit already, but
how do you want to get your structured
964
answer?
965
Well, I think the first thing is that
getting people to actually start
966
using more or less systematic workflows.
967
I mean, the idea is great.
968
We kind of know more or less how we should
be thinking about it, but it's a very
969
complex object.
970
So we're going to be able to tell experts,
statisticians that, yes, this is roughly
971
how you should do.
972
Then we should still also convince them
that, like, almost force them to stick to
973
it.
974
But then especially if we then think about
newcomers, people who are just starting
975
with these things, it's a very complicated
thing.
976
So if you would need to read 50 page book
or 100 page book about Bayesian workflow
977
to even know how to do it, it's a
technical challenge.
978
So I think in long term, we are going to
get essentially tools for assisting it.
979
So really kind of streamlining the
process.
980
thinking of something like an AI assistant
for a person building a model that they
981
really kind of pull you that now I see
that you are trying to go there and do
982
this, but I see that you haven't done
prior predictive checks.
983
I actually already created some plots for
you.
984
Please take a look at these and confirm
that is this what you were expecting?
985
And it's going to be a lot of effort in
creating those.
986
It's something that we've been kind of
trying to think about.
987
how to do it, but it's still.
988
I think that's where the challenge is.
989
We know most of the stuff within the
workflow, roughly how it should be done.
990
At least we have good enough solutions.
991
But then really kind of helping people to
actually follow these principles, that's
992
gonna be hard.
993
Yeah, yeah, yeah.
994
But damn, that would be super cool.
995
Like talking about something like a Javis,
you know, like the AI assistant
996
environment, a Javis, but for...
997
Beijing models, how cool would that be?
998
Love that.
999
And looking forward, how do you see
Beijing methods evolving with artificial
Speaker:
intelligence research?
Speaker:
Yeah, I think.
Speaker:
For quite a while I was about to say that,
like I've been kind of building this basic
Speaker:
idea that the deep learning models as such
will become more and more basic in any
Speaker:
way.
Speaker:
So that's kind of a given.
Speaker:
But now of course, now the recent very
large scale AI models, they're getting so
Speaker:
big that then the question of
computational resources is, it's a major
Speaker:
hurdle to do learning for those models,
even in the crudest possible way.
Speaker:
So it may be that there's of course kind
of clear needs for uncertainty
Speaker:
quantification in the large language model
type of scopes.
Speaker:
They are really kind of unreliable.
Speaker:
They're really poor at, for example,
evaluating their own confidence.
Speaker:
So there's been some examples that if you
ask how sure you are about these states,
Speaker:
more or less irrespective of the
statement, give similar number.
Speaker:
Yeah, 50 % sure.
Speaker:
I don't know.
Speaker:
So it may be that the
Speaker:
It's not really, at least on a very short
run, it's not going to be the Bayesian
Speaker:
techniques that really sells all the
uncertainty quantification in those type
Speaker:
of models.
Speaker:
In the long term, it maybe is.
Speaker:
But I think there's a lot of...
Speaker:
It's going to be interesting.
Speaker:
It looks to me a bit that it's a lot of
stuff that's built on top of...
Speaker:
To address specific limitations of these
large language models, it is...
Speaker:
separate components.
Speaker:
It's some sort of an external tool that
reads in those inputs or it's an external
Speaker:
tool that the LLM can use.
Speaker:
So maybe this is going to be this kind of
a separate element that somehow
Speaker:
integrates.
Speaker:
So an LLM, of course, could be having an
API interface where it can query, let's
Speaker:
say, use tan.
Speaker:
to figure out an answer to type of a
question that requires probabilistic
Speaker:
reasoning.
Speaker:
So people have been plugging in, there's
this public famous examples where you can
Speaker:
query like some mathematical reasoning
engines and so on.
Speaker:
So that the LLM, if you ask a specific
type of a question, it goes outside of its
Speaker:
own realm and does something.
Speaker:
It already kind of knows how to program,
so maybe we just need to teach LLMs to do
Speaker:
statistical inference.
Speaker:
by relying on actually running an MCMC
algorithm on a model that they kind of
Speaker:
specify together with the user.
Speaker:
I don't know whether anyone is actually
working on that.
Speaker:
It's something that just came to my mind.
Speaker:
So I haven't really thought about this too
much.
Speaker:
Yeah, but again, we're getting so many PhD
ideas for people right now.
Speaker:
We are.
Speaker:
Yeah, I feel like we should be doing the
best of all your...
Speaker:
Awesome PhD ideas.
Speaker:
Awesome.
Speaker:
Well, I still have so many questions for
you, but let's go to the show because I
Speaker:
don't want to take too much of your time.
Speaker:
I know it's getting late in Finland.
Speaker:
So let's close up the show and ask you the
last two questions.
Speaker:
I always ask at the end of the show.
Speaker:
First one, if you had unlimited time and
resources, which problem would you try to
Speaker:
solve?
Speaker:
Let's see.
Speaker:
The lazy answer is that I am now trying to
get unlimited resources, well, not
Speaker:
unlimited resources, but I'm really trying
to tackle this prior elicitation question.
Speaker:
I think most of the other parts on the
Bayesian workflow are kind of, we have
Speaker:
reasonably good solutions for those, but
this whole question of really how to
Speaker:
figure out complex multivariate priors
over arbitrary complex models.
Speaker:
That's a very practical thing that I am
investing on.
Speaker:
But maybe if I'm kind of taking, if it
really is infinite, then maybe I could
Speaker:
actually continue on the quick idea that
we just talked about.
Speaker:
That I mean really getting this
probabilistic reasoning at the core of
Speaker:
these large language model type of AI
applications.
Speaker:
That it would really be reliably answering
proper probabilistic judgments of the
Speaker:
kind of decision -making reasoning
problems that we ask from them.
Speaker:
So that would be interesting.
Speaker:
Yeah.
Speaker:
Yeah, for sure.
Speaker:
And second question, if you could have
dinner with any great scientific mind,
Speaker:
dead or alive or fictional, who would it
be?
Speaker:
Yes, this is something I actually thought
about it because I figured you would be
Speaker:
asking it also from me.
Speaker:
And I chose that I mean fictional
characters.
Speaker:
I like fictional characters.
Speaker:
So I went with...
Speaker:
Daniel Waterhouse from Niels Deffensen's
The Baroque Cycle books.
Speaker:
So they are kind of semi -historical
books.
Speaker:
So they talk about the era where Isaac
Newton and others are kind of living and
Speaker:
establishing the Royal Society.
Speaker:
And there's a lot of high fantasy
components involved.
Speaker:
And Daniel Waterhouse in those novels is
his roommate of Isaac Newton and a friend.
Speaker:
of Gottfried Leibniz.
Speaker:
So he knows both sides of this great
debate on who invented calculus and who
Speaker:
copied whom.
Speaker:
So if I had a dinner with him, I would get
to talk about these innovations that I
Speaker:
think are one of the foundational ones.
Speaker:
But I wouldn't actually need to get
involved with either party.
Speaker:
I wouldn't need to choose sides, whether
it's Isaac or Gottfried that I would be
Speaker:
talking to.
Speaker:
Love it.
Speaker:
Yeah, love that answer.
Speaker:
Make sure to record that dinner and post
it on YouTube.
Speaker:
I'm pretty sure lots of people will be
interested in it.
Speaker:
Fantastic.
Speaker:
Thanks.
Speaker:
Thanks a lot, Arto.
Speaker:
That was a great discussion.
Speaker:
Really happy we could go through the,
well, not the whole depth of what you do
Speaker:
because you do so many things, but a good
chunk of it.
Speaker:
So I'm really happy about that.
Speaker:
As usual,
Speaker:
I'll put resources and a link to your
website in the show notes for those who
Speaker:
want to dig deeper.
Speaker:
Thank you again, Akto, for taking the time
and being on this show.
Speaker:
Thank you very much.
Speaker:
It was my pleasure.
Speaker:
I really enjoyed the discussion.
Speaker:
This has been another episode of Learning
Bayesian Statistics.
Speaker:
Be sure to rate, review, and follow the
show on your favorite podcatcher, and
Speaker:
visit learnbaystats .com for more
resources about today's topics, as well as
Speaker:
access to more episodes to help you reach
true Bayesian state of mind.
Speaker:
That's learnbaystats .com.
Speaker:
Our theme music is Good Bayesian by Baba
Brinkman, fit MC Lass and Meghiraam.
Speaker:
Check out his awesome work at bababrinkman
.com.
Speaker:
I'm your host.
Speaker:
Alex and Dora.
Speaker:
You can follow me on Twitter at Alex
underscore and Dora like the country.
Speaker:
You can support the show and unlock
exclusive benefits by visiting patreon
Speaker:
.com slash LearnBasedDance.
Speaker:
Thank you so much for listening and for
your support.
Speaker:
You're truly a good Bayesian change your
predictions after taking information and
Speaker:
if you think and I'll be less than
amazing.
Speaker:
Let me show you how to be a good Bayesian.
Speaker:
Change calculations after taking fresh
data in.
Speaker:
Those predictions that your brain is
making.
Speaker:
Let's get them on a solid foundation.