Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways:
- Teaching Bayesian Concepts Using M&Ms: Tomi Capretto uses an engaging classroom exercise involving M&Ms to teach Bayesian statistics, making abstract concepts tangible and intuitive for students.
- Practical Applications of Bayesian Methods: Discussion on the real-world application of Bayesian methods in projects at PyMC Labs and in university settings, emphasizing the practical impact and accessibility of Bayesian statistics.
- Contributions to Open-Source Software: Tomi’s involvement in developing Bambi and other open-source tools demonstrates the importance of community contributions to advancing statistical software.
- Challenges in Statistical Education: Tomi talks about the challenges and rewards of teaching complex statistical concepts to students who are accustomed to frequentist approaches, highlighting the shift to thinking probabilistically in Bayesian frameworks.
- Future of Bayesian Tools: The discussion also touches on the future enhancements for Bambi and PyMC, aiming to make these tools more robust and user-friendly for a wider audience, including those who are not professional statisticians.
Chapters:
05:36 Tomi’s Work and Teaching
10:28 Teaching Complex Statistical Concepts with Practical Exercises
23:17 Making Bayesian Modeling Accessible in Python
38:46 Advanced Regression with Bambi
41:14 The Power of Linear Regression
42:45 Exploring Advanced Regression Techniques
44:11 Regression Models and Dot Products
45:37 Advanced Concepts in Regression
46:36 Diagnosing and Handling Overdispersion
47:35 Parameter Identifiability and Overparameterization
50:29 Visualizations and Course Highlights
51:30 Exploring Niche and Advanced Concepts
56:56 The Power of Zero-Sum Normal
59:59 The Value of Exercises and Community
01:01:56 Optimizing Computation with Sparse Matrices
01:13:37 Avoiding MCMC and Exploring Alternatives
01:18:27 Making Connections Between Different Models
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.
Links from the show:
- Tomi’s website: https://tomicapretto.com/
- Tomi on GitHub: https://github.com/tomicapretto
- Tomi on Linkedin: https://www.linkedin.com/in/tom%C3%A1s-capretto-a89873106/
- Tomi on Twitter: https://x.com/caprettotomas
- Advanced Regression course (get 10% off if you’re a Patron of the show): https://www.intuitivebayes.com/advanced-regression
- Bambi: https://bambinos.github.io/bambi/
- LBS #35 The Past, Present & Future of BRMS, with Paul Bürkner: https://learnbayesstats.com/episode/35-past-present-future-brms-paul-burkner/
- LBS #1 Bayes, open-source and bioinformatics, with Osvaldo Martin: https://learnbayesstats.com/episode/1-bayes-open-source-and-bioinformatics-with-osvaldo-martin/
- patsy – Describing statistical models in Python: https://patsy.readthedocs.io/en/latest/
- formulae – Formulas for mixed-models in Python: https://bambinos.github.io/formulae/
- Introducing Bayesian Analysis With m&m’s®: An Active-Learning Exercise for Undergraduates: https://www.tandfonline.com/doi/full/10.1080/10691898.2019.1604106
- Richly Parameterized Linear Models Additive, Time Series, and Spatial Models Using Random Effects https://www.routledge.com/Richly-Parameterized-Linear-Models-Additive-Time-Series-and-Spatial-Models-Using-Random-Effects/Hodges/p/book/9780367533731
- Dan Simpson’s Blog (link to blogs with the ‘sparse matrices’ tag): https://dansblog.netlify.app/#category=Sparse%20matrices
- Repository for Sparse Matrix-Vector dot product: https://github.com/tomicapretto/dot_tests
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
Today I am thrilled to host my friend Tommy Capretto, a multifaceted data scientist from
PMC Labs, a dedicated statistics educator at Universidad Nacional de Rosario, and an avid
2
contributor to the open source software community, especially known for his work on Bambi.
3
In our conversation, Tommy shares insights from his dual role as an industry practitioner
and an academic,
4
We dive deep into the practicalities and pedagogical approaches of teaching complex
statistical concepts, making them accessible and engaging.
5
We also explored Tommy's contributions to BEMBEE, which he describes as BRMS for Python.
6
And indeed, it is a Python library designed to make patient modeling more approachable for
beginners and non -experts.
7
This discussion leads us into the heart of our newly launched course,
8
Advanced Regression with Bambi and Pimc, where Tommy, Ravin Kumar and myself unpack the
essentials of regression models, tackle the challenges of parameter identifiability and
9
overparameterization, and address overdispersion and the new zero -sum normal
distribution.
10
So whether you're a student, a professional, or just a curious mind, I'm sure this episode
is packed with insights that will enrich your understanding.
11
statistical world.
12
This is Learn Invasion Statistics, episode 112, recorded June 24, 2024.
13
Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
14
I'm your host, Alex Andorra.
15
You can follow me on Twitter at alex -underscore
16
like the country.
17
For any info about the show, learnbasedats .com is Laplace to be.
18
Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
19
That's learnbasedats .com.
20
If you're interested in one -on -one mentorship, online courses, or statistical
consulting, feel free to reach out and book a call at topmate .io slash alex underscore
21
and dora.
22
See you around, folks, and best Beijing wishes to you
23
Hello, my dear Vagans!
24
A quick note before today's episode.
25
STANCON 2024 is approaching!
26
It's in Oxford, UK this year from September 9 to 13, and it's shaping up to be an
incredible event for anybody interested in statistical modeling and Vagans in France.
27
Actually, we're currently looking for sponsors to help us offer more scholarships and make
STANCON more accessible to everyone.
28
And we
29
encourage you to buy your tickets as soon as possible.
30
Not only will this help with making a better conference, but this will also support our
scholarship fund.
31
For more details on tickets, sponsorships or community involvement, you'll find the
Stencon website in the show notes or counting on you.
32
OK, on to the show
33
Mi capretto, bienvenido a Learning Basics Statistics.
34
Hello Alex, muchas gracias.
35
Thank you.
36
Yeah, thanks a lot for taking the time.
37
That's actually a bit weird to talk to you in Spanish in English now because we're still
talking Spanish.
38
Yeah.
39
for the benefit of the world, we're gonna do that in English.
40
So it's awesome.
41
I'm really happy to have you on the show because with
42
You started as a colleague with the gears now.
43
You're definitely a friend, or at least I consider you a friend.
44
I will tell you after the recording if I consider you friend, depending on how it goes.
45
That's smart move, smart move.
46
I've lost quite a few friends because of my editing skills.
47
Yeah, so I mean, it's a long
48
interview I've had a lot of people who say you should have Tomica Preto on the show and I
always answered yeah I'll come to the show very soon don't worry we're finishing working
49
on a project right now together so well I'll invite him at that point so that he can talk
about the project and you guys maybe know what the project is about but you'll see at
50
at the middle of the episode, more or less, people.
51
But I mean, if you listen to the show regularly, you know which project I'm talking about.
52
But first, Tommy, we'll talk a bit about you.
53
Yeah, basically, can you tell people what you're doing nowadays?
54
You know, and yeah, like, what do your days look like?
55
So I'm doing quite a lot of things regularly.
56
Mainly, I work at Pimes Loves with a great team.
57
We work in very interesting projects doing basin stats.
58
I have the pleasure to be working with the people making the tool that I love.
59
That's amazing.
60
It's also great, I don't know, when we are working on a project, we realize time -seen is
to be able to do something or there's something broken.
61
We are not wondering, is this going to be fixed at some point in time?
62
Are the developers working on it?
63
We can just go and change the things.
64
Well, we have to be responsible because otherwise the community will hate us
65
changing the things all the time, but I definitely really like it.
66
So I work at Pimesy Labs, it's my main job.
67
I've been at Labs for around three years, I think.
68
In parallel, I also teach in university here in Argentina.
69
I live in Rosario, Argentina, which is like the third largest city.
70
in the country.
71
After, so far, these nerds don't know Argentina.
72
We'll see if I know Argentina in well enough after Buenos Aires, of course, and Córdoba.
73
Yeah, I think that's the correct order.
74
And of course, for the football fans, the city of Angel Di Maria and Lyon ABC, of course.
75
Yeah, correct.
76
And for some niche fans of football,
77
Like if you are from the UK or from some very particular area of Spain Also Marcelo
Gielsa, which is a coach A very particular coach He is also from I didn't know he was from
78
Rosario too, ok Yeah, yeah, yeah, we have very particular characters in the city Yeah, now
I understand why he's called El Loco Ok, ok Yeah, yeah That's how we call Tommy inside
79
Pimcy Labs You are not supposed to tell that to people yeah, right, ooh, I'm sorry
80
I'm not supposed to say a lie to you.
81
On the show I can't.
82
Yeah, and so yeah, I live here in Rosario.
83
In Rosario, I don't know why I'm telling that in English.
84
I teach in our national university.
85
There's a program in statistics, which is a program where I studied.
86
Now I'm teaching also based in statistics.
87
There's a whole course
88
dedicated to Basin Statistics in the final year of the career.
89
It's a new course.
90
It started in 2023.
91
That was the first edition.
92
Now we are finishing the second edition.
93
The students hate and love us at the same time because we make them work a lot, but at the
end of the day they learn or at least that's what they say.
94
what we find in the things that they present.
95
So yeah, those are my two main activities today.
96
I'm also an open source developer contributing mainly to Bambi, Pimc, Arvies.
97
Sometimes I am creating a random repository to play with something or some educational
tool.
98
Yeah.
99
And from time to time I teach courses.
100
I've just finalized teaching a Python course.
101
But yeah, it's like a mixture between statistics, computers, basic statistics, Python,
also R, which was my first language.
102
And yeah, that's the world we're living in.
103
Yeah, yeah, definitely.
104
You do a lot of things for sure.
105
Yeah, I think we can go in different directions, but I'm actually curious if you can talk
about...
106
I know you have an exercise in your class where you teach patient base and stance, and you
introduce them with &Ms.
107
yes!
108
Can you talk a bit about that exercise on the show?
109
I think it will be interesting for our listeners.
110
yeah, yeah, definitely.
111
To be completely honest and fair, is not our idea.
112
I mean, it's an idea that was actually published on a paper.
113
I don't remember the name of the paper, but I'm gonna find it.
114
I have it.
115
And I'm gonna give you the real source of the game.
116
But we have adapted that.
117
Basically, the first day you enter
118
a base in classroom, the teachers present you a problem saying, hey, something happened
with MMMs.
119
In our case, we used the local version, which are called Rocklets.
120
It's basically the same.
121
It's chocolate, different colors.
122
And we tell them, hey, the owner of the factory suspects that there's something happening
with the machine that creates
123
the MMMs of a particular color and you need to figure out what's happening.
124
And so we give them, so we divide the students in groups, we give them a bag to the
different groups and they have to open the bag, they have to count the number of pieces
125
that they have of the different colors.
126
At that point, the students realize that what they care about is whether it
127
that particular color or not and the idea is to start thinking like in a statistical plus
basin way like what is the quantity we are trying to estimate or what is the quantity that
128
will tell us the answer and then you say okay we are talking about a proportion all right
and do we know anything about that proportion?
129
Well, it's a proportion.
130
It can be between 0 and 1.
131
It's a continuous quantity.
132
And then, okay, we are going to work manually, so let's discretize that proportion.
133
And we have 11 values from 0 to 1.
134
And then, okay, what else do we know about that proportion?
135
Are all the values equally likely?
136
And you can notice that we are starting to build a prior.
137
And students are like, no, we have five colors.
138
The probability of this color being present 80 % of the time is not the same as the
probability of this color being present 20 % of the time, for example.
139
And so we start like in a very manual way to build a probability distribution, which is
the prior for the proportion of items that are of that
140
And then we say, okay, what's the kind of the data that we are collecting?
141
And we end up saying, okay, this is a binomial experiment.
142
And we talk about the different assumptions, independence, constant probability.
143
And then, okay, how can we combine this information together?
144
And we naturally talk about the Bayesian theorem.
145
And yeah, we do all the math by hand with very simple numbers, but in a very intuitive way
with a problem that is interesting for students because they know those chocolates, they
146
can feel it makes sense to put what they know about the problem into a probability
distribution.
147
because they know that they know something about the problem.
148
And doing some very simple math using probability rules that they already know, we can
arrive a solution in a basic way.
149
And the end of that lesson is, okay, everything we did so far is what we are going to do
in this course.
150
Like we are going to learn more about this approach to do statistics.
151
And yeah.
152
In the very end, they can eat the data, basically.
153
And that's really interesting.
154
In the very first edition, we used Rocklets, which are like &M's.
155
And in the second edition, we used Gummy Bears.
156
But the logic was more or less the same, but we changed the product.
157
And I don't know what you're going to do in the next edition, but it will have some
158
involved.
159
It's definitely very interesting and I'm fascinated by these approaches to introduce stats
to people which are more intuitive.
160
The student is involved in the problem from the very beginning.
161
You don't start with a list of 10 abstract concepts
162
Perhaps they know how to follow, but it's less attractive.
163
So yeah, we do that and I really like that approach.
164
Yeah, yeah.
165
I mean, that's definitely super fun.
166
That's why I want you to do that on the show.
167
think it's a great way to stance and we'll definitely add that to the show notes as you
were saying.
168
And for next year, well, I think you definitely should do that with Alpha Chores.
169
Let's see if we have the budget to do that.
170
Yeah, it's gonna be a bit more budget, yeah for sure.
171
mean, the best would be with empanadas, but that should be not very...
172
that shouldn't be very easy to do, you know, like the empanada can break.
173
Nah, it's gonna be a whole mess.
174
you know...
175
Yeah, I know, and that usually happens like early in the morning, so the students will be
like, what are we doing here?
176
Yeah, it's
177
It's a nice confusion because it creates a nice, an impact.
178
Like they enter the classroom and instead of having people saying, Hey, this is my name.
179
we are going to work on that.
180
It's like, Hey, you have this problem.
181
Take some gummy bears.
182
And they're like, what?
183
What's happening?
184
So that's, it's attractive.
185
Yeah.
186
Yeah.
187
No, for sure.
188
most of your students are like, do they already know about stance?
189
Yes.
190
you're teaching them the Beijing way?
191
Yeah, yeah, so at that point...
192
What's their most, you know, what's the most confusing part to them?
193
How do they react to that new framework?
194
I would say in general, we had good experiences, especially at the end of the journey.
195
But in the very beginning, so when they start the course...
196
They already have like 20 courses, let's say 15 because other courses are focused on
mathematics or programming, but they already have like 15 courses about statistics, but
197
they are all about the non -basin approach.
198
So frequentist approach.
199
They know a lot about maximum likelihood estimation and all the properties.
200
At that point, they already spent hours writing mathematical formulas and demonstrating
results and all that.
201
But they are very new to Bayesian statistics, because all they know about Bayes is Bayes
rules.
202
That's the only thing they know.
203
And they also know there's an estimation method called the Bayesian method, but
204
they are not using that at that point.
205
And one thing that there may be other things, but one thing that takes some time for them
to adapt is, okay, parameters are not fixed anymore.
206
And I put a probability distribution on top of that because in all the courses they took
before our course,
207
there's a lot of emphasis on how to interpret confidence intervals, p -values and
classical statistics.
208
At that point, they are not the typical student that is confused about interpreting
confidence intervals, p -values and frequency stats because they practice that a lot.
209
But then it's hard for them to switch from parameters are fixed
210
our interval either contains the parameter or not, but we don't know it, to, parameters
are random quantities and we put probability distributions on top of them.
211
So there's a cost there, which is not huge.
212
And what was really nice for us, Monte Carlo is something that really helped us from very
early we start
213
computing quantities of interest with Monte Carlo, when they realize the power in that
approach, they're like, I really like this.
214
Because I have a probability distribution and I'm interested in this particular
probability, or I'm interested in a probability involving two random variables, or in many
215
things.
216
Once they discover how powerful that approach
217
They're like, this is really nice.
218
But yeah, it's a challenge, but I really like it.
219
And I think at the end of the day, they also like it and they see the power in the
approach.
220
In fact, I have a student that's right now working on a Google Summer of Code project with
Bambi.
221
So it's based in stats.
222
And it seems I'm going to have another student working on a hierarchical model for his
223
So yeah, it's really nice.
224
Nice, yeah, yeah, for sure.
225
Who is the...
226
So I know also, I think if I remember correctly, there is...
227
So you know Gabriel, who works on BEMI.
228
I don't remember his last name right now, do you?
229
It's hard, it's Gabriel Stech -Schulte.
230
I don't know...
231
yes, something like that.
232
So sorry, Gabriel.
233
But Gabriel is also a patron of the show.
234
of Learn Based Stats, so he's really in the Bayesian state of mind.
235
Thank you so much Gabriel for all the support to Learn Based Stats, but also, and even
more importantly, the work you do on Bambi.
236
I know you've helped me a few months ago on a PR for HSGP, where I was testing Bambi's
HSGP capabilities to the limit.
237
Thank you so much, Gabriel and Tony, of course, for developing Bambi all the time and
pushing the boundaries on that.
238
I know Gabriel.
239
So he was working in the industry and now he's back to academia, but in a more research
role.
240
And sorry, Gabriel, I don't remember all the details about this, but I do remember he was
doing something very cool, applying Basin stats.
241
So I'm like nudging
242
publicly to someday tell the world about what he does.
243
Because I remember being like, this is quite interesting.
244
So yeah.
245
Definitely.
246
Yeah, for sure.
247
Yeah.
248
Actually, let's talk about Bambi.
249
I think it's going to be very interesting to listeners.
250
So yeah, can you tell us what Bambi is about basically and why would people
251
use it.
252
The way I usually do that is at least people know or I tell them it's like BRMS in Python.
253
If you're interested in BRMS and don't know what that is, I think it's episode 35 with
Paul Berkner, he was on the show, I put that in the show notes.
254
But if you want now Tommy's definition of Bambi, so one of the main core devs of Bambi,
well here it is folks.
255
To be honest, your definition was already really good because it's one of the definitions
I usually give when I know the other party knows about VRMS.
256
basically, if you don't know R, I can tell you like in 30 seconds, R has a very particular
syntax to specify regression models.
257
where you basically say, okay, this is my outcome variable, use a symbol, which is a
tilde, and you say, these are my predictors.
258
And you pass that to a function together with a data frame, which is a very convenient
structure.
259
And that function knows how to map the names of the predictors to parameters and variables
in the model.
260
It knows how to take a model formula
261
and a data frame and some other information that's not always needed, and it constructs a
model with that information.
262
So that's like very built in into R.
263
Like if you go back to, I think to the S language, the formula syntax already existed.
264
Then the R language has the formula syntax in the base packages.
265
And a lot of packages built by people in R use the formula syntax to specify regression
models.
266
And a lot of people also extended the formula syntax to account for other things, like one
extension that we incorporated in Bambi is the syntax to have what in frequency stats you
267
call random effects.
268
that appeared I think the first time in the LME4 package which is a very popular package
in R to work with mixed effects model which is another name for hierarchical models it's
269
crazy how many names you have for that so basically in R you have this formula syntax and
this very short way of writing a statistical model
270
and lot of people created a lot of packages to have a larger variety of models.
271
Then go to Python.
272
Let's go to Python.
273
Python is a more general programming language.
274
It has great support for statistics, machine learning, basic stats, and all that.
275
But you don't have something like a model formula built in the language.
276
I think one of the very first attempts to build that, which was extremely successful, it's
Patsy, which is a library developed by...
277
I don't remember the name of the guy, sorry.
278
I think it's Nathaniel, but I don't remember the last name.
279
But that's like...
280
As far as I know, the first package and the largest package that brought the model
formulas to Python, and then other libraries started to build on top of that Patsy
281
library.
282
For example, stats models.
283
And stats models allows you not to copy and paste your R code, but basically to say, this
is in R how I will create a linear regression model.
284
Okay, in Python, what do I need to
285
Okay, I need a pandas data frame, model formula that it passed in a string and it works
the same way.
286
And so as it happened in R with people creating packages to extend those capabilities, the
same happened in Python.
287
Like you have stats models, which is very popular, but there are also many other
libraries.
288
And one of those libraries is Bambi, which extends
289
the model formula and uses the model formula in a basin context.
290
BAMB is stands for basin model building interface.
291
It uses a model formula and a syntax very similar to the syntax that you find in R to
create basin models.
292
I think what's great about it is that you're not only creating the model, but you also
have lot of functionalities to work with the model.
293
For example, obtain predictions, which is not trivial in many cases, or compute some
summary of interest, or help you to find prayers that are sensible for the problem that
294
you have.
295
And so yeah, I joined.
296
the Bambi project, I think it was in 2020 or 2021, while working with Osvaldo, he was my
director in Conicet, which is like a national institute for science and technology here in
297
Argentina.
298
yeah, and I really liked the interface and I saw many points that could be
299
improved, mainly that Bambi didn't support the syntax for random effects.
300
Actually, no Python library supported that because Patsy didn't support that.
301
And at that point in time, I was learning about programming languages and I was like,
well, maybe it's time to write a parser for model formulas.
302
And that's what I did.
303
And that was my first big contribution to Bambi.
304
And then we started to add, I don't know, more model families.
305
So Bambi now supports many more likelihood functions.
306
We started to add better default priors because the goal of these libraries is to allow
you to
307
a quick iteration.
308
It's not that we are rooting for, you should all use default priors and automatic priors.
309
No, please don't do that.
310
But if you want to have something quick and iterate quick, then that's not a bad idea.
311
Once you more or less have like a more refined idea of your model, then you can sit down
and say, okay, let's really think really.
312
about the priors.
313
So to summarize Bambi is a package built on top of PyMC.
314
I didn't mention that before.
315
That allows people to write, fit and work with base models in Python without having to
write a model in a probabilistic programming language.
316
There's a trade -off.
317
Like you can write a very complex model in two or three lines of code.
318
If you want full flexibility, you should use a PIMC.
319
And to conclude, said BAMBEE is the BRMS of Python.
320
We always take like BRMS as an inspiration and also as
321
Yeah, what we want to have in many cases because implementing Bambi, I learned a lot about
BRMS and how great it is actually because the complexities it can handle and the variety
322
of models and kind of things you can have in a model in BRMS is huge.
323
I mean, I'm not aware of any other interface like this that supports
324
as many things, base and non -base.
325
I mean, it's really amazing.
326
And yeah, we are always taking ideas from VRMS.
327
Yeah, Yeah, great, Samari.
328
Thanks to me.
329
And even like, brief history of Bambi, I love that.
330
So in the show notes, I added the link to
331
that you mentioned and also the link to the very first Learn Bay Stats episode which was
with Osvaldo Maldini.
332
So it was episode number one.
333
It's definitely a vintage one, people.
334
Feel free to...
335
I have a fun story about that.
336
yeah?
337
I don't know if I told you about this story but when Osvaldo recorded that...
338
think I know.
339
Yeah, you know, know.
340
When Osvaldo...
341
but I don't know if the public know
342
knows about that story.
343
So Osvaldo and I used to work like in the same building, not in the exact same office, but
his office was in front of my office.
344
So if he was talking to someone, I could listen.
345
Not very clearly, but I could realize he was talking.
346
And some random day I was in the office and I noticed that he was talking English, but
alone.
347
Like, not with another person.
348
And I said, what is he doing?
349
And then after that, he told me, yes, I was interviewed in a podcast that this other guy
who's been contributing to Arby's is starting.
350
And yeah, I think it's very cool.
351
I think it went very well.
352
And at that point in time, I didn't know you, but I knew there was a podcast guy and it
turns out that I witnessed
353
the first recording of Learned Basics Statistics, which is pretty fun.
354
And look where we are now.
355
Pretty interesting.
356
Yeah, this is really cool.
357
I love that story.
358
It was already all linked together.
359
I love that.
360
Yeah.
361
Yeah.
362
I really love Bendy for what you said, you just said, I think.
363
It's a great way to start and iterate very fast on the model.
364
And then if you validate the concept, then you can switch to PIMC and build the model
again, but then build on top of that.
365
And that's going to make all your modeling workflow way faster.
366
Yeah.
367
really love that.
368
Another thing also that's really good is for teaching, especially beginners,
369
that will abstract away a lot of the choices that need to be made in the model.
370
As you were saying, it's not necessarily what you want to do all the time, but at least to
start with, you know, it's like when you start learning a new sport.
371
Yes, there are tons of nuances to learn, but, you know, if you focus on one or two things,
you already have the Pareto effect.
372
Well, then Bambi allows you to do that, and I think that's extremely valuable.
373
Yeah, and another point I'm realizing I forgot to mention is that it lowers the the
entrance barrier.
374
Like, there are a lot of people who are not statisticians, but they do stats because they
have experiments or they have they are studying something and they have data and they have
375
some level of familiarity with some models and they know that that's the model they want
to fit.
376
But probably writing PIMC
377
and working with indexes and demons and quarts is too much and going to Stan and typing
everything is also too much and they don't work with R and they want some higher level
378
interface to work with, then Bambi is also what they use.
379
And yeah, I also really like that.
380
It makes basic stats
381
more welcoming for people that are not experts at writing code, which is completely fine.
382
Because a lot of people out there are trying to solve already difficult problems and
adding the extra complexity of being an expert in a PPL maybe too much.
383
So that's also another reason to have these interfaces.
384
Yeah, yeah, yeah.
385
I definitely completely agree.
386
I that's also...
387
So basically, if people are curious about Bambi and get started with that, I definitely
recommend taking a look at the Bambi's website that I put in the show notes.
388
also, well, probably then about our new course, Tommy, that's the project that was in the
notes.
389
So this is all I am happy to have you on the show here, please.
390
So the course is called Advanced Regression with Bambi and Pimc.
391
Precisely, it's on the intuitive -based website, so of course I put that in the show notes
for people who want to take a look at it.
392
If you're a patron of the show, have 10 % off.
393
This is the only discount that we do, so I hope you appreciate it.
394
That's how special you are.
395
Thank you so much, patrons.
396
And yeah, maybe Tommy tell us about, you know, the course and what it is about and for
whom in particular that would be.
397
We spent a lot of time on this course.
398
It took us two years to develop.
399
So, yeah, I'm super happy about it.
400
I'm also super happy that it's done.
401
But yeah, maybe give us the elevator pitch for the course who that before.
402
and why would people even care about it?
403
So the Advanced Regression Course is a very interesting course with a lot of material,
with a lot of very well thought material, which in all cases went through a lot of
404
reviews.
405
As the title says, it's a course about regression, but also as the title says,
406
it's an advanced regression course.
407
It doesn't mean it starts from the beginning being extremely advanced and it doesn't mean
it involves the craziest mathematical formulas that you're going to see in your life, but
408
it means it's the course you have to take if you want to give, sorry, if you want to take
that second or third step in your learning journey.
409
Like for example, if you took an introductory course like yours or another introductory
course and you feel that's not enough or you are open to learn more, you are eager to
410
learn more, then that's the course for you.
411
Of course, it has a base in approach and it uses a lot of Python, Bambi and Pimc.
412
Every time I talk about regression, I want to qualify something.
413
I remember a conversation I had with colleagues when I was just starting in a previous
job.
414
They were telling me they were taking a course about statistics, like those courses where
you have a ton of topics, but only very lightly colored.
415
And they were like, yeah, the first two units is regression.
416
And this is a lot.
417
And I was telling them, in university, I had six courses about regression.
418
It was not just two units in a course.
419
And that's because I think in many cases, people think that regression is something very
simple.
420
It's the linear regression that you learn in
421
basic statistics course, like you have a predictor and you have an outcome variable and
you have a predictor, then that's simple linear regression.
422
You have multiple predictors, you have multiple linear regression.
423
And that's it.
424
That's all linear regression gives you.
425
And all the rest are crazier things that fall under the machine learning umbrella.
426
But in the course, we see that that's
427
the whole story.
428
So many things are regressions or if you don't like the term maybe we can give you a
better term in the future but so many things are linear models which sounds pretty basic
429
right?
430
You say this is a linear model this is a linear equation it's like this is for dummies but
if you're curious take the course and and you will see
431
With linear models, you can do a lot of crazy things.
432
Of course, we start with simple linear regression and we do multiple linear regression.
433
But then very quickly, go to logistic regression, Poisson regression, we talk about
categorical regression, multinomial regression, when your outcome is categories and you
434
have multiple categories.
435
And then it goes crazy.
436
and we have zero inflation and we have overdispersion and we finalize the course talking
about hierarchical models in the context of regressions and it ends with a very
437
interesting model that you developed.
438
So the course is very complete, it starts
439
A few things that we assume people know but we like review them.
440
But then very soon we start covering new things.
441
I think in all cases we show how to do things with Bambi and how to do them with Pine T.
442
We have a lot of visualizations.
443
Our editor did an amazing job at editing the video so we also have animations and all
that.
444
Yeah, it's a product I'm proud of.
445
Yeah, it's nice.
446
Yeah, definitely.
447
There is so much that we've done, in this foreign territory.
448
Well, I learned so much because...
449
Me too.
450
Yeah, as you were saying, it sounds like what a regression is, just something from the
past.
451
But it's actually used all the time.
452
You know, even the big LMs now, in the end, it's a lot of dot products and dot products
are matrices multiplied with vectors and, you know, a linear regression is actually not
453
that far from that.
454
It's actually exactly that.
455
So if you learn and understand really the nitty gritty of hard regressions, complex
456
you already know a lot of things you're going to need to to need.
457
You're going to need to know when doing Bayesian modeling in the trenches.
458
That's, that's for sure.
459
And that's also why I learned so much in this course, because I had to really dig into the
regression models.
460
And, we show you how to do that from simple regression to binomial regression.
461
Poisson regression, stuff you guys obviously at least have heard about, but then we teach
you more niche and advanced concepts like zero inflated regressions, over dispersed
462
regression, which is one of the chapters you worked on, Tommy, and you folks are gonna
learn a lot on that, like not only how to do the models, but then what to do with the
463
models after.
464
how to diagnose them, how to become confident about the model's predictions.
465
And also we teach you about a personal favorite of mine, which is the categorical and
multinomial regressions, which I use a lot for electoral forecasting.
466
But also you're going to use them a lot, for instance, for any more than two categories,
you're going to use a multinomial or a categorical.
467
And that's just extremely important to know about them because they are not trivial.
468
There are lot of subtleties and difficulties and we show you how to handle that.
469
I that's personally, I learned so much.
470
Something I really loved is what you did in in the over dispersed lesson, you know, where
you were diagnosing the over dispersion and coming up with a bunch
471
custom plots to show that the model is under dispersed.
472
Yeah, that's a term.
473
Compared to the data.
474
And also then coming up with a test statistic, a custom test statistic to actually see
whether the model is under dispersed or not.
475
And I think that's really powerful because that shows you also that in the invasion
framework, I often get that question from beginners.
476
can I compute
477
test statistics, because that's a magic one in the fragrances framework.
478
I'm like, yeah, sure.
479
But you can also invent your own test statistics for your own purpose here.
480
You don't have to use a pre -baked test statistic.
481
You have posterior samples.
482
can do whatever you want with them.
483
I thought that was like, that's definitely one of my favorite parts of the course.
484
And something I realized we forgot to mention, and I really like,
485
about the course and I really like having that in the course is all the different parts
where we talk about parameter identifiability and overparameterization and it's like we
486
don't tell you, take this outcome, take these three predictors and put them into the
machine and you're good to go.
487
I think that's probably, that will be a difficult part the first time you encounter
488
in the course, but we cover it multiple times in multiple lessons.
489
And the reason is it's a very important topic that's covered in many places, but I think
with not enough emphasis.
490
So we did our best to include that topic in many lessons to show it from different angles,
show how it can happen under
491
synchro stances, and that's something I'm really proud about.
492
How much time and effort we invested in non -identifiability, parameter redundancy, and
all that.
493
And the different approaches to deal with that, that's something I'm proud of.
494
I'm very happy we did that.
495
Yeah, definitely.
496
That's a very good point.
497
I think I finally understand overparameterization by working on this course because we see
it from, I think from lesson two or three, up until the last lesson, which is lesson nine.
498
Yes.
499
And we see it repeatedly.
500
And I think that's really good because it's a hard concept that's related to an
unidentifiability.
501
That happens a lot in models, not only Bayesian models, all the, like any statistical
model, but it's
502
mathematical thing.
503
And then it appears all the time in models.
504
And that's related to an identifiability, but it's hard to understand.
505
So you have to repeat it and really, really understand what that means.
506
then only then you can develop an intuition of what that really is and when it happens.
507
So yeah, definitely that's, that's also something I personally learned a lot and enjoyed a
lot in this.
508
in building this course.
509
Yeah, me too.
510
What would you say is your favorite part of all the curriculum right now and also what is
the part that was much more complicated than you anticipated?
511
Good question.
512
I don't know if this is a favorite part, but something I really like about the course is
how many visualizations we created.
513
Like in every model, we always created a visualization to explore the posterior, to plot
predictions, to do things like that.
514
I really like when you create a model and you don't just show two numbers, you make a
beautiful thing to communicate what you found.
515
That's something I really like.
516
definitely, my favorite parts are the more advanced parts, like starting perhaps in lesson
five, lesson six, when we talk about categorical regression, multinomial regression, and
517
then everything that happens after that.
518
Because I think that every lesson has many things to learn.
519
So I couldn't say, okay, this
520
the part I enjoy the most because I enjoy all of them but definitely the second half and
something that was difficult actually while working on the lesson about over dispersion I
521
looked through a lot of books, papers and all that and it was not easy at all to
522
many references, examples, datasets, very well worked examples from end end.
523
Honestly, I thought I would find a lot more, many more resources, and it was not that
easy.
524
I read papers
525
from 50 years ago.
526
Those scanned papers, like written in machines.
527
Yeah, that was harder than what I anticipated.
528
Crafting that lesson required a lot of reading, not only for the complexity, but also to
find resources that helped me build the lesson.
529
Yeah, definitely that
530
challenging and unanticipated.
531
Yeah, that lesson was hard, for sure.
532
that was difficult one.
533
Yeah, I mean, for me, I think my favorite part was really, as I was saying, Not learning,
but really getting to another level of understanding of an identifiability and of
534
parameterization.
535
And also, the next level in my understanding of the zero -sum normal distribution.
536
Because I had to use it a lot in the whole lesson.
537
And so, I mean, in the lessons, in all the lessons I'm teaching in this course, so three
of them, I'm using zero -sum normal.
538
So I had a really deep, deep...
539
And actually, that's something that, yeah, the students have said from the beta version
that
540
Yeah, it's very interesting to see how you solve one of the unidentifiability that can
happen in models.
541
So like, for instance, with multinomial models, one of the probabilities, like the last
category's probability is entirely determined by the n minus one previous categories.
542
So that's basically what an overparameterization is.
543
If you put the parameter
544
the end categories, then your model is overparameterized because the last category is
entirely determined once you know about the end minus one, the previous end minus ones.
545
And so there are at least two ways to solve that as we show in the course.
546
One of the classic ones, and it's the one that automatically implemented in BAMBi is
reference encoding.
547
So you take one of the categories and you consider that
548
is the reference in O and you fix it to an arbitrary number.
549
So fix that parameter to an arbitrary number.
550
Usually it's zero.
551
And then all the other categories, these parameters are in reference to that category.
552
So you could do that, but you can also do, and that's what we show you also in the course,
you can also say, well, instead of fixing one category to zero, I'm going to fix the
553
other categories to zero.
554
And that way you can still have n parameters, one for each category, which is really cool
because that way you don't have to think about one category as a reference.
555
And you just use a zero for normal distribution instead of normal distribution.
556
And that distribution is going to make sure that the sum of the categories sum to zero.
557
So that will depend when you prefer one or the
558
But usually when you don't have a natural placebo, you will probably prefer the zero
-subnormal parameterization because then there is no obvious reference.
559
Whereas a placebo is an obvious reference, you probably want all the parameters in
reference to that category.
560
But the zero -subnormal is going to be in reference to the average of all the categories.
561
And you can actually model an average for all the categories
562
this parameterization and then all the categories will be an offset of that baseline.
563
So that was definitely something super interesting that helped me pass the level in my
understanding of the distribution in that course.
564
And definitely a lot of better testers appreciated it.
565
I guess you want to say something also, but that's only because you know the zero sum
novel quite well.
566
Yeah, yeah.
567
But something like
568
Something nice I wanna say about the zero -sum normal.
569
In PyMC, the Serious or Normal is implemented as a distribution, which I think it would be
better if we could say, okay, this is a normal distribution plus transformation or a
570
restriction.
571
But having something called Serious or Normal and being able to use that as problem as any
other PyMC distribution is very convenient because the user doesn't have to deal with all
572
the details.
573
to get that constraint.
574
While if in PyMC you wanna have like other encoding, like you wanna have reference level,
you have to do it in a very manual way.
575
You have to create a vector of normals with shape n minus one.
576
Then you have to concatenate a serial to that other vector.
577
And then you get a new vector and that's vector you use in your model.
578
And you end up having like a constant in your trace and then Arvis complains about not
being able to compute our hat, for example, because they are all zeros or all constant.
579
And the zeros on normal is also like more appealing for the general users.
580
They just replace normal with zeros on normal.
581
and you're good to go.
582
That doesn't mean we shouldn't think about what we're doing.
583
I'm just talking about from like user experience, it's much easier to use a
SerialSumNormal and also more intuitive in most of the cases.
584
But yeah, I think the summary and how this relates to the course is think about parameter
restrictions that you add to the model.
585
think about how that changes the meaning of the parameters and then be responsible with
what you do.
586
But know that there's not a single recipe for solving that kind of problems.
587
Yeah, yeah.
588
Yeah, and that's also why we have the whole community in intuitive ways and we have the
discourse that people can ask questions because unfortunately there is no...
589
one size fits all.
590
I mean, I say unfortunately, that's actually pretty cool because otherwise, I guess what
we're doing would be pretty boring.
591
Time is running by and I think we've covered that topic quite well.
592
I I could talk about regression quite a long time, but I think that's a good overview.
593
And of course, if people are interested in some of the topics we talked about here,
594
Let me know and I can do a special episode about some parts of Regressions that you're
interested in or you're really wondering about.
595
Or we can even do a modern webinar showing you some things, some answers to the most
frequently asked questions you have about Regressions.
596
for sure, let us know about that.
597
And well, if we made you curious to take the course.
598
That's awesome.
599
I think this will be a lot of hours well invested.
600
Yeah, because it's nine lessons.
601
It's, I don't know how many hours of videos, but a lot.
602
You have lifetime access to that.
603
have exercises, which are very important.
604
Folks, I know I sound like a very old professor here, but actually I think the most
valuable of the course is not only watching the videos, but also doing the exercises.
605
and going through the solutions that you have all on the repo and asking questions on the
discourse, answering questions on the discourse, being part of that community.
606
Basically that's really how you're going to get the most out of yeah, like it's, you can
not learn how to ride a horse by just watching people riding horses.
607
It's the same with patient modeling.
608
If you just watch the videos, that will be entertaining for sure, but you're not gonna get
the most out of it.
609
So, yeah.
610
And if you do take the course, please say hi.
611
You are gonna be very happy to have you there and definitely wanna hear from you.
612
Tell me maybe, yeah, something I wanted to ask you before letting you go is, I know you've
done some work lately about sparse matrices.
613
If I remember correctly, in PyTentor, is that something you think would be useful here to
share a bit for listeners?
614
Yeah, yeah, can, I It's a topic I really like and I wish I knew more about that and always
like trying to learn.
615
Like there's some depth at which I know nothing about how that works.
616
But basically,
617
You already mentioned this, many things can be expressed as dot products.
618
And a subset of those many things can be expressed as a dot product between a matrix and a
vector.
619
That happens all the time in linear models.
620
That's basically the gist of linear model.
621
And in a subset of those cases, one
622
the matrix of that dot product is very sparse.
623
And if it's very sparse...
624
So define a sparse...
625
Yeah, define a closed matrix for example.
626
You have many entries in a matrix, but most of them, the great majority of them, are zero.
627
So it means in the multiplication they are not going to contribute anything to the final
628
If you do a dot product between a sparse matrix and a dense vector, dense is the opposite
of a sparse, meaning that you can have some zeros, but you don't have so many zeros to the
629
point where non -series are the rare value.
630
Anyway, if you have a big sparse matrix and a dense vector and you multiply them, you do a
dot product.
631
you're going to spend a lot of time computing things that are serial and will always be
serial and contribute nothing to the end result.
632
Of course there are, like, for a long time there have been structures to store these
special matrices in computers in such a way that you save space because
633
If you have a huge matrix with a lot of zeros stored in a dense way, that takes memory.
634
If you don't tell the computer those values are all the same, it doesn't know about that.
635
So it's going to take a lot of memory to store that matrix.
636
But with a sparse matrix, first you can save a lot of space into storage of the matrix.
637
And then you can exploit the sparsity to do less computations.
638
And at the end of the day, have computations that run faster.
639
And if you are doing MCMC, which means that you are evaluating the log P and its
derivative many, many times, it means you're multiplying.
640
If you're doing
641
matrix and vector multiplication a lot of times.
642
So gaining time, making that computation faster is something that we want to have.
643
yeah, PyTensor has some support for sparse matrices and sparse objects in general.
644
But as far as I know, that support comes from
645
old Tiano days.
646
There has been some maintenance, but not a lot of features have been added.
647
And yeah, for some projects at Labs, I've been writing my custom things to do dot products
between sparse matrices and dense vectors.
648
Unfortunately, I didn't have time yet to put that into PyTensor, but I want to do that
649
someone wants to collaborate on that endeavor, I'm more than happy.
650
But yeah, I think it's something that we should do more.
651
And the main motivation was that I wanted Bambi to do that by default, because Bambi is
doing the simple thing of multiplying big dense matrices.
652
when some of those matrices could have been sparse.
653
It's definitely not like new theory or new computational techniques, but it's taking
things that already exist and making them usable, first available and then usable for the
654
wider community.
655
And I don't know, I have fun doing those kinds of things.
656
Yeah, I mean, I think this is extremely valuable.
657
I hope you'll have time to include that in Python.
658
In a few weeks or months.
659
I mean, if I had time, but I definitely helped you, Matt.
660
Unfortunately, now with the new job and the other projects that have
661
to finish, like, don't have a lot of time for that.
662
yeah, but I mean, this is also definitely something that I want to learn more about
because it happens quite a lot.
663
And this is extremely frustrating.
664
Yeah, it's just like your brain, it feels weird because your brain when it sees a zero, it
knows if this term is not going to be useful.
665
So you can kind of get rid of it when you do the computation
666
You you do any computation by hand, you get rid of the zeros very easy.
667
But the computation does, the computer doesn't know that.
668
So you have to tell it because otherwise it spends a lot of time doing useless
computation.
669
And then in the end it's like, yeah, that's a zero.
670
But then you spent a lot of seconds doing that.
671
And that's stupid.
672
But you have to tell it, right?
673
It's what I tell with computers a lot, Computers are very powerful, but often they are
very dumb.
674
So you need to tell them exactly what you want.
675
And that's basically what you're trying to do here.
676
That's really interesting because that also happens very frequently, doesn't it?
677
Yeah, yeah.
678
For those who are curious about it and want to take a deeper dive, Daniel Simpson, he has
a very interesting blog.
679
And in that blog, he has many posts about doing things with sparse mentacies.
680
because I didn't mention this, but these matrices can have particular structures and if
they have that particular structure, can exploit some property of matrices and then do the
681
computation even faster.
682
like dot products, inverses, transposes, and things like that, determinants.
683
If you have matrices with particular structures, you can exploit those structures to save
684
and perhaps also memory.
685
And Daniel wrote a lot of posts doing things with sparse matrices using Jax, which, know,
PyTensor has these multiple backends.
686
It has a C backend, it has a Numba backend and a Jax backend.
687
And what has been frustrating to be honest is that the support for sparse matrices
688
varies a lot in those backends.
689
And that's one of the reasons that makes it harder to have something available that works
for most of the cases.
690
So in my use case, I implemented what I needed for the particular model that I had.
691
But if you want to have something public,
692
available for the wider community, it should work in more than just one single case.
693
But yeah, I think what's needed is a few people with some time to work on that and that
should be it because many things are already invented.
694
I'm not saying the task is trivial, not at all.
695
I'm saying it's...
696
It's about investing time, programming, designing, testing, and all that.
697
Yeah.
698
Yeah, so you heard it, folks.
699
Really, if you're interested in working on that, and you don't need to be an expert on
that because we have people like Tommy on the Pimesy repo who can mentor you.
700
If you're interested in that and you want to dive a bit into open source, please contact
me and I'll put you in contact
701
the appropriate authorities, as we say.
702
And yeah, so we should definitely put that blog post by Dan Simpson in the show notes,
Tommy, if you can do that.
703
also, is there anything you can share already in the show notes from your custom
implementation?
704
Yeah, I have all the repository that is public.
705
Perhaps I can update
706
with the latest things.
707
But I do have a few things to share.
708
Both implementations and experiments of myself testing those implementations.
709
Nice.
710
Which implementations are those?
711
In which cases could people use them?
712
Just Matrix.
713
If you write it, it's SPMB.
714
It's a sparse matrix.
715
SPMB, think.
716
But basically sparse matrix dense vector multiplication.
717
That's what I care about.
718
But that's in PyTensor.
719
PyTensor, C, Numba, JAX, many things.
720
But yeah, it's PyTensor with different backends.
721
Okay, so it would be like, for instance, you could use that function that's written in
PyTensor.
722
in a PyMC model.
723
Yeah, yeah, that's the goal and that's what I did in my use case.
724
Yeah, yeah, yeah.
725
It's like you have a sparse matrix multiplication somewhere in your PyMC model.
726
Instead of just doing pm .math .dot, you would use that custom...
727
Another function.
728
You would use that custom PyTensor function.
729
Correct.
730
Yeah, but the problem I was telling is, let's say you want to use the great new, not
PySum, okay, then you need a number backend to be
731
so you have that sparse thing implemented in Lumber and so on.
732
That definitely would awesome to have people help out on that.
733
I definitely love to that, unfortunately, I cannot extend my days.
734
That's really fascinating work.
735
That's really cool.
736
I'm hoping to have to do that at one point for work.
737
So you are forced to do it?
738
Yeah, either for the Marlins or for the Labs project.
739
Because then I'm forced to dive into and do it and probably do a PR to finally push that
to Pytancer Universe.
740
That's how a lot of my PRs end up being, you know.
741
That'd be great, I'd say.
742
I'd love that.
743
I love that because I've definitely been beaten by that before.
744
that's, yeah.
745
I had also looked into implementing a sparse Softmax implementation in Pytensor.
746
If I remember correctly, that didn't need to be very hard and I didn't have a lot of time
to work on that project, so I had to abandon it.
747
But yeah, definitely that'd be super fun.
748
Great, so, Tommy, it's already been a lot of time, maybe I just have one more question
before I go to last two questions.
749
Now I know you, learn a lot of stuff, we kind of work similarly so I think something I'd
like to ask you is what are you thinking about these days?
750
What do you want to learn in the coming week or coming month?
751
that's an interesting question.
752
I've been learning more about hierarchical models.
753
So it seems like, but shouldn't you already know about that topic?
754
Yeah, but turns out there are a lot of things to learn.
755
And so I've been learning about basic modeling and hierarchical models, like in multiple
ways, definitely gaining intuition through like computer exercises.
756
helped me a lot.
757
But lately, I went to more formal sources to have a look at the math and have a look at
the properties to better understand assumptions, consequences of those assumptions, trying
758
to understand when we can avoid computations.
759
In some point, my understanding was, okay, we have HMC.
760
This is the best thing in the world.
761
we pass any model between quotes because it's not any model but let's say any model and it
just works.
762
Okay, yes, you can have some problems but let's say it just works.
763
But then I've been learning more about those cases where you can avoid using such a
sampler or you can...
764
I know it sounds boring to write your own MCMC routine but if you have a A model
765
that you know very well and that's the model you want to use and nuts is going to take 30
hours because you have millions of parameters probably it's worth it like having a look at
766
the theory and realizing if you can do something more and I'm learning about that and I
really like it it's challenging I think that with
767
the experience of having worked a lot with BASIN models, is much easier to digest all
that.
768
So that's one of the things that I'm learning about.
769
Another thing that I'm always learning, and there's a book that we have been sharing
lately with folks at Labs and on Twitter.
770
The book is called, Richly Parametrized
771
linear models or something like that.
772
But something about models with a lot of parameters and how to work with those models.
773
And the book is great.
774
I enjoyed it.
775
And the topic is the connection between many different models that seem to be different,
but how they are connected to each other.
776
And I really enjoy that.
777
Like, you have a spline model.
778
You have a model with splines and then you have a hierarchical model but if you have these
particular priors and you go to the models that distribution it matches that other thing
779
and seeing those connections between the different models and modeling approaches is
really nice because it may seem boring at some point but that's how you
780
really grasp the depths of something.
781
So yeah, those are two things I'm learning about these days and I enjoy learning about
those things.
782
Yeah, I can tell you, you love learning about new things.
783
I do too, I think that's why also we work so well together.
784
And if you have a link to the book you just mentioned...
785
Yeah, I will share the book to edit.
786
I'm very bad at remembering exact names.
787
Fortunately, I can just search my computer so I know one or two words and then I can get
what I want.
788
That's cool.
789
sounds about right.
790
Well, Tommy, that's great.
791
I think it's time to call it a show.
792
We've got a lot of ground.
793
Of course, a ton of questions I
794
Still ask you, let's be respectful of your time.
795
But before, I'll let you go, of course.
796
I'm gonna ask you the last questions.
797
I'll ask you if you had guests at the end of the show.
798
you could...
799
No, sorry.
800
First one is if you had unlimited time and resources, which problem would you try to
solve?
801
I don't know if this problem has like a particular name, but you know, I enjoyed...
802
working with samples obtained with MCMC methods.
803
And it's really nice learning about how they work and how to diagnose them and all that.
804
But if we could have just a method that gives us real samples from any posterior
distribution that we work with, or we could have a very clever machine that knows the
805
details about every model
806
without us noticing, it uses a specific method to give us draws from the posterior,
meaning that you don't need to worry about divergences, convergence, and things like that,
807
where you can just focus in the analysis of the outcome.
808
I will work on that.
809
Because, and it's something I've been thinking more these days, like, now I need to wait
for the compilation.
810
and now I need to wait a few hours to get the draws.
811
If I could have something that saved me from that, even though I enjoy learning about how
it works and how to improve it depending on the kind of problems I'm having, yeah, I would
812
definitely like getting rid of MCMC and just do MC.
813
But I don't know if it's possible.
814
But if I'm here to dream, I'm going to have like...
815
Yeah, a very ambitious dream.
816
sure.
817
Yeah, Let's dream big.
818
Yeah, I agree with that.
819
Kind of having a...
820
Yeah, what I often dream about is having kind of like a Javi's like Iron Man.
821
I mean, like, can you try that version of the model?
822
Something like that.
823
that'd be fantastic.
824
Yeah.
825
Nice.
826
then second question.
827
If you could have dinner with any great scientific mind that alive or fictional, who would
it be?
828
And keep in mind that you cannot say myself because you already had dinner with me.
829
then we have to finish the recording.
830
Yeah, I I knew you were going to answer myself and I definitely appreciate that.
831
But you already had dinner with me, so you have to choose one of us.
832
Yeah,
833
Again, let me explain the answer.
834
I don't know why, but I'm a fan of movies and documentaries about World War II.
835
And one movie I enjoyed a lot and like I was really into the movie with a lot of attention
and very interested in what was happening was the, I think in English it is called the
836
Imitation Game, but in Spanish we call it...
837
the Enigma code or something like that.
838
And I really enjoyed that movie.
839
And I was fascinated seeing the machine moving the things and making noise, trying to
crack the machines to understand the message and then using like, okay, now we have the
840
information.
841
What do we do with that information?
842
So definitely...
843
I'm talking about Alan Turing and we have dinner with him to talk about everything.
844
How he was recruited, how they come with ideas, how they used it, what was hard about
making choices because it was both a technical problem but also a political, human
845
problem.
846
And then to talk about what happened after that.
847
So yeah, I think
848
The bad thing about that dinner would be that I would like it to last for many hours
because I would have many questions.
849
But yeah, that would be one person I would like to have dinner with to interview and ask a
lot of things.
850
Yeah, Great choice.
851
Fantastic choice.
852
Invite him at Christmas.
853
Christmas dinner
854
takes hours, so I think that's That's a very good opportunity.
855
Whether in France or Argentina, they always last hours, so you know.
856
That's good.
857
Awesome.
858
Well, thanks a That was a blast to finally have you on the show.
859
More than 100 episodes after you eavesdropped on Osvaldo's door at the Cunicet.
860
In Spanish, I think you would say, a little bit Quechua, and yeah, I'm sure.
861
yeah, yeah.
862
Yeah, that's great to have you on the show, And as usual, we'll put a link to your
website, to your socials, to a lot of links for those who want to dig deeper.
863
Thanks again, Tommy, for taking the time and being on this show.
864
Thank you, it was a lot of fun to be honest.
865
if Alex happens to invite you to the podcast, you have to say yes.
866
Thank you, Alex.
867
This has been another episode of Learning Bayesian Statistics.
868
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbayestats .com for more resources about today's topics, as well as access to more
869
episodes to help you reach true Bayesian state of mind.
870
That's learnbayestats .com.
871
Our theme music is Good Bayesian by Baba Brinkman.
872
Fit MC Lance and Meghiraam.
873
Check out his awesome work at bababrinkman .com.
874
I'm your host.
875
Alex Andorra.
876
can follow me on Twitter at Alex underscore Andorra, like the country.
877
You can support the show and unlock exclusive benefits by visiting Patreon .com slash
LearnBasedDance.
878
Thank you so much for listening and for your support.
879
You're truly a good Bayesian.
880
Change your predictions after taking information in.
881
And if you're thinking I'll be less than amazing, let's adjust those expectations.
882
Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation