Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
In this episode, I had the pleasure of speaking with Allen Downey, a professor emeritus at Olin College and a curriculum designer at Brilliant.org. Allen is a renowned author in the fields of programming and data science, with books such as “Think Python” and “Think Bayes” to his credit. He also authors the blog “Probably Overthinking It” and has a new book by the same name, which he just released in December 2023.
In this conversation, we tried to help you differentiate between right and wrong ways of looking at statistical data, discussed the Overton paradox and the role of Bayesian thinking in it, and detailed a mysterious Bayesian killer app!
But that’s not all: we even addressed the claim that Bayesian and frequentist methods often yield the same results — and why it’s a false claim. If that doesn’t get you to listen, I don’t know what will!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉
Links from the show:
- LBS #41, Thinking Bayes, with Allen Downey: https://learnbayesstats.com/episode/41-think-bayes-allen-downey/
- Allen’s blog: https://www.allendowney.com/blog/
- Allen on Twitter: https://twitter.com/allendowney
- Allen on GitHub: https://github.com/AllenDowney
- Order Allen’s book, Probably Overthinking It, at a 30% discount with the code UCPNEW: https://press.uchicago.edu/ucp/books/book/chicago/P/bo206532752.html
- The Bayesian Killer App: https://www.allendowney.com/blog/2023/03/20/the-bayesian-killer-app/
- Bayesian and Frequentist Results Are Not the Same, Ever: https://www.allendowney.com/blog/2021/04/25/bayesian-and-frequentist-results-are-not-the-same-ever/
- Allen’s presentation on the Overton paradox: https://docs.google.com/presentation/d/1-Uvby1Lfe1BTsxNv5R6PhXfwkLUgsyJgdkKtO8nUfJo/edit#slide=id.g291c5d4559e_0_0
- Video on the Overton Paradox, from PyData NYC 2022: https://youtu.be/VpuWECpTxmM
- Thompson sampling as a dice game: https://allendowney.github.io/TheShakes/
- Causal quartets – Different ways to attain the same average treatment effect: http://www.stat.columbia.edu/~gelman/research/unpublished/causal_quartets.pdf
- LBS #89, Unlocking the Science of Exercise, Nutrition & Weight Management, with Eric Trexler: https://learnbayesstats.com/episode/89-unlocking-science-exercise-nutrition-weight-management-eric-trexler/
- How Minds Change, David McRaney: https://www.davidmcraney.com/howmindschangehome
Abstract
We are happy to welcome Allen Downey back to ur show and he has great news for us: His new book “Probably Overthinking It” is available now.
You might know Allen from his blog by the same name or his previous work. Or maybe you watched some of his educational videos which he produces in his new position at brilliant.org.
We delve right into exciting topics like collider bias and how it can explain the “low brith weight paradox” and other situations that only seem paradoxical at first, until you apply causal thinking to it.
Another classic Allen can unmystify for us is Simpson’s paradox. The problem is not the data, but your expectations of the data. We talk about some cases of Simpson’s paradox, for example from statistics on the Covid-19 pandemic, also featured in his book.
We also cover the “Overton paradox” – which Allen named himself – on how people report their ideologies as liberal or conservative over time.
Next to casual thinking and statistical paradoxes, we return to the common claim that frequentist statistics and Bayesian statistics often give the same results. Allen explains that they are fundamentally different and that Bayesian should not shy away from pointing that out and to emphasise the strengths of their methods.
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
In this episode, I had the pleasure of
speaking with Alan Derny, a professor
2
emeritus at Allin College and a curriculum
designer at brilliant.org.
3
Alan is a renowned author in the fields of
programming and data science, with books
4
such as ThinkPython and ThinkBase to his
credit.
5
He also authors the blog Probably
Overthinking It, and has a new book by the
6
same name,
7
which he just released in December 2023.
8
In this conversation, we tried to help you
differentiate between right and wrong ways
9
of looking at statistical data, we
discussed the overtone paradox and the
10
role of Bayesian thinking in it, and we
detailed a mysterious Bayesian killer app.
11
But that is not all.
12
We even addressed the claim that Bayesian
infrequentist method often yield the same
13
results, and why it is a false claim.
14
If that doesn't get you to listen, I don't
know what will.
15
This is Learning Basion Statistics,
episode 97, recorded October 25, 2023.
16
Hello, Mediabasians!
17
I have two announcements for you today.
18
First, congratulations to the 10 patrons
who won a digital copy of Alan's new book.
19
The publisher will soon get in touch and
send you the link to your free...
20
digital copy if you didn't win.
21
Well, you still won, because you get a 30%
discount if you order with the discount
22
code UCPNew from the UChicagoPress
website.
23
I put the link in the show notes, of
course.
24
Second, a huge thank you to Matt Nichols,
Maxime Goussensdorf, Michael Thomas, Luke
25
Corey and Corey Kaiser for supporting the
show on Patreon.
26
I can assure you, this is the best way to
start the year.
27
Thank you so much for your support.
28
It literally makes this show possible and
it made my day.
29
Now onto the show with Alan Downey.
30
Show you how to be a good peasy and change
your predictions.
31
Alan Downey, welcome back to Learning
Vasion Statistics.
32
Thank you.
33
It's great to be here.
34
Yeah, thanks again for taking the time.
35
And so for people who know you already,
36
or getting to know you.
37
Allen was already on LearnBasedStats in
episode 41.
38
And so if you are interested in a bit more
detail with his background and also much
39
more about his previous book, ThinkBased,
I recommend listening back to the episode
40
41, which will be in the show notes.
41
focus on other topics, especially your new
book, Alain.
42
I don't know how you do that.
43
But well done, congratulations on, again,
another great book that's getting out.
44
But first, maybe a bit more generally, how
do you define the work that you're doing
45
nowadays and the topics that you're
particularly interested in?
46
It's a little hard to describe now because
I was a professor for more than 20 years.
47
And then I left higher ed about a year, a
year and a half ago.
48
And so now my day job, I'm at
brilliant.org and I am writing online
49
lessons for them in programming and data
science, which is great.
50
I'm enjoying that.
51
Yeah.
52
Sounds like fun.
53
It is.
54
And then also working on these books and
blogging.
55
And
56
I think of it now as almost being like a
gentleman scientist or an independent
57
scientist.
58
I think that's my real aspiration.
59
I want to be an 18th century gentleman
scientist.
60
I love that.
61
Yeah, that sounds like a good objective.
62
Yeah, it definitely sounds like fun.
63
It also sounds a bit similar to...
64
what I'm doing on my end with the podcasts
and also the online courses for intuitive
65
base.
66
And also I teach a lot of the workshops at
Pimc Labs.
67
So yeah, a lot of teaching and educational
content on my end too, which I really
68
love.
69
So that's also why I do it.
70
And yeah, it's fun because most of the
time, like you start
71
teaching a topic and that's a very good
incentive to learn it in lots of details.
72
Right.
73
So, lately I've been myself diving way
more into caution processes again, because
74
this is a very fascinating topic, but
quite complex and causal inference also
75
I've been reading up again on this.
76
So it's been quite fun.
77
What has been on your mind recently?
78
Well, you mentioned causal inference and
that is certainly a hot topic.
79
It's one where I always feel I'm a little
bit behind.
80
I've been reading about it and written
about it a little bit, but I still have a
81
lot to learn.
82
So it's an interesting topic.
83
Yeah, yeah, yeah.
84
And the cool thing is that honestly, when
you're coming from the Bayesian framework,
85
to me that feels extremely natural.
86
It's just a way of...
87
Some concepts are the same, but they're
just named differently.
88
So that's all you have to make the
connection in your brain.
89
And some of them are somewhat new.
90
But if you've been doing generative
modeling for a while, then just coming up
91
with the directed acyclic graph for your
model and just updating it from a
92
generative perspective and doing
counterfactual analysis, it's really,
93
do it in the Bayesian workflow.
94
So that's a really good, that really helps
you.
95
To me, you already have the foundations.
96
And you just have to, well, kind of add a
bit of a toolbox to it, you know, like,
97
OK, so what's regression discontinuity
design?
98
What's interrupted time series?
99
Things like that.
100
But otherwise, what's difference in
differences?
101
things like that, but these are kind of
just techniques that you add on top of the
102
foundations, but the concepts are pretty
easy to pick up if you've been in a
103
Bayesian for a while.
104
I guess that's really the good news for
people who are looking into that.
105
It's not completely different from what
you've been doing.
106
No, I think that's right.
107
And in fact, I have a recommendation for
people if they're coming from Bayes and
108
getting into causal inference.
109
Judea Pearl's book, The Book of Why,
follows exactly the progression that you
110
just described because he starts with
Bayesian nets and then says, well, no,
111
actually, that's not quite sufficient.
112
Now for doing causal inference, we need
the next steps.
113
So that was his professional progression.
114
And it makes, I think, a good logical
progression for learning these topics.
115
Yeah, exactly.
116
And well, funny enough, I've been, I've
started rereading the Book of White
117
recently.
118
I had read it like two, three years ago
and I'm reading it again because surely
119
there are a lot of things that I didn't
pick up at the time, didn't understand.
120
And there are some stuff that are going to
resonate with me more now that I have a
121
bit more background, let's say, or...
122
Some other people would say more wrinkles
on my front head, but I don't know why
123
they would say that.
124
So, Alain, already getting off topic, but
yeah, I really love that.
125
The causal inference stuff has been fun.
126
I'm teaching that next Tuesday.
127
First time I'm going to teach three hours
of causal inference.
128
That's going to be very fun.
129
I can't wait for it.
130
Like you try to study the topic and there
are all angles to consider and then a
131
student will come up with a question that
you're like, huh, I did not think about
132
that.
133
Let me come back to you.
134
That's really the fun stuff to me.
135
As you say, I think every teacher has that
experience that you really learn something
136
when you teach it.
137
Oh yeah.
138
Yeah, yeah.
139
I mean, definitely.
140
That's really one of the best ways for me
to learn.
141
a deadline, first, I have to teach that
stuff.
142
And then having a way of talking about the
topic, whether that's teaching or
143
presenting, is really one of the most
efficient ways of learning, at least to
144
me.
145
Because I don't have the personal
discipline to just learn for the sake of
146
learning.
147
That doesn't really happen for me.
148
Now, we might not be as off topic as you
think, because I do have a little bit of
149
causal inference in the new book.
150
Oh, yeah?
151
I've got a section that is about collider
bias.
152
And this is an example where if you go
back and read the literature in
153
epidemiology, there is so much confusion.
154
There was the low birth weight paradox was
one of the first examples, and then the
155
obesity paradox and the twin paradox.
156
And they're all baffling.
157
if you think of it in terms of regression
or statistical association, and then once
158
you draw the causal diagram and figure out
that you have selected a sample based on a
159
collider, the light bulb goes on and it's,
oh, of course, now I get it.
160
This is not a paradox at all.
161
This is just another form of sampling
bias.
162
What's a collider for the, I was going to
say the students, for the listeners?
163
And also then what does collider bias mean
and how do you get around that?
164
Yeah, no, this was really interesting for
me to learn about as I was writing the
165
book.
166
And the example that I started with is the
low birth weight paradox.
167
And this comes from the 1970s.
168
It was a researcher in California.
169
who was studying low birth weight babies
and the effect of maternal smoking.
170
And he found out that if the mother of a
newborn baby smoked, it is more likely to
171
be low birth weight.
172
And low birth weight babies have health
effects, including higher mortality.
173
But what he found is that if you zoom in
and you just look at the low birth weight
174
babies,
175
you would find that the ones whose mother
smoked had better health outcomes,
176
including lower mortality.
177
And this was a time, this was in the 70s,
when people knew that cigarette smoking
178
was bad for you, but it was still, you
know, public health campaigns were
179
encouraging people to stop smoking, and
especially mothers.
180
And then this article came out that said
that smoking appears to have some
181
protective effect.
182
for low birth weight babies.
183
That in the normal range of birth weight,
it appears to be minimally harmful and for
184
low birth weight babies, it's good.
185
And so, he didn't quite recommend maternal
smoking but he almost did.
186
And there was a lot of confusion.
187
It was, I think it wasn't until the 80s
that somebody explained it in terms of
188
causal inference.
189
And then finally in the 90s where someone
was able to show using data that not only
190
was this a mistake, but you could put the
numbers on it and say, look, this is
191
exactly what's going on.
192
If you correct for the bias, you will find
that not surprisingly smoking is bad
193
across the board, even for low birth
weight babies.
194
So the explanation is that there's a
collider and a collider in a causal graph
195
means that there are two arrows.
196
coming into the same box, meaning two
potential causes for the same thing.
197
So in this case, it's low birth weight.
198
And here's what I think is the simplest
explanation of the low birth weight
199
paradox, which is there are two things
that will cause a baby to be low birth
200
weight, either the mother smoked or
there's something else going on like a
201
birth defect.
202
The maternal smoking is relatively benign.
203
It's not good for you, but it's not quite
as bad as the other effects.
204
So you could imagine being a doctor.
205
You've been called in to treat a patient.
206
The baby is born at a low birth weight.
207
And now you're worried.
208
You're saying to yourself, oh, this might
be a birth defect.
209
And then you find out that the mother
smoked.
210
You would be relieved.
211
because that explains the low birth weight
and it decreases the probability that
212
there's something else worse going on.
213
So that's the effect.
214
And again, it's caused because when they
selected the sample, they selected low
215
birth weight babies.
216
So in that sense, they selected on a
collider.
217
And that's where everything goes wrong.
218
Yeah.
219
And it's like, I find that really
interesting and fascinating because in a
220
way,
221
it comes down to a bias in the sample in a
way here.
222
But also the like, so here, in a way, you
don't have really any ways of.
223
doing the analysis without going back to
the data collecting step.
224
But also, colliders are very tricky in the
sense that if you so you have that path,
225
as you were saying.
226
So the collider is a common effect of two
causes.
227
And the two causes can be completely
unrelated.
228
As is often said, if you control for the
collider, then it's going to open the path
229
and it's going to allow information to
flow from, let's say, X to Y and C is the
230
collider.
231
X is not related to Y in the causal graph.
232
But if you control for C, then X is going
to become related to Y.
233
That's really the tricky thing.
234
That's why we're telling people, do not
just throw.
235
predictors at random in your models when
you're doing the linear regression, for
236
instance.
237
Because if there is a collider in your
graph, and very probably there is one at
238
some point, if it's a complicated enough
situation, then you're going to have
239
spurious statistical correlations which
are not causal.
240
But you've created that by basically
opening the collider path.
241
So the good news is that the path is
closed if you like.
242
naturally.
243
So if you don't control for that, if you
don't add that in your model, you're good.
244
But if you start adding just predictors
all over the place, you're very probably
245
going to create collider biases like that.
246
So that's why it's not as easy when you
have a count found, which is kind of the
247
opposite situation.
248
So let's say now C is the common cause of
x and y.
249
Well, then if you have a count found, you
want to block the pass.
250
the path that's going from X to Y through
C to see if there is a path, direct path
251
from X to Y.
252
Then you want to control for C, but if
it's a collider, you don't.
253
So that's why, like, don't control for
everything.
254
Don't put predictors all over the place
because that can be very tricky.
255
Yeah, and I think that's a really valuable
insight because when people start playing
256
with regression,
257
Sure, they just, you know, you add more to
the model, more is better.
258
And yes, once you think about colliders
and mediators, and I think this vocabulary
259
is super helpful for thinking about these
problems, you know, understanding what
260
should and shouldn't be in your model if
what you're trying to do is causal.
261
Yeah.
262
And that's also definitely something I...
263
can see a lot.
264
It depends on where the students are
coming from.
265
But yeah, where it's like they show me a
regression with, I don't know, 10
266
predictors already.
267
And then I can't.
268
I swear the model doesn't make really
sense.
269
I'm like, wait, did you try with less
predictors?
270
Like, you just do first the model with
just an intercept and then build up from
271
that?
272
And no, often it turns out it's the first
version of the model with 10 predictors.
273
So you're like, oh, wait.
274
Look at that again from another
perspective, from a more minimalist
275
perspective.
276
But that's awesome.
277
I really love that you're talking about
that in the book.
278
I recommend people then looking at it
because it's not only very interesting,
279
it's also very important if you're looking
into, well, are my models telling me
280
something valuable?
281
Are they?
282
helping me understand what's going on or
is it just something that helps me predict
283
better?
284
But other than that, I cannot say a lot.
285
So definitely listeners refer to that.
286
And actually, the URL editor was really
kind to me and Alan because, well, first
287
10 of the patrons are going to get the
book for free at random.
288
So thank you so much.
289
link that you have in the show notes, you
can buy the book at a 30% discount.
290
So, even if you don't win, you will win.
291
So, definitely go there and buy the book,
or if you're a patron, enter the random
292
draw, and we'll see what randomness has in
stock for you.
293
And actually, so we already started diving
in one of your chapters, but
294
Maybe let's take a step back and can you
provide an overview of your new book
295
that's called Probably Overthinking It and
what inspired you to write it?
296
Yeah, well, Probably Overthinking It is
the name of my blog from more than 10
297
years ago.
298
And so one of the things that got this
project started was kind of a greatest
299
hits from the blog.
300
There were a number of articles that
had...
301
either got a lot of attention or where I
thought there was something really
302
important there that I wanted to collect
and present a little bit more completely
303
and more carefully in a book.
304
So that's what started it.
305
And it was partly like a collection of
puzzles, a collection of paradoxes, the
306
strange things that we see in data.
307
So like Collider Bias, which is Berkson's
paradox is the other name for that.
308
There's Simpson's paradox.
309
There's one paradox after another.
310
And that's when I started, I thought that
was what the book was going to be about.
311
It was, here are all these interesting
puzzles.
312
Let's think about them.
313
But then what I found in every chapter was
that there was at least one example that
314
bubbled up where these paradoxes were
having real effects in the world.
315
People were getting things genuinely
wrong.
316
And.
317
those errors had consequences for public
health, for criminal justice, for all
318
kinds of real things that affect real
lives.
319
And that's where the book kind of took a
turn toward not so much the paradox
320
because it's fun to think about, although
it is, but the places where we use data to
321
make better decisions and get better
outcomes.
322
And then a little bit of the warnings
about what can go wrong when we make some
323
of these errors.
324
And most of them boil down, when you think
about it, to one form of sampling bias or
325
another.
326
That should be the subtitle of this book
is like 12 chapters of sampling bias.
327
Yeah, I mean, that's really interesting to
see that a lot of problems come from
328
sampling biases, which is almost
disappointing in the sense that it sounds
329
really simple.
330
But I mean, as we can see in your book,
it's maybe easy to understand the problem,
331
but then solving it is not necessarily
easy.
332
So that's one thing.
333
And then I'm wondering.
334
How would you say, probably over thinking
it helps the readers differentiate between
335
the right and wrong ways of looking at
statistical data?
336
Yeah, I think there are really two
messages in this book.
337
One of them is the optimistic view that we
can use data to answer questions and
338
settle debates and make better decisions.
339
and we will be better off if we do.
340
And most of the time, it's not super hard.
341
If you can find or collect the right data,
most of the time you don't need fancy
342
statistics to answer the questions you
care about with the right data.
343
And usually a good data visualization, you
can show what you wanna show in a
344
compelling way.
345
So that's the good news.
346
And then the bad news is these warnings.
347
I think the key to these things is to
think about them and to see a lot of
348
examples.
349
And I'll take like Simpson's paradox as an
example.
350
If you take an intro stats class, you
might see one or two examples.
351
And I think you come away thinking that
it's just weird, like, oh, those were
352
really confusing and I'm not sure I really
understand what's happening.
353
where at some point you start thinking
about Simpson's paradox and you just
354
realize that there's no paradox there.
355
It's just a thing that can happen because
why not?
356
If you have different groups and you plot
a line that connects the two groups, that
357
line might have one slope.
358
And then when you zoom in and look at one
of those groups in isolation and plot a
359
line through it, there's just no reason.
360
that second line within the group should
have the same slope as the line that
361
connects the different groups.
362
And so I think that's an example where
when you see a lot of examples, it changes
363
the way you think about the thing.
364
Not from, oh, this is a weird, confusing
thing to, well, actually, it's not a thing
365
at all.
366
The only thing that was confusing is that
my expectation was wrong.
367
Yeah, true.
368
Yeah, I love that.
369
I agree.
370
always found it a bit weird to call all
these phenomenon paradoxes in a way.
371
Because as you're saying, it's more prior
expectation that makes it a paradox.
372
Whereas, why should nature obey our simple
minds and priors?
373
there is nothing that says it should.
374
And so most of the time, it's just that,
well, reality is not the way we thought it
375
was.
376
That's OK.
377
And I mean, in a way, thankfully,
otherwise, it would be quite boring.
378
But yeah, that's a bit like when data is
dispersed a lot, there is a lot of
379
variability in the data.
380
And then we tend to say data is over
dispersed.
381
which I always find weird.
382
It's like, well, it's not the data that's
over dispersed.
383
It's the model that's under dispersed.
384
The data doesn't have to do anything.
385
It's the model that has to adapt to the
data.
386
So just adapt the model.
387
But yeah, it's a fun way of phrasing it,
whereas it's like it's the data's fault.
388
But no, not really.
389
It's just, well, it's just a lot of
variation.
390
And.
391
And that made me think actually the
Simpson paradox that also made me think
392
about, did you see that recent paper by, I
mean from this year, so it's quite recent
393
for a paper from Andrew Gellman, Jessica
Hellman, and Lauren Kennedy about the
394
causal quartets?
395
No, I missed it.
396
Awesome, well I'll send that away and I'll
put that on the show notes.
397
But basically the idea is,
398
taking Simpson's paradox, but instead of
looking at it from a correlation
399
perspective, looking at it from a causal
perspective.
400
And so that's basically the same thing.
401
It's different ways to get the same
average treatment effect.
402
So, you know, like Simpson's paradox where
you have four different data points and
403
you get the same correlation between them,
well, here you have four different
404
causal structures that give you different
data points.
405
But if you just look at the average
treatment effect, you will think that it's
406
the same for the four, whereas it's not.
407
You know, so the point is also, well,
that's why you should not only look at the
408
average treatment effect, right?
409
Look at the whole distribution of
treatment effects, because if you just
410
look at the average, you might be in a
situation where the population is really
411
not diverse and then yeah, the average
treatment effect is fake.
412
effect is something representative.
413
But what if you're in a very dispersed
population and the treatment effects can
414
be very negative or very positive, but
then if you look at the averages, it looks
415
like there is no average treatment effect.
416
So then you could conclude that there is
no treatment effect, whereas there is
417
actually a big treatment effect just that
when you look at the average, it cancels
418
out.
419
So yeah, like the...
420
The idea of the paper is the main idea is
that.
421
And that's, I mean, I think this will be
completely trivial to you, but I think
422
it's a good way of teaching this, where
you can, if you just look at the average,
423
you can get beaten by that later on.
424
Because basically, if you're average, you
summarize.
425
And if you summarize, you're looking some
information somewhere.
426
So you're young.
427
You have to cut some dimension of
information to average naturally.
428
So if you do that, it comes at a cost.
429
And the paper does a good job at showing
that.
430
Yes, that's really interesting because
maybe coincidentally, this is something
431
that I was thinking about recently,
looking at the evidence for pharmaceutical
432
treatments for depression.
433
There was a meta-analysis a few months
ago.
434
that really showed quite modest treatment
effects, that the average is not great.
435
And the conclusion that the paper drew was
that the medications were effective for
436
some people and they said something like
15%, which is also not great, but
437
effective for 15% and ineffective or
minimally effective for others.
438
And I was actually surprised by that
result because it was not clear to me how
439
they were distinguishing between having a
very modest effect for everybody or a
440
large effect for a minority that was
averaged in with a zero effect for
441
everybody else, or even the example that
you mentioned, which is that you could
442
have something that's highly effective for
one group and detrimental for another
443
group.
444
And exactly as you said, if you're only
looking at the mean, you can't tell the
445
difference.
446
But what I don't know and I still want to
find out is in this study, how did they
447
draw the conclusion that they drew, which
is they specified that it's effective for
448
15% and not for others.
449
So yeah, I'll definitely read that paper
and see if I can connect it with that
450
research I was looking at.
451
Yeah.
452
Yeah, I'll send it to you and I already
put it in the show notes for people who
453
want to dig deeper.
454
And I mean, that's a very common pitfall,
especially in the social sciences, where
455
doing big experiments with lots of
subjects is hard and very costly.
456
And so often you're doing inferences on
very small groups.
457
And that's even more complicated to just
look at the average treatment effect.
458
It can be very problematic.
459
And interestingly, I talked about that.
460
I mentioned that paper first in episode 89
with Eric Trexler, who works on the
461
science of nutrition and exercise,
basically.
462
So in this field, especially, it's very
hard to have big samples when they do
463
experiments.
464
And so most of the time, they have 10, 20
people per group.
465
is like each time I read that literature,
first they don't use patient stats a lot.
466
And I'm like, with so low sample sizes,
it's, I'm like, yeah, you should use more,
467
use BRMS, use BAMB, if you don't really
know how to do the models, but really, you
468
should.
469
And also, if you do that, and then you
also only look at the average treatment
470
effects.
471
I'm guessing you have.
472
big uncertainties on the conclusions you
can draw.
473
So yeah, I will put that episode also in
the show notes for people who when I
474
referred to it, that was a very
interesting episode where we talked about
475
exercise science, nutrition, how that
relates to weight management and how from
476
an anthropological perspective, also how
the body reacts to these effects.
477
mostly will fight you when you're trying
to lose a lot of weight, but doesn't
478
really fight you when you gain a lot of
weight.
479
And that's also very interesting to know
about these things, especially with the
480
rampant amount of obesity in the Western
societies where it's really concerning.
481
And so these signs helps understand what's
going on and how also we can help.
482
people getting into more trajectories that
are better for their health, which is the
483
main point basically of that research.
484
I'm also wondering, if your book, when you
wrote it, and especially now that you've
485
written it, what would you say, what do
you see as the key takeaways for readers?
486
And especially for readers who may not
have a strong background in statistics.
487
Part of it is I hope that it's empowering
in the sense that people will feel like
488
they can use data to answer questions.
489
As I said before, it often doesn't require
fancy statistics.
490
So...
491
There are two parts of this, I think.
492
And one part is as a consumer of data, you
don't have to be powerless.
493
You can read data journalism and
understand the analysis that they did,
494
interpret the figures and maintain an
appropriate level of skepticism.
495
In my classes, I sometimes talk about
this, a skeptometer, where if you believe
496
everything that you read,
497
That is clearly a problem.
498
But at the other extreme, I often
encounter students who have become so
499
skeptical of everything that they read
that they just won't accept an answer to a
500
question ever.
501
Because there's always something wrong
with a study.
502
You can always look at a statistical
argument and find a potential flaw.
503
But that's not enough to just dismiss
everything that you read.
504
If you think you have found a potential
flaw, there's still a lot of work to do to
505
show that actually that flaw is big enough
to affect the outcome substantially.
506
So I think one of my hopes is that people
will come away with a well-calibrated
507
skeptometer, which is to look at things
carefully and think about the kinds of
508
errors that there can be, but also take
the win.
509
If we have the data and we come up with a
satisfactory answer, you can accept that
510
question as provisionally answered.
511
Of course, it's always possible that
something will come along later and show
512
that we got it wrong, but provisionally,
we can use that answer to make good
513
decisions.
514
And by and large, we are better off.
515
This is my argument for evidence and
reason.
516
But by and large, if we make decisions
that are based on evidence and reason, we
517
are better off than if we don't.
518
Yeah, yeah.
519
I mean, of course I agree with that.
520
It's like preaching to the choir.
521
It shouldn't be controversial.
522
No, yeah, for sure.
523
A difficulty I have though is how do you
explain people they should care?
524
You know?
525
Why do you think...
526
we should care about even making decisions
based on data.
527
Why is that even important?
528
Because that's just more work.
529
So why should people care?
530
Well, that's where, as I said, in every
chapter, something bubbled up where I was
531
a little bit surprised and said, this
thing that I thought was just kind of an
532
academic puzzle actually matters.
533
People are getting it wrong.
534
because of this.
535
And there are examples in the book,
several from public health, several from
536
criminal justice, where we don't have a
choice about making decisions.
537
We're making decisions all the time.
538
The only choice is whether they're
informed or not.
539
And so one of the example, actually,
Simpson's paradox is a nice example.
540
Let me see if I remember this.
541
It came from a journalist, and I
deliberately don't name him in the book
542
because I just don't want to give him any
publicity at all.
543
but the Atlantic magazine named him the
pandemic's wrongest man because he made a
544
career out of committing statistical
errors and misleading people.
545
And he actually features in two chapters
because he commits the base rate fallacy
546
in one and then gets fooled by Simpson's
paradox in another.
547
And if I remember right, in the Simpsons
Paradox example, he looked at people who
548
were vaccinated and compared them to
people who were not vaccinated and found
549
that during a particular period of time in
the UK, the death rate was higher for
550
people who were vaccinated.
551
The death rate was lower for people who
had not been vaccinated.
552
So on the face of it, okay, well, that's
surprising.
553
Okay, that's something we need to explain.
554
It turns out to be an example of Simpson's
paradox, which is the group that he was
555
looking at was a very wide age range from
I think 15 to 89 or something like that.
556
And at that point in time during the
pandemic, by and large, the older people
557
had been vaccinated and younger people had
not, because that was the priority
558
ordering when the vaccines came out.
559
So in the group that he compared, the ones
who were vaccinated were substantially
560
older than the ones who were unvaccinated.
561
And the death rates, of course, were much
higher in older age groups.
562
So that explained it.
563
range of ages together into one group, you
saw one effect.
564
And if you broke it up into small age
ranges, that effect reversed itself.
565
So it was a Simpson's paradox.
566
If you appropriately break people up by
age, you would find that in every single
567
age group, death rates were lower among
the vaccinated, just as you would expect
568
if the vaccine was safe and effective.
569
And that's also where I feel like if you
start thinking about the causal graph, you
570
know, and the causal structure, that's
also where that would definitely help.
571
Because it's not that hard, right?
572
The idea here is not hard.
573
It's not even hard mathematically.
574
I think anybody can understand it even if
they don't have a mathematical background.
575
So yeah, it's mainly that.
576
And I think the most important point is
that, yeah.
577
matters because it affects decisions in
the real world.
578
That thing has literally life and death
consequences.
579
I'm glad you mentioned it because you do
discuss the base rate fallacy and its
580
connection to Bayesian thinking in the
book, right?
581
It starts with the example that everybody
uses, which is interpreting the results of
582
a medical test.
583
Because that's a case that's surprising
when you first hear about it and where
584
Bayesian thinking clarifies the picture
completely.
585
Once you get your head around it, it is
like these other examples.
586
Not only gets explained, it stops being
surprising.
587
And this I'll...
588
Give the example, I'm sure this is
familiar to a lot of your listeners, but
589
if you take a medical test, let's take a
COVID test as an example, and suppose that
590
the test is accurate, 90% accurate, and
let's suppose that means both specificity
591
and sensitivity.
592
So if you have the condition, there's a
90% chance that you correctly get a
593
positive test.
594
If you don't have the condition, there's a
90% chance that you correctly get a
595
negative test.
596
And so now the question is, you take the
test, it comes back positive, what's the
597
probability that you have the condition?
598
And that's where people kind of jump onto
that accuracy statistic.
599
And they think, well, the test is 90%
accurate, so there's a 90% chance that I
600
have, let's say, COVID in this example.
601
And that can be totally wrong, depending
on the base rate or invasion terms,
602
depending on the prior.
603
And here's where the Bayesian thinking
comes out, which is that different people
604
are going to have very different priors in
this case.
605
If if you know that you were exposed to
somebody with COVID three days later, you
606
feel a scratchy throat.
607
The next day you wake up with flu
symptoms.
608
Before you even take a test, I'm going to
say there's at least a 50% chance that you
609
have COVID, maybe higher.
610
Could be a cold.
611
So, you know, it's not 100%.
612
So let's say it's 50-50.
613
You take this COVID test.
614
And let's say, again, 90% accuracy, which
is lower than the home test.
615
So I'm being a little bit unfair here.
616
But let's say 90%.
617
Your prior was 50-50.
618
The likelihood ratio is about 9 to 1.
619
And so your posterior belief is about 9 to
1, which is roughly 90%.
620
So quite likely that test is correct,
621
in this example, have COVID.
622
But the flip side is, let's say you're in
New Zealand, which has a very low rate of
623
COVID infection.
624
You haven't been exposed.
625
You've been working from home for a week,
and you have no symptoms at all.
626
You feel totally fine.
627
What's your base rate there?
628
What's the probability that you
miraculously have COVID?
629
1 in 1,000 at most, probably lower.
630
And so if you.
631
took a test and it came back positive,
it's still probably only about one in a
632
hundred that you actually have COVID and a
99% chance that that's a false positive.
633
So that's, you know, as I said, that's the
usual example.
634
It's probably familiar, but it's a case
where if you neglect the prior, if you
635
neglect the base rate, you can be not just
a little bit wrong, but wrong by orders of
636
magnitude.
637
Yeah, exactly.
638
And it is a classical example for us in
the stats world, but I think it's very
639
effective for non-stats people because
that also talks to them.
640
And it's also the gut reaction to a
positive test is so geared towards
641
thinking you do have the disease that I
think that that's also why
642
It's a good one.
643
Another paradox you're talking about in
the book is the Overton paradox.
644
Could you share some insights into this
one?
645
I don't think I know that one and how
Bayesian analysis plays a role in
646
understanding it, if any.
647
Sure.
648
Well, you may not have heard of the
Overton paradox, and that's because I made
649
the name up.
650
We'll see, I don't know if it will stick.
651
One of the things I'm a little bit afraid
of is it's possible that this is something
652
that has been studied and is well known
and I just haven't found it in the
653
literature.
654
I've done my best and I've asked a number
of people, but I think it's a thing that
655
has not been given a name.
656
So maybe I've given it a name, but we'll
find out.
657
But that's not important.
658
The important part is I think it answers
an interesting question.
659
And this is
660
If you compare older people and younger
people in terms of their political
661
beliefs, you will find in general that
older people are more conservative.
662
So younger people, more liberal, older
people are more conservative.
663
And if you follow people over time and you
ask them, are you liberal or conservative,
664
it crosses over.
665
When people are roughly 25 years old, they
are more likely to say liberal.
666
By the time they're 35 or 40, they are
more likely to say conservative.
667
So we have two patterns here.
668
We have older people actually hold more
conservative beliefs.
669
And as people get older, they are more
likely to say that they are conservative.
670
Nevertheless, if you follow people over
time, their beliefs become more liberal.
671
So that's the paradox.
672
By and large, people don't change their
beliefs a lot over the course of their
673
lives.
674
Excuse me.
675
But when they do, they become a little bit
more liberal.
676
But nevertheless, they are more likely to
say that they are conservative.
677
So that's the paradox.
678
And let me put it to you.
679
Do you know why?
680
I've heard about the two in isolation, but
I don't think I've heard them linked that
681
way.
682
And no, for now, I don't have an intuitive
explanation to that.
683
So I'm very curious.
684
So here's my theory, and it is partly that
conservative and liberal are relative
685
terms.
686
I am to the right of where I perceive the
center of mass to be.
687
And the center of mass is moving over
time.
688
And that's the key, primarily because of
generational replacement.
689
So as older people die and they are
replaced by younger people, the mean
690
shifts toward liberal pretty consistently
over time.
691
And it happens in all three groups among
people who identify themselves as
692
conservative, liberal, or moderate.
693
All three of those lines are moving almost
in parallel toward more liberal beliefs.
694
And what that means is if you took a time
machine to 1970 and you collected the
695
average liberal and you put them in a time
machine and you bring them to the year
696
2000.
697
they would be indistinguishable from a
moderate in the year 2000.
698
And if you bring them all the way to the
present, they would be indistinguishable
699
from a current conservative, which is a
strange thing to realize.
700
If you have this mental image of people in
tie dye with peace medallions from the
701
seventies being transported into the
present, they would be relatively
702
conservative compared to current views.
703
And that is almost that time traveler
example is almost exactly what happens to
704
people over the course of their lives.
705
That in their youth, they hold views that
are left of center.
706
And their views change slowly over time,
but the center moves faster.
707
And that's, I call it chasing the Overton
window.
708
The Overton window, I should explain where
that term comes from, is in political
709
science,
710
It is the set of ideas that are
politically acceptable at any point in
711
time.
712
And it shifts over time, which is
something that might have been radical in
713
the 1970s, might be mainstream now.
714
And there are a number of views from the
seventies that were pretty mainstream.
715
Like a large fraction.
716
I don't think it was a majority, but I
forget the number.
717
It might, might've been 30% of people in
the 1970s thought that mixed race
718
marriages should be illegal.
719
Yeah.
720
That wasn't the majority view, but it was
mainstream.
721
And now that's pretty out there.
722
That's a pretty small minority still hold
that view and it's considered extreme.
723
Yeah, and it changed quite, quite fast.
724
Yes.
725
Also, like, the acceptability of same sex
marriage really changed very fast.
726
If you look in it, you know,
727
time series perspective.
728
That's also a very interesting thing that
these opinions can change very fast.
729
So yeah, okay.
730
I understand.
731
It's kind of like how you define liberal
and conservative in a way explains that
732
paradox.
733
Very interesting.
734
This is a little speculative, but that's
something that might have accelerated
735
since the 1990s.
736
that in many of the trends that I saw
between 1970 and 1990, they were
737
relatively slow and they were being driven
by generational replacement.
738
By and large, people were not changing
their minds.
739
It's just that people would die and be
replaced.
740
There's a line from the sciences that says
that the sciences progress one funeral at
741
a time.
742
Just a little morbid.
743
But that is in some sense the baseline
rate.
744
societal change and it's relatively slow.
745
It's about 1% a year.
746
Yeah.
747
In the starting the 1990s, and
particularly you mentioned support for
748
same sex marriage, also just general
acceptance of homosexuality changed
749
radically.
750
In in 1990, it was about 75% of the US
population would have said that
751
homosexuality was wrong.
752
That was one of the questions in the
general social survey.
753
Do you think it's wrong?
754
75%?
755
That's
756
I think below 30 now.
757
So between 1990 and now, let's say roughly
40 years, it changed by about 40
758
percentage points.
759
So that's about the speed of light in
terms of societal change.
760
And one of the things that I did in the
book was to try to break that down into
761
how much of that is generational
replacement and how much of that is people
762
actually changing their minds.
763
And that was an example where I think 80%
of the change was changed minds.
764
not just one funeral at a time.
765
So that's something that might be
different now.
766
And one obvious culprit is the internet.
767
So we'll see.
768
Yeah.
769
And another proof that the internet is
neither good nor bad, right?
770
It's just a tool, and it depends on what
we're doing with it.
771
The internet is helping us right now
having that conversation and me having
772
that podcast for four years.
773
Otherwise, that would have been.
774
virtually impossible.
775
So yeah, really depends on what you're
doing with it.
776
And another topic, I mean, I don't think I
don't remember it being in the book, but I
777
think you mentioned it in one of your blog
posts, is the idea of a Bajan killer app.
778
So I have to ask you about that.
779
Why is it important in the context of
decision making and statistics?
780
I think a perpetual question, which is,
you know, if Bayesian methods are so
781
great, why are they not taking off?
782
Why is not everybody is using them?
783
And I think one of the problems is that
when people do the comparison of Bayesian
784
and frequentism, and they have tried out
the usual debates, they often show an
785
example where you do the frequentist
analysis and you get a point estimate.
786
And then you do the Bayesian analysis and
you generate a point estimate.
787
And sometimes it's the same or roughly the
same.
788
And so people sort of shrug and say, well,
you know, what's the big deal?
789
The problem there is that when you do the
Bayesian analysis, the result is a
790
posterior distribution that contains all
of the information that you have about
791
whatever it was that you were trying to
estimate.
792
And if you boil it down to a point
estimate, you've discarded all the useful
793
information.
794
So.
795
If all you do is compare point estimates,
you're really missing the point.
796
And that's where I was thinking about what
is the killer app that really shows the
797
difference between Bayesian methods and
the alternatives.
798
And my favorite example is the Bayesian
bandit strategy or Thompson sampling,
799
which is an application to anything that's
like A-B testing or running a medical test
800
where you're comparing two different
treatments.
801
you are always making a decision about
which thing to try next, A or B or one
802
treatment or the other, and then when you
see the result you're updating your
803
beliefs.
804
So you're constantly collecting data and
using that data to make decisions.
805
And that's where I think the Bayesian
methods show what they're really good for,
806
because if you are making decisions and
those decisions
807
the whole posterior distribution because
most of the time you're doing some kind of
808
optimization.
809
You are integrating over the posterior or
in discrete world, you're just looping
810
over the posterior and for every possible
outcome, figuring out the cost or the
811
benefit and weighting it by its posterior
probability.
812
That's where you get the real benefit.
813
And so, Thompson
814
end-to-end application where people
understand the problem and where the
815
solution is a remarkably elegant and
simple one.
816
And you can point to the outcome and say,
this is an optimal balance of exploitation
817
and exploration.
818
You are always making the best decision
based on the information that you have at
819
that point in time.
820
Yeah.
821
Yeah, I see what you're saying.
822
And I...
823
In a way, it's a bit of a shame that it's
the simplest application because it's not
824
that simple.
825
But yeah, I agree with that example.
826
And for people, I put this blog post where
you talk about that patient care app in
827
the show notes because yeah, it's not
super easy,
828
I think it's way better in a written
format, or at least a video.
829
But yeah, definitely these kind of
situations in a way where you have lots of
830
uncertainty and you really care about
updating your belief as accurately as
831
possible, which happens a lot.
832
But yeah, in this case also, I think it's
extremely valuable.
833
But I think it can be.
834
Because first of all, I think if you do it
using conjugate priors, then the update
835
step is trivial.
836
You're just updating beta distributions.
837
And every time a new data comes in, a new
datum, you're just adding one to one of
838
your parameters.
839
So the computational work is the increment
operator, which is not too bad.
840
But I've also done a version of Thompson
sampling as a dice game.
841
I want to take this opportunity to point
people to it.
842
I gave you the link, so I hope it'll be in
the notes.
843
But the game is called The Shakes.
844
And I've got it up on a GitHub repository.
845
But you can do Thompson sampling just by
rolling dice.
846
Yeah.
847
So we'll definitely put that in the show
notes.
848
And also to come back to something you
said just a bit earlier.
849
For sure.
850
Then also something that puzzles me is
when people have a really good patient
851
model, it's awesome.
852
It's a good representation of the
underlying data generating process.
853
It's complex enough, but not too much.
854
It samples well.
855
And then they do decision making based on
the mean of the posterior estimates.
856
And I'm like, no, that's a shame.
857
Why are you doing that past the whole
distribution?
858
to your optimizer so that you can make
decisions based on the full uncertainty of
859
the model and not just take the most
probable outcome.
860
Because first, maybe that's not really
what you care about.
861
And also, by definition, it's going to
sample your decision.
862
It's going to bias your decision.
863
So yeah, that always kind of breaks my
heart.
864
But you've worked so well to get that.
865
It's so hard to get those posterior
distributions.
866
And now you're just.
867
throwing everything away.
868
That's a shame.
869
Yeah.
870
Do patient decision making, folks.
871
You're losing all that information.
872
And especially in any case where you've
got very nonlinear costs, nonlinear in the
873
size of the error, and especially if it's
asymmetric.
874
Thinking about almost anything that you
build, you always have a trade off between
875
under building and over building.
876
Over building is bad because it's
expensive.
877
And underbuilding is bad because it will
fail catastrophically.
878
So that's a case where you have very
nonlinear costs and very asymmetric.
879
If you have the whole distribution, you
can take into account what's the
880
probability of extreme catastrophic
effects, where the tail of that
881
distribution is really important to
potential outcomes.
882
Yeah, definitely.
883
And.
884
What I mean, I could continue, but we're
getting short on time and I still have a
885
lot of things to ask you.
886
So let's move on.
887
And actually, I think you mentioned it a
bit at the beginning of your answer to my
888
last question.
889
But in another of your blog posts, you
addressed the claim that patient
890
infrequentist methods often yield the same
results.
891
And so I know you like to talk about that.
892
So could you elaborate on this and why
you're saying it's a false claim?
893
Yeah, as I mentioned this earlier, you
know, frequentist methods produce a point
894
estimate and a confidence interval.
895
And Bayesian methods produce a posterior
distribution.
896
So they are different kinds of things.
897
They cannot be the same.
898
And I think Bayesians sometimes say this
as a way of being conciliatory that, you
899
know, we're trying to let's all get along.
900
And often, frequentist and Bayesian
methods are compatible.
901
So that's good.
902
The Bayesian methods aren't scary.
903
I think strategically that might be a
mistake, because you're conceding the
904
thing that makes Bayesian methods better.
905
It's the posterior distribution that is
useful for all the reasons that we just
906
said.
907
So it is never the same.
908
It is sometimes the case that if you take
the posterior distribution and you
909
summarize it,
910
with a point estimate or an interval, that
yes, sometimes those are the same as the
911
frequentist methods.
912
But the analogy that I use is, if you are
comparing a car and an airplane, but the
913
rule is that the airplane has to stay on
the ground, then you would come away and
914
you would think, wow, that airplane is a
complicated, expensive, inefficient way to
915
drive on the highway.
916
And you're right.
917
If you want to drive on the highway, an
airplane is a terrible idea.
918
The whole point of an airplane is that it
flies.
919
If you don't fly the plane, you are not
getting the benefit of an airplane.
920
That is a good point.
921
And same, if you are not using the
posterior distribution, you are not
922
getting the benefit of doing Bayesian
analysis.
923
Yeah.
924
Yeah, exactly.
925
drive airplanes on the highway hurt you
well.
926
Actually, a really good question is that
you can really see, and I think I do, and
927
I'm probably sure you do in the work, you
do see many practitioners that might be
928
hesitant to adopt patient methods due to
some perceived complexity most of the
929
time.
930
So I wonder in general, what resources or
strategies you recommend to those who want
931
to learn and apply patient techniques in
their work.
932
Yeah, I think Bayesian methods get the
reputation for complexity, I think largely
933
because of MCMC.
934
That if that's your first exposure, that's
scary and complicated.
935
Or if you do it mathematically and you
start with big scary integrals, I think
936
that also makes it seem more complex than
it needs to be.
937
I think there are a couple of
alternatives.
938
And the one that I use in think Bayes is
everything is discrete and everything is
939
computational.
940
So all of those integrals become for loops
or just array operations.
941
And I think that helps a lot.
942
So those are using grid algorithms.
943
I think grid algorithms can get you a
really long way with very little tooling,
944
basically arrays.
945
You lay out a grid, you compute a prior,
you compute a likelihood, you do a
946
multiplication, which is usually just an
array multiplication, and you normalize,
947
divide through by the total.
948
That's it.
949
That's a Bayesian update.
950
So I think that's one approach.
951
The other one, I would consider an
introductory stats class that does
952
everything using Bayesian methods, using
conjugate priors.
953
And don't derive anything.
954
Don't compute why the beta binomial model
works.
955
But if you just take it as given, that
when you are estimating a proportion, you
956
run a bunch of trials.
957
and you'll have some number of successes
and some number of failures.
958
Let's call it A and B.
959
You build a beta distribution that has the
parameters A plus one, B plus one.
960
That's it.
961
That's your posterior.
962
And now you can take that posterior beta
distribution and answer all the questions.
963
What's the mean?
964
What's a confidence or credible interval?
965
But more importantly, like what are the
tail probabilities?
966
What's the probability that I could exceed
some critical value?
967
Or, again, loop over that posterior and
answer interesting questions with it.
968
You could do all of that on the first day
of a statistics class.
969
And use a computer, because we can
compute.
970
SciPy.stats.beta will tell you everything
you want to know about a beta
971
distribution.
972
of a stats class, that's estimating
proportions.
973
It's everything you need to do.
974
And it handles all of the weird cases.
975
Like if you want to estimate a very small
probability, it's okay.
976
You can still get a confidence interval.
977
It's all perfectly well behaved.
978
If you have an informative prior, sure, no
problem.
979
Just start with some pre-counts in your
beta distribution.
980
So day one, estimating proportions.
981
Day two, estimate rates.
982
You could do exactly the same thing with a
Poisson gamma model.
983
And the update is just as trivial.
984
And you could talk about Poisson
distributions and exponential
985
distributions and estimating rates.
986
My favorite example is I always use either
soccer, football, or hockey as my example
987
of goal scoring rates.
988
And you can generate predictions.
989
You can say, what are the likely outcomes
of the next game?
990
What's the chance that I'm going to win,
let's say, it's a best of seven series.
991
The update is computationally nothing.
992
Yeah.
993
And you can answer all the interesting
questions about rates.
994
So that's day two.
995
I don't know what to do with the rest of
the semester because we've just done 90%
996
of an intro stats class.
997
Yes.
998
Yeah, that sounds like something I think
that would work in the sense that at least
999
that was my experience.
Speaker:
Funny story, I used to not like stats,
which is funny when you see what I'm doing
Speaker:
today.
Speaker:
But when I was in university, I did a lot
of math.
Speaker:
And the thing is, the stats we were doing
with was pen and paper.
Speaker:
So it was incredibly boring.
Speaker:
I was always, you know, dice problems and
very trivial stuff that you have to do
Speaker:
that because the human brain is not good
at computing that kind of stuff, you know.
Speaker:
did when I started having to use
statistics to do electoral forecasting.
Speaker:
I was like, but this is awesome.
Speaker:
Like I can just simulate the distribution.
Speaker:
I can see them on the screen.
Speaker:
I can really almost touch them.
Speaker:
You know, and that was much more concrete
first and also much more empowering
Speaker:
because I could work on topics that were
not trivial stuff that I only would use
Speaker:
for board games.
Speaker:
You know?
Speaker:
So.
Speaker:
I think it's a very powerful way of
teaching for sure.
Speaker:
So to play us out, I'd like to zoom out a
bit and ask you what you hope readers will
Speaker:
take away from probably overthinking it
and how can the insights from your book be
Speaker:
applied to improve decision making in
various fields?
Speaker:
Yeah.
Speaker:
Well, I think I'll...
Speaker:
come back to where we started, which is it
is about using data to answer questions,
Speaker:
make better decisions.
Speaker:
And my thesis again is that we are better
off when we use evidence and reason than
Speaker:
when we don't.
Speaker:
So I hope it's empowering.
Speaker:
I hope people come away from it thinking
that you don't need graduate degrees in
Speaker:
statistics to work with data to interpret
the results that you're seeing in
Speaker:
research papers, in newspapers, that it
can be straightforward.
Speaker:
And then occasionally there are some
surprises that you need to know about.
Speaker:
Yeah.
Speaker:
For sure.
Speaker:
Personally, have you changed some of the
ways you're making decisions based on your
Speaker:
work for this book, Kéján?
Speaker:
Maybe.
Speaker:
I think a lot of the examples in the book
come from me thinking about something in
Speaker:
real life.
Speaker:
There's one example where when I was
running a relay race, I noticed that
Speaker:
everybody was either much slower than me
or much faster than me.
Speaker:
And it seemed like there was nobody else
in the race who was running at my speed.
Speaker:
And that's the kind of thing where when
you're running and you're oxygen deprived,
Speaker:
it seems really confusing.
Speaker:
And then with a little bit of reflection,
you realize, well, there's some
Speaker:
statistical bias there, which is, if
someone is running the same speed as me,
Speaker:
I'm unlikely to see them.
Speaker:
Yeah.
Speaker:
But if they are much faster or much
slower, then I'm going to overtake them or
Speaker:
they're going to overtake me.
Speaker:
Yeah, exactly.
Speaker:
And that makes me think about an
absolutely awesome joke from, of course, I
Speaker:
don't remember the name of the comedian,
but very, very well-known US comedian that
Speaker:
you may know.
Speaker:
And the joke was, have you ever noticed
that everybody that drives slower than you
Speaker:
on the road is a jackass?
Speaker:
and everybody that drives faster than you
is a moron.
Speaker:
It's really the same idea, right?
Speaker:
It's like you have the right speed and
you're doing the right thing and everybody
Speaker:
else is just either a moron or a jackass.
Speaker:
That's exactly right.
Speaker:
I believe that is George Carlin.
Speaker:
This exactly George Carlin, yeah, yeah.
Speaker:
And amazing, I mean, George Carlin is just
absolutely incredible.
Speaker:
But...
Speaker:
Yeah, that's what is already a very keen
observation of the human nature also, I
Speaker:
think.
Speaker:
Which is also an interesting joke in the
sense that it relates to one, you know,
Speaker:
concepts of how minds change and how
people think about reality and so on.
Speaker:
And I find it...
Speaker:
I find it very interesting.
Speaker:
So for people interested, I know we're
short on time, so I'm just going to
Speaker:
mention there is an awesome book that's
called How Minds Change by David McCraney.
Speaker:
I'll put that in the show notes.
Speaker:
And he talks about these kind of topics
and that's especially interesting.
Speaker:
And of course, patient statistics are
mentioned in the book because if you're
Speaker:
interested in optimal decision making at
some point, you're going to talk about
Speaker:
patient stats.
Speaker:
But he's a journalist.
Speaker:
Like he doesn't know at all about patient
stats originally.
Speaker:
And then at some point, it just appears.
Speaker:
I will check that out.
Speaker:
Yeah, I'll put that into the show notes.
Speaker:
So before asking you the last two
questions, Alan, I'm curious about your
Speaker:
predictions, because we're all scientists
here, and we're interested in predictions.
Speaker:
I wonder if you think there is a way
Speaker:
In the realm of statistics education, are
there any innovative approaches or
Speaker:
technologies that you believe have the
potential to change, transform how people
Speaker:
learn and apply statistical concepts?
Speaker:
Well, I think the things we've been
talking about, computation, simulation,
Speaker:
and Bayesian methods, I think have the
best chance to really change statistics
Speaker:
education.
Speaker:
I'm not sure how it will happen.
Speaker:
It doesn't look like statistics
departments are changing enough or fast
Speaker:
enough.
Speaker:
I think what's going to happen is that
data science departments are going to be
Speaker:
created
Speaker:
And I think that's where the innovation
will be.
Speaker:
But I think the question is, what that
will mean?
Speaker:
When you create a data science department,
is it going to be all machine learning and
Speaker:
algorithms or statistical thinking and
basic using data for decision making, as
Speaker:
I'm advocating for?
Speaker:
So obviously, I hope it's the latter.
Speaker:
I hope data science becomes.
Speaker:
in some sense, what statistics should have
been and starts doing a better job of
Speaker:
using, as I said, computation, simulation,
Bayesian thinking, and causal inference, I
Speaker:
think is probably the other big one.
Speaker:
Yeah.
Speaker:
Yeah, exactly.
Speaker:
And they really go hand in hand also, as
we were seeing at the very beginning of
Speaker:
the show.
Speaker:
Of course, I do hope that that's going to
be the case.
Speaker:
You've already been very generous with
your time.
Speaker:
So let's ask you the last two questions,
ask everyone at the end of the show.
Speaker:
And you're in a very privileged position
because it's your second episode here.
Speaker:
So you're in the position where you can
answer something else from your previous
Speaker:
answers, which is a very privileged
position because usually the difficulty of
Speaker:
these questions is that you have to choose
and you cannot answer all of it.
Speaker:
you get to have a second round, Alain.
Speaker:
So first, if you had unlimited time and
resources, which problem would you try to
Speaker:
solve?
Speaker:
I think the problem of the 21st century is
how do we get to 2100 with a habitable
Speaker:
planet and a good quality of life for
everybody on it?
Speaker:
And I think there is a path that gets us
there.
Speaker:
It's a little hard to believe when you
focus on the problems that we currently
Speaker:
see.
Speaker:
But I'm optimistic.
Speaker:
I really do think we can solve climate
change.
Speaker:
the slow process of making things better.
Speaker:
If you look at history on a long enough
term, you will find that almost everything
Speaker:
is getting better in ways that are often
invisible, because bad things happen
Speaker:
quickly and visibly, and good things
happen slowly and in the background.
Speaker:
But my hope for the 21st century is that
we will continue to make slow, gradual
Speaker:
progress
Speaker:
and a good ending for everybody on the
planet.
Speaker:
So that's what I want to work on.
Speaker:
Yeah, I love the optimistic tone to close
out the show.
Speaker:
And second question, if you could have
dinner with any great scientific mind,
Speaker:
then it would be a lot more fictional.
Speaker:
Who would it be?
Speaker:
I think I'm going to argue with the
question.
Speaker:
I think it's based on this idea of great
scientific minds, which is a little bit
Speaker:
related to the great person theory of
history, which is that big changes come
Speaker:
from unique, special individuals.
Speaker:
I'm not sure I buy it.
Speaker:
I think the thing about science that is
exciting to me is that it is a social
Speaker:
enterprise.
Speaker:
It is intrinsically collaborative.
Speaker:
It is cumulative.
Speaker:
Making large contributions, I think, very
often is the right person in the right
Speaker:
place at the right time.
Speaker:
And I think often they deserve that
recognition.
Speaker:
But even then, I'm going to say it's the
system.
Speaker:
It's the social enterprise of science that
makes progress.
Speaker:
So that's, I want to have dinner with the
social enterprise of science.
Speaker:
Well, you call me if you know how to do
that.
Speaker:
But yeah.
Speaker:
I mean.
Speaker:
Choking aside, I completely agree with you
and I think also it's a very good reminder
Speaker:
to say it right now because we're
recording very close to the time where
Speaker:
Nobel prizes are awarded and yeah, these
participate in the fame, like making
Speaker:
science basically kind of like another
movie industry or industries like that are
Speaker:
played by just fame.
Speaker:
and all that comes with it.
Speaker:
And yeah, I completely agree that this is
especially a big problem in science
Speaker:
because scientists are often specialized
in a very small part of their field.
Speaker:
And usually for me, it's a red flag, and
that happened a lot in COVID, where some
Speaker:
scientists started talking about
epidemiology, whereas it was not their
Speaker:
specialty.
Speaker:
And
Speaker:
To me, usually that's a red flag, but the
problem is that if they are very
Speaker:
well-known scientists who may end up
having the Nobel Prize, well, then
Speaker:
everybody listens to them, even though
they probably shouldn't.
Speaker:
When you rely too much on fame and
popularity, that's a huge problem.
Speaker:
Just trying to make heroes is a big
problem because it helps from a narrative
Speaker:
perspective to make people interested in
science.
Speaker:
basically that people start learning about
them.
Speaker:
But there is a limit where it also
decorates people.
Speaker:
Because, you know, if it's that hard, if
you have to be that smart, if you have to
Speaker:
be Einstein or Oppenheimer or any of these
big or Laplace, you know, then it's just
Speaker:
like, you don't even want to start.
Speaker:
working on this.
Speaker:
And that's a big problem because as you're
saying, progress for scientific progress
Speaker:
is small incremental steps done by
community that works together.
Speaker:
And there is competition of course, but
that really works together.
Speaker:
And yeah, if you start implying that most
of that is just you have to be a once in a
Speaker:
century genius to make science.
Speaker:
We're going to have problems, especially
HR problems in the universities.
Speaker:
So yeah, no, you don't need that.
Speaker:
And also you're right that if you look
into the previous work, like even for
Speaker:
Einstein, the idea of relativity was
already there in the time.
Speaker:
If you look at some writings from
Poincaré, one of the main French
Speaker:
mathematicians of the 20th century.
Speaker:
already Poincaré just a few years before
Einstein is already talking about this
Speaker:
idea of relativity and you can see the
equations also in one of his books
Speaker:
previous to Einstein's publications.
Speaker:
So it's like often it's, as you were
saying, an incredible person that's also
Speaker:
here at the right time, at the right
place, who is in the ideas of his time.
Speaker:
So that's also very important to
highlight.
Speaker:
I completely agree with that.
Speaker:
Yeah, in almost every case that you look
at, if you ask the question, if this
Speaker:
person had not done X, when would it have
happened?
Speaker:
Or who else might have done it?
Speaker:
And almost every time the ideas were
there, they would have come together.
Speaker:
Yeah, maybe a bit later, or even maybe a
bit earlier, we never know.
Speaker:
But yeah, that's definitely the case.
Speaker:
And I think the best
Speaker:
proxy to the dinner we wanted to have is
to have a dinner with the LBS community.
Speaker:
So we should organize that, you know, like
an LBS dinner where everybody can join.
Speaker:
That would actually be very fun.
Speaker:
Maybe one day I'll get to do that.
Speaker:
One of my wildest dreams is to organize a,
you know, live episode somewhere where
Speaker:
people could come join the show live and
have a live audience and so on.
Speaker:
We'll see if I can do that one day.
Speaker:
If you have ideas or opportunities, feel
free to let me know.
Speaker:
And I think about it.
Speaker:
Awesome.
Speaker:
Alain, let's call it a show.
Speaker:
I could really record with you for like
three hours.
Speaker:
I literally still have a lot of questions
on my cheat sheet, but let's call it a
Speaker:
show and allow you to go to your main
activities for the day.
Speaker:
So thank you a lot, Alain.
Speaker:
As I was saying, I put a lot of resources
and a link to your website in the show
Speaker:
notes for those who want to dig deeper.
Speaker:
Thanks again, Alain, for taking the time
and being on this show.
Speaker:
Thank you.
Speaker:
It's been really great.
Speaker:
It's always a pleasure to talk with you.
Speaker:
Yeah.
Speaker:
Feel free to come back to the show and
answer the last two questions for a third
Speaker:
time.