Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways:
- Bob’s research focuses on corruption and political economy.
- Measuring corruption is challenging due to the unobservable nature of the behavior.
- The challenge of studying corruption lies in obtaining honest data.
- Innovative survey techniques, like randomized response, can help gather sensitive data.
- Non-traditional backgrounds can enhance statistical research perspectives.
- Bayesian methods are particularly useful for estimating latent variables.
- Bayesian methods shine in situations with prior information.
- Expert surveys can help estimate uncertain outcomes effectively.
- Bob’s novel, ‘The Bayesian Heatman,’ explores academia through a fictional lens.
- Writing fiction can enhance academic writing skills and creativity.
- The importance of community in statistics is emphasized, especially in the Stan community.
- Real-time online surveys could revolutionize data collection in social science.
Chapters:
00:00 Introduction to Bayesian Statistics and Bob Kubinec
06:01 Bob’s Academic Journey and Research Focus
12:40 Measuring Corruption: Challenges and Methods
18:54 Transition from Government to Academia
26:41 The Influence of Non-Traditional Backgrounds in Statistics
34:51 Bayesian Methods in Political Science Research
42:08 Bayesian Methods in COVID Measurement
51:12 The Journey of Writing a Novel
01:00:24 The Intersection of Fiction and Academia
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström and Stefan.
Links from the show:
- Robert’s website (includes blog posts): https://www.robertkubinec.com/
- Robert on GitHub: https://github.com/saudiwin
- Robert on Linkedin: https://www.linkedin.com/in/robert-kubinec-9191a9a/
- Robert on Google Scholar: https://scholar.google.com/citations?user=bhOaXR4AAAAJ&hl=en
- Robert on Twitter: https://x.com/rmkubinec
- Robert on Bluesky: https://bsky.app/profile/rmkubinec.bsky.social
- The Bayesian Hitman: https://www.amazon.com/Bayesian-Hitman-Robert-M-Kubinec/dp/B0D6M4WNRZ/
- Ordbetareg overview: https://www.robertkubinec.com/ordbetareg
- Idealstan – this isn’t out yet, but you can access an older working paper here: https://osf.io/preprints/osf/8j2bt
- Ordinal Regression tutorial, Michael Betancourt: https://betanalpha.github.io/assets/case_studies/ordinal_regression.html
- Andrew Heiss blog: https://www.andrewheiss.com/blog/
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
Did you know there is a novel out there about...
2
...Basian statistics?
3
It even has a great title, The Basian Hitman, and an even greater author, Robert Kubinik.
4
When I heard about that, I, of course, had to invite Bob on the show.
5
An assistant professor at the University of South Carolina, Bob's research focuses on
wealth creation and democratization.
6
causal inference, and Bayesian statistics.
7
In this episode, Bob takes us through his fascinating journey from working in government
to pursuing a career in academia, exploring his current work on measuring corruption and
8
how Bayesian methods help in estimating latent variables.
9
This is Learning Bayesian Statistics, episode 119, recorded October 8, 2024.
10
Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
11
I'm your host, Alex Andorra.
12
You can follow me on Twitter at alex-underscore-andorra.
13
like the country.
14
For any info about the show, learnbasedats.com is Laplace to be.
15
Show notes, becoming a corporate sponsor, unlocking Bayesian Merch, supporting the show on
Patreon, everything is in there.
16
That's learnbasedats.com.
17
If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.
18
See you around, folks.
19
and best patient wishes to you all.
20
And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
21
Check us out at pimc-labs.com.
22
Hello my dear patients!
23
Today I want to welcome two new patrons in the Lone Base Dads family.
24
Thank you so much to Rasmus Hinstrom and to the mysterious Stefan.
25
Your support truly makes this show possible.
26
I can't wait to talk with you in the Slack channel and I hope you will enjoy being there
and talking about everything base.
27
Okay, on to the show now.
28
Bob Kubinek, welcome to Learning Bayesian Statistics.
29
Thank you.
30
Alex, it's so great to be on here.
31
Thanks so much for inviting me.
32
Yeah, that's awesome.
33
And I also really love how that episode came around because I discovered your book, I
mean, your novel, which is your first novel.
34
We'll talk about that during the show, which is called The Bayesian Hitman.
35
Of course, everybody should go and check it out.
36
It will be in the show notes.
37
And so I discovered that book when I was at StandCon a few weeks ago.
38
And I recorded a bunch of like two, live episodes.
39
And then of course, afterwards you go for a drink.
40
And I was having a drink with, I think it was Francis D'Italia and Richard McGrath.
41
And we started.
42
Of course, we were of course talking about base stuff and I don't know why at some point
Francis mentioned your book.
43
It was like, Richard, I remember.
44
And I was like, wait, there is a novel about like that's based around base statistics.
45
I need to have that person on the show.
46
So that's how it came to my knowledge.
47
And then Francis, I think tagged you on Twitter and we talked a bit.
48
on on twitter and then he we are so that's definitely one of the most random episodes that
i've ever done yeah i mean that's fantastic and hopefully richard maccabee does read does
49
read my novel so i'd love to get his feedback on it and he his textbook like for many
people for me it was very influential in in doing basian statistics and he's an excellent
50
writer i don't know if he's written any fiction the fun thing i will say about writing a
novel
51
What's unusual about it isn't actually so much that I wrote a novel.
52
There actually are a fair number of novels written by academics.
53
It's more that I wrote it under my actual name.
54
There's some untold number of academics that are publishing under pen names.
55
Yeah, which is really fun.
56
So next time you see a book that's anonymous, it could be by an academic.
57
For me, I use my real name because the book actually is also about academia, even about
the work that I do.
58
And I wanted
59
I wanted people in the field to read it.
60
That was part of the fun of writing it.
61
That's why it's under my actual name, not some cool made up name.
62
Okay.
63
Well, that's already interesting.
64
I wasn't aware of that at all.
65
We'll definitely get back to that.
66
But that's my job here to do the teasing.
67
Stay tuned for more about why Bob wrote the book.
68
But first, let's talk about your
69
your origin story and all that because you have a very interesting one.
70
But first, as usual, can you tell us what you're doing nowadays?
71
Yeah.
72
So I'm assistant professor of political science at the University of South Carolina.
73
I just started here.
74
Before this, was at New York University Abu Dhabi in Abu Dhabi, the United Arab Emirates
for five years.
75
I did my PhD at the University of Virginia.
76
and I did a one-year postdoc at Princeton.
77
My work here, so in political science, I'm what they call a political economist.
78
I study a lot of, what I tell people is you take kind of the worst parts about political
science and economics, then you put them together, and you've got political economy.
79
So it's all like analyzing things like corruption, money in politics, business influence
over politics.
80
And maybe on the brighter side, I do some work on economic development.
81
I have a big project right now studying entrepreneurship in developing countries and how
young people can kind of get more involved.
82
But yeah, do a lot of these sort of dark money.
83
the connection really to Beijing Cystics is that a lot of the stuff that I study is very,
hard to measure.
84
And a lot of my work in Beijing Cystics deals with measurement and using
85
models, especially latent variable models, and very difficult measurement problems.
86
How do you find an estimate?
87
I'm just writing a viewer response today about this.
88
How do you find an estimate of something that's hard to study that you can't directly
observe that kind of best incorporates everything you know?
89
And Bayesian frameworks are really, I think, best at that.
90
know, roughly half of my research is sort of, let's say, statistical in nature, making new
models, especially in measurement modeling.
91
And then the other half is sort of empirical, like going out into the real world and
trying to discover new things about political economy.
92
That is super fun.
93
So you mean that people don't self-declare that they are corrupted or corrupting someone?
94
Yeah, I it.
95
Yeah, yeah, it is really strange.
96
And corruption is this really fun thing to study, precisely because, you know, how do you
get people to admit to it, right?
97
And there's this whole, you know, field of what they call sensitive survey questions,
which is, you know, if someone has done something wrong or illegal, how can you get them
98
to admit that on a survey in a way that, you know, and in some sense, it's sort of, you
talk about things that are unobservable, you can't
99
it's very hard to observe someone doing something completely illegal or unethical because
sort of by definition if it's illegal they're not going to do it in front of other people
100
that are not going to admit to it.
101
you have this sort of you know big central problem of you know how do you determine or how
do you study social behavior that you can observe right?
102
And yeah it's a complex issue.
103
Yeah for sure but that sounds...
104
That sounds fascinating.
105
Maybe you can talk a bit about that, we'll go back to your origin story.
106
Since we're talking about that, how do you do that?
107
How do you make people admit that they did something wrong without having them admit it?
108
Do you use hypnosis?
109
What are you using, Bob?
110
There's two approaches.
111
One is to rely on...
112
sort of government data.
113
And this they do a lot in developed democracies.
114
So primarily the EU and the United States.
115
And that's because, you know, in these countries, they have enough regulatory capability
that they can force people to report.
116
So like in the US and the EU, there are like lobbying registries, right, where every time
a business deals with a politician, there's a record of it.
117
Also, even when there's not a record of things, there's so much data available.
118
For example, there's a lot of work on contracting.
119
people will get access to the records of all the government contracts that have been
issued in a certain country and then search through them and try to find companies that
120
have political connections to certain politicians.
121
And there, again, it's tricky because you can't necessarily prove that there was a corrupt
transaction or something like this.
122
But let's say if you see that companies that have many more
123
they have someone on the board who was a former member of parliament if they get contracts
at a higher rate relative to other companies that don't you know that that sort of that
124
suggested things that's that's kinda one way of doing it and the other is through you know
trying and this is more i do which is trying to directly crop collect information about
125
corruption these sorts of issues and and that's a lot about trying to assure people some
kind of anonymity confidentiality
126
And I do a lot of online survey research which is better at confidentiality because if you
can fill out something on your phone or in your computer by yourself, I've done a lot of
127
getting employees to talk about what their companies do.
128
So they don't necessarily have to report that they themselves did a corrupt transaction.
129
But do you know if your business is working with some political parties or has offered
130
Do you know if your boss or CEO seems to be involved in some kind of transactions with
political parties?
131
Which happens all the time in developing countries.
132
Anyway, yeah.
133
So that's how I've been really getting at it.
134
Lately, I've been experimenting.
135
I kind of mentioned sensitive survey questions, and these are really fun.
136
They're basically ways of trying to encrypt a survey.
137
So encryption uses keys, right?
138
And so there's a method called randomized response and I actually have a blog post about
this I have a recent study that I did in Tunisia using this where essentially you use a
139
key the key is the respondents birthday So you you ask the respondent?
140
Are you were you born in these three months of the year?
141
And then you ask them a question that combines that answer which is random right?
142
with actual so you say
143
If you were born, like so you say, you know, did you did you let's say give a bribe or
something or do this sensitive thing?
144
The answer is like you you did that or you were born in these three months of the year.
145
And the other answer is, you know, yes, I did it or something like that.
146
What you do is you combine the this natural randomness from this question that's
irrelevant, but random with the actual thing you're trying to measure.
147
And just like
148
encryption that you use on a computer or whatever, it's actually the same process.
149
Because I don't know the respondent's birthday, I don't know their true answer.
150
But when I take all that data, because I know the proportion of people in the population
who are born in those three months of the year, which is roughly uniform, you can then
151
back out what the population estimate is, the latent estimate.
152
So this is some really clever, very
153
counterintuitive methods.
154
And so I'm experimenting with some of these now and yeah, doing things like that.
155
But.
156
This is super fun.
157
Sounds a bit like also, you know, detective work.
158
Yeah.
159
That's really fun.
160
Yeah.
161
Anti-corruption research is fun.
162
It's extremely difficult and difficult to get data.
163
And often you're kind of guessing with like...
164
what you can gather, but I do enjoy it a lot.
165
Yeah.
166
That makes me think a bit of a project I worked with some researchers from Dalhousie
University.
167
That sounds very different, but it made me think of it because they were trying to infer
the trade of shark meat across, know, between countries.
168
And, but trying to do it at the species level in countries don't have to report the
species.
169
They trade, they have to report the species they fish, but not the species they, they
trade.
170
And the thing is they, are some species they cannot trade.
171
And so of course, they, they, they can report the, the species that the trade, the
species.
172
for trade, but they don't always do it.
173
so you're like, okay, if they don't do it, that mean they are trading some species that
are not supposed to?
174
so actually the whole work was to try and infer from both the trade data and the lending's
data, so the fishing data, which species are actually traded by which country to which
175
country.
176
And so that's a bit like this where you don't have.
177
The data is based on self-reports.
178
The reports are not very constraining, so you have to do all that detective work and
that's where all the Bayesian methods are very powerful.
179
Yeah, totally.
180
That's where I've gotten lot of leverage from them in my own work.
181
I think too, because most Bayesian models have a frequentist analog.
182
For a lot of people, that's like, well, I don't see the difference.
183
When you're running a regression model, often there really isn't, right?
184
If it's like a maximum likely that simple linear regression model.
185
But where the Bayesian methods really shine is when you're studying some latent quantity
and especially when you have prior information about that.
186
Because when you have some, let's say, very subtle prior information, like let's say,
know, experts think that the wildlife trade isn't higher than like this threshold.
187
But there's uncertainty.
188
You're not sure exactly what the threshold is.
189
If you do your work with Stan and everything, you can include that information basically
almost directly into the model in a way that will give you much better estimates than
190
starting from some population sampling perspective.
191
And that's, I'd say, my favorite projects, the ones where Bayesian approaches have been so
helpful are those where you have this subtle prior information.
192
and that's where Bayes can really shine.
193
The flip side, of course, is it's not easy.
194
I'm sure it's not easy to do that type of modeling, and especially when you start deriving
custom models, custom distributions, it's intense.
195
It can take a long time, but the answer can be just so much better than alternatives
because it's just so much more nuanced.
196
I could preach on this topic for a long time.
197
No, I mean, that's great.
198
Although you would be preaching to the choir here.
199
Yeah, which is great.
200
That's wonderful.
201
Yeah, for sure.
202
But yeah, that just sounds super fun.
203
And I'm happy that I was able also to bring up that project because that fish project,
that shark tray project will have one of the main authors on the show very soon in
204
November, Aaron McNeil.
205
from Dalhousie University with whom I've worked on this project and the whole team.
206
So stay tuned, guys, for that episode.
207
That's going to be a very fun one.
208
Aaron is a very good communicator and also a very interesting person to talk to.
209
So it should be a very cool episode.
210
But let's get back to you, because I said that you had a...
211
a very interesting origin story, very original one.
212
And that's because when I, so to prepare the show, I of course stalk all my future guests,
right?
213
I hope you understand.
214
And while stalking you, I saw that you've transitioned from working at IBM and the US
Department of State to
215
Well, now academia, which is what you just said.
216
So I love that.
217
How did that happen?
218
Yeah, know life takes a lot of twists and turns.
219
And yeah, so I'd say essentially for the first part of my career, I was very, very wanted
to work in government, especially in foreign affairs.
220
you know, life, you know, life changed.
221
course changes are always a bit complicated.
222
did one tour with the Department in Saudi Arabia.
223
And a lot of my research is in the Middle East and North Africa, so that hasn't really
changed a whole lot.
224
But there are things that I loved about the State Department, and some of my colleagues,
just really amazing people.
225
Personally, actually when I was finishing my master's degree at George Washington, was at
a policy school, it wasn't a program that was really emphasizing preparation for PhD.
226
But I did some research there as part of my thesis sort of thing.
227
And I realized I had this sort of epiphany that I actually really liked doing research.
228
was this Eureka moment.
229
And what I also really came to value was independence.
230
And I think you can probably see some of that with my academic trajectory, that I like to
work on the things that I work on, and I like to kind of take different approaches.
231
And the State Department is a giant bureaucracy.
232
You know, and kind of has to be its job is to implement the legislation and all that
stuff.
233
And I just sort of personally realized that I would do better in a sort of less structured
environment, which, you know, there's still rules and universities and stuff, but
234
relatively speaking, it's much less structured.
235
And, you you work in smaller teams and your work is really more your own.
236
And so I really valued that.
237
So that was part of it.
238
And the other part was personal.
239
My fiance at the time was in the US and she was studying in a graduate program.
240
the State Department, have to have worldwide availability.
241
You have to go wherever they tell you to go.
242
they were going to next send me to Mexico.
243
That was the closest that they could send me back to the States.
244
And I talked to them.
245
was like, my fiance's and she's in Virginia and Richmond, US.
246
and they said, yeah, we're sending you to Mexico.
247
Because in the State Department world, that's close.
248
You're in the same hemisphere, but it's not that close.
249
And I was like, was a combination of that and then getting accepted to the PhD program at
University of Virginia.
250
So was all kind of all these things combined.
251
And honestly, there are definitely challenges to working in academia I'm sure you're
familiar with, but I really
252
I I made the right choice, at least for me.
253
I really do enjoy the freedom and flexibility of academia.
254
That's probably why I've stuck with it.
255
one thing I really care a lot about is open science and transparency.
256
Unfortunately, those aren't always hallmarks of academia, but I do think that at least in
principle, we have the ability to be much more honest and transparent than people in
257
government.
258
And so that's one thing I've always really enjoyed about.
259
academic research is the ability to just be upfront about what you think, release your
conclusions without a lot of political pressure to change them.
260
Yeah.
261
This is really interesting.
262
And when I can definitely relate to that background, because it's also what happened to
me.
263
was working at the French Central Bank, so it's not there.
264
the Department of State, but I actually worked a bit before the Central Bank for the
French Foreign Ministry.
265
Definitely had the same experience.
266
Interesting.
267
Okay.
268
Yeah.
269
Cool.
270
And then, yeah, for sure, getting much more autonomy and freedom in not only what I do,
but how I do it is odd.
271
Definitely something I tremendously appreciate since then and for sure I've never looked
back since that time.
272
But I'm curious though, did your background in the State Department influence your
research mythologies in political science?
273
much of a back and forth?
274
is there between the work you did once and you once did and the work you're doing now?
275
Yeah.
276
I I think that where you start out has a lot of influence.
277
That's where you train.
278
Those are the questions you're exposed to.
279
And absolutely, working at the State Department influenced a lot of the stuff I worked on
later.
280
Part of it simply might be that I tend to work on sort what they call policy relevant
topics.
281
most of my research is very contemporary Middle East issues.
282
I have colleagues who do a lot of amazing historical research, can be really fascinating
stuff, but the long-term influence of history on the contemporary world.
283
And I don't, I tend to focus more on what's kind of currently developing or happening in
different countries.
284
I certainly do spend time writing
285
for a policy audience so written for washington post carney down and that something in the
brookings institution so it now i i could certainly do more of that but i do and i think
286
to that you probably influences my writing style which tends to kind of focus on
simplifying things in making them clear and that was certainly you know when i was when i
287
was a diplomat that was
288
you very much stressed because you're writing for a policymaker audience and you cannot
use lots of jargon you cannot give them ideas they can't digest in 30 seconds because
289
that's all the time they have and so there's you there's a lot real like focus on being
short to the point I think I benefited from that I think it also yeah definitely
290
influences the way that I do things and I think too that
291
I came into my PhD program and definitely into sort of the statistical quantitative world
with a very nontraditional background.
292
So the State Department is like the least quantitative place in the world.
293
Like it's all diplomats who, you know, just sort of know a little bit about everything
and, you know, write, you know, pieces that just reflect their experience.
294
And I think honestly, my feeling is that coming into my program with that background was
actually super helpful.
295
because I wasn't sort of locked into existing paradigms.
296
So I really had studied some statistics prior to grad school, but not very much.
297
And so when I kind of came around to be introduced to Bayesian methods, I was just like,
this is great.
298
And I think too that it's led me to question things maybe in a way that I wouldn't have
otherwise.
299
And a lot of my papers and projects in the statistics world have often come out of me kind
of questioning things and
300
thinking that things are unclear and wanting to know why that's the case.
301
I think to, you know, there are people listening to this podcast, let's say, who are just
starting out in the world of statistics and don't have that background, you they didn't
302
grow up doing, let's say, the math Olympiad, playing chess all the time or whatever the
stereotype is, you know, that's really fine.
303
And there's a lot that you can contribute.
304
Like I really think almost anyone can do statistics or data science.
305
They're going to do it differently.
306
They're going to stress different things.
307
They're going to learn differently.
308
They're going to communicate differently.
309
But you can make a contribution.
310
And part of it is just that as people, think differently.
311
And if you think somewhat differently than the average statistician, that actually can be
a really good thing, especially when you're doing research.
312
Because research is all about finding solutions, right?
313
And if you want to find a solution, it has to be something that
314
no one else has thought of yet.
315
you know, I think that for me, it's actually been really fun being a statistician without
that background.
316
It's also been at times intimidating.
317
So you mentioned StanCon.
318
I went to the first StanCon in 2018, and that was also really the first time that I had
presented at like these political methodology conferences, they're where you have a lot of
319
quantitative social scientists, but
320
It was definitely my first presentation that had a math stats focus, and I was just
absolutely petrified that someone like Andy Gelman was going to ask me this horrifically
321
hard question about driving the analytical posterior distribution or something, and I was
just going to kind of collapse on stage.
322
And that didn't happen, and I had a lot of fun.
323
And I even had time to talk to some of the stand-devs and ask them these deep questions
about Hamiltonian Monte Carlo.
324
And I didn't really understand the answers, but it was still fun.
325
yeah, I think that that reputation of intimidation and stuff is not good for the field.
326
But when people get past that, it actually can be a lot of fun.
327
I definitely, yeah, I I agree with...
328
everything you just said and definitely recommend people to check out events like
StandCon.
329
They are really absolutely fantastic.
330
As I said, I recorded two live episodes there.
331
They are not out yet at the time when your episode is going to be out.
332
They require a bit more editing, but they will drop in your feed, folks, in a few weeks.
333
Cool.
334
But yeah, that's just a fantastic experience because a lot of the Stan developers are
there and you can ask them all the questions that you want that you think are stupid, but
335
actually are very interesting.
336
And yeah, I get that it can be intimidating, but that's the great thing of that Beijing
community.
337
From the beginning, I found that it's very welcoming.
338
community.
339
as you were saying, you started going into that world without a math degree, know, or an
engineering degree.
340
It's the same for me.
341
I studied management and political science.
342
So for a long time, I was, you know, kind of fearing that that part of my background with
a lot of imposter syndrome.
343
as you're saying,
344
Actually, that can make you an interesting statistician because, precisely because you
haven't gone through the classic way to statistics.
345
So for sure, you're not going to weigh in on the mass-saputation matrix routines if you
don't want to, because that's clearly another wheelhouse.
346
you'll have a lot to say on a lot of other topics, especially applied statistics, which in
the end is extremely important because all these software are here to be applied to use
347
cases.
348
Yeah.
349
And I think for listeners, and I'm sure you talk about this more, but if you are
interested in basic statistics in general,
350
and getting into it.
351
community is the place to be.
352
There's a discourse site that you can post questions about using Stan.
353
But if you just simply dig up a lot of the documentation that they've made, both for Stan,
but also they have these case studies online, they're excellent.
354
And a lot of this has to do really with Andy Gilman.
355
And Andy, just, you know, his own writing, if you read his articles, are very clear.
356
And that always set him apart in the statistics world for this love of clarity, this love
of simplicity over formal notation and dense and penitential text.
357
And that has really defined the Stan community as well.
358
don't know if it's as much, the Stan is now so big.
359
I kind of started out relatively early in it.
360
And part of it was, so John Kropko was sort of my stats advisor at UVA and he had
361
had did a two-year postdoc with Andy Gelman's team as they were developing the first
edition of Stan.
362
So I was sort of exposed to it relatively early on and it you know back then it was the
community was small enough that it was a Google group and like you know people really like
363
knew each other if they posted on there and now it's a lot bigger but I think that ethos
is still there and I you know really encourage people that you know to
364
to post, to ask questions.
365
Obviously, people can always be rude to newcomers and things, but it really is a great, if
you're going to start somewhere, it's a great place to start because the ethos is, how can
366
we include more people?
367
I just looking at the stand, they were talking about how they, I think they have like,
368
They're talking about how they're measuring who goes to StanCon and who has sort of a
non-traditional background and stuff like this.
369
And that's because they care about that.
370
And a lot of people don't.
371
So that's a beautiful thing.
372
And think Stan has been responsible really for raising the level of statistical literacy
really across the entire Applied Statistics community because
373
They're smart, they do amazing work, but they also explain things, right?
374
And there's certain models that I know how they work because of the Stan documentation.
375
Because it's just a lot clearer and to the point.
376
you know, that's...
377
And I think, I honestly think there's a lot of...
378
There's actually a fair amount of bad vibes against Stan in the Beijing Cystics world
among people who...
379
We're kind of around before it are attached to different older style methods.
380
But I think part of it too is they don't like the vibe like the you know, anyone can do
this.
381
We can help anyone understand that's not popular in all circles.
382
I'll just say that.
383
Yeah.
384
Damn, yeah, that's the first time I hear that.
385
But that's good to know.
386
Yeah, and for sure, completely second everything you just said.
387
If you're coming from the Python world, I also definitely encourage you to look at the
PaemC community.
388
That's where I started.
389
So the PaemC discourse is a great place to get your questions answered.
390
Also answer some questions yourself because that's really how you're going to learn.
391
So I definitely make use of all that, of all that open source community and make some PRs.
392
PRs are always welcome, can tell you that.
393
And I'm actually curious also, before we switch to your novel, to talking about your
novel, how, how...
394
Like first, do you remember when you were first introduced to Bayesian stance?
395
And also if these methods have shaped your research approach in political science?
396
Yeah.
397
Yeah.
398
Good.
399
Yeah.
400
I should talk about like some of my actual work in this field as part of this podcast.
401
Yeah.
402
So it's really funny, but my first project, so I was working on this paper.
403
that I had to do in grad school that was like a capstone paper for one of my minors.
404
It was an applied statistic field concentration.
405
I had to write this paper and I was doing mixture modeling.
406
And I think the funny thing about this project is that it was in many ways like a failure.
407
Because if you've ever played around with mixtures of Gaussians, mixtures of our
distributions,
408
there are these horrific identification issues.
409
And I was trying to fit a model where I had to identify certain clusters in the data, or I
was trying to do this with mixture modeling.
410
And it just wasn't working.
411
So essentially, my advisor, I was using a frequentist package, our package, called
FlexMix.
412
I remember this very distinctly.
413
And it was just...
414
Basically, every time I ran it, I got a different result.
415
And that was because of having a multimodal likelihood.
416
There wasn't a single solution, and so the algorithm would end up in a different place
each time.
417
And this was driving me nuts.
418
So then I switched to Stan, a early version of Stan at the time.
419
But there was sufficient documentation of mixture modeling.
420
And the funny thing was the model didn't actually work any better in Stan.
421
But...
422
it was doing it in the stand that I actually understood the model.
423
Right?
424
And after beating my head against the wall for a few months, I was like, I understand.
425
Yes, of course, this model is not identified.
426
A mixture is, know, without sufficient prior information about the location of the
mixtures, like you don't know where they are.
427
And so that was sort of my gateway drug.
428
And then I was like, then I read McElhary's text, like when I was, because it had,
429
I was just lucky, know, we all lucky.
430
It came out just as I, his first edition came out just as I was starting out and I was
using Twitter, which, you know, now is let's say not as useful as it was, but the social
431
media things are useful.
432
I appeared on social media, people like, wow, this is really different.
433
And I got it.
434
And it just blew me away.
435
I mean, just the clarity, the simplicity, how he connected a lot of things that I had
thought of before.
436
And, you know, I think
437
after reading it I went from being like, this is cool, to really being a little bit more
of a zealot.
438
You know, this is how we should do research, this can really make a big difference.
439
And then of course I also read, you know, like the canonical Bayesian day analysis.
440
The first time I went through it, it was a little intense.
441
McElroy's text is definitely more approachable.
442
But then a lot of my learning was, yeah, this informal case studies, blog posts, people
sharing online, people answered my questions on the forum.
443
That was all super, super helpful.
444
ultimately, what I really got into...
445
So I have two R packages that are Bayesian modeling.
446
One is called Ord Beta Reg or Ordered Beta Regression.
447
And this is a model that...
448
We don't have to go all into the weeds, but traditional beta regression is supposed to be
model of proportions, but the beta distribution can't handle observations so-called at the
449
bounds, meaning if your proportion goes from 0 to 1, you can include any observations that
have a 0 or a 1.
450
And ordered beta regression is basically a compound model where one part is a beta
distribution and the other part is actually a simple sort of ordered logit.
451
that allows the model to include those.
452
So there's two cut points in the model, like an order logistic regression.
453
And you have one linear model.
454
you can include, basically, the model has sort of three components, zero, anything between
zero and one, and then one.
455
And the payoff to the model, and it's getting a lot of exposure, is that
456
It's a pretty simple model.
457
not much more complicated than beta regression, but you can include an outcome that has
zeros and ones or zero and 100 or whatever your scale is.
458
And that's very useful for people, especially in the social sciences.
459
My other...
460
And that one's available.
461
It's really a wrapper around BRMS to allow people to fit these types of models.
462
And that's out on Crayon and everything.
463
The other one is much more ambitious and is actually just coming to fruition.
464
And it's called Ideal Stan.
465
And ideal there comes from a social science model called the Ideal Point Model, which is
itself a variant of something called item response theory, which people have come across
466
of.
467
It's from psychometrics.
468
And what I've been doing is
469
sort of expanding and generalizing this model using Stan.
470
And what it allows people to do, and I've used it for multiple papers and publications
already, but I've never released the final version, is it allows you to apply what's
471
essentially a very general purpose measurement model to all kinds of distributions of data
that people couldn't before.
472
So one of the big innovations is non-ignorable missing data.
473
Like when you're fitting a latent variable model,
474
or any kind of measurement model, missing data is really, really tough because missing
data is essentially its own latent problem.
475
When you have missing data and a latent variable, what do do?
476
So this package has a way of dealing with that in a way that avoids bias in your estimate
of the latent variable.
477
But it also has a bunch of other things.
478
So Stan is really good at time, so there's a lot of work I've done on time-varying latent
variables.
479
And that's particularly useful nowadays because as social scientists and data scientists,
we're getting so much more time-grain data, right?
480
From Twitter.
481
Well, I guess not anymore from Twitter.
482
But there's plenty of data sets that have and can get time stamped right down into
seconds.
483
And if you want to estimate some latent quality from that, like let's say you want to know
about corruption or polarization or...
484
some other quantity and it's noisy, how do you handle that variation over time, especially
when you have these sparse data sets, right?
485
You have only a few observations in a given time window, but your time series is really
long.
486
So that's the kind of stuff I've been working on, and I'm hoping to finally release a
final version of that by the end of this year.
487
Knock on wood.
488
And I've already used it for a range of things.
489
It's really useful for survey data sets when you have like missing data.
490
Like I used it to measure essentially people's wealth from a survey when there was a lot
of missing data in different variables in the survey.
491
I've used it to measure countries' policy responses at COVID-19 when there's a lot of
complexity in how they respond and which countries are the most prepared.
492
And yeah, so on and so forth.
493
So that package hopefully will be out soon and that uses Stan, that uses like raw Stan
code in an R framework.
494
I know you're, it sounds like you're a Python guy, in principle it's, can estimate in
Python as well, but that would have to be a future project.
495
But that's my, those are my big kind of Bayesian method stuff.
496
I have done a, I have a new paper out, the Journal of Royal Ciscal Society that
497
that fits a big Bayesian measurement models in stand to measure COVID infections in the US
in the early pandemic period.
498
So, and this is talking about, you know, prior information, how useful it is in the US
context.
499
Well, most countries, right, during the early part of COVID, like we didn't know, we
didn't even have bias data, right?
500
Like there were so few tests available and
501
I started working on this actually in sort of the early part of the pandemic and it just
was recently published, but I got really fascinated by this topic of like, well, how do
502
you measure COVID infections if you just don't have any data, right?
503
Like you can't test, you can't do anything.
504
And it's a really cool problem from a Bayesian point of view, because as a Bayesian, you
think, well, the best answer that you can give to a problem is to include all of your
505
prior information, right?
506
Beyond that, that's the best answer you can give.
507
And so I started to think about that from a very kind of purely, so I've told people that
this is my most Bayesian project ever because I just kind of was like sat down and I
508
worked with an epidemiologist guy, Luis Carvalho, who's also a big Stan person.
509
He's on the Discourse site and we worked together on this and kind of came up with a new
approach and the approach really emphasizes using priors.
510
And so we show how you can get a decent estimate of COVID infections.
511
in this very early period like March of 2020 by using things like expert surveys, right?
512
Where you simply go and ask a bunch of experts, well, how many infections do you think
there are?
513
And then you have uncertainty in that estimate, right?
514
And then you allow that uncertainty to propagate into your model, into the final estimates
of COVID infections.
515
And you essentially, can get a pretty good estimate that's still uncertain.
516
but actually incorporates this information that that's not you know that like you don't
have tests are or you know hospitalization even but you know if if you take that basically
517
take advantage of information that you have you can give a much better answer than just
spitballing or as many people did assuming way uncertainty and pretending to be much more
518
certain than they were any such that i think some of my recent stuff that i'm always
tinkering you know i'm sure as you are with with different things but
519
Yeah, same.
520
There is always a part of me that is looking forward to the day where I will have
everything figured out and understood.
521
that day never comes.
522
It's like, I will finally understand Gartian processes.
523
And then I think I understand.
524
And then a new use case comes and I'm like, wait, how do I do that?
525
Wait, why doesn't...
526
Why doesn't that work?
527
It's like, my God, I have to learn that new method.
528
But I guess that's part of the job description.
529
And that's actually the fun part, I would say.
530
It's like, you just have to reassure that other voice in your hand.
531
like, that's fine.
532
That's normal.
533
That's part of the job.
534
And that's actually why it's fun and interesting.
535
But congrats on all those projects.
536
That sounds really cool and really fascinating.
537
And interesting that, like, I didn't know about the beta, the ordered beta regression, but
definitely makes sense.
538
I have some experience with the order logistic.
539
I've used that most recently on a football analytics paper slash project I'm working on.
540
But the ordered beta, I didn't know about that, but that sounds like fun.
541
And yeah, I should say it is getting used in industry.
542
I know there's a guy from Amazon who actually wanted me to make a code change so it could
be deployed somewhere in Amazon.
543
So I know it's out there doing something.
544
I don't know.
545
I guess if your Amazon order doesn't come through, then it's my package.
546
That was the problem.
547
yeah, so it's getting...
548
I'll be honest, I'm really, really happy with how the model is getting used in different
places.
549
And essentially it really matters for predictions.
550
Like if you fit your model and you want to predict, okay, given this many orders, what
proportion, or this many ads, what proportion will buy the product?
551
You want that prediction to respect the bounds.
552
You don't want, if you use OLS or something, then you could end up predicting that 115 %
of your customers will buy a product.
553
That doesn't make any sense.
554
you know the order beta regression allows you to take into account that non-linearity and
the outcome that you have these bounds and yeah and I will say for for order logit again
555
you're talking about the stand documentation but Michael Betancourt who's you know kind of
a legend in the stand community but he has an amazing case study on order logit as this
556
stuff does it does get somewhat technical but it's really brilliant I've never seen anyone
557
sort of go through the model the way he does, but if you can get through his case study,
you really understand OrderedLogit.
558
And as he does, in that case study, this is just a case study, he derives a novel
distribution for the priors in OrderedLogit model, the cut points.
559
He just does this part of the case study.
560
Like, yeah, here's this new Sysl distribution no one's ever used before.
561
So yeah, OrderedLogit's really cool.
562
And again, Order Beta,
563
It's really about thinking out of the box because this was an area, as I mentioned, where
it's somewhat well-trodden issue.
564
And that was something where it was sort of really combining two things that are very
different.
565
So I ordered logits model for discrete data, betas for continuous data, and then was
really combining them that made it work.
566
People think of statistics as a sort of dry rote, read formulas off of a page.
567
But in a lot of, think, actual problems, a lot of it's really creative thinking.
568
How do you get stuck in a dead end?
569
How do you find the way out?
570
And sometimes that's taking a very different approach.
571
Yeah, definitely.
572
And so we should add these links to the show notes.
573
So if you have anything to share regarding your package,
574
order beta, please add that to the show notes for the people because I know a lot of them
are going to want to dig deeper.
575
And I'll also add the link to Michael's case study about logistic for sure.
576
My football project doesn't have anything yet that's ready to be shared, but I will do
that as soon as possible.
577
For sure.
578
Maybe that's something we'll do.
579
We'll teach at PyData with Chris Fonsbeck in PyData New York in November.
580
Maybe we'll do that.
581
We'll see how that works.
582
Maybe at that point, I'll be able to share that and add that to the show notes.
583
In the meantime, let's add your package or any paper or case study or things like that
that you've...
584
return or read and think is interesting or even videos and tutorials and I'll add
Michael's case study.
585
And so I know you had a hard step in a few minutes and I definitely want to talk about
your novel.
586
I mean, I still have like tons of questions for you and the work you do and how you use
space and so on because honestly, I love all the work you do and I...
587
We could do like a three hour episode very easily.
588
But I definitely want to talk about your novel.
589
So let's do that because you're the first novelist on the show.
590
So first question, you know, if I were talking to you in the street or in a bar, would be
like, why?
591
What inspired you to write The Bayesian Hitman?
592
Yeah.
593
Well, my interest in writing, and this is the thing that for me, like doing statistics
kind of came a little bit later in life, my interest in writing predated my doing
594
statistics by a number of years, actually.
595
And I was always sort of interested in writing.
596
I got into writing fiction, actually, when I was a diplomat in Saudi Arabia, where, not to
too fine a point on it, there's not very much to do in Saudi Arabia.
597
And there's very few creative outlets.
598
And I used to love doing things in the States.
599
I used to actually be very much involved in improv theater at one time.
600
And there was very little of that.
601
But you can write a novel anywhere.
602
And so that's actually where I got into writing.
603
And all of that stuff that I wrote will be forever locked away in my computer and never
released.
604
But eventually I just kept writing from time to time, even in grad school, just a really
nice outlet.
605
It's a different way of using your brain.
606
After I get locked into, you
607
doing research papers and stuff and then fiction is just so different.
608
At least it should be.
609
I was posting on Twitter about this.
610
There's some academic studies that have turned out to be fiction and have had to be
retracted lately.
611
But in theory, as active as you are, not doing fiction, you're doing real research.
612
And yeah, the genesis of this novel actually came out of me being on the academic job
market in the fall when I was a grad student, so my last year of grad school.
613
I kind of had this
614
I don't know, as you get these sort of visions, I had this idea of someone going to a
university that wasn't their top choice, that was in an area that they didn't like, but
615
then things being radically different than they expect in a good way.
616
And that was really where the idea of the novel came from because when I was on the job
market, the thing about the job market
617
and that I think appeals as a sort of human story is how our lives are just so disrupted
as academics.
618
know, people move, you know, across their country, sometimes across the world.
619
They end up in places that they never thought they would live.
620
They're culturally very different.
621
And I thought that's a really great setting for a story.
622
And yeah, and you know, I think a lot of writing advice when you're starting out that
you'll get is to write what you know and...
623
And that was kind of what I knew.
624
That was the world I was living in.
625
One thing I want to clarify, it's not an autobiographical novel.
626
And I always worry about that a bit.
627
But the main character is not me.
628
And it's actually set in a fictional town with a fictional university.
629
I that very intentionally because I didn't want to like...
630
I say the main character has, let's say, very uncensored opinions about his institution.
631
And I didn't want to like...
632
you know, critique somewhat, you know, an actual university or college.
633
But the main character is very much informed by my experiences, but also by all my friends
and the things they went through and the places they traveled.
634
And I think ultimately, as I wrote in the acknowledgement of the book, I really, wanted it
to be a novel about academia that was much more realistic, that really got into the
635
problems people have, issues and the challenges they face, and to try to, you know,
636
Because I felt a lot of writing about academics, it's always these like mysterious
literature professors that hang around in like, you know, beautiful Ivy League, you know,
637
places and like solve crimes in their spare time or something.
638
And I wanted it to be a little more hard hitting and, and also really, you know, about,
you know, life as a research academic and what that's like and the pressures you face.
639
And so that was really the Genesis.
640
And then the main character, you know,
641
became about a Bayesian statistician, I don't know, I don't remember distinctly making
that choice, but the Khitmanger character is a Bayesian statistician, and that became
642
really fun because I still think, I mean, it's impossible to verify this claim, but I'd
say there's a high posterior probability that it is the only novel that has a Bayesian
643
statistician as the protagonist.
644
And that was really fun, because it's fun getting to take an area that
645
really doesn't appear in fiction and like put it into a story and you know, see what
happens.
646
Yeah, I mean, that is really interesting to see how like, know, the way of thinking that
got you there.
647
okay, so from like, that was one of the scenarios I had in my head where it's like
actually something you wanted to do already.
648
So you write of course academic content.
649
How was the experience?
650
How different was the experience?
651
And did you enjoy it more to write the fiction?
652
Do you feel freer writing fictional work because you you don't have to check everything,
cite everything and things like that?
653
Or was the experience in me and quite similar?
654
That's a great question.
655
I think that, unfortunately, think at the end, they tend to converge in some ways.
656
But I think that's partly the, I'd say the initial work is very, very different.
657
I when you're writing fiction, you really are, you're really trying to feed your creative
instinct.
658
And hopefully when you're writing academic paper, that's not the, hopefully you have like
data that, you know, constrains, know, these sorts of things.
659
But when you're trying to get that story out for the first time, and yeah, that's a very
different, and I don't know how good I am at that, but that's really about just going
660
wherever the story leads and trying to figure out the next thing.
661
And it's difficult, but it's definitely a very different kind of skill set or experience
than writing academic paper.
662
I'd say that they tend to converge at the end because
663
and this was my experience.
664
I noticed your words in my first novel and that made me very nervous that maybe there will
be another novel.
665
But yeah, it's not easy and it's not easy to finish a novel.
666
think, I mean, it's hard to write one for sure, but it's that finishing that really,
that's where you have to come back and make it coherent and edit it.
667
And my book was in a kind of editing phase for
668
years probably of just going back and forth and adding things, taking away, trying to make
the story really move at the right pace.
669
know, how do you have to, you don't want to have too much detail or too little and things
that become even technical.
670
Like we have what are called plot holes where, you you said something happened in a
certain place, but that couldn't happen because the character was here.
671
So then, you know, at the end of the day, you know, when you're finishing a novel really
comes down to a lot of details and
672
cross-checking things and stuff that's not that different from finishing an academic
paper.
673
so, and that's not, know, I mean, the editing phase is not usually people's favorite phase
unless they're kind of weird.
674
But yeah, so I would say ultimately they start to converge.
675
But of course, the writing style is very, very different.
676
I do think, you know, honestly, you my advice would be especially for academic
researchers, I really encourage you to write.
677
creatively.
678
Creative fiction, creative nonfiction, poetry.
679
No one has to see it.
680
No one has to know.
681
You can post it online under a pseudonym if you want, but it actually really does help you
write.
682
And I would say it has made my academic writing better.
683
mean, academic writing at the of the day, especially when it goes through after the peer
review process, you end up responding to reviewers and anything beautiful you made gets
684
destroyed.
685
But also, I write a lot of blog posts.
686
These have almost, my blog posts have far greater reach than my academic writing, I'm sure
of that.
687
And those are definitely much more informed by my creative writing.
688
And the nice thing is they're not peer reviewed and I'm able to add tone, I'm able to add
fun things.
689
And that definitely is much more connected to my fiction writing and issues of clarity.
690
But I'd say that you really can become a better, it's sort of like cross training as an
athlete.
691
Trying out different forms is really helpful.
692
I'm sure that writing my novel, I didn't write it to get ahead, although I've joked with
people that part of my tenure case is writing a novel.
693
But I don't think it'll actually will help.
694
But I'd say without maybe meaning to it, it definitely has made me a better writer and
that's great for my career at the end of the day.
695
Yeah, I completely agree with that.
696
Personally, I hate reading academic papers.
697
I do it because I have to, but each time I have to do that, I'm like, my God, no.
698
Actually, something I would like to try is feed a paper to Chetjipiti and ask it to
rewrite it from a very, more like a novel or a more exciting tone because honestly, the
699
writing is just terrible.
700
Yeah, count me a story, know, something like that.
701
But as you were saying, like to your point, really like Richard McArif's style because in,
I talked a bit about that with him at Stankon.
702
So I won't, you know, divulge all the details because I don't know if he wants to.
703
But long story short, it's like he also, you know, he, he definitely trained his writing
styles and he's aware of things he wants to.
704
to how he wants to do it.
705
And I think that's also a big part of why his book has been so successful.
706
It's the writing, it's so much more engaging.
707
And I've never understood that, honestly, from the academic world, where it's like, no, it
seems like to look serious, you have to be as boring as possible.
708
And that's just terrible.
709
That doesn't make people wanna read papers.
710
I'm not saying it should be completely entertaining and not saying it should be tick
tocking on papers on the country, but people like stories.
711
And if you can tell stories and at the same time, teach them something or show them a new
method, I think that's much better.
712
Everybody wins.
713
And I think it's also much more enjoyable for the writer.
714
So, you know, why not do a
715
little bit more of that.
716
that's why I think it's awesome that Richard writes that way, that you also like to write
that way and you're even writing novels.
717
I think that's awesome and that's probably going to change gradually.
718
I know other authors also doing that.
719
Osvaldo Martin, for instance, who's going to be back on the show next week.
720
Osvaldo was the first ever guest of Learning Patient Statistics.
721
Episode one was with him five years ago.
722
And he's been kind of a mentor to me.
723
And I really like his side of writing, for instance, also.
724
He's someone who has a lot of humor.
725
that, know, you can see that in the writing.
726
Also, I like it because he doesn't, you know, drown.
727
you with technical details from the get-go.
728
His writing is much more applied where it's like, okay, let me tell you about Bayesian
additive regression trees.
729
Here's the theory you need to know, but not too much.
730
in, okay, here is how we do it.
731
Here are the limitations and so on.
732
And I think that's much more efficient and also much more engaging.
733
totally.
734
And I think blogs have changed statistics.
735
maybe even helped create data science because it's this form of publishing that allows us
to just skirt around the whole archaic academic system.
736
you're just, you know, especially for listeners who aren't, let's say, aren't as plugged
into academia, haven't been through like the paper publishing process.
737
mean, reviewers really do like, you know, even if you try to, I mean,
738
If you write a paper better, it'll be a better paper.
739
they really, I mean, I've had reviewers like kill titles because they didn't think that
they were like academic-y enough.
740
you know, and the title I had to replace it with was definitely like a worse title, right?
741
So, and I've had reviewers like comments say, your writing style is too informal.
742
Like nothing to do with the actual substance of paper.
743
Just this doesn't sound, you know,
744
technical.
745
So when you're reading an academic paper, and it's turgid, it's like, well, some of that
is I mean, and also, I mean, some people just aren't like struggle to write some people,
746
English is not a language they're very comfortable with.
747
And so that, you know, there's not everyone's going to be Richard McElwreath, like that
guy has a gift.
748
Okay.
749
But, but I think to, you know, blogs allow people to just completely circum you know, do
an end runner out that
750
The other person I mentioned who's fantastic blogger on stats issues, I don't know if he
would explicitly say he's Bayesian, but a lot of his stuff is, is Andrew Heiss.
751
He's a fellow political scientist at Georgia State.
752
And we can put his link in the blog.
753
And he is, I think, really one of the best sort of stats bloggers out there because he has
a remarkable gift for visualization, but he's also very good at sort of the explanation
754
side.
755
And so if there's people listening who have not added him to their list, I he's really
great at, and very applied, you know, like the embedded code chunks in the blog and stuff.
756
So yeah, I mean, so I think this is all fantastic.
757
And, you know, the only challenge, and I know as academic who does some blogging is, you
know, we're not paid to do it, and it is a public good.
758
And when I say we're not paid to do it, mean, theoretically, yes, like,
759
anything you do as academic you're quote unquote paid for, but it's not really part of
your valuation.
760
Most people ignore it, right, for tenure and for these sorts of things.
761
And so I'd say it's not incentivized.
762
It's up to people to do it yourself.
763
I think my projects have been, I mean, I think some of my blog posts that relate to my
academic work, they help with, let's say, getting citations.
764
But at the end of the day, it's something that you kind of have to want to do.
765
But I would say personally, I found it just so rewarding to write that way and to see
stuff get out there in the world.
766
I'll just throw this in.
767
It's just a funny tidbit.
768
My most visited blog post for years now is actually a blog post I wrote about a pregnancy
test called a cell-free pregnancy screening for Down syndrome.
769
which if you know anything about being in cystics, right, testing is this like core part,
you it's like the example all the intro books use, you know, some kind of how many
770
vampires are in the population or whatever.
771
And so this blog post came out of actually a very personal story of my wife and I having a
child who was tested or recommended for testing.
772
And to make a long story short, I was really upset about how these tests were being
interpreted in a way that was like statistically invalid.
773
right?
774
Not being aware of prior distributions and how they affect the interpretation of the test.
775
So I wrote a blog post about this and I tried to make it very, very clear, even to people
of no stats background, right?
776
And I mean, it's just my personal blog.
777
I just posted it up there and somehow it's become one of the top Google search results for
this particular test.
778
It's called Maternity 21.
779
It's a top five.
780
So it gets like 100
781
view, you know, unique views a week, sometimes higher than that of people.
782
And I've gotten like emails from people all over the world.
783
And sometimes I have to say, I'm sorry, I'm not a medical doctor.
784
I'm just commenting on, you know, how you correctly interpret the statistics of a test.
785
like that's like as an academic, I guess really cool to have that kind of impact.
786
And I thought that was going to be a post I wrote that would be quickly forgotten, but
ended up.
787
And it's still it's people read it all the time.
788
And hopefully,
789
make fewer statistical errors, right?
790
From using those tests.
791
anyway, yeah.
792
Yeah, yeah, for sure.
793
That's cool because that was actually going to be a question I had for you, know, like the
way you saw your novel in your writing in general contribute to public understanding of
794
patient stance and scientific thinking in general.
795
Yeah.
796
But since we're short on time, because I think you have to leave in like 14 minutes.
797
I'm going to ask the last two questions to ask you that I asked every guest at the end of
the show.
798
But before that, one last question regarding your novel.
799
Did you get any feedback already from your academic peers and readers?
800
And what kind of feedback was it?
801
Not a ton yet.
802
mean, know, novel's only been out for a month and we're all busy.
803
So the people who have read it really like it.
804
Of course, as a Beijing statistician who works on, I know that the reported feedback is
not always the same as the true feedback, right?
805
That's a latent quantity.
806
So that being said, the way I would interpret people's feedback so far is that, wow, this
novel is actually not that bad.
807
I mean, it has a sort of claim to fame as being one of the first novels on the basis that
it's such a protagonist, but that doesn't mean that it's actually fun to read.
808
But the people who have gotten through it said, it's actually well-paced.
809
There's some kind of mystery thriller elements in it.
810
The plot moves along nicely.
811
They enjoyed it.
812
And that's honestly what I want people to take from it the end of the day.
813
There are some things in the book that
814
If you know me, you're probably not surprising.
815
Some things get into some questions about science, what is it about, philosophy of
science, things like this.
816
But it's not too heavy-handed and it's fun and it's a little bit escapist.
817
And that's what I wanted was for people who do research to have a fun book to read and
enjoy it.
818
It doesn't get super into the weeds on Bayesian stats.
819
There is some, and actually you mentioned Gaussian processes.
820
That's one of the few.
821
There is actually time when the characters sort of make fun of Gaussian process
regression.
822
So I was very happy that that ended up in the novel.
823
And maybe I'll get angry emails from people who love Gaussian processes.
824
Yeah, for me.
825
Yeah, for sure.
826
Yeah.
827
I mean, yeah.
828
I mean, I'd say if you understand Bayesian statistics, you'll understand more of what the
main character is doing.
829
I don't write out models in the book, but you have a much better sense of the technical
side of what's happening.
830
in the novel, I tried to make it more fun for people, even people who don't have a
background at all in statistics and stuff like that.
831
But yeah.
832
So it's still early for...
833
There's a lot of people who have the book and are reading it.
834
Yeah.
835
If the pilot distribution of feedback is the same as the true distribution, so far people
enjoyed it.
836
Nice.
837
Yeah, that's awesome.
838
And well done again for taking the time of doing that because I know how long it takes to
write a book and how much dedication and sacrifice of free time it asks for.
839
So yeah, thanks a lot for doing that.
840
I think it's super...
841
important for science communication.
842
And I do think we should teach science from much more of a storytelling perspective
because science is done by people.
843
And this is not just a bunch of dry theorems and papers.
844
So I think your novel definitely contributes to that.
845
So thanks a lot, And now...
846
So I need to ask you the last two questions before you can get out to your next
engagement.
847
first one, if you had unlimited time and resources, which problem would you try to solve?
848
And caveat is that you have to solve it with the Gaussian process.
849
No, of course not.
850
It's just, what problem if you had?
851
limited time and resources?
852
Yeah, limited time and resources.
853
Well, would buy a football team.
854
Important for the world.
855
I have actually thought about this, but I do a lot of work with online surveys.
856
I do think that we haven't really fully exploited them.
857
People maybe shouldn't be on social media as much as they are, but they're on at a time.
858
And there's incredible possibilities through that for data collection.
859
And one thing that I think would be really cool would be to do real-time online surveys
across the whole world that happen almost every day.
860
So this is, yeah, I guess, the social scientists dream.
861
But for some of stuff I study, like
862
corruption, like how people report issues of corruption or what's happening with their
business or things that look shady.
863
Having a survey like that that would run around the world all the time every day would be
pretty awesome.
864
Facebook did this during COVID.
865
They had a COVID poll and made me so jealous because they could, because they're Facebook,
they could just have this thing appear on people's feeds.
866
would just pop up and say, want to take a survey about COVID.
867
So there's this incredible data out there of like daily, sometimes it gets down to like
the county or state level of, know, how many, you know, stuff like, you know, how much
868
contact do people have with other people?
869
And we have this information, like literally almost the entire world.
870
It's just stunning.
871
And so I would love to do, to do that kind of thing about, you know, topics I care about,
like corruption and just see, you know, because in general in academia, we do,
872
The most we get away with is like a sort of point in time survey.
873
Like here's what people thought about President Trump at this particular time.
874
But longitudinal data is so much more interesting.
875
And there's so many more important questions you can ask when you get into things like
when do people change their minds?
876
How do they change their minds?
877
Like even with the current election, which we haven't discussed yet.
878
So we have to discuss that, right?
879
But, you know, like there's all this, you know, conversation about, who supports, you
know, Kamala Harris, who supports Donald Trump.
880
And the thing is we don't really have longitudinal data.
881
So we really don't know who has changed their mind or not because we don't know.
882
People answer a survey and they say which one they like at the moment.
883
But does that mean they really changed their mind or just that's who they saw on TV last
night?
884
That's the most recent candidate they've heard of.
885
What we really want to know are people who used to support one candidate, now they support
another.
886
Those are the really interesting ones.
887
So that's sort of my dream ambition.
888
Maybe that's a very
889
I don't know, uninteresting dream ambition, that would be one thing I would love to do.
890
And it really would require unlimited funding.
891
So if you know of a source of unlimited funding, please put me in touch.
892
Yeah.
893
mean, if I first use it and then I'll tell you, I'll tell you I want.
894
Great answer.
895
I'm not surprised.
896
You seem like you're really passionate about what you're doing.
897
I'm not surprised you came up with a very appropriate answer and of course, a very nerdy
answer.
898
was really hoping for that.
899
That's a prerequisite to be on the show.
900
know.
901
And so second question, if you could have dinner with any great scientific mind, dead,
alive or fictional, who would it be?
902
Yeah.
903
So this is a great question.
904
And when I've had to think about
905
And but it ended up being actually very clear for me and that that's Leonardo da Vinci.
906
And I've always heard about him.
907
was it took a trip to Italy a few years ago.
908
And there's a museum of Leonardo da Vinci.
909
I believe it's in Rome, but don't quote me on that.
910
That's not my if I remember correctly, is in Rome.
911
And it wasn't the biggest museum, but it was so fascinating.
912
And what I loved about this guy and this kind of relates to our conversation right about
913
statisticians writing novels.
914
This guy had no rules, right?
915
And he would, you know, he'd like wake up one day, he'd paint the Mona Lisa, he'd wake up
the next day, he'd like invent a new way to build a dam.
916
And no one was there to tell him like, hey, you should, you know, no, no, no, you're an
artist, like you should just stay painting all the time, or, my gosh, you're really good
917
at, you know, science, like you should, you know, just write scientific treatises.
918
Like he just decided he was going to do it all.
919
Now, obviously he was tremendously gifted and maybe there's not another person alive who
has ever been that multi-talented, but I just think that's so fascinating.
920
His scientific discoveries probably don't measure up to let's say Newton or Bacon or
Einstein, but I think as a person he's fascinating in the way that he's doing art.
921
He's doing science and the two can blend together.
922
And so I think for me, hands down, I just love to sit down and chat with him.
923
And where did you get your ideas from, right?
924
Where did they, is steady stream of insights and you look him up, I mean, his
contributions across fields are staggering, right?
925
And, yeah, so that's that he gets my vote.
926
If you can set that up too, that'd be great.
927
If you know how to bring dead people to life or whatever.
928
Well, I definitely will and I will join the dinner because honestly, yeah, I think it's
great choice.
929
yeah, some other people also have made that choice.
930
So that will be a very interesting dinner.
931
I thought it was super original, but I guess not.
932
I mean, it is original.
933
It's not the bulk of the distribution, but you're not the first one.
934
Yeah, that's fine.
935
Yeah, it's a great one.
936
It would have been very original if you wanted to optimize that to say myself.
937
If you have the source of unlimited funding, I would definitely be saying that.
938
If you like Leonardo da Vinci stuff as I do, there is in my native region in France,
939
There is his last house, which was offered to him, gifted to him by the King of France,
Francis I.
940
And at the time, the King was in the small city of Amboise, which is in the Loire Valley.
941
And so if you go to, so I definitely recommend the region.
942
This is really.
943
It's really beautiful.
944
It's like, it's the same vibe as Tuscany since you know Italy, but without the mountains.
945
But it's great food, great wine, lots of history, lots of castles.
946
Leonardo da Vinci spent his last years in Amboise and his castle, which is called the Clos
Lucé, is actually a museum now that you can visit.
947
Lots of his inventions are there.
948
even handwritten notes, always very original because he was writing with the left hand
from right to left.
949
So it's very hard to decipher actually.
950
There's an amazing park and there is also a bit of, there are some vines because he was
making some wine.
951
So yeah, definitely recommend that.
952
That's a really beautiful place.
953
That sounds fascinating.
954
And I didn't need another reason to go visit France, but you've given me another one, so I
will.
955
I'll have another excuse to visit for sure.
956
Yeah.
957
Thank you.
958
also it's not a very touristic one.
959
mean, it is touristic, but it's mainly European tourism and some Americans.
960
I mean, now that I've talked on the podcast, of course it's going to become much more
touristic because I'm kind of an influencer, but it's just like...
961
Yeah.
962
All these really nerdy people will show up and...
963
Yeah, know, trying to.
964
T-shirts and stuff.
965
Yeah.
966
Did you get a Stan T-shirt at the conference or?
967
No, I don't think there were T-shirts this year but I have some cool stickers.
968
have some cool stickers.
969
Okay, okay.
970
Yeah, they have stickers.
971
Yeah.
972
I got a T-shirt back when but I will say the first Stan conference, I wouldn't say it was
the best because I think probably the qualities, know, like there's so many more people
973
use Stan but it was in California on the coast.
974
Like they had a resort like that was on the Pacific.
975
was pretty, it was very pretty.
976
yeah.
977
Yeah.
978
It was really fun.
979
But yeah.
980
No, this year was in Oxford University.
981
So as expected, it was raining.
982
But the, but the university is pretty cool to look at.
983
Yeah.
984
Yeah.
985
No, that's, that's absolutely beautiful.
986
And, and, again, like, you know, you, you go to, to the UK, you expect it to be raining,
you know, so that's why it's like going to France and not expecting some strikes.
987
It's like,
988
You're missing part of the experience, you know.
989
Awesome.
990
Well, Bob, I need to let you go.
991
know you have another engagement, but thank you so much for taking the time.
992
That was absolutely great.
993
Awesome conversation.
994
I have still a gazillion questions for you, but let's do that when your next novel comes
around.
995
So you told me in about three months, right?
996
So that's awesome.
997
my gosh.
998
And yeah, as usual, I put resources and a link to your website in the show notes for those
who want to dig deeper.
999
Thank you again, Bob, for taking the time and being on the show.
Speaker:
Thank you, man.
Speaker:
I had a great time and I'll definitely get everything else to you.
Speaker:
But thanks so much.
Speaker:
It was so fun to this conversation.
Speaker:
You asked great questions.
Speaker:
made me think a lot.
Speaker:
So I really appreciate it.
Speaker:
I hope you have a good week.
Speaker:
This has been another episode of Learning Bayesian Statistics.
Speaker:
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbaystats.com for more resources about today's topics, as well as access to more
Speaker:
episodes to help you reach true Bayesian state of mind.
Speaker:
That's learnbaystats.com.
Speaker:
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lars and Meghiraam.
Speaker:
Check out his awesome work at bababrinkman.com.
Speaker:
I'm your host.
Speaker:
Alex Andorra.
Speaker:
You can follow me on Twitter at Alex underscore Andorra like the country.
Speaker:
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
Speaker:
Thank you so much for listening and for your support.
Speaker:
You're truly a good Bayesian.
Speaker:
Change your predictions after taking information in and if you're thinking I'll be less
than amazing.
Speaker:
Let's adjust those expectations.
Speaker:
me show you how to be a good Bayesian Change calculations after taking fresh data in Those
predictions that your brain is making Let's get them on a solid foundation