Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Structural Equation Modeling (SEM) is a key framework in causal inference. As I’m diving deeper and deeper into these topics to teach them and, well, finally understand them, I was delighted to host Ed Merkle on the show.

A professor of psychological sciences at the University of Missouri, Ed discusses his work on Bayesian applications to psychometric models and model estimation, particularly in the context of Bayesian SEM. He explains the importance of BSEM in psychometrics and the challenges encountered in its estimation.

Ed also introduces his blavaan package in R, which enhances researchers’ capabilities in BSEM and has been instrumental in the dissemination of these methods. Additionally, he explores the role of Bayesian methods in forecasting and crowdsourcing wisdom.

When he’s not thinking about stats and psychology, Ed can be found running, playing the piano, or playing 8-bit video games.

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

Takeaways:

– Bayesian SEM is a powerful framework in psychometrics that allows for the estimation of complex models involving multiple variables and causal relationships.

– Understanding the principles of Bayesian inference is crucial for effectively applying Bayesian SEM in psychological research.

– Informative priors play a key role in Bayesian modeling, providing valuable information and improving the accuracy of model estimates.

– Challenges in BSEM estimation include specifying appropriate prior distributions, dealing with unidentified parameters, and ensuring convergence of the model. Incorporating prior information is crucial in Bayesian modeling, especially when dealing with large models and imperfect data.

– The blavaan package enhances researchers’ capabilities in Bayesian structural equation modeling, providing a user-friendly interface and compatibility with existing frequentist models.

– Bayesian methods offer advantages in forecasting and subjective probability by allowing for the characterization of uncertainty and providing a range of predictions.

– Interpreting Bayesian model results requires careful consideration of the entire posterior distribution, rather than focusing solely on point estimates.

– Latent variable models, also known as structural equation models, play a crucial role in psychometrics, allowing for the estimation of unobserved variables and their influence on observed variables.

– The speed of MCMC estimation and the need for a slower, more thoughtful workflow are common challenges in the Bayesian workflow.

– The future of Bayesian psychometrics may involve advancements in parallel computing and GPU-accelerated MCMC algorithms.

Chapters:

00:00 Introduction to the Conversation

02:17 Background and Work on Bayesian SEM

04:12 Topics of Focus: Structural Equation Models

05:16 Introduction to Bayesian Inference

09:30 Importance of Bayesian SEM in Psychometrics

10:28 Overview of Bayesian Structural Equation Modeling (BSEM)

12:22 Relationship between BSEM and Causal Inference

15:41 Advice for Learning BSEM

21:57 Challenges in BSEM Estimation

34:40 The Impact of Model Size and Data Quality

37:07 The Development of the Blavaan Package

42:16 Bayesian Methods in Forecasting and Subjective Probability

46:27 Interpreting Bayesian Model Results

51:13 Latent Variable Models in Psychometrics

56:23 Challenges in the Bayesian Workflow

01:01:13 The Future of Bayesian Psychometrics

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
Speaker:

Structural Equation Modeling, or SEM, is a

key framework in causal inference.

2

00:00:10,366 --> 00:00:15,406

As I'm diving deeper and deeper into these

topics to teach them and, well, finally

3

00:00:15,406 --> 00:00:19,946

understand them, I was delighted to host

Ed Merkel on the show.

4

00:00:19,946 --> 00:00:25,106

A professor of psychological sciences at

the University of Missouri, Ed discusses

5

00:00:25,106 --> 00:00:29,518

his work on Bayesian applications to

psychometric models and model estimation.

6

00:00:29,518 --> 00:00:32,738

particularly in the context of Bayesian

SEM.

7

00:00:32,738 --> 00:00:37,818

He explains the importance of Bayesian SEM

in psychometrics and the challenges

8

00:00:37,818 --> 00:00:39,918

encountered in its estimation.

9

00:00:39,918 --> 00:00:44,518

Ed also introduces his blaavan package in

R, which enhances researchers'

10

00:00:44,518 --> 00:00:49,908

capabilities in Bayesian SEM and has been

instrumental in the dissemination of these

11

00:00:49,908 --> 00:00:50,858

methods.

12

00:00:51,246 --> 00:00:55,486

Additionally, he explores the role of

Bayesian methods in forecasting and

13

00:00:55,486 --> 00:01:00,166

crowdsourcing wisdom, and when he's not

thinking about stats and psychology, Ed

14

00:01:00,166 --> 00:01:04,946

can be found running, playing the piano,

or playing 8 -bit video games.

15

00:01:04,946 --> 00:01:11,426

This is Learning Bayesian Statistics,

episode 102, recorded February 14, 2024.

16

00:01:12,238 --> 00:01:33,078

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

17

00:01:33,078 --> 00:01:36,578

methods, the projects, and the people who

make it possible.

18

00:01:36,578 --> 00:01:38,778

I'm your host, Alex Andorra.

19

00:01:38,778 --> 00:01:42,318

You can follow me on Twitter at alex

-underscore -andorra.

20

00:01:42,318 --> 00:01:43,148

like the country.

21

00:01:43,148 --> 00:01:47,478

For any info about the show, learnbasedats

.com is left last to be.

22

00:01:47,478 --> 00:01:52,258

Show notes, becoming a corporate sponsor,

unlocking Bayesian Merge, supporting the

23

00:01:52,258 --> 00:01:54,918

show on Patreon, everything is in there.

24

00:01:54,918 --> 00:01:56,738

That's learnbasedats .com.

25

00:01:56,738 --> 00:02:01,188

If you're interested in one -on -one

mentorship, online courses, or statistical

26

00:02:01,188 --> 00:02:06,388

consulting, feel free to reach out and

book a call at topmate .io slash alex

27

00:02:06,388 --> 00:02:08,358

underscore and dora.

28

00:02:08,358 --> 00:02:12,126

See you around, folks, and best Bayesian

wishes to you all.

29

00:02:17,166 --> 00:02:20,666

Thank you for having me.

30

00:02:20,666 --> 00:02:21,686

Yeah, you bet.

31

00:02:21,686 --> 00:02:24,046

Thanks a lot for taking the time.

32

00:02:24,046 --> 00:02:29,086

I am really happy to have you on and I

have a lot of questions.

33

00:02:29,086 --> 00:02:31,106

So that is perfect.

34

00:02:31,106 --> 00:02:36,916

Before that, as usual, how would you

define the work you're doing nowadays and

35

00:02:36,916 --> 00:02:39,066

how did you end up working on this?

36

00:02:40,366 --> 00:02:46,780

Well, a lot of my work right now is with

37

00:02:46,894 --> 00:02:52,234

Bayesian applications to psychometric

models and model estimation.

38

00:02:53,134 --> 00:03:00,264

Over time, I've gotten more and more into

the model estimation and computation as

39

00:03:00,264 --> 00:03:02,554

opposed to applications.

40

00:03:02,954 --> 00:03:06,794

And it was a slow process to get here.

41

00:03:06,794 --> 00:03:12,274

I started doing some Bayesian modeling

when I was working on my PhD.

42

00:03:12,274 --> 00:03:15,438

I finished that in 2005 and...

43

00:03:15,438 --> 00:03:21,818

I felt a bit restricted by what I could do

with the tools I had at that time, but

44

00:03:21,818 --> 00:03:24,208

things have improved a lot since then.

45

00:03:24,208 --> 00:03:27,018

And also I've learned a lot since then.

46

00:03:27,058 --> 00:03:33,438

So I have over time left some things and

come back to them.

47

00:03:33,438 --> 00:03:39,598

And when I come back to them, I find

there's more progress that can be made.

48

00:03:40,498 --> 00:03:42,338

Yeah, that makes sense.

49

00:03:42,358 --> 00:03:43,822

And that's always super...

50

00:03:43,822 --> 00:03:50,512

interesting and inspiring to see such

diverse backgrounds on the show.

51

00:03:50,512 --> 00:03:52,982

I'm always happy to see that.

52

00:03:53,122 --> 00:03:59,062

And by the way, thanks a lot to Jorge

Sinval to do the introduction.

53

00:03:59,082 --> 00:04:02,682

Today is February 14th and he was our

matchmaker.

54

00:04:02,942 --> 00:04:05,702

So thanks a lot, Jorge.

55

00:04:05,802 --> 00:04:10,022

And yeah, like this promises to be a great

episode.

56

00:04:10,022 --> 00:04:12,072

So thanks a lot for the suggestion.

57

00:04:12,470 --> 00:04:20,060

And Ed, actually, could you tell us the

topics that you are particularly focusing

58

00:04:20,060 --> 00:04:21,090

on?

59

00:04:22,530 --> 00:04:29,060

Yeah, recently, so in psychology,

psychometrics, education, there's this

60

00:04:29,060 --> 00:04:32,010

class of models, structural equation

models.

61

00:04:32,010 --> 00:04:39,410

It's a pretty large class of models and I

think some special cases have been really

62

00:04:39,410 --> 00:04:40,366

useful.

63

00:04:40,366 --> 00:04:45,116

Others sometimes get a bad reputation

with, I think, certain groups of

64

00:04:45,116 --> 00:04:46,486

statistics people.

65

00:04:46,486 --> 00:04:52,216

But it's this big class and it has

interested me for a long time because so

66

00:04:52,216 --> 00:04:56,246

much can be done with this class of

models.

67

00:04:56,246 --> 00:05:03,666

So the Bayesian estimation part has

especially been interesting to me because

68

00:05:03,666 --> 00:05:07,374

it was relatively underexplored for a long

time.

69

00:05:07,374 --> 00:05:13,014

And there's some unique challenges there

that I have found and I've tried to make

70

00:05:13,014 --> 00:05:14,694

some progress on.

71

00:05:16,814 --> 00:05:17,314

Yeah.

72

00:05:17,314 --> 00:05:24,234

And we're going to dive into these topics

for sure in the coming minutes.

73

00:05:24,394 --> 00:05:31,054

But to still talk about your background,

do you remember how you first got

74

00:05:31,054 --> 00:05:36,594

introduced to Bayesian inference and also

why they sticked with you?

75

00:05:36,854 --> 00:05:38,014

Yes.

76

00:05:40,114 --> 00:05:46,350

I think part of how I got interested in

Bayesian inference,

77

00:05:46,350 --> 00:05:51,850

starts a lot earlier to when I was growing

up.

78

00:05:51,850 --> 00:05:57,130

I'm about the age where the first half of

my childhood, there were no computers.

79

00:05:57,130 --> 00:06:03,560

And the second half of growing up,

computers were in people's houses, the

80

00:06:03,560 --> 00:06:05,680

internet was coming around and so on.

81

00:06:05,680 --> 00:06:12,530

So I grew up with having a computer in my

house for the first time.

82

00:06:12,530 --> 00:06:13,326

And then...

83

00:06:13,326 --> 00:06:17,466

just messing around with it and learning

how to do things on it.

84

00:06:17,466 --> 00:06:24,426

So then later, a while later when I was

working on my PhD, I grew up with the

85

00:06:24,426 --> 00:06:28,786

computing topics and I enjoyed that.

86

00:06:29,466 --> 00:06:36,306

So I felt at the time with Bayesian

estimation, some of the interesting

87

00:06:36,306 --> 00:06:41,390

computing things were coming out around

the time I was working on my PhD.

88

00:06:41,390 --> 00:06:48,650

So for example, wind bugs was a big thing,

say around 2000, 2001 or so.

89

00:06:48,650 --> 00:06:51,850

That was when I was starting to work on my

PhD.

90

00:06:52,630 --> 00:06:57,520

And that seemed like a fun little program

where you could build these models and do

91

00:06:57,520 --> 00:06:59,370

some Bayesian estimation.

92

00:06:59,550 --> 00:07:03,590

At the time, I didn't always know exactly

what I was doing, but I still found it

93

00:07:03,590 --> 00:07:09,448

interesting and perhaps a bit more

intuitive than some of the other.

94

00:07:09,448 --> 00:07:12,158

methods that were out there at the time.

95

00:07:13,078 --> 00:07:13,918

Yeah.

96

00:07:13,918 --> 00:07:22,768

And actually it seems like you've been

part of that movement, which introduced

97

00:07:22,768 --> 00:07:27,958

patient stats a lot in the psychological

sciences.

98

00:07:28,318 --> 00:07:34,818

Can you elaborate on the role of the

patient framework in psychological

99

00:07:34,818 --> 00:07:35,858

research?

100

00:07:35,858 --> 00:07:39,118

Always a hard word to say when you have a

French accent.

101

00:07:39,118 --> 00:07:43,118

I understand.

102

00:07:45,638 --> 00:07:52,638

So yeah, when I was working on my PhD, I

think there was not a lot of psychology

103

00:07:52,638 --> 00:07:56,738

applications necessarily, or maybe it was

just in certain areas.

104

00:07:56,738 --> 00:08:03,478

So when I started on my PhD, I was doing

like some cognitive psychology modeling

105

00:08:03,478 --> 00:08:05,422

where you would bring.

106

00:08:05,422 --> 00:08:10,272

someone into a room for an experiment and

it could be about memory or something

107

00:08:10,272 --> 00:08:15,442

where you have them remember a list of

words and then you give them a new list of

108

00:08:15,442 --> 00:08:19,422

words and ask them which did you see

before and which are new and then you can

109

00:08:19,422 --> 00:08:23,182

model people's response times or accuracy.

110

00:08:24,542 --> 00:08:28,902

So there were some Bayesian applications

definitely related to like memory modeling

111

00:08:28,902 --> 00:08:33,702

at that time but more generally there were

less applications.

112

00:08:33,806 --> 00:08:39,886

I did my PhD on some Bayesian structural

equation modeling applications to missing

113

00:08:39,886 --> 00:08:41,086

data.

114

00:08:41,126 --> 00:08:45,786

At the time, I had a really hard time

publishing that work.

115

00:08:45,786 --> 00:08:49,966

I think it was partly because I just

wasn't that great at writing papers at the

116

00:08:49,966 --> 00:08:53,486

time, but also there weren't as many

Bayesian applications.

117

00:08:53,486 --> 00:08:56,686

So I think people were less interested.

118

00:08:57,166 --> 00:09:00,782

But over time that has changed, I think

with...

119

00:09:00,782 --> 00:09:05,482

with improved tools and more attention to

Bayesian modeling.

120

00:09:05,542 --> 00:09:08,502

You see it more and more in psychology.

121

00:09:08,502 --> 00:09:13,242

Sometimes it's just an alternative to

frequentness.

122

00:09:13,242 --> 00:09:18,302

Like if you're doing a regression or a

mixed model, Bayesian is just an

123

00:09:18,302 --> 00:09:19,602

alternative.

124

00:09:20,322 --> 00:09:24,022

Other times, like for the structural

equation models, there can be some

125

00:09:24,022 --> 00:09:28,492

advantages to the Bayesian approach,

especially related to characterizing

126

00:09:28,492 --> 00:09:29,802

uncertainty.

127

00:09:30,125 --> 00:09:34,995

And so I think there's more and more

attention in psychology and psychometrics

128

00:09:34,995 --> 00:09:36,845

to some of those issues.

129

00:09:37,985 --> 00:09:39,365

Yeah.

130

00:09:39,845 --> 00:09:46,645

And definitely interesting to see, to hear

that the publishing has, has gotten, has

131

00:09:46,645 --> 00:09:49,545

become easier, at least for you.

132

00:09:49,745 --> 00:09:58,701

And a method you're especially working on

and developing is Bayesian structural

133

00:09:58,701 --> 00:10:01,541

equation modeling or BSEM.

134

00:10:01,541 --> 00:10:04,351

So we've never covered that yet on the

show.

135

00:10:04,351 --> 00:10:09,071

So could you give our listeners a primer

on BSEM and its importance in

136

00:10:09,071 --> 00:10:10,201

psychometrics?

137

00:10:11,261 --> 00:10:12,441

Yes.

138

00:10:12,781 --> 00:10:18,801

So this Bayesian structural equation

modeling framework, or maybe I can start

139

00:10:18,801 --> 00:10:28,253

with just the structural equation modeling

part, that overlaps with lots of other

140

00:10:28,253 --> 00:10:30,693

modeling frameworks.

141

00:10:31,113 --> 00:10:36,463

So item response models and factor

analysis models, these are more on the

142

00:10:36,463 --> 00:10:44,593

measurement side, examining how say some

tests or scales help us to measure a

143

00:10:44,593 --> 00:10:46,193

person's aptitude.

144

00:10:46,313 --> 00:10:51,913

Those could all be viewed as special cases

of structural equation models, but the

145

00:10:51,913 --> 00:10:55,113

heart of structural equation models

involves,

146

00:10:57,365 --> 00:11:03,365

Like a series of regression models all in

in one big model.

147

00:11:03,365 --> 00:11:09,415

So if if you know, like the directed

acyclic graphs that come from causal

148

00:11:09,415 --> 00:11:18,995

research, especially Judea Pearl, you can

think of structural equation models as a

149

00:11:18,995 --> 00:11:21,155

way to estimate those types of models.

150

00:11:21,155 --> 00:11:24,173

Like these graphs will often have many

variables.

151

00:11:24,173 --> 00:11:29,313

and you have arrows between variables that

reflect some causal relationships.

152

00:11:29,333 --> 00:11:33,493

Well, now structural equation models are

throwing likelihoods on top of that,

153

00:11:33,493 --> 00:11:36,613

typically normal likelihoods.

154

00:11:36,613 --> 00:11:41,393

And that gives us a way to fit these sorts

of models to data.

155

00:11:41,613 --> 00:11:47,673

Whereas directed acyclic graph would

often, you look at that and that helps you

156

00:11:47,673 --> 00:11:53,197

to know what is estimable and what is not

estimable, say.

157

00:11:53,197 --> 00:11:58,537

that now the structural equation model is

a way to fit that sort of thing to data.

158

00:11:59,117 --> 00:12:02,617

But it also overlaps with mixed models.

159

00:12:04,537 --> 00:12:09,067

Like I said, the item response models,

there's some ideas related to principal

160

00:12:09,067 --> 00:12:10,937

components in there.

161

00:12:11,197 --> 00:12:13,717

It overlaps with a lot of things.

162

00:12:14,437 --> 00:12:22,729

Yeah, that's really interesting to have

that take you on structural.

163

00:12:22,885 --> 00:12:30,885

structural equation modeling and the

relationship to causal inference in a way.

164

00:12:31,145 --> 00:12:38,025

And so as you were saying, it also relates

to UDA pearls, to calculus and things like

165

00:12:38,025 --> 00:12:38,765

that.

166

00:12:38,765 --> 00:12:44,915

So I definitely encourage the listener to

dive deeper on these literature that's

167

00:12:44,915 --> 00:12:45,775

absolutely fascinating.

168

00:12:45,775 --> 00:12:46,805

I really love that.

169

00:12:46,805 --> 00:12:51,949

And that's also from my own perspective

learning about those

170

00:12:51,949 --> 00:12:58,949

things recently, I found that it was way

easier being already a Bayesian.

171

00:12:58,949 --> 00:13:05,259

If you already do Bayesian models from a

generative modeling perspective, then

172

00:13:05,259 --> 00:13:11,489

intervening on the graph and doing, like

in calculus, doing an intervention is

173

00:13:11,489 --> 00:13:16,169

basically like doing bus operative

sampling as you were already doing on your

174

00:13:16,169 --> 00:13:17,229

Bayesian model.

175

00:13:17,229 --> 00:13:20,009

But instead of having already

176

00:13:20,109 --> 00:13:30,439

conditioned on some data, you come up with

the platonic idea of the data generative

177

00:13:30,439 --> 00:13:31,719

model that you have in mind.

178

00:13:31,719 --> 00:13:37,049

And then you intervene on the model by

setting some values on some of the nodes

179

00:13:37,049 --> 00:13:41,719

and then seeing what that gives you, what

that intervention gives you on the

180

00:13:41,719 --> 00:13:42,549

outcome.

181

00:13:42,549 --> 00:13:47,749

And I find that really, really natural to

learn already from a Bayesian perspective.

182

00:13:47,749 --> 00:13:49,837

I don't know what your experience has

been.

183

00:13:49,837 --> 00:13:57,637

Oh, yeah, I think the Bayesian perspective

really helps you keep these models at like

184

00:13:57,637 --> 00:14:00,157

the raw data level.

185

00:14:00,157 --> 00:14:06,367

So you're thinking about how do individual

variables cause other variables and what

186

00:14:06,367 --> 00:14:08,897

does that mean about data predictions?

187

00:14:09,297 --> 00:14:14,177

If you look at often how frequent this

present these models.

188

00:14:16,749 --> 00:14:19,769

We have something like random effects in

these models.

189

00:14:19,769 --> 00:14:24,449

And so from a frequentist perspective, you

wanna get rid of those random effects,

190

00:14:24,449 --> 00:14:26,329

marginalize them out of a model.

191

00:14:26,329 --> 00:14:31,509

And then for these models, we're left with

some structured covariance matrix.

192

00:14:31,509 --> 00:14:35,849

And often the frequentist will start with,

okay, you have an observed covariance

193

00:14:35,849 --> 00:14:39,629

matrix and then our model implies a

covariance matrix.

194

00:14:39,629 --> 00:14:42,893

But I find that so it's...

195

00:14:42,893 --> 00:14:46,493

it's unintuitive to think about compared

to raw data.

196

00:14:46,493 --> 00:14:52,403

You know, like I can see how the data from

one variable can influence another

197

00:14:52,403 --> 00:14:56,673

variable, but now to think about what does

that mean about the prediction for a

198

00:14:56,673 --> 00:15:02,073

covariance that I think makes it less

intuitive and that's really where some of

199

00:15:02,073 --> 00:15:04,573

the Bayesian models have an advantage.

200

00:15:05,613 --> 00:15:06,673

Yeah, yeah, definitely.

201

00:15:06,673 --> 00:15:11,725

And that's why my learning myself on

202

00:15:11,725 --> 00:15:17,185

on this front and also teaching about

these topics has been extremely helpful

203

00:15:17,185 --> 00:15:23,185

for myself because to teach it, you really

have to understand it really well.

204

00:15:23,205 --> 00:15:28,175

So that was a great Or said differently

that you don't understand it until you

205

00:15:28,175 --> 00:15:29,125

teach it.

206

00:15:29,125 --> 00:15:34,385

I've thought that I understood things

before, but then when I teach it, I

207

00:15:34,385 --> 00:15:37,725

realized, well, I didn't quite understand

everything.

208

00:15:39,025 --> 00:15:39,805

Yeah, for sure.

209

00:15:39,805 --> 00:15:41,069

Definitely.

210

00:15:41,069 --> 00:15:52,369

And what advice would you give to someone

who is already a Bayesian and want to

211

00:15:52,369 --> 00:15:57,599

learn about these structural equation

modeling, and to someone who is already

212

00:15:57,599 --> 00:16:01,989

doing psychometrics and would like to now

learn about these structural equation

213

00:16:01,989 --> 00:16:02,339

modeling?

214

00:16:02,339 --> 00:16:06,889

What advice would you give to help them

start on this path?

215

00:16:07,509 --> 00:16:09,231

Yeah, I think.

216

00:16:10,541 --> 00:16:13,701

For people who already know Bayesian

models.

217

00:16:17,837 --> 00:16:24,817

I think I would explain structural

equation models as like a combination of

218

00:16:24,817 --> 00:16:30,497

say principal components or factor

analysis and then regression.

219

00:16:30,877 --> 00:16:37,357

And I think you can, there's these

expressions for the structural equation

220

00:16:37,357 --> 00:16:42,107

modeling framework where you have these

big matrices and depending on what goes in

221

00:16:42,107 --> 00:16:45,133

the matrices, you get certain models.

222

00:16:45,133 --> 00:16:50,093

I would almost advise against starting

there because you can have this giant

223

00:16:50,093 --> 00:16:57,143

framework that's expressing matrices, but

it gets very confusing about what goes in

224

00:16:57,143 --> 00:17:01,913

what matrix or what does this mean from a

general perspective.

225

00:17:01,913 --> 00:17:07,513

I would almost advise starting smaller,

say with some factor analysis models, or

226

00:17:07,513 --> 00:17:13,073

you can have these models where there's

one unobserved variable regressed on

227

00:17:13,073 --> 00:17:15,181

another unobserved variable.

228

00:17:15,181 --> 00:17:20,441

I would say like starting with some of

those models and then working your way up.

229

00:17:20,701 --> 00:17:25,171

On the other hand, if someone already

knows the psychometric models and is

230

00:17:25,171 --> 00:17:32,481

moving to Bayesian modeling, I think the

challenge is to think of these models

231

00:17:32,481 --> 00:17:36,741

again as models of data, not as models of

a covariance matrix.

232

00:17:36,741 --> 00:17:39,881

I guess that's related to what we talked

about earlier.

233

00:17:39,881 --> 00:17:45,229

But if you know the frequentist models,

typically the

234

00:17:45,229 --> 00:17:51,549

just how they talk about these models

involves just a covariance matrix or

235

00:17:51,549 --> 00:17:57,839

tricks for marginalizing over the random

effects or the random parameters in the

236

00:17:57,839 --> 00:17:58,729

model.

237

00:17:58,729 --> 00:18:04,289

And I think taking a step back and looking

at what does the model say about the data

238

00:18:04,289 --> 00:18:09,209

before we try to get rid of these random

parameters, I think that is helpful for

239

00:18:09,209 --> 00:18:11,809

thinking through the Bayesian approach.

240

00:18:12,229 --> 00:18:13,569

Okay, yeah.

241

00:18:13,569 --> 00:18:14,989

Yeah, super interesting.

242

00:18:14,989 --> 00:18:24,819

in the then I would also want to ask you

once you once you've done that so once

243

00:18:24,819 --> 00:18:33,989

you're into BSEM why is that useful and

what is its importance in your field of

244

00:18:33,989 --> 00:18:35,817

psychometrics these days?

245

00:18:37,549 --> 00:18:47,459

Yeah, so the Bayesian part, I would say

one use is, I think it slows you down a

246

00:18:47,459 --> 00:18:47,809

bit.

247

00:18:47,809 --> 00:18:51,389

There are certain, say, specifying prior

distributions and really thinking through

248

00:18:51,389 --> 00:18:53,189

the prior distributions.

249

00:18:53,189 --> 00:18:56,829

This is something you don't encounter on

the frequentist side.

250

00:18:56,829 --> 00:19:01,409

It's going to slow you down, but I think

for these models, that ends up being

251

00:19:01,409 --> 00:19:03,781

useful because...

252

00:19:04,309 --> 00:19:09,839

You know, if you simulate data from priors

and really look at what are these priors

253

00:19:09,839 --> 00:19:15,139

saying about the sort of data I can

expect, I find that helps you understand

254

00:19:15,139 --> 00:19:21,889

these models in a way that you don't often

get from the frequentist side.

255

00:19:24,397 --> 00:19:32,817

And then I guess said differently, I think

over say the past 30, 40 years with these

256

00:19:32,817 --> 00:19:38,057

structural equation models, I think often

in the field we've come to expect that I

257

00:19:38,057 --> 00:19:42,567

can specify this giant model and hit a

button and run it.

258

00:19:42,567 --> 00:19:48,477

And then I get some results and report

just a few results from this big model.

259

00:19:48,477 --> 00:19:52,723

I think we've lost something with

understanding what.

260

00:19:52,723 --> 00:19:55,653

exactly as this model is saying about the

data.

261

00:19:55,653 --> 00:20:00,133

And that's a place where the Bayesian

versions of these models can be really

262

00:20:00,133 --> 00:20:01,193

helpful.

263

00:20:02,673 --> 00:20:06,963

I think there was a second part to your

question, but I forgot the second part.

264

00:20:06,963 --> 00:20:12,953

Yeah, what is the importance of BSCM these

days in psychometrics?

265

00:20:13,413 --> 00:20:14,813

Yeah, yeah.

266

00:20:16,013 --> 00:20:19,593

I think there's a couple, I think key

advantages.

267

00:20:19,653 --> 00:20:24,143

One, again, we have random parameters that

are sort of like random effects if you

268

00:20:24,143 --> 00:20:25,873

know mixed models.

269

00:20:26,933 --> 00:20:33,183

And with MCMC, we can sample these

parameters and characterize their

270

00:20:33,183 --> 00:20:38,213

uncertainty or allow the uncertainty in

these random parameters to filter through

271

00:20:38,213 --> 00:20:40,153

to other model predictions.

272

00:20:40,153 --> 00:20:44,633

That's something that's very natural to do

from a Bayesian perspective.

273

00:20:45,421 --> 00:20:48,081

potentially not from other perspectives.

274

00:20:50,661 --> 00:20:54,081

So there's a random parameter piece.

275

00:20:54,081 --> 00:20:59,371

Another thing that people talk about a lot

is fitting these models to smaller sample

276

00:20:59,371 --> 00:21:00,461

sizes.

277

00:21:00,461 --> 00:21:05,591

So for some of these structural equation

models, there's a lot happening and you

278

00:21:05,591 --> 00:21:10,621

can get these failures to converge if

you're estimating frequentist versions of

279

00:21:10,621 --> 00:21:11,821

the model.

280

00:21:12,421 --> 00:21:13,861

Bayesian models,

281

00:21:14,093 --> 00:21:15,653

can still work there.

282

00:21:15,653 --> 00:21:19,513

I think you still have to be careful

because of course if you don't have much

283

00:21:19,513 --> 00:21:26,663

data, the priors are going to be more

influential and sensitivity analyses and

284

00:21:26,663 --> 00:21:28,453

things become very important.

285

00:21:28,453 --> 00:21:35,533

So I think it's not just a full solution

to if you don't have much data, but I

286

00:21:35,533 --> 00:21:39,793

think you can make some progress there

with Bayesian models that are maybe more

287

00:21:39,793 --> 00:21:42,113

difficult with frequentist models.

288

00:21:44,461 --> 00:21:45,381

Okay, I see.

289

00:21:45,381 --> 00:21:50,561

And on the other end, what are some of the

biggest challenges you've encountered in

290

00:21:50,561 --> 00:21:54,853

BSM estimation and how does your work

address them?

291

00:21:57,197 --> 00:22:06,697

I've found I encounter problems as I'm

working on my R package or just

292

00:22:06,697 --> 00:22:08,037

unestimating the models.

293

00:22:08,037 --> 00:22:12,417

There's a number of problems that aren't

completely evident when you start.

294

00:22:12,457 --> 00:22:19,617

And one I've worked on recently and I

continue to work on is specifying prior

295

00:22:19,617 --> 00:22:25,407

distributions for these models in a way

that you know exactly what the prior

296

00:22:25,407 --> 00:22:27,085

distributions are.

297

00:22:27,085 --> 00:22:31,785

in a non -software dependent way.

298

00:22:32,405 --> 00:22:41,565

So in some of these models, there's, say

there's a covariance matrix, a free

299

00:22:41,565 --> 00:22:42,095

parameter.

300

00:22:42,095 --> 00:22:45,165

So you're estimating a full covariance

matrix.

301

00:22:45,165 --> 00:22:52,685

Now, in certain cases of these models, I'm

going to fix some off diagonal elements of

302

00:22:52,685 --> 00:22:55,103

this covariance matrix to zero.

303

00:22:55,245 --> 00:22:59,625

but then I want to freely estimate the

rest of this covariance matrix.

304

00:22:59,745 --> 00:23:06,425

That becomes very difficult when you're

specifying prior distributions now because

305

00:23:06,425 --> 00:23:10,465

we have to keep this full covariance

matrix positive definite.

306

00:23:10,625 --> 00:23:14,805

And I have prior distributions for like an

unrestricted covariance matrix.

307

00:23:14,805 --> 00:23:18,245

You could do a Wishard or an LKJ, say.

308

00:23:18,245 --> 00:23:22,925

But to have this covariance matrix where

some of the entries are, say, fixed to

309

00:23:22,925 --> 00:23:23,917

zero,

310

00:23:23,917 --> 00:23:28,097

but I still have to keep this full

covariance matrix positive definite.

311

00:23:28,117 --> 00:23:31,657

The prior distributions become very

challenging there.

312

00:23:31,657 --> 00:23:37,797

And there's some workarounds that are, I

would say, allow you to estimate the

313

00:23:37,797 --> 00:23:42,447

model, but make it difficult to describe

exactly what prior distribution did you

314

00:23:42,447 --> 00:23:43,837

use here.

315

00:23:44,117 --> 00:23:48,577

That's a piece that continues to challenge

me.

316

00:23:50,097 --> 00:23:53,097

Yeah, and so what are you?

317

00:23:53,483 --> 00:23:57,305

What I'm working on these days to try and

address that.

318

00:23:59,213 --> 00:24:00,293

Um

319

00:24:01,889 --> 00:24:08,209

I've been, I've looked at some ways to

decompose a covariance matrix.

320

00:24:08,209 --> 00:24:12,759

So let's say the Kolesky factors or

things, and we have put prior

321

00:24:12,759 --> 00:24:17,989

distributions on some decomposition of

this covariance matrix so that it's easy

322

00:24:17,989 --> 00:24:26,199

to put, say, some normal priors on the

elements of the decomposition while

323

00:24:26,199 --> 00:24:30,209

maintaining this positive definite full

covariance matrix.

324

00:24:30,689 --> 00:24:31,321

And,

325

00:24:31,437 --> 00:24:38,527

I think I made some progress there, but

then you get into this situation where I

326

00:24:38,527 --> 00:24:42,637

want to put my prior distributions on

intuitive things.

327

00:24:42,677 --> 00:24:50,617

If I get to like some Kolesky factor that

might have some intuitive interpretation,

328

00:24:51,057 --> 00:24:52,877

but sometimes maybe not.

329

00:24:52,877 --> 00:24:56,677

And you run into this problem then of,

okay, if I want to put a prior

330

00:24:56,677 --> 00:24:58,517

distribution on this.

331

00:24:59,085 --> 00:25:03,205

could I meaningfully do that or could a

user meaningfully do that versus they

332

00:25:03,205 --> 00:25:08,145

would just use some default because they

don't know what else they would put on

333

00:25:08,145 --> 00:25:08,945

that.

334

00:25:08,945 --> 00:25:12,185

That becomes a bit of a problem too.

335

00:25:12,545 --> 00:25:14,605

Yeah, yeah.

336

00:25:15,265 --> 00:25:23,915

That's definitely also something I have to

handle when I am teaching these kind of

337

00:25:23,915 --> 00:25:25,085

the compositions.

338

00:25:26,285 --> 00:25:28,069

Like usually the way I...

339

00:25:28,109 --> 00:25:34,069

teach that is when you do that in a linear

regression, for instance, and you would

340

00:25:34,069 --> 00:25:40,619

try and infer not only the intercept and

the slope, but the correlation of

341

00:25:40,619 --> 00:25:42,249

intercept and slope.

342

00:25:42,289 --> 00:25:48,089

And so that way, if the intercept, like if

you have a negative covariance matrix, for

343

00:25:48,089 --> 00:25:50,489

instance, that's inferred between the

intercept and the slope.

344

00:25:50,489 --> 00:25:54,559

That means, well, if you observe a group

and if you do that in a hierarchical

345

00:25:54,559 --> 00:25:57,801

model, particularly, that's very useful.

346

00:25:57,801 --> 00:26:02,581

Because that means, well, if I'm in a

group of the hierarchical model where the

347

00:26:02,581 --> 00:26:07,761

intercepts are high, that probably means

that the slopes are low.

348

00:26:08,541 --> 00:26:12,721

So, because we have that negative

covariation.

349

00:26:13,601 --> 00:26:18,051

And that's interesting because that allows

the model to squeeze even more information

350

00:26:18,051 --> 00:26:22,501

from the data and so make even more

informed and accurate predictions.

351

00:26:22,501 --> 00:26:26,541

But of course, to do that, the challenge,

352

00:26:26,541 --> 00:26:31,091

is that you have to infer a covariance

matrix between the intercept and the

353

00:26:31,091 --> 00:26:31,461

slope.

354

00:26:31,461 --> 00:26:35,761

How do you infer that covariance matrix

that usually tends to be hard and

355

00:26:35,761 --> 00:26:37,161

computationally intensive?

356

00:26:37,161 --> 00:26:41,421

And so that's where the decomposition of

the covariance matrix enters the round.

357

00:26:41,421 --> 00:26:47,441

So especially the Kolesky decomposition of

the covariance matrix, that's what we

358

00:26:47,441 --> 00:26:50,201

usually recommend doing in PMC.

359

00:26:50,341 --> 00:26:55,747

And we have that PM .LKJKoleskykov

distribution.

360

00:26:56,141 --> 00:27:05,671

And two parametrized that you have to give

a prior on the correlation matrix, which

361

00:27:05,671 --> 00:27:06,801

is a bit weird.

362

00:27:06,801 --> 00:27:11,001

But when you think about it, when people

think about it, it's like, wait, prior as

363

00:27:11,001 --> 00:27:17,051

a distribution, understand a prior as a

distribution on a correlation matrix is

364

00:27:17,051 --> 00:27:18,581

hard to understand.

365

00:27:18,981 --> 00:27:22,157

But actually, when you decompose, it's not

that hard.

366

00:27:22,157 --> 00:27:25,947

because it's mainly, well, what's the

parameter that's inside a correlation

367

00:27:25,947 --> 00:27:26,497

matrix?

368

00:27:26,497 --> 00:27:30,937

That's parameter that says there is a

correlation between A and B.

369

00:27:30,937 --> 00:27:36,147

And so what is your a priori belief of

that correlation between the intercept and

370

00:27:36,147 --> 00:27:37,137

the slope?

371

00:27:37,857 --> 00:27:43,717

And so usually you don't want the

completely flat prior, which stays any

372

00:27:43,717 --> 00:27:46,117

correlation is possible with the same

degree of belief.

373

00:27:46,117 --> 00:27:51,173

So that means I really think that there is

as much possibility of that

374

00:27:51,213 --> 00:27:57,353

of slopes and intercept to be completely

positively correlated as they have a

375

00:27:57,353 --> 00:27:59,453

possibility to be not at all correlated.

376

00:27:59,453 --> 00:28:00,373

I'm not sure.

377

00:28:00,373 --> 00:28:05,193

So if you think that, then you need to use

a regularizing weighting information

378

00:28:05,193 --> 00:28:07,733

priors as you do for any other parameters.

379

00:28:07,733 --> 00:28:14,143

So you could think of coming up with a

prior that's a bit more bell -shaped prior

380

00:28:14,143 --> 00:28:19,933

in a way that gives more mass to the low.

381

00:28:20,053 --> 00:28:20,953

Yeah.

382

00:28:21,005 --> 00:28:23,325

to smaller correlations.

383

00:28:23,365 --> 00:28:28,205

And then that's how usually you would do

that in PMC.

384

00:28:28,205 --> 00:28:31,005

And that's what you're basically talking

about.

385

00:28:31,845 --> 00:28:34,925

Of course, that's more complicated and it

makes your model more complex.

386

00:28:34,925 --> 00:28:40,105

But once you have ran that model and have

that inference, that can be extremely

387

00:28:40,105 --> 00:28:44,425

useful and powerful for posterior

analysis.

388

00:28:44,425 --> 00:28:46,185

So it's trade -off.

389

00:28:46,445 --> 00:28:48,065

Yeah, yeah, definitely.

390

00:28:48,065 --> 00:28:50,525

But that reminds me of...

391

00:28:51,213 --> 00:28:58,613

I would say like in psychology, in

psychometrics, there's still a lot of

392

00:28:58,613 --> 00:29:00,873

hesitance to use informative priors.

393

00:29:00,873 --> 00:29:06,713

There's still the idea of I want to do

something objective.

394

00:29:06,713 --> 00:29:12,923

And so I want my priors to be all flat,

which especially like you say for a

395

00:29:12,923 --> 00:29:17,673

correlation or even for other parameters,

I'm against that.

396

00:29:17,673 --> 00:29:20,109

Now I would like to put some...

397

00:29:20,109 --> 00:29:26,199

information in my priors always, but that

is always a challenge because like for the

398

00:29:26,199 --> 00:29:32,299

models I work with, users are accustomed,

like I said, to specifying this big model

399

00:29:32,299 --> 00:29:35,729

and pressing a button and it runs and it

estimates.

400

00:29:36,029 --> 00:29:42,809

But now you do that in a Bayesian context

with these uninformative priors.

401

00:29:43,069 --> 00:29:46,869

Sometimes you just run into problems and

you have to think more about the priors

402

00:29:46,869 --> 00:29:48,609

and add some information.

403

00:29:48,649 --> 00:29:49,517

Yeah.

404

00:29:49,517 --> 00:29:52,187

Which is, if you ask me, a blessing in

disguise, right?

405

00:29:52,187 --> 00:29:58,747

Because just because a model seems to run

doesn't mean it is giving you sensible

406

00:29:58,747 --> 00:30:00,817

results and unbiased results.

407

00:30:00,937 --> 00:30:08,927

I actually love the fact that usually HMC

is really unforgiving of really bad

408

00:30:08,927 --> 00:30:10,037

priors.

409

00:30:11,197 --> 00:30:17,187

So of course, it's usually something we

tend to teach is, try to use priors that

410

00:30:17,187 --> 00:30:17,917

make sense, right?

411

00:30:17,917 --> 00:30:18,625

A priori.

412

00:30:18,625 --> 00:30:21,365

Most of the time you have more information

than you think.

413

00:30:21,365 --> 00:30:25,075

And if you're thinking from a betting

perspective, like let's say that any

414

00:30:25,075 --> 00:30:27,965

decision you make with your model is

actually something that's going to cost

415

00:30:27,965 --> 00:30:29,565

you money or give you money.

416

00:30:29,565 --> 00:30:36,225

If you were to bet on that prior, why

wouldn't you use any information that you

417

00:30:36,225 --> 00:30:37,525

have at your disposal?

418

00:30:37,705 --> 00:30:41,795

Why would you throw away information if

you knew that actually you had information

419

00:30:41,795 --> 00:30:45,037

that would help you make a more

informed...

420

00:30:45,037 --> 00:30:49,297

bet and so bet that gives you actually

more money instead of losing money.

421

00:30:49,517 --> 00:30:57,267

And so I find that this way of framing the

priors can actually like usually works on

422

00:30:57,267 --> 00:31:00,757

beginners because that helps them see the

like the idea.

423

00:31:00,757 --> 00:31:05,037

It's like the idea is not to fudge your

analysis, even though I can show you how

424

00:31:05,037 --> 00:31:06,657

to fudge your analysis, but in both ways.

425

00:31:06,657 --> 00:31:11,877

I can use priors which are going to bias

the model, but I can also use priors that

426

00:31:11,877 --> 00:31:13,537

are going to completely

427

00:31:14,347 --> 00:31:18,257

unbiased the model, but just make it so

variable that it's just going to answer

428

00:31:18,257 --> 00:31:20,697

very aggressively to any data point.

429

00:31:21,117 --> 00:31:22,677

And do you really want that?

430

00:31:22,677 --> 00:31:23,877

I'm not sure.

431

00:31:23,877 --> 00:31:30,617

Do you really want to make very hard

claims based on very small data?

432

00:31:30,637 --> 00:31:31,537

I'm not sure.

433

00:31:31,537 --> 00:31:36,957

So again, if you come back to this idea

of, imagine that you're betting.

434

00:31:37,677 --> 00:31:39,977

Wouldn't you use all the information you

have at your disposal?

435

00:31:39,977 --> 00:31:40,717

That's all.

436

00:31:40,717 --> 00:31:43,277

That's everything you're doing.

437

00:31:43,277 --> 00:31:45,397

That doesn't mean that information is

golden.

438

00:31:45,397 --> 00:31:47,997

That doesn't mean you have to be extremely

certain about the information you're

439

00:31:47,997 --> 00:31:48,577

putting in.

440

00:31:48,577 --> 00:31:54,557

That just means let's try to put some more

structure because that doesn't make any

441

00:31:54,557 --> 00:31:59,157

sense if you're modeling football players.

442

00:31:59,157 --> 00:32:07,237

That doesn't make any sense to allow them

to be able to score 20 goals in a game.

443

00:32:07,237 --> 00:32:08,997

It doesn't ever happen.

444

00:32:08,997 --> 00:32:11,939

Why would you let the model...

445

00:32:12,267 --> 00:32:13,577

a low for that possibility.

446

00:32:13,577 --> 00:32:14,577

You don't want that.

447

00:32:14,577 --> 00:32:18,807

It's going to make your model harder to

estimate, longer, it's going to take

448

00:32:18,807 --> 00:32:20,057

longer to estimate also.

449

00:32:20,057 --> 00:32:22,557

And so that's just less efficient.

450

00:32:23,357 --> 00:32:24,077

Yeah.

451

00:32:24,337 --> 00:32:28,877

You mentioned too of HMC being

unforgiving.

452

00:32:29,557 --> 00:32:34,817

And yeah, a lot of the software that I've

been working on, the model is run and

453

00:32:34,817 --> 00:32:35,757

stand.

454

00:32:36,297 --> 00:32:40,857

And from time to time, well, for some of

these structural equation models, there's

455

00:32:40,857 --> 00:32:41,701

some...

456

00:32:41,837 --> 00:32:46,777

Like, weekly identified parameters, or

maybe even unidentified parameters, but I

457

00:32:46,777 --> 00:32:49,237

run into these situations where.

458

00:32:49,317 --> 00:32:53,567

Somebody runs a Gibbs sampler and they

say, look, it just worked and it converged

459

00:32:53,567 --> 00:33:00,387

and now I move this model over to Stan and

I'm getting these by modal posteriors or

460

00:33:00,387 --> 00:33:01,377

such and such.

461

00:33:01,377 --> 00:33:07,157

It's sort of like a bit of an education of

saying, well, the problem is at Stan.

462

00:33:07,157 --> 00:33:11,213

The problem was the model all along, but

the Gibbs sampler just didn't.

463

00:33:11,213 --> 00:33:13,313

tell you that there was a problem.

464

00:33:13,313 --> 00:33:13,993

Yeah, exactly.

465

00:33:13,993 --> 00:33:14,493

Exactly.

466

00:33:14,493 --> 00:33:14,853

Yeah.

467

00:33:14,853 --> 00:33:15,023

Yeah.

468

00:33:15,023 --> 00:33:17,813

That's like, that's a joke.

469

00:33:17,813 --> 00:33:23,223

I have actually a sticker like that, which

is a, which is a meme of, you know, that

470

00:33:23,223 --> 00:33:26,883

meme of that, that, that guy from a, I

think it's from the notebook, right?

471

00:33:26,883 --> 00:33:33,323

Who, who is crying and yeah, basically the

sticker I have is when someone tells me

472

00:33:33,323 --> 00:33:37,753

that the model he has divergences in HMC.

473

00:33:37,753 --> 00:33:41,133

So they are switching to the Metropolis

sampler and.

474

00:33:41,133 --> 00:33:44,393

I just dance like, yeah, sure.

475

00:33:44,473 --> 00:33:47,853

You're not going to have divergences with

the metropolis sampler.

476

00:33:47,913 --> 00:33:50,913

Doesn't mean the model is converting as

you want.

477

00:33:51,953 --> 00:33:59,173

And yeah, so that's really that thing

where, yeah, actually, you had problems

478

00:33:59,173 --> 00:33:59,963

with the model already.

479

00:33:59,963 --> 00:34:03,233

It's just that you were using a crude

instrument that wasn't able to give you

480

00:34:03,233 --> 00:34:04,613

these diagnostics.

481

00:34:04,613 --> 00:34:09,553

It's like doing an MRI with a stethoscope.

482

00:34:09,693 --> 00:34:10,477

Yeah.

483

00:34:10,477 --> 00:34:12,177

Yeah, that's not going to work.

484

00:34:12,177 --> 00:34:15,477

It's going to look like you don't have any

problems, but maybe you do.

485

00:34:15,477 --> 00:34:17,757

It's just like you're not using the right

tool.

486

00:34:18,357 --> 00:34:19,357

So yeah.

487

00:34:19,357 --> 00:34:25,097

And also this idea of, well, let's use

flat priors and just let the data speak.

488

00:34:25,097 --> 00:34:27,397

That can work from time to time.

489

00:34:27,397 --> 00:34:30,877

And that's definitely going to be the case

anyways, if you have a lot of data.

490

00:34:30,897 --> 00:34:35,857

Even if you're using weekly regularizing

priors, that's exactly the goal.

491

00:34:35,857 --> 00:34:38,577

It's just to give you enough structure to

the model in case the data are not

492

00:34:38,577 --> 00:34:40,515

informative for some parameters.

493

00:34:40,941 --> 00:34:45,511

The bigger the model, the more parameters,

well, the less informed the parameters are

494

00:34:45,511 --> 00:34:49,621

going to be if your data stay what they

are, keep being what they are, right?

495

00:34:49,621 --> 00:34:51,321

If you don't have more.

496

00:34:52,161 --> 00:35:02,701

And also that assumes that the data are

perfect, that there's no bias, that the

497

00:35:02,701 --> 00:35:05,321

data are completely trustworthy.

498

00:35:05,901 --> 00:35:07,171

Do you actually believe that?

499

00:35:07,171 --> 00:35:09,485

If you don't, well, then...

500

00:35:09,485 --> 00:35:11,175

You already know something about your

data, right?

501

00:35:11,175 --> 00:35:12,335

That's your prior right here.

502

00:35:12,335 --> 00:35:18,045

If you think that there is sampling bias

and you kind of know why, well, that's a

503

00:35:18,045 --> 00:35:19,065

prior information.

504

00:35:19,065 --> 00:35:20,935

So why wouldn't you tell that in the

model?

505

00:35:20,935 --> 00:35:23,735

Again, from that betting perspective,

you're just making your model's life

506

00:35:23,735 --> 00:35:28,505

harder and your inference is potentially

wrong.

507

00:35:28,825 --> 00:35:31,725

I'm guessing that's not what you want as

the modeler.

508

00:35:33,485 --> 00:35:36,865

Yeah, you can trust the data blindly.

509

00:35:36,905 --> 00:35:38,039

Should you though?

510

00:35:38,189 --> 00:35:41,429

That's a question you have to answer each

time you're doing a model.

511

00:35:42,169 --> 00:35:42,409

Yep.

512

00:35:42,409 --> 00:35:44,409

Most often than not, you cannot.

513

00:35:45,009 --> 00:35:46,269

Yeah, yeah.

514

00:35:46,649 --> 00:35:53,429

Yeah, the HMC failing thing, I think

that's a place where you can really see

515

00:35:53,429 --> 00:35:57,869

the progress that's been made in Bayesian

estimation.

516

00:35:57,969 --> 00:36:03,549

Just like say in the 20 some years that

I've been doing it, I can think back to

517

00:36:03,549 --> 00:36:05,389

starting out with wind bugs.

518

00:36:05,429 --> 00:36:07,597

You're just happy to get the thing to run.

519

00:36:07,597 --> 00:36:12,357

and to give you some decent convergence

diagnostics.

520

00:36:12,897 --> 00:36:19,997

I think a lot of the things we did around

the start of wind bugs, if you try to run

521

00:36:19,997 --> 00:36:26,127

them in Stan now, you find there were a

lot of problems that were just hidden or

522

00:36:26,127 --> 00:36:28,177

you're kind of overlooked.

523

00:36:29,477 --> 00:36:31,657

Yeah, yeah, yeah, for sure.

524

00:36:31,717 --> 00:36:37,453

And definitely that I think we've hammered

that point in the community quite a lot.

525

00:36:37,453 --> 00:36:39,013

in the last few years.

526

00:36:39,193 --> 00:36:45,333

And so definitely those points that I've

been making in the last few minutes are

527

00:36:45,333 --> 00:36:47,233

clearly starting to percolate.

528

00:36:47,273 --> 00:36:51,823

And I think the situation is way better

than it was a few years ago, just to be

529

00:36:51,823 --> 00:36:56,953

clear and not come across as complaining

statisticians.

530

00:36:56,973 --> 00:36:58,813

Because I'm already French.

531

00:36:58,813 --> 00:37:02,153

So people already imagine that I'm going

to assume that I'm going to complain.

532

00:37:02,153 --> 00:37:05,173

So if on top of that, I complain about

stats, I'm done.

533

00:37:05,173 --> 00:37:07,373

People are not going to listen to the

podcast anymore.

534

00:37:07,373 --> 00:37:10,213

I think you'll be all right.

535

00:37:12,933 --> 00:37:21,753

So to continue, I'd like to talk about

your Blavin package and what inspired the

536

00:37:21,753 --> 00:37:28,393

development of this package and how does

it enhance the capabilities of researchers

537

00:37:28,393 --> 00:37:30,753

in doing BSEM?

538

00:37:31,633 --> 00:37:36,269

Yeah, I think I said earlier my...

539

00:37:36,269 --> 00:37:42,919

PhD was about some Bayesian factor

analysis models and looking at some

540

00:37:42,919 --> 00:37:44,649

missing data issues.

541

00:37:44,649 --> 00:37:50,469

I would say it wasn't the greatest PhD

thesis, but it was finished.

542

00:37:50,469 --> 00:37:57,859

And at the time, I thought it would be

nice to have some software that would give

543

00:37:57,859 --> 00:38:02,259

you some somewhat simple way to specify a

model.

544

00:38:02,259 --> 00:38:04,589

And then it could be translated to

545

00:38:04,589 --> 00:38:12,709

like at the time wind bugs so that you

could have some easier MCMC estimation.

546

00:38:13,849 --> 00:38:21,479

But at that time, like, I, the, like R

wasn't as quite as developed and my skills

547

00:38:21,479 --> 00:38:24,989

weren't quite there to be able to do that

all on my own.

548

00:38:25,069 --> 00:38:34,041

So I left it for a few years, then around

2009 or so, I think.

549

00:38:34,091 --> 00:38:40,581

Some R packages for frequent structural

equation models were becoming better

550

00:38:40,581 --> 00:38:43,121

developed and more supported.

551

00:38:43,821 --> 00:38:51,841

So a few years later, I met the developer

of the LaVon package, which does frequent

552

00:38:51,841 --> 00:38:56,201

structural equation models and did some

work with him.

553

00:38:56,201 --> 00:39:00,109

And from there I thought, well,

554

00:39:00,109 --> 00:39:04,499

he's done some of the hard work already

just with model specification and setting

555

00:39:04,499 --> 00:39:06,309

up the model likelihood.

556

00:39:06,429 --> 00:39:10,449

So I built this package on top of what was

already there to do like the Bayesian

557

00:39:10,449 --> 00:39:13,129

version of that model estimation.

558

00:39:13,489 --> 00:39:16,669

And then it has just gone from there.

559

00:39:16,669 --> 00:39:23,089

I think I continue to learn more things

about these models or encounter tricky

560

00:39:23,089 --> 00:39:26,489

issues that I wasn't quite aware of when I

started.

561

00:39:26,829 --> 00:39:29,421

And I just have...

562

00:39:29,421 --> 00:39:31,081

continue it on.

563

00:39:32,221 --> 00:39:32,801

Yeah.

564

00:39:32,801 --> 00:39:35,921

Well, that sounds like a fun project for

sure.

565

00:39:37,021 --> 00:39:42,441

And how would people use it right now?

566

00:39:42,641 --> 00:39:47,221

When would you recommend using your

package for which type of problems?

567

00:39:48,161 --> 00:39:52,293

Well, the idea from the start was

always...

568

00:39:53,101 --> 00:39:57,951

make the model specification and

everything very similar to the LaVon

569

00:39:57,951 --> 00:40:03,381

package for Frequence models because that

package was already fairly popular among

570

00:40:03,381 --> 00:40:05,561

people that use these models.

571

00:40:05,881 --> 00:40:11,081

And the idea was, well, they could move to

doing a Bayesian version without having to

572

00:40:11,081 --> 00:40:13,501

learn a brand new model specification.

573

00:40:13,501 --> 00:40:17,551

They could already do something similar to

what they had been doing on the Frequence

574

00:40:17,551 --> 00:40:18,521

side.

575

00:40:19,301 --> 00:40:22,797

So that's like,

576

00:40:22,797 --> 00:40:28,457

from the start where we, the idea that we

had or what we wanted to do with a package

577

00:40:28,457 --> 00:40:32,167

and then who would use it?

578

00:40:32,167 --> 00:40:37,067

I think it could be for some of these

measurement problems, like I said, with

579

00:40:37,067 --> 00:40:41,887

item response modelers or things if they

wanted to do a Bayesian version of some of

580

00:40:41,887 --> 00:40:48,217

these models that's currently possible and

blah, blah, and another place is.

581

00:40:50,061 --> 00:40:57,221

With something kind of similar to the

DAGs, the directed acyclic graphs we talk

582

00:40:57,221 --> 00:41:02,871

about, especially in the social sciences,

people have these theories about they have

583

00:41:02,871 --> 00:41:07,211

a collection of variables and what

variables cause what other variables and

584

00:41:07,211 --> 00:41:11,941

they want to estimate some regression type

relationships between these things.

585

00:41:11,941 --> 00:41:15,591

You would see it often like an

observational data where you can't really

586

00:41:15,591 --> 00:41:16,653

do these.

587

00:41:16,653 --> 00:41:20,973

these manipulations the way you could in

an experiment.

588

00:41:21,193 --> 00:41:27,883

But the idea is that you could specify a

graph like that and use Blofond to try to

589

00:41:27,883 --> 00:41:32,883

estimate these regression -like

relationships that if the graph is

590

00:41:32,883 --> 00:41:36,553

correct, you might interpret it as causal

relationships.

591

00:41:39,213 --> 00:41:40,833

Yeah, fascinating, fascinating.

592

00:41:40,833 --> 00:41:42,053

I love that.

593

00:41:42,053 --> 00:41:46,477

And I'll put the package, of course, in

the show notes.

594

00:41:46,477 --> 00:41:50,597

And I encourage people to take a look at

the website.

595

00:41:50,597 --> 00:41:56,327

There are some tutorials and packages of

the, sorry, some tutorials on how to use

596

00:41:56,327 --> 00:41:58,077

the package on there.

597

00:41:58,077 --> 00:42:02,277

So yeah, definitely take a look at the

resources that are on the website.

598

00:42:02,277 --> 00:42:05,057

And of course, everything is on the show

notes.

599

00:42:05,817 --> 00:42:11,977

Another topic I thought was very

interesting from your background is that

600

00:42:11,977 --> 00:42:16,727

your research also touches on forecasting

and subjective probability.

601

00:42:16,727 --> 00:42:22,567

Can you discuss how Bayesian methods

improve these processes, particularly in

602

00:42:22,567 --> 00:42:26,837

crowdsourcing wisdom, which is something

you've worked on quite a lot?

603

00:42:27,217 --> 00:42:30,957

Yeah, I started working on that.

604

00:42:30,957 --> 00:42:35,117

It was probably 2009 or 2010.

605

00:42:35,117 --> 00:42:38,949

So at that time, I think...

606

00:42:39,479 --> 00:42:45,829

Tools like Mechanical Turk were becoming

more usable and so people were looking at

607

00:42:45,829 --> 00:42:50,949

this wisdom of Krausen saying, can we

recruit a large group of people from the

608

00:42:50,949 --> 00:42:51,949

internet?

609

00:42:51,949 --> 00:42:58,109

And if we average their predictions, do

those make for good predictions?

610

00:42:58,789 --> 00:43:03,219

I got involved in some of that work,

especially through some forecasting

611

00:43:03,219 --> 00:43:06,609

tournaments that were being run by

612

00:43:07,245 --> 00:43:12,505

the US government or some branches of the

US government at the time.

613

00:43:14,505 --> 00:43:21,685

I think Bayesian tools there first made

some model estimations easier just the way

614

00:43:21,685 --> 00:43:23,945

they sometimes do in general.

615

00:43:23,945 --> 00:43:27,985

But also with forecasting, it's all about

uncertainty.

616

00:43:27,985 --> 00:43:30,775

You might say, here's what I think will

happen.

617

00:43:30,775 --> 00:43:34,189

But then you also want to have some

characterization of.

618

00:43:34,189 --> 00:43:38,029

your certainty or uncertainty that

something happens.

619

00:43:38,029 --> 00:43:44,329

I think that's where the Bayesian approach

was really helpful.

620

00:43:44,329 --> 00:43:51,069

Of course, you always have this trade -off

with you are giving a forecast often to

621

00:43:51,069 --> 00:43:56,309

like a decision maker or an executive or

someone that is a leader.

622

00:43:56,489 --> 00:44:01,369

Those people sometimes want the simplest

forecast possible and it's sometimes

623

00:44:01,369 --> 00:44:03,373

difficult to convince them that,

624

00:44:03,373 --> 00:44:07,493

Well, you also want to look at the

uncertainty around this forecast as

625

00:44:07,493 --> 00:44:09,413

opposed to just a point estimate.

626

00:44:09,413 --> 00:44:09,703

Yeah.

627

00:44:09,703 --> 00:44:15,003

But that's some of the ways we were using

Bayesian methods, at least to try to

628

00:44:15,003 --> 00:44:16,813

characterize uncertainty.

629

00:44:17,593 --> 00:44:18,493

Yeah.

630

00:44:18,573 --> 00:44:19,193

Yeah.

631

00:44:19,193 --> 00:44:27,113

I'm becoming more and more authoritative

on these fronts, you know, just not even

632

00:44:27,113 --> 00:44:30,923

giving the point estimates anymore and by

default giving a range for the

633

00:44:30,923 --> 00:44:31,725

predictions.

634

00:44:31,725 --> 00:44:35,145

and then people have to ask you for the

point estimates.

635

00:44:35,265 --> 00:44:37,785

Then I can make the point of, do you

really want that?

636

00:44:37,785 --> 00:44:39,325

Why do you want that one?

637

00:44:39,465 --> 00:44:42,805

And why do you want the mean more than the

tail?

638

00:44:42,805 --> 00:44:46,485

Maybe in your case, actually, the tail

scenarios are more interesting.

639

00:44:47,345 --> 00:44:49,125

So keep that in mind.

640

00:44:49,145 --> 00:44:54,225

So yeah, people have to opt in to get the

point estimates.

641

00:44:54,505 --> 00:44:59,525

And well, the human brain being what it

is, usually it's happy with the default.

642

00:44:59,525 --> 00:45:00,109

And so...

643

00:45:00,109 --> 00:45:04,529

Making the default better is something I'm

trying to actually actively do.

644

00:45:05,029 --> 00:45:06,969

That's a good point.

645

00:45:07,509 --> 00:45:12,249

So what for reporting modeling results,

you avoid posterior means.

646

00:45:12,249 --> 00:45:16,279

All you give them is like a posterior

interval or something.

647

00:45:16,279 --> 00:45:17,249

A range.

648

00:45:17,389 --> 00:45:17,509

Yeah.

649

00:45:17,509 --> 00:45:18,489

Yeah.

650

00:45:18,609 --> 00:45:19,849

Yeah, exactly.

651

00:45:20,249 --> 00:45:22,969

Not putting particular emphasis on the

mean.

652

00:45:22,969 --> 00:45:27,449

Because otherwise what's going to end up

happening, and that's extremely

653

00:45:27,449 --> 00:45:29,823

frustrating to me, is...

654

00:45:29,823 --> 00:45:32,693

I mentioned that you're comparing two

options.

655

00:45:34,333 --> 00:45:39,713

And so you have the posterior on option A,

the posterior on option B.

656

00:45:39,713 --> 00:45:42,693

You're looking at the first plot of A and

B.

657

00:45:42,693 --> 00:45:44,093

They seem to overlap.

658

00:45:44,093 --> 00:45:47,353

So then you compute the difference of the

posteriors.

659

00:45:47,353 --> 00:45:48,433

So B minus A.

660

00:45:48,433 --> 00:45:53,573

And you're seeing where it spans on the

real line.

661

00:45:53,973 --> 00:45:57,741

And if option A and B are close enough,

662

00:45:57,741 --> 00:46:02,581

the HDI, so the highest density interval,

is going to overlap with zero.

663

00:46:02,921 --> 00:46:08,251

And it seems like zero is a magic number

that makes the whole HDI collapse on one

664

00:46:08,251 --> 00:46:08,801

point.

665

00:46:08,801 --> 00:46:12,821

So basically, the zero is a black hole

which just sucks everything onto itself,

666

00:46:12,821 --> 00:46:16,101

and then the whole range is zero.

667

00:46:16,401 --> 00:46:21,581

And then people are just going to say, oh,

but that's weird because, no, I think

668

00:46:21,581 --> 00:46:24,581

there is some difference between A and B.

669

00:46:24,581 --> 00:46:27,309

And then you have to say, but that's not

what the model is saying.

670

00:46:27,309 --> 00:46:31,989

You're just looking at zero and you see

that the HDI overlaps zero at some point.

671

00:46:31,989 --> 00:46:36,709

But actually the model is saying that, I

don't know, there is an 86 % chance that

672

00:46:36,709 --> 00:46:41,349

option A is actually better than option B

is actually better than A.

673

00:46:41,389 --> 00:46:46,279

So, you know, there is a five in six

chance, which is absolutely non -next

674

00:46:46,279 --> 00:46:49,999

level that B is indeed better than A, but

we can actually rule out the possibility

675

00:46:49,999 --> 00:46:51,249

that A is better than B.

676

00:46:51,249 --> 00:46:52,809

That's what the model is saying.

677

00:46:52,809 --> 00:46:55,169

It's not telling you that there is no

difference.

678

00:46:55,169 --> 00:46:56,813

And it's not telling you that

679

00:46:56,813 --> 00:46:59,453

A is definitely better than B.

680

00:46:59,453 --> 00:47:01,013

And that is still in it.

681

00:47:01,013 --> 00:47:02,993

I'm trying to crack.

682

00:47:03,013 --> 00:47:09,643

But yeah, here you cannot make the zero

disappear, right?

683

00:47:09,643 --> 00:47:13,563

But the only thing you can do is make sure

that people don't interpret the zero as a

684

00:47:13,563 --> 00:47:14,553

black hole.

685

00:47:14,593 --> 00:47:15,873

That's the main thing.

686

00:47:15,873 --> 00:47:16,593

Yeah, yeah.

687

00:47:16,593 --> 00:47:19,213

Yeah, yeah, that's a good point.

688

00:47:22,573 --> 00:47:26,723

I can see that being challenging for

people that come from frequentist models

689

00:47:26,723 --> 00:47:32,133

because what they're accustomed to, the

maximum likelihood estimate.

690

00:47:32,173 --> 00:47:35,133

And it's all about those point estimates.

691

00:47:35,613 --> 00:47:40,153

But I like the idea of not even supplying

those point estimates.

692

00:47:40,833 --> 00:47:41,193

Yeah.

693

00:47:41,193 --> 00:47:41,693

Yeah, yeah.

694

00:47:41,693 --> 00:47:45,933

I mean, and that makes sense in the way

that's just a distraction.

695

00:47:45,933 --> 00:47:47,503

It doesn't mean anything in particular.

696

00:47:47,503 --> 00:47:48,573

That's mainly a distraction.

697

00:47:48,573 --> 00:47:50,731

What's more important here is the range.

698

00:47:50,797 --> 00:47:53,037

of the estimates.

699

00:47:53,037 --> 00:47:57,817

So, you know, like give the range and give

the point estimates if people ask for it.

700

00:47:57,817 --> 00:48:01,857

But otherwise, that's more distraction

than anything else.

701

00:48:01,857 --> 00:48:09,057

And I think I got that idea from listening

to a talk by Richard MacGarriff, who was

702

00:48:09,057 --> 00:48:12,817

talking about something he called table

two fallacy.

703

00:48:13,237 --> 00:48:15,297

Yeah, I know that.

704

00:48:15,437 --> 00:48:19,949

Where usually the present the table of

estimates in the table two.

705

00:48:19,949 --> 00:48:28,359

And usually people tend to, his point with

that, people tend to interpret the

706

00:48:28,359 --> 00:48:36,459

coefficient on a linear regression, for

instance, as all of them as causal, but

707

00:48:36,459 --> 00:48:37,109

they are not.

708

00:48:37,109 --> 00:48:41,909

The only parameter that's really causally

interpretable is the one that relates the

709

00:48:41,909 --> 00:48:43,229

treatment to the outcome.

710

00:48:43,229 --> 00:48:49,589

The other one, for instance, from a

mediator to the outcome, or...

711

00:48:49,869 --> 00:48:56,469

the one from a confounder to the outcome,

you cannot interpret that parameter as

712

00:48:56,469 --> 00:48:57,509

causal.

713

00:48:57,809 --> 00:49:02,489

Or you have to do the causal graph

analysis and then see if the linear

714

00:49:02,489 --> 00:49:09,229

regression you ran actually corresponds to

the one you would have to run in this new

715

00:49:09,229 --> 00:49:16,299

causal DAG to identify or the direct or

the total causal effect of that new

716

00:49:16,299 --> 00:49:18,477

variable that you're taking as the

treatment.

717

00:49:18,477 --> 00:49:20,357

basically you're changing the treatment

here.

718

00:49:20,357 --> 00:49:23,297

So you have to change the model

potentially.

719

00:49:23,297 --> 00:49:28,237

And so you cannot interpret and should

absolutely not interpret the parameters

720

00:49:28,237 --> 00:49:33,237

that are not the one from the treatment to

the outcome as causally interpretable.

721

00:49:33,617 --> 00:49:41,267

And so to avoid that fallacy, he was

suggesting two options or you actually

722

00:49:41,267 --> 00:49:46,007

provide the interpretation of that

parameter in the current DAG that you

723

00:49:46,007 --> 00:49:46,727

have.

724

00:49:47,181 --> 00:49:51,851

And say, if it's not causally

interpretable in that case, which DAG you

725

00:49:51,851 --> 00:49:56,981

would have, which regression, sorry, which

model would have to use, which is

726

00:49:56,981 --> 00:50:02,271

different from the one you actually have

RAM to actually be able to interpret that

727

00:50:02,271 --> 00:50:03,761

coefficient causally.

728

00:50:03,901 --> 00:50:08,231

Or you just don't report these parameters,

these coefficients, because they are not

729

00:50:08,231 --> 00:50:10,681

the point of the analysis.

730

00:50:10,681 --> 00:50:13,861

The point of the analysis is to relate the

treatment to the outcome and see what the

731

00:50:13,861 --> 00:50:15,405

effect of the treatment is on the outcome.

732

00:50:15,405 --> 00:50:18,425

not what the treatment of a camp founder

on the outcome is.

733

00:50:18,425 --> 00:50:20,605

So why would you report that in the first

place?

734

00:50:20,605 --> 00:50:24,815

You can report it if people ask for it,

but you don't, you should not report it by

735

00:50:24,815 --> 00:50:25,705

default.

736

00:50:26,585 --> 00:50:27,175

Yeah, yeah.

737

00:50:27,175 --> 00:50:30,835

There's some good like tie -ins to

structural equation models there too,

738

00:50:30,835 --> 00:50:37,135

because I think like in some of those,

some of McElroy's examples, he dabbles a

739

00:50:37,135 --> 00:50:42,545

little bit in structural equation model

and to, it's kind of like a one possible

740

00:50:42,545 --> 00:50:44,585

solution here to,

741

00:50:45,069 --> 00:50:50,379

to really saying what could we interpret

causally or not in the presence of

742

00:50:50,379 --> 00:50:56,189

confounding variables or like there's the

colliders that also cause problems if you

743

00:50:56,189 --> 00:50:57,989

include them in a regression.

744

00:50:59,449 --> 00:51:00,829

Yeah, he does a little bit.

745

00:51:00,829 --> 00:51:04,399

I've seen some of his examples like what

structural equation model source of

746

00:51:04,399 --> 00:51:04,989

things.

747

00:51:04,989 --> 00:51:11,229

I think there's something interesting

there about informing what predictors

748

00:51:11,229 --> 00:51:13,389

should go in a regression or.

749

00:51:13,389 --> 00:51:18,589

what could we interpret causally out of a

particular model?

750

00:51:18,929 --> 00:51:20,749

Yeah, exactly.

751

00:51:21,469 --> 00:51:34,839

And I have actually linked to the table 2

fallacy thing I was talking about, his

752

00:51:34,839 --> 00:51:36,129

video of that.

753

00:51:36,129 --> 00:51:41,789

So this will be in the show notes for

people who want to dig deeper.

754

00:51:41,789 --> 00:51:42,949

Yes.

755

00:51:43,437 --> 00:51:46,347

And, yeah, so we're in this discussion.

756

00:51:46,347 --> 00:51:50,307

I really love to talk about these topics,

as you can see, and I've really deeply

757

00:51:50,307 --> 00:51:54,017

enjoyed diving deeper into them.

758

00:51:54,037 --> 00:51:58,897

And still, I'm diving deeper into these

topics for 2024.

759

00:51:58,897 --> 00:52:02,817

That's one of my objectives, so that's

really fun.

760

00:52:03,177 --> 00:52:04,157

Yeah.

761

00:52:04,437 --> 00:52:10,027

Maybe let's talk about latent viable

models, because you also work on that.

762

00:52:10,027 --> 00:52:13,389

And if I understood correctly, they are

quite crucial in psychology.

763

00:52:13,389 --> 00:52:20,429

So how do you approach these models,

especially in the context of patient

764

00:52:20,429 --> 00:52:21,389

stance?

765

00:52:21,569 --> 00:52:26,369

And maybe explain, also give us a primer

on what latent viable models are.

766

00:52:26,369 --> 00:52:27,789

Yeah, I would.

767

00:52:27,789 --> 00:52:32,859

So sometimes I almost use them as like

just another term for structural equation

768

00:52:32,859 --> 00:52:33,769

model.

769

00:52:33,769 --> 00:52:36,469

They're very related.

770

00:52:36,469 --> 00:52:37,707

I would say.

771

00:52:38,221 --> 00:52:43,271

I would say if I'm around psychology or

psychometrics people, I would use the term

772

00:52:43,271 --> 00:52:44,981

structural equation model.

773

00:52:44,981 --> 00:52:49,051

But if I'm around statistics people, I

might more often use the term latent

774

00:52:49,051 --> 00:52:57,401

variable model because I think that term

latent variable, or maybe sometimes people

775

00:52:57,401 --> 00:53:02,441

might say a hidden variable or something

that's unobserved.

776

00:53:03,341 --> 00:53:05,377

But it's like in...

777

00:53:05,377 --> 00:53:09,417

in structural equation modeling, that is

sort of just like a random effect or a

778

00:53:09,417 --> 00:53:15,557

random parameter that we assume has some

influence on other observed variables.

779

00:53:17,897 --> 00:53:22,457

And that you can never observe it.

780

00:53:22,677 --> 00:53:23,377

That's right.

781

00:53:23,377 --> 00:53:27,777

And so the traditional example is...

782

00:53:28,461 --> 00:53:33,121

maybe something related to intelligence or

say like a person's math aptitude,

783

00:53:33,121 --> 00:53:36,161

something you would use a standardized

test for.

784

00:53:36,161 --> 00:53:38,231

You can't directly observe it.

785

00:53:38,231 --> 00:53:43,661

You can ask many questions that get at a

person's math aptitude.

786

00:53:43,661 --> 00:53:49,701

And we could assume, yes, there's this

latent aptitude that each person has that

787

00:53:49,701 --> 00:53:54,841

we are trying to measure with all of our

questions on a standardized test.

788

00:53:54,901 --> 00:53:57,987

That sort of gets at the idea of latent

variable.

789

00:53:58,093 --> 00:53:58,753

Yeah.

790

00:53:58,753 --> 00:53:59,673

Yeah.

791

00:53:59,673 --> 00:54:10,233

And like, or another example would be the

latent popularity of political parties.

792

00:54:10,233 --> 00:54:12,243

Like, you never really observed them.

793

00:54:12,243 --> 00:54:14,853

Actually, you just have an idea with

polls.

794

00:54:14,853 --> 00:54:19,243

You had a better idea with elections, but

even elections are not a perfect image of

795

00:54:19,243 --> 00:54:23,993

that because nobody, like, not everybody

goes and vote.

796

00:54:24,293 --> 00:54:26,573

So that's thank you again.

797

00:54:26,573 --> 00:54:33,433

actually never observe the actual

popularity of political parties in the

798

00:54:33,433 --> 00:54:38,143

total population because, well, even

elections don't make a perfect job of

799

00:54:38,143 --> 00:54:39,073

that.

800

00:54:39,073 --> 00:54:40,833

Yeah, yeah, yeah.

801

00:54:40,833 --> 00:54:47,733

Yeah, and then people will get into a lot

of deep philosophy conversations about

802

00:54:47,733 --> 00:54:54,713

does this latent variable even exist and

how could one characterize that?

803

00:54:54,773 --> 00:54:55,551

And

804

00:54:55,821 --> 00:55:00,431

Personally, I don't often get into those

deep philosophy conversations.

805

00:55:00,431 --> 00:55:05,201

I just more think of this as a model than

within this model.

806

00:55:05,201 --> 00:55:07,281

It could be a random parameter.

807

00:55:07,421 --> 00:55:11,291

And I guess maybe it's just my personal

bias.

808

00:55:11,291 --> 00:55:13,401

I don't think about it too abstractly.

809

00:55:13,401 --> 00:55:17,541

I just think about how does this latent

variable function in a model and how can I

810

00:55:17,541 --> 00:55:19,433

fit this model to data?

811

00:55:22,093 --> 00:55:23,333

Yeah, I see.

812

00:55:23,593 --> 00:55:34,423

And so in these cases, how do you found

that using a basin framework has been

813

00:55:34,423 --> 00:55:35,247

helpful?

814

00:55:36,813 --> 00:55:45,993

Yeah, I think related to it, I was

discussing before about these latent

815

00:55:45,993 --> 00:55:48,473

variables are often like random effects.

816

00:55:48,473 --> 00:55:56,713

And so from a Bayesian point of view, you

can sample those parameters and look at

817

00:55:56,713 --> 00:56:01,173

how their uncertainty filters through to

other parts of your model.

818

00:56:01,173 --> 00:56:02,153

That's all.

819

00:56:02,157 --> 00:56:04,817

very straightforward from a Bayesian point

of view.

820

00:56:04,817 --> 00:56:07,457

I think those are some of the big

advantages.

821

00:56:08,297 --> 00:56:09,357

OK, I see.

822

00:56:09,357 --> 00:56:09,937

I see.

823

00:56:09,937 --> 00:56:11,297

Yeah.

824

00:56:11,377 --> 00:56:18,947

If we de -zoom a bit, I'm actually

curious, what would you say is the biggest

825

00:56:18,947 --> 00:56:21,721

hurdle in the Bayesian workflow currently?

826

00:56:23,297 --> 00:56:24,325

Um

827

00:56:29,261 --> 00:56:35,831

There's always challenges with how long

does it take MCMC to run, especially for

828

00:56:35,831 --> 00:56:40,901

people coming from frequentist models or

things where, for some frequentist models,

829

00:56:40,901 --> 00:56:45,991

especially with these structural equation

or latent variable models, you can get

830

00:56:45,991 --> 00:56:48,531

some maximum likelihood estimates in a

couple of seconds.

831

00:56:48,531 --> 00:56:55,611

And there's cases with MCMC, it might take

much longer depending on how the model was

832

00:56:55,611 --> 00:56:57,677

set up or how tailored.

833

00:56:57,677 --> 00:57:01,977

your estimation strategy is to a

particular model.

834

00:57:02,617 --> 00:57:05,537

So I think speed is always an issue.

835

00:57:06,317 --> 00:57:13,087

And that I think could maybe detract some

people from doing Bayesian modeling

836

00:57:13,087 --> 00:57:14,517

sometimes.

837

00:57:14,757 --> 00:57:21,467

I would say maybe the other barrier to the

workflow is just getting people to slow

838

00:57:21,467 --> 00:57:27,549

down and just be happy with slowing down

with working through their model.

839

00:57:27,565 --> 00:57:36,945

I think especially in the social sciences

where I work, people become too accustomed

840

00:57:36,945 --> 00:57:42,425

to specifying their model, pressing a

button, getting the results immediately

841

00:57:42,425 --> 00:57:44,785

and writing it and being done.

842

00:57:45,045 --> 00:57:50,005

And I think that's not how good Bayesian

modeling happens.

843

00:57:50,005 --> 00:57:56,245

Good Bayesian modeling, you sit back a

little bit and think through everything.

844

00:57:56,285 --> 00:57:57,101

And...

845

00:57:57,101 --> 00:58:04,041

I think is a challenge convincing people

sometimes to make that a habitual part of

846

00:58:04,041 --> 00:58:05,221

the workflow.

847

00:58:05,821 --> 00:58:06,221

Yeah.

848

00:58:06,221 --> 00:58:08,041

Bayesian models need love.

849

00:58:08,341 --> 00:58:10,421

You need to give it love for sure.

850

00:58:10,761 --> 00:58:17,201

I personally have been working lately on

an academic project like that where we're

851

00:58:17,201 --> 00:58:26,683

writing a paper on, basically it's a trade

paper on biology, marine biology trade.

852

00:58:26,701 --> 00:58:30,461

And the model is extremely complex.

853

00:58:30,461 --> 00:58:38,771

And that's why I'm on this project is to

work with the academics working on it who

854

00:58:38,771 --> 00:58:42,501

are extremely knowledgeable, of course,

but on their domain.

855

00:58:42,621 --> 00:58:47,071

And me, I don't understand anything about

the biology part, but I'm just here to try

856

00:58:47,071 --> 00:58:49,121

and make the model work.

857

00:58:49,481 --> 00:58:53,921

And the one is tremendously complicated

because the phenomenon they are studying

858

00:58:53,921 --> 00:58:55,641

is extremely complex.

859

00:58:55,641 --> 00:58:56,421

So.

860

00:58:56,557 --> 00:59:02,297

Yeah, but like here, the amazing thing is

that the person leading the project, Aaron

861

00:59:02,297 --> 00:59:06,377

McNeil, has a huge appetite for that kind

of work, right?

862

00:59:06,377 --> 00:59:14,057

And really love doing the Bayesian model,

coding it, and then improving it together.

863

00:59:14,657 --> 00:59:18,607

But definitely that's a big endeavor,

takes a lot of time.

864

00:59:18,607 --> 00:59:22,187

But then the model is extremely powerful

afterwards and you can get a lot of

865

00:59:22,187 --> 00:59:26,189

inferences that you cannot have with a

classic trivial model.

866

00:59:26,189 --> 00:59:29,439

So, you know, there is no free lunch,

right?

867

00:59:29,439 --> 00:59:34,249

If your model is trivial, your inferences

probably will be, unless you're extremely

868

00:59:34,249 --> 00:59:37,969

lucky and you're just working on something

that nobody has worked on before.

869

00:59:37,969 --> 00:59:43,169

So then it's like, just a forest

completely new.

870

00:59:43,169 --> 00:59:48,779

But otherwise, if you want interesting

inferences, you have to have an

871

00:59:48,779 --> 00:59:49,649

interesting model.

872

00:59:49,649 --> 00:59:54,765

And that takes time, takes dedication, but

for sure it's extremely...

873

00:59:54,765 --> 00:59:57,645

interesting and then after once it gives

you a lot of power.

874

00:59:58,345 --> 01:00:01,185

So, you know, it's a bit of a...

875

01:00:01,185 --> 01:00:04,915

That's also a bit frustrating to me in the

sense that the model is actually not going

876

01:00:04,915 --> 01:00:07,085

to be really part of the paper, right?

877

01:00:07,085 --> 01:00:09,665

People just care about the results of the

model.

878

01:00:09,945 --> 01:00:12,345

But me, it's like, and I mean, it makes

sense, right?

879

01:00:12,345 --> 01:00:17,885

It's like when you buy a car, yeah, the

engine is important, but you care about

880

01:00:17,885 --> 01:00:19,175

the whole car, right?

881

01:00:19,175 --> 01:00:22,495

But I'm guessing that the person who built

the engine is like, yeah, but without the

882

01:00:22,495 --> 01:00:24,285

engine, it's not even a car.

883

01:00:24,525 --> 01:00:28,525

So why don't you give credit to the

engine?

884

01:00:28,965 --> 01:00:30,125

But that makes sense.

885

01:00:30,125 --> 01:00:34,635

But it was really fun for me to see

because for me, the model is really the

886

01:00:34,635 --> 01:00:35,585

thing.

887

01:00:36,045 --> 01:00:41,705

But it's actually almost not even going to

be a part of the paper.

888

01:00:41,705 --> 01:00:43,885

It's going to be an annex or something

like that.

889

01:00:44,245 --> 01:00:45,085

Yeah.

890

01:00:45,085 --> 01:00:48,505

That's really weird.

891

01:00:48,885 --> 01:00:51,425

Put it in the appendix.

892

01:00:51,925 --> 01:00:53,575

Yeah.

893

01:00:53,575 --> 01:00:54,365

Yeah.

894

01:00:54,701 --> 01:00:58,941

So I've already taken a lot of your time,

Ed.

895

01:00:58,941 --> 01:01:04,001

So let's head up for the last two

questions.

896

01:01:04,001 --> 01:01:09,711

Before that, though, I'm curious, looking

forward, what exciting developments do you

897

01:01:09,711 --> 01:01:12,159

foresee in patient psychometrics?

898

01:01:13,709 --> 01:01:21,329

Uh, the one that I see coming is related

to the speed issue again.

899

01:01:21,329 --> 01:01:28,779

So, um, I, what there's, there's more and

more MCMC stuff with GPUs.

900

01:01:28,779 --> 01:01:35,619

And I was at a stand meeting last year

where they're talking about, um, you know,

901

01:01:35,619 --> 01:01:40,239

imagine being able to run hundreds of

parallel chains that all like share a burn

902

01:01:40,239 --> 01:01:41,773

in so that, you know,

903

01:01:41,773 --> 01:01:46,353

one chain isn't going to go off and do

something really crazy.

904

01:01:46,553 --> 01:01:49,073

I think all of that is really interesting.

905

01:01:49,073 --> 01:01:55,513

And I think that could really improve some

of these bigger psychometric models that

906

01:01:55,513 --> 01:02:02,453

can take a while to run if we could do

lots of parallel chains and be pretty sure

907

01:02:02,453 --> 01:02:04,613

that they're gonna converge.

908

01:02:04,613 --> 01:02:08,393

I think is something coming that will be

very useful.

909

01:02:10,061 --> 01:02:14,501

Yeah, that definitely sounds like an

awesome project.

910

01:02:15,781 --> 01:02:20,591

So before letting you go, Ed, I'm going to

ask you the last two questions I ask every

911

01:02:20,591 --> 01:02:22,121

guest at the end of the show.

912

01:02:22,121 --> 01:02:26,951

First one, if you had unlimited time and

resources, which problem would you try to

913

01:02:26,951 --> 01:02:27,525

solve?

914

01:02:29,547 --> 01:02:30,757

Yes.

915

01:02:34,085 --> 01:02:41,525

So I guess people should say, you know,

world hunger or world peace or something,

916

01:02:41,985 --> 01:02:50,725

but I think I would probably go for

something that's closer to what I do.

917

01:02:51,105 --> 01:03:00,355

And one thing that comes to mind involves

maybe improving math education or making

918

01:03:00,355 --> 01:03:03,425

it more accessible to more people.

919

01:03:03,917 --> 01:03:11,457

I think at least in the US, like for

younger kids growing up with math, it

920

01:03:11,457 --> 01:03:16,987

feels a little bit like sports where if

you are fortunate to have gotten into it

921

01:03:16,987 --> 01:03:23,317

really early, then you like have this

advantage and you do well.

922

01:03:23,317 --> 01:03:30,837

But if you come into math late, say maybe

as a teenager, I think what happens

923

01:03:30,837 --> 01:03:32,365

sometimes is,

924

01:03:32,365 --> 01:03:36,875

You see other people that are way ahead of

you, like solving problems you have no

925

01:03:36,875 --> 01:03:38,825

idea how to do.

926

01:03:38,825 --> 01:03:45,965

And then you get maybe not so enthusiastic

and you just leave and do something else

927

01:03:45,965 --> 01:03:47,165

with your life.

928

01:03:47,165 --> 01:03:54,665

I think more could be done just to try to

get more interested people like staying in

929

01:03:54,665 --> 01:03:58,885

math related fields and doing more work

there.

930

01:03:58,885 --> 01:03:59,789

I think.

931

01:03:59,949 --> 01:04:04,749

with unlimited resources, that's the sort

of thing that I would try to do.

932

01:04:06,109 --> 01:04:08,869

Yeah, I love that.

933

01:04:08,889 --> 01:04:14,549

And definitely I can, yeah, I can

understand why you would say that.

934

01:04:14,549 --> 01:04:16,409

That's a very good point.

935

01:04:17,349 --> 01:04:22,029

As I was to say, I was late coming around

to math myself.

936

01:04:22,529 --> 01:04:28,261

I think I don't know what happens in every

country, but in the US, it feels like...

937

01:04:28,397 --> 01:04:33,497

You're just expected to think that math is

this tough thing that's not for you.

938

01:04:33,637 --> 01:04:39,187

And unless you have like influences in

your life that would convince you

939

01:04:39,187 --> 01:04:44,527

otherwise, I think a lot of kids just

don't even make an attempt to do something

940

01:04:44,527 --> 01:04:45,607

with math.

941

01:04:47,117 --> 01:04:49,657

Yeah, yeah, that's a good point.

942

01:04:50,817 --> 01:04:55,957

And second question, if you could have

dinner with any great scientific mind,

943

01:04:55,957 --> 01:04:59,173

dead, alive, or fictional, who would it

be?

944

01:05:01,133 --> 01:05:08,893

Yeah, this is one that is easy to

overthink or to really make a big thing

945

01:05:08,893 --> 01:05:09,693

about.

946

01:05:09,693 --> 01:05:13,373

But so here's one thing that I think

about.

947

01:05:13,553 --> 01:05:21,613

There's, I think it's called Stigler's law

about it's related to this idea that the

948

01:05:21,613 --> 01:05:27,993

person who is known for like a major

finding or scientific result often isn't

949

01:05:27,993 --> 01:05:29,981

the one that did the hard work.

950

01:05:30,113 --> 01:05:37,853

Maybe they were the ones that that were

like promoted themselves the most or or

951

01:05:37,853 --> 01:05:43,313

otherwise just got their name attached and

so If I'm having dinner, I want it to be

952

01:05:43,313 --> 01:05:44,563

more of a low -key dinner.

953

01:05:44,563 --> 01:05:50,963

So I don't necessarily want to go for the

most famous person that is the most known

954

01:05:50,963 --> 01:05:55,573

for something because I worry that they

would just like promote themselves the

955

01:05:55,573 --> 01:05:59,949

whole time or you would feel like you're

talking to a robot because they're

956

01:05:59,949 --> 01:06:03,809

They're like, they see themselves as kind

of above everyone.

957

01:06:04,689 --> 01:06:11,739

So with that in mind, and keeping it on

the Bayesian viewpoint, one person that

958

01:06:11,739 --> 01:06:20,309

comes to mind is Arianna Rosenbluth, who

was one of the, I think was the first to

959

01:06:20,309 --> 01:06:26,629

like program a Metropolis Hastings

algorithm and did it in the context of the

960

01:06:26,629 --> 01:06:29,517

Manhattan project during World War II.

961

01:06:29,517 --> 01:06:33,997

So I think she would be an interesting

person to have dinner with.

962

01:06:33,997 --> 01:06:37,697

She clearly did some important work.

963

01:06:38,917 --> 01:06:44,127

Didn't quite get the recognition that some

others did, but also I think she didn't

964

01:06:44,127 --> 01:06:46,357

have a traditional academic career.

965

01:06:46,357 --> 01:06:50,427

So that means that dinner, you know, you

could talk about some work things, but

966

01:06:50,427 --> 01:06:57,737

also I think she would be interesting to

talk to just, you know, just about other

967

01:06:57,737 --> 01:06:58,829

non -work things.

968

01:06:58,829 --> 01:07:01,949

That's the kind of dinner that I would

like to have.

969

01:07:01,949 --> 01:07:03,589

So that's my answer.

970

01:07:03,589 --> 01:07:04,199

Love it.

971

01:07:04,199 --> 01:07:05,369

Love it, Ed.

972

01:07:05,369 --> 01:07:06,929

Fantastic answer.

973

01:07:07,509 --> 01:07:10,609

And definitely invite me to that dinner.

974

01:07:10,609 --> 01:07:12,509

That would be fascinating.

975

01:07:12,589 --> 01:07:14,009

Fantastic.

976

01:07:14,009 --> 01:07:16,129

Thanks a lot, Ed.

977

01:07:16,489 --> 01:07:18,749

We can call it a show.

978

01:07:19,049 --> 01:07:20,229

That was great.

979

01:07:20,229 --> 01:07:21,689

I learned a lot.

980

01:07:22,069 --> 01:07:28,941

And as usual, I will put a link to your

website and your socials and tutorials.

981

01:07:28,941 --> 01:07:32,581

in the show notes for those who want to

dig deeper.

982

01:07:32,581 --> 01:07:33,201

Thank you again.

983

01:07:33,201 --> 01:07:33,661

All right.

984

01:07:33,661 --> 01:07:35,361

Thanks for taking the time and being on

the show.

985

01:07:35,361 --> 01:07:36,441

Thanks for having me.

986

01:07:36,441 --> 01:07:37,609

It was fun.

987

01:07:41,549 --> 01:07:45,329

This has been another episode of Learning

Bayesian Statistics.

988

01:07:45,329 --> 01:07:50,299

Be sure to rate, review, and follow the

show on your favorite podcatcher, and

989

01:07:50,299 --> 01:07:55,169

visit learnbaystats .com for more

resources about today's topics, as well as

990

01:07:55,169 --> 01:07:59,909

access to more episodes to help you reach

true Bayesian state of mind.

991

01:07:59,909 --> 01:08:01,889

That's learnbaystats .com.

992

01:08:01,889 --> 01:08:06,709

Our theme music is Good Bayesian by Baba

Brinkman, fit MC Lass and Meghiraam.

993

01:08:06,709 --> 01:08:09,849

Check out his awesome work at bababrinkman

.com.

994

01:08:09,849 --> 01:08:11,029

I'm your host.

995

01:08:11,029 --> 01:08:11,989

Alex and Dora.

996

01:08:11,989 --> 01:08:16,249

You can follow me on Twitter at Alex

underscore and Dora like the country.

997

01:08:16,249 --> 01:08:21,329

You can support the show and unlock

exclusive benefits by visiting patreon

998

01:08:21,329 --> 01:08:23,529

.com slash LearnBasedDance.

999

01:08:23,529 --> 01:08:25,929

Thank you so much for listening and for

your support.

Speaker:

01:08:25,929 --> 01:08:31,839

You're truly a good Bayesian change your

predictions after taking information and

Speaker:

01:08:31,839 --> 01:08:35,129

if you think and I'll be less than

amazing.

Speaker:

01:08:35,209 --> 01:08:38,109

Let's adjust those expectations.

Speaker:

01:08:38,285 --> 01:08:43,695

Let me show you how to be a good Bayesian

Change calculations after taking fresh

Speaker:

01:08:43,695 --> 01:08:49,735

data in Those predictions that your brain

is making Let's get them on a solid

Speaker:

01:08:49,735 --> 01:08:51,365

foundation

Previous post
Next post