Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag 😉

Takeaways

  • Convincing non-stats stakeholders in sports analytics can be challenging, but building trust and confirming their prior beliefs can help in gaining acceptance.
  • Combining subjective beliefs with objective data in Bayesian analysis leads to more accurate forecasts.
  • The availability of massive data sets has revolutionized sports analytics, allowing for more complex and accurate models.
  • Sports analytics models should consider factors like rest, travel, and altitude to capture the full picture of team performance.
  • The impact of budget on team performance in American sports and the use of plus-minus models in basketball and American football are important considerations in sports analytics.
  • The future of sports analytics lies in making analysis more accessible and digestible for everyday fans.
  • There is a need for more focus on estimating distributions and variance around estimates in sports analytics.
  • AI tools can empower analysts to do their own analysis and make better decisions, but it’s important to ensure they understand the assumptions and structure of the data.
  • Measuring the value of certain positions, such as midfielders in soccer, is a challenging problem in sports analytics.
  • Game theory plays a significant role in sports strategies, and optimal strategies can change over time as the game evolves.

Chapters

00:00 Introduction and Overview

09:27 The Power of Bayesian Analysis in Sports Modeling

16:28 The Revolution of Massive Data Sets in Sports Analytics

31:03 The Impact of Budget in Sports Analytics

39:35 Introduction to Sports Analytics

52:22 Plus-Minus Models in American Football

01:04:11 The Future of Sports Analytics

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
Speaker:

Folks, you may know it by now, I am a huge

sports fan.

2

00:00:09,082 --> 00:00:12,712

So needless to say that this episode was

like being in a candy store for me.

3

00:00:12,712 --> 00:00:16,362

Well, more appropriately, in a chocolate

store.

4

00:00:16,362 --> 00:00:21,622

Paul Sabin is so knowledgeable that this

conversation was an absolute blast for me.

5

00:00:21,622 --> 00:00:26,182

In it, Paul discusses his experience with

non -stats stakeholders in sports

6

00:00:26,182 --> 00:00:31,054

analytics and the challenges of convincing

them to adopt evidence -based decisions.

7

00:00:31,054 --> 00:00:35,354

He also explains his soccer power ratings

and projections model, which uses a

8

00:00:35,354 --> 00:00:39,374

Bayesian approach and expected goals, as

well as the importance of understanding

9

00:00:39,374 --> 00:00:43,254

player value in difficult to measure

positions and the need for more accessible

10

00:00:43,254 --> 00:00:46,174

and digestible sports analytics for fans.

11

00:00:46,174 --> 00:00:50,494

We also touch on the impact of budget on

team performance in American sports and

12

00:00:50,494 --> 00:00:54,974

the use of plus -minus models in

basketball and American football.

13

00:00:54,974 --> 00:00:59,494

Paul is a senior fellow at the Wharton

Sports Analytics and Business Initiative

14

00:00:59,494 --> 00:01:00,654

and I like truer

15

00:01:00,654 --> 00:01:04,694

in the Department of Statistics and Data

Science at the Wharton School of the

16

00:01:04,694 --> 00:01:06,594

University of Pennsylvania.

17

00:01:06,754 --> 00:01:11,214

He has spent his entire career as a sports

analytics professional, teaching and

18

00:01:11,214 --> 00:01:13,914

leading sports analytics research

projects.

19

00:01:13,914 --> 00:01:20,614

This is Learning Visions Statistics,

episode 108, recorded April 11, 2024.

20

00:01:28,598 --> 00:01:42,018

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

21

00:01:42,018 --> 00:01:45,558

methods, the projects, and the people who

make it possible.

22

00:01:45,558 --> 00:01:47,818

I'm your host, Alex Andorra.

23

00:01:47,818 --> 00:01:52,418

You can follow me on Twitter at Alex

underscore Andorra, like the country, for

24

00:01:52,418 --> 00:01:53,968

any info about the show.

25

00:01:53,968 --> 00:01:56,418

LearnBayStats .com is Laplace to me.

26

00:01:56,418 --> 00:01:57,326

Show notes.

27

00:01:57,326 --> 00:02:01,546

becoming a corporate sponsor, unlocking

Bayesian Merge, supporting the show on

28

00:02:01,546 --> 00:02:03,946

Patreon, everything is in there.

29

00:02:03,946 --> 00:02:05,806

That's LearnBasedStats .com.

30

00:02:05,806 --> 00:02:10,186

If you're interested in one -on -one

mentorship, online courses, or statistical

31

00:02:10,186 --> 00:02:15,366

consulting, feel free to reach out and

book a call at topmate .io slash alex

32

00:02:15,366 --> 00:02:17,266

underscore and dora.

33

00:02:17,266 --> 00:02:21,126

See you around, folks, and best Bayesian

wishes to you all.

34

00:02:24,694 --> 00:02:28,974

Welcome to Learning Vagin Statistics.

35

00:02:54,542 --> 00:02:58,082

a full conversation in French as we just

had before recording.

36

00:02:58,462 --> 00:02:59,182

Well done.

37

00:02:59,182 --> 00:03:00,702

It used to be though.

38

00:03:00,762 --> 00:03:02,942

Go back two to three hundred years.

39

00:03:02,942 --> 00:03:06,062

Maybe you just don't go to Africa enough.

40

00:03:06,062 --> 00:03:09,182

That's where French is spoken a lot now

too.

41

00:03:09,222 --> 00:03:10,342

Exactly.

42

00:03:10,342 --> 00:03:14,801

But other than that, you can see French

used to be a very international language

43

00:03:14,801 --> 00:03:20,642

because in my travels, almost all the time

people tell me, yeah, I studied French in

44

00:03:20,642 --> 00:03:21,762

high school.

45

00:03:21,762 --> 00:03:24,302

And the only thing they can say is just a

few words.

46

00:03:24,302 --> 00:03:27,462

Which is normal, like if you don't use it,

right?

47

00:03:27,462 --> 00:03:31,902

But yeah, you can see that because French

is still, or was still taught in high

48

00:03:31,902 --> 00:03:34,162

school and now less and less.

49

00:03:34,282 --> 00:03:37,262

So yeah, so well done Paul for that.

50

00:03:37,262 --> 00:03:41,462

I know, I don't think French is an easy

language to learn.

51

00:03:41,462 --> 00:03:43,152

What has been your experience?

52

00:03:43,152 --> 00:03:44,882

I'm actually very curious.

53

00:03:45,902 --> 00:03:50,892

You know, it's hard to say, so this is a

statistics pod or data science podcast.

54

00:03:50,892 --> 00:03:53,492

So I guess I can't really, I can't really

compare it to anything else.

55

00:03:53,492 --> 00:03:57,102

That's the only other language I've

learned besides my native English.

56

00:03:57,102 --> 00:04:03,062

So, you know, I guess, you know, one

sample size for me, I took it in high

57

00:04:03,062 --> 00:04:03,742

school as well.

58

00:04:03,742 --> 00:04:04,742

I hated it.

59

00:04:04,742 --> 00:04:11,082

I had, so, you know, coming from America,

you know, so the reason I chose, you know,

60

00:04:11,082 --> 00:04:15,076

seventh grade is when I had to choose

whether I was taking French or Spanish.

61

00:04:15,086 --> 00:04:19,466

And I'm the youngest of four kids in my

family growing up.

62

00:04:19,466 --> 00:04:23,266

And my older siblings told me that the

Spanish teacher was really mean.

63

00:04:23,266 --> 00:04:26,166

And that's originally why I took took

French.

64

00:04:26,866 --> 00:04:29,126

and then I took it for the required two to

three years.

65

00:04:29,126 --> 00:04:30,266

And then I was done.

66

00:04:30,266 --> 00:04:34,566

I had in high school, I had this teacher

from Belgium and I still remember her

67

00:04:34,566 --> 00:04:39,666

name, Madame Vendon Plus, and I couldn't

stand her, but come, come to find out

68

00:04:39,666 --> 00:04:42,986

looking back in life that she was actually

a really nice person.

69

00:04:42,986 --> 00:04:45,030

She was just Belgian.

70

00:04:45,102 --> 00:04:51,982

And the cultural, you know, like Americans

think they're the best and the French

71

00:04:51,982 --> 00:04:56,642

language in Europe people also think

they're the best because they ruled the

72

00:04:56,642 --> 00:05:01,222

world in the 17 and 1800s and America felt

like they've ruled the world for the last

73

00:05:01,222 --> 00:05:02,642

100 years.

74

00:05:02,642 --> 00:05:06,602

And so when you get into a room together

and you think both of your cultures are

75

00:05:06,602 --> 00:05:09,262

superior, you know, that doesn't go well

together.

76

00:05:09,282 --> 00:05:10,798

But actually, so after that, I didn't...

77

00:05:10,798 --> 00:05:11,698

speak French at all.

78

00:05:11,698 --> 00:05:15,738

And then I did church service for my

church for two years and I lived in

79

00:05:15,738 --> 00:05:19,918

Montreal, I lived in Quebec, not actually

in the city, I lived in a lot of rural

80

00:05:19,918 --> 00:05:21,398

small town.

81

00:05:21,398 --> 00:05:23,528

And so I studied French really hard.

82

00:05:23,528 --> 00:05:27,098

I had to learn the very strong Quebecois

accent.

83

00:05:27,158 --> 00:05:32,788

And then when I went back to school, it's

when I like really honed in my French.

84

00:05:32,788 --> 00:05:37,638

I was very conversational, could speak

very fluently in Quebec, but then, you

85

00:05:37,638 --> 00:05:39,182

know, I had to learn the grammar a little

bit more.

86

00:05:39,182 --> 00:05:39,722

in depth.

87

00:05:39,722 --> 00:05:43,182

So then I studied French as well at

university as well.

88

00:05:43,182 --> 00:05:47,622

So, you know, immersing yourself and the

actually like learning languages because

89

00:05:47,622 --> 00:05:50,922

when I learned it in school, it didn't

never made sense to me.

90

00:05:50,922 --> 00:05:55,161

But when I studied it on my own and I

studied conjugation and all these things,

91

00:05:55,161 --> 00:05:56,902

it became kind of like a math problem.

92

00:05:56,902 --> 00:06:00,942

And so when I would speak a sentence in my

head, I'd always be like, I need a

93

00:06:00,942 --> 00:06:01,602

subject.

94

00:06:01,602 --> 00:06:03,182

I need to conjugate the verb.

95

00:06:03,182 --> 00:06:06,766

And then I need to say like what I'm, you

know, just

96

00:06:06,766 --> 00:06:08,646

do an adverb or an adjective after it.

97

00:06:08,646 --> 00:06:12,206

And like it made sense in my head, but

that's not how I was taught in school.

98

00:06:12,206 --> 00:06:15,736

I was taught, I had to memorize all these

words, like everything in the kitchen.

99

00:06:15,736 --> 00:06:16,936

How do you say dishwasher?

100

00:06:16,936 --> 00:06:18,066

How do you say refrigerator?

101

00:06:18,066 --> 00:06:19,656

How do you say fork?

102

00:06:19,656 --> 00:06:20,786

How do you say spoon?

103

00:06:20,786 --> 00:06:25,026

I couldn't learn like that, but at like

living and like thinking about French as a

104

00:06:25,026 --> 00:06:28,736

math equation, it made sense in my head

and I was able to pick it up.

105

00:06:28,736 --> 00:06:29,926

You know, sure.

106

00:06:29,926 --> 00:06:33,276

I made tons of mistakes and embarrassed

myself, but it wasn't too bad.

107

00:06:33,276 --> 00:06:34,346

And that's how you learn.

108

00:06:34,346 --> 00:06:34,736

Yeah.

109

00:06:34,736 --> 00:06:36,398

So I'm guessing.

110

00:06:36,398 --> 00:06:40,618

Like from that answer, I'm guessing people

already know why I invited you on the

111

00:06:40,618 --> 00:06:41,318

podcast.

112

00:06:41,318 --> 00:06:44,098

Very nerdy answer, your put languages,

that's perfect.

113

00:06:44,098 --> 00:06:45,458

Thanks a lot.

114

00:06:45,458 --> 00:06:47,468

And yeah, I completely relate actually.

115

00:06:47,468 --> 00:06:54,698

I learned English and German in high

school and yeah, kind of the same.

116

00:06:54,698 --> 00:06:58,698

I always hated formal language learning.

117

00:06:58,878 --> 00:07:02,798

And like in the end I learned these

languages and Spanish that was the same

118

00:07:02,798 --> 00:07:06,542

and Italian that was the same, just going

to the country basically.

119

00:07:06,542 --> 00:07:12,562

And yeah, as you were saying, I think also

what it adds is you've got skin in the

120

00:07:12,562 --> 00:07:13,122

game.

121

00:07:13,122 --> 00:07:17,522

You're in the country, you're having a

conversation with someone.

122

00:07:17,522 --> 00:07:21,862

If you're not able to talk, you look

extremely stupid.

123

00:07:21,862 --> 00:07:27,182

So it's a very good incentive for the

brain to step up and learn.

124

00:07:27,182 --> 00:07:28,602

And that's really awesome.

125

00:07:28,602 --> 00:07:31,214

And then when you are in the situation

that you...

126

00:07:31,214 --> 00:07:33,434

don't know what to say, you remember that.

127

00:07:33,434 --> 00:07:36,814

And then when you learn, this is what I

should have said, it sticks with you

128

00:07:36,814 --> 00:07:40,554

because it has an emotional attachment to

it.

129

00:07:40,554 --> 00:07:40,814

Yeah.

130

00:07:40,814 --> 00:07:41,174

Yeah.

131

00:07:41,174 --> 00:07:42,254

No, exactly.

132

00:07:42,254 --> 00:07:47,394

And I mean, and that's going to be a good

segue to my first question to you, but I

133

00:07:47,394 --> 00:07:55,454

think it's also one of the situations in

life, where you can really, feel and see

134

00:07:55,454 --> 00:07:56,634

your brain learning.

135

00:07:56,634 --> 00:07:59,994

So that's why I also really love learning

new languages and going to countries to do

136

00:07:59,994 --> 00:08:01,134

that because.

137

00:08:01,134 --> 00:08:03,834

Like you arrive in the country, you don't

know how to say anything.

138

00:08:03,834 --> 00:08:08,954

And in just a few weeks, your brain starts

picking up stuff and you can really,

139

00:08:08,954 --> 00:08:14,054

really feel your brain doing its amazing

work that it's been like conditioned to do

140

00:08:14,054 --> 00:08:16,074

from years of evolution.

141

00:08:16,074 --> 00:08:19,174

And to me, that's just absolutely

incredible that the brain is able to do

142

00:08:19,174 --> 00:08:19,734

that.

143

00:08:19,734 --> 00:08:24,714

Even when you're like in your thirties and

beyond, you can do that.

144

00:08:24,714 --> 00:08:27,914

And it's just, I found that absolutely

incredible.

145

00:08:27,914 --> 00:08:30,798

And that's kind of like a Bayesian.

146

00:08:30,798 --> 00:08:35,938

neural network, you know, so I mean, see

that segue, I should definitely have a

147

00:08:35,938 --> 00:08:36,798

podcast.

148

00:08:37,958 --> 00:08:40,678

So actually talking about base.

149

00:08:41,118 --> 00:08:45,038

Yeah, I invited you on the podcast because

you do absolutely awesome work on sports

150

00:08:45,038 --> 00:08:46,798

modeling.

151

00:08:46,818 --> 00:08:52,238

And people know that I'm a big fan of a

lot of sports.

152

00:08:52,238 --> 00:08:54,058

I love modeling sports and so on.

153

00:08:54,058 --> 00:08:55,958

So I'm super happy to have you here.

154

00:08:55,958 --> 00:08:59,790

And I have a list of questions that is

embarrassingly long.

155

00:08:59,790 --> 00:09:05,470

But maybe can you tell us if you are

actually yourself using some basic

156

00:09:05,470 --> 00:09:09,330

methods, if you're familiar with those or

not?

157

00:09:09,510 --> 00:09:13,670

And yeah, in general, what does that look

like in your work?

158

00:09:13,850 --> 00:09:14,390

Yeah.

159

00:09:14,390 --> 00:09:16,830

So yeah, I mean, just a quick background

about myself, right?

160

00:09:16,830 --> 00:09:23,410

I've worked in sports, what we call sports

analytics for almost 10 years now.

161

00:09:24,650 --> 00:09:27,566

Out of actually, I was getting my PhD.

162

00:09:27,566 --> 00:09:34,736

And statistics, and I, you got, there was

this job opportunity at ESPN, you know,

163

00:09:34,736 --> 00:09:38,506

which is a sports broadcasting television

channel in the U S and a few other

164

00:09:38,506 --> 00:09:38,906

countries.

165

00:09:38,906 --> 00:09:45,066

And, you know, I got the job offer to work

on their sports analytics team where

166

00:09:45,066 --> 00:09:50,146

essentially what the team there does is

make forecasts so that, you know, they can

167

00:09:50,146 --> 00:09:54,426

show on TV, you know, on the bottom line,

like who's expected to win, or they can,

168

00:09:54,426 --> 00:09:56,782

we will run simulations on.

169

00:09:56,782 --> 00:09:59,502

you know, who's likely to win the

championship, you know, all throughout the

170

00:09:59,502 --> 00:09:59,982

season.

171

00:09:59,982 --> 00:10:03,722

And so, you know, you can tell stories

with that saying, you know, the team was

172

00:10:03,722 --> 00:10:04,931

just like the beginning of the season.

173

00:10:04,931 --> 00:10:08,282

No one thought they were going to be any

good, but just look how it, you know, they

174

00:10:08,282 --> 00:10:09,682

got better or the opposite.

175

00:10:09,682 --> 00:10:13,562

Like they were supposed to be really good

and everything just went wrong.

176

00:10:13,562 --> 00:10:19,922

And so in my field in sports modeling, I

would think actually you can't, you can't

177

00:10:19,922 --> 00:10:21,152

do it without being Bayesian.

178

00:10:21,152 --> 00:10:24,814

And so when I would interview people, I'd

always focus on, on those.

179

00:10:24,814 --> 00:10:28,354

So as people coming out of school,

sometimes they don't always learn Bayesian

180

00:10:28,354 --> 00:10:30,654

methods very well.

181

00:10:30,654 --> 00:10:34,814

And the reason is in sports, sample sizes

are very small and you have to make

182

00:10:34,814 --> 00:10:37,874

forecasts with very limited data.

183

00:10:38,094 --> 00:10:42,734

And the great thing about Bayesian is

statistics is that you actually have more

184

00:10:42,734 --> 00:10:43,164

data.

185

00:10:43,164 --> 00:10:44,334

You just haven't observed it.

186

00:10:44,334 --> 00:10:48,110

You have expertise or you have opinions,

but those opinions actually matter.

187

00:10:48,110 --> 00:10:51,930

And so maybe we'll get into this, but I'm

actually a very strong advocate because of

188

00:10:51,930 --> 00:10:55,450

my field of being a subjective Bayesian

analysis.

189

00:10:55,450 --> 00:10:59,610

It's okay to insert some information into

your models and it usually makes them

190

00:10:59,610 --> 00:11:00,910

better.

191

00:11:01,910 --> 00:11:02,260

Yeah.

192

00:11:02,260 --> 00:11:03,690

Well, awesome.

193

00:11:03,990 --> 00:11:07,110

couldn't have dreamt better and I have to

fully structure.

194

00:11:07,110 --> 00:11:10,630

I didn't know Paul was going to answer

that because that's not really, I haven't

195

00:11:10,630 --> 00:11:14,910

seen that in your, you know, on your

website or else,

196

00:11:15,086 --> 00:11:18,786

So before, while preparing the episode, I

didn't know if you were already using

197

00:11:18,786 --> 00:11:20,346

Bayesian methods or else.

198

00:11:20,346 --> 00:11:23,546

But definitely, definitely happy to hear

that.

199

00:11:23,546 --> 00:11:26,146

And so that people know that was not a

conspiracy.

200

00:11:26,146 --> 00:11:29,286

I didn't know anything that Paul was going

to say.

201

00:11:30,486 --> 00:11:32,686

OK, so that's awesome.

202

00:11:33,086 --> 00:11:37,446

So I'm an open source developer, so I'm

always very curious about the stack you're

203

00:11:37,446 --> 00:11:38,446

using.

204

00:11:39,046 --> 00:11:45,312

What are you using actually when you're

doing Bayesian analysis of a spot model?

205

00:11:46,094 --> 00:11:50,194

So in my career, I almost always use R and

Stan.

206

00:11:50,194 --> 00:11:53,714

So if I'm doing Bayes analysis, I write a

lot of Stan code.

207

00:11:55,154 --> 00:11:58,254

It's gotten easier with the Chat GPT.

208

00:11:58,474 --> 00:12:00,714

It doesn't do it all the way, right?

209

00:12:00,714 --> 00:12:04,934

But if it's like, hey, I want to build

this kind of model, it'll at least give me

210

00:12:04,934 --> 00:12:05,724

a good framework.

211

00:12:05,724 --> 00:12:10,914

And then I can adjust it and edit it as I

want from there.

212

00:12:10,994 --> 00:12:11,694

Yeah.

213

00:12:11,714 --> 00:12:12,434

Yeah.

214

00:12:12,614 --> 00:12:15,662

And I mean, for sure, you cannot go wrong

with the.

215

00:12:15,662 --> 00:12:17,522

with R and Stan.

216

00:12:17,522 --> 00:12:19,022

So yeah, definitely.

217

00:12:19,022 --> 00:12:26,902

And we've had the, one of the creators of

Stan, Andrew Gellman, was back on the

218

00:12:26,902 --> 00:12:29,662

podcast a few weeks ago.

219

00:12:29,742 --> 00:12:35,382

It was not released yet, but through time

travel, it's gonna have been released when

220

00:12:35,382 --> 00:12:36,702

your episode is out.

221

00:12:36,702 --> 00:12:42,042

So folks, you can go back to - Right,

because I am definitely a lesser draw than

222

00:12:42,042 --> 00:12:44,602

Andrew Gellman is, but that's great.

223

00:12:44,846 --> 00:12:52,646

No, yeah, so if people are curious about

what Andrew has been up to, lastly, it's

224

00:12:52,646 --> 00:12:56,066

the third time he's been on the show and

he just released a new book, Active

225

00:12:56,066 --> 00:12:58,256

Statistics, that I definitely recommend.

226

00:12:58,256 --> 00:13:00,646

It's really fun to read.

227

00:13:00,746 --> 00:13:05,246

It's like, it's how to teach statistics

with stories, which actually relates to

228

00:13:05,246 --> 00:13:11,146

something you just said, Paul, about the,

like, cool and fun way to relate

229

00:13:11,146 --> 00:13:12,934

statistics to...

230

00:13:12,974 --> 00:13:20,234

non -stats people was to be able to tell

stories about a team's probability of

231

00:13:20,234 --> 00:13:23,034

winning or any forecast like that.

232

00:13:23,034 --> 00:13:27,774

So that's definitely interesting to hear

you talk about that.

233

00:13:27,894 --> 00:13:35,134

And actually I'm curious because I've been

following that field of spots analytics

234

00:13:35,134 --> 00:13:40,176

for a few years and I've seen it

personally mature.

235

00:13:40,206 --> 00:13:44,886

quite a lot and evolved quite a lot when

it comes to the technology and the data

236

00:13:44,886 --> 00:13:46,066

availability.

237

00:13:46,086 --> 00:13:52,906

So I'm curious what an expert like you

think about that evolution of technology

238

00:13:52,906 --> 00:13:58,046

and data availability and how that changed

the landscape of Spots Analytics.

239

00:13:59,246 --> 00:14:02,576

Yeah, I mean, it's exploded in the last 10

to 15 years.

240

00:14:02,576 --> 00:14:07,822

So I mean, if people are familiar with the

book slash movie Moneyball, which is

241

00:14:07,822 --> 00:14:10,092

20, about 20 years, the book is about 20

years old now.

242

00:14:10,092 --> 00:14:13,002

The movie is about 12, 13 years old now.

243

00:14:13,782 --> 00:14:19,642

you know, back then in baseball, baseball

was the sport that sort of took off in

244

00:14:19,642 --> 00:14:20,252

sports analytics.

245

00:14:20,252 --> 00:14:22,502

I mean, for a couple of reasons.

246

00:14:22,502 --> 00:14:25,142

One, the game is very discreet.

247

00:14:25,142 --> 00:14:27,322

So their start and their stopping points.

248

00:14:27,322 --> 00:14:28,522

So you can measure.

249

00:14:28,522 --> 00:14:29,052

Right.

250

00:14:29,052 --> 00:14:32,742

Discrete events very well in baseball, but

two, like they're the only sport that

251

00:14:32,742 --> 00:14:36,500

actually had a really long running data

set.

252

00:14:36,558 --> 00:14:39,938

And that went back and they've been

keeping statistics in baseball and you can

253

00:14:39,938 --> 00:14:46,178

actually go back to the 1800s and find out

how people were playing baseball in 1895.

254

00:14:46,178 --> 00:14:47,698

No other sport has that.

255

00:14:47,698 --> 00:14:50,288

So that's, that's probably the reason why

baseball took off.

256

00:14:50,568 --> 00:14:54,438

but since then, you know, every sport for

a while after that, every sport had what

257

00:14:54,438 --> 00:14:57,638

we call play by play data, which is like,

this is what happens.

258

00:14:57,638 --> 00:15:01,478

Soccer had a, a version that was called

event data.

259

00:15:01,478 --> 00:15:02,670

So would people would.

260

00:15:02,670 --> 00:15:06,350

watch a game and every time someone

touched the ball or made a pass, they

261

00:15:06,350 --> 00:15:10,990

would mark, the ball was touched here on

the field and it was passed to there or

262

00:15:10,990 --> 00:15:12,670

they dribbled from here to there.

263

00:15:12,670 --> 00:15:18,070

So it was, they kind of were discretizing

soccer in a way to make it a similar

264

00:15:18,070 --> 00:15:18,809

format.

265

00:15:18,809 --> 00:15:22,150

But then about 10 years ago, we started

getting this player tracking data, which

266

00:15:22,150 --> 00:15:27,650

is the location of everybody and the ball

or the puck on the field, you know,

267

00:15:27,650 --> 00:15:30,090

depending on the sport, 10 to 25 times per

second.

268

00:15:30,090 --> 00:15:32,334

And that's drastically changed.

269

00:15:32,334 --> 00:15:34,934

the methodologies and things that are

used.

270

00:15:34,934 --> 00:15:39,174

So, I mean, Bayesian analysis was great

for this play by play data or even, you

271

00:15:39,174 --> 00:15:44,034

know, game by game data and measuring how,

how players or teams performed.

272

00:15:44,034 --> 00:15:48,174

And then now we've started getting such

huge data sets that, you know, more of the

273

00:15:48,174 --> 00:15:51,654

computer science world, neural networks,

things like that started becoming much

274

00:15:51,654 --> 00:15:56,194

more prevalent in sports analysis just

because the data sets were so massive.

275

00:15:56,194 --> 00:15:58,134

Not that statistics doesn't play a role.

276

00:15:58,134 --> 00:15:58,974

It still does.

277

00:15:58,974 --> 00:15:59,854

And I think.

278

00:15:59,854 --> 00:16:02,654

People sometimes overly rely on these

black box methods.

279

00:16:02,654 --> 00:16:06,594

They don't think about the implications or

the biases in the data, which are still

280

00:16:06,594 --> 00:16:07,214

important.

281

00:16:07,214 --> 00:16:11,504

But we have these huge amounts of data now

and it's just exploded to like, you know,

282

00:16:11,504 --> 00:16:17,474

if you want all the data in a season in

the NFL, it's like over one terabyte of

283

00:16:17,474 --> 00:16:22,194

locations of everybody on every field, 20,

every play of 25 times a second.

284

00:16:22,194 --> 00:16:22,994

It's just massive.

285

00:16:22,994 --> 00:16:23,214

Right.

286

00:16:23,214 --> 00:16:27,874

So it's, it's really changed the way

people have done things.

287

00:16:27,874 --> 00:16:28,270

Right.

288

00:16:28,270 --> 00:16:31,660

And we started going from really simple

questions to huge big questions.

289

00:16:31,660 --> 00:16:36,270

And the funny thing is now, I actually

think with the data being so large, people

290

00:16:36,270 --> 00:16:39,270

are now actually going back to answering

more simple questions.

291

00:16:39,270 --> 00:16:41,730

Like we're not trying to measure

everything all at once.

292

00:16:41,730 --> 00:16:46,330

Let's try to measure very specific things

that we weren't able to measure before.

293

00:16:46,330 --> 00:16:47,430

Hmm.

294

00:16:47,430 --> 00:16:50,560

Yeah, that is definitely interesting.

295

00:16:51,250 --> 00:16:55,810

and is that so first.

296

00:16:56,270 --> 00:17:03,750

Is that availability of data, massive

availability of data, the case in all the

297

00:17:03,750 --> 00:17:04,990

sports industry?

298

00:17:04,990 --> 00:17:09,330

Or is it more, well, the most historical

ones, as you were saying, maybe more

299

00:17:09,330 --> 00:17:10,130

baseball.

300

00:17:10,130 --> 00:17:15,610

I know the data set are more massive there

and maybe other sports like soccer are

301

00:17:15,610 --> 00:17:21,250

less prevalent, the data set are less

prevalent, less massive, or is that a

302

00:17:21,250 --> 00:17:21,920

uniform trend?

303

00:17:21,920 --> 00:17:22,650

First question.

304

00:17:22,650 --> 00:17:25,165

And then second question is,

305

00:17:25,165 --> 00:17:26,525

Where does that data leave?

306

00:17:26,525 --> 00:17:32,665

Is that mostly open source or is that

still quite close source data?

307

00:17:33,025 --> 00:17:33,685

Yeah.

308

00:17:33,685 --> 00:17:38,125

So I mean, baseball is usually like the

cutting edge of everything because they

309

00:17:38,125 --> 00:17:39,905

had a head start.

310

00:17:40,065 --> 00:17:44,685

And basketball and then like kind of

American football, international soccer

311

00:17:44,685 --> 00:17:49,505

football and hockey kind of trail behind.

312

00:17:49,785 --> 00:17:53,565

But the data sets now in all those sports

are very massive.

313

00:17:53,565 --> 00:17:55,054

Hockey just got

314

00:17:55,054 --> 00:18:00,794

The NHL just got their player puck

tracking data just a couple of years ago.

315

00:18:01,634 --> 00:18:05,634

Now baseball and basketball have moved on

beyond just knowing where players are on

316

00:18:05,634 --> 00:18:06,454

the field.

317

00:18:06,454 --> 00:18:09,224

They actually have data of what's called

pose data.

318

00:18:09,224 --> 00:18:14,054

So they know where different joints and

their arms and the legs are of every

319

00:18:14,054 --> 00:18:16,594

player on the field or on the court.

320

00:18:16,594 --> 00:18:18,174

So that data is massive.

321

00:18:18,174 --> 00:18:19,734

It's massive everywhere.

322

00:18:19,734 --> 00:18:24,902

There's companies that are trying to

collect new data based on

323

00:18:26,064 --> 00:18:30,934

video, so they're using computer vision

algorithms to do that, but largely to

324

00:18:30,934 --> 00:18:32,954

answer your second question.

325

00:18:33,374 --> 00:18:35,344

This is not open source data.

326

00:18:35,344 --> 00:18:38,914

So the old school data, the play by play

data is open source.

327

00:18:38,914 --> 00:18:43,934

You can find that on every sport pretty

much via an open source mechanism now.

328

00:18:43,934 --> 00:18:49,934

But this huge, these huge data sets of the

tracking of the players, you know, 10 to

329

00:18:49,934 --> 00:18:51,834

25 times per second.

330

00:18:51,834 --> 00:18:53,454

It's usually all closed source.

331

00:18:53,454 --> 00:18:54,510

There are a few.

332

00:18:54,510 --> 00:18:59,490

releases of that here and there, you know,

the NFL does a competition where they

333

00:18:59,490 --> 00:19:04,230

release some of that data each year, like

a very small set.

334

00:19:04,870 --> 00:19:07,750

and a few other leagues have done

something similar as well.

335

00:19:07,750 --> 00:19:10,570

If they know that's, that's kind of gives

you a taste.

336

00:19:10,970 --> 00:19:14,570

if you have money, there are companies

that try to create that data themselves

337

00:19:14,570 --> 00:19:16,310

and they'll sell it to you.

338

00:19:16,310 --> 00:19:20,870

But you know, that's usually pretty

expensive for an individual person to buy.

339

00:19:21,190 --> 00:19:23,290

So again, just that.

340

00:19:24,044 --> 00:19:24,814

I see.

341

00:19:24,814 --> 00:19:25,314

Okay.

342

00:19:25,314 --> 00:19:26,214

Yeah, interesting.

343

00:19:26,214 --> 00:19:27,354

Definitely.

344

00:19:27,854 --> 00:19:31,624

Because like data is kind of oil in our

industry, right?

345

00:19:31,624 --> 00:19:38,514

So it's definitely interesting to know

what's the state of the supply of oil in a

346

00:19:38,514 --> 00:19:39,594

way.

347

00:19:39,814 --> 00:19:48,094

Maybe for people who are less versed in in

sports modeling, can you give us an

348

00:19:48,094 --> 00:19:53,038

example of how analytical insights have

349

00:19:53,038 --> 00:19:59,498

directly influenced team strategy or

player selection in one of your consulting

350

00:19:59,498 --> 00:20:00,390

roles.

351

00:20:01,934 --> 00:20:02,214

Yeah.

352

00:20:02,214 --> 00:20:05,034

So I mean, I'll just kind of talk broadly

at first.

353

00:20:05,074 --> 00:20:08,104

I mean, so sometimes it's just the most

basic things, right?

354

00:20:08,104 --> 00:20:13,074

So like in basketball, people shoot three

pointers more because all they did is

355

00:20:13,074 --> 00:20:18,153

figured out the expected value was larger

for three point shot than it was for most

356

00:20:18,153 --> 00:20:19,214

two point shots.

357

00:20:19,214 --> 00:20:22,174

Not, not those layups and the dunks,

right?

358

00:20:22,174 --> 00:20:23,584

Those are very high percentages.

359

00:20:23,584 --> 00:20:30,074

So the expected value of a, of a high

percentage times two is, you know, is, is

360

00:20:30,074 --> 00:20:30,584

pretty good.

361

00:20:30,584 --> 00:20:32,230

But then even if.

362

00:20:32,494 --> 00:20:36,254

The percentage drops off a lot when you

multiply it by three to get the expected

363

00:20:36,254 --> 00:20:37,294

value of a three point shot.

364

00:20:37,294 --> 00:20:39,034

You know, it's also pretty good.

365

00:20:39,034 --> 00:20:42,134

So that means basketball has changed

drastically because of that.

366

00:20:42,134 --> 00:20:48,154

and in my roles, I guess, you know, I

think in a lot of sports, there's just

367

00:20:48,154 --> 00:20:49,434

been a lot of open questions.

368

00:20:49,434 --> 00:20:50,864

People kind of move one way.

369

00:20:50,864 --> 00:20:56,414

And then I think actually, I think the

sports analysis does really good job of

370

00:20:56,414 --> 00:20:58,990

tackling very easy problems first.

371

00:20:58,990 --> 00:21:02,430

But then I think there's actually a

tendency for the analysts themselves to be

372

00:21:02,430 --> 00:21:07,690

overconfident in their analysis and

they're not factoring in all of the

373

00:21:07,690 --> 00:21:10,130

sources of variation that might be there.

374

00:21:10,130 --> 00:21:16,790

And something I'm also very curious about

it is what's your experience with non

375

00:21:16,790 --> 00:21:18,670

-stats stakeholders?

376

00:21:18,670 --> 00:21:25,410

So coaches, scouts, players, how do they

typically respond to the analytics and the

377

00:21:25,410 --> 00:21:28,334

insights you provide and other...

378

00:21:28,334 --> 00:21:33,934

differences in reception across sports,

maybe across roles.

379

00:21:34,734 --> 00:21:35,114

Yeah.

380

00:21:35,114 --> 00:21:38,734

So, I mean, it really does vary as in all

things, there's variance.

381

00:21:38,734 --> 00:21:45,894

There are some typically younger, you

know, coaches or scouts that are a little

382

00:21:45,894 --> 00:21:49,794

bit more receptive than people who have

been doing something for a long time.

383

00:21:49,794 --> 00:21:51,034

And I think that's just human nature.

384

00:21:51,034 --> 00:21:52,394

You're used to doing things a certain way.

385

00:21:52,394 --> 00:21:53,374

You don't like.

386

00:21:54,222 --> 00:21:57,982

You know, to stereotype, you don't like

some young person coming and telling you

387

00:21:57,982 --> 00:21:58,952

how to do your job.

388

00:21:58,952 --> 00:21:59,122

Right.

389

00:21:59,122 --> 00:22:01,402

So you have to be really careful about

that.

390

00:22:02,022 --> 00:22:06,942

and the, and the funny thing is, you know,

everything that I have learned or, you

391

00:22:06,942 --> 00:22:12,682

know, I believe in, in terms of making

data driven decisions and don't

392

00:22:12,682 --> 00:22:17,562

overestimate based on small sample sizes

goes out the window when I'm trying to

393

00:22:17,562 --> 00:22:20,782

convince a stakeholder of something.

394

00:22:20,782 --> 00:22:21,934

So for example,

395

00:22:21,934 --> 00:22:26,874

If I have a model and I want them to use

it, and I think it's going to help them.

396

00:22:26,874 --> 00:22:33,174

Of course, I've done the analysis to say,

you know, what over the long run, how it

397

00:22:33,174 --> 00:22:36,214

would improve our efficiency, or if we

make a decision in this way, it'd be

398

00:22:36,214 --> 00:22:38,174

better process, et cetera.

399

00:22:38,294 --> 00:22:42,074

I've done that analysis and I've done it

over a larger sample size.

400

00:22:42,074 --> 00:22:47,334

But when I, when I tell them what they

want to know is they want confirmation

401

00:22:47,334 --> 00:22:48,844

bias, right?

402

00:22:48,844 --> 00:22:51,726

They love confirming their beliefs.

403

00:22:51,726 --> 00:22:57,706

So in order to get them to, agree with

what you're saying, it, this works so much

404

00:22:57,706 --> 00:23:02,946

more better than saying, you know, out of

the thousand players that I did this in,

405

00:23:02,946 --> 00:23:07,586

you know, you only were correct 60 % of

the time, but my model would have been

406

00:23:07,586 --> 00:23:08,626

correct 70%.

407

00:23:08,626 --> 00:23:10,186

Like they don't want to hear that.

408

00:23:10,186 --> 00:23:13,146

They essentially say, well, my model, you

know, you love this player.

409

00:23:13,146 --> 00:23:14,186

So does my model.

410

00:23:14,186 --> 00:23:18,546

I find the one guy, even if it's literally

only one person, they're like, yeah.

411

00:23:18,546 --> 00:23:20,014

Like, if your model can.

412

00:23:20,014 --> 00:23:23,514

If your model can see that, then it must

be doing something right.

413

00:23:23,514 --> 00:23:26,454

And then it's like, then they start to

trust you a little bit.

414

00:23:26,454 --> 00:23:31,834

And over time you give them little pieces,

little crumbs of a cookie that they can

415

00:23:31,834 --> 00:23:34,074

help, you know, get confidence in.

416

00:23:34,074 --> 00:23:39,074

And then, you know, then is when you share

with them, okay, well, but it's also

417

00:23:39,074 --> 00:23:42,864

suggesting this, which is different than

what you've been doing in the past.

418

00:23:42,864 --> 00:23:43,084

Right?

419

00:23:43,084 --> 00:23:48,046

So you don't ever start with, you know,

trust me.

420

00:23:48,046 --> 00:23:50,506

because you might be wrong, because you're

a human.

421

00:23:50,506 --> 00:23:53,866

I mean, like, you know, humans always make

mistakes, but we usually don't think we

422

00:23:53,866 --> 00:23:56,026

make as many mistakes as we do.

423

00:23:56,586 --> 00:24:01,686

And so I found just over time is if you

get people to trust you by confirming

424

00:24:01,686 --> 00:24:03,466

their prior held beliefs, right?

425

00:24:03,466 --> 00:24:04,626

It's another Bayesian concepts.

426

00:24:04,626 --> 00:24:09,846

If you can confirm their prior beliefs,

they're going to accept your future

427

00:24:09,846 --> 00:24:16,166

recommendations or future things that the

model might suggest more than if you start

428

00:24:16,166 --> 00:24:16,718

with.

429

00:24:16,718 --> 00:24:18,518

the differences upfront.

430

00:24:18,518 --> 00:24:20,648

And so that's like a little bit of human

bias, right?

431

00:24:20,648 --> 00:24:22,398

That you have just learned over time.

432

00:24:22,398 --> 00:24:26,638

And some things are just really hard for

people to accept, but over time, if you

433

00:24:26,638 --> 00:24:29,618

get people to trust you and you build that

relationship, there's a lot of human

434

00:24:29,618 --> 00:24:33,678

elements here and then they trust your

work by confirming their prior held

435

00:24:33,678 --> 00:24:38,258

beliefs, then they'll trust you and open

up a little bit more to being a little bit

436

00:24:38,258 --> 00:24:40,228

more open -minded about other things as

well.

437

00:24:40,228 --> 00:24:42,698

Because then like, okay, well, I know

you're not an idiot.

438

00:24:42,698 --> 00:24:45,486

Like you could speak my language some.

439

00:24:45,486 --> 00:24:49,026

now I might be more open to learning a

little bit of your language.

440

00:24:49,026 --> 00:24:53,806

And that's just sort of a human

relationship thing that you have to always

441

00:24:53,806 --> 00:24:54,790

work on.

442

00:24:56,714 --> 00:25:00,174

Yeah, that is very interesting.

443

00:25:00,234 --> 00:25:04,434

And I'm very, yeah, I'm always very

interested to hear about that because I

444

00:25:04,434 --> 00:25:10,794

also face clients daily and have to

explain models to them.

445

00:25:10,794 --> 00:25:17,814

And so as you were saying, that definitely

varies a lot in interactions to the model.

446

00:25:18,154 --> 00:25:25,902

But that negative wisdom of maybe

indulging the...

447

00:25:25,902 --> 00:25:33,142

the confirmation bias at the beginning and

then slowly go towards a bit more of

448

00:25:33,142 --> 00:25:34,282

speaking the truth.

449

00:25:34,282 --> 00:25:35,052

It's very interesting.

450

00:25:35,052 --> 00:25:39,682

I had not thought of that, but that's

yeah, definitely I can see that being a

451

00:25:39,682 --> 00:25:49,242

valid strategy when you also are in front

of someone who doesn't really understand

452

00:25:49,242 --> 00:25:52,522

the value of the modeling, I would say.

453

00:25:53,002 --> 00:25:54,606

Whereas when I

454

00:25:54,606 --> 00:25:59,706

encounter clients who are already

convinced of what the models can do for

455

00:25:59,706 --> 00:25:59,996

them.

456

00:25:59,996 --> 00:26:04,666

They are usually looking for contradicting

what they already think.

457

00:26:04,666 --> 00:26:07,866

And that's when they find the model

interesting.

458

00:26:07,866 --> 00:26:10,206

So I find that really, really cool to see.

459

00:26:10,206 --> 00:26:13,666

The contradictions are really where

there's value, right?

460

00:26:13,666 --> 00:26:17,026

But there's no value in a model if no one

uses it, right?

461

00:26:17,026 --> 00:26:20,760

Even if the model is really good, if no

one uses it, it has zero value.

462

00:26:20,974 --> 00:26:25,254

If they use it, the contradictions are

valuable if they're right, correct?

463

00:26:25,254 --> 00:26:30,994

So in soccer analysis, you know, I've

spent my career doing lots of different

464

00:26:30,994 --> 00:26:33,824

sports, but there's this sort of, this

applies to every sport.

465

00:26:33,824 --> 00:26:37,154

In basketball, we can call it the LeBron

test and soccer, we'll call it the messy

466

00:26:37,154 --> 00:26:40,394

test, where it's essentially, if you build

a model and it's trying to evaluate

467

00:26:40,394 --> 00:26:46,746

players and messy is not like one of the

top players in your model, then.

468

00:26:46,798 --> 00:26:49,778

You're not going to share it with anybody

because no one's going to believe you.

469

00:26:49,778 --> 00:26:49,928

Right.

470

00:26:49,928 --> 00:26:54,018

That's like the first thing everyone does

is like, okay, well is messy up top.

471

00:26:54,098 --> 00:26:57,498

And if like, if messy is near the top,

then like people, at least they'll listen

472

00:26:57,498 --> 00:26:58,678

to you a little bit longer.

473

00:26:58,678 --> 00:26:58,818

Right.

474

00:26:58,818 --> 00:27:00,418

But they're not going to listen to you at

all.

475

00:27:00,418 --> 00:27:02,898

If you're like, yeah, messy is an okay

player.

476

00:27:03,238 --> 00:27:03,478

Right.

477

00:27:03,478 --> 00:27:04,928

Like I don't care what your model says.

478

00:27:04,928 --> 00:27:05,058

Right.

479

00:27:05,058 --> 00:27:05,588

That's wrong.

480

00:27:05,588 --> 00:27:05,878

Right.

481

00:27:05,878 --> 00:27:07,168

That's that, that's what people believe.

482

00:27:07,168 --> 00:27:09,358

So it's like a little bit of like, I need

to feed you like, no, no, no.

483

00:27:09,358 --> 00:27:12,818

Like I'm taking a different approach than

what you do, but you know, my approach

484

00:27:12,818 --> 00:27:14,488

also thinks that messy is the best.

485

00:27:14,488 --> 00:27:14,978

Right.

486

00:27:14,978 --> 00:27:16,278

And then I'm like, it's okay.

487

00:27:16,278 --> 00:27:16,846

You know,

488

00:27:16,846 --> 00:27:17,856

Okay, yeah, we agree.

489

00:27:17,856 --> 00:27:19,546

He is really good.

490

00:27:20,986 --> 00:27:23,386

Yeah, it's like a sniff test, right?

491

00:27:23,386 --> 00:27:27,326

And it's like, in a way, it's like, well,

that's a strong prior.

492

00:27:27,406 --> 00:27:30,046

And it's like, it's saying, well, I have a

very strong prior.

493

00:27:30,046 --> 00:27:31,446

That message is really good.

494

00:27:31,446 --> 00:27:36,086

To convince me, otherwise you're going to

need really, really good data.

495

00:27:36,086 --> 00:27:41,386

It's like, well, the earth is very

probably somewhat round.

496

00:27:41,506 --> 00:27:44,430

It's going to be very hard for you to...

497

00:27:44,430 --> 00:27:48,250

move that prior from me and telling me

it's not, in a way.

498

00:27:48,250 --> 00:27:48,690

Yeah.

499

00:27:48,690 --> 00:27:51,310

And in sports, people have really strong

priors, right?

500

00:27:51,310 --> 00:27:54,770

So, you know, those sniff tests do really

matter.

501

00:27:54,770 --> 00:27:57,950

And as a modeler, even for myself, like,

I'm a human.

502

00:27:57,950 --> 00:27:59,170

So like, I do the same thing.

503

00:27:59,170 --> 00:28:02,350

If I'm building a model, I always want to

see the results.

504

00:28:02,350 --> 00:28:05,810

And it's like, I don't look at the median,

like I do, but I don't look at who the

505

00:28:05,810 --> 00:28:08,650

median result is in my model half the

time.

506

00:28:08,650 --> 00:28:11,630

I usually look at the best and I look at

the worst.

507

00:28:11,630 --> 00:28:15,350

And if I don't understand it, then I'm

like, maybe my model is doing something

508

00:28:15,350 --> 00:28:15,990

wrong.

509

00:28:15,990 --> 00:28:19,630

And I'm all like, gonna, I'm going to dive

in a little bit more.

510

00:28:19,630 --> 00:28:23,250

If it like confirms my prior held beliefs,

I'm like, it's probably correct.

511

00:28:23,250 --> 00:28:23,520

Right.

512

00:28:23,520 --> 00:28:26,370

And even as a modeler, right, you have to

be careful of that.

513

00:28:26,370 --> 00:28:31,550

But at the same time in sports, you know,

it's like I said, subjective analysis can

514

00:28:31,550 --> 00:28:32,290

be helpful.

515

00:28:32,290 --> 00:28:35,770

It's because people's subjective and I'm

like, there's wisdom.

516

00:28:35,770 --> 00:28:38,822

People coaches have been playing a game

for.

517

00:28:39,118 --> 00:28:43,358

20, 30, or coaching a game for 20 or 30

years to think that they don't have

518

00:28:43,358 --> 00:28:46,658

something to offer a model is kind of

crazy in my opinion.

519

00:28:46,658 --> 00:28:51,838

They might have biases and of course they

do, but their information that they can

520

00:28:51,838 --> 00:28:53,618

provide is useful.

521

00:28:54,378 --> 00:28:55,938

Yeah, definitely.

522

00:28:55,938 --> 00:28:59,958

And that's where we go back to what we

were talking about at the beginning in the

523

00:28:59,958 --> 00:29:02,908

value of Bayesian inference in that

context.

524

00:29:02,908 --> 00:29:08,462

Because if you can leverage that deep and

hard -hearned knowledge,

525

00:29:08,462 --> 00:29:15,722

from the coaches, from the scouts, and add

that to your model, it's like getting the

526

00:29:15,722 --> 00:29:16,782

best of both worlds.

527

00:29:16,782 --> 00:29:21,802

And that can make your analysis extremely

powerful and useful, as you were saying.

528

00:29:21,842 --> 00:29:22,252

Yeah.

529

00:29:22,252 --> 00:29:24,782

And people have done studies like this,

I've done studies like this.

530

00:29:24,782 --> 00:29:30,902

If you build a model just on the data and

ignore the human element, right?

531

00:29:30,902 --> 00:29:34,662

Or if you build a model just on human and

scouting analysis and ignore the other

532

00:29:34,662 --> 00:29:35,540

data.

533

00:29:35,694 --> 00:29:35,954

Right.

534

00:29:35,954 --> 00:29:39,364

Neither one of those is going to do as

well as when you combine both.

535

00:29:39,364 --> 00:29:42,074

And that's really, that's what, you know,

that's Bayesian analysis is you're

536

00:29:42,074 --> 00:29:47,734

combining subjective belief with objective

data and then making forecasts based on

537

00:29:47,734 --> 00:29:48,024

them.

538

00:29:48,024 --> 00:29:54,974

And we know that if you have priors that

are not really, really bad, a subjective

539

00:29:54,974 --> 00:30:01,194

Bayesian forecast is going to have smaller

error than a data, you know, what we call

540

00:30:01,194 --> 00:30:02,684

maximum likelihood forecast, right.

541

00:30:02,684 --> 00:30:04,434

And stats terms, right.

542

00:30:04,434 --> 00:30:05,294

Or.

543

00:30:05,294 --> 00:30:11,174

You know, just the human one, just the no

data, but, you know, feelings forecast as

544

00:30:11,174 --> 00:30:11,554

well, right?

545

00:30:11,554 --> 00:30:14,514

So there's the combination of the two,

always does better.

546

00:30:15,074 --> 00:30:15,454

Yeah.

547

00:30:15,454 --> 00:30:15,934

Yeah.

548

00:30:15,934 --> 00:30:16,474

Yeah.

549

00:30:16,474 --> 00:30:20,654

Preaching, preaching to the choir here for

sure.

550

00:30:21,194 --> 00:30:26,954

And actually, I think that's a good time

now in the episode to get a bit more

551

00:30:26,954 --> 00:30:34,174

nerdy, if we can, because I've seen you,

so you've obviously worked extensively

552

00:30:34,174 --> 00:30:34,766

with.

553

00:30:34,766 --> 00:30:41,426

soccer analytics and you have an

interesting soccer power ratings and

554

00:30:41,426 --> 00:30:48,606

projections on your website that I'm gonna

link to in the show notes but can you tell

555

00:30:48,606 --> 00:30:54,346

us about it and what makes these

projections unique in your perspective in

556

00:30:54,346 --> 00:31:01,466

evaluating team and player performance and

don't be afraid to dig into the nerdy

557

00:31:01,466 --> 00:31:03,182

details because...

558

00:31:03,182 --> 00:31:05,362

My audience definitely liked that.

559

00:31:05,362 --> 00:31:05,832

Yes.

560

00:31:05,832 --> 00:31:06,142

Sure.

561

00:31:06,142 --> 00:31:06,992

I'll dig in.

562

00:31:06,992 --> 00:31:09,202

So what's on my website is...

563

00:31:09,202 --> 00:31:10,422

Sorry if you can hear my dog there.

564

00:31:10,422 --> 00:31:16,402

What's on my website is perhaps the most

simple power ratings forecast that I've

565

00:31:16,402 --> 00:31:17,482

ever done.

566

00:31:17,482 --> 00:31:20,902

So I say that, not that it's like stupid

or anything.

567

00:31:20,902 --> 00:31:26,162

So when I was at ESPN, I build power

ratings in American football, both

568

00:31:26,162 --> 00:31:30,094

professional and collegiate, and

basketball, professional and collegiate.

569

00:31:30,094 --> 00:31:33,944

and hockey, I mean, like almost every

sport, right?

570

00:31:33,944 --> 00:31:38,714

So what's on my website, I'll explain the

model very simply is it's a Bayesian model

571

00:31:38,714 --> 00:31:42,114

where you have an effect for each team,

right?

572

00:31:42,114 --> 00:31:48,974

And the response variable is the expected

goals for each team.

573

00:31:48,974 --> 00:31:53,854

So usually when we do a power ratings and

we're trying to estimate for a team, you

574

00:31:53,854 --> 00:31:55,398

know, there's two sort of.

575

00:31:55,438 --> 00:31:59,058

things that we're trying to estimate their

offensive ability and their defensive

576

00:31:59,058 --> 00:32:04,198

ability and then you assume essentially

that their overall team ability, you know,

577

00:32:04,198 --> 00:32:08,178

if it's a linear model, right is the

combination of their offense and their

578

00:32:08,178 --> 00:32:09,778

defensive abilities.

579

00:32:09,778 --> 00:32:12,498

Okay, so you so essentially in each match,

right?

580

00:32:12,498 --> 00:32:18,098

You have essentially two rows of data

where you have the expected goals for the

581

00:32:18,098 --> 00:32:21,058

one team and then the expected goals for

the other and the reason we use expected

582

00:32:21,058 --> 00:32:23,182

goals, although I actually have

583

00:32:23,182 --> 00:32:24,732

lot of issues with the expected goals.

584

00:32:24,732 --> 00:32:31,022

They are a better indicator of how, how

good the team performed on offense than

585

00:32:31,022 --> 00:32:32,592

just the raw number of goals.

586

00:32:32,592 --> 00:32:32,812

And right.

587

00:32:32,812 --> 00:32:34,262

I don't need to go into details, right?

588

00:32:34,262 --> 00:32:39,062

It's essentially a, it's an expected value

as opposed to an observation from a

589

00:32:39,062 --> 00:32:44,082

Poisson distribution, which soccer scores

roughly, roughly reflect a Poisson or

590

00:32:44,082 --> 00:32:46,162

pretty close to a Poisson distribution,

right?

591

00:32:46,162 --> 00:32:48,582

The expected goals is that expectation.

592

00:32:48,642 --> 00:32:52,790

And so essentially I have a hierarchical

Bayesian model where I actually.

593

00:32:52,878 --> 00:32:54,458

I actually do a few things.

594

00:32:54,458 --> 00:32:58,478

So I actually assume the expected goals is

the mean of a Poisson distribution.

595

00:32:58,478 --> 00:33:04,518

The observed goals is the actual outcome

of the Poisson distribution.

596

00:33:04,518 --> 00:33:09,018

And then I fit a linear model essentially

where I look, okay, I have team A was on

597

00:33:09,018 --> 00:33:10,938

offense, team B was the opponent.

598

00:33:10,938 --> 00:33:13,958

And this was team A's expected goals.

599

00:33:13,958 --> 00:33:17,578

And I'm essentially fitting a regression

model, right?

600

00:33:17,578 --> 00:33:20,288

A Bayesian regression model where I have

individual team effects.

601

00:33:20,288 --> 00:33:22,382

I have a prior on each team.

602

00:33:22,930 --> 00:33:25,710

each team's offense and each team's

defense.

603

00:33:25,710 --> 00:33:29,110

And that prior, you know, rough, I don't

have to get too crazy.

604

00:33:29,110 --> 00:33:34,630

You know, I just use a normal distribution

and, and, you know, sometimes I actually,

605

00:33:34,630 --> 00:33:38,550

when I code in Stan, I actually like

using, distribution was a little, a little

606

00:33:38,550 --> 00:33:39,370

bit thicker tails.

607

00:33:39,370 --> 00:33:42,770

But I think for this model, I was just

trying to go simple, normal distribution

608

00:33:42,770 --> 00:33:47,650

prior with a mean, you know, for my

expected, essentially each team's expected

609

00:33:47,650 --> 00:33:50,382

goals per game, on offense versus.

610

00:33:50,382 --> 00:33:54,982

Defense right and the defensive value I

usually use I usually do the subtraction

611

00:33:54,982 --> 00:33:59,862

So it's team the offensive team minus the

defensive team and that way The the

612

00:33:59,862 --> 00:34:05,582

defensive team's value is is is higher if

they're a good defense So essentially if

613

00:34:05,582 --> 00:34:09,802

team a's, you know expect the goals and

they in a game against an average opponent

614

00:34:09,802 --> 00:34:17,562

is like 1 .5 and the defense was Average

expected goals in the game was you know

615

00:34:17,562 --> 00:34:19,822

that they allowed was 1 .4

616

00:34:19,822 --> 00:34:22,922

then you would say, the difference is like

0 .1, okay.

617

00:34:22,922 --> 00:34:28,522

I also include effects for being at home

in this model.

618

00:34:28,522 --> 00:34:29,942

I think, actually, I think that's all I

do.

619

00:34:29,942 --> 00:34:33,822

But in other models I've done, you can

look at things such as how much rest

620

00:34:33,822 --> 00:34:36,122

they've had since their last match.

621

00:34:36,182 --> 00:34:39,162

You can look at the difference between

each team's rest.

622

00:34:39,162 --> 00:34:40,942

And those are not linear effects, right?

623

00:34:40,942 --> 00:34:44,102

You have to do some sort of nonlinear

effects for that, right?

624

00:34:44,102 --> 00:34:49,454

Because like one day of rest is, two days

of rest is not,

625

00:34:49,454 --> 00:34:52,094

Like the difference between two days of

rest and one day of rest is very different

626

00:34:52,094 --> 00:34:54,254

than seven days versus eight days of rest,

right?

627

00:34:54,254 --> 00:34:58,234

Seven and eight days of rest are pretty

much the same thing, but two and one is

628

00:34:58,234 --> 00:34:59,344

very different, right?

629

00:34:59,344 --> 00:35:04,194

Like much bigger effect for having two

days of rest than just one day of rest.

630

00:35:04,194 --> 00:35:09,454

And so you can do things like that, or how

far away they had to travel, those sorts

631

00:35:09,454 --> 00:35:09,934

of things.

632

00:35:09,934 --> 00:35:13,434

Now in European soccer, that's not a huge

deal, because especially in the

633

00:35:13,434 --> 00:35:16,686

competitions within each country, no team

is traveling that far.

634

00:35:16,686 --> 00:35:18,866

But in American sports, it is a pretty big

deal.

635

00:35:18,866 --> 00:35:22,366

Like, you know, you, you have to fly five,

six hours across the country on short

636

00:35:22,366 --> 00:35:22,786

notice.

637

00:35:22,786 --> 00:35:25,726

Like that can, that can really affect

performance.

638

00:35:26,346 --> 00:35:29,646

and, and other things, like I said, I

don't have this in the soccer model, but

639

00:35:29,646 --> 00:35:34,306

I, if anyone's interested in modeling

sports outcomes, that people typically

640

00:35:34,306 --> 00:35:39,766

tend to overlook is the, I liked always a

big proponent of elevation, meaning that

641

00:35:39,766 --> 00:35:43,886

if there are certain sports where there

are certain teams that play at higher

642

00:35:43,886 --> 00:35:45,070

altitudes,

643

00:35:45,070 --> 00:35:48,650

And if you're not used to playing at

higher altitudes, it's actually a very

644

00:35:48,650 --> 00:35:52,070

noticeable effect in a model that you're

going to have a lower offensive output and

645

00:35:52,070 --> 00:35:55,470

you'll actually allow more points on the

other end due to fatigue.

646

00:35:55,470 --> 00:36:00,170

And so the United States, it's the teams

that are playing in Colorado and in Utah.

647

00:36:00,170 --> 00:36:03,810

But in Europe, it could be the teams that

have to go to Switzerland or the teams

648

00:36:03,810 --> 00:36:09,990

that have to go to some of these alpine

regions that are higher up in altitude.

649

00:36:10,062 --> 00:36:13,312

In Mexico, if you have to go to Mexico

City, it's extremely high.

650

00:36:13,312 --> 00:36:14,292

Or Colombia, right?

651

00:36:14,292 --> 00:36:17,502

I mean, depending on what you're doing,

these are very high altitude places that

652

00:36:17,502 --> 00:36:21,322

have shown to have a measurable impact on

an opponent's performance.

653

00:36:21,922 --> 00:36:23,922

Yeah, that's very fun.

654

00:36:23,922 --> 00:36:25,992

My God, I love those kind of models.

655

00:36:25,992 --> 00:36:27,522

That's so much fun.

656

00:36:27,742 --> 00:36:35,242

And I would also guess that, I mean, at

least my per would be that there is a

657

00:36:35,242 --> 00:36:39,854

reverse mechanism also for teams who are

used to playing altitude.

658

00:36:39,854 --> 00:36:45,254

Do they get a boost of performance when

they play closer to the C level?

659

00:36:45,254 --> 00:36:51,314

Because they could have had adaptation

that make them better when they go to the

660

00:36:51,314 --> 00:36:52,434

C level.

661

00:36:52,994 --> 00:36:53,194

Yeah.

662

00:36:53,194 --> 00:36:55,524

I mean, I think there's certainly science

behind that.

663

00:36:55,524 --> 00:37:00,714

I found that is a lot harder to show in a

model than the reverse.

664

00:37:00,714 --> 00:37:03,934

Not that it might not be there, but I

think the effect size, if it is there, is

665

00:37:03,934 --> 00:37:08,034

definitely smaller than the reverse.

666

00:37:08,274 --> 00:37:08,584

Yeah.

667

00:37:08,584 --> 00:37:09,198

That's what...

668

00:37:09,198 --> 00:37:10,618

That's what I would expect to like.

669

00:37:10,618 --> 00:37:17,578

I think the effect is here mainly because,

well, I've seen it.

670

00:37:17,598 --> 00:37:25,218

Like it seems to be pretty well seated in

the science literature, but that doesn't

671

00:37:25,218 --> 00:37:26,938

mean the effect is big.

672

00:37:27,478 --> 00:37:28,718

So yeah.

673

00:37:28,918 --> 00:37:29,138

Yeah.

674

00:37:29,138 --> 00:37:33,118

I mean, I'm a runner and I know that all

of the distance runners that are training

675

00:37:33,118 --> 00:37:37,198

for marathons that are elites and

professionals, they all train at higher

676

00:37:37,198 --> 00:37:38,228

altitudes, right?

677

00:37:38,228 --> 00:37:38,894

For the...

678

00:37:38,894 --> 00:37:44,074

six weeks leading up to a competition and

then they travel to the competition at a

679

00:37:44,074 --> 00:37:44,974

lower altitude.

680

00:37:44,974 --> 00:37:48,354

And, you know, they think they have an

oxygen performance boost due to that.

681

00:37:48,354 --> 00:37:49,274

Yeah.

682

00:37:49,274 --> 00:37:49,614

Yeah.

683

00:37:49,614 --> 00:37:53,594

Kind of like legal oxygen doping, legal

blood doping.

684

00:37:53,594 --> 00:37:53,694

Yeah.

685

00:37:53,694 --> 00:37:54,374

Yeah, exactly.

686

00:37:54,374 --> 00:37:54,694

Yeah.

687

00:37:54,694 --> 00:37:54,774

Yeah.

688

00:37:54,774 --> 00:37:59,034

I mean, I think it seems to be pretty much

proven.

689

00:37:59,034 --> 00:38:04,614

I would say maybe it has more of an impact

on individual spots like marathon running

690

00:38:04,614 --> 00:38:07,982

or else, because it's more like, you know,

it's just like,

691

00:38:07,982 --> 00:38:13,682

Even if you're winning just a few tenths

of a second, well, it can help you have a

692

00:38:13,682 --> 00:38:19,962

better time in the end because, well, at

this level, just having the smallest

693

00:38:19,962 --> 00:38:26,462

increase in performance could be the

difference between first and second place.

694

00:38:26,822 --> 00:38:36,082

But maybe that's harder to see such a

small effect on a collective spot, a

695

00:38:36,082 --> 00:38:37,582

collective game because, well,

696

00:38:37,582 --> 00:38:39,462

Maybe there are some...

697

00:38:39,462 --> 00:38:41,152

Maybe it's just not an addition.

698

00:38:41,152 --> 00:38:43,882

Maybe it's actually the effect cancel out.

699

00:38:43,882 --> 00:38:47,582

So in the end, you don't really see a big

effect.

700

00:38:48,162 --> 00:38:49,302

But that would be...

701

00:38:49,302 --> 00:38:49,942

Yeah.

702

00:38:49,942 --> 00:38:52,922

I'd love to do an experiment on that.

703

00:38:52,922 --> 00:38:54,142

Like an RCT.

704

00:38:54,142 --> 00:38:56,102

That would be so much fun.

705

00:38:57,282 --> 00:38:57,632

Yeah.

706

00:38:57,632 --> 00:38:59,782

Well, good luck trying to do experiments

in sports.

707

00:38:59,782 --> 00:39:00,682

It's hard.

708

00:39:00,682 --> 00:39:01,742

Yeah, I know.

709

00:39:01,742 --> 00:39:02,402

I know.

710

00:39:02,402 --> 00:39:03,242

But that...

711

00:39:03,242 --> 00:39:05,742

I mean, if the multiverse exists...

712

00:39:05,742 --> 00:39:08,922

Then there is a universe where we can do

that kind of experiments.

713

00:39:08,922 --> 00:39:12,702

And my god, these scientists must have so

much fun.

714

00:39:14,942 --> 00:39:20,842

And yeah, so thanks a lot, first, for

detailing the model that clearly and in so

715

00:39:20,842 --> 00:39:21,602

much details.

716

00:39:21,602 --> 00:39:23,042

That's super cool.

717

00:39:23,342 --> 00:39:27,082

So the results of the model are in a cool

dashboard on your website.

718

00:39:27,082 --> 00:39:32,522

Do you have the model and data available

freely, maybe on your GitHub, that we can

719

00:39:32,522 --> 00:39:34,292

put in the show notes?

720

00:39:35,534 --> 00:39:36,454

Yeah, I'm not sure.

721

00:39:36,454 --> 00:39:40,964

I think my GitHub, I don't know if my

GitHub model is in the model.

722

00:39:40,964 --> 00:39:41,524

It's on GitHub.

723

00:39:41,524 --> 00:39:44,614

I don't know if it's private or not, but I

can let you know.

724

00:39:45,414 --> 00:39:49,194

You know, I use actually open source data

for that.

725

00:39:49,194 --> 00:39:51,264

So I, I, let me double check.

726

00:39:51,264 --> 00:39:55,734

I can actually double check and get back

to you after the show on if, yeah, if I

727

00:39:55,754 --> 00:39:58,494

could have it in my public GitHub or not.

728

00:39:58,494 --> 00:39:59,564

So, yeah.

729

00:39:59,564 --> 00:39:59,844

Yeah.

730

00:39:59,844 --> 00:40:05,326

But essentially it uses the, there's a

package called world football R and.

731

00:40:05,326 --> 00:40:08,626

It uses data from there to build the

model.

732

00:40:08,626 --> 00:40:13,346

So some of that data is just from, it's

scraped from like transfer market.

733

00:40:13,666 --> 00:40:17,666

so I use, I use, I didn't really talk

about how I set priors means for each of

734

00:40:17,666 --> 00:40:24,366

the teams, but very, a very simple, very

simple, hierarchical model is essentially

735

00:40:24,366 --> 00:40:32,286

just to use the expenditures of the club

and use that as a prior mean for how good

736

00:40:32,286 --> 00:40:34,206

the club will be going into the season.

737

00:40:34,206 --> 00:40:34,542

And, and.

738

00:40:34,542 --> 00:40:40,442

Unlike some other sports in soccer, world

football, how much a club spends is very

739

00:40:40,442 --> 00:40:45,062

highly correlated with how successful they

are, which makes sense, but it's not true

740

00:40:45,062 --> 00:40:47,082

necessarily in like baseball.

741

00:40:47,562 --> 00:40:52,342

So, do you see these effects of budget?

742

00:40:52,342 --> 00:40:57,292

So, yeah, first, before I go on a follow

up question, yeah, for sure.

743

00:40:57,292 --> 00:40:58,962

Get back to me after the show.

744

00:40:58,962 --> 00:41:03,758

And if that's possible, we'll put that in

the show notes because I'm sure.

745

00:41:03,758 --> 00:41:06,718

A lot of listeners will be interested in

checking that out.

746

00:41:06,718 --> 00:41:10,578

I personally will be very interested in

checking that out, definitely.

747

00:41:10,658 --> 00:41:12,598

So that'd be awesome.

748

00:41:12,758 --> 00:41:21,558

And second, that effect of budget that you

see on the performance of a team.

749

00:41:21,918 --> 00:41:28,258

And so I guess in football performance

mean number of expect expectation of games

750

00:41:28,258 --> 00:41:29,538

won.

751

00:41:30,638 --> 00:41:32,418

Do you see that on Curse?

752

00:41:32,418 --> 00:41:33,382

Do you see?

753

00:41:33,454 --> 00:41:39,894

that much of an effect also in a closed

league system like the MLS?

754

00:41:40,714 --> 00:41:45,214

Or is that so because my prior would be

the effect of budget would be even

755

00:41:45,234 --> 00:41:51,814

stronger in open leagues like we have in

Europe because it's like there is no

756

00:41:51,814 --> 00:41:53,694

compensation mechanism, right?

757

00:41:53,694 --> 00:41:58,994

Clubs can go down and usually in Europe

the strongest clubs are the historical

758

00:41:58,994 --> 00:41:59,854

clubs.

759

00:41:59,854 --> 00:42:04,214

or the new clubs are just the ones that

were lucky to be bought by very, very

760

00:42:04,214 --> 00:42:06,494

healthy shareholders.

761

00:42:06,734 --> 00:42:11,234

And like, there is not a lot of switching

of the hierarchy and changing of the

762

00:42:11,234 --> 00:42:14,754

hierarchy, mainly because of budget, as

you were saying.

763

00:42:14,754 --> 00:42:20,134

But I would think that maybe the effect of

budget is less strong in a closed league

764

00:42:20,134 --> 00:42:21,474

like the MLS.

765

00:42:21,474 --> 00:42:22,224

Is that true?

766

00:42:22,224 --> 00:42:24,934

Is that something you see or is it

something that's still in the air?

767

00:42:25,806 --> 00:42:26,346

Yes.

768

00:42:26,346 --> 00:42:30,326

So I haven't looked specifically at the

MLS, but in general in American sports,

769

00:42:30,326 --> 00:42:35,206

which all have closed leagues, the budget,

well, for various reasons, the budget

770

00:42:35,206 --> 00:42:37,326

effects are not super strong.

771

00:42:37,326 --> 00:42:42,226

So, you know, in American baseball, there

is no spending limit.

772

00:42:42,226 --> 00:42:46,066

So in some American sports, like the NFL

and football, like there's a salary cap,

773

00:42:46,066 --> 00:42:47,976

meaning you can't spend more than a

certain amount.

774

00:42:47,976 --> 00:42:52,086

So there is no relationship between

overall spending and winning because

775

00:42:52,086 --> 00:42:55,346

everyone has to spend a minimum and

there's a maximum.

776

00:42:55,406 --> 00:42:58,596

In baseball, there is no limit.

777

00:42:58,596 --> 00:42:59,406

There's a tax.

778

00:42:59,406 --> 00:43:02,526

If you spend too much money, they do tax

you.

779

00:43:02,666 --> 00:43:04,506

But there's still not a huge correlation.

780

00:43:04,506 --> 00:43:06,926

And then in MLS, like I said, I'm not

entirely sure.

781

00:43:06,926 --> 00:43:10,846

Most of the clubs, they are constrained

about how much they can spend.

782

00:43:11,286 --> 00:43:14,506

And so there isn't as much variance also

in spending.

783

00:43:14,506 --> 00:43:19,066

So like, you know, Messi going to Inter

Miami, it wasn't that Inter Miami could

784

00:43:19,066 --> 00:43:20,430

pay him a lot of money.

785

00:43:20,430 --> 00:43:24,510

They actually, you know, there's a couple

of exemptions that an MLS club could use

786

00:43:24,510 --> 00:43:26,050

to pay an international player.

787

00:43:26,050 --> 00:43:29,010

They have, they're called, you know, a

couple of exemption players they have.

788

00:43:29,010 --> 00:43:34,250

And that's originally started when David

Beckham went to Los Angeles and they kind

789

00:43:34,250 --> 00:43:38,570

of made that rule essentially just so he

could, they could afford paying him what

790

00:43:38,570 --> 00:43:43,310

he was used to or close to what he was

used to being paid in Europe.

791

00:43:43,970 --> 00:43:45,490

and, and the MLS is still kind of the

case.

792

00:43:45,490 --> 00:43:49,646

You have one or two players you're allowed

to have on these exemptions and.

793

00:43:49,646 --> 00:43:55,786

The way Messi was able to make it work is

he's getting paid from Apple for his

794

00:43:55,786 --> 00:43:58,866

Apple's broadcasting the MLS games.

795

00:43:58,866 --> 00:44:04,346

So they're paying him essentially to play

in the MLS because they're hoping, more

796

00:44:04,346 --> 00:44:07,596

people are going to watch our broadcasts

are going to pay us.

797

00:44:07,596 --> 00:44:09,716

And so we're going to give you a

percentage of that.

798

00:44:09,716 --> 00:44:12,946

And that's where actually a lot of his

salary or like his earnings are coming

799

00:44:12,946 --> 00:44:18,366

from is from a, a deal with Apple versus

the actual MLS club in Miami, which can

800

00:44:18,366 --> 00:44:19,566

only pay him so much.

801

00:44:19,566 --> 00:44:23,966

So my guess is, my prior is, I haven't

looked specifically at the MLS with this,

802

00:44:23,966 --> 00:44:27,846

but my prior is yes, that there isn't a

huge relationship in the MLS between

803

00:44:27,846 --> 00:44:30,726

winning and spending just because there's

not much of a variance.

804

00:44:30,726 --> 00:44:34,926

In order to see those correlations, you

have to have a large enough variance in

805

00:44:34,926 --> 00:44:37,716

the spending to notice the relationship,

right?

806

00:44:37,716 --> 00:44:38,406

So.

807

00:44:38,926 --> 00:44:41,306

Yeah, definitely interesting.

808

00:44:42,046 --> 00:44:46,638

I mean, I love also looking at these, you

know, the...

809

00:44:46,638 --> 00:44:55,498

how the structure of a league impacts the

show and the wins is extremely

810

00:44:55,498 --> 00:44:56,028

interesting.

811

00:44:56,028 --> 00:45:00,558

That can seem very nerdy and I think

that's my political science training that

812

00:45:00,558 --> 00:45:06,678

kicks back here, but really how you

structure the game also makes the game

813

00:45:06,678 --> 00:45:09,698

what it is and the results and the show

you're going to get.

814

00:45:10,298 --> 00:45:15,378

I find that extremely interesting to see

how the American games, the US games are

815

00:45:15,378 --> 00:45:16,370

structured.

816

00:45:16,398 --> 00:45:24,138

Because ironically, it's a system where

there is much more social transfers, if

817

00:45:24,138 --> 00:45:28,558

you want, like we have in Europe for

social security and health and education.

818

00:45:28,558 --> 00:45:33,338

American sports are socialist, and

European sports are capitalist.

819

00:45:33,338 --> 00:45:37,118

But typically, we consider Americans to be

more capitalist and the Europeans to be

820

00:45:37,118 --> 00:45:38,058

more socialist.

821

00:45:38,058 --> 00:45:41,078

So it's an interesting inversion.

822

00:45:41,078 --> 00:45:41,478

Yeah.

823

00:45:41,478 --> 00:45:42,718

No, definitely.

824

00:45:42,838 --> 00:45:45,006

And I mean, I think...

825

00:45:45,006 --> 00:45:48,366

Honestly, that's going to be interesting

in the coming years to see what's

826

00:45:48,366 --> 00:45:51,986

happening on the European side because

there are more and more debates about

827

00:45:51,986 --> 00:45:59,266

whether we should have a closed European

wide league, which would basically be an

828

00:45:59,266 --> 00:46:02,266

extension of the current Champions League.

829

00:46:02,346 --> 00:46:07,406

And honestly, I think it's going to take

that road because more and more

830

00:46:07,926 --> 00:46:12,946

championship, at least all the

championship, I would say, for the

831

00:46:12,946 --> 00:46:14,574

exception of the Premier League.

832

00:46:14,574 --> 00:46:17,094

get more and more concentrated on just a

few clubs.

833

00:46:17,094 --> 00:46:22,314

And just from time to time, you have one

club that bumps onto the top, like

834

00:46:22,314 --> 00:46:27,634

Leverkusen this year in Germany, Monaco in

France a few years ago, Montpellier.

835

00:46:27,634 --> 00:46:29,834

But that's like really exceptions.

836

00:46:29,834 --> 00:46:34,084

And in the end, you almost always get the

same clubs that win all the time.

837

00:46:34,084 --> 00:46:39,564

And so the idea of open leagues is not

really true for the top of the leagues.

838

00:46:39,564 --> 00:46:43,694

It's definitely true for the bottom, but

the big clubs never go down.

839

00:46:43,694 --> 00:46:44,110

And...

840

00:46:44,110 --> 00:46:48,190

And so I think at some point, this

illusion of the open leagues is going to

841

00:46:48,190 --> 00:46:53,770

disappear and probably we'll get a

European wide championship where like

842

00:46:53,770 --> 00:46:58,970

basically the leagues are going to get a

bit more even because I think it's better

843

00:46:58,970 --> 00:47:00,970

for the show and that's going to make more

money.

844

00:47:00,970 --> 00:47:03,490

And in the end, I think that's what the

question is also.

845

00:47:03,490 --> 00:47:05,750

Yeah, you might be right, but I hope, I

hope not.

846

00:47:05,750 --> 00:47:11,350

I really, as an American, always have

dreamed of Americans doing relegation and

847

00:47:11,350 --> 00:47:12,750

promotion just because...

848

00:47:12,750 --> 00:47:16,900

You know, in America, we have this problem

where we call it tanking, right?

849

00:47:16,900 --> 00:47:20,810

Because we have the socialist draft system

where if the worst teams are incentivized

850

00:47:20,810 --> 00:47:24,150

to lose because they know they're not

going to win.

851

00:47:24,150 --> 00:47:29,570

So they want to get the best possible

players in the draft the next season.

852

00:47:29,570 --> 00:47:33,050

And so they're incentivized, you know, to,

to lose a little bit more.

853

00:47:33,050 --> 00:47:36,370

And so that really does kind of, you know,

the promotion relegation is nice because

854

00:47:36,370 --> 00:47:40,710

it solves that, you know, if you keep

losing, you lose a lot of money because

855

00:47:40,710 --> 00:47:42,052

you get sent down.

856

00:47:42,480 --> 00:47:47,410

so everyone's motivated even at the bottom

of each league to keep winning games,

857

00:47:47,410 --> 00:47:47,730

right?

858

00:47:47,730 --> 00:47:48,680

As much as possible.

859

00:47:48,680 --> 00:47:50,920

Otherwise they lose a lot of money.

860

00:47:50,920 --> 00:47:54,890

And in American leagues with the closed

system, it's like, well, Hey, you know,

861

00:47:54,890 --> 00:47:56,530

it's actually, we talk about sick sickle.

862

00:47:56,530 --> 00:47:59,950

He, and one thing that sports analytics

analytics have done is essentially say,

863

00:47:59,950 --> 00:48:06,810

it's really hard to go from an American

sport being an average team to a really

864

00:48:06,810 --> 00:48:07,580

good team.

865

00:48:07,580 --> 00:48:09,318

And the reason is.

866

00:48:09,518 --> 00:48:10,728

is the draft system.

867

00:48:10,728 --> 00:48:15,358

So in the draft system, people are always

overconfident in how good the players are,

868

00:48:15,358 --> 00:48:20,258

but there's really thick right tails of

how good a player can be.

869

00:48:20,258 --> 00:48:23,958

So when you get a new player who's young

and you can draft them at the top of the

870

00:48:23,958 --> 00:48:27,908

draft, they might not pan out, but they

also have a really thick right tail,

871

00:48:27,908 --> 00:48:31,958

meaning that if they do pan out, you could

go from being one of the worst teams to

872

00:48:31,958 --> 00:48:34,018

one of the best teams really quickly.

873

00:48:34,018 --> 00:48:35,206

And so,

874

00:48:35,662 --> 00:48:38,982

You know, it's this other analysis of

like, well, if you don't ever have an

875

00:48:38,982 --> 00:48:43,182

option opportunity to draft someone in a

position where there's that right tail,

876

00:48:43,182 --> 00:48:47,642

where, you know, once out of every five

years, you get a player who's transcends

877

00:48:47,642 --> 00:48:52,562

everyone else that comes in, then you

can't move up from average to really good,

878

00:48:52,562 --> 00:48:55,482

but you can go from being bad to really

good.

879

00:48:55,482 --> 00:48:59,742

So often teams and the smarter teams, if

they're really good, they say really good.

880

00:48:59,742 --> 00:49:03,082

But once they start noticing the players

are getting older, they just trade

881

00:49:03,082 --> 00:49:04,174

everybody away.

882

00:49:04,174 --> 00:49:08,394

They get rid of all their best players and

they just stink for a year or two and

883

00:49:08,394 --> 00:49:09,744

hopefully they can get some good draft.

884

00:49:09,744 --> 00:49:11,474

They get a lot of draft picks.

885

00:49:11,474 --> 00:49:11,594

Essentially.

886

00:49:11,594 --> 00:49:14,974

They try to trade their players away, get

more draft picks, and then it becomes a

887

00:49:14,974 --> 00:49:15,694

sample size problem.

888

00:49:15,694 --> 00:49:21,074

And it says, well, if we have more draft

picks, our probability of getting someone

889

00:49:21,074 --> 00:49:22,834

on the right tail goes up.

890

00:49:22,834 --> 00:49:25,954

And so that's all we're going to do is

we're just going to increase our odds of

891

00:49:25,954 --> 00:49:27,114

getting that right tail player.

892

00:49:27,114 --> 00:49:30,714

And if we get that player, then we'll be

good again.

893

00:49:31,134 --> 00:49:31,274

Yeah.

894

00:49:31,274 --> 00:49:31,734

Yeah.

895

00:49:31,734 --> 00:49:32,494

It's like.

896

00:49:32,494 --> 00:49:34,494

buying a lot of lottery tickets.

897

00:49:34,494 --> 00:49:36,294

Yeah, that's what they're doing.

898

00:49:36,574 --> 00:49:38,754

Yeah, now that's fascinating.

899

00:49:38,894 --> 00:49:41,794

Yeah, I wasn't aware of these effects.

900

00:49:41,794 --> 00:49:43,694

That's super interesting.

901

00:49:43,934 --> 00:49:48,734

Because basically, what you're saying is

there is an incentive to be extreme,

902

00:49:48,734 --> 00:49:49,194

basically.

903

00:49:49,194 --> 00:49:53,254

Either you want to be among the top ones

or you want to be among the worst ones.

904

00:49:53,254 --> 00:49:55,794

But being in the middle is the worst,

actually.

905

00:49:55,794 --> 00:49:56,794

It is the worst.

906

00:49:56,794 --> 00:49:57,134

Yeah.

907

00:49:57,134 --> 00:49:57,454

Yeah.

908

00:49:57,454 --> 00:49:59,614

That is extremely interesting.

909

00:49:59,614 --> 00:50:01,254

And that's...

910

00:50:01,710 --> 00:50:06,490

Yeah, I mean, I actually don't know which

system I prefer.

911

00:50:06,670 --> 00:50:12,310

Honestly, I'm just saying I think Europe

is getting, is going there because we have

912

00:50:12,310 --> 00:50:16,330

more and more basically concentration of

the wealth at the very top of the leagues

913

00:50:16,330 --> 00:50:19,070

and that's going to make the national

leagues less and less interesting

914

00:50:19,070 --> 00:50:20,310

basically.

915

00:50:21,110 --> 00:50:26,950

But I don't know either if I prefer the

European wide championship.

916

00:50:27,050 --> 00:50:30,382

Well, I think I would prefer European wide

championship.

917

00:50:30,382 --> 00:50:33,342

for sure, but I think it would be great to

have it still open.

918

00:50:33,342 --> 00:50:36,542

So where you could have, you know, like

basically countries would become regions

919

00:50:36,542 --> 00:50:42,222

and then you get from like, if you, if

you're in the best in France, basically in

920

00:50:42,222 --> 00:50:45,342

one year, then you get to the highest

level, which is the European one.

921

00:50:45,342 --> 00:50:49,622

And then if you're among the worst, you

get down to your country the next year.

922

00:50:49,622 --> 00:50:55,262

I think that would be very fun because

the, like, especially now that players can

923

00:50:55,262 --> 00:50:58,478

be traded very easily between the, the...

924

00:50:58,478 --> 00:51:01,758

continental Europe because it's basically

the same country legally.

925

00:51:01,758 --> 00:51:07,178

That also makes sense that the teams, you

know, basically meeting PSG versus

926

00:51:07,178 --> 00:51:14,518

Barcelona is much more tied than PSG

versus literally any team in France.

927

00:51:14,578 --> 00:51:17,298

So yeah, that's going to be very

interesting.

928

00:51:17,558 --> 00:51:24,494

But at the same time, I'm very, yeah, I

love hearing about the wrong incentives.

929

00:51:24,494 --> 00:51:27,074

at the same time of the closed system.

930

00:51:27,074 --> 00:51:28,594

So thanks a lot for that.

931

00:51:28,594 --> 00:51:30,354

That's food for thought.

932

00:51:30,774 --> 00:51:36,234

And that's again, like that's very close

to two elections, actually, like how you

933

00:51:36,234 --> 00:51:39,394

count the votes impacts the winner.

934

00:51:39,394 --> 00:51:46,274

And so here, like really in sports to how

you structure your game has an impact on

935

00:51:46,274 --> 00:51:47,014

the winners.

936

00:51:47,014 --> 00:51:52,914

And I think it's extremely important to

keep in mind because in the end, like how

937

00:51:52,914 --> 00:51:53,838

the

938

00:51:53,838 --> 00:52:01,678

the organization, so the MLS in the US or

the UEFA in Europe have actually huge

939

00:52:01,678 --> 00:52:04,678

power over the game.

940

00:52:06,258 --> 00:52:09,498

Well, thanks for that political science

parenthesis.

941

00:52:09,498 --> 00:52:12,638

I wasn't expecting that, but that's

definitely super interesting.

942

00:52:13,018 --> 00:52:17,838

To get back to the modeling because time

is running by and I definitely want to ask

943

00:52:17,838 --> 00:52:22,286

you about the plus minus models because

you're using that also to...

944

00:52:22,286 --> 00:52:25,006

estimate player value in American

football.

945

00:52:25,006 --> 00:52:26,385

So I'm curious about that.

946

00:52:26,385 --> 00:52:28,506

What is that kind of model?

947

00:52:28,506 --> 00:52:33,406

Is that mainly for American football that

you're using that also for other sports?

948

00:52:33,406 --> 00:52:37,986

Or if it's only for American football, why

is that particularly tailored to that

949

00:52:37,986 --> 00:52:38,906

sport?

950

00:52:39,486 --> 00:52:39,766

Yeah.

951

00:52:39,766 --> 00:52:45,906

So plus minus models actually are

originated in basketball and they're, they

952

00:52:45,906 --> 00:52:46,936

work the best in basketball.

953

00:52:46,936 --> 00:52:48,166

They're not perfect.

954

00:52:48,166 --> 00:52:51,746

And that sort of the concept in basketball

is you have 10 players on the court at

955

00:52:51,746 --> 00:52:52,526

each.

956

00:52:52,526 --> 00:52:54,886

at each moment and they substitute in and

out.

957

00:52:54,886 --> 00:52:58,706

But while those 10 players are on the

court, you know how many points are scored

958

00:52:58,706 --> 00:53:00,286

for each team, right?

959

00:53:00,286 --> 00:53:04,706

So, you know, five players on the offense

side and five players on defensive side.

960

00:53:04,706 --> 00:53:09,086

There's essentially just a big linear

model and you look at and you want to

961

00:53:09,086 --> 00:53:12,746

adjust for how long they're on the court

or how many possessions they were on the

962

00:53:12,746 --> 00:53:13,136

court for.

963

00:53:13,136 --> 00:53:16,846

So you can say, okay, these 10 players are

on the court for two and a half minutes.

964

00:53:16,846 --> 00:53:21,186

And in those two and a half minutes, this

team scored six points and their team

965

00:53:21,186 --> 00:53:22,318

scored four points.

966

00:53:22,318 --> 00:53:26,678

And essentially what you're doing then is

a plus minus model, essentially.

967

00:53:26,678 --> 00:53:31,618

So sometimes you might see in a, in a

statistic after the game, like the total

968

00:53:31,618 --> 00:53:36,518

difference in the net points for the team

when a player was on the court versus when

969

00:53:36,518 --> 00:53:37,438

they're not.

970

00:53:37,438 --> 00:53:40,238

Well, that's not too useful because

there's a lot of correlations, right?

971

00:53:40,238 --> 00:53:42,738

You're playing with someone else a lot.

972

00:53:42,738 --> 00:53:46,538

So what we call an adjusted plus minus

model, right, is a linear model that then

973

00:53:46,538 --> 00:53:51,618

tries to fit those player effects of, you

know, you get a one when you're on the

974

00:53:51,618 --> 00:53:52,014

court.

975

00:53:52,014 --> 00:53:53,984

on offense and negative one year on

defense.

976

00:53:53,984 --> 00:53:56,354

And we look at your team's efficiency,

right?

977

00:53:56,354 --> 00:54:01,634

Your points divided by some denominator,

whether it's minutes or possessions.

978

00:54:01,634 --> 00:54:01,834

Okay.

979

00:54:01,834 --> 00:54:04,124

And that's sort of the basketball thing

over time.

980

00:54:04,124 --> 00:54:08,574

They realized, okay, well, there's so much

correlation between who is playing

981

00:54:08,574 --> 00:54:09,174

together.

982

00:54:09,174 --> 00:54:10,664

We need to adjust for that.

983

00:54:10,664 --> 00:54:12,254

So they used ridge regression.

984

00:54:12,254 --> 00:54:15,554

And so that would divvy up the credit a

little bit better.

985

00:54:15,554 --> 00:54:18,542

And you know, ridge regression is very

good at when there's

986

00:54:18,542 --> 00:54:22,102

A lot of multicollinearity or correlation

between two effects, right?

987

00:54:22,102 --> 00:54:25,822

And on the basketball team or all

basketball players, you have teammates

988

00:54:25,822 --> 00:54:29,262

that play a lot together and they don't

play with other people a lot.

989

00:54:29,282 --> 00:54:33,642

But Ridge Regression has done a decently

good job in basketball over a big sample

990

00:54:33,642 --> 00:54:37,022

of estimating how effective players are.

991

00:54:37,022 --> 00:54:40,902

And if you look at these things, you'll

see, we talked about the sniff test.

992

00:54:40,902 --> 00:54:43,694

In 2012, LeBron is the number one player.

993

00:54:43,694 --> 00:54:46,914

And he's the number one player for a lot

of the years, not so much anymore because

994

00:54:46,914 --> 00:54:47,964

he's older, et cetera.

995

00:54:47,964 --> 00:54:48,144

Right.

996

00:54:48,144 --> 00:54:50,794

But that's sort of those sniff tests that

we get.

997

00:54:50,834 --> 00:54:55,644

Well, some people in, in basketball and

I'm proponent of this, like, you know,

998

00:54:55,644 --> 00:55:00,894

this is a Bayesian podcast is that ridge

regression, you know, for those unfamiliar

999

00:55:00,894 --> 00:55:04,864

is, is a frequentist way to write a

Bayesian model.

Speaker:

00:55:04,864 --> 00:55:09,474

That's very specific where you have a

normal prior on each player with a mean

Speaker:

00:55:09,474 --> 00:55:10,474

zero.

Speaker:

00:55:10,474 --> 00:55:10,774

Okay.

Speaker:

00:55:10,774 --> 00:55:12,270

And that's ridge regression.

Speaker:

00:55:12,270 --> 00:55:16,310

So we think about it from that perspective

with adjusted plus minus models.

Speaker:

00:55:16,350 --> 00:55:20,030

What happens when you have a normal prior

with mean zero is that when you have

Speaker:

00:55:20,030 --> 00:55:24,710

players that play less, we shrink more

towards the prior mean.

Speaker:

00:55:24,770 --> 00:55:28,349

And it's only when we have more data for

players that we can deviate from that

Speaker:

00:55:28,349 --> 00:55:29,150

prior mean.

Speaker:

00:55:29,150 --> 00:55:33,490

Well, one thing we know about sports is if

you're not playing as much, that actually

Speaker:

00:55:33,490 --> 00:55:34,630

is pretty useful information.

Speaker:

00:55:34,630 --> 00:55:35,670

And what does that tell us?

Speaker:

00:55:35,670 --> 00:55:36,990

You're not very good.

Speaker:

00:55:36,990 --> 00:55:39,570

Because if you're good, you're going to

play more.

Speaker:

00:55:39,570 --> 00:55:41,710

And if you're bad, you play less.

Speaker:

00:55:41,710 --> 00:55:45,900

So other people have come around and, you

know, in the last 10, 15 years and said,

Speaker:

00:55:45,900 --> 00:55:50,090

okay, well, instead of a ridge regression

model for basketball, we should do a

Speaker:

00:55:50,090 --> 00:55:51,070

Bayesian regression model.

Speaker:

00:55:51,070 --> 00:55:55,650

And instead of having a mean zero for a

player, we should have a mean of something

Speaker:

00:55:55,650 --> 00:55:56,120

else.

Speaker:

00:55:56,120 --> 00:55:58,730

So there's a few different versions that

people have done.

Speaker:

00:55:58,730 --> 00:56:04,270

One thing, a very simple version is say

just everybody has a mean prior mean of,

Speaker:

00:56:04,270 --> 00:56:06,630

you know, what we call a replacement

player.

Speaker:

00:56:06,630 --> 00:56:07,110

Okay.

Speaker:

00:56:07,110 --> 00:56:08,750

Someone that doesn't play very much.

Speaker:

00:56:08,750 --> 00:56:10,574

If you're really good and you play a lot.

Speaker:

00:56:10,574 --> 00:56:14,633

It doesn't matter what the prior mean is

too much because the data is going to

Speaker:

00:56:14,633 --> 00:56:15,794

overwhelm the prior.

Speaker:

00:56:15,794 --> 00:56:20,294

But if you don't play very much, we're

going to stick with that sort of negative

Speaker:

00:56:20,294 --> 00:56:23,354

prior mean because it means you're below

average.

Speaker:

00:56:23,354 --> 00:56:25,144

And so that's one thing you can do.

Speaker:

00:56:25,144 --> 00:56:28,174

A more sophisticated thing sometimes

people will do is they'll build a

Speaker:

00:56:28,174 --> 00:56:33,874

hierarchical model where you have

essentially a, a prior mean that is based

Speaker:

00:56:33,874 --> 00:56:37,454

on other statistics that we observe.

Speaker:

00:56:37,474 --> 00:56:40,270

So how many points you score or how many

assists you have.

Speaker:

00:56:40,270 --> 00:56:46,610

And those that's called a box, a box score

prior mean or a box score plus minus.

Speaker:

00:56:46,610 --> 00:56:47,790

So that's sort of the basketball.

Speaker:

00:56:47,790 --> 00:56:49,190

So we gave you the what plus minus models.

Speaker:

00:56:49,190 --> 00:56:51,210

So that's sort of the basketball approach.

Speaker:

00:56:51,229 --> 00:56:51,850

Now.

Speaker:

00:56:51,920 --> 00:56:56,190

Basketball is really nice because you have

lots of games in the NBA.

Speaker:

00:56:56,190 --> 00:57:00,550

You play every team at least twice and you

substitute a lot and there's lots of

Speaker:

00:57:00,550 --> 00:57:01,450

scoring.

Speaker:

00:57:01,450 --> 00:57:06,690

Now my work in American football tried to

address a lot of these issues in American

Speaker:

00:57:06,690 --> 00:57:07,080

football.

Speaker:

00:57:07,080 --> 00:57:08,886

You don't play every team.

Speaker:

00:57:09,262 --> 00:57:11,182

you don't substitute very much.

Speaker:

00:57:11,182 --> 00:57:16,322

And if you do play, you only play with

certain people like all the time.

Speaker:

00:57:16,322 --> 00:57:18,812

And then there's not a lot of scoring

compared to basketball.

Speaker:

00:57:18,812 --> 00:57:22,742

There's some scoring, but you know,

there's, you know, American football point

Speaker:

00:57:22,742 --> 00:57:24,462

scoring is unique, right?

Speaker:

00:57:24,462 --> 00:57:27,542

You get six or seven points for a

touchdown, you get three points for a

Speaker:

00:57:27,542 --> 00:57:30,902

field goal, you know, and then on more

rare occasions, you get these two point

Speaker:

00:57:30,902 --> 00:57:32,182

safeties.

Speaker:

00:57:32,202 --> 00:57:32,878

Yeah.

Speaker:

00:57:32,878 --> 00:57:38,258

So there's roughly maybe 10 scoring events

in an American football game versus in

Speaker:

00:57:38,258 --> 00:57:42,318

basketball where you have, you know, a

hundred to a hundred.

Speaker:

00:57:42,318 --> 00:57:45,968

So there's, you know, about each two to

three points, each one there's, you know,

Speaker:

00:57:45,968 --> 00:57:48,428

80 to 120 scoring events in a basketball

game.

Speaker:

00:57:48,428 --> 00:57:48,598

Right.

Speaker:

00:57:48,598 --> 00:57:49,988

So these models work a lot better.

Speaker:

00:57:49,988 --> 00:57:53,398

My work in American football has been to

sort of, how do we take the basketball

Speaker:

00:57:53,398 --> 00:57:56,588

model and make some modifications so we

can do a football model?

Speaker:

00:57:56,588 --> 00:57:59,718

And so one of the things that is tricky in

football is.

Speaker:

00:57:59,886 --> 00:58:03,606

that certain positions never get

substituted out.

Speaker:

00:58:03,606 --> 00:58:07,925

So on offense, the quarterback plays every

single play unless they're hurt or they

Speaker:

00:58:07,925 --> 00:58:08,706

stink.

Speaker:

00:58:08,706 --> 00:58:10,296

So they get benched.

Speaker:

00:58:10,296 --> 00:58:14,966

Well, the quarterback also always plays

with the same offensive line as long as

Speaker:

00:58:14,966 --> 00:58:17,286

they're healthy and they don't get

substituted out.

Speaker:

00:58:17,286 --> 00:58:22,146

So how does a model separate credit when

the same players are on the field all the

Speaker:

00:58:22,146 --> 00:58:22,336

time?

Speaker:

00:58:22,336 --> 00:58:27,470

And so my work in that was sort of to use

Bayesian statistics and take the...

Speaker:

00:58:27,470 --> 00:58:31,590

the Bayesian regression model where we had

a prior mean, I used some information to

Speaker:

00:58:31,590 --> 00:58:36,690

inform the prior mean for each player, but

I also did this unique thing where I

Speaker:

00:58:36,690 --> 00:58:37,510

shrink.

Speaker:

00:58:37,510 --> 00:58:43,810

So the prior variance is a function and is

actually, there's one prior variance for

Speaker:

00:58:43,810 --> 00:58:48,410

all players and then it's multiplied by

another parameter, which is unique for the

Speaker:

00:58:48,410 --> 00:58:50,150

position that they play.

Speaker:

00:58:50,150 --> 00:58:54,610

And so quarterbacks have a different

shrinkage parameter, essentially, or prior

Speaker:

00:58:54,610 --> 00:58:56,238

variance than.

Speaker:

00:58:56,238 --> 00:58:58,278

a different position.

Speaker:

00:58:58,478 --> 00:59:02,218

And then instead of just looking at

scoring plays in football, we have what we

Speaker:

00:59:02,218 --> 00:59:03,868

call is expected points added.

Speaker:

00:59:03,868 --> 00:59:09,098

So at each play, we look at on average,

how many points are you going to score if

Speaker:

00:59:09,098 --> 00:59:11,238

you have the ball in this position?

Speaker:

00:59:11,238 --> 00:59:13,578

And I look at the difference between two

plays, right?

Speaker:

00:59:13,578 --> 00:59:18,378

And that tells you essentially how much

value you got in the result of the play.

Speaker:

00:59:18,438 --> 00:59:22,238

So instead of using every scoring play, I

just use every single play in football.

Speaker:

00:59:22,238 --> 00:59:24,782

And I do this unique shrinkage.

Speaker:

00:59:24,942 --> 00:59:28,901

dependent on position and doing that, and

it's a huge model.

Speaker:

00:59:28,901 --> 00:59:32,582

So I did this in college football, which

has way too many parameters because

Speaker:

00:59:32,582 --> 00:59:34,342

there's like 16 ,000 kids.

Speaker:

00:59:34,342 --> 00:59:39,652

But even in the NFL, I've done this and

you get interesting results.

Speaker:

00:59:39,652 --> 00:59:42,632

Sometimes they match up with what you

think, sometimes they don't.

Speaker:

00:59:42,632 --> 00:59:45,642

But the interesting thing is you can

actually estimate how much you should

Speaker:

00:59:45,642 --> 00:59:47,182

shrink each position.

Speaker:

00:59:47,182 --> 00:59:52,222

And so actually the model is nice because

it essentially tells you how much of the

Speaker:

00:59:52,222 --> 00:59:53,902

variance in the outcome of the play.

Speaker:

00:59:53,902 --> 00:59:58,622

is dependent on how good players are

across different positions.

Speaker:

00:59:58,762 --> 01:00:02,942

So in football, we all know that

quarterbacks are the most impactful

Speaker:

01:00:02,942 --> 01:00:05,062

position in the game.

Speaker:

01:00:05,062 --> 01:00:10,542

And I did give somewhat subjective priors,

but not with, I still left a lot of

Speaker:

01:00:10,542 --> 01:00:14,862

uncertainty around and the model very well

could see and estimate that quarterbacks

Speaker:

01:00:14,862 --> 01:00:20,182

are in fact the most important position

because you shrink them the less they have

Speaker:

01:00:20,182 --> 01:00:21,742

the largest variance.

Speaker:

01:00:21,742 --> 01:00:23,556

So.

Speaker:

01:00:23,886 --> 01:00:24,816

You could look at that.

Speaker:

01:00:24,816 --> 01:00:29,206

If you look at the most impactful players

in football, it should be a quarterback.

Speaker:

01:00:29,206 --> 01:00:33,866

But in the same measure, the worst players

in football are also quarterbacks because

Speaker:

01:00:33,866 --> 01:00:37,906

in order to negatively hurt your team, you

can only hurt your team really a lot.

Speaker:

01:00:37,906 --> 01:00:40,966

If you're a quarterback compared to other

positions, I mean, every position you can

Speaker:

01:00:40,966 --> 01:00:43,846

hurt your team, but no one can hurt a team

as much as a bad quarterback hurts their

Speaker:

01:00:43,846 --> 01:00:44,566

team.

Speaker:

01:00:44,566 --> 01:00:47,276

Just like a good quarterback can help

their team better.

Speaker:

01:00:47,276 --> 01:00:52,866

So that's sort of like a kind of rough

overview of, of my plus minus modeling in

Speaker:

01:00:52,866 --> 01:00:53,614

football.

Speaker:

01:00:53,614 --> 01:00:59,784

I think I do have, when I wrote the paper,

I have a version of that written in Stan.

Speaker:

01:00:59,784 --> 01:01:04,354

The data set itself was not public, but I

did have a version of the Stan model

Speaker:

01:01:04,354 --> 01:01:08,194

written and uploaded on my GitHub that you

can look at.

Speaker:

01:01:08,194 --> 01:01:10,194

It's pretty massive.

Speaker:

01:01:10,474 --> 01:01:15,694

In recent years, I've tried to expand it

and to do a state space model type

Speaker:

01:01:15,694 --> 01:01:16,074

version.

Speaker:

01:01:16,074 --> 01:01:19,022

So I have effects for each player for each

season over time.

Speaker:

01:01:19,022 --> 01:01:20,932

Yeah, that was exactly what I meant.

Speaker:

01:01:20,932 --> 01:01:23,822

Computationally, that gets a little bit

trickier.

Speaker:

01:01:23,822 --> 01:01:28,342

And my dataset, actually, I was able to

scrape some data for that.

Speaker:

01:01:28,342 --> 01:01:29,742

And then actually, I can't anymore.

Speaker:

01:01:29,742 --> 01:01:32,622

The NFL just stopped releasing that.

Speaker:

01:01:32,622 --> 01:01:34,492

So that work is on hold for now.

Speaker:

01:01:34,492 --> 01:01:38,842

But I probably need to find a graduate

student that can help me finish it.

Speaker:

01:01:40,162 --> 01:01:43,102

Yeah, definitely we should put that in the

show notes.

Speaker:

01:01:43,462 --> 01:01:44,642

That's super interesting.

Speaker:

01:01:44,642 --> 01:01:46,414

Your paper in the...

Speaker:

01:01:46,414 --> 01:01:49,354

and the link to the GitHub repo.

Speaker:

01:01:49,674 --> 01:01:51,554

That's for sure.

Speaker:

01:01:52,014 --> 01:02:01,094

And that makes me think a recent episode I

did, and also a recent interest of mine, I

Speaker:

01:02:01,094 --> 01:02:09,514

started contributing to that package

called Baseflow, where that's precisely

Speaker:

01:02:09,794 --> 01:02:15,074

that could be useful in your case here,

because your model structure doesn't

Speaker:

01:02:15,074 --> 01:02:15,950

change.

Speaker:

01:02:15,950 --> 01:02:19,430

If I understand correctly, because well,

once you have the model structure, it's

Speaker:

01:02:19,430 --> 01:02:20,770

kind of like a physics model.

Speaker:

01:02:20,770 --> 01:02:25,040

It's not going to change when you have new

data, but the data sets do change.

Speaker:

01:02:25,040 --> 01:02:27,230

So you have new data sets coming in.

Speaker:

01:02:27,230 --> 01:02:33,210

And so that's where probably using these

kind of inference that's called amortized

Speaker:

01:02:33,210 --> 01:02:38,550

Bayesian inference could be extremely

useful because you would basically, if the

Speaker:

01:02:38,550 --> 01:02:42,320

bottle, the computational bottleneck would

just happen once.

Speaker:

01:02:42,729 --> 01:02:46,192

That would be when you train the deep

neural network.

Speaker:

01:02:46,222 --> 01:02:50,522

to learn the posterior structure and

parameters.

Speaker:

01:02:50,582 --> 01:02:57,342

So instead of MCMC, you're using the deep

neural network to learn the posterior.

Speaker:

01:02:57,342 --> 01:03:03,442

But then once you have trained the deep

neural network, then it's like doing

Speaker:

01:03:03,442 --> 01:03:05,502

posterior inference is trivial.

Speaker:

01:03:06,182 --> 01:03:12,782

And so for that kind of models where you

have a lot of data, but the model is the

Speaker:

01:03:12,782 --> 01:03:13,882

same.

Speaker:

01:03:14,318 --> 01:03:18,378

That's a very good use case for amortized

Bayesian inference.

Speaker:

01:03:18,598 --> 01:03:21,538

So that could be something very

interesting here.

Speaker:

01:03:21,958 --> 01:03:22,058

Yeah.

Speaker:

01:03:22,058 --> 01:03:22,798

Yeah.

Speaker:

01:03:22,798 --> 01:03:23,078

Yeah.

Speaker:

01:03:23,078 --> 01:03:23,178

Yeah.

Speaker:

01:03:23,178 --> 01:03:28,518

Happy to tell you more about that

afterwards if you're interested.

Speaker:

01:03:28,518 --> 01:03:32,498

But yeah, I've started digging into that,

and that's super fun for sure.

Speaker:

01:03:32,498 --> 01:03:36,118

So yeah, and I think this is a cool use

case.

Speaker:

01:03:36,958 --> 01:03:37,308

Awesome.

Speaker:

01:03:37,308 --> 01:03:41,646

Well, I still have a few questions, but

can I?

Speaker:

01:03:41,646 --> 01:03:45,986

We are getting short on time, so can I

keep you a bit longer?

Speaker:

01:03:46,146 --> 01:03:47,426

Yeah, just a few more minutes.

Speaker:

01:03:47,426 --> 01:03:47,566

Sure.

Speaker:

01:03:47,566 --> 01:03:48,406

Yeah.

Speaker:

01:03:48,406 --> 01:03:48,786

Okay.

Speaker:

01:03:48,786 --> 01:03:49,366

Awesome.

Speaker:

01:03:49,366 --> 01:03:49,986

Yeah.

Speaker:

01:03:49,986 --> 01:03:56,626

So actually, I'd like to pick your brain

about now talking a bit more about the

Speaker:

01:03:56,626 --> 01:03:57,666

future.

Speaker:

01:03:57,666 --> 01:03:59,066

I'm curious.

Speaker:

01:03:59,066 --> 01:04:01,846

So let me fuse two questions.

Speaker:

01:04:01,846 --> 01:04:09,606

So first, I'm curious what, where do you

see the field of spots analytics heading

Speaker:

01:04:09,606 --> 01:04:11,438

in the next years?

Speaker:

01:04:11,438 --> 01:04:13,038

five to 10 years.

Speaker:

01:04:13,278 --> 01:04:19,118

And also sub question is other spots,

specific spots where you see significant

Speaker:

01:04:19,118 --> 01:04:21,958

potential for growth in analytics.

Speaker:

01:04:23,918 --> 01:04:25,378

Yeah, those are, those are good questions.

Speaker:

01:04:25,378 --> 01:04:27,478

I think they go kind of hand in hand.

Speaker:

01:04:27,478 --> 01:04:33,718

You know, I think it's hard to, it's hard

if I could predict the future, right?

Speaker:

01:04:33,718 --> 01:04:36,857

I would probably have a different job.

Speaker:

01:04:37,258 --> 01:04:39,258

I'd probably be retired.

Speaker:

01:04:39,258 --> 01:04:41,216

But.

Speaker:

01:04:41,486 --> 01:04:47,366

You know, I think a lot of the future is

going to be catching up to, you know,

Speaker:

01:04:47,366 --> 01:04:52,776

sports like soccer, American football,

hockey going to be catching up.

Speaker:

01:04:52,776 --> 01:04:56,846

And I think a lot of the growth is

actually going to be making sports

Speaker:

01:04:56,846 --> 01:05:00,866

analytics more digestible for just

everyday people.

Speaker:

01:05:00,866 --> 01:05:02,306

So the fans, right.

Speaker:

01:05:02,306 --> 01:05:03,686

And that's happened over time, right?

Speaker:

01:05:03,686 --> 01:05:07,266

You watched a broadcast of a, of a soccer

game,

Speaker:

01:05:07,854 --> 01:05:10,144

20 years ago, no one talked about expected

goals.

Speaker:

01:05:10,144 --> 01:05:12,314

Now, most broadcasts will show it.

Speaker:

01:05:12,314 --> 01:05:13,634

They might not always talk about it.

Speaker:

01:05:13,634 --> 01:05:14,534

They'll show it.

Speaker:

01:05:14,534 --> 01:05:19,234

Like I said, expected goals, it's better

than just showing the score, but there's a

Speaker:

01:05:19,234 --> 01:05:22,094

lot to be left undone.

Speaker:

01:05:22,094 --> 01:05:26,714

I think in the future, there's going to be

a lot of sports analytics that's really

Speaker:

01:05:26,714 --> 01:05:29,974

much focused on expected values to date.

Speaker:

01:05:29,974 --> 01:05:35,354

And not enough has been focused on

distributions and variance around

Speaker:

01:05:35,354 --> 01:05:36,654

estimates.

Speaker:

01:05:36,654 --> 01:05:41,334

And so I think once one place it's going

to have to end up going.

Speaker:

01:05:41,344 --> 01:05:44,104

and part of the reason is, right, we, we

talk about neural networks.

Speaker:

01:05:44,104 --> 01:05:49,014

Neural networks are very good at expected

values, with really large data sets.

Speaker:

01:05:49,014 --> 01:05:50,284

It's a lot harder, right?

Speaker:

01:05:50,284 --> 01:05:54,754

Modeling variance is a lot harder in

anything than modeling and expectations.

Speaker:

01:05:54,754 --> 01:05:57,544

So I think catching up on some of those

things.

Speaker:

01:05:57,544 --> 01:06:02,094

And I think also, like I said, taking a

step back and I think, you know, there's

Speaker:

01:06:02,094 --> 01:06:04,694

been a lot of good work that has been

done, but I think we're going to find a

Speaker:

01:06:04,694 --> 01:06:05,550

few things that.

Speaker:

01:06:05,550 --> 01:06:08,850

Hey, maybe we were a little bit

overconfident, right?

Speaker:

01:06:08,850 --> 01:06:12,450

And with everything in sports, it's always

about game theory.

Speaker:

01:06:12,450 --> 01:06:18,310

So even if something is optimal today,

that strategy is not always going to be

Speaker:

01:06:18,310 --> 01:06:20,030

optimal in the future.

Speaker:

01:06:20,030 --> 01:06:25,570

And so if you, if, you know, in basketball

for a sec, we talked about three pointers.

Speaker:

01:06:25,570 --> 01:06:28,350

Of course, three pointers are really good

right now because they have higher

Speaker:

01:06:28,350 --> 01:06:32,830

expected value, but you know, defensively

players are learning to play against three

Speaker:

01:06:32,830 --> 01:06:34,446

pointers better than they used to.

Speaker:

01:06:34,446 --> 01:06:38,606

or in American football, the numbers have

said you should pass the ball more.

Speaker:

01:06:38,606 --> 01:06:42,385

Well, now the defenses are learning how to

defend it better.

Speaker:

01:06:42,385 --> 01:06:45,216

And so running is going to be more

important than it used to be.

Speaker:

01:06:45,216 --> 01:06:45,366

Right.

Speaker:

01:06:45,366 --> 01:06:47,446

And so these things are always going to

change.

Speaker:

01:06:47,446 --> 01:06:50,626

And so in five to 10 years, I don't know

exactly what it's going to be, but I think

Speaker:

01:06:50,626 --> 01:06:55,446

in some ways, you know, you might find

some analytics person in 10 years giving

Speaker:

01:06:55,446 --> 01:07:00,166

exact opposite advice of what we're seeing

now, just because the game has evolved.

Speaker:

01:07:00,166 --> 01:07:01,606

The game has changed.

Speaker:

01:07:02,030 --> 01:07:04,250

And so now you should do something else,

right?

Speaker:

01:07:04,250 --> 01:07:05,570

To get an edge.

Speaker:

01:07:06,130 --> 01:07:09,450

and so I think the growth is in twofold.

Speaker:

01:07:09,450 --> 01:07:12,310

We're always staying on the cutting edge

of like, what's next.

Speaker:

01:07:12,310 --> 01:07:15,870

Sometimes that's going back to where you

were.

Speaker:

01:07:16,000 --> 01:07:21,490

and like I said, making the numbers more

digestible for the everyday consumer.

Speaker:

01:07:21,850 --> 01:07:24,440

you know, it's, it's one thing you and I,

we can talk about models.

Speaker:

01:07:24,440 --> 01:07:26,280

I had to do this at ESPN all the time.

Speaker:

01:07:26,280 --> 01:07:29,230

I can't talk about prior distributions on

TV.

Speaker:

01:07:29,230 --> 01:07:29,560

Right?

Speaker:

01:07:29,560 --> 01:07:31,840

So how do we explain these things?

Speaker:

01:07:31,840 --> 01:07:31,990

Right?

Speaker:

01:07:31,990 --> 01:07:35,150

And I think what's really going to be key

is over time, this has happened already,

Speaker:

01:07:35,150 --> 01:07:38,810

but it's going to keep on happening that

the analysts themselves are going to be

Speaker:

01:07:38,810 --> 01:07:41,990

much more data literate than they have

been in the past.

Speaker:

01:07:41,990 --> 01:07:45,990

Not just because they have more people

working with them or they're younger.

Speaker:

01:07:45,990 --> 01:07:51,170

Also the analysts in the future is going

to be able to use AI to do their own

Speaker:

01:07:51,170 --> 01:07:51,870

analysis.

Speaker:

01:07:51,870 --> 01:07:57,084

And that could be scary because they might

make some bad assumptions.

Speaker:

01:07:57,230 --> 01:08:01,130

but they're also going to be more data

savvy and they could load up a data set

Speaker:

01:08:01,130 --> 01:08:02,440

and use an AI tool.

Speaker:

01:08:02,440 --> 01:08:06,830

And even if they can't code to get

insights that, you know, I used to have to

Speaker:

01:08:06,830 --> 01:08:10,510

write some code to get them and now they

can just do it themselves.

Speaker:

01:08:10,510 --> 01:08:10,740

Right.

Speaker:

01:08:10,740 --> 01:08:14,150

And so that's, I think somewhere else that

teams and coaches are going to be able to

Speaker:

01:08:14,150 --> 01:08:16,490

do more analysis on their own.

Speaker:

01:08:16,490 --> 01:08:19,180

And it's not that the data people aren't,

aren't needed.

Speaker:

01:08:19,180 --> 01:08:22,530

In fact, they're going to be needed even

more to make sure that the coach isn't

Speaker:

01:08:22,530 --> 01:08:24,280

missing an assumption, right.

Speaker:

01:08:24,280 --> 01:08:26,894

That he needs to be thinking about of the

structure of the data.

Speaker:

01:08:26,894 --> 01:08:28,094

Cause he might just be, great.

Speaker:

01:08:28,094 --> 01:08:29,184

Now I can run a regression.

Speaker:

01:08:29,184 --> 01:08:29,794

I don't even know.

Speaker:

01:08:29,794 --> 01:08:32,034

I don't even need to know how to code it.

Speaker:

01:08:32,034 --> 01:08:32,414

Right.

Speaker:

01:08:32,414 --> 01:08:33,054

that's great.

Speaker:

01:08:33,054 --> 01:08:35,534

But are you thinking about this?

Speaker:

01:08:35,534 --> 01:08:35,654

Right.

Speaker:

01:08:35,654 --> 01:08:38,934

And so there's going to be a lot of

education about using some of these tools

Speaker:

01:08:38,934 --> 01:08:42,784

better and every, but everyone's going to

have their access to it.

Speaker:

01:08:42,784 --> 01:08:42,894

Right.

Speaker:

01:08:42,894 --> 01:08:45,854

It's going to be so much more accessible

in the future than it has been in the

Speaker:

01:08:45,854 --> 01:08:46,914

past.

Speaker:

01:08:46,914 --> 01:08:47,494

Yeah.

Speaker:

01:08:47,494 --> 01:08:48,174

Yeah.

Speaker:

01:08:48,174 --> 01:08:49,194

Yeah.

Speaker:

01:08:49,694 --> 01:08:50,194

yeah, for sure.

Speaker:

01:08:50,194 --> 01:08:53,144

Completely, completely agree with that.

Speaker:

01:08:53,144 --> 01:08:54,954

and that's also something I'm very

passionate about.

Speaker:

01:08:54,954 --> 01:08:56,750

That's also what these show.

Speaker:

01:08:56,750 --> 01:08:57,750

is here, right?

Speaker:

01:08:57,750 --> 01:09:05,490

It's to have the bridge between the

modelers and the known stats people be

Speaker:

01:09:05,490 --> 01:09:06,600

easier, in a way.

Speaker:

01:09:06,600 --> 01:09:09,370

And that's something I really love doing

also in my job, basically being that

Speaker:

01:09:09,370 --> 01:09:14,150

bridge between the really nitty gritty

details of the model.

Speaker:

01:09:14,150 --> 01:09:19,410

And then, OK, now that we have the model,

how do we explain to the people who are

Speaker:

01:09:19,410 --> 01:09:23,890

actually going to consume the model

results what the model can do, what it

Speaker:

01:09:23,890 --> 01:09:26,573

cannot do, and how we can?

Speaker:

01:09:26,573 --> 01:09:30,293

make decisions based on that, that

hopefully are going to be better decisions

Speaker:

01:09:30,293 --> 01:09:31,793

than we used to make.

Speaker:

01:09:31,793 --> 01:09:33,583

And also, how do we update our decisions?

Speaker:

01:09:33,583 --> 01:09:37,453

Because, well, the game changes, as you

said so well.

Speaker:

01:09:37,453 --> 01:09:42,273

So yeah, for sure, all that stuff is

absolutely crucial.

Speaker:

01:09:42,273 --> 01:09:45,433

And I like using the metaphor of the

engine and the car, right?

Speaker:

01:09:45,433 --> 01:09:48,573

It's like building the model is the engine

of the car.

Speaker:

01:09:48,573 --> 01:09:52,133

So surely, you want the best engine

possible, but you also need a very cool

Speaker:

01:09:52,133 --> 01:09:55,393

car, because otherwise, nobody's going to

want your engine.

Speaker:

01:09:55,393 --> 01:09:55,950

And so...

Speaker:

01:09:55,950 --> 01:10:00,670

like then building all the communication

around the model, the visualizations,

Speaker:

01:10:00,670 --> 01:10:05,050

things like that, extremely important

because then in the end, as you were

Speaker:

01:10:05,050 --> 01:10:10,130

saying at the beginning of the show, if

the model isn't used, well, that's not a

Speaker:

01:10:10,130 --> 01:10:11,830

very good investment.

Speaker:

01:10:13,950 --> 01:10:14,550

Yeah.

Speaker:

01:10:14,550 --> 01:10:20,030

So I would have literally, I would have a

lot more questions if they are on my list,

Speaker:

01:10:20,030 --> 01:10:24,142

but we are going to call it a show poll

because I don't want to keep you...

Speaker:

01:10:24,142 --> 01:10:28,062

three hours, you've already been very

generous with your time.

Speaker:

01:10:28,142 --> 01:10:32,122

You can come back to the show anytime if

you want to, if you have a cool new

Speaker:

01:10:32,122 --> 01:10:34,302

project you want to talk about for sure.

Speaker:

01:10:34,622 --> 01:10:38,932

Yeah, maybe we can record the French

version of the podcast sometime, you know.

Speaker:

01:10:38,932 --> 01:10:39,842

yeah, yeah.

Speaker:

01:10:39,842 --> 01:10:41,092

I'll definitely be down for that.

Speaker:

01:10:41,092 --> 01:10:44,922

You know, someone who will be very happy

is my mother.

Speaker:

01:10:44,922 --> 01:10:48,222

She's always asking me, so when are you

going to do the French version of your

Speaker:

01:10:48,222 --> 01:10:49,962

courses in your podcasting zone?

Speaker:

01:10:49,962 --> 01:10:52,722

I'm like, that's not going to happen, mom.

Speaker:

01:10:54,078 --> 01:10:58,978

Maybe that's what moms are for though.

Speaker:

01:10:59,138 --> 01:11:00,398

Exactly.

Speaker:

01:11:03,178 --> 01:11:07,438

Before letting you go, Paul, I'm going to

ask you the last two questions.

Speaker:

01:11:07,438 --> 01:11:11,818

I ask every guest at the end of the show

because it's a Beijing show, so what

Speaker:

01:11:11,818 --> 01:11:15,878

counts is not the individual point

estimate, but the distribution of the

Speaker:

01:11:15,878 --> 01:11:17,018

responses.

Speaker:

01:11:17,498 --> 01:11:21,838

First question, if you had unlimited time

and resources, which problem?

Speaker:

01:11:21,838 --> 01:11:23,332

would you try to solve?

Speaker:

01:11:29,838 --> 01:11:30,058

Good.

Speaker:

01:11:30,058 --> 01:11:30,708

That's a good question.

Speaker:

01:11:30,708 --> 01:11:34,478

You sent me this ahead of time and I spent

a couple seconds and I was like, man, I

Speaker:

01:11:34,478 --> 01:11:35,438

don't know.

Speaker:

01:11:35,438 --> 01:11:37,848

But I, it's tough.

Speaker:

01:11:37,848 --> 01:11:39,458

There's so many questions in sports.

Speaker:

01:11:39,458 --> 01:11:40,218

Yeah.

Speaker:

01:11:40,218 --> 01:11:41,038

I know.

Speaker:

01:11:41,038 --> 01:11:48,658

I, I mean, my, one of my passions is

American football and I just keep going

Speaker:

01:11:48,658 --> 01:11:49,358

back.

Speaker:

01:11:49,358 --> 01:11:53,338

So I could tell, I love American football

and I love soccer, international football.

Speaker:

01:11:53,338 --> 01:11:53,738

Right.

Speaker:

01:11:53,738 --> 01:11:57,582

And both of those games, understanding.

Speaker:

01:11:57,582 --> 01:12:02,082

There's certain positions that are just

really hard to understand how valuable

Speaker:

01:12:02,082 --> 01:12:03,162

they are.

Speaker:

01:12:03,162 --> 01:12:05,482

And so in soccer, it's like the midfield.

Speaker:

01:12:05,482 --> 01:12:09,102

It's we know you need a good midfielder,

but how do you measure that?

Speaker:

01:12:09,102 --> 01:12:10,582

That's a really hard problem.

Speaker:

01:12:10,582 --> 01:12:13,522

And in football, there's a lot of

positions in American football.

Speaker:

01:12:13,522 --> 01:12:14,852

There's a lot of positions like that as

well.

Speaker:

01:12:14,852 --> 01:12:16,352

So I probably go somewhere along those.

Speaker:

01:12:16,352 --> 01:12:20,942

Like I want to, I want to discover and

measure the value in these really hard to

Speaker:

01:12:20,942 --> 01:12:25,202

measure, traits and values and these two

sports.

Speaker:

01:12:25,202 --> 01:12:26,902

Yeah.

Speaker:

01:12:27,182 --> 01:12:28,862

Yeah, I definitely understand.

Speaker:

01:12:29,582 --> 01:12:35,901

The battle for the middle is extremely

important always in soccer.

Speaker:

01:12:35,962 --> 01:12:42,102

And if you look at all the teams which win

the Champions League, so the Holy Grail,

Speaker:

01:12:42,102 --> 01:12:48,262

like the Super Bowl of the soccer world,

almost all the time they have an amazing

Speaker:

01:12:48,262 --> 01:12:54,302

and impressive pair or three players as

midfielders.

Speaker:

01:12:54,302 --> 01:12:56,222

And that's like a sine qua non.

Speaker:

01:12:56,222 --> 01:12:57,262

But...

Speaker:

01:12:57,262 --> 01:13:03,062

As you were saying, it's extremely hard to

come up with a metric that's going to not

Speaker:

01:13:03,062 --> 01:13:08,982

only explain why the midfielders are good,

but also help you constantly choose

Speaker:

01:13:08,982 --> 01:13:12,562

midfielders that will increase your

probability of winning the Champions

Speaker:

01:13:12,562 --> 01:13:13,062

League.

Speaker:

01:13:13,062 --> 01:13:17,702

And I'm seeing that as a very frustrated

Paris fan because that's been years since

Speaker:

01:13:17,702 --> 01:13:21,622

Thiago Mota basically retired that we're

looking for a number six.

Speaker:

01:13:21,622 --> 01:13:25,562

So the play, the midfielder just before

the defense and we're still looking for

Speaker:

01:13:25,562 --> 01:13:26,722

him.

Speaker:

01:13:26,842 --> 01:13:27,576

Yeah.

Speaker:

01:13:28,270 --> 01:13:31,970

So please, Paul, let me know when you're

done with that.

Speaker:

01:13:31,990 --> 01:13:32,510

Yeah.

Speaker:

01:13:32,510 --> 01:13:36,980

Well, unfortunately, there's several

really good French midfielders.

Speaker:

01:13:36,980 --> 01:13:38,930

They just don't play for PSG.

Speaker:

01:13:38,930 --> 01:13:40,790

I know.

Speaker:

01:13:40,790 --> 01:13:41,250

I know.

Speaker:

01:13:41,250 --> 01:13:43,330

Not a lot of French players stay in

France.

Speaker:

01:13:43,330 --> 01:13:46,430

That's why I'm telling you, we need a

European wide league.

Speaker:

01:13:46,430 --> 01:13:51,670

Many more players would stay in France and

play for PSG, I guess.

Speaker:

01:13:52,090 --> 01:13:56,750

And second question, if you could have

dinner with any great scientific mind.

Speaker:

01:13:56,750 --> 01:14:00,110

dead, alive or fictional, who would it be?

Speaker:

01:14:00,110 --> 01:14:01,030

Fictional?

Speaker:

01:14:01,030 --> 01:14:03,880

I haven't really thought about fictional

scientific minds.

Speaker:

01:14:03,880 --> 01:14:08,470

That is a good question.

Speaker:

01:14:10,650 --> 01:14:11,716

Geez.

Speaker:

01:14:17,038 --> 01:14:18,298

Man.

Speaker:

01:14:19,238 --> 01:14:23,418

Well, I mean, I thought you were going to

answer very fast.

Speaker:

01:14:23,418 --> 01:14:27,018

Actually, that one, I thought you were

going to answer Bill James like super

Speaker:

01:14:27,018 --> 01:14:27,598

fast.

Speaker:

01:14:27,598 --> 01:14:28,518

Bill James.

Speaker:

01:14:28,518 --> 01:14:28,838

Yeah.

Speaker:

01:14:28,838 --> 01:14:30,538

Well, I've met Bill James.

Speaker:

01:14:30,538 --> 01:14:31,038

So, okay.

Speaker:

01:14:31,038 --> 01:14:34,478

So I have dinner with him, but I have met

him.

Speaker:

01:14:34,478 --> 01:14:40,558

I'll go a little, how liberal are you with

the word scientific mind here?

Speaker:

01:14:40,558 --> 01:14:41,118

Yeah.

Speaker:

01:14:41,118 --> 01:14:46,278

So I think scientific mind, I think

Galileo, I think Newton, I think Einstein,

Speaker:

01:14:46,278 --> 01:14:46,518

right?

Speaker:

01:14:46,518 --> 01:14:47,278

Like,

Speaker:

01:14:47,278 --> 01:14:54,368

You know, those are all, but I'm sure from

the sports world, from the sports world,

Speaker:

01:14:54,368 --> 01:15:00,458

there is a former football player that

very few people have ever heard of and his

Speaker:

01:15:00,458 --> 01:15:01,838

name is Virgil Carter.

Speaker:

01:15:01,838 --> 01:15:07,518

And the reason why I love him, he played

in the seventies is that he wrote a paper

Speaker:

01:15:07,518 --> 01:15:11,878

about expected points in football while he

was playing in the NFL.

Speaker:

01:15:11,958 --> 01:15:15,566

And it was sort of the first sports

analytics.

Speaker:

01:15:15,566 --> 01:15:20,586

ever done in American football and he was

a player in American football at the same

Speaker:

01:15:20,586 --> 01:15:21,326

time.

Speaker:

01:15:21,326 --> 01:15:22,956

So very, not very well known.

Speaker:

01:15:22,956 --> 01:15:23,826

He's still alive.

Speaker:

01:15:23,826 --> 01:15:26,846

I don't know him at all, but he would be a

really cool person.

Speaker:

01:15:26,846 --> 01:15:31,846

If I go like classical, scientific,

scientific minds, I would, I would

Speaker:

01:15:31,846 --> 01:15:38,086

probably, maybe Gauss like, Hey, this

distribution that has your name is like

Speaker:

01:15:38,086 --> 01:15:41,006

used everywhere and it's very useful.

Speaker:

01:15:41,006 --> 01:15:43,246

So I probably, I would stick with him.

Speaker:

01:15:43,466 --> 01:15:45,650

Normal distributions.

Speaker:

01:15:45,742 --> 01:15:48,682

counseling distributions, like the rule of

world nowadays.

Speaker:

01:15:48,682 --> 01:15:52,402

So I'd probably stick with that if I were

to go traditional scientific mind.

Speaker:

01:15:52,402 --> 01:15:52,582

Yeah.

Speaker:

01:15:52,582 --> 01:15:53,242

Yeah.

Speaker:

01:15:53,242 --> 01:15:54,212

Now good choices.

Speaker:

01:15:54,212 --> 01:15:55,322

Good choices.

Speaker:

01:15:55,322 --> 01:15:58,072

I am amazed about that Virgil Carter

story.

Speaker:

01:15:58,072 --> 01:15:59,642

That's so amazing.

Speaker:

01:15:59,642 --> 01:16:00,042

Yeah.

Speaker:

01:16:00,042 --> 01:16:05,682

So if anybody knows Virgil Carter, please

contact us and we'll try to get that

Speaker:

01:16:05,682 --> 01:16:06,642

dinner for Paul.

Speaker:

01:16:06,642 --> 01:16:11,802

If you do that, I'll definitely be here to

grab the dinner and have a conversation

Speaker:

01:16:11,802 --> 01:16:15,062

with Virgil because like having someone

like that on the show would be absolutely

Speaker:

01:16:15,062 --> 01:16:15,694

amazing.

Speaker:

01:16:15,694 --> 01:16:16,774

I love that story.

Speaker:

01:16:16,774 --> 01:16:17,874

That's so amazing.

Speaker:

01:16:17,874 --> 01:16:21,234

It's like, you know, the myth of the

philosopher king.

Speaker:

01:16:21,234 --> 01:16:24,034

Well, here is like the myth of the

scientist player.

Speaker:

01:16:24,034 --> 01:16:26,253

It's just like, I love that.

Speaker:

01:16:26,474 --> 01:16:27,394

Yeah.

Speaker:

01:16:27,394 --> 01:16:28,234

that's fantastic.

Speaker:

01:16:28,234 --> 01:16:28,974

Damn.

Speaker:

01:16:28,974 --> 01:16:30,514

Thanks a lot, Paul.

Speaker:

01:16:30,614 --> 01:16:32,014

Let's call it a show.

Speaker:

01:16:32,014 --> 01:16:33,294

Thanks for having me.

Speaker:

01:16:33,294 --> 01:16:35,694

Yeah, that was amazing.

Speaker:

01:16:35,874 --> 01:16:41,034

As usual, we'll put resources and a link

to your website in the show notes for

Speaker:

01:16:41,034 --> 01:16:42,494

those who want to dig deeper.

Speaker:

01:16:42,494 --> 01:16:45,678

Thanks again, Paul, for taking the time

and being on this show.

Speaker:

01:16:46,126 --> 01:16:48,614

Thanks once again, I really enjoyed it.

Speaker:

01:16:52,558 --> 01:16:56,258

This has been another episode of Learning

Bayesian Statistics.

Speaker:

01:16:56,258 --> 01:17:01,218

Be sure to rate, review, and follow the

show on your favorite podcatcher, and

Speaker:

01:17:01,218 --> 01:17:06,138

visit learnbaystats .com for more

resources about today's topics, as well as

Speaker:

01:17:06,138 --> 01:17:10,858

access to more episodes to help you reach

true Bayesian state of mind.

Speaker:

01:17:10,858 --> 01:17:12,798

That's learnbaystats .com.

Speaker:

01:17:12,798 --> 01:17:17,618

Our theme music is Good Bayesian by Baba

Brinkman, fit MC Lass and Megharam.

Speaker:

01:17:17,618 --> 01:17:20,798

Check out his awesome work at bababrinkman

.com.

Speaker:

01:17:20,798 --> 01:17:21,966

I'm your host.

Speaker:

01:17:21,966 --> 01:17:22,966

Alex and Dora.

Speaker:

01:17:22,966 --> 01:17:27,226

You can follow me on Twitter at Alex

underscore and Dora like the country.

Speaker:

01:17:27,226 --> 01:17:32,286

You can support the show and unlock

exclusive benefits by visiting patreon

Speaker:

01:17:32,286 --> 01:17:34,466

.com slash LearnBasedDance.

Speaker:

01:17:34,466 --> 01:17:36,886

Thank you so much for listening and for

your support.

Speaker:

01:17:36,886 --> 01:17:42,806

You're truly a good Bayesian change your

predictions after taking information and

Speaker:

01:17:42,806 --> 01:17:45,946

if you think and I'll be less than

amazing.

Speaker:

01:17:45,946 --> 01:17:49,262

Let's adjust those expectations.

Speaker:

01:17:49,262 --> 01:17:54,662

Let me show you how to be a good Bayesian

Change calculations after taking fresh

Speaker:

01:17:54,662 --> 01:18:00,702

data in Those predictions that your brain

is making Let's get them on a solid

Speaker:

01:18:00,702 --> 01:18:02,502

foundation

Previous post
Next post