Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways
- Convincing non-stats stakeholders in sports analytics can be challenging, but building trust and confirming their prior beliefs can help in gaining acceptance.
- Combining subjective beliefs with objective data in Bayesian analysis leads to more accurate forecasts.
- The availability of massive data sets has revolutionized sports analytics, allowing for more complex and accurate models.
- Sports analytics models should consider factors like rest, travel, and altitude to capture the full picture of team performance.
- The impact of budget on team performance in American sports and the use of plus-minus models in basketball and American football are important considerations in sports analytics.
- The future of sports analytics lies in making analysis more accessible and digestible for everyday fans.
- There is a need for more focus on estimating distributions and variance around estimates in sports analytics.
- AI tools can empower analysts to do their own analysis and make better decisions, but it’s important to ensure they understand the assumptions and structure of the data.
- Measuring the value of certain positions, such as midfielders in soccer, is a challenging problem in sports analytics.
- Game theory plays a significant role in sports strategies, and optimal strategies can change over time as the game evolves.
Chapters
00:00 Introduction and Overview
09:27 The Power of Bayesian Analysis in Sports Modeling
16:28 The Revolution of Massive Data Sets in Sports Analytics
31:03 The Impact of Budget in Sports Analytics
39:35 Introduction to Sports Analytics
52:22 Plus-Minus Models in American Football
01:04:11 The Future of Sports Analytics
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.
Links from the show:
- LBS Sports Analytics playlist: https://www.youtube.com/playlist?list=PL7RjIaSLWh5kDiPVMUSyhvFaXL3NoXOe4
- Paul’s website: https://sabinanalytics.com/
- Paul on GitHub: https://github.com/sabinanalytics
- Paul on Linkedin: https://www.linkedin.com/in/rpaulsabin/
- Paul on Twitter: https://twitter.com/SabinAnalytics
- Paul on Google Scholar: https://scholar.google.com/citations?user=wAezxZ4AAAAJ&hl=en
- Soccer Power Ratings & Projections: https://sabinanalytics.com/ratings/soccer/
- Estimating player value in American football using plus–minus models: https://www.degruyter.com/document/doi/10.1515/jqas-2020-0033/html
- World Football R Package: https://github.com/JaseZiv/worldfootballR
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
Folks, you may know it by now, I am a huge
sports fan.
2
:So needless to say that this episode was
like being in a candy store for me.
3
:Well, more appropriately, in a chocolate
store.
4
:Paul Sabin is so knowledgeable that this
conversation was an absolute blast for me.
5
:In it, Paul discusses his experience with
non -stats stakeholders in sports
6
:analytics and the challenges of convincing
them to adopt evidence -based decisions.
7
:He also explains his soccer power ratings
and projections model, which uses a
8
:Bayesian approach and expected goals, as
well as the importance of understanding
9
:player value in difficult to measure
positions and the need for more accessible
10
:and digestible sports analytics for fans.
11
:We also touch on the impact of budget on
team performance in American sports and
12
:the use of plus -minus models in
basketball and American football.
13
:Paul is a senior fellow at the Wharton
Sports Analytics and Business Initiative
14
:and I like truer
15
:in the Department of Statistics and Data
Science at the Wharton School of the
16
:University of Pennsylvania.
17
:He has spent his entire career as a sports
analytics professional, teaching and
18
:leading sports analytics research
projects.
19
:This is Learning Visions Statistics,
,:
20
:Welcome to Learning Bayesian Statistics, a
podcast about Bayesian inference, the
21
:methods, the projects, and the people who
make it possible.
22
:I'm your host, Alex Andorra.
23
:You can follow me on Twitter at Alex
underscore Andorra, like the country, for
24
:any info about the show.
25
:LearnBayStats .com is Laplace to me.
26
:Show notes.
27
:becoming a corporate sponsor, unlocking
Bayesian Merge, supporting the show on
28
:Patreon, everything is in there.
29
:That's LearnBasedStats .com.
30
:If you're interested in one -on -one
mentorship, online courses, or statistical
31
:consulting, feel free to reach out and
book a call at topmate .io slash alex
32
:underscore and dora.
33
:See you around, folks, and best Bayesian
wishes to you all.
34
:Welcome to Learning Vagin Statistics.
35
:a full conversation in French as we just
had before recording.
36
:Well done.
37
:It used to be though.
38
:Go back two to three hundred years.
39
:Maybe you just don't go to Africa enough.
40
:That's where French is spoken a lot now
too.
41
:Exactly.
42
:But other than that, you can see French
used to be a very international language
43
:because in my travels, almost all the time
people tell me, yeah, I studied French in
44
:high school.
45
:And the only thing they can say is just a
few words.
46
:Which is normal, like if you don't use it,
right?
47
:But yeah, you can see that because French
is still, or was still taught in high
48
:school and now less and less.
49
:So yeah, so well done Paul for that.
50
:I know, I don't think French is an easy
language to learn.
51
:What has been your experience?
52
:I'm actually very curious.
53
:You know, it's hard to say, so this is a
statistics pod or data science podcast.
54
:So I guess I can't really, I can't really
compare it to anything else.
55
:That's the only other language I've
learned besides my native English.
56
:So, you know, I guess, you know, one
sample size for me, I took it in high
57
:school as well.
58
:I hated it.
59
:I had, so, you know, coming from America,
you know, so the reason I chose, you know,
60
:seventh grade is when I had to choose
whether I was taking French or Spanish.
61
:And I'm the youngest of four kids in my
family growing up.
62
:And my older siblings told me that the
Spanish teacher was really mean.
63
:And that's originally why I took took
French.
64
:and then I took it for the required two to
three years.
65
:And then I was done.
66
:I had in high school, I had this teacher
from Belgium and I still remember her
67
:name, Madame Vendon Plus, and I couldn't
stand her, but come, come to find out
68
:looking back in life that she was actually
a really nice person.
69
:She was just Belgian.
70
:And the cultural, you know, like Americans
think they're the best and the French
71
:language in Europe people also think
they're the best because they ruled the
72
:and:like they've ruled the world for the last
73
:100 years.
74
:And so when you get into a room together
and you think both of your cultures are
75
:superior, you know, that doesn't go well
together.
76
:But actually, so after that, I didn't...
77
:speak French at all.
78
:And then I did church service for my
church for two years and I lived in
79
:Montreal, I lived in Quebec, not actually
in the city, I lived in a lot of rural
80
:small town.
81
:And so I studied French really hard.
82
:I had to learn the very strong Quebecois
accent.
83
:And then when I went back to school, it's
when I like really honed in my French.
84
:I was very conversational, could speak
very fluently in Quebec, but then, you
85
:know, I had to learn the grammar a little
bit more.
86
:in depth.
87
:So then I studied French as well at
university as well.
88
:So, you know, immersing yourself and the
actually like learning languages because
89
:when I learned it in school, it didn't
never made sense to me.
90
:But when I studied it on my own and I
studied conjugation and all these things,
91
:it became kind of like a math problem.
92
:And so when I would speak a sentence in my
head, I'd always be like, I need a
93
:subject.
94
:I need to conjugate the verb.
95
:And then I need to say like what I'm, you
know, just
96
:do an adverb or an adjective after it.
97
:And like it made sense in my head, but
that's not how I was taught in school.
98
:I was taught, I had to memorize all these
words, like everything in the kitchen.
99
:How do you say dishwasher?
100
:How do you say refrigerator?
101
:How do you say fork?
102
:How do you say spoon?
103
:I couldn't learn like that, but at like
living and like thinking about French as a
104
:math equation, it made sense in my head
and I was able to pick it up.
105
:You know, sure.
106
:I made tons of mistakes and embarrassed
myself, but it wasn't too bad.
107
:And that's how you learn.
108
:Yeah.
109
:So I'm guessing.
110
:Like from that answer, I'm guessing people
already know why I invited you on the
111
:podcast.
112
:Very nerdy answer, your put languages,
that's perfect.
113
:Thanks a lot.
114
:And yeah, I completely relate actually.
115
:I learned English and German in high
school and yeah, kind of the same.
116
:I always hated formal language learning.
117
:And like in the end I learned these
languages and Spanish that was the same
118
:and Italian that was the same, just going
to the country basically.
119
:And yeah, as you were saying, I think also
what it adds is you've got skin in the
120
:game.
121
:You're in the country, you're having a
conversation with someone.
122
:If you're not able to talk, you look
extremely stupid.
123
:So it's a very good incentive for the
brain to step up and learn.
124
:And that's really awesome.
125
:And then when you are in the situation
that you...
126
:don't know what to say, you remember that.
127
:And then when you learn, this is what I
should have said, it sticks with you
128
:because it has an emotional attachment to
it.
129
:Yeah.
130
:Yeah.
131
:No, exactly.
132
:And I mean, and that's going to be a good
segue to my first question to you, but I
133
:think it's also one of the situations in
life, where you can really, feel and see
134
:your brain learning.
135
:So that's why I also really love learning
new languages and going to countries to do
136
:that because.
137
:Like you arrive in the country, you don't
know how to say anything.
138
:And in just a few weeks, your brain starts
picking up stuff and you can really,
139
:really feel your brain doing its amazing
work that it's been like conditioned to do
140
:from years of evolution.
141
:And to me, that's just absolutely
incredible that the brain is able to do
142
:that.
143
:Even when you're like in your thirties and
beyond, you can do that.
144
:And it's just, I found that absolutely
incredible.
145
:And that's kind of like a Bayesian.
146
:neural network, you know, so I mean, see
that segue, I should definitely have a
147
:podcast.
148
:So actually talking about base.
149
:Yeah, I invited you on the podcast because
you do absolutely awesome work on sports
150
:modeling.
151
:And people know that I'm a big fan of a
lot of sports.
152
:I love modeling sports and so on.
153
:So I'm super happy to have you here.
154
:And I have a list of questions that is
embarrassingly long.
155
:But maybe can you tell us if you are
actually yourself using some basic
156
:methods, if you're familiar with those or
not?
157
:And yeah, in general, what does that look
like in your work?
158
:Yeah.
159
:So yeah, I mean, just a quick background
about myself, right?
160
:I've worked in sports, what we call sports
analytics for almost 10 years now.
161
:Out of actually, I was getting my PhD.
162
:And statistics, and I, you got, there was
this job opportunity at ESPN, you know,
163
:which is a sports broadcasting television
channel in the U S and a few other
164
:countries.
165
:And, you know, I got the job offer to work
on their sports analytics team where
166
:essentially what the team there does is
make forecasts so that, you know, they can
167
:show on TV, you know, on the bottom line,
like who's expected to win, or they can,
168
:we will run simulations on.
169
:you know, who's likely to win the
championship, you know, all throughout the
170
:season.
171
:And so, you know, you can tell stories
with that saying, you know, the team was
172
:just like the beginning of the season.
173
:No one thought they were going to be any
good, but just look how it, you know, they
174
:got better or the opposite.
175
:Like they were supposed to be really good
and everything just went wrong.
176
:And so in my field in sports modeling, I
would think actually you can't, you can't
177
:do it without being Bayesian.
178
:And so when I would interview people, I'd
always focus on, on those.
179
:So as people coming out of school,
sometimes they don't always learn Bayesian
180
:methods very well.
181
:And the reason is in sports, sample sizes
are very small and you have to make
182
:forecasts with very limited data.
183
:And the great thing about Bayesian is
statistics is that you actually have more
184
:data.
185
:You just haven't observed it.
186
:You have expertise or you have opinions,
but those opinions actually matter.
187
:And so maybe we'll get into this, but I'm
actually a very strong advocate because of
188
:my field of being a subjective Bayesian
analysis.
189
:It's okay to insert some information into
your models and it usually makes them
190
:better.
191
:Yeah.
192
:Well, awesome.
193
:couldn't have dreamt better and I have to
fully structure.
194
:I didn't know Paul was going to answer
that because that's not really, I haven't
195
:seen that in your, you know, on your
website or else,
196
:So before, while preparing the episode, I
didn't know if you were already using
197
:Bayesian methods or else.
198
:But definitely, definitely happy to hear
that.
199
:And so that people know that was not a
conspiracy.
200
:I didn't know anything that Paul was going
to say.
201
:OK, so that's awesome.
202
:So I'm an open source developer, so I'm
always very curious about the stack you're
203
:using.
204
:What are you using actually when you're
doing Bayesian analysis of a spot model?
205
:So in my career, I almost always use R and
Stan.
206
:So if I'm doing Bayes analysis, I write a
lot of Stan code.
207
:It's gotten easier with the Chat GPT.
208
:It doesn't do it all the way, right?
209
:But if it's like, hey, I want to build
this kind of model, it'll at least give me
210
:a good framework.
211
:And then I can adjust it and edit it as I
want from there.
212
:Yeah.
213
:Yeah.
214
:And I mean, for sure, you cannot go wrong
with the.
215
:with R and Stan.
216
:So yeah, definitely.
217
:And we've had the, one of the creators of
Stan, Andrew Gellman, was back on the
218
:podcast a few weeks ago.
219
:It was not released yet, but through time
travel, it's gonna have been released when
220
:your episode is out.
221
:So folks, you can go back to - Right,
because I am definitely a lesser draw than
222
:Andrew Gellman is, but that's great.
223
:No, yeah, so if people are curious about
what Andrew has been up to, lastly, it's
224
:the third time he's been on the show and
he just released a new book, Active
225
:Statistics, that I definitely recommend.
226
:It's really fun to read.
227
:It's like, it's how to teach statistics
with stories, which actually relates to
228
:something you just said, Paul, about the,
like, cool and fun way to relate
229
:statistics to...
230
:non -stats people was to be able to tell
stories about a team's probability of
231
:winning or any forecast like that.
232
:So that's definitely interesting to hear
you talk about that.
233
:And actually I'm curious because I've been
following that field of spots analytics
234
:for a few years and I've seen it
personally mature.
235
:quite a lot and evolved quite a lot when
it comes to the technology and the data
236
:availability.
237
:So I'm curious what an expert like you
think about that evolution of technology
238
:and data availability and how that changed
the landscape of Spots Analytics.
239
:Yeah, I mean, it's exploded in the last 10
to 15 years.
240
:So I mean, if people are familiar with the
book slash movie Moneyball, which is
241
:20, about 20 years, the book is about 20
years old now.
242
:The movie is about 12, 13 years old now.
243
:you know, back then in baseball, baseball
was the sport that sort of took off in
244
:sports analytics.
245
:I mean, for a couple of reasons.
246
:One, the game is very discreet.
247
:So their start and their stopping points.
248
:So you can measure.
249
:Right.
250
:Discrete events very well in baseball, but
two, like they're the only sport that
251
:actually had a really long running data
set.
252
:And that went back and they've been
keeping statistics in baseball and you can
253
:actually go back to the:ople were playing baseball in:
254
:No other sport has that.
255
:So that's, that's probably the reason why
baseball took off.
256
:but since then, you know, every sport for
a while after that, every sport had what
257
:we call play by play data, which is like,
this is what happens.
258
:Soccer had a, a version that was called
event data.
259
:So would people would.
260
:watch a game and every time someone
touched the ball or made a pass, they
261
:would mark, the ball was touched here on
the field and it was passed to there or
262
:they dribbled from here to there.
263
:So it was, they kind of were discretizing
soccer in a way to make it a similar
264
:format.
265
:But then about 10 years ago, we started
getting this player tracking data, which
266
:is the location of everybody and the ball
or the puck on the field, you know,
267
:depending on the sport, 10 to 25 times per
second.
268
:And that's drastically changed.
269
:the methodologies and things that are
used.
270
:So, I mean, Bayesian analysis was great
for this play by play data or even, you
271
:know, game by game data and measuring how,
how players or teams performed.
272
:And then now we've started getting such
huge data sets that, you know, more of the
273
:computer science world, neural networks,
things like that started becoming much
274
:more prevalent in sports analysis just
because the data sets were so massive.
275
:Not that statistics doesn't play a role.
276
:It still does.
277
:And I think.
278
:People sometimes overly rely on these
black box methods.
279
:They don't think about the implications or
the biases in the data, which are still
280
:important.
281
:But we have these huge amounts of data now
and it's just exploded to like, you know,
282
:if you want all the data in a season in
the NFL, it's like over one terabyte of
283
:locations of everybody on every field, 20,
every play of 25 times a second.
284
:It's just massive.
285
:Right.
286
:So it's, it's really changed the way
people have done things.
287
:Right.
288
:And we started going from really simple
questions to huge big questions.
289
:And the funny thing is now, I actually
think with the data being so large, people
290
:are now actually going back to answering
more simple questions.
291
:Like we're not trying to measure
everything all at once.
292
:Let's try to measure very specific things
that we weren't able to measure before.
293
:Hmm.
294
:Yeah, that is definitely interesting.
295
:and is that so first.
296
:Is that availability of data, massive
availability of data, the case in all the
297
:sports industry?
298
:Or is it more, well, the most historical
ones, as you were saying, maybe more
299
:baseball.
300
:I know the data set are more massive there
and maybe other sports like soccer are
301
:less prevalent, the data set are less
prevalent, less massive, or is that a
302
:uniform trend?
303
:First question.
304
:And then second question is,
305
:Where does that data leave?
306
:Is that mostly open source or is that
still quite close source data?
307
:Yeah.
308
:So I mean, baseball is usually like the
cutting edge of everything because they
309
:had a head start.
310
:And basketball and then like kind of
American football, international soccer
311
:football and hockey kind of trail behind.
312
:But the data sets now in all those sports
are very massive.
313
:Hockey just got
314
:The NHL just got their player puck
tracking data just a couple of years ago.
315
:Now baseball and basketball have moved on
beyond just knowing where players are on
316
:the field.
317
:They actually have data of what's called
pose data.
318
:So they know where different joints and
their arms and the legs are of every
319
:player on the field or on the court.
320
:So that data is massive.
321
:It's massive everywhere.
322
:There's companies that are trying to
collect new data based on
323
:video, so they're using computer vision
algorithms to do that, but largely to
324
:answer your second question.
325
:This is not open source data.
326
:So the old school data, the play by play
data is open source.
327
:You can find that on every sport pretty
much via an open source mechanism now.
328
:But this huge, these huge data sets of the
tracking of the players, you know, 10 to
329
:25 times per second.
330
:It's usually all closed source.
331
:There are a few.
332
:releases of that here and there, you know,
the NFL does a competition where they
333
:release some of that data each year, like
a very small set.
334
:and a few other leagues have done
something similar as well.
335
:If they know that's, that's kind of gives
you a taste.
336
:if you have money, there are companies
that try to create that data themselves
337
:and they'll sell it to you.
338
:But you know, that's usually pretty
expensive for an individual person to buy.
339
:So again, just that.
340
:I see.
341
:Okay.
342
:Yeah, interesting.
343
:Definitely.
344
:Because like data is kind of oil in our
industry, right?
345
:So it's definitely interesting to know
what's the state of the supply of oil in a
346
:way.
347
:Maybe for people who are less versed in in
sports modeling, can you give us an
348
:example of how analytical insights have
349
:directly influenced team strategy or
player selection in one of your consulting
350
:roles.
351
:Yeah.
352
:So I mean, I'll just kind of talk broadly
at first.
353
:I mean, so sometimes it's just the most
basic things, right?
354
:So like in basketball, people shoot three
pointers more because all they did is
355
:figured out the expected value was larger
for three point shot than it was for most
356
:two point shots.
357
:Not, not those layups and the dunks,
right?
358
:Those are very high percentages.
359
:So the expected value of a, of a high
percentage times two is, you know, is, is
360
:pretty good.
361
:But then even if.
362
:The percentage drops off a lot when you
multiply it by three to get the expected
363
:value of a three point shot.
364
:You know, it's also pretty good.
365
:So that means basketball has changed
drastically because of that.
366
:and in my roles, I guess, you know, I
think in a lot of sports, there's just
367
:been a lot of open questions.
368
:People kind of move one way.
369
:And then I think actually, I think the
sports analysis does really good job of
370
:tackling very easy problems first.
371
:But then I think there's actually a
tendency for the analysts themselves to be
372
:overconfident in their analysis and
they're not factoring in all of the
373
:sources of variation that might be there.
374
:And something I'm also very curious about
it is what's your experience with non
375
:-stats stakeholders?
376
:So coaches, scouts, players, how do they
typically respond to the analytics and the
377
:insights you provide and other...
378
:differences in reception across sports,
maybe across roles.
379
:Yeah.
380
:So, I mean, it really does vary as in all
things, there's variance.
381
:There are some typically younger, you
know, coaches or scouts that are a little
382
:bit more receptive than people who have
been doing something for a long time.
383
:And I think that's just human nature.
384
:You're used to doing things a certain way.
385
:You don't like.
386
:You know, to stereotype, you don't like
some young person coming and telling you
387
:how to do your job.
388
:Right.
389
:So you have to be really careful about
that.
390
:and the, and the funny thing is, you know,
everything that I have learned or, you
391
:know, I believe in, in terms of making
data driven decisions and don't
392
:overestimate based on small sample sizes
goes out the window when I'm trying to
393
:convince a stakeholder of something.
394
:So for example,
395
:If I have a model and I want them to use
it, and I think it's going to help them.
396
:Of course, I've done the analysis to say,
you know, what over the long run, how it
397
:would improve our efficiency, or if we
make a decision in this way, it'd be
398
:better process, et cetera.
399
:I've done that analysis and I've done it
over a larger sample size.
400
:But when I, when I tell them what they
want to know is they want confirmation
401
:bias, right?
402
:They love confirming their beliefs.
403
:So in order to get them to, agree with
what you're saying, it, this works so much
404
:more better than saying, you know, out of
the thousand players that I did this in,
405
:you know, you only were correct 60 % of
the time, but my model would have been
406
:correct 70%.
407
:Like they don't want to hear that.
408
:They essentially say, well, my model, you
know, you love this player.
409
:So does my model.
410
:I find the one guy, even if it's literally
only one person, they're like, yeah.
411
:Like, if your model can.
412
:If your model can see that, then it must
be doing something right.
413
:And then it's like, then they start to
trust you a little bit.
414
:And over time you give them little pieces,
little crumbs of a cookie that they can
415
:help, you know, get confidence in.
416
:And then, you know, then is when you share
with them, okay, well, but it's also
417
:suggesting this, which is different than
what you've been doing in the past.
418
:Right?
419
:So you don't ever start with, you know,
trust me.
420
:because you might be wrong, because you're
a human.
421
:I mean, like, you know, humans always make
mistakes, but we usually don't think we
422
:make as many mistakes as we do.
423
:And so I found just over time is if you
get people to trust you by confirming
424
:their prior held beliefs, right?
425
:It's another Bayesian concepts.
426
:If you can confirm their prior beliefs,
they're going to accept your future
427
:recommendations or future things that the
model might suggest more than if you start
428
:with.
429
:the differences upfront.
430
:And so that's like a little bit of human
bias, right?
431
:That you have just learned over time.
432
:And some things are just really hard for
people to accept, but over time, if you
433
:get people to trust you and you build that
relationship, there's a lot of human
434
:elements here and then they trust your
work by confirming their prior held
435
:beliefs, then they'll trust you and open
up a little bit more to being a little bit
436
:more open -minded about other things as
well.
437
:Because then like, okay, well, I know
you're not an idiot.
438
:Like you could speak my language some.
439
:now I might be more open to learning a
little bit of your language.
440
:And that's just sort of a human
relationship thing that you have to always
441
:work on.
442
:Yeah, that is very interesting.
443
:And I'm very, yeah, I'm always very
interested to hear about that because I
444
:also face clients daily and have to
explain models to them.
445
:And so as you were saying, that definitely
varies a lot in interactions to the model.
446
:But that negative wisdom of maybe
indulging the...
447
:the confirmation bias at the beginning and
then slowly go towards a bit more of
448
:speaking the truth.
449
:It's very interesting.
450
:I had not thought of that, but that's
yeah, definitely I can see that being a
451
:valid strategy when you also are in front
of someone who doesn't really understand
452
:the value of the modeling, I would say.
453
:Whereas when I
454
:encounter clients who are already
convinced of what the models can do for
455
:them.
456
:They are usually looking for contradicting
what they already think.
457
:And that's when they find the model
interesting.
458
:So I find that really, really cool to see.
459
:The contradictions are really where
there's value, right?
460
:But there's no value in a model if no one
uses it, right?
461
:Even if the model is really good, if no
one uses it, it has zero value.
462
:If they use it, the contradictions are
valuable if they're right, correct?
463
:So in soccer analysis, you know, I've
spent my career doing lots of different
464
:sports, but there's this sort of, this
applies to every sport.
465
:In basketball, we can call it the LeBron
test and soccer, we'll call it the messy
466
:test, where it's essentially, if you build
a model and it's trying to evaluate
467
:players and messy is not like one of the
top players in your model, then.
468
:You're not going to share it with anybody
because no one's going to believe you.
469
:Right.
470
:That's like the first thing everyone does
is like, okay, well is messy up top.
471
:And if like, if messy is near the top,
then like people, at least they'll listen
472
:to you a little bit longer.
473
:Right.
474
:But they're not going to listen to you at
all.
475
:If you're like, yeah, messy is an okay
player.
476
:Right.
477
:Like I don't care what your model says.
478
:Right.
479
:That's wrong.
480
:Right.
481
:That's that, that's what people believe.
482
:So it's like a little bit of like, I need
to feed you like, no, no, no.
483
:Like I'm taking a different approach than
what you do, but you know, my approach
484
:also thinks that messy is the best.
485
:Right.
486
:And then I'm like, it's okay.
487
:You know,
488
:Okay, yeah, we agree.
489
:He is really good.
490
:Yeah, it's like a sniff test, right?
491
:And it's like, in a way, it's like, well,
that's a strong prior.
492
:And it's like, it's saying, well, I have a
very strong prior.
493
:That message is really good.
494
:To convince me, otherwise you're going to
need really, really good data.
495
:It's like, well, the earth is very
probably somewhat round.
496
:It's going to be very hard for you to...
497
:move that prior from me and telling me
it's not, in a way.
498
:Yeah.
499
:And in sports, people have really strong
priors, right?
500
:So, you know, those sniff tests do really
matter.
501
:And as a modeler, even for myself, like,
I'm a human.
502
:So like, I do the same thing.
503
:If I'm building a model, I always want to
see the results.
504
:And it's like, I don't look at the median,
like I do, but I don't look at who the
505
:median result is in my model half the
time.
506
:I usually look at the best and I look at
the worst.
507
:And if I don't understand it, then I'm
like, maybe my model is doing something
508
:wrong.
509
:And I'm all like, gonna, I'm going to dive
in a little bit more.
510
:If it like confirms my prior held beliefs,
I'm like, it's probably correct.
511
:Right.
512
:And even as a modeler, right, you have to
be careful of that.
513
:But at the same time in sports, you know,
it's like I said, subjective analysis can
514
:be helpful.
515
:It's because people's subjective and I'm
like, there's wisdom.
516
:People coaches have been playing a game
for.
517
:20, 30, or coaching a game for 20 or 30
years to think that they don't have
518
:something to offer a model is kind of
crazy in my opinion.
519
:They might have biases and of course they
do, but their information that they can
520
:provide is useful.
521
:Yeah, definitely.
522
:And that's where we go back to what we
were talking about at the beginning in the
523
:value of Bayesian inference in that
context.
524
:Because if you can leverage that deep and
hard -hearned knowledge,
525
:from the coaches, from the scouts, and add
that to your model, it's like getting the
526
:best of both worlds.
527
:And that can make your analysis extremely
powerful and useful, as you were saying.
528
:Yeah.
529
:And people have done studies like this,
I've done studies like this.
530
:If you build a model just on the data and
ignore the human element, right?
531
:Or if you build a model just on human and
scouting analysis and ignore the other
532
:data.
533
:Right.
534
:Neither one of those is going to do as
well as when you combine both.
535
:And that's really, that's what, you know,
that's Bayesian analysis is you're
536
:combining subjective belief with objective
data and then making forecasts based on
537
:them.
538
:And we know that if you have priors that
are not really, really bad, a subjective
539
:Bayesian forecast is going to have smaller
error than a data, you know, what we call
540
:maximum likelihood forecast, right.
541
:And stats terms, right.
542
:Or.
543
:You know, just the human one, just the no
data, but, you know, feelings forecast as
544
:well, right?
545
:So there's the combination of the two,
always does better.
546
:Yeah.
547
:Yeah.
548
:Yeah.
549
:Preaching, preaching to the choir here for
sure.
550
:And actually, I think that's a good time
now in the episode to get a bit more
551
:nerdy, if we can, because I've seen you,
so you've obviously worked extensively
552
:with.
553
:soccer analytics and you have an
interesting soccer power ratings and
554
:projections on your website that I'm gonna
link to in the show notes but can you tell
555
:us about it and what makes these
projections unique in your perspective in
556
:evaluating team and player performance and
don't be afraid to dig into the nerdy
557
:details because...
558
:My audience definitely liked that.
559
:Yes.
560
:Sure.
561
:I'll dig in.
562
:So what's on my website is...
563
:Sorry if you can hear my dog there.
564
:What's on my website is perhaps the most
simple power ratings forecast that I've
565
:ever done.
566
:So I say that, not that it's like stupid
or anything.
567
:So when I was at ESPN, I build power
ratings in American football, both
568
:professional and collegiate, and
basketball, professional and collegiate.
569
:and hockey, I mean, like almost every
sport, right?
570
:So what's on my website, I'll explain the
model very simply is it's a Bayesian model
571
:where you have an effect for each team,
right?
572
:And the response variable is the expected
goals for each team.
573
:So usually when we do a power ratings and
we're trying to estimate for a team, you
574
:know, there's two sort of.
575
:things that we're trying to estimate their
offensive ability and their defensive
576
:ability and then you assume essentially
that their overall team ability, you know,
577
:if it's a linear model, right is the
combination of their offense and their
578
:defensive abilities.
579
:Okay, so you so essentially in each match,
right?
580
:You have essentially two rows of data
where you have the expected goals for the
581
:one team and then the expected goals for
the other and the reason we use expected
582
:goals, although I actually have
583
:lot of issues with the expected goals.
584
:They are a better indicator of how, how
good the team performed on offense than
585
:just the raw number of goals.
586
:And right.
587
:I don't need to go into details, right?
588
:It's essentially a, it's an expected value
as opposed to an observation from a
589
:Poisson distribution, which soccer scores
roughly, roughly reflect a Poisson or
590
:pretty close to a Poisson distribution,
right?
591
:The expected goals is that expectation.
592
:And so essentially I have a hierarchical
Bayesian model where I actually.
593
:I actually do a few things.
594
:So I actually assume the expected goals is
the mean of a Poisson distribution.
595
:The observed goals is the actual outcome
of the Poisson distribution.
596
:And then I fit a linear model essentially
where I look, okay, I have team A was on
597
:offense, team B was the opponent.
598
:And this was team A's expected goals.
599
:And I'm essentially fitting a regression
model, right?
600
:A Bayesian regression model where I have
individual team effects.
601
:I have a prior on each team.
602
:each team's offense and each team's
defense.
603
:And that prior, you know, rough, I don't
have to get too crazy.
604
:You know, I just use a normal distribution
and, and, you know, sometimes I actually,
605
:when I code in Stan, I actually like
using, distribution was a little, a little
606
:bit thicker tails.
607
:But I think for this model, I was just
trying to go simple, normal distribution
608
:prior with a mean, you know, for my
expected, essentially each team's expected
609
:goals per game, on offense versus.
610
:Defense right and the defensive value I
usually use I usually do the subtraction
611
:So it's team the offensive team minus the
defensive team and that way The the
612
:defensive team's value is is is higher if
they're a good defense So essentially if
613
:team a's, you know expect the goals and
they in a game against an average opponent
614
:is like 1 .5 and the defense was Average
expected goals in the game was you know
615
:that they allowed was 1 .4
616
:then you would say, the difference is like
0 .1, okay.
617
:I also include effects for being at home
in this model.
618
:I think, actually, I think that's all I
do.
619
:But in other models I've done, you can
look at things such as how much rest
620
:they've had since their last match.
621
:You can look at the difference between
each team's rest.
622
:And those are not linear effects, right?
623
:You have to do some sort of nonlinear
effects for that, right?
624
:Because like one day of rest is, two days
of rest is not,
625
:Like the difference between two days of
rest and one day of rest is very different
626
:than seven days versus eight days of rest,
right?
627
:Seven and eight days of rest are pretty
much the same thing, but two and one is
628
:very different, right?
629
:Like much bigger effect for having two
days of rest than just one day of rest.
630
:And so you can do things like that, or how
far away they had to travel, those sorts
631
:of things.
632
:Now in European soccer, that's not a huge
deal, because especially in the
633
:competitions within each country, no team
is traveling that far.
634
:But in American sports, it is a pretty big
deal.
635
:Like, you know, you, you have to fly five,
six hours across the country on short
636
:notice.
637
:Like that can, that can really affect
performance.
638
:and, and other things, like I said, I
don't have this in the soccer model, but
639
:I, if anyone's interested in modeling
sports outcomes, that people typically
640
:tend to overlook is the, I liked always a
big proponent of elevation, meaning that
641
:if there are certain sports where there
are certain teams that play at higher
642
:altitudes,
643
:And if you're not used to playing at
higher altitudes, it's actually a very
644
:noticeable effect in a model that you're
going to have a lower offensive output and
645
:you'll actually allow more points on the
other end due to fatigue.
646
:And so the United States, it's the teams
that are playing in Colorado and in Utah.
647
:But in Europe, it could be the teams that
have to go to Switzerland or the teams
648
:that have to go to some of these alpine
regions that are higher up in altitude.
649
:In Mexico, if you have to go to Mexico
City, it's extremely high.
650
:Or Colombia, right?
651
:I mean, depending on what you're doing,
these are very high altitude places that
652
:have shown to have a measurable impact on
an opponent's performance.
653
:Yeah, that's very fun.
654
:My God, I love those kind of models.
655
:That's so much fun.
656
:And I would also guess that, I mean, at
least my per would be that there is a
657
:reverse mechanism also for teams who are
used to playing altitude.
658
:Do they get a boost of performance when
they play closer to the C level?
659
:Because they could have had adaptation
that make them better when they go to the
660
:C level.
661
:Yeah.
662
:I mean, I think there's certainly science
behind that.
663
:I found that is a lot harder to show in a
model than the reverse.
664
:Not that it might not be there, but I
think the effect size, if it is there, is
665
:definitely smaller than the reverse.
666
:Yeah.
667
:That's what...
668
:That's what I would expect to like.
669
:I think the effect is here mainly because,
well, I've seen it.
670
:Like it seems to be pretty well seated in
the science literature, but that doesn't
671
:mean the effect is big.
672
:So yeah.
673
:Yeah.
674
:I mean, I'm a runner and I know that all
of the distance runners that are training
675
:for marathons that are elites and
professionals, they all train at higher
676
:altitudes, right?
677
:For the...
678
:six weeks leading up to a competition and
then they travel to the competition at a
679
:lower altitude.
680
:And, you know, they think they have an
oxygen performance boost due to that.
681
:Yeah.
682
:Yeah.
683
:Kind of like legal oxygen doping, legal
blood doping.
684
:Yeah.
685
:Yeah, exactly.
686
:Yeah.
687
:Yeah.
688
:I mean, I think it seems to be pretty much
proven.
689
:I would say maybe it has more of an impact
on individual spots like marathon running
690
:or else, because it's more like, you know,
it's just like,
691
:Even if you're winning just a few tenths
of a second, well, it can help you have a
692
:better time in the end because, well, at
this level, just having the smallest
693
:increase in performance could be the
difference between first and second place.
694
:But maybe that's harder to see such a
small effect on a collective spot, a
695
:collective game because, well,
696
:Maybe there are some...
697
:Maybe it's just not an addition.
698
:Maybe it's actually the effect cancel out.
699
:So in the end, you don't really see a big
effect.
700
:But that would be...
701
:Yeah.
702
:I'd love to do an experiment on that.
703
:Like an RCT.
704
:That would be so much fun.
705
:Yeah.
706
:Well, good luck trying to do experiments
in sports.
707
:It's hard.
708
:Yeah, I know.
709
:I know.
710
:But that...
711
:I mean, if the multiverse exists...
712
:Then there is a universe where we can do
that kind of experiments.
713
:And my god, these scientists must have so
much fun.
714
:And yeah, so thanks a lot, first, for
detailing the model that clearly and in so
715
:much details.
716
:That's super cool.
717
:So the results of the model are in a cool
dashboard on your website.
718
:Do you have the model and data available
freely, maybe on your GitHub, that we can
719
:put in the show notes?
720
:Yeah, I'm not sure.
721
:I think my GitHub, I don't know if my
GitHub model is in the model.
722
:It's on GitHub.
723
:I don't know if it's private or not, but I
can let you know.
724
:You know, I use actually open source data
for that.
725
:So I, I, let me double check.
726
:I can actually double check and get back
to you after the show on if, yeah, if I
727
:could have it in my public GitHub or not.
728
:So, yeah.
729
:Yeah.
730
:But essentially it uses the, there's a
package called world football R and.
731
:It uses data from there to build the
model.
732
:So some of that data is just from, it's
scraped from like transfer market.
733
:so I use, I use, I didn't really talk
about how I set priors means for each of
734
:the teams, but very, a very simple, very
simple, hierarchical model is essentially
735
:just to use the expenditures of the club
and use that as a prior mean for how good
736
:the club will be going into the season.
737
:And, and.
738
:Unlike some other sports in soccer, world
football, how much a club spends is very
739
:highly correlated with how successful they
are, which makes sense, but it's not true
740
:necessarily in like baseball.
741
:So, do you see these effects of budget?
742
:So, yeah, first, before I go on a follow
up question, yeah, for sure.
743
:Get back to me after the show.
744
:And if that's possible, we'll put that in
the show notes because I'm sure.
745
:A lot of listeners will be interested in
checking that out.
746
:I personally will be very interested in
checking that out, definitely.
747
:So that'd be awesome.
748
:And second, that effect of budget that you
see on the performance of a team.
749
:And so I guess in football performance
mean number of expect expectation of games
750
:won.
751
:Do you see that on Curse?
752
:Do you see?
753
:that much of an effect also in a closed
league system like the MLS?
754
:Or is that so because my prior would be
the effect of budget would be even
755
:stronger in open leagues like we have in
Europe because it's like there is no
756
:compensation mechanism, right?
757
:Clubs can go down and usually in Europe
the strongest clubs are the historical
758
:clubs.
759
:or the new clubs are just the ones that
were lucky to be bought by very, very
760
:healthy shareholders.
761
:And like, there is not a lot of switching
of the hierarchy and changing of the
762
:hierarchy, mainly because of budget, as
you were saying.
763
:But I would think that maybe the effect of
budget is less strong in a closed league
764
:like the MLS.
765
:Is that true?
766
:Is that something you see or is it
something that's still in the air?
767
:Yes.
768
:So I haven't looked specifically at the
MLS, but in general in American sports,
769
:which all have closed leagues, the budget,
well, for various reasons, the budget
770
:effects are not super strong.
771
:So, you know, in American baseball, there
is no spending limit.
772
:So in some American sports, like the NFL
and football, like there's a salary cap,
773
:meaning you can't spend more than a
certain amount.
774
:So there is no relationship between
overall spending and winning because
775
:everyone has to spend a minimum and
there's a maximum.
776
:In baseball, there is no limit.
777
:There's a tax.
778
:If you spend too much money, they do tax
you.
779
:But there's still not a huge correlation.
780
:And then in MLS, like I said, I'm not
entirely sure.
781
:Most of the clubs, they are constrained
about how much they can spend.
782
:And so there isn't as much variance also
in spending.
783
:So like, you know, Messi going to Inter
Miami, it wasn't that Inter Miami could
784
:pay him a lot of money.
785
:They actually, you know, there's a couple
of exemptions that an MLS club could use
786
:to pay an international player.
787
:They have, they're called, you know, a
couple of exemption players they have.
788
:And that's originally started when David
Beckham went to Los Angeles and they kind
789
:of made that rule essentially just so he
could, they could afford paying him what
790
:he was used to or close to what he was
used to being paid in Europe.
791
:and, and the MLS is still kind of the
case.
792
:You have one or two players you're allowed
to have on these exemptions and.
793
:The way Messi was able to make it work is
he's getting paid from Apple for his
794
:Apple's broadcasting the MLS games.
795
:So they're paying him essentially to play
in the MLS because they're hoping, more
796
:people are going to watch our broadcasts
are going to pay us.
797
:And so we're going to give you a
percentage of that.
798
:And that's where actually a lot of his
salary or like his earnings are coming
799
:from is from a, a deal with Apple versus
the actual MLS club in Miami, which can
800
:only pay him so much.
801
:So my guess is, my prior is, I haven't
looked specifically at the MLS with this,
802
:but my prior is yes, that there isn't a
huge relationship in the MLS between
803
:winning and spending just because there's
not much of a variance.
804
:In order to see those correlations, you
have to have a large enough variance in
805
:the spending to notice the relationship,
right?
806
:So.
807
:Yeah, definitely interesting.
808
:I mean, I love also looking at these, you
know, the...
809
:how the structure of a league impacts the
show and the wins is extremely
810
:interesting.
811
:That can seem very nerdy and I think
that's my political science training that
812
:kicks back here, but really how you
structure the game also makes the game
813
:what it is and the results and the show
you're going to get.
814
:I find that extremely interesting to see
how the American games, the US games are
815
:structured.
816
:Because ironically, it's a system where
there is much more social transfers, if
817
:you want, like we have in Europe for
social security and health and education.
818
:American sports are socialist, and
European sports are capitalist.
819
:But typically, we consider Americans to be
more capitalist and the Europeans to be
820
:more socialist.
821
:So it's an interesting inversion.
822
:Yeah.
823
:No, definitely.
824
:And I mean, I think...
825
:Honestly, that's going to be interesting
in the coming years to see what's
826
:happening on the European side because
there are more and more debates about
827
:whether we should have a closed European
wide league, which would basically be an
828
:extension of the current Champions League.
829
:And honestly, I think it's going to take
that road because more and more
830
:championship, at least all the
championship, I would say, for the
831
:exception of the Premier League.
832
:get more and more concentrated on just a
few clubs.
833
:And just from time to time, you have one
club that bumps onto the top, like
834
:Leverkusen this year in Germany, Monaco in
France a few years ago, Montpellier.
835
:But that's like really exceptions.
836
:And in the end, you almost always get the
same clubs that win all the time.
837
:And so the idea of open leagues is not
really true for the top of the leagues.
838
:It's definitely true for the bottom, but
the big clubs never go down.
839
:And...
840
:And so I think at some point, this
illusion of the open leagues is going to
841
:disappear and probably we'll get a
European wide championship where like
842
:basically the leagues are going to get a
bit more even because I think it's better
843
:for the show and that's going to make more
money.
844
:And in the end, I think that's what the
question is also.
845
:Yeah, you might be right, but I hope, I
hope not.
846
:I really, as an American, always have
dreamed of Americans doing relegation and
847
:promotion just because...
848
:You know, in America, we have this problem
where we call it tanking, right?
849
:Because we have the socialist draft system
where if the worst teams are incentivized
850
:to lose because they know they're not
going to win.
851
:So they want to get the best possible
players in the draft the next season.
852
:And so they're incentivized, you know, to,
to lose a little bit more.
853
:And so that really does kind of, you know,
the promotion relegation is nice because
854
:it solves that, you know, if you keep
losing, you lose a lot of money because
855
:you get sent down.
856
:so everyone's motivated even at the bottom
of each league to keep winning games,
857
:right?
858
:As much as possible.
859
:Otherwise they lose a lot of money.
860
:And in American leagues with the closed
system, it's like, well, Hey, you know,
861
:it's actually, we talk about sick sickle.
862
:He, and one thing that sports analytics
analytics have done is essentially say,
863
:it's really hard to go from an American
sport being an average team to a really
864
:good team.
865
:And the reason is.
866
:is the draft system.
867
:So in the draft system, people are always
overconfident in how good the players are,
868
:but there's really thick right tails of
how good a player can be.
869
:So when you get a new player who's young
and you can draft them at the top of the
870
:draft, they might not pan out, but they
also have a really thick right tail,
871
:meaning that if they do pan out, you could
go from being one of the worst teams to
872
:one of the best teams really quickly.
873
:And so,
874
:You know, it's this other analysis of
like, well, if you don't ever have an
875
:option opportunity to draft someone in a
position where there's that right tail,
876
:where, you know, once out of every five
years, you get a player who's transcends
877
:everyone else that comes in, then you
can't move up from average to really good,
878
:but you can go from being bad to really
good.
879
:So often teams and the smarter teams, if
they're really good, they say really good.
880
:But once they start noticing the players
are getting older, they just trade
881
:everybody away.
882
:They get rid of all their best players and
they just stink for a year or two and
883
:hopefully they can get some good draft.
884
:They get a lot of draft picks.
885
:Essentially.
886
:They try to trade their players away, get
more draft picks, and then it becomes a
887
:sample size problem.
888
:And it says, well, if we have more draft
picks, our probability of getting someone
889
:on the right tail goes up.
890
:And so that's all we're going to do is
we're just going to increase our odds of
891
:getting that right tail player.
892
:And if we get that player, then we'll be
good again.
893
:Yeah.
894
:Yeah.
895
:It's like.
896
:buying a lot of lottery tickets.
897
:Yeah, that's what they're doing.
898
:Yeah, now that's fascinating.
899
:Yeah, I wasn't aware of these effects.
900
:That's super interesting.
901
:Because basically, what you're saying is
there is an incentive to be extreme,
902
:basically.
903
:Either you want to be among the top ones
or you want to be among the worst ones.
904
:But being in the middle is the worst,
actually.
905
:It is the worst.
906
:Yeah.
907
:Yeah.
908
:That is extremely interesting.
909
:And that's...
910
:Yeah, I mean, I actually don't know which
system I prefer.
911
:Honestly, I'm just saying I think Europe
is getting, is going there because we have
912
:more and more basically concentration of
the wealth at the very top of the leagues
913
:and that's going to make the national
leagues less and less interesting
914
:basically.
915
:But I don't know either if I prefer the
European wide championship.
916
:Well, I think I would prefer European wide
championship.
917
:for sure, but I think it would be great to
have it still open.
918
:So where you could have, you know, like
basically countries would become regions
919
:and then you get from like, if you, if
you're in the best in France, basically in
920
:one year, then you get to the highest
level, which is the European one.
921
:And then if you're among the worst, you
get down to your country the next year.
922
:I think that would be very fun because
the, like, especially now that players can
923
:be traded very easily between the, the...
924
:continental Europe because it's basically
the same country legally.
925
:That also makes sense that the teams, you
know, basically meeting PSG versus
926
:Barcelona is much more tied than PSG
versus literally any team in France.
927
:So yeah, that's going to be very
interesting.
928
:But at the same time, I'm very, yeah, I
love hearing about the wrong incentives.
929
:at the same time of the closed system.
930
:So thanks a lot for that.
931
:That's food for thought.
932
:And that's again, like that's very close
to two elections, actually, like how you
933
:count the votes impacts the winner.
934
:And so here, like really in sports to how
you structure your game has an impact on
935
:the winners.
936
:And I think it's extremely important to
keep in mind because in the end, like how
937
:the
938
:the organization, so the MLS in the US or
the UEFA in Europe have actually huge
939
:power over the game.
940
:Well, thanks for that political science
parenthesis.
941
:I wasn't expecting that, but that's
definitely super interesting.
942
:To get back to the modeling because time
is running by and I definitely want to ask
943
:you about the plus minus models because
you're using that also to...
944
:estimate player value in American
football.
945
:So I'm curious about that.
946
:What is that kind of model?
947
:Is that mainly for American football that
you're using that also for other sports?
948
:Or if it's only for American football, why
is that particularly tailored to that
949
:sport?
950
:Yeah.
951
:So plus minus models actually are
originated in basketball and they're, they
952
:work the best in basketball.
953
:They're not perfect.
954
:And that sort of the concept in basketball
is you have 10 players on the court at
955
:each.
956
:at each moment and they substitute in and
out.
957
:But while those 10 players are on the
court, you know how many points are scored
958
:for each team, right?
959
:So, you know, five players on the offense
side and five players on defensive side.
960
:There's essentially just a big linear
model and you look at and you want to
961
:adjust for how long they're on the court
or how many possessions they were on the
962
:court for.
963
:So you can say, okay, these 10 players are
on the court for two and a half minutes.
964
:And in those two and a half minutes, this
team scored six points and their team
965
:scored four points.
966
:And essentially what you're doing then is
a plus minus model, essentially.
967
:So sometimes you might see in a, in a
statistic after the game, like the total
968
:difference in the net points for the team
when a player was on the court versus when
969
:they're not.
970
:Well, that's not too useful because
there's a lot of correlations, right?
971
:You're playing with someone else a lot.
972
:So what we call an adjusted plus minus
model, right, is a linear model that then
973
:tries to fit those player effects of, you
know, you get a one when you're on the
974
:court.
975
:on offense and negative one year on
defense.
976
:And we look at your team's efficiency,
right?
977
:Your points divided by some denominator,
whether it's minutes or possessions.
978
:Okay.
979
:And that's sort of the basketball thing
over time.
980
:They realized, okay, well, there's so much
correlation between who is playing
981
:together.
982
:We need to adjust for that.
983
:So they used ridge regression.
984
:And so that would divvy up the credit a
little bit better.
985
:And you know, ridge regression is very
good at when there's
986
:A lot of multicollinearity or correlation
between two effects, right?
987
:And on the basketball team or all
basketball players, you have teammates
988
:that play a lot together and they don't
play with other people a lot.
989
:But Ridge Regression has done a decently
good job in basketball over a big sample
990
:of estimating how effective players are.
991
:And if you look at these things, you'll
see, we talked about the sniff test.
992
:In:
993
:And he's the number one player for a lot
of the years, not so much anymore because
994
:he's older, et cetera.
995
:Right.
996
:But that's sort of those sniff tests that
we get.
997
:Well, some people in, in basketball and
I'm proponent of this, like, you know,
998
:this is a Bayesian podcast is that ridge
regression, you know, for those unfamiliar
999
:is, is a frequentist way to write a
Bayesian model.
::
That's very specific where you have a
normal prior on each player with a mean
::
zero.
::
Okay.
::
And that's ridge regression.
::
So we think about it from that perspective
with adjusted plus minus models.
::
What happens when you have a normal prior
with mean zero is that when you have
::
players that play less, we shrink more
towards the prior mean.
::
And it's only when we have more data for
players that we can deviate from that
::
prior mean.
::
Well, one thing we know about sports is if
you're not playing as much, that actually
::
is pretty useful information.
::
And what does that tell us?
::
You're not very good.
::
Because if you're good, you're going to
play more.
::
And if you're bad, you play less.
::
So other people have come around and, you
know, in the last 10, 15 years and said,
::
okay, well, instead of a ridge regression
model for basketball, we should do a
::
Bayesian regression model.
::
And instead of having a mean zero for a
player, we should have a mean of something
::
else.
::
So there's a few different versions that
people have done.
::
One thing, a very simple version is say
just everybody has a mean prior mean of,
::
you know, what we call a replacement
player.
::
Okay.
::
Someone that doesn't play very much.
::
If you're really good and you play a lot.
::
It doesn't matter what the prior mean is
too much because the data is going to
::
overwhelm the prior.
::
But if you don't play very much, we're
going to stick with that sort of negative
::
prior mean because it means you're below
average.
::
And so that's one thing you can do.
::
A more sophisticated thing sometimes
people will do is they'll build a
::
hierarchical model where you have
essentially a, a prior mean that is based
::
on other statistics that we observe.
::
So how many points you score or how many
assists you have.
::
And those that's called a box, a box score
prior mean or a box score plus minus.
::
So that's sort of the basketball.
::
So we gave you the what plus minus models.
::
So that's sort of the basketball approach.
::
Now.
::
Basketball is really nice because you have
lots of games in the NBA.
::
You play every team at least twice and you
substitute a lot and there's lots of
::
scoring.
::
Now my work in American football tried to
address a lot of these issues in American
::
football.
::
You don't play every team.
::
you don't substitute very much.
::
And if you do play, you only play with
certain people like all the time.
::
And then there's not a lot of scoring
compared to basketball.
::
There's some scoring, but you know,
there's, you know, American football point
::
scoring is unique, right?
::
You get six or seven points for a
touchdown, you get three points for a
::
field goal, you know, and then on more
rare occasions, you get these two point
::
safeties.
::
Yeah.
::
So there's roughly maybe 10 scoring events
in an American football game versus in
::
basketball where you have, you know, a
hundred to a hundred.
::
So there's, you know, about each two to
three points, each one there's, you know,
::
80 to 120 scoring events in a basketball
game.
::
Right.
::
So these models work a lot better.
::
My work in American football has been to
sort of, how do we take the basketball
::
model and make some modifications so we
can do a football model?
::
And so one of the things that is tricky in
football is.
::
that certain positions never get
substituted out.
::
So on offense, the quarterback plays every
single play unless they're hurt or they
::
stink.
::
So they get benched.
::
Well, the quarterback also always plays
with the same offensive line as long as
::
they're healthy and they don't get
substituted out.
::
So how does a model separate credit when
the same players are on the field all the
::
time?
::
And so my work in that was sort of to use
Bayesian statistics and take the...
::
the Bayesian regression model where we had
a prior mean, I used some information to
::
inform the prior mean for each player, but
I also did this unique thing where I
::
shrink.
::
So the prior variance is a function and is
actually, there's one prior variance for
::
all players and then it's multiplied by
another parameter, which is unique for the
::
position that they play.
::
And so quarterbacks have a different
shrinkage parameter, essentially, or prior
::
variance than.
::
a different position.
::
And then instead of just looking at
scoring plays in football, we have what we
::
call is expected points added.
::
So at each play, we look at on average,
how many points are you going to score if
::
you have the ball in this position?
::
And I look at the difference between two
plays, right?
::
And that tells you essentially how much
value you got in the result of the play.
::
So instead of using every scoring play, I
just use every single play in football.
::
And I do this unique shrinkage.
::
dependent on position and doing that, and
it's a huge model.
::
So I did this in college football, which
has way too many parameters because
::
there's like 16 ,000 kids.
::
But even in the NFL, I've done this and
you get interesting results.
::
Sometimes they match up with what you
think, sometimes they don't.
::
But the interesting thing is you can
actually estimate how much you should
::
shrink each position.
::
And so actually the model is nice because
it essentially tells you how much of the
::
variance in the outcome of the play.
::
is dependent on how good players are
across different positions.
::
So in football, we all know that
quarterbacks are the most impactful
::
position in the game.
::
And I did give somewhat subjective priors,
but not with, I still left a lot of
::
uncertainty around and the model very well
could see and estimate that quarterbacks
::
are in fact the most important position
because you shrink them the less they have
::
the largest variance.
::
So.
::
You could look at that.
::
If you look at the most impactful players
in football, it should be a quarterback.
::
But in the same measure, the worst players
in football are also quarterbacks because
::
in order to negatively hurt your team, you
can only hurt your team really a lot.
::
If you're a quarterback compared to other
positions, I mean, every position you can
::
hurt your team, but no one can hurt a team
as much as a bad quarterback hurts their
::
team.
::
Just like a good quarterback can help
their team better.
::
So that's sort of like a kind of rough
overview of, of my plus minus modeling in
::
football.
::
I think I do have, when I wrote the paper,
I have a version of that written in Stan.
::
The data set itself was not public, but I
did have a version of the Stan model
::
written and uploaded on my GitHub that you
can look at.
::
It's pretty massive.
::
In recent years, I've tried to expand it
and to do a state space model type
::
version.
::
So I have effects for each player for each
season over time.
::
Yeah, that was exactly what I meant.
::
Computationally, that gets a little bit
trickier.
::
And my dataset, actually, I was able to
scrape some data for that.
::
And then actually, I can't anymore.
::
The NFL just stopped releasing that.
::
So that work is on hold for now.
::
But I probably need to find a graduate
student that can help me finish it.
::
Yeah, definitely we should put that in the
show notes.
::
That's super interesting.
::
Your paper in the...
::
and the link to the GitHub repo.
::
That's for sure.
::
And that makes me think a recent episode I
did, and also a recent interest of mine, I
::
started contributing to that package
called Baseflow, where that's precisely
::
that could be useful in your case here,
because your model structure doesn't
::
change.
::
If I understand correctly, because well,
once you have the model structure, it's
::
kind of like a physics model.
::
It's not going to change when you have new
data, but the data sets do change.
::
So you have new data sets coming in.
::
And so that's where probably using these
kind of inference that's called amortized
::
Bayesian inference could be extremely
useful because you would basically, if the
::
bottle, the computational bottleneck would
just happen once.
::
That would be when you train the deep
neural network.
::
to learn the posterior structure and
parameters.
::
So instead of MCMC, you're using the deep
neural network to learn the posterior.
::
But then once you have trained the deep
neural network, then it's like doing
::
posterior inference is trivial.
::
And so for that kind of models where you
have a lot of data, but the model is the
::
same.
::
That's a very good use case for amortized
Bayesian inference.
::
So that could be something very
interesting here.
::
Yeah.
::
Yeah.
::
Yeah.
::
Yeah.
::
Happy to tell you more about that
afterwards if you're interested.
::
But yeah, I've started digging into that,
and that's super fun for sure.
::
So yeah, and I think this is a cool use
case.
::
Awesome.
::
Well, I still have a few questions, but
can I?
::
We are getting short on time, so can I
keep you a bit longer?
::
Yeah, just a few more minutes.
::
Sure.
::
Yeah.
::
Okay.
::
Awesome.
::
Yeah.
::
So actually, I'd like to pick your brain
about now talking a bit more about the
::
future.
::
I'm curious.
::
So let me fuse two questions.
::
So first, I'm curious what, where do you
see the field of spots analytics heading
::
in the next years?
::
five to 10 years.
::
And also sub question is other spots,
specific spots where you see significant
::
potential for growth in analytics.
::
Yeah, those are, those are good questions.
::
I think they go kind of hand in hand.
::
You know, I think it's hard to, it's hard
if I could predict the future, right?
::
I would probably have a different job.
::
I'd probably be retired.
::
But.
::
You know, I think a lot of the future is
going to be catching up to, you know,
::
sports like soccer, American football,
hockey going to be catching up.
::
And I think a lot of the growth is
actually going to be making sports
::
analytics more digestible for just
everyday people.
::
So the fans, right.
::
And that's happened over time, right?
::
You watched a broadcast of a, of a soccer
game,
::
20 years ago, no one talked about expected
goals.
::
Now, most broadcasts will show it.
::
They might not always talk about it.
::
They'll show it.
::
Like I said, expected goals, it's better
than just showing the score, but there's a
::
lot to be left undone.
::
I think in the future, there's going to be
a lot of sports analytics that's really
::
much focused on expected values to date.
::
And not enough has been focused on
distributions and variance around
::
estimates.
::
And so I think once one place it's going
to have to end up going.
::
and part of the reason is, right, we, we
talk about neural networks.
::
Neural networks are very good at expected
values, with really large data sets.
::
It's a lot harder, right?
::
Modeling variance is a lot harder in
anything than modeling and expectations.
::
So I think catching up on some of those
things.
::
And I think also, like I said, taking a
step back and I think, you know, there's
::
been a lot of good work that has been
done, but I think we're going to find a
::
few things that.
::
Hey, maybe we were a little bit
overconfident, right?
::
And with everything in sports, it's always
about game theory.
::
So even if something is optimal today,
that strategy is not always going to be
::
optimal in the future.
::
And so if you, if, you know, in basketball
for a sec, we talked about three pointers.
::
Of course, three pointers are really good
right now because they have higher
::
expected value, but you know, defensively
players are learning to play against three
::
pointers better than they used to.
::
or in American football, the numbers have
said you should pass the ball more.
::
Well, now the defenses are learning how to
defend it better.
::
And so running is going to be more
important than it used to be.
::
Right.
::
And so these things are always going to
change.
::
And so in five to 10 years, I don't know
exactly what it's going to be, but I think
::
in some ways, you know, you might find
some analytics person in 10 years giving
::
exact opposite advice of what we're seeing
now, just because the game has evolved.
::
The game has changed.
::
And so now you should do something else,
right?
::
To get an edge.
::
and so I think the growth is in twofold.
::
We're always staying on the cutting edge
of like, what's next.
::
Sometimes that's going back to where you
were.
::
and like I said, making the numbers more
digestible for the everyday consumer.
::
you know, it's, it's one thing you and I,
we can talk about models.
::
I had to do this at ESPN all the time.
::
I can't talk about prior distributions on
TV.
::
Right?
::
So how do we explain these things?
::
Right?
::
And I think what's really going to be key
is over time, this has happened already,
::
but it's going to keep on happening that
the analysts themselves are going to be
::
much more data literate than they have
been in the past.
::
Not just because they have more people
working with them or they're younger.
::
Also the analysts in the future is going
to be able to use AI to do their own
::
analysis.
::
And that could be scary because they might
make some bad assumptions.
::
but they're also going to be more data
savvy and they could load up a data set
::
and use an AI tool.
::
And even if they can't code to get
insights that, you know, I used to have to
::
write some code to get them and now they
can just do it themselves.
::
Right.
::
And so that's, I think somewhere else that
teams and coaches are going to be able to
::
do more analysis on their own.
::
And it's not that the data people aren't,
aren't needed.
::
In fact, they're going to be needed even
more to make sure that the coach isn't
::
missing an assumption, right.
::
That he needs to be thinking about of the
structure of the data.
::
Cause he might just be, great.
::
Now I can run a regression.
::
I don't even know.
::
I don't even need to know how to code it.
::
Right.
::
that's great.
::
But are you thinking about this?
::
Right.
::
And so there's going to be a lot of
education about using some of these tools
::
better and every, but everyone's going to
have their access to it.
::
Right.
::
It's going to be so much more accessible
in the future than it has been in the
::
past.
::
Yeah.
::
Yeah.
::
Yeah.
::
yeah, for sure.
::
Completely, completely agree with that.
::
and that's also something I'm very
passionate about.
::
That's also what these show.
::
is here, right?
::
It's to have the bridge between the
modelers and the known stats people be
::
easier, in a way.
::
And that's something I really love doing
also in my job, basically being that
::
bridge between the really nitty gritty
details of the model.
::
And then, OK, now that we have the model,
how do we explain to the people who are
::
actually going to consume the model
results what the model can do, what it
::
cannot do, and how we can?
::
make decisions based on that, that
hopefully are going to be better decisions
::
than we used to make.
::
And also, how do we update our decisions?
::
Because, well, the game changes, as you
said so well.
::
So yeah, for sure, all that stuff is
absolutely crucial.
::
And I like using the metaphor of the
engine and the car, right?
::
It's like building the model is the engine
of the car.
::
So surely, you want the best engine
possible, but you also need a very cool
::
car, because otherwise, nobody's going to
want your engine.
::
And so...
::
like then building all the communication
around the model, the visualizations,
::
things like that, extremely important
because then in the end, as you were
::
saying at the beginning of the show, if
the model isn't used, well, that's not a
::
very good investment.
::
Yeah.
::
So I would have literally, I would have a
lot more questions if they are on my list,
::
but we are going to call it a show poll
because I don't want to keep you...
::
three hours, you've already been very
generous with your time.
::
You can come back to the show anytime if
you want to, if you have a cool new
::
project you want to talk about for sure.
::
Yeah, maybe we can record the French
version of the podcast sometime, you know.
::
yeah, yeah.
::
I'll definitely be down for that.
::
You know, someone who will be very happy
is my mother.
::
She's always asking me, so when are you
going to do the French version of your
::
courses in your podcasting zone?
::
I'm like, that's not going to happen, mom.
::
Maybe that's what moms are for though.
::
Exactly.
::
Before letting you go, Paul, I'm going to
ask you the last two questions.
::
I ask every guest at the end of the show
because it's a Beijing show, so what
::
counts is not the individual point
estimate, but the distribution of the
::
responses.
::
First question, if you had unlimited time
and resources, which problem?
::
would you try to solve?
::
Good.
::
That's a good question.
::
You sent me this ahead of time and I spent
a couple seconds and I was like, man, I
::
don't know.
::
But I, it's tough.
::
There's so many questions in sports.
::
Yeah.
::
I know.
::
I, I mean, my, one of my passions is
American football and I just keep going
::
back.
::
So I could tell, I love American football
and I love soccer, international football.
::
Right.
::
And both of those games, understanding.
::
There's certain positions that are just
really hard to understand how valuable
::
they are.
::
And so in soccer, it's like the midfield.
::
It's we know you need a good midfielder,
but how do you measure that?
::
That's a really hard problem.
::
And in football, there's a lot of
positions in American football.
::
There's a lot of positions like that as
well.
::
So I probably go somewhere along those.
::
Like I want to, I want to discover and
measure the value in these really hard to
::
measure, traits and values and these two
sports.
::
Yeah.
::
Yeah, I definitely understand.
::
The battle for the middle is extremely
important always in soccer.
::
And if you look at all the teams which win
the Champions League, so the Holy Grail,
::
like the Super Bowl of the soccer world,
almost all the time they have an amazing
::
and impressive pair or three players as
midfielders.
::
And that's like a sine qua non.
::
But...
::
As you were saying, it's extremely hard to
come up with a metric that's going to not
::
only explain why the midfielders are good,
but also help you constantly choose
::
midfielders that will increase your
probability of winning the Champions
::
League.
::
And I'm seeing that as a very frustrated
Paris fan because that's been years since
::
Thiago Mota basically retired that we're
looking for a number six.
::
So the play, the midfielder just before
the defense and we're still looking for
::
him.
::
Yeah.
::
So please, Paul, let me know when you're
done with that.
::
Yeah.
::
Well, unfortunately, there's several
really good French midfielders.
::
They just don't play for PSG.
::
I know.
::
I know.
::
Not a lot of French players stay in
France.
::
That's why I'm telling you, we need a
European wide league.
::
Many more players would stay in France and
play for PSG, I guess.
::
And second question, if you could have
dinner with any great scientific mind.
::
dead, alive or fictional, who would it be?
::
Fictional?
::
I haven't really thought about fictional
scientific minds.
::
That is a good question.
::
Geez.
::
Man.
::
Well, I mean, I thought you were going to
answer very fast.
::
Actually, that one, I thought you were
going to answer Bill James like super
::
fast.
::
Bill James.
::
Yeah.
::
Well, I've met Bill James.
::
So, okay.
::
So I have dinner with him, but I have met
him.
::
I'll go a little, how liberal are you with
the word scientific mind here?
::
Yeah.
::
So I think scientific mind, I think
Galileo, I think Newton, I think Einstein,
::
right?
::
Like,
::
You know, those are all, but I'm sure from
the sports world, from the sports world,
::
there is a former football player that
very few people have ever heard of and his
::
name is Virgil Carter.
::
And the reason why I love him, he played
in the seventies is that he wrote a paper
::
about expected points in football while he
was playing in the NFL.
::
And it was sort of the first sports
analytics.
::
ever done in American football and he was
a player in American football at the same
::
time.
::
So very, not very well known.
::
He's still alive.
::
I don't know him at all, but he would be a
really cool person.
::
If I go like classical, scientific,
scientific minds, I would, I would
::
probably, maybe Gauss like, Hey, this
distribution that has your name is like
::
used everywhere and it's very useful.
::
So I probably, I would stick with him.
::
Normal distributions.
::
counseling distributions, like the rule of
world nowadays.
::
So I'd probably stick with that if I were
to go traditional scientific mind.
::
Yeah.
::
Yeah.
::
Now good choices.
::
Good choices.
::
I am amazed about that Virgil Carter
story.
::
That's so amazing.
::
Yeah.
::
So if anybody knows Virgil Carter, please
contact us and we'll try to get that
::
dinner for Paul.
::
If you do that, I'll definitely be here to
grab the dinner and have a conversation
::
with Virgil because like having someone
like that on the show would be absolutely
::
amazing.
::
I love that story.
::
That's so amazing.
::
It's like, you know, the myth of the
philosopher king.
::
Well, here is like the myth of the
scientist player.
::
It's just like, I love that.
::
Yeah.
::
that's fantastic.
::
Damn.
::
Thanks a lot, Paul.
::
Let's call it a show.
::
Thanks for having me.
::
Yeah, that was amazing.
::
As usual, we'll put resources and a link
to your website in the show notes for
::
those who want to dig deeper.
::
Thanks again, Paul, for taking the time
and being on this show.
::
Thanks once again, I really enjoyed it.
::
This has been another episode of Learning
Bayesian Statistics.
::
Be sure to rate, review, and follow the
show on your favorite podcatcher, and
::
visit learnbaystats .com for more
resources about today's topics, as well as
::
access to more episodes to help you reach
true Bayesian state of mind.
::
That's learnbaystats .com.
::
Our theme music is Good Bayesian by Baba
Brinkman, fit MC Lass and Megharam.
::
Check out his awesome work at bababrinkman
.com.
::
I'm your host.
::
Alex and Dora.
::
You can follow me on Twitter at Alex
underscore and Dora like the country.
::
You can support the show and unlock
exclusive benefits by visiting patreon
::
.com slash LearnBasedDance.
::
Thank you so much for listening and for
your support.
::
You're truly a good Bayesian change your
predictions after taking information and
::
if you think and I'll be less than
amazing.
::
Let's adjust those expectations.
::
Let me show you how to be a good Bayesian
Change calculations after taking fresh
::
data in Those predictions that your brain
is making Let's get them on a solid
::
foundation