Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways:
- Communicating Bayesian concepts to non-technical audiences in sports analytics can be challenging, but it is important to provide clear explanations and address limitations.
- Understanding the model and its assumptions is crucial for effective communication and decision-making.
- Involving domain experts, such as scouts and coaches, can provide valuable insights and improve the model’s relevance and usefulness.
- Customizing the model to align with the specific needs and questions of the stakeholders is essential for successful implementation.
- Understanding the needs of decision-makers is crucial for effectively communicating and utilizing models in sports analytics.
- Predicting the impact of training loads on athletes’ well-being and performance is a challenging frontier in sports analytics.
- Identifying discrete events in team sports data is essential for analysis and development of models.
Chapters:
00:00 Bayesian Statistics in Sports Analytics
18:29 Applying Bayesian Stats in Analyzing Player Performance and Injury Risk
36:21 Challenges in Communicating Bayesian Concepts to Non-Statistical Decision-Makers
41:04 Understanding Model Behavior and Validation through Simulations
43:09 Applying Bayesian Methods in Sports Analytics
48:03 Clarifying Questions and Utilizing Frameworks
53:41 Effective Communication of Statistical Concepts
57:50 Integrating Domain Expertise with Statistical Models
01:13:43 The Importance of Good Data
01:18:11 The Future of Sports Analytics
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.
Links from the show:
- LBS Sports Analytics playlist: https://www.youtube.com/playlist?list=PL7RjIaSLWh5kDiPVMUSyhvFaXL3NoXOe4
- Patrick’s website: http://optimumsportsperformance.com/blog/
- Patrick on GitHub: https://github.com/pw2
- Patrick on Linkedin: https://www.linkedin.com/in/patrickward02/
- Patrick on Twitter: https://twitter.com/OSPpatrick
- Patrick & Ellis Screencast: https://github.com/thebioengineer/TidyX
- Patrick on Research Gate: https://www.researchgate.net/profile/Patrick-Ward-10
Transcript:
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
Today's episode takes us into the dynamic
intersection of Bayesian statistics and
2
sports analytics with
3
Patrick Ward, the Director of Research and
Analysis for the Seattle Seahawks.
4
With a rich background that spans from the
Nike Sports Research Lab to teaching
5
statistics, Patrick brings a wealth of
knowledge to the table.
6
In our discussion, Patrick delves into how
these methods are revolutionizing the way
7
we understand player performance and
manage injury risks in professional
8
sports.
9
He sheds light on the particular
challenges of translating
10
complex Beijing concepts for coaches and
team managers who may not be versed in
11
statistical methods but need to leverage
these insights for strategic decisions.
12
Patrick also walks us through the
practical aspects of applying Beijing
13
stats in the high -stakes world of the
NFL.
14
From selecting the right players to
optimizing training loads, he illustrates
15
the profound impact that thoughtful
statistical analysis can have on a team's
16
success and players'
17
For those of you who appreciate the blend
of science and strategy, this conversation
18
offers a behind -the -scenes look at the
sophisticated analytics powering team
19
decisions.
20
And when he's not dissecting data or
strategizing for the Seahawks, Patrick
21
enjoys the simple pleasures of reading,
savoring coffee, and playing jazz guitar.
22
This is Learning Bayesian Statistics,
episode 111, recorded June 19, 2024.
23
Welcome to Learning Bayesian Statistics, a
podcast about Bayesian inference.
24
the methods, the projects, and the people
who make it possible.
25
I'm your host, Alex Andorra.
26
You can follow me on Twitter at alex
.andorra, like the country.
27
For any info about the show, learnbasedats
.com is Laplace to be.
28
Show notes, becoming a corporate sponsor,
unlocking Bayesian Merge, supporting the
29
show on Patreon, everything is in there.
30
That's learnbasedats .com.
31
If you're interested in one -on -one
mentorship, online courses, or statistical
32
consulting,
33
Feel free to reach out and book a call at
topmate .io slash alex underscore and
34
dora.
35
See you around folks and best patient
wishes to you all.
36
And if today's discussion sparked ideas
for your business, well, our team at Pimc
37
Labs can help bring them to life.
38
Check us out at pimc -labs
39
Hello, my dear patients, I have some
exciting personal news to share with you.
40
I am thrilled to announce that I have
recently taken on a new role as a senior
41
applied scientist with the Miami Marlins.
42
In this position, I'll be diving even
deeper into the world of sports analytics,
43
leveraging Bayesian modeling, of course,
to enhance team performance and player
44
development.
45
And honestly, this move is so exciting to
me and solidifies my commitment to
46
advancing the application of Beijing stats
and sports.
47
if you find yourself in Miami or if you're
curious about the intersection of Beijing
48
methods and baseball or team sports in
general, don't hesitate to reach out.
49
OK, back to the show now.
50
Patrick Ward, welcome to Learning Bayesian
Statistics.
51
Thanks for having me.
52
I listen to every episode.
53
I think every year at the end of the year,
Spotify tells me that it's one of my
54
highly listened to podcasts.
55
So it's pleasure to be here.
56
Hopefully.
57
I don't know if I can live up to your
prior.
58
You've had some pretty big timers, but
yeah.
59
No, yeah.
60
So first, thanks a lot for being such
61
faithful listener.
62
I definitely appreciate that.
63
And I'm always amazed at the diversity of
people who listen to the show.
64
That's really awesome.
65
And also I want to thank Scott Morrison,
put us in contact.
66
Scott is working at the Miami Marlins.
67
He's a fellow colleague now.
68
That's change for me, that's great change.
69
I'm extremely excited about that new step
in my life.
70
But today we're not going to talk a lot
about baseball.
71
We're going to talk a lot about US
football.
72
So today, European listeners, when you
hear football, we're going to talk about
73
American football, the one with a ball
that looks like a rugby ball.
74
And so Patrick, we're going to talk about
that.
75
But first, as usual, I want to talk a bit
more about you.
76
Can you tell the listeners what you're
doing nowadays?
77
So I gave your title, your bio in the
intro, but maybe like tell us a bit more
78
in the flesh what you're doing and also
how you ended up doing what you're doing.
79
Yeah.
80
Well, currently I'm at the Seattle
Seahawks, which is one of the American
81
football teams in the NFL.
82
And I'm the director of research and
analysis there.
83
So we kind of work across all of football
operations.
84
So everything from player acquisition,
front office type of stuff to team based
85
analysis and opponent analysis.
86
And just kind of coordinating a research
strategy around how we attack questions
87
for the key decision makers or the key
stakeholders across
88
coaching, acquisition, even into player
health and performance and development and
89
things like that.
90
And I got here, this is my 10th year, I
got here from Nike.
91
So I was at Nike in the sports research
lab actually working for nearly two years
92
as a researcher.
93
And the way that I got
94
was I was doing some projects for Nike
around applied sports research and they
95
had just at the time, I think they had
just become like the biggest sponsor of
96
the newly minted National Women's Soccer
League.
97
And they said, we want to do something
around this.
98
And so, you we were kind of kicking around
ideas.
99
And one of the ideas we had was what if we
went out and we tested all of the women in
100
the league, like tested them
101
sprinting and jumping and power output and
things like that.
102
And then we could basically build like
archetypes and that would be useful for,
103
you know, like apps in your watch and on
your phone and girls could in the field
104
could compare themselves to their favorite
athletes and stuff.
105
So they let us do it.
106
And they sent me on the road for an entire
off season, the entire off season training
107
of the national women's soccer league.
108
went around the country to every single
team, myself and four colleagues, and we
109
tested.
110
every woman in the league.
111
And so we had the largest data set on
women's soccer players that anyone could
112
have.
113
So we did some conference presentations
and things like that with that data.
114
And lo and behold, Nike was there and the
Seattle Seahawks called down to Nike and
115
said, hey, we hear there's this test
battery and we'd love to see what our
116
players do on it.
117
And so I went up and I did a project for
them around that.
118
And
119
then they kind of just said like, what if
you just did this kind of stuff all the
120
time?
121
And so that's how I started out 10 years
ago.
122
And I basically started out just in
applied physiology, which was my
123
background.
124
And I was doing like wearables, wearable
tech for the team, like GPS and
125
accelerometry and things like that.
126
And then that kind of progressed into
draft analysis and player evaluation and
127
things like that.
128
And it just kind of growing until
129
Yeah, 10 years later, here we are.
130
Yeah, that's a great, yeah, it's it's a
great, uh, background.
131
love it because I mean, definitely it
seems like you've been into sports since
132
you are, uh, at least a college graduate,
but also there is a, uh, a bit of
133
randomness in this.
134
So sorry, love that.
135
Uh, of course, as a, as a fellow Bayesian,
always, always interested in.
136
in the random parts of anybody's journey.
137
Actually, how much of the Bayesian stats
do you have in that journey and also in
138
your current work?
139
How Bayesian in a way is your work right
now, but also how were you introduced to
140
Bayesian stats?
141
Well, mean, anyone who has watched
American football knows it's a game of
142
very, very small sample sizes.
143
So we only play up until two years ago.
144
We play 17 games now.
145
We used to only play 16 games.
146
So unlike most of the other sports,
baseball has got 162, several hundred at
147
bats, basketball, hockey, 82 games.
148
many attempts.
149
Also the players in a lot of these sports
are all doing the same things in baseball.
150
Aside from the pitchers, everybody's going
to go to the plate and hit in basketball.
151
Everybody has a chance on the court to
dribble the ball, shoot, score, pass, get
152
assists, get blocks, et cetera.
153
Football is really unique because it's a
very tactical game.
154
There's discrete events in terms of plays,
stop and start.
155
But because of the tactical nature of it
and one ball, there's only certain
156
positions that touch the ball.
157
There's only certain opportunities that
players are going to have.
158
So that was always an issue.
159
And when I did my PhD, a big part of my
PhD was using mixed models to look at
160
physiological differences between players
on the field with GPS and accelerometry.
161
And I always thought of mixed models, even
though I didn't know it at the time,
162
because I hadn't really...
163
learned anything Bayesian yet.
164
I always thought of mixed models.
165
I think of them as like this bridge to
Bayesian analysis because you have these
166
fixed effects which behave like our
population averages, our population base
167
rates, I guess you could say.
168
And the random effects are sort of like,
hey, we know something about you or your
169
group and therefore we know how you
deviate from the population.
170
And then with those two bits of
information, we're also like, hey, here's
171
someone new in the population, or maybe
someone that we've only seen or observed
172
do the thing one time.
173
best guess therefore is the fixed effects
portion of this until proven otherwise.
174
So I always had that in the back of my
head going through this, but you know, my
175
first two or three years in the NFL, we
always just used to kind of throw our
176
hands up when we see small samples, we'd
be like, yeah, it's this, it's
177
50%, but it's such a small sample, we
can't really know.
178
And we didn't really have a good way of
like sorting out what to do with that
179
information.
180
Because as you know, know, something like
one out of 10 and 10 out of a hundred and
181
you know, a hundred out of a thousand,
those are the same proportions, but
182
different levels of information are
contained within those proportions.
183
And I stumbled upon a paper, it was like a
19, I think it was like 19.
184
77 or something by Efron and Morris and it
was called the Stein's paradox and I
185
probably stumbled on it because I was like
You know, there's so much in Saber Metrics
186
someone in baseball has probably figured
this out before and so I I was probably
187
googling something like Small samples
baseball statistics Saber Metrics blah
188
blah blah and I stumbled upon this paper
about Stein's paradox and The crux of the
189
paper was if we observe these I think it
was
190
12 or 18 baseball players through the
first half of the season up to the All
191
-Star break.
192
And we see the number of times they went
to the plate and what they're, you know,
193
and the number of times they hit, we have
a batting average.
194
If we take the observed batting average
through the first half of the season, how
195
well does that predict the batting average
at the end of the season?
196
Meaning now they've gone through the
second half.
197
And
198
You look at that and you're like, okay,
let's, you know, what's this all about?
199
And so the first thing they do is they set
up this argument that like, well, that
200
doesn't do a very good job because some of
these players batted, you know, five times
201
or three times, certainly a player who
went three for three has a hundred percent
202
batting average.
203
We don't think this is the greatest
baseball player of all time yet, because
204
we've only seen them do this thing three
times.
205
So, the basic naive prediction of using
the half
206
first half of the season to predict the
second half wasn't very good.
207
And so in that paper, they introduced this
kind of simple Bayesian model of saying,
208
well, we know something about average
baseball players.
209
What if we weighted everybody to that?
210
And lo and behold, that did a bit better
of a job constraining the small sampled
211
players who had these, you know, a guy
that goes 0 for 10, which is totally
212
possible in baseball when you have
hundreds of it bats.
213
We don't think that's the worst hitter in
baseball.
214
so, you know, constraining those players
told them something about what they
215
expected to then see at the end of the
season.
216
And so through that paper, then I found
this blog by David Robinson, who's an R
217
programmer.
218
And it was all about like using empirical
Bayesian analysis for baseball.
219
And then he made it into a nice little
book that you could buy on Amazon for
220
like, I don't know, $20 or something.
221
You know, and I read those two things and
I was like, this is incredible.
222
This is exactly what I've always wanted to
know.
223
And so like I went in the next day to our
other analysts at the time, there was only
224
two of us.
225
And I said, I think I figured out a way we
could solve small sample problems.
226
And, and that was it.
227
Like then after that, you really couldn't
convince me otherwise that this wasn't a
228
great way of thinking.
229
That doesn't mean that everything we do
has to be Bayesian.
230
Certainly like there's other things that
we do that are used.
231
you know, different tools like machine
learning models and neural networks and
232
things like that.
233
But certainly when we start thinking about
like decision -making, how do I
234
incorporate priors, domain expertise?
235
How do I fit the right prior?
236
You know, like if you went 0 for 5 and
you're first at bats, let's say in
237
baseball, but you were a college standout
and you were an amazing
238
player in the AAA, I probably have a
stronger prior that you're maybe a
239
slightly better than average baseball
player than if you went 0 for 5 and you
240
were a horrific college player and you
weren't very good in AAA and you were
241
really the last person on the bench that
we needed to call.
242
And maybe that prior is much lower.
243
so utilizing that information in order to
help us make decisions going forward,
244
that's really
245
That was kind of the money for me.
246
And so how much do we use it?
247
mean, if we have a new analyst start, one,
you know, one of our new analyst starts,
248
started two years ago.
249
I think the first thing was like, how much
do you know about BASE?
250
And it was like, well, I never really
learned that in school and blah, blah.
251
And it was like, okay, here's two books.
252
Here's a 12 week curriculum.
253
We're going to meet every week and you're
going to do projects and homework and
254
reading.
255
And that was it.
256
Like, it was like, you have to learn this
because this is how we're going to think.
257
And this is how we're going to,
258
process information and communicate
information.
259
Well, what about that?
260
I told the listeners that we were not
going to talk a lot about baseball, but in
261
the end we are.
262
It all comes back to baseball, think.
263
Yeah, in sports analytics, all comes back
to baseball, Certainly, yeah.
264
Yeah, okay.
265
If I understand correctly, was motivated a
lot by low sample sizes and being able to
266
handle all of that in your models.
267
That makes a ton of sense.
268
As a lot of people, I've seen a lot of
clients definitely motivated by a very
269
practical problem that you were having.
270
I mean, most of people enter the Beijing
field through that.
271
Something that I'm actually very curious
about, because like I could keep talking
272
about that for hours, but I really want to
dive into what you're doing at the
273
Seahawks and also, you know, like how
Beijing stats is helpful.
274
to what you guys are doing.
275
I think it's the most interesting for the
listeners who understand basically how
276
themselves could they apply patient stats
to their own problems, which are not
277
necessarily in sports, but I think sports
is a really good field to think about that
278
because you have a lot of diversity and
you have also a lot of somewhat controlled
279
experiments.
280
You have a lot of constraints and that's
always extremely interesting to talk about
281
that.
282
Maybe you can start by basically
explaining how patient stats are applied
283
in your current role for analyzing player
performance and injury risk.
284
Because now that I work directly in
sports, something I'm starting to
285
understand is that really player
286
projecting player performance and also
being able to handle injury risk are two
287
extremely important topics.
288
So maybe let's start with that.
289
What can you tell us about that, Okay.
290
Let's see.
291
Which one should I start with?
292
I guess I'll start with injury risk, I
suppose.
293
Injury is like...
294
I mean, this is like a super difficult
problem to solve.
295
You know, I've written a number of papers
on those.
296
think you can link to my research gate.
297
And there's a number of methodology papers
that we've written that have looked at
298
things like this.
299
And I think it's complicated because one,
there's like a ton of inter -individual
300
differences as far as why people get hurt.
301
There's a ton of things that we probably,
you know, don't know they're important yet
302
because we can't measure them or we at
least can't measure them in the real world
303
applied.
304
setting, maybe in a lab you can.
305
And then there's other things that we just
don't know because we're like, it's a
306
epistemic problem.
307
Like we're just stupid about it.
308
We're naive that there's other things out
there that maybe we're just unaware of
309
yet.
310
And so it's a really hard problem to try
and solve.
311
So when I see papers that basically come
out and say like an injury prediction
312
model and they're estimating
313
prediction as like a one or a zero, like a
yes or a no, like a binary response, and
314
they give a nice little two by two table
and they talk about how well their model
315
did.
316
I'm always like, I don't, how is that
useful to the people who actually have to
317
do the work?
318
Because in reality, what we're dealing
with is it's probably not unlike a hedge
319
fund manager managing the risk of their
portfolio.
320
And if you think of each player,
321
or each athlete that you deal with as a
portfolio, they each have some level of
322
base risk.
323
So if we know nothing about you, you
really have to have a pretty good handle
324
in your sport of what's the base rates of
risk of injury for position groups and
325
players of different age and things like
that.
326
So that might be an initial model, right?
327
And then from there...
328
The players go out and they do things and
they play and they perform and they
329
compete and they get dinged up and they
take hits and they get, you know, hit by
330
hit by pitches or they get tackled really
hard or things like that.
331
And we collect that information and we're
basically just shifting the probabilities
332
up and down based on what we observe over
time.
333
And when that probability reaches a
certain threshold.
334
And of course you could use a posterior
distribution.
335
So you have an integral of like how much
of the probability distribution is above
336
or below a certain threshold.
337
Then you have the opportunity to have a
discussion about when to act or what to
338
do.
339
And how you act and when to act is going
to be dependent on your tolerance for risk
340
or your coach's tolerance for risk.
341
If it's your best player, if it's the MVP
of your team and it's week two of the
342
season and the risk probability, or let's
say we're using this as a model.
343
Some of the stuff that you mentioned Scott
earlier that we've worked on is like
344
return to play type of models where it's
like, okay, the athlete has, you know, saw
345
an ankle sprain and we're there rehabbing
346
And we have a, you know, we have a test or
several tests, a test battery that tells
347
us where that athlete is on their return
to play timeline.
348
Um, let's say it's week two of the season
and we say, well, there's a, you know, the
349
probability distribution, the posterior
distribution looks like this.
350
Here's the threshold that we'd feel
comfortable releasing this athlete back to
351
full on competition.
352
And there's a 30 % chance
353
they're in good shape and there's a 70 %
chance that they're below that threshold.
354
In week two of the season, we probably
want to say, you know what?
355
Let's not take that risk this week.
356
Let's be a little bit more risk averse
here because it is the best player.
357
And let's wait till we have more
distribution on the right side of the
358
threshold.
359
Alternatively, if it's
360
final game of the season, it's the Super
Bowl or the World Series or the Champions
361
League final or something like that,
you're going to probably take that risk
362
because you need the best player out
there.
363
And so when I think about injury risk
modeling, what I really think about is how
364
do we evaluate this individual's current
status?
365
on our sort of risk score or our risk
distribution.
366
And when do we feel like we need to
intervene and do something?
367
And when are we going to feel like, this
is fine and continue training as is.
368
And I think that's the tricky part.
369
I don't think it's not easy.
370
I don't think I've solved anything.
371
I don't think anyone has, but...
372
Certainly from the perspective of our
staff, we can all sit down with a
373
performance staff of strength coaches and
dieticians and strength coaches and
374
medical people and sit down and have these
conversations.
375
And what makes it nice about using a
Bayesian approach is that we can also take
376
into account domain expertise that we
might not have in the data.
377
So if we sit down on a Monday meeting and
then we say, you know, this player,
378
This is where they're currently at and
this is their risk status, which I don't
379
know, I don't really feel comfortable with
that.
380
How do you feel about it?
381
And then one of the medical people say,
you know, he's been complaining that his
382
hamstring feels really tight and he's been
getting treatment every morning.
383
Well, that's not data that we would be
collecting, but that's valuable domain
384
information that this individual who's
working with the player now adds to this.
385
And it's just like anything in
probability.
386
It's like if we
387
two or three or four independent sources,
all kind of converging on the same
388
outcome, on the same end point, we
probably need to feel really good about
389
making that decision and saying like, hey,
let's do something about this, let's act
390
now, right?
391
So that's kind of how we, you know, that's
how I think about it in that, you know,
392
from that side of things.
393
From the performance side of things, the
development side of things,
394
It's probably going to be, I mean, it'd be
way different for you guys in baseball
395
because you draft a player.
396
You don't expect them to maybe get to the
major leagues and contribute till 23, 24,
397
25 years old.
398
You know, for us, you draft a player and
those are going to be the, you know, next
399
year they're playing, they're ready, you
know, they're in, they're in the mix.
400
So in that regard, you'd be thinking of
models that would probably be, in my head,
401
I would be thinking of it as like models
that are mapping the growth potential of
402
an individual.
403
How are they progressing through the minor
leagues, which attributes matter?
404
And then maybe from there answering
questions like what's the probability that
405
this player makes 20 starts in the major
leagues or starts for three seasons
406
whatever end point makes sense to the
decision makers, obviously.
407
You know, for us, it's more about like
player identification.
408
And again, football is a, is a sport of
small samples.
409
And so in their college years, some of
these kids might really only be a starter
410
or a full -time player in their junior and
senior year, or maybe just their senior
411
year of college.
412
Additionally, you know, unlike, unlike the
NFL where
413
you know, at that highest level, the
talent is much more homogenous.
414
You get to the college football ranks and
you have just this diversity of talent
415
where you might have a big time team
playing a really lower level opponent.
416
And so, you you have to adjust things,
being able to hand off the ball to your
417
running back who's playing against a very
low level opponent.
418
And he goes for 500 yards or something
absurd, 200, 300 yards in a game.
419
that has to be adjusted and weighted in
some way because it's not the same as
420
going two or 300 yards against a big time
opponent.
421
And the big time opponents are more
similar to the NFL players that they're
422
going to play against.
423
And so, you know, all of these types of
things fit into models and hierarchical
424
models and Bayesian models, which help us
utilize prior information.
425
And the other way that the Bayesian models
are useful here
426
You know, sometimes we're dealing with
information that's incomplete because we
427
can't observe all of the cases.
428
You know, for example, in college sport,
division one is the top division.
429
You know, and then you have FBS and then
they division two and division three.
430
So if you pull all the division two kids
that have ever made it as a pro athlete,
431
the list is very small.
432
but they're kids that made it.
433
And so if you were to just build a normal
model on this, it would say like, well,
434
the best players clearly come from these
lower level schools because all of the
435
ones that we have seen have made it, have
been successful.
436
And in theory, there's hundreds of
thousands of kids from that level that
437
have never made it.
438
So we have to adjust that model in some
way.
439
We have to weight that prior back down.
440
Yeah, this guy is really, really good at
that level.
441
but our prior belief on him making it is
very, very low.
442
And you mean he'd have to be so
exceptional in order to, and this is where
443
like, oftentimes people rail on like, use
weekly informative priors, let the data
444
speak a little bit.
445
But there are times where in these
situations where I feel like you could
446
probably put a slightly stronger prior on
this and be like, man, this guy's really
447
gonna have to do something outstanding to
get outside.
448
of the distribution that we believe is on
this just given what we know.
449
Okay, yeah, that's very interesting.
450
That's a very good point.
451
Since I, yeah, related to survivor bias in
a way.
452
How concretely, how do you, how do you
handle these kind of cases?
453
Is it a matter of using a different prior
for these type of players or something
454
Try to do this in a few different ways.
455
One is you try and make basically like
equivalency metrics, like saying if you
456
did X at this low level, it in some way
relates to Y at this other level.
457
So you try and normalize players based on
players that you've seen that have moved,
458
say, between levels
459
of the game.
460
so like, again, if you think about it from
a baseball perspective, you know, hitting
461
40 home runs in AA baseball might be
related to, you know, might be in some way
462
convert to like 33 home runs in AAA and 24
home runs in the MLB or 12 home runs in
463
the MLB or whatever it might be.
464
Right.
465
So trying to,
466
identify equivalencies between those that
we can then like constrain everybody.
467
Other ways is just like, like you said,
like putting a prior on it.
468
knowing the level that the person is
playing at, you would have like a lower
469
level of prior.
470
For example, it's just like playtime.
471
If I think about playtime and performance
as sort of this,
472
this kind of like rising curve that goes
to an asymptote of some upper level of
473
performance.
474
The players way at the left who have very
small number of observations, it would be
475
silly to say that my prior for those
players is the league average.
476
There's a reason why they're not playing
very much.
477
It's probably because people don't think
they're very good, right?
478
So somewhere in that curve,
479
for each of those numbers of observations
across whatever performance metric we're
480
looking at, there's going to be a specific
prior on that continuous distribution.
481
And that's where I would, you know, that's
where we would kind of draw a stake in the
482
ground and say like, we probably think
based on what we know that this player is
483
closer to these players than he is to
those players.
484
Okay, yeah, yeah, I see.
485
Yeah, definitely makes sense.
486
And yeah, yeah, like that point of play
time already tells you something.
487
Because if the player plays less, then
very probably already you know you have
488
information about his level.
489
And that means he's at least not as good
as the A level players that play much more
490
The only time you get in trouble with that
is like an endowment effect where if you,
491
you know, like in major league baseball,
there's been some research on players who
492
are drafted very high in the first round,
second round get progressed up and through
493
the minor leagues faster than players who
were drafted lower, even if they don't
494
outperform those players just because
they're high as a consequence of being a
495
high draft pick.
496
That one's a tricky one, but there has to
be, at some point it's like actually, and
497
this is where like, know, posterior
distributions, you can really, I mean,
498
it's almost like doing an AB test.
499
Like we've got two players and what's the
probability that this guy is actually
500
outperforming the other guy, even though
the other guy might've been, you know, a
501
higher draft pick or something like that.
502
And so you try and at least display, you
know, we try and at least display that
503
visually and have those conversations.
504
It's,
505
kind of in my head, at least maybe I'm
wrong, but a nice way of like helping
506
people understand the uncertainty, you
know, which is really important.
507
always, maybe it's try, you know, I used
to work with a guy who whenever I would
508
present some of the stuff at work and he'd
be like, stop doing that.
509
Like every, every time you present, you
talk about like what the uncertainty and
510
the assumptions and the limitations are,
like just give them the answers.
511
And I'm like, well, it's important that
they know what the limitations
512
and what assumptions are behind this
because we can't, we don't want to talk
513
past the sale and sell them on something
that, you know, isn't really there.
514
Like there's been times where I've had to
stop someone and just be like, hold on.
515
This analysis definitely can't tell us
that.
516
Like what you're saying right now, it
can't tell us that.
517
like, let's not, let's not try and make
this more than it is.
518
And also just, you know, conveying your
uncertainty.
519
mean, that's just super important because
520
It's really, really hard.
521
I mean, we're all going to fail at trying
to identify talent.
522
It's really hard to identify why one
player is going to succeed over another.
523
so, you know, in some way it's not binary.
524
It's not a like, do you like this guy or
not?
525
Is he good or bad?
526
Is this guy better or worse than the other
guy?
527
there's a lot of factors that go into why
someone has success.
528
And so I think conveying that uncertainty
is really important.
529
And obviously, the more observations that
we have of you doing the thing, the more
530
certain we are that this is your true
level of performance.
531
But it takes a while to get there.
532
So we have to just be honest about that.
533
Yeah, yeah.
534
I think that's actually related to
something I wanted to ask you about also a
535
bit more generally, you know, but
536
the most significant challenges that you
face when applying Bayesian stance in, in
537
sports science and, and how you address
them, because I'm guessing that you, you
538
already started talking a bit about that.
539
So, let's go there.
540
And then, then I have other technical
questions for you, but the kind of, of
541
models and, and, and usefulness that
Bayesian stance has in your field.
542
But I think this is a good moment to, to
address these.
543
questions.
544
think the biggest or there's a few
challenges.
545
One challenge is not everybody is excited
about a posterior distribution like you
546
might be.
547
Most of the time, they just want an
answer.
548
Tell me what to
549
Give me the yes or no, make it binary.
550
And so that's always tough.
551
And you're trying to oftentimes convey
this to non -technical audiences or people
552
who are good at doing other things.
553
They're not math people or they're not
stats people.
554
And that's okay.
555
So that always makes it challenging is why
are you showing me this distribute?
556
I don't understand what I'm supposed to
take from this.
557
Just tell me.
558
What to do?
559
Tell me which guy's better.
560
Tell me which guy's worse.
561
So that's always hard.
562
And that takes a lot of patience and
communication.
563
For a while, we used to do just weekly sit
downs with our scouts where we would teach
564
them about like one stat a week.
565
And we'd go slow.
566
And we'd also try and...
567
as best as possible, relate things back to
the currency that they speak in.
568
And scouts and coaches, the currency they
speak in is video, not charts and graphs.
569
So the more that we can connect our
analysis to video cut -ups, because then
570
they can see it.
571
And then they understand why a model says
what it says or makes a decision or why it
572
has assumptions.
573
And this is also super valuable too,
because they give
574
And they say, it's, saying that, you know,
the model is saying that, this is, is the
575
outcome, but I can see why it's because
these four other things happen.
576
It's like, wow.
577
Well, we could probably account for that.
578
And we never, I just didn't know it,
right?
579
That's why they're domain expert and, and,
and I'm not.
580
so.
581
You know, the patience around
communicating stats and numbers is always
582
difficult and also knowing what people
like.
583
When I first started, everybody would tell
you, need to have, you know, got to have
584
an amazing dashboard, got to have like
charts and graphs, you know, and all that
585
stuff.
586
And what I found was there was a lot of
people who were like, I don't, what do
587
you, I don't even know what I'm looking
at.
588
Like, I hate these things.
589
Just give me the table of numbers.
590
It's like, okay.
591
Well, maybe a table of numbers with just
some conditionally formatted information.
592
And also, you know,
593
I have an academic side, I do supervise
PhD students and master students, and I do
594
teach a master's class in statistics at
college.
595
So I guess what I'm about to say would,
know, people on the academic side would
596
hate it, but you have to like recognize
the environment you're in.
597
And sometimes just like changing the
verbiage helps, like instead of calling
598
things the...
599
low credible interval and the high
credible interval, like we just call it
600
the floor and the ceiling.
601
And people are like, yeah, this guy's
floor, it's a bit higher than the other
602
guy's floor.
603
And that guy's ceiling, this guy's got a
better ceiling.
604
And like, know, academically you'd get
shot for that, it's like, those kinds of
605
things go a long way because it brings the
information to the end user.
606
And if you want them to start to...
607
take this information into their decision
calculus, you have to get them
608
comfortable.
609
And sometimes it's just meeting them with
terminology that helps.
610
And so I think that's a really, you know,
that's a big one.
611
Those are big challenges in communicating
this stuff.
612
Yeah, definitely.
613
And I resonate with that.
614
I've had the same issues.
615
I'll be able to tell.
616
talk more precisely about sports in a few
months.
617
But when it comes to a lot of other
fields, whether it's marketing or biostats
618
or electrical forecasting, yeah, the
issues are related to these.
619
They're also extremely diverse.
620
So that's interesting.
621
You definitely don't have a one size fits
all.
622
Definitely what's extremely important
basically is to know the model extremely
623
well from my experience.
624
And yeah, if you have coded the model
yourself, you usually know it really well
625
because you spent hours on it to try and
get it to work and understand what it's
626
doing.
627
And when it's not able to do as you were
saying, I think it's extremely important
628
to be able to tell people what the model
cannot tell you.
629
And yeah, I think these are extremely good
points to try and balance what people are
630
usually wondering about.
631
And that's also where I think having the
Bayesian model is extremely interesting,
632
right?
633
Because the Bayesian model by definition
is extremely open box and you have to run
634
it down your assumptions.
635
And so you know much better what the model
is doing than a black box model.
636
Yeah, I mean, that's another good point
is.
637
If you go into a meeting and you have
model outputs and your only reason when
638
asked, why does it prefer this over that?
639
Your only reason is because the model said
so.
640
If people aren't going to be super excited
about that.
641
knowing why things are happening, know,
this also, you know, I mean, this really
642
plays into like how you validate and check
your models.
643
And so buildings, you know, we kind
644
within that Bayesian sort of world,
building simulations is a big part of it.
645
And building simulations to see how the
model behaves under different constraints
646
and different pieces of information,
that's really important because it gives
647
you useful context to talk about and it
gives you useful information in order to
648
head things off at the pass when you know
there's gonna be some gotchas and some
649
trouble if, you
650
people have certain types of questions.
651
You can head things off of the past
because you're already aware of them.
652
Another thing that I do think is really
useful in this and maybe in some of your
653
prior work in consulting, I'm sure you've
like stumbled on like, or used frameworks
654
like crisp DM and things like that.
655
Like in statistics, there's a PPDAC
problem plan, data analysis and
656
conclusion.
657
Those types of frameworks help just
because again,
658
A lot of times we're dealing with non
-technical audiences and they're trying to
659
give you a question and say like, Hey, can
we look at this?
660
And oftentimes these things are very vague
and very sort of like, you know, not, not,
661
not clearly defined.
662
like, you know, my younger self would take
that and run away and, know, do something
663
for a week or two and then come back and
be like, Hey, here's this thing, you know,
664
and you ask about
665
you know, they're usually like, the reply
is, that's kind of cool, but I was
666
thinking of it like this and I would do
this with it.
667
it's like, man, if you, you know, if you
told me that two weeks ago, I would have
668
done something else.
669
So using those kinds of frameworks, one,
does a few things.
670
One, it gives us the opportunity.
671
Like I always tell our analysts like
question the question, like, you know,
672
question the question.
673
Right?
674
So when they have a question, I'm always
sitting there and I'm like, okay, well,
675
you know, what would you want to do with
this?
676
How, do you foresee yourself using it to
make a decision?
677
What's the cadence that you would need to
access this information?
678
If I were to get it to you tomorrow, you
know, what would you, what kind of
679
decision would you want to make?
680
Like really kind of Socratic questions,
you know, question the question.
681
And, that does a few things.
682
One, we get, we get to two, you know,
683
usually two different results.
684
Both of them are good.
685
The first is I get them to then walk
through that five minutes with me and
686
clearly define what it is they're looking
for.
687
That's great.
688
The other result is the opposite, but it's
also a good result, which is we get about
689
three minutes in and they go, you know
what?
690
I haven't thought about this well enough.
691
Let me think through it a bit more and
come back to you.
692
In which case I didn't waste the time
building things and scraping and cleaning
693
data and doing all that stuff.
694
The other thing that those frameworks do
is, and I try and get analysts to think
695
like this, is utilize each step within
those frameworks as touch points back to
696
the person who asked you the question.
697
Hey, this is where we're at.
698
We've collected this kind of data.
699
These are the things we're thinking.
700
These are the features that we're thinking
about using.
701
What do you think about that?
702
Anything else you can think of.
703
By doing that, along each step of the way,
they get to see the model developed.
704
They get to provide input.
705
And what that does is it gives them a bit
of ownership over it.
706
So when you get to the end result, they're
like, geez, this was built exactly in my
707
vision, and now I'm excited to use it.
708
And that's a really cool thing too.
709
Yeah.
710
Yeah.
711
Thanks for that detailed answer, Patrick.
712
I can definitely hear the 10 years of
experience working on that.
713
That makes me think about a lot of other
things.
714
Yeah, definitely the same for me, would
say, where my personal evolution has been
715
trying to really understand the question
the consumer of the model is trying to get
716
to, right?
717
Like what actually is your question?
718
Because you have something in mind, but
maybe the way we're talking about it right
719
now and the way I have it in mind is not
what you want.
720
And so, yeah, as you were saying, a good
model is really that's custom made, that's
721
fine and hard work and that takes time.
722
so before investing all that time in doing
the model, let's actually make sure we
723
align and agree on what we're actually
looking at in studying.
724
That's, think it's extremely important.
725
Yeah, no doubt.
726
I think that's often the hardest part,
because it's just getting people to really
727
define.
728
that's probably, I mean, that and making
sure that you have good data.
729
Those are the two biggest things.
730
The model building part and things like
that sort of happen a little bit easier
731
once you do the first two things.
732
That's always the tough part.
733
Yeah, yeah, yeah.
734
Actually, continuing on that topic, how do
you communicate these statistical
735
concepts?
736
And honestly, a lot of them are really
complex.
737
So how do you communicate that to non
-stats people in your line of work?
738
I'm guessing that would be scouts, as you
talked about, coaches, players.
739
How do you make sure they understand?
740
what you're doing and in the end are able
to use it because we talked about that in
741
episode 108 with Paul Sabin.
742
If your model is awesome but not used,
it's not very interesting.
743
So yeah, how do you do that?
744
First, trying to really understand what
kind of cadence this is going to be on.
745
So some questions.
746
especially in sport, get asked.
747
And they're more asked from the knowledge
generation standpoint, meaning that I have
748
a question.
749
I think it'll help us with, you know,
updating our priors, our prior beliefs
750
about the game.
751
Maybe things have changed.
752
Maybe rule changes have altered things or
something like that.
753
Can we study this?
754
A question like
755
for knowledge generation requires a
different output than something that's
756
like weekly or daily consumption.
757
So if it's for knowledge generation,
that's usually communicated in the form of
758
like a short written report.
759
The question at the top, the bottom line
up front, here's the four bullet points,
760
and then the nitty gritty.
761
Like this is how we went about studying
it.
762
charts and graphs and usually it's like a
page or two and a PDF or maybe like an
763
interactive HTML file that they can see
things and have a table of contents and go
764
to different sections.
765
If the question is directed at stuff
that's required to be evaluated weekly or
766
daily, like I need to see this every week
because we're going to be evaluating a
767
certain player or an opponent or I need to
see this daily because it's
768
player health related, something like
that.
769
We're always thinking in terms of like web
applications.
770
So how do I get, you now I have to think
through the full stack pipeline of like,
771
where do we get the data?
772
Where does it live in the database?
773
What's the analysis layer?
774
Kick it out to an output.
775
Where's that output stored?
776
And then how does the website ingest that
output and make it consumable?
777
And for that,
778
It's usually some form of charts and
graphs and a table.
779
And usually it's interactive stuff.
780
So they can sort and filter and hover over
points and access the information.
781
And again, as best as possible, I'm always
thinking to try and develop that in the
782
way that they're going to use it.
783
So like I was sitting down, for example,
today with our director of player health,
784
and he was like, you
785
I'd love to have this information daily so
that I can relay it to the new coaching
786
staff.
787
And I want to say it, you know, say these
things.
788
Okay, great.
789
I have all that information.
790
I have all of that, those models, but come
over to the whiteboard and draw for me the
791
path that you want to take to going from
sitting at your desk.
792
and reading the information from a webpage
to how you want to communicate it.
793
And as soon as he started drawing it out,
it's like, okay, I know exactly what to do
794
now.
795
That's perfect.
796
Otherwise I would have built something
that in my head I thought would be useful,
797
but maybe not useful to him.
798
And then he uses like part of it or maybe
because he's super motivated, he's going
799
to use it.
800
And he's also going to,
801
use like 10 other things to get the other
stuff he wants, but he's a nice guy and he
802
doesn't want to tell me that it doesn't
have all the things that he needs.
803
And so then like four weeks later, I walk
in his office, I'm like, what are you
804
doing?
805
It's like, oh, I go here and then I get
this information from this webpage, but
806
then I go to this other three webpages
again.
807
So, whoa, whoa, whoa, why didn't you just
tell me that?
808
Like I'll just, I could make this all into
one thing.
809
Like you don't have to, and so.
810
That's a really important piece is knowing
how the data is going to be utilized,
811
making sure that it's exactly in the order
that the decision maker requires it.
812
Yeah.
813
Awesome points.
814
Yeah.
815
Thanks for that, Patrick.
816
And I think it's also very valuable to a
lot of listeners because we're talking
817
about a professional sports team here, but
it is definitely transferable to
818
basically, I think, any company where
you're working
819
different people who are using the models
but are not themselves producing the
820
models.
821
It's like almost every company out there.
822
yeah, I think and also from my experience
doing consulting in a lot of different
823
fields, I can definitely vouch for the
things you've touched on here.
824
yeah, thanks.
825
That's definitely, I think, very valuable.
826
turn back a bit more to the technical
stuff because I see time is running and I
827
definitely want to touch a bit more on the
spot side of things and how patient stance
828
is applied in the film.
829
Obviously a very important part of your
work is, I'm guessing,
830
drafting players, player selection
processes.
831
So yeah, how might Bayesian methods be
applied here to improve the drafted
832
strategies in the player selection
processes?
833
Yeah, well, again, like I think I said
earlier, everybody's going to miss.
834
It's impossible to be, you know...
835
to have a good hit rate and always be
picking, you know, picking players who are
836
going to reach high level success.
837
And a lot of that is just because, you
know, performance and talent are extremely
838
right -tailed.
839
You know, you have a whole bunch of
players that never make it.
840
You have a small group that make it and
are good enough to make it.
841
You have an even smaller group that are
good enough to make it and like really
842
good to play all the time.
843
And then you have
844
a few Hall of Famers sprinkled in, right?
845
So it's really right -tailed.
846
it is very hard to do this stuff.
847
So, you know, understanding or modeling
your uncertainty, that's really important.
848
And
849
information from the domain experts, know,
scouts see things on film that we can't
850
see in numbers and vice versa.
851
One of the values that we have is we can
process way more players than any one
852
human can actually watch.
853
So we have the ability to build models
that can identify players and hopefully
854
get them,
855
over to the domain experts who have to
then watch the film and write the reports
856
and say like, hey, did you know this guy
was really good in these things?
857
This is his potential ceiling.
858
And we think that we have, you know, we
think that this would be valuable for our
859
team, right?
860
Building models like that, that help us.
861
Identify talent, give us a range of
plausible outcomes.
862
One, it helps us get information to the
people who have to watch the film and make
863
the decisions.
864
Two, it helps us have discussions about
where the appropriate time to acquire
865
people
866
If you're sitting there, obviously, you
know, in the major league draft, major
867
league baseball draft, it would be the
same thing.
868
Everybody knows who the first round picks
are and the second round picks.
869
It's after that, that things become pretty
sparse.
870
And if you can identify players that have
unique abilities later in the draft, that
871
opens up a lot of opportunities to,
872
select players that might be able to
contribute successfully to your team.
873
And so that's really where those models
help us.
874
The other area that they help us in is, I
always talk about with our analysts, like,
875
what is the benchmark that you're trying
to beat?
876
So every model, like you can't just build
a model.
877
I mean, I remember one of our analysts,
she had a model and she said, I built a
878
model and I think it's really good.
879
And I said, cool.
880
How well does it do against the benchmark?
881
She's like, well, what do you mean?
882
And I was like, well, like how well does
it do against if we just use, let's say
883
scout grades or if we just use public
perception, how well does it do
884
historically against that?
885
She's like, no, no, no, I don't care about
that.
886
Like this model is just with their stats
and you You know, it's like, no, no, but
887
you have to care about that because if
it's not better than those things, then
888
why would we use it?
889
Right?
890
You have to be able to beat that
benchmark.
891
One of the areas where we can really beat
a benchmark is when we combine the domain
892
experts information with the actual
observed data information.
893
And a Bayesian model allows us to do that,
right?
894
It allows us to take down the domain
expert who's maybe scoring the player a
895
certain way, writing information about the
player.
896
It allows us to take that information.
897
mix it with the numbers and get a model
that is, I guess, man and machine, right?
898
And those models beat our benchmark much
better than any one of these alone, right?
899
If we just use numbers, never watched any
film, never knew anything about the
900
player, or if we just use domain expert
information.
901
When we combine those things, we tend to
do a much better job.
902
And so that's where Bayesian analysis
really helps us.
903
And also,
904
That's where you start to get interesting
discussions about the floor and the
905
ceiling of a player.
906
Because now once you run their posterior
distribution and the domain experts
907
information is in there and you're saying,
yeah, this guy, he's awesome at tackling
908
and he'll be a great tackler and blah,
blah, blah.
909
And these are his numbers.
910
the numeric model says like, yeah, I think
this guy's a pretty good tackler.
911
Domain experts saying like, no, no, no, I
watched him and he doesn't play against
912
great competition, but his technique is
really bad.
913
It's not going to translate against these
bigger players.
914
It's like, well, that's not information
that maybe our stats would have.
915
But when we combine those two bits of
information, all of a sudden, our maybe
916
overly bullish belief in this player gets
brought down a bit.
917
And utilizing the information like that
918
is interesting and it also makes it unique
to the people that are in that room, the
919
domain experts that you have in that room
and things like that.
920
How you weight those things is really
important.
921
For our own analytics staff, we'll do
things like we'll build our own separate
922
models and have our own meetings and we'll
build our own analysis.
923
So we'll have independent models all
against each other and maybe we'll have
924
them weighted or we'll use
925
you know, like triangle prior and build
them together and, you know, mix them
926
together and get posterior simulations.
927
And we try and do those things in a way
that allows us to understand all the
928
plausible outcomes that might be relevant
for this individual.
929
It's fascinating.
930
Yeah.
931
And I really love both that feel the fact
that you have to blend a lot of different
932
information.
933
Like the domain knowledge from the scouts,
the benchmark from the markets, the models
934
that you have in house, also scientific
knowledge of all the scientists that the
935
team has inside of it.
936
that makes all that much more complicated,
right?
937
I'm guessing sometimes as the modeler, you
would probably be like, my God, that'd be
938
so much easier if we could just run some
very big neural network and that'd be
939
done.
940
at the same time, I think it's what makes
the thrill of that field, at least for me,
941
is that, no, that stuff is really hard.
942
There is a lot of randomness.
943
There is a lot of things we don't really
understand either.
944
And you have to blend all of these
elements together to try and make the best
945
decisions you can, even though you know
you're not making the optimal decisions,
946
as you are saying.
947
And I think it's a fascinating field to
study important decision -making under
948
uncertainty.
949
Yeah, for sure.
950
I think that's the thing that's most
951
interesting about it to me.
952
Like, yeah, I think that's the most that
stuff is fascinating just knowing
953
Yeah, Decision making under uncertainty is
really challenging and I think that's the
954
thing that makes this the most, you know,
the coolest stuff to work on.
955
Yeah, yeah, no, definitely.
956
Actually, maybe a last question on the
technical side.
957
Now if we look, so we've talked about the
beginning of the career of a player,
958
right?
959
Like the draft.
960
We've talked about...
961
kind of the whole lifetime of the player,
which is projection, performance
962
projection over the whole career.
963
Now I'm wondering about the day -to -day
stuff.
964
What can Bayesian models tell us here or
how can they help us in predicting the
965
impact of training loads on the athletes'
wellbeing and performance?
966
I know, I think it's kind of a frontier
967
almost all the sports, but I'm curious
what the state of the art here is,
968
especially in US football.
969
Yeah, it really is, I think, the sort of
one of the final frontiers, I guess, in
970
sport.
971
Team sport is just challenging because you
perform well or you win or you lose due to
972
a whole bunch of issues that sometimes
973
have nothing to do with you.
974
For example, I can train you, you know, we
could train you and you could be very fit
975
and strong.
976
And if in the last play of the game, the
quarterback throws the ball to a patch of
977
grass and you lose, it had nothing to do
with you being fit and strong.
978
know, counter that to like individual
sport athletes.
979
If you're a 400 meter runner, a cyclist, a
swimmer, a runner, a marathoner, you know,
980
physiologically.
981
If we build you up, we have a much more
direct line between how you develop and
982
how it directly relates to your
performance.
983
There's not a lot of other information
there.
984
No one's trying to tackle you on the bike
or in the pool or something like that.
985
So that makes, that makes a sport much
more difficult.
986
Baseball is probably the closest because
even though it is a team
987
It really is this sort of zero sum duel
between a pitcher and a batter.
988
And one guy wins and one guy loses.
989
And the events are very discreet.
990
The states of the game have been played
out, know, runner on first and second with
991
two outs, bottom of the third, blah, blah,
blah.
992
So it's maybe a little bit more clear in
baseball.
993
I think in the other team sports, in the
kind of invasion sports,
994
what makes this challenging is
identifying.
995
I always try and take it back to
identifying the discrete events that we're
996
trying to, trying to maybe measure
against.
997
like, for example, I can give you example,
a pretty clear example from basketball.
998
was talking with a friend in a, in an NBA
team and, he was like, yeah, you know,
999
our, our, our coach and our scouts and
the, you know,
Speaker:
coaches, feel like our players don't close
out three pointers fast enough.
Speaker:
And I was like, well, is that a tactical
problem or is it a physical problem?
Speaker:
And he's like, well, how would we look at
that?
Speaker:
And I was like, you have the player
tracking data.
Speaker:
And if you know every time your team's on
defense, which is easy to know, and you
Speaker:
know every three pointer that's been shot
against your defense, if you were to take
Speaker:
that frame,
Speaker:
out of the player tracking data and maybe
like the frame a second to a second and a
Speaker:
half before that.
Speaker:
So all of that information for every one
of those three pointers.
Speaker:
You have an idea of the relationship
between your player and the player who's
Speaker:
taking the three point shot.
Speaker:
You have an idea of the relationship
between your player and the other players
Speaker:
on his team.
Speaker:
So you know from a technical, a tactical
standpoint.
Speaker:
you know what type of like formation or
defense you're trying to run.
Speaker:
So first things first, are the players in
the right position to close out that three
Speaker:
pointer?
Speaker:
Maybe, you know what?
Speaker:
Our guys consistently mess up the
defensive shape and when they get in
Speaker:
there, they give too much ground to the
guy shooting a three pointer.
Speaker:
The other is the physical standpoint of,
well, no, they're in good position, but
Speaker:
when they go to close it out over that
second and a half,
Speaker:
They're not fast enough to get there.
Speaker:
Okay, great.
Speaker:
Now roll it back to what you can measure
in the gym.
Speaker:
Is there some measure, let's say on a
force plate of the amount of impulse or
Speaker:
force under the force time curve that the
player outputs that can tell us something
Speaker:
about their ability to move rapidly, apply
force into the ground, move rapidly to
Speaker:
close out that three pointer?
Speaker:
And maybe if you look at several years
worth of data, you'd find
Speaker:
The top players on your team all do this
thing really well, and some of the worst
Speaker:
players at closing out the three do this
thing poorly.
Speaker:
And so now you have something to say about
like, hey, what if we develop this quality
Speaker:
in the off season and our players, would
we be able to close out the three pointers
Speaker:
more effectively, more efficiently?
Speaker:
And so I think from that standpoint,
linking the development piece to sport,
Speaker:
team sport, invasion sport.
Speaker:
You have to really think about the
discrete events of the game and how you
Speaker:
can kind of tease those out of, let's say
the player tracking data.
Speaker:
And it's like super hard in something
like, you know, in football, because
Speaker:
players all do really different things.
Speaker:
You know, the linebacker does something
totally different than the offensive
Speaker:
lineman.
Speaker:
And so you have to really get down to the,
the domain of each of those positions and
Speaker:
say like, gosh, what are the discrete
events?
Speaker:
that define what this position does, then
how do we measure success in those?
Speaker:
And then if we can measure success, how do
we identify the archetype of players who
Speaker:
are good at those things?
Speaker:
And then if we can do that, maybe then we
can start to talk about, is this something
Speaker:
that you can develop in a player?
Speaker:
Is it something that you have to identify
in a player?
Speaker:
That's sort of the, in my head, I mean, I
don't know, I could be wrong.
Speaker:
This is not.
Speaker:
Nobody, think everybody's trying to figure
this out, but I could be wrong.
Speaker:
But in my head, that's at least the
process that I would, you know, I try and
Speaker:
think through when I think about these
things.
Speaker:
Yeah.
Speaker:
Yeah.
Speaker:
It makes a ton of sense.
Speaker:
mean, and it seems like, yeah, that, and
there are so many areas, open areas of
Speaker:
research on all of that stuff.
Speaker:
That's just, just fascinating.
Speaker:
I'm
Speaker:
I'm already thinking, that'd be amazing to
have a huge patient model where you have
Speaker:
all of those topics that we've talked
about.
Speaker:
Basically, it could be a big patient model
where you have a bunch of likelihoods.
Speaker:
And yeah, that'd be super fun.
Speaker:
I'm guessing we're still a bit far from
that, but maybe not too far.
Speaker:
Hopefully in a few years, that'd be
definitely super fun.
Speaker:
Yeah, no doubt.
Speaker:
Yeah, I mean, and that's, definitely
doable.
Speaker:
But yeah, you need you need really good
data and you need really good structure in
Speaker:
your model.
Speaker:
Yeah, that's the part too, is getting
getting good data, know, player tracking
Speaker:
data is fine.
Speaker:
I mean, it has errors, you know, people
who think that it's like a panacea, you
Speaker:
know, it's like, have you really worked
with it?
Speaker:
I mean, there's
Speaker:
Sampling at 10 hertz for humans that move
really, really fast.
Speaker:
Acceleration is a derivative of speed.
Speaker:
At 10 hertz, people who are moving really
fast, that data gets noisy pretty quick.
Speaker:
I think one of the things is as we
progress, as the technology keeps
Speaker:
improving, things get better.
Speaker:
you get better data and maybe that helps
you also answer some of these questions a
Speaker:
little bit more specifically.
Speaker:
yeah.
Speaker:
And then we'll be able to have our huge
patient model with a lot of different
Speaker:
likelihoods in there that fit into each
other.
Speaker:
And then we don't even need to play the
game.
Speaker:
We don't have to play the game.
Speaker:
They just let the computers play the game
and it's over.
Speaker:
We're done.
Speaker:
Yeah, no.
Speaker:
No, you still have to play the game
because you still have randomness.
Speaker:
Then you're like, yeah.
Speaker:
mean, because otherwise the model is kind
of like if you want kind of a quantum
Speaker:
state, right?
Speaker:
Where the model can see the probabilities
of things happening, but then you have to
Speaker:
open the box and see what is actually
happening.
Speaker:
So you can have the best model.
Speaker:
In the end, you still have to play the
game to see what's going to happen because
Speaker:
it's not deterministic.
Speaker:
Yeah, thankfully.
Speaker:
yeah, that's right.
Speaker:
Yeah.
Speaker:
Yeah.
Speaker:
But I mean, it's definitely I always love
doing these these big models.
Speaker:
And that's definitely doable.
Speaker:
I've done that for election forecasting,
for instance, where you have several
Speaker:
likelihoods, one for polls, for instance,
and one for elections.
Speaker:
So yeah, that's I know that's definitely
doable in the Bayesian framework, because
Speaker:
I mean, why not?
Speaker:
It's just part of the big
Speaker:
of the same big model in a directed S
-secret graph, if you want.
Speaker:
But yeah, I'm curious to see that done in
spots.
Speaker:
Maybe we'll get back together for another
episode, Patrick, where we talk about that
Speaker:
and how we did that.
Speaker:
That'd be cool.
Speaker:
Yeah, there you go.
Speaker:
Yeah, actually, I wanted to ask you to
close us out here.
Speaker:
about, you you've started talking about
that right now, like some emerging trends
Speaker:
in sports analytics that you believe will
significantly impact how teams manage
Speaker:
training, performance, drafting in the
near future.
Speaker:
And also if there are any spots you see as
more promising than others.
Speaker:
well, mean, yeah, trends.
Speaker:
Yeah.
Speaker:
We talked a lot about that stuff and I
think, you know, better data and better,
Speaker:
you know, better technology.
Speaker:
all of those things will, will, I think
will help us.
Speaker:
I think also it's getting, you know,
getting the decision makers comfortable
Speaker:
with the utility of some of this stuff,
you know, baseball, has always been a game
Speaker:
of numbers.
Speaker:
And, I think early.
Speaker:
maybe mid 2000s, early 2004, five, six,
seven, you know, releasing data kind of to
Speaker:
the public, really the first sport to get
player tracking data, things like that.
Speaker:
I think that opened up a lot of
opportunities for people to do really
Speaker:
interesting work in the public space,
which then sort of got
Speaker:
teams interested and then sort of a, you
know, more of a shift in people in the
Speaker:
front office where, maybe historically it
was ex players who kind of played out
Speaker:
until they retired and then became scouts
and managers and things like that.
Speaker:
I think that, you know, that happening in
baseball was a really good thing for that
Speaker:
sport.
Speaker:
And I think slowly for the other sports,
that's really
Speaker:
probably needs to happen because the more
that these things are open and sort of
Speaker:
curbside, I think the more the decision
makers become comfortable with them and
Speaker:
can say like, I can see how I would use
this.
Speaker:
I can see what this might help me with.
Speaker:
so I think that's never underestimate the
work that you do in the public space
Speaker:
because I think there's an opportunity to
always.
Speaker:
you know, help things evolve,
crowdsourcing, guess.
Speaker:
Yeah, mean, preaching to the choir here.
Speaker:
Yeah, for me, a lot more of these data
would be open sourced.
Speaker:
Yeah, I mean, there is also an extremely
interesting trend right now towards open
Speaker:
sourcing more and more parts of large
language models.
Speaker:
I think that's going to be extremely
interesting to see that develop because
Speaker:
At the same time, this is very hard
because these kind of models are just so
Speaker:
huge.
Speaker:
You need a lot of computing power to make
them run.
Speaker:
So I don't know how open source can help
in that, but I know how open source can
Speaker:
help in the development and sustainability
and trustworthiness and openness of all
Speaker:
that stuff.
Speaker:
So that's going to be super interesting.
Speaker:
And I'm also going to be very interested
in
Speaker:
the different spots evolve.
Speaker:
Now that basically the nerds are they are
much more right than before.
Speaker:
know, so like, probably baseball is going
to be at the forefront of that because
Speaker:
they just have a lot of, of, know,
advanced in years compared to the other
Speaker:
sports.
Speaker:
So it's going to be interesting to see how
things plays out here when it comes to
Speaker:
data.
Speaker:
Because at the same
Speaker:
Not sure it makes a lot of sense for all
the clubs to have their own data
Speaker:
collection structure if in the end they
just have the same data because you're
Speaker:
mainly, I think, to gather data, you are
limited, I'm guessing, by the technology
Speaker:
much more than by the ideas of a coach or
manager or a scientist being like, I
Speaker:
data, I think in the end, the data
collection is something that can be pretty
Speaker:
much, you know, collective, but then how
you use the data is more the appropriate
Speaker:
proprietary stuff.
Speaker:
It's going to be interesting to see that
out.
Speaker:
Yeah, no doubt.
Speaker:
Great.
Speaker:
Well, Patrick, I've taken a lot of your
time already.
Speaker:
I need to let you go because...
Speaker:
You need to drink some coffee.
Speaker:
definitely need to because that was very
intense.
Speaker:
But man, so interesting.
Speaker:
Before letting you go, so I have the last
two questions, of course, as usual.
Speaker:
You told me before we started the show
that when the season is going to start
Speaker:
again for you in US football, your days
are going to be extremely busy.
Speaker:
Like basically working from 5 a .m.
Speaker:
to 10 p .m.
Speaker:
or something like that.
Speaker:
How is that possible?
Speaker:
when do you sleep?
Speaker:
We do have some long days.
Speaker:
It depends on the day of the week and when
the full practice days are.
Speaker:
Usually, yeah, I'd get in around 4, 45 or
5, have a bit of a workout, and then kind
Speaker:
of start the day around 6, 30 or 7.
Speaker:
And it's really long.
Speaker:
I mean, there's a ton of meetings.
Speaker:
It's a very tactical sport if you've ever
watched it.
Speaker:
And so the players are nonstop in and out
of meetings and walk through practices and
Speaker:
full practices and then more meetings.
Speaker:
it's all a big, you know, tactical pattern
recognition type of thing.
Speaker:
And so, you know, we're in, you know,
working on projects and data and getting
Speaker:
you know, things set up so that model set
up and identifying things in data for the
Speaker:
staff and things like that.
Speaker:
it just becomes this really long day.
Speaker:
And I mean, like, yeah, if we go home at
eight or nine, maybe 930 sometimes, maybe
Speaker:
10, but I mean, there's people there
that'll stay even later than that, just
Speaker:
going through film and watching it.
Speaker:
They are very long days.
Speaker:
Usually those types of days are about
three days a week and then the other days,
Speaker:
I might be in there at five and get out at
like five or six.
Speaker:
So still 12 hour days, but it's a long
week for sure.
Speaker:
This is brutal.
Speaker:
Yeah.
Speaker:
But is it like that during the whole
season or is that mainly the start of the
Speaker:
season?
Speaker:
No, that's the season.
Speaker:
That is
Speaker:
18 weeks later we have a bye, 17 games
this season.
Speaker:
Damn, impressive.
Speaker:
You have to be sharp with your sleep also,
I guess in these weeks.
Speaker:
You do, yes.
Speaker:
You try and catch up on the weekends.
Speaker:
Yeah, damn.
Speaker:
Awesome, well Patrick, I think it's time
to call it a show.
Speaker:
Thank you so much, that was amazing.
Speaker:
Of course, I'm going to ask you the last
two questions, ask every guest at the end
Speaker:
of the show.
Speaker:
You knew that was coming, right?
Speaker:
Yes.
Speaker:
So what's the first one?
Speaker:
You know the first one.
Speaker:
The first one is, if unlimited resources,
what problem would you solve?
Speaker:
Yeah, unlimited time and resources.
Speaker:
I'll take one outside of sport, but one
Speaker:
I witnessed in sport.
Speaker:
so when I first started, I used to do all
of the GPS stuff, like live on the field.
Speaker:
Now someone else does it, but coding it or
cutting it up and stuff like that during
Speaker:
practice.
Speaker:
And on Friday practices at the time, that
was the day for our make -a -wish, the
Speaker:
make -a -wish child.
Speaker:
So they'd have kids that had make a wish
and their wish was to see a practice and
Speaker:
meet their favorite NFL players.
Speaker:
And these were usually kids that were, you
know, were small and terminally ill.
Speaker:
I think the, that's probably the thing
that I would solve because standing there
Speaker:
and you watch that and you work with all
these guys that are healthy and young.
Speaker:
And then you see this little kid who never
have a chance to
Speaker:
healthy and young, but they're just so
happy to meet these guys.
Speaker:
I think like that's a super unfair thing
for those little kids.
Speaker:
if I could solve anything, it'd be like
that, you know, kids and cancer and stuff
Speaker:
like that.
Speaker:
I think it's just a horrible thing.
Speaker:
And then your second question is always, I
could have dinner with anyone dead or
Speaker:
alive, who would it be?
Speaker:
There's so many good ones, but I think I
would pick...
Speaker:
a previous guest that you've had, I think
three times, if I'm correct, which is
Speaker:
Andrew Gellman.
Speaker:
I think he's fascinatingly interesting and
I think dinner would be pretty amazing.
Speaker:
Yeah.
Speaker:
yeah.
Speaker:
Both good choices, amazing answers.
Speaker:
Thanks, Patrick.
Speaker:
I can tell your faithful listeners because
they're like, yeah, you knew the
Speaker:
questions.
Speaker:
Like you're taking my job basically, I can
see that.
Speaker:
No, that's great.
Speaker:
So Andrew, if you're listening, well, if
you're ever in New York, Patrick will try
Speaker:
and make that work.
Speaker:
That'd be fun for sure.
Speaker:
Yeah, Andrew is always fantastic to talk
to.
Speaker:
So yeah, that's definitely a great choice.
Speaker:
Awesome.
Speaker:
Well, that's it, Patrick.
Speaker:
Thank you so much for being in the show.
Speaker:
I really had a blast and learned a lot
about US football because I that's, I
Speaker:
think that's not the sport I know most
about.
Speaker:
So definitely thank you so much for taking
the time.
Speaker:
We'll put resources to your website in the
show notes for those who want to dig
Speaker:
deeper.
Speaker:
have a bunch of links over there and
Speaker:
Thank you again, Patrick, for taking the
time and being on the show.
Speaker:
Thank you.
Speaker:
This has been another episode of Learning
Bayesian Statistics.
Speaker:
Be sure to rate, review, and follow the
show on your favorite podcatcher, and
Speaker:
visit learnbaystats .com for more
resources about today's topics, as well as
Speaker:
access to more episodes to help you reach
true Bayesian state of mind.
Speaker:
That's learnbaystats .com.
Speaker:
Our theme music is Good Bayesian by Baba
Brinkman, fit MC Lars and Meghiraam.
Speaker:
Check out his awesome work at bababrinkman
.com.
Speaker:
I'm your host.
Speaker:
Alex Andorra.
Speaker:
You can follow me on Twitter at Alex
underscore Andorra like the country.
Speaker:
You can support the show and unlock
exclusive benefits by visiting Patreon
Speaker:
.com slash LearnBasedDance.
Speaker:
Thank you so much for listening and for
your support.
Speaker:
You're truly a good Bayesian.
Speaker:
Change your predictions after taking
information in and if you're thinking of
Speaker:
me less than amazing, let's adjust those
expectations.
Speaker:
me show you how to be a good Bayesian
Change calculations after taking fresh
Speaker:
data in Those predictions that your brain
is making Let's get them on a solid
Speaker:
foundation