Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag 😉
Takeaways:
- Communicating Bayesian concepts to non-technical audiences in sports analytics can be challenging, but it is important to provide clear explanations and address limitations.
- Understanding the model and its assumptions is crucial for effective communication and decision-making.
- Involving domain experts, such as scouts and coaches, can provide valuable insights and improve the model’s relevance and usefulness.
- Customizing the model to align with the specific needs and questions of the stakeholders is essential for successful implementation.
- Understanding the needs of decision-makers is crucial for effectively communicating and utilizing models in sports analytics.
- Predicting the impact of training loads on athletes’ well-being and performance is a challenging frontier in sports analytics.
- Identifying discrete events in team sports data is essential for analysis and development of models.
Chapters:
00:00 Bayesian Statistics in Sports Analytics
18:29 Applying Bayesian Stats in Analyzing Player Performance and Injury Risk
36:21 Challenges in Communicating Bayesian Concepts to Non-Statistical Decision-Makers
41:04 Understanding Model Behavior and Validation through Simulations
43:09 Applying Bayesian Methods in Sports Analytics
48:03 Clarifying Questions and Utilizing Frameworks
53:41 Effective Communication of Statistical Concepts
57:50 Integrating Domain Expertise with Statistical Models
01:13:43 The Importance of Good Data
01:18:11 The Future of Sports Analytics
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.
Links from the show:
- LBS Sports Analytics playlist: https://www.youtube.com/playlist?list=PL7RjIaSLWh5kDiPVMUSyhvFaXL3NoXOe4
- Patrick’s website: http://optimumsportsperformance.com/blog/
- Patrick on GitHub: https://github.com/pw2
- Patrick on Linkedin: https://www.linkedin.com/in/patrickward02/
- Patrick on Twitter: https://twitter.com/OSPpatrick
- Patrick & Ellis Screencast: https://github.com/thebioengineer/TidyX
- Patrick on Research Gate: https://www.researchgate.net/profile/Patrick-Ward-10
Transcript:
This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.
Transcript
Today's episode takes us into the dynamic
intersection of Bayesian statistics and
2
:sports analytics with
3
:Patrick Ward, the Director of Research and
Analysis for the Seattle Seahawks.
4
:With a rich background that spans from the
Nike Sports Research Lab to teaching
5
:statistics, Patrick brings a wealth of
knowledge to the table.
6
:In our discussion, Patrick delves into how
these methods are revolutionizing the way
7
:we understand player performance and
manage injury risks in professional
8
:sports.
9
:He sheds light on the particular
challenges of translating
10
:complex Beijing concepts for coaches and
team managers who may not be versed in
11
:statistical methods but need to leverage
these insights for strategic decisions.
12
:Patrick also walks us through the
practical aspects of applying Beijing
13
:stats in the high -stakes world of the
NFL.
14
:From selecting the right players to
optimizing training loads, he illustrates
15
:the profound impact that thoughtful
statistical analysis can have on a team's
16
:success and players'
17
:For those of you who appreciate the blend
of science and strategy, this conversation
18
:offers a behind -the -scenes look at the
sophisticated analytics powering team
19
:decisions.
20
:And when he's not dissecting data or
strategizing for the Seahawks, Patrick
21
:enjoys the simple pleasures of reading,
savoring coffee, and playing jazz guitar.
22
:This is Learning Bayesian Statistics,
,:
23
:Welcome to Learning Bayesian Statistics, a
podcast about Bayesian inference.
24
:the methods, the projects, and the people
who make it possible.
25
:I'm your host, Alex Andorra.
26
:You can follow me on Twitter at alex
.andorra, like the country.
27
:For any info about the show, learnbasedats
.com is Laplace to be.
28
:Show notes, becoming a corporate sponsor,
unlocking Bayesian Merge, supporting the
29
:show on Patreon, everything is in there.
30
:That's learnbasedats .com.
31
:If you're interested in one -on -one
mentorship, online courses, or statistical
32
:consulting,
33
:Feel free to reach out and book a call at
topmate .io slash alex underscore and
34
:dora.
35
:See you around folks and best patient
wishes to you all.
36
:And if today's discussion sparked ideas
for your business, well, our team at Pimc
37
:Labs can help bring them to life.
38
:Check us out at pimc -labs
39
:Hello, my dear patients, I have some
exciting personal news to share with you.
40
:I am thrilled to announce that I have
recently taken on a new role as a senior
41
:applied scientist with the Miami Marlins.
42
:In this position, I'll be diving even
deeper into the world of sports analytics,
43
:leveraging Bayesian modeling, of course,
to enhance team performance and player
44
:development.
45
:And honestly, this move is so exciting to
me and solidifies my commitment to
46
:advancing the application of Beijing stats
and sports.
47
:if you find yourself in Miami or if you're
curious about the intersection of Beijing
48
:methods and baseball or team sports in
general, don't hesitate to reach out.
49
:OK, back to the show now.
50
:Patrick Ward, welcome to Learning Bayesian
Statistics.
51
:Thanks for having me.
52
:I listen to every episode.
53
:I think every year at the end of the year,
Spotify tells me that it's one of my
54
:highly listened to podcasts.
55
:So it's pleasure to be here.
56
:Hopefully.
57
:I don't know if I can live up to your
prior.
58
:You've had some pretty big timers, but
yeah.
59
:No, yeah.
60
:So first, thanks a lot for being such
61
:faithful listener.
62
:I definitely appreciate that.
63
:And I'm always amazed at the diversity of
people who listen to the show.
64
:That's really awesome.
65
:And also I want to thank Scott Morrison,
put us in contact.
66
:Scott is working at the Miami Marlins.
67
:He's a fellow colleague now.
68
:That's change for me, that's great change.
69
:I'm extremely excited about that new step
in my life.
70
:But today we're not going to talk a lot
about baseball.
71
:We're going to talk a lot about US
football.
72
:So today, European listeners, when you
hear football, we're going to talk about
73
:American football, the one with a ball
that looks like a rugby ball.
74
:And so Patrick, we're going to talk about
that.
75
:But first, as usual, I want to talk a bit
more about you.
76
:Can you tell the listeners what you're
doing nowadays?
77
:So I gave your title, your bio in the
intro, but maybe like tell us a bit more
78
:in the flesh what you're doing and also
how you ended up doing what you're doing.
79
:Yeah.
80
:Well, currently I'm at the Seattle
Seahawks, which is one of the American
81
:football teams in the NFL.
82
:And I'm the director of research and
analysis there.
83
:So we kind of work across all of football
operations.
84
:So everything from player acquisition,
front office type of stuff to team based
85
:analysis and opponent analysis.
86
:And just kind of coordinating a research
strategy around how we attack questions
87
:for the key decision makers or the key
stakeholders across
88
:coaching, acquisition, even into player
health and performance and development and
89
:things like that.
90
:And I got here, this is my 10th year, I
got here from Nike.
91
:So I was at Nike in the sports research
lab actually working for nearly two years
92
:as a researcher.
93
:And the way that I got
94
:was I was doing some projects for Nike
around applied sports research and they
95
:had just at the time, I think they had
just become like the biggest sponsor of
96
:the newly minted National Women's Soccer
League.
97
:And they said, we want to do something
around this.
98
:And so, you we were kind of kicking around
ideas.
99
:And one of the ideas we had was what if we
went out and we tested all of the women in
100
:the league, like tested them
101
:sprinting and jumping and power output and
things like that.
102
:And then we could basically build like
archetypes and that would be useful for,
103
:you know, like apps in your watch and on
your phone and girls could in the field
104
:could compare themselves to their favorite
athletes and stuff.
105
:So they let us do it.
106
:And they sent me on the road for an entire
off season, the entire off season training
107
:of the national women's soccer league.
108
:went around the country to every single
team, myself and four colleagues, and we
109
:tested.
110
:every woman in the league.
111
:And so we had the largest data set on
women's soccer players that anyone could
112
:have.
113
:So we did some conference presentations
and things like that with that data.
114
:And lo and behold, Nike was there and the
Seattle Seahawks called down to Nike and
115
:said, hey, we hear there's this test
battery and we'd love to see what our
116
:players do on it.
117
:And so I went up and I did a project for
them around that.
118
:And
119
:then they kind of just said like, what if
you just did this kind of stuff all the
120
:time?
121
:And so that's how I started out 10 years
ago.
122
:And I basically started out just in
applied physiology, which was my
123
:background.
124
:And I was doing like wearables, wearable
tech for the team, like GPS and
125
:accelerometry and things like that.
126
:And then that kind of progressed into
draft analysis and player evaluation and
127
:things like that.
128
:And it just kind of growing until
129
:Yeah, 10 years later, here we are.
130
:Yeah, that's a great, yeah, it's it's a
great, uh, background.
131
:love it because I mean, definitely it
seems like you've been into sports since
132
:you are, uh, at least a college graduate,
but also there is a, uh, a bit of
133
:randomness in this.
134
:So sorry, love that.
135
:Uh, of course, as a, as a fellow Bayesian,
always, always interested in.
136
:in the random parts of anybody's journey.
137
:Actually, how much of the Bayesian stats
do you have in that journey and also in
138
:your current work?
139
:How Bayesian in a way is your work right
now, but also how were you introduced to
140
:Bayesian stats?
141
:Well, mean, anyone who has watched
American football knows it's a game of
142
:very, very small sample sizes.
143
:So we only play up until two years ago.
144
:We play 17 games now.
145
:We used to only play 16 games.
146
:So unlike most of the other sports,
baseball has got 162, several hundred at
147
:bats, basketball, hockey, 82 games.
148
:many attempts.
149
:Also the players in a lot of these sports
are all doing the same things in baseball.
150
:Aside from the pitchers, everybody's going
to go to the plate and hit in basketball.
151
:Everybody has a chance on the court to
dribble the ball, shoot, score, pass, get
152
:assists, get blocks, et cetera.
153
:Football is really unique because it's a
very tactical game.
154
:There's discrete events in terms of plays,
stop and start.
155
:But because of the tactical nature of it
and one ball, there's only certain
156
:positions that touch the ball.
157
:There's only certain opportunities that
players are going to have.
158
:So that was always an issue.
159
:And when I did my PhD, a big part of my
PhD was using mixed models to look at
160
:physiological differences between players
on the field with GPS and accelerometry.
161
:And I always thought of mixed models, even
though I didn't know it at the time,
162
:because I hadn't really...
163
:learned anything Bayesian yet.
164
:I always thought of mixed models.
165
:I think of them as like this bridge to
Bayesian analysis because you have these
166
:fixed effects which behave like our
population averages, our population base
167
:rates, I guess you could say.
168
:And the random effects are sort of like,
hey, we know something about you or your
169
:group and therefore we know how you
deviate from the population.
170
:And then with those two bits of
information, we're also like, hey, here's
171
:someone new in the population, or maybe
someone that we've only seen or observed
172
:do the thing one time.
173
:best guess therefore is the fixed effects
portion of this until proven otherwise.
174
:So I always had that in the back of my
head going through this, but you know, my
175
:first two or three years in the NFL, we
always just used to kind of throw our
176
:hands up when we see small samples, we'd
be like, yeah, it's this, it's
177
:50%, but it's such a small sample, we
can't really know.
178
:And we didn't really have a good way of
like sorting out what to do with that
179
:information.
180
:Because as you know, know, something like
one out of 10 and 10 out of a hundred and
181
:you know, a hundred out of a thousand,
those are the same proportions, but
182
:different levels of information are
contained within those proportions.
183
:And I stumbled upon a paper, it was like a
19, I think it was like 19.
184
:77 or something by Efron and Morris and it
was called the Stein's paradox and I
185
:probably stumbled on it because I was like
You know, there's so much in Saber Metrics
186
:someone in baseball has probably figured
this out before and so I I was probably
187
:googling something like Small samples
baseball statistics Saber Metrics blah
188
:blah blah and I stumbled upon this paper
about Stein's paradox and The crux of the
189
:paper was if we observe these I think it
was
190
:12 or 18 baseball players through the
first half of the season up to the All
191
:-Star break.
192
:And we see the number of times they went
to the plate and what they're, you know,
193
:and the number of times they hit, we have
a batting average.
194
:If we take the observed batting average
through the first half of the season, how
195
:well does that predict the batting average
at the end of the season?
196
:Meaning now they've gone through the
second half.
197
:And
198
:You look at that and you're like, okay,
let's, you know, what's this all about?
199
:And so the first thing they do is they set
up this argument that like, well, that
200
:doesn't do a very good job because some of
these players batted, you know, five times
201
:or three times, certainly a player who
went three for three has a hundred percent
202
:batting average.
203
:We don't think this is the greatest
baseball player of all time yet, because
204
:we've only seen them do this thing three
times.
205
:So, the basic naive prediction of using
the half
206
:first half of the season to predict the
second half wasn't very good.
207
:And so in that paper, they introduced this
kind of simple Bayesian model of saying,
208
:well, we know something about average
baseball players.
209
:What if we weighted everybody to that?
210
:And lo and behold, that did a bit better
of a job constraining the small sampled
211
:players who had these, you know, a guy
that goes 0 for 10, which is totally
212
:possible in baseball when you have
hundreds of it bats.
213
:We don't think that's the worst hitter in
baseball.
214
:so, you know, constraining those players
told them something about what they
215
:expected to then see at the end of the
season.
216
:And so through that paper, then I found
this blog by David Robinson, who's an R
217
:programmer.
218
:And it was all about like using empirical
Bayesian analysis for baseball.
219
:And then he made it into a nice little
book that you could buy on Amazon for
220
:like, I don't know, $20 or something.
221
:You know, and I read those two things and
I was like, this is incredible.
222
:This is exactly what I've always wanted to
know.
223
:And so like I went in the next day to our
other analysts at the time, there was only
224
:two of us.
225
:And I said, I think I figured out a way we
could solve small sample problems.
226
:And, and that was it.
227
:Like then after that, you really couldn't
convince me otherwise that this wasn't a
228
:great way of thinking.
229
:That doesn't mean that everything we do
has to be Bayesian.
230
:Certainly like there's other things that
we do that are used.
231
:you know, different tools like machine
learning models and neural networks and
232
:things like that.
233
:But certainly when we start thinking about
like decision -making, how do I
234
:incorporate priors, domain expertise?
235
:How do I fit the right prior?
236
:You know, like if you went 0 for 5 and
you're first at bats, let's say in
237
:baseball, but you were a college standout
and you were an amazing
238
:player in the AAA, I probably have a
stronger prior that you're maybe a
239
:slightly better than average baseball
player than if you went 0 for 5 and you
240
:were a horrific college player and you
weren't very good in AAA and you were
241
:really the last person on the bench that
we needed to call.
242
:And maybe that prior is much lower.
243
:so utilizing that information in order to
help us make decisions going forward,
244
:that's really
245
:That was kind of the money for me.
246
:And so how much do we use it?
247
:mean, if we have a new analyst start, one,
you know, one of our new analyst starts,
248
:started two years ago.
249
:I think the first thing was like, how much
do you know about BASE?
250
:And it was like, well, I never really
learned that in school and blah, blah.
251
:And it was like, okay, here's two books.
252
:Here's a 12 week curriculum.
253
:We're going to meet every week and you're
going to do projects and homework and
254
:reading.
255
:And that was it.
256
:Like, it was like, you have to learn this
because this is how we're going to think.
257
:And this is how we're going to,
258
:process information and communicate
information.
259
:Well, what about that?
260
:I told the listeners that we were not
going to talk a lot about baseball, but in
261
:the end we are.
262
:It all comes back to baseball, think.
263
:Yeah, in sports analytics, all comes back
to baseball, Certainly, yeah.
264
:Yeah, okay.
265
:If I understand correctly, was motivated a
lot by low sample sizes and being able to
266
:handle all of that in your models.
267
:That makes a ton of sense.
268
:As a lot of people, I've seen a lot of
clients definitely motivated by a very
269
:practical problem that you were having.
270
:I mean, most of people enter the Beijing
field through that.
271
:Something that I'm actually very curious
about, because like I could keep talking
272
:about that for hours, but I really want to
dive into what you're doing at the
273
:Seahawks and also, you know, like how
Beijing stats is helpful.
274
:to what you guys are doing.
275
:I think it's the most interesting for the
listeners who understand basically how
276
:themselves could they apply patient stats
to their own problems, which are not
277
:necessarily in sports, but I think sports
is a really good field to think about that
278
:because you have a lot of diversity and
you have also a lot of somewhat controlled
279
:experiments.
280
:You have a lot of constraints and that's
always extremely interesting to talk about
281
:that.
282
:Maybe you can start by basically
explaining how patient stats are applied
283
:in your current role for analyzing player
performance and injury risk.
284
:Because now that I work directly in
sports, something I'm starting to
285
:understand is that really player
286
:projecting player performance and also
being able to handle injury risk are two
287
:extremely important topics.
288
:So maybe let's start with that.
289
:What can you tell us about that, Okay.
290
:Let's see.
291
:Which one should I start with?
292
:I guess I'll start with injury risk, I
suppose.
293
:Injury is like...
294
:I mean, this is like a super difficult
problem to solve.
295
:You know, I've written a number of papers
on those.
296
:think you can link to my research gate.
297
:And there's a number of methodology papers
that we've written that have looked at
298
:things like this.
299
:And I think it's complicated because one,
there's like a ton of inter -individual
300
:differences as far as why people get hurt.
301
:There's a ton of things that we probably,
you know, don't know they're important yet
302
:because we can't measure them or we at
least can't measure them in the real world
303
:applied.
304
:setting, maybe in a lab you can.
305
:And then there's other things that we just
don't know because we're like, it's a
306
:epistemic problem.
307
:Like we're just stupid about it.
308
:We're naive that there's other things out
there that maybe we're just unaware of
309
:yet.
310
:And so it's a really hard problem to try
and solve.
311
:So when I see papers that basically come
out and say like an injury prediction
312
:model and they're estimating
313
:prediction as like a one or a zero, like a
yes or a no, like a binary response, and
314
:they give a nice little two by two table
and they talk about how well their model
315
:did.
316
:I'm always like, I don't, how is that
useful to the people who actually have to
317
:do the work?
318
:Because in reality, what we're dealing
with is it's probably not unlike a hedge
319
:fund manager managing the risk of their
portfolio.
320
:And if you think of each player,
321
:or each athlete that you deal with as a
portfolio, they each have some level of
322
:base risk.
323
:So if we know nothing about you, you
really have to have a pretty good handle
324
:in your sport of what's the base rates of
risk of injury for position groups and
325
:players of different age and things like
that.
326
:So that might be an initial model, right?
327
:And then from there...
328
:The players go out and they do things and
they play and they perform and they
329
:compete and they get dinged up and they
take hits and they get, you know, hit by
330
:hit by pitches or they get tackled really
hard or things like that.
331
:And we collect that information and we're
basically just shifting the probabilities
332
:up and down based on what we observe over
time.
333
:And when that probability reaches a
certain threshold.
334
:And of course you could use a posterior
distribution.
335
:So you have an integral of like how much
of the probability distribution is above
336
:or below a certain threshold.
337
:Then you have the opportunity to have a
discussion about when to act or what to
338
:do.
339
:And how you act and when to act is going
to be dependent on your tolerance for risk
340
:or your coach's tolerance for risk.
341
:If it's your best player, if it's the MVP
of your team and it's week two of the
342
:season and the risk probability, or let's
say we're using this as a model.
343
:Some of the stuff that you mentioned Scott
earlier that we've worked on is like
344
:return to play type of models where it's
like, okay, the athlete has, you know, saw
345
:an ankle sprain and we're there rehabbing
346
:And we have a, you know, we have a test or
several tests, a test battery that tells
347
:us where that athlete is on their return
to play timeline.
348
:Um, let's say it's week two of the season
and we say, well, there's a, you know, the
349
:probability distribution, the posterior
distribution looks like this.
350
:Here's the threshold that we'd feel
comfortable releasing this athlete back to
351
:full on competition.
352
:And there's a 30 % chance
353
:they're in good shape and there's a 70 %
chance that they're below that threshold.
354
:In week two of the season, we probably
want to say, you know what?
355
:Let's not take that risk this week.
356
:Let's be a little bit more risk averse
here because it is the best player.
357
:And let's wait till we have more
distribution on the right side of the
358
:threshold.
359
:Alternatively, if it's
360
:final game of the season, it's the Super
Bowl or the World Series or the Champions
361
:League final or something like that,
you're going to probably take that risk
362
:because you need the best player out
there.
363
:And so when I think about injury risk
modeling, what I really think about is how
364
:do we evaluate this individual's current
status?
365
:on our sort of risk score or our risk
distribution.
366
:And when do we feel like we need to
intervene and do something?
367
:And when are we going to feel like, this
is fine and continue training as is.
368
:And I think that's the tricky part.
369
:I don't think it's not easy.
370
:I don't think I've solved anything.
371
:I don't think anyone has, but...
372
:Certainly from the perspective of our
staff, we can all sit down with a
373
:performance staff of strength coaches and
dieticians and strength coaches and
374
:medical people and sit down and have these
conversations.
375
:And what makes it nice about using a
Bayesian approach is that we can also take
376
:into account domain expertise that we
might not have in the data.
377
:So if we sit down on a Monday meeting and
then we say, you know, this player,
378
:This is where they're currently at and
this is their risk status, which I don't
379
:know, I don't really feel comfortable with
that.
380
:How do you feel about it?
381
:And then one of the medical people say,
you know, he's been complaining that his
382
:hamstring feels really tight and he's been
getting treatment every morning.
383
:Well, that's not data that we would be
collecting, but that's valuable domain
384
:information that this individual who's
working with the player now adds to this.
385
:And it's just like anything in
probability.
386
:It's like if we
387
:two or three or four independent sources,
all kind of converging on the same
388
:outcome, on the same end point, we
probably need to feel really good about
389
:making that decision and saying like, hey,
let's do something about this, let's act
390
:now, right?
391
:So that's kind of how we, you know, that's
how I think about it in that, you know,
392
:from that side of things.
393
:From the performance side of things, the
development side of things,
394
:It's probably going to be, I mean, it'd be
way different for you guys in baseball
395
:because you draft a player.
396
:You don't expect them to maybe get to the
major leagues and contribute till 23, 24,
397
:25 years old.
398
:You know, for us, you draft a player and
those are going to be the, you know, next
399
:year they're playing, they're ready, you
know, they're in, they're in the mix.
400
:So in that regard, you'd be thinking of
models that would probably be, in my head,
401
:I would be thinking of it as like models
that are mapping the growth potential of
402
:an individual.
403
:How are they progressing through the minor
leagues, which attributes matter?
404
:And then maybe from there answering
questions like what's the probability that
405
:this player makes 20 starts in the major
leagues or starts for three seasons
406
:whatever end point makes sense to the
decision makers, obviously.
407
:You know, for us, it's more about like
player identification.
408
:And again, football is a, is a sport of
small samples.
409
:And so in their college years, some of
these kids might really only be a starter
410
:or a full -time player in their junior and
senior year, or maybe just their senior
411
:year of college.
412
:Additionally, you know, unlike, unlike the
NFL where
413
:you know, at that highest level, the
talent is much more homogenous.
414
:You get to the college football ranks and
you have just this diversity of talent
415
:where you might have a big time team
playing a really lower level opponent.
416
:And so, you you have to adjust things,
being able to hand off the ball to your
417
:running back who's playing against a very
low level opponent.
418
:And he goes for 500 yards or something
absurd, 200, 300 yards in a game.
419
:that has to be adjusted and weighted in
some way because it's not the same as
420
:going two or 300 yards against a big time
opponent.
421
:And the big time opponents are more
similar to the NFL players that they're
422
:going to play against.
423
:And so, you know, all of these types of
things fit into models and hierarchical
424
:models and Bayesian models, which help us
utilize prior information.
425
:And the other way that the Bayesian models
are useful here
426
:You know, sometimes we're dealing with
information that's incomplete because we
427
:can't observe all of the cases.
428
:You know, for example, in college sport,
division one is the top division.
429
:You know, and then you have FBS and then
they division two and division three.
430
:So if you pull all the division two kids
that have ever made it as a pro athlete,
431
:the list is very small.
432
:but they're kids that made it.
433
:And so if you were to just build a normal
model on this, it would say like, well,
434
:the best players clearly come from these
lower level schools because all of the
435
:ones that we have seen have made it, have
been successful.
436
:And in theory, there's hundreds of
thousands of kids from that level that
437
:have never made it.
438
:So we have to adjust that model in some
way.
439
:We have to weight that prior back down.
440
:Yeah, this guy is really, really good at
that level.
441
:but our prior belief on him making it is
very, very low.
442
:And you mean he'd have to be so
exceptional in order to, and this is where
443
:like, oftentimes people rail on like, use
weekly informative priors, let the data
444
:speak a little bit.
445
:But there are times where in these
situations where I feel like you could
446
:probably put a slightly stronger prior on
this and be like, man, this guy's really
447
:gonna have to do something outstanding to
get outside.
448
:of the distribution that we believe is on
this just given what we know.
449
:Okay, yeah, that's very interesting.
450
:That's a very good point.
451
:Since I, yeah, related to survivor bias in
a way.
452
:How concretely, how do you, how do you
handle these kind of cases?
453
:Is it a matter of using a different prior
for these type of players or something
454
:Try to do this in a few different ways.
455
:One is you try and make basically like
equivalency metrics, like saying if you
456
:did X at this low level, it in some way
relates to Y at this other level.
457
:So you try and normalize players based on
players that you've seen that have moved,
458
:say, between levels
459
:of the game.
460
:so like, again, if you think about it from
a baseball perspective, you know, hitting
461
:40 home runs in AA baseball might be
related to, you know, might be in some way
462
:convert to like 33 home runs in AAA and 24
home runs in the MLB or 12 home runs in
463
:the MLB or whatever it might be.
464
:Right.
465
:So trying to,
466
:identify equivalencies between those that
we can then like constrain everybody.
467
:Other ways is just like, like you said,
like putting a prior on it.
468
:knowing the level that the person is
playing at, you would have like a lower
469
:level of prior.
470
:For example, it's just like playtime.
471
:If I think about playtime and performance
as sort of this,
472
:this kind of like rising curve that goes
to an asymptote of some upper level of
473
:performance.
474
:The players way at the left who have very
small number of observations, it would be
475
:silly to say that my prior for those
players is the league average.
476
:There's a reason why they're not playing
very much.
477
:It's probably because people don't think
they're very good, right?
478
:So somewhere in that curve,
479
:for each of those numbers of observations
across whatever performance metric we're
480
:looking at, there's going to be a specific
prior on that continuous distribution.
481
:And that's where I would, you know, that's
where we would kind of draw a stake in the
482
:ground and say like, we probably think
based on what we know that this player is
483
:closer to these players than he is to
those players.
484
:Okay, yeah, yeah, I see.
485
:Yeah, definitely makes sense.
486
:And yeah, yeah, like that point of play
time already tells you something.
487
:Because if the player plays less, then
very probably already you know you have
488
:information about his level.
489
:And that means he's at least not as good
as the A level players that play much more
490
:The only time you get in trouble with that
is like an endowment effect where if you,
491
:you know, like in major league baseball,
there's been some research on players who
492
:are drafted very high in the first round,
second round get progressed up and through
493
:the minor leagues faster than players who
were drafted lower, even if they don't
494
:outperform those players just because
they're high as a consequence of being a
495
:high draft pick.
496
:That one's a tricky one, but there has to
be, at some point it's like actually, and
497
:this is where like, know, posterior
distributions, you can really, I mean,
498
:it's almost like doing an AB test.
499
:Like we've got two players and what's the
probability that this guy is actually
500
:outperforming the other guy, even though
the other guy might've been, you know, a
501
:higher draft pick or something like that.
502
:And so you try and at least display, you
know, we try and at least display that
503
:visually and have those conversations.
504
:It's,
505
:kind of in my head, at least maybe I'm
wrong, but a nice way of like helping
506
:people understand the uncertainty, you
know, which is really important.
507
:always, maybe it's try, you know, I used
to work with a guy who whenever I would
508
:present some of the stuff at work and he'd
be like, stop doing that.
509
:Like every, every time you present, you
talk about like what the uncertainty and
510
:the assumptions and the limitations are,
like just give them the answers.
511
:And I'm like, well, it's important that
they know what the limitations
512
:and what assumptions are behind this
because we can't, we don't want to talk
513
:past the sale and sell them on something
that, you know, isn't really there.
514
:Like there's been times where I've had to
stop someone and just be like, hold on.
515
:This analysis definitely can't tell us
that.
516
:Like what you're saying right now, it
can't tell us that.
517
:like, let's not, let's not try and make
this more than it is.
518
:And also just, you know, conveying your
uncertainty.
519
:mean, that's just super important because
520
:It's really, really hard.
521
:I mean, we're all going to fail at trying
to identify talent.
522
:It's really hard to identify why one
player is going to succeed over another.
523
:so, you know, in some way it's not binary.
524
:It's not a like, do you like this guy or
not?
525
:Is he good or bad?
526
:Is this guy better or worse than the other
guy?
527
:there's a lot of factors that go into why
someone has success.
528
:And so I think conveying that uncertainty
is really important.
529
:And obviously, the more observations that
we have of you doing the thing, the more
530
:certain we are that this is your true
level of performance.
531
:But it takes a while to get there.
532
:So we have to just be honest about that.
533
:Yeah, yeah.
534
:I think that's actually related to
something I wanted to ask you about also a
535
:bit more generally, you know, but
536
:the most significant challenges that you
face when applying Bayesian stance in, in
537
:sports science and, and how you address
them, because I'm guessing that you, you
538
:already started talking a bit about that.
539
:So, let's go there.
540
:And then, then I have other technical
questions for you, but the kind of, of
541
:models and, and, and usefulness that
Bayesian stance has in your field.
542
:But I think this is a good moment to, to
address these.
543
:questions.
544
:think the biggest or there's a few
challenges.
545
:One challenge is not everybody is excited
about a posterior distribution like you
546
:might be.
547
:Most of the time, they just want an
answer.
548
:Tell me what to
549
:Give me the yes or no, make it binary.
550
:And so that's always tough.
551
:And you're trying to oftentimes convey
this to non -technical audiences or people
552
:who are good at doing other things.
553
:They're not math people or they're not
stats people.
554
:And that's okay.
555
:So that always makes it challenging is why
are you showing me this distribute?
556
:I don't understand what I'm supposed to
take from this.
557
:Just tell me.
558
:What to do?
559
:Tell me which guy's better.
560
:Tell me which guy's worse.
561
:So that's always hard.
562
:And that takes a lot of patience and
communication.
563
:For a while, we used to do just weekly sit
downs with our scouts where we would teach
564
:them about like one stat a week.
565
:And we'd go slow.
566
:And we'd also try and...
567
:as best as possible, relate things back to
the currency that they speak in.
568
:And scouts and coaches, the currency they
speak in is video, not charts and graphs.
569
:So the more that we can connect our
analysis to video cut -ups, because then
570
:they can see it.
571
:And then they understand why a model says
what it says or makes a decision or why it
572
:has assumptions.
573
:And this is also super valuable too,
because they give
574
:And they say, it's, saying that, you know,
the model is saying that, this is, is the
575
:outcome, but I can see why it's because
these four other things happen.
576
:It's like, wow.
577
:Well, we could probably account for that.
578
:And we never, I just didn't know it,
right?
579
:That's why they're domain expert and, and,
and I'm not.
580
:so.
581
:You know, the patience around
communicating stats and numbers is always
582
:difficult and also knowing what people
like.
583
:When I first started, everybody would tell
you, need to have, you know, got to have
584
:an amazing dashboard, got to have like
charts and graphs, you know, and all that
585
:stuff.
586
:And what I found was there was a lot of
people who were like, I don't, what do
587
:you, I don't even know what I'm looking
at.
588
:Like, I hate these things.
589
:Just give me the table of numbers.
590
:It's like, okay.
591
:Well, maybe a table of numbers with just
some conditionally formatted information.
592
:And also, you know,
593
:I have an academic side, I do supervise
PhD students and master students, and I do
594
:teach a master's class in statistics at
college.
595
:So I guess what I'm about to say would,
know, people on the academic side would
596
:hate it, but you have to like recognize
the environment you're in.
597
:And sometimes just like changing the
verbiage helps, like instead of calling
598
:things the...
599
:low credible interval and the high
credible interval, like we just call it
600
:the floor and the ceiling.
601
:And people are like, yeah, this guy's
floor, it's a bit higher than the other
602
:guy's floor.
603
:And that guy's ceiling, this guy's got a
better ceiling.
604
:And like, know, academically you'd get
shot for that, it's like, those kinds of
605
:things go a long way because it brings the
information to the end user.
606
:And if you want them to start to...
607
:take this information into their decision
calculus, you have to get them
608
:comfortable.
609
:And sometimes it's just meeting them with
terminology that helps.
610
:And so I think that's a really, you know,
that's a big one.
611
:Those are big challenges in communicating
this stuff.
612
:Yeah, definitely.
613
:And I resonate with that.
614
:I've had the same issues.
615
:I'll be able to tell.
616
:talk more precisely about sports in a few
months.
617
:But when it comes to a lot of other
fields, whether it's marketing or biostats
618
:or electrical forecasting, yeah, the
issues are related to these.
619
:They're also extremely diverse.
620
:So that's interesting.
621
:You definitely don't have a one size fits
all.
622
:Definitely what's extremely important
basically is to know the model extremely
623
:well from my experience.
624
:And yeah, if you have coded the model
yourself, you usually know it really well
625
:because you spent hours on it to try and
get it to work and understand what it's
626
:doing.
627
:And when it's not able to do as you were
saying, I think it's extremely important
628
:to be able to tell people what the model
cannot tell you.
629
:And yeah, I think these are extremely good
points to try and balance what people are
630
:usually wondering about.
631
:And that's also where I think having the
Bayesian model is extremely interesting,
632
:right?
633
:Because the Bayesian model by definition
is extremely open box and you have to run
634
:it down your assumptions.
635
:And so you know much better what the model
is doing than a black box model.
636
:Yeah, I mean, that's another good point
is.
637
:If you go into a meeting and you have
model outputs and your only reason when
638
:asked, why does it prefer this over that?
639
:Your only reason is because the model said
so.
640
:If people aren't going to be super excited
about that.
641
:knowing why things are happening, know,
this also, you know, I mean, this really
642
:plays into like how you validate and check
your models.
643
:And so buildings, you know, we kind
644
:within that Bayesian sort of world,
building simulations is a big part of it.
645
:And building simulations to see how the
model behaves under different constraints
646
:and different pieces of information,
that's really important because it gives
647
:you useful context to talk about and it
gives you useful information in order to
648
:head things off at the pass when you know
there's gonna be some gotchas and some
649
:trouble if, you
650
:people have certain types of questions.
651
:You can head things off of the past
because you're already aware of them.
652
:Another thing that I do think is really
useful in this and maybe in some of your
653
:prior work in consulting, I'm sure you've
like stumbled on like, or used frameworks
654
:like crisp DM and things like that.
655
:Like in statistics, there's a PPDAC
problem plan, data analysis and
656
:conclusion.
657
:Those types of frameworks help just
because again,
658
:A lot of times we're dealing with non
-technical audiences and they're trying to
659
:give you a question and say like, Hey, can
we look at this?
660
:And oftentimes these things are very vague
and very sort of like, you know, not, not,
661
:not clearly defined.
662
:like, you know, my younger self would take
that and run away and, know, do something
663
:for a week or two and then come back and
be like, Hey, here's this thing, you know,
664
:and you ask about
665
:you know, they're usually like, the reply
is, that's kind of cool, but I was
666
:thinking of it like this and I would do
this with it.
667
:it's like, man, if you, you know, if you
told me that two weeks ago, I would have
668
:done something else.
669
:So using those kinds of frameworks, one,
does a few things.
670
:One, it gives us the opportunity.
671
:Like I always tell our analysts like
question the question, like, you know,
672
:question the question.
673
:Right?
674
:So when they have a question, I'm always
sitting there and I'm like, okay, well,
675
:you know, what would you want to do with
this?
676
:How, do you foresee yourself using it to
make a decision?
677
:What's the cadence that you would need to
access this information?
678
:If I were to get it to you tomorrow, you
know, what would you, what kind of
679
:decision would you want to make?
680
:Like really kind of Socratic questions,
you know, question the question.
681
:And, that does a few things.
682
:One, we get, we get to two, you know,
683
:usually two different results.
684
:Both of them are good.
685
:The first is I get them to then walk
through that five minutes with me and
686
:clearly define what it is they're looking
for.
687
:That's great.
688
:The other result is the opposite, but it's
also a good result, which is we get about
689
:three minutes in and they go, you know
what?
690
:I haven't thought about this well enough.
691
:Let me think through it a bit more and
come back to you.
692
:In which case I didn't waste the time
building things and scraping and cleaning
693
:data and doing all that stuff.
694
:The other thing that those frameworks do
is, and I try and get analysts to think
695
:like this, is utilize each step within
those frameworks as touch points back to
696
:the person who asked you the question.
697
:Hey, this is where we're at.
698
:We've collected this kind of data.
699
:These are the things we're thinking.
700
:These are the features that we're thinking
about using.
701
:What do you think about that?
702
:Anything else you can think of.
703
:By doing that, along each step of the way,
they get to see the model developed.
704
:They get to provide input.
705
:And what that does is it gives them a bit
of ownership over it.
706
:So when you get to the end result, they're
like, geez, this was built exactly in my
707
:vision, and now I'm excited to use it.
708
:And that's a really cool thing too.
709
:Yeah.
710
:Yeah.
711
:Thanks for that detailed answer, Patrick.
712
:I can definitely hear the 10 years of
experience working on that.
713
:That makes me think about a lot of other
things.
714
:Yeah, definitely the same for me, would
say, where my personal evolution has been
715
:trying to really understand the question
the consumer of the model is trying to get
716
:to, right?
717
:Like what actually is your question?
718
:Because you have something in mind, but
maybe the way we're talking about it right
719
:now and the way I have it in mind is not
what you want.
720
:And so, yeah, as you were saying, a good
model is really that's custom made, that's
721
:fine and hard work and that takes time.
722
:so before investing all that time in doing
the model, let's actually make sure we
723
:align and agree on what we're actually
looking at in studying.
724
:That's, think it's extremely important.
725
:Yeah, no doubt.
726
:I think that's often the hardest part,
because it's just getting people to really
727
:define.
728
:that's probably, I mean, that and making
sure that you have good data.
729
:Those are the two biggest things.
730
:The model building part and things like
that sort of happen a little bit easier
731
:once you do the first two things.
732
:That's always the tough part.
733
:Yeah, yeah, yeah.
734
:Actually, continuing on that topic, how do
you communicate these statistical
735
:concepts?
736
:And honestly, a lot of them are really
complex.
737
:So how do you communicate that to non
-stats people in your line of work?
738
:I'm guessing that would be scouts, as you
talked about, coaches, players.
739
:How do you make sure they understand?
740
:what you're doing and in the end are able
to use it because we talked about that in
741
:episode 108 with Paul Sabin.
742
:If your model is awesome but not used,
it's not very interesting.
743
:So yeah, how do you do that?
744
:First, trying to really understand what
kind of cadence this is going to be on.
745
:So some questions.
746
:especially in sport, get asked.
747
:And they're more asked from the knowledge
generation standpoint, meaning that I have
748
:a question.
749
:I think it'll help us with, you know,
updating our priors, our prior beliefs
750
:about the game.
751
:Maybe things have changed.
752
:Maybe rule changes have altered things or
something like that.
753
:Can we study this?
754
:A question like
755
:for knowledge generation requires a
different output than something that's
756
:like weekly or daily consumption.
757
:So if it's for knowledge generation,
that's usually communicated in the form of
758
:like a short written report.
759
:The question at the top, the bottom line
up front, here's the four bullet points,
760
:and then the nitty gritty.
761
:Like this is how we went about studying
it.
762
:charts and graphs and usually it's like a
page or two and a PDF or maybe like an
763
:interactive HTML file that they can see
things and have a table of contents and go
764
:to different sections.
765
:If the question is directed at stuff
that's required to be evaluated weekly or
766
:daily, like I need to see this every week
because we're going to be evaluating a
767
:certain player or an opponent or I need to
see this daily because it's
768
:player health related, something like
that.
769
:We're always thinking in terms of like web
applications.
770
:So how do I get, you now I have to think
through the full stack pipeline of like,
771
:where do we get the data?
772
:Where does it live in the database?
773
:What's the analysis layer?
774
:Kick it out to an output.
775
:Where's that output stored?
776
:And then how does the website ingest that
output and make it consumable?
777
:And for that,
778
:It's usually some form of charts and
graphs and a table.
779
:And usually it's interactive stuff.
780
:So they can sort and filter and hover over
points and access the information.
781
:And again, as best as possible, I'm always
thinking to try and develop that in the
782
:way that they're going to use it.
783
:So like I was sitting down, for example,
today with our director of player health,
784
:and he was like, you
785
:I'd love to have this information daily so
that I can relay it to the new coaching
786
:staff.
787
:And I want to say it, you know, say these
things.
788
:Okay, great.
789
:I have all that information.
790
:I have all of that, those models, but come
over to the whiteboard and draw for me the
791
:path that you want to take to going from
sitting at your desk.
792
:and reading the information from a webpage
to how you want to communicate it.
793
:And as soon as he started drawing it out,
it's like, okay, I know exactly what to do
794
:now.
795
:That's perfect.
796
:Otherwise I would have built something
that in my head I thought would be useful,
797
:but maybe not useful to him.
798
:And then he uses like part of it or maybe
because he's super motivated, he's going
799
:to use it.
800
:And he's also going to,
801
:use like 10 other things to get the other
stuff he wants, but he's a nice guy and he
802
:doesn't want to tell me that it doesn't
have all the things that he needs.
803
:And so then like four weeks later, I walk
in his office, I'm like, what are you
804
:doing?
805
:It's like, oh, I go here and then I get
this information from this webpage, but
806
:then I go to this other three webpages
again.
807
:So, whoa, whoa, whoa, why didn't you just
tell me that?
808
:Like I'll just, I could make this all into
one thing.
809
:Like you don't have to, and so.
810
:That's a really important piece is knowing
how the data is going to be utilized,
811
:making sure that it's exactly in the order
that the decision maker requires it.
812
:Yeah.
813
:Awesome points.
814
:Yeah.
815
:Thanks for that, Patrick.
816
:And I think it's also very valuable to a
lot of listeners because we're talking
817
:about a professional sports team here, but
it is definitely transferable to
818
:basically, I think, any company where
you're working
819
:different people who are using the models
but are not themselves producing the
820
:models.
821
:It's like almost every company out there.
822
:yeah, I think and also from my experience
doing consulting in a lot of different
823
:fields, I can definitely vouch for the
things you've touched on here.
824
:yeah, thanks.
825
:That's definitely, I think, very valuable.
826
:turn back a bit more to the technical
stuff because I see time is running and I
827
:definitely want to touch a bit more on the
spot side of things and how patient stance
828
:is applied in the film.
829
:Obviously a very important part of your
work is, I'm guessing,
830
:drafting players, player selection
processes.
831
:So yeah, how might Bayesian methods be
applied here to improve the drafted
832
:strategies in the player selection
processes?
833
:Yeah, well, again, like I think I said
earlier, everybody's going to miss.
834
:It's impossible to be, you know...
835
:to have a good hit rate and always be
picking, you know, picking players who are
836
:going to reach high level success.
837
:And a lot of that is just because, you
know, performance and talent are extremely
838
:right -tailed.
839
:You know, you have a whole bunch of
players that never make it.
840
:You have a small group that make it and
are good enough to make it.
841
:You have an even smaller group that are
good enough to make it and like really
842
:good to play all the time.
843
:And then you have
844
:a few Hall of Famers sprinkled in, right?
845
:So it's really right -tailed.
846
:it is very hard to do this stuff.
847
:So, you know, understanding or modeling
your uncertainty, that's really important.
848
:And
849
:information from the domain experts, know,
scouts see things on film that we can't
850
:see in numbers and vice versa.
851
:One of the values that we have is we can
process way more players than any one
852
:human can actually watch.
853
:So we have the ability to build models
that can identify players and hopefully
854
:get them,
855
:over to the domain experts who have to
then watch the film and write the reports
856
:and say like, hey, did you know this guy
was really good in these things?
857
:This is his potential ceiling.
858
:And we think that we have, you know, we
think that this would be valuable for our
859
:team, right?
860
:Building models like that, that help us.
861
:Identify talent, give us a range of
plausible outcomes.
862
:One, it helps us get information to the
people who have to watch the film and make
863
:the decisions.
864
:Two, it helps us have discussions about
where the appropriate time to acquire
865
:people
866
:If you're sitting there, obviously, you
know, in the major league draft, major
867
:league baseball draft, it would be the
same thing.
868
:Everybody knows who the first round picks
are and the second round picks.
869
:It's after that, that things become pretty
sparse.
870
:And if you can identify players that have
unique abilities later in the draft, that
871
:opens up a lot of opportunities to,
872
:select players that might be able to
contribute successfully to your team.
873
:And so that's really where those models
help us.
874
:The other area that they help us in is, I
always talk about with our analysts, like,
875
:what is the benchmark that you're trying
to beat?
876
:So every model, like you can't just build
a model.
877
:I mean, I remember one of our analysts,
she had a model and she said, I built a
878
:model and I think it's really good.
879
:And I said, cool.
880
:How well does it do against the benchmark?
881
:She's like, well, what do you mean?
882
:And I was like, well, like how well does
it do against if we just use, let's say
883
:scout grades or if we just use public
perception, how well does it do
884
:historically against that?
885
:She's like, no, no, no, I don't care about
that.
886
:Like this model is just with their stats
and you You know, it's like, no, no, but
887
:you have to care about that because if
it's not better than those things, then
888
:why would we use it?
889
:Right?
890
:You have to be able to beat that
benchmark.
891
:One of the areas where we can really beat
a benchmark is when we combine the domain
892
:experts information with the actual
observed data information.
893
:And a Bayesian model allows us to do that,
right?
894
:It allows us to take down the domain
expert who's maybe scoring the player a
895
:certain way, writing information about the
player.
896
:It allows us to take that information.
897
:mix it with the numbers and get a model
that is, I guess, man and machine, right?
898
:And those models beat our benchmark much
better than any one of these alone, right?
899
:If we just use numbers, never watched any
film, never knew anything about the
900
:player, or if we just use domain expert
information.
901
:When we combine those things, we tend to
do a much better job.
902
:And so that's where Bayesian analysis
really helps us.
903
:And also,
904
:That's where you start to get interesting
discussions about the floor and the
905
:ceiling of a player.
906
:Because now once you run their posterior
distribution and the domain experts
907
:information is in there and you're saying,
yeah, this guy, he's awesome at tackling
908
:and he'll be a great tackler and blah,
blah, blah.
909
:And these are his numbers.
910
:the numeric model says like, yeah, I think
this guy's a pretty good tackler.
911
:Domain experts saying like, no, no, no, I
watched him and he doesn't play against
912
:great competition, but his technique is
really bad.
913
:It's not going to translate against these
bigger players.
914
:It's like, well, that's not information
that maybe our stats would have.
915
:But when we combine those two bits of
information, all of a sudden, our maybe
916
:overly bullish belief in this player gets
brought down a bit.
917
:And utilizing the information like that
918
:is interesting and it also makes it unique
to the people that are in that room, the
919
:domain experts that you have in that room
and things like that.
920
:How you weight those things is really
important.
921
:For our own analytics staff, we'll do
things like we'll build our own separate
922
:models and have our own meetings and we'll
build our own analysis.
923
:So we'll have independent models all
against each other and maybe we'll have
924
:them weighted or we'll use
925
:you know, like triangle prior and build
them together and, you know, mix them
926
:together and get posterior simulations.
927
:And we try and do those things in a way
that allows us to understand all the
928
:plausible outcomes that might be relevant
for this individual.
929
:It's fascinating.
930
:Yeah.
931
:And I really love both that feel the fact
that you have to blend a lot of different
932
:information.
933
:Like the domain knowledge from the scouts,
the benchmark from the markets, the models
934
:that you have in house, also scientific
knowledge of all the scientists that the
935
:team has inside of it.
936
:that makes all that much more complicated,
right?
937
:I'm guessing sometimes as the modeler, you
would probably be like, my God, that'd be
938
:so much easier if we could just run some
very big neural network and that'd be
939
:done.
940
:at the same time, I think it's what makes
the thrill of that field, at least for me,
941
:is that, no, that stuff is really hard.
942
:There is a lot of randomness.
943
:There is a lot of things we don't really
understand either.
944
:And you have to blend all of these
elements together to try and make the best
945
:decisions you can, even though you know
you're not making the optimal decisions,
946
:as you are saying.
947
:And I think it's a fascinating field to
study important decision -making under
948
:uncertainty.
949
:Yeah, for sure.
950
:I think that's the thing that's most
951
:interesting about it to me.
952
:Like, yeah, I think that's the most that
stuff is fascinating just knowing
953
:Yeah, Decision making under uncertainty is
really challenging and I think that's the
954
:thing that makes this the most, you know,
the coolest stuff to work on.
955
:Yeah, yeah, no, definitely.
956
:Actually, maybe a last question on the
technical side.
957
:Now if we look, so we've talked about the
beginning of the career of a player,
958
:right?
959
:Like the draft.
960
:We've talked about...
961
:kind of the whole lifetime of the player,
which is projection, performance
962
:projection over the whole career.
963
:Now I'm wondering about the day -to -day
stuff.
964
:What can Bayesian models tell us here or
how can they help us in predicting the
965
:impact of training loads on the athletes'
wellbeing and performance?
966
:I know, I think it's kind of a frontier
967
:almost all the sports, but I'm curious
what the state of the art here is,
968
:especially in US football.
969
:Yeah, it really is, I think, the sort of
one of the final frontiers, I guess, in
970
:sport.
971
:Team sport is just challenging because you
perform well or you win or you lose due to
972
:a whole bunch of issues that sometimes
973
:have nothing to do with you.
974
:For example, I can train you, you know, we
could train you and you could be very fit
975
:and strong.
976
:And if in the last play of the game, the
quarterback throws the ball to a patch of
977
:grass and you lose, it had nothing to do
with you being fit and strong.
978
:know, counter that to like individual
sport athletes.
979
:If you're a 400 meter runner, a cyclist, a
swimmer, a runner, a marathoner, you know,
980
:physiologically.
981
:If we build you up, we have a much more
direct line between how you develop and
982
:how it directly relates to your
performance.
983
:There's not a lot of other information
there.
984
:No one's trying to tackle you on the bike
or in the pool or something like that.
985
:So that makes, that makes a sport much
more difficult.
986
:Baseball is probably the closest because
even though it is a team
987
:It really is this sort of zero sum duel
between a pitcher and a batter.
988
:And one guy wins and one guy loses.
989
:And the events are very discreet.
990
:The states of the game have been played
out, know, runner on first and second with
991
:two outs, bottom of the third, blah, blah,
blah.
992
:So it's maybe a little bit more clear in
baseball.
993
:I think in the other team sports, in the
kind of invasion sports,
994
:what makes this challenging is
identifying.
995
:I always try and take it back to
identifying the discrete events that we're
996
:trying to, trying to maybe measure
against.
997
:like, for example, I can give you example,
a pretty clear example from basketball.
998
:was talking with a friend in a, in an NBA
team and, he was like, yeah, you know,
999
:our, our, our coach and our scouts and
the, you know,
::
coaches, feel like our players don't close
out three pointers fast enough.
::
And I was like, well, is that a tactical
problem or is it a physical problem?
::
And he's like, well, how would we look at
that?
::
And I was like, you have the player
tracking data.
::
And if you know every time your team's on
defense, which is easy to know, and you
::
know every three pointer that's been shot
against your defense, if you were to take
::
that frame,
::
out of the player tracking data and maybe
like the frame a second to a second and a
::
half before that.
::
So all of that information for every one
of those three pointers.
::
You have an idea of the relationship
between your player and the player who's
::
taking the three point shot.
::
You have an idea of the relationship
between your player and the other players
::
on his team.
::
So you know from a technical, a tactical
standpoint.
::
you know what type of like formation or
defense you're trying to run.
::
So first things first, are the players in
the right position to close out that three
::
pointer?
::
Maybe, you know what?
::
Our guys consistently mess up the
defensive shape and when they get in
::
there, they give too much ground to the
guy shooting a three pointer.
::
The other is the physical standpoint of,
well, no, they're in good position, but
::
when they go to close it out over that
second and a half,
::
They're not fast enough to get there.
::
Okay, great.
::
Now roll it back to what you can measure
in the gym.
::
Is there some measure, let's say on a
force plate of the amount of impulse or
::
force under the force time curve that the
player outputs that can tell us something
::
about their ability to move rapidly, apply
force into the ground, move rapidly to
::
close out that three pointer?
::
And maybe if you look at several years
worth of data, you'd find
::
The top players on your team all do this
thing really well, and some of the worst
::
players at closing out the three do this
thing poorly.
::
And so now you have something to say about
like, hey, what if we develop this quality
::
in the off season and our players, would
we be able to close out the three pointers
::
more effectively, more efficiently?
::
And so I think from that standpoint,
linking the development piece to sport,
::
team sport, invasion sport.
::
You have to really think about the
discrete events of the game and how you
::
can kind of tease those out of, let's say
the player tracking data.
::
And it's like super hard in something
like, you know, in football, because
::
players all do really different things.
::
You know, the linebacker does something
totally different than the offensive
::
lineman.
::
And so you have to really get down to the,
the domain of each of those positions and
::
say like, gosh, what are the discrete
events?
::
that define what this position does, then
how do we measure success in those?
::
And then if we can measure success, how do
we identify the archetype of players who
::
are good at those things?
::
And then if we can do that, maybe then we
can start to talk about, is this something
::
that you can develop in a player?
::
Is it something that you have to identify
in a player?
::
That's sort of the, in my head, I mean, I
don't know, I could be wrong.
::
This is not.
::
Nobody, think everybody's trying to figure
this out, but I could be wrong.
::
But in my head, that's at least the
process that I would, you know, I try and
::
think through when I think about these
things.
::
Yeah.
::
Yeah.
::
It makes a ton of sense.
::
mean, and it seems like, yeah, that, and
there are so many areas, open areas of
::
research on all of that stuff.
::
That's just, just fascinating.
::
I'm
::
I'm already thinking, that'd be amazing to
have a huge patient model where you have
::
all of those topics that we've talked
about.
::
Basically, it could be a big patient model
where you have a bunch of likelihoods.
::
And yeah, that'd be super fun.
::
I'm guessing we're still a bit far from
that, but maybe not too far.
::
Hopefully in a few years, that'd be
definitely super fun.
::
Yeah, no doubt.
::
Yeah, I mean, and that's, definitely
doable.
::
But yeah, you need you need really good
data and you need really good structure in
::
your model.
::
Yeah, that's the part too, is getting
getting good data, know, player tracking
::
data is fine.
::
I mean, it has errors, you know, people
who think that it's like a panacea, you
::
know, it's like, have you really worked
with it?
::
I mean, there's
::
Sampling at 10 hertz for humans that move
really, really fast.
::
Acceleration is a derivative of speed.
::
At 10 hertz, people who are moving really
fast, that data gets noisy pretty quick.
::
I think one of the things is as we
progress, as the technology keeps
::
improving, things get better.
::
you get better data and maybe that helps
you also answer some of these questions a
::
little bit more specifically.
::
yeah.
::
And then we'll be able to have our huge
patient model with a lot of different
::
likelihoods in there that fit into each
other.
::
And then we don't even need to play the
game.
::
We don't have to play the game.
::
They just let the computers play the game
and it's over.
::
We're done.
::
Yeah, no.
::
No, you still have to play the game
because you still have randomness.
::
Then you're like, yeah.
::
mean, because otherwise the model is kind
of like if you want kind of a quantum
::
state, right?
::
Where the model can see the probabilities
of things happening, but then you have to
::
open the box and see what is actually
happening.
::
So you can have the best model.
::
In the end, you still have to play the
game to see what's going to happen because
::
it's not deterministic.
::
Yeah, thankfully.
::
yeah, that's right.
::
Yeah.
::
Yeah.
::
But I mean, it's definitely I always love
doing these these big models.
::
And that's definitely doable.
::
I've done that for election forecasting,
for instance, where you have several
::
likelihoods, one for polls, for instance,
and one for elections.
::
So yeah, that's I know that's definitely
doable in the Bayesian framework, because
::
I mean, why not?
::
It's just part of the big
::
of the same big model in a directed S
-secret graph, if you want.
::
But yeah, I'm curious to see that done in
spots.
::
Maybe we'll get back together for another
episode, Patrick, where we talk about that
::
and how we did that.
::
That'd be cool.
::
Yeah, there you go.
::
Yeah, actually, I wanted to ask you to
close us out here.
::
about, you you've started talking about
that right now, like some emerging trends
::
in sports analytics that you believe will
significantly impact how teams manage
::
training, performance, drafting in the
near future.
::
And also if there are any spots you see as
more promising than others.
::
well, mean, yeah, trends.
::
Yeah.
::
We talked a lot about that stuff and I
think, you know, better data and better,
::
you know, better technology.
::
all of those things will, will, I think
will help us.
::
I think also it's getting, you know,
getting the decision makers comfortable
::
with the utility of some of this stuff,
you know, baseball, has always been a game
::
of numbers.
::
And, I think early.
::maybe mid:
seven, you know, releasing data kind of to
::
the public, really the first sport to get
player tracking data, things like that.
::
I think that opened up a lot of
opportunities for people to do really
::
interesting work in the public space,
which then sort of got
::
teams interested and then sort of a, you
know, more of a shift in people in the
::
front office where, maybe historically it
was ex players who kind of played out
::
until they retired and then became scouts
and managers and things like that.
::
I think that, you know, that happening in
baseball was a really good thing for that
::
sport.
::
And I think slowly for the other sports,
that's really
::
probably needs to happen because the more
that these things are open and sort of
::
curbside, I think the more the decision
makers become comfortable with them and
::
can say like, I can see how I would use
this.
::
I can see what this might help me with.
::
so I think that's never underestimate the
work that you do in the public space
::
because I think there's an opportunity to
always.
::
you know, help things evolve,
crowdsourcing, guess.
::
Yeah, mean, preaching to the choir here.
::
Yeah, for me, a lot more of these data
would be open sourced.
::
Yeah, I mean, there is also an extremely
interesting trend right now towards open
::
sourcing more and more parts of large
language models.
::
I think that's going to be extremely
interesting to see that develop because
::
At the same time, this is very hard
because these kind of models are just so
::
huge.
::
You need a lot of computing power to make
them run.
::
So I don't know how open source can help
in that, but I know how open source can
::
help in the development and sustainability
and trustworthiness and openness of all
::
that stuff.
::
So that's going to be super interesting.
::
And I'm also going to be very interested
in
::
the different spots evolve.
::
Now that basically the nerds are they are
much more right than before.
::
know, so like, probably baseball is going
to be at the forefront of that because
::
they just have a lot of, of, know,
advanced in years compared to the other
::
sports.
::
So it's going to be interesting to see how
things plays out here when it comes to
::
data.
::
Because at the same
::
Not sure it makes a lot of sense for all
the clubs to have their own data
::
collection structure if in the end they
just have the same data because you're
::
mainly, I think, to gather data, you are
limited, I'm guessing, by the technology
::
much more than by the ideas of a coach or
manager or a scientist being like, I
::
data, I think in the end, the data
collection is something that can be pretty
::
much, you know, collective, but then how
you use the data is more the appropriate
::
proprietary stuff.
::
It's going to be interesting to see that
out.
::
Yeah, no doubt.
::
Great.
::
Well, Patrick, I've taken a lot of your
time already.
::
I need to let you go because...
::
You need to drink some coffee.
::
definitely need to because that was very
intense.
::
But man, so interesting.
::
Before letting you go, so I have the last
two questions, of course, as usual.
::
You told me before we started the show
that when the season is going to start
::
again for you in US football, your days
are going to be extremely busy.
::
Like basically working from 5 a .m.
::
to 10 p .m.
::
or something like that.
::
How is that possible?
::
when do you sleep?
::
We do have some long days.
::
It depends on the day of the week and when
the full practice days are.
::
Usually, yeah, I'd get in around 4, 45 or
5, have a bit of a workout, and then kind
::
of start the day around 6, 30 or 7.
::
And it's really long.
::
I mean, there's a ton of meetings.
::
It's a very tactical sport if you've ever
watched it.
::
And so the players are nonstop in and out
of meetings and walk through practices and
::
full practices and then more meetings.
::
it's all a big, you know, tactical pattern
recognition type of thing.
::
And so, you know, we're in, you know,
working on projects and data and getting
::
you know, things set up so that model set
up and identifying things in data for the
::
staff and things like that.
::
it just becomes this really long day.
::
And I mean, like, yeah, if we go home at
eight or nine, maybe 930 sometimes, maybe
::
10, but I mean, there's people there
that'll stay even later than that, just
::
going through film and watching it.
::
They are very long days.
::
Usually those types of days are about
three days a week and then the other days,
::
I might be in there at five and get out at
like five or six.
::
So still 12 hour days, but it's a long
week for sure.
::
This is brutal.
::
Yeah.
::
But is it like that during the whole
season or is that mainly the start of the
::
season?
::
No, that's the season.
::
That is
::
18 weeks later we have a bye, 17 games
this season.
::
Damn, impressive.
::
You have to be sharp with your sleep also,
I guess in these weeks.
::
You do, yes.
::
You try and catch up on the weekends.
::
Yeah, damn.
::
Awesome, well Patrick, I think it's time
to call it a show.
::
Thank you so much, that was amazing.
::
Of course, I'm going to ask you the last
two questions, ask every guest at the end
::
of the show.
::
You knew that was coming, right?
::
Yes.
::
So what's the first one?
::
You know the first one.
::
The first one is, if unlimited resources,
what problem would you solve?
::
Yeah, unlimited time and resources.
::
I'll take one outside of sport, but one
::
I witnessed in sport.
::
so when I first started, I used to do all
of the GPS stuff, like live on the field.
::
Now someone else does it, but coding it or
cutting it up and stuff like that during
::
practice.
::
And on Friday practices at the time, that
was the day for our make -a -wish, the
::
make -a -wish child.
::
So they'd have kids that had make a wish
and their wish was to see a practice and
::
meet their favorite NFL players.
::
And these were usually kids that were, you
know, were small and terminally ill.
::
I think the, that's probably the thing
that I would solve because standing there
::
and you watch that and you work with all
these guys that are healthy and young.
::
And then you see this little kid who never
have a chance to
::
healthy and young, but they're just so
happy to meet these guys.
::
I think like that's a super unfair thing
for those little kids.
::
if I could solve anything, it'd be like
that, you know, kids and cancer and stuff
::
like that.
::
I think it's just a horrible thing.
::
And then your second question is always, I
could have dinner with anyone dead or
::
alive, who would it be?
::
There's so many good ones, but I think I
would pick...
::
a previous guest that you've had, I think
three times, if I'm correct, which is
::
Andrew Gellman.
::
I think he's fascinatingly interesting and
I think dinner would be pretty amazing.
::
Yeah.
::
yeah.
::
Both good choices, amazing answers.
::
Thanks, Patrick.
::
I can tell your faithful listeners because
they're like, yeah, you knew the
::
questions.
::
Like you're taking my job basically, I can
see that.
::
No, that's great.
::
So Andrew, if you're listening, well, if
you're ever in New York, Patrick will try
::
and make that work.
::
That'd be fun for sure.
::
Yeah, Andrew is always fantastic to talk
to.
::
So yeah, that's definitely a great choice.
::
Awesome.
::
Well, that's it, Patrick.
::
Thank you so much for being in the show.
::
I really had a blast and learned a lot
about US football because I that's, I
::
think that's not the sport I know most
about.
::
So definitely thank you so much for taking
the time.
::
We'll put resources to your website in the
show notes for those who want to dig
::
deeper.
::
have a bunch of links over there and
::
Thank you again, Patrick, for taking the
time and being on the show.
::
Thank you.
::
This has been another episode of Learning
Bayesian Statistics.
::
Be sure to rate, review, and follow the
show on your favorite podcatcher, and
::
visit learnbaystats .com for more
resources about today's topics, as well as
::
access to more episodes to help you reach
true Bayesian state of mind.
::
That's learnbaystats .com.
::
Our theme music is Good Bayesian by Baba
Brinkman, fit MC Lars and Meghiraam.
::
Check out his awesome work at bababrinkman
.com.
::
I'm your host.
::
Alex Andorra.
::
You can follow me on Twitter at Alex
underscore Andorra like the country.
::
You can support the show and unlock
exclusive benefits by visiting Patreon
::
.com slash LearnBasedDance.
::
Thank you so much for listening and for
your support.
::
You're truly a good Bayesian.
::
Change your predictions after taking
information in and if you're thinking of
::
me less than amazing, let's adjust those
expectations.
::
me show you how to be a good Bayesian
Change calculations after taking fresh
::
data in Those predictions that your brain
is making Let's get them on a solid
::
foundation