Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag 😉

Takeaways:

  • Communicating Bayesian concepts to non-technical audiences in sports analytics can be challenging, but it is important to provide clear explanations and address limitations.
  • Understanding the model and its assumptions is crucial for effective communication and decision-making.
  • Involving domain experts, such as scouts and coaches, can provide valuable insights and improve the model’s relevance and usefulness.
  • Customizing the model to align with the specific needs and questions of the stakeholders is essential for successful implementation. 
  • Understanding the needs of decision-makers is crucial for effectively communicating and utilizing models in sports analytics.
  • Predicting the impact of training loads on athletes’ well-being and performance is a challenging frontier in sports analytics.
  • Identifying discrete events in team sports data is essential for analysis and development of models.

Chapters:

00:00 Bayesian Statistics in Sports Analytics

18:29 Applying Bayesian Stats in Analyzing Player Performance and Injury Risk

36:21 Challenges in Communicating Bayesian Concepts to Non-Statistical Decision-Makers

41:04 Understanding Model Behavior and Validation through Simulations

43:09 Applying Bayesian Methods in Sports Analytics

48:03 Clarifying Questions and Utilizing Frameworks

53:41 Effective Communication of Statistical Concepts

57:50 Integrating Domain Expertise with Statistical Models

01:13:43 The Importance of Good Data

01:18:11 The Future of Sports Analytics

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.

Links from the show:

Transcript:

This is an automatic transcript and may therefore contain errors. Please get in touch if you’re willing to correct them.

Transcript
Speaker:

Today's episode takes us into the dynamic

intersection of Bayesian statistics and

2

:

sports analytics with

3

:

Patrick Ward, the Director of Research and

Analysis for the Seattle Seahawks.

4

:

With a rich background that spans from the

Nike Sports Research Lab to teaching

5

:

statistics, Patrick brings a wealth of

knowledge to the table.

6

:

In our discussion, Patrick delves into how

these methods are revolutionizing the way

7

:

we understand player performance and

manage injury risks in professional

8

:

sports.

9

:

He sheds light on the particular

challenges of translating

10

:

complex Beijing concepts for coaches and

team managers who may not be versed in

11

:

statistical methods but need to leverage

these insights for strategic decisions.

12

:

Patrick also walks us through the

practical aspects of applying Beijing

13

:

stats in the high -stakes world of the

NFL.

14

:

From selecting the right players to

optimizing training loads, he illustrates

15

:

the profound impact that thoughtful

statistical analysis can have on a team's

16

:

success and players'

17

:

For those of you who appreciate the blend

of science and strategy, this conversation

18

:

offers a behind -the -scenes look at the

sophisticated analytics powering team

19

:

decisions.

20

:

And when he's not dissecting data or

strategizing for the Seahawks, Patrick

21

:

enjoys the simple pleasures of reading,

savoring coffee, and playing jazz guitar.

22

:

This is Learning Bayesian Statistics,

,:

23

:

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference.

24

:

the methods, the projects, and the people

who make it possible.

25

:

I'm your host, Alex Andorra.

26

:

You can follow me on Twitter at alex

.andorra, like the country.

27

:

For any info about the show, learnbasedats

.com is Laplace to be.

28

:

Show notes, becoming a corporate sponsor,

unlocking Bayesian Merge, supporting the

29

:

show on Patreon, everything is in there.

30

:

That's learnbasedats .com.

31

:

If you're interested in one -on -one

mentorship, online courses, or statistical

32

:

consulting,

33

:

Feel free to reach out and book a call at

topmate .io slash alex underscore and

34

:

dora.

35

:

See you around folks and best patient

wishes to you all.

36

:

And if today's discussion sparked ideas

for your business, well, our team at Pimc

37

:

Labs can help bring them to life.

38

:

Check us out at pimc -labs

39

:

Hello, my dear patients, I have some

exciting personal news to share with you.

40

:

I am thrilled to announce that I have

recently taken on a new role as a senior

41

:

applied scientist with the Miami Marlins.

42

:

In this position, I'll be diving even

deeper into the world of sports analytics,

43

:

leveraging Bayesian modeling, of course,

to enhance team performance and player

44

:

development.

45

:

And honestly, this move is so exciting to

me and solidifies my commitment to

46

:

advancing the application of Beijing stats

and sports.

47

:

if you find yourself in Miami or if you're

curious about the intersection of Beijing

48

:

methods and baseball or team sports in

general, don't hesitate to reach out.

49

:

OK, back to the show now.

50

:

Patrick Ward, welcome to Learning Bayesian

Statistics.

51

:

Thanks for having me.

52

:

I listen to every episode.

53

:

I think every year at the end of the year,

Spotify tells me that it's one of my

54

:

highly listened to podcasts.

55

:

So it's pleasure to be here.

56

:

Hopefully.

57

:

I don't know if I can live up to your

prior.

58

:

You've had some pretty big timers, but

yeah.

59

:

No, yeah.

60

:

So first, thanks a lot for being such

61

:

faithful listener.

62

:

I definitely appreciate that.

63

:

And I'm always amazed at the diversity of

people who listen to the show.

64

:

That's really awesome.

65

:

And also I want to thank Scott Morrison,

put us in contact.

66

:

Scott is working at the Miami Marlins.

67

:

He's a fellow colleague now.

68

:

That's change for me, that's great change.

69

:

I'm extremely excited about that new step

in my life.

70

:

But today we're not going to talk a lot

about baseball.

71

:

We're going to talk a lot about US

football.

72

:

So today, European listeners, when you

hear football, we're going to talk about

73

:

American football, the one with a ball

that looks like a rugby ball.

74

:

And so Patrick, we're going to talk about

that.

75

:

But first, as usual, I want to talk a bit

more about you.

76

:

Can you tell the listeners what you're

doing nowadays?

77

:

So I gave your title, your bio in the

intro, but maybe like tell us a bit more

78

:

in the flesh what you're doing and also

how you ended up doing what you're doing.

79

:

Yeah.

80

:

Well, currently I'm at the Seattle

Seahawks, which is one of the American

81

:

football teams in the NFL.

82

:

And I'm the director of research and

analysis there.

83

:

So we kind of work across all of football

operations.

84

:

So everything from player acquisition,

front office type of stuff to team based

85

:

analysis and opponent analysis.

86

:

And just kind of coordinating a research

strategy around how we attack questions

87

:

for the key decision makers or the key

stakeholders across

88

:

coaching, acquisition, even into player

health and performance and development and

89

:

things like that.

90

:

And I got here, this is my 10th year, I

got here from Nike.

91

:

So I was at Nike in the sports research

lab actually working for nearly two years

92

:

as a researcher.

93

:

And the way that I got

94

:

was I was doing some projects for Nike

around applied sports research and they

95

:

had just at the time, I think they had

just become like the biggest sponsor of

96

:

the newly minted National Women's Soccer

League.

97

:

And they said, we want to do something

around this.

98

:

And so, you we were kind of kicking around

ideas.

99

:

And one of the ideas we had was what if we

went out and we tested all of the women in

100

:

the league, like tested them

101

:

sprinting and jumping and power output and

things like that.

102

:

And then we could basically build like

archetypes and that would be useful for,

103

:

you know, like apps in your watch and on

your phone and girls could in the field

104

:

could compare themselves to their favorite

athletes and stuff.

105

:

So they let us do it.

106

:

And they sent me on the road for an entire

off season, the entire off season training

107

:

of the national women's soccer league.

108

:

went around the country to every single

team, myself and four colleagues, and we

109

:

tested.

110

:

every woman in the league.

111

:

And so we had the largest data set on

women's soccer players that anyone could

112

:

have.

113

:

So we did some conference presentations

and things like that with that data.

114

:

And lo and behold, Nike was there and the

Seattle Seahawks called down to Nike and

115

:

said, hey, we hear there's this test

battery and we'd love to see what our

116

:

players do on it.

117

:

And so I went up and I did a project for

them around that.

118

:

And

119

:

then they kind of just said like, what if

you just did this kind of stuff all the

120

:

time?

121

:

And so that's how I started out 10 years

ago.

122

:

And I basically started out just in

applied physiology, which was my

123

:

background.

124

:

And I was doing like wearables, wearable

tech for the team, like GPS and

125

:

accelerometry and things like that.

126

:

And then that kind of progressed into

draft analysis and player evaluation and

127

:

things like that.

128

:

And it just kind of growing until

129

:

Yeah, 10 years later, here we are.

130

:

Yeah, that's a great, yeah, it's it's a

great, uh, background.

131

:

love it because I mean, definitely it

seems like you've been into sports since

132

:

you are, uh, at least a college graduate,

but also there is a, uh, a bit of

133

:

randomness in this.

134

:

So sorry, love that.

135

:

Uh, of course, as a, as a fellow Bayesian,

always, always interested in.

136

:

in the random parts of anybody's journey.

137

:

Actually, how much of the Bayesian stats

do you have in that journey and also in

138

:

your current work?

139

:

How Bayesian in a way is your work right

now, but also how were you introduced to

140

:

Bayesian stats?

141

:

Well, mean, anyone who has watched

American football knows it's a game of

142

:

very, very small sample sizes.

143

:

So we only play up until two years ago.

144

:

We play 17 games now.

145

:

We used to only play 16 games.

146

:

So unlike most of the other sports,

baseball has got 162, several hundred at

147

:

bats, basketball, hockey, 82 games.

148

:

many attempts.

149

:

Also the players in a lot of these sports

are all doing the same things in baseball.

150

:

Aside from the pitchers, everybody's going

to go to the plate and hit in basketball.

151

:

Everybody has a chance on the court to

dribble the ball, shoot, score, pass, get

152

:

assists, get blocks, et cetera.

153

:

Football is really unique because it's a

very tactical game.

154

:

There's discrete events in terms of plays,

stop and start.

155

:

But because of the tactical nature of it

and one ball, there's only certain

156

:

positions that touch the ball.

157

:

There's only certain opportunities that

players are going to have.

158

:

So that was always an issue.

159

:

And when I did my PhD, a big part of my

PhD was using mixed models to look at

160

:

physiological differences between players

on the field with GPS and accelerometry.

161

:

And I always thought of mixed models, even

though I didn't know it at the time,

162

:

because I hadn't really...

163

:

learned anything Bayesian yet.

164

:

I always thought of mixed models.

165

:

I think of them as like this bridge to

Bayesian analysis because you have these

166

:

fixed effects which behave like our

population averages, our population base

167

:

rates, I guess you could say.

168

:

And the random effects are sort of like,

hey, we know something about you or your

169

:

group and therefore we know how you

deviate from the population.

170

:

And then with those two bits of

information, we're also like, hey, here's

171

:

someone new in the population, or maybe

someone that we've only seen or observed

172

:

do the thing one time.

173

:

best guess therefore is the fixed effects

portion of this until proven otherwise.

174

:

So I always had that in the back of my

head going through this, but you know, my

175

:

first two or three years in the NFL, we

always just used to kind of throw our

176

:

hands up when we see small samples, we'd

be like, yeah, it's this, it's

177

:

50%, but it's such a small sample, we

can't really know.

178

:

And we didn't really have a good way of

like sorting out what to do with that

179

:

information.

180

:

Because as you know, know, something like

one out of 10 and 10 out of a hundred and

181

:

you know, a hundred out of a thousand,

those are the same proportions, but

182

:

different levels of information are

contained within those proportions.

183

:

And I stumbled upon a paper, it was like a

19, I think it was like 19.

184

:

77 or something by Efron and Morris and it

was called the Stein's paradox and I

185

:

probably stumbled on it because I was like

You know, there's so much in Saber Metrics

186

:

someone in baseball has probably figured

this out before and so I I was probably

187

:

googling something like Small samples

baseball statistics Saber Metrics blah

188

:

blah blah and I stumbled upon this paper

about Stein's paradox and The crux of the

189

:

paper was if we observe these I think it

was

190

:

12 or 18 baseball players through the

first half of the season up to the All

191

:

-Star break.

192

:

And we see the number of times they went

to the plate and what they're, you know,

193

:

and the number of times they hit, we have

a batting average.

194

:

If we take the observed batting average

through the first half of the season, how

195

:

well does that predict the batting average

at the end of the season?

196

:

Meaning now they've gone through the

second half.

197

:

And

198

:

You look at that and you're like, okay,

let's, you know, what's this all about?

199

:

And so the first thing they do is they set

up this argument that like, well, that

200

:

doesn't do a very good job because some of

these players batted, you know, five times

201

:

or three times, certainly a player who

went three for three has a hundred percent

202

:

batting average.

203

:

We don't think this is the greatest

baseball player of all time yet, because

204

:

we've only seen them do this thing three

times.

205

:

So, the basic naive prediction of using

the half

206

:

first half of the season to predict the

second half wasn't very good.

207

:

And so in that paper, they introduced this

kind of simple Bayesian model of saying,

208

:

well, we know something about average

baseball players.

209

:

What if we weighted everybody to that?

210

:

And lo and behold, that did a bit better

of a job constraining the small sampled

211

:

players who had these, you know, a guy

that goes 0 for 10, which is totally

212

:

possible in baseball when you have

hundreds of it bats.

213

:

We don't think that's the worst hitter in

baseball.

214

:

so, you know, constraining those players

told them something about what they

215

:

expected to then see at the end of the

season.

216

:

And so through that paper, then I found

this blog by David Robinson, who's an R

217

:

programmer.

218

:

And it was all about like using empirical

Bayesian analysis for baseball.

219

:

And then he made it into a nice little

book that you could buy on Amazon for

220

:

like, I don't know, $20 or something.

221

:

You know, and I read those two things and

I was like, this is incredible.

222

:

This is exactly what I've always wanted to

know.

223

:

And so like I went in the next day to our

other analysts at the time, there was only

224

:

two of us.

225

:

And I said, I think I figured out a way we

could solve small sample problems.

226

:

And, and that was it.

227

:

Like then after that, you really couldn't

convince me otherwise that this wasn't a

228

:

great way of thinking.

229

:

That doesn't mean that everything we do

has to be Bayesian.

230

:

Certainly like there's other things that

we do that are used.

231

:

you know, different tools like machine

learning models and neural networks and

232

:

things like that.

233

:

But certainly when we start thinking about

like decision -making, how do I

234

:

incorporate priors, domain expertise?

235

:

How do I fit the right prior?

236

:

You know, like if you went 0 for 5 and

you're first at bats, let's say in

237

:

baseball, but you were a college standout

and you were an amazing

238

:

player in the AAA, I probably have a

stronger prior that you're maybe a

239

:

slightly better than average baseball

player than if you went 0 for 5 and you

240

:

were a horrific college player and you

weren't very good in AAA and you were

241

:

really the last person on the bench that

we needed to call.

242

:

And maybe that prior is much lower.

243

:

so utilizing that information in order to

help us make decisions going forward,

244

:

that's really

245

:

That was kind of the money for me.

246

:

And so how much do we use it?

247

:

mean, if we have a new analyst start, one,

you know, one of our new analyst starts,

248

:

started two years ago.

249

:

I think the first thing was like, how much

do you know about BASE?

250

:

And it was like, well, I never really

learned that in school and blah, blah.

251

:

And it was like, okay, here's two books.

252

:

Here's a 12 week curriculum.

253

:

We're going to meet every week and you're

going to do projects and homework and

254

:

reading.

255

:

And that was it.

256

:

Like, it was like, you have to learn this

because this is how we're going to think.

257

:

And this is how we're going to,

258

:

process information and communicate

information.

259

:

Well, what about that?

260

:

I told the listeners that we were not

going to talk a lot about baseball, but in

261

:

the end we are.

262

:

It all comes back to baseball, think.

263

:

Yeah, in sports analytics, all comes back

to baseball, Certainly, yeah.

264

:

Yeah, okay.

265

:

If I understand correctly, was motivated a

lot by low sample sizes and being able to

266

:

handle all of that in your models.

267

:

That makes a ton of sense.

268

:

As a lot of people, I've seen a lot of

clients definitely motivated by a very

269

:

practical problem that you were having.

270

:

I mean, most of people enter the Beijing

field through that.

271

:

Something that I'm actually very curious

about, because like I could keep talking

272

:

about that for hours, but I really want to

dive into what you're doing at the

273

:

Seahawks and also, you know, like how

Beijing stats is helpful.

274

:

to what you guys are doing.

275

:

I think it's the most interesting for the

listeners who understand basically how

276

:

themselves could they apply patient stats

to their own problems, which are not

277

:

necessarily in sports, but I think sports

is a really good field to think about that

278

:

because you have a lot of diversity and

you have also a lot of somewhat controlled

279

:

experiments.

280

:

You have a lot of constraints and that's

always extremely interesting to talk about

281

:

that.

282

:

Maybe you can start by basically

explaining how patient stats are applied

283

:

in your current role for analyzing player

performance and injury risk.

284

:

Because now that I work directly in

sports, something I'm starting to

285

:

understand is that really player

286

:

projecting player performance and also

being able to handle injury risk are two

287

:

extremely important topics.

288

:

So maybe let's start with that.

289

:

What can you tell us about that, Okay.

290

:

Let's see.

291

:

Which one should I start with?

292

:

I guess I'll start with injury risk, I

suppose.

293

:

Injury is like...

294

:

I mean, this is like a super difficult

problem to solve.

295

:

You know, I've written a number of papers

on those.

296

:

think you can link to my research gate.

297

:

And there's a number of methodology papers

that we've written that have looked at

298

:

things like this.

299

:

And I think it's complicated because one,

there's like a ton of inter -individual

300

:

differences as far as why people get hurt.

301

:

There's a ton of things that we probably,

you know, don't know they're important yet

302

:

because we can't measure them or we at

least can't measure them in the real world

303

:

applied.

304

:

setting, maybe in a lab you can.

305

:

And then there's other things that we just

don't know because we're like, it's a

306

:

epistemic problem.

307

:

Like we're just stupid about it.

308

:

We're naive that there's other things out

there that maybe we're just unaware of

309

:

yet.

310

:

And so it's a really hard problem to try

and solve.

311

:

So when I see papers that basically come

out and say like an injury prediction

312

:

model and they're estimating

313

:

prediction as like a one or a zero, like a

yes or a no, like a binary response, and

314

:

they give a nice little two by two table

and they talk about how well their model

315

:

did.

316

:

I'm always like, I don't, how is that

useful to the people who actually have to

317

:

do the work?

318

:

Because in reality, what we're dealing

with is it's probably not unlike a hedge

319

:

fund manager managing the risk of their

portfolio.

320

:

And if you think of each player,

321

:

or each athlete that you deal with as a

portfolio, they each have some level of

322

:

base risk.

323

:

So if we know nothing about you, you

really have to have a pretty good handle

324

:

in your sport of what's the base rates of

risk of injury for position groups and

325

:

players of different age and things like

that.

326

:

So that might be an initial model, right?

327

:

And then from there...

328

:

The players go out and they do things and

they play and they perform and they

329

:

compete and they get dinged up and they

take hits and they get, you know, hit by

330

:

hit by pitches or they get tackled really

hard or things like that.

331

:

And we collect that information and we're

basically just shifting the probabilities

332

:

up and down based on what we observe over

time.

333

:

And when that probability reaches a

certain threshold.

334

:

And of course you could use a posterior

distribution.

335

:

So you have an integral of like how much

of the probability distribution is above

336

:

or below a certain threshold.

337

:

Then you have the opportunity to have a

discussion about when to act or what to

338

:

do.

339

:

And how you act and when to act is going

to be dependent on your tolerance for risk

340

:

or your coach's tolerance for risk.

341

:

If it's your best player, if it's the MVP

of your team and it's week two of the

342

:

season and the risk probability, or let's

say we're using this as a model.

343

:

Some of the stuff that you mentioned Scott

earlier that we've worked on is like

344

:

return to play type of models where it's

like, okay, the athlete has, you know, saw

345

:

an ankle sprain and we're there rehabbing

346

:

And we have a, you know, we have a test or

several tests, a test battery that tells

347

:

us where that athlete is on their return

to play timeline.

348

:

Um, let's say it's week two of the season

and we say, well, there's a, you know, the

349

:

probability distribution, the posterior

distribution looks like this.

350

:

Here's the threshold that we'd feel

comfortable releasing this athlete back to

351

:

full on competition.

352

:

And there's a 30 % chance

353

:

they're in good shape and there's a 70 %

chance that they're below that threshold.

354

:

In week two of the season, we probably

want to say, you know what?

355

:

Let's not take that risk this week.

356

:

Let's be a little bit more risk averse

here because it is the best player.

357

:

And let's wait till we have more

distribution on the right side of the

358

:

threshold.

359

:

Alternatively, if it's

360

:

final game of the season, it's the Super

Bowl or the World Series or the Champions

361

:

League final or something like that,

you're going to probably take that risk

362

:

because you need the best player out

there.

363

:

And so when I think about injury risk

modeling, what I really think about is how

364

:

do we evaluate this individual's current

status?

365

:

on our sort of risk score or our risk

distribution.

366

:

And when do we feel like we need to

intervene and do something?

367

:

And when are we going to feel like, this

is fine and continue training as is.

368

:

And I think that's the tricky part.

369

:

I don't think it's not easy.

370

:

I don't think I've solved anything.

371

:

I don't think anyone has, but...

372

:

Certainly from the perspective of our

staff, we can all sit down with a

373

:

performance staff of strength coaches and

dieticians and strength coaches and

374

:

medical people and sit down and have these

conversations.

375

:

And what makes it nice about using a

Bayesian approach is that we can also take

376

:

into account domain expertise that we

might not have in the data.

377

:

So if we sit down on a Monday meeting and

then we say, you know, this player,

378

:

This is where they're currently at and

this is their risk status, which I don't

379

:

know, I don't really feel comfortable with

that.

380

:

How do you feel about it?

381

:

And then one of the medical people say,

you know, he's been complaining that his

382

:

hamstring feels really tight and he's been

getting treatment every morning.

383

:

Well, that's not data that we would be

collecting, but that's valuable domain

384

:

information that this individual who's

working with the player now adds to this.

385

:

And it's just like anything in

probability.

386

:

It's like if we

387

:

two or three or four independent sources,

all kind of converging on the same

388

:

outcome, on the same end point, we

probably need to feel really good about

389

:

making that decision and saying like, hey,

let's do something about this, let's act

390

:

now, right?

391

:

So that's kind of how we, you know, that's

how I think about it in that, you know,

392

:

from that side of things.

393

:

From the performance side of things, the

development side of things,

394

:

It's probably going to be, I mean, it'd be

way different for you guys in baseball

395

:

because you draft a player.

396

:

You don't expect them to maybe get to the

major leagues and contribute till 23, 24,

397

:

25 years old.

398

:

You know, for us, you draft a player and

those are going to be the, you know, next

399

:

year they're playing, they're ready, you

know, they're in, they're in the mix.

400

:

So in that regard, you'd be thinking of

models that would probably be, in my head,

401

:

I would be thinking of it as like models

that are mapping the growth potential of

402

:

an individual.

403

:

How are they progressing through the minor

leagues, which attributes matter?

404

:

And then maybe from there answering

questions like what's the probability that

405

:

this player makes 20 starts in the major

leagues or starts for three seasons

406

:

whatever end point makes sense to the

decision makers, obviously.

407

:

You know, for us, it's more about like

player identification.

408

:

And again, football is a, is a sport of

small samples.

409

:

And so in their college years, some of

these kids might really only be a starter

410

:

or a full -time player in their junior and

senior year, or maybe just their senior

411

:

year of college.

412

:

Additionally, you know, unlike, unlike the

NFL where

413

:

you know, at that highest level, the

talent is much more homogenous.

414

:

You get to the college football ranks and

you have just this diversity of talent

415

:

where you might have a big time team

playing a really lower level opponent.

416

:

And so, you you have to adjust things,

being able to hand off the ball to your

417

:

running back who's playing against a very

low level opponent.

418

:

And he goes for 500 yards or something

absurd, 200, 300 yards in a game.

419

:

that has to be adjusted and weighted in

some way because it's not the same as

420

:

going two or 300 yards against a big time

opponent.

421

:

And the big time opponents are more

similar to the NFL players that they're

422

:

going to play against.

423

:

And so, you know, all of these types of

things fit into models and hierarchical

424

:

models and Bayesian models, which help us

utilize prior information.

425

:

And the other way that the Bayesian models

are useful here

426

:

You know, sometimes we're dealing with

information that's incomplete because we

427

:

can't observe all of the cases.

428

:

You know, for example, in college sport,

division one is the top division.

429

:

You know, and then you have FBS and then

they division two and division three.

430

:

So if you pull all the division two kids

that have ever made it as a pro athlete,

431

:

the list is very small.

432

:

but they're kids that made it.

433

:

And so if you were to just build a normal

model on this, it would say like, well,

434

:

the best players clearly come from these

lower level schools because all of the

435

:

ones that we have seen have made it, have

been successful.

436

:

And in theory, there's hundreds of

thousands of kids from that level that

437

:

have never made it.

438

:

So we have to adjust that model in some

way.

439

:

We have to weight that prior back down.

440

:

Yeah, this guy is really, really good at

that level.

441

:

but our prior belief on him making it is

very, very low.

442

:

And you mean he'd have to be so

exceptional in order to, and this is where

443

:

like, oftentimes people rail on like, use

weekly informative priors, let the data

444

:

speak a little bit.

445

:

But there are times where in these

situations where I feel like you could

446

:

probably put a slightly stronger prior on

this and be like, man, this guy's really

447

:

gonna have to do something outstanding to

get outside.

448

:

of the distribution that we believe is on

this just given what we know.

449

:

Okay, yeah, that's very interesting.

450

:

That's a very good point.

451

:

Since I, yeah, related to survivor bias in

a way.

452

:

How concretely, how do you, how do you

handle these kind of cases?

453

:

Is it a matter of using a different prior

for these type of players or something

454

:

Try to do this in a few different ways.

455

:

One is you try and make basically like

equivalency metrics, like saying if you

456

:

did X at this low level, it in some way

relates to Y at this other level.

457

:

So you try and normalize players based on

players that you've seen that have moved,

458

:

say, between levels

459

:

of the game.

460

:

so like, again, if you think about it from

a baseball perspective, you know, hitting

461

:

40 home runs in AA baseball might be

related to, you know, might be in some way

462

:

convert to like 33 home runs in AAA and 24

home runs in the MLB or 12 home runs in

463

:

the MLB or whatever it might be.

464

:

Right.

465

:

So trying to,

466

:

identify equivalencies between those that

we can then like constrain everybody.

467

:

Other ways is just like, like you said,

like putting a prior on it.

468

:

knowing the level that the person is

playing at, you would have like a lower

469

:

level of prior.

470

:

For example, it's just like playtime.

471

:

If I think about playtime and performance

as sort of this,

472

:

this kind of like rising curve that goes

to an asymptote of some upper level of

473

:

performance.

474

:

The players way at the left who have very

small number of observations, it would be

475

:

silly to say that my prior for those

players is the league average.

476

:

There's a reason why they're not playing

very much.

477

:

It's probably because people don't think

they're very good, right?

478

:

So somewhere in that curve,

479

:

for each of those numbers of observations

across whatever performance metric we're

480

:

looking at, there's going to be a specific

prior on that continuous distribution.

481

:

And that's where I would, you know, that's

where we would kind of draw a stake in the

482

:

ground and say like, we probably think

based on what we know that this player is

483

:

closer to these players than he is to

those players.

484

:

Okay, yeah, yeah, I see.

485

:

Yeah, definitely makes sense.

486

:

And yeah, yeah, like that point of play

time already tells you something.

487

:

Because if the player plays less, then

very probably already you know you have

488

:

information about his level.

489

:

And that means he's at least not as good

as the A level players that play much more

490

:

The only time you get in trouble with that

is like an endowment effect where if you,

491

:

you know, like in major league baseball,

there's been some research on players who

492

:

are drafted very high in the first round,

second round get progressed up and through

493

:

the minor leagues faster than players who

were drafted lower, even if they don't

494

:

outperform those players just because

they're high as a consequence of being a

495

:

high draft pick.

496

:

That one's a tricky one, but there has to

be, at some point it's like actually, and

497

:

this is where like, know, posterior

distributions, you can really, I mean,

498

:

it's almost like doing an AB test.

499

:

Like we've got two players and what's the

probability that this guy is actually

500

:

outperforming the other guy, even though

the other guy might've been, you know, a

501

:

higher draft pick or something like that.

502

:

And so you try and at least display, you

know, we try and at least display that

503

:

visually and have those conversations.

504

:

It's,

505

:

kind of in my head, at least maybe I'm

wrong, but a nice way of like helping

506

:

people understand the uncertainty, you

know, which is really important.

507

:

always, maybe it's try, you know, I used

to work with a guy who whenever I would

508

:

present some of the stuff at work and he'd

be like, stop doing that.

509

:

Like every, every time you present, you

talk about like what the uncertainty and

510

:

the assumptions and the limitations are,

like just give them the answers.

511

:

And I'm like, well, it's important that

they know what the limitations

512

:

and what assumptions are behind this

because we can't, we don't want to talk

513

:

past the sale and sell them on something

that, you know, isn't really there.

514

:

Like there's been times where I've had to

stop someone and just be like, hold on.

515

:

This analysis definitely can't tell us

that.

516

:

Like what you're saying right now, it

can't tell us that.

517

:

like, let's not, let's not try and make

this more than it is.

518

:

And also just, you know, conveying your

uncertainty.

519

:

mean, that's just super important because

520

:

It's really, really hard.

521

:

I mean, we're all going to fail at trying

to identify talent.

522

:

It's really hard to identify why one

player is going to succeed over another.

523

:

so, you know, in some way it's not binary.

524

:

It's not a like, do you like this guy or

not?

525

:

Is he good or bad?

526

:

Is this guy better or worse than the other

guy?

527

:

there's a lot of factors that go into why

someone has success.

528

:

And so I think conveying that uncertainty

is really important.

529

:

And obviously, the more observations that

we have of you doing the thing, the more

530

:

certain we are that this is your true

level of performance.

531

:

But it takes a while to get there.

532

:

So we have to just be honest about that.

533

:

Yeah, yeah.

534

:

I think that's actually related to

something I wanted to ask you about also a

535

:

bit more generally, you know, but

536

:

the most significant challenges that you

face when applying Bayesian stance in, in

537

:

sports science and, and how you address

them, because I'm guessing that you, you

538

:

already started talking a bit about that.

539

:

So, let's go there.

540

:

And then, then I have other technical

questions for you, but the kind of, of

541

:

models and, and, and usefulness that

Bayesian stance has in your field.

542

:

But I think this is a good moment to, to

address these.

543

:

questions.

544

:

think the biggest or there's a few

challenges.

545

:

One challenge is not everybody is excited

about a posterior distribution like you

546

:

might be.

547

:

Most of the time, they just want an

answer.

548

:

Tell me what to

549

:

Give me the yes or no, make it binary.

550

:

And so that's always tough.

551

:

And you're trying to oftentimes convey

this to non -technical audiences or people

552

:

who are good at doing other things.

553

:

They're not math people or they're not

stats people.

554

:

And that's okay.

555

:

So that always makes it challenging is why

are you showing me this distribute?

556

:

I don't understand what I'm supposed to

take from this.

557

:

Just tell me.

558

:

What to do?

559

:

Tell me which guy's better.

560

:

Tell me which guy's worse.

561

:

So that's always hard.

562

:

And that takes a lot of patience and

communication.

563

:

For a while, we used to do just weekly sit

downs with our scouts where we would teach

564

:

them about like one stat a week.

565

:

And we'd go slow.

566

:

And we'd also try and...

567

:

as best as possible, relate things back to

the currency that they speak in.

568

:

And scouts and coaches, the currency they

speak in is video, not charts and graphs.

569

:

So the more that we can connect our

analysis to video cut -ups, because then

570

:

they can see it.

571

:

And then they understand why a model says

what it says or makes a decision or why it

572

:

has assumptions.

573

:

And this is also super valuable too,

because they give

574

:

And they say, it's, saying that, you know,

the model is saying that, this is, is the

575

:

outcome, but I can see why it's because

these four other things happen.

576

:

It's like, wow.

577

:

Well, we could probably account for that.

578

:

And we never, I just didn't know it,

right?

579

:

That's why they're domain expert and, and,

and I'm not.

580

:

so.

581

:

You know, the patience around

communicating stats and numbers is always

582

:

difficult and also knowing what people

like.

583

:

When I first started, everybody would tell

you, need to have, you know, got to have

584

:

an amazing dashboard, got to have like

charts and graphs, you know, and all that

585

:

stuff.

586

:

And what I found was there was a lot of

people who were like, I don't, what do

587

:

you, I don't even know what I'm looking

at.

588

:

Like, I hate these things.

589

:

Just give me the table of numbers.

590

:

It's like, okay.

591

:

Well, maybe a table of numbers with just

some conditionally formatted information.

592

:

And also, you know,

593

:

I have an academic side, I do supervise

PhD students and master students, and I do

594

:

teach a master's class in statistics at

college.

595

:

So I guess what I'm about to say would,

know, people on the academic side would

596

:

hate it, but you have to like recognize

the environment you're in.

597

:

And sometimes just like changing the

verbiage helps, like instead of calling

598

:

things the...

599

:

low credible interval and the high

credible interval, like we just call it

600

:

the floor and the ceiling.

601

:

And people are like, yeah, this guy's

floor, it's a bit higher than the other

602

:

guy's floor.

603

:

And that guy's ceiling, this guy's got a

better ceiling.

604

:

And like, know, academically you'd get

shot for that, it's like, those kinds of

605

:

things go a long way because it brings the

information to the end user.

606

:

And if you want them to start to...

607

:

take this information into their decision

calculus, you have to get them

608

:

comfortable.

609

:

And sometimes it's just meeting them with

terminology that helps.

610

:

And so I think that's a really, you know,

that's a big one.

611

:

Those are big challenges in communicating

this stuff.

612

:

Yeah, definitely.

613

:

And I resonate with that.

614

:

I've had the same issues.

615

:

I'll be able to tell.

616

:

talk more precisely about sports in a few

months.

617

:

But when it comes to a lot of other

fields, whether it's marketing or biostats

618

:

or electrical forecasting, yeah, the

issues are related to these.

619

:

They're also extremely diverse.

620

:

So that's interesting.

621

:

You definitely don't have a one size fits

all.

622

:

Definitely what's extremely important

basically is to know the model extremely

623

:

well from my experience.

624

:

And yeah, if you have coded the model

yourself, you usually know it really well

625

:

because you spent hours on it to try and

get it to work and understand what it's

626

:

doing.

627

:

And when it's not able to do as you were

saying, I think it's extremely important

628

:

to be able to tell people what the model

cannot tell you.

629

:

And yeah, I think these are extremely good

points to try and balance what people are

630

:

usually wondering about.

631

:

And that's also where I think having the

Bayesian model is extremely interesting,

632

:

right?

633

:

Because the Bayesian model by definition

is extremely open box and you have to run

634

:

it down your assumptions.

635

:

And so you know much better what the model

is doing than a black box model.

636

:

Yeah, I mean, that's another good point

is.

637

:

If you go into a meeting and you have

model outputs and your only reason when

638

:

asked, why does it prefer this over that?

639

:

Your only reason is because the model said

so.

640

:

If people aren't going to be super excited

about that.

641

:

knowing why things are happening, know,

this also, you know, I mean, this really

642

:

plays into like how you validate and check

your models.

643

:

And so buildings, you know, we kind

644

:

within that Bayesian sort of world,

building simulations is a big part of it.

645

:

And building simulations to see how the

model behaves under different constraints

646

:

and different pieces of information,

that's really important because it gives

647

:

you useful context to talk about and it

gives you useful information in order to

648

:

head things off at the pass when you know

there's gonna be some gotchas and some

649

:

trouble if, you

650

:

people have certain types of questions.

651

:

You can head things off of the past

because you're already aware of them.

652

:

Another thing that I do think is really

useful in this and maybe in some of your

653

:

prior work in consulting, I'm sure you've

like stumbled on like, or used frameworks

654

:

like crisp DM and things like that.

655

:

Like in statistics, there's a PPDAC

problem plan, data analysis and

656

:

conclusion.

657

:

Those types of frameworks help just

because again,

658

:

A lot of times we're dealing with non

-technical audiences and they're trying to

659

:

give you a question and say like, Hey, can

we look at this?

660

:

And oftentimes these things are very vague

and very sort of like, you know, not, not,

661

:

not clearly defined.

662

:

like, you know, my younger self would take

that and run away and, know, do something

663

:

for a week or two and then come back and

be like, Hey, here's this thing, you know,

664

:

and you ask about

665

:

you know, they're usually like, the reply

is, that's kind of cool, but I was

666

:

thinking of it like this and I would do

this with it.

667

:

it's like, man, if you, you know, if you

told me that two weeks ago, I would have

668

:

done something else.

669

:

So using those kinds of frameworks, one,

does a few things.

670

:

One, it gives us the opportunity.

671

:

Like I always tell our analysts like

question the question, like, you know,

672

:

question the question.

673

:

Right?

674

:

So when they have a question, I'm always

sitting there and I'm like, okay, well,

675

:

you know, what would you want to do with

this?

676

:

How, do you foresee yourself using it to

make a decision?

677

:

What's the cadence that you would need to

access this information?

678

:

If I were to get it to you tomorrow, you

know, what would you, what kind of

679

:

decision would you want to make?

680

:

Like really kind of Socratic questions,

you know, question the question.

681

:

And, that does a few things.

682

:

One, we get, we get to two, you know,

683

:

usually two different results.

684

:

Both of them are good.

685

:

The first is I get them to then walk

through that five minutes with me and

686

:

clearly define what it is they're looking

for.

687

:

That's great.

688

:

The other result is the opposite, but it's

also a good result, which is we get about

689

:

three minutes in and they go, you know

what?

690

:

I haven't thought about this well enough.

691

:

Let me think through it a bit more and

come back to you.

692

:

In which case I didn't waste the time

building things and scraping and cleaning

693

:

data and doing all that stuff.

694

:

The other thing that those frameworks do

is, and I try and get analysts to think

695

:

like this, is utilize each step within

those frameworks as touch points back to

696

:

the person who asked you the question.

697

:

Hey, this is where we're at.

698

:

We've collected this kind of data.

699

:

These are the things we're thinking.

700

:

These are the features that we're thinking

about using.

701

:

What do you think about that?

702

:

Anything else you can think of.

703

:

By doing that, along each step of the way,

they get to see the model developed.

704

:

They get to provide input.

705

:

And what that does is it gives them a bit

of ownership over it.

706

:

So when you get to the end result, they're

like, geez, this was built exactly in my

707

:

vision, and now I'm excited to use it.

708

:

And that's a really cool thing too.

709

:

Yeah.

710

:

Yeah.

711

:

Thanks for that detailed answer, Patrick.

712

:

I can definitely hear the 10 years of

experience working on that.

713

:

That makes me think about a lot of other

things.

714

:

Yeah, definitely the same for me, would

say, where my personal evolution has been

715

:

trying to really understand the question

the consumer of the model is trying to get

716

:

to, right?

717

:

Like what actually is your question?

718

:

Because you have something in mind, but

maybe the way we're talking about it right

719

:

now and the way I have it in mind is not

what you want.

720

:

And so, yeah, as you were saying, a good

model is really that's custom made, that's

721

:

fine and hard work and that takes time.

722

:

so before investing all that time in doing

the model, let's actually make sure we

723

:

align and agree on what we're actually

looking at in studying.

724

:

That's, think it's extremely important.

725

:

Yeah, no doubt.

726

:

I think that's often the hardest part,

because it's just getting people to really

727

:

define.

728

:

that's probably, I mean, that and making

sure that you have good data.

729

:

Those are the two biggest things.

730

:

The model building part and things like

that sort of happen a little bit easier

731

:

once you do the first two things.

732

:

That's always the tough part.

733

:

Yeah, yeah, yeah.

734

:

Actually, continuing on that topic, how do

you communicate these statistical

735

:

concepts?

736

:

And honestly, a lot of them are really

complex.

737

:

So how do you communicate that to non

-stats people in your line of work?

738

:

I'm guessing that would be scouts, as you

talked about, coaches, players.

739

:

How do you make sure they understand?

740

:

what you're doing and in the end are able

to use it because we talked about that in

741

:

episode 108 with Paul Sabin.

742

:

If your model is awesome but not used,

it's not very interesting.

743

:

So yeah, how do you do that?

744

:

First, trying to really understand what

kind of cadence this is going to be on.

745

:

So some questions.

746

:

especially in sport, get asked.

747

:

And they're more asked from the knowledge

generation standpoint, meaning that I have

748

:

a question.

749

:

I think it'll help us with, you know,

updating our priors, our prior beliefs

750

:

about the game.

751

:

Maybe things have changed.

752

:

Maybe rule changes have altered things or

something like that.

753

:

Can we study this?

754

:

A question like

755

:

for knowledge generation requires a

different output than something that's

756

:

like weekly or daily consumption.

757

:

So if it's for knowledge generation,

that's usually communicated in the form of

758

:

like a short written report.

759

:

The question at the top, the bottom line

up front, here's the four bullet points,

760

:

and then the nitty gritty.

761

:

Like this is how we went about studying

it.

762

:

charts and graphs and usually it's like a

page or two and a PDF or maybe like an

763

:

interactive HTML file that they can see

things and have a table of contents and go

764

:

to different sections.

765

:

If the question is directed at stuff

that's required to be evaluated weekly or

766

:

daily, like I need to see this every week

because we're going to be evaluating a

767

:

certain player or an opponent or I need to

see this daily because it's

768

:

player health related, something like

that.

769

:

We're always thinking in terms of like web

applications.

770

:

So how do I get, you now I have to think

through the full stack pipeline of like,

771

:

where do we get the data?

772

:

Where does it live in the database?

773

:

What's the analysis layer?

774

:

Kick it out to an output.

775

:

Where's that output stored?

776

:

And then how does the website ingest that

output and make it consumable?

777

:

And for that,

778

:

It's usually some form of charts and

graphs and a table.

779

:

And usually it's interactive stuff.

780

:

So they can sort and filter and hover over

points and access the information.

781

:

And again, as best as possible, I'm always

thinking to try and develop that in the

782

:

way that they're going to use it.

783

:

So like I was sitting down, for example,

today with our director of player health,

784

:

and he was like, you

785

:

I'd love to have this information daily so

that I can relay it to the new coaching

786

:

staff.

787

:

And I want to say it, you know, say these

things.

788

:

Okay, great.

789

:

I have all that information.

790

:

I have all of that, those models, but come

over to the whiteboard and draw for me the

791

:

path that you want to take to going from

sitting at your desk.

792

:

and reading the information from a webpage

to how you want to communicate it.

793

:

And as soon as he started drawing it out,

it's like, okay, I know exactly what to do

794

:

now.

795

:

That's perfect.

796

:

Otherwise I would have built something

that in my head I thought would be useful,

797

:

but maybe not useful to him.

798

:

And then he uses like part of it or maybe

because he's super motivated, he's going

799

:

to use it.

800

:

And he's also going to,

801

:

use like 10 other things to get the other

stuff he wants, but he's a nice guy and he

802

:

doesn't want to tell me that it doesn't

have all the things that he needs.

803

:

And so then like four weeks later, I walk

in his office, I'm like, what are you

804

:

doing?

805

:

It's like, oh, I go here and then I get

this information from this webpage, but

806

:

then I go to this other three webpages

again.

807

:

So, whoa, whoa, whoa, why didn't you just

tell me that?

808

:

Like I'll just, I could make this all into

one thing.

809

:

Like you don't have to, and so.

810

:

That's a really important piece is knowing

how the data is going to be utilized,

811

:

making sure that it's exactly in the order

that the decision maker requires it.

812

:

Yeah.

813

:

Awesome points.

814

:

Yeah.

815

:

Thanks for that, Patrick.

816

:

And I think it's also very valuable to a

lot of listeners because we're talking

817

:

about a professional sports team here, but

it is definitely transferable to

818

:

basically, I think, any company where

you're working

819

:

different people who are using the models

but are not themselves producing the

820

:

models.

821

:

It's like almost every company out there.

822

:

yeah, I think and also from my experience

doing consulting in a lot of different

823

:

fields, I can definitely vouch for the

things you've touched on here.

824

:

yeah, thanks.

825

:

That's definitely, I think, very valuable.

826

:

turn back a bit more to the technical

stuff because I see time is running and I

827

:

definitely want to touch a bit more on the

spot side of things and how patient stance

828

:

is applied in the film.

829

:

Obviously a very important part of your

work is, I'm guessing,

830

:

drafting players, player selection

processes.

831

:

So yeah, how might Bayesian methods be

applied here to improve the drafted

832

:

strategies in the player selection

processes?

833

:

Yeah, well, again, like I think I said

earlier, everybody's going to miss.

834

:

It's impossible to be, you know...

835

:

to have a good hit rate and always be

picking, you know, picking players who are

836

:

going to reach high level success.

837

:

And a lot of that is just because, you

know, performance and talent are extremely

838

:

right -tailed.

839

:

You know, you have a whole bunch of

players that never make it.

840

:

You have a small group that make it and

are good enough to make it.

841

:

You have an even smaller group that are

good enough to make it and like really

842

:

good to play all the time.

843

:

And then you have

844

:

a few Hall of Famers sprinkled in, right?

845

:

So it's really right -tailed.

846

:

it is very hard to do this stuff.

847

:

So, you know, understanding or modeling

your uncertainty, that's really important.

848

:

And

849

:

information from the domain experts, know,

scouts see things on film that we can't

850

:

see in numbers and vice versa.

851

:

One of the values that we have is we can

process way more players than any one

852

:

human can actually watch.

853

:

So we have the ability to build models

that can identify players and hopefully

854

:

get them,

855

:

over to the domain experts who have to

then watch the film and write the reports

856

:

and say like, hey, did you know this guy

was really good in these things?

857

:

This is his potential ceiling.

858

:

And we think that we have, you know, we

think that this would be valuable for our

859

:

team, right?

860

:

Building models like that, that help us.

861

:

Identify talent, give us a range of

plausible outcomes.

862

:

One, it helps us get information to the

people who have to watch the film and make

863

:

the decisions.

864

:

Two, it helps us have discussions about

where the appropriate time to acquire

865

:

people

866

:

If you're sitting there, obviously, you

know, in the major league draft, major

867

:

league baseball draft, it would be the

same thing.

868

:

Everybody knows who the first round picks

are and the second round picks.

869

:

It's after that, that things become pretty

sparse.

870

:

And if you can identify players that have

unique abilities later in the draft, that

871

:

opens up a lot of opportunities to,

872

:

select players that might be able to

contribute successfully to your team.

873

:

And so that's really where those models

help us.

874

:

The other area that they help us in is, I

always talk about with our analysts, like,

875

:

what is the benchmark that you're trying

to beat?

876

:

So every model, like you can't just build

a model.

877

:

I mean, I remember one of our analysts,

she had a model and she said, I built a

878

:

model and I think it's really good.

879

:

And I said, cool.

880

:

How well does it do against the benchmark?

881

:

She's like, well, what do you mean?

882

:

And I was like, well, like how well does

it do against if we just use, let's say

883

:

scout grades or if we just use public

perception, how well does it do

884

:

historically against that?

885

:

She's like, no, no, no, I don't care about

that.

886

:

Like this model is just with their stats

and you You know, it's like, no, no, but

887

:

you have to care about that because if

it's not better than those things, then

888

:

why would we use it?

889

:

Right?

890

:

You have to be able to beat that

benchmark.

891

:

One of the areas where we can really beat

a benchmark is when we combine the domain

892

:

experts information with the actual

observed data information.

893

:

And a Bayesian model allows us to do that,

right?

894

:

It allows us to take down the domain

expert who's maybe scoring the player a

895

:

certain way, writing information about the

player.

896

:

It allows us to take that information.

897

:

mix it with the numbers and get a model

that is, I guess, man and machine, right?

898

:

And those models beat our benchmark much

better than any one of these alone, right?

899

:

If we just use numbers, never watched any

film, never knew anything about the

900

:

player, or if we just use domain expert

information.

901

:

When we combine those things, we tend to

do a much better job.

902

:

And so that's where Bayesian analysis

really helps us.

903

:

And also,

904

:

That's where you start to get interesting

discussions about the floor and the

905

:

ceiling of a player.

906

:

Because now once you run their posterior

distribution and the domain experts

907

:

information is in there and you're saying,

yeah, this guy, he's awesome at tackling

908

:

and he'll be a great tackler and blah,

blah, blah.

909

:

And these are his numbers.

910

:

the numeric model says like, yeah, I think

this guy's a pretty good tackler.

911

:

Domain experts saying like, no, no, no, I

watched him and he doesn't play against

912

:

great competition, but his technique is

really bad.

913

:

It's not going to translate against these

bigger players.

914

:

It's like, well, that's not information

that maybe our stats would have.

915

:

But when we combine those two bits of

information, all of a sudden, our maybe

916

:

overly bullish belief in this player gets

brought down a bit.

917

:

And utilizing the information like that

918

:

is interesting and it also makes it unique

to the people that are in that room, the

919

:

domain experts that you have in that room

and things like that.

920

:

How you weight those things is really

important.

921

:

For our own analytics staff, we'll do

things like we'll build our own separate

922

:

models and have our own meetings and we'll

build our own analysis.

923

:

So we'll have independent models all

against each other and maybe we'll have

924

:

them weighted or we'll use

925

:

you know, like triangle prior and build

them together and, you know, mix them

926

:

together and get posterior simulations.

927

:

And we try and do those things in a way

that allows us to understand all the

928

:

plausible outcomes that might be relevant

for this individual.

929

:

It's fascinating.

930

:

Yeah.

931

:

And I really love both that feel the fact

that you have to blend a lot of different

932

:

information.

933

:

Like the domain knowledge from the scouts,

the benchmark from the markets, the models

934

:

that you have in house, also scientific

knowledge of all the scientists that the

935

:

team has inside of it.

936

:

that makes all that much more complicated,

right?

937

:

I'm guessing sometimes as the modeler, you

would probably be like, my God, that'd be

938

:

so much easier if we could just run some

very big neural network and that'd be

939

:

done.

940

:

at the same time, I think it's what makes

the thrill of that field, at least for me,

941

:

is that, no, that stuff is really hard.

942

:

There is a lot of randomness.

943

:

There is a lot of things we don't really

understand either.

944

:

And you have to blend all of these

elements together to try and make the best

945

:

decisions you can, even though you know

you're not making the optimal decisions,

946

:

as you are saying.

947

:

And I think it's a fascinating field to

study important decision -making under

948

:

uncertainty.

949

:

Yeah, for sure.

950

:

I think that's the thing that's most

951

:

interesting about it to me.

952

:

Like, yeah, I think that's the most that

stuff is fascinating just knowing

953

:

Yeah, Decision making under uncertainty is

really challenging and I think that's the

954

:

thing that makes this the most, you know,

the coolest stuff to work on.

955

:

Yeah, yeah, no, definitely.

956

:

Actually, maybe a last question on the

technical side.

957

:

Now if we look, so we've talked about the

beginning of the career of a player,

958

:

right?

959

:

Like the draft.

960

:

We've talked about...

961

:

kind of the whole lifetime of the player,

which is projection, performance

962

:

projection over the whole career.

963

:

Now I'm wondering about the day -to -day

stuff.

964

:

What can Bayesian models tell us here or

how can they help us in predicting the

965

:

impact of training loads on the athletes'

wellbeing and performance?

966

:

I know, I think it's kind of a frontier

967

:

almost all the sports, but I'm curious

what the state of the art here is,

968

:

especially in US football.

969

:

Yeah, it really is, I think, the sort of

one of the final frontiers, I guess, in

970

:

sport.

971

:

Team sport is just challenging because you

perform well or you win or you lose due to

972

:

a whole bunch of issues that sometimes

973

:

have nothing to do with you.

974

:

For example, I can train you, you know, we

could train you and you could be very fit

975

:

and strong.

976

:

And if in the last play of the game, the

quarterback throws the ball to a patch of

977

:

grass and you lose, it had nothing to do

with you being fit and strong.

978

:

know, counter that to like individual

sport athletes.

979

:

If you're a 400 meter runner, a cyclist, a

swimmer, a runner, a marathoner, you know,

980

:

physiologically.

981

:

If we build you up, we have a much more

direct line between how you develop and

982

:

how it directly relates to your

performance.

983

:

There's not a lot of other information

there.

984

:

No one's trying to tackle you on the bike

or in the pool or something like that.

985

:

So that makes, that makes a sport much

more difficult.

986

:

Baseball is probably the closest because

even though it is a team

987

:

It really is this sort of zero sum duel

between a pitcher and a batter.

988

:

And one guy wins and one guy loses.

989

:

And the events are very discreet.

990

:

The states of the game have been played

out, know, runner on first and second with

991

:

two outs, bottom of the third, blah, blah,

blah.

992

:

So it's maybe a little bit more clear in

baseball.

993

:

I think in the other team sports, in the

kind of invasion sports,

994

:

what makes this challenging is

identifying.

995

:

I always try and take it back to

identifying the discrete events that we're

996

:

trying to, trying to maybe measure

against.

997

:

like, for example, I can give you example,

a pretty clear example from basketball.

998

:

was talking with a friend in a, in an NBA

team and, he was like, yeah, you know,

999

:

our, our, our coach and our scouts and

the, you know,

::

coaches, feel like our players don't close

out three pointers fast enough.

::

And I was like, well, is that a tactical

problem or is it a physical problem?

::

And he's like, well, how would we look at

that?

::

And I was like, you have the player

tracking data.

::

And if you know every time your team's on

defense, which is easy to know, and you

::

know every three pointer that's been shot

against your defense, if you were to take

::

that frame,

::

out of the player tracking data and maybe

like the frame a second to a second and a

::

half before that.

::

So all of that information for every one

of those three pointers.

::

You have an idea of the relationship

between your player and the player who's

::

taking the three point shot.

::

You have an idea of the relationship

between your player and the other players

::

on his team.

::

So you know from a technical, a tactical

standpoint.

::

you know what type of like formation or

defense you're trying to run.

::

So first things first, are the players in

the right position to close out that three

::

pointer?

::

Maybe, you know what?

::

Our guys consistently mess up the

defensive shape and when they get in

::

there, they give too much ground to the

guy shooting a three pointer.

::

The other is the physical standpoint of,

well, no, they're in good position, but

::

when they go to close it out over that

second and a half,

::

They're not fast enough to get there.

::

Okay, great.

::

Now roll it back to what you can measure

in the gym.

::

Is there some measure, let's say on a

force plate of the amount of impulse or

::

force under the force time curve that the

player outputs that can tell us something

::

about their ability to move rapidly, apply

force into the ground, move rapidly to

::

close out that three pointer?

::

And maybe if you look at several years

worth of data, you'd find

::

The top players on your team all do this

thing really well, and some of the worst

::

players at closing out the three do this

thing poorly.

::

And so now you have something to say about

like, hey, what if we develop this quality

::

in the off season and our players, would

we be able to close out the three pointers

::

more effectively, more efficiently?

::

And so I think from that standpoint,

linking the development piece to sport,

::

team sport, invasion sport.

::

You have to really think about the

discrete events of the game and how you

::

can kind of tease those out of, let's say

the player tracking data.

::

And it's like super hard in something

like, you know, in football, because

::

players all do really different things.

::

You know, the linebacker does something

totally different than the offensive

::

lineman.

::

And so you have to really get down to the,

the domain of each of those positions and

::

say like, gosh, what are the discrete

events?

::

that define what this position does, then

how do we measure success in those?

::

And then if we can measure success, how do

we identify the archetype of players who

::

are good at those things?

::

And then if we can do that, maybe then we

can start to talk about, is this something

::

that you can develop in a player?

::

Is it something that you have to identify

in a player?

::

That's sort of the, in my head, I mean, I

don't know, I could be wrong.

::

This is not.

::

Nobody, think everybody's trying to figure

this out, but I could be wrong.

::

But in my head, that's at least the

process that I would, you know, I try and

::

think through when I think about these

things.

::

Yeah.

::

Yeah.

::

It makes a ton of sense.

::

mean, and it seems like, yeah, that, and

there are so many areas, open areas of

::

research on all of that stuff.

::

That's just, just fascinating.

::

I'm

::

I'm already thinking, that'd be amazing to

have a huge patient model where you have

::

all of those topics that we've talked

about.

::

Basically, it could be a big patient model

where you have a bunch of likelihoods.

::

And yeah, that'd be super fun.

::

I'm guessing we're still a bit far from

that, but maybe not too far.

::

Hopefully in a few years, that'd be

definitely super fun.

::

Yeah, no doubt.

::

Yeah, I mean, and that's, definitely

doable.

::

But yeah, you need you need really good

data and you need really good structure in

::

your model.

::

Yeah, that's the part too, is getting

getting good data, know, player tracking

::

data is fine.

::

I mean, it has errors, you know, people

who think that it's like a panacea, you

::

know, it's like, have you really worked

with it?

::

I mean, there's

::

Sampling at 10 hertz for humans that move

really, really fast.

::

Acceleration is a derivative of speed.

::

At 10 hertz, people who are moving really

fast, that data gets noisy pretty quick.

::

I think one of the things is as we

progress, as the technology keeps

::

improving, things get better.

::

you get better data and maybe that helps

you also answer some of these questions a

::

little bit more specifically.

::

yeah.

::

And then we'll be able to have our huge

patient model with a lot of different

::

likelihoods in there that fit into each

other.

::

And then we don't even need to play the

game.

::

We don't have to play the game.

::

They just let the computers play the game

and it's over.

::

We're done.

::

Yeah, no.

::

No, you still have to play the game

because you still have randomness.

::

Then you're like, yeah.

::

mean, because otherwise the model is kind

of like if you want kind of a quantum

::

state, right?

::

Where the model can see the probabilities

of things happening, but then you have to

::

open the box and see what is actually

happening.

::

So you can have the best model.

::

In the end, you still have to play the

game to see what's going to happen because

::

it's not deterministic.

::

Yeah, thankfully.

::

yeah, that's right.

::

Yeah.

::

Yeah.

::

But I mean, it's definitely I always love

doing these these big models.

::

And that's definitely doable.

::

I've done that for election forecasting,

for instance, where you have several

::

likelihoods, one for polls, for instance,

and one for elections.

::

So yeah, that's I know that's definitely

doable in the Bayesian framework, because

::

I mean, why not?

::

It's just part of the big

::

of the same big model in a directed S

-secret graph, if you want.

::

But yeah, I'm curious to see that done in

spots.

::

Maybe we'll get back together for another

episode, Patrick, where we talk about that

::

and how we did that.

::

That'd be cool.

::

Yeah, there you go.

::

Yeah, actually, I wanted to ask you to

close us out here.

::

about, you you've started talking about

that right now, like some emerging trends

::

in sports analytics that you believe will

significantly impact how teams manage

::

training, performance, drafting in the

near future.

::

And also if there are any spots you see as

more promising than others.

::

well, mean, yeah, trends.

::

Yeah.

::

We talked a lot about that stuff and I

think, you know, better data and better,

::

you know, better technology.

::

all of those things will, will, I think

will help us.

::

I think also it's getting, you know,

getting the decision makers comfortable

::

with the utility of some of this stuff,

you know, baseball, has always been a game

::

of numbers.

::

And, I think early.

::maybe mid:

seven, you know, releasing data kind of to

::

the public, really the first sport to get

player tracking data, things like that.

::

I think that opened up a lot of

opportunities for people to do really

::

interesting work in the public space,

which then sort of got

::

teams interested and then sort of a, you

know, more of a shift in people in the

::

front office where, maybe historically it

was ex players who kind of played out

::

until they retired and then became scouts

and managers and things like that.

::

I think that, you know, that happening in

baseball was a really good thing for that

::

sport.

::

And I think slowly for the other sports,

that's really

::

probably needs to happen because the more

that these things are open and sort of

::

curbside, I think the more the decision

makers become comfortable with them and

::

can say like, I can see how I would use

this.

::

I can see what this might help me with.

::

so I think that's never underestimate the

work that you do in the public space

::

because I think there's an opportunity to

always.

::

you know, help things evolve,

crowdsourcing, guess.

::

Yeah, mean, preaching to the choir here.

::

Yeah, for me, a lot more of these data

would be open sourced.

::

Yeah, I mean, there is also an extremely

interesting trend right now towards open

::

sourcing more and more parts of large

language models.

::

I think that's going to be extremely

interesting to see that develop because

::

At the same time, this is very hard

because these kind of models are just so

::

huge.

::

You need a lot of computing power to make

them run.

::

So I don't know how open source can help

in that, but I know how open source can

::

help in the development and sustainability

and trustworthiness and openness of all

::

that stuff.

::

So that's going to be super interesting.

::

And I'm also going to be very interested

in

::

the different spots evolve.

::

Now that basically the nerds are they are

much more right than before.

::

know, so like, probably baseball is going

to be at the forefront of that because

::

they just have a lot of, of, know,

advanced in years compared to the other

::

sports.

::

So it's going to be interesting to see how

things plays out here when it comes to

::

data.

::

Because at the same

::

Not sure it makes a lot of sense for all

the clubs to have their own data

::

collection structure if in the end they

just have the same data because you're

::

mainly, I think, to gather data, you are

limited, I'm guessing, by the technology

::

much more than by the ideas of a coach or

manager or a scientist being like, I

::

data, I think in the end, the data

collection is something that can be pretty

::

much, you know, collective, but then how

you use the data is more the appropriate

::

proprietary stuff.

::

It's going to be interesting to see that

out.

::

Yeah, no doubt.

::

Great.

::

Well, Patrick, I've taken a lot of your

time already.

::

I need to let you go because...

::

You need to drink some coffee.

::

definitely need to because that was very

intense.

::

But man, so interesting.

::

Before letting you go, so I have the last

two questions, of course, as usual.

::

You told me before we started the show

that when the season is going to start

::

again for you in US football, your days

are going to be extremely busy.

::

Like basically working from 5 a .m.

::

to 10 p .m.

::

or something like that.

::

How is that possible?

::

when do you sleep?

::

We do have some long days.

::

It depends on the day of the week and when

the full practice days are.

::

Usually, yeah, I'd get in around 4, 45 or

5, have a bit of a workout, and then kind

::

of start the day around 6, 30 or 7.

::

And it's really long.

::

I mean, there's a ton of meetings.

::

It's a very tactical sport if you've ever

watched it.

::

And so the players are nonstop in and out

of meetings and walk through practices and

::

full practices and then more meetings.

::

it's all a big, you know, tactical pattern

recognition type of thing.

::

And so, you know, we're in, you know,

working on projects and data and getting

::

you know, things set up so that model set

up and identifying things in data for the

::

staff and things like that.

::

it just becomes this really long day.

::

And I mean, like, yeah, if we go home at

eight or nine, maybe 930 sometimes, maybe

::

10, but I mean, there's people there

that'll stay even later than that, just

::

going through film and watching it.

::

They are very long days.

::

Usually those types of days are about

three days a week and then the other days,

::

I might be in there at five and get out at

like five or six.

::

So still 12 hour days, but it's a long

week for sure.

::

This is brutal.

::

Yeah.

::

But is it like that during the whole

season or is that mainly the start of the

::

season?

::

No, that's the season.

::

That is

::

18 weeks later we have a bye, 17 games

this season.

::

Damn, impressive.

::

You have to be sharp with your sleep also,

I guess in these weeks.

::

You do, yes.

::

You try and catch up on the weekends.

::

Yeah, damn.

::

Awesome, well Patrick, I think it's time

to call it a show.

::

Thank you so much, that was amazing.

::

Of course, I'm going to ask you the last

two questions, ask every guest at the end

::

of the show.

::

You knew that was coming, right?

::

Yes.

::

So what's the first one?

::

You know the first one.

::

The first one is, if unlimited resources,

what problem would you solve?

::

Yeah, unlimited time and resources.

::

I'll take one outside of sport, but one

::

I witnessed in sport.

::

so when I first started, I used to do all

of the GPS stuff, like live on the field.

::

Now someone else does it, but coding it or

cutting it up and stuff like that during

::

practice.

::

And on Friday practices at the time, that

was the day for our make -a -wish, the

::

make -a -wish child.

::

So they'd have kids that had make a wish

and their wish was to see a practice and

::

meet their favorite NFL players.

::

And these were usually kids that were, you

know, were small and terminally ill.

::

I think the, that's probably the thing

that I would solve because standing there

::

and you watch that and you work with all

these guys that are healthy and young.

::

And then you see this little kid who never

have a chance to

::

healthy and young, but they're just so

happy to meet these guys.

::

I think like that's a super unfair thing

for those little kids.

::

if I could solve anything, it'd be like

that, you know, kids and cancer and stuff

::

like that.

::

I think it's just a horrible thing.

::

And then your second question is always, I

could have dinner with anyone dead or

::

alive, who would it be?

::

There's so many good ones, but I think I

would pick...

::

a previous guest that you've had, I think

three times, if I'm correct, which is

::

Andrew Gellman.

::

I think he's fascinatingly interesting and

I think dinner would be pretty amazing.

::

Yeah.

::

yeah.

::

Both good choices, amazing answers.

::

Thanks, Patrick.

::

I can tell your faithful listeners because

they're like, yeah, you knew the

::

questions.

::

Like you're taking my job basically, I can

see that.

::

No, that's great.

::

So Andrew, if you're listening, well, if

you're ever in New York, Patrick will try

::

and make that work.

::

That'd be fun for sure.

::

Yeah, Andrew is always fantastic to talk

to.

::

So yeah, that's definitely a great choice.

::

Awesome.

::

Well, that's it, Patrick.

::

Thank you so much for being in the show.

::

I really had a blast and learned a lot

about US football because I that's, I

::

think that's not the sport I know most

about.

::

So definitely thank you so much for taking

the time.

::

We'll put resources to your website in the

show notes for those who want to dig

::

deeper.

::

have a bunch of links over there and

::

Thank you again, Patrick, for taking the

time and being on the show.

::

Thank you.

::

This has been another episode of Learning

Bayesian Statistics.

::

Be sure to rate, review, and follow the

show on your favorite podcatcher, and

::

visit learnbaystats .com for more

resources about today's topics, as well as

::

access to more episodes to help you reach

true Bayesian state of mind.

::

That's learnbaystats .com.

::

Our theme music is Good Bayesian by Baba

Brinkman, fit MC Lars and Meghiraam.

::

Check out his awesome work at bababrinkman

.com.

::

I'm your host.

::

Alex Andorra.

::

You can follow me on Twitter at Alex

underscore Andorra like the country.

::

You can support the show and unlock

exclusive benefits by visiting Patreon

::

.com slash LearnBasedDance.

::

Thank you so much for listening and for

your support.

::

You're truly a good Bayesian.

::

Change your predictions after taking

information in and if you're thinking of

::

me less than amazing, let's adjust those

expectations.

::

me show you how to be a good Bayesian

Change calculations after taking fresh

::

data in Those predictions that your brain

is making Let's get them on a solid

::

foundation

Previous post
Next post