#85 A Brief History of Sports Analytics, with Jim Albert

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

In this episode, I am honored to talk with a legend of sports analytics in general, and baseball analytics in particular. I am of course talking about Jim Albert.

Jim grew up in the Philadelphia area and studied statistics at Purdue University. He then spent his entire 41-year academic career at Bowling Green State University, which gave him a wide diversity of classes to teach – from intro statistics through doctoral level.

As you’ll hear, he’s always had a passion for Bayesian education, Bayesian modeling and learning about statistics through sports. I find that passion fascinating about Jim, and I suspect that’s one of the main reasons for his prolific career — really, the list of his writings and teachings is impressive; just go take a look at the show notes.

Now an Emeritus Professor of Bowling Green, Jim is retired, but still an active tennis player and writer on sports analytics — his blog, “Exploring Baseball with R”, is nearing 400 posts!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony and Joshua Meehl.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

Links from the show:

Jim’s website: https://bayesball.github.io/
Jim’s baseball blog: https://baseballwithr.wordpress.com/
Jim on GitHub: https://github.com/bayesball
Jim on Twitter: https://twitter.com/albertbayes
Jim on Linkedin: https://www.linkedin.com/in/jim-albert-22846b41/
Jim’s baseball research: https://bayesball.github.io/BLOG/
Probability and Bayesian Modeling book: https://monika76five.github.io/ProbBayes/
Curve Ball — Baseball, Statistics, and the Role of Chance in the Game: https://bayesball.github.io/curveball/curveball.htm
Visualizing Baseball: https://bayesball.github.io/VB/
Analyzing Baseball Data with R: https://www.amazon.com/gp/product/0815353510?pf_rd_p=c2945051-950f-485c-b4df-15aac5223b10&pf_rd_r=SFAV7QEGY9A2EDADZTJ5
Teaching Statistics Using Baseball: https://bayesball.github.io/TSUB2/
Ordinal Data Modeling: https://link.springer.com/book/10.1007/b98832?changeHeader
Workshop Statistics (an intro stats course taught from a Bayesian point of view): https://bayesball.github.io/nsf_web/main.htm
LBS #76, The Past, Present & Future of Stan, with Bob Carpenter: https://learnbayesstats.com/episode/76-past-present-future-of-stan-bob-carpenter/
MCMC Interactive Gallery: https://chi-feng.github.io/mcmc-demo/app.html?algorithm=HamiltonianMC&target=banana

Abstract

written by Christoph Bamberg

In this episode, Jim Albert, a legend of sports analytics, Emeritus Professor at Bowling Green university, is our guest.

We talk about a range of topics, including his early interest in math and sports, challenges in analysing sports data and his experience teaching statistics.

We trace back the history of baseball sport analytics to the 1960s and discuss how new, advanced ways to collect data change the possibilities of what can be modelled.

There are also statistical approaches to American football, soccer and basketball games. Jim explains why these team sports are more difficult to model than baseball.

The conversation then turns to Jim’s substantial experience teaching statistics ad the challenges he sees in that. Jim worked on several books on sports analytics and has many blog posts on this topic.

We also touch upon the challenges of prior elicitation, a topic that has come up frequently in recent podcasts, how different stakeholders such as coaches and managers think differently about the sport and how to extract priors from their information.

For more tune in to episode 85 with Jim Albert.

Chapters

[00:00:00] Episode Begins

[00:04:04] How did you get into the world of statistics?

[00:11:17] Baseball is more advanced on the analytics path compared to other sports

[00:17:02] How is the data collected?

[00:24:43] Why is sports analytics important and is it turning humans into robots?

[00:32:51] Loss in translation problem between modellers and domain experts…?

[00:41:43] Active learning and learning through workshops

[00:51:08] Principles before methods

[00:52:30] Your favorite sports analytics model

[01:02:07] If you had unlimited time and resources which problem would you try to solve?

Transcript

Please note that this transcript is generated automatically and may contain errors. Feel free to reach out if you are willing to correct them.

Transcript

Alex:

Team Albert, welcome to Learning Vision Statistics.

Jim Albert:

Thank you. Thanks for having me.

Alex:

Yeah, I mean, thank you for taking the time. I'm actually very grateful to have you on the show and very grateful to Bob Carpenter, who actually put us in in contact. So thanks a lot, Bob, if you are listening to this episode. Bob was on the show a few episodes ago. I will, of course, put his episode in the show notes of this one. So yeah, thanks for taking the time, Jim. And I'm super excited to talk about spots and analytics today. It's going to be very fun. So, yeah, so let's dive in. But as usual, we'll start with your origin story. Because, yeah, I'm very curious to know You've done so many things, but I'm curious to know about the beginnings. Like, how did you come to the world of statistics and sports analytics? And how sinuous of a path was it?

Jim Albert:

Okay, of course, growing up, I was always interested in sports. And I was

Alex:

Mm-hmm.

Jim Albert:

also, I like math. So I was interested in statistics early on and I played baseball and I was, I've been lucky to play tennis my whole life. And so my tennis, my tennis was a part of my family. My dad was a very good player. So that was sort of nice. And so I played simulation games in my basement, games like stratomatic baseball. And I wasn't a real good baseball player, but I was a baseball fan. And, uh, so I was a math, I was good in math and I didn't quite know where that would lead me in terms of

Alex:

Mm-hmm.

Jim Albert:

a profession. But I went to Bucknell University in Pennsylvania and, uh, I was a math major. And then that's where I was exposed to statistics. We had a few staff professors that were very influential to me. And so I wanted to apply, apply math somehow. And I didn't quite. I'm not sure I really understood what statistics was about, but I was fortunate to go to Purdue.

Alex:

Mm-hmm.

Jim Albert:

I thought I was gonna do operations research, which I didn't quite. I thought that was a nice blend of math and probability and different things, but it turned out that I was just sort of put into the PhD program in statistics. So that's where I was. And the professor across the hall from me was Jim Berger. And... So

Alex:

Mm-hmm.

Jim Albert:

I got to know him pretty early. I mean, in fact, he only had come to Purdue a couple of years before me. So he was very young. And I took my first Bayesian course with him. It was a course in decision theory using Tom Ferguson's book. And so when I was looking for an advisor, he was a natural person to ask because I knew him sort of informally and... He was very easy to work with, more of a helpful friend than really

Alex:

Mm-hmm.

Jim Albert:

like a tyrant. He was very easy to talk to and available. So I was very lucky. But back in those days, he had done a lot of work in simultaneous estimation, variations of Stein estimation. So he basically said, well, why don't you work on something which is away from the... the normal mean case, once you start looking at things like, you know, Poisson means or other types of parameters. And we were looking for more practical kinds of procedures. And so right away, he said, why don't you look at these Efrain and Morris papers? And so I was reading those and those were very influential. Now, I should mention this is a notable time to talk about that because Cara Morris, a great statistician, just passed away this week. And so, and one thing that was notable about Carl and Brad Efron is they both like sports. And so they incorporated

Alex:

Mm-hmm.

Jim Albert:

sports examples into their research. And so they have a famous paper where they look at a collection of batting averages. And they talk about shrinkage to an average. And also Carl had wrote a paper in the early 80s that was called like parametric empirical Bayes estimation for JAZA. And one of the examples was batting averages for Ty Cobb. And one of the questions he was asking was, is Ty Cobb really a 400 batting hitter? In other words, was his true batting ability over 0.4? So we're not talking about his performance, we're talking about his ability, which is measured by a probability of getting a base

Alex:

Yeah.

Jim Albert:

hit. So that was a big part of what... And also, I think Carl was also a tennis player. And he wrote a paper back one of the early days of tennis analytics about the most important point in tennis. So we had a lot of things in common, you know, and I think, so I was lucky to be exposed to that. And so my thesis was on, was estimating Poisson means from

Alex:

Mm-hmm.

Jim Albert:

like an empirical Bayes perspective. This is before MCMC and simulation. We didn't have... able to, we couldn't do the computations for these posteriors, but the ideas were there. And so then I got my first job in the math stat department at Bowling Green State University. We moved to Ohio and my whole academic career I spent at Bowling Green. So that's sort of

Alex:

Yeah.

Jim Albert:

a quick description of

Alex:

Yeah.

Jim Albert:

my background.

Alex:

Yeah, thanks a lot. Yeah, that sounds... that's so interesting that... Yeah, basically, sports and math was really something you were into since you were a young child. I found that very interesting. Like for me, my personal path, for instance, was extremely... extremely... sinuous so that's that's very interesting to me and but that being said um how come like if you were really into into tennis uh how come you've you've made uh most of your analysis on baseball

Jim Albert:

Well, baseball is remarkable for all the data collected. And I think baseball was sort of the, to me it's the most statistical sport in the sense that there's more data collected by it. So baseball started professionally like about 1800s, 1870s or so. And right away they were computing

Alex:

Mm-hmm.

Jim Albert:

things like batting average and runs and singles doubles, home runs and. And even now, I think the amount of data collected is remarkable. And so I think that's sort of driven maybe the interest in the sport is that there is so much, much, much statistical information. And in the old days, it was put into big books, big thick books.

Alex:

Yeah.

Jim Albert:

And

Alex:

Yeah.

Jim Albert:

now there have been efforts like, for example, make play-by-play data accessible. And so

Alex:

Mm-hmm.

Jim Albert:

literally, you can look at plays from old seasons, every single play. And now we have what's called stat cast data, where there's data on every single pitch in terms of the speed and the break and the type. And then we also know when a ball is hit, we know the launch angle and we know the exit velocity. There's even more detailed information now about where people are on the field. We can talk about player speeds, their range and fielding. I mean, it's all available now. And I think baseball was probably the first sport to really seriously use that data to answer meaningful questions.

Alex:

Yeah, yeah, I mean... return to the future like the the the book he one of the one of the characters has is about baseball history right where he manages to make a fortune because he goes back to the past and get that book it's because i think it's about baseball already

Jim Albert:

All right.

Alex:

even though like the the movie's already quite quite old so yeah i can see what you mean so okay i understand and that actually goes back it goes into something i wanted to talk about with you because Yeah, I have the feeling that... So I have the prior... We're on a Beijing show. So I have the prior that yeah, baseball is more advanced on that analytics path compared to other sports and then also like the US in itself, so maybe more advanced than other countries but we'll get back to that. But yeah, basically can you, you started already doing that but can you give us a very brief history of baseball analytics in the US and then also tell us how advanced baseball is on that path today compared to... to other sports.

Jim Albert:

Okay, so basically back in the early 60s, people were starting to do meaningful information about baseball. There was a person named George Lindsay, and his dad will actually record play-by-play information while he was watching the game. And so

Alex:

Mm-hmm.

Jim Albert:

there's this idea called run expectancies, which is like the potential for scoring runs, given a certain

Alex:

Mm-hmm.

Jim Albert:

number of bases on number of outs and people on base. that work was published in the early 60s. But things were pretty quiet. And then probably the most famous person who got things going was Bill James. He was actually just a, he didn't have a formal math background, but he started writing about baseball in the early 80s, or maybe the late 70s. And there's this book called The Baseball Abstract, where he basically, he would, do things like a statistician, he would ask reasonable questions and try to use data to answer them. And he was also

Alex:

Mm-hmm.

Jim Albert:

a very good writer. So he had made a tremendous impact. And so the whole field is called Saber Metrics. And partly it's called Saber Metrics because back in the 70s, a new society, a new organization organized called Saber, which is the Society for American Baseball Research. And that's still very active today. And so when that started, then Saber Metrics was sort of the people in that organization who focused on doing statistical work with baseball. And so it's really since Bill James and then the book Moneyball came out about 20 years ago, more than 20 years ago, and essentially made the whole idea of Moneyball was you're trying to, you're a general manager of a team, in this case Oakland. and they were trying to efficiently use resources. And so they weren't a big market team, so they didn't have the money to spend on players. And so instead they try to find players with talents that were undervalued. And one undervalued talent was the ability to get on base. And so they got people who maybe were not famous, but they were successful in getting on base. And so basically that, the book, and eventually the movie, sort of made the whole idea very popular. And Oakland at the time was one of the, relatively few teams that really spent a lot of effort on Saber Metrics. Now, every team in baseball, all 30 teams, have significant analytics groups, some more than others. But it's perceived to be a big deal, especially in terms of scouting players. predicting future success. I mean, that's really a big enterprise. And I used to tell students, students would say they wanna get a job in baseball and I would say, that's nice, but there's not much available. Now the opportunities are really remarkable. I mean, if you really have your, if you're really enthusiastic and you have some, you've done some project or, you know, you've written a blog post or something, you know, you can get a job as an intern. I mean, the opportunities are there. Now other sports like football, American football and basketball and soccer, they're coming, they're behind baseball, but they're catching up. But right now, I mean, there's nothing like the amount of effort in baseball. I mean, I know friends who work for football teams, but American football, it's more like it's a very small group of people that do that. Although American

Alex:

Mmm.

Jim Albert:

football is the most popular sport, the use of analytics is still a bit limited. The opportunities

Alex:

Mm-hmm.

Jim Albert:

are there and I think it will continue to advance. I think ice hockey is exciting because the pace of the game is so fast that you need to collect data to really understand what's happening on the ice. Otherwise you miss things. Baseball is a slower game, a little more discreet than say football or basketball or soccer. It's probably in some sense easier to analyze. But the technology is here. We can now track balls that are like in soccer. We can track where the ball is on the field. And in basketball, we can track the locations of the players and the positioning. So now it gets to be much more of a spatial type of analysis.

Alex:

Yeah, I see. Yeah, and you have all the history and basically the advantage of the first comma in baseball, where it's also much more ingrained into the way of thinking, into probably the way of managing the teams and the data science, I mean, the data science teams, I don't know if that's the name, but you get what I mean. The modeling teams maybe have more access to the sports

Jim Albert:

Right.

Alex:

teams, stuff like that. I don't know, maybe you can, you have some insights

Jim Albert:

But yeah, so I think the amount of data that teams have is really

Alex:

Hm hm.

Jim Albert:

remarkable. Literally every

Alex:

Hm.

Jim Albert:

second of the game, there's movement of players in the field and that's all available to the teams. So I think

Alex:

Yeah,

Jim Albert:

it's really,

Alex:

incredible.

Jim Albert:

I think they can, for example, one thing, one aspect of baseball that was difficult to measure was fielding. Because in the old days, they would just measure whether you... If you made a play, whether you made it successfully or if you didn't make a play, it was called an error. But what if

Alex:

Mm-hmm.

Jim Albert:

you didn't reach the ball? Well, that was never measured, see, because there was no way of measuring it. Now, we can talk about the probability of balls like 10 yards from you. We can talk about the probability you should get it. So we can talk about your speed of getting there. We can talk about your range. We can talk about catch probabilities, that's all. So really we're getting a much better understanding about fielding. It's not a secret anymore. Well, for many years, it was sort of a, the data we had was very incomplete.

Alex:

Yeah, and how do they get that data? Is that with cameras? Or is that

Jim Albert:

Yeah,

Alex:

sensors

Jim Albert:

Alex:

on the players?

Jim Albert:

I think the Hawkeye system, which is used for tennis, for marking locations

Alex:

Right.

Jim Albert:

of balls, that's being used

Alex:

Yeah.

Jim Albert:

in all sports now. So I think that

Alex:

Mmm.

Jim Albert:

technology is used. And so when a ball is hit, we know exactly where

Alex:

Mm-hmm.

Jim Albert:

it lands in the field. All the players are tracked, so we know where the players are moving.

Alex:

Mm-hmm.

Jim Albert:

So we can, yeah, so that's all measurable nowadays.

Alex:

Yeah, yeah, it's incredible. Basically, we're starting to get, like, data is oil, is the oil of the modeling.

Jim Albert:

Right.

Alex:

So basically, we're starting to get the oil, and now we can work on actually building the engines and the cars. Yeah,

Jim Albert:

rights.

Alex:

that's so cool.

Jim Albert:

So I

Alex:

Jim Albert:

think

Alex:

love that.

Jim Albert:

all these teams

Alex:

And,

Jim Albert:

must have tremendous

Alex:

yeah.

Jim Albert:

databases. And so you need people that can

Alex:

Yeah.

Jim Albert:

build these databases. You want to extract information fast. But obviously, you also need modelers. I think Bazians are even asked, sometimes the job description will require you to have some experience in Bayesian modeling. So that's roughly an important aspect of that.

Alex:

Yeah, for sure. And I mean, as an open source enthusiast, I'm wondering if they have some shared databases because I'm guessing some of the data could be shared across everybody and some of the data should probably remain a secret, but I'm guessing that nobody's shared across teams, right?

Jim Albert:

Well, Major League Baseball has a component of Major League Baseball is called Advanced Media. And that currently is the web presence of that is called Baseball Savant. And that's got

Alex:

Mm-hmm.

Jim Albert:

the high level. Now, some of that data is publicly available. So I can, for example, I can every day, I download the data for the previous games. So I, for example, I focus a lot on home runs. And for every home run hit, I know the launch angle, I know the X of velocity, and I probably have the distance. So that data is available to anybody who wants to get that. Now the extra data like the locations of fielders, that's probably again available through that same organization, and that's supplied to the teams. So the teams probably have pretty much the same data. Yeah.

Alex:

Oh, okay. Yeah, that's interesting. Yeah, because, I mean, the modeler in me is already thinking about hierarchical models and stuff like that.

Jim Albert:

Right,

Alex:

And

Jim Albert:

I mean to me

Alex:

basically,

Jim Albert:

that's

Alex:

yeah.

Jim Albert:

hierarchical modeling is extremely natural in that

Alex:

Yeah.

Jim Albert:

setting because you want to, you're often looking at a collection of different hitters or fielders or teams

Alex:

Yeah,

Jim Albert:

and that's so when you

Alex:

exactly.

Jim Albert:

have data in groups like that that lends well

Alex:

Yeah.

Jim Albert:

self-willed hierarchical modeling. Yeah.

Alex:

Yeah, yeah, like, I mean, just thinking about it like team, team hierarchies and also position hierarchies, like

Jim Albert:

Right,

Alex:

it would

Jim Albert:

right,

Alex:

Jim Albert:

right, right.

Alex:

the two main ones that I would think of right now. Yeah. Um, and is that like, so basically baseball has always been on the forefront on that, on that thing, as you were saying on these analytics. Um, What is the current frontier? Like what are the current frontiers of baseball analytics right now? Like the biggest answers people are looking for right now in the baseball analytics world.

Jim Albert:

Well, I think one thing we don't quite understand is maybe the how teams work together, because

Alex:

Mm-hmm.

Jim Albert:

team is a collect getting like defense, the collective enterprise is not just one player. And so like two key players are the the second baseman, the shortstop, they play in the middle of the field. And the question is, how do they work together? I don't think we have a good, a good sense of how to measure that, you know, morally synergy or something between those two players. You know,

Alex:

Mm-hmm.

Jim Albert:

we don't know much about there's a lot. You hear a lot about team chemistry or the ability to work together as a team. I don't think we have a good sense of how that's measured. You know, the coaching, what's the value added to coaching? Or is a baseball manager really an important position? You know, you hear about different styles of management. Some people are more the... like to be more like the players and

Alex:

Mm-hmm.

Jim Albert:

talk at their level. Other managers like to be more like dictators where they're making, calling the shots. I mean,

Alex:

Mm-hmm.

Jim Albert:

what's the, often teams will vacillate between the two kinds of managers. I'm not sure we really understand, getting the value of that thing. So I think we're still, you know, those are things are harder to understand, you know, because those things are very important for teams winning or losing.

Alex:

Yeah, yeah, yeah, that makes sense. Any stat? So that these are the main questions right now for baseball. And thank you for, um, like painting that whole, um, drawing for us, uh, the, the current landscape and right now I'd like to de zoom a bit because so you've been talking about sports in the U S, um, I'm European. So of course I'm I want to compare to it and I'm really into sports. I love it. Something I can notice though, um, just from my standpoint, which is way less specialized than yours is it feels like Europe is, um, quite behind compared to the U S on that front. Um, and so. Especially for football and soccer. Um, so. So I don't know, my feeling is that England, the Premier League for instance, is way more advanced, it sounds to me, than for instance the French League or the continental leagues for instance. So I'd like to get your expertise on that, how is Europe fairing in all this in comparison to the US? What's the state of the data science market

Jim Albert:

Yeah,

Alex:

on this?

Jim Albert:

I think football or soccer is the big frontier now in sports analytics,

Alex:

Mm-hmm.

Jim Albert:

because I think there's

Alex:

Yeah.

Jim Albert:

more interest internationally in that sport than any other sports. And I think, but obviously it's harder to measure because it's not about scoring goals, it's about movement of players on the field, about making plays. And I think that's all spatial, that data is being recorded now. We're starting to... get a handle on that. But how do you measure that? Or how do you measure a player's performance given that spatial data? That to me is the really, to me the exciting aspect of soccer, of how to work with that. I think we're starting, because it's not about, honestly, it's not about scoring goals, because goals are relatively unlikely events, right? They're relatively rare events. And so if you just focus on goal scoring, You know, you're a bit limited in terms of what you can do. But if you talk about plays or about advantage, spatial advantages on the field, right? And that's what, as a team, you wanna have those spatial advantages. Well, what does that mean? How do you measure that? You know, what are the key players on a team that will give you that spatial advantage, right? So. But again, it's not like baseball, which is a discrete process. It's rather a continuous spatial process, right? So, but I think that's, but the point is the data is there. So the data is available. And so I think the teams are just starting to build up analytics groups to work with that. So I think modeling is a big part of that, especially Bayesian modeling, where you want to sort of borrow strength from different, different uh teams or time frames or you know

Alex:

Yeah, yeah, I mean, we'll get to that modeling part in a few minutes, for sure. But okay, yeah, I get what you mean. And that's definitely also the thing I get. Um, one of the interesting thing I think is how much more conservatism there seems to be in the, in the soccer world, uh, in continental Europe, like. Um, if I'm playing dumb for now, like if, so for the sake of like, I like to entertain the different arguments. Um, so basically one of the main critiques you will get about any kind of modeling in Western Europe. Far soccer will be related to a kind of the kind of critiques you will hear about automation Basically, so you're trying to get the humans out of the loop. It makes the game less interesting. Basically it makes it turns players human players into robots basically, uh,

Jim Albert:

Right.

Alex:

so the these people will basically say that and Question the importance of sports analytics in that way So can you talk a bit about that and tell us why spots analytics important and why is it turning humans into robots or not?

Jim Albert:

Oh, that's a good question. I mean, I think. You know, I think in baseball, for example, there's a lot of people who think we've gone too far, where,

Alex:

Mm-hmm.

Jim Albert:

you know, managers will make, like, for example, one big decision a manager's got to make in baseball is, how long do you keep the pitcher in the game? Right?

Alex:

Mm-hmm.

Jim Albert:

And,

Alex:

Yeah.

Jim Albert:

and so the pitcher will throw pitches and the feeling is once they've hit a certain limit, then their effectiveness goes down. Or maybe when they face the other team's lineup for like a third time, then suddenly there's like a drop off. And so managers often will make a decision solely on the analytics. So that this pitcher has thrown 100 pitches, therefore he's coming out. Or this pitcher is coming out because he's been, he's gonna face a batter for the third time and we don't want that. Yeah. So the critics would say they're just acting like robots. They're

Alex:

Mm-hmm.

Jim Albert:

just like machines, they're just reacting to data. Well, rather the reality is that every picture is different, every game is different, and you have to make subjective judgments on what to

Alex:

Mm-hmm.

Jim Albert:

do. Soccer is the same way. I think analytics are gonna help you learn about important. aspects to winning, you know, and I think you need to know that because otherwise you may make some obvious mistakes in Putting in players or React to the situation and improvise

Alex:

Yeah.

Jim Albert:

and that's always going to be part of sports So I think I think we need some basic intelligence understand. So for example, how do you value, like a soccer player, how

Alex:

Mm-hmm.

Jim Albert:

do you evaluate how good is he? Especially when he's not a scorer, right?

Alex:

Yeah.

Jim Albert:

He's the defenseman or a midfielder. Well, you're not going to measure him by the number of goals he scores, right? But you need some measure and it isn't necessarily raw speed, right? It's their ability to react in certain situations or their ability to to make plays or give their team an advantage. And I think you need some sort of analytical tools to understand that. Because I don't think our ability to just to look at somebody is that great in terms of evaluating their performance.

Alex:

Yeah, I mean... I... I completely agree with that and I... I mean, you have a lot of examples of... at least French football clubs who are just... which are just still, you know, recruiting players based on that instinct way of doing things, basically. And right now, that's why I'm a... no... Of course biased and but fervent Enthusiast about more modeling in these kind of cases because There is still a long way to go before the players and the managers become robots for now. This is way more done by just instincts and gut feeling and There is a lot of clubs where it just doesn't work at all because there are so many variables that you have to take into account that a lot of those biases are basically not taken into account when you when you recruit a player and that just like can mess up a whole a whole recruitment project

Jim Albert:

In American football, what's interesting is that we just recently had our draft, big NFL draft for the draft players.

Alex:

Oh yeah.

Jim Albert:

And before

Alex:

Mm-hmm.

Jim Albert:

they have the draft, they have what's called a combine, where they put

Alex:

Mm-hmm.

Jim Albert:

all these college players through all these different tests, like running tests or strength tests. And so everyone is measured in all these different attributes. And

Alex:

Mmm.

Jim Albert:

the question is, does that define a football player? No,

Alex:

Mm-hmm.

Jim Albert:

because even though you might have a great speed or rate, a great jumping ability or something, doesn't mean that you're gonna play well. Because something, you know, another thing we don't understand well in sports is how important is the mental composition of the player? I mean, you know, and so the maturity or something like that. And I'm not sure we can measure that. I mean, we know that's important, but how do we measure that? So, you know, I think... A lot of times teams will worry about what these players do when they aren't playing football or another sport. Sometimes they get into trouble or they have some

Alex:

Mm-hmm.

Jim Albert:

problems. But those things are all important because that defines the individual. But how do you quantify those kind of issues is sort of interesting.

Alex:

Yeah, yeah, exactly. And I think, so I'm curious about what you think, but I think that's also where and why patient statistics can fare extremely well here because it's not a black box approach where I just put in the data and predict and fit and predict. It's something that takes in the scouts. domain knowledge that's a hard-earned knowledge from years of experience, but balances it with data and with a way of writing down your priors and your biases that you don't have when you're just making gutting decisions. And so that's why I think here patients' statistics can bring a lot to the table when it comes to sports analytics.

Jim Albert:

Yeah, I mean, I think the prior information is available, but the challenge is how do you quantify that into priors?

Alex:

Mm-hmm.

Jim Albert:

And

Alex:

Oh,

Jim Albert:

decisions

Alex:

okay.

Jim Albert:

on modeling, for example, it's not

Alex:

Uh-huh.

Jim Albert:

often clear what type of model you choose. So you often choose different alternatives and there are a lot of issues involved in terms of like a competition could be a big issue. Maybe

Alex:

Mm-hmm.

Jim Albert:

the models you wanna use are just not. competition attractable, so you have to do something else. Yeah.

Alex:

Hmm. So can you, like, do you have an example in mind of something like that where actually putting down the priors in a model is, has proven complicated?

Jim Albert:

Well, I'm thinking about, well, I think it's, well, that's a good question. I mean, it's easy to talk about, like spatial problems, for example. I mean,

Alex:

Mm-hmm.

Jim Albert:

how do you, we're talking about spatial parameters. That's probably a harder thing to talk about a prior on, right? Or even when, even to be non-informative or weakly informative, how do you construct priors with that kind of weak information? I mean,

Alex:

Mm-hmm.

Jim Albert:

we're

Alex:

Yeah.

Jim Albert:

exposed to like the typical Bayesian course will focus on exponential family models where you have single parameters that describe things. That's pretty simplistic to the problems we talk about in sports. Because again, a spatial problem, what is the... So maybe you don't want to use a parametric density, for example, to describe things. We're talking about locations of players. That's more of a density estimate, right, of something. And so this is a little more non-parametric. It's not a simple thing like a normal or binomial model.

Alex:

I think and you think you do you think it's something that's possible to deal with on the software side? So for instance, like as when we develop the tools, when we develop PIMC or the Stan devs, when they develop Stan or stuff like that, or the RVs devs when they are developing new plots and stuff like that. Do you think that's doable? That's something we could be able to deal with on that end? Or it's more something that's, it's a lost in translation problem and the domain experts where it's hard to cross the bridge between you talking about the distributions and and that domain expert not really knowing what you're talking about and you like you have a problem crossing that bridge

Jim Albert:

Yeah, I think, I think it's challenging in sports is that often the people doing the analytics are not the ones making the decisions. And so you need a dialogue, you know, between the, those two groups. And so we really need to have conversations where the people who are really making the decisions are expressing their beliefs. And you have to somehow quantify that into some sort of models, you know. That's sort of a challenge. And I think, so it's not about, I think the challenge is not necessarily the, to me, once you've got a statistical problem defined, it's relatively easy to get a reasonable answer. But the

Alex:

Mm-hmm.

Jim Albert:

challenge is to think of a reasonable statistical problem. And there you have to work with the people who are working with the team, you know, have the issues and then get them to express their, or scouts, you know, they're talking about player's performances as well. They're not using the same language as we're using but somehow you've got as you

Alex:

Mm-hmm.

Jim Albert:

said you got to build build that bridge so you can communicate.

Alex:

Hmm. Yeah, okay. So it's more something that goes in, like during the Bayesian workflow, the modeling workflow, where it's on the prior elicitation part more than just a technical issue, let's say that's

Jim Albert:

Right.

Alex:

something that would be made easier with a software tool.

Jim Albert:

Yeah, I think I think it really think about a consulting work where you work with somebody from a different field and

Alex:

Mm-hmm.

Jim Albert:

probably the most important meeting is the first one where you get a

Alex:

Mm-hmm.

Jim Albert:

sense of what the problem is and

Alex:

Mm-hmm.

Jim Albert:

really understand the issues and then this is way before you talk about any statistical approach, you're trying to just get a sense of you know what they're in often they have a hard time expressing what the problem is. So the challenge for the statistician is to try to put that in sort of, you know, terms that, that makes sense to him and also the person looking for the help. Yeah.

Alex:

Mm-hmm. Yeah. Yeah. Yeah, that's super interesting. And that's definitely something also, like any consulting projects we have, for instance, with Paemceelabs, our consulting firm, is like a lot of the work is focused on that, like making sure we're all sitting priors in a good way and in a way that's understandable to people.

Jim Albert:

Right.

Alex:

especially business people. So yeah, it's like in a way business people don't really care about, um, Oh, I'm using the gamma per, um, distribution for that parameter. Um,

Jim Albert:

No,

Alex:

Jim Albert:

no,

Alex:

they,

Jim Albert:

they

Alex:

Jim Albert:

don't

Alex:

they

Jim Albert:

think

Alex:

don't

Jim Albert:

Alex:

really

Jim Albert:

those

Alex:

care.

Jim Albert:

terms, right?

Alex:

Jim Albert:

But they

Alex:

no,

Jim Albert:

have beliefs,

Alex:

yeah. Um,

Jim Albert:

they can express

Alex:

and if anything,

Jim Albert:

those.

Alex:

exactly.

Jim Albert:

Yeah.

Alex:

Yeah. And if anything, sometimes it can even like backfire because some people, um, are really intimidated by math and stats and when you start shouting distributions and names of distributions and Greek letters it's just like people can shut off completely because they aren't too intimidated so

Jim Albert:

Right, right,

Alex:

definitely

Jim Albert:

right.

Alex:

something to be careful about I mean on that respect for instance An addition we made to the PMC package based on those type of interactions with clients is the new Find constraint priors distribution where it really came from work with clients where we were basically trying to elicit priors and Basically people will tell us well, I think that parameter should be positive and Most of the time, uh, it's gonna be between 0.1 and 0.4 And so basically, you as the modeler is like, okay, so that means I need a gamma distribution with about, I don't know, 95% of the mass of the distribution between point one and point four. How do I get the distribution? But then the business person doesn't care about that, but at least when she gives you that information, well, you have your prior. And then, well, you can use PIMC to get the actual gamma distribution that you would need to use with that find constraint prior function. so actually yeah we're already talking about models so that's cool because I wanted to switch a bit towards that but before that I'm actually curious if you remember how you first got introduced to Bayesian methods and in why they sticked with you they stuck

Jim Albert:

Well,

Alex:

with you I think

Jim Albert:

I was very lucky that I got introduced to Bayes through Jim Berger. And

Alex:

Mm-hmm.

Jim Albert:

I think at that point there was no real Bayesian text available. I mean Jim was Jim Berger's

Alex:

Mm-hmm.

Jim Albert:

working on a book, his decisionary book. So really I was learning it more from a decision theoretic perspective, putting

Alex:

Mm-hmm.

Jim Albert:

priors on things. To me it always seemed very natural. And I could see, I mean, there were some obviously issues with p-values, for example. And when you started looking carefully at p-values and contrasted it with Bayesian measures of evidence, you got very different answers. And it made you ask, what's really going on? And so I think I got exposed to that kind of idea. And I'm. And the idea that likelihood principle was so important. So if you collect data, say you have binary data, and let's say you get 20 successes, 20 trials, and you get 12 successes. Instead,

Alex:

Mm-hmm.

Jim Albert:

you might continue the experiment until you have 12 successes. Well, does that matter? Well, not if you believe that the likelihood principle is dominant. But if you're a frugantist, it matters entirely because you have two different distributions. You've got a negative binomial and a binomial, which are different distributions. So you get different p-values and different measures of evidence. And so I started thinking about that kind of stuff pretty early in my graduate career. So when I went to Bowling Green, teaching a Bayesian course was sort of a natural thing to do because I had that background.

Alex:

Mm-hmm.

Jim Albert:

So right away I was teaching a master's level course.

Alex:

Mm-hmm.

Jim Albert:

But then at the same time, I was also, we were teaching a lot of intro stat at the same time in the department. And I saw, I was very, let's say, I didn't think that the approach we were using in the intro stat was really working. I mean, students were learning recipes and it didn't really understand what they were doing. And they didn't understand the, the frequentest interpretation of confidence, for example, the idea that you

Alex:

Yeah.

Jim Albert:

don't have confidence in a certain method, rather you only have confidence in repeated sampling, which is very, very counterintuitive

Alex:

Mm-hmm.

Jim Albert:

because,

Alex:

Yeah.

Jim Albert:

yeah, right, so I think right away I said, well, we really need, I mean, I was aware there was some efforts to teach intro bays at that level, and I said, well, why don't we, We really should try that at that neutral level because I felt like we were, I thought inference was really a lost cause from a frequentist perspective at that level.

Alex:

Mm-hmm.

Jim Albert:

This is the students who just have an algebra, college algebra background, you know. So that inspired me to sort of look at that seriously. And the person who really influenced me was Don Berry because he had a book that was... exactly on that. I'm just basically interest that discrete models. You know, it didn't get much attention at the time, but I thought that was a really good book. The only thing I did, the only problem with the book is that really it had very little computational facilities. And so things like even with discrete priors, there is no software connected with the book. And so that inspired me to write some many time macros. to try to supplement that. So my first book was actually a collection of Minitab macros for doing Bayesian calculations. And that was my early, early book. And it didn't, it wasn't successful in the sense that Minitab changed their language. And so what I had written wasn't gonna work with the new macros. And, you know, there was a problem with that. But that really was the foundation for developing. At that time I was using a MATLAB. for computation. So I developed a

Alex:

Mm-hmm.

Jim Albert:

collection of MATLAB functions for implementing a lot of these calculations, basing calculations. And then MATLAB eventually became R. When R became available, I just took those MATLAB routines and translated them to R. And so I wrote a book then called Basing Computation with R. And that was really the routines I was using for my graduate course. So,

Alex:

I see.

Jim Albert:

but again, I was, but I was always very, so basically I also wrote, again, before I wrote that competition book, I wrote a intro base book. And really I wanted to do it from two different perspectives. First, I wanted to do Bayesian base ideas instead of frequentist. And second, I liked the idea of active learning where the students are learning through activities and a sort of workshop style. And,

Alex:

Mm-hmm.

Jim Albert:

And so Alan Rossman and Beth Chance had written some books like that, um, that were pretty popular. So what I did essentially was work with Alan Rossman and used his data part, and then added a probability and Bayesian components to that book. So that became my, my intro Bayesian book. And we used it at Bowie Green for a while. The problem we had was the book, book was fine. The students, uh, could do little Bayesian projects. But the problem is that we were trying to do that together with like some sections would do the Bayesian thing, some sections would do frequentist and we just had so many sections and having the two styles of classes was hard to work with because we were working with like 30 sections of classes. So eventually we went back to the traditional just because of management issues. It was not because the Bayesian approach was working. you needed somebody in charge who was more, had a more of a Bayesian background. So it was harder for a beginning graduate student to teach that course. So that's why we went back to a more traditional thing. But the approach worked fine. And I think I was encouraged by that. Yeah.

Alex:

Hmm.

Jim Albert:

So,

Alex:

I see,

Jim Albert:

uh.

Alex:

yeah. And the more intuitive interpretation, basically, of

Jim Albert:

Right,

Alex:

the uncertainty.

Jim Albert:

so the focus in that intro book was using discrete priors. So for example, for proportion, you make a list of values that you think are appropriate, you assign weights to those values that becomes your prior, and then you just have simple routines that will, once you observe some data, will convert those to posterior probabilities. But the nice thing about that approach is that it's easy to do inference because it's just based on a table probabilities. Um, your posterior estimate is just the, like a posterior mode. You collect values, which have a certain probability content. That's your, your, your credible interval. It was very easy to do inference.

Alex:

Yeah,

Jim Albert:

Yeah.

Alex:

yeah, yeah. Nice. Awesome. Yeah, damn. Super interesting. And actually, I'd like to talk a bit more about your educational part, because yeah, you've written a lot of books. You've taught a lot. So, um. Yeah, let's talk a bit about that and then afterwards if we have some time, I'd like to dive a bit more into the specifics, maybe of a sports and analytics model, but to de-zoom a bit because we've been a bit technical also. So yeah, as I was saying, you're also a professor and you're very passionate about stats education. So I'm curious, what are the most important skills that you try to instill in your students?

Jim Albert:

Okay, you're talking about, okay, that's a good question. I think, well, I think it is a challenge to be a Bayesian because students typically, especially at the graduate level, students have already, you know, have some training from Frequent Statistics. So to me, when you're a Bayesian, it sort of turned things around a bit. I mean, and so it's a different perspective. And so I think you need to contrast Bayesian answers with frequentist answers. And it's not like frequentist answers are bad, but it's a different way of thinking about things, different ways of thinking about conclusions, certain aspects. I mean, obviously one big advantage is that you can bring prior information to the table. And I think that's one of the big advantage. And I think you have to give them examples where prior information matters. And I think

Alex:

Mm-hmm.

Jim Albert:

from the viewpoint of combining information, multi-level models are very, very attractive. And I think those are naturally done from a Bayesian perspective because they essentially are Bayesian models. So I think, and I think also the students have to be somewhat to become competitionally savvy in some sense. I mean, they have to be familiar with using simulation, with writing some code. to do their work. We're not really using canned programs to get answers. Rather, we're using algorithms, right? And it's sort of helpful to understand how the algorithms work, right? So you don't just use... I mean, so especially when I like to teach Introduce MCMC through Metropolis and Gibbs because those are relatively easy to understand, you know, also easy to program. Stan is a bit more complicated, although it has some similarities with Metropolis algorithms. I think it's a bit more sophisticated. And I really, I think I'm a little resistant to using Stan at first. I think you need to have some experience with other algorithms first, because I think there's – Stan can become too much of a black box if you aren't familiar with some of the algorithms

Alex:

Mm-hmm.

Jim Albert:

involved.

Alex:

Yeah, I see. Yeah, you mean if you're not familiar with the basics of MCMC... Yeah, I can see what you mean. Especially... I mean, you could say the same for PrimeC, for instance.

Jim Albert:

Right.

Alex:

Where, if you don't know anything about the algorithm that's running in the background... You will just ignore the divergences or the

Jim Albert:

Right.

Alex:

target accept warnings and and just use them the model as it is even though you shouldn't

Jim Albert:

Right, right, right, right. But yeah, just to explain what a divergence means, right, you need to understand the algorithm before that makes any sense, right? Yeah, yeah. So, yeah, I think that's it.

Alex:

Yeah.

Jim Albert:

I hope you enjoyed this video. If you did, please like, share, and subscribe. And if you haven't already, you can find me on Twitter at www.patreon.com. And if you're interested

Alex:

Yeah,

Jim Albert:

in learning

Alex:

yeah,

Jim Albert:

Alex:

yeah,

Jim Albert:

about

Alex:

no,

Jim Albert:

the

Alex:

for sure. No, you need to give an intuition. Yeah, for sure. And there is this amazing small web app that's basically a video demonstration of how MCMC works. I'll always forget the name of the person who did that. I'm very sorry, but that will be in the show notes. So if you folks want to look at that, I really encourage it because it shows you basically you can slow down or increase the speed of the sampling and you basically see the MCMC algorithm going through the posterior space and you will see how each sample is accepted or rejected and you can try different MCMC algorithms. So like the classic Gibbs one for instance, very basic, and then you can increase to the very computationally efficient HNC algorithms that Stan and Pine C are using today. And that way you can also see what a divergence is, especially if you're asking for a posterior distribution that has a funnel shape or a multimodal shape, then you will see that the algorithm, if it's not tuned well enough, well enough. can have divergences and that's how it can give

Jim Albert:

Right,

Alex:

you

Jim Albert:

right,

Alex:

intuitions

Jim Albert:

right.

Alex:

about what that means and why the priors could be at fault also if you get divergences or lots of things. So yeah, I put that into the show notes.

Jim Albert:

Okay, that's good.

Alex:

And yeah, it's like maybe more of a very broad question. I'm very aware of that, but I'm really curious if you could change one thing in the way stats are taught right now at the level that you were teaching, what would it be?

Jim Albert:

Well, I always thought that the intro level, I mean, I really think that courses that try to be very, very conceptual

Alex:

Mm-hmm.

Jim Albert:

were more successful. Like for example, David Moore, a Purdue professor, wrote a series

Alex:

Mm-hmm.

Jim Albert:

of books. And one of the first books he wrote was called Concepts and Controversies. And rather

Alex:

Mm-hmm.

Jim Albert:

it was not a, it wasn't a book about techniques, was more on ideas and more about illustrations of statistics. in society and that to me is really the way an intro coast would be because otherwise if all you're teaching is a few recipes, students will learn the recipes and five minutes later they'll forget everything. But they can, but I think that's why I mean I actually wrote a book called Teaching Stats Using Baseball and the reason why I wrote that is because I said what if we spent the whole course just talking about baseball. and I would focus on players and talk about issues and questions. And the statistical material was always in the background. But I would use methods when they were appropriate. But the students loved the course. And even one student said, this is the most useful course I've taken. Well, that's silly because the course was not useful. But they probably meant they understood the context. And so maybe you'll forget the method. But you might remember the. comparing Babe Ruth with Mickey Mantle or that type of thing, or

Alex:

Mm-hmm.

Jim Albert:

talking about clutch performance and ability, you might remember that. So

Alex:

Mm-hmm.

Jim Albert:

I think that's what you want. You want to leave the student with these experiences that they will remember. And I think a lot of times our basic stack course is just glorified mathematics, where you're just learning. techniques. And that's not all what statistics is about. It's not about the techniques. It's about the way you think about data. And I think that's the challenge. And unfortunately, for a beginning graduate student, that's not an easy teaching assignment. It's much easier to teach algebra or to teach methods. It's much harder to bring in things from the news, for example, and talk about the application of that statistics. So, but as I said, that's my wish that the course would change that way, but to actually implement that for a large number of sections is a challenge.

Alex:

Yeah, but I really agree with you in that sense. Yeah, for sure. It's something I see and something we really try to teach a lot. Anytime we do teaching with PIMC Labs, we do corporate workshops. So the way we teach or in the intuitive based online courses that I do, really always trying to teach the principles. before the methods. because in a way it's the principles that are gonna save you and that you need to understand because once you understand the core principles, then the methods are quite natural because as you were saying, if you are able to think in a generative manner about your data and then about the model that you wanna have, then the method is kind of something accessory that just comes in afterwards about the problem at hand in a principled way. But. Very often what we see, especially from people who have had a lot of statistics, but from the classic framework, is that very toolbox, basically mentality, or it's like, okay, what's the problem? What's the method? What's this problem? Has this method? But then when the problem is really original, which is often the case, and there is the method, it's a problem. You have no tool in the toolbox

Jim Albert:

Right,

Alex:

and you

Jim Albert:

right,

Alex:

don't

Jim Albert:

right.

Alex:

know how to develop the tool.

Jim Albert:

Right, right. Yeah. Yeah, I think

Alex:

Yeah, it's

Jim Albert:

even

Alex:

like...

Jim Albert:

the applied sciences often have a single course where they talk about statistical methods and they're gonna focus on the tools they use. And unfortunately, that gives the student a very limited viewpoint of what statistics is all about. Right? And again, when

Alex:

Mm-hmm.

Jim Albert:

they face a new problem,

Alex:

True.

Jim Albert:

they have no idea what to do because they're just familiar with certain tools. Right.

Alex:

Yeah, yeah, yeah, exactly. And so do you have a few minutes more to talk about a model, or are

Jim Albert:

Oh sure,

Alex:

you short on

Jim Albert:

sure,

Alex:

time? Yeah?

Jim Albert:

sure.

Alex:

Cool. So I'm actually curious if you have. favorite sports analytics model a Bayesian model that you'd like to share with the with them with the listeners like mainly like yeah the the problem at hand and how you came up with the modeling idea and solution

Jim Albert:

Well, one general thing of interest in baseball, for example, there is a lot of interest on how players perform in different situations. Okay, so like, for example,

Alex:

Mm-hmm.

Jim Albert:

players generally perform better during home games versus away games. They tend to, if they're facing a pitcher in the opposite arm, which means that you're right handed and the pitcher is left handed, you do better. When there is There's a lot of interest in how does the player do when it's an important situation during the game. That's called clutch play. So there's a lot of interest in that kind of situational stats. And so that discussion leads to basically a collection of models that can describe that. And so one model would say that that the situation is no effect. That maybe you see some variation, but really there's, like for example, home versus away generally there's no difference in home games versus away games. Another model would say that, yeah, there's an effect,

Alex:

Mm-hmm.

Jim Albert:

but it's the same across all players. Okay. And the most interesting model would say there's an effect, like a home effect, but some players take advantage of that more than other players. take advantage of being at home. Now trying to understand why that's the case is interesting, but, so those are all Bayesian models and what you find is that, you know, although people like to talk about clutch ability, you see clutch performances. Players do, like usually do, but there's no special ability to do better in clutch situations. It's sort of a controversial, conclusion because people like to think that players are clutch or players will do well in stressful situations. But

Alex:

Mm-hmm.

Jim Albert:

really through modeling you learn that the variation you see in data is just like what you would predict with a model that would assume like a constant effect due to the situation. So and that's always

Alex:

Interesting.

Jim Albert:

I mean those those type of examples are always worthwhile to do because people believe believe in those situations are real. You know,

Alex:

Yeah, like

Jim Albert:

often,

Alex:

the hot

Jim Albert:

yeah,

Alex:

hand in basketball,

Jim Albert:

right. I mean,

Alex:

for instance.

Jim Albert:

certain players, like one person in baseball, his name is David Ortiz, he's thought to be a clutch player. Reggie Jackson was called Mr. October because he performed well at the end of the season. You know, agree that he did do some nice things in October, but there was no general. tendency of them doing better during the more important situations. So I mean, those models I've done for a while, they're worth revisiting because the conclusions are sort of at odds with what the baseball people believe. And so I think you have to learn more what are the really important variables in fluency performance. We had, in fact, recently I did a problem. Interesting story in baseball is home run hitting. And

Alex:

Mm-hmm.

Jim Albert:

in 2019, there was a record number of home runs hit. The question was, why? Okay, were the players getting stronger? Were they hitting the balls harder at the right at launch angles or was it issue of the baseball construction? And it turns out that the The way the baseball is made has a tremendous effect on the baseball, effect on the number of home runs. And one thing that you can measure is called the drag coefficient, which is its ability of how the ball moves through the air. And when there is less

Alex:

Mm-hmm.

Jim Albert:

drag, the ball will carry further. So it's been demonstrated that the baseball has changed in recent seasons. Although it's made in the same plant in Costa Rica, somehow the characteristics of the baseball have changed. And that's had a tremendous effect on homerun hitting. Another thing that's been changing is that pitchers will put a, will make the ball sticky by

Alex:

Mm-hmm.

Jim Albert:

either rosin or saliva or some substance. And if they can make the ball sticky, they can throw the ball with more spin. And when they throw it with more spin, that leads to more favorable outcomes. Okay. And so again, the baseball world is, this is a problem because they don't want, it's like cheating. You're trying to do something to baseball, which will make it advantageous for the pitcher. And so maybe the solution is to use this ball that's already sticky. And then the ball, they won't be able to do that advantage. Everyone will have the same thing. And that's what they're using now in Japan, and they might be using that in baseball. So it's sort of fascinating that the characteristic of the baseball can have a tremendous impact on the performance of players.

Alex:

Hmm.

Jim Albert:

But then

Alex:

Yeah.

Jim Albert:

you use model, Bayesian models to try to understand those effects.

Alex:

Yeah, yeah, that's super interesting. And what would the model look like very briefly at a high level for

Jim Albert:

Okay, so

Alex:

that

Jim Albert:

basically,

Alex:

kind of problem?

Jim Albert:

okay, so what you do, okay, so a ball is hit in the air, a ball is hit, and there's

Alex:

Mm-hmm.

Jim Albert:

a probability of a home run. And it's a function of the launch angle and the x velocity, but it

Alex:

Mm-hmm.

Jim Albert:

varies by means of a, in a smooth way. So a general way of doing that is by a generalized additive model. So you're saying the probability of the home run on the logit scale, is a smooth function of the launch angle at x velocity. Okay.

Alex:

Mm-hmm.

Jim Albert:

And so based on that model, that sort of describes, you know, how the baseball works in a certain year. And then you can predict from that model and then look at this current year and see what the predictions are for home runs for the current year. And then you get a sense of if predictions are lower than they were, and he would predict that means the baseball is not the same as the one they used that previous year.

Alex:

Mm-hmm.

Jim Albert:

So you're really using a Bayesian prediction exercise to get a handle on how the baseball characteristics change from year to year. And again, every year is you don't, you don't baseball, official baseball organization doesn't really tell you what's going on. So you have to learn what's happening through the collected data. Yeah, which is interesting.

Alex:

Yeah, and that makes space even more interesting then, because

Jim Albert:

Yeah, this

Alex:

data

Jim Albert:

space, this

Alex:

is...

Jim Albert:

has been interesting year because baseball is is sort of a long game. And so they've made changes to the game to make it go faster. So now

Alex:

Mm-hmm.

Jim Albert:

pictures

Alex:

Mm-hmm.

Jim Albert:

have to there's a pitch clock pictures that throw a ball within a certain time. And the bases are bigger. So it's easier to steal bases. And they

Alex:

Yeah,

Jim Albert:

the

Alex:

okay.

Jim Albert:

infielders are not allowed to be in the same locations. They have to be. before they would have all the infielders on one side of the second base. That's not allowed now. And all these changes are impacting the game. But we don't understand exactly until we look at the data. Yeah, so it's an interesting season.

Alex:

Yeah, yeah, for sure. Yeah. Yeah, super interesting. Thanks a lot for working us through that, Tim. Before calling it a show and asking you the last two questions... I'd like to de-zoom again and ask you more generally, what does the future of sports analytics look like to you and what you would like to see and what you would like to not see.

Jim Albert:

Well, I think it's going to continue to grow. I mean, I think it's probably going to grow more in other sports before, like baseball right now is sorta at a, I think they have pretty large groups now working for the baseball teams, but that's not happened for other sports. So I think what's happening in baseball is going to happen in other sports. And what's interesting is they're looking at a lot of the same issues. Like for example, looking at the importance of situations, looking at the impact of the ball, either understanding things like that. So ideas from baseball are going to be helpful for understanding other sports. I'm going to be going to a seminar, it's called Saber Seminar in August. And

Alex:

Mm-hmm.

Jim Albert:

this is an opportunity for students to make presentations about baseball projects. And... most of the Major League Baseball teams are there at this meeting. So it's a wonderful opportunity for these students to showcase their talents

Alex:

Yeah, it's cool.

Jim Albert:

and get interviews for things like internships and jobs. And so I think that's the model. And I think that's going to that's going to happen to other sports. And it's again, I think soccer is going to become the next big frontier and it will continue to grow in terms of analytics.

Alex:

Yeah, I mean, I agree. Like soccer is the sport I know the best and I can see so many, um, so many opportunities here, um, and I wish we'll be able to, to help them, uh, because I mean, I, I love sport and I know, um, Bayesian stats can bring a lot of value there. We'll see if Western European soccer clubs finally start seriously getting into that. I think they will because the Premier League has already started doing that and that will give them a structural competitive advantage at some point and so

Jim Albert:

And the technology

Alex:

I can't have buy.

Jim Albert:

is becoming more prevalent too. I think the technology you see

Alex:

Yeah.

Jim Albert:

at the higher level leagues is eventually become available to lower level leagues. Yeah.

Alex:

Yeah, yeah, not official. Okay, well, that was awesome. Already took a lot of your time, so let's call it a show. But, of course, before letting you go, I'm gonna ask you the last two questions, and ask every guest at the end of the show. So, if you had unlimited time and resources, which problem would you try to solve?

Jim Albert:

Well, I've always been fascinated by the learning about team performance or learning about what is special about how the players work together. And that's sort of a challenging problem because I don't think there's a clear methodology for doing that. I think we need to look a little more,

Alex:

Mm-hmm.

Jim Albert:

you know, and I think it's probably maybe baseball is a little more individual sport. I think sports like basketball and and American football are

Alex:

Mm-hmm.

Jim Albert:

more team oriented because the players have to work together and trying to understand that that synergy between players to me is a fascinating topic. I think we're going to learn more about that but I think we have to get beyond our basic models to do that so I think it's challenging to develop the methodology but that's the kind of things I've been thinking about.

Alex:

Yeah, yeah. I'm not surprised. You seem very passionate. So I'm not surprised that the answer is still sports analytics oriented.

Jim Albert:

Oh, right, right. Well, that's I think that's the one thing that's nice about sports is that it's a sports like in baseball. I understand the sport really well. And so it gives you a certain intuition. So I think when you get statistical conclusions, they have to somehow agree with what you think. And so sometimes you can discard what you get because it doesn't make sense.

Alex:

Mm-hmm.

Jim Albert:

You don't have a good explanation for it. Yeah.

Alex:

Yeah. And so second question, if you could have dinner with any great scientific mind, dead, alive or fictional, who would it be?

Jim Albert:

Well, one person that was very remarkable as a statistician because he was very prominent in methodology, but also he was very interested in education and also interested in sports was Fred Mosteller because he

Alex:

Mm-hmm.

Jim Albert:

was a remarkable statistician. He was such a wide range of interests. And I think that's sort of a great model for, you know, for someone. to work in statistics because I think John Tukey said that you can work in other people's playgrounds and I think Maastar was wonderful in that way. And he had some very influential sports papers. He looked at the World Series, for example, and

Alex:

Mm-hmm.

Jim Albert:

I think he looked at golf, for example. I think that he'd be a fascinating person to have dinner with.

Alex:

Hmm, yeah, well, it sounds like an interesting dinner for sure. Awesome. Well, thanks a lot, Jim. As. usual I put resources and a link to your website in the show notes for those who want to dig deeper and there will be a lot of things in the show notes because Jim has written an incredible amount of content so feel free to check this out folks and yeah thanks a lot for taking the time Jim I learned a lot I'm really fascinated always by sports analytics. So that was really a treat and a pleasure to have you on the show.

Jim Albert:

Great, thanks for having me.

Alex:

Thank you, mate. Thank you.

Transcript

Sign up for our newsletter!

The latest from Reverend Bayes directly in your inbox!

QUICK Links

Get in Touch