Name: Bayesian Football Analytics in Europe, with Max Göbel
Uploaded: 2023-09-20T11:00:42Z
Description: Explore Bayesian football analytics for European soccer: modeling goal-scoring rates, multilevel player skill models, and why analytics lags in Europe

‌

Listen on your favorite platform:

Apple Podcasts

Spotify

Youtube

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

https://www.pymc-labs.io/

My Intuitive Bayes Online Courses: https://www.intuitivebayes.com/

1:1 Mentorship with me: https://topmate.io/alex_andorra

As you may know, I’m kind of a nerd. And I also love football — I've been a PSG fan since I’m 5 years old, so I’ve lived it all with this club.. And yet, I’ve never done a European-centered football analytics episode because, well, the US are much more advanced when it comes to sports analytics.

But today, I’m happy to say this day has come: a sports analytics episode where we can actually talk about European football. And that is thanks to Maximilan Göbel.

Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan. Before that, he did his PhD in Economics at the Lisbon School of Economics and Management.

Max is a very passionate football fan and played himself for almost 25 years in his local football club. Unfortunately, he had to give it up when starting his PhD — don’t worry, he still goes to the gym, or goes running and sometimes cycling.

Max is also a great cook, inspired by all kinds of Italian food, and an avid podcast listener — from financial news, to health and fitness content, and even a mysterious and entertaining Bayesian podcast…

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau and Luis Fonseca.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Links from the show:

Max’s website: https://www.maximiliangoebel.com/home

Max on GitHub: https://github.com/maxi-tb22

Max on LinkedIn: https://www.linkedin.com/in/maximilian-g%C3%B6bel-188b0413a/

Max’s Soccer Analytics page: https://www.maximiliangoebel.com/soccer-analytics

Soccer Factor Model on GitHub: https://github.com/maxi-tb22/SFM

Max webinar on his Soccer Factor Model: https://www.youtube.com/watch?v=2dGrN8JGd_w

Timestamps

00:00:00 Episode starts

00:00:45 Alex introduces Max

00:10:41 What tools do you use to work ...

00:17:43 Can you define econometrics for us and tell us what it brings to economics?

00:28:38 Lets talk about soccer factor model

00:41:36 How did you adapt that factor model for soccer?

00:56:02 Common misconceptions against using statistical models to evaluate performance ..

01:03:15 Getting it per player ...

01:05:14 Subjective judgement

01:08:06 What is your main pain point right now ...?

01:12:07 If you had unlimited time and resources, which problem would you try?

01:13:39 If you could have dinner with any great scientific mind, dead or alive or fictional who would it be?

Max's paper using Bayesian inference:

VARCTIC - A Baysian Vector Autoregression for the Arctic: “Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis”: https://journals.ametsoc.org/view/journals/clim/34/13/JCLI-D-20-0324.1.xml

Forecasting Arctic Sea Ice:

Daily predictions of Arctic Sea Ice Extent: https://chairemacro.esg.uqam.ca/arctic-sea-ice-forecasting/?lang=en

Sea Ice Outlook (SIO) Forecasting competition: https://www.arcus.org/sipn/sea-ice-outlook

Some of Max’s coauthors:

Philippe Goulet Coulombe (UQAM): https://philippegouletcoulombe.com/

Francis X. Diebold (UPenn): https://www.sas.upenn.edu/~fdiebold/

Timestamps

Maximilian Göbel, welcome to Learning Basian Statistics.

Thanks Alex.

Oh, yeah.

Thank you for, for taking the time.

I'm really excited about this episode.

Um, I'm really having a variety of, uh, of, uh, podcast episodes these days.

Um, going from, so episode nine 89 is going to get out in a few days.

Uh, and, uh, you'll see it's about sports also, but it's about the science of, um, sports and nutrition.

of exercise and nutrition.

And so today we're going to talk a lot about sports also, but more about football or soccer as it's known in the US.

So that's going to be a fun one.

And I'm really happy to have you on the show because you are German.

So if I remember correctly, Germany is in Europe.

And so you would be the first soccer analytics episode Europe centered, which is cool.

Yeah, it's one of the things I'm saying we should do more here in Europe.

But before that, as usual, we'll start with your origin story.

Max, how did you come to the world of econometrics and machine learning?

Because it's actually what you're doing most of the time, if I understood correctly.

Yeah, yeah, you're right, Alex.

Well, actually, it's been well, if I say it's quite a journey, it sounds dramatic.

But that's, that's not the case.

But it took me quite a while, let's say.

Yeah, that's maybe the better framing.

I started out in my PhD, basically, the first year is, you know, there's just some coursework.

But I went into the PhD without really having something that I really wanted to work on in particular.

So I took the first year to see which courses I like, which not.

And at my university, it was not really allowed to choose from.

I mean, we had macroeconomics, microeconomics, and econometrics, the usual stuff.

But yeah, really nothing resonated with me so much, I have to say.

And then I thought I would do some macro, macroeconomics.

I think many, many people, or most of the people.

PhD students really want to do something in that field.

So it was also me.

But yeah, I really never got familiar with that stuff so much.

I never really liked it.

But in the second year, then there was a course of computational economics.

And I liked that quite a lot.

And it was also, let's say a tough schedule.

I had to prepare a proposal within a week and I didn't have any idea about computational economics.

But that really got me into

looking into that stuff very deeply or deeper, let's say.

And so, yeah, basically what I was working on there was some clustering, some unsupervised learning basically, but it wasn't really a fancy machine learning back then.

So what I did was like the project was related to clustering community structure in the SMP 500 basically, that was the project.

And...

Yeah, but I really thought, oh, this network analysis, this community structure detection, that's really cool.

I want to work on that.

And yeah, so I thought this would be basically the outline for the rest of my PhD.

And how did I get into economics and machine learning then?

Because it wasn't really related to or not really machine learning, what I was doing back then.

So how do I get there then?

It wasn't until the third year, basically until

I got luckily invited to the University of Pennsylvania as a visiting student.

And I got introduced, I got invited by Francis Diebold and I'll be forever grateful for him for inviting me there.

And he had a research group on econometrics.

And at that time, the topic was about climate.

And I, again, I thought, well, I'm, I don't care about the topic actually.

I just want to learn whatever.

Yeah.

comes to me.

And so, yeah, I took that opportunity.

He introduced me to his research group.

And they were working on climate on climate forecasting, climate econometrics.

And that's how I got basically really introduced into econometrics.

Because before I went to the University of Pennsylvania, I thought like, yeah, I basically know what's going on.

And I have this and this project.

And that's cool.

But when I really arrived there, I really got to know what PhD in economics is really about.

And yeah, that was pretty insightful, I would say.

And that's how I got introduced, basically, through this research group, through projects that we were working on.

And then there was one guy, he was Frank's RA.

And yeah, he was working on machine learning, in particular.

And basically, a couple of weeks in, he came to me and asked me, well, Max, you want to get me that and that data?

And we can work on a project.

started off a long, well, quite well, a couple of years now of co-authorship with him with Philippe Goulicolom, who is now a professor at UCAM in the University of Quebec at

Montreal.

And he's working a lot on machine learning.

And he basically introduced me to that sphere.

And so in the end, it was the third year of my PhD that I got introduced into econometrics and machine learning.

And yeah, quite late, as I would say.

Yeah.

Better late than never maybe.

I mean, better late than never.

Right?

So it's cool.

And you seem to enjoy that.

So that's super fun.

And so today, what are we doing?

Basically, how would you define the work you're doing nowadays and the topics you are particularly interested in?

Yeah, well, that's a good question.

And because everyone I got asked that question, I also already or always had a difficult time actually saying because I was doing something here, something there.

So in between, I also thought I would like to get back to macroeconomics actually, but after spending a couple of months on something there and it didn't really work out, I

completely ditched it at least for the meantime.

So what I'm working now is basically machine learning and macroeconomic forecasting, let's say.

I have a project on recession forecasting in the United States, which is probably a hot topic currently.

Everyone is awaiting it, but it doesn't really seem to occur.

So you have to wait a couple of months more.

And then the other stuff is basically related to climate, a lot of climate forecasting.

especially about Arctic sea ice, how Arctic sea ice is projected to evolve in the future, not only in the near future, but also in the, let's say, longer run.

So when Arctic sea ice might potentially disappear, there are a couple of projects on that are still related to that climate econometrics group.

And then the other stuff is basically, yeah, I mentioned learning.

And I got really interested in finance, asset pricing, what you can do.

predicting stock returns, using machine learning tools there.

That's super fascinating.

And yeah, just I mean, I have to say that I'm not a specialist in machine learning or so.

I'm just super interested and fascinated by the tools and the problems that come with them.

So yeah, there's a lot of, well, they are powerful, but.

Applying them to finance and economics also comes with some drawbacks.

So yeah, you have to work around that.

And it makes it super interesting.

Yeah, yeah, yeah.

Yeah, for sure.

And I mean, that's probably by being really interested in a topic that you end up being a specialist of it.

So it's like you don't really start being a specialist and then being interested in the subject.

It's like the causality go the other way around.

So that's good.

Like trying a lot of things is how you end up finding.

what you're really passionate about.

Yeah, awesome.

And I'm curious actually, in the research realm of economics, which tools do you use, machine learning tools, to work in these models?

I'm guessing a lot of open source package, I'm hoping.

Because I remember I was introduced a bit to, I mean, I knew a bit the econometrics economics

field in Europe a few years ago and they were using Stata all over the place.

So I'm curious if that changed and how that changed.

Oh yeah, that's a funny question.

Because Stata, yeah, I mean some people love Stata.

I'm actually at the complete other end of the distribution.

So I always try to avoid it as much as I can.

I don't know, I never really liked it.

So what I'm using is basically R and Python.

Okay.

worked a bit on MATLAB.

I like MATLAB actually a lot.

But yeah, now I'm mostly working in R and Python.

And it really depends.

Sometimes I prefer R.

Sometimes I prefer Python.

For machine learning, I'm mostly using Python.

Well, let's say for machine learning, I'm actually using R, let's say, when it comes to random forest or gradient boosted trees or something like that or just plain LASA or

Ridge.

When it comes to deep learning, then I'm using Python.

So TensorFlow, now I'm trying to switch to PyTorch, actually.

And yeah, so that's basically the patch that I'm using.

Yeah.

Yeah, interesting.

And how do you choose the tool, the particular tool you're using for a particular project?

Yeah, that's a good question.

I think that's mostly an art rather than a science, I would say.

And it's up to your preference.

But not all tools work in every context, right?

So in economics, it's really the problem, especially in, I would say, macroeconomic forecasting, where you have time series of, let's say, it gets until 700 observations on a

monthly basis for the United States maybe.

And then you have a feature set of, let's say,

100 features when you include lags and all that.

You can pump it up maybe to 1,000 or something.

But for machine learning or for deep learning, this is still rather a small data set, I would say.

So that's ridiculous, actually.

But still, that's then the challenge, right?

To tune them, to train them so that they don't overfit.

And that's really the interesting part for me, I think.

And yeah.

In other contexts, other tools might work much more conveniently, let's say, or are much easier to apply.

So some lasso or so when you have a lot of features and you just don't know which features are important, then you, yeah.

I like lasso in that regard because it selects basically the features for you.

Or you might say, well, you're a file

As a pricing context, we have returns, a lot of noise in their signal-to-noise ratio, very, very low.

You really don't know which features are important.

So we just maybe the better option, because Lasso would basically set almost everything to zero.

Yeah, so it really depends.

You really have to make it dependent on the context that you're working in.

And yeah, but that's also interesting to see which models prefer or work well on which data sets and which contexts.

And yeah, I'm still learning in that regard.

And that's super interesting.

Yeah, yeah, yeah.

No, for sure.

And I find that super interesting also to see this ability of open source tools to basically be adopted more and more in your research, which of course, I'm extremely

biased, but I welcome.

But also mainly because I do think that open data and open source are natural consequence, but also cause, I would say, of...

more open science, which I definitely welcome and I think should be way more of the case, you know, like more and more you see papers with accompanying GitHub repositories and

accompanying GitHub open source packages even in Python or in R, which is definitely something new.

And that's super cool that the research realm is catching up on that.

Um, because less and less you see papers where I remember a few years ago, you know, like the first open say the open science and, or open data papers was like, Oh yeah, the data

is available by the way.

Um, at the end of the paper, you know, and then you had to basically beg the, the corresponding author about like three times a week for four months to get some of the data

and that was not really open basically, um, so yeah, that, that's a really cool.

development that I really love.

I have to say.

No, absolutely.

And this is also, I think that's a very good point.

For example, me and my co-authors, or my co-authors are pushing for that, really, to make the codes then also available on the website, for example, so that people can cross-check.

And that's very good.

And yeah, I like that also myself.

When I read papers and I want to replicate something and the authors are making the code available, basically, you can check if your own code is correct.

That's super helpful.

You learn a lot by that.

And yeah, really, really.

Especially when, for example, using GustaTrees or so.

I mean, it's XGBoost, and it's super convenient to use.

And for sure, there's some tuning that you have to do yourself.

But still, the package is there, basically.

And it's super convenient to use.

You don't have to cope the whole forest, basically, yourself.

So yeah, for sure.

That's amazing.

yeah.

No, clearly.

Yeah, that's super nice and well done and like picking up all those different tools and different languages.

That's super cool.

And I don't know how it changed, but I do remember that a few years ago, doing open source development wasn't really incentivized for doctoral candidates or post-doctoral

candidates, so maybe that changed and that's further better.

But if that didn't, the fact that you're doing it is like even more

commentable, I would say, because that's a bit adjacent to your project.

So yeah, well done on doing that and taking the time to do it.

That's what we're called for sure.

Um, so now I'd like to talk a bit about, yeah.

So you said you're doing econometrics, but, um, can you define econometrics for us and, and tell us what it brings to economics basically.

Yeah, sure.

So a lot of weight now for me on giving the textbook definition of econometrics.

No, I mean, it's basically, or now I'm butchering the whole definition probably.

But it's applying statistical tools to an economic context and trying to use statistical tools to basically verify some economic theory or some.

to understand some relationships between economic variables.

So I think it's a, yeah, I think that that's basically it.

It's kind of a fancier term for what it actually is, applying statistical tools for understanding economic relationships.

That's basically it.

I mean, it's essential.

I mean, for empirical work, for sure they're economists who you only work on theory, but yeah, for policy analysis or for...

you need to analyze the data in the end.

And basically, that's what I'm doing.

I don't really do theory stuff, but for me, it's just all empirical.

And yeah, so definitely, it's very useful in the end, especially for policymaking at central banks and everywhere, also for the industry, be it banking industry or be it just

normal in the real economy for analyzing demand and all that.

And do you...

So I'm curious how you got introduced to Bayesian methods, actually, and why they stuck with you, because from what I remember, from the world of econometrics, Bayes was not used

a lot in this field.

So I'm actually curious why you are using it.

Yeah.

Well, I have to admit, like, so I already said that it was like third year that I got to introduce in Jekyll and the Matrix.

And that was this project when Philippe, Frank's RA basically came to me and asked me to gather some data on climate variables because we want to run a vector autoregression of

the Arctic.

Basically, you basically get some, what we basically did is we gathered data.

and which time series on certain climate variables, which we thought would proxy for the Arctic ecosystem basically.

And then we wanted to use a vector autoregression to analyze certain amplification mechanisms, if there is a shock to CO2, for example, and also to be able to produce

long-run forecasting projections.

So when Arctic seas might potentially disappear in the future.

Yeah.

And so the data is highly non-stationary.

And in VARs, when you work with VARs, most economists really work with patient methods there.

And as I said, data was highly stationary.

So patient statistics or the patient's framework gives you some leeway there, granted some freedom there.

So that was big.

Yeah, that was why Felipe told me, okay, look at Bayesian VARs, look at the Bayesian way.

And that's how I actually got introduced to that.

And there was at the time, I really didn't have any exposure.

So there was a package in MATLAB for doing Bayesian inference, basically, with VARs.

And that was super helpful.

That helped me a lot.

That was super, or a great education, a source of education, really, that was great.

And

The more I learned about it, the more it resonated with me, this concept of quantifying uncertainty.

I think this is because especially in economics, this is quintessential to really get an idea of what the uncertainty is.

Point estimate is always nice, but you want to have the uncertainty around it.

And that's also what Frank Biber always told us.

Yeah, you want to have a measure of uncertainty.

And definitely, that's true.

Yeah, you get it from the in the Bayesian framework.

It's just so intuitive to think about it.

And yeah, I like that a lot.

And unfortunately, I don't really work so much or haven't worked in so many projects with Bayesian methods lately, or not as much as I would like to.

But yeah, it's ever since resonated with me.

And yeah.

Still, I wanted to learn more, and that's how basically I got into looking at PyMC, because I wanted to learn with Python, and thought, well, maybe an application with

Bayesian methods, the Bayesian framework would be cool to learn, and that's how I got into PyMC 3, or PyMC basically, or looked at it and looked at it.

So, yeah.

Yeah, yeah, yeah.

Nice.

That's interesting.

So yeah, basically, it's like the uncertainty quantifying that was really important to you.

Yeah.

I mean, that does make sense, right?

Because, yeah, that's really one of the parts where bass does shine a lot.

And also,

especially for the Arctic sea ice project that you are talking about.

It's not like it's a reproducible experiment.

It's really hard in these cases to think from a frequentist framework of repeatable experiments.

You cannot have multiple earths on which you can two RCTs where you melt the ice caps or not, and you melt it naturally.

like naturally or thanks to human intervention.

It's just like, it doesn't work in that case.

So yeah, Base, I'm not surprised that it would be a project where Base fits way more naturally.

Yeah, no, that's for sure.

I mean, for example, these climate models from these climate institutions, these are huge models.

And big models, to train them or to run these models, it takes a lot of time.

And they are very sophisticated.

So really, really sophisticated.

But they are basically deterministic models.

And they give you a point estimate in the end.

But our...

interest was basically really to see, well, we get a point estimate, but we also want to see, especially when you project the path of Arctic sea ice, the uncertainty around it.

Well, how likely is it that maybe or that we see Arctic sea ice disappearing, not at our point estimate in the 2060s or 70s, but beforehand?

Like how large is the uncertainty?

Maybe our model is really not good and the uncertainty is so much all over the place that it's

more or less useless.

But yeah, and that project was actually interesting to see that the uncertainty or the credible region was basically spanning like 20 years, 25 years around.

So that was very interesting.

And it gave us a quick quantification of uncertainty to it.

Yeah, that was really, really interesting.

Yeah, yeah, yeah.

Nice.

Uh, they, I love that.

Uh, and I mean, I would have, that's really interesting for me to, to talk with someone who recently got into the Bayesian framework and to understand how you get into it and

why, and, and how, uh, so I would have a lot of other questions on that, but I want to talk about football or soccer, so let's, let's switch to that and then if we have time at

the end of the episode, I'll come back with my, um, nerdy, uh.

Educational questions.

So yeah, basically you have an area or a hobby of yours where you do apply and need actually Beijing stats and that's soccer analytics.

First, I read a bit your website and I saw you were a passionate football since you were a child and you mentioned a bunch of European championships.

Not the French one though.

I was absolutely outraged.

What happened?

Like, don't you get the French games in Germany?

Oh yeah, well that's another issue.

So when I was younger really, I mean it was only the Bundesliga and sometimes when you were lucky, sometimes you got the highlights of the French Premier League and the Serie A,

but yeah you had to be really lucky, it was not always available and I wasn't that...

Yeah, I didn't know the websites where you could watch it basically.

So that was another issue.

But yeah, the French, well, the French league, I was never really a fan of.

I'm sorry, Alex.

But yeah, that's just even though one of my favorite players was Joao Gopic.

So Olympic Rio.

Yeah, yeah, yeah.

So yeah.

Yeah.

Um, yeah, no offense taken.

I think the French league is pretty boring.

Um, and, uh, yeah, as long as, I mean, yeah, um, as long as PSG is dominating like that, uh, I mean, that's good for me because, um, I'm a PSG fan since I'm like five year olds,

uh, but yeah, like, uh, it's not a very interesting league.

And the level is kind of going down by the gears.

So hopefully we'll get some investors in other clubs, which make for a good competition for Paris, but until now it's really bad.

And it's actually bad for Paris because the competition inside the country is really bad.

So then when they get on the European stage, they are not really used to the intensity and having so much.

adversity in a way.

So, yeah, it's too easy for them, let's say.

So basically, but I didn't get you on the show to trash the French league.

I want to talk about soccer factor model that you recently worked on.

And I found it super interesting because that's mainly, yeah, the main question I always have in soccer analytics.

The nerd in me is always very careful about the hot takes that you see the commentators have about players where it's like, yeah, but what's the, how do you separate a player's

skill from the ability, skills and ability from his team's strength?

And that's to me is extremely important because mostly in Europe, right now, most of the clubs...

mainly invest on players on gut feeling, basically.

And the thing is when you do that and you're not able to separate inherent player abilities from team strength, then you get kind of an aura effect from the beginning of

your carrier that can follow you, even though you're not that good of a player, but basically, like this aura can follow you even though you are not making that much of a

difference.

But it's just like, it's hard to contradict it because you don't really have the method of the scientific way of disproving basically what's going on.

That actually, well, it's not really your inherent abilities but mainly the people you're surrounded with.

And I think it's like absolutely important to do that and should lead to...

really a revolutionized way of transferring players and signing them and so on.

So, that was basically the background for people who are not interested in football.

Even though, even if the field doesn't interest you, I think the method and the goal of the model is actually extremely important because you can also think about that in

finance, for instance, like I know a lot more work has been done in finance for that because I mean, the return or.

Basically, the incentives of the money are much more important because you know if you make money or not.

But I know there is a lot of literature right on basically passive investment versus active investment.

And how do you actually prove that an active investment is better than a passive one and that it's actually due to the skills of the person who invested on the market instead of

just random market fluctuation?

So you can see that in a lot of contexts where you can see that.

Basically, information is sparse, is hard to decipher, and so you need a model to make sense of it.

So you can see that, I would say, in football, in a lot of sports, in finance, in medicine also, right, where it's like you can have a lot of these celebrity effect basically.

I think in a lot of contexts where celebrity effect is important, it can be broken down by that scientific way of estimating it.

So these...

politics, of course, movie.

I think it's basically a theme that's running in a lot of fields where the celebrity effect is extremely big.

So yeah, that was a very long introduction.

But to say that, I think it's very useful.

So you can react to what I said and also afterwards, if you can tell us what a factor model is.

Because your model is very,

You could lead the soccer factor model, but then can you tell us before that what a factor model is?

Yeah.

No, Alex, I mean, you laid it out perfectly.

I couldn't have said it any more accurately, I would say, really on the point as far as I see that.

So a factor model, what it actually is, is a factor basically as some, I would define it as some proxy for a certain.

exposure to a certain, in finance to a certain risk basically.

Also a reduction for example in when you look at economics or macroeconomics it's often related to the context you have a huge set of features and you reduce it to a couple of

underlying factors or a single factor only.

It's a kind of a feature reduction like dimensionally reduction technique like PCA.

principal component analysis or that.

But in finance, it's really like a proxy for a certain risk exposure that basically the cross-section of stock returns or all stock returns are exposed to a certain systematic

risk exposure.

All stock returns are basically exposed to it.

This is basically a factor.

And in the literature, and as surprising as identified, several of these and yeah.

common risk exposures basically across the whole universe of stocks basically.

But as you already said, you can use it also as quantifying the ability, for example, of a portfolio manager.

So if he has some skill in the game, basically if he has really superior selection potential, then just following along these.

common risk exposures, basically.

And that's also what this Stalker Factor Model basically is inspired by, to identify certain features that all players are exposed to because of the differences in the teams.

And then when you account for that, then you can basically extract the skill and the inherent ability of each player after you account for these systematic

differences across teams basically that influences the ability or the observed performance of a player.

Yeah, yeah, for sure.

Yeah, for sure.

Because like in the example of football, like you'd say it's easier to be the number nine.

So the, how do you say in English that position, like the front, playing.

Number nine is like the guy who's supposed to score the goals.

Like the English natives can then tell me what the, the name is in French that would be Atacon.

It's easier to be the number nine of PSG than the number nine of a very small team in France, because the whole, the rest of the team is stronger.

The manager is supposed to be stronger and so on.

So, yeah, you're like, yeah, but maybe if you took the number nine of the small team and you put it in Paris, maybe he would perform as well as the current number nine does.

So how do you make the difference?

So that's what we're going to talk about.

Before that, I'm curious, from a structural standpoint, these kind of factor models, how do they work?

How much time do you need to really start to decipher the difference between inherent skills and exhaustion as basically strength?

And that question is basically, how much

data you need from the past years to start having an idea like how data hungry are those models.

Yeah, so that's definitely a good question, a good point.

So you have to create these, yeah, you have, so in the model that I'm basically proposing is,

Basically, I need a lead time into the season to really account for certain differences.

So I need a couple of games already that would need to be played to really account for differences in teams.

Because before the first game, basically, everything, or based on the data that I had, everyone would have been the same.

But it depends really on the data.

If you have data that allows you to account for differences across teams, batch it.

Mm-hmm.

Yeah.

And for overall data, I would say like more data is always better.

If you have only a few observations, I think the Bayesian framework is then tailor made for that as well.

Like it's yeah, it grants you some leeway there.

But I would say really, it's the more data you have, the better.

But yeah.

But you could already, OK, so you could already start having that idea with just a few games.

Then you get the idea of the strength of the team.

And then you can start deciphering the strengths of the player.

OK.

Yeah.

But as far as I always used a certain number of, let's say, burn-in games to really account for that.

Yeah.

And I mean, it's not that superficial, right?

Because you can think like right now it's August, it's the beginning of the leagues for the European teams.

August is a weird moment where the teams are still warming up basically.

Um, and they are not really, they are clearly not at peak performance.

Usually they try to peak around spring for the Northern hemisphere.

So around March.

from February to May, basically, they are trying to get their peak.

So they are still warming up.

They can still trade players until the end of August.

So you could really say that the games they are doing in August, even though they are official games, they are still warming up games and don't really mean a lot for a

long-term performance perspective.

So that's an interesting moment to start warming up the model, I'd say.

And so, but something I mean, and maybe you have that for future iterations of the model where you could put in the priors.

Um, we're going to talk about the structure of the model, uh, right away, right after that, but, uh, something I'm thinking about is that you could put in the prior, the

information that you have about the strengths of the team in, in the way that, yeah, you have the budget, which is a good proxy for potential future performance.

But also, like, just past performance.

If you know that Paris has been the champion for nine years out of 10, well, you have really good prior about the strengths of the team.

So you can probably also add that into the model and in that way reduce the warming up period of the model.

Yeah, no, absolutely.

Or how Paris against Lyon, let's say, has performed in the past.

So they're direct comparison between those teams, basically, when they faced each other for past years.

That would also feed in there.

Yeah, so absolutely.

There's a lot of potential.

And my model is, when you're basically suggesting this stuff, my model just appears very rudimentary.

But it could be definitely.

extended in that regard.

Yeah, I mean, that's the fun thing of model and rights.

It's like you have to start somewhere that's good enough, and then you have a lot of ideas to extend it.

And it's a never-ending endeavor.

Like, each model, if you want to do your good work on it your whole life, if you're interested enough, you definitely can do that.

I know my models that I often revisit are the ones for predicting French presidential elections.

when I started doing that in 2017 and compared to the one I had for 2022, it's just embarrassing.

But in a way, it's good that the work you're doing right now is the best one you've ever done.

And in a few years, when you look at the work you're doing right now, it should be the worst you've ever done because that means you've...

progressed a lot in the meantime.

So I think it's a good mindset.

So how did you adapt that factor model for soccer?

Like how, what does the model structure look like basically for listeners to have an idea?

And for those watching on YouTube,

you can share your screen actually.

So if you want to share anything at some point, feel free to do it.

Otherwise, the audio format is here for you because it's a podcast.

So it's an audio first content.

Perfect.

Yeah.

So yeah, maybe if I get it on the screen, I'll do that.

But for now, maybe the structure, I think, is pretty simple.

And as you laid it out already very, very accurately, it's basically trying to come up with some features, do some feature engineering that basically accounts for differences

across teams.

And well, when you look at, let's say, player

a certain player, let's say, Cristiano Ronaldo.

And you really want to account for the difference that his current team is currently between his team and the team that he's facing at that exact instance.

And you want to create some features that can proxy for these differences across teams.

And that's basically the heart of the model.

And this is basically inspired by these asset pricing factors that try to account for.

differences across assets, across stocks, across firms, basically.

And the modeling part itself is really nothing sophisticated.

You can include kind of a hierarchical structure where you don't need to, but it can help, definitely.

But it's really the feature engineering that is at the heart of it.

And then IMC comes in very conveniently and just basically.

That's the dirty work for you.

Mm-hmm.

And so what's the, so then that's cool.

If it's a simple structure, yeah, can you talk about what was your likelihood and then what kind of distribution you put on the parameters and things like that?

I think it would be a fun thing to talk about for the listeners.

Sure, sure.

Then maybe I just get the workbook loaded.

So maybe I can share my screen and couple of...

you should be able to.

Let me see.

So in terms of a likelihood, basically, or what the model structure is, so I have to proxy, I need some observed measurement of a player's performance.

Not a skill, I mean, that is something that is underlying, that is latent, that we want to identify.

But we need some observed measure of player performance.

What I used is scoring goals.

Did players score a goal in a certain game or not?

So basically, 0, 1, basically binomial distributed, and basically, the logistic regression it is.

You want to identify the probability of a player's scoring.

And so now I have it.

I guess I have it here.

you may have to authorize Google Chrome to share.

Oh yeah.

unfortunately takes a bit of time.

Um,

Sorry, I guess I'll be here in a second.

all good.

Yep.

It's all good.

You can do that and come back.

I don't know what's going to happen for the recording, but I already did that.

After all, it's no problem.

Sorry, I didn't.

I mean, it's the first time I do it.

So I didn't know it either.

Ah, okay, here it is.

Wait.

Is it Joe?

Ah.

No.

think you need to give permission.

And open your computer system settings and click privacy and security.

Well, maybe.

Apparently, if you open your system settings, and then you go to privacy and security, and you click screen recording, and allow your browser to share your screen.

I think you need to allow Google Chrome to share your screen.

mm-hmm yeah I was there but ah yeah okay now maybe

it's no chip.

Okay.

Sorry for that.

So let's see.

No?

That's what I wanna do with that guess.

Sorry.

because you have to get out to quit Google Chrome and then come back.

Are you on Mac?

Yeah, yeah, exactly.

so you probably need to close Google Chrome and then come back.

But you can do that.

And then you come back to the same link I sent you.

And then it should work.

Maybe I'll have to do another recording, but that's OK.

I can edit that after once.

It's easy.

So I'll wait for you here.

Yeah.

Okay.

I'm back Alex.

Sorry.

Sorry Alex, I cannot hear you currently.

Yes, that's normal.

I was muted.

So cool.

I didn't even have to start a new recording.

You can just join the room again.

Cool.

First time it happened, so I didn't know what would happen.

So cool.

Perfect.

So does it work now?

Let's try.

No, no, no.

I'll give it a last try and otherwise I just.

Yeah, otherwise it's okay, but...

Yeah, Google Chrome, it's there.

It should work.

I allowed it.

So I don't know, Google Chrome, it's fine.

It can access.

but

I'm checking that it could be on my end maybe, so...

screen the window

also it's all good, so...

And I'm sorry, no, fortunately it doesn't work.

Anyways, that's OK.

So well, then let's continue between the screen sharing.

You can just talk through it.

It's no problem.

I've done it.

We've done it for a lot of podcast episodes.

OK.

Yeah, so the structure basically is relatively simple.

You need some idea of what the performance of the player is.

And you have to have a proxy for that.

And well, you need this performance to be observed, obviously.

And the proxy that I choose for a player's performance is whether he scores a goal or not, so 0 or 1 in a certain game.

We're normally distributed our y, our target.

And it's basically a logistic regression that we are running.

Because what we want to identify is really the skill and the ability, latent variable hidden in our observed performance measure, basically.

And so the model is pretty simple.

You need the prior.

You have basically a bunch of coefficients.

That is, you have the alpha.

the skill, the ability that you're interested in.

And then you have the loadings, the coefficients on all the factors that are in your model.

So you basically have to impose priors for all the coefficients.

And then you have to define the likelihood, the newly distributed.

And yeah, that's basically the model.

It's on the workbook.

And people can go through it.

There's also a redacted version, basically, where you're

People, if they are fancy, can try to work with their own priors and all that and try to do it themselves first and check the unredacted version.

So they want to play with that a bit.

Yeah, that's basically it.

So it's nothing really crazy.

It's the four lines of code, the basic model, basically.

And yeah, when you look at multiple players, so you can do that for a single player only, but you can also do that for sure for multiple players.

The key reason is that.

Basically, everyone should be exposed to the, each player should be exposed to these factors with the same loading basically.

So you can expose, impose a hierarchical structure on the ability and skill of each player.

You should definitely do that, but you can post the hierarchical structure by player or also by season.

So the ability of the player may evolve over seasons or across seasons basically.

That's, I think.

something worth looking into or worthwhile doing.

And then basically you have the loadings on the factors and they should account for the team effort basically.

You want to account that and you want to get that out of the way so that you're basically in the end left with this latent factor, the alpha, the inherent skill and ability of the

player.

Yeah, yeah, yeah.

OK.

Yeah, that makes sense.

And I mean, for sure, I will put all of these in your episode's show notes.

And actually, I think I can share my screen.

I didn't know why I didn't think about that before.

And here is the notebook, right?

Am I on the right notebook?

Yeah, perfect.

So.

yeah.

So there are a couple of notebooks there.

So there's this in the Pyamicon folder, that's the one where there's the redacted version and the unredacted version and the version that we're currently looking on.

That's the initial part with all its typos in there.

Ah ok, so it's not the right one.

Then, should look at another one.

one, so it's perfect.

The other one is just a bit smaller and more concise, I would say.

Ah, here.

Unredacted.

Perfect.

Yeah, I have it here.

So yeah, like for those of you watching on YouTube, I'm charging it right now.

And so basically, this is the part of the model where you're talking about the likelihood, where it's goal is scored or not scored.

And then you have here the probability, which is basically here.

this alpha that you talked about, right?

That is the inherent skill of the player which enters probability.

And you have the Xs and the beta.

So the Xs, are they the factors or the beta are the factors?

So the Xs are the factors.

These are the differences across the teams or between the teams.

And this is what you want to basically account for and to clean the observed performance measure from.

Yeah.

Yeah, yeah.

Oh, yeah, OK.

Yeah, for sure.

And then the beta is the slope, basically, on the factors.

Yeah, yeah, yeah.

Yeah, yeah, it's a fun model.

So of course, it's hard to make it just this on the podcast.

But I encourage you to go and watch that part on YouTube.

I'm sharing it right now.

And also, you can just take a look at the notebook from Max, which I put in the show notes, where you have all the details.

So it's pretty fun to look at.

And also, as you were saying, the model is pretty small.

So that's the amazing thing that I find is that basically, and now if we go look at the Prime C implementation, so a bit later down in the model, the really cool thing is that

basically the model is

quite easy to code, right?

And in a way, that's just a few lines of codes, so basically four lines of codes, as you were saying, and you're done.

So that's the beauty of the probabilistic programming framework, right?

It's a really useful model.

But if you want to get to a first good enough version that already gives you interesting insights,

you don't have to reinvent everything.

And you don't have to go with the first, hardest version from the start, where you have a hierarchical time series model where everything is varying and pulling information.

Sure, that's cool.

But don't start with that.

It's like if you're starting to train, don't start with 100 push-ups.

Start by like try five first, and then do a few series of them.

build your way up to 100.

So that's the critical thing I find of here at the patient framework coupled to the part of probabilistic programming languages, which is you can get down to a first good enough

version and then in a few lines of codes having your version and then sampling from it.

Because here you have it on the screen.

The likelihood that you have a line for deterministic, which is the.

logistic regression line, and then you have your intercept and your coefficient on the factors.

And basically that's it.

That's really amazing.

Absolutely.

No, that's, I think, the beauty of Climacy that it allows you to describe or build your model in a pretty intuitive way.

And you can even let it be printed out to see if everything is as you would have expected.

And yeah, then Climacy does the dirty work, the sampling and all that for you.

And yeah, but it already gives you an intuitive idea of how the modeling works.

And yeah, that's absolutely super cool.

really fun.

Well done on that.

And so I'm curious, what are your, do you have any ideas?

Do you want to keep working on this model?

Do you have any ideas on where to take it from what it is right now?

Um.

Yeah, that's a good question, actually.

So definitely the model can be improved.

And definitely, it's all depending on the features that you have and the data that you have.

And I think the clubs, they have so much more interesting data than I have.

And they could build many, many more interesting factors according to our differences across teams.

So yeah, I really don't know because I tried to reach out to a couple of clubs, let's say.

But I don't know.

there was nothing really coming back.

So yeah, apparently, perhaps they're not interested in that or maybe they have their own models already or something.

So I really don't know.

I'd be excited to work on that.

But as you said, it's rather a side project that I did once upon a time.

And yeah, it's not really related to economics or finance.

That's why I'm currently working absolutely on other stuff.

But yeah, I would love to work on that in that regard.

But yeah, it seems not.

not so many teams are picking up on that, at least to those that I reached out.

And it seems to be European clubs.

Um, because in part of your last episodes, I heard people talking about that in the United States, it's pretty different.

And, um, yeah, uh, there are a lot of, apparently a lot of clubs already trying to implement that to really try to understand the inherent latent skill of, of players, not

necessarily in soccer, but in baseball or in other, um, in other disciplines.

Yeah, yeah, yeah.

So this is sad, but I'm kind of reassured to hear you say that because I do think it's a huge area of improvement that there is in Europe.

And clubs just don't seem to be very interested.

The thing I know is that a few English clubs are using data pretty heavily, like Liverpool.

Manchester City, clubs like that, but still is kind of the exception.

I know Toulouse now in France, which is a small club, and that makes sense.

If you're a small club, you have less money, so you have much more competitive pressure to find good players, which you are not overpaying, which is basically where science can help

you.

You don't want to pay for just a name.

You want to pay for someone who has a name because...

he's got talent, not just because he's got a name.

So it's like, to me, everybody should do that.

And I just don't understand why they don't.

Because it's just like, that's also the beauty of sport, right, you don't care about the name, you care about what someone can do and if they have talent or not.

Like, you should not care at all about the name, about the color of the skin, about nothing else, but what they can do on the field.

And...

Yeah, like to me that if I had a club, that would be one of my first priority.

How do we make sure we optimize the way we are signing the players because it costs a lot of money.

So.

I think one club that also does a lot of that data work is in Denmark, the FC Midjartland or something.

I think the name I got it completely wrong.

But I heard once upon a time that they're really investing a lot in data science and trying to assign players according to data or at least incorporate data a lot in their

daily training exercises and all that.

So yeah, they are one of the cutting edge maybe there in Europe as well.

Small club, but yeah.

I think they won the Danish Championship a couple of years ago.

Yeah, not surprised.

I mean, something I see a lot, at least in France, and I've seen that a lot also on electoral forecasting, is basically this idea that if you start doing that, you're

basically becoming kind of inhuman and you make players being robots.

Basically, that's really an interesting thing to me because one of the spots that really use data heavily is cycling.

A lot of the teams are using now data.

Here, again, thanks a lot to the British, which often in Europe are the first ones to take up the data wave.

And so I know, for instance, Bradley Wiggins, I think he's won the Tour de France.

I don't remember how many times, but a lot of times.

And basically, a lot.

The whole team was using data to optimize the performances of the team.

And that was one, like the British started being like, okay, we need to get back on our circling game.

They started using data extremely optimally and well, they did.

And thanks to these, basically a lot of the teams started to do that again.

And the Tour de France is extremely optimized on that.

But it's funny because when you hear the mediatic coverage of that, at least in France, it's a bad thing because it's like players are becoming robot.

and they cannot eat what they want at the time they want.

And they like, it just gets the magic out of the Tour de Francois and I strongly disagree with that, of course, because the performances get better in a clean way, of course.

Well, then that's just better for everybody because the show is going to get better.

And also

We're talking about the Tour de France or professional athletes.

Like the goal is not to recreationally do that.

They do that for a living.

Um, so it's important for their own basically income.

Uh, but also they do that because they want to be the best.

Is it, they are not doing that because, well, they just want to cycle on the weekends, right?

They cycle for living.

So yeah, sure.

If you're an amateur cyclist, then okay.

You don't need the same.

structure as a professional cyclist.

But even then, if you want to improve your performances as an amateur cyclist, you're going to need to optimize some of the things.

And if you really care about it, you're going to need to optimize your nutrition, for instance, and maybe when you take your meals or else.

But if you're a professional, the one slightest change can mean you're going to have to take your meals or else.

perform one second better or two seconds better, which can make you win the Tour de France or not.

So I don't understand this argument in these contexts where you're trying to optimize performance.

For me, it's like not something that should count here.

They are not doing that for pleasure only.

I think absolutely agree.

Absolutely agree.

It should be incorporated much more, especially for the clubs.

In the end, I think it will pay off as you lay it out.

You want to pick a lemon, and you just rather pick it.

Yeah.

No, I mean, I have to say it's like, it's an interesting topic for me because I'm trying to crack that nut and I cannot crack it for now.

Like, understand why basically the clubs in Europe are not really interested in that.

Because I don't really care about the Chinese side or else.

I'm like, once the club starts picking that up, then everybody will have to.

But what I'm trying to understand is why the clubs don't do that.

because it's just leaving gates on the table.

And I'm just super curious about why they would do that from a sociological standpoint, honestly.

Because I've seen a lot of clubs using, they have data science teams, but they use it for marketing.

That's such a shame.

And I don't know why.

So if anybody knows, please get in touch.

If anybody is working in a club, please get in touch with Max or me, because I want to know about it.

We don't even need to work together.

I would be happy to help you out with a model, but for now, I just want to know why and what are the internal factors, because definitely there is something going on, but I don't

know what it is, and I'm just curious about it.

So yeah, to try and make it a bit more

constructive, do you have any idea on how we personally in the data world could change the status quo in that regard?

And not only for spots, but that's also true for a lot of domain where more robust application of the scientific method would be useful.

But it's hard to get it done.

Do you have any ideas personally on how that status quo could be changed?

Yeah, I think it's really hard to say.

It depends on the willingness to adopt these, to be open to these methods, I would say.

And the players play an important part, or I think the crucial part, because if the players are not willing to adopt these additional insights, I would say, it's just not

possible.

But for sure, I mean, as you say, it's management, it's internal.

things that are going on there, politics potentially, but I really don't know.

How can someone resolve that?

I don't know.

I regard it always as, for sure, you shouldn't base all your decisions on this model or on a single model or so, but it can help stimulate your decision process, and I think it's a

useful addition.

And in the end, for sure, there might be an upfront cost, basically, to implement, to get the data, to implement the model, to hire people to produce that, but

In the end, it actually may pay off economically because it may save you from picking a lemon overpaying massively.

So yeah, I see it really as a worthwhile investment.

I think the US sports has demonstrated that.

Yeah, yeah.

I mean, just look at the US, just look at all the other fields, especially marketing, for instance, which is starting and already started to adopt data analysis and modeling

aggressively and they just like, we do that all at the labs, basically making them save a lot of money and not only save money, but make more money.

So like, it's just, yeah, like, I don't think this is a question, but yeah.

I mean, something you can do.

I would think if you're interested in it and have the time, something maybe that could work is if you could make some predictions with your model, basically.

And I would think to get it per player, you would probably need some hierarchical structure in that to get some better predictions.

But once you get there, you have something of a web page with basically the predictions of the model per player saying basically,

this player is basically overvalued and this player is undervalued, basically based on the results of the model.

And then basically see what that gives you during the season because at the beginning of the season, you can see that player is basically undervalued.

He's gonna perform better than what the market currently think.

And then people see that it's true.

All that's a clear sign that basically these kind of...

methods and models are working and so that could spark some interest.

Um, because definitely demonstrating what a model is for.

Because I'm my hinge, hinge hunch.

I think it's hunch.

My hunch is that, um, basically the decision makers in the clubs are not data, um, they don't, don't really know what data is about.

and they even don't know what a model is and what it can give you.

But if you are able to demonstrate what a model can give you, because they don't care about the model, the priors, the parameters, stuff like that, they just care about the

results of the model.

So if you can demonstrate the results of the model and even better what the model can say about recruiting that player or not recruiting that player, that would maybe have a better

impact, or at least I would say it increases the probability that the impact...

These methods can help get noticed.

Oh, absolutely.

That's absolutely the case.

For sure, it depends on having the real-time data, basically getting the real-time data.

That's an upfront cost that you would have to pay.

No, but that's actually the intent, really.

This is the intent to run that model for multiple players as part of the workbook, for example, to lay it out and to compare which players perform well or not.

And you see it, for example, Cristiano Ronaldo, when he won the.

player of the year award in 2008.

He was basically in the middle of the pack in that season.

So there were other players actually outperforming, for example, Imera Berbertov in that very season.

He was playing for Tottenham later on in the year, thereafter signed by Manchester United.

So you see that.

And for sure, there's a lot of subjective judgment coming in from when you observe it and you see the model telling you something completely different.

But this is stimulating and it should

potentially update your priors, so your subjective price.

Yeah.

And forces you to lay out your priors clearly and on paper.

So it's actually very important.

Yeah.

So I would say definitely something like that.

And if you have the predictions for the biggest number of players on a webpage and basically betting based on the model, saying that this model, this player is going to over

perform.

in respect to the market or underperformed in respect to the market.

That's an interesting thing.

And also, as you were saying, for the individual rewards, where the name is extremely, like, counts a lot, where you can see someone like Messi, who is, yeah, sure, an

incredible player.

But the number of times he's got the golden...

How is it called in English?

Ballon d'or?

Golden ball, I don't know.

You could argue that some of these seasons where he did get the reward, maybe there were other players who were actually overperforming him, but they don't have the name

recognition, so they are not scrutinized as much.

They don't have the confirmation bias going in their favor, where it's like everybody's looking at Messi because they already know he's extremely good, so they just look at

confirming the fact that he's...

Incredible, which he is, but maybe not all the time, so as to get so many rewards.

So yeah, like that.

To me, that would be a really good way of demonstrating the utility of these methods.

Basically, making it really concrete for the decision maker.

Thank you.

So before we close up the show, I'd like to get back a bit on your personal experience with bass.

And I'm curious, what was your main pain point on this project, the Sucker Factor model, and just in general, when you're using the bassian workflow, what is your main pain point

right now?

Yeah, so in that project, I really have to admit that Mayer was lucky.

But there wasn't really a huge pain point.

I mean, it's not something publishable for a paper or so.

It's just basically sketching the idea behind the model and basically showing the outline of the model, what it can give you.

pretty well.

I didn't really, I don't remember any really big problems.

So then when I looked at the model evaluation, everything looked fine.

I mean, for example, we can evaluate the how well the model works is when you look at in this logistic regression at the area under the curve, for example, it's a popular metric.

And it wasn't a reasonable ballpark.

And that was fine for me so that the model didn't

the results were really what you would have, or that it's kind of reliable, the results.

So that was not much of a pain point.

And that was also nice for me to see that, yeah, it's a simple model and it works also pretty simply.

And yeah, that was a project that I was pleased to see that there were not many obstacles that I had to overcome.

Nice.

Yeah, that's good to hear.

And so in general, in the Bayesian workflow, do you identify something in your own learning that is costing you to learn right now, that has cost you to learn, and you would

like an easier way to have learned that?

I mean, I have to say that, for example, with all the different samplers that are out there, that's not my major field.

I would like to learn much, much more about the inner workings of all these samplers.

I mean, I code maybe one of the simpler ones, myself maybe once or so, but then I really resort to open source packages for that.

But to really understand what's going on, I think, yeah.

looking deeper into that, that's definitely something I would like to do and would need to do.

But yeah, I think that's basically the math of it.

I think it's the most fascinating stuff and how it really works and how it's then implemented in code.

I think that's the most fascinating stuff.

But yeah, the beauty of PyMC then is if you really are interested in the outcome and want a fast outcome, yeah, it's pretty intuitive.

Yeah.

Nice.

OK.

Well, it's good to hear.

Yeah, and I'm asking that from a developer perspective and also teacher perspective.

That's always interesting for me to get a peek in the learning experience of the people.

Cool.

So before we close up the show, is there a topic I didn't ask you about and that you'd like to mention?

Well, actually, my career hasn't progressed so much so far.

So I think we covered everything there.

So, oh yeah, that's pretty interesting.

And yeah, you covered actually everything.

Awesome.

Yeah, we did record for a long time, so that's a price.

Yeah, and I'm happy.

I got to ask you the main thing I wanted to ask you, so that's super cool.

In a reasonable amount of time, I'm sure the listeners will appreciate it, because the last two episodes were the two longest of the whole podcast.

So it's good to get back to reasonable amounts of time for people, I guess.

And yeah, so before letting you go, I'm gonna ask you the last two questions I ask every guest at the end of the show.

So Max, if you had unlimited time and resources, which problem would you try?

Yeah, so I think one of the most popular answers is climate change.

And definitely, it's, it's probably the most present problem, especially here in Milan currently.

You really feel it.

But when I've been or throughout the time I've been working on a bit of climate econometrics, let's say, forecasting RTC, as I saw what people are really doing in climate

and what, yeah, they're fascinating people out there very, very

intelligent people.

So I think my throwing money on me would be wasted in that regard.

I mean, what I'd be rather interested in is like, yeah, maybe implementing that into sports into sports analytics, right to, to allow teams to access data to have access to

data, and to kind of create that level playing field across players and then really, yeah, it's an investment and

people spend a lot of, especially in investing and in banking and finance, spend a lot of time on crunching numbers and why not do that in sports as well if you have the data

available.

So yeah, I'd be very, very interested in working on that.

That's for sure.

Yeah, I love it.

Me too, for sure.

That's a good one.

And if you could have dinner with any great scientific mind, dead, alive or fictional, who would it be?

Yeah, well, that's a that's pretty a tough question, I have to say.

So no, really, it's, yeah, there's so many amazing people out there.

And when you read papers, that's really incredible.

What people are doing.

And so yeah, there's so many people I'd like to talk to you on.

Well, one, one for sure.

It's Frank Debal, the guy who basically invited me to the University of Pennsylvania, because that was a declining

point in my PhD, absolutely.

But then if I could pick one as professors should expand on your network, basically, it would be Ben Bernanke.

He was former president of the Federal Reserve.

He received the Nobel Prize in economics.

Well, people say there's no Nobel Prize in economics, but yeah, the Ricks Bank prize last year for his work on banks and financial crisis.

Yeah, that would be super interesting to talk to him.

He served his country basically.

Then he was assistant professor.

So how he managed all that.

And yeah, that would be super interesting to talk to him.

Phenomenal scholar.

And I like reading his papers.

So yeah, I think that would be super cool.

Nice, yeah.

Love it.

Very nerdy answer.

Awesome.

Well, thanks a lot, Max.

That was really interesting.

You allowed me to rant about some of my pet peeves about data analytics and soccer.

And I hope people learned a bit more.

And of course, if they are curious, as usual, I will put a link.

resources and a link to your website in the show notes for those who want to dig deeper.

Thank you again Max for taking the time and being on this show.

Thanks Alex.

It was a pleasure.

Support & Resources

→ Support the show on Patreon
→ Intro to Bayes Course (first 2 lessons free)
→ Advanced Regression Course (first 2 lessons free)
Theme music: “Good Bayesian” by Baba Brinkman (feat MC Lars and Mega Ran). bababrinkman.com