One of the greatest features of this podcast, and my work in general, is that I keep getting surprised. Along the way, I keep learning, and I meet fascinating people, like Tarmo Jüristo.
Tarmo is hard to describe. These days, he’s heading an NGO called Salk, in the Baltic state called Estonia. Among other things, they are studying and forecasting elections, which is how we met and ended up collaborating with PyMC Labs, our Bayesian consultancy.
But Tarmo is much more than that. Born in 1971 in what was still the Soviet Union, he graduated in finance from Tartu University. He worked in finance and investment banking until the 2009 crisis, when he quit and started a doctorate in… cultural studies. He then went on to write for theater and TV, teaching literature, anthropology and philosophy. An avid world traveler, he also teaches kendo and Brazilian jiu-jitsu.
As you’ll hear in the episode, after lots of adventures, he established Salk, and they just used a Bayesian hierarchical model with post-stratification to forecast the results of the 2023 Estonian parliamentary elections and target the campaign efforts to specific demographics.
Oh, and let thing: Tarmo is a fan of the show — I told you he was a great guy 😉
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh and Grant Pezzolesi.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉
Links from the show:
- Tarmo on GitHub: https://github.com/tarmojuristo
- Tarmo on Linkedin: https://www.linkedin.com/in/tarmo-j%C3%BCristo-7018bb7/
- Tarmo on Twitter: https://twitter.com/tarmojuristo
- Salk website: https://salk.ee/
- Hierarchical Bayesian Modeling of Survey Data with Post-stratification: https://www.youtube.com/watch?v=efID35XUQ3I
In episode 83 of the podcast Tarmo Jüristo is our guest. He recently received media attention for his electoral forecasting in the Estonian election and potential positive role in aiding liberal parties gain more votes than expected.
Tarmo explains to us how he used Bayesian models with his NGO SALK to forecast the election and how he leveraged these models to unify the different liberal parties that participated in the election. So, we get a firsthand view of how to use Bayesian modelling smartly.
Furthermore, we talk about when to use Bayesian models, difficulties in modelling survey data and how post-stratification can help.
He also explains how he, with the help of PyMC Labs, added Gaussian Processes to his models to better model the time-series structure of their survey data.
We close this episode by discussing the responsibility that comes with modelling data in politics.
please note that the transcript was generated automatically and may therefore contain errors. Feel free to reach out if you’re willing to correct them.
Welcome to Learning Bayesian statistics.
Nice to be here.
Yeah, thanks for taking the time and how bed with my Estonian fermentation of your name.
I'm kind of used to this. So you're actually doing pretty good.
Okay, cool. Yeah, I think you told me once t how to pronounce your name. So I thought like I tried to remember. So basically, because we're gonna talk a lot about that in the episodes, but basically, we know each other from a project we've worked on. So you've contracted this at PMC labs. And you had a very interesting electoral polling project. Where we ended up doing very fun things with hierarchical multi normal models. Multi Level regression with post stratification because you have very good pulling data and census data in Estonia, which was awesome for me. To work on. The nerdy me thank you a loss for this opportunity. And, and then, on your own afterwards, you folks at eat some Gushan processes to all of that. So this is already the summary for the listeners that we're gonna talk about. A lot. of fun stuff. We're going to go a bit more into the details, because you've already actually been, you know, playing see meetup, the meetups that we regularly do points labs meetups, and I'll link to that in the show notes of the episodes but Yeah, over there, we talked to Pete tomorrow, but the whole context and how we work together. So here, what we'll do today is more going into the weeds of the model, but also very interestingly, how you used the model because I found that super interesting the model was actually super helpful and very critical in how you ended up setting up campaigning for Estonian elections this year in 2023. So we'll talk about all that. But first, as usual, can you tell tell us about your personal background that move because you've actually had a very, you have a very interesting background because you've done a lot of things in your life. So yeah, like, can you tell us how you actually ended up in that nerdy world of stats and political science now?
I guess in a way I have been in that, you know, different kinds of nerdy, nerdy words throughout my life. So I was born in Estonia in 1971 Back then, it was Soviet Union. And from the school days I was I was a math nerd. So I was really into mathematics. This is something which I found really interesting and fascinating guy was into board games into throughout my life into you know, martial arts or anything and everything is really you know, what I get fascinated about, I really tend to dig into in a deep way. And I guess this is how I now also feel about Bayesian statistics. And modeling, forecasting all these things. But the background is the education background is that I, I graduated from Tartu University in economics and finance and more specific that was already about 30 years ago. So back then I had a few semesters of linear algebra and maths and statistics and everything like this. But as I recently found out, things have changed a bit over the last 30 years. So I've had to pretty much relearn all of that. But yeah, just to skip a whole bunch of things. In in the end of 2020, I was the head of the think tank, Mr. Brooks is called proxies. And at the same time, we had a government where our far right party was one of the coalition partners, and they were successful throughout different kinds of happenings, coercing the marriage equality referendum, which was, which was a bit of an issue because Estonia being the former Eastern Bloc country is not quite as progressive in terms of civil liberties and gay rights in particular, as the Western Europe and the short time we had, for the referendum meant that it's it was very, very hard to put together a successful campaign however we organized we tried to get into it, and then again to cut to bits and pieces. Long story short, the referendum never happened. It got called off in the 12 hour but we were already there. And we were organized and we figured that we're, we'll be continuing and, and that's how we we got into this organization that we're running now. And this is in the end how we basically met met with you in Thailand.
Yeah, it's thanks for that. Start summary. story. I know he's longer. Yeah. And I also want to say that you also did some screenwriting. So you have your you've had a very, very diversified carrier. So that that's really interesting. We talked about that when I was in Cali, and that was super interesting to to extend that. And so to dive in a bit more into base, do remember when you first got introduced actually to patient statistics.
Now this is a little more difficult thing to point pin down precisely in the abstract terms. I think the first time I really took significant notice and Bayesian statistics was probably about 10 to 15 years ago when there was a brief period of my life when I was taking playing poker quite seriously. And at that time, this was a very interesting, interesting time in poker as well. The way how the game was approached was changing very rapidly. From the old days of you know, smokey Cardrooms back in Las Vegas to Internet poker, where lots of people were starting to use the statistical tools to analyze the game to figure out their own leaks in the game. And there was lots of interesting theory being produced in a pretty short span of time. So people started thinking about poker hands, not in terms of of, you know, the particular hand you have rather than the ranges of hands or, as you know, you would say in statistics, the distribution of hands rather than the point value and so we would start balancing the distributions against those of your opponents. And think in terms of the expected value of not your single hand rather than a distribution of hands like it would play in any particular spot. And then you would have things like Game Theory, optimal play and all these and I was I was at the at the time when I was playing. Of course, these things help you to play well and you know, make money or not lose quite as much money as you otherwise would. But this was something that I found really fascinating also, in abstract terms, the way how we change your thinking. And then there was a really one, you know, seminal book that came out many years ago by Bill Chan and Gerald Ackerman, I think it was the his name, which are called mathematics of poker. And this was very Bayesian, so they were explicitly Bayesian in their approach. And I guess this must have been the first time when I actually really you know, got thinking about Bayesian Bayesian statistics in in specifically Bayesian terms, because as I was saying, back when I was in university 30 years ago, then the statistics, as far as I can remember, everything we were taught was was frequentist so this was there was no bacon stats.
Guess so like, basically these these, I mean, I'm not surprised that basically diving into Parker Parker helped you then discover bass. Did that help you?
Well, yeah, obviously it does. I mean, it's something of course, people are imperfect, in terms of the replication of the statistical principles, and then people are really bad randomizers and all of that, but the way how which I think the most valuable thing from from that period of of my life was, was not really just related to poker, but the way how this kind of approach changed my thinking outside of poker, the way how you think about the randomness, the way how you think about chess the way how you think about, you know, in many ways, life in general. So rather than thinking about the things that happened in terms of, you know, point estimates or point values, you think of them as ranges of different things that could have happened. And then that will, it will limber up the way how you have you looked at the wide range of things, not just playing cards.
Yeah. Yeah. I completely agree. Is, is the cool thing of that framework is that it's a tool that you can use in a lot of like any, any endeavor that you have, where you need to think, which is a lot of cases.
By the way, I even used the Bayesian stats, kind of thinking when I'm doing martial arts nowadays. So when you're, this is the Brazilian jujitsu service submission grappling. And in a way this is also I teach it as well and I don't teach it this way necessarily to beginners, but the in more advanced groups, you can think of a fight like a stochastic process, so you don't know what your opponent is going to do. But you have a range of things they can do and some of the things are more likely or less likely. And so you have your your priors you have an idea of things that could happen. And then you incorporate the information that you gain over the encounter to narrow these these things. And in a way, what you try to do when you're fighting your opponent or you know when you're playing poker or you know, chess or whatever. board game, you narrow your opponents options while keeping yours as flexible as possible. And this is I could go on about this a long, long, long way. But you know, this is just an example that you can use the Bayesian approach to randomness or to, you know, unknown things in a much broader way than people would probably usually recognize.
Yeah, for sure. And in the fact that also the perception and priors are ingrained in the brain with without us even noticing consciously. And this is actually something that I find fascinating and I've started doing a bit of readings on these the results is at least episode 77 With Pascal Wallace where we talk about that and especially how to explain like these, you know, perceptive different perceptions of colors, especially with the dress, you know, that from 2016 that some people are seeing is gold and white. Even though it's like in lieu into why actually some people see like that in there too. It's it's a fascinating episode Pascal Duggan might do a lot of research for that. And also Episode 81, which is with Ellen Stoker, and talk about perception of visual speed and how it's actually oriented also to priors and prior experiences, and things like that without us even noticing that computation going. On because otherwise, we'll be rebuilt by decision, decision uncertainty and paralysis. So definitely super interesting and extremely, extremely useful and and rebalance in everyday life. And actually, now, can you tell us, basically, what you do, how would you define the work you're doing nowadays, and topics that you are particularly interested in?
Well, as I was saying, in the introduction, we set up the organization or that we had the foundation for a particular reason for fighting the referendum. And for this, as we, we were really short, short of time. We figured that we have to move really quickly tried to get an idea of what the public opinion is on this issue, and then see what we can do to tilt it towards the outcome that we were we were looking for. Now, this is a kind of a general problem of course, not only to in pertaining to referendums, but also to elections, that the function of the outcome of a referendum or an election is actually you know, a function of a number of different variables. The most important ones would be in case of election that would be the People's Party preference, and the other would be turnout. Because, you know, whether you agree with something or disagree with something is of no consequence if you do not show up and cast your vote. And so this is something that we got into first, we figured that since there was a less than six months to do the plan date or the referendum, then there's not going to be a whole lot of time to change people's opinions. So because you know, based on all the available literature, you can change people's minds. But it's hard to do that overnight. So it takes time. And however, it is oftentimes much easier to change people's behavior. So you can you can try and and motivate people to to get out and then cast their vote, or you can you can try and give them you know, good enough reasons to stay at home or not show up and then getting bored. So this was something that we were rehearsed, trying to look into. But just as I said before, the election never materialized. But we had already gotten started with this idea and then we said okay, so the problem is still there. So we we still have the far right party in the garment and even although the the the garment fell apart, so but we still have them in the parliament, they could easily be in the in the garment. And if you look at what's been going on in Europe, then this is a very general thing. Everywhere that the far right is, is gaining strength throughout the continent. Now however, when we were looking, we started our monthly service streams, we started getting the data in building the time series, and then pretty much immediately we notice something which is actually quite obvious of course once you once you see it and this is something that is true throughout the all of the of the Europe, country to UK and US where you have basically two party systems most of the Europe is multi party systems, where you have coalition governments and this presents you with a particular kind of a problem. This is It is also there in case of UK and US but in a slightly different than perhaps a little less pronounced form. And what I'm getting to is that if you look at the setup of the political landscape in all of these Western European countries, then you can easily see that the liberal side of the politics is highly fragmented, and has been this way for for a long time. However, the far right in most of the European countries tends to be pretty unified or you know, they are not split. Oftentimes, it's just because it's just one party like you know, the northern northern league in Italy or you know, like, the true things in Finland or Swedish Democrats in Sweden or whatever. But it's also it carries over to the sight of the voters. And what I have in mind is that if you look at the way how the voters opinions cluster, then the far right tends to have no competition for their core voters, while the liberal parties tend to share their, you know, core voters not by the party affiliation in the way how people like to you know, express how they tend to vote. But in terms of if you look at the people's political opinions or preference on issues like immigration or you know, women's rights or environment, climate crisis or the each, then the liberal cluster is split between a number of different parties. And now this is something that got me and got us with, you know, the team that we have, thinking that what we're facing here is actually a pretty basic coordination problem. So the liberal party's compete with each other, but the far right doesn't really far right only competes with liberal parties. And in a way this is inevitable in politics because in parliamentary politics, you know, politicians see the politics as a zero sum game. And they actually have pretty good justification for this because you know, the the number of seats in the parliament, whichever country is, is a set number, so it's not too flexible. And this means that any seat that someone else takes is a seat that you do not get, and this is a very you know, this incentivizing poor cooperation or even coordination between the parties. And this is a handicap for for liberal parties. Going into elections and running their campaigns. This can take a number of different guises. So, it can be you know, their you know, different type of toy gains or situations. So, you can have like, you know, battle of sexes, or you know, the Tragedy of Commons type of things where, sometimes, you know, people would like to coordinate for a certain kind of a result as a possibility for a coalition, but their own selfish interests drive them against this optimal result. So they arrive at suboptimal result. And so this was the the issue that slowly emerged or actually pretty quickly emerged when we started looking at the data. And then we started figuring that, trying to figure out what can we do about it? And so, and the rest is history like they say so this got us embarked on a very, very interesting process, which, during which also our roads crossed.
Yeah, very interesting. Story, and yeah, I mean, I think it's a pretty good segue to then start talking about Yeah, like you reached the what you recently these, and how, how the model was used. So for the listeners, can you tell us basically, what happened, like which kind of elections just happened in Estonia, and how did your work with the NGO cell basically feed into that? And then we'll get into the the patient model part of the project.
Well, we had a regular parliamentary elections scheduled for 2023. So this is something that we did set our sights on, already in 2021. And so actually, we had another election in 21, which was the local municipal elections. So this was something where we were testing some of the ideas and some of the approaches and figured out what we would need to do differently, or Additionally, on top of what we were doing, and then we pretty much set our sights, yes to 20 on a free election, which took place a little less than a month ago now. 55th of March. And just you know, spoil the surprise for anyone who wants to Gulu what happened in Estonia then the liberal side of the politics one pretty much landslide. So it was expected to be a tightly contested election between the conservative liberal party and the conservative one was including the far right party that I mentioned. Before, but in the end, the liberal side 160 seats of our 101 C parliament. So it's a very comfortable majority government that these were supposed to step into the office today. They will probably do that tomorrow because the the Pirate Party was setting up a filibusters and delaying things. It's, it's all done now. And so yes, this was something where we we leverage the data on one hand, to the extent that we could, but you know, tying back to what I was saying just before, and this is something well, we might want to revisit in a greater detail at some point, but you know, I just want to outline it here and say that although the data and the model were really instrumental, were really important in in getting the result that we did, in a way it was a little bit of a smokescreen. So we use the data and we use the model to give a common reference point to the liberal parties. And this in itself will help people to or will actually push people to cooperate. So if they see the same data if they see the same picture, and they have the you know, the basic alignment about the facts of the situation then the coordination just happens inevitably. So you do not need to coordinate people, you don't need to tell them where to go and what to do, they will start doing that by themselves. If if your information if the data is coordinated, and this is something which was a you know, perhaps when looking at the the end result, and we have been doing a bit of the interviews, trying to assess our impact. It is hard to say of course because much of it is a not direct it's it's through different lectures and different venues. But looking at the the end result, what we're really happy about is that the end result is exactly what we were aiming for. I can't tell how much of it is because of us, but it was exactly what we were aiming for. Meaning that all the liberal party's old free liberal party's punched well above their weight. In terms of they outperform the polling expectations, they outperformed expectations of experts, and all the parties on the other side underperformed the expectations. And this is something which gave the 60 seats so if only the Reform Party that won the election, who got 37 seats, if only they had performed well. While you know cannibalizing the vote share of other liberal parties. Then you wouldn't have had the landslide the way we did, but what what what was really good thing about it is that there was a solid strong performance throughout the liberal liberal fund.
Yeah, basically anything and definitely that's something we are gonna get back to. When we talk about more of the of the basically the usage of the model. I find that super interesting basically this idea that just having a reliable and trustworthy outside source of data and and modeling helps you solve the prisoner's dilemma basically, that you were talking about a few minutes ago. And basically instead of fighting on whether there is a problem, then parties can coalesce and be like, okay, there is a problem and let's agree on how to solve it. Which of course, way more efficient. They collaborate on the solution instead of fighting in the first place over whether there is a problem or no. So definitely, let's go back to that. But first, let's look at the model. And before that, even so you're saying that there was quite a substantial polling error. Basically, the polls ended up being biased statistically biased towards the right parties. So that means that the left parties have been underestimated. So I'm actually wondering, what was the magnitude of that error and happy the model help cope with that though it was either an error that the model had actually anticipated in the way that in the uncertainties that it was calculating these kind of polling error was already taken in and so that way, the fact that you had a Bayesian model, with uncertainties made your predictions way more robust than just taking an average of polls.
Yeah. Now, this is of course, a huge subject and we could easily talk an hour about nuances here. But let's let's find and put a finger on a few more important things. So first of all, it was a really strange situation in in Estonia in the last weeks leading up to the election, because assembly is a small country, we do not have, you know, a huge number of posters covering the elections like you would have in United States where there's literally dozens of them running different service all the time. I don't know how many how many there are in France, for instance, if
if you ate depends on the elections, but between Natella needs, not the US.
Yeah, in Estonia, it's pretty much free so three different postures, and they are all international ones with local representations. And what happened was that you know, their numbers were diverging widely in in the last two, two weeks going into the election. And as far as I can tell, learn nobody has has really too bad moment. figured out what exactly went wrong there. So there are different of course every pollster is standing by their guns and saying that we did everything right and but you know, there was a literally in within the single week. What was the most volatile was actually exactly the the support or the implied support of the far right party. So we had a situation where the the same poster reported more than 10 percentage point difference between two subsequent weeks. And we had different posters reporting more than 10 percentage point difference in the party support within the same week. And the support was somewhere you know, depending on whom you you were believing either in the neighborhood of 15% or 25%. So it's, it's a huge swing. And I am personally suspicious, very suspicious of the data quality there. So it just you know, we all know we work with a survey data and you know, that you know, mistakes happen. And even if the mistakes do not happen, then you can literally have, you know, an outlier survey. You can have a bad survey as you as you as you call it. You know, you try your best you have your your Serbia cells, which you tried to fill you try to keep the sample representative and all this but you know, sometimes you just end up in a long tail of distribution and you get skewed results. So I don't know which was the case there but you know, these numbers literally did not make sense going into the election. So this is the backdrop for this. However, not even when you take a longer view going into the the election, then the model that we will be talking about the the MRP model that Alex and the team at PMC labs helped to develop for us was, in retrospect, I can tell that it was it was all the time predicting, giving better predictions in Insight then, then the polls were showing so our model showed the party that was actually winning the election. Predicted constantly between 3033 to 35 to 36 seats, it ended up winning 37 which was a great surprise for everyone. However, the polls were averaging around 29 to 31 seats. So it's a substantial difference. So however, it is hard to be sure. You know, it might be that you know something, changed people's opinions in the tail end of the campaign. And there were a few things that were were breaking up that time. And also, I have a suspicion that this whole confusion about the ratings helped also us to mobilize the voters to try them out and say that it's a really tightly contested election, although we pretty much knew that it wasn't. But you can still make that point and drive people out to vote. And so this is something that now leads me to a second important point there is that the model that we we talked about, really only tried to figure out the latent support of the parties in the in the population. We did not get into the turnout modeling. And this is something which was a was a huge factor in this election, because in 2023 in Estonia, about 10% More people showed up to vote than in the last election. So again, there's substantial difference. And it seems that we still do not have the full breakdown of the people who who did vote the statistics, but it does seem that the activity spike or the additional turnout was definitely not uniformly distributed. So it was benefiting the liberal parties by a lot. And that's again, something that we did not even well, I shouldn't say we didn't try we try to model what we what we failed, because we have very low confidence in this modeling because you know, unfortunately, elections are pretty rare events. So you you cannot observe them every weekend and then draw your inferences. And because they happen every four years, then also the context tends tends to be very different between different elections and you're, you'd be really hard pressed to find the your priors that you can rely on from four years ago.
Yeah, that's, that's interesting. So, I mean, hear from the mother is endpoints if you want to convince people of the importance of a model you've you seem to have had the perfect, you know, circumstance, which I've been dreaming of that circumstance in France for a long time convinced French, at least journalists that just making an average of polls is not the best. And that's why it's different to do to make a model. But yeah, basically, the the model ends up being way closer to the election than the conventional wisdom and the polls. That really helps driving the point home that you need some serious modeling because these are extremely complicated events. To forecasts and just your intuition is usually fall short is usually gonna fall short. Even though you can be an extremely smart person, say like, but you're not happy
but you know, as a proper be Bayesian, you would also obviously recognize that we might have just been lucky. The results of this way.
Yep. Yeah. I'm just more talking from the marketing standpoint here. Even the political standpoint, but yeah, for sure. Like this just the first first election, and so I'm really looking forward for your next elections that you're going to try that type of models. And I mean, for sure, if you try and go into other countries also doing the same you should that will increase your sample size of elections even though that these will be different countries. And yeah, I mean, the first thing I would do as the modeler here is like trying and understand if, like, if there is a good reason why the model actually differed from the polls and the conventional wisdom. I think, to me, that would be the most interesting because maybe the model was just lucky, because it was biased. In some ways, like, you know, like in the violence bias tradeoff, it was more biased than variable and so in that case, it was lucky, but maybe the next time we want so yeah, like that. We know this pendulum basically all the time. Is there Harding, and trying to place the slider between overfitting and underfitting, especially when you don't have a lot of the sample size, as you were saying, is extremely, extremely important. But yeah, I'm like, I'm quite happy to hear about all these. These six bits vary based on patient and data science modeling. That's absolutely awesome. And so as you were saying, we could continue talking about that, but I think now it's a good point to actually dive deeper into the model, because that will help listeners understand basically what the model was doing, and why also, it could have been more efficient than the rest of the methods. And I mean, I do have a bias. I worked on the model. And also I do think that these kinds of methods are actually better at trading between overthinking and underfitting. And so in the long term, this kind of method will usually give you better predictions than other methods that are either too biased or too variable. But basically, these online priors and biased biases BOTH Yeah, can you tell us a bit about the structure of the model, first of all, like the patient's structure, and then we will talk a bit more about how we make that even better with MRP.
So even before we dive deep into the model itself, I would like to set one thing straight there and say that, you know, this was definitely the case with us, but I think it's also something which would apply in an in a bit more universal broader way. We did not use the model and we didn't even you know, suggest using I actually suggested against using it for predicting the elections. So this is something which, again, would would take a lot of more looking into it. But say, in the case if it was just to give an example, if it was a really tightly contested election, so it was basically a coin flip and you build a model that would give you a correct prediction. Who wins then I would say that if it really is a coin flip, type of model, or type of situation, then your models prediction is, is pretty useless. Because you know, the distribution if the mean is right in the center, the center of the the outcomes, you know, if you were right then it was just luck. So predicting coin flips is not something that you should use a Bayesian model for. However, what you can use a Bayesian model for is determining whether the situation you're facing is it indeed a coin flip, or is it it's a lopsided situation. And this is something where the Bayesian model can give you a lots of really good input and this is where we get into the importance that you were also referring to before that the model can give you that the chest you know aberration with the survey results wouldn't because the if you average the survey results, you end up with a point estimate and this can be either right or wrong. But that's not in itself a hugely useful piece of information. However, if you do get also the uncertainty estimate with this, then you can make you know much more informed course. Whether this is you know the right place where you actually want to, you know, send your resources to whether this is a you know, Hill to die on, or whether this is something that you should just you know, leave leave aside because, you know, there's a no but a snowball's chance in hell to get a mandate from that. District. So this is, this is one of the important things to keep it and now what we try to do the other thing we tried to do with the model, and that's
new here, but the model has access to previous elections. Like the difference with like, just averaging is just like you, you train the model and produce elections. So if you structure it in a way where the model can actually learn from history, something you can do with just simple average that is
true. That is also absolutely true. And now the the other thing that the model let you Well, we were using our monthly survey waves were with the sample size was 1200 observations. So this is not a you know, huge thing but for Sonia, this is pretty standard one. So across the whole country. It gives you a pretty good idea of where things are. But as soon as you start zooming in into specific districts or into specific, you know, social demographic groups, then the data very quickly gets very noisy. So it's very, you know, there's just simply not enough observations and your monthly observations start to fluctuate by a lot. So, it is something that you basically cannot just rely on. And if you if you let's say if you the example that I used also in the in the in the meetup that Alex was referring to before is that if you take the second largest city in in Estonia, which is startups, and then you take the male citizens, and then you take the second largest, or the the main ethnic minority, which is Russian speaking people, then you are left with a sample size of four out of initial 1200 And this is obviously something that you know, you cannot work with, but the point is that you can with a model you can with a survey data you cannot but with a survey data you act like a Martian who's who's been you know, stranded to to planet earth, first time put down to city authority to ensure for Russian speaking males and asked to tell make any sense out of the situation, but we can rely on the signal that we can pick up from the whole sample. So we have 1200 observations, 20% of which are Russian speakers, about 8% of them live in the city of Tartu and so you can borrow the signal from other parts of the population. If you have a, you know, rate idea what drives people's political preferences opinions, and now this is where we get to the model. So this is how we set it up because we had a, you know, a bit of a hypothesis, but we had pretty good idea. What are the main determinants of people's political preferences not just the party preferences, but also the underlying political ideas that cause people to prefer one party over the other? And in sterling case, we figured out we found that there are four main things that determine these these opinions. So those would be age, gender, education, and ethnicity. These are the four main ones and there are of course, important interaction effects effects between those as well. But if you know these things about a person, then you can pretty easily construct you know lightly distributions of their political opinions. And of course, each and every person's concrete opinions can vary within these distributions, but we're not concerned about predicting, you know, preferences of a single person. We're concerned of making inferences across bigger distributions.
Yeah. Yeah. So that's, that's where the basic structure comes, comes into place. Right? They can, can you do you want to talk a bit about that, or should I give the rundown basically, of how that kind of modeling could work?
Well, I can I can give the I can try and give from my side the, the overview of the structure. So, basically, perfect. Basically, obviously, this is we did not invent the wheel. So this is the the type or structure of the model, which has been used for for quite, quite quite a long number of years. So it's called MRP, the multilevel regression with postcard education, where we pick up let's let's take them you know, one letter at a time, so the multi level is basically just the refers to a hierarchical structure of the model. And let's leave it aside for a moment. Let's get back to this. So the R is regression. And this refers to the point that I was making before that we would have a model that learns the relationships between these four factors that I mentioned before the age, age, gender, education and ethnicity. The model learns how these things affect or tilt, or you know, somehow influence current medical preferences, yeah, opinions. And then also looks at the way how these different factors interact in these influences. And then they have a pretty good idea of what each and every component of these, these, these four do. So it's kind of like levers that you you can slide to one side or the other. And it's a very multi dimensional data space that this thing unfolds in, but basically, this is how it works. And then the post front ification part is that once you put the multilateral and the repression parts together and multilevel is then something which I was referring to before saying this motion thing that you can, you can borrow the signal of gender, ethnicity, whatever else education, run the light groups as we're in. So we had this what you were referring to yourself, Alex as a Russian doll doll type of structure where the it's a nested structure where the we had a small geographic units, kind of local districts that were grouped into electoral districts and that were then grouped into the whole population. And so the model keeps them separate, but allows you to learn across the divisions of these geographical divisions. And then post from implication is the final bit, which I think was also pretty important for the the end results being that way. They were. We were kind of lucky that Estonia had its food census about a year ago, so half year before we started working with the model. So we had a fresh, high quality census data that we could just get from the statistics office for you know, a couple of like 20 bucks, and then have model to D bias the, the the inputs from the survey and scale them to the population. And this does a number of things. So first of all, yes, D biases, the estimates, but this also allows you to simulate the population and then instead of working, you know, like I was giving the sample before about part two and you know, for Russian males, you can simulate the actual number of Russian speaking males in character. And then you know, dice and slice them whichever way you like, group them. Figure out you know, isolate the further narrower category of age groups within this broader demographic, and make inferences about this. And you get the inferences together with your uncertainty estimates, which is again, hugely important and useful. So this is the overview of the how the MRP model is set up. But I guess we can we can get into a lot more more detail there. So also, like you were saying before, we added the GP part. So this is this is something which was a crucial, crucial component.
Yeah. Yeah. So I will dive into Yeah, we'll talk about the TPS in a minute. And then we'll dive into how you concretely use the model during the campaign because it will also help people understand the how powerful these kinds of levels can be. And there's so to summarize what you just said with the model structure is that yeah, what you observe are poles row poles that you will come back with your partners in the NGO, and then you get those row poles, which is extremely valuable because me that was the first time I got the opportunity to work on relay row poles like that it was not poles that were reported by draw a line newspapers are also which is what usually I use for French editorial forecasting. But here you get access to the row polls. And so you do that when it's in your regression part where the model basically is trying to then Multinomial choice simulation and based on that, well, Bayes formula comes in we observed polls, and the model says, Well, based on the data I've observed in on priors, and the structure of the model, which is reflecting your domain knowledge. I think that the crew Leighton's popularity of the parties in the population is these and when you get a distribution for each party, but as you're saying, we're observing polls, and even though we're also doing a regression using the social demographics, demographical factors that you talked about in we trained the model on previous elections. This is still a biased sample of the population because it's a poll. And so afterwards, Tom's the post stratification part that you talked about that was invented by Andrew Gelman and other very smart people. And basically, this is kind of a magic thing that is so easy in a way to do in the Bayesian framework right. Just now that we fit the model, you tell the model Well, now you mentioned that we observed these data, which are extremely reliable data, because they come from they are census data. And in Estonia, you're incredibly detailed census data, which then you can use and then tell the model, okay, based on these data that we trust now, make the predictions on what we learned previously from the polling data. And now you have your device estimates that you're talking about, and you are able to make predictions even for very low sample sizes of the population, like so, as you were saying, Russian speaking males, in fact to maybe low education, Russian speaking males, and then you can you can dive into the you can slice your your population to strike out whatever you want. But since you have your sense of data, then you're able to actually make predictions, which also makes sense to you as the domain experts, which was the amazing thing. And the uncertainty is actually workable, right? It's not an uncertainty and it's like, oh, yeah, well, they think we should, like the Estonian government should invest more in education. With a probability of 10 to 70% is actually an actionable. This was actually very actionable probability. And to me, that was magic. Like, here's, like, I know how this works mathematically, but like, doing it and then seeing it in action, and how that device is your you're you're estimating allow you to make predictions on very small datasets that it feels like magic. That was incredible. And all that structure, then you add IID Gaussian processes. So yeah, now I'll give you the floor again, if you want to add anything to that and then just talk a bit more about the Gaussian processes before we go into the practical use of the of the model.
In some cases, I just want to underline this what I said before about the uncertainty estimates being important. In some cases, if you you know, slice the data long enough, then eventually you get to the point where the uncertainty is going to explode. So this is just the nature of the way how statistics works. But even in these cases, it can be immensely useful. Because when we were showing the results to the, you know, Soviet clients to the parties that we worked with, then we were drawing their attention to this and saying that, you know, don't just look at the distribution means don't disregard the, you know, the long tails of the distributions. So if the distribution is really wide, then you shouldn't be using, you know, a stopwatch or a, you know, the ruler to figure out which option is better than in this case, he would say they are roughly equal, but in some cases, you can figure out even though the aims are longer than the, you know, the joint distribution is pretty small. Then you say that, you know, there's actually a significant difference between these two options. And I can say that with a pretty high confidence, even though the, the uncertainty can be very high. So this is, and I absolutely agree with you, as you said before, but it looks and feels like a matrix. So it takes a while to get used to, especially if you come from, you know, working with just a raw survey data and then running reports to apps, and then figuring out that, you know, I think I have a signal but I have no idea how short I can be about that. So this is a very different world. And now about GDP. So the Gaussian process part, so that was really important addition. We did discuss it with you Alex first, but we figured that we'll we'll leave it out of the you know, the main main version of the model that you shipped. But the reason why it's important is that if you show a model, let's say two years worth of data, and that's been collected on in monthly intervals. So if you if you just feed it to the the model, then the model really has no way of knowing that what Edie seeing is actually a time series. So it will take the whole variance over the two year period as a, you know, content or simultaneous variance within the same moment. And as everyone knows the party popularity can have wild swings, especially you know, at the time of COVID pandemics and everything like this. So the it's been a roller coaster. And so there's a huge variety or huge variance in the data. However, if you can tell the model and say that, you know, this is this is a time segmented time series. So it's not just you know, one moment that we're talking about. So the Gaussian process allows a model to keep the time period separate the same way like the hierarchical structure helps it to keep the geographic units separate, and still learn from this and not mix everything up. Keep the important distinctions but pick up the useful signal and that is what the Gaussian process gave us. So once we added this, then the uncertainty came down. We could make even if we wanted to, we could make predictions into the future although then as you will know the uncertainties is going to explode very quickly. But it will let you it will let the model to learn much more precisely.
Yeah, exactly. And yeah, I mean, I do still snacks for French elections, and it's definitely something you want because we are live you give all the time. The whole time series to the model. It will be both over confidence and estimate larger violence than needed, which is the weird combination. But yeah, because the model is like wait, that's where that party can go from 5% probability popularity to 25%, which is a five fold increase or decrease in interest a few like at the same time is weird, because the model doesn't know about time series. That I'm not conscious of time. And at the same time, the model has a huge load of data. If you give it my I don't know, five elections. That's actually a lot of polls. So then the model will think Well, surely I have a lot of data. I shouldn't be very uncertain. So it should be very certain that the variance is very high. Exactly what you're told what still Yeah, like then any so if you're conscious of that, then you can you can still work with a model, which doesn't have time series. So it can already be a good model, but then definitely the next installment in your modeling workflow should be okay, how do we make the model time conscious basically, because it needs to know that? Yeah, one party can go from 5% to 25%, but it usually happens during yours and not during one election campaign. So that's where all the work you did on the on the Gaussian processes, I guess was very useful. Still, okay. I think now, listeners have a very good background for everything your deeds, and hopefully the more astute technical listeners will feel fulfilled by the previous segment. Now, before we close up the show, because it's already late for you, I don't want to take too much of your time. I can you dive a bit into basically how you use the model and how you use these also to basically focus the campaigning efforts and the kind of insights that you've got practically from it.
I guess the most interesting and important contribution to what we could offer the parties was was something where we we did use the model that we were just describing the MRP model. As a platform for working with the different kinds of data. So instead of the, the MRP model was initially built to predict the or discover in for the latent support for parties in the population. But what we ended up doing was that we run a different kind of survey, which was set up as a max diff or best worse scoring. So just very briefly, we had 18 different policy questions that we showed people, the respondents we had, again, 1200 respondents, every one of them saw 10 times a random sample of five out of the 18. And every time they had to indicate the one that is for them most important for making the decision, and the least important for making their decision. And this gives you you know, a whole bunch of data points. So this is like 12 1200 people giving you 10 screens, so that's already 12 12,000 data points, and in each of those, there's five options. So this is this is a lot of data that the model can dig, dig into. And now what it discovers is not just the latent support for certain policy proposals, but it also gives you heterogeneous effects over the different socio demographic groups. And it works out each and every person is sort of a latent how to how to say that order importance of these, these topics. And now this combined with the same patient model that we used as a platform, so we we built a couple of things on top of it, so we still use the GP. We had a little bit different regression part where it learns people's Layton preferences, and then allows us to post stratify those across the whole population. And now this is why this was really important is that as I mentioned before, it lets you to dig into the heterogeneous effects. So you can specify, you can be very precise, and figure out that you know, in this part of the country, in general, this topic trend is important, but not in this group. So let's say you may want to talk about education, but not to a lower education, lower educated people or you know, for instance, one of the topics that was very strongly stratified, and this is true also in elsewhere in the world is attitudes towards nuclear energy. And this is where there's a big gender gap. So, men tend to be viewed much more favorably than than women. And also tend to think it's much more important than women. And this was really interesting because we had a Europe wide energy crisis in the last winter. So this was an important topic to figure out and, and we could tell that you know, if you want to talk about this talk to men, not women, and, and we could say that, you know, don't go and talk about this. If people's political preferences are leaning this way, then they are probably not receptive to this, this idea, and we could do that. On all those different 18 different topics. We could give very precise, precise ideas, what topics to stress and which topics to avoid when you win short, because this would generate a lot of very strong response from you know, wrong kind of, of people from the campaigning point of view. And that is something that the parties were later telling us was was hugely important or hugely useful for them for calibrating their campaign messages and for figuring out where to go with them. And and what you do with it.
Yeah, it's so interesting. So basically, like, yeah, the the insights you get from the post stratified estimates, after ones are really informed, which kind of demographics you should focus on to, depending on the issue you're interested in talking about.
War are the issues you would want to avoid. And this is equally important in the elections that you do not race the for instance, in us is a very well known thing that if the salience of immigration issue starts trending, then this is beneficial for Republicans, because the median voter tends to think that Republicans have more convincing answers. To immigration if it's framed as a problem. And so Democrats should really avoid touching the immigration issue. And that was the same thing in Estonia, by the way, so we were because the war in Ukraine there's lots of Ukrainian refugees in the center so we could very confidently tell parties that it's fine to talk about providing military help to Ukraine, it's fine to talk about assisting and helping, but you do not want to debate openly or had pay a lot of attention to the issue of Ukrainian refugees, especially outside of big cities, because this is something which was a contentious issue for for many people in the in the smaller, smaller parts of Estonia. And and so better to avoid and and they did and it seems to have worked out really well.
And basically, this is due to the fact that here, people are not really receptive to any thing that could change their view. Right. It's that basically just avoid that topic because you're not going to be able to change your view. For now. The views are way too entrenched and in their identity. And so that's basically a waste of your political capital to capital to try and do that. I'll come with something else or try to tackle the issue from another standpoint from another way of coming at the issue instead of coming right in front of this issue and just talking about refugees, for instance.
Yeah, so this, I could bring other examples of this as well. But this is this is something which, which is immensely important in campaigning and we were using it quite extensively. You know, coming back to the start of our discussion when you mentioned about this, the nerdy world of of patient statistics and modeling and everything like this, then, you know, I have been thinking about the back to my childhood when I was also one of the nerdy things that I was doing. I was reading lots of science fiction. So I was really a big fan of all the classical science fiction. And I don't know if you have read this very famous, serious Foundation series by Isaac Asimov. But I'm sure some of the listener listeners have
said, Yes, I've heard of it. I've just not tried it.
Yeah, this is the basic premise of the book is that there's this one man called Harry Selden, who has, you know, discovers or comes up with the this whole new discipline called psycho psycho history, which allows you to predict the future of societies by looking at the interactions of the people. And it's, you know, this is something that has come back to me every now and then, but you know, in a way, of course, as you know, in Foundation Series misrepresents very fundamentally the nature of stochastic processes and then randomness of all this, but, in a way, the ethics of this book or what, what Harry Selden is trying to do is not one light from the patient modeling in political context. So you're trying to gain insight to how people would act in a certain situation, and this is like an anthill. So it's impossible to predict the trajectory of a single ant, but the the the totality of the anthill follows a small number of very basic fundamental heuristics. And this allows you to predict the entirety of the behavior of the anthill with surprisingly high precision. And and this is what's so fascinating about seeing this thing unfold.
Right? Yeah, for sure. So what's the name of the of the series I put that in the show notes? Cuz that sounds like a read.
Yeah, there's, there's I think there's four or five books in that series, but it's a foundation series.
Oh, yeah. Okay. Yeah. Alright exam. So, I will put that into the show notes. My second femur Foundation, there even Wikipedia page. Perfect. That sounds like fun. Probably going to read that. Right. So. Yeah, I mean, because one of the main questions I would have on that kind of, you know, accusation of what parties can and cannot say is, yeah, that's good. But then what do you do if you really have to talk about refugees is like, if you really want to talk about drugs, or Jesus, and you think that it's a problem that actually you cannot talk about the fact that Well, I think that Estonia should take on more Ukrainian refugees and what if you want to do that? And isn't that also the role of politics is to bring bring some hot topics for a poor debate? And if we're not able to debate these kinds of very hot topics, doesn't that mean that in the end incentives of our democratic institutions or maybe not the main maybe need to be updated in a way? So yeah, like that's the main question I would get based on these.
Well, now we're getting into politics podcast away from the statistics podcast and I would be happy discussion as well. But hopefully,
we're almost at the end of the discussion. So you know, like it's, it's gonna have a natural ending point.
I think the important thing to notice here is that you are free to talk about whatever you like, but this kind of model just gives you an honest estimate, what the, the likely outcome or you know, the expected cost of that could be and so you can make an informed decision. If you think that this is an important thing to to bring forth and discuss, then by all means, go and do that. So this is politics. So however, the statistics part cannot tell you what your value should be. Statistics cannot tell you if you should be in favor of accepting Ukrainian refugees or, you know, draw the line somewhere. So this is a different thing. This is this is politics, this is where people have to figure out and and arrive at some kind of a consensus or some kind of a working arrangement in the end. So this is not something that statistics can provide you statistics can tell you what is likely to happen if you go down this way, rather than the other.
Yeah, yeah. So to make it clear, it's like the models here continue with the problems are better not the problem themselves. I've had something that I have to remind people of, it's like, you know, it's like the famous when penned up comedians, often saying the joke about the horrible thing is not the horrible thing itself. And yeah, like the model is not the problem itself. It just reveals the problems that we may have and then we may need collectively and one collectively to do something about that. But at least the modeling can can tell you Yeah, like, here, there is kind of a problem. You could get an optimized solution. But maybe that's the local optimum, and you might want to find another optimal which is more global.
And this is important thing to underline at the end of the episode is that I don't think that the politics should or could be modeled statistically, from the start to the end. I think it would be a terrible idea. And in that sense, if you read the theme of a foundation, then there are you know, these darker tones there as well, which would make you think about the downsides of such things. But that being said, statistics and modeling and Bayesian modeling can be immensely useful tool for also for doing the right thing. So it's just that I want to be it to be clear that what we have spoken about today in terms of modeling the people's preference and modeling election outcomes, and all of that is it's just a technical way of figuring out if you look at the election as a kind of a game and say that you want to maximize your, your results you want to optimize, find this local, local maximum, then this is what you should do, but you should always keep in mind that there's a broader word behind the books that you're you're
working with the current rules of the game, here is how to optimize your game. But that doesn't mean you shouldn't change the world. Exactly. Okay, cool. So let's maybe close the show. So I added the foundation series in the show notes. And also for we referenced a lot of concepts that we didn't really explain in this episode. And that's kind of normal because already had episodes about all those topics. I put them in the show notes. You'll find episodes about hierarchical models, Gaussian processes, nonparametric models, which are Gaussian processes also. And also MRP and missing data. So you will find all these episodes in the show notes if you want to dig deeper, which I recommend because these are very interesting topics. And so maybe I assume the final two questions on escaper guests at the end of the show, I've kind of quick question and probably quick answer. I'm just curious, what is the thing that surprised you surprised you the most in this whole project in this whole endeavor that we talked about?
I didn't know if it's a short answer. So there were I should I should think about the short answer the longer one would be that I was I was pretty much constantly surprised how much there is to learn and how much fun things you can do with your mentioned, for instance, nonparametric models, which is something that we were considering at one point and we'll probably go down that route and try so it's, it's all like it's tinkering. It's finding the different bits and pieces and then trying and most of the time you fail in one form or the other. But, you know, sometimes you strike out and those are the great moments. So when you you know run your MCMC sampler and the you know, the trace plots come out perfect, and then you suddenly see something that you didn't see before. So this is the It's a wonderful feeling and I'm sure you know, well as well.
Yeah, yeah, for sure. I and I understand. There can't be an answer I could have given. Okay, so let's close up the show now, by asking you the last two questions. So first one, if you had a new at time and resources, which problem would you try to solve?
Well right now I would probably given my current knowledge and, and leanings I would probably dedicate a lot of basically all my all my time and, and resources to trying to figure out how to avoid the climate problems the because I think this is really, really a fundamental thing we're facing in the world. And so this is something that we're also thinking of actually doing in in Estonia with our models and with our other capacities. So I guess that would be the answer.
Yeah, definitely in good company. With this answer is a popular one. And second questions at the end of the show, second question, if you could have dinner with any great scientific minds, dead alive. Or fictional, who would it be? Oh,
scientific mind. I don't know. I think I would actually go back quite a long time into the past. I think Aristotle would be a great guy to talk to you. So to get really to the wellsprings of the Western scientific tradition, so that would be a good choice.
That sounds like fun, and it will probably be in Greece, which is a good choice. Awesome, well, thermo so let's do a fake goodbye here. And then we'll stop the recordings and I will tell me what to do. Okay. Well, thermo Thank you very much that was fascinating conversation are in low needs, I think. I hope we struck a good balance between going very detailed and nerdy and giving people background about European politics, especially Eastern Europe and Estonia. And help people know before about the scam tree and the wonderful Bayesian statistics that you folks are making as well. So, I will as usual put everything in the show notes and links to your to your websites and things like that for people who want to dig deeper. Thank you again, that mo for taking the time and being on the show.
And I also want to thank you because I'm just thinking that life takes French turns when I started listening to your podcast two years ago, or around that time, I would have never guessed that I end up being hosted or hosted by you or guest on on one of the episodes. So it's been it's been great. Working with you. Great knowing you and thanks for
everything. Yeah, you bet. Thank you very much. Appreciate it. Appreciate your loyalty to the show. And for sure, when I started the show, almost four years ago, I never felt that they would be a full time patient model. Because I was actually starting learning Bayesian statistics. And that's why I stopped the show. So yeah, for sure. Life is always full of surprises. Well, that note, thank you very much tunnel and still very soon in Estonia. Okay, see you bye. Okay, so you can stop with SCP.
So hit the spacebar.
Yep. Yep. And then you could file export, export as W and V.
Web as a WAV format.
Yeah. And make sure it's 24 beats in the in the format 24 beats PCM and then you can save it wherever you want and will you keep the default meta data? Here keep the default, exporting. Yep. Then it's going to be a big file. But whenever you've got time and you can send that to me, we've got Google Drive or Dropbox or whatever you prefer. We transfer and then we'll send that to editing.
It's 651 megabytes. I'll drop you a link.
Yeah, and other than that, so I didn't check yet my Discord. So I don't know if you if you already answered or not, but I did. Okay, so I will add whatever you sent me to the show notes. If you want to add anything to the shownotes I shared with you during the episode Google Doc. So you will see that I already put a lot into the show notes. If you can. Maybe you send me your bio already. So I'll just add it to the Google Doc. But anything you want to add you can add there until we release the episode. I'm trying to think if I'm forgetting something, but I think we're good. No, yeah, just to submit a ffice and off you go.
Okay. Anyway, so thanks again for having me. And let's see. Yeah, other when you're over in Estonia.
Yeah, for sure. I will. I will let you know when the extent maybe Estonia and thanks again for taking the time and saying that. Now it's time to go back to your newborn child.
Yeah, I had a phone call from home so I need to hurry. Okay. Take care. Bye. You