Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

I’m guessing you already tried to communicate the results of a statistical model to non-stats people — it’s hard, right? I’ll be honest: sometimes, I even prefer to take notes during meetings than doing that… But shhh, that’s out secret.

But all of this was before. Before I talked with Jessica Hullman. Jessica is the Ginny Rometty associate professor of computer science at Northwestern University.

Her work revolves around how to design interfaces to help people draw inductive inferences from data. Her research has explored how to best align data-driven interfaces and representations of uncertainty with human reasoning capabilities, which is what we’ll mainly talk about in this episode.

Jessica also tries to understand the role of interactive analysis across different stages of a statistical workflow, and how to evaluate data visualization interfaces.

Her work has been awarded with multiple best paper and honorable mention awards, and she frequently speaks and blogs on topics related to visualization and reasoning about uncertainty — as usual, you’ll find the links in the show notes.

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, Adam Bartonicek, William Benton, James Ahloy, Robin Taylor, Thomas Wiecki, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Elea McDonnell Feit, Bert≈rand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Joshua Duncan, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, David Haas, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox and Trey Causey.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

General links from the show:

Some of Jessica’s research that she mentioned:

Behavioral economics paper Jessica mentioned:

More on David Blackwell:


by Christoph Bamberg

Professor Jessica Hullman from Northwestern University is an expert in designing visualisations that help people learn from data and not fall prey to biases.

She focuses on the proper communication of uncertainty, both theoretically and empirically.

She addresses questions like “Can a Bayesian model of reasoning explain apparently biased reasoning?”, “What kind of visualisation guides readers best to a valid inference?”, “How can biased reasoning be so prevalent – are there scenarios where not following the canonical reasoning steps is optimal?”.

In this episode we talk about her experimental studies on communication of uncertainty through visualisation, in what scenarios it may not be optimal to focus too much on uncertainty and how we can design models of reasoning that can explain actual behaviour and not discard it as biased. 


[00:00:00] I'm guessing you already tried to communicate the reason, results of a statistical model to non-stat people. It's hard, right? I'll be honest. Sometimes I even prefer to take notes during meetings than doing that, but that's our secret. But all this was before. Before I talked with Jessica Holman. Jessica is the Jenny Ramy Associate Professor of Computer Science at Northwestern University.

Her work revolves around how to design interfaces to help people draw inductive inferences from data. Her research has explored how to best align data-driven interfaces and representations of uncertainty with human reasoning capabilities, which incidentally is what we'll mainly talk about in this episode.

Jessica also tries to understand the role of interactive analysis across different stages of a statistical workflow and how to evaluate data visualization interfaces. Her work has been awarded with multiple best paper and [00:01:00] honorable mention words, and she frequently speaks in blogs on topics related to visualization and reasoning about uncertainty.

As usual, don't worry. You'll find all the leaks in the show notes. This is learning Beijing status. Episode 73, recorded September 20th, 2022.

Welcome to Learning Beijing Statistics, a fortnightly podcast on Beijing infant, the methods, the project, and the people who make it possible. I'm your host, Alex Andora. You can follow Twitter and Alex Andora, like the country. For any info about the podcast, learn based stats.com lap be show notes, becoming corporate sponsor supporting l bs, Paton and unlocking base merch.

Everything is in there. That's Learn bass dance. If with all that info a ion model is still we're existing you, or if you find my voice especially smooth and want me to come and teach bass [00:02:00] dance in your company, then reach out at Alex dot Andora at p c labs dot I or book call with me@learnbassdance.com.

Thanks a lot folks. And. Wishes to you old. Let me show you how to be a good and change your predictions after taking information. And if you're thinking there'll be less than amazing, let's adjust those expectations. What's Aian is someone who cares about evidence and doesn't jump to assumptions based on intuitions and prejudice.

A Bayesian makes predictions on the best available info and adjusts the probability cuz every belief is provisional. And when I kick the flow, mostly I'm watching eyes widen. Maybe cuz my likeness lowers expectations of tight Ryman. How would I know unless I'm Ryman in front of a bunch of blind men dropping placebo controlled science like I'm Richard Feinman.

Hello my dear. Patience, two extra things for you today. First, Be thank you to Trey Cozy who just joined the [00:03:00] LBS Patron in the full ST here. I'm sending you oh, my good Beign Vines. Dear Tray, second holidays are around the corner and I wanted a great gift for you. So if you are Podcast Patron, you get 10% discount on my base introductory course that I did with my dear friends writing Kumar and Thomas Speaki.

No time limits, no questions asked. Just go to the LBS Slack or patron page and you'll find the link in there. Thanks again for your support folks. Really, you are the one. To make all of this possible. So thanks so much. Okay. Out to the show now.

Jessica Hellman, welcome to Learning Beijing Statistics. Thanks for having me. I'm glad to be here. Yes, I am very glad to have you here. I know you are very busy, so particularly appreciate and even more at, um, than as all my California based [00:04:00] guests. I make you wake up at a very ungodly hour in the morning.

So double thank you. And if I could, I would offer you some coffee right now. I have some, so thank you. I I did get up early enough to make coffee, so thanks, . That's good. So actually, let's start quite should be easy for you because I wanna start with your origin story basically. So, . I'm curious, how did you come to the stance and better world and if it was like how senior of a path it was?

I would say very sinuous. So yeah, as an undergrad I liked science a lot. I guess all throughout, you know, being younger, I liked both science, but also like humanities and especially writing. And so I, in undergrad I sort of went back and forth between hard sciences. I was like a science major for a while, but I also really liked philosophy and like literary theory.

And I really loved writing, like creatively analytically, et cetera. And so [00:05:00] I took a, a bunch of courses kind of all over the university, hard sciences, humanities. Ultimately, my undergrad ended up being, it was comparative studies and religion. So basically like religiou. Theory, so very, very random, but I could talk to you about like Buddhist philosophy if you wanted

But then after undergrad, I wanted to do the writing thing more. So I got an MFA in experimental writing, but it was during that time, so I was like 23 ish when I, I kind of realized that I was just much more analytical than I had realized, and so I missed. I was doing this MFA in writing. I was getting more and more sort of, I guess not wanting to engage with creative stuff.

Everything seemed so subjective. I missed having courses like stats and just being exposed to more of the science math stuff. Like I was an undergrad and so I happened to have a good friend, I guess while I was doing this MFA, who was a pure mathematician who had sort of gotten into finance. And so he introduced me to the idea of natural language processing.

Cuz at the time he thought that [00:06:00] like, you know, places, you know, like the big, you know, banks would start, you know, doing N L P on like news feeds and it would like revolutionize trading strategies. So he was all excited about N L P. And so it was, I started looking into it and it was kind of like, you know, a mix of like language, which I've always liked, like words and text, uh, but also, you know, stats.

And so I, I got interested in natural language processing originally that made me go back to Michigan where I was originally from to get a master's in information science, but wanting to do nlp. But I guess during that time I was getting the master's, I sort of took a visualization class or maybe several, like an HCI visualization course and somehow, Somehow found myself gravitating more.

I think, you know, visualization had data representation. I've always liked thinking about like representations in the abstract, like what's a good representation of information. And I also, I think part of it why I went toward visualization and way from N L P was sort of also that in visualization. I think [00:07:00] the role of cognition, like how people think is something that you sort of have to deal with more.

Like you can be kind of like doing engineering style work or doing design style work, making interactive systems. But I think you really have to think about like how are people processing information? And I think, yeah, I've always sort of just naturally liked thinking about cognition. I never really took many psych classes or anything, but I think just like thinking about thinking.

was always fun to me. So I think, yeah, I ended up deciding after that master's, right after that to just get a PhD in information science, but really focused on visualization. So not a very direct path whatsoever. But I'm glad, I mean, I, I don't regret it. . It was fun. Yeah. I can case see this in, yeah, quite random, but in the end, yeah.

You managed to blend those two interests. If I understood quickly, let's, yeah. I think in a way I've always liked theory, sort of philosophical theory and then theories of cognition, and so I think as long as I can like find sort of the [00:08:00] interesting theoretical bits about like representation of information, et cetera, like I'm happy so, so yeah, it worked out.

Yeah. And so, but it makes me curious. You work a lot with visualization. Do you read a lot of comic. No, I actually don't. , I would say maybe, I guess a lot of people in visualization or who, who do visualization research maybe do gravitate towards comics. Like people cite this book. I think the Scott McLeod Understanding Comics book a lot.

There's people working on like data comics. I would say among visualization people. I'm not very designy oriented, like I'm not really drawn to like visual art as much or comics, which is maybe weird among this people, but I'm terrible at like graphic design, et cetera. I mean, I like the idea of comics, but I can never really get too far into 'em.

I see. Between what is data comics? So people, Benjamin Box, some people at Microsoft Research, like Natalie Henry started doing work where, I mean it's basically like sort of what people would call like storytelling [00:09:00] with data. You know, like you have some data set and you sort of have some narrative I guess, that you want to convey and so.

How do you generate a comic strip that represents that? I'm not sure if they ever, if anybody's done like sort of automatically generated comics, data comics, but I know there are people that were like studying sort of like the design space because I guess there are some existing artists who were doing data comics out in the world and so they sort of were looking at like how do they use like the comic style to represent or to talk about data.

So there are people doing it . That does sound like fun. Yeah. I mean, and I was thinking that was data on comic books, so it's like this is no nerd squared. No, that's something that comes up sometimes when I'm like a teaching vis classes. A lot of students, if they have to like make an interactive visualization, there's a lot of people wanna do characters from comic books.

Any sort of like avatar stuff, I don't know. I'm sure there is data on comic books and someone's visualizing it. Probably in some grad class [00:10:00] on visualization . Yeah, that's a good inception. nerd section for sure. So actually, how would you define the work that, uh, you are doing nowadays and also the, the topics that you are particularly interested in?

I would say now I'm kind of like in a stage where my interests are changing a bit or just shifting, I think the sort of way at which I'm. Coming at questions. So a lot of my work has always been about visualization and specifically visualization of uncertainty. So how do we, you know, help people, you know, make better decisions by representing uncertainty in better ways?

And I've done sort of come up with some techniques and mainly sort of frequency based representations. I think you had Matt Kon, so I'm sure you already heard. Some of it, but it was something I started on as a grad student and you know, things like probabilistic animation where I wanna visualize a distribution, but instead I'm going to, instead of showing you something like a density plot, I'm gonna, or a a mean and an interval, I'm gonna show you random draws from the distribution as an animation over time.

So a lot of my work has always been how do we make [00:11:00] better visualization techniques for uncertainty? In a lot of the ways that I've evaluated them have been sort of decision theoretical style experiments and I've worked on a lot of other topics. But I guess if you're asking sort of how I've seen my work, it's, it's often been as this person who studies visualization of uncertainty, I would say in the last like year or two, I think I've kind of started getting more excited about kind of this intersection between visualization, which is this like really sort of empirical engineering field and.

Theory, including statistical learning theory and Beijing statistical theory, but also sort of theory in the style that like economists do it, or theoretical computer scientists. So kind of theory in the sort of mathematical framework sense, not like philosophy or whatever. And I think I see sort of a lot of opportunity for people doing visualization research to sort of better formalize, kind of like when we're developing visualizations or interactive tools for like exploratory analysis.

What is the objective? What do we think people are doing? What do we think is a good visual [00:12:00] analysis session? What do we think is. Sort of how do we know when we see a difference between two visualizations in an experiment that that is actually indicative of like an important difference in the world?

And not just some, I'm not saying like P hacking, but there's a lot of like studies that'll show you like, oh, this visualization's better for then this one where they come up with some kind of contrived task and they create a set of stimuli where you end up seeing that one visualization is better, but is that sort of set of stimuli, are those visualizations and data sets that they tested, are those really representative of all.

All of the times when you might care about that visualization in the world. So I think I stopped sort of trusting some of the empirical research and started thinking, well, I really wanna like have a better sense of what is the objective when we're doing an experiment on visualizations, especially with uncertainty.

Can we say sort of, even without doing the experiment, by using kind of theoretical frameworks, how much we might expect the maximum benefit from a better visualization to be. And so, I mean, I can sort of give [00:13:00] you an example. It's kind of very different among visualization research. So I feel like I have to preface it all by saying, this is sort of weird.

This is not like representative visualization research. But I have a project now, for instance, with one of my colleagues, Jason Hartline and a few students at Northwestern, where we're sort of starting from this hunch, which has been bugging me for a while. That often how you visualize the data isn't the most critical thing.

Like often it's like what data are you looking at? If you're trying to think about people making better judgements or coming to better conclusions as they analyze data. I think visualization is often not the most important piece and, but like I said, in visualization research, you see all these studies where.

the studies will say like this visualization of uncertainty, you know, like showing uncertainty as probabilistic animation, to take an example from my work is like better than this other thing because we can construct a situation where we can show that for this type of judgment, this visualization appears better.

But I sort of like, I don't really always trust those results because I think you're often cherry picking the comparisons you're doing in an experiment to sort of, to show what you want in terms of a [00:14:00] visualization being better. And so in this project, we're basically kind of taking this idea of a rational beige agent, which I think we don't really make enough use of in visualization.

And saying like, let's say we have two visualizations that differ in terms of how much information they provide to this agent, the user, or whatever. And we want to know if we were to do a study between these two visualizations where someone has to do some sort of decision. You know, can we use this idea of a rational beigian agent and how much better a rational beigian agent could possibly do with this, you know, more informative visualization versus this less informative one as kind of a way of doing sort of like a pre-ex experiment analysis to understand like, what's even the, the possible benefit of this visualization versus another, I think.

One of the reasons this becomes kind of important is that like there's a lot of, if you're studying sort of visualizations of uncertainty, there's always all of this noise or, you know, uh, variants just cuz you're sampling to generate the stimuli, et [00:15:00] cetera. And so I think it's like, People can not easily sort of not realize, like I'm doing an experiment where even in the best possible case, you couldn't really do that much better with than this visualization.

And so if we do see a big difference between the two visualizations, probably it's like an artifact of like a small sample size. Um, if that makes sense. You know, like we're doing these little empirical experiments and sometimes seen big effects between visualizations, but they're either a cherry picked or they're overestimates.

And so in this framework, we're taking the idea of the rational beigian agent making a decision using one visualization over another, and the maximum distance that you could see if the rational beigian agent was responding under an optimal scoring rule. So using, using this idea of scoring roles from, you know, like, well, they use them in econ, et cetera, they use them in lots of places, stats to sort of think about like, yeah, theoretically what's the best this visualization could help us do?

And so I think, you know, this is just one example, but. This sort of idea of thinking about in a more sort of formal way, what do [00:16:00] visualizations do? I think the things that I like about this is like, one, we have to decide what is the objective of a visualization. And I think in visualization research we're often very sort of, we start out really empirically like, like let's just like come up with some task and put two visualizations in front of people or you know, end visualizations and compare how people do.

So I think what I like about sort of taking a more theoretical approach is like if you're, you know, like a theorist in computer science, you have to really think hard about what is the objective, what's the payoff function, what is the optimal solution? And so I think. I mean, I think in a lot of ways we're never going to be able to say like, what is the optimal visual analysis session?

Like if somebody sits down to visually analyze data, like what is the optimal thing they could do? But I think we could go a lot further in trying to sort of structure that. So yeah, I have other work that's sort of also trying to sort of, rather than taking an empirical approach where we're doing user studies use, for instance, like machine learning to try to say, if we have some visualization design and we wanna know like how well it captures sort of [00:17:00] the signal in data, can we see how well, you know, a machine learning model could differentiate signal from noise using that visualization.

So sort of. Kind of a similar question. It's like, what's possible? How separable is this visualization from this visualization? Or how powerful is this visualization? But more in a sort of, kind of trying to define that in a mathematical framework. So yeah, so a lot of my current interests, like I said, this is all kind of still early work for me, but I feel my work sort of going in this way.

And I think, I mean, I have other interests in like visual data analysis, like exploratory analysis, which I've done some work on before. Like I said, like what do we think a, a good exploratory visual analysis looks like? I think there's interesting sort of theory, just like statistical learning theory again, where you could try to model like what is someone doing as they're sort of looking through a bunch of plots and trying to find where are the interesting patterns, what would be sort of the optimal way for them to be doing that.

So I think some. Some stuff in sort of like online learning, online machine learning algorithms, et cetera, like I [00:18:00] think could also be be brought to bear. But I haven't started really doing that seriously yet. But yeah, all of my interests are sort of pushing in this weird visualization plus theory direction.

So yeah. So many things. , I have too many questions already. We'll dive actually into some of these topics because I have some other questions to ask you compared to what I, yeah, I'm sure I'll end up talking more about this stuff as we go. So yeah, because indeed I had met Kay on on the show, so I'll try to do a good job as a host and not ask you the same question.

Before that though, I'm curious if you remember. How you first got introduced to Beijing methods and also how often do you end up using them today? So I started, I guess, I mean I've read Andrew Gunman's blog all throughout grad school. I was starting to read, you know, like what's the difference between Beijing statistical modeling and frequentist statistical modeling as a grad student.

But I would say I didn't really seriously get into using [00:19:00] Beijing methods to analyze the data from like an experiment I did until I was like in my, I guess first year of faculty where I, yeah, I just decided that that was worth the plunge. Actually, I, I was working with Matt Case, so he was a PhD student and we did an uncertainty visualization paper cuz I had just finished this other work and I was like excited about these frequency based representations.

He was already using patients stats, like I think he was starting to see it as like the default. And so yeah, it was something I'd been interested in and suddenly I had like a collaborator who was also kind of doing that. And so I read. Richard Mcal Reese's book, went through all the chapters, tried to do the exercises, um, in all of them.

I went, I think a week long, just like Beijing Stats seminar by, uh, John Kruk around the same time. And so kind of just like tried to jump into it. And then I had PhD students by that time, well, like my first, second year as faculty. And they, some of them were also interested, like some, they had like applied stats or like a vision science background, a couple of them.[00:20:00]

And were also interested just for the purposes of analyzing data from things like experiments to get into this. So that helped a lot to have like, , you know, people to play around with the stuff with and try to figure it out. So yeah, that was when I got into it. I would say, I mean I still, my students, I pushed them to take be stats classes and they all probably have read Mikel Writh at this point.

Even like the brand new student who just started, you know, I've told him to read that book. So I think we try to default to Beijing stats in my lab. I would say, I mean maybe I'm slightly different from Matt in that a lot of my interest in Beijing stats is also in the theory more so. So like a lot of my research has been really about like using sort of like be updating or the idea of the rational be agent as a way to sort of try to get more insight into what we're actually trying to like design for when we design interactive visualization systems.

So some of the stuff I was just talking about, it's sort of like, you know, very bein inspired, but I'm not necessarily like using BE stats to analyze data. I'm [00:21:00] trying to use the idea of like be updating and be statistical theory to sort of. Create this kind of like way of thinking about what the goal of a visualization is.

So some of my work, and I ended up, I don't know, my third or fourth year as faculty, maybe starting to get into like Beijing models of cognition, which are sort of like, you know, I'm going to have, I, I'm gonna show people like visualized data. I'm going to either endow them with prior beliefs or I'm gonna like try to elicit their prior beliefs.

Like say I want them to estimate some parameter, like, you know, what do you think is like the rate of dementia among the elderly in the us? And so I'm either gonna elicit a prior distribution or I'm going to endow a prior distribution and then I'm gonna show them some data and then I'm gonna try to elicit.

They're posterior beliefs, and I wanna like compare, you know, like how well do people update their beliefs or how do they compare in terms of the belief updating they're doing relative to like the rational besian agent. And so we started doing work like that where it's like, I don't know, I mean, cognitive sci scientists would call it besian [00:22:00] models of cognition, but we can then use like how much people deviate from like rational vision updating as a way to sort of understand like what are people's biases, kind of in a, in a more like well-defined way.

Like sometimes people study decision biases and visualization research in a way that it's like, it's hard to pinpoint like, what do you exactly do you mean by bias? I think a lot of, like the original economy and inky stuff has this, they define bias within this like very contrived setting. You know, like the problem with like Linda, you know, and like you use this little, this little vignette and you say that like people are biased if they don't take into account like the prior probability of.

Firefighting or whatever in the population. But it's like there are ways in which you can explain that as also being kind of rational. And so I think I like the beigian models of cognition because at least within the model you're working with, things tend to be a little more well defined. And so then when people don't look beige, like you show them visualizations and.

They're not appearing to like update an evasion way. You can learn a lot by saying like, okay, how could [00:23:00] we adjust this model to try to account for some of what we are seeing? Like how do we explain this bias in the context of, you know, a evasion framework? And I've found that really useful for like getting insight into things just like biases and how people use visualizations that I wasn't aware.

Like I knew people were biased, but I couldn't really just give you a good formal description of it and doing these kinds of studies. sort of has helped me just like get a, a deeper sense of like what we're dealing with when we're designing. So for instance, like one example would be, and this is not like something I discovered like in behavioral econ, there have been a few papers on this as well, but you know, we always think like people are bad with sample size or like people don't understand variance.

But I think we often think of like belief in the law of small numbers. So like people, you show them some small sample and they will like over update their beliefs. But what we have found in some of these be cognition experiments is, and which appears to be sort of corroborated in the um, some behavioral econ lit, is that actually what appears to be much [00:24:00] worse is people under updating their beliefs when they have a large sample.

So you do see this bias where they overestimate from small samples, but you see a much more extreme bias where like no matter how much data you show people, people are never certain about, you know, like the parameter estimate. And so this has been called like non-belief in the law of large numbers. And I think this is one of the most sort of strongest biases we've seen in these style of experiments.

And then we can think about like using this idea of like this deviation from Beijing updating as kind of. A goal as that we can keep in mind when we're designing visualization. So we can say like, if we change the way we show uncertainty, can we get people closer to appearing to like take the data as informative as they should?

And so like one thing we did, which was kind of interesting, was about within, so we had like a simple evasion model. Um, we were having people estimate proportions or you know, like rates of a disease. And so you can basically sort of, so they, you get their prior. What do I think the rate of [00:25:00] dementia in the elderly is in the United States?

And then we show them some data, and it could be like a small data set, like here's a survey of like people in assisted living centers with dementia, or like, here's a very large sample of like elderly in the us, how many had dementia, and then we get their, their posterior beliefs. But then we can basically using like the update that they did, we can, and given the prior that they gave us, we can basically calculate what we called, like the perceived sample size.

So if I were a besian, how did I perceive that the sample size of that data set that they were shown. And so what we found is like, you know, you can show people like a, a huge dataset, like 750,000 elderly people and they will update as though you've showed them like 600 people, like a survey of 600 people.

And so, but then you can use this sort of difference, this perceived sample size, like as a, a metric as you're trying to design better uncertainty, visualizations and sort of track like, can I, how far can I get that up, et cetera. So, So it's sort of like I said, yeah, a different way of sort of trying to [00:26:00] like capture one of these biases actually mean, but there's also interesting questions like, well maybe people never believe huge samples because they never fully trust the people that are preparing data.

There's always this like distrust I think as well when you're showing people data, like in data communication scenarios, like we never fully trust like the polling agency or whatever. So we can start to think about like, why is it that people never look quite be so a lot of like seeping in of Bayes into my work in those kinds of ways.

Yes. So I was actually thinking about that non insensitivity to sample size the other day because it's a fun thing because since I'm a nerd, I cannot go to a re. Without checking the ratings, but each time someone asks me, I ask someone about the rating. I'm always gonna ask them afterwards, but how many ratings?

And it's funny because like people usually don't pay attention to that at all. Very often someone will tell me, oh, let's go to [00:27:00] that restaurant. It's like 4.9. But my first question is, Yeah, but how many ratings are there? And it's funny to me because to me it's a reflex, but most of the time people don't think about it.

And also often they will say, ah, yeah, right. Like that many ratings. But sometimes they say, yeah, but we don't care. It's like, it's like it's not important. I find that super interesting in the sense that most of the time people know it's something they should pay attention to, but also they don't really care if they don't pay attention to it.

Yeah. I mean, this makes me think of multiple things, so we. I think of this as sort of like relying on unstandardized effects size, so you know, like effect size. You can like measure it in different ways, but like you have two distributions and you unstandardized. It's just like the mean of one versus the mean of other, you know, standardized effect side like coin D, which I don't really like that.

Kind of thing necessarily, but standardized effect size is like trying to also capture like the difference in means also accounting for the [00:28:00] uncertainty in each group. And one of the things we see actually a lot in some of the previous experiments I've done is that like you show people visualizations of distributions.

And if they can pick out where like the mean is or where the central tendency is, they're highly likely to just ignore the uncertainty and take that difference in means and like map it to whatever response scale they need to answer. And so if we ask them like, what's the probability that if you were to go to this restaurant with a higher rating versus this other restaurant, it would actually be better?

Or what's the probability that this restaurant is actually better than this one? And people will, yeah, answer those questions as if they're really only looking at this on standardized effect at the same time. I was thinking as you were talking about that, like there are ways in which you could look at at that kind of behavior as like maybe rational.

Like you could probably figure out a model under which. It is rational because, you know, say I care about, like my utility is based on like whether I can experience the best possible restaurant. And so if you have me, like if you gimme like [00:29:00] two restaurants and like one of 'em has, you know, 4.5 stars with like a thousand reviews and the other one has like 4.9 with, well I don't know how many reviews do you need, 10 reviews, whatever, since there's more variance or whatever.

Um, in the smaller sample size, like the upper bound on performance is potentially bigger if I go with the smaller sample size restaurant. Does that make sense? It reminds me of like these multi-arm bandit problems that people think about, you know, like I should go with the one that has my uncertainty suggests that it could have the higher values.

So, so it's weird, but yeah, I totally agree. I think of this kind of stuff a lot too, the like Yelp reviews have always like thrown me off. Like how do people actually use these things? So Yeah, exactly. And yeah, it's basically a situation of high race, high reward and I think it's, , maybe it's like related to the bias we have, you know, for the lottery whereas's, like the probability of winning the lottery is just like basically zero.

But people still do it because the reward is huge potentially. Whereas the risk is low. And so depends how many [00:30:00] times a day you play the lottery. But put that aside. I mean here it's kind of the same. It's like, yeah, but maybe that restaurant really is amazing. And so I really wanna try it even though it's only 30 people who've great, who've like rated it, you know?

Or also like you could have another effect, which is Yeah, but I'm not like the people who rate the restaurants, you know, I'm different. Yeah. I don't have the same tastes and so actually I don't care about the reviews. Yeah, that's hard. I can't explain that with any sort of statistics. . Yeah. I'm like, so I have a friend who always tells me that I'm like, and I never know how to answer except that.

I don't think you're that special. . Yeah, no, I mean I do like, there's interesting stuff, just like the whole wisdom of the crowd thing and like combining advice from people. Like there's so much fascinating research on that, like ways that in some conditions you are asking a bunch of people leads to like a biased estimate, but if you ask them in the right ways, if you incentivize them in the right ways, like you can get much better estimates.

I find [00:31:00] all of that kind of like combining expert decision type stuff. In this case, we're not talking about experts, but like how do you combine information from multiple, like forecasters I think is a super interesting question. I think at least with restaurant reviews, like it's not like people are incentivized to be dishonest for the most part.

And like you maybe you know the restaurant owner and you're like, you might, some people might, some of them are. Yeah. But yeah, but for the most part, that's also the point. Like if the simple size is bigger, then these effects are drowned out. Right? Whereas if you have like 30 reviews, Well, the first thing I would ask my friends and family is go there, put five stars.

Right? I think I could get 30 reviews pretty easily. So, you know, anyways, now it's making me hungry. It's almost dinner time here, . And so you talked about some of those biases and I found that super interesting because you related also to Canman in Krosky's research, which is something I always nerd a lot about.

So that's, that's super interesting to me. And so you talked about that, and I'm [00:32:00] actually curious. Those biases. You talked about when people look at data or deal with data, which is the main specialty in your research, I'm wondering if those biases are less pronounced in people who actually work with data every day or if like all subject to them, no matter how experienced we are.

Yeah, I don't know. I mean, I've been asked that question before and I mean, I think it's, I don't know, I think it's hard to answer that definitively. I mean, you could do a bunch of studies with different populations and try to capture, you know, like differences in these biases and I'm sure you'd find some differences and some similarities, like this idea of like conservatism and belief updating and I think is like, I think, you know, supposed to be fairly universal, but I'm sure.

Are better. You know, like I would imagine you or me are gonna be like, place a little more weight on as the sample size gets bigger. Maybe more so than, you know, like people who don't think about stats at all. So, so yeah, I guess I'm like, I [00:33:00] can't answer that question so much or I don't really wanna answer that question as much because I would have to go out and do a bunch of studies and there's no end to the studies I could do.

And I'm probably finding differences in similarities. I think I'm. I would think about it more as like, is there, what is the meaningful way of even measuring bias that would allow us to make clear comparisons across these groups? Because often the kinds of analyses you can do as an expert are much more complex than as a layperson.

And so like if I'm thinking about, I guess experts, data scientists doing like visual analysis of data, I care about the biases in so much as like they affect like the outcome of their exploratory analysis session. So it's sort of like, I think when I think about analysts versus like laypeople, it's like, It's a very different sort of question that I wanna ask in the first place.

You know, I do think like having a, a good definition of bias within some like framework, like maybe, you know, like using a basian modeling framework can help, but [00:34:00] sometimes I guess like my interests in theory lately, like a lot of questions I get asked, feel like, you know, empirical questions, like I can't really answer that without doing a lot of studies and I don't really trust studies that much, so I can't really say

Yeah, actually talking about studies, I'm wondering like if you have a favorite paper or study or experiment or theoretical work, you know that particularly. That is dear to your heart about those topics in that you'd like to share with listeners? Sure. I guess since we've been talking about this non-belief in the law of large numbers idea, I mean there's a, a model that tries to sort of capture like what is like the mechanism by which that arises, like where people really discount as they discount more and more as sample size gets larger and so there's a paper by some behavioral.

Where they coin this term, the non-belief in the law of large numbers, and they propose a model basically where the normal besian agent is going to know that. If I'm seeing, you know, say I'm trying to [00:35:00] estimate some rate, I know that as my sample size gets bigger, like the sample proportion is going to converge to the true rate, they propose that, you know, the way people act instead is as if you know, like they know there's some true rate and they're gonna see a sample and the sample proportion is how they can like estimate what that true rate is.

But they act as though, you know, like, What's happening is that there is a true rate and then a proportion is being sampled or a, a rate basically is being sampled from a distribution around that true rate. And then actual data is being sampled from that sampled rate. So it's like they're acting as though this, there's this like extra layer of uncertainty where you're never gonna see like their, um, estimated rate after seeing data converge.

They're never gonna believe basically that the sample proportion converges to the true rate. And so I kind of like, there are some behavioral econ papers I like because they'll propose kind of like, what is a mechanism for what's going on here? So yeah. So that one comes to mind. And then you can try to, you know, given [00:36:00] decision data from some experiment, you can try to like figure out like what's the variance in this distribution that they think, like around the actual true rate where they think you're, you're sort of drawing the rate that you're gonna sample from.

So, yeah. Yeah, there's a lot of great papers, I think at the intersection of sort of like, Really like cognitive science and economic theory and behavioral economics. Uh, like it's all very core to sort of like uncertainty and trying to estimate the impacts of uncertainty. Um, both reasoning under uncertainty, but also when we fit a model.

I mean, I guess they're all sort of about behavioral data. Like if we have some model that we're trying to describe, some like behavioral responses using it, how, there's another fascinating paper that's sort of like, Trying to capture like, you know, how much inherent uncertainty is there in some of the tasks that we ask people in these experiments.

Um, so how much, how well could our model possibly do at explaining some of these phenomena? Cuz like people always have this like, element of sort of randomness as well. And so there's some interesting work as [00:37:00] well that comes to mind that's about sort of trying to, sorry, maybe I'm not even gonna get into that paper.

I think it's just gonna be too complicated to explain. Okay. So I do have these kind of, of topics and papers like the intersectional visualization and behavioral that's. I find that, yeah, really fascinating. As you were saying at the beginning, like thinking about the way we think is really something that I, I do enjoy a lot too.

But in the end, I'm curious if we can train ourselves to become better at interpreting data. Like, because in a way, I'm wondering if that's what you're trying to do, for instance, because you identify those biases and pitfalls that we. And then you're trying to set up tools to circumvent those biases. Is the goal like to train ourselves to get better intuitively, or is it just to just have better tools that take into account those biases?

Yeah, I mean, I guess I approach it [00:38:00] maybe more from the perspective of like, I'm just interested in, if we think about visualizations as like abstractions of some information. Like I'm interested in finding the most robust abstractions so that even if people don't th think very hard about what they're doing, like they're, their estimates will be, or their inference will be more robust.

So I think I, I prefer to think about it in that way. I think sometimes, I mean like a very different view would be like, okay, I'm a visualization research. I'm going to try to design like these different like training exercises that walk you through how to think about uncertainty. I find that like more sort of boring from a scientific perspective.

Like I'm sure you could do that. But yeah, so I think about it more as like, I wanna find. The abstractions that are robust. I think what's interesting though, and we've seen this I guess in some of these decision theoretic style experiments we've done at at times is that sometimes if you use a visualization that makes like say uncertainty, feel a little more concrete.

So things like, you know, probabilistic animation, we're actually seeing draws [00:39:00] from a distribution rather than seeing like a summary of a distribution, like an interval. We have seen in a few studies this, we do these like within subject studies, so you're using one type of visualization, then you're using another type of visualization.

And we do sometimes see like a learning effect where if you use the sort of better visualization of uncertainty first you do better with the worst ones. So, or the, the sort of inferior ones. So things like. We did a study where we were showing people basically the effect of sampling error on the jobs report.

So the jobs report in the US is just like, it comes out, I think they do it every month, but it's like, you know, what are the monthly like numbers of jobs added to the economy? And so the New York Times had done an article at one point showing like, well, even if there's like no growth in the job market over the year because of sampling error, you can still see like it looks like there's a pattern or whatever, like it looks like there's growth or it looks like there's a de decline, et cetera.

So they made these animated visualizations to show people like under sampling error, even if you have [00:40:00] no growth, you could see visualizations that look like any of these samples or whatever. And so what we did was an experiment, this was a few years ago now, but where we showed people the jobs report and we asked them, We basically showed them hypothetical jobs reports.

So it was just like a set of 12 monthly estimates of jobs added to the economy. And then we, we showed them two possible models that might have generated that data. One is just like no growth in the job market, so just totally flat. And the other was like a steady growth trend, like a, just a linear, very simple linear model.

And so then we varied how we showed them those two models. So they either saw just like, you know, the no growth like a bar, With no growth or a bar chart with study growth with error bars on each bar to des designate like the sampling error. Or we showed them like each model as a probabilistic animation so you could see what the, no, you see the no growth model, but it's actually an animation showing you like these are all the things you could see under no growth and the study growth similarly.

And so they had to look at this like hypothetical data sample and say like, [00:41:00] which model is more likely to have generated this? And we tested, like I said, either bar charts with error bars or what we call hypothetical outcome plots. These probabilistic animations where you're showing, you're seeing samples on each frame.

And we found that after people who use the probabilistic animation first ended up doing better on the task of differentiating which model created the data with error bars. So it's sort of like, I think there can be. If you give people good metaphors for uncertainty, I think it, it can transfer potentially to other task.

So yeah. But there's a lot of brittleness sometimes in these results too, so I don't wanna say anything. All I'll say is like, we've seen learning effects a few times, but yeah, like I said, I prefer to think about it more as like, what's the representation that's robust, even if I don't have time to train you.

Yeah, I see. So something like automated assistance that kind of corrects for those biases. Yeah, I mean, I think there's a lot of interesting. Too, like if we think of these like besian models of bias, there's all sorts of things we're not even trying yet. I think, you know, if people over [00:42:00] update from small samples and under update from large samples, like you can imagine, like, I wanna show you some big estimate, like some big estimate of a rate, like maybe I should be chunking the data up.

Or instead of showing you like there's a, a poll on like who's gonna win the next presidential election If it's a really big poll, like why don't I chunk the data up so that you're like updating, you're like, so that I'm like adjusting for your bias. So like you'll over update for each like smaller piece of data, but if I pick out like the right number of date of chunks to show you of this data set, like I'll get you to be perfectly besian.

So I think, yeah, I'm more interested in sort of like taking these biases and then trying to figure out like how can we engineer. Representations that sort of correct without people having to think so hard about what's going on. But yeah, that's very hard. Also, and we do see some of these representations fail sometimes cuz people just get confused or they, they're so unaware of like how to use something like probabilistic animation.

Well, that they end up just ignoring the uncertainty. Like, so we've seen in some studies, like you show people two [00:43:00] distributions where you're showing random draws from each distribution in an animation and we ask them things like, well what's the probability that the value of distribution A is greater than the value of distribution B?

Say A has a higher mean. Rather than just like watching the animation and just counting how often A is greater than B, which like gives you basically a very easy way to estimate that. People instead try to estimate from the moving visualization what is the mean of A, and then they'll estimate what is the mean of B, and then they'll just think about how far apart the means are.

Which is like exactly what you don't wanna do. You know, like you just made things so much harder for yourself. I think it's just, it's hard in general. Like you can come up with visualizations that tend to be more robust in some situations, but it's always, it's always hard. And this is part of why I've lost a little faith in empirical experiments.

Like it's just, there's so much variation in the world, , you know that sometimes people learn things, sometimes they don't and there's a lot to figure out still. Yeah, I see. To me in the end also, it kind of seems like basically going the, [00:44:00] trying to be as automated as possible as. Saying like basically having assistant seems to be a better route because otherwise that means that everybody has to train almost as a statistician and be aware of the biases.

Yeah, I mean I'm sure they go hand in hand also as well. Like the more you build these things into tools, like for me, like the more I think of like how can we improve little bits in a visual analysis tool, the more people sort of pick up on like these ways of thinking about things. Yeah. I would hope to.

You would hope. But, but yeah, it's always like a balance, I guess. Yeah. I'm surely curious. Yeah, like for instance, are there any, uh, studies around like, I mean, I don't even know how you would do that, but something I'd be curious about is since the advent of five of 5 38, there is way more talk about probabilistic thinking in electoral coverage, at least in the US and however,

I'm [00:45:00] curious how much better people understand probabilities and understand uncertainty and how that propagates into the popular vote and then the electoral vote and the electoral college, like, you know, and I'm wondering if in a sense it just doesn't make the people who already data savvy, data savvy better.

That's good. That's really good. But the people who are already outside of that bubble. They don't really enter the bubble. Yeah, I mean I think if you look at like how people have visualized forecasts and how they've specifically, how they've communicated uncertainty in forecasts even over the last few years in the US, like 5 38 has drastically sort of upped their game.

I think in trying to make people aware, not just of quantified uncertainty, but of unquantified uncertainty or epistemic uncertainty. So things like five Vox, it's like it is constantly reminding you like, these estimates might not be right. And then recently there was some new stuff they did where they basically giving you sort of a, like a what if scenarios in the, I think, you know, like in the previous elections there's a huge amount of [00:46:00] polling error.

We know that we underestimated the polling error, so like, Here's the recent survey data, um, for the upcoming election. But now let's like simulate like what if the polling era was as bad as it was then? Like how would we have to adjust these? So I think there's a lot of more sort of forthright communication of uncertainty At the same time.

I totally agree. Like there's some port proportion of the population and it's probably very big, who are going to these forecasts for answers. They wanna know the probability that their candidate will win, and you're not going to, you know, when people just want an answer, like it's extremely hard, and I've seen this in my research, like it's extremely hard to get them, even if you give them a visualization that's trying to like force them not to just like round up, like they will still find ways to do it.

So honestly, like some of my research lately, like I've spent so long trying to figure out like, and realizing like often it's hard to change these heuristics. Like often it's just. They're just built in. And what I started thinking more or asking as a question, like as a researcher, is like if these things are so ingrained in people, [00:47:00] like there's gotta be ways in which they're sort of optimal in some sense.

Like one of the things that actually like surprises me is that, you know, like we see this, like this tendency to ignore uncertainty in so many settings and just like use the means use like the under standardized effect size. There's gotta be ways in which like that's actually robust in ways that we're just not, like those of us coming from statistics are like, oh, you always have to take into account like the variants, et cetera.

And I think some of these heuristics are actually. They work well enough. Like in, if you have these like noisy decision settings, there's not gonna be a big enough difference I think in many cases between like fully taking into account all uncertainty and acting Beijing and just like being this sort of empirical kind of like, I'm just gonna use the means and ignore the uncertainty.

So I think one, I don't know, some of my interest in theory is like I think if we apply a certain learning theory, there might be ways in which we can better understand like why some of this behavior actually works out for people in the end. Because like if we really need it uncertainty all the [00:48:00] time, like we probably wouldn't be operating the way we do

So anyway, I think there's like a lot of deep questions as that, you know, now that I'm more senior as a researcher, I can question everything and it's, I'm learning a lot by thinking sort of in this more adversarial way. Yeah, yeah, yeah. No, for sure. And also it's like a bit of my pessimism here, but I'm like these kind of topics also is, I kind of feel also rely like making progress in those.

Areas, requests people to learn new things and learning new things is hard. It's uncomfortable. And our species just trying to be as comfortable as possible, as fast as possible. So you mean learning new things about like, what do you mean? So basically learning about probability and uncertainty and things like that and, and being aware that, well, just the what you see, for instance, and your intuition are not perfect.

And taking that into account [00:49:00] requires quite a big amount of just curiosity and wanting to learn. Totally. Yeah. That's what I, I mean from a, like a theoretical perspective, like there's some, you know, payoff function under which it's simply not worth it to invest the time. And so we should be thinking about it potentially more like that to better understand what's going on and why it is sufficient to ignore uncertainty in some settings.

Like not in all settings, but I think there's a lot of settings where we. Where we can ignore uncertainty and it's actually not that big a deal. Yeah, yeah, yeah. And we do that all the time. , which is a weird thing for me to say. Yeah. as an uncertainty person. No, for sure. But, and we do that all the time, as you were saying, like also because if you, maybe from a survival for survival, I know most of the time it's better to underestimate, like not care about uncertainty and just care about the worst scenario.

Yeah, actually, and there's some really interesting work that all this makes me think of in mathematical psychology on basically this idea that, you know, it can be optimal to operate, like to make a decision just based on a single sample. So there's this interesting paper called [00:50:00] One and Done by Some Cognitive Scientists where they basically formulate sort of a framework in which you can understand that.

Like not even waiting to see more samples, like just deciding based on a single sample is actually sort of the utility optimal thing to do. So. So it's all in like how you look at it. I think the moral of the story. Yeah, yeah, yeah, yeah. Before time is running by and gonna, we're gonna have to close up, but I have, I still have so many questions, but I'm actually wondering, so we talked about all those biases and so on, you know, but if you.

Re-engineer how the way Homo Sapiens processes data and visualization. Would you change something? And if yes, what? Yeah, I don't , I think that's a hard question and I can maybe sidestep it by saying, you know, like what are the reasons I went into computer science is because I just wanna think about like abstractions and not have to re-engineer people.

I think even more so though, I don't think I could propose how to re-engineer people without fully understanding [00:51:00] people and how people process information. And I don't think we really. Do fully understand, at least not when it comes to like how do people do good exploratory analysis or good statistical inference.

I think like how do you connect, like what's happening in like the eye to like, like everything else, I think is still very hard. So yeah, I'm not gonna suggest that I know how to change it. I would much rather just think about like, given how people appear to be acting, how can I come up with better, better tools for them, better representations, et cetera.

So, so yeah. I mean, would you re-engineer them? I'm curious. . No, I mean, absolutely don't know. I mean, it was kind of a way of asking you if you think there is. It was a fun way of asking you if you think there is doing things like more optimally with our current environment when it comes to visualization and data.

That was the question. Yeah, I probably should say it. I mean, I probably should say that yes, I want people to account more for uncertainty, but like I said, like I've sort of like, I've seen the like [00:52:00] strength of these kind of heuristics so much at this point that I feel like my take instead is that like something optimal is going on here.

Like there's something, there's some reason why, you know, like as much as we wanna like say that people should just be like this, they should just account for sample size, et cetera. I don't think it's that simple. So yeah, so I wouldn't try to re-engineer people. I would try to, I would try to see if I can explain it in other ways first, which is what I'm trying to do.

So you're uncertain. The importance of uncertainty. That's interesting. Exactly. Yeah. Yeah. , I think it's sort of a natural, like Yeah, a natural response though, to studying uncertainty for so long, you're gonna start doubting everything. And I . Yeah, exactly. Yeah. Yeah. at the end where I'm at, oh my God, what have I done,

Um, yeah, exactly. Uh, okay. So, okay. Just one last question before the last two ones, but I think it's a more. Faster one. I'm curious if for you personally, like thinking about all those studies in your research, did that [00:53:00] change anything for you personally when you're dealing with data? I mean, maybe your habits or how you consume data?

things like that. Yeah, I guess I've talked a lot maybe already about how like it's made me doubt sort of, you know, how much we need uncertainty in a more sort of theoretical way. But I would say like, you know, in sort of a more daily, like the kind of stuff I'm doing on a daily basis, how's it affected things?

I mean, when I switch to Beijing methods, I mean I think I like just that switch alone can make you report uncertainty slightly differently. Like emphasize uncertainty more in your results. And of course studying uncertainty visualization for like, whatever, like 10 years now when I write like the results section of a paper or I help my students write them, like I'm.

I'm much more, just much more aware of like how little I can actually say from like, the basis of like some data that we have or some, you know, user study we did of some tool we built, like, so I think it, it definitely affects things in that respect and yeah, we didn't really get into it, but I think, like I said, like switching to Beijing methods, I [00:54:00] think the way in which you're interacting with uncertainty, like in, you know, you fit a model and you wanna interpret it, it's so much more natural to sort of get the answer the questions you want or query your fitted model in ways that make it easy to show uncertainty in a much more flexible way.

So I think, you know, getting into that has all shaped sort of like how I think about like writing a good results section for some empirical paper. So yeah, like I said, we already talked about it, but I now I doubt sort of the, the value of uncertainty in many situations. So it's obviously affected me in many profound ways.

Yeah. Okay. So I think it's time for the last questions, unless you have, do you have any project that you. For, for the coming month that you particularly care about and or I mean, are excited about and, and wanna share right now with the people before we close up the show. Um, I have a lot of projects, but probably any of them will take me a while to really explain

So yeah, no, I mean I talked a little bit my interest in like the more theoretical perspective and some of this like pre-ex [00:55:00] experiment analysis you can do. So yeah, I'll leave it at that. Okay, perfect. So as usual, the last two questions, let ask every guest at the end of the show first. If you had unlimited time and resources, which problem would you try to solve?

Well that one, yeah. I guess following up on a lot of what I've already said today, why is on standardized effect size, like the thing that people turn to so often, like why despite all of our years of arguing for, you know, better communication of variation and uncertainty, why does it still seem to work in many situations to just look at like a difference in means?

I think there's something really interesting there, and again, I think like, you know, thinking about that maybe as a theorist could be useful to try to understand like what are some possible ways in which this might be optimal or sort of, there's not a lot of regret that you incur when you do ignore uncertainty.

So yeah, I think I. I'll probably think about that question anyway, but it's, if I had unlimited resources, I would definitely spend it partly on that question. I am not surprised. . [00:56:00] Yeah, . And so final question, if you could have dinner with, um, any Crate Scientific mind, dead, alive, or fictional? Oh, I didn't even notice.

Fictional, that would've been interesting to create a fictional person to have dinner with. Yeah, I think, yeah, I mean there's so many brilliant people. I guess one problem with brilliant people is that sometimes they're very full of themselves. So I feel like I have to be a little careful and just choosing some brilliant scientific mind cuz like, you know, you could ask someone like some, you know, famous statistician from long ago to go to dinner and find out they're a total asshole.

So, so I'm gonna pick someone that I recently have like learned more. About and whose research has always sort of fascinated me, but also seems like he was like a really sort of inspiring person who went through a lot of like hard things himself. And that's David Blackwell. I don't know if you're familiar with him, but he was a statistician, kind of probability theorist and he's now dead.

But he did a lot of like really interesting work on [00:57:00] information and kind of game theory as well. So some of the work I talked about earlier on actually with, you know, trying to rank for different visualizations that different kind of like how informative they are of a distribution. Like what can you say about the superiority of one versus the other?

He did. This interesting work on what's called Blackwell ordering, which basically says like, you can order if you have like noisy channels. So representations of some, some experiments say like you can order them and, and sort of, and understand through kind of theory. The ways in which a more informative structure is sort of superior to another.

It's like I think of visualizations in this way. So his language was like, you know, if you have two noisy channels, you know, you can think of it as like two visualizations or two representations where you can represent one of them as like a post-processing step applied to the first. Then you can call the second one a garbling of the first one.

And if you have a garbling, then you can do all this theory to sort of understand like ways in which you can never do better with this, like less [00:58:00] informative visualization or this. He wasn't talking about visualizations at all, but I find his work really, really interesting. And he also did some work on like combining expert opinions.

Just a lot of really entrusting stuff that gets into like calibration and game theory, but where he was looking at these problems through game theory and it turns out that they were also like these critical fundamental issues just in like statistical learning. So he's a fascinating person and if your listeners don't know who he is, I recommend checking out some of his work.

And like I said, sounded like a really nice guy who would be inspiring to be around for dinner. Well love that choice. Yeah. If you have, um, any links about him and his work that you can put in the channels for sure. Sure. Yeah, there was actually, yeah, recently a workshop at the, I'm at a theory institute right now in Berkeley, and so there was a whole day devoted to him, and so I'm happy to, to link that in case people wanna learn more about some of these cool contributions.

Oh yeah. Perfect. Yeah, I'm pretty sure Lisa are gonna thank you. Like the show notes are usually a popular resource, so Yeah, for sure. [00:59:00] Awesome. Okay, well, Jessica, it's time to call it a show. Thanks a lot for being so generous with your time and, um, also diving so deep into your explanation. I really learned so much and I still had like so many questions for you, but that's my job to be frustrated always at the end of the.

So I guess I did a good job today, uh, . So as usual, I put sources and link to your website in the show notes for those who wanna dig deeper. Thank you again, Jessica, for taking the time and being on this show. Yeah. Thank you so much for having me. I've enjoyed talking and thanks for, uh, making my rambling seem appreciated.

it's, it was. Thank you. Okay. Yeah, thanks. Bye. This has been another episode of Earning Patient Statistics. Be sure to read preview review and subscribe to the show on your favorite PACA or ER and visit learn.com for more resources based on today's topics as well as access to more episodes that will [01:00:00] help to reach true patient of mind.

Let's learn base test.com. Our theme music is good. Patient by Baba Ringman with mc, Lars, and mega, check out his awesome work@babaringman.com. I'm your host, Alex Andora. You can follow me on Twitter at alexco endora like the country. You can support the show and unlock exclusive benefits by visiting patriot.com/learn a steps.

Thanks so much for listening and for your support. You're truly. Good and change your predictions after taking information in. And if you're thinking of be less than amazing, let's adjust those expectations. Let me show you how to be a good change calculations after taking fresh data in those predictions that your brain is making.

Let's get the solid foundation.

Previous post
Next post