#152 A Bayesian decision theory workflow, with Daniel Saunders
• Join this channel to get access to perks:
https://www.patreon.com/c/learnbayesstats
• Proudly sponsored by PyMC Labs: https://www.pymc-labs.com/contact
• Intro to Bayes Course (first 2 lessons free): https://topmate.io/alex_andorra/503302
• Advanced Regression Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Takeaways
Chapters:
00:00 The Importance of Decision-Making in Data Science
06:41 From Philosophy to Bayesian Statistics
14:57 The Role of Soft Skills in Data Science
18:19 Understanding Decision Theory Workflows
22:43 Shifting Focus from Accuracy to Business Value
26:23 Leveraging PyTensor for Optimization
34:27 Applying Optimal Decision-Making in Industry
40:06 Understanding Utility Functions in Regulation
41:35 Introduction to Obeisance Decision Theory Workflow
42:33 Exploring Price Elasticity and Demand
45:54 Optimizing Profit through Bayesian Models
51:12 Risk Aversion and Utility Functions
57:18 Advanced Risk Management Techniques
01:01:08 Practical Applications of Bayesian Decision-Making
01:06:54 Future Directions in Bayesian Inference
01:10:16 The Quest for Better Inference Algorithms
01:15:01 Dinner with a Polymath: Herbert Simon
Thank you to my Patrons (https://learnbayesstats.com/#patrons) for making this episode possible!
Links from the show:
https://www.fieldofplay.co.uk/
Hello my dear Baysians!
In this episode, we're talking about the part of the workflow everyone forgets, actually making decisions.
We spend so much time staring at trace plots and checking our priors, but in the end, the goal isn't just to have a posterior or a beautiful model, it's just to use it.
My guest today, Daniel Saunders, is a senior
data scientist at PIMC Labs with a PhD in philosophy.
And he spent a lot of time thinking about how to turn inference into action.
So in this episode, Daniel is going to live demo his new patient decision theory workflow.
We'll see how to use Pydancer, which is PIMC's backend, to optimize for-profit and utility instead of just accuracy.
and why shifting the focus to business value is the ultimate soft skill for data scientists.
This is Learning Basics and Statistics, episode 152, recorded February 24, 2026.
Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the projects, and the people who make it possible.
I'm your host, Alex Andorra.
You can follow me on Twitter at alex-underscore-andorra.
like the country.
For any info about the show, learnbasedats.com is Laplace to be.
Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on Patreon, everything is in there.
That's learnbasedats.com.
If you're interested in one-on-one mentorship, online courses, or statistical consulting, feel free to reach out and book a call at topmate.io slash alex underscore and dora.
See you around, folks, and best Beijing wishes to you all.
Daniel Sanders, welcome to Learning Vision Statistics.
Thank you for having me.
Yeah, thank you for being on the show.
I'm trying to...
I'm thinking, but I think you're the first guest that I know already from real life and then coming on the show.
Usually it goes the other way around.
I know people online...
through the podcast and then somehow because one is traveling, I meet them.
I meet them in real life.
But no, for you, it was the other way around.
So yeah, well done.
You're the first one.
Good to be the first of something.
Yeah, we met at a StandCon a couple of years ago, I think, over in Oxford.
Yeah, 2024.
Yeah, that's a nice conference.
had a lot of fun there.
Yeah.
Yeah, that was great.
That was great.
Plus it was in Oxford, so...
That was super fun, you know, like classic British atmosphere.
um Very intellectual, of course.
So I mean, I don't know for you, but that's my element.
It's, prefer these atmospheres and say, you know, like bricks and you go to pubs and you get a beer.
And the bar is like 400 years old, which is a nice touch often.
Yeah.
No, incredible.
Like, I mean.
I was very surprised by Oxford prices when I, when I came there, I was definitely not expecting these prices, especially for hotels.
Um, but I have to say, understand, uh, why it's, uh, why it's so expensive.
It's a, it's a great place to be.
So they, uh, they must be used to having the, universities, compa hotels days for all their guests.
So paying your own money is a little bit of an anomaly for them.
Yeah.
Yeah, that's for sure.
But that was super fun.
Like I really enjoyed it.
I think it's the, I think it was the first time in my life where I was actually kind of feeling like a celebrity because the first time when I arrived, so when I arrived in
Oxford, the day before the conference, like nobody knew me in theory, I was walking down the street in Oxford, just, you know, sucking up the atmosphere.
And then somebody stops me and is like, Hey, sorry, aren't you Alex Sandoor?
I'm like, uh well, it depends.
What do you want?
And it turns out he, um, that was a fan of the podcast.
And so he was like, well, I, love the show.
Uh, that's great for you to be coming here and so on.
So yeah, that was the first time it happened to me.
That was a, that was pretty fun.
um so, know, everybody can be in, in, in a celebrity, you just need to find the.
the smallest niche possible and then boom, that's perfect.
Unfortunately, it might be the only place in the world where you get people stopping the Asians on the street and asking them to confirm their identity.
Maybe in Edinburgh, isn't there where David Hume is?
um I think maybe also probably where Bayes is in London, the cemetery.
um I went there last time I was in London.
basis tombstone that was, that was amazing.
Um, but yeah, but I have to say it's, it's good.
Like it's the only place, know, otherwise it becomes, like I understand why it could be a, could be a hassle.
know, you're mining your own business, walking in the street and then you have to talk to people.
Uh, can be, I can see how it can be tiring when it's all year long.
Yeah.
I think celebrities have a very difficult life in that regard.
Yeah.
I'm not sure it's a great, uh, I'm not sure it's a great trade off, you know,
I'd love to be some A-B testing.
Honestly, if I could do that, you know, and test that for myself, I'd do it for curiosity, but I think anonymity is a great deal that's often underrated.
That being said, we're not here to talk about me.
We're here to talk about you, actually.
So, and you have a great background.
Like, I think it's awesome that you're on the show because you have a PhD in philosophy.
ah
So you're definitely the first guest to have that and to have such a focus in philosophy and you have a focus on logic and decision theory.
So that's going to be super fun to talk about.
yeah, my first question would be, how do you get to Bayesian stance from philosophy?
How did that happen?
Yeah, the transition was gradual, but when I was in graduate school for philosophy during the pandemic, I had a lot of free time on my hands.
as many people did.
And I also had a lot of time to think about my future in the philosophy field.
uh And started to appreciate that it's like, don't just want to spend this time acquiring knowledge.
It'd be great to acquire a few skills along the way.
um And incidentally, there's a region of philosophy that's interested in mathematical modeling.
um They either try and think about the philosophical foundations of it, um
Models are kind of strange objects where you want to simplify reality.
So they're a little bit fictionalized, but they're also supposed to tell you truth about the world.
So there's uh a big philosophy puzzle about how does that work?
And then people are also interested in using models to understand how do you make the right decisions?
How do societies develop social norms and moral and moral norms?
Um, so I decided to head down that direction.
as I thought, gives me some skills.
It's also very interesting.
And one of the main stops on that tour was this area called cultural evolution, um where they're interested in exactly that question about how do you develop moral norms in a
society, what sustains certain moral norms, and what causes norms to change over time.
um And it turns out that Richard McElrath is one of the
kind of major figures in that field.
uh That's where he got his graduate training.
And he has this characteristically beautiful textbook about evolutionary game theory and social learning.
um And so I read that book, loved it, and then was like, hey, what's this statistical rethinking project he's working on?
So I got curious enough about that, but I
wanted to learn about it too.
and at that point, that was kind of like the middle of my PhD.
was like, I've, found the thing.
I found the thing to really sink my teeth into.
Um, so that book was a pivotal period for me.
Um, and then I just tried to incorporate, uh, kind of patient statistics and modeling into everything I did throughout the rest of my PhD.
So I ended up writing about, um, kind of,
Ended up writing a dissertation that explored a few different things, but one of the main themes was about the history of cultural evolution as a field and how do they think about
statistical models.
and in preparation for that project, I was learning a lot, uh, statistical modeling myself.
Um, so yeah, that's, that's, the transition.
That's how they're so tightly intertwined in my eyes.
Well,
He, uh, he seems to have that effect on people that the particular book has been everyone's site says like, this was a changing point for me.
And I was, Well, that's, that's very awesome.
Well, and, uh, Richard, if, if you're listening, well, thank you so much.
It's, uh, it's amazing to have such an impact.
I guess, um, actually talking about, you know, turning point.
Where's there a specific.
aha moment during your Academy career where you realized that Bayesian stance was the right framework for the philosophical problems you were tackling?
Yeah, I consider myself kind of a non-ideological Bayesian in that I didn't really fall in love with it because of its like philosophical foundations.
It's this whole, you know, hundred year long project of showing that Bayesian rationality is...
the correct way to make decisions.
and you might think that Bayesian modeling falls out of that kind of motivation that it's about, um, the optimal way to update beliefs and the optimal way to make decisions.
Um, but for me personally, I didn't find that as persuasive.
What I found really gripping was this whole idea that, um, Bayesian inference allows you to focus on the data generating process.
Um, what you'd need to do to build a Bayesian model is just think really hard about what kinds of causal processes contribute to the phenomenon that you're interested in.
Um, and then it kind of takes care of the rest.
The inference process is not entirely trivial.
It can still be challenging, but it's, um, greatly simplified.
Whereas with frequent tests, there's a whole kind of zoo of different estimators and properties of estimators that
one has to keep track of.
I feel like that often ah adds friction to the process of building scientific models.
um That once you strip it away, statistics is really fun and intuitive.
ah it um is connected to things you already care about in science, like developing theoretical models about how processes work.
um So the kind of connection to my
academic work was that, um, culture evolution theorists will build these simple models of how human societies evolve over time.
and that's kind of one flavor of modeling, like the theoretical modeling.
And then there's like statistical modeling, which in a kind of, um, in a kind of estimator language, it can often feel disconnected from that.
But one of the things I've found, uh, disturbing or unusual.
about scientific practice was that these two types of models were often spoken about and developed in isolation from each other.
But I just was like, we're trying to represent one reality.
Um, so there can't be that deep of a separation between the two.
And I think that's what the kind of, the, the philosophical justification for Bayes that persuaded me.
Yeah.
Okay.
That makes sense.
And,
Yeah, I really love that idea of...
With base, you just have one estimator.
It's the posterior.
And then you're done.
You don't have to learn about all these different zoo of tests and...
like...
which, basically trying to see in which box your use case is falling.
Without mentioning the cases where your use case actually doesn't...
really fit into any of their pre-existing boxes.
And then good luck creating a new box in this zoo.
yeah, like this is to me one of the most interesting parts is that you just have one estimator to posterior and then do whatever you want with it.
That's so powerful.
Almost as simple as it could be.
Yeah.
Yeah.
Doesn't mean easy, but as simple as it could be.
uh
I'm actually something I want to ask you about because I think it's also like in our field in general, um what are called soft skills, which I refer to as interpersonal skills, um
which I think is a better way of framing that because, you know, usually soft skills are a bit underrated.
I'm curious from philosophy, this is actually something that I was...
thinking someone coming from that field uh probably has a better sensitivity to these.
So do you think there's at least one soft skill, let's say from philosophy, that you find most undervalued in the data science industry today?
I do tend to think that um philosophers are well equipped to think about model evaluation.
uh
in a way that is often like the disciplinary training of other fields can discourage you from thinking about very clearly.
Um, because then like classical machine learning, there can be a, a, uh a kind of clarity to model evaluation that is artificial for other fields.
Um, they'll, they'll try to figure out which model reduces the KL divergence.
the most or which model will have the best out of sample predictive performance.
And because it's so clearly measurable, em you can just run a sequence of models and rank them on this one metric.
But I think philosophers are a lot happier of ambiguity and they're a lot happier with trying to justify the model, not from its performance, but from its assumptions.
em So I think that the kind of
very parametric, very causal approach to modeling that's often popular with Bayesians.
It's something that philosophers are well suited to do.
And I also think that in like ah industry, um so now I work at a consulting company um that does Bayesian models for mostly private industry.
um I feel like one of the ways a project can go most deeply awry
the way you can waste a lot of really good technical effort is when senior stakeholders have a totally different expectation about how to assess a model at the end.
There's often this question of like, should we trust this model?
Why should we trust this model?
Um, and I've just in the short, uh, course of my experience in the, in the sector, I've noticed that these conversations, if they start off on the wrong foot, if they start off
as like, let's find a metric that will tell us what the correct model is.
then you can trap yourself.
You can trap both the scientists who are trying to optimize for that metric.
And then the stakeholders who are disappointed when that metric either doesn't do what everything is promised or that their, their scientists aren't able to hit some target that
they've decided.
Um, that whole conversation feels like it got started in the wrong place.
And,
maybe a little bit less technicality and a little bit more humanistic kind of approaches to modeling can communicate the kind of uncertainty that we're dealing with when we're
these causal models.
News here on the podcast, but I know you also, you've said all the time, is Bayesian workflow.
know, this, I think it was even episode six of the podcast with Michael Betancourt where ah we talked about principled Bayesian workflow.
for and you work a lot on decision theory workflows.
How, how these work, how does this workflow extend or change that traditional patient workflow cycle?
Yeah, there are, um, I think they are typically fairly separate.
Um, and I think the separateness is something that is a virtue of the way I've been thinking about decision theory workflows lately.
Um,
So the kind of classical Bayesian workflow is that iterative process of refining a model and finding out more about your day generating process and incorporating that back then
and having a variety of prior checks and posterior predictive checks to kind of keep you on the track.
The decision theory workflow uh stuff I've been writing about lately comes out of this observation that
in large teams, uh especially in industrial or business contexts.
There's a lot of knowledge you have to have about how does the, what are the processes that drive the business, but also what are the kinds of metrics they care about improving?
What are their constraints and how they can improve that?
um What are their different levers they can pull to improve sales or
Uh, decreased wait times or whatever it is that they're trying to solve.
Um, and it can often be helpful to have separate people keeping track of all of that knowledge.
Um, because they are conceptually disentangled, you know, in the kind of like classical Bayesian decision theory.
The idea is you have a, um, a credence function or a probability function that assigns, probabilities to all the different states in the world.
And then you have a utility function that says, um, if any one of those States was realized, how good or bad would it be to for me relative to all the other things that
could be materialized.
Um, so kind of in the, in the foundation, these two things are separate.
And if someone gives me a really good posterior, I can optimize using that.
Um, but frankly, if someone gives me a bad posterior, I can still optimize using that.
And I don't need to know whether.
like exactly how the posterior was generated to develop a good tool for creating inferences on the basis of that.
Whatever your beliefs are, we can tell you the best decision, Dashaun does, please.
So the decision theory workflow is a kind of set of tools for what to do after someone's developed a good model or even while they're developing the model.
Yeah.
Yeah.
Okay.
I like that you...
that you make them, you know, separate but complimentary because yeah, I do agree they are not like they are not completely the same goal and same product in the end.
so sometimes you need just a modeling workflow and a lot of the times you also need the decision theory workflow on top of that.
um I'm not aware of cases where you just need the decision theory.
workflow, not the modeling one, but that probably exists somewhere.
I don't think I've ever encountered that.
um One thing though that I think is very important to have in mind and tell the listeners is I think a lot of the time, actually the need for the decision theory workflow on top of
the modeling one is underestimated.
Where people don't really think they need it or even they don't
even considered it's an unknown unknown in a way for them.
And, and I mean, in industry, what I've seen a lot is that often you optimize for metrics like RMSE or MAPE, but in your framework on decision theory, how do you, how do you shift
the conversation from how accurate is the model to how much value does this decision?
Yeah, I think that um orienting the model evaluation process around decision making can help um dispel some of these misconceptions about what makes for a good model.
um So at the beginning of this conversation, I alluded to that there's this kind of project in philosophy of science that's interested in this puzzle about models.
Models simplify reality.
Sometimes they introduce these um artificial parameters that don't.
tie on to anything in the real world.
Sometimes they introduce explicitly false assumptions because the false assumption simplifies the math in some way that's really productive.
And you might be a little bit confused about how does this practice work because if science is supposed to give us a better and more accurate pictures of reality, then what
are all these simplifications and idealizations doing in modeling?
One of the prominent answers to this puzzle is to say that every model has a purpose and you only need to figure out whether it's good enough at its particular purpose.
So in, um, and kind of industry and science, it seems like there are often three purposes, uh, forecasting, uh, understanding and decision-making.
And a model doesn't need to be good at all three.
In fact, it often can be to its advantage to pick one to prioritize.
Um, you know, very difficult to explain nonparametric models can be great at forecasting, but terrible at informing understanding.
Um, if the end goal of the model is to tell you, um, who to draft and, uh, an upcoming sports draft or how much advertising money to put in particular channels or which
discounts you should run on the product.
Um, then.
You only really need to make sure it's making those decisions as well, as good as it can.
Um, and that this lets you be a little bit strategic about how you approach model evaluation, because instead of saying R squared must be above 0.9 before we'll accept the
model, you can know this that at certain kind of like adding on additional features to try and bump up the R squared doesn't influence the parameters that affect your decision
making.
in any meaningful way.
And you can get to a point where you can quantify the cost or value of uh making your model more complex.
You could say, if I added all these features to the model, if we ran the model for three hours instead of 20 minutes, it would save you 20 bucks.
And I think that's a kind of really powerful way of anchoring these more abstract conversations about model valuation and something that businesses can respond to.
Yeah, yeah, Yeah, that makes a lot of sense.
And I really think that this...
Also, this way of thinking makes usually much more intuitive sense to business stakeholders because you're going to relate that to actual business metrics instead of
just statistical metrics.
And this is usually much more valuable and concrete for stakeholders, non-technical stakeholders.
Also, it's a much more direct thermometer of what a model is actually doing and how it's lifting the bottom line or not.
And in the end, is really what you care about.
And on that note, know, for the listeners who see PymC or Stan or NumPyro, any probabilistic programming language as just a way to sample from the posterior.
What is the advantage of using Pytensors, which is PymC's backend.
And this is what you're using in your, in your blog post.
We'll, we'll get to that very soon, but basically you have that new blog post about
decision theory workflow out.
That's what we're talking about today.
And of course in that um blog post, you're touching on Python, sir.
And symbolic computation.
So can you tell us what is the advantage of using that for the optimization side of things?
You know, why can't we, and that's like a follow up question, but I think it's a good place to ask it.
Why can't we just.
you know, run scipy.optimize in a loop over the posterior symbols.
Yeah, yeah, two good questions.
So, PyTensor is the, it's a kind of unusual backend for a machine learning library these days.
It has its history and a project that started around 2012 and was officially sunset in the late 2010.
Um, but PyMC decided to revive it because we saw a particular kind of value in the package that wasn't being captured by things like Jax or TensorFlow.
And the value is that it gives you a representation of a sequence of computations in terms of a compute graph.
Um, and when I first started hearing people talk about this, I found it too abstract to really gather.
But if you take a simple computation like
5 times 10 plus 3.
In elementary school, you learn an order of operations that says you have to do the multiplication first and then the addition.
So you can think about that sequence of operations as a tree where the leafs of the tree are the three numbers, 5, 10, and 3.
And then the tree compresses
between 5 and 10.
em And then it compresses again between the product of those two and three.
So you get one final answer, which is the root node, and then a kind of branching tree that goes up to the kind of atomic parts, the leaves.
PyTensor lets you look at this tree, and it lets you operate on this tree.
So you can do things like...
uh I want to replace three of a different number.
I want to replace the addition operation with a different oper- with a subtraction operation.
um Or I want to kind of extend the tree.
I can take the product of some sequencer computations and then glue it into a different tree uh and then continue working on that.
So that's how, that's the kind of big idea behind pie tensor.
The reason why this is so powerful for optimization is that lets you maintain a division of labor in terms of what your modelers are doing and what your decision makers, the
people who work on algorithmic decision making, em what do they do?
Because the tree gives you a kind of inventory of reusable parts where the modelers can say, here's your price data, here's your sales target.
And we're going to fit some function that links the price data to the sales target.
um The person working on the optimization doesn't need to know terribly much about what the sequence of operations is that connects A to B, price to sales.
They can say, I want to start the sequence at price.
I want to get to the sales.
And then after that, I would like to add on a bunch of extra stuff to this tree, stuff that's
specific to my needs, you know, maybe I need to multiply sales by the unit profit in order to get the objective function I want.
The modeler doesn't care about the unit profit, but the optimizer person does.
So that's one benefit is that the modeler can work in isolation and the optimizer person can work in isolation.
And then they can quickly glue their parts together by
by representing the model as a graph and operating on it.
The second benefit is that you can turn any section or entire tree into an evaluable, a callable function.
em And many optimization libraries, SciPy in particular, have been built around being able to optimize arbitrary callable functions.
So you extract a compiled, efficient, nice function
out of a PMC model and you hand it over to SciPy and you say, here's my inputs, here's my output, find the best combination of inputs.
um So that I think is your first question.
um But you also asked a little bit about why not um do the full posteriorized sequence of SciPy applications.
Yes.
So you could, you could.
You could take each sample and say conditional on this one sample, what is the best price of my product?
The funny thing about that though is you would end up with 2000 different best prices and you would need another operation to decide how do you decide between those different
prices.
What's nice about PyTensor is that you can
The kind of natural thing to do there is to take the average of the different recommended prices.
It's nice about pi tensor is you can put that averaging process inside the graph.
So you'll, you can vectorize, which is a kind of efficient way of doing several computations in parallel over your full posterior.
And then average the result.
So you get, will be either an expected profit or maybe an expected utility.
And you say, and you ask sci pi to say.
What is the best expected utility or expected profit I can get additional on this full posterior?
Yeah.
I can't second that.
I've done both.
First, I was doing it artisanally with SciPy and then smart people like Ricardo Viera, one of the main maintainers and developers and Jesse Krabowski, were like, you should...
probably try and do that with by tensor.
It's going to be much more principled, let's say, and also much faster because this is vectorized, as you were saying, whereas if you do that with the sine-point loop, for loop,
it's going to take a long time.
This can be a very time-costly process.
I've done that a lot, and I'm very happy that you can do that with by tensor now.
So definitely recommend people to try that out and you have a Python search documentation, but also now Danielle's blog post, which is really amazing and shows you how to do that.
So definitely recommend you checking that out.
ah Something I was also wondering actually, Danielle is uh how does this framework of optimal decision making, can you tell listeners how it applies to a concrete environment?
know, industry, let's say an industrial environment, how does that even apply to what listeners are doing in their job day to day?
um Yeah.
So it comes up anytime you need to make decisions on the basis of Beijing models.
um The kind of popular topics to make optimal decisions for are things like the price or discount of a product.
um how much to spend on various revenue generating activities.
But it also comes up in government context too.
So you need to figure out how much electricity you're going to generate for your province or state or country.
um And it's expensive to generate too much electricity, but it's quite damaging to generate too little.
So you need to figure out some kind of way of balancing those considerations.
And you're also dealing with uncertainty and how much electrical demand there would be.
So this is another kind of context.
the range of applications is huge for this kind of optimal decision-making framework.
um Kind of nice feature of this pie tensor approach in industrial context.
It's kind of, kind of what you were saying a second ago about how we've tried it the hard way.
And now we've tried it the easy way.
It's this particular project I've been working on.
um In my job, we have a very large company who sells products in many markets around the world.
And we're building a by hand, a Bayesian model to understand each market.
And when we first started doing this project, we would.
Uh, we would take the important functions and rewrite them in NumPy and try and extract the posterior stuff it into NumPy and pass that to SciPy to get our decision.
And we realized that every single time anyone made a change to any market about how they want to model some process, we would have to rewrite the optimizer code.
Um, so in terms of the industrial scalability of optimization, I think
Being able to abstract over computational graphs is super handy because there's a lot less code to track.
Instead, there's this portable object that you can, you can know more about it if you need to know more about it, but you can also know less about it if you would rather be agnostic
to the particularities of the function that you're dealing with.
Yeah.
So good to know I wasn't the only one to do the, the artisanal SciPy for loop.
I feel better.
Something I've also seen in my experience working in industry is that it's often easier to explain a prediction to business stakeholders than a utility function that I can tell you.
So how, I have an idea on how you can answer that, but I'm very curious to hear your articulated answer.
How do we sell the concept of optimal decisions theory to...
non-technical partners who may not care at all about the philosophical and mathematical and their opinions of why that's the optimal decision to make.
Yeah.
And in both, problems are similar.
You're often trying to both sell a particular, um, modeling project to non-technical stakeholders, but you're also trying to elicit their understanding of that problem.
So in building Bayesian models, that might be eliciting priors.
from stakeholders, but they also might be eliciting functional forms that connect inputs to outputs.
And then the utility function is also something that needs to be elicited.
You need to know what levers can they pull, are there constraints on those levers, sort of boundary conditions on how much can they put into this lever versus another one.
And then...
eat all kinds of particularities about their business that, maybe they have a contract that says that they have to buy so much from this particular vendor.
So that part is locked in, but this other part is flexible and they're trying to decide which contract to pick up.
Um, but fortunately, I think the decision-making stuff can often be easier to elicit because that's the thing that is most front and center in non-technical decision-makers
months.
um
is, you know, frankly, is this going to make more money?
um There's a very simple utility function that just says $1 equals one utility, and let's uh maximize expected profit.
um So I think you can much like invasion models, you can start simple, and then add complexity as you learn, you you get some feedback about this decision isn't actually
possible for us or
Maybe they actually care a lot about making at least this much money in the next fiscal quarter, because maybe they need that much money to ah start some new project they're
excited about.
So you can slowly get to a more rich and robust utility function through cycles of feedback.
em In some contexts, the utility function is extremely well defined.
Um, and you just need to find the right people to talk to.
So in government regulation and regulated industries, the utility function will often be sort of implicitly written into regulations.
Um, like the grid has the power grid has certain responsibilities to the public and those responsibilities imply certain utility functions or certain kinds of risk tolerance that
they're allowed to take.
Um, similarly the finance department.
at a large corporation will know exactly how much each additional dollar is worth it to them.
So they might have some diminishing marginal utility function um that says that the next dollar is worth a little bit less than the previous dollar.
uh And we know exactly how much less it's worth.
And if you can get something like that, you get a really powerful and realistic utility framework.
Yeah, I love that.
And basically, yeah.
Making sure uh to relate that to dollars is usually a good way to uh have people interested in it.
And understand also really what this is all about and that this is not some, you know, just research thing that is cool, but not actionable.
The universal language.
Yeah.
Yeah, exactly.
Yeah.
So let's go into your new tutorial now.
I think that was a great introduction.
uh As I was saying, the tutorial is called Obeisance Decision Theory Workflow.
uh Of course, folks, you have the link in the show notes.
I think Daniel is even gonna show us some live stuff today.
So if you're listening, this is a part where you might wanna switch to YouTube to see the video.
uh Before we dive into the code though, what is the scenario you've prepared for us today, Daniel?
uh
And why is it a good representation of the struggles that Bayesian practitioners face?
Yeah, I want to pick something very simple, uh easy to explain, but it has all of the essential ingredients of what you would face in a more complex situation.
So we just have some consumer good, and we're trying to figure out what's the price elasticity of that good, and in turn figure out what's the optimal price.
Yeah.
Okay.
A price elasticity is usually something like, um if I cut my price in half, I double my sales or maybe I get 75 % more sales.
um That's what that metric usually represents.
But yeah, that's the setup to the problem.
Okay.
Yeah.
Perfect.
Simple and to the point with some dollar meaning.
So that's perfect.
So now let's start into these.
What do you want to show us right now?
So the data is just synthetically generated, but you might imagine a company tried varying its price over time.
Maybe they uh have run their product for a hundred days and they changed their price once a week.
So this is a...
a really good situation for us.
get a lot of information about how does demand respond to changes in price.
And the particular function we're going to use to represent um sales as a response to price is just some baseline level of sales times a multiplier.
And the multiplier is the current price raised to some power.
this power is called the elasticity.
Um, so we can imagine, you know, maybe they make a hundred sales a day, um, in the absence of kind of like price related, uh, penalties.
Maybe that's the intrinsic demand for the good is that there's a hundred people out there who want it.
Um, but then if the price goes up or goes down, you can move the demand in one direction or the other.
um
To set up the problem, we're just going to pick some arbitrary true parameters, um intervene on the model to generate sales data that obeys those true parameters, and then
kind of relearn the things we already know.
Relearn the true baseline sales and relearn the true ICO STC.
The reason why we do this is because we want to capture a little bit of uncertainty.
um
where if we just knew the right answer, there's not really any Bayesian part to it anymore.
We could just log that number in and optimize from there.
But in reality, we won't know exactly what these parameters are.
So we want to look at the problem as if there's a little bit of uncertainty.
um So you can see that we recovered the true parameters very, very well.
em The true baseline price or baseline sales is 100.
And we get 99.9 with a tiny little standard deviation around it.
And the true price elasticity is negative 1.1 with a little bit of variation around it.
So we're in pretty ideal conditions, but not, perfect information.
And if you're building a kind of Bayesian model, classically, this is where you would end.
You would have a summary of your posterior and that would be the kind of answer you get.
So this next section is all where the pie tensor and optimization process begins.
So to get to something we can optimize, we need to take a few steps.
We need to convert sales into a metric we might care more about.
um If you want to maximize sales, there's a trivial way to do that.
You set the price to zero in your sales.
Well, I go to.
infinity on this function.
So we don't want to maximize sales per se, we want to maximize profits.
So to do that, we'll need something like a unit profit, and we multiply the unit profit by the sales.
And what's nice about the unit profit is that as price increases, um your unit profit goes up, but as it decreases, your unit profit goes down.
So it stops that trivial solution to the problem.
um
And then we can see kind of an initial view of our objective function.
em If we knew exactly what the parameters were, if we just plugged in those numbers I told you earlier, we would get a surface that looks like this.
And you can just grid search on that surface to find out that the absolute best price for maximizing profit is $3.30.
Um, so we can use that as kind of uh a guidepost to see how we're doing.
But when we actually optimize with uncertainty, we're not going to find this answer.
We're going to find some kind of approximation to that answer.
This next business is that we want to take all of the posterior samples and plug them into our graph.
Um, so initially we just have these little random variables at the kind of, in the leaves of our graph.
we want to strip those out and replace them with posterior samples.
So this is one of the values of operating on graphs, and that's what this function does for us.
And then if we have our 2,000 or 4,000 samples, whatever it is, we need a scalar goal.
We need a single number that we're trying to maximize um so we can ah
take the mean of these samples.
And then there's a little bit of high tensor magic here, which is not terribly great to explain.
But the nice bit is that at the end, we can pull out a gradient function and an objective function.
And these are things we can actually use to optimize with.
This problem is extremely simple.
It only has a single variable that you're optimizing.
We don't really need all of the power of SciPy to do something interesting.
So if we just grid search over the function, we can get a different surface, which is what does the model think is the objective surface, given the uncertainties that it has, and
what does it think the optimal price is?
So you can see that uh even though we recovered the true parameters quite clearly, um it still recommends a price of around $4.
which is quite a bit different than the actual minimal.
So a small amount of uncertainty in the elasticity results in a big change in the recommended price.
Yeah, this is super cool.
think this is also super clear.
So really, you're watching the video right now, I'm guessing this is very clear.
If you're just listening, it's well done.
This is a good level of concentration.
So ah now we can show you what it would look like to put this all into SciPy.
um And remarkably, it's very compact.
m You just pass in the callable function that you and the initial location, and the bounds.
And you let SciPy know that the gradient of this function is available inside the big objective function.
um
When we compiled it above, we said one input is the optimizable price, and our two outputs are the profit and gradient of profit.
So this function has two things inside of it.
And Sci-Fi can use both parts quite happily.
I'm rather happy with how compact this representation or this way of coding it up is.
uh
And you can see that SciPy arrives at the same conclusion we did.
The best price conditional on certainty is about four bucks.
So that is the kind of the workflow, the kind of the first pass of it.
um We could talk about a little bit more.
We can talk about um playing around with your utility function or representing different kinds of risks or risk aversion in the function.
Um, but that's the heart.
Yes.
so, yeah, thanks a lot, uh, Daniel.
I think this was very clear and I appreciate you going through that live.
This is, this is very great.
Um, I think all the threads you just mentioned right now are worth, uh, talking about or even illustrating if you have something about that.
Um.
Yeah, like the different, maybe we can start with the different levels of risk aversions and how that translates into the cost function.
If that's something that you're happy with is like all the threads, the three threads you've mentioned is things I've seen concretely when working on optimization problems.
Yeah.
So classically, uh, patient decision theory says you have your belief function, your utility function, you
take them, you glue them together and you take the expectation over that.
So you say, what is the maximum expected utility I can achieve?
And I think that's fine for decision makers who are risk neutral.
em If you think losses and wins are equally valuable or that you're comfortable with uncertainty so long as that on average things will work out.
then that's the right decision-making framework for you.
And there's a kind of a robust philosophical dispute about whether risk neutrality is like uniquely best position.
But whatever the answer to that question is, people in practice are risk-averse and businesses might have reasons to be risk-averse.
That might be kind of beyond your control.
Um, so you can build in all kinds of constraints and, uh, special modifications to your function.
And I think this is the, illustrates this kind of nice division of labor is that.
Uh, optimal decision making is a complex process and that you need to kind of cognitively separate it from the modeling part to be able to keep track of everything.
So there's a whole family of.
different ways of handling risks.
If we took a look at what the objective function looks like from the point of view of our uncertainty, um you realize that at any given price, there's a wide range of possible
sales levels you could achieve.
um And for those listening, visualize a nice um inverted convex function that has a big cone of uncertainty around it.
Um, you can see that at really small prices, the uncertainty shrinks, because you're kind of moving towards that more degenerate case where, um, you know, your sales are going to
move to infinity.
People are going to be buying your product like hotcakes because it's getting closer and closer to being free.
and, um, there's like less complexity in the behavior.
Whereas on the other side, as you move to higher and higher prices,
There's a huge difference between if one person buys it or if two people buy it.
uh And as a result, you get a much wider level of uncertainty and the kind of profit you might expect to achieve.
the different prices are asymmetrical with respect to uncertainty.
And you might want to be on the safe side for some reason.
One way to be on the safe side is to have any utility function that diminishes at higher and higher profits.
So like the, this is um a longstanding idea in social sciences that people who are really rich aren't that much more happy than people who are a little bit rich.
Yeah.
Once you're kind of past the half million threshold, there's not that much more to be gained ah in happiness.
And that's the kind of idea with these um exponential utility functions.
This is a very popular technique from economics to model diminishing returns in utility.
for money.
So there's a whole family of them, but all of them have the same behaviors that they saturate as you get more and more money.
So instead of just taking your profit, sorry, you're taking your profit and passing it and calling that the same thing as utility, you can add an extra layer in between your profit
and your utility, which is some function that modifies that.
And if you pass in this exponential utility, if you insert that into the computational graph, um you can get a different utility function.
um And it gives you a kind of adjustable parameter for risk aversion.
um The different functional shapes of the exponential utility correspond to different levels of risk aversion.
And you can see that when you have no risk aversion, it still recommends a price of about $4.
But as you slowly increase this function, this risk aversion, the recommended price drops.
Because all of this stuff in the tail of the posterior that has really high profits, but very high uncertainty, it counts less and less for you.
So the function isn't attracted to anything with a big tail, tends to prioritize small tailed regions.
So yeah, this is one common and powerful approach to representing risk aversion, but we also could explore a couple more if we're feeling in the mood for it.
Yeah, this is, so let's do that.
And this is, this is awesome.
Like it's extremely practical.
I love that.
I've been using also these kind of plots a lot.
when working on optimization problems like that.
yeah, I think this is extremely valuable because also like the concepts of cost function or reward function or objective function are a bit complicated to understand at first.
So I think these kind of plots really help cementing the understanding.
Yeah, exactly.
So this first example is just a way of modifying how does money map to utility.
em And there's a certain sense in which that doesn't represent like a real risk aversion.
em Cause it's not the uncertainty that you're worried about.
You tend to stay away from high risk gambles as a product of your aversion to as your diminishing utility to money.
But you could try and build like direct risk aversion into your model as well.
So there's this popular technique.
in financial economics or mathematical finance called the mean variance function.
um And it's really nice.
You uh take the average of your utility and then you subtract some adjustable parameter times the variance.
And this is just intuitively, it's a penalty for large variance solutions.
um So if you look at the graph that results from that,
Zero risk aversion, you get the same price you got before, four bucks.
But as you slide up your risk aversion, um you very quickly realize that the recommended function stays very far away from any ah uncertainty in the posterior and penalizes all of
that uncertainty and moves you towards very cheap products that have less uncertainty in their section of the posterior.
So ah mean variance is nice.
It's super easy to implement.
It's just an extra line when you are saying, take the average.
You say, take the average minus a penalty for variance.
um And then there's this other idea called conditional variance at risk.
And the idea here is you want to protect yourself in worst case scenarios.
You say, in the worst
5 % of my posterior, what is the best thing I can do?
um And the way it works is it basically takes a quantile of your posterior.
It says, give me the lower 5 % bound or the lower 10 % bound.
And then just treat those posterior samples as something to ah maximize expected utility over.
um So you get a similar kind of behavior.
um
When you use the lower 100 % bound, it's the same as what you had before.
um But as you change what bound you're interested in, such that maybe you're looking at the lower 20%, you'll realize that it's just kind of slicing off this section of the
posterior and then taking the expect utility over that.
And as a result, it recommends a cheaper product as well.
oh
If you have the 20 % lower bound, says, make your product $2.70.
Yeah.
And I love how this is like, what's awesome is that first it's extremely practical and actionable.
And second, it's also great because you can check your model's results against extremely common business metrics that make sense.
in your case.
And so you can see if the model is saying something that's counterintuitive, you can really decipher much more easily whether it's actually a prompt from the model or it's
actually something that was undervalued by us before and that the model helped uncover.
And this is extremely valuable for that kind of cases.
Yeah, exactly.
There's a common problem where
You have been doing things one way for a long time.
You build up a Bayesian model to try and tell you how to do things.
You look at what it tells you to do and you're like, that's crazy.
It's so different from what we were doing before.
Um, and it's hard to tell whether, uh, who's wrong in that situation.
Maybe the folk wisdom had some wisdom in it.
Um, and these sorts of tricks, they won't guarantee that you'll result in what you had before, or they won't necessarily bring you closer.
But they can protect you against quirks of the Bayesian model where maybe the result is driven largely by some very high risk, very uncertain, but potentially profitable channel.
maybe you don't want to put all of your money on that channel because maybe, maybe you did model it wrong and it's worth slowing down before you change the business plan as much.
Completely.
Yeah, that's really awesome.
Again, I love that you packaged all of that into that tutorial with so many great plots.
Anything else you wanted to show us here before uh stopping sharing your screen?
Or something you want to expand on?
This is my story about the value of Pytensor.
It's a great package.
Here's things you can do with it.
Yeah, yeah, yeah.
Any other, you know, topic you want to talk about when it comes to, let's say, link, not link function, that's for regressions.
But when it comes to cost functions and optimization things, threats that we didn't pull yet, or, you know,
concrete advice that you give people when they are working on optimization, something that you usually do that's very helpful?
um I don't think I have anything too much to say in general.
It's a huge topic.
And I think partly the reason why the probabilistic programming community has focused on how do you build a good model.
And then there's a whole other community that is interested in how do you make optimal decisions.
These have often stayed a little bit far apart because they are both really big topics and require a lot of knowledge.
the kind of things to Google, if you want to know more about this is like linear programming, stochastic programming, nonlinear programming.
These are the of words that people who research this stuff traffic under.
And you'll learn about the
Appropriate numerical algorithms for certain situations and the way of translating domain knowledge and constraints into programs that you can solve.
Yeah.
Yeah.
I mean, it's also because, well, you need a good model before to be able to optimize.
So like I think this is also where the bias of the community comes from, where we had to focus a lot on, okay, how do we even get a good model to begin with?
And.
Then we'll focus on, okay, what do we do with all these posterior samples?
ah It's great to have them, but how can we optimally use them?
Now we're getting there, but yeah, that meant that we had to develop backends that were able to uh deal with these things in an efficient way, vectorized way, efficient
computation, even better if you have it in your model graph.
So that's what PyTensor gives you.
But yeah, like these things take time to develop and this is open source, so it can take even more time because people have day jobs.
And, uh, Python's are well, most of the people there do that for free on their free time.
And again, huge kudos to Ricardo Vieira and, uh, Jesse Grabowski who've been blasting through these issues and PRs.
uh
almost single-handedly and in an incredibly fast and smart fashion.
yeah, my recommend their episode.
Of course, they've been on the show before.
I'll put them in the show notes.
But I think that was the end of the live coding part, right, Daniel?
So I think we're done for that topic.
Or do you want to add one last thought?
Nope.
All good there.
Beautiful.
Yeah, and thank you again for doing that.
I think it's extremely valuable and great that we have that out there.
um And actually, we're almost on time.
I've been uh having you here for quite a while and I don't want to abuse your time, but I'm also curious.
I know you're someone who likes to learn a lot.
So what's next for you?
What are you excited about learning next?
um Yeah, I'm really interested in
Um, getting deeper in this area, especially, um, I think it's such a big, exciting field that I like, I've glimpsed through the door.
Um, but if we had developed a set of tools that makes it even easier to, uh, take patient models and convert them into optimizable programs.
Um, I think that's well worth our investment.
and in like.
uh, actual large scale industrial projects.
This approach simplifies things, but there's still so much complexity and challenge around it.
Um, so that's one place I would like to really deepen my interests.
Um, I've also started to get really drawn to like, uh, I was telling my fiance the other day that, uh, like bread and butter kinds of industries, in my, in my interest lately,
like.
electrical grids and prices and, um, uh, moot, getting things from one place to another, um, kind of data science that like makes the basic engine of society operate.
Um, that's what I've been excited about lately.
So I've been poking at finance books and, um, and supply chain management topics.
Um, but, uh, yeah, I, uh, I'm pretty open-minded.
I guess the other, the other thing I've been excited about lately is, prior elicitation.
Um, I had this, this observation or I had this point out to me by someone that if you know someone's utility function, you can figure out what they believe.
If you know what they believe, you can often figure out your utility function.
and I think that as a community like.
uh, industry scale prior elicitation is not something we've been, uh, so good at.
but I think there's opportunity to improve there.
So I think it's the kind of the nuts and bolts sort of stuff of patient decision-making and the nuts and bolts of, uh, industries is where my attention is having.
Yeah.
Yeah.
I guess so.
I like, I really like that.
Uh, and I mean,
Prior elicitation for sure, that was extremely important.
Dear to my heart, um the Prellis package, course, spearheaded by, among others, Osvaldo Martin, who's been on the show also.
link to his episode about Prellis.
um Really recommend it, this package.
I use it all the time when I'm working on Bayesian models.
This makes your job much, much easier as a modeler.
comes to choosing prior.
So definitely uh give this episode a listen and then try pre-lease, On that note, Daniel, anything you want to add before I ask you the last two questions?
No, let's go for Alex.
Okay.
So um as you know, as usual, I have to ask you the last two questions I ask everybody at the end of the show.
So first one.
If you had unlimited time and resources, which problem would you try to solve?
um The people working on um new inference algorithms, I feel very jealous of them every day um because, know, probabilistic programming, Bayesian methods, they have this kind of
tagline that's like, well, if you just think about data generating process and you know how to convert that into a piece of math, then
Bayes takes care of the rest.
you know, we both said this earlier on the podcast and we also were like, in practice, it's more complicated, but in, and concept, it's that simple.
And the practical part is that, um, getting valid, uh, approximations of your posterior is still a very difficult thing to do.
Um, especially as models get much bigger.
Um, all of this business about reparameterizing models.
or kind of tuning models for effective MCMC sampling.
It does feel like I would like to see all of that disappear one day.
That the inference algorithms get good enough that they don't really care ah what mathematically equivalent representation you decide to use of your model.
They just do the right thing.
Cause anytime you're like trying to model something and then your problem becomes thinking about numerical methods and
trying to get into the nuts and bolts of that.
That means something has gone wrong.
You really aren't supposed to be at the numerical level most of the time in your work if you're doing science or industry.
um So some folks are working on ah MCMC algorithms that don't need reparameterization.
um Either they use a neural network to reparameterize the problem for you.
So this is...
a friend of the show, Adrian's big project.
But then Bob Carpenter and then my friend Elliot Carson have been working on this alternative version of HMC that can change the step size of your algorithm as it hits more
difficult parts of the posterior and then re-lengthen the step size when you get back to easy parts of the posterior.
So these
kinds of tools seem really promising.
the sooner we can get them into routine practice, the better, I'd say.
Yeah, I couldn't agree more.
Honestly, I felt the same, like this will greatly help patient modeling adoption once we figure this out, you know, like how to basically make sampling as smooth as possible for
users.
So yeah, I really can't wait for that.
I have great hopes for these neural network-based automatic reparametrizations because neural networks are extremely flexible and powerful.
So I know there's already a flavor of that with Netpy.
If you're using the normalizing flow tuning routine, you can have access to uh
to do that in NutPy, so in PintZ, through NutPy, which is Adrian's able brainchild of implementation basically of samplers in nuts, in rust, sorry, nuts in rust.
em And yeah, I know Adrian is doing a lot of work on that, very actively working on that.
Actually also with Bob Carpenter that you mentioned, so.
I'm guessing we'll have them back on the show at some point because I know they are working on some very interesting things.
So once this is public, very probably, we'll get them on the show again.
In the meantime, I've had both on the show already and I've added the episode to the show notes.
Folks, so if you want to have background about what these guys are doing, definitely give this a listen.
This is for sure frontier science, but hopefully this will be on all our machines.
in a few years, which will be extremely, extremely valuable as Daniel was saying.
And second question, Daniel, ah if you could have dinner with any great scientific mind, dead, alive or fictional, how would it be?
I've always had a big soft spot for Albert Simon.
Do you know this character?
Yeah.
He's kind of...
He's really fun as a character in the history of science because he was really pivotal in the early stages of AI, m was really pivotal in decision theory and economics, but also
philosophy.
m He had some of the early work on causal graphs, m some of the earliest approaches to that field.
Um, so he's really a kind of polymathic sort of character.
He is in touch with his time, but in touch with all of the topics that we still find really exciting and interesting.
Um, and this is a little detail I've always found so, so fun about him, which is that he didn't finish his undergraduate calculus, um, class.
like went to a few classes and he was like, um, I don't know if it's for me.
Um, so he, dropped out and he did political science.
Um,
So as a person who had that kind of humanities background and then pivoted later on, and I think you have a similar trajectory, Alex, of political science division sets.
He's kind of a fun character.
It's like he is able to work with what he had and accomplish a lot.
And I think it's nice to know that you don't have to have spent decades under the stack of linear algebra textbooks to make.
valuable contributions to what we do.
um yeah, Herbert Simon would be a great dinner guest.
Yeah, beautiful.
Love that choice, uh Daniel.
You're the first to make it, definitely great one.
uh yeah, it's great that the useful contributions can come from so many different backgrounds.
It makes all of these much more interesting and all these dinners and beers that we share together.
much more interesting also because then everybody comes with a different perspective.
that's also much less boring, which is great.
Fantastic.
Well, Daniel, I think we can call this a show that was really fantastic to have you on.
was extremely practical.
So I love that.
I know my listeners do too.
Please refer to...
The show notes, folks, they are plenty for this episode.
If you want to dig deeper, thank you again, Daniel, for taking the time and being on this show.
Yeah.
Thank you so much.
I've had a really great time and I think this podcast has been such a valuable contribution to our community.
So thanks for your work on that.
Thank you, Daniel.
Really, really appreciate it and welcome back anytime, even before you work kind to the show.
to make that clear.
This is not some kind of quid pro quo.
We're already here.
Welcome back to the show.
for sure.
Awesome.
Well, thank you and well, see you very soon on the show.
This has been another episode of Learning Bayesian Statistics.
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit learnbayestats.com for more resources about today's topics, as well as access to more
episodes to help you reach true Bayesian state of mind.
That's learnbayestats.com.
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.
Check out his awesome work at bababrinkman.com.
I'm your host.
Alex and Dora.
can follow me on Twitter at Alex underscore and Dora like the country.
You can support the show and unlock exclusive benefits by visiting Patreon.com slash LearnBasedDance.
Thank you so much for listening and for your support.
You're truly a good Bayesian.
Change your predictions after taking information and if you're thinking I'll be less than amazing.
Let's adjust those expectations.
Let me show you how to be.
Good days, you change calculations after taking fresh data in Those predictions that your brain is making Let's get them on a solid foundation