From Posterior to Profit: How Bayesian Decision Theory Turns Models Into Real Business Value
← All posts

From Posterior to Profit: How Bayesian Decision Theory Turns Models Into Real Business Value

Learning Bayesian Statistics, Episode 152 | Host: Alex Andorra | Guest: Daniel Saunders

Key Takeaways

  • Bayesian modeling and decision theory are complementary but separate workflows --- you need both, and keeping them distinct actually makes each one stronger.
  • Optimizing for accuracy metrics like RMSE or MAPE misses the point. The real question is: how much business value does this decision create?
  • PyTensor's computational graph lets modelers and decision-makers work independently and then snap their work together --- no rewriting optimizer code every time a model changes.
  • Risk aversion is not one-size-fits-all. Exponential utility functions, mean-variance penalties, and Conditional Value at Risk (CVaR) each give you a different, tunable lever for how conservative your recommendations should be.
  • A small amount of uncertainty in your elasticity estimate can produce a surprisingly large shift in your recommended price --- which is exactly why the Bayesian approach to optimization matters.
  • Philosophy trains you to justify a model from its assumptions, not just its performance metrics --- a skill that is deeply underrated in the data science industry.
  • Prior elicitation and utility elicitation are mirror images of each other. If you know someone's utility function, you can infer their beliefs, and vice versa.

Who Is Daniel Saunders, and Why Should Bayesian Practitioners Listen to Him?

Daniel Saunders is a senior data scientist at PIMC Labs with a PhD in philosophy, specializing in logic and decision theory. Unlike most guests who arrive at Bayesian statistics through a mathematics or computer science path, Daniel came through philosophy --- specifically through the field of cultural evolution and the study of how moral norms develop and change in societies.

His journey to Bayesian modeling began during the pandemic when he picked up Richard McElreath's work on evolutionary game theory and social learning. That led him to Statistical Rethinking, and from there, he found a way to weave Bayesian inference throughout the rest of his PhD. Today, at PIMC Labs, he works on building Bayesian models for private industry clients and has been developing a decision theory workflow that bridges the gap between inference and action.

He is also, as host Alex Andorra noted with some delight, the first guest Alex had already met in real life before appearing on the show --- having crossed paths at StanCon 2024 in Oxford.

How Did a Philosophy PhD End Up Doing Bayesian Statistics?

Daniel's path was gradual but logical. During his graduate studies, he was drawn to a corner of philosophy genuinely interested in mathematical modeling --- not just as an abstract puzzle, but as a tool for understanding how societies develop norms and how decisions should be made.

One of the central questions in that field --- how do you build a model that simplifies reality while still telling you something true about it? --- turned out to be the same question sitting at the heart of Bayesian inference. Richard McElreath, who is a major figure in cultural evolution research, wrote a textbook on evolutionary game theory that Daniel loved. That book pointed him toward Statistical Rethinking, and the rest followed naturally.

The philosophical connection that really clicked for Daniel was this: Bayesian inference forces you to think carefully about the data-generating process. You have to articulate what causal mechanisms you believe are at work. That is exactly what theoretical modelers in cultural evolution do --- they build simple models of how human societies change over time. Bayesian statistics, Daniel realized, was a way to unite the theoretical model and the statistical model, which in his experience had been developed in frustrating isolation from each other.

What Is the Most Undervalued Soft Skill in Data Science, According to a Philosopher?

Daniel's answer is the ability to evaluate a model from its assumptions rather than its performance metrics alone.

In classical machine learning culture, there is a comforting clarity to model evaluation: find the model that minimizes KL divergence, or maximizes out-of-sample predictive performance, and you are done. You can rank models on a single number. Philosophers, Daniel argues, are far more comfortable sitting with ambiguity. They are trained to ask whether the assumptions baked into a model are defensible --- not just whether the model's output scores well on a leaderboard.

This matters enormously in industry. Daniel has seen technically excellent projects go sideways because senior stakeholders and the data science team had completely different expectations for how the model would be evaluated at the end. When those conversations start from the wrong place --- "let's find a metric that tells us which model is correct" --- you can trap everyone. The scientists optimize for a number that may not mean what it appears to mean, and the stakeholders end up disappointed when reality does not cooperate.

A more humanistic, assumption-first framing of model evaluation communicates uncertainty more honestly and opens up a richer conversation about what the model is actually trying to do.

What Is a Bayesian Decision Theory Workflow, and How Is It Different From a Standard Bayesian Workflow?

The standard Bayesian workflow is an iterative loop: build a model, run prior predictive checks, fit the model, run posterior predictive checks, diagnose sampling behavior, refine, repeat. The goal is a well-calibrated posterior that faithfully represents what the data are telling you about your generative process.

A Bayesian decision theory workflow picks up where that ends --- or runs in parallel with it. Its job is to take whatever posterior you have and answer a different question: given these beliefs, what is the best action to take?

The philosophical foundation for separating the two comes from classical Bayesian decision theory itself. You have two distinct objects: a credence function (your probability distribution over states of the world) and a utility function (how good or bad each of those states would be for you). These are conceptually independent. Given a good posterior, you can optimize. Given a not-so-good posterior, you can still optimize. The decision-making apparatus does not need to know how the posterior was generated.

In large teams, particularly in industrial or business settings, this separation becomes a practical virtue. Modelers can focus on the data-generating process. Decision analysts can focus on the utility function, the constraints, and the levers that the business can actually pull. The two groups can work independently and then integrate their work cleanly.

Why Should You Stop Optimizing for RMSE and Start Optimizing for Business Value?

Every model has a purpose. The philosophy of science has a useful answer to the puzzle of how simplified, idealized models can still be good science: a model only needs to be good enough for its particular purpose.

In industry and science, Daniel identifies three main purposes a model can serve: forecasting, understanding, and decision-making. A model rarely needs to be good at all three. In fact, trying to be good at all three often means being mediocre at each.

If the end goal is to recommend a price, decide how much advertising spend to allocate, or figure out which discounts to run, then the model only needs to make those decisions as well as it can. This reframing has a powerful practical consequence: instead of demanding that R-squared be above 0.9 before a model is accepted, you can ask whether adding more features or running the model for an extra three hours of compute time actually changes the decision in a meaningful way. If adding complexity would save you twenty dollars, that is your answer. Anchoring model evaluation to decisions rather than statistical metrics gives stakeholders something they can respond to and reason about.

What Is PyTensor, and Why Does It Matter for Bayesian Optimization?

PyTensor is PyMC's computational backend. It represents a sequence of computations as a directed graph --- sometimes called a compute graph. Think of a simple arithmetic expression like 5 × 10 + 3. Elementary school teaches you to do the multiplication first, then the addition. You can draw that as a tree: the leaves are the numbers, and the tree compresses them step by step until you reach one root node --- the final answer.

PyTensor lets you look at that tree and operate on it. You can replace a number, swap an operation, extend the graph by gluing new computations onto it, or extract any section of it as a callable function that optimization libraries like SciPy can work with directly.

For Bayesian optimization, this matters in two specific ways.

First, it enforces a clean division of labor. The modeler builds the part of the graph that connects price to sales. The optimization engineer builds the part that converts sales to profit and applies any business-specific constraints. Neither needs to know the details of the other's work. They meet at a shared node in the graph.

Second, it makes vectorized optimization over the full posterior straightforward. Instead of looping over 2,000 posterior samples with a for-loop, calling SciPy separately on each, and then averaging 2,000 "best prices" --- a process that is both slow and conceptually messy --- you can put the averaging step inside the graph. You vectorize across the full posterior, compute expected profit in one pass, and hand a single callable function to SciPy. The result is the price that maximizes expected profit given all of your uncertainty at once.

As Alex noted from his own experience, doing this "artisanally" with a SciPy for-loop works, but it is painful, slow, and brittle. Every time a modeler changes the model, the optimizer code has to be rewritten. PyTensor's graph abstraction eliminates that fragility.

What Does a Concrete Bayesian Optimization Example Actually Look Like?

Daniel walked through a worked example of price elasticity optimization --- deliberately simple, but containing every essential ingredient of a harder problem.

The setup: a company sells a consumer good and has varied its price over roughly 100 days. The data-generating model says that daily sales equal a baseline demand level multiplied by a power function of price, where the exponent is the price elasticity. If elasticity is negative (as it almost always is), raising the price reduces sales.

The goal is not to maximize sales --- that has a trivial and useless solution of setting the price to zero. The goal is to maximize profit, which is sales multiplied by unit profit margin. As price rises, unit profit goes up but sales go down. There is a sweet spot.

After fitting the Bayesian model and recovering the posterior over baseline sales and elasticity, the next step is to extract the PyTensor graph, substitute posterior samples in for the random variables at the leaves, add the profit calculation on top, and average across the posterior to get expected profit as a function of price. That function goes straight into SciPy's optimizer.

The result with true parameters plugged in --- no uncertainty --- is an optimal price of $3.30. The result when optimizing over the uncertain posterior is around $4.00. A small amount of residual uncertainty in the elasticity estimate meaningfully shifts the recommended price. That shift is not a bug; it is the honest answer given what the model actually knows.

How Do You Handle Risk Aversion in a Bayesian Decision Framework?

Standard Bayesian decision theory says to maximize expected utility --- the average outcome weighted by your beliefs. This is perfectly reasonable for a risk-neutral decision maker who is comfortable with uncertainty as long as things work out on average. In practice, many businesses are not risk-neutral, and for good reasons. Daniel presented three increasingly sophisticated approaches.

Exponential utility functions are a classical economics technique for modeling diminishing marginal utility. The intuition is familiar: the difference in happiness between earning zero dollars and one million dollars is much larger than the difference between earning ten million and eleven million. Once you are past a certain threshold, extra money is worth progressively less. Inserting an exponential transformation between profit and utility means the optimizer will naturally avoid high-variance, high-upside bets, because the large wins are discounted. As you increase the risk aversion parameter, the recommended price drops, because the model stops chasing the uncertain high-profit tail of the posterior.

Mean-variance optimization is borrowed from financial economics. You take the expected utility and subtract a penalty term equal to some adjustable coefficient times the variance of outcomes. This is a direct, explicit penalty for uncertainty. With zero risk aversion it reproduces the baseline result of about $4.00. As the coefficient increases, the recommended price moves sharply downward toward the low-uncertainty, low-price end of the posterior. It is easy to implement --- just one extra line --- and easy to explain to stakeholders.

Conditional Value at Risk (CVaR), sometimes called expected shortfall, takes a different angle: instead of penalizing variance across the whole posterior, it asks what the best decision looks like if you restrict attention to the worst X percent of outcomes. You slice off the lower 5 percent or lower 20 percent of the posterior, and then maximize expected utility over just those samples. This is a worst-case-scenario planning tool. At the 20 percent lower bound in Daniel's example, the recommended price drops to around $2.70, reflecting a strategy that protects the business even if demand turns out to be far weaker than expected.

All three of these techniques can be inserted directly into the PyTensor graph, keeping the modeling and optimization layers cleanly separated.

How Do You Sell Decision Theory to Non-Technical Business Stakeholders?

This is where the philosophical training pays off in a very practical way. The challenge is not just explaining the math --- it is eliciting from stakeholders the information you need to build the utility function in the first place.

Daniel's advice is to start simple. The most basic utility function is one dollar equals one unit of utility. Maximize expected profit. Most stakeholders can immediately engage with that framing because it speaks directly to what they care about. From there, you iterate. Stakeholders will tell you when a recommended decision is not actually feasible --- maybe there is a supplier contract that locks in a minimum order quantity, or a fiscal quarter deadline that makes short-term revenue unusually important right now. Each piece of feedback enriches the utility function.

In regulated industries and government contexts, the utility function is often already implicitly written into regulations and public responsibilities. A power grid has legally defined obligations to the public that translate directly into risk tolerance constraints. A corporate finance department knows exactly how much each additional dollar of profit is worth in terms of diminishing marginal utility. When you can find those people and have those conversations, you get a remarkably precise and realistic utility framework without starting from scratch.

The key insight is that for non-technical decision-makers, the decision-making layer is often easier to elicit than the model itself. They may not know what a prior distribution is, but they absolutely know what they are trying to achieve, what constraints they are operating under, and what keeps them up at night. Framing the conversation around those concerns --- and then translating the results back into dollars --- is the universal language that makes the whole workflow land.

What Should Bayesian Practitioners Learn Next if They Want to Go Deeper on Optimization?

Daniel is direct about this: the probabilistic programming community has historically focused heavily on building good models, and the optimization community has historically focused on decision-making given a model. The two have stayed somewhat apart because both are genuinely deep topics.

For practitioners who want to bridge that gap, the relevant search terms are linear programming, stochastic programming, and nonlinear programming. These are the research traditions that have developed the numerical algorithms, theoretical guarantees, and practical frameworks for translating domain knowledge and constraints into solvable programs.

Within the PyMC ecosystem, the PyTensor documentation is a natural starting point, and Daniel's own tutorial --- A Bayesian Decision Theory Workflow --- is the most direct practical guide available. Ricardo Vieira and Jesse Grabowski, two of the core PyMC maintainers, have also done significant work on making these optimization workflows more accessible, and their contributions are worth following.

What Is Coming Next in Bayesian Inference Itself?

Asked what he would work on with unlimited time and resources, Daniel pointed to inference algorithms that do not require reparameterization.

One of the persistent frustrations of applied Bayesian modeling is that after building a model that correctly reflects the science, you often have to reparameterize it --- changing its mathematical form without changing its meaning --- just to make the sampler behave well. This is a numerical methods problem that has nothing to do with the science. It is time-consuming, requires specialist knowledge, and feels like the wrong level of abstraction for most practitioners to be working at.

Two promising research directions are in progress. One involves using neural networks to learn a reparameterization automatically, removing the burden from the modeler. The other, being developed by Bob Carpenter and Elliot Carson, is a variant of Hamiltonian Monte Carlo that adaptively adjusts its step size as it encounters difficult regions of the posterior and relaxes it again when conditions improve. Both approaches aim at the same goal: letting practitioners describe their generative process clearly and correctly, and then having the inference engine handle the rest without manual tuning.

Conclusion

Episode 152 of Learning Bayesian Statistics is one of those conversations that sits at an unusual and productive intersection --- philosophy, decision theory, and industrial data science, held together by a shared commitment to being honest about uncertainty.

The central message Daniel Saunders brings is both simple and genuinely under-appreciated: a posterior is not the destination; it is the starting line. The whole point of building a careful Bayesian model is eventually to act on it. And acting on it well requires a separate but complementary layer of thinking --- one that is explicit about what you are trying to maximize, what constraints you are operating under, and how risk-averse you need to be.

PyTensor makes that layer technically tractable by treating the model as a graph you can inspect, extend, and hand off to optimization solvers. The decision theory frameworks --- expected utility, exponential utility, mean-variance, CVaR --- give that layer conceptual richness. And the philosophical habit of justifying models from their assumptions rather than their metrics alone gives the whole enterprise intellectual honesty.

For any Bayesian practitioner who has ever fit a beautiful model, looked at the posterior summary, and then wondered "...now what?" --- this episode, and Daniel's tutorial, is the answer.

Listen to Learning Bayesian Statistics Episode 152 and find all related resources at [learnbayestats.com](https://learnbayestats.com/). Daniel's tutorial, "A Bayesian Decision Theory Workflow," is linked in the show notes.