MCMC re-runs from scratch on every new dataset. Amortized Bayesian inference pays the cost once -- then gives you a posterior in milliseconds, forever. Stefan Radev (BayesFlow creator) and I built an Agent Skill that makes it reliable. Come join us, your first trained amortizer is ~3 minutes away π
And how Stefan Radev and I built an Agent Skill to stop your coding agent from getting it silently wrong.
Imagine for a moment you're running a clinical study. Same model for each participant, ... but you have one million participants, and you need a posterior for each of them.
With MCMC, you're in trouble. Even if a single run takes 30 seconds, you're looking at ~347 days of sampling. Clusters, parallelization, nutpie β sure, knock it down to a week or two. But now multiply by every update to the data, every sensitivity check, every "what if we got another hundred patients?" question from the stakeholders -- that's a long time if you can't take a sabbatical.
This isn't a contrived scenario by the way. It's the one Marvin Schmitt described on LBS episode 107 β a real motivating use case where Bayesian inference as we normally practice it is justβ¦ infeasible.
This is where amortized Bayesian inference (ABI) comes in. And it's why Stefan Radev (the creator of BayesFlow) and I built an Agent Skill to make coding agents save your year β correctly.
The philosophy behind this skill is the same as the bayesian-workflow and causal-inference skills before it: enforce the workflow, not just the code.
BayesFlow gives you all the building blocks. The skill tells the agent which block to reach for, in what order, with which guardrails, and how to prove to you that the result is trustworthy β architecture choice, simulator sanity, diagnostic gates, report structure.
Amortized inference isn't magic β it's a lot of small decisions where getting one wrong silently breaks everything. The skill's job is to make sure none of them are silent, and that the trustworthy path is the default path.
Train once. Infer a million times. And know that your posterior is worth trusting.
The word "amortized" comes from finance -- I know, sexy right? If you amortize a loan, you pay a big cost upfront and spread it out, so the marginal cost of each transaction becomes small. Amortized Bayesian inference does the same thing... with posteriors.
The trade is this: instead of running MCMC every time you get new data, you train a neural network once β typically for a few minutes to a few hours β to approximate the posterior for the entire family of possible datasets your model could generate. After that, getting a posterior for any new dataset is just a forward pass through the network. Milliseconds. No warmup. No tuning. No divergences to stare at.
A million participants? That's a million forward passes. Practical.
And there's a second superpower: you don't need a likelihood function. As Jonas Arruda demonstrated beautifully on LBS episode 151, ABI falls under the broader umbrella of simulation-based inference β you hand it a simulator (prior + forward model) and it learns the posterior from simulations alone. If you have a mechanistic model for epidemics, cosmology, or your sport of choice (yes, soccer, of course), and the likelihood is intractable or prohibitive to compute β you can still do Bayesian inference. It's like being able to sip on your favorite espresso while camping deep in the forest β a pretty big deal!
Here's the part that delighted me when Stefan first walked me through it.
With MCMC, simulation-based calibration (SBC) is the gold standard for checking whether your sampler produces calibrated posteriors. The catch is that doing SBC properly means running your posterior on ~1000 simulated datasets. For a moderately complex PyMC model that takes a minute to sample, that's several hours just to check calibration β which is why, let's be honest, most of us skip it.
With an amortized estimator, SBC is cheap. You trained the network on simulations, remember? Running it on 1000 more simulated datasets is seconds. What was previously a luxury becomes routine β like getting upgraded to first class on all your intercontinental flights.
And that flips the Bayesian workflow on its head. Diagnostics that were previously "we'd love to but can't afford it" become mandatory. You can actually afford to prove your posterior is well-calibrated before trusting it.
Here's the thing: BayesFlow is brilliant, but the surface area is large. There are several summary network options (SetTransformer, TimeSeriesTransformer, FusionTransformer, ConvolutionalNetwork), several inference networks (FlowMatching, DiffusionModel, StableConsistencyModel, coupling flows), three training regimes (online, offline, disk), and an adapter system that needs to route data to the right slots.
Miss one choice, and things fail in the worst possible way: silently. The loss curve goes down. The code runs, yes, but the posterior is wrong.
For example, if you have N exchangeable observations (like a regression dataset, or repeated measurements), you must route them through summary_variables with a SetTransformer. If your agent flattens them into inference_conditions, training converges, inference runs, the numbers look plausible β and the posterior is invalid because the network doesn't know it's looking at a set.
Or, if you try to do image denoising (your inferential target is a 28Γ28 image), you need a DiffusionModel with a UNet subnet. Use the default setup and your agent will confidently "train a model" that doesn't even have the right tensor shapes to represent the output.
An agent doing this without a skill will very often get the architecture wrong β because the defaults look like they apply and the error messages don't appear until much later, when you ask why the posterior looks strange.
The skill enforces the right choices and calls them out as MUST and NEVER rules.
ABI is fast at inference time β that's the whole point. But v2 of the skill, which Stefan led, tackles a different kind of speed: how fast can you go from "I have a generative story" to "I have diagnostics I trust"?
He added three things:
This last one is important. Before v2, the agent still had to interpret calibration and contraction numbers from a DataFrame β and interpretation is exactly where agents are weakest. Now it's programmatic.
We ran 7 eval scenarios covering the terrain: Gaussian location-scale, multi-parameter constrained models, variable-N regression, AR(2) time series, non-identifiable mixtures, offline simulation banks, and Bayesian image denoising (Fashion MNIST).
A +15.1 percentage-point lift.
Where does the gap come from? Three recurring failure modes in the without-skill runs:
These aren't subtle failures. They're the difference between "I have a trained network" and "I have inference I can defend."
I want to pause here, because this skill wouldn't exist without Stefan Radev.
Stefan is the creator of BayesFlow β the library the skill is built on β and is one of the most generous collaborators I've worked with. When I first reached out about a skill for amortized inference, he didn't just give feedback from the sidelines. He wrote code, opened PRs, ran pilot studies with his grad students to figure out what made agents stumble. V2's architecture came out of his direct experience watching people and agents use BayesFlow in practice.
This matters to me because open-source Bayesian tooling runs on this kind of collaboration, and isn't always rewarded as much as it should be. So if you try this skill and it works for you, please star BayesFlow and baygent-skills, and tell a friend! This is how the ecosystem gets better.
A few things to have in mind:
Just ask your agent: "Install the amortized-workflow skill: https://github.com/Learning-Bayesian-Statistics/baygent-skills/tree/main/amortized-workflow" β and you're good to go. Unlike causal-inference, this skill is standalone: it doesn't depend on bayesian-workflow.
As always, it works with Claude Code, Cursor, Gemini CLI, Kimi Code, and any agent supporting the Agent Skills spec.
You'll also need BayesFlow: pip install "bayesflow>=2.0". JAX is the recommended backend; PyTorch and TensorFlow also work.
The amortized-workflow skill is open source and part of the baygent-skills collection. If you try it and something doesn't work, open an issue β Stefan and I both read them.
And if you want to go deeper on the ideas behind this, the podcast episodes that got me hooked are:
On that note, PyMCheers, my dear Bayesians!
Alexandre Andorra β with Stefan Radev