
Today's clip is from episode 158 featuring Stefan Radev. In this conversation, Alex and Stefan explore a genuinely fascinating problem: how do you turn an expert's intuition into a mathematically valid prior distribution - and can AI help automate that process?Alex explains that prior elicitation is essentially a translation problem. Experts don't walk around thinking in probability distributions - their knowledge lives in intuitions, rules of thumb, and rough ranges. The challenge is converting that into something a Bayesian model can actually use.The traditional approach? Ask an expert for quantiles or a mean, then parameterize your prior with hyperparameters and simulate until the model-implied quantities match what the expert described. If your pipeline is differentiable end-to-end, you use gradient descent. If not, you fall back to something like Bayesian optimization. Either way, you're iterating toward a prior that genuinely reflects expert knowledge - not just a convenient assumption.But the really exciting part is what came next. In a follow-up paper, they pushed this further: instead of optimizing within a fixed parametric family (say, a Gaussian), they replaced the prior entirely with a normalizing flow - a flexible generative network - and ran the same procedure. No assumed distribution family. Just let the data and the expert's knowledge shape the prior from scratch.The catch? More flexibility means more non-identifiability and stability headaches. But the direction is clear: a fully automated, end-to-end pipeline for building priors from non-probabilistic expert knowledge. And in 2026, that pipeline could theoretically be driven by an agent.Get the full discussion hereSupport & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Jun 2
4 min

Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: Why are prior predictive checks so underused in practice, and how do simulations help?A: They're underused because researchers don't always think to run them before seeing data -- but also because doing them rigorously (in the style Michael Betancourt advocates, with prior push-forward checks on interpretable summaries) takes effort. Simulations make it cheap to generate thousands of “what-if world” datasets from your model and check whether they look plausible, catching bad priors before you ever touch real data.Q: How can generative AI help with prior elicitation?A: Rather than forcing a domain expert to choose a distributional family and parameterize it, you can use a generative model to translate their qualitative knowledge directly into a prior. The expert describes what realistic data should look like; the generative model produces synthetic datasets matching that description; those datasets are used to fit a prior distribution. It removes the assumption that experts can think in terms of parameters and replaces it with the more natural question: does this look like your data?Q: What would a foundation model for Bayesian inference actually look like?A: Stefan's bet is that it won't be a fine-tuned general LLM. The right analogy is chess: you don't fine-tune GPT to play chess, you teach it when to call Stockfish. For Bayesian inference, you'd want a semantic layer – an LLM that understands the analysis goal – calling specialized numerical engines (MCMC samplers, amortized inference networks) that do the actual computation. Agent skills are already a step in this direction; the longer-term vision is engines that have been trained from scratch to generalize across large families of models and priors.Full takeaways here.Chapters:00:00 How does amortized inference fit into modern Bayesian workflows?06:01 What role do simulations play across the full Bayesian workflow?12:12 How do you elicit priors from a domain expert who doesn't think in distributions?19:01 What would a foundation model for Bayesian inference actually look like?35:32 What is self-consistency in amortized inference and why does it matter?39:22 How does semi-supervised learning improve simulation-based inference?43:16 Why is sensitivity analysis so important yet so underused in Bayesian practice?47:40 What is multiverse analysis and how does it change how we report Bayesian results?51:32 How does amortized inference make sensitivity and multiverse analysis affordable?01:02:47 How do amortized inference and classical MCMC complement each other?01:10:08 What are the next major directions for BayesFlow and amortized inference research?Thank you to my Patrons for making this episode possible!Links from the show here.
May 21
1 hr 18 min

Today's clip is from Episode 157 featuring Stefan Radev. In this conversation, Alex and Stefan dig into one of the hardest open problems in simulation-based inference — hierarchical models.The core idea: when you move from flat to hierarchical models, you're no longer estimating one set of parameters. You have local parameters that vary by location (or subject, or city) and global parameters that capture what's shared across all of them. And you don't just want each separately — you want the full joint posterior, because that's where the Bayesian magic of shrinkage actually lives.Stefan builds the problem from the ground up. Start with the simplest hierarchical case: a two-level model. He uses electoral forecasting in France as the example — cities nested inside departments nested inside the whole country.Now your simulator has to cover all three levels. If that simulator is slow (think: brain emulators, minutes per sample), scaling to hundreds of groups becomes completely intractable. Memory issues, specialized network requirements, the works.The key insight: this problem has structure you can exploit. The joint posterior factorizes in a particularly nice way — each local parameter depends on its own local data and on the global parameters. That means instead of cramming everything into one giant high-dimensional vector and hoping a neural network figures it out, you can decompose the problem. Estimate local parameters conditioned on local data and the globals. Use composition.The takeaway: hierarchical models aren't just "harder flat models" - they have a geometry that demands a different architecture. Respecting that structure is what makes amortized inference scale.Get the full discussion hereSupport & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
May 13
3 min

Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: What is simulation-based inference and what does "sim-to-real" mean?A: Simulation-based inference (SBI) uses a mechanistic simulator as an epistemic tool: you train a neural network on a large number of labeled simulations and then deploy it on real, unlabeled data. The "sim-to-real" framing captures the key asymmetry -- your network never sees real data during training, only simulations, but it generalizes to real observations at inference time. This is the opposite of the more common "synthetic-for-ML" approach, where fake data is used purely to augment real training data.Q: What is the amortized inference agent skill and what does it do?A: It's an open-source AI agent skill, co-developed by Stefan and Alexandre, that teaches an AI coding agent to run a complete, state-of-the-art amortized inference workflow. Because amortized inference is recent enough that it's underrepresented in LLM training data, vanilla agents tend to get it wrong. The skill injects the right methodology: it guides the agent to set up the simulator, choose the right network architecture, run a pilot, train with appropriate diagnostics, and produce an actionable report -- without the user needing to know the details.Q: What is calibration coverage and why should you never skip it?A: Calibration coverage tells you whether your posterior uncertainty is honest -- whether your credible intervals actually contain the true parameter at the right frequency. A model can show poor parameter recovery yet still be well-calibrated (because it's falling back on the prior), or it can appear to recover parameters while being poorly calibrated. Running calibration diagnostics both in-sample and out-of-sample is especially revealing for hierarchical models, which often appear to underfit in-sample but generalize much better out-of-sample thanks to shrinkage.Full takeaways hereChapters:00:00:00 How does amortized inference fit into the Bayesian workflow?00:12:03 What does "sim-to-real" mean in simulation-based inference?00:15:57 Why is amortized inference particularly suited to psychology and neuroscience?00:21:51 What is the amortized inference agent skill?00:39:00 What is calibration coverage and how do you interpret it?00:41:50 How do you decide what to do next after your first training run?00:44:53 How do actionable insights make Bayesian workflows more usable?00:49:08 What are the unique challenges of hierarchical models in amortized inference?01:00:51 What is the current state of BayesFlow's support for hierarchical models?01:05:00 What are the main failure modes of amortized inference and how do you handle model misspecification?Thank you to my Patrons for making this episode possible!Links from the show
May 6
1 hr 18 min

Today's clip is from Episode 156 featuring Adam Foster. In this conversation, Adam explains Expected Information Gain (EIG) -the scoring function at the heart of optimal Bayesian experimental design.The core idea: when designing an experiment, you need a way to compare possible designs and pick the best one. EIG is that score - it tells you how much information you expect to gain about your model parameters from a given design. The higher the EIG, the better the design.Adam builds intuition for EIG from two directions that sound completely different but lead to the same place. First, the Bayesian angle: simulate datasets from your prior predictive distribution, run inference on each, measure how much uncertainty dropped, and average across datasets. Second, a classic puzzle - the 12 prisoners balance scale problem - where the best weighing strategy turns out to be the one that makes all three outcomes (tip left, tip right, balance) equally likely. This maximizes outcome entropy, which is exactly what EIG does: it steers you toward designs where every possible result narrows down your hypotheses as fast as possible.The takeaway: good experimental design isn't about intuition or convention - it's about making your data work as hard as possible, and EIG gives you a rigorous way to do that.Get the full discussion hereSupport & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
May 1
5 min

Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeawaysQ: What is Bayesian experimental design and what problem does it solve?A: It's the practice of using a Bayesian model to decide how to collect data before you collect it. Most statistical thinking starts with a fixed dataset. Bayesian experimental design sits upstream -- you have control over experimental parameters (which questions to ask, which reagents to mix, which conditions to test) and you want to choose them optimally. The Bayesian angle is to ask: what new data would most reduce my current uncertainty?Q: When should you actually use Bayesian experimental design?A: When two conditions hold: you have active control over how data is collected (not just passive observation), and you have a Bayesian model whose prior predictive distribution gives a reasonable picture of what typical data might look like. It's especially valuable when data collection is expensive or irreversible -- when the "committal step" of running an experiment has real cost, it's worth doing the analysis first.Q: What is expected information gain (EIG) and why is it central to Bayesian experimental design?A: EIG is the score you assign to a candidate experimental design -- the amount of information you expect to gain about your model parameters by running an experiment with that design. You compute it by simulating datasets from your prior predictive, doing Bayesian inference on each, and averaging how much the uncertainty decreased. What's remarkable is that you can derive the same quantity from two completely different starting points -- reducing parameter uncertainty, or maximizing outcome uncertainty while correcting for noise - and arrive at the same formula. That convergence is why EIG keeps being re-discovered independently across fields.Full takeaways hereChapters:00:00 What is Bayesian experimental design and why does it matter?00:06:02 What problem does Bayesian experimental design actually solve?00:08:54 When should practitioners use Bayesian experimental design?00:12:00 Is Bayesian experimental design changing how scientists work in practice?00:15:04 What are the limitations of Bayesian experimental design?00:17:55 What is expected information gain (EIG) and how does it work?00:21:05 How do you compute expected information gain in practice?00:23:48 What is active learning and how does it connect to Bayesian experimental design?00:41:02 What is active learning by disagreement?00:48:57 What is deep adaptive design and when should you00: use it?00:56:02 How is Bayesian experimental design applied in protein dynamics and quantum chemistry?01:01:58 What does a practical Bayesian experimental design workflow look like?Thank you to my Patrons for making this episode possible!Links from the show
Apr 25
1 hr 16 min

Today's clip is from Episode 152 of the podcast, featuring Daniel Saunders. In this conversation, Daniel explores how Bayesian decision theory handles real-world risk aversion beyond the textbook maximum expected utility framework.The key insight: classical Bayesian decision theory assumes risk neutrality, but in practice, people and businesses are risk-averse. Using a pricing optimization example, Daniel shows how uncertainty varies dramatically across price points—lower prices have predictable demand, while higher prices create wide uncertainty in profits. This asymmetry matters when you want safer decisions.Daniel introduces exponential utility functions—a technique from economics that models diminishing returns on money. By adjusting a risk-aversion parameter, you can see how increasing risk aversion shifts optimal decisions away from high-uncertainty, high-profit scenarios toward more predictable outcomes.The broader lesson: optimal decision-making requires separating the modeling process from the decision process, allowing you to build in constraints and risk adjustments that pure expected utility maximization would miss.Get the full discussion hereSupport & Resources→ Support the show on Patreon: https://www.patreon.com/c/learnbayesstats→ Bayesian Modeling Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Apr 16
5 min

Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free): Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work Takeaways:Q: Why is bridging deep learning and probabilistic programming so important?A: Deep learning is extraordinarily good at fitting complex functions, but it throws away uncertainty. Probabilistic programming keeps uncertainty explicit throughout. Combining the two – as in inference compilation – lets you get the expressiveness of neural networks while still doing proper Bayesian inference.Q: What is inference compilation and how does it relate to amortized inference?A: Amortized inference is the general idea of training a model upfront so you don't have to run expensive inference from scratch every single time. Inference compilation is a specific form of amortized inference where a neural network is trained to propose good posterior samples for a given probabilistic program – essentially learning to do inference rather than computing it fresh each query.Q: What is PyProb and what problems does it solve?A: PyProb is a probabilistic programming library designed specifically to support amortized inference workflows. It lets you write probabilistic models in Python and then train inference networks on top of them, making methods like inference compilation practical for real-world simulators and scientific models.Full takeaways here.Chapters:00:00:00 Introduction to Bayesian Inference and Its Barriers00:03:51 Andreas Munch's Journey into Statistics00:10:09 Bridging the Gap: Bayesian Inference in Real-World Applications00:15:56 Deep Learning Meets Probabilistic Programming00:22:05 Understanding Inference Compilation and Amortized Inference00:28:14 Exploring PyProb: A Tool for Amortized Inference00:33:55 Probabilistic Surrogate Networks and Their Applications00:38:10 Building Surrogate Models for Probabilistic Programming00:45:44 The Challenge of Bayesian Inference in Enterprises00:52:57 Communicating Uncertainty to Stakeholders01:01:09 Democratizing Bayesian Inference with Evara01:06:27 Insurance Pricing and Latent Variables01:16:41 Modeling Uncertainty in Predictions01:20:29 Dynamic Inference and Decision-Making01:23:17 Updating Models with Actual Data01:26:11 The Future of Bayesian Sampling in Excel01:31:54 Navigating Business Challenges and Growth01:36:40 Exploring Language Models and Their Applications01:38:35 The Quest for Better Inference Algorithms01:41:01 Dinner with Great Minds: A Thought ExperimentThank you to my Patrons for making this episode possible!Links from the show here.
Apr 8
1 hr 54 min

Today's clip is from Episode 154 of the podcast, with Thomas Pinder.In this conversation, Thomas Pinder explains how Bayesian methods naturally lend themselves to causal modeling, and why that matters for real-world business decisions. The key insight is that causal questions in industry are rarely black and white: instead of a single treatment effect, you get a full posterior distribution, credible intervals, and the ability to communicate the probability that an effect is positive, which is far more useful to stakeholders than a p-value.Thomas then dives into Bayesian Synthetic Control, a reframing of the classic synthetic control method from a constrained optimization problem into a Bayesian regression problem. Rather than optimizing weights on a simplex, you place a Dirichlet prior on the regression coefficients, which turns out to be not just mathematically elegant but practically richer: you can express prior beliefs about how many control units are informative, set the concentration parameter accordingly, or let a gamma hyperprior on that parameter let the data decide. The result is a more flexible, less fragile counterfactual, implemented cleanly in PyMC or NumPyro.Get the full discussion here Support & Resources→ Support the show on Patreon: https://www.patreon.com/c/learnbayesstats→ Bayesian Modeling Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Apr 2
5 min

• Support & get perks!• Bayesian Modeling course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work! Takeaways:Q: Why was GPJax created and how does it benefit researchers?A: GPJax was developed to provide a high-performance, flexible framework for Gaussian processes (GPs) within the JAX ecosystem. It allows researchers to move beyond black-box implementations and easily experiment with custom kernels and model structures while leveraging JAX’s automatic differentiation and GPU acceleration.Q: What are the primary advantages of using Gaussian processes for data modeling?A: Gaussian processes are highly effective at modeling complex, nonlinear relationships in data. Unlike many machine learning methods that only provide a point estimate, GPs offer built-in uncertainty quantification, which is essential for understanding the reliability of predictions in research and industry.Q: How does the GPJax and NumPyro integration enhance probabilistic modeling?A: The integration allows users to treat GPJax models as components within a larger NumPyro probabilistic program. This combination enables the use of advanced sampling techniques like NUTS (No-U-Turn Sampler), making it easier to build and fit complex hierarchical models that include Gaussian processes.Q: What are the main challenges when applying Gaussian processes to high-dimensional data?A: High-dimensional data significantly complicates GP modeling due to the curse of dimensionality and the cubic scaling of computational costs. In high dimensions, defining meaningful distance metrics for kernels becomes harder, often requiring specialized techniques like sparse GPs or dimensionality reduction to remain tractable.Full takeaways here!Chapters:11:40 What is GPJax and how does it simplify Gaussian Process modeling?15:48 How are Bayesian methods used for experimentation and causal inference in industry?18:40 How do you implement Bayesian Synthetic Control?32:17 What is Bayesian Synthetic Difference-in-Differences?39:44 What are the research applications and supported methods for the GPJax library?45:47 What are the primary software and computational bottlenecks when scaling Gaussian Processes?49:02 What are the real-world industrial applications of Gaussian Process models?54:36 How is Bayesian modeling applied to soccer and sports analytics?58:43 What is the future development roadmap for the GPJax ecosystem?01:05:37 What is Impulso and how does it integrate into a Bayesian modeling workflow?01:13:42 How do you balance Bayesian computational overhead with industrial latency requirements?01:20:26 Why is there optimism that scalable Bayesian methods for causal inference are now within reach?Thank you to my Patrons for making this episode possible!Links from the show here!
Mar 25
1 hr 26 min
Load more
