Bayesian Superiority Estimation with R2D2 Priors: A Practical Guide for Protein Screening
Alternatively titled: How to Make Your Data Work Harder for You!
Hello fellow datanistas!
I've just published a new blog post that tackles one of the most frustrating challenges in experimental science: how do we know if our measurements are meaningful, and which candidates are truly superior?
Here it is: Bayesian Superiority Estimation with R2D2 Priors: A Practical Guide for Protein Screening
If you've ever worked in a lab or analyzed experimental data, you know the struggle:
Your measurements vary wildly between experiments
You can't test everything in one batch
Simple ranking by mean values feels wrong, but you're not sure why
Statistical significance (p-values) doesn't tell you what you really want to know
These challenges aren't unique to protein screening—they appear in drug discovery, materials science, A/B testing, and anywhere we need to compare multiple candidates under noisy conditions.
In this post, I demonstrate two powerful Bayesian techniques that transform how we analyze screening data:
R2D2 priors for variance decomposition: Understand exactly how much of your signal comes from the biological effect versus experimental artifacts. This isn't just statistical hand-waving—it provides concrete guidance on how to improve your experimental protocols.
Bayesian superiority calculation: Instead of reducing rich data to point estimates, directly calculate the probability that one candidate outperforms another, properly accounting for all uncertainty.
The beauty of this approach is that it gives you answers in the form you actually want: "There's an 85% chance that protein A is superior to protein B" rather than the cryptic "p < 0.05."
Good statistical practice is both under-rated and under-taught amongst both machine learning practitioners and laboratory scientists. Yet it underpins our ability to build high-performing models and design experiments that yield interpretable, actionable results.
As I note in the post: "Without good statistical practice underlying the data generating process—and by that I mean good experimental design and explicit quantification of uncertainty—all ML models become equal: equally bad."
This post is for you if:
You work with experimental data and need to rank candidates
You're tired of p-values and want more intuitive measures of certainty
You want to understand how much of your signal is real versus experimental noise
You're interested in practical applications of Bayesian methods
The post includes complete code examples using PyMC, clear visualizations, and practical interpretations that you can apply to your own work.
Happy coding,
Eric
P.S. The techniques I describe aren't fancy statistical magic—they're just logic and math, applied to data. Isn't that what statistics was supposed to be all along?