Writing archive
Jan 11, 20264 min read

Bayesian vs. Frequentist: Two Ways of Thinking About Uncertainty

Two dominant frameworks of statistical inference—Frequentist and Bayesian—start from the same data but differ in what probability means and how uncertainty is quantified.

Technical explainerPublished essayBayesian Methods

Byline

By Amandeep Singh

Research portfolio on Bayesian statistics, macroeconomic tail risk, actuarial systems, and essays on India, law, history, and political structure.

Research Record

TypeTechnical explainer
StatusPublished essay
Primary hubBayesian Methods
Sources listedNot listed yet

At A Glance

FormatTechnical explainer
Sections8
Read time4 min
PublishedJan 11, 2026

Topics

Bayesian MethodsFrequentist MethodsStatisticsInference
Editorial cover image for Bayesian vs. Frequentist: Two Ways of Thinking About Uncertainty
Article visualTechnical explainerBayesian Methods

Statistical inference is how we learn from data: we estimate unknown quantities, quantify uncertainty, and make decisions under noise. Two paradigms dominate modern inference—Frequentist and Bayesian—and while they often start from the same likelihood and the same dataset, they differ in what probability means, what is considered random, and how conclusions should be interpreted.

This article explains the core ideas behind both paradigms, how each approaches estimation and testing, and what practical trade-offs matter when choosing between them.


1. The Frequentist Approach

The Frequentist view interprets probability as a long-run frequency: if you repeated the same experiment under identical conditions many times, probability is the limiting proportion of times an event occurs.

P(A)=limnnAn.P(A)=\lim_{n\to\infty}\frac{n_A}{n}.

Data are random; parameters are fixed

In frequentist inference, the observed dataset is treated as a random outcome of a sampling process, while parameters are treated as fixed but unknown constants. A standard modeling assumption is that data are i.i.d.:

Xif(xθ),i=1,,n.X_i \sim f(x\mid \theta), \quad i=1,\dots,n.

Here, the randomness is entirely in the sample. You could have drawn a different sample from the same population, and frequentist methods evaluate performance by imagining repeated sampling.

A central result that supports many frequentist procedures is the Central Limit Theorem (CLT), which motivates approximate normality of common estimators (like the sample mean) for large samples:

Xˉ=1ni=1nXi    N ⁣(μ,σ2n)as n.\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i \;\approx\; \mathcal{N}\!\left(\mu,\frac{\sigma^2}{n}\right) \quad \text{as } n\to\infty.

2. Frequentist Estimation

An estimator is a function of the data:

θ^=g(X1,,Xn).\hat{\theta}=g(X_1,\dots,X_n).

A classic criterion is unbiasedness:

E[θ^]=θ.\mathbb{E}[\hat{\theta}]=\theta.

Frequentist estimators are assessed via their sampling distributions. Key metrics include variance and mean squared error (MSE).

Variance:

Var(θ^)=E ⁣[(θ^θ)2].\mathrm{Var}(\hat{\theta})=\mathbb{E}\!\left[(\hat{\theta}-\theta)^2\right].

MSE decomposition:

MSE(θ^)=Var(θ^)+(E[θ^]θ)2.\mathrm{MSE}(\hat{\theta}) =\mathrm{Var}(\hat{\theta})+\left(\mathbb{E}[\hat{\theta}]-\theta\right)^2.

This decomposition makes the familiar bias–variance trade-off explicit.


3. Frequentist Methodology

(a) Hypothesis testing

You specify a null hypothesis H0H_0 and an alternative HaH_a, compute a test statistic, and quantify evidence via a p-value:

p-value=Pr ⁣(T(X)TobsH0 is true).p\text{-value}=\Pr\!\left(T(X)\ge T_{\mathrm{obs}} \mid H_0 \text{ is true}\right).

(b) Confidence intervals

A typical 100(1α)%100(1-\alpha)\% confidence interval is:

θ^±zα/2SE(θ^).\hat{\theta}\pm z_{\alpha/2}\cdot \mathrm{SE}(\hat{\theta}).

For the mean with known variance:

Xˉ±zα/2σn.\bar{X}\pm z_{\alpha/2}\cdot \frac{\sigma}{\sqrt{n}}.

Interpretation is crucial: the probability statement is about the procedure over repeated samples, not about θ\theta being random.

(c) Maximum Likelihood Estimation (MLE)

MLE selects parameters that maximize the likelihood of the observed data:

θ^MLE=argmaxθL(θX),L(θX)=i=1nf(Xiθ).\hat{\theta}_{\mathrm{MLE}} =\arg\max_{\theta} L(\theta\mid X), \qquad L(\theta\mid X)=\prod_{i=1}^n f(X_i\mid \theta).

4. The Bayesian Approach

Bayesian inference interprets probability as a degree of belief (uncertainty) that can be updated when new data arrives.

Bayes’ theorem:

P(AB)=P(BA)P(A)P(B).P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}.

In parameter inference, Bayesians treat the parameter as a random variable with a prior distribution:

θp(θ).\theta \sim p(\theta).

After observing data XX, beliefs are updated to the posterior:

p(θX)=p(Xθ)p(θ)p(X)    p(Xθ)p(θ).p(\theta\mid X)=\frac{p(X\mid \theta)p(\theta)}{p(X)} \;\propto\; p(X\mid \theta)p(\theta).

5. Bayesian Estimation and Uncertainty

A common Bayesian point estimate is the posterior mean:

E[θX]=θp(θX)dθ.\mathbb{E}[\theta\mid X]=\int \theta\, p(\theta\mid X)\, d\theta.

Another is the MAP (maximum a posteriori) estimate:

θ^MAP=argmaxθp(θX).\hat{\theta}_{\mathrm{MAP}}=\arg\max_{\theta} p(\theta\mid X).

A 100(1α)%100(1-\alpha)\% credible interval satisfies:

Pr(θ1θθ2X)=1α.\Pr(\theta_1 \le \theta \le \theta_2 \mid X)=1-\alpha.

Unlike confidence intervals, credible intervals are direct probability statements about θ\theta conditioned on the observed data.


6. Bayesian Testing: Bayes Factors

Bayesian hypothesis testing often compares models via the Bayes factor:

B=p(XH1)p(XH0).B=\frac{p(X\mid H_1)}{p(X\mid H_0)}.

Posterior odds relate to prior odds by:

p(H1X)p(H0X)=B×p(H1)p(H0).\frac{p(H_1\mid X)}{p(H_0\mid X)} = B \times \frac{p(H_1)}{p(H_0)}.

This makes explicit how evidence and prior belief jointly drive conclusions.


7. Practical Differences That Matter

  • What probability means

    • Frequentist: long-run frequency under repetition
    • Bayesian: quantified belief updated with data
  • What’s random

    • Frequentist: the data (sampling); parameters are fixed
    • Bayesian: parameters are random (via a prior); data update the posterior
  • Intervals

    • Frequentist: confidence intervals (coverage under repeated sampling)
    • Bayesian: credible intervals (probability about θ\theta given XX)
  • Computation

    • Frequentist: often analytic/asymptotic (MLE, CLT)
    • Bayesian: often computational for complex models (MCMC, etc.)

Conclusion

Neither framework is “universally better.” Frequentist methods offer strong long-run guarantees and are often straightforward to compute and communicate. Bayesian methods provide a coherent way to incorporate prior knowledge and produce direct probability statements about unknowns—especially powerful in hierarchical models and sequential learning.

In practice, the best approach is pragmatic: match the framework to your question, your assumptions, and what you need to report (coverage guarantees vs. posterior probabilities), and validate conclusions with sensitivity checks.

Continue Reading

Related arguments and adjacent essays.

Browse archive

Comments

0 comments
No comments yet. Be the first to add one.