ST308 Bayesian Inference

Class 2


Philipp Sterzinger

p.sterzinger@lse.ac.uk

07 February 2025

Recap

Decision Theory

Statistical decision problem

Model indexed by \(\theta\) with conditional density \(f(x \mid \theta)\) and data \(x \in \mathcal{X}\)

  • Parameter space \(\Theta\)
  • Set of possible actions \(\mathcal{A}\)
  • Loss function: \(L(a, \theta): \mathcal{A} \times \Theta \to \Re\)

Decision rule

\(\delta(x) : \mathcal{X} \to \mathcal{A}\)

Frequentist, Posterior, Bayes risk

Given a statistical decision problem, we can associate decision rule \(\delta(x)\) with a risk

Frequentist risk

\[ R(\delta(x),\theta) = \textrm{E}_{X \mid \theta} \left(L(\delta(x), \theta) \right) = \int_{\mathcal{X}} L(\delta(x), \theta) f(x \mid \theta) dx \equiv g(\theta) \]

Posterior risk

\[ \rho(\delta(x), \pi(\theta)) = \textrm{E}_{\theta \mid x} \left(L(\delta(x), \theta) \right) = \int_{\Theta} L(\delta(x), \theta) \pi(\theta \mid x) d\theta \equiv h(x) \]

Bayes risk

\[ r(\delta(x), \pi(\theta)) = \textrm{E}_{\theta} \left(R(\delta(x), \theta) \right) = \int_{\Theta} R(\delta(x), \theta) \pi(\theta) d\theta \equiv c \in \Re \]

Point estimator

Given model \(f(x \mid \theta)\), and data \(x\), find a best guess \(\hat{\theta}\)

  • Action: Report point estimate \(\theta \in \Theta\), i.e. \(\mathcal{A} = \Theta\)
  • Loss function: squared loss, ML, …
  • Decision rule: Estimator \(\hat{\theta}: x \mapsto \theta\), e.g. sample mean

Exercises

Question 1

Consider the vaccination example in the lecture slides.

  1. Assume that a person is tested positive for immunity. Which of the decision rules have the lower posterior risk?
  2. Repeat the above for the case that the person was tested negative.
  3. Combine the two above cases and choose the optimal decision rule. Compare with the Bayes risk outcome.

Question 1

Likelihood

\(f(x|\theta)\) \(x_1\): positive \(x_2\): negative
\(\theta_1\): immune \(0.65\) \(0.35\)
\(\theta_2\): susceptible \(0.25\) \(0.75\)

Loss function

\(L(a, \theta)\) \(a_1\): vaccinate \(a_2\): do not vaccinate
\(\theta_1\): immune \(8\) \(0\)
\(\theta_2\): susceptible \(0\) \(20\)

Decision rules

  • \(\delta_1(x) = a_1\) (Vaccinate everybody)
  • \(\delta_2(x_i) = a_i\) (Vaccinate positives)
  • \(\delta_3(x_i) = a_{-i}\) (Vaccinate negatives)
  • \(\delta_4(x) = a_2\) (Vaccinate nobody)

Question 1.a

Prior

\[ \pi(\theta) = \begin{cases} 0.6 & \textrm{if } \theta = \theta_1, \\ 0.4 & \textrm{else.} \end{cases} \]

Posterior

Since \(x = x_1\), use Bayes’ Theorem and consider the following two cases:

  1. \(\theta = \theta_1\), \(x = x_1\)
  2. \(\theta = \theta_2\), \(x = x_1\)

Question 1.a

\[ \begin{aligned} \pi(\theta_1 \mid x_1) &= \Pr \left(\theta_1 \mid x_1 \right) \\ &= \frac{\Pr(x_1 \mid \theta_1) \Pr(\theta = \theta_1)}{\Pr(x = x_1)} \\ &= \frac{\Pr(x_1 \mid \theta_1) \Pr(\theta = \theta_1)}{\sum_{i = 1}^2\Pr(x = x_1 \mid \theta_i) \Pr(\theta = \theta_i) } \\ &= \frac{0.65 \times 0.6}{0.65 \times 0.6 + 0.25 \times 0.4} \\ &= \frac{39}{49} \,, \end{aligned} \]

and \[\pi(\theta_2 \mid x_1) = 1 - \pi(\theta_1 \mid x_1) = \frac{10}{49}\,.\]

Question 1.a

\(\delta_1(x)\)

\[ \begin{aligned} \rho(\delta_1(x_1), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_1} \left(L(\delta_1(x), \theta) \right) \\ &= L(a_1, \theta_1) \pi(\theta_1 \mid x_1) + L(a_1, \theta_2) \pi(\theta_2 \mid x_1) \\ &= 8 \frac{39}{49} + 0 \frac{10}{49} \\ &= \frac{312}{49} \,. \end{aligned} \]

\(\delta_2(x)\)

\(\delta_2(x_1) = a_1 = \delta_1(x_1)\) so \(\rho(\delta_2(x_1), \pi(\theta)) = 312 / 49\).

Question 1.a

\(\delta_3(x)\)

\[ \begin{aligned} \rho(\delta_3(x_1), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_1} \left(L(\delta_3(x), \theta) \right) \\ &= L(a_2, \theta_1) \pi(\theta_1 \mid x_1) + L(a_2, \theta_2) \pi(\theta_2 \mid x_1) \\ &= 0 \frac{39}{49} + 20 \frac{10}{49} \\ &= \frac{200}{49} \,. \end{aligned} \]

\(\delta_4(x)\)

\(\delta_4(x_1) = a_2 = \delta_3(x_1)\) so \(\rho(\delta_4(x_1), \pi(\theta)) = 200 / 49\).




\(\longrightarrow \delta_3, \delta_4\) minimise posterior risk given \(x_1\)

Question 1.b

Posterior

\[ \begin{aligned} \pi(\theta_1 \mid x_2) &= \Pr \left(\theta_1 \mid x_2 \right) \\ &= \frac{\Pr(x_2 \mid \theta_1) \Pr(\theta = \theta_1)}{\Pr(x = x_2)} \\ &= \frac{\Pr(x_2 \mid \theta_1) \Pr(\theta = \theta_1)}{\sum_{i = 1}^2\Pr(x = x_2 \mid \theta_i) \Pr(\theta = \theta_i) } \\ &= \frac{0.35 \times 0.6}{0.35 \times 0.6 + 0.75 \times 0.4} \\ &= \frac{21}{51} \,, \end{aligned} \]

and \[\pi(\theta_2 \mid x_2) = 1 - \pi(\theta_1 \mid x_2) = \frac{30}{51}\,.\]

Question 1.b

\(\delta_1(x)\)

\[ \begin{aligned} \rho(\delta_1(x_2), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_2} \left(L(\delta_1(x), \theta) \right) \\ &= L(a_1, \theta_1) \pi(\theta_1 \mid x_2) + L(a_1, \theta_2) \pi(\theta_2 \mid x_2) \\ &= 8 \frac{21}{51} + 0 \frac{30}{51} \\ &= \frac{168}{51} \,. \end{aligned} \]

\(\delta_2(x)\)

\[ \begin{aligned} \rho(\delta_2(x_2), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_2} \left(L(\delta_2(x), \theta) \right) \\ &= L(a_2, \theta_1) \pi(\theta_1 \mid x_2) + L(a_2, \theta_2) \pi(\theta_2 \mid x_2) \\ &= 0 \frac{21}{51} + 20 \frac{30}{51} \\ &= \frac{600}{51} \,. \end{aligned} \]

Question 1.b

\(\delta_3(x)\)

\(\delta_3(x_2) = a_1 = \delta_1(x_2)\) so \(\rho(\delta_4(x_1), \pi(\theta)) = 168 / 51\).

\(\delta_4(x)\)

\(\delta_4(x_2) = a_2 = \delta_2(x_1)\) so \(\rho(\delta_4(x_1), \pi(\theta)) = 600 / 51\).




\(\longrightarrow \delta_1, \delta_3\) minimise posterior risk given \(x_2\)

Question 1.c

Posterior risk

\(\rho(\delta(x), \pi(\theta))\) \(\delta_1\) \(\delta_2\) \(\delta_3\) \(\delta_4\)
\(x_1\): positive \(\frac{312}{49}\) \(\frac{312}{49}\) \(\frac{200}{49}\) \(\frac{200}{49}\)
\(x_2\): negative \(\frac{168}{51}\) \(\frac{600}{51}\) \(\frac{168}{51}\) \(\frac{600}{51}\)

Question 1.c

Bayes risk

\[ \begin{aligned} r(\delta(x), \pi(\theta)) &= \textrm{E}_{\theta} \left(R(\delta(x), \theta) \right) \\ &= \textrm{E}_{\theta} \left(\textrm{E}_{X \mid \theta} \left[L(\delta(x), \theta) \right] \right) \\ &= \textrm{E}_{\theta, X} \left(L(\delta(x), \theta) \right) \\ &= \textrm{E}_{X} \left(\textrm{E}_{\theta \mid x} \left[L(\delta(x), \theta) \right] \right) \\ &= \textrm{E}_{X} \left(\rho(\delta(x), \pi(\theta)) \right) \\ &= \int_{\mathcal{X}} \rho(\delta(x), \pi(\theta)) m(x) dx \,. \end{aligned} \]

Hence, if there exists a decision rule \(\delta_i(x)\) such that \[ \rho(\delta_i(x), \pi(\theta)) \leq \rho(\delta_{-i}(x), \pi(\theta)) \,, \] for all \(x \in \mathcal{X}\), then also \[ r(\delta_i(x), \pi(\theta)) \leq r(\delta_{-i}(x), \pi(\theta)) \,, \]

Question 1.c

Marginal distribution of \(X\)

\[ \begin{aligned} \Pr \left( x = x_1 \right) &= \sum_{i = 1}^2 \Pr \left( x = x_1 \cap \theta = \theta_i \right) \\ &= \sum_{i = 1}^2 \Pr \left( x = x_1 \mid \theta_i \right) \Pr(\theta = \theta_i) \\ &= 0.65 \times 0.6 + 0.25 \times 0.4 \\ &= 0.49 \end{aligned} \]

\[ \begin{aligned} \Pr \left( x = x_2 \right) &= 1 - \Pr \left( x = x_1 \right) \\ &= 0.51 \end{aligned} \]

Question 1.c

Bayes risk \(\delta_1(x)\)

\[ r(\delta_1(x), \pi(\theta)) = 0.49 \times \frac{312}{49} + 0.51 \frac{168}{51} = 4.8 \]

Bayes risk \(\delta_2(x)\)

\[ r(\delta_2(x), \pi(\theta)) = 0.49 \times \frac{312}{49} + 0.51 \frac{600}{51} = 9.12 \]

Question 1.c

Bayes risk \(\delta_3(x)\)

\[ r(\delta_3(x), \pi(\theta)) = 0.49 \times \frac{200}{49} + 0.51 \frac{168}{51} = 3.68 \]

Bayes risk \(\delta_4(x)\)

\[ r(\delta_4(x), \pi(\theta)) = 0.49 \times \frac{200}{49} + 0.51 \frac{600}{51} = 8 \]

Question 1.c

\(r(\delta(x), \pi(\theta))\) \(\delta_1\) \(\delta_2\) \(\delta_3\) \(\delta_4\)
\(4.8\) \(9.12\) \(3.68\) \(8\)

Question 2

Consider the quadratic error, absolute error and \(0−1\) loss functions. Find the Bayes estimator for \(\theta\) in the case of

  1. A random sample \(x = (x_1, \ldots, x_n)\) from a \(\mathrm{N}(\theta, 1)\). Assign a \(\mathrm{N}(\mu, \tau^2)\) prior to \(\theta\).
  2. A single observation \(x\) from a \(\textrm{Binom}(n, \theta)\). Assign a \(\textrm{Beta}(\alpha, \beta)\) prior to \(\theta\).

Question 2

Loss Quadratic: \((a - \theta)^2\) Absolute: \(|a - \theta|\) \(0-1\): \(\mathbb{1}\{|a - \theta| > \epsilon \}\)
Bayes estimator Posterior mean: \(\textrm{E}_{\theta \mid x}[\theta]\) Posterior median: \(a: F_{\theta \mid x}(a) = \frac{1}{2}\) Posterior mode: \(\arg \max_{a} f_{\theta \mid x}(a)\)

Question 2.a

From the previous class, we know that the posterior in this case is

\[ \mathrm{N} \left(\frac{\frac{1}{n}\mu + \tau^2 \bar{x}}{\tau^2 + \frac{1}{n}}, \frac{\tau^2 \frac{1}{n}}{\tau^2 + \frac{1}{n}} \right) \]

Code
plot(dnorm, from = -3, to = 3) 

xs <- seq(from = -3, to = 0, length.out = 100)
ds <- dnorm(xs)

polygon_x <- c(min(xs), xs, max(xs))
polygon_y <- c(min(ds), ds, min(ds))
polygon(polygon_x, polygon_y, col = rgb(0, 0, 1, 0.25))
abline(v = 0, col = "black", lty = 2, lwd = 2)

Mean, median, and mode of a normal distribution are all the same quantity:

\[ \frac{\frac{1}{n}\mu + \tau^2 \bar{x}}{\tau^2 + \frac{1}{n}} \]

Question 2.b

Likelihood

\[ f(x \mid \theta) = \binom{n}{x} \theta^{x} (1 - \theta)^{n - x} \]

Prior

\[ \pi(\theta) = \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} B(\alpha, \beta)^{-1}, \quad B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)} \]

Question 2.b

Posterior

\[ \begin{aligned} \pi(\theta \mid x) &\propto f(x \mid \theta) \pi(\theta) \\ &= \binom{n}{x} \theta^{x} (1 - \theta)^{n - x} \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} B(\alpha, \beta)^{-1} \\ & \propto \theta^{x} (1 - \theta)^{n - x} \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} \\ &= \theta^{x + \alpha - 1} (1 - \theta)^{n - x + \beta - 1} \\ &\sim \textrm{Beta}(x + \alpha, n - x + \beta) \,. \end{aligned} \]

Question 2.b

Mean of Beta\((a, b)\)

\[ \begin{aligned} B(a, b)^{-1} \int_0^1 \theta \theta^{a - 1} (1 - \theta)^{b - 1} d \theta &= B(a, b)^{-1} \int_0^1 \theta^{a } (1 - \theta)^{b - 1} d \theta \\ &= \frac{B(a + 1, b)}{B(a, b)} \\ &= \frac{\Gamma(a + 1) \Gamma(b)}{\Gamma(a + b + 1)} \frac{\Gamma(a + b)}{\Gamma(a) \Gamma(b)} \\ &= \frac{\Gamma(a + 1)}{\Gamma(a )} \frac{\Gamma(a + b)}{\Gamma(a + b + 1)} \,, \end{aligned} \] for \[ \Gamma(z) = \int_0^\infty t^{z - 1} \exp\{- t\} dt \]

Question 2.b

Mean of Beta\((a, b)\)

Using integration by parts, we get that

\[ \begin{aligned} \Gamma(z + 1) &= \int_0^\infty t^{z } \exp\{- t\} dt \\ &= \left[-t^z \exp\{-t\} \right]^{\infty}_0 + \int_0^\infty z t^{z-1} \exp\{-t\} dt\\ &= [-0 - 0] + z \int_0^\infty t^{z - 1} \exp\{- t\} dt \\ &= z \Gamma(z) \end{aligned} \]

Hence

\[ \begin{aligned} \frac{\Gamma(a + 1)}{\Gamma(a )} \frac{\Gamma(a + b)}{\Gamma(a + b + 1)} &= \frac{a}{a + b} \end{aligned} \]

Question 2.b

Mean of Beta\((a, b)\)

\[ \textrm{E}_{\theta \mid x}[\theta] = \frac{x + \alpha}{n + \alpha + \beta} \,. \]

Question 2.b

Mode of Beta\((a, b)\)

Maximise the log of the density \[ (a - 1) \log(\theta) + (b - 1) \log(1 - \theta) \] \[ \begin{aligned} \frac{\partial}{ \partial \theta} \log(\theta) + (b - 1) \log(1 - \theta) & = \frac{a -1}{\theta} - \frac{b - 1}{1 - \theta} \\ &= 0 \\ &\iff \theta = \frac{a + 1}{a + b - 2} \end{aligned} \]

Second derivative \[ \begin{aligned} \frac{\partial^2}{ \partial \theta^2} \log(\theta) + (b - 1) \log(1 - \theta) & = -\frac{a -1}{\theta^2} - \frac{b - 1}{(1 - \theta)^2} & < 0 \end{aligned} \] for \(a > 1, b > 1\), in which \(\theta = \frac{a + 1}{a + b - 2}\) is a global maximum.

Question 2.b

Mode of Beta\((a, b)\)

  • If \(a = b = 1\), any \(\theta \in [0,1]\) is mode.
  • If \(a = 1\), \(b > 1\), posterior \(\propto (1-\theta)^{b - 1}\) which is maximised at \(\theta = 0\).
  • If \(a > 1\), \(b = 1\), posterior \(\propto \theta^{a - 1}\) which is maximised at \(\theta = 1\).
  • If either \(a < 1\) or \(b < 1\), no mode exists as pdf diverges as \(\theta \to 0\) or \(\theta \to 1\).

Question 2.b

Median of Beta\((a, b)\)

No closed form solution, find numerically, in R use the uniroot function.

Code
alpha <- 2 
beta <- 2.5 

median_diff <- function(x, alpha, beta){
  0.5 - pbeta(x, alpha, beta) 
}

med <- uniroot(median_diff, c(0,1), alpha, beta)$root

pr <- pbeta(med, alpha, beta) 

cbind(med, pr) 
           med        pr
[1,] 0.4355571 0.5000045

Question 3

Show that the bayes risk \(r(\delta(x), \pi(\theta))\) can be written as averaging the posterior risk over \(x\).

\[ \begin{aligned} r(\delta, \pi(\theta)) &= \textrm{E}_{\theta} \left[R(\delta(x), \delta) \right] \\ &= \int_{\Theta} R(\delta(x), \delta) \pi(\theta) d \theta \\ &= \int_{\Theta} \left( \int_{\mathcal{X}} L(\delta(x), \theta) f(x \mid \theta) dx \right) \pi(\theta) d \theta \\ &= \int_{\Theta} \int_{\mathcal{X}} L(\delta(x), \theta) f(x \mid \theta) \pi(\theta) dx d \theta \\ &= \int_{\Theta} \int_{\mathcal{X}} L(\delta(x), \theta) \pi(\theta \mid x) m(x) dx d \theta \\ &= \int_{\mathcal{X}} \int_{\Theta} L(\delta(x), \theta) \pi(\theta \mid x) m(x) d \theta dx \\ &= \int_{\mathcal{X}} \left( \int_{\Theta} L(\delta(x), \theta) \pi(\theta \mid x) d \theta \right) m(x) dx = \textrm{E}_X \left[ \rho(\delta(X), \pi(\theta)) \right] \,. \end{aligned} \]

Question 4

Let \(x = (x_1, \ldots , x_n)\) be a random sample from a Pois\((\lambda)\) distribution. Assign a Gamma\((\alpha, \beta)\) prior to \(\lambda\). Consider the LINEX (LINear-EXponential) loss function. \[ L(a, \lambda) = \exp\{k(a − \lambda)\} − k(a − \lambda) − 1 \,, \] where \(k\) is a known positive constant. Find the Bayes estimator \(\lambda\).

From lecture (week 1, slides 34-35), we know that the posterior in this case is a \(\textrm{Gamma}(\alpha + \sum_{i = 1}^n x_i, n + \beta)\) distribution.

Bayes estimator

\[ \begin{aligned} \hat{\lambda} &= \arg \min_{a} \, \rho(a, \pi(\lambda)) \\ &= \arg \min_{a} \, \int_{0}^\infty L(a, \lambda) \pi(\lambda \mid x) d \lambda \\ &= \arg \min_{a} \, \int_{0}^\infty \left(\exp\{k(a − \lambda)\} − k(a − \lambda) − 1 \right) \pi(\lambda \mid x) d \lambda \\ \end{aligned} \]

Question 4

\[ \begin{aligned} \frac{\partial}{\partial a} \int_{0}^\infty L(a, \lambda) \pi(\lambda \mid x) d \lambda &= \int_{0}^\infty \frac{\partial}{\partial a} \left\{ \left(\exp\{k(a − \lambda)\} − k(a − \lambda) − 1 \right) \pi(\lambda \mid x) \right\} d \lambda \\ &= \int_{0}^\infty k \left( \exp\{k(a − \lambda)\} − 1 \right) \pi(\lambda \mid x) d \lambda \\ &= k \left(\exp\{k a\}\int_{0}^\infty \exp\{−k \lambda\} \pi(\lambda \mid x) d \lambda - 1 \right) \\ &= k \left(\exp\{k a\} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) - 1 \right) \,. \end{aligned} \]

FOC

\[ \begin{aligned} & \quad \quad \, \, \, \, k \left(\exp\{k a\} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) - 1 \right) &&= 0 \\ &\iff \exp\{k a\} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) &&= 1 \\ &\iff \exp\{-k a\} &&= \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \\ &\iff -ka &&= \log \left( \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \right) \\ &\iff a &&= - \frac{\log \left( \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \right)}{k} \\ \end{aligned} \]

Question 4

\(\textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\})\)

For \(\textrm{Gamma}(A, B)\) distribution

\[ \begin{aligned} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) &= \int_0^\infty \exp\{- k \lambda\} \frac{B^A}{\Gamma(A)} \lambda^{A-1} \exp\{-B \lambda\} d \lambda \\ &= \frac{B^A}{\Gamma(A)} \int_0^\infty \exp\{- (k + B) \lambda\} \lambda^{A-1} d \lambda \\ &= \frac{B^A}{(B + k)^A} \int_0^\infty \exp\{- (k + B) \lambda\} \frac{(B + k)^A}{\Gamma(A)} \lambda^{A-1} d \lambda \\ &= \left(\frac{B}{B + k} \right)^A \\ \end{aligned} \]

Question 4

Bayes estimator

For \(A = \alpha + \sum_{i = 1}^n x_i\), \(B = \beta + n\), \[ \begin{aligned} a &= - \frac{\log \left( \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \right)}{k} \\ &= \frac{A \log \left( \frac{B + k}{B}\right)}{k} \\ &= \frac{A (\log (B + k) - \log(B))}{k} \\ &= \frac{(\alpha + \sum_{i = 1}^n x_i) (\log (n + \beta + k) - \log(n + \beta))}{k} \,, \end{aligned} \]