p.sterzinger@lse.ac.uk
07 February 2025
Model indexed by \(\theta\) with conditional density \(f(x \mid \theta)\) and data \(x \in \mathcal{X}\)
\(\delta(x) : \mathcal{X} \to \mathcal{A}\)
Given a statistical decision problem, we can associate decision rule \(\delta(x)\) with a risk
\[ R(\delta(x),\theta) = \textrm{E}_{X \mid \theta} \left(L(\delta(x), \theta) \right) = \int_{\mathcal{X}} L(\delta(x), \theta) f(x \mid \theta) dx \equiv g(\theta) \]
\[ \rho(\delta(x), \pi(\theta)) = \textrm{E}_{\theta \mid x} \left(L(\delta(x), \theta) \right) = \int_{\Theta} L(\delta(x), \theta) \pi(\theta \mid x) d\theta \equiv h(x) \]
\[ r(\delta(x), \pi(\theta)) = \textrm{E}_{\theta} \left(R(\delta(x), \theta) \right) = \int_{\Theta} R(\delta(x), \theta) \pi(\theta) d\theta \equiv c \in \Re \]
Given model \(f(x \mid \theta)\), and data \(x\), find a best guess \(\hat{\theta}\)
Consider the vaccination example in the lecture slides.
- Assume that a person is tested positive for immunity. Which of the decision rules have the lower posterior risk?
- Repeat the above for the case that the person was tested negative.
- Combine the two above cases and choose the optimal decision rule. Compare with the Bayes risk outcome.
| \(f(x|\theta)\) | \(x_1\): positive | \(x_2\): negative |
|---|---|---|
| \(\theta_1\): immune | \(0.65\) | \(0.35\) |
| \(\theta_2\): susceptible | \(0.25\) | \(0.75\) |
| \(L(a, \theta)\) | \(a_1\): vaccinate | \(a_2\): do not vaccinate |
|---|---|---|
| \(\theta_1\): immune | \(8\) | \(0\) |
| \(\theta_2\): susceptible | \(0\) | \(20\) |
\[ \pi(\theta) = \begin{cases} 0.6 & \textrm{if } \theta = \theta_1, \\ 0.4 & \textrm{else.} \end{cases} \]
Since \(x = x_1\), use Bayes’ Theorem and consider the following two cases:
\[ \begin{aligned} \pi(\theta_1 \mid x_1) &= \Pr \left(\theta_1 \mid x_1 \right) \\ &= \frac{\Pr(x_1 \mid \theta_1) \Pr(\theta = \theta_1)}{\Pr(x = x_1)} \\ &= \frac{\Pr(x_1 \mid \theta_1) \Pr(\theta = \theta_1)}{\sum_{i = 1}^2\Pr(x = x_1 \mid \theta_i) \Pr(\theta = \theta_i) } \\ &= \frac{0.65 \times 0.6}{0.65 \times 0.6 + 0.25 \times 0.4} \\ &= \frac{39}{49} \,, \end{aligned} \]
and \[\pi(\theta_2 \mid x_1) = 1 - \pi(\theta_1 \mid x_1) = \frac{10}{49}\,.\]
\[ \begin{aligned} \rho(\delta_1(x_1), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_1} \left(L(\delta_1(x), \theta) \right) \\ &= L(a_1, \theta_1) \pi(\theta_1 \mid x_1) + L(a_1, \theta_2) \pi(\theta_2 \mid x_1) \\ &= 8 \frac{39}{49} + 0 \frac{10}{49} \\ &= \frac{312}{49} \,. \end{aligned} \]
\(\delta_2(x_1) = a_1 = \delta_1(x_1)\) so \(\rho(\delta_2(x_1), \pi(\theta)) = 312 / 49\).
\[ \begin{aligned} \rho(\delta_3(x_1), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_1} \left(L(\delta_3(x), \theta) \right) \\ &= L(a_2, \theta_1) \pi(\theta_1 \mid x_1) + L(a_2, \theta_2) \pi(\theta_2 \mid x_1) \\ &= 0 \frac{39}{49} + 20 \frac{10}{49} \\ &= \frac{200}{49} \,. \end{aligned} \]
\(\delta_4(x_1) = a_2 = \delta_3(x_1)\) so \(\rho(\delta_4(x_1), \pi(\theta)) = 200 / 49\).
\(\longrightarrow \delta_3, \delta_4\) minimise posterior risk given \(x_1\)
\[ \begin{aligned} \pi(\theta_1 \mid x_2) &= \Pr \left(\theta_1 \mid x_2 \right) \\ &= \frac{\Pr(x_2 \mid \theta_1) \Pr(\theta = \theta_1)}{\Pr(x = x_2)} \\ &= \frac{\Pr(x_2 \mid \theta_1) \Pr(\theta = \theta_1)}{\sum_{i = 1}^2\Pr(x = x_2 \mid \theta_i) \Pr(\theta = \theta_i) } \\ &= \frac{0.35 \times 0.6}{0.35 \times 0.6 + 0.75 \times 0.4} \\ &= \frac{21}{51} \,, \end{aligned} \]
and \[\pi(\theta_2 \mid x_2) = 1 - \pi(\theta_1 \mid x_2) = \frac{30}{51}\,.\]
\[ \begin{aligned} \rho(\delta_1(x_2), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_2} \left(L(\delta_1(x), \theta) \right) \\ &= L(a_1, \theta_1) \pi(\theta_1 \mid x_2) + L(a_1, \theta_2) \pi(\theta_2 \mid x_2) \\ &= 8 \frac{21}{51} + 0 \frac{30}{51} \\ &= \frac{168}{51} \,. \end{aligned} \]
\[ \begin{aligned} \rho(\delta_2(x_2), \pi(\theta)) &= \textrm{E}_{\theta \mid x = x_2} \left(L(\delta_2(x), \theta) \right) \\ &= L(a_2, \theta_1) \pi(\theta_1 \mid x_2) + L(a_2, \theta_2) \pi(\theta_2 \mid x_2) \\ &= 0 \frac{21}{51} + 20 \frac{30}{51} \\ &= \frac{600}{51} \,. \end{aligned} \]
\(\delta_3(x_2) = a_1 = \delta_1(x_2)\) so \(\rho(\delta_4(x_1), \pi(\theta)) = 168 / 51\).
\(\delta_4(x_2) = a_2 = \delta_2(x_1)\) so \(\rho(\delta_4(x_1), \pi(\theta)) = 600 / 51\).
\(\longrightarrow \delta_1, \delta_3\) minimise posterior risk given \(x_2\)
| \(\rho(\delta(x), \pi(\theta))\) | \(\delta_1\) | \(\delta_2\) | \(\delta_3\) | \(\delta_4\) |
|---|---|---|---|---|
| \(x_1\): positive | \(\frac{312}{49}\) | \(\frac{312}{49}\) | \(\frac{200}{49}\) | \(\frac{200}{49}\) |
| \(x_2\): negative | \(\frac{168}{51}\) | \(\frac{600}{51}\) | \(\frac{168}{51}\) | \(\frac{600}{51}\) |
\[ \begin{aligned} r(\delta(x), \pi(\theta)) &= \textrm{E}_{\theta} \left(R(\delta(x), \theta) \right) \\ &= \textrm{E}_{\theta} \left(\textrm{E}_{X \mid \theta} \left[L(\delta(x), \theta) \right] \right) \\ &= \textrm{E}_{\theta, X} \left(L(\delta(x), \theta) \right) \\ &= \textrm{E}_{X} \left(\textrm{E}_{\theta \mid x} \left[L(\delta(x), \theta) \right] \right) \\ &= \textrm{E}_{X} \left(\rho(\delta(x), \pi(\theta)) \right) \\ &= \int_{\mathcal{X}} \rho(\delta(x), \pi(\theta)) m(x) dx \,. \end{aligned} \]
Hence, if there exists a decision rule \(\delta_i(x)\) such that \[ \rho(\delta_i(x), \pi(\theta)) \leq \rho(\delta_{-i}(x), \pi(\theta)) \,, \] for all \(x \in \mathcal{X}\), then also \[ r(\delta_i(x), \pi(\theta)) \leq r(\delta_{-i}(x), \pi(\theta)) \,, \]
\[ \begin{aligned} \Pr \left( x = x_1 \right) &= \sum_{i = 1}^2 \Pr \left( x = x_1 \cap \theta = \theta_i \right) \\ &= \sum_{i = 1}^2 \Pr \left( x = x_1 \mid \theta_i \right) \Pr(\theta = \theta_i) \\ &= 0.65 \times 0.6 + 0.25 \times 0.4 \\ &= 0.49 \end{aligned} \]
\[ \begin{aligned} \Pr \left( x = x_2 \right) &= 1 - \Pr \left( x = x_1 \right) \\ &= 0.51 \end{aligned} \]
\[ r(\delta_1(x), \pi(\theta)) = 0.49 \times \frac{312}{49} + 0.51 \frac{168}{51} = 4.8 \]
\[ r(\delta_2(x), \pi(\theta)) = 0.49 \times \frac{312}{49} + 0.51 \frac{600}{51} = 9.12 \]
\[ r(\delta_3(x), \pi(\theta)) = 0.49 \times \frac{200}{49} + 0.51 \frac{168}{51} = 3.68 \]
\[ r(\delta_4(x), \pi(\theta)) = 0.49 \times \frac{200}{49} + 0.51 \frac{600}{51} = 8 \]
| \(r(\delta(x), \pi(\theta))\) | \(\delta_1\) | \(\delta_2\) | \(\delta_3\) | \(\delta_4\) |
|---|---|---|---|---|
| \(4.8\) | \(9.12\) | \(3.68\) | \(8\) |
Consider the quadratic error, absolute error and \(0−1\) loss functions. Find the Bayes estimator for \(\theta\) in the case of
- A random sample \(x = (x_1, \ldots, x_n)\) from a \(\mathrm{N}(\theta, 1)\). Assign a \(\mathrm{N}(\mu, \tau^2)\) prior to \(\theta\).
- A single observation \(x\) from a \(\textrm{Binom}(n, \theta)\). Assign a \(\textrm{Beta}(\alpha, \beta)\) prior to \(\theta\).
| Loss | Quadratic: \((a - \theta)^2\) | Absolute: \(|a - \theta|\) | \(0-1\): \(\mathbb{1}\{|a - \theta| > \epsilon \}\) |
|---|---|---|---|
| Bayes estimator | Posterior mean: \(\textrm{E}_{\theta \mid x}[\theta]\) | Posterior median: \(a: F_{\theta \mid x}(a) = \frac{1}{2}\) | Posterior mode: \(\arg \max_{a} f_{\theta \mid x}(a)\) |
From the previous class, we know that the posterior in this case is
\[ \mathrm{N} \left(\frac{\frac{1}{n}\mu + \tau^2 \bar{x}}{\tau^2 + \frac{1}{n}}, \frac{\tau^2 \frac{1}{n}}{\tau^2 + \frac{1}{n}} \right) \]
Mean, median, and mode of a normal distribution are all the same quantity:
\[ \frac{\frac{1}{n}\mu + \tau^2 \bar{x}}{\tau^2 + \frac{1}{n}} \]
\[ f(x \mid \theta) = \binom{n}{x} \theta^{x} (1 - \theta)^{n - x} \]
\[ \pi(\theta) = \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} B(\alpha, \beta)^{-1}, \quad B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)} \]
\[ \begin{aligned} \pi(\theta \mid x) &\propto f(x \mid \theta) \pi(\theta) \\ &= \binom{n}{x} \theta^{x} (1 - \theta)^{n - x} \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} B(\alpha, \beta)^{-1} \\ & \propto \theta^{x} (1 - \theta)^{n - x} \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} \\ &= \theta^{x + \alpha - 1} (1 - \theta)^{n - x + \beta - 1} \\ &\sim \textrm{Beta}(x + \alpha, n - x + \beta) \,. \end{aligned} \]
\[ \begin{aligned} B(a, b)^{-1} \int_0^1 \theta \theta^{a - 1} (1 - \theta)^{b - 1} d \theta &= B(a, b)^{-1} \int_0^1 \theta^{a } (1 - \theta)^{b - 1} d \theta \\ &= \frac{B(a + 1, b)}{B(a, b)} \\ &= \frac{\Gamma(a + 1) \Gamma(b)}{\Gamma(a + b + 1)} \frac{\Gamma(a + b)}{\Gamma(a) \Gamma(b)} \\ &= \frac{\Gamma(a + 1)}{\Gamma(a )} \frac{\Gamma(a + b)}{\Gamma(a + b + 1)} \,, \end{aligned} \] for \[ \Gamma(z) = \int_0^\infty t^{z - 1} \exp\{- t\} dt \]
Using integration by parts, we get that
\[ \begin{aligned} \Gamma(z + 1) &= \int_0^\infty t^{z } \exp\{- t\} dt \\ &= \left[-t^z \exp\{-t\} \right]^{\infty}_0 + \int_0^\infty z t^{z-1} \exp\{-t\} dt\\ &= [-0 - 0] + z \int_0^\infty t^{z - 1} \exp\{- t\} dt \\ &= z \Gamma(z) \end{aligned} \]
Hence
\[ \begin{aligned} \frac{\Gamma(a + 1)}{\Gamma(a )} \frac{\Gamma(a + b)}{\Gamma(a + b + 1)} &= \frac{a}{a + b} \end{aligned} \]
\[ \textrm{E}_{\theta \mid x}[\theta] = \frac{x + \alpha}{n + \alpha + \beta} \,. \]
Maximise the log of the density \[ (a - 1) \log(\theta) + (b - 1) \log(1 - \theta) \] \[ \begin{aligned} \frac{\partial}{ \partial \theta} \log(\theta) + (b - 1) \log(1 - \theta) & = \frac{a -1}{\theta} - \frac{b - 1}{1 - \theta} \\ &= 0 \\ &\iff \theta = \frac{a + 1}{a + b - 2} \end{aligned} \]
Second derivative \[ \begin{aligned} \frac{\partial^2}{ \partial \theta^2} \log(\theta) + (b - 1) \log(1 - \theta) & = -\frac{a -1}{\theta^2} - \frac{b - 1}{(1 - \theta)^2} & < 0 \end{aligned} \] for \(a > 1, b > 1\), in which \(\theta = \frac{a + 1}{a + b - 2}\) is a global maximum.
No closed form solution, find numerically, in R use the uniroot function.
Show that the bayes risk \(r(\delta(x), \pi(\theta))\) can be written as averaging the posterior risk over \(x\).
\[ \begin{aligned} r(\delta, \pi(\theta)) &= \textrm{E}_{\theta} \left[R(\delta(x), \delta) \right] \\ &= \int_{\Theta} R(\delta(x), \delta) \pi(\theta) d \theta \\ &= \int_{\Theta} \left( \int_{\mathcal{X}} L(\delta(x), \theta) f(x \mid \theta) dx \right) \pi(\theta) d \theta \\ &= \int_{\Theta} \int_{\mathcal{X}} L(\delta(x), \theta) f(x \mid \theta) \pi(\theta) dx d \theta \\ &= \int_{\Theta} \int_{\mathcal{X}} L(\delta(x), \theta) \pi(\theta \mid x) m(x) dx d \theta \\ &= \int_{\mathcal{X}} \int_{\Theta} L(\delta(x), \theta) \pi(\theta \mid x) m(x) d \theta dx \\ &= \int_{\mathcal{X}} \left( \int_{\Theta} L(\delta(x), \theta) \pi(\theta \mid x) d \theta \right) m(x) dx = \textrm{E}_X \left[ \rho(\delta(X), \pi(\theta)) \right] \,. \end{aligned} \]
Let \(x = (x_1, \ldots , x_n)\) be a random sample from a Pois\((\lambda)\) distribution. Assign a Gamma\((\alpha, \beta)\) prior to \(\lambda\). Consider the LINEX (LINear-EXponential) loss function. \[ L(a, \lambda) = \exp\{k(a − \lambda)\} − k(a − \lambda) − 1 \,, \] where \(k\) is a known positive constant. Find the Bayes estimator \(\lambda\).
From lecture (week 1, slides 34-35), we know that the posterior in this case is a \(\textrm{Gamma}(\alpha + \sum_{i = 1}^n x_i, n + \beta)\) distribution.
\[ \begin{aligned} \hat{\lambda} &= \arg \min_{a} \, \rho(a, \pi(\lambda)) \\ &= \arg \min_{a} \, \int_{0}^\infty L(a, \lambda) \pi(\lambda \mid x) d \lambda \\ &= \arg \min_{a} \, \int_{0}^\infty \left(\exp\{k(a − \lambda)\} − k(a − \lambda) − 1 \right) \pi(\lambda \mid x) d \lambda \\ \end{aligned} \]
\[ \begin{aligned} \frac{\partial}{\partial a} \int_{0}^\infty L(a, \lambda) \pi(\lambda \mid x) d \lambda &= \int_{0}^\infty \frac{\partial}{\partial a} \left\{ \left(\exp\{k(a − \lambda)\} − k(a − \lambda) − 1 \right) \pi(\lambda \mid x) \right\} d \lambda \\ &= \int_{0}^\infty k \left( \exp\{k(a − \lambda)\} − 1 \right) \pi(\lambda \mid x) d \lambda \\ &= k \left(\exp\{k a\}\int_{0}^\infty \exp\{−k \lambda\} \pi(\lambda \mid x) d \lambda - 1 \right) \\ &= k \left(\exp\{k a\} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) - 1 \right) \,. \end{aligned} \]
\[ \begin{aligned} & \quad \quad \, \, \, \, k \left(\exp\{k a\} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) - 1 \right) &&= 0 \\ &\iff \exp\{k a\} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) &&= 1 \\ &\iff \exp\{-k a\} &&= \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \\ &\iff -ka &&= \log \left( \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \right) \\ &\iff a &&= - \frac{\log \left( \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \right)}{k} \\ \end{aligned} \]
For \(\textrm{Gamma}(A, B)\) distribution
\[ \begin{aligned} \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) &= \int_0^\infty \exp\{- k \lambda\} \frac{B^A}{\Gamma(A)} \lambda^{A-1} \exp\{-B \lambda\} d \lambda \\ &= \frac{B^A}{\Gamma(A)} \int_0^\infty \exp\{- (k + B) \lambda\} \lambda^{A-1} d \lambda \\ &= \frac{B^A}{(B + k)^A} \int_0^\infty \exp\{- (k + B) \lambda\} \frac{(B + k)^A}{\Gamma(A)} \lambda^{A-1} d \lambda \\ &= \left(\frac{B}{B + k} \right)^A \\ \end{aligned} \]
For \(A = \alpha + \sum_{i = 1}^n x_i\), \(B = \beta + n\), \[ \begin{aligned} a &= - \frac{\log \left( \textrm{E}_{\lambda \mid x}(\exp\{-k \lambda\}) \right)}{k} \\ &= \frac{A \log \left( \frac{B + k}{B}\right)}{k} \\ &= \frac{A (\log (B + k) - \log(B))}{k} \\ &= \frac{(\alpha + \sum_{i = 1}^n x_i) (\log (n + \beta + k) - \log(n + \beta))}{k} \,, \end{aligned} \]
Philipp Sterzinger - ST308 Assignment 2