p.sterzinger@lse.ac.uk
21 February 2025
Go to Moodle, navigate to ST308 and download the computer class R notebook
Open it in RStudio and try to knit it
You might have to download rmarkdown
The following commands can be used to generate from various distributions
Uniform: runif(n, min=0, max=1)
Normal: rnorm(n, mean = 0, sd = 1)
Beta: rbeta(n, shape1, shape2)
Gamma / chi-square / Exponential: rgamma(n, shape, rate = 1) Note: scale = 1/rate
Binomial/Bernoulli: rbinom(n, size, prob)
Negative Binomial: rnbinom(n, size, prob)
Poisson: rpois(n,lambda)
Probability density/mass functions, Cumulative density functions and quantile functions are also available. Check the result of
\[ \begin{aligned} \int_{\mathcal{X}} g(x) f_X(x) dx &= \textrm{E}_X [g(x)] \\ & \approx \frac{1}{n} \sum_{i = 1}^n g(X_i)\,, \end{aligned} \]
where \(X_i\) are random variables with pdf \(f_X(x)\).
[1] 0.28757752 0.78830514 0.40897692 0.88301740 0.94046728 0.04555650
[7] 0.52810549 0.89241904 0.55143501 0.45661474 0.95683335 0.45333416
[13] 0.67757064 0.57263340 0.10292468 0.89982497 0.24608773 0.04205953
[19] 0.32792072 0.95450365
set.seed(123)
N <- 10^4
us <- runif(N)
probs <- c(0.025, 0.5, 0.975)
mean_row <- c(mean(us), 0.5)
var_row <- c(var(us), 1 / 12)
percs <- cbind(quantile(us, probs), probs)
df <- data.frame(rbind(mean_row, var_row, percs))
rownames(df) <- c("mean", "var", "2.5%", "50%", "97.5%")
colnames(df) <- c("MC", "true")
df MC true
mean 0.49754937 0.50000000
var 0.08219329 0.08333333
2.5% 0.02445883 0.02500000
50% 0.49456763 0.50000000
97.5% 0.97323162 0.97500000
Repeat with a different distribution.
set.seed(123)
N <- 10^4
xs <- rnorm(N)
probs <- c(0.025, 0.5, 0.975)
mean_row <- c(mean(xs), 0)
var_row <- c(var(xs), 1)
percs <- cbind(quantile(xs, probs), qnorm(probs))
df <- data.frame(rbind(mean_row, var_row, percs))
rownames(df) <- c("mean", "var", "2.5%", "50%", "97.5%")
colnames(df) <- c("MC", "true")
df MC true
mean -0.002371702 0.000000
var 0.997275128 1.000000
2.5% -1.977271262 -1.959964
50% -0.011089219 0.000000
97.5% 1.951131474 1.959964
set.seed(123)
n <- 10000
y <- rnorm(n)
d <- density(y)
plot(d, main="Kernel Density of y")
polygon(d, col=rgb(1, 0, 0, 0.25), border="blue", lwd = 1)
x <- seq(-3, 3, length=1000)
lines(x, dnorm(x), col='purple', lwd = 2)
legend("topright", c("Kernel density", "N(0,1) pdf"), col=c("blue", "purple"), lwd=c(1, 2)) Repeat with a different distribution
Suppose that we observe a single observation \(y=98\) from the \(N(\theta,80)\) distribution. Find the MLE and provide a \(95\%\) confidence interval for \(\theta\).
The ML estimator maximises \[ f(\theta \mid y) = \frac{1}{\sqrt{2 \pi}} \exp \left\{ - \frac{(y - \theta)^2}{2 (80)} \right\} \,, \] which is simply \[ \hat{\theta} = y \sim \mathrm{N}(\theta, 80) \,. \]
\[ Z = \sqrt{n} \frac{\hat{\theta} - \theta}{\sigma} \sim \mathrm{N}(0,1) \]
For two sided CI with confidence level \(\alpha\): \[ \begin{aligned} \Pr \left(| Z | > z \right) &= 1 - \Pr \left(-z \leq Z \leq z \right) \\ &= 1 - [\Psi(z) - \Psi(-z)] \\ &= [1 - \Psi(z)] - \Psi(-z) \\ &= 2 \Psi(-z) \\ &= 2 [1 - \Psi(z)] \\ &= \alpha \\ & \iff z_{\alpha / 2} = \Psi^{-1}\left( 1 - \frac{\alpha}{2}\right) \end{aligned} \]
\[ \alpha = \Pr\left( \left| \frac{\hat{\theta} - \theta}{\sigma} \right| > z_{\alpha / 2} \right) = \Pr \left(\hat{\theta} - z_{\alpha / 2} \sigma < \theta < \hat{\theta} + z_{\alpha / 2}\sigma \right) \]
\[ \theta \sim \mathrm{N}(110, 120) \]
If prior is \(\mathrm{N}(\mu, \tau^2)\) and data \(y_1,\ldots, y_n\) is normal with mean \(\theta\), variance \(\sigma\), then
\[ \theta \mid y \sim \mathrm{N}\left( \frac{\sigma^2 \mu + n \tau^2 \bar{y}}{n \tau^2 + \sigma^2}, \frac{\sigma^2 \tau^2}{n \tau^2 + \sigma^2} \right) \,. \]
In our case
\[ \frac{\sigma^2 \mu + n \tau^2 \bar{y}}{n \tau^2 + \sigma^2} = \frac{80 \times 110 + 120 \times 98}{120 + 80} = 102.8 \]
\[ \frac{\sigma^2 \tau^2}{n \tau^2 + \sigma^2} = \frac{80 \times 120}{120 + 80} = 48 \]
\[ \theta \mid y \sim \mathrm{N} \left(102.8, 48 \right) \]
Construct a symmetric \(95\%\) credible interval around the posterior mean
CI_l CI_u
[1,] 89.22097 116.379
Monte carlo:
Suppose that the posterior of a parameter \(\theta\) is the Beta(\(5,7\)) distribution.
Provide a Bayes estimator and a \(95\%\) credible interval for \(\theta\).
Bayes estimator minimises the posterior risk
\[ \rho(\delta, \pi(\theta)) = \int_{\Theta} \mathrm{L}(\delta, \theta) \pi(\theta \mid x) d \theta \,. \]
For quadratic loss, we know the Bayes estimator is equal to the posterior mean
\[ \begin{aligned} % \frac{\partial}{\partial \delta} \rho(\delta, \pi(\theta)) &= \int_{\Theta} \frac{\partial}{\partial \theta} \mathrm{L}(\delta, \theta) \pi(\theta \mid x) d \theta &= \int_{\Theta} \frac{\partial}{\partial \theta} (\delta - \theta)^2 \pi(\theta \mid x) d \theta \\ &= 2 \int_{\Theta} (\delta - \theta) \pi(\theta \mid x) d \theta \\ &= 0 \iff \delta = \int_{\Theta} \theta \pi(\theta \mid x) d \theta \\ \end{aligned} \]
For \(X\sim \textrm{Beta}(\alpha, \beta)\), \(\textrm{E}(X) = \alpha / (\alpha + \beta)\).
post_mean post_median CI_l CI_u
0.4166667 0.4118904 0.1674881 0.6920953
beta_dens <- function(x){dbeta(x, alpha, beta)}
plot(beta_dens, from = 0, to = 1, main = "95% credible interval")
x_vals <- seq(0, 1, length.out = 1000)
beta_ci <- function(x) {
ifelse(x >= ci[1] & x <= ci[2], dbeta(x, alpha, beta), 0)
}
y_vals <- beta_ci(x_vals)
polygon(c(x_vals, rev(x_vals)), c(y_vals, rep(0, length(y_vals))),
col = rgb(1, 0, 0, 0.25), border = NA)
abline(v = out[1]) post_mean post_median CI_l CI_u
truth 0.4166667 0.4118904 0.1674881 0.6920953
MC 0.4168747 0.4121561 0.1659781 0.6947470
Philipp Sterzinger - ST308 Week 4