ST308 Bayesian Inference

Class 4


Philipp Sterzinger

p.sterzinger@lse.ac.uk

21 February 2025

Downloading the Rmarkdown document

Go to Moodle, navigate to ST308 and download the computer class R notebook

Open it in RStudio and try to knit it

You might have to download rmarkdown

install.packages("rmarkdown")

Generating random numbers from distributions

Generating random numbers from distributions

The following commands can be used to generate from various distributions

  • Uniform: runif(n, min=0, max=1)

  • Normal: rnorm(n, mean = 0, sd = 1)

  • Beta: rbeta(n, shape1, shape2)

  • Gamma / chi-square / Exponential: rgamma(n, shape, rate = 1) Note: scale = 1/rate

  • Binomial/Bernoulli: rbinom(n, size, prob)

  • Negative Binomial: rnbinom(n, size, prob)

  • Poisson: rpois(n,lambda)

Generating random numbers from distributions

Probability density/mass functions, Cumulative density functions and quantile functions are also available. Check the result of

dnorm(0) 
dnorm(0)*sqrt(2*pi)
qnorm(0.975) 
pnorm(1.96)

Monte Carlo calculations in the case of Uniform(0,1)

\[ \begin{aligned} \int_{\mathcal{X}} g(x) f_X(x) dx &= \textrm{E}_X [g(x)] \\ & \approx \frac{1}{n} \sum_{i = 1}^n g(X_i)\,, \end{aligned} \]

where \(X_i\) are random variables with pdf \(f_X(x)\).

Monte Carlo calculations in the case of Uniform(0,1)

  1. Generate \(20\) points from the Uniform\((0,1)\) distribution.
Code
set.seed(123) 

us <- runif(20) 

us
 [1] 0.28757752 0.78830514 0.40897692 0.88301740 0.94046728 0.04555650
 [7] 0.52810549 0.89241904 0.55143501 0.45661474 0.95683335 0.45333416
[13] 0.67757064 0.57263340 0.10292468 0.89982497 0.24608773 0.04205953
[19] 0.32792072 0.95450365
  1. Calculate the sample mean, median, variance and provide a \(95\%\) interval by computing the \(2.5\%\) and \(97.5\%\) percentiles.
Code
u_mean <- mean(us) 
u_var <- var(us) 
u_perc <- quantile(us, probs = c(0.025, 0.5, 0.975))

df <- data.frame(c(u_mean, u_var, u_perc))
rownames(df) <- c("mean", "var", "2.5%", "50%", "97.5%")
colnames(df) <- ""
df
                
mean  0.55080839
var   0.09826408
2.5%  0.04372059
50%   0.53977025
97.5% 0.95572674

Monte Carlo calculations in the case of Uniform(0,1)

  1. Repeat with \(10000\) points and compare the results with the true values of the calculated quantities.
Code
set.seed(123) 

N <- 10^4 
us <- runif(N) 

probs <- c(0.025, 0.5, 0.975)

mean_row <- c(mean(us), 0.5) 
var_row <- c(var(us), 1 / 12) 
percs <- cbind(quantile(us, probs), probs) 

df <- data.frame(rbind(mean_row, var_row, percs))
rownames(df) <- c("mean", "var", "2.5%", "50%", "97.5%")
colnames(df) <- c("MC", "true")

df 
              MC       true
mean  0.49754937 0.50000000
var   0.08219329 0.08333333
2.5%  0.02445883 0.02500000
50%   0.49456763 0.50000000
97.5% 0.97323162 0.97500000

Activity 1

Repeat with a different distribution.

Activity 1

Code
set.seed(123) 
N <- 10^4
xs <- rnorm(N) 

probs <- c(0.025, 0.5, 0.975)

mean_row <- c(mean(xs), 0) 
var_row <- c(var(xs), 1) 
percs <- cbind(quantile(xs, probs), qnorm(probs)) 

df <- data.frame(rbind(mean_row, var_row, percs))
rownames(df) <- c("mean", "var", "2.5%", "50%", "97.5%")
colnames(df) <- c("MC", "true")

df 
                MC      true
mean  -0.002371702  0.000000
var    0.997275128  1.000000
2.5%  -1.977271262 -1.959964
50%   -0.011089219  0.000000
97.5%  1.951131474  1.959964

Density estimation

Histogram

Code
set.seed(123) 
xs <- rnorm(100) 
hist(xs, freq = FALSE)  

Code
set.seed(123) 
xs <- rnorm(10000) 
hist(xs, freq = FALSE, breaks = 100, col = 2)   

Kernel Density

Code
set.seed(123)
n <- 10000
y <- rnorm(n)
d <- density(y)
plot(d, main="Kernel Density of y")

polygon(d, col=rgb(1, 0, 0, 0.25), border="blue", lwd = 1)

x <- seq(-3, 3, length=1000)
lines(x, dnorm(x), col='purple', lwd = 2)

legend("topright", c("Kernel density", "N(0,1) pdf"), col=c("blue", "purple"), lwd=c(1, 2)) 

Activity 2

Repeat with a different distribution

Activity 2

Code
set.seed(123) 
xs <- rchisq(10000, 3) 
hist(xs, freq = FALSE, breaks = 100, col = 2)   

Code
d <- density(xs)
plot(d, main="Kernel Density of xs")

polygon(d, col=rgb(1, 0, 0, 0.25), border="blue", lwd = 1)

x <- seq(0, 20, length=1000)
lines(x, dchisq(x, 3), col='purple', lwd = 2)

legend("topright", c("Kernel density", "Chisq_3 pdf"), col=c("blue", "purple"), lwd=c(1, 2)) 

Point Estimation / Confidence Intervals

Frequentist Estimation and Confidence Intervals

Suppose that we observe a single observation \(y=98\) from the \(N(\theta,80)\) distribution. Find the MLE and provide a \(95\%\) confidence interval for \(\theta\).

The ML estimator maximises \[ f(\theta \mid y) = \frac{1}{\sqrt{2 \pi}} \exp \left\{ - \frac{(y - \theta)^2}{2 (80)} \right\} \,, \] which is simply \[ \hat{\theta} = y \sim \mathrm{N}(\theta, 80) \,. \]

Frequentist Estimation and Confidence Intervals

\[ Z = \sqrt{n} \frac{\hat{\theta} - \theta}{\sigma} \sim \mathrm{N}(0,1) \]

For two sided CI with confidence level \(\alpha\): \[ \begin{aligned} \Pr \left(| Z | > z \right) &= 1 - \Pr \left(-z \leq Z \leq z \right) \\ &= 1 - [\Psi(z) - \Psi(-z)] \\ &= [1 - \Psi(z)] - \Psi(-z) \\ &= 2 \Psi(-z) \\ &= 2 [1 - \Psi(z)] \\ &= \alpha \\ & \iff z_{\alpha / 2} = \Psi^{-1}\left( 1 - \frac{\alpha}{2}\right) \end{aligned} \]

\[ \alpha = \Pr\left( \left| \frac{\hat{\theta} - \theta}{\sigma} \right| > z_{\alpha / 2} \right) = \Pr \left(\hat{\theta} - z_{\alpha / 2} \sigma < \theta < \hat{\theta} + z_{\alpha / 2}\sigma \right) \]

Frequentist Estimation and Confidence Intervals

Code
alpha <- 0.05
z_a <- qnorm(1 - alpha / 2) 

hat_theta <- 98 
sigma <- sqrt(80)


CI_l <- hat_theta - sigma * z_a 
CI_u <- hat_theta + sigma * z_a 

cbind(CI_l, CI_u) 
         CI_l     CI_u
[1,] 80.46955 115.5305

Bayesian Estimation and Credible Intervals

Prior

\[ \theta \sim \mathrm{N}(110, 120) \]

Posterior

If prior is \(\mathrm{N}(\mu, \tau^2)\) and data \(y_1,\ldots, y_n\) is normal with mean \(\theta\), variance \(\sigma\), then

\[ \theta \mid y \sim \mathrm{N}\left( \frac{\sigma^2 \mu + n \tau^2 \bar{y}}{n \tau^2 + \sigma^2}, \frac{\sigma^2 \tau^2}{n \tau^2 + \sigma^2} \right) \,. \]

In our case

\[ \frac{\sigma^2 \mu + n \tau^2 \bar{y}}{n \tau^2 + \sigma^2} = \frac{80 \times 110 + 120 \times 98}{120 + 80} = 102.8 \]

\[ \frac{\sigma^2 \tau^2}{n \tau^2 + \sigma^2} = \frac{80 \times 120}{120 + 80} = 48 \]

Bayesian Estimation and Credible Intervals

Posterior

\[ \theta \mid y \sim \mathrm{N} \left(102.8, 48 \right) \]

Construct a symmetric \(95\%\) credible interval around the posterior mean

Code
alpha <- 0.05
z_a <- qnorm(1 - alpha / 2) 

post_mean <- 102.8 
post_var <- 48 



CI_l <- post_mean - sqrt(post_var) * z_a 
CI_u <- post_mean + sqrt(post_var) * z_a 

cbind(CI_l, CI_u) 
         CI_l    CI_u
[1,] 89.22097 116.379
Code
# qnorm(c(0.025,0.975), post_mean, sqrt(post_var))

Monte carlo:

Code
set.seed(123) 
N <- 10^5

ys <- rnorm(N, post_mean, sqrt(post_var))
quantile(ys, probs = c(0.025, 0.975))
     2.5%     97.5% 
 89.20886 116.35465 

Activity 3

Suppose that the posterior of a parameter \(\theta\) is the Beta(\(5,7\)) distribution.

Provide a Bayes estimator and a \(95\%\) credible interval for \(\theta\).

Activity 3

Bayes estimator

Bayes estimator minimises the posterior risk

\[ \rho(\delta, \pi(\theta)) = \int_{\Theta} \mathrm{L}(\delta, \theta) \pi(\theta \mid x) d \theta \,. \]

Loss function

For quadratic loss, we know the Bayes estimator is equal to the posterior mean

\[ \begin{aligned} % \frac{\partial}{\partial \delta} \rho(\delta, \pi(\theta)) &= \int_{\Theta} \frac{\partial}{\partial \theta} \mathrm{L}(\delta, \theta) \pi(\theta \mid x) d \theta &= \int_{\Theta} \frac{\partial}{\partial \theta} (\delta - \theta)^2 \pi(\theta \mid x) d \theta \\ &= 2 \int_{\Theta} (\delta - \theta) \pi(\theta \mid x) d \theta \\ &= 0 \iff \delta = \int_{\Theta} \theta \pi(\theta \mid x) d \theta \\ \end{aligned} \]

Activity 3

Posterior mean

For \(X\sim \textrm{Beta}(\alpha, \beta)\), \(\textrm{E}(X) = \alpha / (\alpha + \beta)\).

Code
alpha <- 5 
beta <- 7

post_mean <- alpha / (alpha + beta) 
post_med <- qbeta(0.5, alpha, beta) 
ci <- qbeta(c(0.025, 0.975), alpha, beta) 

out <- c(post_mean, post_med, ci)  
names(out) <- c("post_mean", "post_median", "CI_l", "CI_u") 
out 
  post_mean post_median        CI_l        CI_u 
  0.4166667   0.4118904   0.1674881   0.6920953 
Code
beta_dens <- function(x){dbeta(x, alpha, beta)} 
plot(beta_dens, from = 0, to = 1, main = "95% credible interval") 

x_vals <- seq(0, 1, length.out = 1000)

beta_ci <- function(x) {
    ifelse(x >= ci[1] & x <= ci[2], dbeta(x, alpha, beta), 0)
}

y_vals <- beta_ci(x_vals)

polygon(c(x_vals, rev(x_vals)), c(y_vals, rep(0, length(y_vals))), 
        col = rgb(1, 0, 0, 0.25), border = NA)

abline(v = out[1]) 

Activity 3

Monte Carlo

Code
N <- 100000
x <- rbeta(N, 5, 7)
out2 <- c(mean(x), median(x), quantile(x,probs=c(0.025,0.975)))

comp <- rbind(out, out2) 
rownames(comp) <- c("truth", "MC") 

comp
      post_mean post_median      CI_l      CI_u
truth 0.4166667   0.4118904 0.1674881 0.6920953
MC    0.4168747   0.4121561 0.1659781 0.6947470