ST303 Stochastic Simumlation

Class 9


Philipp Sterzinger

p.sterzinger@lse.ac.uk


Office hours: Thursday, COL 8.05



04 December 2025

Recap

Importance sampling

Want to estimate

\[ \textrm{E}_{f_X}(h(X)) = \int_{\mathcal{X}} h(x) f_X(x)dx \,, \]

but \(f_X\) may be difficult or inefficient to sample from.

Code
library(ggplot2)
library(dplyr)
library(gridExtra)

alpha <- 4.8
scale_orig <- 1
a <- 9.6

t <- 0.5
scale_tilt <- 1 / (1 - t)  # = 2

df <- data.frame(x = seq(0, 30, length.out = 2000)) %>%
  mutate(
    f  = dgamma(x, shape = alpha, scale = scale_orig),
    g  = dgamma(x, shape = alpha, scale = scale_tilt),
    h  = ifelse(x > a, sqrt(x - a), 0),
    logw = dgamma(x, shape = alpha, scale = scale_orig, log = TRUE) -
      dgamma(x, shape = alpha, scale = scale_tilt, log = TRUE),
    w   = exp(logw),
    h_w = h * w
  ) %>%
  mutate(
    h_scaled   = h   / max(h,   na.rm = TRUE) * max(f),
    h_w_scaled = h_w / max(h_w, na.rm = TRUE) * max(g)
  )

col_density   <- "#08519c"  # blue
col_integrand <- "#de2d26"  # red

p_naive <- ggplot(df, aes(x)) +
  geom_line(aes(y = f),       colour = col_density,   linewidth = 0.9) +
  geom_line(aes(y = h_scaled), colour = col_integrand, linewidth = 0.9, linetype = "dashed") +
  geom_vline(xintercept = a, linetype = "dotted") +
  labs(title = "Naive MC") +
  theme_minimal() +
  theme(
    axis.title       = element_blank(),
    legend.position  = "none"
  )

p_is <- ggplot(df, aes(x)) +
  geom_line(aes(y = g),         colour = col_density,   linewidth = 0.9) +
  geom_line(aes(y = h_w_scaled), colour = col_integrand, linewidth = 0.9, linetype = "dashed") +
  geom_vline(xintercept = a, linetype = "dotted") +
  labs(title = "Importance Sampling") +
  theme_minimal() +
  theme(
    axis.title       = element_blank(),
    legend.position  = "none"
  )

grid.arrange(p_naive, p_is, ncol = 1)

Importance sampling

Change of measure

Choose a nonegative function \(l\) that is nonzero almost everywhere and

\[\begin{multline} \textrm{E}_{f_X}(h(X)) = \int_{\mathcal{X}} h(x) f_X(x)dx = \int_{\mathcal{X}} \frac{h(x) }{l(x)} f_X(x)l(x) dx \\ = \int f_X(x)l(x) dx \int \frac{h(x) }{l(x)} \frac{f_X(x)l(x)}{\int f_X(x)l(x) dx} dx = \textrm{E}_{f_X}(l(X)) \textrm{E}_{g_{X^*}}\left(\frac{h(X^*)}{l(X^*)} \right)\,, \end{multline}\]

for \[ g_{X^*}(x) = \frac{f_X(x) l(x)}{\int f_X(x) l(x) dx} = \frac{f_X(x) l(x)}{\textrm{E}_{f_X}(l(X))}\,. \]

Importance sampling

Variance reduction

\[ \textrm{Var}_{g_{X^*}}\left(\textrm{E}_{f_X}(l(X)) \textrm{E}_{g_{X^*}}\left(\frac{h(X^*)}{l(X^*)} \right) \right) \ll \textrm{Var}_{f_X}(h(X)) \,. \]

Optimal, but unattainable:

\[ h(x) = |g(x)| f_X(x) \,, \]

would give zero variance.

Importance sampling

Esscher transform

A particular \(l\) is \(\exp\{tx\}\) for which

\[ g_{X^*} = \frac{\exp\{tx\}f_X(x)}{\int_{-\infty}^\infty f_X(x) \exp\{tx\} dx} = \frac{\exp\{tx\}f_X(x)}{M_X(t)} \,. \]

Choosing \(t\)

Make integrand

\[ h(x) \exp\{-tx\} M_X(t) \]

closely follow the contours of the distribution of \(X^*\) so that chosen sampling distribution is closer to the integrand’s shape, concentrating sampling effort on areas with significant contributions.

Questions

Question 1

Consider a random variable \(X\) with a gamma distribution with parameters \(4.8\) and \(1\) (\(4.8\) is the index parameter and \(1\) is the scale parameter).

  1. We are interested in approximating \(\Pr(X > 9.6)\). Find that with a straight forward Monte Carlo method.
  2. Repeat by using importance sampling with the Esscher transform. Choose the tilting parameter so that the mean of the tilted distribution is \(9.6.\)
  3. Compare your results with the true value you can get from 1−pgamma(9.6, 4.8, 1).
  4. Using the same tilting parameter estimate \(\textrm{E}[\sqrt{X − 9.6} \mid X > 9.6]\).

Question 1.a

We are interested in approximating \(\Pr(X > 9.6)\). Find that with a straight forward Monte Carlo method.

Code
N <- 10^3 
set.seed(123) 

X <- rgamma(N, 4.8, 1) 

est <- mean(X > 9.6) 

true <- 1 - pgamma(9.6, 4.8 , 1)

cat("Estimate: ", est, ", Truth: ", true, ", Error: ", abs(est - true))
Estimate:  0.029 , Truth:  0.03148364 , Error:  0.00248364

Question 1.b

Repeat by using importance sampling with the Esscher transform. Choose the tilting parameter so that the mean of the tilted distribution is \(9.6.\)

\[ \begin{aligned} \Pr \left(X > 9.6 \right) &= \textrm{E}_{f_X} \left( \mathbb{1} \{X > 9.6\} \right) \\ &= \int \mathbb{1} \{x > 9.6\} f_X(x) dx \\ &= \int \mathbb{1} \{x > 9.6\} f_X(x) \frac{\exp\{tx\}}{M_X(t)} \frac{M_X(t)}{\exp\{tx\}}dx \\ &= \int \mathbb{1} \{x > 9.6\} \exp\{-tx\} M_X(t) g_{X^*}(x) dx \\ &= M_X(t) \textrm{E}_{g^*} \left( \mathbb{1} \{X^* > 9.6\} \exp\{-tX^*\} \right) \\ & \approx M_X(t) \frac{1}{n} \sum_{i = 1}^n \mathbb{1} \{X_i^* > 9.6\} \exp\{-tX_i^*\} \,, \end{aligned} \]

for \(g_X = \frac{f_X(x) \exp\{tx\}}{M_X(t)}\)

Question 1.b

Choose the tilting parameter so that the mean of the tilted distribution is \(9.6.\)

  • pdf of Gamma distribution with shape \(\alpha\) and scale \(\theta\): \(f_X(x) = \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} \exp\{-x \lambda\}\)

  • MGF of Gamma distribution with shape \(\alpha\) and scale \(\theta\): \(M_X(t) = (1 - t / \lambda)^{-\alpha}\)

  • Mean of Gamma distribution with shape \(\alpha\) and scale \(\theta\): \(\alpha / \lambda\)

\[\begin{align} g_{X^*}(x) &= f_X(x) \frac{\exp\{tx\}}{M_X(t)} \\ &= \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} \exp\{-x \lambda\} \exp\{tx\} \left(1 - \frac{t}{\lambda}\right)^{\alpha}\\ &= \frac{1}{\Gamma(\alpha)} x^{\alpha - 1} \left\{\lambda \left(1 - \frac{t}{\lambda} \right) \right\}^\alpha \exp\{-x \lambda + tx\} \\ &= \frac{1}{\Gamma(\alpha)} \left(\lambda - t \right)^{\alpha} x^{\alpha - 1} \exp\{-x (\lambda - t) \} \end{align}\]

which is the pdf of a gamma distribution with shape \(\alpha\) and scale \(\lambda - t\)

Question 1.b

Hence

\[ \begin{aligned} \textrm{E}_{g_{X^*}} \left( X^* \right) &= \frac{\alpha}{ \lambda - t} \\ &= \frac{4.8}{1 - t} \\ &= 9.6 \\ &\iff t = \frac{1}{2} \\ \end{aligned} \]

and \(X^*\) follows a Gamma distribution with shape \(4.8\) and scale \(1 / 2\).

Code
N <- 10^3
set.seed(123) 

X_s <- rgamma(N, 4.8, 1 / 2) 

integrand <- (X_s > 9.6) * exp(-0.5 * X_s) * 2^(4.8)

est <- mean(integrand) 

cat("Estimate: ", est, ", Truth: ", true, ", Error: ", abs(est - true))
Estimate:  0.03081588 , Truth:  0.03148364 , Error:  0.0006677608

Question 1.c

Compare your results with the true value you can get from 1−pgamma(9.6, 4.8, 1).

Code
truth <- 1 - pgamma(9.6, 4.8, 1) 

mean_row <- c(mean(X > 9.6), mean(integrand), truth) 
var_row <- c(var(X > 9.6), var(integrand), 0) / N  

df <- data.frame(rbind(mean_row, var_row)) 
colnames(df) <- c("MC", "IS", "truth")
rownames(df) <- c("mean", "var") 

print(df) 
               MC           IS      truth
mean 2.900000e-02 3.081588e-02 0.03148364
var  2.818719e-05 3.223869e-06 0.00000000

Question 1.d

Using the same tilting parameter estimate \(\textrm{E}[\sqrt{X − 9.6} \mid X > 9.6]\).

\[ \begin{aligned} \textrm{E}[\sqrt{X − 9.6} \mid X > 9.6] = \frac{ \textrm{E}[\sqrt{X − 9.6} \mathbb{1}\{X > 9.6\}]}{\Pr(X > 9.6)} \end{aligned} \]

Code
N <- 10^5 
set.seed(123) 

X_s <- rgamma(N, 4.8, 0.5) 

integrand_1 <-  sqrt((X_s > 9.6) * (X_s - 9.6)) * exp(-0.5 * X_s) * 2^(4.8)
integrand_2 <- (X_s > 9.6) * exp(-0.5 * X_s) * 2^(4.8)

mean(integrand_1) / mean(integrand_2)
[1] 1.07377

Question 2

The loss to an insurer over a limited time can be described by \[ X = \sum_{i = 1}^N Y_i \,, \] where \(N\) is a Poisson random variable with mean \(10\) and the \(Y_i\)’s are exponentially distributed with parameter \(1\). A stop-loss reinsurance is arranged with a deductible of \(17\). We are interested in expected loss to the reinsurer \(\textrm{E}[(X − 17)_+]\).

Estimate it with a straightforward Monte Carlo method as well as using importance sampling with the Esscher transform. Comment your results.

Question 2

Using importance sampling with Esscher transform, we have

\[ \textrm{E}_{f_X}[h(X)] = M_X(t) \textrm{E}_{g_{X^*}} [h(X^*) \exp \{-t X^*\}] \]

So need

  1. MGF of \(X\): \(M_X(t)\)
  2. MGF of \(X^*\): \(M_{X^*}(t)\)
  3. Choose \(t\)

Question 2

\(M_X(t)\)

\[ M_X(t) = \textrm{E}_{f_X}[\exp\{t X\}] = \textrm{E}_{f_X} \left[\exp\left\{t \sum_{i = 1}^N Y_i\right\} \right] = \textrm{E}_{f_N}\left( \textrm{E}_{f_Y}\left[\exp\left\{t \sum_{i = 1}^N Y_i \right\} {\Bigg \vert} N \right] \right) \]

Using independence of the \(Y_i\)’s, we have

\[ \begin{aligned} \textrm{E}_{f_Y}\left[\exp\left\{t \sum_{i = 1}^N Y_i \right\} {\Bigg \vert} N \right] &= \textrm{E}_{f_Y}\left[\prod_{i = 1}^N \exp\left\{t Y_i \right\} {\Bigg \vert} N \right] \\ &= \prod_{i = 1}^N \textrm{E}_{f_Y}\left[\exp\left\{t Y_i \right\} { \vert} N \right] \\ &= M_Y(t)^N \\ &= \left(\frac{1}{1 - t} \right)^N \\ &= \exp \left\{N \log \left(\frac{1}{1 - t} \right)\right\} \end{aligned} \]

Question 2

\(M_X(t)\)

Hence for \(\tilde{t} = \log \left(\frac{1}{1 - t} \right)\)

\[ \begin{aligned} \textrm{E}_{f_N}\left( \exp \left\{N \log \left(\frac{1}{1 - t} \right)\right\} \right) &= \textrm{E}_{f_N}\left( \exp \left\{N \tilde{t} \right\} \right) \\ &= M_N(\tilde{t}) \\ &= \exp \{10 (\exp\{\tilde{t}\} - 1) \} \\ &= \exp \left\{10\left( \frac{1}{1 - t} - 1 \right) \right\} \\ &= M_X(t) \end{aligned} \]

Question 2

\(M_{X^*}(r)\)

\[ \begin{aligned} M_{X^*}(r) &= \textrm{E}_{g_{X^*}} \left[\exp\{r X^*\} \right] \\ &= \int \exp\{rx \} g_{X^*}(x) dx \\ &= \int \exp\{rx \} \frac{\exp\{t x\} f_X(x)}{M_X(t)} dx \\ &= \frac{1}{M_X(t)} \int \exp\{(r + t) x \} f_X(x) dx \\ &= \frac{1}{M_X(t)} \textrm{E}_{f_X} [\exp\{(t + r) X\}] \\ &= \frac{M_X(r + t)}{M_X(t)} \end{aligned} \]

Question 2

\(M_{X^*}(r)\)

\[ \begin{aligned} \frac{M_X(r + t)}{M_X(t)} &= \exp\left\{10 \left(\frac{1}{1 - t - r} - 1 - \left(\frac{1}{1 - t} - 1\right) \right) \right\} \\ &= \exp\left\{10 \left(\frac{1}{1 - t - r} - \frac{1}{1 - t} \right) \right\} \\ &= \exp\left\{10 \left(\frac{1 - t - (1 - t - r)}{(1 - t - r)(1 - t)} \right) \right\} \\ &= \exp\left\{10 \left(\frac{r}{(1 - t - r)(1 - t)} \right) \right\} \\ &= \exp\left\{\frac{10}{1 - t} \left(\frac{r}{1 - t - r} \right) \right\} \\ &= \exp\left\{\frac{10}{1 - t} \left(\frac{1 - t}{1 - t - r} - 1 \right) \right\} \\ \end{aligned} \]

Question 2

\(M_{X^*}(r)\)

Hence \[ X^* = \sum_{i = 1}^{N^*} Y_i^* \] where \(N^* \sim \textrm{Pois}(10 / (1 - t))\) and \(Y_i \sim \textrm{Exp}(1 - t)\).

Question 2

Choosing \(t\)

Interested in \(\textrm{E}[(X − 17)_+]\), so choose \(t\) such that \(\textrm{E}(X^*) = 17\).

\[ \begin{aligned} \frac{\partial M_{X^*}(r)}{\partial r} &= \frac{\partial}{\partial r} \exp\left\{\frac{10}{1 - t} \left(\frac{1 - t}{1 - t - r} - 1 \right) \right\} \\ &= \exp\left\{\frac{10}{1 - t} \left(\frac{1 - t}{1 - t - r} - 1 \right) \right\} \left(\frac{10}{1 - t} \frac{1 - t}{(1 - t - r)^2} \right)\\ &= \exp\left\{\frac{10}{1 - t} \left(\frac{1 - t}{1 - t - r} - 1 \right) \right\} \left(\frac{10}{(1 - t - r)^2} \right) \end{aligned} \]

and thus

\[ \textrm{E}(X^*) = \frac{\partial M_{X^*}(r) }{\partial r} {\huge \vert}_{r = 0} = \frac{10}{(1 - t)^2} \]

which equals \(17\) for

\[ t = 1 - \sqrt{\frac{10}{17}} \approx 0.233035 \]

Question 2

Hence we arrive at the MC estimator

\[ \begin{aligned} \textrm{E}_{f_X}[(X - 17)_+] &= M_X(t) \textrm{E}_{g_{X^*}} [(X^* - 17)_+ \exp \{-t X^*\}] \\ &\approx \exp \left\{10\left( \frac{1}{1 - t} - 1 \right) \right\} \frac{1}{n} \sum_{i = 1}^n (X^*_i - 17)_+ \exp \{-t X^*_i\} \end{aligned} \] and \(t = 1 - \sqrt{10 / 17}\).

Question 2

Code
N <- 10^5 

# MC estimator 
MC_estimator<- function(seed){
    set.seed(seed) 
    N <- rpois(1, 10) 
    Y <- rexp(N, 1) 
    X <- sum(Y) 
    max(X - 17, 0)
}

## IS estimator 
IS_estimator <- function(seed, t = 1 - sqrt(10 / 17)){
    set.seed(seed) 
    N <- rpois(1, 10 / (1 - t)) 
    Y_s <- rexp(N, 1 - t) 
    X_s <- sum(Y_s)  
    exp(10 * t / (1 - t)) * max(X_s - 17, 0) * exp(-t * X_s)  
}

MC_estimates <- vapply(1:N, MC_estimator, 0.0) 
IS_estimates <- vapply(1:N, IS_estimator, 0.0) 

means <- c(mean(MC_estimates), mean(IS_estimates)) 
vars <- c(var(MC_estimates), var(IS_estimates)) / N 

df <- data.frame(rbind(means, vars)) 
colnames(df) <- c("MC", "IS") 
rownames(df) <- c("mean", "var") 

print(df) 
               MC           IS
mean 2.087753e-01 2.028024e-01
var  1.062372e-05 6.193147e-07

Question 3

Generalizing Question 2, let \[ X = \sum_{i = 1}^N Y_i \,, \] where \(N\) is a Poisson random variable with mean \(\lambda\) and the \(Y_i\)’s have moment generating function \(M_Y(r)\).

  1. Show that the tilted distribution can be described by \[ X_* = \sum_{i = 1}^{N^*} Y_i^* \,, \] where \(N^*\) is a Poisson random variable with mean \(\lambda M_Y (t)\) and \(Y_i^*\)’s have moment generating function \(M_Y(r + t)/M_Y(t)\).

  2. Show also that \(\Pr(X > b)\) can be estimated by \[ \Pr(X > b) = M_X(t) \textrm{E}_{g_{X^*}}(\exp\{-tX^*\} \mathbb{1}\{X^* > b\}) \]

  3. Moreover, show that a good choice for \(t\) if \(\Pr(X > b)\) can be estimated is the solution of the equation \(\lambda M_{Y}'(t) = b\).

Question 3.a

As in Question 2, by conditioning on \(N\) and using that \(Y_i\)’s are i.i.d., we get that

\[ \textrm{E}_{f_Y}\left[\exp\left\{t \sum_{i = 1}^N Y_i \right\} {\Bigg \vert} N \right] = \exp \left\{N \log \left(M_Y(t) \right)\right\} \] and again substituting \(\tilde{t} = \log \left(M_Y(t) \right)\) and using that \(M_N(t) = \exp\{ \lambda (\exp\{t\} - 1)\}\), we get \[ \begin{aligned} M_X(t) &= \textrm{E}_{f_X}(\exp\{tX\}) \\ &= \textrm{E}_{f_N}\left(\textrm{E}_{f_Y}\left[ \exp\{t \sum_{i = 1}^N Y_i\} {\huge \vert} N \right]\right) \\ &= \textrm{E}_{f_N}\left(\exp \left\{N \log \left(M_Y(t) \right)\right\} \right) \\ &= \exp\{ \lambda (M_Y(t) - 1) \} \end{aligned} \]

Question 3.a

We have already established in Question 2 that \(M_{X^*}(r) =M_X(r + t) / M_X(t)\) so that

\[ \begin{aligned} M_{X^*}(r) &= \frac{M_X(r + t)}{M_X(t)} \\ &= \exp\{ \lambda \left(M_Y(t + r) - M_Y(t)\right) \} \\ &= \exp\left\{ \lambda M_Y(t)\left(\frac{M_Y(t + r)}{M_Y(t)} - 1\right) \right\} \end{aligned} \]

and hence

\[ X^* = \sum_{i = 1}^{N^*} Y_i^* \,, \]

where \(N^* \sim \textrm{Pois}(\lambda M_Y(t))\) and the \(Y_i^*\)’s have MGF \({M_Y(t + r)} / {M_Y(t)}\).

Question 3.b

Change of measure

\[ \begin{aligned} \Pr(X > b) &= \textrm{E}_{f_X}(\mathbb{1}\{X > b\}) \\ &= \int \mathbb{1}\{x > b\} f_X(x) dx \\ &= \int \mathbb{1}\{x > b\} \exp\{-tx\} \exp\{tx\} f_X(x) \frac{\int f_X(x) \exp\{tx\} dx}{\int f_X(x) \exp\{tx\} dx} dx \\ &= \int \mathbb{1}\{x > b\} \exp\{-tx\} g_{X^*}(x) M_X(t) dx \\ &= M_X(t) \textrm{E}_{g_{X^*}}(\mathbb{1}\{X^* > b\} \exp\{-tX^*\}) \end{aligned} \]

Question 3.c

Choose \(t\) such that \(\textrm{E}_{g_{X^*}} (X^*) = b\) using the MGF

\[ \begin{aligned} \textrm{E}_{g_{X^*}} (X^*) &= \frac{\partial M_{X^*}(r)}{\partial r} {\huge \vert}_{r = 0} \\ &= \frac{\partial \exp\{ \lambda \left(M_Y(t + r) - M_Y(t)\right) \}}{\partial r} {\huge \vert}_{r = 0} \\ &= \left( \exp\{ \lambda \left(M_Y(t + r) - M_Y(t)\right) \} \lambda M_Y'(t + r)\right) {\huge \vert}_{r = 0} \\ &= \lambda M_Y'(t) \end{aligned} \]

Question 4

Apply the results of Question 3 to the following problem. The number of trades a day trader performs on a single day is Poisson with mean \(16\). Each trade results to a profit or loss that is normally distributed with mean \(0\) and variance \(1\). Approximate the probability that at the end of a given day, her profit is more than \(8\).

Tilted distribution

  • \(N^* \sim \textrm{Pois}(\lambda M_Y(t))\)
  • \(Y_i^*\)’s have MGF \({M_Y(t + r)} / {M_Y(t)}\)
  • \(M_X(t) = \exp\{\lambda (M_Y(t) - 1)\}\) MGF of normal distribution: \(M_Y(t) = \exp\{\mu t (\sigma^2 t^2) / 2 \}\)

Thus

  • \(N^* \sim \textrm{Pois}(16 \exp\{t^2 / 2\})\)
  • \(Y_i^* \sim \mathrm{N}(t, 1)\) as \(({M_Y(t + r)} / {M_Y(t)}) = \exp\{((t + r)^2 - t^2) / 2 \} = \exp\{tr + r^2 / 2 \}\)
  • \(M_X(t) = \exp\{16 (\exp\{t^2 / 2\} - 1)\}\)

Question 4

Choosing t

MGF of standard normal distribution: \(M_Y(t) = \exp\{ t^2 / 2 \}\)

Hence need to solve \[ 16 t \exp\{t^2 /2\} = 8 \] numerically

Code
t_root <- function(t){t * exp(t^2 / 2) - 0.5} 

t <- uniroot(t_root, c(-10, 10))$root

t 
[1] 0.4515319

Question 4

Estimating \(\Pr(X > 8)\)

Code
N <- 10^5 

IS_estimator <- function(seed, t){
    set.seed(seed) 
    N <- rpois(1, 16 * exp(t^2 / 2)) 
    Y_s <- rnorm(N, mean = t, sd = 1) 
    X_s <- sum(Y_s) 

   exp(16 * (exp(t^2 / 2) - 1)) * (X_s > 8) * exp(-t * X_s)  
}

IS_estimates <- vapply(1:N, IS_estimator, 0.0, t = t)  

IS_mean <- mean(IS_estimates) 
IS_var <- var(IS_estimates) / N 

cbind(IS_mean, IS_var) 
        IS_mean       IS_var
[1,] 0.02329405 1.426862e-08

Question 5

Apply the method of section 5.4.4 of notes to the case where the Poisson mean is \(3\) and the probability of interest is \(\Pr(X \geq 7)\).

  • Poisson pdf: \(\Pr(X = k) = \frac{\lambda^k \exp\{-\lambda\}}{k!}\)
  • Geometric pdf: \(\Pr(Y = k) = \theta (1-\theta)^{k}\)

\[ \begin{aligned} \Pr(X \geq a) &= \sum_{i = a}^\infty \frac{\lambda^k \exp\{-\lambda\}}{k!} \\ &= \sum_{i = 0}^\infty \frac{\lambda^{k + a} \exp\{-\lambda\}}{(k + a)!} \\ &= \sum_{i = 0}^\infty \frac{\lambda^{k + a} \exp\{-\lambda\}}{(k + a)!} \frac{1}{ \theta (1-\theta)^{k}} \theta (1-\theta)^{k }\\ &= \textrm{E}_{f_Y} \left( \frac{\lambda^{Y + a} \exp\{-\lambda\}}{ (Y + a)!\theta (1-\theta)^{Y}} \right) \\ & \approx \frac{1}{n} \sum_{i = 1}^n \frac{\lambda^{Y_i + a} \exp\{-\lambda\}}{(Y_i + a)!\theta (1-\theta)^{Y_i}} \end{aligned} \]

Question 5

Code
N <- 10^5 
set.seed(123) 

integrand <- function(Y, lambda, theta, a){
    lambda^(Y + a) * exp(-lambda) / (factorial(Y + a) * theta * (1 - theta)^Y) 
}

theta <- 0.6
lambda <- 3 
a <- 7 

Y <- rgeom(N, theta) 

prob_est <- mean(integrand(Y, lambda, theta, a)) 
prob_truth <- 1 - ppois(a - 1, lambda) 

cbind(prob_est, prob_truth)
       prob_est prob_truth
[1,] 0.03348215 0.03350854