Assume we have a random variable \(X\) with pdf \(f_X\) such that
\[
f_X(x) = \sum_{i = 1}^N \alpha_i f_i(x)
\] where \(f_i\) are known pdf functions (that we can sample from), and \(\alpha_i \in [0,1]\), \(\sum_{i = 1}^N \alpha_i = 1\), \(N \in \mathbb{N} \cup \infty\).
Sampling algorithm
Draw mixture component \(m\) according to weights \(\alpha_i\)
pdf \(f_X\) of a random variable \(X\) that is easy to evaluate but difficult to sample from
another pdf \(g_Y\) of a random variable \(Y\) with same support as \(X\), that we can sample from easily and for which \(\sup_x \, \frac{f_X(x)}{g_Y(x)} \leq c\), \(c \geq 1\)
Want: A sampling routine for \(X\)
Algorithm
Draw \(Y\) according to \(g_Y\)
Draw \(U \sim \mathcal{U}({0,1})\)
Check if \(U \leq \frac{f_X(Y)}{cg_Y(Y)}\)
3.a. If yes, \(X = Y\)
3.b. Otherwise, go to 1.
Rejection sampling
Claim: \(X\) from the rejection sampling scheme has distribution according to \(f_X\).
\[
\begin{aligned}
\Pr \left(X \leq z \right) &= \Pr \left(Y \leq z {\huge \mid} U \leq \frac{f_X(Y)}{c g_Y(Y)} \right)\\
% &= \frac{\Pr \left(Y \leq z \cap \textrm{accept} \right) }{\Pr(\textrm{accept})} \\
&= \frac{\Pr \left(Y \leq z \cap U \leq \frac{f_X(Y)}{g_Y(Y)}\right) }{\Pr\left(U \leq \frac{f_X(Y)}{c g_Y(Y)}\right)} \\
&= \frac{\int_{-\infty}^z \int_{0}^{\frac{f_X(y)}{c g_Y(y)}} f_{Y, U}(y,u) du dy }{\int_{-\infty}^{\infty} \int_{0}^{\frac{f_X(y)}{c g_Y(y)}} f_{Y, U}(y,u) du dy} \\
&= \frac{\int_{-\infty}^z \int_{0}^{\frac{f_X(y)}{c g_Y(y)}} g_Y(y) du dy }{\int_{-\infty}^{\infty} \int_{0}^{\frac{f_X(y)}{c g_Y(y)}} g_Y(y) du dy} && {\Huge|} f_{Y,U}(y,u) = g_Y(y) \mathbb{1}\{ u \in (0,1) \} \\
&= \int_{-\infty}^{z} f_X(y) dy = F_X(z) && {\Huge|} \int_0^{\frac{f_X(y)}{c g_Y(y)}} g_Y(y) du = \frac{f_X(y)}{c}
\end{aligned}
\]
Rejection sampling
Code
library(ggplot2) library(patchwork) xs <-seq(-3, 3, length.out =1000)p <-function(x){exp(-2*abs(x))} q <-function(x){3.5*dcauchy(x)} set.seed(123) q_sample <-rcauchy(length(xs)) us <-q(q_sample) *runif(length(xs)) threshold <-p(q_sample) /q(q_sample)x_point <-0.5y_point <-q(x_point) mid_point <-p(x_point)df <-data.frame(x = xs, y =c(p(xs), q(xs)), group =rep(c(1,2), each =length(xs)), u = us, q = q_sample, fill =factor(us <p(q_sample))) rej_plot <-ggplot(df, aes(x = x, y = y, group = group)) +geom_point(aes(x = q, y = u, fill = fill), shape =21, color ="black", stroke =NA, size =1.5, alpha = .3) +scale_color_manual(name ="Distribution",values =c("1"="steelblue", "2"="grey"),labels =c("target", "proposal") ) +scale_fill_manual(name ="Proposal sample",values =c("TRUE"="steelblue", "FALSE"="tomato"), labels =c("TRUE"="accept", "FALSE"="reject") ) +geom_segment(data =data.frame(x = x_point, y = mid_point),aes(x = x, xend = x, y =0, yend = y, group =1), color ="steelblue", lty =1) +geom_segment(data =data.frame(x = x_point, y = mid_point, y_end = y_point),aes(x = x, xend = x, y = y, yend = y_end, group =1), color ="tomato", lty =1) +geom_line(aes(colour =factor(group))) +theme_minimal() +coord_cartesian(xlim =c(-3, 3)) +theme(plot.margin =margin(t =0, b =0), axis.title.x =element_blank(), axis.text.x =element_blank(), axis.title.y =element_blank(), axis.ticks.x =element_blank()) rej_sample <- q_sample[us < threshold] bottom_hist <-ggplot(data.frame(x = rej_sample), aes(x = x)) +geom_histogram(aes(y =after_stat(density)), bins =30, color =NA, fill ="steelblue", alpha = .3) +geom_line(data =data.frame(x = xs, y =p(xs)), aes(x = x, y = y), color ="steelblue", alpha = .3) +scale_x_continuous(limits =c(-3, 3)) +scale_y_reverse(limits =c(1, 0), breaks =rev(c(0, .3, .6, .9))) +theme_minimal() +theme(plot.margin =margin(t =0, b =0), axis.title.x =element_blank(), axis.title.y =element_blank(), axis.ticks.y =element_blank()) rej_plot / bottom_hist +plot_layout(guides ="collect")
Expected acceptance rate
The expected acceptance rate of the rejection sampler is \(\Pr(\text{accept}) = n/c\).
Explain how you would generate random variables with density \[
f(x) = (1 - \alpha) + \alpha \frac{2}{(1 + x)^2}, \quad x \in (0, 1)
\] If you want to write the code, set \(\alpha = 0.6\).
We see that the random variable \(X\) is a discrete mixture of a uniform random variable \(Y\) and another random variable \(Z\) with density \(2 / (1 + x^2)\).
Sampling scheme
Draw Bernoulli random variable \(I\) with \(\Pr(I = 1) = \alpha\)
alpha <-0.6N <-10^5inv_cdf <-function(x){ x / (2- x)}set.seed(123) Is <-rbinom(N, 1, alpha) Xs <-runif(N)Xs[Is ==1] <-inv_cdf(Xs[Is ==1])xs <-seq(from =0, to =1, length.out =1000)pdf_fun <-function(x){ (1- alpha) + alpha *2/ (1+ x)^2}hist(Xs, freq =FALSE) lines(xs, pdf_fun(xs), col ="blue", lwd =2)
Question 2
Generate a sample from the density \[
f(x) = \sum_{i = 1}^5 \alpha_i \frac{i^3 x^2 e^{-ix}}{2}
\] for \(\alpha = (0.1, 0.2, 0.2, 0.4, 0.1)^\top\). Use the sample function for the weights. Plot a histogram and calculate the mean (with a \(95\%\) confidence interval) comparing it with the actual mean of the distribution.
Note that the densities of the mixture components are from a gamma distribution with shape \(\alpha = 3\), and rate \(\beta = i\), i.e.
Let \(U \sim \Gamma(3,1)\), \(V \sim \Gamma(2, 1)\) independent of each other. For \(X = \frac{U}{U + V}\), give the qqplot of a sample \(X\) of size \(1000\) against \(Y \sim B(3,2)\). What do you see?
Code
N <-1000set.seed(123) U <-rgamma(N, shape =3, rate =1)V <-rgamma(N, shape =2, rate =1)X <- U / (U + V) Y <-rbeta(N, 3, 2) par(mfrow =c(1, 2))qqplot(Y, X, xlab ="B(3, 2)", ylab ="X=U/(U+V)", col ="red")abline(0, 1, col ="blue")plot(density(X), lwd =2, col ="blue", xlab ="x",main ="Probability Density Function")lines(density(Y), lwd =2, col ="red")
Question 4
Generate a sample from the density \[
f(x) = \frac{1}{6} + \frac{x}{2} + x^2 + x^3, \quad 0 < x < 1 \,.
\] Plot a histogram and calculate the mean (with a \(95\%\) confidence interval) comparing it with the actual mean of the distribution.
Using acceptance-rejection methods to generate random variable \(X\) with density \[
(1 − x)(1 + 3x), \quad 0 < x < 1 \,.
\] You should try with the following envelopes:
\(1\)
\(2(1 − x)\)
\(2(1+3x) / 5\)
Which one of the three methods is best?
Question 5
Calculating upper bounds on \(f(x) / g(x)\)
\(\sup_x (1 - x) (1 + 3 x) = 4 / 3\)
\(\sup_x (1 + 3x) / 2 = 2\)
\(\sup_x 5(1 - x) / 2 = 5 / 2\)
Question 5
Code
c_1 <-4/3c_2 <-2c_3 <-5/2pdf <-function(x){(1- x) * (1+3* x)} r_1 <-function(x){rep(c_1, length(x))} r_2 <-function(x){c_2 * (1+3* x) /2} r_3 <-function(x){c_3 * (1- x) *5/2}xs <-seq(from =0, to =1, length.out =1000)plot(xs, pdf(xs), type ="l", xlim =c(0, 1), ylim =c(0, 2.5), main ="Acceptance region", ylab ="", xlab ="", col ="blue")polygon(c(xs, rev(xs)),c(pdf(xs), rep(0, length(xs))),col =adjustcolor("lightblue", alpha.f =0.5),border =NA)lines(xs, pdf(xs), col ="blue", lwd =2)curve(r_1, from =0, to =1, col ="red", add =TRUE, lwd =2)curve(r_2, from =0, to =1, col ="purple", add =TRUE, lwd =2)curve(r_3, from =0, to =1, col ="green", add =TRUE, lwd =2)legends <-c("Target",expression((1- x) * (1+3* x)),expression(frac(1+3* x, 2)),expression(frac(5* (1- x), 2)))par(xpd =NA)legend("topright",#inset = c(-0.1, 0),legend = legends,lwd =rep(2, 4),col =c("blue", "red", "purple", "green"),cex =0.5,bty ="n")