```{r}
set.seed(5172013)
```
Day 7 Lab: Monte Carlo Integration
Monte Carlo integration
Example 1: integral of \(e^{x^4 - x^2}\) over [0,1]
Define an integrand
```{r}
integrand_function <- function(x){
return(exp(x^4 - x^2))
}
```
Plot the integrand on the interval [0,1]
```{r}
x_values <- c(0:100) / 100
plot(x_values, integrand_function(x_values), type = "l", lwd = 2, xlab = "x", ylab = "y", cex.lab = 1.3, col = "red")
text(0.6, 0.95, expression(y == e^{x^4 - x^2}), cex = 2)
```
See ?plotmath
for more info on plotting math in R
Deterministic numerical integration from 0 to 1
Notice that we are passing function integrand_function
as an argument Also the integral can be seen as \(E(e^{U^4 - U^2})\), where \(U \sim Uniform(0, 1)\)
```{r}
det_int <- integrate(integrand_function, lower = 0, upper = 1)
```
The function integrate()
returns an object, not a value to find out what is in the object, type names(det_int)
```{r}
names(det_int)
det_int$value
det_int$message
```
[1] "value" "abs.error" "subdivisions" "message" "call"
[1] 0.8785237
[1] "OK"
Monte Carlo integration
By default runif()
defines uniform distribution on [0,1]
```{r}
mcarlo_iterations <- 1000000
unif_sample <- runif(mcarlo_iterations)
```
artihmetic average
```{r}
(mcarlo_int = mean(integrand_function(unif_sample)))
```
[1] 0.8785647
Monte Carlo error
```{r}
(mcarlo_error = sd(integrand_function(unif_sample))/sqrt(mcarlo_iterations))
```
[1] 7.702924e-05
Monte Carlo 95% confidence interval
```{r}
(c(mcarlo_int - 1.96*mcarlo_error, mcarlo_int + 1.96*mcarlo_error))
```
[1] 0.8784137 0.8787157
Compare to the deterministic numerical integration
```{r}
(det_int)
```
0.8785237 with absolute error < 1e-14
Example 2: second moment of the Beta distribution \(X \sim Beta(\alpha, \beta)\)
```{r}
alpha <- 2
beta <- 2
```
- theory tells us that \(E(X^2) = \frac{\alpha(\alpha + 1)}{(\alpha + \beta + 1)(\alpha + \beta)}\)
```{r}
(beta_2nd_moment_theory <- alpha * (alpha + 1) / ((alpha + beta + 1) * (alpha + beta)))
```
[1] 0.3
- deterministic integration
```{r}
beta_integrand <- function(x){
return(x^2 * dbeta(x, shape1 = alpha, shape2 = beta))
}
(beta_2nd_moment_det <- integrate(beta_integrand, lower = 0, upper = 1))
```
0.3 with absolute error < 3.3e-15
- Monte Carlo integration
```{r}
beta_mcarlo_iterations <- 1000
beta_sample <- rbeta(beta_mcarlo_iterations, shape1=alpha, shape2=beta)
(beta_2nd_moment_mc <- mean(beta_sample^2))
```
[1] 0.2937768
Monte Carlo errors
```{r}
(beta_mcarlo_error <- sd(beta_sample^2) / sqrt(beta_mcarlo_iterations))
```
[1] 0.00715907
Monte Carlo 95% confidence intervals
```{r}
c(
beta_2nd_moment_mc - 1.96 * beta_mcarlo_error,
beta_2nd_moment_mc + 1.96 * beta_mcarlo_error
)
```
[1] 0.2797450 0.3078086
Compare the three answers
```{r}
beta_2nd_moment_theory
beta_2nd_moment_det
beta_2nd_moment_mc
c(
beta_2nd_moment_mc - 1.96 * beta_mcarlo_error,
beta_2nd_moment_mc + 1.96 * beta_mcarlo_error
)
```
[1] 0.3
0.3 with absolute error < 3.3e-15
[1] 0.2937768
[1] 0.2797450 0.3078086
Exercise 1
Find the 4th moment of a beta distribution with parameters alpha = 2, beta = 5. Use:
deterministic
Monte Carlo integration
Compute Monte Carlo error and confidence intervals of your estimate(s) from part b.
Exercise 1 solution
```{r}
alpha <- 2
beta <- 5
```
- deterministic integration
```{r}
beta_integrand <- function(x){
return(x^4 * dbeta(x, shape1 = alpha, shape2 = beta))
}
(beta_4th_moment_det <- integrate(beta_integrand, lower = 0, upper = 1))
```
0.02380952 with absolute error < 2.6e-16
- Monte Carlo integration
```{r}
beta_mcarlo_iterations <- 1000
beta_sample <- rbeta(beta_mcarlo_iterations, shape1 = alpha, shape2 = beta)
(beta_4th_moment_mc <- mean(beta_sample^4))
```
[1] 0.0227895
- Monte Carlo error and 95% confidence intervals
```{r}
(beta_mcarlo_error <- sd(beta_sample^4) / sqrt(beta_mcarlo_iterations))
c(
beta_4th_moment_mc - 1.96 * beta_mcarlo_error,
beta_4th_moment_mc + 1.96 * beta_mcarlo_error
)
```
[1] 0.00152763
[1] 0.01979535 0.02578366
Compare the two answers
```{r}
beta_4th_moment_det
beta_4th_moment_mc
c(
beta_4th_moment_mc - 1.96 * beta_mcarlo_error,
beta_4th_moment_mc + 1.96 * beta_mcarlo_error
)
```
0.02380952 with absolute error < 2.6e-16
[1] 0.0227895
[1] 0.01979535 0.02578366
Exercise 2
Integral of \(\exp(-x^5)\) on \([0, \infty)\).
Use:
deterministic integration
Monte Carlo integration to approximate it Hint: Uniform sampling won’t work, because uniform distribution is not defined on \([0, \infty]\). For importance sampling to work, the support must be the same. Use exponential distribution samples instead – think about how to write the desired integral in terms of expectation
compute Monte Carlo error and confidence intervals of your estimate(s) from part b
compare Monte Carlo error of the procedure using two different rates of the exponential distribution
Exercise 2 Solutions
Define function
```{r}
integrand_function2 <- function(x){
return(exp(-x^5))
}
```
```{r}
x_values <- seq(0,20,0.1)
plot(x_values, integrand_function2(x_values), type = "l", lwd = 2, xlab = "x", ylab = "y", cex.lab = 1.3, col = "red")
```
Deterministic integration
```{r}
(det_int2 <- integrate(integrand_function2, lower = 0, upper = Inf))
```
0.9181687 with absolute error < 1.3e-07
Monte Carlo
```{r}
mcarlo_iterations2 <- 1000000
exp_rate <- 0.6
exp_sample <- rexp(mcarlo_iterations2, rate = exp_rate)
#arithmetic average
mc_samples <- integrand_function2(exp_sample) / dexp(exp_sample, rate = exp_rate)
mcarlo_int2 <- mean(mc_samples)
```
Monte Carlo error
```{r}
mcarlo_error2 = sd(mc_samples) / sqrt(mcarlo_iterations2)
print(mcarlo_error2)
```
[1] 0.0009452832
Monte Carlo 95% confidence interval
```{r}
c(mcarlo_int2 - 1.96 * mcarlo_error2, mcarlo_int2 + 1.96 * mcarlo_error2)
```
[1] 0.9178095 0.9215150
Compare to the deterministic numerical integration
```{r}
det_int2
```
0.9181687 with absolute error < 1.3e-07
Different rate
```{r}
mcarlo_iterations2 <- 1000000
exp_rate <- 10
exp_sample <- rexp(mcarlo_iterations2, rate = exp_rate)
#arithmetic average
(mcarlo_int3 <- mean(integrand_function2(exp_sample) / dexp(exp_sample, rate = exp_rate)))
(mcarlo_error3 = sd(integrand_function2(exp_sample) / dexp(exp_sample, rate = exp_rate)) / sqrt(mcarlo_iterations2))
mcarlo_error3
```
[1] 0.9107092
[1] 0.01080825
[1] 0.01080825
Importance sampling
Let Z have a standard normal distribution, \(Z \sim N(0,1)\). We want to compute \(P(X \geq 4.5)\). We could do this via a Naive Monte Carlo simulation. (How?)
```{r}
hist(rnorm(10000,0,1), main = "10,000 Random samples from a Standard Normal")
```
We generate a bunch of samples of Z and count how many satisfy \(Z \geq 4\). The problem is that there won’t be very many (perhaps none), unless my number of samples is very large.
Q: What is the error of the MC estimate for \(p = P(Z \geq 4)\)?
Define a threshold value and number of Monte Carlo samples
```{r}
my_const <- 4.5
sim_size <- 1000000
```
To find the true probability of interest:
```{r}
(true_prob <- pnorm(my_const, lower.tail = FALSE))
```
[1] 3.397673e-06
Naive Monte Carlo estimate:
```{r}
naive_samples <- ifelse(rnorm(sim_size) > my_const, yes = 1, no = 0)
(naive_est <- mean(naive_samples))
```
[1] 5e-06
Instead, we can compute this via importance sampling.
We can take the sampling distribution to be a shifted exponential:
\[ q(x) = \begin{cases} & e^{-(x-4.5)} \quad \text{when } x \geq 4.5, \\ & 0 \quad \text{ o.w. } \end{cases} \] (This means that if \(Y \sim Exp(1)\), \(Y+4 \sim q(x)\))
So the probability that we want to compute, i.e. \(p = P(X \geq 4)\)
\[ p = \int 1_{x \geq 4} \frac{p(x)}{q(x)}q(x)dx \]
Importance sampling estimate:
```{r}
shift_exp <- rexp(sim_size, rate = 1) + my_const
# weights = dnorm(x)/q(x)
is_samples <- dnorm(shift_exp) / exp(-(shift_exp - my_const))
(is_est <- mean(is_samples))
```
[1] 3.40056e-06
```{r}
print(true_prob)
print(naive_est)
print(is_est)
```
[1] 3.397673e-06
[1] 5e-06
[1] 3.40056e-06
Computing Monte Carlo variances
```{r}
(naive_var_mc <- var(naive_samples) / sim_size)
(is_var_mc <- var(is_samples) / sim_size)
```
[1] 4.99998e-12
[1] 1.949783e-17
Computing Monte Carlo confidence intervals
```{r}
# Naive (not very useful, the variance estimate might be degenerate):
c(naive_est - 1.96 * sqrt(naive_var_mc), naive_est + 1.96 * sqrt(naive_var_mc))
# Importance Sampling:
c(is_est - 1.96 * sqrt(is_var_mc), is_est + 1.96 * sqrt(is_var_mc))
```
[1] 6.173155e-07 9.382684e-06
[1] 3.391905e-06 3.409215e-06
Check your understanding
- Try repeating these sames steps with
my_const = 0.9
. Can you explain why the Naive MC performs better in this case?
We will get more samples from the normal distribution in that area.
- Why do you think a shifted exponential is a good proposal distribution in this case?
It generates samples that are in the area of the original distribution that we want to approximate.
- Let’s say that you want to use the same shifted exponential (with shift constant = 4), but you want to find \(P(Z>8)\). How would you have to modify the Importance Sampling sample to target the correct distribution?
We need to only consider the values where \(X>8\) for the I.S. sample (assing 0 to the others).