Day 7 Lab: Monte Carlo Integration

Monte Carlo integration

```{r}
set.seed(5172013)
```

Example 1: integral of \(e^{x^4 - x^2}\) over [0,1]

Define an integrand

```{r}
integrand_function <- function(x){
  return(exp(x^4 - x^2))
}
```

Plot the integrand on the interval [0,1]

```{r}
x_values <- c(0:100) / 100

plot(x_values, integrand_function(x_values), type = "l", lwd = 2, xlab = "x", ylab = "y", cex.lab = 1.3, col = "red")
text(0.6, 0.95, expression(y == e^{x^4 - x^2}), cex = 2)
```

See ?plotmath for more info on plotting math in R

Deterministic numerical integration from 0 to 1

Notice that we are passing function integrand_function as an argument Also the integral can be seen as \(E(e^{U^4 - U^2})\), where \(U \sim Uniform(0, 1)\)

```{r}
det_int <- integrate(integrand_function, lower = 0, upper = 1)
```

The function integrate() returns an object, not a value to find out what is in the object, type names(det_int)

```{r}
names(det_int)

det_int$value
det_int$message
```

[1] "value"        "abs.error"    "subdivisions" "message"      "call"        
[1] 0.8785237
[1] "OK"

Monte Carlo integration

By default runif() defines uniform distribution on [0,1]

```{r}
mcarlo_iterations <- 1000000

unif_sample <- runif(mcarlo_iterations)
```

artihmetic average

```{r}
(mcarlo_int = mean(integrand_function(unif_sample)))
```

[1] 0.8785647

Monte Carlo error

```{r}
(mcarlo_error = sd(integrand_function(unif_sample))/sqrt(mcarlo_iterations))
```

[1] 7.702924e-05

Monte Carlo 95% confidence interval

```{r}
(c(mcarlo_int - 1.96*mcarlo_error, mcarlo_int + 1.96*mcarlo_error))
```

[1] 0.8784137 0.8787157

Compare to the deterministic numerical integration

```{r}
(det_int)
```

0.8785237 with absolute error < 1e-14

Example 2: second moment of the Beta distribution \(X \sim Beta(\alpha, \beta)\)

```{r}
alpha <- 2
beta <- 2
```

theory tells us that \(E(X^2) = \frac{\alpha(\alpha + 1)}{(\alpha + \beta + 1)(\alpha + \beta)}\)

```{r}
(beta_2nd_moment_theory <- alpha * (alpha + 1) / ((alpha + beta + 1) * (alpha + beta)))
```

[1] 0.3

deterministic integration

```{r}
beta_integrand <- function(x){
  return(x^2 * dbeta(x, shape1 = alpha, shape2 = beta))
}

(beta_2nd_moment_det <- integrate(beta_integrand, lower = 0, upper = 1))
```

0.3 with absolute error < 3.3e-15

Monte Carlo integration

```{r}
beta_mcarlo_iterations <- 1000

beta_sample <- rbeta(beta_mcarlo_iterations, shape1=alpha, shape2=beta)

(beta_2nd_moment_mc <- mean(beta_sample^2))
```

[1] 0.2937768

Monte Carlo errors

```{r}
(beta_mcarlo_error <- sd(beta_sample^2) / sqrt(beta_mcarlo_iterations))
```

[1] 0.00715907

Monte Carlo 95% confidence intervals

```{r}
c(
  beta_2nd_moment_mc - 1.96 * beta_mcarlo_error, 
  beta_2nd_moment_mc + 1.96 * beta_mcarlo_error
)
```

[1] 0.2797450 0.3078086

Compare the three answers

```{r}
beta_2nd_moment_theory
beta_2nd_moment_det
beta_2nd_moment_mc

c(
  beta_2nd_moment_mc - 1.96 * beta_mcarlo_error, 
  beta_2nd_moment_mc + 1.96 * beta_mcarlo_error
)
```

[1] 0.3
0.3 with absolute error < 3.3e-15
[1] 0.2937768
[1] 0.2797450 0.3078086

Exercise 1

Find the 4th moment of a beta distribution with parameters alpha = 2, beta = 5. Use:

deterministic
Monte Carlo integration
Compute Monte Carlo error and confidence intervals of your estimate(s) from part b.

Exercise 1 solution

```{r}
alpha <- 2
beta <- 5
```

deterministic integration

```{r}
beta_integrand <- function(x){
  return(x^4 * dbeta(x, shape1 = alpha, shape2 = beta))
}

(beta_4th_moment_det <- integrate(beta_integrand, lower = 0, upper = 1))
```

0.02380952 with absolute error < 2.6e-16

Monte Carlo integration

```{r}
beta_mcarlo_iterations <- 1000

beta_sample <- rbeta(beta_mcarlo_iterations, shape1 = alpha, shape2 = beta)

(beta_4th_moment_mc <- mean(beta_sample^4))
```

[1] 0.0227895

Monte Carlo error and 95% confidence intervals

```{r}
(beta_mcarlo_error <- sd(beta_sample^4) / sqrt(beta_mcarlo_iterations))

c(
  beta_4th_moment_mc - 1.96 * beta_mcarlo_error, 
  beta_4th_moment_mc + 1.96 * beta_mcarlo_error
)
```

[1] 0.00152763
[1] 0.01979535 0.02578366

Compare the two answers

```{r}
beta_4th_moment_det
beta_4th_moment_mc

c(
  beta_4th_moment_mc - 1.96 * beta_mcarlo_error, 
  beta_4th_moment_mc + 1.96 * beta_mcarlo_error
)
```

0.02380952 with absolute error < 2.6e-16
[1] 0.0227895
[1] 0.01979535 0.02578366

Exercise 2

Integral of \(\exp(-x^5)\) on \([0, \infty)\).

Use:

deterministic integration
Monte Carlo integration to approximate it Hint: Uniform sampling won’t work, because uniform distribution is not defined on \([0, \infty]\). For importance sampling to work, the support must be the same. Use exponential distribution samples instead – think about how to write the desired integral in terms of expectation
compute Monte Carlo error and confidence intervals of your estimate(s) from part b
compare Monte Carlo error of the procedure using two different rates of the exponential distribution

Exercise 2 Solutions

Define function

```{r}
integrand_function2 <- function(x){
  return(exp(-x^5))
}
```

```{r}
x_values <- seq(0,20,0.1)

plot(x_values, integrand_function2(x_values), type = "l", lwd = 2, xlab = "x", ylab = "y", cex.lab = 1.3, col = "red")
```

Deterministic integration

```{r}
(det_int2 <- integrate(integrand_function2, lower = 0, upper = Inf))
```

0.9181687 with absolute error < 1.3e-07

Monte Carlo

```{r}
mcarlo_iterations2 <- 1000000
exp_rate <- 0.6
exp_sample <- rexp(mcarlo_iterations2, rate = exp_rate)

#arithmetic average

mc_samples <- integrand_function2(exp_sample) / dexp(exp_sample, rate = exp_rate)
mcarlo_int2 <- mean(mc_samples)
```

Monte Carlo error

```{r}
mcarlo_error2 = sd(mc_samples) / sqrt(mcarlo_iterations2)
print(mcarlo_error2)
```

[1] 0.0009452832

Monte Carlo 95% confidence interval

```{r}
c(mcarlo_int2 - 1.96 * mcarlo_error2, mcarlo_int2 + 1.96 * mcarlo_error2)
```

[1] 0.9178095 0.9215150

Compare to the deterministic numerical integration

```{r}
det_int2
```

0.9181687 with absolute error < 1.3e-07

Different rate

```{r}
mcarlo_iterations2 <- 1000000
exp_rate <- 10
exp_sample <- rexp(mcarlo_iterations2, rate = exp_rate)

#arithmetic average
(mcarlo_int3 <- mean(integrand_function2(exp_sample) / dexp(exp_sample, rate = exp_rate)))
(mcarlo_error3 = sd(integrand_function2(exp_sample) / dexp(exp_sample, rate = exp_rate)) / sqrt(mcarlo_iterations2))

mcarlo_error3
```

[1] 0.9107092
[1] 0.01080825
[1] 0.01080825

Importance sampling

Let Z have a standard normal distribution, \(Z \sim N(0,1)\). We want to compute \(P(X \geq 4.5)\). We could do this via a Naive Monte Carlo simulation. (How?)

```{r}
hist(rnorm(10000,0,1), main = "10,000 Random samples from a Standard Normal")
```

We generate a bunch of samples of Z and count how many satisfy \(Z \geq 4\). The problem is that there won’t be very many (perhaps none), unless my number of samples is very large.

Q: What is the error of the MC estimate for \(p = P(Z \geq 4)\)?

Define a threshold value and number of Monte Carlo samples

```{r}
my_const <- 4.5
sim_size <- 1000000
```

To find the true probability of interest:

```{r}
(true_prob <- pnorm(my_const, lower.tail = FALSE))
```

[1] 3.397673e-06

Naive Monte Carlo estimate:

```{r}
naive_samples <- ifelse(rnorm(sim_size) > my_const, yes = 1, no = 0)
(naive_est <- mean(naive_samples))
```

[1] 5e-06

Instead, we can compute this via importance sampling.

We can take the sampling distribution to be a shifted exponential:

\[ q(x) = \begin{cases} & e^{-(x-4.5)} \quad \text{when } x \geq 4.5, \\ & 0 \quad \text{ o.w. } \end{cases} \] (This means that if \(Y \sim Exp(1)\), \(Y+4 \sim q(x)\))

So the probability that we want to compute, i.e. \(p = P(X \geq 4)\)

\[ p = \int 1_{x \geq 4} \frac{p(x)}{q(x)}q(x)dx \]

Importance sampling estimate:

```{r}
shift_exp <- rexp(sim_size, rate = 1) + my_const
# weights = dnorm(x)/q(x)
is_samples <- dnorm(shift_exp) / exp(-(shift_exp - my_const))
(is_est <- mean(is_samples))
```

[1] 3.40056e-06

```{r}
print(true_prob)
print(naive_est)
print(is_est)
```

[1] 3.397673e-06
[1] 5e-06
[1] 3.40056e-06

Computing Monte Carlo variances

```{r}
(naive_var_mc <- var(naive_samples) / sim_size)
(is_var_mc <- var(is_samples) / sim_size)
```

[1] 4.99998e-12
[1] 1.949783e-17

Computing Monte Carlo confidence intervals

```{r}
# Naive (not very useful, the variance estimate might be degenerate):
c(naive_est - 1.96 * sqrt(naive_var_mc), naive_est + 1.96 * sqrt(naive_var_mc))

# Importance Sampling:
c(is_est - 1.96 * sqrt(is_var_mc), is_est + 1.96 * sqrt(is_var_mc))
```

[1] 6.173155e-07 9.382684e-06
[1] 3.391905e-06 3.409215e-06

Check your understanding

Try repeating these sames steps with my_const = 0.9. Can you explain why the Naive MC performs better in this case?

We will get more samples from the normal distribution in that area.

Why do you think a shifted exponential is a good proposal distribution in this case?

It generates samples that are in the area of the original distribution that we want to approximate.

Let’s say that you want to use the same shifted exponential (with shift constant = 4), but you want to find \(P(Z>8)\). How would you have to modify the Importance Sampling sample to target the correct distribution?

We need to only consider the values where \(X>8\) for the I.S. sample (assing 0 to the others).