Bayesian GLM

Based on Dr. Mine Dogucu’s Notes

Babak Shahbaba

The notes for this lecture are derived from the Bayes Rules! book

glimpse(bikes)
Rows: 500
Columns: 13
$ date        <date> 2011-01-01, 2011-01-03, 2011-01-04, 2011-01-05, 2011-01-0…
$ season      <fct> winter, winter, winter, winter, winter, winter, winter, wi…
$ year        <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
$ month       <fct> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan…
$ day_of_week <fct> Sat, Mon, Tue, Wed, Fri, Sat, Mon, Tue, Wed, Thu, Fri, Sat…
$ weekend     <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALS…
$ holiday     <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, yes, n…
$ temp_actual <dbl> 57.39952, 46.49166, 46.76000, 48.74943, 46.50332, 44.17700…
$ temp_feel   <dbl> 64.72625, 49.04645, 51.09098, 52.63430, 50.79551, 46.60286…
$ humidity    <dbl> 80.5833, 43.7273, 59.0435, 43.6957, 49.8696, 53.5833, 48.2…
$ windspeed   <dbl> 10.749882, 16.636703, 10.739832, 12.522300, 11.304642, 17.…
$ weather_cat <fct> categ2, categ1, categ1, categ1, categ2, categ2, categ1, ca…
$ rides       <int> 654, 1229, 1454, 1518, 1362, 891, 1280, 1220, 1137, 1368, …

Rides

\(Y_i | \mu, \sigma \stackrel{ind}{\sim} N(\mu, \sigma^2)\)
\(\mu \sim N(\theta, \tau^2)\) \(\sigma \sim \text{ some prior model.}\)

Regression Model

\(Y_i\) the number of rides
\(X_i\) temperature (in Fahrenheit) on day \(i\).

\(\mu_i = \beta_0 + \beta_1X_i\)

\(\beta_0:\) the typical ridership on days in which the temperature was 0 degrees ( \(X_i\)=0). It is not interpretable in this case.

\(\beta_1:\) the typical change in ridership for every one unit increase in temperature.

Prior Models

\(\text{likelihood model:} \; \; \; Y_i | \beta_0, \beta_1, \sigma \;\;\;\stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right)\text{ with } \mu_i = \beta_0 + \beta_1X_i\)

\(\text{prior models:}\)

\(\beta_0\sim N(m_0, s_0^2 )\)
\(\beta_1\sim N(m_1, s_1^2 )\)
\(\sigma \sim \text{Exp}(l)\)

Recall:

\(\text{Exp}(l) = \text{Gamma}(1, l)\)

plot_normal(mean = 5000, sd = 1000) + 
  labs(x = "beta_0c", y = "pdf")

plot_normal(mean = 100, sd = 40) + 
  labs(x = "beta_1", y = "pdf")

plot_gamma(shape = 1, rate = 0.0008) + 
  labs(x = "sigma", y = "pdf")

\[\begin{split} Y_i | \beta_0, \beta_1, \sigma & \stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right) \;\; \text{ with } \;\; \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0c} & \sim N\left(5000, 1000^2 \right) \\ \beta_1 & \sim N\left(100, 40^2 \right) \\ \sigma & \sim \text{Exp}(0.0008) .\\ \end{split}\]

Posterior sampling via rstanarm

bike_model <- stan_glm(rides ~ temp_feel, data = bikes,
                       family = gaussian,
                       prior_intercept = normal(5000, 1000),
                       prior = normal(100, 40), 
                       prior_aux = exponential(0.0008),
                       chains = 4, iter = 5000*2, seed = 84735,
                       refresh = FALSE) 

The refresh = FALSE prevents printing out your chains and iterations, especially useful in R Markdown.

# Effective sample size ratio and Rhat
neff_ratio(bike_model)
(Intercept)   temp_feel       sigma 
    1.03285     1.03505     0.96585 
rhat(bike_model)
(Intercept)   temp_feel       sigma 
  0.9998873   0.9999032   0.9999642 

The effective sample size ratios are slightly above 1 and the R-hat values are very close to 1, indicating that the chains are stable, mixing quickly, and behaving much like an independent sample.

mcmc_trace(bike_model, size = 0.1)

mcmc_dens_overlay(bike_model)

# STEP 1: DEFINE the model
stan_bike_model <- "
  data {
    int<lower = 0> n;
    vector[n] Y;
    vector[n] X;
  }
  parameters {
    real beta0;
    real beta1;
    real<lower = 0> sigma;
  }
  model {
    Y ~ normal(beta0 + beta1 * X, sigma);
    beta0 ~ normal(-2000, 1000);
    beta1 ~ normal(100, 40);
    sigma ~ exponential(0.0008);
  }
"

# STEP 2: SIMULATE the posterior
stan_bike_sim <- 
  stan(model_code = stan_bike_model, 
       data = list(n = nrow(bikes), Y = bikes$rides, X = bikes$temp_feel), 
       chains = 4, iter = 5000*2, seed = 84735, refresh = FALSE)

broom.mixed::tidy(bike_model, effects = c("fixed", "aux"),
     conf.int = TRUE, conf.level = 0.80)
# A tibble: 4 × 5
  term        estimate std.error conf.low conf.high
  <chr>          <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)  -2195.     354.    -2646.    -1736. 
2 temp_feel       82.2      5.07     75.7      88.7
3 sigma         1282.      40.6    1232.     1336. 
4 mean_PPD      3487.      81.5    3382.     3593. 

Referring to the tidy() summary, the posterior median relationship is

\[\begin{equation} -2195.31 + 82.22 X \end{equation}\]

# Store the 4 chains for each parameter in 1 data frame
bike_model_df <- as.data.frame(bike_model)
# Check it out
nrow(bike_model_df)
[1] 20000
head(bike_model_df, 3)
  (Intercept) temp_feel    sigma
1   -2125.091  80.92653 1230.511
2   -2087.977  80.60855 1200.141
3   -2265.712  83.64980 1365.039

# 50 simulated model lines
bikes %>%
  tidybayes::add_fitted_draws(bike_model, n = 50) %>%
  ggplot(aes(x = temp_feel, y = rides)) +
    geom_line(aes(y = .value, group = .draw), alpha = 0.15) + 
    geom_point(data = bikes, size = 0.05)

# Tabulate the beta_1 values that exceed 0
bike_model_df %>% 
  mutate(exceeds_0 = temp_feel > 0) %>% 
  tabyl(exceeds_0)
 exceeds_0     n percent
      TRUE 20000       1

Binary outcome

\[Y = \begin{cases} \text{ Yes rain} & \\ \text{ No rain} & \\ \end{cases} \;\]

\[\begin{split} X_1 & = \text{ today's humidity at 9am (percent)} \\ \end{split}\]

library(bayesrules)
library(rstanarm)
library(bayesplot)
library(tidyverse)

data(weather_perth)
weather <- weather_perth %>%
  select(day_of_year, raintomorrow, humidity9am,
         humidity3pm, raintoday)

glimpse(weather)
Rows: 1,000
Columns: 5
$ day_of_year  <dbl> 45, 11, 261, 347, 357, 254, 364, 293, 304, 231, 313, 208,…
$ raintomorrow <fct> No, No, Yes, No, No, No, No, No, Yes, Yes, No, No, No, No…
$ humidity9am  <int> 55, 43, 62, 53, 65, 84, 48, 51, 47, 90, 53, 87, 83, 96, 8…
$ humidity3pm  <int> 44, 16, 85, 46, 51, 59, 43, 34, 47, 57, 42, 46, 55, 56, 5…
$ raintoday    <fct> No, No, No, No, No, Yes, No, No, Yes, Yes, No, No, Yes, Y…

Bayesian logistic regression model

\(\text{likelihood model:} \; \; \; Y_i | \beta_0, \beta_1 \;\;\;\stackrel{ind}{\sim} \text{Bern}(\pi_i)\)

\(\log(\frac{\pi_i}{1-\pi_i}) = \beta_0 + \beta_1X_{i1}\)

\(\text{prior models:}\)

\(\beta_0\sim N(m_0, s_0 )\)
\(\beta_1\sim N(m_1, s_1 )\)

# Convert raintomorrow to 1s and 0s
weather <- weather %>% 
  mutate(raintomorrow = as.numeric(raintomorrow == "Yes"))

# Simulate the model
rain_model_1 <- stan_glm(raintomorrow ~ humidity9am,
                         data = weather,
                         family = binomial,
                         chains = 4, 
                         iter = 5000, 
                         warmup = 1000,
                         seed = 84735)

prior_summary(rain_model_1)
Priors for model 'rain_model_1' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 0.14)
------
See help('prior_summary.stanreg') for more details

mcmc_dens_overlay(rain_model_1)

rain_model_1_df <- as.data.frame(rain_model_1)
head(rain_model_1_df, 3)
  (Intercept) humidity9am
1   -4.260952  0.04413347
2   -4.125510  0.04307958
3   -4.476874  0.04587014

\[\pi = \frac{e^{\beta_0 + \beta_1 X_1}}{1 + e^{\beta_0 + \beta_1 X_1}}\]

# The first 50 posterior parameter sets
first_50 <- head(rain_model_1_df, 50)

# Function that calculates model trend on probability scale
prob_trend <- function(beta0, beta1, x){
  exp(beta0 + beta1*x) / (1 + exp(beta0 + beta1*x))}

# Plot the first 50 posterior model trends
ggplot(weather, aes(x = humidity9am, y = raintomorrow)) + 
  mapply(function(beta0, beta1) {
    stat_function(fun = prob_trend,
      args = list(beta0 = beta0, beta1 = beta1), size = 0.1)
    },
    beta0 = first_50$`(Intercept)`, beta1 = first_50$humidity9am
  ) + 
  labs(y = "probability of rain")