This memo demonstrates the feasibility of using a model of treatment effect heterogeneity in one experiment to predict treatment effect heterogeneity in a second experiment. I use a Get-Out-The-Vote (GOTV) experiment from 2007 (reported in Gerber, Green, and Larimer (2010)) to predict treatment effects in a second GOTV experiment from 2009 (reported in Sinclair, McConnell, and Green (2012)). I show that subjects in the 2009 experiment who were predicted on the basis of the 2007 results to have stronger treatment effects did in fact exhibit stronger responses to treatment.

Challenges facing this exercise

  1. Finding common covariates. In order to apply a model developed on one dataset to a second, the model inputs must be present in both. A related challenge is that covariates can change their meaning depending on the dataset. For example, the “age” covariate is the subject’s age at the time of treatment. This is preferred to a “birthdate” covariate.

  2. Developing a model of treatment effect heterogeneity. The simplest such model would simply interact treatment with all covariates and allow the effects of treatment to vary with each covariate in a linear fashion. I will use an alternative model, Bayesian Additive Regression Trees (BART), which allows for more flexible interactions. This model has been advocated by some social scientists (Hill 2011, Green and Kern (2012)) as a method for exploring heterogeneous effects in a principled fashion. One major advantage of BART (in contrast to other machine learning models) is that it is not overly sensitive to tuning parameters and works “out of the box.”

Procedure

  1. Load data and packages
load(ETOV2007_subset.rdata) # 2007 Experiment
load(SMG2009_subset.rdata)  # 2009 Experiment

library(dplyr) # for data manipulation tools like subset() and transmute()
library(dbarts) # for actual BART estimation
library(ggplot2) # for nice looking graphs
library(stargazer) # for nice looking tables
  1. Clean both datasets, taking care to name the covariates the same in both.
# Recode 2007
ETOV2007_subset <- ETOV2007_subset %>%
  transmute(
    self = as.numeric(treatmen %in% c("shown 05 vote","shown 06 vote")),
    female = as.numeric(mf == "F"),
    age_at_treatment = as.numeric(as.Date("2007-11-01") - as.Date(paste0("19",yob1), "%Y%m%d")) /365.25,
    two_person_household = as.numeric(twoperso == 1),
    vote_propensity = (g2004 + p2004 + g2006 + p2006)/4,
    voted_dv = og2007
  ) 

# Recode 2009
SMG2009_subset <-SMG2009_subset %>%
  transmute(
    self = Tind,
    female = as.numeric(Sex == "F"),
    age_at_treatment = as.numeric(as.Date("2009-04-01") - as.Date(birthdate)) /365.25,
    two_person_household = as.numeric(hhsize == 2),
    vote_propensity = (voted_e2004g + voted_e2004p + voted_e2006g + voted_e2006p)/4,
    voted_dv = voted
  )
  1. Run BART. Now that we have comparable datasets, we run the model on the 2007 dataset (the “training sample”) and generate predictions for the 2009 dataset (the “test sample”). The BART model uses the following 5 predictors: treatment, gender, age, household size, and vote propoentsity. The treatment (the “self” mailer is common in both experiments. Age is the age of the voter at the time of treatment. Household size is an indicator for whether there are 2 voters in the household. Vote propensity is the simple sum of the number of elections a voter has voted in between 2004 and 2006 (this measure varies between 0 and 4).
training_set <- select(ETOV2007_subset, -voted_dv)

n_test <- nrow(SMG2009_subset)
# Stack test_set on itself
test_set <- SMG2009_subset[rep(1:n_test, 2),]
test_set <- select(test_set, -voted_dv)
# For the first n rows, we get the "treated" prediction
test_set[1:n_test, "self"] <-1
# for the second n rows, we get the "control" prediction
test_set[(n_test+1):(2*n_test), "self"] <-0

# Run BART estimation
# If you use the whole training set, will take a while
out_bart <- bart(x.train = training_set, 
                 y.train = ETOV2007_subsample$voted_dv, 
                 x.test = test_set)
  1. Summarize predictions. The BART model produces predictions in probit space, so we return them to probabilites with pnorm().
# Add predictions to the 2009 dataset
SMG2009_subset <- within(SMG2009_subset,{
  ate_hats <- apply(pnorm(out_bart$yhat.test[,1:n_test]) - 
                      pnorm(out_bart$yhat.test[,((n_test+1):(2*n_test))]),
                    2, mean)
  ate_hats_centered <- ate_hats - mean(ate_hats)
})

First, let’s run a linear regression of the 2009 outcome on treatment interacted with the predictions. The regression shows that those who are predicted to have higher treatment effects do in fact show a stronger response to treatment.

fit <- lm(voted_dv ~ self * ate_hats_centered, data=SMG2009_withpredictions)
stargazer(fit, style="apsr", omit.stat = c("f", "adj.rsq", "ser"),
          title="2009 Heterogeneity predicted by 2007 Heterogeneity",header = FALSE)

Next, lets plot the (smoothed) treatment and control reponses as a function of the predictions. This graph shows that both high and low propoensity voters were predicted to have small effects, but that medium-propensity voters were predicted to have larger effects. The separation between the two response curves on the right hand side of the graph confirms these predictions.

SMG2009_withpredictions %>% 
  ggplot(aes(x=ate_hats, y=voted_dv, group=self, color=self)) + 
  stat_smooth() + theme_bw()+
  ylab("Proportion Voting") +
  xlab("Predicted Effects")
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

Summary

The heterogenous effects of a treatment can be difficult to model, which is why a flexible algothrim like BART is so useful. This exercise shows that the heterogeneous effects of treatment can be predicted, but they are often complex functions of covariates.

References

Gerber, Alan S., Donald P. Green, and Christopher W. Larimer. 2010. “An Experiment Testing the Relative Effectiveness of Encouraging Voter Participation by Inducing Feelings of Pride or Shame.” Political Behavior 32 (3): 409–22. doi:10.1007/s11109-010-9110-4.

Green, Donald P., and Holger L. Kern. 2012. “Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76 (3): 491–511. doi:10.1093/poq/nfs036.

Hill, Jennifer L. 2011. “Bayesian Nonparametric Modeling for Causal Inference.” Journal of Computational and Graphical Statistics 20 (1): 217–40. doi:10.1198/jcgs.2010.08162.

Sinclair, Betsy, Margaret McConnell, and Donald P. Green. 2012. “Detecting Spillover Effects: Design and Analysis of Multilevel Experiments.” American Journal of Political Science 56 (4): 1055–69. doi:10.1111/j.1540-5907.2012.00592.x.