Skip to contents

Angrist, Imbens, and Rubin (1996) demonstrate that one can consistently estimate the local average treatment effect, LATE, (also called the complier average treatment effect, CATE) in randomized control trials with non-compliance. Compliers are units that comply with the treatment assignment, i.e., they participate if assigned to the treatment but not otherwise. In contrast, always-takers are units that always participate, regardless of their assignment. Never-takers are units that never participate, regardless of their assignment. Finally, defiers do the opposite of what they are assigned to do, i.e., they participate if assigned to the control group and do not participate if assigned to the treatment group. Angrist, Imbens, and Rubin (1996) highlight that to consistently estimate the LATE, randomization is not sufficient and one also needs to assume that there are no defiers, along with other assumptions.

To better understand the subpopulation one is making inferences about, i.e., the compliers, Marbach and Hangartner (2020) suggest to characterize the compliers and non-compliers in terms of their covariates when analyzing experiments with non-compliance (or use instruments in observational studies). They provide an easy-to-use estimator for the case of a fully randomized treatment assignment (see ivdesc). Using CausalQueries, we replicate their approach with data from the JTPA study evaluating the effectiveness of an employment and training program in the United States during the 1980s (Bloom et al. 1997). In this field experiment, subjects were randomly assigned to an employment training program. However, there is non-compliance: Some subjects assigned to training decided to not participate while a small number managed to participate despite being assigned to the control group.

For this vignette, we focus on characterizing the proportion of men among the compliers, always-takers and never-takers. We convert the outcome (earnings) into a binary indicator encoding if subjects earn above median and rename the relevant variables to ease the specification of the model:

  • Z: treatment assignment,
  • D: participation,
  • Y: outcome,
  • X: covariate.
set.seed(42)
df <- read_dta("http://fmwww.bc.edu/repec/bocode/j/jtpa.dta") |>
  select(training, assignmt, earnings, sex) |> 
  rename(Z=assignmt, D=training, Y=earnings, X=sex) |>  
  mutate(Y=as.numeric(Y>median(Y)))

We define a classical instrumental variable model with an exclusion restriction and the restriction that there are no defiers. While the exclusion restriction follows from the experimental design, the restriction of no defiers remains an (untestable) assumption. We allow race to confound treatment and outcome and we allow for additional unobserved confounding between the treatment and the outcome.

model <- make_model(
    "Z -> D -> Y; 
     D <- X -> Y; 
     D <-> Y") |> 
  set_restrictions("(D[Z=1] < D[Z=0])") 

Relying on the default priors, we update model with the experimental data:

model <- model |> update_model(df, refresh = 0, iter=4000)

To estimate the effect of participation among compliers on probability of above-median earnings, we use the query_model function:

model |> 
  query_model(list(
      LATE="Y[D=1]-Y[D=0] :|: D[Z=1] > D[Z=0]"), 
      using = c("posteriors"))
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label |query         |given           |using      | mean|    sd| cred.low| cred.high|
#> |:-----|:-------------|:---------------|:----------|----:|-----:|--------:|---------:|
#> |LATE  |Y[D=1]-Y[D=0] |D[Z=1] > D[Z=0] |posteriors | 0.04| 0.015|     0.01|      0.07|

Next, we estimate the share of compliers (co_p), always-takers (at_p) and never-takers (nt_p):

model |> 
  query_model(list(
      co_p="D[Z=1] > D[Z=0]", 
      at_p="D[Z=1]==1 & D[Z=0]==1", 
      nt_p="D[Z=1]==0 & D[Z=0]==0"), 
      using = c("posteriors")) 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label |query                 |using      |  mean|    sd| cred.low| cred.high|
#> |:-----|:---------------------|:----------|-----:|-----:|--------:|---------:|
#> |co_p  |D[Z=1] > D[Z=0]       |posteriors | 0.626| 0.006|    0.614|     0.637|
#> |at_p  |D[Z=1]==1 & D[Z=0]==1 |posteriors | 0.016| 0.002|    0.012|     0.020|
#> |nt_p  |D[Z=1]==0 & D[Z=0]==0 |posteriors | 0.359| 0.006|    0.348|     0.370|

Observing that most non-compliers are never-takers, we next estimate the share of men among the compliers (co_mu), always-takers (at_mu) and never-takers (nt_mu):

model |> 
  query_model(list(
      co_mu="X==1 :|: D[Z=1] > D[Z=0]", 
      at_mu="X==1 :|: D[Z=1]==1 & D[Z=0]==1",
      nt_mu="X==1 :|: D[Z=1]==0 & D[Z=0]==0"), 
      using = c("posteriors")) 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label |query |given                 |using      |  mean|    sd| cred.low| cred.high|
#> |:-----|:-----|:---------------------|:----------|-----:|-----:|--------:|---------:|
#> |co_mu |X==1  |D[Z=1] > D[Z=0]       |posteriors | 0.444| 0.007|    0.431|     0.457|
#> |at_mu |X==1  |D[Z=1]==1 & D[Z=0]==1 |posteriors | 0.364| 0.060|    0.252|     0.485|
#> |nt_mu |X==1  |D[Z=1]==0 & D[Z=0]==0 |posteriors | 0.479| 0.009|    0.461|     0.496|

The results show that the share of men is larger among the never-takers compared to the compliers. To obtain the posterior probability that the share of men among compliers is smaller than among the never-takers we draw from the posterior distribution via query_distribution and then compute the share of the draws for which complier mean is smaller than the never-taker mean:

model |> 
  query_distribution(list(
      co_mu="X==1 :|: D[Z=1] > D[Z=0]", 
      nt_mu="X==1 :|: D[Z=1]==0 & D[Z=0]==0"), 
      using = c("posteriors")) |> 
    summarize(
        pp_co_nt=mean(co_mu < nt_mu))
#>   pp_co_nt
#> 1   0.9975

With CausalQueries it is easy to move beyond the profiling of compliers and non-compliers in fully randomized experiments as in Marbach and Hangartner (2020). For example, one can extend the model to allow for conditional random assignment (by adding a causal arrow from X to Z in the model definition) or explore how violations of the monotonicity assumption affect the results (by removing set_restrictions()).

References

Angrist, J. D., G. W. Imbens, and D. B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91.434:444–455.

Bloom, H. S., Orr, L. L., Bell, S. H., Cave, G., Doolittle, F., Lin, W., and Bos, J. M. 1997. “The benefits and costs of JTPA Title II-A programs: Key findings from the National Job Training Partnership Act Study”. The Journal of Human Resources 32.3:549-576.

Marbach, M. and D. Hangartner. 2020. “Profiling compliers and noncompliers for instrumental-Variable Analysis.” Political Analysis 28.3:435-444.