Profiling compliers and non-compliers
Characterizing latent groups in an encouragement design
Moritz Marbach
Source:vignettes/f-complier-profiling.Rmd
      f-complier-profiling.RmdAngrist, Imbens, and Rubin (1996) demonstrate that one can consistently estimate the local average treatment effect, LATE, (also called the complier average treatment effect, CATE) in randomized control trials with non-compliance. Compliers are units that comply with the treatment assignment, i.e., they participate if assigned to the treatment but not otherwise. In contrast, always-takers are units that always participate, regardless of their assignment. Never-takers are units that never participate, regardless of their assignment. Finally, defiers do the opposite of what they are assigned to do, i.e., they participate if assigned to the control group and do not participate if assigned to the treatment group. Angrist, Imbens, and Rubin (1996) highlight that to consistently estimate the LATE, randomization is not sufficient and one also needs to assume that there are no defiers, along with other assumptions.
To better understand the subpopulation one is making inferences
about, i.e., the compliers, Marbach and Hangartner (2020) suggest to
characterize the compliers and non-compliers in terms of their
covariates when analyzing experiments with non-compliance (or use
instruments in observational studies). They provide an easy-to-use
estimator for the case of a fully randomized treatment assignment (see
ivdesc). Using
CausalQueries, we replicate their approach with data from
the JTPA study evaluating the effectiveness of an employment and
training program in the United States during the 1980s (Bloom et
al. 1997). In this field experiment, subjects were randomly assigned to
an employment training program. However, there is non-compliance: Some
subjects assigned to training decided to not participate while a small
number managed to participate despite being assigned to the control
group.
For this vignette, we focus on characterizing the proportion of men among the compliers, always-takers and never-takers. We convert the outcome (earnings) into a binary indicator encoding if subjects earn above median and rename the relevant variables to ease the specification of the model:
- 
Z: treatment assignment,
- 
D: participation,
- 
Y: outcome,
- 
X: covariate.
set.seed(42)
df <- read_dta("http://fmwww.bc.edu/repec/bocode/j/jtpa.dta") |>
  select(training, assignmt, earnings, sex) |> 
  rename(Z=assignmt, D=training, Y=earnings, X=sex) |>  
  mutate(Y=as.numeric(Y>median(Y)))We define a classical instrumental variable model with an exclusion restriction and the restriction that there are no defiers. While the exclusion restriction follows from the experimental design, the restriction of no defiers remains an (untestable) assumption. We allow race to confound treatment and outcome and we allow for additional unobserved confounding between the treatment and the outcome.
model <- make_model(
    "Z -> D -> Y; 
     D <- X -> Y; 
     D <-> Y") |> 
  set_restrictions("(D[Z=1] < D[Z=0])") Relying on the default priors, we update model with the experimental data:
model <- model |> update_model(df, refresh = 0, iter=4000)To estimate the effect of participation among compliers on
probability of above-median earnings, we use the
query_model function:
model |> 
  query_model(list(
      LATE="Y[D=1]-Y[D=0] :|: D[Z=1] > D[Z=0]"), 
      using = c("posteriors"))
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label |query         |given           |using      | mean|    sd| cred.low| cred.high|
#> |:-----|:-------------|:---------------|:----------|----:|-----:|--------:|---------:|
#> |LATE  |Y[D=1]-Y[D=0] |D[Z=1] > D[Z=0] |posteriors | 0.04| 0.015|     0.01|      0.07|Next, we estimate the share of compliers (co_p),
always-takers (at_p) and never-takers
(nt_p):
model |> 
  query_model(list(
      co_p="D[Z=1] > D[Z=0]", 
      at_p="D[Z=1]==1 & D[Z=0]==1", 
      nt_p="D[Z=1]==0 & D[Z=0]==0"), 
      using = c("posteriors")) 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label |query                 |using      |  mean|    sd| cred.low| cred.high|
#> |:-----|:---------------------|:----------|-----:|-----:|--------:|---------:|
#> |co_p  |D[Z=1] > D[Z=0]       |posteriors | 0.626| 0.006|    0.614|     0.637|
#> |at_p  |D[Z=1]==1 & D[Z=0]==1 |posteriors | 0.016| 0.002|    0.012|     0.020|
#> |nt_p  |D[Z=1]==0 & D[Z=0]==0 |posteriors | 0.359| 0.006|    0.348|     0.370|Observing that most non-compliers are never-takers, we next estimate
the share of men among the compliers (co_mu), always-takers
(at_mu) and never-takers (nt_mu):
model |> 
  query_model(list(
      co_mu="X==1 :|: D[Z=1] > D[Z=0]", 
      at_mu="X==1 :|: D[Z=1]==1 & D[Z=0]==1",
      nt_mu="X==1 :|: D[Z=1]==0 & D[Z=0]==0"), 
      using = c("posteriors")) 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label |query |given                 |using      |  mean|    sd| cred.low| cred.high|
#> |:-----|:-----|:---------------------|:----------|-----:|-----:|--------:|---------:|
#> |co_mu |X==1  |D[Z=1] > D[Z=0]       |posteriors | 0.444| 0.007|    0.431|     0.457|
#> |at_mu |X==1  |D[Z=1]==1 & D[Z=0]==1 |posteriors | 0.364| 0.060|    0.252|     0.485|
#> |nt_mu |X==1  |D[Z=1]==0 & D[Z=0]==0 |posteriors | 0.479| 0.009|    0.461|     0.496|The results show that the share of men is larger among the
never-takers compared to the compliers. To obtain the posterior
probability that the share of men among compliers is smaller than among
the never-takers we draw from the posterior distribution via
query_distribution and then compute the share of the draws
for which complier mean is smaller than the never-taker mean:
model |> 
  query_distribution(list(
      co_mu="X==1 :|: D[Z=1] > D[Z=0]", 
      nt_mu="X==1 :|: D[Z=1]==0 & D[Z=0]==0"), 
      using = c("posteriors")) |> 
    summarize(
        pp_co_nt=mean(co_mu < nt_mu))
#>   pp_co_nt
#> 1   0.9975With CausalQueries it is easy to move beyond the
profiling of compliers and non-compliers in fully randomized experiments
as in Marbach and Hangartner (2020). For example, one can extend the
model to allow for conditional random assignment (by adding a causal
arrow from X to Z in the model definition) or
explore how violations of the monotonicity assumption affect the results
(by removing set_restrictions()).
References
Angrist, J. D., G. W. Imbens, and D. B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91.434:444–455.
Bloom, H. S., Orr, L. L., Bell, S. H., Cave, G., Doolittle, F., Lin, W., and Bos, J. M. 1997. “The benefits and costs of JTPA Title II-A programs: Key findings from the National Job Training Partnership Act Study”. The Journal of Human Resources 32.3:549-576.
Marbach, M. and D. Hangartner. 2020. “Profiling compliers and noncompliers for instrumental-Variable Analysis.” Political Analysis 28.3:435-444.