Skip to contents
library(CausalQueries)
options(mc.cores = parallel::detectCores())
set.seed(1)

Here we show properties of some canonical causal models, exploring in particular estimates of non-identified queries. The examples include cases where inferences are reliable but also where identification failures result in unreliable posteriors.

Simple experiment

model <- make_model("X -> Y")
model |> plot()

This model could be justified by a randomized control trial. With a lot of data you can get tight estimates for the effect of XX on YY but not for whether a given outcome on YY is due to XX. That is the “effects of causes” estimand is identified, but the “causes of effects” estimand is not.

In the illustration below we generate data from a parameterized model and then try to recover the average treatment effect (ATE) and the “probability of causation” (POC) for X=1, Y=1 cases. The ATE has a tight credibility interval. The PoC does not, although it is still considerably tighter than the prior.

model <- model |> 
  set_parameters(nodal_type = c("10", "01"), parameters = c(.1, .6)) 

data <- model |> 
  make_data(n = 5000)

model |>
  update_model(data, refresh = 0, iter = 10000) |>
  
  query_model(queries = 
                list(ATE = "Y[X=1] - Y[X=0]",
                     POC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1"),
              using = c("parameters", "priors", "posteriors")) |>
  
  plot() 

Confounded

model <- make_model("X -> Y; X <-> Y")

model |> plot(x_coord = 0:1, y_coord = 1:0)

This is the appropriate model if XX is not randomized and it is possible that unknown factors affect both the assignment of XX and the outcome YY.

In the illustration below we use data, drawn from a model in which X is in fact as if randomized (though we do not know this) and there is a true positive treatment effect. We see we have lost identification on the ATE but also our uncertainty about POC is much greater.

model <- model |> 
  set_parameters(nodal_type = "01", parameters = .7) 

data <- model |> 
  make_data(n = 5000)

model |>
  update_model(data, refresh = 0, iter = 10000) |>
  
  query_model(queries = 
                list(ATE = "Y[X=1] - Y[X=0]",
                     POC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1"),
              using = c("parameters", "priors", "posteriors")) |>
  
  plot() 

Chain model

model <- make_model("Z -> X -> Y")
model |> plot() 

This is a chain model. This model is hard to justify from experimentation since randomization of ZZ does not guarantee that third features do not influence both XX and YY, or that ZZ operates on YY only though XX.

Even still, it is a good model to illustrate limits of learning about effects by observation of the values of mediators.

Below we imagine that data is produced by a model in which ZZ has a 0.8 average effect on XX and XX has a 0.8 average effect on YY. We see that positive evidence on the causal chain (on XX) has a modest effect on our belief that Z=1Z=1 caused Y=1Y=1. Negative evidence has a much stronger effect, albeit with considerable posterior uncertainty.

model <- model |> 
  set_parameters(param_names = c("X.10", "X.01", "Y.10", "Y.01"), 
                 parameters = c(0.05, .85, .05, .85)) 

data <- model |> 
  make_data(n = 5000)

model |>
  update_model(data, refresh = 0, iter = 10000) |>
  
  query_model(query = list("Y[Z=1] - Y[Z=0]"),
              given = c("", "Z==1 & Y==1", "Z==1 & Y==1 & X==0", "Z==1 & Y==1 & X==1"),
              using = c("parameters", "posteriors")) |>
  
  plot() 

IV model with exclusion restriction

# Model with exclusion restriction and monotonicity
model <- make_model("Z -> X -> Y; X <-> Y") |>
  set_restrictions(decreasing('Z', 'X'))  # monotonicity restriction


# Adding parameters for data generation
model <- model |>
  set_parameters(param_names = c("Y.10_X.01", "Y.01_X.01"), 
                 parameters = c(0, .75))     # complier effect

model |> plot(x_coord = c(0,1,0), y_coord = 2:0)
IV model with exclusion restriction satisfied

IV model with exclusion restriction satisfied

This is the classic “instrumental variables” model. This model is sometimes justified by randomization of ZZ under the assumption that ZZ operates on YY only though XX (the exclusion restriction). Researchers also often assume that ZZ has a monotonic effect on XX, which we have also imposed here.

Below we draw data from this model, update, and query, focusing our attention on the effects of XX on YY for the population and also specifically for units for whom XX responds positively to YY, compliers. In addition we ask about the probability of causation, both for units with X=1,Y=1X=1, Y=1 (PoC) and for compliers with X=1,Y=1X=1, Y=1 (PoCC). In the model that generated these data an effect is present for compliers only, though researchers do not know (or assume) this.


data <- make_data(model, 4000)

model |>
  
  update_model(data, refresh = 1000, iter = 10000) |>
  
  query_model(
    query = list(
      ATE = "Y[X=1] - Y[X=0]",
      LATE= "Y[X=1] - Y[X=0] :|: X[Z=1] > X[Z=0]", # ATE for compliers
      PoC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1",    # ATE for units with X=Y-1
      PoCC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1 & (X[Z=1] > X[Z=0])"),    # ATE for units with X=Y-1
    using = c("posteriors", "parameters"),
    cred = 99) |>
  
  plot()
IV inferences I

IV inferences I

Note the relatively tight posterior for the complier average effect (LATE) and the wide posterior for the average effect and for the probability of causation.

IV model without exclusion restriction

Here is a model in which neither the exclusion restriction or monotonicity holds.

model <- make_model("Z -> X -> Y <- Z; X <-> Y") 
model |> plot(x_coord = c(0,1,0), y_coord = 2:0)
Exclusion restriction not satisfied

Exclusion restriction not satisfied

Note that for comparability we assume again that the data is the same as before, generated by a model in which the exclusion restriction in fact holds (though researchers do not know this).


model |>
  
  update_model(data, refresh = 1000, iter = 20000) |>
  
  query_model(
    query = list(
      ATE = "Y[X=1] - Y[X=0]",
      LATE= "Y[X=1] - Y[X=0] :|: X[Z=1] > X[Z=0]", # ATE for compliers
      PoC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1",    # ATE for units with X=Y-1
      PoCC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1 & (X[Z=1] > X[Z=0])"), # ATE for compliers with X=Y-1
    using = c("posteriors"),
    cred = 99
) |>
  
  plot()
IV inferences II

IV inferences II

We updated here using a relatively large number of iterations to ensure convergence. Note the failure of the two key assumptions are reflected by larger credibility intervals and underestimates for all quantities.

Mediation model with sequential ignorability

model <- make_model("Z -> X -> Y <- Z")

model |> plot(x_coord = c(0,1,0), y_coord = 2:0)

This is a typical mediation type problem where you might want to understand the effects of ZZ on YY that operate directly or that operate via XX.

We have assumed here that there are no third features that cause both XX and YY. This is a strong assumption that is a key part of “sequential ignorability” (see Forastiere et al (2018) for an extensive treatment of the relationship between sequential ignorability and “strong principal ignorability” which we impose here).

Here one might ask queries about different types of direct or indirect effect of ZZ on YY as well as the average effects of ZZ on XX and YY and of XX on YY.

In this example the data is drawn from a world in which the most common type has Y=1Y=1 if and only if both Z=1Z=1 and X=1X=1 but in which ZZ exerts a negative effect on XX; there are both positive direct effects and negative indirect effects.


model <-  
  make_model("Z -> X -> Y <- Z") |> 
  set_parameters(nodal_type = c("00", "10"), parameters = c(0, .5)) |> 
  set_parameters(nodal_type = "0001", parameters = .5) 

data <- model |> make_data(n = 2000)

queries <- list(
  `ATE Z -> X` = "X[Z=1] - X[Z=0]",
  `ATE Z -> Y` = "Y[Z=1] - Y[Z=0]",
  `ATE X -> Y` = "Y[X=1] - Y[X=0]",
  `Direct (Z=1)` = "Y[Z = 1, X = X[Z=1]] - Y[Z = 0, X = X[Z=1]]",
  `Direct (Z=0)` = "Y[Z = 1, X = X[Z=0]] - Y[Z = 0, X = X[Z=0]]",
  `Indirect (Z=1)` = "Y[Z = 1, X = X[Z=1]] - Y[Z = 1, X = X[Z=0]]",
  `Indirect (Z=0)` = "Y[Z = 0, X = X[Z=1]] - Y[Z = 0, X = X[Z=0]]"
)

model |>
  update_model(data, refresh = 0, iter = 5000)  |>
  
  query_model(
    query = queries,
    cred = 99,
    using = c("parameters", "posteriors"),
    expand_grid = TRUE) |>
  
  plot() 
Mediation model

Mediation model

We estimate all quantities very well.

Mediation model without sequential ignorability

We now allow that there may be third features that cause both XX and YY. Thus we do not assume “sequential ignorability.” This model might be justified by a random assignment of ZZ.

In this example the data is drawn the same way as before which means that in the data generating model the potential outcomes for YY are independent of those for XX, though the researcher does not know this. The true (unknown) values of the queries are also the same as before.

model <- make_model("Z -> X -> Y <- Z; X <-> Y")  |>
  set_parameters(nodal_type = c("00", "10"), parameters = c(0, .5)) |> 
  set_parameters(nodal_type = "0001", parameters = .5)  
  
model |> plot(x_coord = c(0,1,0), y_coord = 2:0)
Mediation model without sequential ignorability

Mediation model without sequential ignorability

  
  model |>
  
  update_model(data, iter = 10000) |>   
  
  query_model(
    query = queries,
    cred = 99,
    using = c("parameters", "posteriors"),
    expand_grid = TRUE) |>
  
  plot()   

We see we do not do nearly so well. To ensure stable estimates we ran a large number of iterations. For the non-identified quantities our credibility intervals are not tight (which is as it should be!) and in one case the true value lies outside of them (which is not as it should be). This highlights the extreme difficulty of this problem. Nevertheless the gains relative to the priors are considerable.

The napkin model

model <- make_model("W->Z->X->Y; W <-> X; W <-> Y")
plot(model)

The “napkin” model (see Pearl’s Book of Why) involves two types of confounding. We will paramaterize a version in which (a) the average causal effect of XX on YY is negative, arising in cases when W=0W=0 (b) there is positive confounding arising from the fact that XX is liable to be 1 regardless of ZZ when W=1W = 1 and ) YY is liable to be 1 regardless of XX when W=1W = 1.

model <- model |>  
  set_parameters(param_name = "Y.10_W.0", parameters = .6) |>   
  set_parameters(param_name = "Y.11_W.1", parameters = .9) |>
  set_parameters(param_name = "X.11_W.1", parameters = .9)

Naive regression performs very badly here.

data <- make_data(model, n = 10000, using = "parameters")

estimatr::lm_robust(Y ~ X, data = data) |> 
  texreg::htmlreg(caption = "Naive estimation")
Naive estimation
  Model 1
(Intercept) 0.75*
  [ 0.73; 0.76]
X -0.04*
  [-0.06; -0.03]
R2 0.00
Adj. R2 0.00
Num. obs. 10000
RMSE 0.45
* 0 outside the confidence interval.

In contrast, the updated causal model yields good estimates for the ATE, though again we have somehwat more uncertainty for the probability of causation.



model |>
  update_model(data, refresh = 0, iter = 6000) |>
  query_model(
    list(
      ATE = "Y[X=1] - Y[X=0]",
      PoC = "Y[X=1] - Y[X=0] :|: X==1 & Y==1"),
    using = c("posteriors", "priors", "parameters")) |>
  plot()

M-bias

In the below model, also studied in Pearl (Causality), the effect of XX on YY is identified without controlling for ZZ. Controlling for ZZ, however, introduces a backdoor path between XX and YY which can introduce bias in naive estimation.

Strikingly ZZ might be prior to XX but it can still introduce bias. Here is the model

model <- make_model("Z <- U1 -> Y; Z <- U2 -> X; X -> Y")

Here is a parameterization in which there is no true effect of XX on YY yet a strong negative correlation between XX and YY once you control for ZZ.

model <- model |> 
  set_parameters(param_name = "Y.0101", parameters = .8) |>   
  set_parameters(param_name = "Z.0001", parameters = .8) |>
  set_parameters(param_name = "X.01", parameters = .8)


plot(model)

With this model, naive regression results perform poorly when you control for ZZ.

data <- model |> make_data(n = 5000)


list(
  `no controls` = estimatr::lm_robust(Y ~ X, data = data),
  `with controls` = estimatr::lm_robust(Y ~ X + Z, data = data)
) |>
  
  texreg::htmlreg(caption = "Estimation with and without controls")
Estimation with and without controls
  no controls with controls
(Intercept) 0.51* 0.44*
  [ 0.49; 0.53] [ 0.42; 0.47]
X 0.00 -0.12*
  [-0.02; 0.03] [-0.15; -0.09]
Z   0.41*
    [ 0.38; 0.44]
R2 0.00 0.13
Adj. R2 -0.00 0.13
Num. obs. 5000 5000
RMSE 0.50 0.47
* Null hypothesis value outside the confidence interval.

Taking account of ZZ does not harm (and could help) in the Bayesian setting.


list(
  `Using Z` = model |> update_model(data, iter = 20000),
  `Ignoring Z` = model |> update_model(data |> dplyr::select(-Z), iter = 20000))|>
  
  query_model(
    list(
      "Y[X=1] - Y[X=0]",
      "Y[X=1] - Y[X=0] :|: X==1 & Y==1"),
    using = c("parameters", "priors",  "posteriors")) |>
  
  plot()

References

Forastiere, Laura, Alessandra Mattei, and Peng Ding. 2018. “Principal ignorability in mediation analysis: through and beyond sequential ignorability.” Biometrika 105.4: 979-986.

Pearl, Judea. 2012. Causality. Cambridge University Press