Causal models for qualitative and mixed methods inference

Lecture 1: Causality

Macartan Humphreys and Alan Jacobs

1 Motivation

(Alan)

1.1 Motivation 1: beyond random assignment

1.1.1 “Causal inference”

The term has become synonymous with randomization, or design-based inference
Massive growth in last 15 years in design-based approaches in political science and economics
- Randomization by the researcher: standard experiments
- Randomization by nature: pure natural experiments
- “As if” randomization, e.g.:
  - Regression-discontinuity designs

1.1.2 The benefit of strong design: assumption-free inference

Broad, deep skepticism about observational, assumption- (or model-) based inference
- Gerber, Green, and Kaplan (2014): “…observational findings are accorded zero weight….” if biases unknown
Randomized assignment allows for “assumption-free” causal inference
- To get the ATE, just difference the outcomes between treatment groups
- Liberated from models

1.1.3 But sometimes randomization isn’t enough

To be clear: we think developments in design-based inference are enormously important
- If you can randomize to get at your estimand, go for it
But the “causal inference revolution” has some limits

1.1.4 Limits of randomized designs

Feasibility: Randomization impossible or unavailable for large set of research and policy questions.
- Some causal variables that are hard to randomize
  - Institutions
  - Conflict
  - Economic development
  - Party systems

1.1.5 And non-experimental causal inference always requires models

Simple example:

Assume no randomization
We want to know if \(X\) causes \(Y\)
We want to use \(X,Y\) correlations as evidence
- But this evidence depends on beliefs about how the world works
  - Specifically, we have to believe there is no \(Z\) such that: \(X \leftarrow Z \rightarrow Y\)
Or maybe we do think there is a confounding \(Z\)
- So then we control for \(Z\)
- But this strategy depends on a different model of the world
  - A model in which THIS isn’t happening: \(X \rightarrow Z \rightarrow Y\)
In sum: beliefs about the world will always play a central role in non-randomized inference

1.1.6 Even with randomization, so many questions we can’t answer without models

Randomization can give us an \(ATE\), but can’t address many other causal estimands of interest

Distribution of effects
- Say, we know an aid treatment had an ATE of \(+20\%\) on village income
- For what proportion of villages did it have a positive effect?
- What proportion did it hurt?
How effects occurred (mechanisms)
- Did aid boost income by enabling capital investments?
- By boosting schooling?
- By mitigating conflict?

1.2 Even with randomization, so many questions we can’t answer without models

Whether and where effects will travel
- We see effects in Northeastern Kenya
- Should we expect those effects to operate in Somalia? In Bangladesh?
Case-level effects
- We get a population-level estimate of \(+20\%\) causal effect
- But did aid cause income growth in this specific village that received aid and experienced income growth?
We can’t answer any of these questions without models!

1.3 Motivation 2: analytically explicit qualitative inference

1.4 Qualitative inference: ambiguity at the heart of process tracing

We have a general intuition about how process tracing works
- I have a theory that \(Inequality \rightarrow MassMobilization \rightarrow Democratization\)
So I go into a case with high Inequality and I look to see if there was Mass Mobilization
If I see Mass Mobilization, I take this as evidence that Inequality caused Democratization in this case

1.5 Ambiguity at the heart of process tracing

But what warrants this conclusion?
- Why isn’t this just another correlation?
- Why is seeing Mass Mobilization more indicative of a causal effect than seeing no Mass Mobilization?
Is seeing Mobilization evidence that Inequality caused Democracy?
- Maybe: if I believe democracy causes Mobilization, which causes Democratization
- But what if I believe: Inequality weakens Mobilization?
- Then seeing Mobilization would be evidence against Inequality’s having had an effect

1.6 Ambiguity at the heart of process tracing

In sum, standard process tracing approaches leave us with:

Lack of clarity about what kind of evidence is informative
Lack of clarity about how to justify inferences from this evidence

1.6.1 Models \(\Rightarrow\) more transparent, evaluable qualitative inference

In qualitative inference too, what counts as evidence for a finding depends on how we think the world works
By making our beliefs about the world explicit and reasoning systematically from them, we can do much better than if we keep our underlying models implicit
- Logical consistency
- Transparency
- Openness to evaluation

1.7 The challenge of multi-method research

1.7.1 How can we integrate qualitative and quantitative data?

Data of different forms
- Quant: a little information on lots of cases; focused on \(X\)’s and \(Y\)
- Qual: lots of information on one/few cases; focused on process and context
We want two forms of integration
- Combination: Arrive at inferences that build on all information
- Joint gains: Allow one form of evidence to inform inferences from other forms of evidence

1.8 In sum

1.8.1 Key benefits of doing inference with causal models

Using causal models can allow us to

Answer a vast range of causal questions, with experimental and observational data
Draw inferences from evidence in ways logically consistent with our prior beliefs/information
Show, and allow others to see, precisely how our inferences hinge on our model
Readily integrate quantitative and qualitative evidence into a single set of findings
[Though not in this course] Make research design choices in a way systematically informed by what we already know
\(\Rightarrow\) implications for large-n, small-n, and mixed-method research

A cost: inferences are model-dependent

2 Causality

(Macartan)

2.1 Potential outcomes and the counterfactual approach

Causation as difference making

2.1.1 Motivation

The intervention based motivation for understanding causal effects:

We want to know if a particular intervention (like aid) caused a particular outcome (like reduced corruption).
We need to know:
1. What happened?
2. What would the outcome have been if there were no intervention?
The problem:
1. … this is hard
2. … this is impossible

The problem in 2 is that you need to know what would have happened if things were different. You need information on a counterfactual.

2.1.2 Potential Outcomes

For each unit, we assume that there are two post-treatment outcomes: \(Y_i(1)\) and \(Y_i(0)\).
- \(Y(1)\) is the outcome that would obtain if the unit received the treatment.
- \(Y(0)\) is the outcome that would obtain if it did not.
The causal effect of Treatment (relative to Control) is: \(\tau_i = Y_i(1) - Y_i(0)\)
Note:
- The causal effect is defined at the individual level.
- There is no “data generating process” or functional form.
- The causal effect is defined relative to something else, so a counterfactual must be conceivable (did Germany cause the second world war?).
- Are there any substantive assumptions made here so far?

2.1.3 Potential Outcomes

Idea: A causal claim is (in part) a claim about something that did not happen. This makes it metaphysical.

2.1.4 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

2.1.5 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?
A boulder is flying down a mountain. You duck. This saves your life.
So the boulder caused the ducking and the ducking caused you to survive.
So: did the boulder cause you to survive?

2.1.6 Causal claims: Contribution or attribution?

The counterfactual model is about contribution and attribution in a very specific sense.

Focus is on non-rival contributions
Focus is on conditional attribution. Not: “what caused \(Y\)?” or “What is the cause of \(Y\)?”, but “did \(X\) cause \(Y\) given all other factors were what they were?”

2.1.7 Causal claims: Contribution or attribution?

Consider an outcome \(Y\) that might depend on two causes \(X_1\) and \(X_2\):

\[Y(0,0) = 0\] \[Y(1,0) = 0\] \[Y(0,1) = 0\] \[Y(1,1) = 1\]

What caused \(Y\)? Which cause was most important?

2.1.8 Causal claims: No causation without manipulation

Some seemingly causal claims not admissible.
To get the definition off the ground, manipulation must be imaginable (whether practical or not)
This renders thinking about effects of race and gender difficult
What does it mean to say that Aunt Pat voted for Brexit because she is old?

2.1.9 Causal claims: No causation without manipulation

Some seemingly causal claims not admissible.
To get the definition off the ground, manipulation must be imaginable (whether practical or not)
This renders thinking about effects of race and gender difficult
Compare: What does it mean to say that Southern counties voted for Brexit because they have many old people?
What does it mean to say that Trump won because he is a white man? What exactly do we have to imagine…

3 Causal models

(Alan)

3.1 A causal model

A representation of beliefs about causal dependencies in a given domain
- How do we believe variables relating to democratization (economic conditions, external threat, inequality, regime type, etc.) relate to one another causally?
- How do we believe variables relating to the choice of an electoral system (party system, relative party strength, normative beliefs, electoral rules) are related to one another causally?
Comprised of:
1. A structure – indicating qualitative which nodes react to which others; we will represent these graphically
2. Functional relations between nodes indicating quantitatively how one responds to another
3. Probability distributions (priors) over exogenous conditions (which we deal with later)

3.2 Causal structure

Causal structures—the first component of a causal model—can be represented with Directed Acyclic Graphs (DAGs)

Let’s walk through the parts of a DAG

3.3 The nodes

The nodes are variables that can take on different values for each unit as a function of the values of other nodes that precede it in the causal ordering:

Suppose we believe that inequality (\(I\)) can cause democratization (\(D\)) through mass mobilization (\(M\)).

3.4 \(\theta\) terms: exogenous variation

A \(\theta\) term for each variable (Pearl uses a \(U\) here)
Represent exogenous variation affecting the system
Think: unexplained prior conditions, features of context, random disturbances
For us: these will capture the way a node responds to its antecedents

3.5 DAG grammar

Node: a variable with a specified range
Edge: an arrow connecting two nodes
- Implies possible causal dependency
- Only direct dependencies are connected
- No arrow \(\Rightarrow\) no causal dependency given other nodes

3.6 DAG grammar: prohibitions

No causal cycles: “acyclic”

3.7 DAG grammar

Not represented on the DAG:

Ranges of variables
Functional relationships
- E.g., does \(I\) have a positive or negative effect on \(M\)?
Beliefs about the distributions of \(\theta\) terms

3.8 Different causal structures

Here we allow for a direct effect of \(I\) on \(D\)
- \(\Rightarrow\) there could be effects that do not go through mobilization
\(M\) mediates and moderates \(I\)’s effect

3.9 Different causal structures

A second causal variable
- DAG alone does not tell us whether or how \(I\), \(M\), \(P\) interact

3.10 Assumptions implied by a DAG

Exogenous nodes are assigned independently
- Above graph implies \(I\) and \(P\) are generated independently
- No confounding between them
- Similarly, implies no confounding between \(I\) and \(D\)

3.11 Assumptions implied by a DAG

3.12 Representing unobservable confounding

We can allow for unobservable confounding with double-headed arrows
Here, no unobservable confounding
But note: \(I\) is an observable counfound!

3.13 Representing unobservable confounding

Here, we allow the possibility of confounding between \(I\) and \(D\) by unobservable factors

3.14 Assumptions implied by a DAG

Absent arrows mean NO direct effect
- Here, we assume any effect of \(I\) must run through \(M\)
- A strong assumption: that we know the mechanism

3.15 Assumptions implied by a DAG

This graph makes a weaker assumption
- Allows for effects of \(I\) on \(D\) that do NOT run through \(M\)

3.16 DAG lingo: all in the family

We use familial language to talk about relations among nodes

Parents: \(M\) is a parent of \(D\)
Children: \(I\) has two children, \(M\) and \(D\)
Ancestors: \(\theta_I\) is an ancestor of \(M\)

3.17 A model is a summary of possible relations between nodes

It is not a specific “argument” for how things work

A model based on our read of the inequality and democratization literature (Acemoglu/Robinson, Boix, Haggard/Kaufman, Ansell/Samuels):

3.18 Example: Pierson on welfare-state cuts

No argument here – just potentially causal connections

4 Functional relations and causal types

Moving from DAGs to causal models

4.1 Functional relations

A specific causal process through which things might work in a case could look like:

socioeconomic pressures put issues on the agenda; being on the agenda and having an opportunity to avoid blame were individually necessary and jointly sufficient for retrenchment

Note:

Now we are talking about functional relations between nodes
The \(\theta\)’s can capture these relations by indicating how a node responds to its “parents”
Process tracing is: figuring out what the \(\theta\)’s are for a case.

We are now squarely beyond DAG territory and in the world of causal models

4.2 Causal types as functional relations

We will call the causal processes operating between nodes in a given case the case’s causal type
If you know a case’s causal type, you know all possible causal relations within the case
A case’s causal type is represented by its \(\theta\) terms

4.3 Causal types

For a node with no parents: \(\theta\) simply indicates the node’s value. e.g. 0 or 1.
For a node with one parent: \(\theta\) indicates how the node would respond to each value of its parents. For instance:
- \(Y=0\) regardless of \(X\): we will call this “\(\theta^Y =\) ‘00’”
- \(Y=1\) if and only if \(X=1\): we will call this “\(\theta^Y =\) ‘01’”
- \(Y=1\) if and only if \(X=0\): we will call this “\(\theta^Y =\) ‘10’”
- \(Y=1\) regardless of \(X\): we will call this “\(\theta^Y =\) ‘11’”
For a node with \(k\) parents: \(\theta\) can take on up to \(2^{\left(2^k\right)}\) values! This grows quickly.
You refine a model by specifiying which types are possible or which types are likely
Probabilistic process tracing can be thought of as figuring out which types are likely among all those that are possible

5 Making and plotting models in `CausalQueries`

(Macartan)

5.1 Making models

5.1.1 Causal statements

The key input to model creation is the causal statement:

“X -> Y”
“A -> B -> C <- A <-> C”
“A -> B; B <- C”

Note:

Arrows can go both directions
Double headed arrows means confounding
Parts of a statement can be separated using ;
Avoid spaces and special characters in names (., -, _, |, etc.)

5.1.2 Causal statements

CausalQueries will always translate causal statements into a simple collection of connections:

make_model("A -> B -> C <- A <-> C") |>
  inspect("statement")


Causal statement: 
A -> B; A -> C; A <-> C; B -> C

Note:

Arrows can go both directions
Double headed arrows means confounding
Parts of a statement can be separated using ;
See “cheat sheet” for more guidance

5.1.3 Inspect

Once a model is made there is a default generation of causal structure, causal types, parameters, priors, and so on. All elements can be examined using inspect:

make_model("X -> Y") |>
  inspect("parameters_df")


parameters_df
Mapping of model parameters to nodal types: 

  param_names: name of parameter
  node:        name of endogeneous node associated
               with the parameter
  gen:         partial causal ordering of the
               parameter's node
  param_set:   parameter groupings forming a simplex
  given:       if model has confounding gives
               conditioning nodal type
  param_value: parameter values
  priors:      hyperparameters of the prior
               Dirichlet distribution 

  param_names node gen param_set nodal_type given param_value priors
1         X.0    X   1         X          0              0.50      1
2         X.1    X   1         X          1              0.50      1
3        Y.00    Y   2         Y         00              0.25      1
4        Y.10    Y   2         Y         10              0.25      1
5        Y.01    Y   2         Y         01              0.25      1
6        Y.11    Y   2         Y         11              0.25      1

5.1.4 Interpret

All the nodal types get made but they can still be hard to interpret. You can get help using summary or interpret:

make_model("A -> Y <- B") |>
  interpret_type(nodes = "Y")

$Y
  index          interpretation
1  *---  Y = * if A = 0 & B = 0
2  -*--  Y = * if A = 1 & B = 0
3  --*-  Y = * if A = 0 & B = 1
4  ---*  Y = * if A = 1 & B = 1

5.1.5 Refine

There are various ways to refine a model:

Set restrictions: set_restrictions removes particular “nodal types”
Set parameters: set_parameters provide particular parameter values for a model
Set priors: set_priors provide priors over model parameters

There are many ways to use these so consult help. e.g. ? set_restrictions

5.1.6 Example 1

Say I have a model of the form A -> B -> C. I want to impose monotonicity (no negative effects), but I also want to assume that the A -> B relation is very strong:

Using natural language statements:

model <- 
  make_model("A -> B-> C") |>
  set_restrictions(statement = decreasing("A", "B")) |>
  set_restrictions(statement = decreasing("B", "C")) |>
  set_parameters(statement = increasing("A", "B"), parameters = .8)

model |>  inspect("parameters_df")


parameters_df
Mapping of model parameters to nodal types: 

  param_names: name of parameter
  node:        name of endogeneous node associated
               with the parameter
  gen:         partial causal ordering of the
               parameter's node
  param_set:   parameter groupings forming a simplex
  given:       if model has confounding gives
               conditioning nodal type
  param_value: parameter values
  priors:      hyperparameters of the prior
               Dirichlet distribution 

   param_names node gen param_set nodal_type given param_value priors
1          A.0    A   1         A          0         0.5000000      1
2          A.1    A   1         A          1         0.5000000      1
3         B.00    B   2         B         00         0.1000000      1
5         B.01    B   2         B         01         0.8000000      1
6         B.11    B   2         B         11         0.1000000      1
7         C.00    C   3         C         00         0.3333333      1
9         C.01    C   3         C         01         0.3333333      1
10        C.11    C   3         C         11         0.3333333      1

5.1.7 Example 1

Say I have a model of the form A -> B -> C. I want to impose monotonicity (no negative effects), but I also want to assume that the A -> B relation is very strong:

Using nodal types:

model <- 
  make_model("A -> B-> C") |>
  set_restrictions(param_names  = 'B.10') |>
  set_restrictions(param_names  = 'C.10') |>
  set_parameters(param_names = 'B.01', parameters = .8)

model |>  inspect("parameters_df")


parameters_df
Mapping of model parameters to nodal types: 

  param_names: name of parameter
  node:        name of endogeneous node associated
               with the parameter
  gen:         partial causal ordering of the
               parameter's node
  param_set:   parameter groupings forming a simplex
  given:       if model has confounding gives
               conditioning nodal type
  param_value: parameter values
  priors:      hyperparameters of the prior
               Dirichlet distribution 

   param_names node gen param_set nodal_type given param_value priors
1          A.0    A   1         A          0         0.5000000      1
2          A.1    A   1         A          1         0.5000000      1
3         B.00    B   2         B         00         0.1000000      1
5         B.01    B   2         B         01         0.8000000      1
6         B.11    B   2         B         11         0.1000000      1
7         C.00    C   3         C         00         0.3333333      1
9         C.01    C   3         C         01         0.3333333      1
10        C.11    C   3         C         11         0.3333333      1

5.2 Plotting

Plotting functionality makes use of the Sugiyama layout from igraph which plots nodes to reflect their position in a causal ordering.

The plot method calls plot_model and passes provided arguments to it.

5.2.1 A basic plot:

model <- make_model("X -> Y")

model |> plot_model()

Simple model

6 Extra slides

6.0.1 ggplot layers

The model that is produced is a ggplot object and additional layers can be added in the usual way.

model |>
  plot_model()  +
  annotate("text", x = c(1, -1) , y = c(1.5, 1.5), label = c("Some text", "Some more text")) +
  coord_flip()

Adding additional ggplot layers

6.0.2 Adding labels

Provide labels in the same order as model nodes.

model <- make_model("A -> B -> C <- A")


# Check node ordering
inspect(model, "nodes")


Nodes: 
A, B, C

# Provide labels
model |>
   plot_model(
     labels = c("This is A", "Here is B", "And C"),
     nodecol = "white", textcol = "black")

Adding labels

6.0.3 Controlling positions

You can manually set positions using the x_coord and y_coord arguments.

model |>
  plot(x_coord = 0:2,  y_coord = c(0, 2, 1))

Specifying coordinates

6.0.4 Controlling color

You can manually control node color and text color for all nodes together or separately.

model |>
  plot(x_coord = 0:2,  y_coord = c(0, 2, 1),
       nodecol = c("blue", "orange", "red"),
       textcol = c("white", "red", "blue"))

Controlling colors

6.0.5 Models with unobserved confounding

Unobserved confounding is represented using dashed curves.

make_model('X -> K -> Y <- X; X <-> Y; K <-> Y') |>   plot()

Plot showing confounding

Causal models for qualitative and mixed methods inference

1 Motivation

1.1 Motivation 1: beyond random assignment

1.1.1 “Causal inference”

1.1.2 The benefit of strong design: assumption-free inference

1.1.3 But sometimes randomization isn’t enough

1.1.4 Limits of randomized designs

1.1.5 And non-experimental causal inference always requires models

1.1.6 Even with randomization, so many questions we can’t answer without models

1.2 Even with randomization, so many questions we can’t answer without models

1.3 Motivation 2: analytically explicit qualitative inference

1.4 Qualitative inference: ambiguity at the heart of process tracing

1.5 Ambiguity at the heart of process tracing

1.6 Ambiguity at the heart of process tracing

1.6.1 Models \(\Rightarrow\) more transparent, evaluable qualitative inference

1.7 The challenge of multi-method research

1.7.1 How can we integrate qualitative and quantitative data?

1.8 In sum

1.8.1 Key benefits of doing inference with causal models

2 Causality

2.1 Potential outcomes and the counterfactual approach

2.1.1 Motivation

2.1.2 Potential Outcomes

2.1.3 Potential Outcomes

2.1.4 Potential Outcomes

2.1.5 Potential Outcomes

2.1.6 Causal claims: Contribution or attribution?

2.1.7 Causal claims: Contribution or attribution?

2.1.8 Causal claims: No causation without manipulation

2.1.9 Causal claims: No causation without manipulation

3 Causal models

3.1 A causal model

3.2 Causal structure

3.3 The nodes

3.4 \(\theta\) terms: exogenous variation

3.5 DAG grammar

3.6 DAG grammar: prohibitions

3.7 DAG grammar

3.8 Different causal structures

3.9 Different causal structures

3.10 Assumptions implied by a DAG

3.11 Assumptions implied by a DAG

3.12 Representing unobservable confounding

3.13 Representing unobservable confounding

3.14 Assumptions implied by a DAG

3.15 Assumptions implied by a DAG

3.16 DAG lingo: all in the family

3.17 A model is a summary of possible relations between nodes

3.18 Example: Pierson on welfare-state cuts

4 Functional relations and causal types

4.1 Functional relations

4.2 Causal types as functional relations

4.3 Causal types

5 Making and plotting models in CausalQueries

5.1 Making models

5.1.1 Causal statements

5.1.2 Causal statements

5.1.3 Inspect

5.1.4 Interpret

5.1.5 Refine

5.1.6 Example 1

5.1.7 Example 1

5.2 Plotting

5.2.1 A basic plot:

6 Extra slides

6.0.1 ggplot layers

6.0.2 Adding labels

6.0.3 Controlling positions

6.0.4 Controlling color

6.0.5 Models with unobserved confounding

5 Making and plotting models in `CausalQueries`