Lecture 1: Causality
(Alan)
The term has become synonymous with randomization, or design-based inference
Massive growth in last 15 years in design-based approaches in political science and economics
Randomization by the researcher: standard experiments
Randomization by nature: pure natural experiments
“As if” randomization, e.g.:
Broad, deep skepticism about observational, assumption- (or model-) based inference
Randomized assignment allows for “assumption-free” causal inference
To get the ATE, just difference the outcomes between treatment groups
Liberated from models
To be clear: we think developments in design-based inference are enormously important
But the “causal inference revolution” has some limits
Simple example:
Assume no randomization
We want to know if \(X\) causes \(Y\)
We want to use \(X,Y\) correlations as evidence
But this evidence depends on beliefs about how the world works
Or maybe we do think there is a confounding \(Z\)
In sum: beliefs about the world will always play a central role in non-randomized inference
Randomization can give us an \(ATE\), but can’t address many other causal estimands of interest
We have a general intuition about how process tracing works
So I go into a case with high Inequality and I look to see if there was Mass Mobilization
If I see Mass Mobilization, I take this as evidence that Inequality caused Democratization in this case
But what warrants this conclusion?
Why isn’t this just another correlation?
Why is seeing Mass Mobilization more indicative of a causal effect than seeing no Mass Mobilization?
Is seeing Mobilization evidence that Inequality caused Democracy?
In sum, standard process tracing approaches leave us with:
In qualitative inference too, what counts as evidence for a finding depends on how we think the world works
By making our beliefs about the world explicit and reasoning systematically from them, we can do much better than if we keep our underlying models implicit
Using causal models can allow us to
Answer a vast range of causal questions, with experimental and observational data
Draw inferences from evidence in ways logically consistent with our prior beliefs/information
Show, and allow others to see, precisely how our inferences hinge on our model
Readily integrate quantitative and qualitative evidence into a single set of findings
[Though not in this course] Make research design choices in a way systematically informed by what we already know
\(\Rightarrow\) implications for large-n, small-n, and mixed-method research
A cost: inferences are model-dependent
(Macartan)
Causation as difference making
The intervention based motivation for understanding causal effects:
The problem in 2 is that you need to know what would have happened if things were different. You need information on a counterfactual.
Idea: A causal claim is (in part) a claim about something that did not happen. This makes it metaphysical.
Now that we have a concept of causal effects available, let’s answer two questions:
Now that we have a concept of causal effects available, let’s answer two questions:
TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?
A boulder is flying down a mountain. You duck. This saves your life.
So the boulder caused the ducking and the ducking caused you to survive.
So: did the boulder cause you to survive?
The counterfactual model is about contribution and attribution in a very specific sense.
Consider an outcome \(Y\) that might depend on two causes \(X_1\) and \(X_2\):
\[Y(0,0) = 0\] \[Y(1,0) = 0\] \[Y(0,1) = 0\] \[Y(1,1) = 1\]
What caused \(Y\)? Which cause was most important?
Some seemingly causal claims not admissible.
To get the definition off the ground, manipulation must be imaginable (whether practical or not)
This renders thinking about effects of race and gender difficult
Compare: What does it mean to say that Southern counties voted for Brexit because they have many old people?
What does it mean to say that Trump won because he is a white man? What exactly do we have to imagine…
(Alan)
A representation of beliefs about causal dependencies in a given domain
How do we believe variables relating to democratization (economic conditions, external threat, inequality, regime type, etc.) relate to one another causally?
How do we believe variables relating to the choice of an electoral system (party system, relative party strength, normative beliefs, electoral rules) are related to one another causally?
Comprised of:
A structure – indicating qualitative which nodes react to which others; we will represent these graphically
Functional relations between nodes indicating quantitatively how one responds to another
Probability distributions (priors) over exogenous conditions (which we deal with later)
Causal structures—the first component of a causal model—can be represented with Directed Acyclic Graphs (DAGs)
Let’s walk through the parts of a DAG
The nodes are variables that can take on different values for each unit as a function of the values of other nodes that precede it in the causal ordering:
Suppose we believe that inequality (\(I\)) can cause democratization (\(D\)) through mass mobilization (\(M\)).
Node: a variable with a specified range
Edge: an arrow connecting two nodes
Implies possible causal dependency
Only direct dependencies are connected
No arrow \(\Rightarrow\) no causal dependency given other nodes
Not represented on the DAG:
Ranges of variables
Functional relationships
Beliefs about the distributions of \(\theta\) terms
We use familial language to talk about relations among nodes
Parents: \(M\) is a parent of \(D\)
Children: \(I\) has two children, \(M\) and \(D\)
Ancestors: \(\theta_I\) is an ancestor of \(M\)
It is not a specific “argument” for how things work
A model based on our read of the inequality and democratization literature (Acemoglu/Robinson, Boix, Haggard/Kaufman, Ansell/Samuels):
No argument here – just potentially causal connections
Moving from DAGs to causal models
A specific causal process through which things might work in a case could look like:
socioeconomic pressures put issues on the agenda; being on the agenda and having an opportunity to avoid blame were individually necessary and jointly sufficient for retrenchment
Note:
We are now squarely beyond DAG territory and in the world of causal models
For a node with no parents: \(\theta\) simply indicates the node’s value. e.g. 0 or 1.
For a node with one parent: \(\theta\) indicates how the node would respond to each value of its parents. For instance:
For a node with \(k\) parents: \(\theta\) can take on up to \(2^{\left(2^k\right)}\) values! This grows quickly.
You refine a model by specifiying which types are possible or which types are likely
Probabilistic process tracing can be thought of as figuring out which types are likely among all those that are possible
CausalQueries(Macartan)
The key input to model creation is the causal statement:
X -> Y”A -> B -> C <- A <-> C”A -> B; B <- C”Note:
;., -, _, |, etc.)CausalQueries will always translate causal statements into a simple collection of connections:
Causal statement:
A -> B; A -> C; A <-> C; B -> C
Note:
;Once a model is made there is a default generation of causal structure, causal types, parameters, priors, and so on. All elements can be examined using inspect:
parameters_df
Mapping of model parameters to nodal types:
param_names: name of parameter
node: name of endogeneous node associated
with the parameter
gen: partial causal ordering of the
parameter's node
param_set: parameter groupings forming a simplex
given: if model has confounding gives
conditioning nodal type
param_value: parameter values
priors: hyperparameters of the prior
Dirichlet distribution
param_names node gen param_set nodal_type given param_value priors
1 X.0 X 1 X 0 0.50 1
2 X.1 X 1 X 1 0.50 1
3 Y.00 Y 2 Y 00 0.25 1
4 Y.10 Y 2 Y 10 0.25 1
5 Y.01 Y 2 Y 01 0.25 1
6 Y.11 Y 2 Y 11 0.25 1
All the nodal types get made but they can still be hard to interpret. You can get help using summary or interpret:
There are various ways to refine a model:
set_restrictions removes particular “nodal types”set_parameters provide particular parameter values for a modelset_priors provide priors over model parametersThere are many ways to use these so consult help. e.g. ? set_restrictions
Say I have a model of the form A -> B -> C. I want to impose monotonicity (no negative effects), but I also want to assume that the A -> B relation is very strong:
Using natural language statements:
model <-
make_model("A -> B-> C") |>
set_restrictions(statement = decreasing("A", "B")) |>
set_restrictions(statement = decreasing("B", "C")) |>
set_parameters(statement = increasing("A", "B"), parameters = .8)
model |> inspect("parameters_df")
parameters_df
Mapping of model parameters to nodal types:
param_names: name of parameter
node: name of endogeneous node associated
with the parameter
gen: partial causal ordering of the
parameter's node
param_set: parameter groupings forming a simplex
given: if model has confounding gives
conditioning nodal type
param_value: parameter values
priors: hyperparameters of the prior
Dirichlet distribution
param_names node gen param_set nodal_type given param_value priors
1 A.0 A 1 A 0 0.5000000 1
2 A.1 A 1 A 1 0.5000000 1
3 B.00 B 2 B 00 0.1000000 1
5 B.01 B 2 B 01 0.8000000 1
6 B.11 B 2 B 11 0.1000000 1
7 C.00 C 3 C 00 0.3333333 1
9 C.01 C 3 C 01 0.3333333 1
10 C.11 C 3 C 11 0.3333333 1
Say I have a model of the form A -> B -> C. I want to impose monotonicity (no negative effects), but I also want to assume that the A -> B relation is very strong:
Using nodal types:
model <-
make_model("A -> B-> C") |>
set_restrictions(param_names = 'B.10') |>
set_restrictions(param_names = 'C.10') |>
set_parameters(param_names = 'B.01', parameters = .8)
model |> inspect("parameters_df")
parameters_df
Mapping of model parameters to nodal types:
param_names: name of parameter
node: name of endogeneous node associated
with the parameter
gen: partial causal ordering of the
parameter's node
param_set: parameter groupings forming a simplex
given: if model has confounding gives
conditioning nodal type
param_value: parameter values
priors: hyperparameters of the prior
Dirichlet distribution
param_names node gen param_set nodal_type given param_value priors
1 A.0 A 1 A 0 0.5000000 1
2 A.1 A 1 A 1 0.5000000 1
3 B.00 B 2 B 00 0.1000000 1
5 B.01 B 2 B 01 0.8000000 1
6 B.11 B 2 B 11 0.1000000 1
7 C.00 C 3 C 00 0.3333333 1
9 C.01 C 3 C 01 0.3333333 1
10 C.11 C 3 C 11 0.3333333 1
Plotting functionality makes use of the Sugiyama layout from igraph which plots nodes to reflect their position in a causal ordering.
The plot method calls plot_model and passes provided arguments to it.
Simple model
The model that is produced is a ggplot object and additional layers can be added in the usual way.
Adding additional ggplot layers
Provide labels in the same order as model nodes.
Nodes:
A, B, C
Adding labels
You can manually set positions using the x_coord and y_coord arguments.
You can manually set positions using the x_coord and y_coord arguments.
Specifying coordinates
You can manually control node color and text color for all nodes together or separately.
Controlling colors
Unobserved confounding is represented using dashed curves.
Plot showing confounding