Causal models for qualitative and mixed methods inference

Lecture 2: Queries and inferences

Macartan Humphreys and Alan Jacobs

1 Queries

(Alan)

1.1 Estimands (causal questions)

Estimands are the things you want to know.

A few common causal estimands:

  • Case-level causal effects
    • e.g., Did/does \(X\) have a positive effect on \(Y\) in this case?
    • e.g., Did \(X=1\) cause \(Y=1\) in this case with \(X=1, Y=1\)? (“causes of effects”)
  • Causal pathways
    • e.g., Does \(X\) affect \(Y\) through \(M\)?
      • For this case?
      • For population as a whole?
  • Average causal effects for a population
    • e.g., What is the average effect of \(X\) on \(Y\)? (“effects of causes”)

1.2 Case-level causal effects

  • In qualitative research, we often see an outcome and want to know what caused it
    • Why does \(Y=1\) in this case?
    • Why did Argentina democratize in the 1980s?
  • And we have suspected causes?
    • Did \(X=1\) cause \(Y=1\) in this case?
    • Did \(W=1\) cause \(Y=1\) in this case?
    • Did military defeat cause Argentina’s democratization?
  • “Did \(X=1\) cause \(Y=1\) in this case?” means (in a binary framework)….
    • If \(X\) had been \(0\) in this case, would \(Y\) have been \(0\)?
  • “Did military defeat cause Argentina’s democratization?” means….
    • If military defeat had not happened, would Argentina’s democratization have failed to happen?

1.3 The probability of a case-level causal effect

  • Our estimates of case-level causal effects have uncertainty

    • Our inferences thus take the form of probabilities – as degrees of belief
  • We call this probability of causation

  • We always condition case-level queries on what we already know about the case

    • Given that \(X=1\) and \(Y=1\) in this case, what’s the probability that \(X=1\) caused \(Y=1\)?
    • Given that Argentina experienced military defeat and democratized, what’s the probability that the former caused the latter?

1.4 Case-level causal effects and types

  • “Did \(X=1\) cause \(Y=1\) in this case?” means (in a binary framework)….
    • If \(X\) had been \(0\) in this case, would \(Y\) have been \(0\)?
  • This is the same as asking: is the case a \(\theta_{01}\) type?
    • This is a type in which
      • \(Y=1\) if \(X=1\)
      • \(Y=0\) if \(X=0\)
  • More precisely we ask, what’s the probability that this case is a \(\theta_{01}\) type?

1.5 Causal queries on a DAG

Causal queries can be expressed as nodes on a causal graph

1.6 Causal queries on a DAG: case-level causal effect

  • Does \(X\) have a positive effect on \(Y\) in this case?
    • This is a question about the value of \(\theta^Y\)
  • Does \(\theta^Y = \theta^Y_{01}\)?

1.7 Pathway questions

  • How did \(X\) cause \(Y\) in this case?

    • What is the pathway or mechanism?
  • We might think there’s a possible pathway through \(M\), but did that pathway actually operate in this case?

  • There could be other possibilities

  • Did military defeat cause democratization in Argentina through creating division within the army?

    • Or did military defeat cause democratization in Argentina through generating popular protest?
  • Also a question about types

1.8 Causal queries on a DAG: pathway questions

  • Does \(X\) have a positive effect on \(Y\) through \(M\) in this case?
  • A question about \(\theta^M\) and \(\theta^Y\) in this case

1.9 Causal queries on a DAG: pathway questions

  • A question about \(\theta^M\)
    • Does \(M\) respond to \(X\)?
    • E.g., is \(\theta^M = \theta_{01}^M\)?
  • Questions about \(\theta^Y\)
    • When \(M\) changes the way it does when \(X\) goes from 0 to 1, does \(Y\) go from 0 to 1, even when \(X\) is held constant?
    • Does \(Y\) increase if \(X\) increases but \(M\) is held constant?
  • We work through this in the “Causal Questions” chapter, but larger point:
    • We can define this causal question (like any causal question) as a question about combinations of nodal types (i.e., causal types)

1.10 Causal queries with conditioning

  • We can ask: what is the effect of \(X\) on \(Y\) given \(M=1\)?
  • Importantly, different from: what is the effect of \(X\) on \(Y\) if \(M\) is set to \(1\)?

1.11 Types of conditioning

We distinguish between:

  • the effect of \(X\) on \(Y\) when \(M\) “happens to” take on a given value
  • the effect of \(X\) on \(Y\) when \(M\) “is set to” or “fixed at” a given value

Say we think that school attendance (X) increases income (Y) via education (M).

  • then we will be more convinced that \(X=1\) caused \(Y=1\) if we observe that \(M=1\), BUT
  • we would not expect \(X\) to cause \(Y\) if we were to fix \(M\) to \(1\) – if we were to set education level to some value

2 Queries in CausalQueries

(Macartan)

2.1 Defining queries in CausalQueries

  • We use [condition] to describe conditions that are set
  • we use “given = condition” arguments to describe conditions that happen to hold

Examples:

  • \(Y[X=1]\) is the value of \(Y\) when \(X\) is set to 1. (do\((X=1)\), or \(X \leftarrow 1\))
  • \(Y[X=1] - Y[X=0]\) is the effect of \(X\) on \(Y\)
  • \(Y[X=1, M = 1] - Y[X=0, M=1]\) is the effect of \(X\) on \(Y\) when \(M\) is set to 1
  • \(Y[X=1] - Y[X=0]\) given \(M==1\) is the effect of \(X\) on \(Y\) when \(M\) happens to be 1

2.2 Conditioning in CausalQueries

You can query using multiple queries, and asking for queries based on parameters, priors, or posteriors.

What are each of these?

model <- make_model("X -> M -> Y")

model |> query_model("Y[X=1] - Y[X=0]")

model |> query_model("Y[X=1, M=1] - Y[X=0, M=1]")

model |> query_model("Y[X=1] - Y[X=0]", given = "M==1")

model |> query_model("Y[X=1] - Y[X=0]", using = "posteriors")

2.3 From statements to types

In general even complex queries can be written as sets of causal types. CausalQueries translates from causal statements to causal types.

make_model("X->Y") |>
  get_query_types("Y[X=1] ==1 ")

Causal types satisfying query's condition(s)  

 query =  Y[X=1]==1 

X0.Y01  X1.Y01
X0.Y11  X1.Y11


 Number of causal types that meet condition(s) =  4
 Total number of causal types in model =  8

2.4 From statements to types

In general even complex queries can be written as sets of causal types. CausalQueries translates from causal statements to causal types.

make_model("X->Y") |>
  get_query_types("Y[X=1] == Y[X=0]")

Causal types satisfying query's condition(s)  

 query =  Y[X=1]==Y[X=0] 

X0.Y00  X1.Y00
X0.Y11  X1.Y11


 Number of causal types that meet condition(s) =  4
 Total number of causal types in model =  8

2.5 Lotsa queries (see handout)

Model Query Given Interpretation Types
X -> Y Y[X=1] > Y[X=0] Probability that X has a positive effect on Y X0.Y01, X1.Y01
X -> Y Y[X=1] < Y[X=0] X == 1 Probability that X has a negative effect on Y among those for whom X=1 X1.Y10
X -> Y Y[X=1] > Y[X=0] X==1 & Y==1 Probability that Y=1 is due to X=1 (Attribution) X1.Y01
X -> Y <- W Y[X=1] > Y[X=0] W == 1 Probability that X has a positive effect on Y for a case in which W = 1 (where W is possibly defined post treatment) W1.X0.Y0001, W1.X1.Y0001, W1.X0.Y1001, W1.X1.Y1001, W1.X0.Y0011, W1.X1.Y0011, W1.X0.Y1011, W1.X1.Y1011
X -> Y <- W Y[X=1, W = 1] > Y[X=0, W = 1] W==0 Probability that X has a positive effect on Y if W were set to 1 for cases for which in fact W=0 W0.X0.Y0001, W0.X1.Y0001, W0.X0.Y1001, W0.X1.Y1001, W0.X0.Y0011, W0.X1.Y0011, W0.X0.Y1011, W0.X1.Y1011
X -> Y <- W Y[X=1] > Y[X=0] Y[W=1] > Y[W=0] Probability that X has a positive effect on Y for a case in which W has a positive effect on Y W0.X0.Y0110, W1.X1.Y0001, W1.X1.Y1001, W0.X0.Y0111
X -> Y <- W (Y[X=1, W = 1] > Y[X=0, W = 1]) > (Y[X=1, W = 0] > Y[X=0, W = 0]) W==1 & X==1 Probability of a positive interaction between W and X for Y; the probability that the effect of X on Y is stronger when W is larger W1.X1.Y0001, W1.X1.Y1001, W1.X1.Y1011
X -> M -> Y <- X Y[X = 1, M = M[X=1]] > Y[X = 0, M = M[X=1]] X==1 & M==1 & Y==1 The probability X would have a positive effct on Y if M were controlled to be at the level it would take if X were 1 for units for which in fact M==1 X1.M01.Y0001, X1.M11.Y0001, X1.M01.Y1001, X1.M11.Y1001, X1.M01.Y0101, X1.M11.Y0101, X1.M01.Y1101, X1.M11.Y1101
X -> M -> Y <- X (Y[M = 1] > Y[M = 0]) & (M[X = 1] > M[X = 0]) Y[X=1] > Y[X=0] & M==1 The probability that X causes M and M causes Y among units for which M = 1 and X causes Y X1.M01.Y0001, X1.M01.Y0011

3 Bayesian updating

(Macartan)

Updating on causal quantities

3.1 Outline

  1. Bayesian reasoning
  2. Bayesian calculations by hand

3.2 Bayes reasoning

  • Bayesian methods are just sets of procedures to figure out how to update beliefs in light of new information.

  • We begin with a prior belief about the probability that a hypothesis is true.

  • New data then allow us to form a posterior belief about the probability of the hypothesis.

3.2.1 Bayes Rule

Bayesian inference takes into account:

  • the consistency of the evidence with a hypothesis
  • the uniqueness of the evidence to that hypothesis
  • background knowledge about the problem.

3.2.2 Illustration 1

I draw a card from a deck and ask What are the chances it is a Jack of Spades?

  • Just 1 in 52.

Now I tell you that the card is indeed a spade. What would you guess?

  • 1 in 13

What if I told you it was a heart?

  • No chance it is the Jack of Spades

What if I said it was a face card and a spade.

  • 1 in 3.

3.2.3 Illustration 1

These answers are applications of Bayes’ rule.

In each case the answer is derived by:

  • assessing what is possible…
  • given the new information, and then
  • assessing how likely the outcome of interest is among the states that are possible.

In each case, you calculate:

\[\text{Prob Jack of Spades | Info} = \frac{\text{Is Jack of Spades Consistent w/ Info?}}{\text{How many cards are consistent w/ Info?}} \]

3.2.4 Illustration 2 Interpreting Your Test Results

You take a test to see whether you suffer from a disease that affects 1 in 100 people. The test is good in the following sense:

  • if you have the disease, then with a 99% probability it will say you have the disease
  • if you do not have it, then with a 99% probability, it will say that you do not have it

The test result says that you have the disease. What are the chances you have it?

3.2.5 Illustration 2 Interpreting Your Test Results

  • It is not 99%. 99% is the probability of the result given the disease, but we want the probability of the disease given the result.

  • The right answer is 50%, which you can think of as the share of people that have the disease among all those that test positive. For example

  • e.g. if there were 10,000 people, then 100 would have the disease and 99 of these would test positive. But 9,900 would not have the disease and 99 of these would test positive. So the people with the disease that test positive are half of the total number testing positive.

3.2.6 Illustration 2. A picture

What’s the probability of being a circle given you are black?

3.2.7 Illustration 2. More formally.

As an equation this might be written:

\[\text{Prob You have the Disease | Pos} = \frac{\text{How many have the disease and test pos?}}{\text{How many people test pos?}}\]

\[\frac{0.01 \times 10000 \times 0.99}{0.01 \times 10000 \times 0.99 + 0.99 \times 10000 \times 0.01} =\frac12 \]

3.2.8 Two Child Problem

Consider last an old puzzle described in Gardner (1961).

  • Mr Smith has two children, \(A\) and \(B\).
  • At least one of them is a boy.
  • What are the chances they are both boys?

To be explicit about the puzzle, we will assume that the information that one child is a boy is given as a truthful answer to the question “is at least one of the children a boy?

Assuming also that there is a 50% probability that a given child is a boy.

3.2.9 Two Child Problem

As an equation:

\[\text{Prob both boys | Not both girls} = \frac{\text{Prob both boys}}{\text{Prob not both girls}} = \frac{\text{1 in 4}}{\text{3 in 4}}\]

3.2.10 Bayes Rule

Formally, all of these equations are applications of Bayes’ rule, which is a simple and powerful formula for deriving updated beliefs from new data.

The formula is:

\[ \Pr(H|\mathcal{D})= \frac{\Pr(\mathcal{D}, H)}{\Pr(\mathcal{D})} \]

Equivalently:

\[ \Pr(H|\mathcal{D})= \frac{\Pr(\mathcal{D}|H)\Pr(H)}{\Pr(\mathcal{D})}\\ \]

or:

\[ \Pr(H|\mathcal{D})= \frac{\Pr(\mathcal{D}|H)\Pr(H)}{\sum_{H'}\Pr(\mathcal{D}|H')\Pr(H'))} \]

3.2.11 Causal Problem

What’s the probability that \(Y=1\) is due to \(X=1\)?

\[\begin{eqnarray*} \Pr(\theta^Y = \theta^Y_{01} | X=1, Y=1)&=&\frac{\Pr(X=1, Y=1|\theta^Y_{01})\Pr(\theta^Y_{01})}{\Pr(X=1, Y=1)}\\ &=&\frac{\Pr(X=1)\Pr(\theta^Y_{01})}{\Pr(X=1)\Pr(\theta^Y_{01})+ \Pr(X=1)\Pr(\theta^Y_{11})} \end{eqnarray*}\]

So:1

\[ \Pr(\theta^Y_{01} | X=1, Y=1)=\frac{\Pr(\theta^Y_{01})}{\Pr(\theta^Y_{01})+ \Pr(\theta^Y_{11})} \]

3.2.12 Causal Problem

More generally

\[\Pr(Q|D) = \sum_{\theta \in Q} \Pr(\theta | D)=\sum_{\theta \in Q}\frac{\Pr(D,\theta)}{\Pr(D)}=\frac{\sum_{\theta \in Q}\Pr(D,\theta)}{\Pr(D)}=\frac{\sum_{\theta \in Q \cap D}\Pr(\theta)}{\sum_{\theta\in D}\Pr(\theta)}\]

so:

\[ \frac{\text{Type consistent with the data and the query}}{\text{Type consistent with the data}} \]

3.3 Continuous distributions

For continuous distributions and parameter vector \(\theta\):

\[p(\theta|\mathcal{D})=\frac{p(\mathcal{D}|\theta)p(\theta)}{\int_{\theta'}p(\mathcal{D|\theta'})p(\theta')d\theta}\]

3.3.1 Illustration of continuous distributions

Consider the share of people in a population that voted or the share of people for which there is a positive treatment effect

  • This is a quantity between 0 and 1.

  • It can be represented by a continuous distribution: the beta distribution, with parameters that reflect how much data you have observed

3.3.2 Beta

  • The Beta distribution is a distribution over the \([0,1]\) that is governed by two parameters, \(\alpha\) and \(\beta\).
  • In the case in which both \(\alpha\) and \(\beta\) are 1, the distribution is uniform – all values are seen as equally likely.
  • As \(\alpha\) rises large outcomes are seen as more likely
  • As \(\beta\) rises, lower outcomes are seen as more likely.
  • If both rise proportionately the expected outcome does not change but the distribution becomes tighter.

An attractive feature is that if one has a prior Beta(\(\alpha\), \(\beta\)) over the probability of some event, and then one observes a positive case, the Bayesian posterior distribution is also a Beta with with parameters \(\alpha+1, \beta\). Thus if people start with uniform priors and build up knowledge on seeing outcomes, their posterior beliefs should be Beta.

3.3.3 Beta

Here is a set of such distributions.

Beta distributions

4 Conditional independence

(Alan)

Figuring out when A tells you something about B

4.1 Key insight linking Causal Models to Bayes

The key insight that connects a causal model to Bayesian updating is this:

  • a causal model can tell you how to update on one feature given information on another

4.2 Conditional independence and graph structure

  • What DAGs do is tell you when one variable is independent of another variable given some third variable.
  • Intuitively:
    • what variables “shield off” the influence of one variable on another
    • e.g. If inequality causes revolution via discontent, then inequality and revolution should be related to each other overall, but not related to each other among highly contented cases or among discontented cases

4.2.1 Conditional independence

Variable sets \(A\) and \(B\) are conditionally independent, given \(C\) if for all \(a\), \(b\), \(c\):

\[\Pr(A = a | C = c) = \Pr(A = a | B = b, C = c)\]

Informally; given \(C\), knowing \(B\) tells you nothing more about \(A\).

4.2.2 Three elemental structures for thinking about conditional independence

Three elemental relations of conditional independence.

4.2.3 Conditional independence from graphs

\(A\) and \(B\) are conditionally independent, given \(C\) if on every path between \(A\) and \(B\):

  • there is some chain (\(\bullet\rightarrow \bullet\rightarrow\bullet\) or \(\bullet\leftarrow \bullet\leftarrow\bullet\)) or fork (\(\bullet\leftarrow \bullet\rightarrow\bullet\)) with the central element in \(C\) (i.e., all dependencies are blocked by \(C\)),

or

  • there is an inverted fork (\(\bullet\rightarrow \bullet\leftarrow\bullet\)) with the central element (and its descendants) not in \(C\) (i.e., no dependencies are opened by \(C\))

Notes:

  • In this case we say that \(A\) and \(B\) are d-separated by \(C\).
  • \(A\), \(B\), and \(C\) can all be sets
  • Note that a path can involve arrows pointing any direction \(\bullet\rightarrow \bullet\rightarrow \bullet\leftarrow \bullet\rightarrow\bullet\)

4.2.4 Test yourself

Are A and D unconditionally independent:

  • if you do not condition on anything?
  • if you condition on B?
  • if you condition on C?
  • if you condition on B and C?

4.3 Theorem

Key insight for process tracing:

  • Informativeness of a node hinges on conditional independence
  • If \(K\) is not conditionally independent of \(\theta\) then \(K\) is (possibly) informative for \(\theta\).
  • If \(K\) is conditionally independent of \(\theta\) then \(K\) is not informative for \(\theta\).

4.3.1 Next steps

When we turn to process tracing we will unpack the usefulness of this result

5 Updating in CausalQueries

(Macartan)

CausalQueries brings these elements together

5.1 Big picture

CausalQueries brings these elements together by allowing users to:

  1. Make model: Specify a DAG: CausalQueries figures out all causal types and places probabilities on these (parameters)
  2. Query model: CausalQueries figures out which types correspond to a given causal query given case data and figures out the probability of these (given some parameters)
  3. Update model: CausalQueries figures out which parameters are more likely given data, allowing you to query using these parameters

5.2 Illustration of 1 and 2

When we pose queries given different data we are in fact using Bayesian updating.

Below we update on our beliefs about a single case given different data we might see. This, essentially, is what process tracing is (in our view).

make_model('X -> M -> Y') |>
  set_restrictions(decreasing("X", "M"))|>
  set_restrictions(decreasing("M", "Y"))|>
  query_model(
    query = "Y[X=1] != Y[X=0]",
    given = c("All",
              "X==1 & Y ==1", 
              "M==0", 
              "M==1",
              "X==1 & Y ==1 & M==0",
              "X==1 & Y ==1 & M==1")) 

Causal queries generated by query_model (all at population level)

|label                                      |query            |given               |using      |  mean|
|:------------------------------------------|:----------------|:-------------------|:----------|-----:|
|Y[X=1] != Y[X=0]                           |Y[X=1] != Y[X=0] |-                   |parameters | 0.111|
|Y[X=1] != Y[X=0] given X==1 & Y ==1        |Y[X=1] != Y[X=0] |X==1 & Y ==1        |parameters | 0.200|
|Y[X=1] != Y[X=0] given M==0                |Y[X=1] != Y[X=0] |M==0                |parameters | 0.111|
|Y[X=1] != Y[X=0] given M==1                |Y[X=1] != Y[X=0] |M==1                |parameters | 0.111|
|Y[X=1] != Y[X=0] given X==1 & Y ==1 & M==0 |Y[X=1] != Y[X=0] |X==1 & Y ==1 & M==0 |parameters | 0.000|
|Y[X=1] != Y[X=0] given X==1 & Y ==1 & M==1 |Y[X=1] != Y[X=0] |X==1 & Y ==1 & M==1 |parameters | 0.250|

5.3 Illustration of 3

We will spend more time on 3 later but in brief:

  • update_model() lets you update over population parameters not just case types
model <- make_model("X->Y") 
data  <- data.frame(X = c(0,0,1,1), Y = c(0,0,1,1))

model <- update_model(model, data) |>
  query_model("Y[X=1] - Y[X=0]", using = "posteriors")

Causal queries generated by query_model (all at population level)

|label           |using      |  mean|    sd| cred.low| cred.high|
|:---------------|:----------|-----:|-----:|--------:|---------:|
|Y[X=1] - Y[X=0] |posteriors | 0.349| 0.259|   -0.168|     0.817|