Lecture 2: Queries and inferences
(Alan)
Estimands are the things you want to know.
A few common causal estimands:
Our estimates of case-level causal effects have uncertainty
We call this probability of causation
We always condition case-level queries on what we already know about the case
Causal queries can be expressed as nodes on a causal graph
How did \(X\) cause \(Y\) in this case?
We might think there’s a possible pathway through \(M\), but did that pathway actually operate in this case?
There could be other possibilities
Did military defeat cause democratization in Argentina through creating division within the army?
Also a question about types
We distinguish between:
Say we think that school attendance (X) increases income (Y) via education (M).
CausalQueries(Macartan)
CausalQueries[condition] to describe conditions that are setgiven = condition” arguments to describe conditions that happen to holdExamples:
do\((X=1)\), or \(X \leftarrow 1\))given \(M==1\) is the effect of \(X\) on \(Y\) when \(M\) happens to be 1CausalQueriesYou can query using multiple queries, and asking for queries based on parameters, priors, or posteriors.
What are each of these?
In general even complex queries can be written as sets of causal types. CausalQueries translates from causal statements to causal types.
In general even complex queries can be written as sets of causal types. CausalQueries translates from causal statements to causal types.
| Model | Query | Given | Interpretation | Types |
|---|---|---|---|---|
| X -> Y | Y[X=1] > Y[X=0] | Probability that X has a positive effect on Y | X0.Y01, X1.Y01 | |
| X -> Y | Y[X=1] < Y[X=0] | X == 1 | Probability that X has a negative effect on Y among those for whom X=1 | X1.Y10 |
| X -> Y | Y[X=1] > Y[X=0] | X==1 & Y==1 | Probability that Y=1 is due to X=1 (Attribution) | X1.Y01 |
| X -> Y <- W | Y[X=1] > Y[X=0] | W == 1 | Probability that X has a positive effect on Y for a case in which W = 1 (where W is possibly defined post treatment) | W1.X0.Y0001, W1.X1.Y0001, W1.X0.Y1001, W1.X1.Y1001, W1.X0.Y0011, W1.X1.Y0011, W1.X0.Y1011, W1.X1.Y1011 |
| X -> Y <- W | Y[X=1, W = 1] > Y[X=0, W = 1] | W==0 | Probability that X has a positive effect on Y if W were set to 1 for cases for which in fact W=0 | W0.X0.Y0001, W0.X1.Y0001, W0.X0.Y1001, W0.X1.Y1001, W0.X0.Y0011, W0.X1.Y0011, W0.X0.Y1011, W0.X1.Y1011 |
| X -> Y <- W | Y[X=1] > Y[X=0] | Y[W=1] > Y[W=0] | Probability that X has a positive effect on Y for a case in which W has a positive effect on Y | W0.X0.Y0110, W1.X1.Y0001, W1.X1.Y1001, W0.X0.Y0111 |
| X -> Y <- W | (Y[X=1, W = 1] > Y[X=0, W = 1]) > (Y[X=1, W = 0] > Y[X=0, W = 0]) | W==1 & X==1 | Probability of a positive interaction between W and X for Y; the probability that the effect of X on Y is stronger when W is larger | W1.X1.Y0001, W1.X1.Y1001, W1.X1.Y1011 |
| X -> M -> Y <- X | Y[X = 1, M = M[X=1]] > Y[X = 0, M = M[X=1]] | X==1 & M==1 & Y==1 | The probability X would have a positive effct on Y if M were controlled to be at the level it would take if X were 1 for units for which in fact M==1 | X1.M01.Y0001, X1.M11.Y0001, X1.M01.Y1001, X1.M11.Y1001, X1.M01.Y0101, X1.M11.Y0101, X1.M01.Y1101, X1.M11.Y1101 |
| X -> M -> Y <- X | (Y[M = 1] > Y[M = 0]) & (M[X = 1] > M[X = 0]) | Y[X=1] > Y[X=0] & M==1 | The probability that X causes M and M causes Y among units for which M = 1 and X causes Y | X1.M01.Y0001, X1.M01.Y0011 |
(Macartan)
Updating on causal quantities
Bayesian methods are just sets of procedures to figure out how to update beliefs in light of new information.
We begin with a prior belief about the probability that a hypothesis is true.
New data then allow us to form a posterior belief about the probability of the hypothesis.
Bayesian inference takes into account:
I draw a card from a deck and ask What are the chances it is a Jack of Spades?
Now I tell you that the card is indeed a spade. What would you guess?
What if I told you it was a heart?
What if I said it was a face card and a spade.
These answers are applications of Bayes’ rule.
In each case the answer is derived by:
In each case, you calculate:
\[\text{Prob Jack of Spades | Info} = \frac{\text{Is Jack of Spades Consistent w/ Info?}}{\text{How many cards are consistent w/ Info?}} \]
You take a test to see whether you suffer from a disease that affects 1 in 100 people. The test is good in the following sense:
The test result says that you have the disease. What are the chances you have it?
It is not 99%. 99% is the probability of the result given the disease, but we want the probability of the disease given the result.
The right answer is 50%, which you can think of as the share of people that have the disease among all those that test positive. For example
e.g. if there were 10,000 people, then 100 would have the disease and 99 of these would test positive. But 9,900 would not have the disease and 99 of these would test positive. So the people with the disease that test positive are half of the total number testing positive.
What’s the probability of being a circle given you are black?
As an equation this might be written:
\[\text{Prob You have the Disease | Pos} = \frac{\text{How many have the disease and test pos?}}{\text{How many people test pos?}}\]
\[\frac{0.01 \times 10000 \times 0.99}{0.01 \times 10000 \times 0.99 + 0.99 \times 10000 \times 0.01} =\frac12 \]
Consider last an old puzzle described in Gardner (1961).
To be explicit about the puzzle, we will assume that the information that one child is a boy is given as a truthful answer to the question “is at least one of the children a boy?”
Assuming also that there is a 50% probability that a given child is a boy.
As an equation:
\[\text{Prob both boys | Not both girls} = \frac{\text{Prob both boys}}{\text{Prob not both girls}} = \frac{\text{1 in 4}}{\text{3 in 4}}\]
Formally, all of these equations are applications of Bayes’ rule, which is a simple and powerful formula for deriving updated beliefs from new data.
The formula is:
\[ \Pr(H|\mathcal{D})= \frac{\Pr(\mathcal{D}, H)}{\Pr(\mathcal{D})} \]
Equivalently:
\[ \Pr(H|\mathcal{D})= \frac{\Pr(\mathcal{D}|H)\Pr(H)}{\Pr(\mathcal{D})}\\ \]
or:
\[ \Pr(H|\mathcal{D})= \frac{\Pr(\mathcal{D}|H)\Pr(H)}{\sum_{H'}\Pr(\mathcal{D}|H')\Pr(H'))} \]
What’s the probability that \(Y=1\) is due to \(X=1\)?
\[\begin{eqnarray*} \Pr(\theta^Y = \theta^Y_{01} | X=1, Y=1)&=&\frac{\Pr(X=1, Y=1|\theta^Y_{01})\Pr(\theta^Y_{01})}{\Pr(X=1, Y=1)}\\ &=&\frac{\Pr(X=1)\Pr(\theta^Y_{01})}{\Pr(X=1)\Pr(\theta^Y_{01})+ \Pr(X=1)\Pr(\theta^Y_{11})} \end{eqnarray*}\]
So:1
\[ \Pr(\theta^Y_{01} | X=1, Y=1)=\frac{\Pr(\theta^Y_{01})}{\Pr(\theta^Y_{01})+ \Pr(\theta^Y_{11})} \]
More generally
\[\Pr(Q|D) = \sum_{\theta \in Q} \Pr(\theta | D)=\sum_{\theta \in Q}\frac{\Pr(D,\theta)}{\Pr(D)}=\frac{\sum_{\theta \in Q}\Pr(D,\theta)}{\Pr(D)}=\frac{\sum_{\theta \in Q \cap D}\Pr(\theta)}{\sum_{\theta\in D}\Pr(\theta)}\]
so:
\[ \frac{\text{Type consistent with the data and the query}}{\text{Type consistent with the data}} \]
For continuous distributions and parameter vector \(\theta\):
\[p(\theta|\mathcal{D})=\frac{p(\mathcal{D}|\theta)p(\theta)}{\int_{\theta'}p(\mathcal{D|\theta'})p(\theta')d\theta}\]
Consider the share of people in a population that voted or the share of people for which there is a positive treatment effect
This is a quantity between 0 and 1.
It can be represented by a continuous distribution: the beta distribution, with parameters that reflect how much data you have observed
An attractive feature is that if one has a prior Beta(\(\alpha\), \(\beta\)) over the probability of some event, and then one observes a positive case, the Bayesian posterior distribution is also a Beta with with parameters \(\alpha+1, \beta\). Thus if people start with uniform priors and build up knowledge on seeing outcomes, their posterior beliefs should be Beta.
Here is a set of such distributions.
Beta distributions
(Alan)
Figuring out when A tells you something about B
The key insight that connects a causal model to Bayesian updating is this:
Variable sets \(A\) and \(B\) are conditionally independent, given \(C\) if for all \(a\), \(b\), \(c\):
\[\Pr(A = a | C = c) = \Pr(A = a | B = b, C = c)\]
Informally; given \(C\), knowing \(B\) tells you nothing more about \(A\).
Three elemental relations of conditional independence.
\(A\) and \(B\) are conditionally independent, given \(C\) if on every path between \(A\) and \(B\):
or
Notes:
Are A and D unconditionally independent:
Key insight for process tracing:
When we turn to process tracing we will unpack the usefulness of this result
CausalQueries(Macartan)
CausalQueries brings these elements together
CausalQueries brings these elements together by allowing users to:
CausalQueries figures out all causal types and places probabilities on these (parameters)CausalQueries figures out which types correspond to a given causal query given case data and figures out the probability of these (given some parameters)CausalQueries figures out which parameters are more likely given data, allowing you to query using these parametersWhen we pose queries given different data we are in fact using Bayesian updating.
Below we update on our beliefs about a single case given different data we might see. This, essentially, is what process tracing is (in our view).
make_model('X -> M -> Y') |>
set_restrictions(decreasing("X", "M"))|>
set_restrictions(decreasing("M", "Y"))|>
query_model(
query = "Y[X=1] != Y[X=0]",
given = c("All",
"X==1 & Y ==1",
"M==0",
"M==1",
"X==1 & Y ==1 & M==0",
"X==1 & Y ==1 & M==1"))
Causal queries generated by query_model (all at population level)
|label |query |given |using | mean|
|:------------------------------------------|:----------------|:-------------------|:----------|-----:|
|Y[X=1] != Y[X=0] |Y[X=1] != Y[X=0] |- |parameters | 0.111|
|Y[X=1] != Y[X=0] given X==1 & Y ==1 |Y[X=1] != Y[X=0] |X==1 & Y ==1 |parameters | 0.200|
|Y[X=1] != Y[X=0] given M==0 |Y[X=1] != Y[X=0] |M==0 |parameters | 0.111|
|Y[X=1] != Y[X=0] given M==1 |Y[X=1] != Y[X=0] |M==1 |parameters | 0.111|
|Y[X=1] != Y[X=0] given X==1 & Y ==1 & M==0 |Y[X=1] != Y[X=0] |X==1 & Y ==1 & M==0 |parameters | 0.000|
|Y[X=1] != Y[X=0] given X==1 & Y ==1 & M==1 |Y[X=1] != Y[X=0] |X==1 & Y ==1 & M==1 |parameters | 0.250|
We will spend more time on 3 later but in brief:
update_model() lets you update over population parameters not just case types
Causal queries generated by query_model (all at population level)
|label |using | mean| sd| cred.low| cred.high|
|:---------------|:----------|-----:|-----:|--------:|---------:|
|Y[X=1] - Y[X=0] |posteriors | 0.349| 0.259| -0.168| 0.817|