Causal models for qualitative inference and mixed methods

Author

Macartan Humphreys and Alan Jacobs

Lecture 1 + Exercise 1

After the first session you should have an understanding of:

  • the limits of design-based inference
  • the limits of non-explicit qualitative inference
  • what a potential outcome is
  • what a causal effect is (under the potential outcomes model)
  • what a causal model is
  • what a DAG is
  • what an arrow or its absence means in a DAG
  • what a “causal type” is
  • how to construct a model in CausalQueries

Lecture 2 + Exercise 2

After the second session you should have an understanding of:

  • the difference between case-level and population-level queries
  • causal queries as questions about a case’s causal type (i.e., about theta’s)
  • how to write a query in CausalQueries
  • what Bayes’ rule is and how it works for discrete and continuous queries
  • what conditional independence is and how to read it from a causal graph
  • how conditional independence relates to the informativeness of clues for understanding queries: we hope you will start being able to see queries and clues as nodes on a DAG and understand when a clue is informative for a theory

Lecture 3 + Exercise 3

After the third session you should have an understanding of:

  • when process data is potentially information about causal queries
  • clues are not just mediators: they can occupy different spots on a DAG
  • why a model has to have substantive assumptions—beyond the structural assumptions—to allow information inferences from process data
  • a procedure for drawing qualitative inferences
  • the benefits of an explicit strategy for process tracing
  • how to draw qualitative inferences in CausalQueries

Lecture 4 + Exercise 4

After the fourth session you should have an understanding of:

  • what are some key population queries
  • what the parameters of a mixed methods model are (the \(\lambda\)s)
  • Bayesian approaches to updating on population queries
  • what mixtures of quantitative and qualitative data look like
  • why, fundamentally, updating procedures are the same whether you update on treatment-outcome data, process data, or mixed data
  • how qualitative (process) data can help quantitative inferences
  • how quantitative data can help qualitative inferences
  • what data gathering you might use in practice in a mixed methods evaluation
  • how to do all this in CausalQueries

What we have not covered but you can read more about in the book:

  • how to assess how dependent your inferences are on your model
  • how to assess whether your model is doing more harm than good!

Advance watching and reading

The causal models approach we’ll be teaching is fairly complex, and you will get much more out of the course if you prepare in advance by engaging with our book, Integrated Inferences.

First, watch these four introductory videos to get the general idea - they will make the reading a bit easier: https://integrated-inferences.github.io/videos.html

Then, read the following chapters from the open access pre-print: https://integrated-inferences.github.io/book/

Chapters 2, 4, 5, 7, 9

The text is fairly dense, so don’t let yourself get bogged down if there are things you don’t understand. Just keep going. There will be plenty of time for questions and further explanation in class, but it’s critical that you come in with this overview.

Software installation

In class, we will be doing hands-on exercises with the software package through which the approach can be implemented, CausalQueries. You will thus need to have all necessary software installed before the start of class on the first day.

Make sure you have an up to date installation of R Make sure you have an up to date installation of RStudio Install CausalQueries from CRAN, e.g. in Rstudio. e.g. via install.packages(“CausalQueries”) To check the package is working try to make and update a model like this: model <- make_model() |> update_model()

Identify data (Optional)

During the class we will practice forming causal models and drawing inferences from them. We will work with simulated examples and some real examples that we used in our book. However if you have an application in mind we encourage you to identify data, even if imperfect, that you can try this all out on. Good data for this would have the following features:

A key outcome variable you particularly care about

One or two possible causes that you care about

Data on at least some of the cases that capture either relevant features of context, or aspects of processes connecting causes and outcomes

All variables can be plausibly dichotomized (measured as 0 or 1) Keep things simple: use a setting where each observation is independent – so avoid data with complex hierarchical, clustering, or time dependence features.