18 Glossary

term	(typical) symbol	meaning
Ambiguities matrix	\(A\)	A matrix of 0s and 1s that maps from causal types (rows) to data types (columns). We call it an ambiguities matrix because the mapping from causal types to data types is many to one: Each causal type produces a unique data type, but a data type can be produced by many causal types.
Causal function	\(f_Y(X, \theta_Y)\)	A function that maps from the possible values of the parents of a node to the possible values of the node. A change in the value of an argument is interpreted as a controlled change. Thus, \(f_Y(X=1, \theta_Y) - f_Y(X=0, \theta_Y)\) can be interpreted as the change in \(Y\) as \(X\)’s value is manipulated from 0 to 1. See Remark 2.1.
Causal model	M, M’	A triple containing: (1) a partially ordered set of (endogenous and exogenous) nodes, (2) a set of functions, one for each endogenous variable, specifying how it responds to the values of earlier variables in the ordering, (3) a probability distribution over exogenous variables. Note that (1) and (2) together define a “structural causal model” whereas (1), (2), and (3) describe a “probabilistic causal model” which we refer to simply as a causal model. See Definition 2.1.
Causal type	\(\theta\)	A causal type is a concatenation of nodal types, one for each node. The causal type of a unit fully describes what values that unit takes on at all nodes and also how that unit would respond to all interventions. Example: \((\theta^X_0, \theta^Y_{01})\) is a causal type that has \(X=0\) and \(Y=0\) but would have \(Y=1\) if \(X\) were set to 1. Types like this are written in code in `CausalQueries` as `X0.Y01`.
Clue	\(K\)	A variable or collection of variables whose values are potentially informative for some query.
Conditional independence	\({\displaystyle (A\perp \!\!\!\perp B\mid C)}\)	Two (sets of) variables (\(A\) and \(B\)) are conditionally independent given some third (set of) variables (\(C\)) if, for all \(a\), \({\displaystyle \Pr(A=a\mid B,C)=\Pr(A=a\mid C)}\) See Definition 2.2.
Credibility interval		A set of possible values within which we believe a parameter lies with some specified probability. In tables we often use `cred low` and `cred high` to indicate the lower and upper bounds of a 95% credibility interval.
DAG		Directed acyclic graph. A graphical representation of a structural causal model, indicating nodes, parent-child relations, and relations of conditional independence.
Data strategy	\(S\)	A plan indicating for how many nodes data of different types will be gathered. A data strategy might indicate what new data will be gathered at one point as a function of what has already been seen at earlier points.
Dirichlet priors	alpha, \(\alpha\)	Nonnegative numbers used to characterize a prior distribution over a simplex. The implied mean is the normalized vector \(\mu= \alpha/\sum_j\alpha_j\) and the variance is \(\mu(1-\mu)/(1+\sum_j\alpha_j)\). See Section 5.1.4.
Data type or event type		A possible set of values on all nodes (including, possibly, NAs). Example: `X0Y1` \(= (X=0, Y = 1)\).
Endogenous node	\(X\), \(Y\)	A node that is a function of other nodes (whether these are just exogenous nodes, or a mix of endogenous and exogenous nodes). All substantive nodes in a model are typically endogenous in that they, minimally, have an exogenous (\(\theta^j\)) node pointing into them.
Event probability	\(w\)	The probability of a data type or event type arising. Example: \(w_{01}=\Pr(X=0, Y=1)\).
Exogenous node	\(\theta^X\), \(\theta^Y\)	A node that is not a function of other nodes in a model. Exogenous nodes are often not represented on causal graphs, but in general there is implicitly one exogenous node for each endogenous node. In this book’s use of causal models, exogenous nodes typically represent nodal types.
Flat priors		We say priors are flat when they place equal weight on all possibilities. For instance, we refer to a Dirichlet as describing flat priors when \(\alpha\) is a vector of 1s.
Mediator	\(M\)	A mediator is a variable (node) that lies along the causal pathway of one variable to another and through which a causal effect may pass. For instance, in an \(X \rightarrow M \rightarrow Y\) model, \(M\) is a potential mediator for the effect of \(X\) on \(Y\).
Moderator	\(K, M, W\)	A moderator is a variable that affects the effect of one variable on another. For instance, in an \(X \rightarrow Y \leftarrow K\) model, \(K\) is a potential moderator, potentially altering the affect of \(X\) on \(Y\)
Multinomial distribution		A probability distribution reporting the probability of a given distribution of outcomes across categories.
Nodal type	\(\theta^X\)	The way that a node responds to the values of its parents. Example: \(\theta^Y_{10}\), sometimes written `Y10` is a nodal type for which \(Y\) takes the value 1 if \(X=0\) and 0 if \(X=1\).
Parent (child)	\(pa()\)	\(X\) is a parent of \(Y\) if a change in \(X\) possibly induces a change in \(Y\) even when all other nodes in the graphs are fixed. \(Y\) is a child of \(X\) if a change in \(X\) sometimes induces a change in \(Y\) even when all other nodes are fixed. On the graph, an arrow from \(X\) to \(Y\) indicates that \(X\) is a parent of \(Y\) and that \(Y\) is a child of \(X\).
Parameter	\(\lambda\)	An unknown quantity of interest. In many applications in the book, \(\lambda^V_x\) denotes the share of units that have nodal type \(x\) on node \(V\). In models with unobserved confounding, parameters are often thought of as the conditional probabilities of nodal types. Example: \(\lambda^Y_{01\|\theta^M=\theta^M_{01}} = \Pr(\theta^Y = \theta^Y_{01}\|\theta^M=\theta^M_{01})\).
Parameter matrix	\(P\)	A matrix of 0s and 1s that maps from parameters (rows) to causal types (columns).
Posterior	\(p(\lambda\|d)\)	A probability distribution over a set of parameter values after observing data.
Potential outcomes	\(Y_i(0), Y_i(1)\)	The values that a unit would take on under a specified set of conditions—for instance, if \(X\) were set to 0 or \(X\) were set to 1. See Remark 2.1.
Prior	\(p(\lambda)\)	A probability distribution over a set of parameter values before observing data.
Query	\(Q\), \(q\)	A question asked of a model, either about the values of nodes or the values that nodes would take under specified operations. We use lower case \(q\) to represent the answer to the query (the estimand), which is the realization of \(Q\). Simple queries, such as the probability that \(X\) has a positive effect on \(Y\), ask about the probability of some set of causal types. Complex queries such as the average treatment effect, ask for summaries of simple queries: In a binary setup, the share of units with a positive effect less the share of units with a negative effect.