Various helpers to simulate data and to manipulate data types between compact and long forms.
collapse_data
can be used to convert long form data to compact form data,
expand_data
can be used to convert compact form data (one row per data type) to long form data (one row per observation).
make_data
generates a dataset with one row per observation.
make_events
generates a dataset with one row for each data type.
Draws full data only. To generate various types of incomplete data see
make_data
.
Usage
collapse_data(
data,
model,
drop_NA = TRUE,
drop_family = FALSE,
summary = FALSE
)
expand_data(data_events = NULL, model)
make_data(
model,
n = NULL,
parameters = NULL,
param_type = NULL,
nodes = NULL,
n_steps = NULL,
probs = NULL,
subsets = TRUE,
complete_data = NULL,
given = NULL,
verbose = FALSE,
...
)
make_events(
model,
n = 1,
w = NULL,
P = NULL,
A = NULL,
parameters = NULL,
param_type = NULL,
include_strategy = FALSE,
...
)
Arguments
- data
A
data.frame
. Data of nodes that can take three values: 0, 1, and NA. In long form as generated bymake_events
- model
A
causal_model
. A model object generated bymake_model
.- drop_NA
Logical. Whether to exclude strategy families that contain no observed data. Exceptionally if no data is provided, minimal data on data on first node is returned. Defaults to `TRUE`
- drop_family
Logical. Whether to remove column
strategy
from the output. Defaults to `FALSE`.- summary
Logical. Whether to return summary of the data. See details. Defaults to `FALSE`.
- data_events
A 'compact'
data.frame
with one row per data type. Must be compatible with nodes inmodel
. The default columns areevent
,strategy
andcount
.- n
An integer. Number of observations.
- parameters
A vector of real numbers in [0,1]. Values of parameters to specify (optional). By default, parameters is drawn from the parameters dataframe. See
inspect(model, "parameters_df")
.- param_type
A character. String specifying type of parameters to make 'flat', 'prior_mean', 'posterior_mean', 'prior_draw', 'posterior_draw', 'define. With param_type set to
define
use arguments to be passed tomake_priors
; otherwiseflat
sets equal probabilities on each nodal type in each parameter set;prior_mean
,prior_draw
,posterior_mean
,posterior_draw
take parameters as the means or as draws from the prior or posterior.- nodes
A
list
. Which nodes to be observed at each step. If NULL all nodes are observed.- n_steps
A
list
. Number of observations to be observed at each step- probs
A
list
. Observation probabilities at each step- subsets
A
list
. Strata within which observations are to be observed at each step. TRUE for all, otherwise an expression that evaluates to a logical condition.- complete_data
A
data.frame
. Dataset with complete observations. Optional.- given
A string specifying known values on nodes, e.g. "X==1 & Y==1"
- verbose
Logical. If TRUE prints step schedule.
- ...
Arguments to be passed to make_priors if param_type ==
define
- w
A numeric matrix. A `n_parameters x 1` matrix of event probabilities with named rows.
- P
A
data.frame
. Parameter matrix. Not required but may be provided to avoid repeated computation for simulations. Seeinspect(model, "parameter_matrix")
.- A
A
data.frame
. Ambiguities matrix. Not required but may be provided to avoid repeated computation for simulations.inspect(model, "ambiguities_matrix")
- include_strategy
Logical. Whether to include a 'strategy' vector. Defaults to FALSE. Strategy vector does not vary with full data but expected by some functions.
Value
A vector of data events
If summary = TRUE
`collapse_data` returns a list containing the
following components:
- data_events
A compact data.frame of event types and strategies.
- observed_events
A vector of character strings specifying the events observed in the data
- unobserved_events
A vector of character strings specifying the events not observed in the data
A data.frame
with rows as data observation
A data.frame
with simulated data.
A data.frame
of events
Details
Note that default behavior is not to take account of whether a node has already been observed when determining whether to select or not. One can however specifically request observation of nodes that have not been previously observed.
See also
Other data_generation:
get_all_data_types()
,
make_data_single()
,
observe_data()
Other data_generation:
get_all_data_types()
,
make_data_single()
,
observe_data()
Examples
# \donttest{
model <- make_model('X -> Y')
df <- data.frame(X = c(0,1,NA), Y = c(0,0,1))
df |> collapse_data(model)
#> event strategy count
#> 1 X0Y0 XY 1
#> 2 X1Y0 XY 1
#> 3 X0Y1 XY 0
#> 4 X1Y1 XY 0
#> 5 Y0 Y 0
#> 6 Y1 Y 1
# Illustrating options
df |> collapse_data(model, drop_NA = FALSE)
#> event strategy count
#> 1 X0Y0 XY 1
#> 2 X1Y0 XY 1
#> 3 X0Y1 XY 0
#> 4 X1Y1 XY 0
#> 5 Y0 Y 0
#> 6 Y1 Y 1
#> 7 X0 X 0
#> 8 X1 X 0
df |> collapse_data(model, drop_family = TRUE)
#> event count
#> 1 X0Y0 1
#> 2 X1Y0 1
#> 3 X0Y1 0
#> 4 X1Y1 0
#> 5 Y0 0
#> 6 Y1 1
df |> collapse_data(model, summary = TRUE)
#> $data_events
#> event strategy count
#> 1 X0Y0 XY 1
#> 2 X1Y0 XY 1
#> 3 X0Y1 XY 0
#> 4 X1Y1 XY 0
#> 5 Y0 Y 0
#> 6 Y1 Y 1
#>
#> $observed_events
#> [1] "X0Y0" "X1Y0" "Y1"
#>
#> $unobserved_events
#> [1] "X0Y1" "X1Y1" "Y0"
#>
# Appropriate behavior given restricted models
model <- make_model('X -> Y') |>
set_restrictions('X[]==1')
df <- make_data(model, n = 10)
df[1,1] <- ''
df |> collapse_data(model)
#> event strategy count
#> 1 X0Y0 XY 2
#> 2 X0Y1 XY 7
#> 3 Y0 Y 1
#> 4 Y1 Y 0
df <- data.frame(X = 0:1)
df |> collapse_data(model)
#> X1 data is inconsistent with model and ignored
#> event strategy count
#> 1 X0 X 1
# }
# \donttest{
model <- make_model('X->M->Y')
make_events(model, n = 5) |>
expand_data(model)
#> X M Y
#> 1 0 0 0
#> 2 0 0 1
#> 3 1 0 0
#> 4 1 0 1
#> 5 1 1 1
make_events(model, n = 0) |>
expand_data(model)
#> X M Y
#> 1 NA NA NA
# }
# Simple draws
model <- make_model("X -> M -> Y")
make_data(model)
#> X M Y
#> 1 0 1 0
make_data(model, n = 3, nodes = c("X","Y"))
#> X M Y
#> 1 1 NA 1
#> 2 1 NA 1
#> 3 1 NA 1
make_data(model, n = 3, param_type = "prior_draw")
#> X M Y
#> 1 0 0 0
#> 2 0 0 1
#> 3 1 1 0
make_data(model, n = 10, param_type = "define", parameters = 0:9)
#> X M Y
#> 1 1 0 0
#> 2 1 0 0
#> 3 1 0 1
#> 4 1 0 1
#> 5 1 1 0
#> 6 1 1 0
#> 7 1 1 1
#> 8 1 1 1
#> 9 1 1 1
#> 10 1 1 1
# Data Strategies
# A strategy in which X, Y are observed for sure and M is observed
# with 50% probability for X=1, Y=0 cases
model <- make_model("X -> M -> Y")
make_data(
model,
n = 8,
nodes = list(c("X", "Y"), "M"),
probs = list(1, .5),
subsets = list(TRUE, "X==1 & Y==0"))
#> X M Y
#> 1 0 NA 0
#> 2 0 NA 1
#> 3 1 0 0
#> 4 1 0 0
#> 5 1 NA 0
#> 6 1 NA 1
#> 7 1 NA 0
#> 8 1 NA 1
# n not provided but inferred from largest n_step (not from sum of n_steps)
make_data(
model,
nodes = list(c("X", "Y"), "M"),
n_steps = list(5, 2))
#> X M Y
#> 1 0 0 1
#> 2 0 1 1
#> 3 0 NA 1
#> 4 1 NA 0
#> 5 1 NA 0
# Wide then deep
make_data(
model,
n = 8,
nodes = list(c("X", "Y"), "M"),
subsets = list(TRUE, "!is.na(X) & !is.na(Y)"),
n_steps = list(6, 2))
#> X M Y
#> 1 0 0 0
#> 2 NA NA NA
#> 3 0 NA 1
#> 4 0 NA 1
#> 5 1 NA 0
#> 6 1 0 0
#> 7 1 NA 1
#> 8 NA NA NA
make_data(
model,
n = 8,
nodes = list(c("X", "Y"), c("X", "M")),
subsets = list(TRUE, "is.na(X)"),
n_steps = list(3, 2))
#> X M Y
#> 1 0 NA 0
#> 2 NA NA NA
#> 3 0 NA 1
#> 4 NA NA NA
#> 5 1 0 NA
#> 6 1 0 NA
#> 7 1 NA 0
#> 8 NA NA NA
# Example with probabilities at each step
make_data(
model,
n = 8,
nodes = list(c("X", "Y"), c("X", "M")),
subsets = list(TRUE, "is.na(X)"),
probs = list(.5, .2))
#> X M Y
#> 1 0 0 NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 1 NA 1
#> 6 1 NA 1
#> 7 1 NA 0
#> 8 1 NA 0
# Example with given data
make_data(model, given = "X==1 & Y==1", n = 5)
#> X M Y
#> 1 1 0 1
#> 2 1 0 1
#> 3 1 0 1
#> 4 1 1 1
#> 5 1 1 1
# \donttest{
model <- make_model('X -> Y')
make_events(model = model)
#> event count
#> 1 X0Y0 0
#> 2 X1Y0 1
#> 3 X0Y1 0
#> 4 X1Y1 0
make_events(model = model, param_type = 'prior_draw')
#> event count
#> 1 X0Y0 0
#> 2 X1Y0 0
#> 3 X0Y1 0
#> 4 X1Y1 1
make_events(model = model, include_strategy = TRUE)
#> event strategy count
#> 1 X0Y0 XY 1
#> 2 X1Y0 XY 0
#> 3 X0Y1 XY 0
#> 4 X1Y1 XY 0
# }