Skip to contents

Various helpers to simulate data and to manipulate data types between compact and long forms.

collapse_data can be used to convert long form data to compact form data,

expand_data can be used to convert compact form data (one row per data type) to long form data (one row per observation).

make_data generates a dataset with one row per observation.

make_events generates a dataset with one row for each data type. Draws full data only. To generate various types of incomplete data see make_data.

Usage

collapse_data(
  data,
  model,
  drop_NA = TRUE,
  drop_family = FALSE,
  summary = FALSE
)

expand_data(data_events = NULL, model)

make_data(
  model,
  n = NULL,
  parameters = NULL,
  param_type = NULL,
  nodes = NULL,
  n_steps = NULL,
  probs = NULL,
  subsets = TRUE,
  complete_data = NULL,
  given = NULL,
  verbose = FALSE,
  ...
)

make_events(
  model,
  n = 1,
  w = NULL,
  P = NULL,
  A = NULL,
  parameters = NULL,
  param_type = NULL,
  include_strategy = FALSE,
  ...
)

Arguments

data

A data.frame. Data of nodes that can take three values: 0, 1, and NA. In long form as generated by make_events

model

A causal_model. A model object generated by make_model.

drop_NA

Logical. Whether to exclude strategy families that contain no observed data. Exceptionally if no data is provided, minimal data on data on first node is returned. Defaults to `TRUE`

drop_family

Logical. Whether to remove column strategy from the output. Defaults to `FALSE`.

summary

Logical. Whether to return summary of the data. See details. Defaults to `FALSE`.

data_events

A 'compact' data.frame with one row per data type. Must be compatible with nodes in model. The default columns are event, strategy and count.

n

An integer. Number of observations.

parameters

A vector of real numbers in [0,1]. Values of parameters to specify (optional). By default, parameters is drawn from the parameters dataframe. See inspect(model, "parameters_df").

param_type

A character. String specifying type of parameters to make 'flat', 'prior_mean', 'posterior_mean', 'prior_draw', 'posterior_draw', 'define. With param_type set to define use arguments to be passed to make_priors; otherwise flat sets equal probabilities on each nodal type in each parameter set; prior_mean, prior_draw, posterior_mean, posterior_draw take parameters as the means or as draws from the prior or posterior.

nodes

A list. Which nodes to be observed at each step. If NULL all nodes are observed.

n_steps

A list. Number of observations to be observed at each step

probs

A list. Observation probabilities at each step

subsets

A list. Strata within which observations are to be observed at each step. TRUE for all, otherwise an expression that evaluates to a logical condition.

complete_data

A data.frame. Dataset with complete observations. Optional.

given

A string specifying known values on nodes, e.g. "X==1 & Y==1"

verbose

Logical. If TRUE prints step schedule.

...

Arguments to be passed to make_priors if param_type == define

w

A numeric matrix. A `n_parameters x 1` matrix of event probabilities with named rows.

P

A data.frame. Parameter matrix. Not required but may be provided to avoid repeated computation for simulations. See inspect(model, "parameter_matrix").

A

A data.frame. Ambiguities matrix. Not required but may be provided to avoid repeated computation for simulations. inspect(model, "ambiguities_matrix")

include_strategy

Logical. Whether to include a 'strategy' vector. Defaults to FALSE. Strategy vector does not vary with full data but expected by some functions.

Value

A vector of data events

If summary = TRUE `collapse_data` returns a list containing the following components:

data_events

A compact data.frame of event types and strategies.

observed_events

A vector of character strings specifying the events observed in the data

unobserved_events

A vector of character strings specifying the events not observed in the data

A data.frame with rows as data observation

A data.frame with simulated data.

A data.frame of events

Details

Note that default behavior is not to take account of whether a node has already been observed when determining whether to select or not. One can however specifically request observation of nodes that have not been previously observed.

See also

Other data_generation: get_all_data_types(), make_data_single(), observe_data()

Other data_generation: get_all_data_types(), make_data_single(), observe_data()

Examples

# \donttest{

model <- make_model('X -> Y')

df <- data.frame(X = c(0,1,NA), Y = c(0,0,1))

df |> collapse_data(model)
#>   event strategy count
#> 1  X0Y0       XY     1
#> 2  X1Y0       XY     1
#> 3  X0Y1       XY     0
#> 4  X1Y1       XY     0
#> 5    Y0        Y     0
#> 6    Y1        Y     1

# Illustrating options

df |> collapse_data(model, drop_NA = FALSE)
#>   event strategy count
#> 1  X0Y0       XY     1
#> 2  X1Y0       XY     1
#> 3  X0Y1       XY     0
#> 4  X1Y1       XY     0
#> 5    Y0        Y     0
#> 6    Y1        Y     1
#> 7    X0        X     0
#> 8    X1        X     0

df |> collapse_data(model, drop_family = TRUE)
#>   event count
#> 1  X0Y0     1
#> 2  X1Y0     1
#> 3  X0Y1     0
#> 4  X1Y1     0
#> 5    Y0     0
#> 6    Y1     1

df |> collapse_data(model, summary = TRUE)
#> $data_events
#>   event strategy count
#> 1  X0Y0       XY     1
#> 2  X1Y0       XY     1
#> 3  X0Y1       XY     0
#> 4  X1Y1       XY     0
#> 5    Y0        Y     0
#> 6    Y1        Y     1
#> 
#> $observed_events
#> [1] "X0Y0" "X1Y0" "Y1"  
#> 
#> $unobserved_events
#> [1] "X0Y1" "X1Y1" "Y0"  
#> 

# Appropriate behavior given restricted models

model <- make_model('X -> Y') |>
  set_restrictions('X[]==1')
df <- make_data(model, n = 10)
df[1,1] <- ''
df |> collapse_data(model)
#>   event strategy count
#> 1  X0Y0       XY     2
#> 2  X0Y1       XY     7
#> 3    Y0        Y     1
#> 4    Y1        Y     0

df <- data.frame(X = 0:1)
df |> collapse_data(model)
#> X1 data is inconsistent with model and ignored
#>   event strategy count
#> 1    X0        X     1

# }

# \donttest{
model <- make_model('X->M->Y')
make_events(model, n = 5) |>
  expand_data(model)
#>   X M Y
#> 1 0 0 0
#> 2 0 0 1
#> 3 1 0 0
#> 4 1 0 1
#> 5 1 1 1
make_events(model, n = 0) |>
  expand_data(model)
#>    X  M  Y
#> 1 NA NA NA
 # }


# Simple draws
model <- make_model("X -> M -> Y")
make_data(model)
#>   X M Y
#> 1 0 1 0
make_data(model, n = 3, nodes = c("X","Y"))
#>   X  M Y
#> 1 1 NA 1
#> 2 1 NA 1
#> 3 1 NA 1
make_data(model, n = 3, param_type = "prior_draw")
#>   X M Y
#> 1 0 0 0
#> 2 0 0 1
#> 3 1 1 0
make_data(model, n = 10, param_type = "define", parameters =  0:9)
#>    X M Y
#> 1  1 0 0
#> 2  1 0 0
#> 3  1 0 1
#> 4  1 0 1
#> 5  1 1 0
#> 6  1 1 0
#> 7  1 1 1
#> 8  1 1 1
#> 9  1 1 1
#> 10 1 1 1

# Data Strategies
# A strategy in which X, Y are observed for sure and M is observed
# with 50% probability for X=1, Y=0 cases

model <- make_model("X -> M -> Y")
make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), "M"),
  probs = list(1, .5),
  subsets = list(TRUE, "X==1 & Y==0"))
#>   X  M Y
#> 1 0 NA 0
#> 2 0 NA 1
#> 3 1  0 0
#> 4 1  0 0
#> 5 1 NA 0
#> 6 1 NA 1
#> 7 1 NA 0
#> 8 1 NA 1

# n not provided but inferred from largest n_step (not from sum of n_steps)
make_data(
  model,
  nodes = list(c("X", "Y"), "M"),
  n_steps = list(5, 2))
#>   X  M Y
#> 1 0  0 1
#> 2 0  1 1
#> 3 0 NA 1
#> 4 1 NA 0
#> 5 1 NA 0

# Wide then deep
  make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), "M"),
  subsets = list(TRUE, "!is.na(X) & !is.na(Y)"),
  n_steps = list(6, 2))
#>    X  M  Y
#> 1  0  0  0
#> 2 NA NA NA
#> 3  0 NA  1
#> 4  0 NA  1
#> 5  1 NA  0
#> 6  1  0  0
#> 7  1 NA  1
#> 8 NA NA NA


make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), c("X", "M")),
  subsets = list(TRUE, "is.na(X)"),
  n_steps = list(3, 2))
#>    X  M  Y
#> 1  0 NA  0
#> 2 NA NA NA
#> 3  0 NA  1
#> 4 NA NA NA
#> 5  1  0 NA
#> 6  1  0 NA
#> 7  1 NA  0
#> 8 NA NA NA

# Example with probabilities at each step

make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), c("X", "M")),
  subsets = list(TRUE, "is.na(X)"),
  probs = list(.5, .2))
#>    X  M  Y
#> 1  0  0 NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5  1 NA  1
#> 6  1 NA  1
#> 7  1 NA  0
#> 8  1 NA  0

# Example with given data
make_data(model, given = "X==1 & Y==1", n = 5)
#>   X M Y
#> 1 1 0 1
#> 2 1 0 1
#> 3 1 0 1
#> 4 1 1 1
#> 5 1 1 1
# \donttest{
model <- make_model('X -> Y')
make_events(model = model)
#>   event count
#> 1  X0Y0     0
#> 2  X1Y0     1
#> 3  X0Y1     0
#> 4  X1Y1     0
make_events(model = model, param_type = 'prior_draw')
#>   event count
#> 1  X0Y0     0
#> 2  X1Y0     0
#> 3  X0Y1     0
#> 4  X1Y1     1
make_events(model = model, include_strategy = TRUE)
#>   event strategy count
#> 1  X0Y0       XY     1
#> 2  X1Y0       XY     0
#> 3  X0Y1       XY     0
#> 4  X1Y1       XY     0
# }