Skip to contents

Performs the Causal Rule Ensemble on a data set with a response variable, a treatment variable, and various features.

Usage

cre(y, z, X, method_params = NULL, hyper_params = NULL, ite = NULL)

Arguments

y

An observed response vector.

z

A treatment vector.

X

A covariate matrix (or a data frame). Should be provided as numerical values.

method_params

The list of parameters to define the models used, including:

  • Parameters for Honest Splitting

    • ratio_dis: The ratio of data delegated to rules discovery (default: 0.5).

  • Parameters for Discovery and Inference

    • ite_method: The method for ITE (pseudo-outcome) estimation (default: "aipw", options: "aipw" for Augmented Inverse Probability Weighting, "cf" for Causal Forest, "bart" for Causal Bayesian Additive Regression Trees, "slearner" for S-Learner, "tlearner" for T-Learner, "xlearner" for X-Learner, "tpoisson" for T-Learner with Poisson regression).

    • learner_ps: The model for the propensity score estimation (default: "SL.xgboost", options: any SuperLearner prediction model i.e., "SL.lm", "SL.svm", used only for "aipw", "bart", "cf" ITE estimators).

    • learner_y: The model for the outcome estimation (default: "SL.xgboost", options: any SuperLearner prediction model i.e., "SL.lm", "SL.svm", used only for "aipw", "slearner", "tlearner" and "xlearner" ITE estimators).

hyper_params

The list of hyper parameters to fine-tune the method, including:

  • General hyper parameters

    • intervention_vars: Array with intervention-able covariates names used for Rules Generation. Empty or null array means that all the covariates are considered as intervention-able (default: NULL).

    • ntrees: The number of decision trees for random forest (default: 20).

    • node_size: Minimum size of the trees' terminal nodes (default: 20).

    • max_rules: Maximum number of generated candidates rules (default: 50).

    • max_depth: Maximum rules length (default: 3).

    • t_decay: The decay threshold for rules pruning. Higher values will carry out an aggressive pruning (default: 0.025).

    • t_ext: The threshold to truncate too generic or too specific (extreme) rules (default: 0.01, range: [0, 0.5)).

    • t_corr: The threshold to define correlated rules (default: 1, range: [0,+inf)).

    • stability_selection: Method for stability selection for selecting the rules. "vanilla" for stability selection, "error_control" for stability selection with error control and "no" for no stability selection (default: "vanilla").

    • B: Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20).

    • subsample: Bootstrap ratio subsample for stability selection in rules selection and uncertainty quantification in estimation (default: 0.5).

  • Method specific hyper parameters

    • offset: Name of the covariate to use as offset (i.e., "x1") for T-Poisson ITE estimation. Use NULL if offset is not used (default: NULL).

    • cutoff: Threshold (percentage) defining the minimum cutoff value for the stability scores for Stability Selection (default: 0.9).

    • pfer: Upper bound for the per-family error rate (tolerated amount of falsely selected rules) for Error Control Stability Selection (default: 1).

ite

The estimated ITE vector. If given both the ITE estimation steps in Discovery and Inference are skipped (default: NULL).

Value

An S3 object composed by:

M

the number of Decision Rules extracted at each step,

CATE

the data.frame of Conditional Average Treatment Effect decomposition estimates with corresponding uncertainty quantification,

method_params

the list of method parameters,

hyper_params

the list of hyper parameters,

rules

the list of rules (implicit form) decomposing the CATE.

Note

  • If intervention_vars are provided, it is important to note that the individual treatment effect will still be computed using all covariates.

Examples


# \donttest{
set.seed(123)
dataset <- generate_cre_dataset(n = 400,
                                rho = 0,
                                n_rules = 2,
                                p = 10,
                                effect_size = 2,
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]

method_params <- list(ratio_dis = 0.5,
                      ite_method ="aipw",
                      learner_ps = "SL.xgboost",
                      learner_y = "SL.xgboost")

hyper_params <- list(intervention_vars = NULL,
                     offset = NULL,
                     ntrees = 20,
                     node_size = 20,
                     max_rules = 50,
                     max_depth = 3,
                     t_decay = 0.025,
                     t_ext = 0.025,
                     t_corr = 1,
                     stability_selection = "vanilla",
                     cutoff = 0.6,
                     pfer = 1,
                     B = 20,
                     subsample = 0.5)

cre_results <- cre(y, z, X, method_params, hyper_params)
#> Loading required package: nnls
#> Registered S3 method overwritten by 'randomForest':
#>   method      from
#>   plot.margin RRF 
# }