Performs the Causal Rule Ensemble on a data set with a response variable, a treatment variable, and various features.
Arguments
- y
An observed response vector.
- z
A treatment vector.
- X
A covariate matrix (or a data frame). Should be provided as numerical values.
- method_params
The list of parameters to define the models used, including:
Parameters for Honest Splitting
ratio_dis: The ratio of data delegated to rules discovery (default: 0.5).
Parameters for Discovery and Inference
ite_method: The method for ITE (pseudo-outcome) estimation (default:
"aipw"
, options:"aipw"
for Augmented Inverse Probability Weighting,"cf"
for Causal Forest,"bart"
for Causal Bayesian Additive Regression Trees,"slearner"
for S-Learner,"tlearner"
for T-Learner,"xlearner"
for X-Learner,"tpoisson"
for T-Learner with Poisson regression).learner_ps: The model for the propensity score estimation (default:
"SL.xgboost"
, options: any SuperLearner prediction model i.e.,"SL.lm"
,"SL.svm"
, used only for"aipw"
,"bart"
,"cf"
ITE estimators).learner_y: The model for the outcome estimation (default:
"SL.xgboost"
, options: any SuperLearner prediction model i.e.,"SL.lm"
,"SL.svm"
, used only for"aipw"
,"slearner"
,"tlearner"
and"xlearner"
ITE estimators).
- hyper_params
The list of hyper parameters to fine-tune the method, including:
General hyper parameters
intervention_vars: Array with intervention-able covariates names used for Rules Generation. Empty or null array means that all the covariates are considered as intervention-able (default:
NULL
).ntrees: The number of decision trees for random forest (default: 20).
node_size: Minimum size of the trees' terminal nodes (default: 20).
max_rules: Maximum number of generated candidates rules (default: 50).
max_depth: Maximum rules length (default: 3).
t_decay: The decay threshold for rules pruning. Higher values will carry out an aggressive pruning (default: 0.025).
t_ext: The threshold to truncate too generic or too specific (extreme) rules (default: 0.01, range: [0, 0.5)).
t_corr: The threshold to define correlated rules (default: 1, range:
[0,+inf)
).stability_selection: Method for stability selection for selecting the rules.
"vanilla"
for stability selection,"error_control"
for stability selection with error control and"no"
for no stability selection (default:"vanilla"
).B: Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20).
subsample: Bootstrap ratio subsample for stability selection in rules selection and uncertainty quantification in estimation (default: 0.5).
Method specific hyper parameters
offset: Name of the covariate to use as offset (i.e.,
"x1"
) for T-Poisson ITE estimation. UseNULL
if offset is not used (default:NULL
).cutoff: Threshold (percentage) defining the minimum cutoff value for the stability scores for Stability Selection (default: 0.9).
pfer: Upper bound for the per-family error rate (tolerated amount of falsely selected rules) for Error Control Stability Selection (default: 1).
- ite
The estimated ITE vector. If given both the ITE estimation steps in Discovery and Inference are skipped (default:
NULL
).
Value
An S3 object composed by:
- M
the number of Decision Rules extracted at each step,
- CATE
the data.frame of Conditional Average Treatment Effect decomposition estimates with corresponding uncertainty quantification,
- method_params
the list of method parameters,
- hyper_params
the list of hyper parameters,
- rules
the list of rules (implicit form) decomposing the CATE.
Note
If
intervention_vars
are provided, it is important to note that the individual treatment effect will still be computed using all covariates.
Examples
# \donttest{
set.seed(123)
dataset <- generate_cre_dataset(n = 400,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
method_params <- list(ratio_dis = 0.5,
ite_method ="aipw",
learner_ps = "SL.xgboost",
learner_y = "SL.xgboost")
hyper_params <- list(intervention_vars = NULL,
offset = NULL,
ntrees = 20,
node_size = 20,
max_rules = 50,
max_depth = 3,
t_decay = 0.025,
t_ext = 0.025,
t_corr = 1,
stability_selection = "vanilla",
cutoff = 0.6,
pfer = 1,
B = 20,
subsample = 0.5)
cre_results <- cre(y, z, X, method_params, hyper_params)
#> Loading required package: nnls
#> Registered S3 method overwritten by 'randomForest':
#> method from
#> plot.margin RRF
# }