CRE • CRE

Installation

Installing from CRAN.

install.packages("CRE")

Installing the latest developing version.

library(devtools)
install_github("NSAPH-Software/CRE", ref = "develop")

Import.

library("CRE")

Arguments

Data (required)

y The observed response/outcome vector (binary or continuous).

z The treatment/exposure/policy vector (binary).

X The covariate matrix (binary or continuous).

Parameters (not required)

method_parameters The list of parameters to define the models used, including:
- ratio_dis The ratio of data delegated to the discovery sub-sample (default: 0.5).
- ite_method The method to estimate the individual treatment effect (default: “aipw”) [1].
- learner_ps The (SuperLearner) model for the propensity score estimation (default: “SL.xgboost”, used only for “aipw”,“bart”,“cf” ITE estimators).
- learner_y The (SuperLearner) model for the outcome estimation (default: “SL.xgboost”, used only for “aipw”,“slearner”,“tlearner” and “xlearner” ITE estimators).

hyper_params The list of hyper parameters to fine tune the method, including:
- intervention_vars Intervention-able variables used for Rules Generation (default: NULL).
- ntrees The number of decision trees for random forest (default: 20).
- node_size Minimum size of the trees’ terminal nodes (default: 20).
- max_rules Maximum number of candidate decision rules (default: 50).
- max_depth Maximum rules length (default: 3).
- t_decay The decay threshold for rules pruning (default: 0.025).
- t_ext The threshold to define too generic or too specific (extreme) rules (default: 0.01).
- t_corr The threshold to define correlated rules (default: 1).
- stability_selection Method for stability selection for selecting the rules. vanilla for stability selection, error_control for stability selection with error control and no for no stability selection (default: vanilla).
- B Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20). - subsample Bootstrap ratio subsample for stability selection in rules selection and uncertainty quantification in estimation (default: 0.5). - offset Name of the covariate to use as offset (i.e. “x1”) for T-Poisson ITE Estimation. NULL if not used (default: NULL).
- cutoff Threshold defining the minimum cutoff value for the stability scores in Stability Selection (default: 0.9).
- pfer Upper bound for the per-family error rate (tolerated amount of falsely selected rules) in Error Control Stability Selection (default: 1).

Additional Estimates (not required)

ite The estimated ITE vector. If given, both the ITE estimation steps in Discovery and Inference are skipped (default: NULL).

Notes

Options for the ITE estimation

[1] Options for the ITE estimation are as follows:
- S-Learner (slearner).
- T-Learner (tlearner)
- T-Poisson(tpoisson)
- X-Learner (xlearner)
- Augmented Inverse Probability Weighting (aipw)
- Causal Forests (cf)
- Causal Bayesian Additive Regression Trees (bart)

If other estimates of the ITE are provided in ite additional argument, both the ITE estimations in discovery and inference are skipped and those values estimates are used instead. The ITE estimator requires also an outcome learner and/or a propensity score learner from the SuperLearner package (i.e., “SL.lm”, “SL.svm”). Both these models are simple classifiers/regressors. By default XGBoost algorithm is used for both these steps.

Customized wrapper for SuperLearner

One can create a customized wrapper for SuperLearner internal packages. The following is an example of providing the number of cores (e.g., 12) for the xgboost package in a shared memory system.

m_xgboost <- function(nthread = 12, ...) {
  SuperLearner::SL.xgboost(nthread = nthread, ...)
}

Then use “m_xgboost”, instead of “SL.xgboost”.

Examples

Example 1 (default parameters)

set.seed(9687)
dataset <- generate_cre_dataset(n = 1000, 
                                rho = 0, 
                                n_rules = 2, 
                                p = 10,
                                effect_size = 2, 
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]

cre_results <- cre(y, z, X)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)

Example 2 (personalized ite estimation)

set.seed(9687)
dataset <- generate_cre_dataset(n = 1000, 
                                rho = 0, 
                                n_rules = 2, 
                                p = 10,
                                effect_size = 2, 
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
  y <- dataset[["y"]]
  z <- dataset[["z"]]
  X <- dataset[["X"]]

ite_pred <- ... # personalized ite estimation
cre_results <- cre(y, z, X, ite = ite_pred)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)

Example 3 (setting parameters)

set.seed(9687)
dataset <- generate_cre_dataset(n = 1000, 
                                rho = 0, 
                                n_rules = 2, 
                                p = 10,
                                effect_size = 2, 
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]

method_params <- list(ratio_dis = 0.5,
                      ite_method ="aipw",
                      learner_ps = "SL.xgboost",
                      learner_y = "SL.xgboost")

hyper_params <- list(intervention_vars = c("x1","x2","x3","x4"),
                     offset = NULL,
                     ntrees = 20,
                     node_size = 20,
                     max_rules = 50,
                     max_depth = 3,
                     t_decay = 0.025,
                     t_ext = 0.025,
                     t_corr = 1,
                     stability_selection = "vanilla",
                     cutoff = 0.8,
                     pfer = 1,
                     B = 10,
                     subsample = 0.5)

cre_results <- cre(y, z, X, method_params, hyper_params)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)

More synthetic data sets can be generated using generate_cre_dataset().