Installation
Installing from CRAN.
install.packages("CRE")
Installing the latest developing version.
library(devtools)
install_github("NSAPH-Software/CRE", ref = "develop")
Import.
Arguments
Data (required)
y
The observed response/outcome vector
(binary or continuous).
z
The treatment/exposure/policy vector
(binary).
X
The covariate matrix (binary or
continuous).
Parameters (not required)
method_parameters
The list of
parameters to define the models used, including:
- ratio_dis
The ratio of data delegated to
the discovery sub-sample (default: 0.5).
- ite_method
The method to estimate the
individual treatment effect (default: “aipw”) [1].
- learner_ps
The (SuperLearner)
model for the propensity score estimation (default: “SL.xgboost”, used
only for “aipw”,“bart”,“cf” ITE estimators).
- learner_y
The (SuperLearner)
model for the outcome estimation (default: “SL.xgboost”, used only for
“aipw”,“slearner”,“tlearner” and “xlearner” ITE estimators).
hyper_params
The list of hyper
parameters to fine tune the method, including:
- intervention_vars
Intervention-able
variables used for Rules Generation (default: NULL
).
- ntrees
The number of decision trees for
random forest (default: 20).
- node_size
Minimum size of the trees’
terminal nodes (default: 20).
- max_rules
Maximum number of candidate
decision rules (default: 50).
- max_depth
Maximum rules length (default:
3).
- t_decay
The decay threshold for rules
pruning (default: 0.025).
- t_ext
The threshold to define too
generic or too specific (extreme) rules (default: 0.01).
- t_corr
The threshold to define
correlated rules (default: 1).
- stability_selection
Method for stability
selection for selecting the rules. vanilla
for stability
selection, error_control
for stability selection with error
control and no
for no stability selection (default:
vanilla
).
- B
Number of bootstrap samples for
stability selection in rules selection and uncertainty quantification in
estimation (default: 20). - subsample
Bootstrap ratio subsample for stability selection in rules selection and
uncertainty quantification in estimation (default: 0.5). -
offset
Name of the covariate to use as
offset (i.e. “x1”) for T-Poisson ITE Estimation. NULL
if
not used (default: NULL
).
- cutoff
Threshold defining the minimum
cutoff value for the stability scores in Stability Selection (default:
0.9).
- pfer
Upper bound for the per-family
error rate (tolerated amount of falsely selected rules) in Error Control
Stability Selection (default: 1).
Additional Estimates (not required)
ite
The estimated ITE vector. If given,
both the ITE estimation steps in Discovery and Inference are skipped
(default: NULL
).
Notes
Options for the ITE estimation
[1] Options for the ITE estimation are as
follows:
- S-Learner
(slearner
).
- T-Learner
(tlearner
)
- T-Poisson(tpoisson
)
- X-Learner
(xlearner
)
- Augmented
Inverse Probability Weighting (aipw
)
- Causal Forests
(cf
)
- Causal Bayesian
Additive Regression Trees (bart
)
If other estimates of the ITE are provided in ite
additional argument, both the ITE estimations in discovery and inference
are skipped and those values estimates are used instead. The ITE
estimator requires also an outcome learner and/or a propensity score
learner from the SuperLearner
package (i.e., “SL.lm”, “SL.svm”). Both these models are simple
classifiers/regressors. By default XGBoost algorithm is used for both
these steps.
Customized wrapper for SuperLearner
One can create a customized wrapper for SuperLearner internal packages. The following is an example of providing the number of cores (e.g., 12) for the xgboost package in a shared memory system.
m_xgboost <- function(nthread = 12, ...) {
SuperLearner::SL.xgboost(nthread = nthread, ...)
}
Then use “m_xgboost”, instead of “SL.xgboost”.
Examples
Example 1 (default parameters)
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
cre_results <- cre(y, z, X)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
Example 2 (personalized ite estimation)
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
ite_pred <- ... # personalized ite estimation
cre_results <- cre(y, z, X, ite = ite_pred)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
Example 3 (setting parameters)
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
method_params <- list(ratio_dis = 0.5,
ite_method ="aipw",
learner_ps = "SL.xgboost",
learner_y = "SL.xgboost")
hyper_params <- list(intervention_vars = c("x1","x2","x3","x4"),
offset = NULL,
ntrees = 20,
node_size = 20,
max_rules = 50,
max_depth = 3,
t_decay = 0.025,
t_ext = 0.025,
t_corr = 1,
stability_selection = "vanilla",
cutoff = 0.8,
pfer = 1,
B = 10,
subsample = 0.5)
cre_results <- cre(y, z, X, method_params, hyper_params)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
More synthetic data sets can be generated using
generate_cre_dataset()
.