Installation
library("devtools")
install_github("NSAPHSoftware/CausalGPS", ref="master")
library("CausalGPS")
Usage
Input parameters:
Y
A vector of observed outcome
variable.w
A vector of observed continuous exposure
variable.c
A data.frame or matrix of observed
covariates variable.ci_appr
The causal inference approach.
Possible values are:
 “matching”: Matching by GPS
 “weighting”: Weighting by GPSgps_density
Model density type which is
used for estimating GPS value, including normal (default) and
kernel.use_cov_transform
If TRUE, the function
uses transformer to meet the covariate balance.transformers
A list of transformers. Each
transformer should be a unary function. You can pass name of customized
function in the quotes.
Available transformers:
 pow2: to the power of 2
 pow3: to the power of 3bin_seq
Sequence of w (treatment) to
generate pseudo population. If NULL is passed the default value will be
used, which is
seq(min(w)+delta_n/2,max(w), by=delta_n)
.exposure_trim_qtls
A numerical vector of
two. Represents the trim quantile level for exposure value. Both numbers
should be in the range of [0,1] and in increasing order (default:
c(0.01,0.99)). gps_trim_qtls
A numerical
vector of two. Represents the trim quantile level for gps value. Both
numbers should be in the range of [0,1] and in increasing order
(default: c(0.0, 1.0)).params
Includes list of params that is
used internally. Unrelated parameters will be ignored.sl_lib
: A vector of prediction algorithms.
nthread
An integer value that represents
the number of threads to be used by internal packages....
Additional arguments passed to
different models.
Additional parameters
Causal Inference Approach (ci.appr
)
 if ci.appr = ‘matching’:

dist_measure: Distance measuring function. Available
options:
 l1: Manhattan distance matching
 l1: Manhattan distance matching

delta_n: caliper parameter.

scale: a specified scale parameter to control the relative
weight that is attributed to the distance measures of the exposure
versus the GPS.

covar_bl_method: covariate balance method. Available
options:
 ‘absolute’
 ‘absolute’

covar_bl_trs: covariate balance threshold
 covar_bl_trs_type: covariate balance type (mean, median, maximal)

max_attempt: maximum number of attempt to satisfy covariate
balance.
See create_matching() for more details about the parameters and default values.

dist_measure: Distance measuring function. Available
options:
 if ci.appr = ‘weighting’:

covar_bl_method: Covariate balance method.

covar_bl_trs: Covariate balance threshold
 max_attempt: Maximum number of attempt to satisfy covariate balance.

covar_bl_method: Covariate balance method.
 Generating Pseudo Population
set.seed(422)
n < 10000
mydata < generate_syn_data(sample_size = n)
year < sample(x=c("2001", "2002", "2003", "2004", "2005"), size = n,
replace = TRUE)
region < sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year < as.factor(year)
mydata$region < as.factor(region)
mydata$cf5 < as.factor(mydata$cf5)
pseudo_pop < generate_pseudo_pop(
mydata[, c("id", "Y")],
mydata[, c("id", "w")],
mydata[, c("id", "cf1", "cf2", "cf3", "cf4",
"cf5", "cf6","year","region")],
ci_appr = "matching",
gps_density = "kernel",
use_cov_transform = TRUE,
transformers = list("pow2", "pow3", "abs",
"scale"),
exposure_trim_qtls = c(0.01,0.99),
sl_lib = c("m_xgboost"),
covar_bl_method = "absolute",
covar_bl_trs = 0.1,
covar_bl_trs_type = "mean",
max_attempt = 4,
dist_measure = "l1",
delta_n = 1,
scale = 0.5,
nthread = 1)
plot(pseudo_pop)
matching_fn
is Manhattan distance
matching approach. For prediction model we use SuperLearner
package. SuperLearner supports different machine learning methods and
packages. params
is a list of
hyperparameters that users can pass to the third party libraries in the
SuperLearner package. All hyperparameters go into the params list. The
prefixes are used to distinguished parameters for different libraries.
The following table shows the external package names, their equivalent
name that should be used in sl_lib
, the
prefixes that should be used for their hyperparameters in the
params
list, and available
hyperparameters.
Package name 
sl_lib name 
prefix  available hyperparameters 

XGBoost  m_xgboost 
xgb_ 
nrounds, eta, max_depth, min_child_weight 
ranger  m_ranger 
rgr_ 
num.trees, write.forest, replace, verbose, family 
nthread
is the number of available
threads (cores). XGBoost needs OpenMP installed on the system to
parallelize the processing.
 Estimating GPS
data_with_gps < estimate_gps(w,
c,
params = list(xgb_max_depth = c(3,4,5),
xgb_rounds = c(10,20,30,40)),
nthread = 1,
sl_lib = c("m_xgboost")
)
 Estimating Exposure Rate Function
estimate_npmetric_erf<function(matched_Y,
matched_w,
matched_counter = NULL,
bw_seq=seq(0.2,2,0.2),
w_vals,
nthread)
 Generating Synthetic Data
syn_data < generate_syn_data(sample_size=1000,
outcome_sd = 10,
gps_spec = 1,
cova_spec = 1)
 Logging
The CausalGPS package is logging internal activities into the
CausalGPS.log
file. The file is located in the source file
location and will be appended. Users can change the logging file name
(and path) and logging threshold. The logging mechanism has different
thresholds (see logger package).
The two most important thresholds are INFO and DEBUG levels. The former,
which is the default level, logs more general information about the
process. The latter, if activated, logs more detailed information that
can be used for debugging purposes.