R/generate_synthetic_data_covs.R
generate_syn_data_covs.RdGenerates synthetic data set based on different GPS models and covariates.
generate_syn_data_covs(sample_size = 10000, gps_spec = 1)Desired size of generated synthetic dataset
A flag that determines the level of complexity in the generated synthetic data. Available options:
gps_spec == 1: In this scenario, there's no confounding, meaning the
treatment variable is independent of the covariates. The mu (mean of the
truncated normal distribution from which treatment values are drawn) is
set to 3.
gps_spec == 2: In this scenario, the confounding is included. The
treatment is not independent of the covariates; it's influenced by the
variables cf[, 1], cf[, 2], cf[, 3], cf[, 4], cf5, and cf6.
These factors are incorporated into the computation of mu which is
then used to generate the treatment variable.
gps_spec == 3: Similar to gps_spec == 2, but it introduces additional
complexity. Not only are the variables cf[, 1], cf[, 2], cf[, 3],
cf[, 4], cf5, and cf6 affecting the treatment, but the effect
modifiers em1 and em2 are also included in the mu calculation.
A data.frame of synthetic data set that includes covariates and treatment.
data <- generate_syn_data_covs(sample_size = 500, gps_spec = 1)