R/generate_synthetic_data_covs.R
generate_syn_data_covs.Rd
Generates synthetic data set based on different GPS models and covariates.
generate_syn_data_covs(sample_size = 10000, gps_spec = 1)
Desired size of generated synthetic dataset
A flag that determines the level of complexity in the generated synthetic data. Available options:
gps_spec == 1
: In this scenario, there's no confounding, meaning the
treatment variable is independent of the covariates. The mu (mean of the
truncated normal distribution from which treatment values are drawn) is
set to 3.
gps_spec == 2
: In this scenario, the confounding is included. The
treatment is not independent of the covariates; it's influenced by the
variables cf[, 1]
, cf[, 2]
, cf[, 3]
, cf[, 4]
, cf5
, and cf6
.
These factors are incorporated into the computation of mu
which is
then used to generate the treatment variable.
gps_spec == 3
: Similar to gps_spec == 2
, but it introduces additional
complexity. Not only are the variables cf[, 1]
, cf[, 2]
, cf[, 3]
,
cf[, 4]
, cf5
, and cf6
affecting the treatment, but the effect
modifiers em1
and em2
are also included in the mu
calculation.
A data.frame of synthetic data set that includes covariates and treatment.
data <- generate_syn_data_covs(sample_size = 500, gps_spec = 1)