Skip to contents

A dataset containing exposure, confounders, and outcome for causal inference studies. The dataset is hosted on Harvard dataverse doi:10.7910/DVN/L7YF2G . This dataset was produced from five different resources. Please see https://github.com/NSAPH-Projects/synthetic_data/ for the data processing pipelines. In the following

Exposure Data

The exposure parameter is PM2.5. Di et al. (2019) provided daily, and annual PM2.5 estimates at 1 km×1 km grid cells in the entire United States. The data can be downloaded from Di et al. (2021). Features in this category starts with qd_ prefix.

Census Data

The main reference for getting the census data is the United States Census Bureau. There are numerous studies and surveys for different geographical resolutions. We use 2010 county level American County Survey at the county level (acs5). Features in this category starts with cs_ prefix.

CDC Data

The Centers for Disease Control and Prevention (CDC), provides the Behavioral Risk Factor Surveillance System (Centers for Disease Control and Prevention (2021)), which is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors.

GridMET Data

Climatology Lab at the University of California, Merced, provides the GridMET data (Abatzoglou (2013)). The data set is daily surface meteorological data covering the contiguous United States.

CMS Data

The Centers for Medicare and Medicaid Services(CMS) provides synthetic data at the county level for 2008-2010 (Centers for Medicare & Medicaid Services (2021)).

The definition of each variables are provided below. All data are collected for 2010 and aggregated into the county level and in the contiguous United States.

Usage

data(synthetic_us_2010)

Format

A data frame with 3109 rows and 46 variables:

qd_mean_pm25

Mean PM2.5 (microgram/m3)

cs_poverty

The proportion of below poverty level population among 65+ years old.

cs_hispanic

The proportion of Hispanic or Latino population among 65+ years old.

cs_black

The proportion of Black or African American population among 65+ years old.

cs_white

The proportion of White population among 65 years and over.

cs_native

The proportion of American Indian or Alaska native population among 65 years and over.

cs_asian

The proportion of Asian population among 65 years and over.

cs_other

The proportion of other races population among 65 years and over.

cs_ed_below_highschool

The proportion of the population with below high school level education among 65 years and over.

cs_household_income

Median Household income in the past 12 months (in 2010 inflation-adjusted dollars) where householder is 65 years and over.

cs_median_house_value

Median house value (USD)

cs_total_population

Total Population

cs_area

Area of each county (square miles)

cs_population_density

The number of the population in one square mile.

cdc_mean_bmi

Body Mass Index.

cdc_pct_cusmoker

The proportion of current smokers.

cdc_pct_sdsmoker

The proportion of some days smokers.

cdc_pct_fmsmoker

The proportion of former smokers.

cdc_pct_nvsmoker

The proportion of never smokers.

cdc_pct_nnsmoker

The proportion of not known smokers.

gmet_mean_tmmn

Annual mean of daily minimum temperature (K)

gmet_mean_summer_tmmn

The mean of daily minimum temperature during summer (K)

gmet_mean_winter_tmmn

The mean of daily minimum temperature during winter (K)

gmet_mean_tmmx

Annual mean of daily maximum temperature (K)

gmet_mean_summer_tmmx

The mean of daily maximum temperature during summer (K)

gmet_mean_winter_tmmx

The mean of daily maximum temperature during winter (K)

gmet_mean_rmn

Annual mean of daily minimum relative humidity (%)

gmet_mean_summer_rmn

The mean of daily minimum relative humidity during summer (%)

gmet_mean_winter_rmn

The mean of daily minimum relative humidity during winter (%)

gmet_mean_rmx

Annual mean of daily maximum relative humidity (%)

gmet_mean_summer_rmx

The mean of daily maximum relative humidity during summer (%)

gmet_mean_winter_rmx

The mean of daily maximum relative humidity during winter (%)

gmet_mean_sph

Annual mean of daily mean specific humidity (kg/kg)

gmet_mean_summer_sph

The mean of daily mean specific humidity during summer(kg/kg)

gmet_mean_winter_sph

The mean of daily mean specific humidity during winter(kg/kg)

cms_mortality_pct

The proportion of deceased patients.

cms_white_pct

The proportion of White patients.

cms_black_pct

The proportion of Black patients.

cms_hispanic_pct

The proportion of Hispanic patients.

cms_others_pct

The proportion of Other patients.

cms_female_pct

The proportion of Female patients.

region

The region that the county is located in.

  NORTHEAST=("NY","MA","PA","RI","NH","ME","VT","CT","NJ")
  SOUTH=("DC","VA","NC","WV","KY","SC","GA","FL","AL","TN","MS","AR","MD","DE","OK","TX","LA")
  MIDWEST=c("OH","IN","MI","IA","MO","WI","MN","SD","ND","IL","KS","NE")
  WEST=c("MT","CO","WY","ID","UT","NV","CA","OR","WA","AZ","NM")

FIPS

Federal Information Processing Standards, a unique ID for each county.

NAME

County, State name.

STATE

State abbreviation.

STATE_CODE

State numerical code.

References

Abatzoglou, John T. 2013. “Development of Gridded Surface Meteorological Data for Ecological Applications and Modelling.” International Journal of Climatology 33 (1): 121–31. doi:10.1002/joc.3413 .

Centers for Disease Control and Prevention. 2021. “Behavioral Risk Factor Surveillance System.” https://www.cdc.gov/brfss/annual_data/annual_2010.htm/.

Centers for Medicare & Medicaid Services. 2021. “CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF).” https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf.

Di, Qian, Heresh Amini, Liuhua Shi, Itai Kloog, Rachel Silvern, James Kelly, M Benjamin Sabath, et al. 2019. “An Ensemble-Based Model of Pm2. 5 Concentration Across the Contiguous United States with High Spatiotemporal Resolution.” Environment International 130: 104909. doi:10.1016/j.envint.2019.104909 .

Di, Qian, Yaguang Wei, Alexandra Shtein, Carolynne Hultquist, Xiaoshi Xing, Heresh Amini, Liuhua Shi, et al. 2021. “Daily and Annual Pm2.5 Concentrations for the Contiguous United States, 1-Km Grids, V1 (2000 - 2016).” NASA Socioeconomic Data; Applications Center (SEDAC). doi:10.7927/0rvr-4538 .