Public data set for air pollution and health studies, case study: 2010 county-Level data set for the contiguous United States
Source:R/data.R
synthetic_us_2010.Rd
A dataset containing exposure, confounders, and outcome for causal inference studies. The dataset is hosted on Harvard dataverse doi:10.7910/DVN/L7YF2G . This dataset was produced from five different resources. Please see https://github.com/NSAPH-Projects/synthetic_data/ for the data processing pipelines. In the following
Exposure Data
The exposure parameter is PM2.5. Di et al. (2019) provided daily, and annual PM2.5 estimates at 1 km×1 km grid cells in the entire United States. The data can be downloaded from Di et al. (2021). Features in this category starts with qd_ prefix.
Census Data
The main reference for getting the census data is the United States Census
Bureau. There are numerous studies and surveys for different geographical
resolutions. We use 2010 county level American County Survey at the county
level (acs5
). Features in this category starts with cs_ prefix.
CDC Data
The Centers for Disease Control and Prevention (CDC), provides the Behavioral Risk Factor Surveillance System (Centers for Disease Control and Prevention (2021)), which is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors.
GridMET Data
Climatology Lab at the University of California, Merced, provides the GridMET data (Abatzoglou (2013)). The data set is daily surface meteorological data covering the contiguous United States.
CMS Data
The Centers for Medicare and Medicaid Services(CMS) provides synthetic data at the county level for 2008-2010 (Centers for Medicare & Medicaid Services (2021)).
The definition of each variables are provided below. All data are collected for 2010 and aggregated into the county level and in the contiguous United States.
Usage
data(synthetic_us_2010)
Format
A data frame with 3109 rows and 46 variables:
- qd_mean_pm25
Mean PM2.5 (microgram/m3)
- cs_poverty
The proportion of below poverty level population among 65+ years old.
- cs_hispanic
The proportion of Hispanic or Latino population among 65+ years old.
- cs_black
The proportion of Black or African American population among 65+ years old.
- cs_white
The proportion of White population among 65 years and over.
- cs_native
The proportion of American Indian or Alaska native population among 65 years and over.
- cs_asian
The proportion of Asian population among 65 years and over.
- cs_other
The proportion of other races population among 65 years and over.
- cs_ed_below_highschool
The proportion of the population with below high school level education among 65 years and over.
- cs_household_income
Median Household income in the past 12 months (in 2010 inflation-adjusted dollars) where householder is 65 years and over.
- cs_median_house_value
Median house value (USD)
- cs_total_population
Total Population
- cs_area
Area of each county (square miles)
- cs_population_density
The number of the population in one square mile.
- cdc_mean_bmi
Body Mass Index.
- cdc_pct_cusmoker
The proportion of current smokers.
- cdc_pct_sdsmoker
The proportion of some days smokers.
- cdc_pct_fmsmoker
The proportion of former smokers.
- cdc_pct_nvsmoker
The proportion of never smokers.
- cdc_pct_nnsmoker
The proportion of not known smokers.
- gmet_mean_tmmn
Annual mean of daily minimum temperature (K)
- gmet_mean_summer_tmmn
The mean of daily minimum temperature during summer (K)
- gmet_mean_winter_tmmn
The mean of daily minimum temperature during winter (K)
- gmet_mean_tmmx
Annual mean of daily maximum temperature (K)
- gmet_mean_summer_tmmx
The mean of daily maximum temperature during summer (K)
- gmet_mean_winter_tmmx
The mean of daily maximum temperature during winter (K)
- gmet_mean_rmn
Annual mean of daily minimum relative humidity (%)
- gmet_mean_summer_rmn
The mean of daily minimum relative humidity during summer (%)
- gmet_mean_winter_rmn
The mean of daily minimum relative humidity during winter (%)
- gmet_mean_rmx
Annual mean of daily maximum relative humidity (%)
- gmet_mean_summer_rmx
The mean of daily maximum relative humidity during summer (%)
- gmet_mean_winter_rmx
The mean of daily maximum relative humidity during winter (%)
- gmet_mean_sph
Annual mean of daily mean specific humidity (kg/kg)
- gmet_mean_summer_sph
The mean of daily mean specific humidity during summer(kg/kg)
- gmet_mean_winter_sph
The mean of daily mean specific humidity during winter(kg/kg)
- cms_mortality_pct
The proportion of deceased patients.
- cms_white_pct
The proportion of White patients.
- cms_black_pct
The proportion of Black patients.
- cms_hispanic_pct
The proportion of Hispanic patients.
- cms_others_pct
The proportion of Other patients.
- cms_female_pct
The proportion of Female patients.
- region
The region that the county is located in.
=("NY","MA","PA","RI","NH","ME","VT","CT","NJ") NORTHEAST=("DC","VA","NC","WV","KY","SC","GA","FL","AL","TN","MS","AR","MD","DE","OK","TX","LA") SOUTH=c("OH","IN","MI","IA","MO","WI","MN","SD","ND","IL","KS","NE") MIDWEST=c("MT","CO","WY","ID","UT","NV","CA","OR","WA","AZ","NM") WEST
- FIPS
Federal Information Processing Standards, a unique ID for each county.
- NAME
County, State name.
- STATE
State abbreviation.
- STATE_CODE
State numerical code.
References
Abatzoglou, John T. 2013. “Development of Gridded Surface Meteorological Data for Ecological Applications and Modelling.” International Journal of Climatology 33 (1): 121–31. doi:10.1002/joc.3413 .
Centers for Disease Control and Prevention. 2021. “Behavioral Risk Factor Surveillance System.” https://www.cdc.gov/brfss/annual_data/annual_2010.htm/.
Centers for Medicare & Medicaid Services. 2021. “CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF).” https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf.
Di, Qian, Heresh Amini, Liuhua Shi, Itai Kloog, Rachel Silvern, James Kelly, M Benjamin Sabath, et al. 2019. “An Ensemble-Based Model of Pm2. 5 Concentration Across the Contiguous United States with High Spatiotemporal Resolution.” Environment International 130: 104909. doi:10.1016/j.envint.2019.104909 .
Di, Qian, Yaguang Wei, Alexandra Shtein, Carolynne Hultquist, Xiaoshi Xing, Heresh Amini, Liuhua Shi, et al. 2021. “Daily and Annual Pm2.5 Concentrations for the Contiguous United States, 1-Km Grids, V1 (2000 - 2016).” NASA Socioeconomic Data; Applications Center (SEDAC). doi:10.7927/0rvr-4538 .