Data Resource Hub
Below are freely available public datasets, categorized by topic. If you have any questions or need assistance navigating the data resources, please reach out to us at ceph@northwestern.edu.
National
Dryad
Dryad is an open data publishing platform and community committed to the open availability and re-use of all research data. Thousands of data sets ar available for download across a range of topics.
ICPSR
The ICPSR contains thousands of ready-to-download datasets and individual variables on a variety of topics. Survey and longitudinal data are available for download through ICPSR.
IPUMS Health Surveys
IPUMS Health Surveys (NHIS) are US-based data from 1963 to the present day. IPUMS Health Surveys are best for cross-sectional analyses with individual-level health events as the outcome. Add any number of variables “to cart” and IPUMS will generate a .csv dataset to download. Data can be readily downloaded from R using the IPUMS R Interface
IPUMS USA
IPUMS USA contains census microdata from 1790 to 2010 and American Community Surveys (ACS) from 2000 to the present. IPUMS USA data are useful for cross-sectional analyses at the household or individual level. Add any number of variables “to cart” and IPUMS will generate a .csv dataset to download. Data can be readily downloaded from R using the IPUMS R Interface
Behavioral Risk Factor Surveillance System (BRFSS)
The Behavioral Risk Factor Surveillance System (BRFSS) a telephone survey that state health departments conduct monthly. Self-reported survey data and documentation is available for cross-sectional analysis. Topics include health behaviors, chronic conditions, and use of preventive services.
The National Health and Nutrition Examination Survey (NHANES)
The National Health and Nutrition Examination Survey (NHANES) collects nationally representative data about the health of adults and children in the United States. Two-year period survey data is available from 1999-2023, while data prior to 1999 is grouped in larger year interval datasets. Continuous variables are available to search and download.
National Health Interview Survey (NHIS)
National Health Interview Survey (NHIS) data is available for adults and children annually. Other available data includes parent-child pair data and imputed income data. While these surveys are usually best for cross-sectional analyses, some years have adult longitudinal data available (i.e. 2020 adult longitudinal).
International
IPUMS International
IPUMS International contains census microdata for 104 countries, 565 censuses and surveys, and over 1-billion-person records. Add any number of variables “to cart” and IPUMS will generate a .csv dataset to download. Data can be readily downloaded from R using the IPUMS R Interface
Institute for Health Metrics and Evaluation (IHME)
• The Global Health Data Exchange is the world’s most comprehensive catalog of surveys, censuses, vital statistics, and other health-related data. After topic and country selection, survey and/or longitudinal data is available for download upon registration.
National
National Health and Aging Trends Study (NHATS)
NHATS is a longitudinal, nationally representative study of Medicare beneficiaries aged 65 and older. Public data files are available after log in.
Health and Retirement Study
The Health and Retirement study collects aging data on US citizens. HRS data is publicly available for registered users. Data includes biennial, off-year, and cross year data.
Gateway to Global Aging Data
The USC Gateway to Global Aging team has created a harmonized dataset using the following aging data and studies: HRS (USA), MHAS (Mexico), ELSA (England), SHARE (Europe and Israel), CRELES (Costa Rica), KLoSA (South Korea) JSTAR (Japan), TILDA (Ireland), CHARLS (China), and MARS (Malaysia). Interested users need to apply for Enclave access by creating an account and filling out an application. Individual datasets for each study are available below.
International
Mexican Health and Aging Study (MHAS)
Country: Mexico
The MHAS offers publicly available aging data to registered users. The study has collected five waves of data (2001, 2003, 2012, 2015, and 2018) on adults over the age of 50 living in Mexico.
English Longitudinal Study of Ageing (ELSA)
Country: England
The English Longitudinal Study of Ageing (ELSA) offers data from 1998-2023 (Waves 0-10) to download by registered users.
Survey of Health, Ageing, and Retirement in Europe (SHARE)
Country: Europe and Israel
Downloadable survey data is available from the Survey of Health, Ageing, and Retirement in Europe (SHARE) after completion of registration.
Costa Rician Study on Longevity and Health Aging (CRELES)
Country: Costa Rica
The Costa Rician Study on Longevity and Health Aging (CRELES) contains aging data for two cohorts. Data is available for download after submission of the data usage agreement.
Korean Longitudinal Study of Ageing (KLoSA)
Country: South Korea
The Korean Longitudinal Study of Ageing (KLoSA) started national surveys of adults 45 years and older in 2006. Basic survey data and special topic survey data are available for download after log-in.
China Health and Retirement Longitudinal Study (CHARLS)
Country: China
Data on demographics, health status, work and retirement, income, and more are available for download from the China Health and Retirement Longitudinal Study (CHARLS) after registration. Individual ‘Wave’ datasets are available as well as harmonized, longitudinal datasets.
Longitudinal Aging Study in India (LASI)
Country: India
The Longitudinal Aging Study in India (LASI) is a longitudinal study of individuals aged 45 and older in India. LASI collects information conceptually comparable to that gathered by the Health and Retirement Study (HRS) in the United States and its sister surveys in Asia, Europe, Mexico, and elsewhere. Part of the reason for the close connection is to allow cross-country comparisons using these data.
National
The Youth Risk Behavior Surveillance System (YRBSS)
Since 1990, the Youth Risk Behavior Surveillance System has surveyed youths in the US. Survey topics include demographics, youth health behaviors and conditions, substance use behaviors, and student experiences. Bi-annual data is available for download.
KIDS COUNT
KIDS COUNT is a nationwide network that tracks the well-being of children at the state-level. Data is available for download by topic and/or location. Topics include demographics, economic well-being, education, health indicators and insurance, safety and risky behaviors, and more.
Add Health
The National Longitudinal Study of Adolescent to Adult Health (Add Health) contains public-use datasets that contain survey data from adolescents in grades 7-12 into adulthood. The public-use data sets contain all the survey data from In-Home Interviews but only for a subset of the full Add Health sample. Longitudinal data (Waves 1-4) are available for public use as well as the Parent Study; a sample of Add Health members that were originally interviewed at Wave I (1994-1995) who are now parents.
Child Opportunity Index
Diversity Data Kids is best known for its Child Opportunity Index (COI). The COI is a composite index of children's neighborhood opportunity that contains data for every neighborhood (by census tract) in the United States from 2012 through 2021. It is comprised of 44 indicators in three domains (education, health and environment, and social and economic) and 14 subdomains.
Environmental Influences on Child Health Outcomes (ECHO)
The Environmental Influences on Child Health Outcomes (ECHO) focuses on five pediatric outcome areas: obesity, neurodevelopment, upper and lower airways, pre-, peri-, and postnatal outcomes, and positive health. The ECHO-wide Cohort Study incorporates longitudinal data that has been complied and harmonized across 69 pediatric cohorts (including 30,000 pregnancies and 50,000 children). ECHO datasets are available for download after signing up.
Local
Chicago Metropolitan Environment, Sustainability, and Climate Datasets
The Chicago Metropolitan Agency for Planning (CMAP) has several datasets related to Environment, Study, and Climate in the Chicagoland area available for use.
National
The Harvard Dataverse
The Harvard Dataverse contains datasets available for download on a variety of climate-related topics such as air emissions, environmental justice mapping, air quality index, and wildfires.
EPA Datasets
The EPA has over 6,000 Open Data resources available for download. Dataset topics include air quality, walkability indices, safe drinking water, greenhouse gas emissions, and more. No log-in or registration necessary. Download datasets by clicking directly on the icon of the format you prefer.
Environmental Public Health Tracking Network
The National Environmental Public Health Tracking Network brings together health data and environmental data from national, state, and city sources. Exposure and health outcomes data (asthma, cancer, birth defects) are among available data. Datasets can be created by choosing desired variables within the Data Explorer or existing datasets can be downloaded directly here.
USDA Climate Datasets
The USDA Geospatial Data Gateway contains ready-to-download monthly and annual average precipitation and temperature datasets at State and National levels that can be downloaded without placing an order.
Local
City of Chicago
The City of Chicago provides datasets on neighborhood, city, ward, and zip-code boundaries that can be downloaded and used freely. Shapefiles are available to view or use outside of a web browser, but you will need to use compression software and GIS software. No metadata is available.
Chicago Health Atlas
The Chicago Department of Public Health and PHAME Center at UIC have created a health atlas for the City of Chicago. Data on 100s of indicators are available to download at the Zip Code, Census tract, community, or zone level.
Cook County Health Atlas
The Cook County Health Atlas is an information sharing platform co-designed by Cook County Department of Public Health (CCDPH) along with multiple stakeholders. Data on a variety of indicators and health outcomes are available to download at the municipal, zip code, or district level.
*Coming soon from the Kershaw Social Environment and Health Lab: R Code for Geocoding, housed at NU.
National
National Historical Geographic Information System (NHGIS)
The National Historical Geographic Information System (NHGIS) provides free online access to summary statistics and GIS files for U.S. censuses and other nationwide surveys from 1790 through the present. The NHGIS provides easy access to all levels of U.S. census geography; states, counties, tracts, and blocks. Data can be readily downloaded from R using the ipumsr interface.
Geodata from Social Explorer
The Social Explorer has an extensive geodata library that contains downloadable census, district, public use, and tract data. Geodata is easy to download with the Professional Plan provided by Northwestern.
TIGER/Line Files
The US Census Bureau website has census mapping files and line shapefiles available to download and merge existing data. While the core TIGER/Line Files and Shapefiles do not include demographic data, they do contain geographic entity codes that can be linked to the Census Bureau’s demographic data, available here.
Local
Chicago Health Atlas
The Chicago Department of Public Health and PHAME Center at UIC offer a health atlas for the City. Data on 100s of indicators from physical environment to clinical care are available to download at the Zip Code, census tract, community, or zone level.
Cook County Health Atlas
The Cook County Health Atlas is an information sharing platform co-designed by Cook County Department of Public Health (CCDPH) along with multiple stakeholders. Data on 100s of indicators, from healthcare access to income and public assistance access, are available to download at the municipal, zip code, or district level.
National
Social Determinants of Health Database
The Agency for Healthcare Research and Quality has a Social Determinants of Health database with Variable files that correspond to five key SDOH domains: social context (e.g., age, race/ethnicity, veteran status), economic context (e.g., income, unemployment rate), education, physical infrastructure (e.g, housing, crime, transportation), and healthcare context (e.g., health insurance). The files can be linked to other data by geography (county, ZIP Code, and census tract). The database includes data files and codebooks by year at three levels of geography, as well as a documentation file.
OpenICPSR
openICPSR is a self-publishing repository for social, behavioral, and health sciences research data. A service of the Inter-university Consortium for Political and Social Research (ICPSR), openICPSR is particularly well-suited for the deposit of replication data sets for researchers who need to publish their raw data associated with a journal article so that other researchers can replicate their findings. ICPSR does not improve or alter datasets deposited in openICPSR in any way. Data are preserved as-is and distributed in the same condition and format submitted by the depositor.
Social and Economic Context
IPUMS USA
IPUMS USA contains decades of national survey data on individual and household income, government assistance programs, demographics, education, veteran status, and more. Add any number of variables “to cart” and IPUMS will generate a .csv dataset to download. Data can be readily downloaded from R using the IPUMS R Interface
The Atlas of Rural and Small-Town America
The Atlas of Rural and Small-Town America provides statistics by broad categories of socioeconomic factors. Data are grouped by topic and reported in four tabs within the spreadsheet: People, Jobs, Income, Veterans, and County Classifications. Each tab includes the county FIPS code as the first column.
Physical Infrastructure
American Housing Survey (AHS)
The American Housing Survey (AHS) has national public use files from 1973-2023. In the AHS microdata, the basic unit is an individual housing unit. Each record shows most of the information associated with a specific housing unit or individual, except for data items that could be used to personally identify that housing unit or individual. Datasets can be downloaded as SAS or CSV without registration.
Multidimensional Deprivation Index (MDI)
The Multidimensional Deprivation Index (MDI) is a measure of deprivation in six dimensions estimated from the American Community Survey: standard of living, health, education, economic security, housing quality, and neighborhood. State MDI rates from 2010-2019 and 2021-2023 and County MDI rates from 2010-2019 are available for free and easy download.
Social Deprivation Index
The Social Deprivation Index is a composite measure of seven demographic characteristics collected in the American Community Survey (ACS): percent living in poverty, percent with less than 12 years of education, percent single-parent households, the percentage living in rented housing units, the percentage living in the overcrowded housing unit, percent of households without a car, and percentage of unemployed adults under 65 years of age. The SDI measure is available at county, census tract, aggregated Zip Code Tabulation Area (ZCTA), and Primary Care Service Area (PCSA, v 3.1) levels.
Racial Segregation and Neighborhoods
Staff and faculty at Northwestern have published an Open ICPSR project that measured residential and racial segregation using G*statistics.
Food Access Research Atlas
The USDA has created an atlas of the US by census tract to identify low food access areas. 2019 datasets are available for download by tract.
US Crime Data
The Social Explorer had county level crime data from the FBI and UCR from 2010-current. Northwestern provides a Professional Plan to Social Explorer for data access.
Healthcare Context
IPUMS Health Surveys
IPUMS Health Surveys (NHIS) are US-based data from 1963 to the present day. IPUMS NHIS has rich cross-sectional, person-level data on healthcare access, health insurance, and income/government assistance. Add any number of variables “to cart” and IPUMS will generate a .csv dataset to download. Data can be readily downloaded from R using the IPUMS R Interface
Healthcare Cost and Utilization Project (HCUP)
The Healthcare Cost and Utilization Project (HCUP) includes the largest collection of longitudinal hospital care data in the United States. The HCUP database has information on inpatient stays, emergency department visits, and ambulatory care.
Dartmouth Dataverse
The Dartmouth Dataverse has clinical practice, healthcare accessibility/quality, and Medicare datasets available for download after accepting the terms of use.
National
American Time Use
The American Time Use survey provides nationally representative estimates of how, where, and with whom Americans spend their time. Individual and multi-year datasets include information on time spent on child and elderly care, volunteering, and working from home. Instructions for downloading the microdata files can be found after clicking the data file link.