High-throughput Phenotyping on EHRs Using Multi-Tensor Factorization
Various efforts have been undertaken to transform diverse and voluminous electronic health record (EHR) data into concise and meaningful concepts, or phenotypes. To date, these efforts have been ad hoc and labor intensive, resulting in specific phenotypes for specific environments. There is an urgent need for scalable phenotyping methods, but several major challenges must be addressed, including: a) patient representation; b) high-throughput phenotype generation from EHRs; c) expert-guided phenotype refinement; and d) phenotype adaptation across institutions.
The goal of this project is to address these challenges by developing a general computational framework for transforming EHR data into meaningful phenotypes with only modest levels of expert guidance. The research team—comprised of biomedical informaticists, computer scientists, and clinical experts from Northwestern University, Georgia Institute of Technology, and Vanderbilt University—plans to represent and analyze EHR data as inter-connected high-order relations (i.e. tensors—for example: tuples of patient-medication-diagnosis, patient-lab, and patient-symptoms).
The proposed analytic framework generalizes several existing machine learning and data mining methodologies, including dimensionality reduction, topic modeling and co-clustering. The accompanying suite of algorithms and methods will enable the automation of high-throughput phenotype generation, refinement, adaptation and applications, in a broad range of health informatics settings and across multiple institutions.
CHIP Director Abel Kho is co-principal investigator along with colleagues from the Georgia Institute of Technology and Vanderbilt University for this project funded by the National Science Foundation.
Collaborators:
Georgia Institute of Technology
Vanderbilt University