INFORMATICS TOOLS FOR PHARMACOGENOMIC DISCOVERY USING PRACTICE-BASED DATA
Rapid growth in the clinical implementation of large electronic medical records (EMRs) has led to an unprecedented expansion in the availability of dense longitudinal datasets for observational research. More recently, huge efforts have linked EMR databases with archived biological material, to accelerate research in personalized medicine. EMR- linked DNA biobanks have identified common and rare genetic variants that contribute to risk of disease. An appealing vision, which has not been extensively explored, is to use EMRs-linked biobanks for pharmacogenomic studies, which identify associations between genetic variation and drug efficacy and toxicity. The longitudinal nature of the data contained within EMRs make them ideal for quantifying drug outcome (both efficacy and toxicity). Efforts are already underway to link these EMRs across institutions, and standardize the definition of phenotypes for large-scale studies of treatment outcome, specifically within the context of routine clinical care. Despite its success, EMR-based pharmacogenomic studies are often hampered by its data-intensive nature — it is time- consuming and costly to extract and integrate data from multiple heterogeneous EMR databases, for large-scale pharmacogenomic studies. The Informatics for Integrating Biology and the Bedside (i2b2) is a National Center for Biomedical Computing based at Partners Healthcare System. I2b2 has developed a scalable informatics framework to enable clinical researchers to repurpose existing EMR data for clinical and genomic discovery. In this study, we will collaborate with i2b2 to extend its informatics framework to the pharmacogenomics domain, by proposing the following specific aims: 1) Develop new methods to extract and model drug exposure and outcome information from EMR and integrate them with the i2b2 NLP components; 2) Build ontology tools to normalize and integrate pharmacogenomic data across different sites; 3) Conduct known and novel pharmacogenomic studies to evaluate and refine tools developed in Aim 1 and 2; and 4) Disseminate the developed informatics tools among pharmacogenomic researchers.
BIG DATA COURSEWORK FOR COMPUTATIONAL MEDICINE
As the era of “Big Data” is dawning on biomedical research, multiple types of biomedical data, including phenotypic, molecular (including -omics), clinical, imaging, behavioral, and environmental data is being generated on an unprecedented scale with high volume, variety and velocity. These datasets are increasingly large and complex, challenging our current abilities for data representation, integration and analysis for improving outcomes and reducing healthcare costs. It is well-recognized that the greatest challenge to leveraging the significant potentials of Big Data is in educating and recruiting future computational and data scientists who have the background, training and experience to master fundamental opportunities in biomedical sciences. This demands interdisciplinary education and hands-on practicum training on understanding the application, analysis, limitations, and value of the Big Data. To bridge this knowledge gap for the U.S. biomedical workforce, we propose to develop a research educational program-Big Data Coursework for Computational Medicine (BDC4CM)-that will instruct students, fellows and scientists in the use of specific new methods and tools fo Big Data by providing tailored, in-depth instruction, hands-on laboratory modules, and case studies on Big Data access, integration, processing and analysis. Offered by highly interdisciplinary and experienced faculty from Mayo Clinic and the University of Minnesota, this program will provide a short- term training opportunity on Big Data methods and approaches for: 1) data and knowledge representation standards; 2) information extraction and natural language processing; 3) visualization analytics; 4) data mining and predictive modeling; 5) privacy and ethics; and 6) applications in comparative effectiveness research and population health research and improvement. Our primary educational goal is to prepare the next generation of innovators and visionaries in the emerging, multidimensional field of Big Data Science in healthcare, as well as to develop a future workforce that fulfills industry needs and increases U.S. competitiveness in healthcare technologies and applications.
NATIONAL INFRASTRUCTURE FOR STANDARDIZED AND PORTABLE EHR PHENOTYPING ALGORITHMS
With the rapidly growing adoption of patient electronic health record (EHR) systems due to Meaningful Use, and linkage of EHRs to research biorepositories, evaluating the suitability of EHR data for clinical and translational research is becoming ever more important, with ramifications for genomic and observational research, clinical trials, and comparative effectiveness studies. A key component for identifying patient cohorts in the EHR is to define inclusion and exclusion criteria that algorithmically select sets o patients based on stored clinical data. This process is commonly referred to, as “EHR-driven phenotyping” is time-consuming and tedious due to the lack of a widely accepted and standards-based formal information model for defining phenotyping algorithms. To address this overall challenge, the proposed project will design, build and promote an open-access community infrastructure for standards-based development and sharing of phenotyping algorithms, as well as provide tools and resources for investigators, researchers and their informatics support staff to implement and execute the algorithms on native EHR data.
MODELING SOCIAL BEHAVIOR FOR HEALTHCARE UTILIZATION IN DEPRESSION
Depression is highly prevalent, both in the US and worldwide. Among US adults, the estimated 12-month and lifetime prevalence rates are 8.3% and 19.2%, respectively. The World Health Organization considers major depressive disorder (MDD) as the third-highest cause of disease burden worldwide, and the highest cause of disease burden in the developed world. However, despite its prevalence and burden, depression remains significantly under-recognized and under-treated in all practice settings, including managed care where less than one third of adults with depression obtain appropriate professional treatment. Denial of illness and stigma are two primary barriers to proper identification and treatment of depression. Many individuals with depression are ashamed to seek out a mental health professional and consider depression a sign of personal weakness. In particular, “self-stigma” has been associated to affect adherence to psychiatric services, hope and quality of life negatively, and also poses as a barrier for social integration. Further, since self-stigma can exist without actual stigma from the public, and is more hidden and inside, it seems to be the worst form of stigma against people with depression and can directly affect the patients’ over all well-being. Studies suggest that early recognition and treatment of depressive behavior and symptoms can improve social function, increase productivity, and decrease absenteeism in the workplace. However, recognition of depression, particularly in early stages, is still challenging. To address this problem, in this proposal we plan to develop effective methods for detection of depressive behavior, not only at an individual-level, but also at a community-level. The latter is highly pertinent because depression is significantly influenced by variations in social determinants and socio- ecological factors. In particular, we will leverage robust and longitudinal electronic health record (EHR) systems at Mayo Clinic and private insurance (UnitedHealthCare/Optum Labs) reimbursement and claims data along with online social media data from Twitter and PatientsLikeMe as well as geo-coded neighborhood and environmental data to develop a “big data” platform for identifying combinations of online socio-behavioral factors and neighborhood environmental conditions to enable innovative ways for detection of depressive behavior within communities and identify patterns and changes in health care utilization for depression across different communities and geographies within U.S.
The New York City Clinical Data Research Network (NYC-CDRN) has been established to improve and streamline research in an effort to advance patient-centered research. The NYC-CDRN uses a large volume of robust, high-quality patient data and support services from a unique collaboration of more than 20 partners. These organizations include Columbia University, Montefiore Medical Center, Mount Sinai Health System, New York University Langone Medical Center, NewYork-Presbyterian, Weill Cornell Medical College, and the Clinical Directors Network. The NYC-CDRN is funded by the Patient-Centered Outcomes Research Institute (PCORI).