Research Overview

Dr. Cong Liu’s research sits at the intersection of biomedical informatics, artificial intelligence, and rare disease diagnosis. He is the Principal Investigator of the NIH-funded RESCUE project (Rare Disease Detection and Escalation Support via a Learning Health System), focused on developing intelligent systems that integrate structured and unstructured health data to support diagnostic decision-making. He has developed tools for phenotype-driven gene ranking, HPO-based NLP, and pediatric molecular test recommendation. His recent work leverages large language models for phenotype normalization, concept extraction, and gene prioritization, advancing natural language understanding in clinical genomics.

Currently, Dr. Liu is building and validating mutiple-agentic systems that interact with EHRs, insurance systems, laboratories, and clinical teams to streamline tasks such as triage, test ordering, diagnosis, and patient communication. The system is designed to align with existing clinical roles, SOPs, and organizational structures, and are being evaluated in real-world settings. His work integrates FHIR standards, decision support algorithms, and human-AI collaboration to enhance diagnostic efficiency, equity, and scalability. Dr. Liu is also an active contributor to national consortia including eMERGE and OHDSI, helping develop interoperable frameworks for integrating polygenic risk scores and real-world evidence into clinical care.

Publications

  1. Phenotype driven molecular genetic test recommendation for diagnosing pediatric rare disorders. NPJ Digit Med. 2024 Nov 21; 7(1):333. View Abstract
  2. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet. 2024 Oct 03; 111(10):2190-2202. View Abstract
  3. Fine-tuning large language models for rare disease concept normalization. J Am Med Inform Assoc. 2024 Sep 01; 31(9):2076-2083. View Abstract
  4. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. Patterns (N Y). 2024 Jan 12; 5(1):100887. View Abstract
  5. Deep learning for rare disease: A scoping review. J Biomed Inform. 2022 11; 135:104227. View Abstract
  6. OARD: Open annotations for rare diseases and their phenotypes based on real-world data. Am J Hum Genet. 2022 09 01; 109(9):1591-1604. View Abstract
  7. Evaluation of Criteria2Query: Towards Augmented Intelligence for Cohort Identification. Stud Health Technol Inform. 2022 Jun 06; 290:297-300. View Abstract
  8. Risk Factors Associated With SARS-CoV-2 Breakthrough Infections in Fully mRNA-Vaccinated Individuals: Retrospective Analysis. JMIR Public Health Surveill. 2022 May 24; 8(5):e35311. View Abstract
  9. Generalizability of Polygenic Risk Scores for Breast Cancer Among Women With European, African, and Latinx Ancestry. JAMA Netw Open. 2021 08 02; 4(8):e2119084. View Abstract
  10. Comparative effectiveness of medical concept embedding for feature engineering in phenotyping. JAMIA Open. 2021 Apr; 4(2):ooab028. View Abstract
  11. Comparison of Clinical Characteristics Between Clinical Trial Participants and Nonparticipants Using Electronic Health Record Data. JAMA Netw Open. 2021 Apr 01; 4(4):e214732. View Abstract
  12. Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genom Bioinform. 2020 Jun; 2(2):lqaa032. View Abstract
  13. DQueST: dynamic questionnaire for search of clinical trials. J Am Med Inform Assoc. 2019 Nov 01; 26(11):1333-1343. View Abstract
  14. Ensembles of natural language processing systems for portable phenotyping solutions. J Biomed Inform. 2019 Dec; 100:103318. View Abstract
  15. Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network. J Biomed Inform. 2019 11; 99:103293. View Abstract
  16. Doc2Hpo: a web application for efficient and accurate HPO concept curation. Nucleic Acids Res. 2019 Jul 02; 47(W1):W566-W570. View Abstract

Contact Cong Liu