LIke ThisLIke ThisLIke This

Timothy Miller, PhD

Man no face
Natural Language Processing Laboratory
Computational Health Informatics Program (CHIP)
Medicine Research
Hospital Title:
Associate Scientific Researcher
Academic Title:
Instructor in Pediatrics, Harvard Medical School
Research Focus Area:
Natural Language Processing Applied to Text in Clinical Narratives
Contact Via Email
Send an email to SendYour Name*Your Email Address*Subject*Comments*

Research Overview

Timothy Miller's work in the field of clinical natural language processing (NLP) has covered a broad array of applications, from clinical research-enabling phenotyping applications as part of the i2b2 center for biomedical computing, to semantic processing of clinical texts, to core contributions to NLP and machine learning. A major thread that ties all this work together is an interest in the value of syntax. He has been responsible for syntactic contributions in temporal relation extraction (Lin etal, 2014, Miller et al, 2013 and Miller et al, in preparation), UMLS relation extraction (Dligach et al, 2013), coreference resolution (Miller et al, 2012, Zheng et al, 2012), and negation detection (Miller et al, in preparation). This also includes contribution of code to open source projects Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) and ClearTK. In cTAKES he developed a constituency parser module, and contributed syntactic features to all the relation extraction modules. In ClearTK he contributed java tree kernel code (part of their version 2.0 release) that dramatically improves tree kernel learning, and enables new kernel development. This code was the backbone for a new kernel (Descending Path Kernel) described
in Lin et al. (2014).
Despite these advances, he is struck by the diversity in clinical sub-domains and how this affects performance. He has been involved with several clinical language annotation projects, and has been lucky enough to be able to use these syntactic and semantic annotations. However, the difficulty of distributing clinical data and the differences between domains will limit the applicability of methods developed on only one corpus. Timothy saw first hand evidence of this by working on different coreference corpora (ODIE and i2b2 Challenge), where performance suffered greatly between corpora. As a result, he has come to be interested in approaches that make use of unsupervised structure learning and world knowledge extraction.


Publications powered by Harvard Catalyst Profiles
  1. Miller T, Dligach D, Bethard S, Lin C, Savova G. Towards generalizable entity-centric clinical coreference resolution. J Biomed Inform. 2017 May; 69:251-258.
  2. Lin C, Dligach D, Miller TA, Bethard S, Savova GK. Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc. 2016 Mar; 23(2):387-95.
  3. Lin C, Karlson EW, Dligach D, Ramirez MP, Miller TA, Mo H, Braggs NS, Cagan A, Gainer V, Denny JC, Savova GK. Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc. 2015 Apr; 22(e1):e151-61.
  4. Pfiffner PB, Oh J, Miller TA, Mandl KD. as a data source for semi-automated point-of-care trial eligibility screening. PLoS One. 2014; 9(10):e111055.
  5. Lin C, Karlson EW, Canhao H, Miller TA, Dligach D, Chen PJ, Perez RN, Shen Y, Weinblatt ME, Shadick NA, Plenge RM, Savova GK. Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One. 2013; 8(8):e69932.
  6. Zheng J, Chapman WW, Miller TA, Lin C, Crowley RS, Savova GK. A system for coreference resolution for the clinical narrative. J Am Med Inform Assoc. 2012 Jul-Aug; 19(4):660-7.
LIke ThisLIke ThisLIke This

Related Laboratory

Natural Language Processing Laboratory

Our mission is to develop and implement Natural Language Processing (NLP) technologies to apply to the electronic medical record.  These technologies include core NLP tasks such as relation extraction, coreference resolution, and parsing, and make use of statistical machine learning methods.

Learn more