Title | Location bias of identifiers in clinical narratives. |
Publication Type | Journal Article |
Year of Publication | 2013 |
Authors | Hanauer, DA, Mei, Q, Malin, B, Zheng, K |
Journal | AMIA Annu Symp Proc |
Volume | 2013 |
Pagination | 560-9 |
Date Published | 2013 |
ISSN | 1942-597X |
Keywords | Computer Security, Confidentiality, Electronic Health Records, Health Insurance Portability and Accountability Act, Humans, Medical Records Systems, Computerized, Narration, United States |
Abstract | Scrubbing identifying information from narrative clinical documents is a critical first step to preparing the data for secondary use purposes, such as translational research. Evidence suggests that the differential distribution of protected health information (PHI) in clinical documents could be used as additional features to improve the performance of automated de-identification algorithms or toolkits. However, there has been little investigation into the extent to which such phenomena transpires in practice. To empirically assess this issue, we identified the location of PHI in 140,000 clinical notes from an electronic health record system and characterized the distribution as a function of location in a document. In addition, we calculated the 'word proximity' of nearby PHI elements to determine their co-occurrence rates. The PHI elements were found to have non-random distribution patterns. Location within a document and proximity between PHI elements might therefore be used to help de-identification systems better label PHI. |
Alternate Journal | AMIA Annu Symp Proc |
PubMed ID | 24551358 |
PubMed Central ID | PMC3900199 |
Grant List | 1R01LM011366 / LM / NLM NIH HHS / United States 1U01HG006385 / HG / NHGRI NIH HHS / United States UL1TR000433 / TR / NCATS NIH HHS / United States |
Research reported in this publication was supported by the National Cancer Institutes of
Health under Award Number P30CA046592. The content is solely the responsibility
of the authors and does not necessarily represent the official views of the
National Institutes of Health.
Research reported in this publication was supported by the National Cancer Institutes of
Health under Award Number P30CA046592 by the use of the following Cancer Center
Shared Resource(s): Biostatistics, Analytics & Bioinformatics; Flow Cytometry;
Transgenic Animal Models; Tissue and Molecular Pathology; Structure & Drug
Screening; Cell & Tissue Imaging; Experimental Irradiation; Preclinical
Imaging & Computational Analysis; Health Communications; Immune Monitoring;
Pharmacokinetics)
Copyright © Cancer Center Informatics-2011 Regents of the University of Michigan