Highlights from “Mining Electronic Health Records for Insights” Webinar

On October 15, 2015 me, Todor Primov, a Healthcare expert with Ontotext, presented Mining Electronic Health Records for Insights: Beyond Ontology Based Text Mining. This webinar highlighted some of the challenges in text mining clinical patient data and the solutions which Ontotext provides to overcome them, including:

    • Ontology-based Information Extraction
    • Application of flexible gazetteers
    • Negations detection
    • Temporality identification
    • Discovery of post-coordination patterns
    • Generation of Linked Data

The presentation also addressed many of the issues raised in our earlier blog post Overcoming the Next Hurdle in the Digital Healthcare Revolution: EHR Semantic Interoperability.

Q & A from the webinar

During the webinar Todor covered some of the challenges in applying NLP over clinical patient data and the solutions which Ontotext provides to overcome them.

Some really interesting questions were raised by the audience:

Q: Pre-coordinated vs. post-coordinated vocabularies. Why are pre-coordinated vocabularies still used? Are there any advantages of pre-coordinated compared to post-coordinated vocabularies?

A: There are lots of pre-coordinated ontologies which are primarily used for medical coding purposes, like ICD9-CM, ICD10-CM and ICPC. In many use cases a particular medical observation must be identified and referred unambiguously. So for that purpose, a fully qualified concept will be needed and the pre-coordinated ontologies are a good reference source. Just the opposite, with the post-coordinated ontologies we can model complex medical findings using relations between the “seed concept” and additional qualifiers or other classes of instances.

However the post-coordination pattern definition approach, requires to reference a finding not to a single concept, but to a relation between concepts. Some ontologies benefit from both approaches, like SNOMED CT. It is always a trade off which approach to apply and this is usually determined by the particular use case.

Q: How we can stop the explosion of possible mappings using flexible gazetteers? How many mappings are acceptable until they loose meaning for practitioners or domain experts?

A: To enrich our dictionaries, we use a predefined sequence of routines. Each routine performs a specific task and they follow an exact order, starting with applying particular ignore rules, rewrite rules and synonym/term inversion enrichment. The output from a routine serves as an input for the next step in the workflow. In each routine there are multiple rules that are applied just once, so that the different routines in the workflow are not applied iteratively and there is no risk for “explosion”. However even applying each set of rules just once, this results in a significant increase of the literals compared to the initial set. It is always a good practice to validate the newly generated terms against a large corpus of domain specific documents (like medical journal articles or anonymized EHR) in order to validate that the newly generated terms are naturally used by the medical professionals. The generated dictionary is used both by standard and the so called flexible gazetteers. The flexible gazetteers are able to identify any term from the dictionary even it’s tokens are split with an additional token in the real text.

Q: Are you able to normalize all of the qualifiers to concepts from an ontology?

A: When we use post-coordination patterns to identify and fully specify a concept in the text, we use qualifiers that are already defined by an ontology. However, we have identified many cases in which we identify a qualifier in the noun phrase ,but we cannot normalize it to a valid concept from an ontology. This requires to model your extracted data in RDF in a way that it will allow to store also the text/tokens which was not possible to be grounded to an ontology concept. This also require new implementation of new approaches for exploration of the data extracted from text.

Q: How do you model relations between extracted entities?

A: If the extraction rules are defined for extraction of different concept classes and relation between them, we model the semantics of the relation with the usage of special predicates. This is the case when we extract drug dosage information, where we identify a drug concept, a disease concept and the relation that the disease concept is an indication for the drug concept – in this example we model the relation as drug “hasIndication” disease. Other more trivial relations in the knowledgebase are modelled using the SKOS schema – related, closeMatch or exactMatch based on their type of relations and the mechanism used to define the mapping.

The slides from this presentation are available on SlideShare and a recording of the presentation is available on demand by clicking below.

View The Webinar Recording

Todor Primov

Todor Primov

Solution Architect LS & HC at Ontotext
Todor Primov

Related Posts

  • Integrating data and metadata to create increased interlinking and integrated services to engage audiences

    How Data Integration Joined the Music Hit Charts

    “We are at a crossroads in the music business: with the rise of the internet, the world we live in has changed, and the past is not coming back. But I see the glass as half-full: the…

  • Open data fosters a culture of creativity and innovation

    Open Data Innovation? Open Your Data And See It Happen.

    As more and more companies and startups are creating business and social value out of open data, the open data trend-setting governments and local authorities are not sitting idle and are opening up data sets and actively encouraging citizens, developers, and firms to innovate with open data.

  • Linked Open Data Sets

    Linked Data Innovation – A Key To Foster Business Growth

      ‘Data is the new oil’, once said Neelie Kroes,  former Vice-President of the European Commission responsible for the Digital Agenda, aptly describing how the growing amounts of data are changing businesses and our lives. The year…

Back to top