Highlights from “Mining Electronic Health Records for Insights” Webinar

On October 15, 2015 me, Todor Primov, a Healthcare expert with Ontotext, presented Mining Electronic Health Records for Insights: Beyond Ontology Based Text Mining. This webinar highlighted some of the challenges in text mining clinical patient data and the solutions which Ontotext provides to overcome them, including:

    • Ontology-based Information Extraction
    • Application of flexible gazetteers
    • Negations detection
    • Temporality identification
    • Discovery of post-coordination patterns
    • Generation of Linked Data

The presentation also addressed many of the issues raised in our earlier blog post Overcoming the Next Hurdle in the Digital Healthcare Revolution: EHR Semantic Interoperability.

Q & A from the webinar

During the webinar Todor covered some of the challenges in applying NLP over clinical patient data and the solutions which Ontotext provides to overcome them.

Some really interesting questions were raised by the audience:

Q: Pre-coordinated vs. post-coordinated vocabularies. Why are pre-coordinated vocabularies still used? Are there any advantages of pre-coordinated compared to post-coordinated vocabularies?

A: There are lots of pre-coordinated ontologies which are primarily used for medical coding purposes, like ICD9-CM, ICD10-CM and ICPC. In many use cases a particular medical observation must be identified and referred unambiguously. So for that purpose, a fully qualified concept will be needed and the pre-coordinated ontologies are a good reference source. Just the opposite, with the post-coordinated ontologies we can model complex medical findings using relations between the “seed concept” and additional qualifiers or other classes of instances.

However the post-coordination pattern definition approach, requires to reference a finding not to a single concept, but to a relation between concepts. Some ontologies benefit from both approaches, like SNOMED CT. It is always a trade off which approach to apply and this is usually determined by the particular use case.

Q: How we can stop the explosion of possible mappings using flexible gazetteers? How many mappings are acceptable until they loose meaning for practitioners or domain experts?

A: To enrich our dictionaries, we use a predefined sequence of routines. Each routine performs a specific task and they follow an exact order, starting with applying particular ignore rules, rewrite rules and synonym/term inversion enrichment. The output from a routine serves as an input for the next step in the workflow. In each routine there are multiple rules that are applied just once, so that the different routines in the workflow are not applied iteratively and there is no risk for “explosion”. However even applying each set of rules just once, this results in a significant increase of the literals compared to the initial set. It is always a good practice to validate the newly generated terms against a large corpus of domain specific documents (like medical journal articles or anonymized EHR) in order to validate that the newly generated terms are naturally used by the medical professionals. The generated dictionary is used both by standard and the so called flexible gazetteers. The flexible gazetteers are able to identify any term from the dictionary even it’s tokens are split with an additional token in the real text.

Q: Are you able to normalize all of the qualifiers to concepts from an ontology?

A: When we use post-coordination patterns to identify and fully specify a concept in the text, we use qualifiers that are already defined by an ontology. However, we have identified many cases in which we identify a qualifier in the noun phrase ,but we cannot normalize it to a valid concept from an ontology. This requires to model your extracted data in RDF in a way that it will allow to store also the text/tokens which was not possible to be grounded to an ontology concept. This also require new implementation of new approaches for exploration of the data extracted from text.

Q: How do you model relations between extracted entities?

A: If the extraction rules are defined for extraction of different concept classes and relation between them, we model the semantics of the relation with the usage of special predicates. This is the case when we extract drug dosage information, where we identify a drug concept, a disease concept and the relation that the disease concept is an indication for the drug concept – in this example we model the relation as drug “hasIndication” disease. Other more trivial relations in the knowledgebase are modelled using the SKOS schema – related, closeMatch or exactMatch based on their type of relations and the mechanism used to define the mapping.

The slides from this presentation are available on SlideShare and a recording of the presentation is available on demand by clicking below.

View The Webinar Recording

Related Posts

  • Featured Image

    One Step Closer to Intertwingularity: Semantic Metadata

    Metadata fundamentally alters the way we think and make use of information to create and transfer knowledge.

    Semantic metadata even more so. It allows us to add as much granularity of detail to an existing object, interlink it to an endless number of other objects and make it easy to search, access and use.

  • Webinar GraphDB Cloud Enterprise Ready RDF Database on Demand

    GraphDB Cloud: Reliable, Scalable and Ready-to-Use DBaaS

    GraphDB Cloud – the easy way to introduce you to a semantic database like our signature GraphDB product – is a ready-to-use outset of your journey to manage your data and turn information into insights. The automated tasks in GraphDB Cloud save your organization the time and effort to install and manage hardware and software as well as the cost to buy it.

  • Featured Image

    Exceptional User Experiences with Meaningful Content NOW

    Content enrichment and semantic web technologies are key to efficient content management. Learn why and see these technologies in action.

Back to top