What is Semantic Annotation?

Semantic annotation is the process of attaching additional information to various concepts (e.g. people, things, places, organizations etc) in a given text or any other content. Unlike classic text annotations for reader’s reference, semantic annotations are used by machines to refer to.

When a document (or another piece of content, e.g. video) is semantically annotated it becomes a source of information that is easy to interpret, combine and reuse by our computers.

If you are looking to provide high-quality content at low costs you should read our white paper on Dynamic Semantic Publishing.

Create Smart Content with Machine-Processable Marginalia

Think of semantic annotations as a sort of highly structured digital marginalia, usually invisible in the human-readable part of the content. Written in the machine-interpretable formal language of data, these notes serve computers to perform operations like classifying, linking, inferencing, searching, filtering.

For instance, to semantically annotate chosen concepts in the sentence “Aristotle, the author of Politics, established the Lyceum” means to identify Aristotle as person and Politics as a written work of political philosophy and to further index, classify and interlink the identified concepts in a semantic graph database. In this case Aristotle can be linked to his date of birth, his teachers, his works and Politics can be linked to its subject, to its date of creation etc. Given the semantic metadata about the above sentence and its links to other (external or internal) formal knowledge, algorithms will be able to automatically:

  • Find out who tutored Alexander the Great.
  • Answer which of Plato’s pupils established the Lyceum.
  • Retrieve a list of political thinkers who lived between 380 and 310 BC.
  • Render a page about Greek philosophers and include Aristotle.

How Does Semantic Annotation Work?

Semantic annotation enriches content with machine-processable information by linking background information to extracted concepts. These concepts, found in a document or another piece of content, are unambiguously defined and related to each other within and outside the content.  It turns the content into better manageable data source.

A typical process of semantic enrichment (yet another term for semantic annotation) includes:

Text Identification

text identification

Text is extracted from non-textual sources such as PDF files, videos,documents, voice recordings etc.

Text Analysis

text_analysis

Algorithms split sentences and identify concepts, such as people, things, places, events, numbers.

Concept Extraction

concept_extraction

All recognized concepts are classified, that is they are defined as people, organizations, numbers etc. Next, they are disambiguated, that is they are unambiguously defined according to a domain-specific knowledge base. For example, Rome is classified as a city and further disambiguated as Rome, Italy not Rome, Iowa.

This is the most important stage of semantic annotation. It very much resembles Named Entity Recognition but is different for it not only recognizes text chunks but also makes them machine-processable and understandable data pieces by linking them to a broader sets of already existing data.

Relationship Extraction

relationship_extraction

The relationships between the extracted concepts  are identified and interlinked with related external or internal domain knowledge.

Indexing and storing in a semantic graph database

indexing_storing_graph_database

All the recognized and enriched with machine-readable data mentions of people, things, numbers etc and the relationships between them are indexed and stored in a semantic graph database for further reference and use.

Where is Semantic Annotation Used?

What semantic annotation brings to the table are smart data pieces containing highly-structured and informative notes for machines to refer to. Solutions that include semantic annotation are widely used for risk analysis, content recommendation, content discovery, detecting regulatory compliance and more.

Semantically Annotated Content Opens Up Cost-Effective Opportunities:

Semantic Annotation Makes it Easy to:

  • Find relevant information among heaps of documents with the help of machines doing the legwork
  • Extract knowledge from disparate sources
  • Provide personalized content, based on machine-understandable context
  • Automatically interconnect content
SMARTER CONTENT WITH A DYNAMIC SEMANTIC PUBLISHING PLATFORM

White Paper: Dynamic Semantic Publishing Platform 

Rake smart content with DSP

Free Download

Back to top