Semantic annotation is the process of attaching additional information to various concepts (e.g. people, things, places, organizations etc) in a given text or any other content. Unlike classic text annotations for reader’s reference, semantic annotations are used by machines to refer to.
When a document (or another piece of content, e.g. video) is semantically annotated it becomes a source of information that is easy to interpret, combine and reuse by our computers.
Think of semantic annotations as a sort of highly structured digital marginalia, usually invisible in the human-readable part of the content. Written in the machine-interpretable formal language of data, these notes serve computers to perform operations like classifying, linking, inferencing, searching, filtering.
For instance, to semantically annotate chosen concepts in the sentence “Aristotle, the author of Politics, established the Lyceum” means to identify Aristotle as person and Politics as a written work of political philosophy and to further index, classify and interlink the identified concepts in a semantic graph database. In this case Aristotle can be linked to his date of birth, his teachers, his works and Politics can be linked to its subject, to its date of creation etc. Given the semantic metadata about the above sentence and its links to other (external or internal) formal knowledge, algorithms will be able to automatically:
Semantic annotation enriches content with machine-processable information by linking background information to extracted concepts. These concepts, found in a document or another piece of content, are unambiguously defined and related to each other within and outside the content. It turns the content into better manageable data source.
A typical process of semantic enrichment (yet another term for semantic annotation) includes:
Text is extracted from non-textual sources such as PDF files, videos,documents, voice recordings etc.
Algorithms split sentences and identify concepts, such as people, things, places, events, numbers.
All recognized concepts are classified, that is they are defined as people, organizations, numbers etc. Next, they are disambiguated, that is they are unambiguously defined according to a domain-specific knowledge base. For example, Rome is classified as a city and further disambiguated as Rome, Italy not Rome, Iowa.
This is the most important stage of semantic annotation. It very much resembles Named Entity Recognition but is different for it not only recognizes text chunks but also makes them machine-processable and understandable data pieces by linking them to a broader sets of already existing data.
The relationships between the extracted concepts are identified and interlinked with related external or internal domain knowledge.
All the recognized and enriched with machine-readable data mentions of people, things, numbers etc and the relationships between them are indexed and stored in a semantic graph database for further reference and use.
What semantic annotation brings to the table are smart data pieces containing highly-structured and informative notes for machines to refer to. Solutions that include semantic annotation are widely used for risk analysis, content recommendation, content discovery, detecting regulatory compliance and more.
Rake smart content with DSP