Text, Data and the Roman Roads: Semantic Enrichment

What do you think is the common thread between the Great Roman Empire and your Great scientific research, journalistic report or financial analysis?

In a word, it is interconnectedness.

In a sentence, these are the paths that connect objects to make a rich system of intelligent pathways throughout your content management system and across the web.


The interconnectedness of Roman Roads

Among the things that made the Roman Empire great were its roads. The communication channels linking Rome to its colonies fostered further expansion, making exchange thrive throughout the entire empire. With building military roads in mind, what Romans actually created was a vital infrastructure that facilitated the movement of goods, people and ideas.

It is that same facilitating of communication and information exchange within the infrastructure of an organization’s content assets that can make knowledge discovery flourish.

Harnessing the power of text through the endless possibilities of metadata

When everything is interlinked, the interconnected parts are more easily remixed, put together in new ways, recombined, repurposed, you name it.

And semantic enrichment is exactly that, enriching textual content with additional, well-defined information which can be processed by computers. Words become not only words, but rather easy-to-search and easy-to-use, machine-readable pieces of data.

Setting Knowledge Free with Semantic Enrichment

The ultimate goal of semantic enrichment is to set knowledge free from the limits of the non-machine-readable texts it has been locked into.

By telling a computer how data items are related and how these relations can be evaluated automatically, the processing of complex filter and search operations becomes possible. And this is vital for efficient content management and knowledge discovery.

How Semantic Enrichment Can Help Your Business?

Trying to make good use and sense of their content assets, today enterprises realize that they are virtually

sitting on more text data than they have ever experienced before.

Jarred McGinnis: Semantic technology: is it the next big thing or just another buzzword?

As a structural solution to the growing amount of data organizations face, the last decade has seen the emergence of “intelligent content”. The data pieces created by semantic enrichment are the building blocks of this type of content. These building blocks allow content to travel across multiple channels, platforms and systems, they help you connect the dots, inform your insights, guide your research and last but not least aid the uncovering of hidden relationships.


Depending on what you need and expect from your organization’s content assets, semantic enrichment can solve some of the most common problems, allowing for:

  • effective search in databases and archives
  • integration of heterogenous sources of data
  • faster information retrieval
  • more accurate research
  • automated relationships discovery/mapping
  • neatly stored (regularly updated) domain knowledge
  • automatically aggregate, repurpose and reuse content.

Semantically annotated texts not only make for better search and presentation of content, but also are the foundation of future, innovative ways of managing dormant content assets.

How Exactly Does Semantic Enrichment Work?

Creative chaos might work for you but not for anyone else trying to make sense of your content.

Take that to another level and add a machine to the equation. Why a machine? To mention just one reason, because we would definitely make good use of automated assistants in coping with the growing amount of content – both within our organization and outside it.

So, in order to make our ambiguous, difficult-to-interpret, often messy and heterogenous content a neat set of machine-processable data pieces, semantic enrichment (also known as semantic annotation) comes into play.

What semantic enrichment does is add machine-readable meaning to specific chunks of text, thus taking them to the next level, where what they contain can be boiled down to reusable information, opening the door for better search and presentation in an organization’s content or web-wide.

The Step-by-step Process of Semantic Enrichment

In order for a text to become a neat set of data pieces, it is put through a number of text-enrichment steps.

Text is extracted from articles, documents or any form of unstructured data.


After sentences are split, the important concepts and entities (i.e.the proper nouns) are identified through dictionary word lists.


Machine learning algorithms classify and disambiguate the identified entities.


Relationships between the entities are also identified.


Additionally, the facts and the original reference to the articles are indexed and stored with corresponding classifications and relationships in a triplestore.


Here’s how, in five steps, information is given well-defined meaning and becomes ready to join other data pieces, enabling new combinations, insights and cross-references.

Collectively, this powerful blend of relationships, classifications, explicit & inferred facts and unstructured data allow organizations to understand and interrogate their content and data at a much finer grain of detail.

Excerpt from: The truth about triplestores

It is through this extraction of structured, processable data from a free-flowing textual content that we can help computers help us. And it is through semantic annotation that we can “package” our content in a way that would further allow its easy travelling across platforms and devices.

To get back to the Roman infrastructure reference, with semantic enrichment we have the unique opportunity to interconnect objects and thus facilitate information exchange and knowledge discovery. Just like Roman roads did.

What other exciting applications of semantic annotation can you think of? Share your thoughts in the comments, we would love to hear from you and further discuss how semantic enrichment changes the way we create and consume content.

Teodora Petkova

Teodora Petkova

Teodora is a philologist fascinated by the metamorphoses of text on the Web. Curious about our networked lives, she explores how the Semantic Web vision unfolds, transforming the possibilities of the written word.
Teodora Petkova

Related Posts

  • Semantic Information Extraction and the New Digital Disorder

    Semantic Information Extraction: From Data Bits to Knowledge Bytes

    Semantic information extraction is a boring name for a fascinating task: pulling out meaningful data from textual sources. With its help, text chunks become data bits, data bits become semantic metadata and semantic metadata become knowledge bytes – data pieces, ready to be leveraged for insights, decisions and actions.

  • Featured image

    The Knowledge Discovery Quest

    Surrounded by millions of bits of information, in today’s digital world we are on a knowledge discovery quest. On this quest semantic search is key. It helps us explore connections and gather information from seemingly disparate sources.…

  • Revolution of Linked Open Data

    The Web as a CMS: How BBC joined Linked Open Data

    Editorial wants to create faultless content and it is hard for them to imagine that quality coming from anyone else but their team. The dilemma these days is how do you maintain that high-quality in an era of shrinking editorial budgets and ever increasing amounts of data. Early on the BBC decided not to mint their own IDs but to utilise existing URIs for musical artists from a freely available database MusicBrainz. BBC went further and made the strategic decision to also use its resources to help improve the MusicBrainz database.

Back to top