A Web of People and Machines: W3C Semantic Web Standards

It’s 1989. We are at CERN, the European Organization for Nuclear Research. Physicists and engineers from all over the world have gathered to seek answers about particle physics, bringing a variety of computers, file formats, software and procedures to the site. The inventor of the World Wide Web, Sir Tim Berners-Lee is also there for a brief consulting software job, working on a way to end the mess, caused by the incompatibility of formats, networks and systems. He is thinking how he can implement his long-thought idea of a software program (a software project, initially named Enquire, which became the predecessor to the World Wide Web) and design a system that will ease the process of interlinking and organizing information at CERN, by finding a way for computers to communicate indirectly over a network.

Weaving the web

As Tim Berners-Lee will later write in his book “Weaving the Web”:

Suppose all the information stored on computers everywhere were linked, I thought. Suppose I could program my computer to create a space in which anything could be linked to anything. All the bits of information in every computer at CERN, and on the planet would be available to me and to anyone else. There would be a single , global information space.

It was that same desire of linking anything to anything that grew into a proposal, issued to CERN in 1990: Information Management: A Proposal and was the conceptual basis that further gave rise to the World Wide Web.

Sir Tim Berners-Lee

A World Wide Web later

Fast forward a quarter century later. The World Wide Web is already a powerful means for collaboration, connecting people from all over the world, letting everyone publish, share, access use and reuse documents and files with of all conceivable formats. Cooperation between people is made easy, the transfer of all kinds of content too. What Sir Tim Berners-Lee envisioned has come true. Partly, though.

Now that the cooperation between people has become effortless in many ways, it is the communication of computer systems capable of understanding the mountains of data put on the Web that will truly unfold the potential of the Web.

As far as data are concerned, still, the same daunting task of managing, sharing, reusing and automatic processing is before us. Only this time it has to do with bringing collaboration to the next level and linking anything to anything on data level. For that to happen data on the Web are to be put in a understandable and processable by machines form, and not locked into siloed, proprietary data formats that impede knowledge storage, access and retrieval.

But what is the road to integration and interoperability of data? What will make for a common framework which will facilitate the sharing and reuse of data by computers, the way the common language HTML allowed them to share and represent hypertext?


A Web of People and Machines: The Semantic Web Vision

Only when we have this extra level of semantics will we be able to use computer power to help us exploit the information to a greater extent than our own reading.
cit. Plenary at WWW Geneva 94 

The inventor of the Web saw it not only as a Web of People, the ultimate goal of which is to “support and improve our weblike existence in the world”, but also as a Web of Machines, where human collaboration is extended through computers. What Tim Berners-Lee envisioned was a global information space in which computers become capable of carrying out sophisticated tasks through analyzing content, links, transactions between people and machines and. This space he called the Semantic Web.

A layer in the fabric of the Web as we know it, only woven of machine-readable data, the Semantic Web is to become a highly interconnected network where the huge amount of heterogeneous data is be given well-defined meaning. Ultimately, just like we have a web of documents, we will have a web of data that will be processed on our behalf by autonomous agents aware of the context and the meaning of data pieces and able to interpret the relationships among them.

In order for that vision of the Semantic Web to be fully realized, there need to be formal standards for representing and interpreting data.

Building Bridges Through Global Agreement: W3C and the Semantic Web Standards

The data web, not unlike the document web, involves standards. W3C (The World Wide Web Consortium) is the international community that represents developers, researchers, organizations and users and where Web standards that make the World Wide Web work are being developed. It is important to mention that the process of the standardization of Web technologies is based on community consensus, that is standards are agreed upon discussions, close collaboration and general agreement between W3C Members, W3C Team and working groups of experts.

In addition to specifications for the “Web of documents”, W3C is dedicated to the development of an ecosystem of standards to support a “Web of data”, i.e. the Semantic Web stack. At the end of 2013, W3C Semantic Web Activity (launched in 2001 “to lead the use of the Web as an exchange medium for data as well as documents”) became part of an initiative with a broader scope, namely W3C Data Activity.


All We Need is Linked Data

Central to the concept of the Semantic Web is Linked Data. In order for the Semantic Web to function, that is for applications and tools to be able to manage and process data on our behalf, it is important that data pieces are available in a standard format.

Just like there’s lingua franca for representing documents on the Web and that is the Hypertext Markup Language (HTML), a common format for data to be represented and shared exists and it is called Resource Description Framework (RDF). A standard model for data interchange on the Web, RDF, is among the main building blocks of the Semantic Web Stack, together with other Semantic Web technologies, such as OWL, SKOS, SPARQL etc.

URI, RDF, SPARQL at a glance

Semantic Web technologies have the immense potential to address the need for connected, discoverable and understandable by humans and machines data and empower Linked Data. And Linked Data, together with the precious LOD – a vast subject in its own right, is what makes for more effective discovery, automation, integration, and reuse of information.

In the paragraphs below you will find a short introduction to three of the essential technologies, part of the Semantic Web architecture: URI, RDF and SPARQL.

If you want to learn more about these technologies and their enterprise application, you are more than welcome at Ontotext’s Developer’s Hub, as well as to browse the following resource with some basic Semantic Web concepts, available here.

Naming things (URI)

URI stands for Uniform Resource identifier and it is used to address everything – from documents and digital contents available on the Web to real objects and abstract concepts. Think of it as naming. In order to describe anything or to refer to anything you need to name it. So, on the Semantic Web things are named with an URI. Also, on the Semantic Web anyone can name anything, just like in real life. One thing can have different names (URIs) that people are referring to it with. URI are the building element of RDF.

Making statements, forming sentences (RDF)

RDF stands for Resource Description Framework. It is used for describing resources on the web. When you already have an URI, you can use RDF to say things about things, that is to create statements. Think of this as building sentences. RDF statements consist of Subject, Predicate and Object, the same way our sentences consist of these three. For example, the sentence “This article is about Semantic Web standards“ can be stored in an RDF statement containing a relationship between the subject of this sentence (this article) and the object (Semantic Web Standards). “Is about” is the predicate indicating the type of relationships existing between the subject and the object. All the parts of the RDF statement can enter multiple relationships between other parts. Thus data can be stored in these triples and further easily interchanged.

Querying Data and Discovering Relationships (SPARQL)

SPARQL is short for SPARQL Protocol and RDF Query Language and is a language used for querying, retrieving and manipulating data stored in RDF format. Think of it as a query language that allows users to search the Web of Data (or any database) and discover relationships. It is a powerful language that goes way beyond keyword search.

Using the Semantic Web Standards to Model Data

The most important technical challenge today in managing big data is variety (heterogeneity of data and diversity of data sources). The only effective way to handle heterogeneity is a semantic approach: develop some form of vocabulary, knowledge base or ontology, and use semantic information extraction and annotate heterogeneous data to improve interoperability and integration.

cit. Amit Shett of Knoesis
Web vs. Semantic Web

The technology standards of the Semantic Web enable more and more enterprises, application builders and information retrieval systems to handle data in cost-effective and agile manner. Companies in the fields of media and publishing, financial services, life sciences, health care, where effective data management is vital, have been among the early adopters of these technologies.

In today’s data driven world, flexibility and interoperability for data modeling are critical for everyone who wants to stay in business. This is why semantic technologies are finding their way among a broader range of industries. The robustness of these standards for enterprise solutions for easy and quick retrieval of data, actionable knowledge management and business intelligence is what makes the number of organizations turning to semantics grow.

Using Semantic Web Standards for modeling data might still seem a considerable investment of time and money, yet these are a small price to pay for a powerful technology for representing relationships, in all their richness and diversity, within any domain of knowledge.

Teodora Petkova

Teodora Petkova

Teodora is a philologist fascinated by the metamorphoses of text on the Web. Curious about our networked lives, she explores how the Semantic Web vision unfolds, transforming the possibilities of the written word.
Teodora Petkova

Related Posts

  • featured image

    Linked Open Data for Cultural Heritage and Digital Humanities

    The Galleries, Libraries, Archives and Museums (GLAM) sector deals with complex and varied data. Integrating that data, especially across institutions, has always been a challenge. On the other hand, the value of linked data is especially high in this sector, since culture by its very nature is cross-border and interlinked.

  • Revolution of Linked Open Data

    The Web as a CMS: How BBC joined Linked Open Data

    Editorial wants to create faultless content and it is hard for them to imagine that quality coming from anyone else but their team. The dilemma these days is how do you maintain that high-quality in an era of shrinking editorial budgets and ever increasing amounts of data. Early on the BBC decided not to mint their own IDs but to utilise existing URIs for musical artists from a freely available database MusicBrainz. BBC went further and made the strategic decision to also use its resources to help improve the MusicBrainz database.

  • Linked Data Solutions for Healthcare featured image

    Linked Data Solutions in Healthcare

    Linked Data Solutions for Healthcare breaks down the barriers of information silos and gives an all-round comprehensive view of patients’ and research data. They help organizations enrich unstructured patient data with all the terminology in order to identify entities such as generic and branded drugs, recommended and prescribed dosage, or adverse event reactions. Linked data also makes text mining easier as it eliminates ambiguity with the help of those vocabularies, referred to as ontologies in the semantic technology industry.

Back to top