It’s 1989. We are at CERN, the European Organization for Nuclear Research. Physicists and engineers from all over the world have gathered to seek answers about particle physics, bringing a variety of computers, file formats, software and procedures to the site. The inventor of the World Wide Web, Sir Tim Berners-Lee is also there for a brief consulting software job, working on a way to end the mess, caused by the incompatibility of formats, networks and systems. He is thinking how he can implement his long-thought idea of a software program (a software project, initially named Enquire, which became the predecessor to the World Wide Web) and design a system that will ease the process of interlinking and organizing information at CERN, by finding a way for computers to communicate indirectly over a network.
As Tim Berners-Lee will later write in his book “Weaving the Web”:
Suppose all the information stored on computers everywhere were linked, I thought. Suppose I could program my computer to create a space in which anything could be linked to anything. All the bits of information in every computer at CERN, and on the planet would be available to me and to anyone else. There would be a single , global information space.
It was that same desire of linking anything to anything that grew into a proposal, issued to CERN in 1990: Information Management: A Proposal and was the conceptual basis that further gave rise to the World Wide Web.
Fast forward a quarter century later. The World Wide Web is already a powerful means for collaboration, connecting people from all over the world, letting everyone publish, share, access use and reuse documents and files with of all conceivable formats. Cooperation between people is made easy, the transfer of all kinds of content too. What Sir Tim Berners-Lee envisioned has come true. Partly, though.
Now that the cooperation between people has become effortless in many ways, it is the communication of computer systems capable of understanding the mountains of data put on the Web that will truly unfold the potential of the Web.
As far as data are concerned, still, the same daunting task of managing, sharing, reusing and automatic processing is before us. Only this time it has to do with bringing collaboration to the next level and linking anything to anything on data level. For that to happen data on the Web are to be put in a understandable and processable by machines form, and not locked into siloed, proprietary data formats that impede knowledge storage, access and retrieval.
But what is the road to integration and interoperability of data? What will make for a common framework which will facilitate the sharing and reuse of data by computers, the way the common language HTML allowed them to share and represent hypertext?
Only when we have this extra level of semantics will we be able to use computer power to help us exploit the information to a greater extent than our own reading.
cit. Plenary at WWW Geneva 94
The inventor of the Web saw it not only as a Web of People, the ultimate goal of which is to “support and improve our weblike existence in the world”, but also as a Web of Machines, where human collaboration is extended through computers. What Tim Berners-Lee envisioned was a global information space in which computers become capable of carrying out sophisticated tasks through analyzing content, links, transactions between people and machines and. This space he called the Semantic Web.
A layer in the fabric of the Web as we know it, only woven of machine-readable data, the Semantic Web is to become a highly interconnected network where the huge amount of heterogeneous data is be given well-defined meaning. Ultimately, just like we have a web of documents, we will have a web of data that will be processed on our behalf by autonomous agents aware of the context and the meaning of data pieces and able to interpret the relationships among them.
In order for that vision of the Semantic Web to be fully realized, there need to be formal standards for representing and interpreting data.
The data web, not unlike the document web, involves standards. W3C (The World Wide Web Consortium) is the international community that represents developers, researchers, organizations and users and where Web standards that make the World Wide Web work are being developed. It is important to mention that the process of the standardization of Web technologies is based on community consensus, that is standards are agreed upon discussions, close collaboration and general agreement between W3C Members, W3C Team and working groups of experts.
In addition to specifications for the “Web of documents”, W3C is dedicated to the development of an ecosystem of standards to support a “Web of data”, i.e. the Semantic Web stack. At the end of 2013, W3C Semantic Web Activity (launched in 2001 “to lead the use of the Web as an exchange medium for data as well as documents”) became part of an initiative with a broader scope, namely W3C Data Activity.
Central to the concept of the Semantic Web is Linked Data. In order for the Semantic Web to function, that is for applications and tools to be able to manage and process data on our behalf, it is important that data pieces are available in a standard format.
Just like there’s lingua franca for representing documents on the Web and that is the Hypertext Markup Language (HTML), a common format for data to be represented and shared exists and it is called Resource Description Framework (RDF). A standard model for data interchange on the Web, RDF, is among the main building blocks of the Semantic Web Stack, together with other Semantic Web technologies, such as OWL, SKOS, SPARQL etc.
Semantic Web technologies have the immense potential to address the need for connected, discoverable and understandable by humans and machines data and empower Linked Data. And Linked Data, together with the precious LOD – a vast subject in its own right, is what makes for more effective discovery, automation, integration, and reuse of information.
In the paragraphs below you will find a short introduction to three of the essential technologies, part of the Semantic Web architecture: URI, RDF and SPARQL.
If you want to learn more about these technologies and their enterprise application, you are more than welcome at Ontotext’s Developer’s Hub, as well as to browse the following resource with some basic Semantic Web concepts, available here.
URI stands for Uniform Resource identifier and it is used to address everything – from documents and digital contents available on the Web to real objects and abstract concepts. Think of it as naming. In order to describe anything or to refer to anything you need to name it. So, on the Semantic Web things are named with an URI. Also, on the Semantic Web anyone can name anything, just like in real life. One thing can have different names (URIs) that people are referring to it with. URI are the building element of RDF.
RDF stands for Resource Description Framework. It is used for describing resources on the web. When you already have an URI, you can use RDF to say things about things, that is to create statements. Think of this as building sentences. RDF statements consist of Subject, Predicate and Object, the same way our sentences consist of these three. For example, the sentence “This article is about Semantic Web standards“ can be stored in an RDF statement containing a relationship between the subject of this sentence (this article) and the object (Semantic Web Standards). “Is about” is the predicate indicating the type of relationships existing between the subject and the object. All the parts of the RDF statement can enter multiple relationships between other parts. Thus data can be stored in these triples and further easily interchanged.
SPARQL is short for SPARQL Protocol and RDF Query Language and is a language used for querying, retrieving and manipulating data stored in RDF format. Think of it as a query language that allows users to search the Web of Data (or any database) and discover relationships. It is a powerful language that goes way beyond keyword search.
The most important technical challenge today in managing big data is variety (heterogeneity of data and diversity of data sources). The only effective way to handle heterogeneity is a semantic approach: develop some form of vocabulary, knowledge base or ontology, and use semantic information extraction and annotate heterogeneous data to improve interoperability and integration.
The technology standards of the Semantic Web enable more and more enterprises, application builders and information retrieval systems to handle data in cost-effective and agile manner. Companies in the fields of media and publishing, financial services, life sciences, health care, where effective data management is vital, have been among the early adopters of these technologies.
In today’s data driven world, flexibility and interoperability for data modeling are critical for everyone who wants to stay in business. This is why semantic technologies are finding their way among a broader range of industries. The robustness of these standards for enterprise solutions for easy and quick retrieval of data, actionable knowledge management and business intelligence is what makes the number of organizations turning to semantics grow.
Using Semantic Web Standards for modeling data might still seem a considerable investment of time and money, yet these are a small price to pay for a powerful technology for representing relationships, in all their richness and diversity, within any domain of knowledge.