The Web as a CMS: How BBC joined Linked Open Data

I was looking at the slides from a recent talk by Paul Rissen, Senior Data Architect at the BBC, about the history of Linked Data usage at the organisation. One of his slides, number 20 to be exact, reminded me of how quietly revolutionary the work at the BBC has been. The slide was titled ‘The Web as a Content Management System’.

The Web as a CMS: BBC editorial staff are contributing to MusicBrainz and Wikipedia instead of internal systems

First Successes

Early on the BBC decided not to mint their own ids but to utilise existing URIs for musical artists from a freely available database MusicBrainz. For the uninitiated, a URI (Uniform Resource Identifier) is a way for the computer to identify a thing and it is one of the basic concepts in Linked Data paradigm.

Firstly, this instantly gave them a database of 50 million artists, albums and songs. This saved the BBC a huge time and expense. Each MusicBrainz entry has a link to yet another data source dBpedia which has the text description of the artist from wikipedia.

That’s the ‘magic’ of linked data. By magic, I mean, exactly how a graph (of data) works. Everything is connected. Follow the links, gather the content.

The magic of linked data

Fast, Cheap and Out of Control

I can imagine what the conversations with the heads of editorial when the techies suggested the idea of using ‘wild’ data more often called open data. I’ve been involved in similar conversations. There is always a fear of losing control.

Editorial wants to create faultless content and it is hard for them to imagine that quality coming from anyone else but their team. The dilemma these days is how do you maintain that high-quality in an era of shrinking editorial budgets and ever increasing amounts of data.

See what Jem Rayfield, Senior Technical Architect at BBC at that time had to say about the complexity of the data the BBC Olympics 2012 site had to manage.

The ability to automatically and reliably make use of information on the web FOR FREE must have convinced the skeptics on the editorial side of the BBC. To give you an idea of just how much information is out there: dBpedia has data for 4.58 million things (e.g. people, places, music, film, video games, organisations, species, etc). Wikidata, another general information data source, has 26 million similar kinds of ‘items’.

The Quiet Revolution of Linked Open Data

The use of Linked Open Data would have been one battle that would have been fought. BBC went further and made the strategic decision to also use its resources to help improve the MusicBrainz database. When errors were found, the BBC fixed the mistake in the external data source and not within the walled garden of BBC’s ICT infrastructure where only the BBC could benefit from the organisation’s editorial expertise. Of course, the BBC’s charter requires the organisation to provide ‘benefit’ to the public and contributing to the free and open MusicBrainz database fits nicely with that public service remit.

Regardless of its public service remit, this is a strategically smart approach and one of those quietly revolutionary ideas behind ‘the Web as CMS’. The BBC’s contributions add value to a resource, the MusicBrainz database. That added value, in turn, makes that resource more attractive to others who will use the resource and further improve that data. This virtuous cycle is how Wikipedia became a ubiquitous part of our lives online. The BBC is one of the main beneficiaries of their altruism.

Today the list of MuzicBrainz’ contributors includes names like, Spotify and Universal Music who inject Linked Open Data into their knowledge management infrastructure to enhance the effectiveness of their catalogues metadata.

The Quiet Revolution of using Linked Open Data

Ten Years On

Ten years after the BBC started down the Linked Data path still makes some editors, and even IT directors, worried.

The lack of control is still a concern. Each time an organisation looks at using open data, the same conversation has to be had. What about mistakes or deliberate errors introduced into the data sources? The importance is to be able to trace the provenance of the error. Every organisation will have a means to trace the source of an error that doesn’t change when you are using the web as your CMS. It’s just that you have a few extra thousand pairs of eyes also on the content who are more likely to catch the error and fix it before your relatively small team.

The choice to use open data is not an all or nothing proposition. Use what you need, ignore the rest. Of course, you can create a guarantee for the data, create a vetting process, track deltas, etc. You can even pull the data into the walled garden of your organisation and never share and play nice with the rest of the community. Just as there is a concern about the data coming in, people worry that they will lose control of the data going out. Rest assured, you can still make the business decision on what internal data you want to share and what you feel commands a premium.

Using wild data

It’s Getting Better All the Time

Those arguments were true ten years ago as they are today, but back then it was hard to convince organizations of the advantages of open data. That was ten years ago. The use of open data is commonplace now. There are only numerous examples proving the real value that open data provides. We have moved from the bold experiments of the BBC to ‘ignore at your peril’.

The scale of content and data that an organisation must make sense of has long ago gone beyond what can be handled by one organisation. The data problems that the giants like google, twitter and facebook were dealing with ten years ago are the problems that all organisations are dealing with. This has made it more likely that organisations can’t afford to manage data and content without making use of the data that exists openly and freely on the web. The simple but radical idea of ‘The Web as a CMS’ is increasingly the norm.

GraphDB Free Download

GraphDB Free

Run your first query and discover meaning in your data


Jarred McGinnis

Jarred McGinnis

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.
Jarred McGinnis

Related Posts

  • Featured Image

    Weaving Data Into Texts: The Value of Semantic Annotation

    Semantic annotation is about weaving data into textual sources. In semantically annotated texts, certain words (denoting things, people, locations, organizations, etc) are linked to data – that is, to context and references that can be processed by an algorithm.

  • Datathon Case Overview: Revealing Hidden Links Through Open Data

    For the first Datathon in Central and Eastern Europe, the Data Science Society team and the partner companies provided various business cases in the field of data science, offering challenges to the participants who set out to solve them in less than 48 hours. At the end of the event, there were 16 teams presenting their results after a weekend of work.

  • Featured Image

    Exploring Linked Open Data with FactForge

    Our way out of data confusion and into data abundance is the portion of the growingly interconnected data on the web. With FactForge as a convenient entry point to the web of interconnected data, we can turn the exciting opportunities that data flows on the web can pour into our business into real experience.

Back to top