The Web as a CMS: How BBC joined Linked Open Data

I was looking at the slides from a recent talk by Paul Rissen, Senior Data Architect at the BBC, about the history of Linked Data usage at the organisation. One of his slides, number 20 to be exact, reminded me of how quietly revolutionary the work at the BBC has been. The slide was titled ‘The Web as a Content Management System’.

The Web as a CMS: BBC editorial staff are contributing to MusicBrainz and Wikipedia instead of internal systems

First Successes

Early on the BBC decided not to mint their own ids but to utilise existing URIs for musical artists from a freely available database MusicBrainz. For the uninitiated, a URI (Uniform Resource Identifier) is a way for the computer to identify a thing and it is one of the basic concepts in Linked Data paradigm.

Firstly, this instantly gave them a database of 50 million artists, albums and songs. This saved the BBC a huge time and expense. Each MusicBrainz entry has a link to yet another data source dBpedia which has the text description of the artist from wikipedia.

That’s the ‘magic’ of linked data. By magic, I mean, exactly how a graph (of data) works. Everything is connected. Follow the links, gather the content.

The magic of linked data

Fast, Cheap and Out of Control

I can imagine what the conversations with the heads of editorial when the techies suggested the idea of using ‘wild’ data more often called open data. I’ve been involved in similar conversations. There is always a fear of losing control.

Editorial wants to create faultless content and it is hard for them to imagine that quality coming from anyone else but their team. The dilemma these days is how do you maintain that high-quality in an era of shrinking editorial budgets and ever increasing amounts of data.

See what Jem Rayfield, Senior Technical Architect at BBC at that time had to say about the complexity of the data the BBC Olympics 2012 site had to manage.

The ability to automatically and reliably make use of information on the web FOR FREE must have convinced the skeptics on the editorial side of the BBC. To give you an idea of just how much information is out there: dBpedia has data for 4.58 million things (e.g. people, places, music, film, video games, organisations, species, etc). Wikidata, another general information data source, has 26 million similar kinds of ‘items’.

The Quiet Revolution of Linked Open Data

The use of Linked Open Data would have been one battle that would have been fought. BBC went further and made the strategic decision to also use its resources to help improve the MusicBrainz database. When errors were found, the BBC fixed the mistake in the external data source and not within the walled garden of BBC’s ICT infrastructure where only the BBC could benefit from the organisation’s editorial expertise. Of course, the BBC’s charter requires the organisation to provide ‘benefit’ to the public and contributing to the free and open MusicBrainz database fits nicely with that public service remit.

Regardless of its public service remit, this is a strategically smart approach and one of those quietly revolutionary ideas behind ‘the Web as CMS’. The BBC’s contributions add value to a resource, the MusicBrainz database. That added value, in turn, makes that resource more attractive to others who will use the resource and further improve that data. This virtuous cycle is how Wikipedia became a ubiquitous part of our lives online. The BBC is one of the main beneficiaries of their altruism.

Today the list of MuzicBrainz’ contributors includes names like last.fm, Spotify and Universal Music who inject Linked Open Data into thier knowledge management infrastructure to enhance the effectiveness of their catalogues metadata.

The Quiet Revolution of using Linked Open Data

Ten Years On

Ten years after the BBC started down the Linked Data path still makes some editors, and even IT directors, worried.

The lack of control is still a concern. Each time an organisation looks at using open data, the same conversation has to be had. What about mistakes or deliberate errors introduced into the data sources? The importance is to be able to trace the provenance of the error. Every organisation will have a means to trace the source of an error that doesn’t change when you are using the web as your CMS. It’s just that you have a few extra thousand pairs of eyes also on the content who are more likely to catch the error and fix it before your relatively small team.

The choice to use open data is not an all or nothing proposition. Use what you need, ignore the rest. Of course, you can create a guarantee for the data, create a vetting process, track deltas, etc. You can even pull the data into the walled garden of your organisation and never share and play nice with the rest of the community. Just as there is a concern about the data coming in, people worry that they will lose control of the data going out. Rest assured, you can still make the business decision on what internal data you want to share and what you feel commands a premium.

Using wild data

It’s Getting Better All the Time

Those arguments were true ten years ago as they are today, but back then it was hard to convince organizations of the advantages of open data. That was ten years ago. The use of open data is commonplace now. There are only numerous examples proving the real value that open data provides. We have moved from the bold experiments of the BBC to ‘ignore at your peril’.

The scale of content and data that an organisation must make sense of has long ago gone beyond what can be handled by one organisation. The data problems that the giants like google, twitter and facebook were dealing with ten years ago are the problems that all organisations are dealing with. This has made it more likely that organisations can’t afford to manage data and content without making use of the data that exists openly and freely on the web. The simple but radical idea of ‘The Web as a CMS’ is increasingly the norm.

GraphDB Free Download

GraphDB Free

Run your first query and discover meaning in your data

Download

Jarred McGinnis

Jarred McGinnis

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.
Jarred McGinnis

Related Posts

  • Open data fosters a culture of creativity and innovation

    Open Data Innovation? Open Your Data And See It Happen.

    As more and more companies and startups are creating business and social value out of open data, the open data trend-setting governments and local authorities are not sitting idle and are opening up data sets and actively encouraging citizens, developers, and firms to innovate with open data.

  • Linked Open Data Sets

    Linked Data Innovation – A Key To Foster Business Growth

      ‘Data is the new oil’, once said Neelie Kroes,  former Vice-President of the European Commission responsible for the Digital Agenda, aptly describing how the growing amounts of data are changing businesses and our lives. The year…

  • Feaured image Linked Open Data

    Connectivity, Open Data and A Bag of Chips

    Often considered too technical and hard to implement Linked Open Data is actually not something outside business and free exchange as usual – it is connectivity, but on a data level. Global connectivity transformed the way we…

Back to top