Sir Arthur Conan Doyle was The Daily Mail’s star reporter at the marathon race in the 1908 London Olympics. His reports of the dramatic competition contributed to creating the marathon myth and making Italian long-distance runner Dorando Pietri a legend. Pietri collapsed just a few feet before the finish line and was assisted through it by officials, which later led to his disqualification.
Conan Doyle – who wrote “the Italian’s great performance can never be effaced from our records of sport, be the decision of the judges what it may” – led a campaign for the Italian runner and raised £309, which Pietri donated to charity.
In today’s sports and reporting, journalists are a kind of Conan Doyle’s most famous fictional detective in literature, Sherlock Holmes.
Sports journalists have to deal with a lot of facts and statistics in order to support their articles with information and increase trust, to investigate the connection between money, sports and sponsorship, and to pursue transparency and accountability from governments and authorities.
Still, what is a great detective without their signature hat, pipe and magnifying glass. In sports reporting, the magnifying glass is the statistics that makes the backbone of a match report, an exclusive interview or an insightful original story behind the numbers.
Data is the primary source and the easier it is to obtain, the easier the job of the writer would be. Open Data is freely accessible for anyone to use and share so it does not need freedom of information requests. Having open datasets of statistics per sport or athlete helps sports writers have the basics of their stories, or add another layer or angle to their articles. Yet, huge raw datasets from various sources may turn out to be even more difficult to analyze than getting to sit down with Leo Messi for an exclusive interview.
So our sports journalist Sherlock may need a little help from a semantic technology ‘magnifying glass’ to see the whole content organized and interlinked. The semantic graph database for example helps enterprises store, manage, reuse and repurpose content. Semantics, when used in a graph database, creates one rich knowledge base linked with the open datasets of DBPedia and GeoNames.
The ability to show links between entities and more importantly, the capacity to infer new links out of existing facts, is what distinguishes the semantic graph database, also known as an RDF triplestore, from the relational database or any spreadsheet-like set of top goal scorers or league standings.
The BBC Sport website is the poster boy for a semantic news publishing technology. The BBC first started using a Dynamic Semantic Publishing platform for the 2010 World Cup and scaled it up to a Linked Data platform for its online content.
Semantic technology stores the statement ‘Wayne Rooney plays for England’, for example, in the form of a subject-predicate (verb)-object statement, also called a triple. ‘Wayne Rooney’ is the subject, ‘plays for’ is the predicate, and ‘England’ is the object, with the predicate showing the relationship between the subject and object.
The semantic technology infers that Rooney currently plays in Group B of the UEAFA Euro 2016, England’s group in the tournament, which generates more content to be further reused, repurposed or repackaged.
Open data surely contributes to content creation. It can provide the raw numbers for reporters to back their investigative stories with. Having the technology to easily store, reuse and manage data, sports journalists focus their efforts and time on writing what the data actually reveals and have enough time for the exclusive reporting, investigative analyses or interviews.
Furthermore, open data from smart cities or from government agencies help both journalists and fans with information on venues, stadium capacity, security, routes to the venues, traffic. For example, Rio de Janeiro and research groups have created transport mobility apps for the upcoming 2016 Summer Olympics which Rio is hosting this August. The apps, based on open data, will be offering travel options to facilitate the arrival of spectators to competition venues.
The UK’s data.gov.uk government data features datasets such as Sport Pitches Playing Fields and Statistics on football banning orders. The publisher of the latter set, the Home Office, has included in the data the number of arrests and banning orders issued during the season, shown by club and by offence.
Governments in the UK and the US have already opened many public datasets, including such on government spending.
At the same time, sport and its implications on the global business, sponsorship and bidding to host Olympic Games or World Cups have become front-page news and quite worthy of investigative reporting, not only by sports writers. Unfortunately, data on these are not open-source and open-access, and may well not be such soon.
In the huge bribery scandal at FIFA last year, the US indicted 14 FIFA officials over “rampant, systemic, and deep-rooted” corruption, especially in accepting bribes for securing that certain nations would host World Cups. This is why our sports and investigative Sherlock Holmes should push for more transparency and accountability via data openness.
In June 2015, Jack Hardinges from the Open Data Institute (ODI) wrote: “Adopting an open data policy could act as a turning point for FIFA. It could be the way to restore faith and trust in it as the global face of football amongst fans, sponsors and wider global community.”
It’s not only FIFA that needs opening up data, though. The use of banned performance-enhancing substances, as well as the TV and sponsorship deals in all sports, are also worthy of reporting and investigating, and open data, if and when it becomes available, would do a huge favor not only to journalism but also to society and the general public as a whole.
Until open data becomes available and of actual value, media organizations, financial institutions and big data analytics enthusiasts have taken to crunching data from various sources to predict the outcomes of various tournaments.
Yahoo experts favor Germany to win the ongoing Euro 2016 tournament based on data from Yahoo Sport and millions of Tumblr posts. Goldman Sachs sees hosts France winning the tournament based on the historical performance of each team. The Financial Times’s John Burn-Murdoch backs Spain to win, on the basis of the players’ Champions League appearances and market values. As the tournament unfolds we’ll see who, if any of those, gets it right.
Meanwhile, our sports reporter Sherlock is looking through a semantic technology magnifying glass to unlock the value of open data, linked data and big data in sports reporting in order to engage more audience and seek truth and transparency. In a football analogy, late Johan Cruyff put it like this: “You play football with your head, and your legs are there to help you.”