Ontotext released Linked Leaks, a linked data portal that allows the Panama Papers database to be explored in combination with additional knowledge from other datasets such as DBPedia and GeoNames. Linked Leaks provides unique facilities for querying and analysing the recently published Panama Papers leak database and aims to showcase the role of Linked Data in Investigative data journalism. The Linked Leaks dataset is published as linked open data, this way becoming part of the Web of data. It is also available for download as a dump of 20 million RDF statements to allow others to build applications using it.
On the 9th of May, ICJI released the Panama Papers databasewith information about more than 200,000 offshore entities that are part of the Panama Papers investigation and more than 100,000 additional companies that were part of the 2013 ICIJ Offshore Leaks investigation. It includes “information about companies, trusts, foundations and funds incorporated in 21 tax havens, from Hong Kong to Nevada in the United States. It links to people in more than 200 countries and territories.” It also includes “information about more than 100,000 additional companies that were part of the 2013 ICIJ Offshore Leaks investigation”.
Linked Leaks publishes the Panama Papers data as a knowledge graph, according to the Linked Data principles and extends it with links to additional data from DBpedia (a structured version of Wikipedia) and GeoNames (a database with all geographic features on the Earth). This allows users to enter the URL identifier of an entity or a person (for example JENTAL LIMITED) in a web browser and see all the information available about this entity in the database. Third party applications can retrieve the relevant information about a resource making HTTP GET with its URL. On top of the graph structure of the ICJI database, Ontotext has provided additional classification of the relationship types, e.g., beneficiary and shareholder are mapped as variants of an ownership relation. Ontotext’s GraphDB semantic database engine interprets these pieces of additional information about the data scheme and infers new relationships. In this way, about 2 million new facts are inferred in the initial version of Linked Leaks.
The Linked Leaks data allows for all sorts of discovery and analytics queries such as:
- Companies that have more than one shareholder in common with another company;
- Companies related to a given shareholder (be it a person or an organization), including control relationships;
- Companies that control other companies in the same country through a company in an offshore zone;
- The most popular offshore jurisdictions;
- and more.
Technically speaking, the Linked Leaks project contributes to the Data world in three significant ways:
- Adding rich semantics to the initial graph model in a way that is compliant with the W3C standards, and publishing data and the resulting schema model in RDF.
- Linking the Panama Papers dataset to other existing Linked Open Datasets.
- Providing examples and guidance about how to use this data for rich investigative findings.
In the Linked Leaks data, there are links from Leaks country identifiers to the DBpedia identifiers for the same countries. As there are plenty of datasets that are already mapped to DBpedia, this effectively links the Linked Leaks data to many other datasets. E.g., countries from the LInked Leaks dataset are mapped (through owl:sameAs statements) to the identifiers for the same countries in DBpedia and, through it, also to GeoNames. In this way, one can ask for the preferred offshore service providers or jurisdictions used by owners in Eastern Europe, because the information which countries are part of a specific geographic region is available in GeoNames.
The result is a dataset that is semantically represented in a hierarchy of relationships and connections, ready for download and loaded into a NoSQL semantic graph database – GraphDB.
This is a work in progress! We continuously enrich the dataset adding new relations that allow better interpretation and analysis of the raw data. We also add more mappings to other datasets and provide new sample queries. We plan to map this data to the Financial Industry Business Ontology (FIBO), so that one can query and analyse the data using its semantics.