FactForge – Open Data and News about People, Organizations and Locations

FactForge – Open Data and News about People, Organizations and Locations

FactForge.net is a hub of Linked Open Data (LOD) and news articles about people, organizations and locations. It includes more than 1 billion facts from popular datasets such as DBpedia, Geonames, Wordnet, the Panama Papers, etc., and ontologies such as the Financial Industry Business Ontology (FIBO). It also includes a live stream of news articles and metadata linking news to entities and concepts: about 2000 articles/day tagged by Ontotext’s Publishing platform.

FactFroge - linked open data about people, locations and organisations; and news articles

Explore FactForge

FactForge.net is a public service that offers free access to data represented as RDF graph. Applications can access the repository via a SPARQL end-point, and people can explore and query the data via the GraphDB Workbench. FactForge features some sample queries that demonstrate its unique capabilities for media monitoring of related entities and analysis of industry trends and company control patterns. The FactForge data resemble and extend BBC’s Dynamics Semantic Publishing use case.

FactForge can be used as a convenient RDF repository, tuned for efficient querying of several central LOD datasets. Some aspects of these datasets have been cleaned up and complemented to allow for more efficient usage, for example, the industry classification of companies and the organization control relationships in DBPedia. This is illustrated by query F08: Most popular companies per industry, including children where one can change dbr:Automotive to dbr:Entertainment or any other sector.

FactForge is unique in offering reasoning with big open data. Users can choose whether they want their queries to “see” only the explicit statements or also all the implicit facts, inferred when interpreting the ontologies and the datasets with respect to the semantics of OWL 2 RL. The service implements the semantics of the owl:sameAs mappings, which is only possible at this scale because of GraphDB’s inference optimizations. In this way, for instance, someone can query facts from Geonames, using the DBPedia identifiers of the locations, as demonstrated in F02: Big Cities in Eastern Europe.

Thе new generation of FactForge was released in December 2016. While it shares a lot with the first generation of FactForge service, there are also major differences, as described below. For convenience, Ontotext also maintains the old service.

Access, Exploration and Querying

FactForge represents a large scale public demonstrator of many of GraphDB‘s advanced features: reasoning, geo-spatial indexing, RDFRank, full-text search connectors and owl:sameAs optimization. The service is available at http://factforge.net.

FactForge benefits from the GraphDB Workbench with its URI auto-suggest available for resource exploration and in the SPARQL editor; it is activated by with Ctrl-Space or Cmd-Space. Its Class hierarchy diagram is indispensable when exploring a repository with over 1400 classes while the Class relationships diagram makes it easier to understand the major patterns of relationships.

Applications can access FactForge via SPARQL Protocol at http://factforge.net/repositories/ff-news. This is also the service address to be used for federated SPARQL queries. SPARQL Protocol (popular as “SPARQL end-point”) allows applications to remotely query and update an RDF repository over HTTP. It represents a REST style application programming interface (API).

Datasets and Ontologies

FactForge loads several LOD datasets in a single GraphDB repository. As described in the next section, cleanup and other corrections are applied to some of these datasets and ontologies.

Here follows a list of the datasets and ontologies included in FactForge:

  • DBPedia: the structured version of the Wikipedia encyclopedia. Only the English version of DBPedia is loaded. The RDF dump used is from November 2015. The DBPedia ontology version loaded is from April 2016 (dbpedia_2016-04.nt).
  • Geonames: a worldwide geographical database, which “contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places”. The loaded RDF dump is from October 2015, Release 1.1.13.
  • Wordnet: the most popular semantic dictionary for English. Words “are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations”. It contains 117 thousand synsets. The RDF dump of Wordnet 3.1 is loaded.
  • WorldFacts: dataset about countries, languages, currencies and other related information. It is developed by the DBPedia association and includes information derived from LEXVO, CIA World FactBook and other datasets.
  • Linked Leaks: The LOD version of the Panama Papers database released by the International Consortium for Investigative Journalism (ICIJ) in May 2016. It includes “about 200,000 offshore entities that are part of the Panama Papers investigation and about more than 100,000 additional companies that were part of the 2013 ICIJ Offshore Leaks investigation”. The Linked Leaks data is published at http://data.ontotext.com.
  • GLEI: Global Legal Entity Identifier profiles of about 211 000 organizations, derived from the GMEI Utility data dump from April 2016. “The Global Markets Entity Identifier (GMEI) utility is DTCC’s legal entity identifier solution offered in collaboration with SWIFT. The GMEI utility is a pre-Local Operating Unit of the Global Legal Entity Identifier System (GLEIS)”
  • NOW News: Article texts and metadata for a stream of general news. The metadata includes annotations that link mentions of entities (e.g., people or organizations) and concepts (e.g., “chocolate” or “recession”) in the news to the corresponding DBPedia and Wikidata concepts. More information is provided in the corresponding section below.

FactForge uses the Financial Industry Business Ontology (FIBO) as an upper-level ontology. Various aspects of the schemata of the different datasets are mapped to the corresponding FIBO classes and relationships. In this way, one can query across different datasets using FIBO. The following two modules of FIBO are loaded into FactForge:

  • Foundations, version 14-11-30 (November 2014);
  • Business Entities, version 15-02-23 (February 2015)

Note: For the datasets that are updated on a regular basis, FactForge will soon provide a periodic synchronization with their most recent versions.

What’s New?

The second generation of FactForge was released in December 2016. While it shares a lot with the earlier FactForge service, there are also major differences. It includes a slightly different collection of datasets and, most importantly, it includes a live stream of news articles and metadata that links news to the rest of the knowledge graph.

As a courtesy to a range of 3rd party services that depend on it and some university courses that use it as a platform for exercises, Ontotext will maintain the old service at http://old.factforge.net until 31st of January 2017. The rest of this section describes the major differences between the first and the second generation of FactForge.

To start with, the intention of the first generation service was mostly to demonstrate how some of the central LOD datasets could be queried efficiently via the PROTON upper-level ontology. This pre-2016 FactForge was packed with a set of sample queries, which showed the beauty of inference across several datasets. However, most of those queries were more in the spirit of the questions in the “Who Wants to Be a Millionaire?” show. They were only good for satisfying intellectual curiosity and for leading to serendipitous discoveries.

The new FactForge aims to demonstrate how a knowledge graph compiled from open data and news metadata feed can serve specific information needs related to people, organizations and locations. The use cases that have steered the development of the service are related to media monitoring of related entities and analysis of industry trends and company control patterns.

Credits

There are several features of the GraphDB semantic database engine without which FactForge would be impossible or at least much less useful and performant. Most notable here is GraphDB’s capability to perform efficient reasoning and query evaluation with large-scale knowledge graphs. Given the size and the diversity of data, making sense of query results would often be troublesome without GraphDB’s RDFRank that provides a way to measure the importance of a node within the graph. The same rank also allows helpful auto-suggestion across millions of entities.

The biggest difference between FactForge and other LOD services is that it is not static. It is live! No one steps in the same FactForge twice. It is constantly being updated with news articles and metadata that links the news to the knowledge graph. FactForge is fed with news metadata from the NOW semantic news demonstration portal. This is only possible because of Ontotext’s Dynamic Semantic Publishing platform, which is amazingly accurate in unsupervised recognition and disambiguation of Wikipedia and Wikidata entities in text.

The technology that made the new generation of FactForge possible is partially funded by the Seventh Framework Programme collaborative research project MULTISENSOR.

Logo Header Menu

Back to top