Knowledge Path Series: 6. owl:sameAs

Disambiguate at Warp Speed

Did you know that GraphDB’s inferencing engine supports owl:sameAs functionality allowing for the identification of two or more URIs that represent the same entity even when they reside in different documents?

Setup

OWL sameAs optimization is “on” by default.  To disable owl:sameAs, simply check the “Disable OWL sameAs optimization box when “Configuring New Repository” as shown below through GraphDB™ Workbench.

owlsameAs1

Functionality

owl:sameAs declares that two different URI denote one and the same resource or object in the world.  Most often it is used to align different identifiers of the same real-world entity used in different data sources.

For example, let’s assert that there are four different URIs for Bulgaria and for Sofia (Capital of Bulgaria).

dbpedia:Sofia owl:sameAs geonames:727011
geonames:727011 geo-ont:parentFeature geonames:732800
dbpedia:Bulgaria owl:sameAs geonames:732800
dbpedia:Bulgaria owl:sameAs opencyc-en:Bulgaria

The standard semantics of owl:sameAs dictate the following:

  • It is a transitive and symmetrical relationship
  • Statements using one URI should be inferred to appear with all equivialent URIs in the same position
  • Thus, the 4 statements in the example above lead to 10 “inferred” statements as follows:

geonames:727011 owl:sameAs dbpedia:Sofia
geonames:732800 owl:sameAs dbpedia:Bulgaria
geonames:732800 owl:sameAs opencyc-en:Bulgaria
opencyc-en:Bulgaria owl:sameAs dbpedia:Bulgaria
opencyc-en:Bulgaria owl:sameAs geonames:732800
dbpedia:Sofia geo-ont:parentFeature geonames:732800
dbpedia:Sofia geo-ont:parentFeature opencyc-en:Bulgaria
dbpedia:Sofia geo-ont:parentFeature dbpedia:Bulgaria
geonames:727011 geo-ont:parentFeature opencyc-en:Bulgaria
geonames:727011 geo-ont:parentFeature dbpedia:Bulgaria

Optimization

GraphDB™ features an optimization that allows it to use a single master-node in its indices to represent a class of sameAs-equivalent URIs which avoids inflating the indices with multiple equivalent statements as shown above.

For example if a statement has 5 sameAs equivalents for it’s object, 2 for it’s predicate and 3 for it’s object, then such a statement would have 30 replicas in the indices (after forward-chaining from inferencing) if no optimization is utilized.

As stated, owl:sameAs can lead to a proliferation of bindings during query evaluation and leads to an expanded result-set with rows which differ only by referring to different URIs from the same equivalent class.

Examples

GraphDB™ is optimized to allow the expansion over equivalent URIs to be switched off on a per-query basis by using the following “pseudo-graph” in the query:

FROM <http://ontotext.com/disable-sameAs>

From this statement one can eliminate multiple “equivalent” statement as follows:

SELECT *
FROM <http://ontotext.com/disable-sameAs>
WHERE { ?s geo-ont:parentFeature ?b . }

This will return a single binding which corresponds to the originally-asserted statement that Bulgaria is a parent feature of Sofia as follows:

s=geonames:727011 b=geonames:732800

To read about and test a real-life implementation of owl:sameAs on a 15 billion statement environment visit FactForge, an open access semantic search engine in the life sciences domain which showcases GraphDB’s power to locate resources in the LOD (linked open data) cloud.

Enabling inferencing on such a large scale was accomplished by the use of a custom rule-set. More information and examples can be found mid-way down the following page:

http://ontotext.com/factforge-links/#Inference

The FactForge endpoint shows several examples of owl:sameAs as saved queries above the query box.  http://factforge.net/sparql

Conclusion

In summary, the owl:sameAs optimisation ensures that:

  • All inferences that follow from the application of the standard owl:sameAs semantics are inferred with the optimization on
  • One can determine the “original” version of the statement, i.e. which URIs were used when the statement was asserted
  • One can still get all the variations of all statements, if desired
  • standard semantics are simulated on retrieval so that the owl:sameAs optimization is a transparent implementation detail
  • enumeration of equivalent URIs can be disabled for a query in order to reduce the number of query results

Without this optimization, reasoning with linked data becomes inefficient and the query results become overly inflated.

Back to top