Knowledge Path Series: 2. SPARQL 1.1

Full Support for the Specification

You may not know that Ontotext GraphDB™ offers full support of the SPARQL 1.1 specification including key areas such as:

SPARQL 1.1 Protocol for RDF defines the means for transmitting SPARQL queries to a SPARQL query processing service, and returning the query results to the entity that requested them.

SPARQL 1.1 Query provides more powerful query constructions compared to SPARQL 1.0. It adds:

  • Aggregates
  • Subqueries
  • Negation
  • Expressions in the SELECT clause
  • Property Paths
  • Assignment
  • An expanded set of functions and operators

SPARQL 1.1 Update provides a means to change the state of the database using a query-like syntax. SPARQL Update has similarities to SQL INSERT INTO, UPDATE WHERE and DELETE FROM behavior. Full details are provided on the W3C SPARQL Update working group page, but here is a brief summary of the various types of modification operations on the RDF triples:

  • INSERT DATA {…} – inserts RDF statements;
  • DELETE DATA {…} – removes RDF statements;
  • DELETE {…} INSERT {…} WHERE {…} – for more complex modifications;
  • LOAD (SILENT) from_iri – loads an RDF document identified by from_iri;
  • LOAD (SILENT) from_iri INTO GRAPH to_iri – loads an RDF document into the local graph called to_iri;
  • CLEAR (SILENT) GRAPH iri – removes all triples from the graph identified by iri;
  • CLEAR (SILENT) DEFAULT – removes all triples from the default graph;
  • CLEAR (SILENT) NAMED – removes all triples from all named graphs;
  • CLEAR (SILENT) ALL – removes all triples from all graphs.

The following operations are used to manage graphs:

  • CREATE – creates a new graph in stores that support empty graphs;
  • DROP – removes a graph and all of its contents;
  • COPY – modifies a graph to contain a copy of another;
  • MOVE – moves all of the data from one graph into another;
  • ADD – reproduces all data from one graph into another.

SPARQL 1.1 Federation provides extensions to the query syntax for executing distributed queries over any number of SPARQL endpoints. This feature is very powerful, and allows integration of RDF data from different sources using a single query.

For example, to discover DBpedia resources about people who have the same names as those stored in a local repository:

SELECT ?dbpedia_id
WHERE {
   ?person a foaf:Person ;
           foaf:name ?name .
   SERVICE <http://dbpedia.org/sparql> {
        ?dbpedia_id a dbpedia-owl:Person ;
                    foaf:name ?name .   }}

The above query matches the first part against the local repository and for each person it finds, it checks the DBpediaSPARQL endpoint to see if a person with the same name exists and if so returns the id.

Since Sesame repositories are also SPARQL endpoints, it is possible to use the federation mechanism to do distributed querying over several repositories on a local server. For example, imagine two repositories – one repository (called ‘my_concepts’) with triples about concepts and a separate repository (called ‘my_labels’), which contains all the label information. To retrieve the corresponding label for each concept the following query can be executed on the ‘my_concepts’ repository:

SELECT ?id ?label
WHERE {
    ?id a ex:Concept .
    SERVICE <http://localhost:8080/openrdf-sesame/repositories/my_labels> {
        ?id rdfs:label ?label.    }}

Federation must be used with caution, first of all to avoid doing excessive querying of remote (public) SPARQL endpoints, but also because it can lead to inefficient query patterns. The following example finds resources in the second SPARQL endpoint, which have a similar rdfs:label to the rdfs:label of <http://dbpedia.org/resource/Vaccination> in the first SPARQL endpoint:

 PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT ?endpoint2_id {
    SERVICE <http://faraway_endpoint.org/sparql>
    {
        ?endpoint1_id rdfs:label ?l1 .
        FILTER( lang(?l1) = "en" )
    }    SERVICE <http://remote_endpoint.com/sparql>
    {
        ?endpoint2_id rdfs:label ?l2 .
        FILTER( str(?l2) = str(?l1) )
    }
}
BINDINGS ?endpoint1_id
{ ( <http://dbpedia.org/resource/Vaccination> ) }

However, such a query is very inefficient, because no intermediate bindings are passed between endpoints. Instead, both sub-queries execute independently, requiring the second sub-query to return all X rdfs:label Y statements that it stores. These are then joined locally to the (likely much smaller) results of the first sub-query.

SPARQL 1.1 Graph Store HTTP Protocol provides a means for updating and fetching RDF graph content from a Graph Store over HTTP in the REST style. The URL patterns for this new functionality are provided at:

  • <SESAME_URL>/repositories/<repo_id>/rdf-graphs/service (for indirectly referenced named graphs)
  • <SESAME_URL>/repositories/<repo_id>/rdf-graphs/<NAME> (for directly referenced named graphs).

The methods supported by these resources and their effects are:

  • GET fetches statements in the named graph from the repository in the requested format.
  • PUT updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.
  • DELETE deletes all data in the specified named graph in the repository.
  • POST updates data in the named graph in the repository by adding the supplied data to any existing data in the named graph. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.

Request headers:

  • Accept’: Relevant values for GET requests are the MIME types of supported RDF formats.
  • ‘Content-Type’: Must specify the encoding of any request data that is sent to a server. Relevant values are the MIME types of supported RDF formats.

For requests on indirectly referenced named graphs, the following parameters are supported:

  • ‘graph’ (optional): specifies the URI of the named graph to be accessed.
  • ‘default’ (optional): specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.

Each request on an indirectly referenced graph needs to specify precisely one of the above parameters.

Named Graphs

An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a named graph. An RDF database has one more graph that does not have a name and this is called the ‘default graph’.

The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an ‘RDF Dataset’, which identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:

  • FROM uri – brings statements from the database’s graph, identified by ‘uri’ in to the dataset’s default graph, i.e. the statements ‘lose’ their graph name.
  • FROM NAMED uri – brings the statements from database’s graph identified by ‘uri’ in to the dataset, i.e. the statements keep their graph name.

If either FROM or FROM NAMED are used, then the database’s default graph is no longer used as input for processing this query. In effect, the combination of FROM and FROM NAMED clauses exactly defines the dataset. This is somewhat bothersome, as it precludes the possibility, for instance, of executing a query over just one named graph and the default graph. However, there is a programmatic way to get around this limitation that is described below.

The default SPARQL dataset

The SPARQL specification does not define what happens when no FROM or FROM NAMED clauses are present in a query, i.e. it does not define how a SPARQL processor should behave when no dataset is defined. In this situation, implementations are free to construct the default dataset in any way they please.

GraphDB constructs the default dataset as follows:

  • The dataset’s default graph contains the merge of the database’s default graph AND all the database’s named graphs.
  • The dataset contains all named graphs from the database.

This means that if a statement ex:x ex:y ex:z exists in the database in the graph ex:g, then the following query patterns will behave as follows:

Query Bindings
SELECT * { ?s ?p ?o } ?s=ex:x ?p=ex:y ?o=ex:z
SELECT * { GRAPH ?g { ?s ?p ?o } } ?s=ex:x ?p=ex:y ?o=ex:z ?g=ex:g

In other words, the triple ex:x ex:y ex:z will appear to be in both the default graph and the named graph ex:g at the same time.

There are a few reasons for this behavior including:

  1. It provides an easy way to execute a triple pattern query over all stored RDF statements.
  2. It allows all named graph names to be discovered, i.e. with this query: SELECT ?g { GRAPH ?g { ?s ?p ?o } }.
Back to top