SPARQL, short for “SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF.
SPARQL vs SQL
In addition, a SPARQL query can also be executed on any database that can be viewed as RDF via middleware. For example, a relational database can be queried with SPARQL by using a Relational Database to RDF (RDB2RDF) mapping software.
This is what makes SPARQL such a powerful language for computation, filtering, aggregation and subquery functionality.
In contrast to SQL, SPARQL queries are not constrained to work within one database: Federated queries can access multiple data stores (endpoints). Consequently, SPARQL overcomes the constraints posed by local search.
The power of SPARQL together with the flexibility of RDF can lead to lower development costs as merging results from multiple data sources is easier.
These design choices – enabling queries over distributed sources on non-uniform data, are not accidental. SPARQL is designed to enable Linked Data for the Semantic Web. Its goal is to assist people to enrich their data by linking it to other global semantic resources, thus sharing, merging, and reusing data in a more meaningful way.
SPARQL from within
SPARQL sees your data as a directed, labeled graph, that is internally expressed as triples consisting of subject, predicate and object.
Correspondingly, a SPARQL query consists of a set of triple patterns in which each element (the subject, predicate and object) can be a variable (wildcard). Solutions to the variables are then found by matching the patterns in the query to triples in the dataset.
SPARQL has four types of queries. It can be used to:
- ASK whether there is at least one match of the query pattern in the RDF graph data;
- SELECT all or some of those matches in tabular form (including aggregation, sampling and pagination through OFFSET and LIMIT);
- CONSTRUCT an RDF graph by substituting the variables in those matches in a set of triple templates; or
- DESCRIBE the matches found by constructing a relevant RDF graph.
The top semantic graph databases that support SPARQL, including GraphDB Free, feature intuitive SPARQL editors with autocomplete, explorer and more that guide data scientists through their path of building powerful SPARQL queries.
The Power of SPARQL in an Example
The biggest strength of SPARQL is navigating relations in RDF graph data through graph pattern matching, where simple patterns can be combined into more complex ones that explore more elaborate relations in the data.
Such relations can be explored by using basic patterns, pattern joins, unions, by adding optional patterns that may extend the information about the found solutions, etc. Furthermore, property paths allow sequential composition (sequencing), parallel composition (alternatives), iterations (Kleene star), inversion, etc.
A basic graph pattern consists of a triple in which each element (the subject, predicate and object) can be a variable (wildcard).
For example, the pattern ‘John’ (a subject)->‘has son’ (a predicate)->X (a wildcard object) will have as a solution each triple in the RDF graph that matches the subject, matches the predicate, and has any object.
So if John has two sons – Bob and Michael, the triples ‘John’->‘has son’->‘Bob’ and ‘John’->‘has son’->‘Michael’ will be the solutions to the SPARQL query.
A SPARQL query can also express an union of alternative graph patterns. Any solution to at least one of the patterns is a solution of the union.
For example the union of patterns ‘John’->‘has son’->X and ‘John’->‘has daughter’->X will have as solutions all of John’s sons and all of John’s daughters.
A group graph pattern is a join of two (or more) basic graph patterns. Unlike the union, it requires that both (or all) patterns are matched. So a join of ‘John’->‘has son’->Y and Y->‘has son’->Z will have as matching solutions the sons of John’s sons.
The sons of John’s daughters, however, will not be returned because the first basic pattern in the query, namely ‘John’->‘has son’->Y, will not be matched by a triple in the data such as ‘John’->‘has daughter’->‘Anna’.
So even if, ‘Anna’->‘has son’->‘Timmy’, Timmy will not show up as a solution of the above join. Luckily, an alternative graph pattern and a group graph pattern can easily be combined. So a union of ‘John’->‘has son’->Y and ‘John’->‘has daughter’->Y grouped with Y->‘has son’->Z will find all of John’s grandsons.
Extensions of SPARQL
SPARQL is not just a query language, but a comprehensive set of specifications. SPARQL UPDATE includes queries to delete data, insert data and manipulate graphs. In general, SPARQL Protocol defines how to access SPARQL endpoints and result formats and can be further extended to leverage the uniqueness of various data types.
Standardized extensions include GeoSPARQL for querying geospatial data. Custom extensions supported by GraphDB include full-text search, making queries against external full-text and faceting engines (Lucene, SOLR, ElasticSearch), RDFRank for ordering, SPARQL MM for multimedia and others.
Why using SPARQL?
The wide variety of graph patterns that can be matched through SPARQL queries reflects the wide variety in the data that SPARQL was designed for – the data of the Semantic Web.
Whether it is by including optional values so that solutions are not rejected because some part of the pattern doesn’t match or by combining graph patterns so that one of several alternatives may match, SPARQL can be used efficiently and effectively to extract the necessary information hidden in non-uniform data stored in various formats and sources.
As the inventor of the World Wide Web, creator and advocate of the Semantic Web and W3C Director, Sir Tim Berners-Lee, puts it:
“Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL. SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the Web.”