Knowledge Path Series: 4. Query Optimization

Execute your Queries at Maximum Efficiency

Did you know that GraphDB™ provides advanced intelligent Query Optimization? After a SPARQL query is executed, GraphDB™ uses a statistical approach to reorder and reprioritize sub-selects within the query. This ensures fast query evaluation against very large volumes of data.

Setup

To enable Query Optimization, simply check the Query Optimization box when “Configuring New Repository” as shown below through GraphDB™ Workbench.
Query1

Explain Plan

GraphDB™ has a feature called “Explain Plan,” which explains how GraphDB™ executes SPARQL queries and also includes information about unique subjects, predicates and objects collections sizes.  It helps GraphDB™ users improve query plans leading to better execution performance.

Examples – Based on LDBC SPB

All examples below refer to the LDBC Semantic Publishing Benchmark (SPB) query mix (http://ldbc.eu). This benchmark is chosen, because the domain is easy to understand:

  • articles (instances of CreativeWork)
  • topics
  • mentions (entities found in the content of the articles, such as people, organizations, locations)

Disk-based AVL Trees

GraphDB Storage component uses disk-based AVL Trees to keep triples in ordered fashion. GraphDB keeps two such trees as indices, which contain all statements, sorted by POS or PSO. They can return the triples for a fixed predicate, which either bound or unbound the subject/object.

Join strategy

GraphDB uses indexed nested loops (INL) join strategy. E.g. For when the following query is being executed:

SELECT * where {
  ?x rdf:type rdfs:Class.
  ?x rdfs:label ?label.
}

The first join (let’s say the optimizer chooses “?x a rdf:type”) is translated into a query to the POS index, which returns all triples with P=rdf:type, O=rdfs:class;The optimizer decides on the order of the joins;

  • Then in a loop over this collection, ?x is bound to an item X, and a new query is asked to the PSO index, where P=rdfs:label and S=X.

Aggregation

Aggregation is done in a single pass over the result set and uses HashMaps to calculate the aggregate values. The aggregation overhead is relatively small compared to the fetch time and is done in linear time over the collection size.
A typical aggregation query is Q7:

# Retrieve the N most popular topics that creative works mention. 
# Further limiting the results above some primary content limit.
# reasoning : owl:ObjectProperty
SELECT ?mentions ((COUNT(*)) as ?count) 
WHERE {
 ?creativeWork cwork:mentions ?mentions .
 {
 SELECT ?creativeWork (count(*) as ?pcCount) {
 ?creativeWork bbc:primaryContentOf ?pc .
 }
 GROUP BY (?creativeWork)
 }
} 
GROUP BY ?mentions
ORDER BY DESC(?count)
LIMIT 10

The “execution time” is ~500ms, the fetch time is ~2700ms, and the aggregation time is <100ms.

Usage

To execute and return query explain plan, the user has to use a system from clause, which is:

from <http://ontotext.com/explain>

or if we use prefix:

PREFIX onto:<http://ontotext.com/>
 . . .
 from onto:explain

The server returns iterator with the explain plan result, instead of query result.

For Example plans and more information on GraphDB’s query optimization techniques please visit:

http://graphdb.ontotext.com/display/GraphDB6/GraphDB-SE+Explain+Plan#GraphDB-SEExplainPlan-Exampleplans

Back to top