GraphDB Performance Benchmark Results

Adequate benchmarking of semantic graph databases is a complex exercise involving many factors. Ontotext is involved in a project – LDBC – an outstanding initiative that aims to establish industry cooperation between vendors of RDF and graph database technologies in developing, endorsing, and publishing reliable and insightful performance benchmark results.

The benchmark results presented here aim to provide sufficient information on how GraphDB ™ performs on important tasks (such as loading, inference and querying) with variations in size and nature of the data, inference, query types and other relevant factors. Benchmarking results obtained with old versions of GraphDB are retained, when such results are not present yet for newer versions.

GraphDB ™ 7.2 Performance Benchmark Results

The GraphDB performance benchmark results below are obtained with GraphDB Standard Edition 7.2.0. The speed up column represents comparison with version 7.1, if other version is not referred to explicitly.

Task Hardware (1) Data size (2) (explicit triples) Load time (sec.) Loading speed (st./sec.) Query Performance Query perf. Measure Load time speed up (3) Query time speed up (3) Comment
LDBC SPB 256M Marconi 231,000,000 17, 341 13, 321 63 read queries per second 58% 15% Load time includes forward-chaining and materialization. Version 7.2 is the first one that allows parallel bluk loading for this datasets. There is 58% speed up between loading in v.7.2 in parallel mode vs. v.7.1 in serial mode. The parallel loading is 40% faster than the serial loading in v.7.2.

LDBC SPB driver is configured to use 8 clients to perform read queries, while in parallel 4 clients perform updates. GraphDB 7.1 achieves 55 read queries/sec in the same configuration.

10 updates per second -3%  The higher number of read queries processed results in slight slow down of the updates.  GraphDB 7.1 achieves 10.6 updates/sec in the same configuration.
Wordnet 3.1 load Leibniz 5,557,709 76 73,189  125% Fairly expressive reasoning is performed through forward-chaining 11 526 185 new statements. Speed comparison is for bulk load speed, parallel loading in v.7.2 vs. serial loading in version 6.2. The speed up between parallel and serial mode in v.7.2 is 101%.

Benchmark of BlazeGraph v. 2.1.1 (RDFS+, Fast Load mode) shows that it need 1,542 seconds to load Wordnet with inferences on the same hardware. This renders it about 19 times slower than GraphDB.

FactForge Turing 712,679,341 17,293 41,212  123% FactForge 2016 includes DB Pedia (the English version), Geonames (with owl:sameAs links to DBPedia), NOW metadata for  340 000 news articles and few smaller datasets and ontologies. Speed comparison is for bulk load speed, parallel vs. serial loading in v.7.2.
    • The hardware configurations are as follows. Leibniz is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost below $10,000. The configuration of Turing is very similar to Leibniz. Marconi is a single-CPU server with Xeon E5-1650 CPU, 64GB of RAM and a pair of SSD drives; overall assembly cost around $4,000.
    • In the Data size column we refer to the number of explicit statements in the repository after the initial loading data. We exclude inferred statements, because this is only relevant for forward-chaining based engines. Some tests insert additional statements if update queries are part of the query mixes – these additional statements are ignored above.
    • As a summary, the new implementation of the parallel mode in the LoadRDF bulk loading tool of GraphDB 7.2 makes loading about twice faster than anything one was able to achieve with previous versions. There is also observable speed up on read queries throughput (about 15%) – this is result of the new Global Cache implemented in GraphDB 7.2 which allows for more efficient handling of multiple read queries in parallel.

GraphDB ™ 6.1 Performance Benchmark Results

The benchmark results presented here aim to provide sufficient information on how GraphDB ™ performs important tasks (such as loading, inference and querying) with variations in size and nature of the data, inference, query types and other relevant factors. It also presents the improvement of speed in GraphDB ™ 6.1 in comparison to OWLIM 5.4.

Task Hardware (1) Data size (2) (explicit triples) Load time (sec.) Loading speed (st./sec.) Query Performance Query perf. Measure Load time speed up (3) Query time speed up (3) Comment
UNIPROT Aug’14 load Rolle 12,896,017,962 57,240 225,297 353% Loaded in a bit less than 16h. If data size is judged by the amount of triples in the input files (which is 17 billions), the loading speed is 295 000 st./sec.
DBPedia 2014 load, English version Leibniz 566,076,449 3,147 179,905 Loaded in 1 hour and 10 minutes from Turtle files
BSBM 100M Explore Leibniz 99,892,000 536 186,366 10,041 QMPH 241% 67% Query performance measured with 16 clients. Results in Query Mixes Per Hour
BSBM 100M Explore & Update Leibniz 10,086 QMPH 18%
BSBM 1B Explore Leibniz 998,782,000 5,581 178,961 1,083 QMPH 239% 13%
BSBM 1B Explore & Update Leibniz 1,278 QMPH 10%
LDBC SPB 50M Newton 50,124,572 2, 045 24, 511 40 read queries per second 10% 38% Load time includes forward-chaining and materialization. 10 clients perform read queries, while in parallel 2 clients perform updates
31 updates per second 244%
LDBC SPB 50M AWS c3.4xlarge 50,124,572 31 read queries per second 19% Load time includes forward-chaining and materialization. 14 clients perform read queries, while in parallel 2 clients perform updates
17 updates per second 113%
LDBC SPB 1B Newton 1,002,491,440 41,400 24,215 11 read queries per second 526% -2%
10 updates per second 1415%
Wordnet load Leibniz 2,724,000 576 4,729 Quite expressive reasoning is performed through forward-chaining

Notes

    • The hardware configurations are as follows. Leibniz is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost below $10,000. Rolle is the same as Leibniz, but with 512GB of RAM. Newton is very similar to Leibniz. AWS c3.4xlarge is a type Amazon cloud instance with 16 vCPUs, 55 ECU, 30GB of RAM and SSD storage.
    • In the Data size column we refer to the number of explicit statements in the repository after the initial loading data. We exclude inferred statements, because this is only relevant for forward-chaining based engines. Some tests insert additional statements if update queries are part of the query mixes – these additional statements are ignored above. There are datasets that include a substantial amount of duplicate statements in the data dumps – for instance, the raw files of UNIPROT contain 17B statements, but only 12B of those are unique.
    • Load and query performance of GraphDB ™ is compared to OWLIM SE 5.4, running in the same environment. Loading in GraphDB ™ is performed using the new Load Tool.

Results Analysis

GraphDB ™ can load datasets of more than 10 billion statements on a single commodity database server at speeds exceeding 200,000 statements per second. In specific loading scenarios GraphDB ™ managed to load billions of triple scale datasets at speeds of around 500,000 statements per second.

The loading speed of GraphDB ™ does not degrade as the volume of the data grows – for both BSBM and LDBC, the loading speeds for the 50-100 million datasets were the same as for the 1 billion statement datasets.
Under the LDBC Semantic Publishing Benchmark (SPB) 50-million dataset, GraphDB ™ Standard Edition can execute 30 read queries per second, while handling more than 20 updates each second in a consistent and transactionally safe manner.

This is also the case on the Amazon AWS instance with 30GB of RAM. LDBC SPB is a benchmark derived from BBC’s Dynamic Semantic Publishing projects. This benchmark simulates loads similar to the one experienced by GraphDB ™ serving web page generation for the BBC Sport website. Read query performance can be scaled up linearly through the cluster architecture of GraphDB ™ Enterprise;
GraphDB’s Loading Tool is much faster than any loading mechanism in OWLIM 5.4. For big datasets the speed up can be more than 5 times;
GraphDB ™ is faster on update queries – the increase in speed varies between 2 times (on SPB 50M) and 15 times (on SPB 1B).

Back to top