GraphDB Performance Benchmark Results

Our engineering team invests constant efforts in achieving the best possible database loading and query answering speed. This page presents some popular benchmarks and explains how they should be interpreted in the context of common RDF use cases.

Adequate benchmarking and result interpretation is a complex task involving deep understanding of database index structures and internals. If you cannot recognize your use case listed below, do not hesitate to contact us at We will do our best to publish it here.

LDBC Semantic Publishing Benchmark 2.0

LDBC is an industry association aimed to create TPC-like benchmarks for RDF and graph databases. The association is founded by a consortium of RDF database vendors (Ontotext, OpenLink, Neo Technologies, Oracle, IBM, SAP, SYSTAP).

The Semantic Publishing Benchmark (SPB) simulates the database load commonly faced by media or publishing organizations. The generated dataset is based on BBC’s Dynamic Semantic Publishing and contains both reference data and content meta-data, e.g., abstract information resources labeled “creative works” are annotated with master data organized in taxonomies. All queries follow the classical editorial process: updates (adding new meta-data or updating the reference knowledge) and aggregation queries (retrieving content according to various criteria).

Data loading

This section illustrates how quickly GraphDB can do an initial data load. The SPB-256 dataset represents the size of an average production database used by a media or a scientific publisher. The dataset contains around 256M explicit statements and 400M indexed triples in total, including the implicit ones generated by the reasoner:

Table 1: Loading time of the LDBC-SPB 256 Reference dataset with multiple cores in minutes

EditionsRulesetExplicit statementsTotal indexedCores
8.1 EE/SEOWL2-RL255,057,222771,007,768554519462429424
8.1 EE/SERDFS-Plus-optimized255,057,222404,404,552179171167166165
8.1 FreeRDFS-Plus-optimized255,057,222404,404,552211211211211211
The data loading with the most expressive ruleset is the slowest, but it is also boosted to the highest degree by the number of available CPU cores. The default ruleset for GraphDB – RDFS plus is much faster.

GraphDB 8.1 EE/SE are faster than 8.1 Free because of the asynchronous data loading with two threads. GraphDB 8.1 is much faster than 8.0.6 because of the new indexes.

Production load

The test demonstrates the typical editorial workflow that generates new content and meta-data, updates the used taxonomies or retrieves information for the service end users. The different runs compare the database performance according to the number of concurrent read and write clients.

Table 2: The number of executed query mixes per second (higher is better) vs the number of concurrent clients.

Server parametersRun #1Run #2Run #3Run #4Run #5
InstancePriceDiskReading Agents: 0Writing Agents: 4Reading Agents: 12Writing Agents: 4Reading Agents: 16Writing Agents: 0Reading Agents: 16Writing Agents: 4Reading Agents: 8Writing Agents: 4
c4.4xlarge0.796iop (5K IOPS)012.319.33.833.3019.92.818.65.7
c3.4xlarge0.84local SSD09.543.84.969.9049.
i3.4xlarge1.248local NVMe017.271.59.690.9078.87.359.411.1
marconi *-local SSD022.677.5994.9080.56.665.813

Notes: All runs use the same configuration limited to 20GB heap size. The AWS price is based on the US East coast for an on-demand type of instance (Q1 2017) and it does not include the EBS volume charges, which are substantial only for iop partitions.

Here is an alternative hardware configuration with 32 CPU cores covering extreme scenarios:

Table 3: The number of executed query mixes per second (higher is better) vs the number of concurrent clients on a Leibniz**

Reading AgentsWriting AgentsReads/sWrites/s

Berlin SPARQL Benchmark (BSBM)

BSBM is a popular benchmark similar to LDBC SPB in combining read queries with frequent updates. It covers a more generic use case, generally defined as eCommerce, describing relations between: products and producers, products and offers, offers and vendors, products and reviews.

The tests represent very well the type of queries commonly used in enterprise data integration projects. There are two runs, where the “explore” run generates requests like “find products for a given set of generic features”, “retrieve basic information about a product for display purpose”, “get recent review”, etc. The “explore and update” run mixes all read queries with information updates.

Table 4: Executed query mixes per hour (QMpH) on a Marconi server class for different number of concurrent clients

Threadsexplore (QMpH)explore and update (QMpH)

* The Marconi server class is a single-CPU server with Xeon E5-1650 CPU, 64GB of RAM and a pair of SSD drives; overall assembly cost around $4,000

** The Leibniz server class is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost below $10,000.

Back to top