Choosing the Right Graph Database for Your Project
Back in February we had a webinar on the topic of “Choosing the right graph database for your project”. The purpose of the webinar was to:
- Outline typical advantages of graph databases (and semantic graph databases in particular) over alternative data management approaches
- Outline most common use cases where graph databases are utilised
- Provide details on the various features and editions of Ontotext Graph DB – one of the leading semantic graph databases
- Provide a roadmap for choosing the right GraphDB edition – from free, to single node deployment, to high-availability cluster or database-as-a-service in the Cloud – based on projects phase and needs
This post will summarise the webinar, as well as provide answers to the questions which we were unable to get to, due to the time restrictions.
The video recording and the slides from the webinar are also available. Due to the technical glitches we experienced in the beginning, the voice recording and slide sharing are missing for the first few minutes of the webinar recording.
Graph databases provide an advantage for managing data in use cases like:
- Dealing with interlinked, hierarchical or highly connected datasets (social graphs, product catalogs, classifications and taxonomies, etc.)
- Integration of heterogeneous data sources, where changes of the data model are required as new data sources are added, or as the application evolves. The “schema-late” approach of graph databases (and other NoSQL databases) is well suited for such cases and significantly reduces the overhead of data model refactoring
- Relationship-centric cases, where exploring the connections between the nodes of the graph supports data discovery and analytics. This is different from the “entity-centric” data modelling, where it is easy to obtain information about a particular entity, but exploring its relationships with other entities requires costly “joins”
There are generally two classes of graph databases, which share a lot of common features:
- Property graph databases, where the graph is modelled as nodes and edges, which may have various properties (key/value pairs) attached to them. Cypher is the most popular query language for property graph databases.
- Semantic graph databases (also called triplestores, or RDF databases), where the graph is similarly modelled as triples, where each triple is composed of two nodes and an edge connecting them, or a node, edge and a simple datatype value. SPARQL is the standard query language for semantic graph databases
Semantic graph databases have additional capabilities, which provide advantages over property graph databases in some use cases:
- The option to use rich, semantic data models (also called ontologies) to describe the properties of the entities and their relationships
- Ability the easily map between different data models, so that new data sources can be integrated at a lower cost
- Global identifiers of all entities and relationships (also called URIs), which additionally lowers the cost of integrating data sources, and makes data publishing and consumption easier (quite important for cases with Open / Linked Data sets)
- Ability to infer additional data and enrich the graph, based on pre-defined rules. A semantic graph database (triplestore) may also be described as a combination of a database management system and a rule engine, which is able to infer new data (relationships between nodes) and enrich the graph.
- Compliance to standards, which significantly reduces the risk of vendor lock-in when working with semantic graph databases and tools
Graph databases are well suited for a variety of use cases, such as:
- Various types of network analysis (for social graphs, influencer identification, fraud detection, risk analysis, etc.), where insight can be derived based on the relationships between entities such as people, organisations and events
- Recommendation engines, where the relationships between entities (like people, products, or content pages) provide insight on additional entities of interest
- Master Data Management, where organizational hierarchies or product catalogs as well as the relationships between the various entities need to be efficiently modelled, and easy to adapt to changing requirements
- Content enrichment and metadata driven publishing, where unstructured data (content) needs to be enriched with tags, categories and entities of interest, so that the content becomes more discoverable and easy to re-purpose for different access channels
- Knowledge Graphs and data sharing, where interlinked datasets are shared to 3rd party applications and provide means for powerful, semantic driven search and discovery
- Information discovery and semantic search, where complex questions can be answered and patterns detected, as opposed to traditional and inefficient keyword based search and discovery
GraphDB by Ontotext
Ontotext provides GraphDB, one of the leading semantic graph databases (triplestore), which has been successfully applied in mission critical use cases in various industries: media & publishing, healthcare & life sciences, digital libraries and cultural heritage.
GraphDB provides various advantages for working with large scale knowledge graphs:
- High scalability, with deployments of tens of billions of triples (nodes and edges of the graph)
- Various inference profiles, including custom rules, which are able to derive new information and enrich the semantic graph with new relationships between nodes
- Full compliance to standards
- Extensions for geo-spatial querying, RDF Rank for graph analytics, experimental Blueprints/Gremlin support, as well as a plugin architecture for 3rd party extensions
- Connectors to full-text search engines like Solr and Elasticsearch, which provide the means for high-performance hybrid queries mixing graph pattern matching with full-text or faceted queries
- High-availability cluster, for improved resilience and performance of the database in mission critical scenarios
GraphDB comes in various editions and deployment options, so that customers can choose the set of capabilities best suited for their prototype or use case.
The Standard edition provides all the capabilities and performance of GraphDB for single-node deployments. It is available for on-premise deployments (Java based, deploy anywhere, also via Maven artefacts and Docker containers), as well as instantly available on the AWS cloud via the AWS Marketplace
The Free edition of GraphDB provides – as the name implies – a fully featured and free semantic graph database. It has the same features of the Standard edition, with the limitation of only 2 concurrent database queries being executed at any point in time. GraphDB Free is ideal for experimentation, prototyping or for production deployments where the query load is not too high. It is also available for on-premise deployments or on the AWS cloud via the AWS Marketplace
The Enterprise edition of GraphDB provides a high availability (HA) cluster for mission critical deployments. It employs a standard master/worker architecture where queries are load-balanced among the worker nodes and all updates are multiplexed to all nodes. The cluster supports various topologies, including multi-master and multi-datacenter deployments.
The Database-as-a-Service version of GraphDB is available via the Self-Service Semantic Suite (S4) platform and it provides a fully-managed database in the cloud, where new databases can be instantly deployed and there’s zero administration involved. All DBA tasks such as maintenance, upgrades and backups are taken care of on behalf of developers, so that they can focus on faster prototyping and experimentation. The Database-as-a-Service provides various options based on the size of the semantic graph: from 10 million triples (free!), to 50, 250 million triples and up to 1 billion triples.
Choosing a Database for Your Project
Semantic graph databases provide a good data management solution for a variety of use cases. Ontotext GraphDB is among the leading enterprise semantic graph databases, with advantages such as:
- High Availability (HA) cluster
- Performance and scalability
- Various advanced features and extensions
- Variety of deployment options, from single node on-premise, to instant deployments on the AWS cloud, high-availability cluster for mission critical deployments, as well as a fully-managed database-as-a-service for faster experimentation and prototyping
- Developed by an established vendor and proven in high-profile use cases in various industries
There are various phases that a project undergoes when adopting a new technology such as a semantic graph database: from learning, to initial prototyping and experimentation, a full-featured pilot, up to a production deployment. Different priorities are most important in each phase: cost, ease of deployment, performance, high availability.
- During the learning phase key priorities are usually the cost (free) and instant and easy set up. The Database-as-a-Service (free up to 10M triples) provides an instantly available database in the Cloud, where developers don’t need to spend time on DBA tasks. The Free edition provides a good option for the learning phase for developers who prefer a local deployment (the Free edition is also available via the AWS Marketplace)
- During the prototyping phase the cost (free) and ease of deployment and maintenance are still among the leading priorities, but the data volume (size of the semantic graph) starts to grow. The Free edition is a good solution, since it has no limit on the size of the graphs it can manage. Additionally, the Database-as-a-Service edition provides low-cost options for bigger graphs (50/250 million triples) which may be preferred during the prototyping phase.
- When a fully-featured pilot is being developed, performance and scalability usually become key priorities. The Standard edition provides an optimal solution, since it is able to manage large volumes of data and concurrent queries. In some cases the Free and Database-as-a-Service editions may (250 million / 1 billion triples) may be preferred even for a full pilot.
- In mission critical production deployment, the key factors are performance and scalability, as well as high availability and resilience. The Enterprise (HA) edition of GraphDB provides the optimal solution, with various topologies that can ensure improved query performance and cluster resilience. In some production scenarios, a single-node database may be sufficient too, so that the Standard edition may be an option too.
Semantic graph databases provide the optimal data management solution for various use cases such as network analysis, highly interlinked data, master data management and heterogeneous data integration. Ontotext GraphDB is among the leading enterprise semantic graph databases with high scalability, advanced features and extensions and a variety of deployment options (from single node on-premise, to instant deployments on the AWS cloud, high-availability cluster for mission critical deployments, as well as a fully-managed database-as-a-service for faster experimentation and prototyping).
If you haven’t already done so, check out GraphDB and choose the edition most suited for your smart data prototypes!
CTO at Ontotext
As the technological captain of Ontotext, he is leading the company on the right tech route and reserving our spot on the map of the world. His sharp mind can explain complex things in a simple way, making him an invaluable resource in semantics. Marin is a frequent speaker on semantic conferences and open data meetups at various technology related events.
Latest posts by Marin Dimitrov (see all)