Why are graph databases hot? Because they tell a story…

Graph databases like GraphDB™ are popular for a variety of reasons. They make it easy to import data without creating complex schemas. They store relationships extracted from unstructured data. You can combine Linked Open Data with your own data and extend your knowledge about facts like people, places, organizations and events.

Download a free whitepaper: “The Truth About Triplestores: The Top 8 Things You Need to Know When Considering a Triplestore”

In turn, the types of queries you can perform and the intelligence returned expands. There are dozens of reasons why organizations are adopting this exciting new form of database.

One of the most important aspects of graph databases has to do with how the data is stored – in the form of relationships. These relationships tell you something about the entity. For example, “John works at Banking Corp” or “Sally lives in Nottingham”.

As you create more and more semantic links ( known as triples  – the atomic form of intelligence inside a graph database) you uncover more meaning because of connections across the triples. This new found intelligence can be used to identify unknown or non-obvious relationships and linkages between facts.

Two of the most important attributes of graph databases are inference and semantic data integration. The first allows you to create new facts from existing facts. The latter allows you to integrate many forms of data while maintaining connections back to the original sources. Keeping all of your data in synch and materializing new facts using inference are two important aspects of graph databases and semantic technology.

What is Inference?

It’s the ability to materialize new semantic facts from existing facts. For example, if we know that Fido is a dog and we know that a dog is a mammal, then we can infer Fido is a mammal.

GraphDB Usecase/BigBucks

How can inference help your business? Let’s use the graph database example above. A business person analyzing entities such as companies may need to know relationships that exist between different companies. Some of them may not be obvious.

In the example above, we know that Big Bucks Coffee controls a company called Global Investment, Inc. We also know know that Global Investment controls a chain of coffee shops called “My Local Cafe”. As the diagram shows, data about My Local Cafe was also extracted through a text mining pipeline from a news article on the Cafe and stored inside a graph database.  Because of the transitive properties of graph databases, we can infer (red dashed lines) that Big Bucks Coffee controls My Local Cafe.

Why Semantic Data Integration?

What we also see are other facts about the world that have been integrated. These facts may come from Linked Open Data. For example, we know that Big Bucks Cafe is in Seattle and Seattle is a sub region of Washington State. We know Global Investment is in West Bay and West Bay is a sub region of the Cayman Islands. And we know that the Cayman Islands are classified as an offshore zone for investment purposes.

Most importantly, we can infer that there is a suspicious relationship between Big Bucks Cafe and My Local Cafe using inference rules that take into account the location of entities and the relationships they have to each other. Without connected facts and inference, you simply could not determine all of these relationships actually exist.

Integration of Text and Data

Semantic data integration, when done correctly, has the ability to maintain real time feeds from text mining pipelines into your graph database. One of the biggest challenges organizations face is extracting meaning from unstructured data. Therefore, including text mining in your semantic stack is essential if you want to analyze free flowing text, create triples on the fly and store them inside graph databases.

Closely aligned with text mining is something called disambiguation or identity resolution. As you analyze text, identify entities and classify them, you will inevitably uncover names that refer to the same entity. For example, Robert Smith, RJ Smith, Bob James Smith and Bobby Smith may actually be referring to the same person. The process to disambiguate entities is covered in more detail in another post.

Optimizing the storage of facts that refer to the same entity is an important aspect of the graph database enabling fast queries and inference.

Get more details in a free whitepaper “The Truth About Triplestores: The Top 8 Things You Need to Know When Considering a Triplestore”

Graph databases hold the keys to unlocking hidden meaning in your data. Because GraphDB™ is a special type of graph database, it provides you with extremely powerful qualities that other graph databases do not have. It can load, query and infer new facts simultaneously and at high rates speed. It has direct connections to text mining pipelines allowing you to extract meaning from your unstructured data and create new facts in real time. It ensures that the semantic triples in GraphDB™ are kept in synch with changes to your content stores. It allows you to develop hybrid queries that include semantic facts and full-text search within unstructured data.

Graph databases allow you to tell a story. They allow you to connect the dots. When you use this powerful type of database, true meaning is one query away.

Milena Yankova

Milena Yankova

Director Global Marketing at Ontotext
A bright lady with a PhD in Computer Science, Milena's path started in the role of a developer, passed through project and quickly led her to product management. For her a constant source of miracles is how technology supports and alters our behaviour, engagement and social connections.
Milena Yankova
  • Pingback: Why are graph databases hot? Because they tell a story… – Ontotext()

  • Agree. Some of the most difficult analytic jobs we get at Analyze Corp involve semantic searching. Hidden relationships and inferences need to be dealt with in a systematic and repeatable way. Fast and accurate searching has not historically been synonymous within this domain. They may be ending.

    I am not that familiar yet with GraphDB, but on first look it seems fairly useful. Importing and exporting data has always been an area that takes time, since you really have to get the ontology right. The less structured the data the more important we find it is to get that nailed down correctly.

Related Posts

  • Featured Image

    The New Cache on the Block: A Caching strategy in GraphDB To Better Utilize Memory

    The ability to seamlessly integrate datasets and the speed at which this can be done are mission critical when it comes to working with big data. The new caching system of GraphDB is better, faster and smarter and solves the issues of the old caching strategy in GraphDB.

  • Featured Image

    Fighting Fake News: Ontotext’s Role in EU-Funded Pheme Project

    Before ‘fake news’ became the latest buzzword, in January 2014 Ontotext started working on Project PHEME – ‘Computing Veracity Across Media, Languages, and Social Networks’ alongside eight other partners. The EU-funded project aimed at creating a computational framework for automatic discovery and verification of information at scale and fast.

  • Datathon Case Overview: Revealing Hidden Links Through Open Data

    For the first Datathon in Central and Eastern Europe, the Data Science Society team and the partner companies provided various business cases in the field of data science, offering challenges to the participants who set out to solve them in less than 48 hours. At the end of the event, there were 16 teams presenting their results after a weekend of work.

Back to top