What is Semantic Data Integration?

Semantic Data Integration allows users to quickly design data processing jobs involving GraphDB™ and GATE (General Architecture for Text Engineering).  Users interested in “RDF-izing” their data can export the jobs as executable processes or REST services.  Part of this process involves identity resolution where users can predefine matching criteria.  The Identity Resolution Framework directly supports accessing semantic repositories through SPARQL.

The Ontotext Workbench provides users with a web interface and API to facilitate RDF database management, administration, and application development tasks. GraphDB™ Connectors include a set of adapters and configuration interfaces allowing users to connect GraphDB™ to external persistence engines.  Learn about these tools and the Web Mining Framework below.


Creating a 360 Degree View with Semantic Data Integration

Data integration is paramount in a world where complete visibility, accurate analysis and data complexity dominate the landscape.  Today, organizations are searching for solutions that allow them to manage all of their data – structured, semi-structured and unstructured data.  Whether your graph database operates standalone or integrated into a larger database ecosystem, you need a complete set of tools to ensure you have a synchronized  360-degree view of your data. The ability to easily perform tasks – create documents from files, create and export annotations, load RDF statements into GraphDB™ and merge two or more GraphDB™ databases – are all essential functions that support world-class semantic solutions.   With our Semantic Integration Suite, your ability to integrate data is much easier.

Semantic Integration Tools

Our team of experts has hundreds of years of experience working with text mining and RDF data integration tools. Our customers use these tools and our services to help guide them through a semantic data integration lifecycle including loading documents, processing annotations, creating RDF statements, loading those statements into semantic repositories and merging two or more repositories when needed.  User can quickly design data processing jobs for both GraphDB™ and Gate. They can export the jobs as executable processes or REST services and apply them to integrate massive amounts of data.   Ontotext Semantic Data Integration allows you to rapidly RDF-ize your data.

RDF-ize your own data with the latest GraphDB Free 7.2. Import your data and run queries super fast.

Identify Resolution Framework

In many cases, two or more RDF statements may be referring to the same entity.  This has been determined through disambiguation analysis in the text mining process.  Knowing that these different entities are really the same allows users to later search and locate all of the references in an optimized way.  Search results and analysis are more accurate.  The Identity Resolution Framework uses domain-specific predefined matching criteria expressed in a human friendly way based on predicate logic.  Ontologies are used to represent the knowledge in GraphDB™.  Direct access is provided through SPARQL.

Use Cases for Identity Resolution

  • Data Consolidation – In identify resolution, users typically want to discover references to the same object that exist in different data sources.  In essence, they want to pair these objects.   This technique has two major benefits – the identities are resolved AND redundancy in the incoming data has been consolidated in the graph database and  later can be used in analysis.  In other words, the resolution has far reaching effects beyond semantic data integration.
  • Cross Document Co-Reference – Our approach to this allows organizations to identify variations of the same objects from different formats – textual documents, web pages, database records, ontologies and more.   We create a single data view where different facts are interlinked and redundancy is removed.  This allows users to easily query and use large data sets in a variety of ways.  In essence we consolidate objects, linking records and allow for cross document co-reference resolution, a very powerful capability widely used by anyone interested in natural language processing, ontology population and the semantic web.
  • Efficient Extraction & Aggregation  – Organizations interested in consolidating information from many systems and data sources can resolve repetitive information.  Identities can be resolved across different ontologies. Information extraction can be done efficiently from different sources.  Deciding which data is “new” and which has already been extracted needs to be carefully managed if the resulting applications are to be successful.  Users interested in aggregating details about the resolved identities can do so. This very same approach can also be applied to different objects where you want to pair together two objects that work together like a nut and bolt.
  • Industry Applications – In financial services, banks and brokerages organizations are very interested in identity resolution in support of fraud detection and anti-money laundering analysis.  Media and Publishing companies need to search historical archives to quickly identify when two or more references to an entity are the same.  Customer service departments in the eCommerce or Retail space consolidate data from various systems and need to resolve identities in the process.   eDiscovery applications can link together documents where the same person is referenced in different ways.  The use cases for this type of semantic integration are endless.

Web Mining Framework

Many businesses want to load graph databases with information collected from the web.  This could be competitive intelligence, target names, facts about places – any fact that you want to use in analysis and search.   The Web Mining Framework is a comprehensive, efficient web intelligence and web search platform. It provides the capability to crawl, fetch, parse, extract and store heterogeneous documents from the web, transforming them into a well-structured data set.  The resulting data can be used to enrich your current graph database and used in search applications.

GraphDB™ Connectors

GraphDB™ Connectors are a suite of adapters and configuration interfaces allowing users to connect the semantic repository to various external persistence engines. For example you can connect external search engines like Lucene, SoLR and Elasticsearch for faster co-occurrence, faceted search and navigation.  Users can obtain updates from big data stores and write to external file systems for backup or data replication .  Today, we support connectors to SoLR, Lucene and Elasticsearch but plans are well underway to extend this library.

Technology partners interested in offering a full suite of semantic technology should connect us. This technology dramatically extends our platform allowing organizations to integrate other data sources and processes.  They work with our GraphDB™ notification technology and the plug-in API.

GraphDB™ Workbench

GraphDB™ Workbench is a web interface and API to facilitate RDF database management, administration and application development tasks. With a single click users can start to define everything through this interface.  The workbench allows for easy configuration and operation of RDF databases. We support a Sesame API, the Linked Data Publishing platform from w3C, the ability to create, reconfigure and delete repositories, security management, user setup, write permissions, creating and modifying linked data sources and more.   Contact us for a demo of GraphDB™ Workbench and find out how easy it is to set up and manage your RDF repositories.

Ontotext Professional Services

Our professional services staff has helped hundreds of customers apply this complete set of semantic integration tools.  Many organizations contract with us to build, deploy and maintain GraphDB™ repositories that are populated and updated using document management, annotation and text mining tools.  They consider us part of their extended team.  To learn more about our services, contact us today.

Back to top