Ontotext To Help Datathon Teams Showcase the Power of Linked Open Data


Ontotext will challenge teams of data enthusiasts to convert data from the Bulgarian Commercial Register into a Linked Open Data (LOD) format in order to demonstrate how semantic graph databases can reveal relationships and uncover hidden facts from denormalized data, for example:

  • Identify and rank the biggest groups of related companies in Bulgaria or in a specific region;
  • Board-walk: analyze networks of influence through directors that co-participate in boards of multiple companies.

As part of the first practical data challenge in Central and Eastern Europe – Datathon Bulgaria Ontotext will participate with ‘Hacking the Bulgarian Commercial Register’. For this challenge, Ontotext will provide the teams with a subset of the Bulgarian Commercial Register between 2008-2017 and will mentor them throughout the steps of converting the data into LOD using a simple RDF model and linking it to other open datasets.

The Bulgarian Commercial Register is administered by the Bulgarian Registry Agency and has been available online since 2008. The register contains information about all companies and legal entities in Bulgaria, including addresses, owners and managers. It is an information resource with great social value that strives to support businesses and limit corruption.

For the Datathon challenge, Ontotext, which has been pioneering the use of LOD for years, will partner with OpenCorporates – the largest open database of companies and company data in the world, with about 120M companies from 100+ countries. OpenCorporates is already Ontotext’s partner in the H2020 project euBusinessGraph that aims to create a platform for integrating, harmonizing and publishing data about European companies.

OpenCorporates’s primary goal is “to make information on companies more usable and more widely available for the public benefit, particularly to tackle the use of companies for criminal or anti-social purposes, for example corruption, money laundering and organised crime.” This is a very important task in the context of the ever increasing role that companies play in today’s society with networks of legal entities spread across borders.

Ontotext’s challenge at the Datathon will show how a big set of highly complex data such as the Bulgarian Commercial Register – currently organized as a set of daily updates in XML files – can be aggregated and converted into an LOD-suitable format that is accessible, open (based on open standards and recommendations by W3C) and interconnected (showing the relationships between companies, managers, locations, regulatory and court filings).

The resulting dataset will allow to easily link all the data with additional open data sources such as Geonames (all geographic objects on Earth), DBPedia (structured version of Wikipedia), Wikidata, OpenCorporates and many others. Creating an LOD format of the Bulgarian Commercial Register has the potential to make the data more transparent and informative for businesses, as well as easy and efficient to query by researchers and reporters, thus enhancing availability and helping fight corruption.

when you connect data together, you get power

The mentoring of ‘hacking’ the Commercial Register will be provided by Dimitar Manov and Plamen Tarkalanov from Ontotext and Alex Angelov from OpenCorporates. One week before the event, Ontotext will supply the teams with free training video materials adapted from the one day “Semantic technology Proof-of-Concept” live training. Tips and tricks, and hands on session will be handed on before the hacking starts on site.

The Dathaton will take place between March 24 and 26 and will award the teams that have come up with the most precise, creative and elegant solution to the data problems.

Get your free fully managed semantic graph database GraphDB on the Ontotext Cloud and a free local copy to start representing denormalized data in a Linked Open Data knowledge graph and begin exploring all the facets of your data.

Every participant will get vouchers for using the Standard tier of GraphDB on the cloud free of charge that will be valid for three months after the event.

Back to top