Ontotext talks about Open Data at Data Summit Brussels

A few weeks ago I gave a presentation titled “Enabling low-cost Open Data Publication and Reuse” at the Data Summit Brussels. The presentation was based on the ongoing work in one of the EC funded research projects that Ontotext participates in: DaPaaS, which has the goal of developing a platform for Open Data publishing and access.

In recent years, Open Data initiatives have been growing at a rapid pace worldwide. More and more data (mostly from government organizations) has been made available for open access. Organizations such as the Open Data Institute have set their mission to educate government organizations and SMEs on how to publish, utilize and monetize Open Data. It has a significant potential for improving the transparency and quality of public services, as well as optimize costs and improve innovation in various industry sectors. IN In the McKinsey report titled “Open data: Unlocking innovation and performance with liquid information” analysts wrote:

“Our research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations”

At the same time, there are some challenges to the wider adoption of Open Data: the quality of data is a significant problem. Lots of organizations are obligated to open up their data, but lack the required expertise, tooling, resources or sustainability plans on how to make this data useful, how to improve its quality and how to maintain it with regular updates and enhancements. Most of the Open Data available these days is really just plain CSV files of questionable quality, which are difficult to access and use. While some organizations will reference the hundreds of thousands of open datasets available as a proof for the success of the Open Data movement, very few have taken a more critical look at the quality, usage statistics, and value that most of these datasets provide. Additional factors limiting the adoption of Open Data include the lack of expertise, resources and commitment by many organizations to make data available as live data services and APIs easily accessible to 3rd party applications.

DaPaaS is an EC funded research project that has the goal of making it easier to publish and reuse Open Data. The partners in the project include: Ontotext (Bulgaria), SINTEF (Norway), Swirrl (UK), Open Data Institute (UK), and Sirma Mobile (Bulgaria), as well as an associated partner from South Korea: Saltlux.

The DaPaaS project has chosen the Linked Data paradigm as a way to publish and consume Open Data, so that the data can be better described, interlinked and queried in a way that is not possible utilizing the traditional approaches of CVS files or very simple Web APIs providing access to Open Data. The key building blocks of the DaPaaS platform include:

  • Grafter, an open source suite of tools and a DSL for data cleaning and transformation. Grafter can easily transform from one tabular format to another, or from a tabular format to RDF. It is designed for stream-like processing, so that even very large datasets can be processed efficiently. A key feature of Grafter is that the data transformation and cleanup workflows can be easily packaged as REST services, and that the transformations are repeatable and reusable over the same dataset in the long term.
  • Grafterizer, another open source tool providing an IDE for the Grafter suite, so that developers can easily create data transformation and cleanup workflows.
  • A scalable RDF database-as-a-service (DBaaS), based on Ontotext’s enterprise grade GraphDB, which makes it possible to instantly deploy large number of live data services (RDF databases and SPARQL endpoints) over the open datasets which was cleaned up ad RDF-ized with Grafter and Grafterizer.


The DaPaaS Open Data platform will soon be open to the general public. More information is available via Twitter and email. For details on my presentation view the slide deck from my talk on SlideShare.

Marin Dimitrov

Marin Dimitrov

CTO at Ontotext
As the technological captain of Ontotext, he is leading the company on the right tech route and reserving our spot on the map of the world. His sharp mind can explain complex things in a simple way, making him an invaluable resource in semantics. Marin is a frequent speaker on semantic conferences and open data meetups at various technology related events.
Marin Dimitrov

Related Posts

  • Featured Image

    Weaving Data Into Texts: The Value of Semantic Annotation

    Semantic annotation is about weaving data into textual sources. In semantically annotated texts, certain words (denoting things, people, locations, organizations, etc) are linked to data – that is, to context and references that can be processed by an algorithm.

  • Datathon Case Overview: Revealing Hidden Links Through Open Data

    For the first Datathon in Central and Eastern Europe, the Data Science Society team and the partner companies provided various business cases in the field of data science, offering challenges to the participants who set out to solve them in less than 48 hours. At the end of the event, there were 16 teams presenting their results after a weekend of work.

  • Featured Image

    Exploring Linked Open Data with FactForge

    Our way out of data confusion and into data abundance is the portion of the growingly interconnected data on the web. With FactForge as a convenient entry point to the web of interconnected data, we can turn the exciting opportunities that data flows on the web can pour into our business into real experience.

Back to top