Ontotext talks about Open Data at Data Summit Brussels

A few weeks ago I gave a presentation titled “Enabling low-cost Open Data Publication and Reuse” at the Data Summit Brussels. The presentation was based on the ongoing work in one of the EC funded research projects that Ontotext participates in: DaPaaS, which has the goal of developing a platform for Open Data publishing and access.

In recent years, Open Data initiatives have been growing at a rapid pace worldwide. More and more data (mostly from government organizations) has been made available for open access. Organizations such as the Open Data Institute have set their mission to educate government organizations and SMEs on how to publish, utilize and monetize Open Data. It has a significant potential for improving the transparency and quality of public services, as well as optimize costs and improve innovation in various industry sectors. IN In the McKinsey report titled “Open data: Unlocking innovation and performance with liquid information” analysts wrote:

“Our research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations”

At the same time, there are some challenges to the wider adoption of Open Data: the quality of data is a significant problem. Lots of organizations are obligated to open up their data, but lack the required expertise, tooling, resources or sustainability plans on how to make this data useful, how to improve its quality and how to maintain it with regular updates and enhancements. Most of the Open Data available these days is really just plain CSV files of questionable quality, which are difficult to access and use. While some organizations will reference the hundreds of thousands of open datasets available as a proof for the success of the Open Data movement, very few have taken a more critical look at the quality, usage statistics, and value that most of these datasets provide. Additional factors limiting the adoption of Open Data include the lack of expertise, resources and commitment by many organizations to make data available as live data services and APIs easily accessible to 3rd party applications.

DaPaaS is an EC funded research project that has the goal of making it easier to publish and reuse Open Data. The partners in the project include: Ontotext (Bulgaria), SINTEF (Norway), Swirrl (UK), Open Data Institute (UK), and Sirma Mobile (Bulgaria), as well as an associated partner from South Korea: Saltlux.

The DaPaaS project has chosen the Linked Data paradigm as a way to publish and consume Open Data, so that the data can be better described, interlinked and queried in a way that is not possible utilizing the traditional approaches of CVS files or very simple Web APIs providing access to Open Data. The key building blocks of the DaPaaS platform include:

  • Grafter, an open source suite of tools and a DSL for data cleaning and transformation. Grafter can easily transform from one tabular format to another, or from a tabular format to RDF. It is designed for stream-like processing, so that even very large datasets can be processed efficiently. A key feature of Grafter is that the data transformation and cleanup workflows can be easily packaged as REST services, and that the transformations are repeatable and reusable over the same dataset in the long term.
  • Grafterizer, another open source tool providing an IDE for the Grafter suite, so that developers can easily create data transformation and cleanup workflows.
  • A scalable RDF database-as-a-service (DBaaS), based on Ontotext’s enterprise grade GraphDB, which makes it possible to instantly deploy large number of live data services (RDF databases and SPARQL endpoints) over the open datasets which was cleaned up ad RDF-ized with Grafter and Grafterizer.

 

The DaPaaS Open Data platform will soon be open to the general public. More information is available via Twitter and email. For details on my presentation view the slide deck from my talk on SlideShare.

Marin Dimitrov

Marin Dimitrov

CTO at Ontotext
As the technological captain of Ontotext, he is leading the company on the right tech route and reserving our spot on the map of the world. His sharp mind can explain complex things in a simple way, making him an invaluable resource in semantics. Marin is a frequent speaker on semantic conferences and open data meetups at various technology related events.
Marin Dimitrov

Related Posts

  • Open data fosters a culture of creativity and innovation

    Open Data Innovation? Open Your Data And See It Happen.

    As more and more companies and startups are creating business and social value out of open data, the open data trend-setting governments and local authorities are not sitting idle and are opening up data sets and actively encouraging citizens, developers, and firms to innovate with open data.

  • Linked Open Data Sets

    Linked Data Innovation – A Key To Foster Business Growth

      ‘Data is the new oil’, once said Neelie Kroes,  former Vice-President of the European Commission responsible for the Digital Agenda, aptly describing how the growing amounts of data are changing businesses and our lives. The year…

  • Feaured image Linked Open Data

    Connectivity, Open Data and A Bag of Chips

    Often considered too technical and hard to implement Linked Open Data is actually not something outside business and free exchange as usual – it is connectivity, but on a data level. Global connectivity transformed the way we…

Back to top