Open Data Sources for Empowering Smart Analytics


Finding Open Data sources is a walk in the park: a simple search leads to hundreds of pages of datasets. Governments, NGOs and organizations keep on aggregating and publishing Open Data and more and more businesses and developers use data analytics to gain insights, predict trends and make data-driven decisions.

ontotext, open data, open data analytics

Why Open Data?

In recent years Open Data has opened the door to easier and more efficient ways of finding and analyzing huge datasets.  Yet, the true value of Open Data sources is finding ways and using tools to explore, analyze and reuse all that content to minimize efforts and maximize returns. Once an organization has decided which open datasets it will use, semantic technology and enrichment can help it enrich and classify entities with Linked Open Data, identify relationships between entities and concepts, and disambiguate one entity from another. Linking Open Data is increasingly turning into a means for organizations to stay ahead of competitors.

Still, the first step to analyzing Open Data is to use as reliable sources of datasets as possible.
Linked Open Data

Government Open Data Sources

Our search for Open Data sources begins with government data. The US Government’s Open Data portal has 195,000-plus Federal and local datasets on topics ranging from agriculture and education to finance and climate.
However, sometimes datasets need more visuals and more user-friendly experience to become easier to search and time- and cost-efficient to use.
In early April 2016 the MIT Media Lab, in cooperation with Deloitte and Datawheel, launched the Data USA website, a “visualization engine of public US Government data that tells stories about America”, as the developers had put it. Users can search any of the four categories – locations, industries, occupation or education – in the search box. For example, type Boston, or Pittsburgh, or Spokane, in the search field and the website shows an aerial photo of the city with its population, median household income and median age as main statistics. Links below lead to five other categories for each city: economy, demographics, education, housing & living, and health & safety. The website also features profiles on cross-topics, such as ‘Most Common Universities for Computer Science’, or ‘Gender Pay Gap in Connecticut’.

In Europe, we find the UK government’s website which has aggregated Open Data on topics such as environment, society, towns, and business & economy from government bodies and agencies.

Europe-wide, we have the European Data Portal developed by the European Commission with the support of a Capgemini-led consortium, including INTRASOFT International, Fraunhofer Fokus, con terra, Sogeti, the Open Data Institute, Time.Lex and the University of Southampton. This source has nearly 430,000 datasets tagged under various categories, including environment, economy & finance, and education, culture & sport.

Open Data from Global Organizations

On supra-governmental and supra-continent level, the World Health Organization and UNICEF provide Open Datasets with statistics on hunger, diseases, deaths, children and women’s health. The World Bank has a free and open access to data about development in countries around the globe, with economy & growth, health, education, environment and climate change featured. So does the OECD data portal.

If you are not sure where you to begin with, OpenDataSoft has compiled a list of more than 2,500 Open Data sources portals by country.

Googling Open Data Sources

Still, browsing all these portals each at a time is sometimes tedious and always time- and resource-consuming. Google Public Data – though not as comprehensive as the separate statistics websites – has aggregated some of the most popular and reliable official sources and key economic and health indicators across the world. For example, a random try at a search for wages in the US in the ‘Metrics’ menu shows the ‘Compensation of Employees’ report by the U.S. Bureau of Economic Analysis, with charts and comparisons by region or by industry.

Google Trends gives info on the search habits, traffic and interest over time on searches, with historical data dating back to 2004. It also contains infographics on the search interest in global trending topics such as the US Presidential Elections, the Panama Papers, the Brussels attacks or the Zika virus.

Crowdsourcing for Open Data

Users and developers are not only browsing for data, they are actively contributing to creating open databases, DBpedia and GeoNames being the most notable examples.

Lunking Open Data

Corporate Data as Open Data Sources

Businesses and organizations may be increasingly using Open Data sources to support decisions, but they are reluctant to publish their proprietary data except for statutory filings. If you want to have some basic company info, browsing the websites of the US Securities and Exchange Commission (SEC) or UK’s Financial Conduct Authority (FCA), to name just these two, is a dull and often unproductive task. OpenCorporates, a company based at the Open Data Institute (ODI), contains basic data on almost 100,000,000 companies around the world. OpenCorporates has also designed visuals using several sources: filings to the SEC, banking data held by the National Information Center of the Federal Reserve System in the US, and information about individual shareholders published by the official New Zealand corporate registry. The visualizations show all (the thousands of) subsidiaries in all countries of BP, Bank of America, Citigroup, Goldman Sachs, Morgan Stanley, JP Morgan and Wells Fargo.

Another organization, Berlin-based OpenOil, has collected more than 1 million corporate filings related to the oil, gas and mining industries. It has indexed the full text of contracts, company disclosures, news articles and government reports, which allows users to simultaneously check documents from different sources.

Gaining Insights from Open Data

The number of open datasets is only set to grow, and so is the need for organizations to have tools to rapidly analyze data in order to have the upper hand in a fierce competition environment. Linked Open Data and semantic technology help organizations boost data analytics by building ranking reports, viewing topics linked implicitly, drawing trend lines, and extending analytics with additional data sources.

Apart from generating economic and social value, Open Data creates new business models and opportunities. More and more organizations are and will be embracing smart analytics to create additional value for their stakeholders, users and customers.

Milena Yankova

Milena Yankova

Director Global Marketing at Ontotext
A bright lady with a PhD in Computer Science, Milena's path started in the role of a developer, passed through project and quickly led her to product management. For her a constant source of miracles is how technology supports and alters our behaviour, engagement and social connections.
Milena Yankova
  • Fariz Darari

    Wikidata is also a nice example of crowdsourced, open data source. The license is even CC0 🙂

Related Posts

  • Featured Image

    Weaving Data Into Texts: The Value of Semantic Annotation

    Semantic annotation is about weaving data into textual sources. In semantically annotated texts, certain words (denoting things, people, locations, organizations, etc) are linked to data – that is, to context and references that can be processed by an algorithm.

  • Datathon Case Overview: Revealing Hidden Links Through Open Data

    For the first Datathon in Central and Eastern Europe, the Data Science Society team and the partner companies provided various business cases in the field of data science, offering challenges to the participants who set out to solve them in less than 48 hours. At the end of the event, there were 16 teams presenting their results after a weekend of work.

  • Featured Image

    Exploring Linked Open Data with FactForge

    Our way out of data confusion and into data abundance is the portion of the growingly interconnected data on the web. With FactForge as a convenient entry point to the web of interconnected data, we can turn the exciting opportunities that data flows on the web can pour into our business into real experience.

Back to top