Semantic Web Mining for Recipes
Edamam is a US start-up that worked with Ontotext to develop an extensive knowledge base about food and cooking. Today, the Edamam platform provides an authoritative source of cooking information that is semantically rich along with end-user interfaces for access to millions of recipes.
Like other forms of unstructured content, recipes and information about food is best discovered when semantic technology is part of the ingredients. Edamam’s goal was to create a comprehensive food knowledge base becoming an authoritative source of cooking information. They wanted to provide this information in an attractive interface allowing users to search by key words and food classifications. With millions of recipes in play, users also needed faceted search to refine the results.
In order to provide the most comprehensive information, Edamam was faced with the challenge of transforming all of the organic and implied knowledge about food into structured data. With data scattered all across the web, Edamam needed web crawling and mining technology to first identify and extract the recipes. Then they needed to semantically analyze the data by identifying entities, parsing the data, extracting the core information and classifying the results. Since recipes are duplicated in many cases, they needed to identify when two or more recipes were the same and eliminate duplicates. Finally, Edamam had to make all of this information accessible and update it on a regular basis.
Edamam used a blend of Ontotext technology focused on web mining, text analysis, ontologies, GraphDB and search to solve the problem. The Ontotext Web Mining Framework crawled sites and extracted recipes. Edamam adapted the crawlers to extend into more and more sites over time. Once the data was identified, extracted and classified, a link to the original site and full credit was provided.
Over time the Edamam food ontology (used to classify everything) included recipes, ingredients, nutrition information, measures, allergies, and more. Based on the semantic facts stored in GraphDB, Edamam applied inferencing to derive more data including cooking time, dietary restrictions (e.g. Vegan, Vegetarian, Kosher, etc), recipe classifications, recipe complexity, nutrition information per serving and the degree to which the recipe contributes to a balanced diet. Over 30 different classes of information and detailed attributes about the recipe are part of the Edamam knowledge base.
The solution factored in many domain-specific facts and “pragmatics” that allowed for data to be transformed semantically. For example, conversion from a measure (e.g. a cup) to the weight of the product depends on the state of the ingredient. Minced onions weigh more than chopped onions. Certain measures depend on the ingredients themselves – “a pouch of dry onion soup” has a different weight than a “pouch of flavor fresh tuna.” In addition, Edamam was able to transform semantic phrases such as “to taste”, “dash of”, “top it up” to default measures.
This comprehensive knowledge base of food and ingredients was enriched and transformed. It now comprises the Edamam database which is stored and updated in GraphDB on a continuous basis.
Here is a small example of specific measures and there relationships to one another that are part of the Edamam solution. It’s this type of knowledge base and classification system that drives search across millions of recipes. Users see highly relevant results that have been semantically enriched with information relevant to each individual search.
This information can be searched instantly. New facts can be inferred in real time. There’s a SPARQL end-point and full-text search using Lucene that has been integrated into GraphDB.
Enriched Data Mined Using Text Analysis from Ontotext
After extracting the parts of a recipe, Edamam used text Analysis and semantic annotation techniques provided by the Ontotext Semantic Platform to map ingredients, cooking techniques and tools to industry databases including the US Department of Agriculture’s Standard Reference which provides a list of some 9000 ingredients, including full nutrition information over 140 nutrients. A very comprehensive food description thesaurus was also integrated. These data sets allow Edamam to compute precise nutritional information, and filter by various dietary restrictions. The Edamam database is also mapped to available Linked Open Data, such as DBpedia and FreeBase.
Edamam in Action
This solution is more than a knowledge base of recipes. The Edamam vision includes using this platform to build recipe healthy eating applications, shopping applications, cooking robots and smart fridges. The initial release of the project includes two consumer applications.
The smart-phone application for iPhone and Android was developed by Ontotext’s sibling company Sirma Mobile. The first screen below shows a recipe view, and the user further refining (restricting) the result set by selecting a computed criterion “Balanced Diet”. The second screen shows detailed nutrition information:
The recipe detail screen shows instructions, ingredient list, dietary classifications, total energy, a bar of the fundamental nutrients, and detailed nutrition information:
The user interface provides efficient full-text search, ranking by various criteria, filtering by dietary restrictions and other recipe classifications.