Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Computational Gastronomy

Monsoon 2020 – Week 03

 RECIPE DATA
We shall now investigate the relationship among cuisines, recipes, ingredients and
ingredient categories.

A ‘cuisine’ comprises of a set of recipes assessed to be of similar geo-cultural origin.

An ‘ingredient category’ represents a set of ingredients assessed to be of similar nature.

The following figure demonstrates the relationship among these entities.

 RECIPE DATA STRUCTURE


A recipe could, therefore, most conveniently be represented as a follows,
Recipe-ID: A delimited list of Ingredient-IDs
Here, for the moment, we are conveniently ignoring the other relevant details of a recipe,
such as ingredient quantity, preprocessing, order of appearance, cooking process applied
and such.

For example, a cuisine with 100 recipes may be represented as—


Recipe-01 : Ingredient-01, Ingredient-02, Ingredient-15, Ingredient-19, Ingredient-06
Recipe-02 : Ingredient-06, Ingredient-12, Ingredient-08, Ingredient-15
.
.
.
Recipe-100 : Ingredient-06, Ingredient-11, Ingredient-15, Ingredient-13

A more structured way of representing these recipes is via a paired list of ‘Recipe-ID—
Ingredient-ID ’.

The above set of recipes therefore could be represented as—

Recipe-01 — Ingredient-01
Recipe-01 — Ingredient-02
Recipe-01 — Ingredient-15
Recipe-01 — Ingredient-19
Recipe-01 — Ingredient-06

Recipe-02 — Ingredient-06
Recipe-02 — Ingredient-12
Recipe-02 — Ingredient-08
Recipe-02 — Ingredient-15
.
.
.
Recipe-100 — Ingredient-06
Recipe-100 — Ingredient-11
Recipe-100 — Ingredient-15
Recipe-100 — Ingredient-13

 RECIPE DATA STATISTICS


Following are some of the interesting statistics that could be generated using this
representation of the cuisine: recipe size distribution, ingredient popularity statistics,
category compisition and itemset mining.
 Recipe Size Distribution:

Recipes Data Statistics

The size of a recipe


(s) is defined as the number of ingredients in it. The recipe size distribution of recipes
is observed to follow a normal distribution.

Recipe size distribution for world cuisines from 22 regions. The number of recipes of a given size
in a cuisine are normalized by enumerating ‘Percentage of recipes’. The inset shows the
distribution for all the recipes across the world regions. Adapted from Singh and Bagler, ‘Data-
driven investigations of culinary patterns in traditional recipes across the world’, 2018 IEEE 34th
International Conference on Data Engineering Workshops, Paris.
Every recipe that is part of the cuisine is a cultural legacy and has been transmitted over
generations. By definition a recipe using a single ingredient (s=1) would not be considered
to be meaningful, since the for the purpose of present analysis we shall focus on recipes
that have survived due to the ‘magic’ of ingredient combinations.

Intuitionally, smaller size recipes would be easy to transmit. However, such recipes would
also be considered ‘too simple’ iconic cultural legacy. On the other hand, while complex
recipes that combine a large number of ingredients potently represent culturally nuances,
such recipes would be rarer due to, both, the difficulty in passing on with subtleties of
cooking protocol as well as because they would be harder to reproduce. With such
forces/constraints shaping the recipe sizes it is understandable that cuisines across the
world present with a normal distribution with a typical recipe size of s=10.

 Ingredient Popularity Statistics:


The popularity of an ingredient in a cuisine is defined as the number/fraction of recipes
that it is used in.

Frequency-Rank statistics of cuisines (also known as Ingredient Popularity Statistics) for world
cuisines from 22 regions. The normalized frequency follows a power distribution suggesting the
presence of skewed use/popularity of a few ingredients. The inset shows the statistics for all
the recipes across the world regions. Adapted from Singh and Bagler, ‘Data-driven
investigations of culinary patterns in traditional recipes across the world’, 2018 IEEE 34th
International Conference on Data Engineering Workshops, Paris.

By the very design, the nature of the Frequency-Rank statistics is bound to be a


monotonously decreasing function short of a uniform use of each ingredient. However, it
is interesting that this statistics consistently presents with a power law across the world
cuisines. This is indicative of underlying mechanistic forces that shape the recipe
composition leading to the presence of a few ingredients dominating a cuisine. While the
exact ingredients that top the popularity chart in each of the cuisine may vary, each cuisine
is dictated by the power law distribution.

 Category Composition:
Category composition analysis is a coarse grained view of recipe structures to probe the
presence/dominance of ingredient of each category.

The heatmap of the Category Composition matrix for world cuisines from 22 regions. The data
is normalized within each cuisine to represent the fraction of all ingredient instances that are
from a given category. Adapted from Singh and Bagler, ‘Data-driven investigations of culinary
patterns in traditional recipes across the world’, 2018 IEEE 34th International Conference on
Data Engineering Workshops, Paris.

The category composition matrix provides insights into the dominant ingredient
categories (spice, dairy, vegetable etc.) that dictate the recipes of a cuisine as well as
similarities/differences across cuisines by way of their ingredient use.

 Frequent Itemset Mining:


The pattern among the ingredient combinations that dominate the recipe composition is
another key feature that resonates well the popular notion of food pairing. Every cuisine
is known to have a culinary intuition of ingredients that go well with each other. Some
ingredient pairs (combinations) are more preferred, more frequently used among the
recipes than others.

Frequent itemset mining captures this notion to identify ingredient combinations (pairs,
triads and higher order sets) that are frequently used across the recipes.

You might also like