10 1109@conecct50063 2020 9198450

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Ingredient/Recipe Algorithm using Web Mining and

Web Scraping for Smart Chef


Shilpa Chaudhari, Aparna R., Vinay G Tekkur, Pavan G L., and Shreekanth R Karki
Department of Computer Science and Engineering
M. S. Ramaiah Institute of technology, Bangalore-560054
Email: shilpasc29@msrit.edu, aparna@msrit.edu, gtvinay058@gmail.com, pavangl21ms@gmail.com, shrikantkarki2@gmail.com

Abstract— Due to busy lifestyle and irregular food habits, history: the information acquired from process number 3 is
many of us have to eat diet food according to ingredients saved in tables and database.
parameters. It is difficult to ask chef to make variety of recipes In this paper, an algorithm is proposed to extract ingredient
as per the diet plan. Many recipes exist on number of recipe
based recipes or recipes from the websites. The proposed
websites, tempting the chef to prepare in their kitchen without
knowing the diet ingredient proportion and side effects. In ingredient detection algorithm provides information about
addition, listing all recipes containing a specific ingredient is the ingredients associated with various recipes. Using this
still not an easy job. This paper proposes an algorithm for information, the chef will get a complete idea of whether to
extracting all the recipes' details using web scraping and use that particular ingredient in the diet plan or not. The
searching recipes containing intended ingredient using python final product is web application interacting with the user
and MongoDB. Web scraping retrieves the contents of a web through recipe search, ingredient search, recipe nutrients
page using python scrappy library and form database of them and master chef as four main features. The paper is
in the MongoDB format. This database is used for further organized in four sections including this introduction
research on smart chef application for healthy diet dishes in
section. Section 2 explains the proposed web scraping
various varieties. A web and mobile app is developed for smart
chef to maintain healthy life with variety of dishes. algorithm for smart chef. Result obtained is discussed in
Section 3. Finally Section 4 concludes the paper based on
Keywords— web scraping, web extracting, Smart chef the proposed algorithm and results obtained.
II. RELATED WORKS
I. INTRODUCTION
This section discusses information regarding extraction
Maintaining healthy lifestyle is an art managed through
using web mining/scraping as there is no specific work
cooking. Stepping into kitchen triggers many recipes to one’s
mind according to availability of ingredient or according to related to web mining for smart chef in existing literature.
diet plan given by the medical practitioner. Chef searches The authors of [1] discuss how to extract information using
recipes on the internet for variety of dishes in their kitchen web mining/scrapping, which can be useful for smart chef
appreciated by the customers/family members. They come application to extract ingredient from recipes from various
across various types of recipe related web sites during the web-sites. Automatic data extraction from the HTML of
search, which includes various ingredients and direction for website by parsing the webpages technique uses specially
preparation. For maintaining health, if they have to follow coded programs and converts it into another format. Web
some diet plan, it will be difficult for them to search recipes scraping analyzes and stores web data into structured form
including the specific ingredient. Few works exist on in a central database/spreadsheet.
extracting information from various web sites using web
Cloud-based scraping architecture for unstructured data
scraping. However, its application for ingredient searching
for smart chef is not still proposed by anybody as per our acquisition from the web is proposed in [2] using libraries
knowledge through literature review. like BeautifulSoup, Scrapy, Selenium, web driver API,
Web scraping techniques are used to extract information HTMLParser library of Python. Selenium and web driver
from websites in an automatic way by parsing hypertext tags API tools are selected here for automating web page data
and retrieving plain text information embedded onto them extraction. Instance of virtual machine based on Elastic
from large amounts of data from the web. Whether you are a compute cloud(EC2) of Amazon web services is used for
data scientist, engineer, or anyone else who analyzes large implementing this cloud-based scrapping architecture.
amounts of datasets, the ability to scrape data from the web Web Crawler or web spider usually crawls website starting
is a useful skill to possess. Let's say you find data from the from first page for link identification in each page, which is
web, and there is no direct way to download it, web scraping stored as data structure that is used to open the respective
using Python is a skill you can use to extract the data into a webpage recursively repeats the process, till all links get
useful form that can be imported. Web scrapping techniques crawled. Data Extractor extracts the useful information and
are elaborated into four main processes. (1) Creating converts that into needed format [3].
scrapping template: inserting scrapping template by defining The authors of [4] presented web scrapping technique for
HTML documents from website whose information is collecting historical tweets within any date range using web
collected. (2) Exploring site navigation making website scraping techniques bypassing for Twitter API restrictions.
navigation exploration system from websites whose Hypertext tags are used to retrieve plaintext information.
information is collected. (3) Automating navigation and Scraping Twitter Search endpoint and customizing queries
extraction: from the processes numbered 1 and 2, fields in order to extend searching capabilities for collecting
automation from the data and information acquired from the history of the tweets within specified date using Scrapy, an
websites is conducted. (4) Extracting data and package open source framework for extracting data from websites
written in Python.

978-1-7281-6828-9/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on September 22,2020 at 05:24:10 UTC from IEEE Xplore. Restrictions apply.
Web scrapping with Naïve Bayes Classification is used for customers or recommend whether to use a particular
job search engine in four processes[5]. (1) Creating ingredient in the recipe or not based on customers wish or
scrapping template: inserting scrapping template by defining health conditions.
HTML documents from website whose information is The functionality of this proposed work includes scraping
collected. (2) exploring site navigation making website recipe website, store extracted recipe into database,
navigation exploration system from websites whose retrieving data from recipe database, and implementing the
information is collected (3) automating navigation and smart chef application.
extraction: from the processes number 1 and 2, automation Scraping recipe website refers to extracting information
from the data and information acquired from the websites is related to recipes such as recipe name, ingredients used,
conducted (4) extracting data and package history: the URL of the recipe etc from recipe websites. Store extracted
information acquired from process number 3 is saved in recipe into mongoDB database functionality - basically
tables and database. stores three fields namely recipe name, ingredients, and
Web scraping technology used in [6] collect real time recipe URL from the web scrapping results. The stored
weather data from various websites and provide updated database can be used in smart chef recipe for retrieving data
weather data online to form a weather dataset. from recipe database. The smart chef algorithm takes two
NewsOne - An Aggregation System for News in [7]- data items namely recipe name and ingredients, input in the
platform uses web scraping/crawling method to extract the form of a list and provides list of recipes for mentioned
content from various news websites. It aggregates all the ingredient.
latest news updates from multiple national and international The main blocks of the algorithm is designed to realize
resources and summarizes them to present in a short and efficiency and flexibility of ingredient/recipe collection
crisp words. It provides a service oriented interaction among algorithm for smart chef as shown in Figure 1. In this paper,
the users from across web. Pycharm and MongoDB compass are used.
Adaptive automatic web-scraping system extracts Python using framework called Django facilitates web-
information from web pages consisting of repetitive blocks. scraping or web-crawling using python package called
Each block represents product-offer object and containing scrappy. The results are stored using mongoDB.
its attributes such as offer-title, offer-description, offer-
expiry, etc. using a novel classification-based approach [8].
Start
Commonly used structure based web scraping tools need to
be manually reconfigured as soon as the structure of web
page changes. An automatic web-scraping system, that is Web scraping of Store results
Recipe
adaptive to structural changes is given in [9] extracting recipe sites Database
information from web pages consisting of repetitive blocks.
The web-scraping framework of [10] offers an easy and
feasible approach by parsing and extracting data on a large Ingredient input
scale from multiple websites with minimal human Recipe/ingredient
intervention for harvesting learning objects for an eLearning algorithm
application.
It is concluded that web scraping is a very efficient Recipe results
technique to extract data from different websites which in
further can be used for various purposes, like web mining, No
data mining, weather data monitoring etc. Web scraping Success
technique is inexpensive, easy to implement, low
maintenance, higher speed and accuracy, which can be used Yes
for our smart chef application to extract ingredients from
various recipe websites. End

Figure 1: Blocks of Web Scraping for Smart chef


III. PROPOSED WEB SCRAPING FOR SMART CHEF
This paper deals with web scraping of recipes and Scraping recipe websites: Scraping refers to extract
ingredients to get recipes based on specific ingredient that information related to recipes such as recipe name,
helps the chef to deal with ingredient in which he is ingredients used, URL of the recipe etc from recipe
interested, to choose recipes based on ingredients or to websites. Scraping recipe website called
maintain diet plan in terms of ingredient used in recipes. www.foodnetwork.com is used in this proposed algorithm..
Initially, only few websites are considered, which includes Three fields namely name of the recipe, ingredients used in
the various recipes from which three parameters such as the recipe, and link for a complete reference for the recipe
Name of the recipe, Ingredient used in the recipe and finally are accessed from this websites. Python package called
link to access that recipe are retrieved. This obtained scrapy which allows us to convert website content into a
information is stored in the MongoDB database. Web specific format is used to implement such a platform.
scraping technique is used for converting web information Store result into database: scrapped results are in the format
into a specific storable format. Using the ingredient of Recipe Name, Ingredients of the recipe and URL of the
information, the chef can prepare the diet plan for the recipe, which is stored in mongoDB database automatically

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on September 22,2020 at 05:24:10 UTC from IEEE Xplore. Restrictions apply.
for further execution of the algorithm. Database created
contains three fields namely recipe_name, ingredients, and
recipe_url.
Getting data from database: allows to access database
required for recipe/ingredient algorithm.
Recipe algorithm takes two data sets namely recipe name
and ingredients used, in the form of a list and provides list
of recipes for mentioned ingredient. The algorithm supports
retrieval of four different search results from the database in
smart chef application which include normal recipe search,
search recipe based on particular ingredient, search based on
nutritional details and smart chef recipes based on diet plan.
Interactive interface is implemented in this proposed
algorithm in which user will be able to select ingredient of
his/her choice and list of recipes along with information of
those recipes or URL of the recipe can be obtained. User
will enter the name of the Ingredient in which he is
interested and the results will in the form of a list of recipes.
The various objects involved in this communication with its
attributes and methods are given in Figure 2.

Figure 3: Objects Communication of Web Scraping for


Smart chef

Algorithm 1:
Input : Recipe name and ingredients from the database.
Output : List of recipes based on an ingredient to be
searched.
Access the database
my_dictionary : Store all recipe names and its
ingredients in one dictionary.
ingredient_name : Take Ingredient input from user.
list_keys : Store dictionary keys in one list.
for i in list_keys then
List_of_values = my_dictionary[i]
For j in List_of_values then
Nested_list = split j with delimiter as default
If ingredient_name in Nested_list then
Append i to the List_of_wanted_keys list
Break from loop
Figure 2: Objects involved Web Scraping for Smart chef Print the List_of_wanted_keys to get the output.
The communication between various objects involved is
shown in Figure 3. The recipe algorithm is given in ingredient based recipe listing for smart chef based on diet
Algorithm 1 wherein List_of_wanted_keys consist all plan. Web scraping for recipe application is developed using
recipes name in which searched ingredient exist. HTML parsing created in Python programming language in
Anaconda platform running on windows OS. Script iscreated
IV. DISCUSSION OF RESULT using scrapy python library and the site used for scraping is
www.foodnetwork.com.
This section discusses the results obtained from after web
The web app home page of our website initially shows top
scraping of the recipe web sites and web app results for
30 recipes from various websites. The feature of searching

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on September 22,2020 at 05:24:10 UTC from IEEE Xplore. Restrictions apply.
recipes is provided here based on ingredients. User can
search recipes from many ingredients input as a list of
comma separated values as well as to indicate multiple
ingredients in one recipe. Here comma acts as delimiter for
ingredients.
Finally, the results are obtained based on ingredients.
Searching feature in which user can enter any number of
ingredients separated by comma is shown in Figure 5. Each
recipe item in result consist of two buttons namely Details
(which provides details of the recipe) and Recipe URL (This Figure 7: DevTools CPU chart for Smart Chef
will directly takes us to actual Recipe website from which
the information is collected). V. CONCLUSION
The python based web scraping helps to collect recipe data
automatically as per the requirements of the proposed recipe
algorithm. The resulted scrapped data is analyzed to get the
ingredient-based recipe as per the diet plan specified by the
medical practitioner so that customer can enjoy variety of
dishes as per diet plan to maintain his health conditions. It
can advice the chef in preparing the food in a smart way
based on patients health conditions and taste. This can be
further aided with machine learning algorithm to automate
the scrapping of web pages and search for recipes.
REFERENCES
[1] Malik, S.K. and Rizvi, S.A., 2011, October. Information extraction
Figure 5: Web Scraping result using web usage mining, web scrapping and semantic annotation.
In 2011 International Conference on Computational Intelligence and
Communication Networks (pp. 465-469). IEEE.
When user clicks Details Button details of the recipe will
[2] Chaulagain, R.S., Pandey, S., Basnet, S.R. and Shakya, S., 2017,
appear as shown in above screen shot, which has Ingredients November. Cloud based web scraping for big data applications.
used for the recipe, and two Buttons namely Publisher URL In 2017 IEEE International Conference on Smart Cloud
and Recipe URL as shown in Figure 6. They take us to (SmartCloud) (pp. 138-143). IEEE.
publisher (Recipe website eg:www.allrecipe.com) page and [3] Mahto, D.K. and Singh, L., 2016, March. A dive into Web Scraper
Recipe page respectively. world. In 2016 3rd International Conference on Computing for
Sustainable Global Development (INDIACom) (pp. 689-693). IEEE.
[4] Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, K.,
Martinez-Hernandez, V., Sanchez, V. and Perez-Meana, H., 2018. A
web scraping methodology for bypassing twitter API
restrictions. arXiv preprint arXiv:1803.09875.
[5] Slamet, C., Andrian, R., Maylawati, D.S.A., Darmalaksana, W. and
Ramdhani, M.A., 2018, January. Web scraping and Naïve Bayes
classification for job search engine. In IOP Conference Series:
Materials Science and Engineering (Vol. 288, No. 1, p. 012038). IOP
Publishing.
[6] Kunang, Y.N. and Purnamasari, S.D., 2018, October. Web Scraping
Techniques to Collect Weather Data in South Sumatera. In 2018
International Conference on Electrical Engineering and Computer
Science (ICECOS) (pp. 385-390). IEEE.
[7] Sundaramoorthy, K., Durga, R. and Nagadarshini, S., 2017, April.
Newsone—an aggregation system for news using web scraping
Figure 6: Ingredient based recipe for Smart Chef method. In 2017 International Conference on Technical
Advancements in Computers and Communications (ICTACC) (pp.
The main metric for measuring the performance of any 136-140). IEEE.
animation is Frame rate expressed in frames per second or [8] Ujwal, B.V.S., Gaind, B., Kundu, A., Holla, A. and Rungta, M., 2017,
FPS and the frequency (rate) at which consecutive images December. Classification-Based Adaptive Web Scraper. In 2017 16th
IEEE International Conference on Machine Learning and
called frames appear on a display. FPS taken by Applications (ICMLA) (pp. 125-132). IEEE.
recipe/ingredient search using DevTools is given in Figure 7 [9] Dastidar, B.G., Banerjee, D. and Sengupta, S., 2016. An Intelligent
wherein the Summary tab provides accurate details based on Survey of Personalized Information Retrieval using Web
recipe requests, data transfer between application and load Scraper. International Journal of Education and Management
times. Application spent most of it's time on rendering and Engineering, 6(5), pp.24-31.
idle state wherein rendering refers to sharing data within it's [10] Upadhyay, S., Pant, V., Bhasin, S. and Pattanshetti, M.K., 2017,
February. Articulating the construction of a web scraper for massive
components and idle refers to user does not actively data extraction. In 2017 Second International Conference on
participated on that time frame indicating successful Electrical, Computer and Communication Technologies
performance of the application without participation of the (ICECCT) (pp. 1-4). IEEE.
user.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on September 22,2020 at 05:24:10 UTC from IEEE Xplore. Restrictions apply.

You might also like