Web Scraping C18

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

VIGNAN’S INSTITUTE OF MANAGEMENT AND TECHNOLOGY

FOR WOMEN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Mini project Review


On

To compare price of the products using “Web Scraping”


BY
K Sathwika(19UP1A05C4)
T Tushara Priya(19UP1A05E8)
U Sahithi (19UP1A05F1)

Under the guidance of


Internal Guide Head of the Department
Mrs P Archana MRS M Parimala
Asst Professor Associate Professor
Dept. of CSE,VMTW Dept of CSE,VMTW.
CONTENT
1. ABSTRACT 10.UML diagrams
2. INTRODUCTION 11.SYSTEM ARCHITECTURE
3. EXISTING SYSTEM 12.IMPLEMENTATION
4. DISADVANTAGES OF Existing 13.CODE
SYSTEM 14.RESULT
5. PROPOSED SYSTEM 15.CONCLUSION
6. ADVANTAGES OF PROPOSED 16.future scope
SYSTEM
17.REFERENCE
7. SYSTEM REQUIREMENTS
8. literature review
9. METHODOLOGY
ABSTRACT
• Web Scraping let us to collect data from web runners across the internet. In this project the
script searches for a product via URL and finds the price of the product.
• This project is particularly useful when we want to monitor the price of the specific item
from multiple eCommerce platforms.
• Here, in this project we have two major eCommerce websites to find the price of the
product .
• On each execution, the websites are crawled and the product is located, and the price of the
same product from all the sources is obtained and displayed on the console window.
• So the buyer can see the prices and make the decision to buy from the platform which offers
the lowest price.
INTRODUCTION
• Web scraping is a technique to fetch data from websites. While surfing on the
web, many websites don’t allow the user to save data for personal use.
• One way is to manually copy-paste the data, which both tedious and time-
consuming. Web Scraping is the automation of the data extraction process from
websites.
• This event is done with the help of web scraping software known as web
scrapers. They automatically load and extract data from the websites based on
user requirements.
• These can be custom built to work for one site or can be configured to work
with any website.
EXISTING SYSTEM
• In Existing system is the manual web data extraction process has two major
problems.
• Firstly, it can’t measure costs efficiently and can’t escalate it very quickly. The
data collection costs increase as more data is collected from each website.
• In order to conduct a manual extraction, businesses need to hire large number of
staffs, this increases the cost of labour significantly.
• Secondly, each manual extraction is known to be error prone.
• Further, if any business process is very complex then cleaning up the data can
get expensive and time consuming.
DISADVANTAGES OF EXISTING SYSTEM
• The existing system doesn’t enable us to rapidly scrape many websites at the
same time without having to watch and control every single request.
• You can also set it up just one time and it will scrape a whole website within an
hour or much less - instead of what would have taken a week for a single
person to complete.
• It is not easy to implement - This means that with onetime investment, the data
cannot be collected.
• Competitor Monitoring - It is not easy to monitor the competitors in the market
and the business world
PROPOSED SYSTEM
• To find the right price, you need to understand and be able to
predict how your customers react to price change.
• Web scraping allows you to compare price of the products that you
want to buy.
• Track how customers are reacting to changes in your competitors
prices or tweak your own prices and monitor how it affects sales.
• Create Applications for Tools that don’t have a public developer
API. Web scraping services provide an essential service at a low
cost.
ADVANTAGES OF PROPOSED SYSTEM
1.Time Efficient
The advantage of web Scraping is its time-efficient and low maintenance. For
example, downloading big data may take hours, and then analyzing every single
row manually at a time is worth spending your entire month.
2. Complete Automation
• Some advantages of automation are that it doesn’t get bored or tiring, does not
require any breaks, and never gets distracted they follow the given instructions.
• While we have advantages in tasks like analysis, running an algorithm across a
large dataset is faster and more effective than having someone manually read
through every document one by one.
ADVANTAGES(CONTINUE..)
3. Cost Efficiency
Web scraping services provide essential services at a competitive cost because it’s much
cheaper than hiring a company to perform the same task .
4. Track product performance
By monitoring listings and sales data, it allows you to see how well different products are
performing . Keeping track of your business has never been easier.
5. Data Accuracy
There are no humans are involved in this process, Simple errors in data extraction may
lead to major issues. Web scraping is not only a fast process, but it’s also very accurate
too . Hence, it’s necessary to ensure that the data is accurate.
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS -
Processor:11th Gen Intel(R) Core(TM) i3-1115G4 3.00GHz
Ram:8.00 GB
System type:64-bit operating system, x64-based processor
SOFTWARE REQUIREMENTS -
Operating System: windows 64-bit OS
Platform: jupyter (python 3.x with Selenium, Beautiful Soup, pandas libraries
installed)
Web Browser: Microsoft Edge Version 105.0.1343.50
LITERATURE REVIEW
S.NO Name of paper Technique/ algorithm Drawback
used

1 Data Analysis by web Scraping Python ,Web Scraping Time Consuming


Using Python (Pandas),
Implementing Web
Scrape
2 Web Scraping Using python Python ,Web Scraping Difficult to understand
(Beautiful Soup)

3 Web Scraping with Python: Python , Web Scraping protection policies


Successfully scrape data from (Selenium)
any website with the power of
Python
METHODOLOGY
Step - 1: Find the desired URL to scrap
The initial step is to find the URL that you want to scrap. Here we are extracting
product details from the flipkart and amazon. The URL of this page is
https://www.flipkart.com and https://www.amazon.in

METHODOLOGY
Step - 2: Inspecting the page
It is necessary to inspect the page carefully because the data is usually contained within the
tags. So we need to inspect to select the desired tag. To inspect the page, right-click on the
element and click "inspect".
Y
Step - 3: Find the data for extracting
Extract the price and name which are contained in the "div" tag, respectively.
Step - 4: Importing libraries and code execution
Import the libraries pandas, beautiful soup and requests etc,and write the code.
METHODOLOGY
USE CASE
SEQUENCE DIAGRAM
CLASS DIAGRAM
SYSTEM
ARCHITECTURE
System architecture defines the
structure of a software system. This
is usually a series of diagrams that
illustrate services, components,
layers and interactions.
Scheduler-
 A scheduler is a software product
that allows an enterprise to
schedule and track computer
batch tasks.
SYSTEM ARCHITECTURE
 Job schedulers may also manage the job queue for a computer cluster. A
scheduler starts by manipulating a prepared job control language algorithm or
through communication with a human user and taking the required URL.
Multi-thread downloader(download manager)-
 A Download Manager is basically a computer program dedicated to the task of
downloading stand-alone files from internet.
 Here, we are going to create a simple Download Manager with the help of
threads in Python. Using multi-threading a file can be downloaded in the form
of chunks simultaneously from different threads.
SYSTEM ARCHITECTURE
To implement this, we are going to create simple command line tool which
accepts the URL of the file and then downloads it.
Queue-
Downloads are put to the download queue and prioritised. From this we get the
required data from the website and can be stored in required format.
Storage-
The data obtained from the website is stored in the form of csv file or in the data
base as per the requirements of the user.
IMPLEMENTATION
• Beautiful Soup in Python is a web scraping library that allows us to
parse and scrape HTML and XML pages.
• You can search, navigate, and modify data using a parser. It’s
versatile and saves a lot of time.
• In this project we will learn how to scrape data using Beautiful Soup.
• The basic web crawling code used for the project which shows the
data crawled and stored in the database(csv file) of the products from
a social network site.
CODE
CODE
CODE
T
The overall results of the project turn out to be helpful to understand
the price of the products. The Web scrapy extracted the data and
made into csv file format.
The script which was written to extract the data turned out to be both
of finding each of these sources provided with great ease.
Moreover, the analysis done has shown the most rated product in the
site taken in the most rated review product format.
RESULT
CONCLUSION
The main outcomes of this project were user friendly search interface, indexing,
query processing, and effective data extraction technique based on web structure.
 Web scraping assist us to avail large-scale product data and also helps in gaining
data as per the requirement in a readable format.
Whether in e-commerce or e-marketing, the use of the technique of web scraping
will be the key to success as it will provide insight into the targeting market and
help decision makers.
Web scraping has become a modern necessity to stay competitive in business,
helping organizations to utilize data to track trends and strategize for the future.
The data could be used in real time to keep pricing in line with rival companies,
or could be used to track the misuse of data and illegal sales.
FUTURE SCOPE
As we go forward, marketing will become an even more competitive
exercise. Those who wish to arrive at a suitable marketing strategy
will need to derive deeper insights regarding the market and base
their marketing decisions the on data than other aspects.
For this, the future of marketing is closely linked with comparison of
price of products aggregated from various media sites, social media
platforms, web traffic etc.
FUTURE SCOPE
 Sentimental analysis is a popular
way for organizations to determine
and categorize opinions about a
product, service or idea.
 In future, it is set to increase
its role in decision making.
 Going forward , It will become an
integral part of policy framing and
strategic planning in all fields .
FUTURE SCOPE
 Going forward, sentiment analysis using web scraping will become a vital driver of
policy and strategy. Companies which will invest in web scraping for future will reap
huge dividends in terms of sentiment analysis and rich insights into customer
expectations and overall customer behaviour.
 Starting with Google, everyone needs data to process, analyse and streamline
information.
 The world of business has become more dynamic and responds to change immediately
and at times frequently.
 The prices keep fluctuating on e-commerce websites and a number of businesses are
keenly watching and analysing this data to rework their own strategy.
REFERENCE
[1]. Renita Crystal Pereira, Vanitha T. “Web Scraping of Social Networks.”
International Journal of Innovative Research in Computer and Communication
Engineering, vol. 3, pp.237-239, Oct. 7, 2018”
[2] Kaushal Parikh, Dilip Singh, Dinesh Yadav and Mansingh Rathod,
“Detection of web scraping using machine learning,” Open access international
journal of Science and Engineering, pp.114-118, Vol. 3, 2018.
[3] Anand V. Saurkar, Kedar G. Pathare and Shweta A. Gode, “An Overview on
Web Scraping Techniques and Tools,” International Journal on Future
Revolution in Computer Science & Communication Engineering, pp. 363-367,
Vol. 4, 2018.

You might also like