This document describes a project that uses web scraping to compare product prices across different e-commerce websites. The project scrapes product price data from websites like Flipkart and Amazon and stores it in a CSV file or database. The methodology involves finding the URLs of products, inspecting webpage elements to identify where prices are stored, importing libraries like BeautifulSoup and Pandas for scraping, and using multithreading to download price data efficiently. The results allow buyers to easily compare prices and choose the lowest price. Overall, the web scraping approach automates price comparison and monitoring of multiple websites.
This document describes a project that uses web scraping to compare product prices across different e-commerce websites. The project scrapes product price data from websites like Flipkart and Amazon and stores it in a CSV file or database. The methodology involves finding the URLs of products, inspecting webpage elements to identify where prices are stored, importing libraries like BeautifulSoup and Pandas for scraping, and using multithreading to download price data efficiently. The results allow buyers to easily compare prices and choose the lowest price. Overall, the web scraping approach automates price comparison and monitoring of multiple websites.
This document describes a project that uses web scraping to compare product prices across different e-commerce websites. The project scrapes product price data from websites like Flipkart and Amazon and stores it in a CSV file or database. The methodology involves finding the URLs of products, inspecting webpage elements to identify where prices are stored, importing libraries like BeautifulSoup and Pandas for scraping, and using multithreading to download price data efficiently. The results allow buyers to easily compare prices and choose the lowest price. Overall, the web scraping approach automates price comparison and monitoring of multiple websites.
This document describes a project that uses web scraping to compare product prices across different e-commerce websites. The project scrapes product price data from websites like Flipkart and Amazon and stores it in a CSV file or database. The methodology involves finding the URLs of products, inspecting webpage elements to identify where prices are stored, importing libraries like BeautifulSoup and Pandas for scraping, and using multithreading to download price data efficiently. The results allow buyers to easily compare prices and choose the lowest price. Overall, the web scraping approach automates price comparison and monitoring of multiple websites.
FOR WOMEN DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Mini project Review
On
To compare price of the products using “Web Scraping”
BY K Sathwika(19UP1A05C4) T Tushara Priya(19UP1A05E8) U Sahithi (19UP1A05F1)
Under the guidance of
Internal Guide Head of the Department Mrs P Archana MRS M Parimala Asst Professor Associate Professor Dept. of CSE,VMTW Dept of CSE,VMTW. CONTENT 1. ABSTRACT 10.UML diagrams 2. INTRODUCTION 11.SYSTEM ARCHITECTURE 3. EXISTING SYSTEM 12.IMPLEMENTATION 4. DISADVANTAGES OF Existing 13.CODE SYSTEM 14.RESULT 5. PROPOSED SYSTEM 15.CONCLUSION 6. ADVANTAGES OF PROPOSED 16.future scope SYSTEM 17.REFERENCE 7. SYSTEM REQUIREMENTS 8. literature review 9. METHODOLOGY ABSTRACT • Web Scraping let us to collect data from web runners across the internet. In this project the script searches for a product via URL and finds the price of the product. • This project is particularly useful when we want to monitor the price of the specific item from multiple eCommerce platforms. • Here, in this project we have two major eCommerce websites to find the price of the product . • On each execution, the websites are crawled and the product is located, and the price of the same product from all the sources is obtained and displayed on the console window. • So the buyer can see the prices and make the decision to buy from the platform which offers the lowest price. INTRODUCTION • Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the user to save data for personal use. • One way is to manually copy-paste the data, which both tedious and time- consuming. Web Scraping is the automation of the data extraction process from websites. • This event is done with the help of web scraping software known as web scrapers. They automatically load and extract data from the websites based on user requirements. • These can be custom built to work for one site or can be configured to work with any website. EXISTING SYSTEM • In Existing system is the manual web data extraction process has two major problems. • Firstly, it can’t measure costs efficiently and can’t escalate it very quickly. The data collection costs increase as more data is collected from each website. • In order to conduct a manual extraction, businesses need to hire large number of staffs, this increases the cost of labour significantly. • Secondly, each manual extraction is known to be error prone. • Further, if any business process is very complex then cleaning up the data can get expensive and time consuming. DISADVANTAGES OF EXISTING SYSTEM • The existing system doesn’t enable us to rapidly scrape many websites at the same time without having to watch and control every single request. • You can also set it up just one time and it will scrape a whole website within an hour or much less - instead of what would have taken a week for a single person to complete. • It is not easy to implement - This means that with onetime investment, the data cannot be collected. • Competitor Monitoring - It is not easy to monitor the competitors in the market and the business world PROPOSED SYSTEM • To find the right price, you need to understand and be able to predict how your customers react to price change. • Web scraping allows you to compare price of the products that you want to buy. • Track how customers are reacting to changes in your competitors prices or tweak your own prices and monitor how it affects sales. • Create Applications for Tools that don’t have a public developer API. Web scraping services provide an essential service at a low cost. ADVANTAGES OF PROPOSED SYSTEM 1.Time Efficient The advantage of web Scraping is its time-efficient and low maintenance. For example, downloading big data may take hours, and then analyzing every single row manually at a time is worth spending your entire month. 2. Complete Automation • Some advantages of automation are that it doesn’t get bored or tiring, does not require any breaks, and never gets distracted they follow the given instructions. • While we have advantages in tasks like analysis, running an algorithm across a large dataset is faster and more effective than having someone manually read through every document one by one. ADVANTAGES(CONTINUE..) 3. Cost Efficiency Web scraping services provide essential services at a competitive cost because it’s much cheaper than hiring a company to perform the same task . 4. Track product performance By monitoring listings and sales data, it allows you to see how well different products are performing . Keeping track of your business has never been easier. 5. Data Accuracy There are no humans are involved in this process, Simple errors in data extraction may lead to major issues. Web scraping is not only a fast process, but it’s also very accurate too . Hence, it’s necessary to ensure that the data is accurate. SYSTEM REQUIREMENTS HARDWARE REQUIREMENTS - Processor:11th Gen Intel(R) Core(TM) i3-1115G4 3.00GHz Ram:8.00 GB System type:64-bit operating system, x64-based processor SOFTWARE REQUIREMENTS - Operating System: windows 64-bit OS Platform: jupyter (python 3.x with Selenium, Beautiful Soup, pandas libraries installed) Web Browser: Microsoft Edge Version 105.0.1343.50 LITERATURE REVIEW S.NO Name of paper Technique/ algorithm Drawback used
1 Data Analysis by web Scraping Python ,Web Scraping Time Consuming
Using Python (Pandas), Implementing Web Scrape 2 Web Scraping Using python Python ,Web Scraping Difficult to understand (Beautiful Soup)
3 Web Scraping with Python: Python , Web Scraping protection policies
Successfully scrape data from (Selenium) any website with the power of Python METHODOLOGY Step - 1: Find the desired URL to scrap The initial step is to find the URL that you want to scrap. Here we are extracting product details from the flipkart and amazon. The URL of this page is https://www.flipkart.com and https://www.amazon.in • METHODOLOGY Step - 2: Inspecting the page It is necessary to inspect the page carefully because the data is usually contained within the tags. So we need to inspect to select the desired tag. To inspect the page, right-click on the element and click "inspect". Y Step - 3: Find the data for extracting Extract the price and name which are contained in the "div" tag, respectively. Step - 4: Importing libraries and code execution Import the libraries pandas, beautiful soup and requests etc,and write the code. METHODOLOGY USE CASE SEQUENCE DIAGRAM CLASS DIAGRAM SYSTEM ARCHITECTURE System architecture defines the structure of a software system. This is usually a series of diagrams that illustrate services, components, layers and interactions. Scheduler- A scheduler is a software product that allows an enterprise to schedule and track computer batch tasks. SYSTEM ARCHITECTURE Job schedulers may also manage the job queue for a computer cluster. A scheduler starts by manipulating a prepared job control language algorithm or through communication with a human user and taking the required URL. Multi-thread downloader(download manager)- A Download Manager is basically a computer program dedicated to the task of downloading stand-alone files from internet. Here, we are going to create a simple Download Manager with the help of threads in Python. Using multi-threading a file can be downloaded in the form of chunks simultaneously from different threads. SYSTEM ARCHITECTURE To implement this, we are going to create simple command line tool which accepts the URL of the file and then downloads it. Queue- Downloads are put to the download queue and prioritised. From this we get the required data from the website and can be stored in required format. Storage- The data obtained from the website is stored in the form of csv file or in the data base as per the requirements of the user. IMPLEMENTATION • Beautiful Soup in Python is a web scraping library that allows us to parse and scrape HTML and XML pages. • You can search, navigate, and modify data using a parser. It’s versatile and saves a lot of time. • In this project we will learn how to scrape data using Beautiful Soup. • The basic web crawling code used for the project which shows the data crawled and stored in the database(csv file) of the products from a social network site. CODE CODE CODE T The overall results of the project turn out to be helpful to understand the price of the products. The Web scrapy extracted the data and made into csv file format. The script which was written to extract the data turned out to be both of finding each of these sources provided with great ease. Moreover, the analysis done has shown the most rated product in the site taken in the most rated review product format. RESULT CONCLUSION The main outcomes of this project were user friendly search interface, indexing, query processing, and effective data extraction technique based on web structure. Web scraping assist us to avail large-scale product data and also helps in gaining data as per the requirement in a readable format. Whether in e-commerce or e-marketing, the use of the technique of web scraping will be the key to success as it will provide insight into the targeting market and help decision makers. Web scraping has become a modern necessity to stay competitive in business, helping organizations to utilize data to track trends and strategize for the future. The data could be used in real time to keep pricing in line with rival companies, or could be used to track the misuse of data and illegal sales. FUTURE SCOPE As we go forward, marketing will become an even more competitive exercise. Those who wish to arrive at a suitable marketing strategy will need to derive deeper insights regarding the market and base their marketing decisions the on data than other aspects. For this, the future of marketing is closely linked with comparison of price of products aggregated from various media sites, social media platforms, web traffic etc. FUTURE SCOPE Sentimental analysis is a popular way for organizations to determine and categorize opinions about a product, service or idea. In future, it is set to increase its role in decision making. Going forward , It will become an integral part of policy framing and strategic planning in all fields . FUTURE SCOPE Going forward, sentiment analysis using web scraping will become a vital driver of policy and strategy. Companies which will invest in web scraping for future will reap huge dividends in terms of sentiment analysis and rich insights into customer expectations and overall customer behaviour. Starting with Google, everyone needs data to process, analyse and streamline information. The world of business has become more dynamic and responds to change immediately and at times frequently. The prices keep fluctuating on e-commerce websites and a number of businesses are keenly watching and analysing this data to rework their own strategy. REFERENCE [1]. Renita Crystal Pereira, Vanitha T. “Web Scraping of Social Networks.” International Journal of Innovative Research in Computer and Communication Engineering, vol. 3, pp.237-239, Oct. 7, 2018” [2] Kaushal Parikh, Dilip Singh, Dinesh Yadav and Mansingh Rathod, “Detection of web scraping using machine learning,” Open access international journal of Science and Engineering, pp.114-118, Vol. 3, 2018. [3] Anand V. Saurkar, Kedar G. Pathare and Shweta A. Gode, “An Overview on Web Scraping Techniques and Tools,” International Journal on Future Revolution in Computer Science & Communication Engineering, pp. 363-367, Vol. 4, 2018.