Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

WEB

SCRAPING
What is
SCRAPING
converting unstructured documents
into structured information
What is
WEB
SCRAPING
Web Scraping is a technique to fetch data and
information from websites.
Everything you see on a webpage can be
scraped.
What is
WEB
SCRAPING
Web scraping is a technique to extract large amounts of
data from websites whereby the data is extracted and
saved to a local file in your computer.

The data can be used for several purposes like displaying on your own
website and application, performing data analysis or for any other reason.
WEB
SCRAPING
There are mainly two ways to extract data from a website:
Use the API of the website (if it exists). For example, Facebook has the Facebook
Graph API which allows retrieval of data posted on Facebook.

Access the HTML of the webpage and extract useful information/data from it.
This technique is called web scraping or web harvesting or web data extraction.
WORLFLOW
essential parts of web scraping

Web Scraping follows this workflow:


Get the website - using HTTP library
----->(Requests)
Parse the html document - using any parsing library
----->(beautifulsoup and lxml)
Store the results - either a db, csv, text file, etc
------>(pandas)
Need Of
WEB
SCRAPING
What about a thousand webpages or even more.
When no API is provided or there is only limited
number of requests.
Online tools with less customizations.
Learn something new and reduces manual effort
No Rate Limiting
Web pages contain wealth of information(text form),
designed mostly for human consumption
usage of
WEB SCRAPING
in real life

Extract product information


Extract job postings and internships
Extract offers and discounts from deal-of-the-day
websites
Preparing data set for your ML model
Extract data to make a search engine
E-Commerce price comparer

You might also like