Abhishek

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 10

DESIGN AND IMPLEMENTATION OF

“WEB SCRAPING”

A Power Point Presentation


Of CS 705 (Major Project)

SUBMITTED TO:- SUBMITTED BY:-


Mr.Mayank kumar Sharma Abhishek Saxena (0837cs071002)
Aditya Gupta (0837cs071003)
Apoorv Naik (0837cs071011)
Gaurav Agrawal (0837cs071019)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SCHOOL OF ENGINEERING
SANGHVI INSTITUTE OF MANAGEMENT AND SCIENCE
PIGDAMBAR,RAU
CONTENT

 ABSTRACT
 INTRODUCTION
 WEB SCRAPING
 WHAT IS CRAWLER ?
 ARCHITECTURE OF CRAWLER
 PROBLEM DOMAIN
 SOLUTION DOMAIN
 HOW DOES WEB SCRAPING WORKS ?
ABSTRACT
 We have made this software utility for data
extraction from websites.
 This project is a desktop application that will be
used for extracting the web pages in a specified
format. User just has to type the URL of the
website and the keyword to be searched on the
website.
 The software will display the desired output in
specified format as user clicks on the result
button.
INTRODUCTION

Web scraping (also called Web harvesting or Web


data extraction) is a computer software technique
of extracting information from websites. Web
scraping is closely related to Web indexing, which
indexes Web content using a bot and is a universal
technique adopted by most search engines. Web
scraping is the process of automatically collecting
Web information.
WEB SCRAPING

Web Scraping is basically based on four parts:


 Web Crawler

 Database

 Search Algorithm

 Search system that binds all the above together


WHAT IS CRAWLER ?

A Web crawler is a computer program that


browses the World Wide Web (www) in a
methodical, automated manner or in an orderly
fashion. This process is called Web crawling or
spidering.
ARCHITECTURE OF CRAWLER
PROBLEM DOMAIN

The problem of manual website data extraction


causes inefficient data extraction and archiving of
the data that we are interested in because of the
fact that the crawling of such large amount of
websites or even a single one for finding the
relevant data is not 100% possible for a human
operator as there are constraints ,and to err is
human tendency. In order to resolve the flaws of
the human restrictions and increase the efficiency
of the process ; Web Scraping /Web harvesting
was devised .
SOLUTION DOMAIN
 Web Scraping as a solution offers the seamless
capabilities of the raw computing power and also
removes majority of constraints like time restrictions,
physical/mental stress and workload capability, all
these are not an issue for the computers.
 With Web Scraper, you can extract data from multiple
pages in a blink of an eye thanks to the multithreaded
crawling technology that downloads up to 20 threads
simultaneously. Simply click for data extraction and
that’s all! No need to spend time on browsing and
tedious cut-and-paste operations.
HOW DOES SOFTWARE WORKS ?

User just has to type the URL of the website from


which Web Scraper will start crawling, then
specify the crawling rules and then specify
keyword to be searched on the website. Once an
extraction project is set, you can execute it with
one mouse click.
The software will display the desired output in
specified format as user clicks on the result
button.

You might also like