Web Scrapping

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

What is Web Scraping?

Some websites can contain a very large amount of invaluable data, stock prices product
details, sports stats, you name it if you wanted to access this information you either have to use
whatever format the website uses or copy and paste the information manually into a new
document this can be pretty tedious when you want to extract a lot of information from a
website here's where web scraping can help.
Web scraping simply refers to the extraction of data from a website where this
information is collected and then exported into a format that is more useful to the user. For
example, you can use web scraping to export a list of product names and prices from Amazon
onto an Excel spreadsheet.
Overview of how web scraping works.

websites come in many shapes and forms and as a result web scrapers can vary in
functionality and features. So how do web scrapers work and tackle complex sites? First, a web
scraper will be given one or more URLs to load before scraping. The scraper then loads the
entire HTML code for the page in question. More advanced scrapers will render the entire
website including CSS and JavaScript elements. Then the scraper will either extract all the data
on the page or specific data selected by the user before the project is run. Ideally the user will
go through the process of selecting the specific data they want from the page. For example,
you might want to scrape an Amazon product page for prices and models but are not
necessarily interested in product reviews. Lastly, the web scraper will output the data that has
been collected into a format that is more useful to the user. Most web scrapers will output data
to a CSV or Excel spreadsheet, while more advanced scrapers will support other formats such
as JSON which can then be used for an API. Web scrapers can also come in many different
forms with very different features on a case-by-case basis. For example, web scrapers can
come as a browser extension or more powerful desktop application that is downloaded to your
computer. Web scrapers can also scrape sites locally from your computer using your computer
resources and your internet connection or work on the cloud without using your computer
resources or your internet connection.
By this point you can probably think of several different ways in which web scrapers can
be used. A few uses include scraping stock prices to make better investing decisions, scraping
data from Yellow Pages to generate leads, scraping data from a store locator to create a list of
business locations, scraping product data from sites like Amazon or Ebay for competitor
analysis, or scraping sports stats for betting or fantasy leagues.
So now that you know the basics of web scraping you're probably wondering what is
the best web scraper for you? The obvious answer is that it depends! It's way easier to know
which web scraper is best for you the more you know about your web scraping needs.

What is Web Scraping?


Web Scraping, mainly, is the process of extracting data from a website. This could be
done manually where a person copies and pastes information to a document. However, this
process could be quite tedious especially if you are dealing with a large amount of data. This
why numerous applications provide programs that reduce the number of workload through
different web scraping websites/applications

What are the different web scraping applications?


Each web scraper comes with different features. There are web scrapers that are offered
as a browser extension and there are those that can be downloaded from your computer. Some
scrapers can also extract data locally from your computer and there are some that work on the
cloud. But the web scraper that is best suited for you really depends on what your needs are.

What are several instances where web scraping is useful?


Web scraping can be very useful on any instance that you need to collect data from a
certain website. Some of the examples include
1. Extracting the list, prices and descriptions of online selling websites.
2. Extracting food menu items, prices, ingredients list and the like from online menus.
3. Collecting contact information from contact list sources
4. The sky’s the limit to what web scraping can do.

For the purpose of this project, I will be using Parsehub as my web scraping tool to extract data
as it is free and convenient to use. Therefore, I will specifically create a web scraping manual for
Parsehub as a guide to web scraping.
1. SETTING UP YOUR PARSEHUB APPLICATION
a. Visit https://www.parsehub.com
b. Click the ‘Download Parsehub for free’

c. Click the Parsehub setup

d. Click yes to agree to changes


e. Click ‘next’ to setup the application
f. Select ‘standard’ for the setup preference

g. Then the download will automatically start setting up

2. CREATING YOUR PARSEHUB ACCOUNT


a. After downloading, setup an account by pressing ‘sign up’
b. Create an account by providing the details asked

3. WEB SCRAPING
a. After setting up your account, the interface will look like this and you may now
start creating a project. A tutorial will automatically pop up to guide you through.
b. In creating a project, start by clicking on the

c. On the top left portion, you will be asked to paste the URL of the website you
wish to extract data from. In this case, we will be using a Parsehub sample where
we will extract data from a movie website. Paste the link on the blank space
provided. After pasting, click the green button ‘start project on this url’
d. Selecting elements is the basis of Parsehub. To pick the items you want to
interact with, click the blue button (selection) on the top left corner. Rename
‘selection 1’ to ‘movies’
e. PIck the movie titles you want to interact with. In this case, click ‘The Shawshank
Redemption’

f. Once you click the item, elements in the current selection will be highlighted
green. You will also be able to see yellow boxes. These are suggestions from
Parsehub which you might also want to select. To add another item, just click on
the yellow boxes.
g. After selecting which items you want, proceed to confirming your selection. You
can confirm this by checking the top left blue button. Here you will be able to see
how many items you have selected. Remember that all items that you have
picked are now boxed in green.

h. Parsehub features a number of commands. Here are some of the commands it


features.
i. In Parsehub, your selection needs to have commands. To add commands, go
back to the top left blue button and click the ‘+’ sign

j. One of the most important commands in Parshub is the ‘relative select’ button.
This allows you to add additional information about the Selection. To add the
relative selection button click the plus sign then click ‘relative select’
k. A ne blue button will pop up. Change the name of the button to the classification
you want to select. In this example, rename the box ‘IMAX’
l. After which, click on the movie title then the IMAX showtime. You will be able to
see an arrow. Do this to all the items you want to select.

m. Now, when selecting, we have to give Parsehub a pattern to follow. In this


portion, click the ‘character’ then the corresponding actor.

n. After selecting, we are now ready to extract data. To get you data, you have to
run your project. You can do this by clicking the gear icon on the top left corner of
the interface.
o. Then click ‘get data.’

p. A new pop up will then appear. This will be the Run page. You start your project
from running by clicking the green button ‘Run’
q. A new pop up will then appear. Since the program still has to process data, you
will have to wait for a few minutes before getting everything done.

r. Once your data is ready, the box will turn green. And you will now be able to
download your result through different formats. Since we want to extract an excel
format of the data, we will be selecting the CVS/Excel button.
s. Here is the extracted data. Every item selected in the earlier process is now
organized in the Excel.

And that is how you use Parshub as a web scraping tool to extract data.

You might also like