Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Web Crawler

A web crawler, or spider, is a type of bot that is typically operated by search engines like Google and
Bing. Their purpose is to index the content of websites all across the Internet so that those websites can
appear in search engine results.

Example:
import urllib
def get_page(url):
try:
return urllib.urlopen(url).read()
except:
return "No links found."

def get_next_target(s):
start_link = s.find('<a href=')
if start_link == -1:
return None, 0
start_quote = s.find('"', start_link)
end_quote = s.find('"', start_quote + 1)
url = s[start_quote + 1 : end_quote]
return url, end_quote

def print_all_links(page):
while True:
url, endpos = get_next_target(page)
if url:
print(url)
page = page[endpos:]
else:
break

print_all_links(get_page("https://www.facebook.com/"))

You might also like