Web Scraping by Using Regular Expressions

Uploaded by

myaswanthkrishna9706

0% found this document useful (0 votes)

8 views8 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

8 views8 pages

Web Scraping by Using Regular Expressions

Uploaded by

myaswanthkrishna9706

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 8

Search inside document

Web Scraping by using

Regular Expressions
Regex Parsing

• from bs4 import BeautifulSoup:

Beautiful Soup is a library that makes it easy to scrape information from
web pages.
Link : webscraping
https://www.webscrapingapi.com/parse-html-like-a-pro-scraping-with-
python-and-regex
import requests

• The requests module allows you to send HTTP requests

using Python.
• Make a request to a web page, and print the response
text:
• import requests

x=requests.get('https://www.irctc.co.in/nget/train-
search')

print(x.text)
• url="https://akshardham.com/"
• page = requests.get(url)
• page.content

• # parse rhe data

• soup = BeautifulSoup(page.content,"html.parser")
• print(soup.prettify())

• re.findall(r'<title>(.*?)</title>',page.text)
Extract text that is before or after specific
keywords.

text regex capture group result

price: $14.99 inc.VAT price:\s+([^\s]+) 1 $14.99

4.2 out of 5 stars ([^\s]+) out of 1 4.2

date: 2014-08-20 \d+-\d+-\d+ 0 2014-08-20

• import re

• # Example HTML content

• html_content = '''
• <html>
• <head>
• <title>Web Scraping Example</title>
• </head>
• <body>
• <h1>Web Scraping with Regular Expressions</h1>
• <ul>
• <li><a href="https://example.com/page1">Page 1</a></li>
• <li><a href="https://example.com/page2">Page 2</a></li>
• <li><a href="https://example.com/page3">Page 3</a></li>
• </ul>
• </body>
• </html>
• # Regular expression pattern to find links
• link_pattern = re.compile(r'<a\s+href=["\'](https?://[^"\']+)["\']',
re.IGNORECASE)

• # Find all links using the regular expression pattern

• links = link_pattern.findall(html_content)

Web Programming
Document36 pages
Web Programming
rahulchaudhary27075
No ratings yet
Web Scraping Takeaways
Document2 pages
Web Scraping Takeaways
Herisatry Lubaba
No ratings yet
Chapter 11. Web Scraping
Document57 pages
Chapter 11. Web Scraping
Arindam Dutta
100% (1)
Web Scrapping
Document57 pages
Web Scrapping
Arindam Dutta
No ratings yet
Multithreading Crawler Project OS
Document11 pages
Multithreading Crawler Project OS
Fizza Ahmed
No ratings yet
1.1 Web Scraping
Document34 pages
1.1 Web Scraping
ines
No ratings yet
Session 3 Data Aquisition - Updated
Document40 pages
Session 3 Data Aquisition - Updated
Alessandro Sinai
100% (1)
Topics: HTML (Hypertext Markup Language)
Document17 pages
Topics: HTML (Hypertext Markup Language)
Muhaimin Abdul
No ratings yet
WT Cit 1 Imp 2m
Document5 pages
WT Cit 1 Imp 2m
Kavya Nandhini
No ratings yet
An Introduction To Tornado
Document34 pages
An Introduction To Tornado
Gavin M. Roy
No ratings yet
Web Development Foundation Study Guide: Cygen Technology 1
Document25 pages
Web Development Foundation Study Guide: Cygen Technology 1
Bruno Hlaing Wayne Aung
No ratings yet
Module 5.3: HTTP & Content Rendering
Document23 pages
Module 5.3: HTTP & Content Rendering
Harpreet Singh
No ratings yet
Overview of Javascript and Dom: Cs 299 - Web Programming and Design
Document18 pages
Overview of Javascript and Dom: Cs 299 - Web Programming and Design
Yzabel Margaux
No ratings yet
Web Programming HTML
Document48 pages
Web Programming HTML
adhamelthn
No ratings yet
Slot 21,22-JSTL
Document34 pages
Slot 21,22-JSTL
Pham Duc Dat K17 HL
No ratings yet
Web Design - Course by Wan
Document89 pages
Web Design - Course by Wan
elvi.malik
No ratings yet
Python Web Crawler
Document15 pages
Python Web Crawler
Achmad Agung Setiawan
No ratings yet
Introduction To Web Programming
Document48 pages
Introduction To Web Programming
Magical music
100% (1)
Sandeep Kumar Patel: Web Page Performance
Document5 pages
Sandeep Kumar Patel: Web Page Performance
Sandeep Patel
No ratings yet
How To Create A Wordpress Theme
Document13 pages
How To Create A Wordpress Theme
sameerroushan
No ratings yet
SEO Cheat Sheet v.0.4: Element Example Note
Document3 pages
SEO Cheat Sheet v.0.4: Element Example Note
Javier Nieto
No ratings yet
SEO Cheatsheet
Document3 pages
SEO Cheatsheet
pcanete
33% (3)
My Document
Document7 pages
My Document
clash tv
No ratings yet
Python Django Presentation
Document16 pages
Python Django Presentation
Shtabre
No ratings yet
Web Scarpping
Document4 pages
Web Scarpping
Aashish Kumar
No ratings yet
SESION 10 (Pandas 2)
Document120 pages
SESION 10 (Pandas 2)
2marlenehh2003
No ratings yet
Unit-2 Working With Links, Images, Forms and Multimedia PDF
Document66 pages
Unit-2 Working With Links, Images, Forms and Multimedia PDF
Akshat Shah
No ratings yet
HTML 5 and CSS3
Document40 pages
HTML 5 and CSS3
Aarti karpe
No ratings yet
Codigos 1
Document71 pages
Codigos 1
Newbie Shy
No ratings yet
JWD - Unit 4 - Using CSS For Web Designing - PPT
Document16 pages
JWD - Unit 4 - Using CSS For Web Designing - PPT
SusantoPaul
No ratings yet
Creating Website With Web Forms
Document20 pages
Creating Website With Web Forms
YarkiisaHarunWiilka
No ratings yet
Web Application Development: Bootstrap
Document17 pages
Web Application Development: Bootstrap
Nguyễn Phương Thảo
No ratings yet
Web Design and Development Lecture 2
Document39 pages
Web Design and Development Lecture 2
Ibraheem Baloch
No ratings yet
Manipulating HTML Using Nokogiri
Document3 pages
Manipulating HTML Using Nokogiri
rdpoor
No ratings yet
CSE2045Y - Lecture 1 - Review of Web Related Concepts
Document21 pages
CSE2045Y - Lecture 1 - Review of Web Related Concepts
splokbov
No ratings yet
Web Engineering: Hyperlinks, Tables & Frames
Document21 pages
Web Engineering: Hyperlinks, Tables & Frames
Radhay35335
No ratings yet
PHP Full Stack Development
Document21 pages
PHP Full Stack Development
Badal Ranjan
No ratings yet
Hands-On Python Programming With HTTPX
Document46 pages
Hands-On Python Programming With HTTPX
secure.test987
No ratings yet
chp2 - Sem6 Bca
Document42 pages
chp2 - Sem6 Bca
aishwarya saji
No ratings yet
A Simple Python Web Crawler...
Document5 pages
A Simple Python Web Crawler...
tnasrevid
100% (1)
SP2 Predavanje 1 - HTTP
Document63 pages
SP2 Predavanje 1 - HTTP
akomocar
No ratings yet
Web Report 3
Document22 pages
Web Report 3
Saad
No ratings yet
Web Application Development
Document53 pages
Web Application Development
Anuj Mathur
No ratings yet
Creating A Hypertext Document
Document38 pages
Creating A Hypertext Document
imadpr
No ratings yet
Introduction To Web Programming
Document49 pages
Introduction To Web Programming
Mauricio González
No ratings yet
Web Scrapping: From NP-10
Document11 pages
Web Scrapping: From NP-10
Bagas Prawira Adji Wisesa
No ratings yet
Overview of Javascript and Dom: Instructor: Dr. Fang (Daisy) Tang
Document18 pages
Overview of Javascript and Dom: Instructor: Dr. Fang (Daisy) Tang
Larisa Vasilache
No ratings yet
Webmatrix: Web Pages
Document69 pages
Webmatrix: Web Pages
mani5558
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
Document8 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
jeanie
No ratings yet
Fusion
Document34 pages
Fusion
api-3844034
No ratings yet
CGI Using Python
Document26 pages
CGI Using Python
eee.22beec97
No ratings yet
3.3 Coding For The Web
Document51 pages
3.3 Coding For The Web
abdul shahid
No ratings yet
Web Scripting 3rd Edition
Document224 pages
Web Scripting 3rd Edition
Moises Soriano
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
Document6 pages
A Guide To Web Scraping in Python Using Beautiful Soup
paco
No ratings yet
Unblur
Document122 pages
Unblur
Inna
No ratings yet
Experiment No 1
Document16 pages
Experiment No 1
Uddhav Rodge
No ratings yet
SEO Point
Document5 pages
SEO Point
meet makwana
No ratings yet
Download
Document4 pages
Download
SAIFUR RAHMAN
No ratings yet
Selenium Notes
Document55 pages
Selenium Notes
sageswagbaba
No ratings yet
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet
CSE326 Lec11 Part4
Document8 pages
CSE326 Lec11 Part4
myaswanthkrishna9706
No ratings yet
Unit 1
Document53 pages
Unit 1
myaswanthkrishna9706
No ratings yet
Unit 2
Document95 pages
Unit 2
myaswanthkrishna9706
No ratings yet
Unit 3 Mine Updated
Document45 pages
Unit 3 Mine Updated
myaswanthkrishna9706
No ratings yet