Welcome to Scribd!

21 Project 2 Parsing Data From A HTML File With Python and REGEX

Uploaded by

0% found this document useful (0 votes)

11 views3 pages

This document discusses extracting tabular data from an HTML file using regular expressions (REGEX) in Python. It provides an example HTML file with country names and languages in a table. The goal is to extract the country and language values between the first and sixth indices. The code opens the HTML file, finds the values using a REGEX pattern, and prints the expected output of a list of tuples with the country and language for each row.

Original Description:

Original Title

21 Project 2 Parsing Data From a HTML File With Python and REGEX

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

11 views3 pages

21 Project 2 Parsing Data From A HTML File With Python and REGEX

Uploaded by

ArvindSharma

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 3

Search inside document

Project 2: Parsing data from a HTML file with Python

and REGEX

In this project we use REGEX to nd some values from a HTML le

WE'LL COVER THE FOLLOWING

• Solution

In this project, we want to extract tabular information from a HTML file (see
below). Our goal is to extract information available between and except the
first numerical index ( 1..6 ).

Consider the data.html file below:

<html>
<head>
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
}
th {
text-align: left;
}
</style>
</head>
<body>
<table style="width:100%">
<tr align="center"><td>1</td> <td>England</td> <td>English</td></tr>
<tr align="center"><td>2</td> <td>Japan</td> <td>Japanese</td></tr>
<tr align="center"><td>3</td> <td>China</td> <td>Chinese</td></tr>
<tr align="center"><td>4</td> <td>Middle-east</td> <td>Arabic</td></tr>
<tr align="center"><td>5</td> <td>India</td> <td>Hindi</td></tr>
<tr align="center"><td>6</td> <td>Thailand</td> <td>Thai</td></tr>
</table>

</body>
</html>

If we load the HTML file onto a browser it should look like below:

A HTML file with table

Solution #
In this code, we first extract HTML data ( data.html ) and then find and extract
the values from the HTML code.

main.py

data.html

import re

with open('data.html', 'r') as myfile:

data=myfile.read().replace('\n', '')

result=re.findall(r'<td>\w+</td>\s<td>(\w+)</td>\s<td>(\w+)</td>',data)
print(result)

Expected output :

[('England', 'English'),
('Japan', 'Japanese'),
('China', 'Chinese'),
('India', 'Hindi'),
('Thailand', 'Thai')]

Unit4_CompletePPT_2024[1]
Document62 pages
Unit4_CompletePPT_2024[1]
ashayamal2003
No ratings yet
Basic (HTML, CSS, JavaScript)
Document30 pages
Basic (HTML, CSS, JavaScript)
Kyaw Zin Nyein
No ratings yet
HTML and CSS
Document77 pages
HTML and CSS
Laxmi Burri
No ratings yet
Practicals Web Technology
Document16 pages
Practicals Web Technology
AnshumanSrivastav
No ratings yet
Practical WebTech
Document39 pages
Practical WebTech
prasankumarmishra21
No ratings yet
Lab 1 - Basic HTML
Document9 pages
Lab 1 - Basic HTML
Hadeer Ehab
No ratings yet
Web Module 2
Document133 pages
Web Module 2
D28shravan v suvarna
No ratings yet
Ict Indivudual
Document6 pages
Ict Indivudual
Jonathan Tariku
No ratings yet
Practical 1 Introduction To HTML. Create A Basic HTML File
Document11 pages
Practical 1 Introduction To HTML. Create A Basic HTML File
Anil Suthar
No ratings yet
Web Desighning Lab Manual
Document75 pages
Web Desighning Lab Manual
Open World
No ratings yet
Tables in HTML
Document12 pages
Tables in HTML
Wanjiru Ian Sampaiyo Maloi
No ratings yet
1.HTML Program To Print Hello
Document37 pages
1.HTML Program To Print Hello
mayankdevsingh4
No ratings yet
Web Technologies - by G. Sreenivasulu Unit - I: Handout - 1
Document13 pages
Web Technologies - by G. Sreenivasulu Unit - I: Handout - 1
Rao Rao
No ratings yet
HTML Quick Reference
Document6 pages
HTML Quick Reference
Abdo Shair
No ratings yet
Group 1 Presenting
Document46 pages
Group 1 Presenting
jhonloyd.estrera
No ratings yet
Web Tech Ex 1-1
Document11 pages
Web Tech Ex 1-1
dummy.athik
No ratings yet
2.more On HTML
Document9 pages
2.more On HTML
Sourabh
No ratings yet
HTML 180911083335
Document14 pages
HTML 180911083335
muna cliff
No ratings yet
HTML Tables and Forms & Advanced CSS
Document39 pages
HTML Tables and Forms & Advanced CSS
Peaceful World
No ratings yet
Htmlbasicsforabeginner 180109035906
Document37 pages
Htmlbasicsforabeginner 180109035906
manu
No ratings yet
Basics of HTML
Document7 pages
Basics of HTML
shubhamchavhan0143
No ratings yet
Programming Table Attributes in HTML
Document31 pages
Programming Table Attributes in HTML
Boy Bawang
No ratings yet
Lab 05
Document8 pages
Lab 05
taimur baig
No ratings yet
An Introduction To HTML
Document118 pages
An Introduction To HTML
arjitcse
No ratings yet
Output
Document26 pages
Output
Patel Bansari
No ratings yet
Subject Code 308
Document41 pages
Subject Code 308
Abhishek
0% (1)
HTML
Document50 pages
HTML
harithakongi
No ratings yet
HTML
Document68 pages
HTML
Nushail Ziyadh
No ratings yet
Tables HTML
Document8 pages
Tables HTML
armygamer0007
No ratings yet
Unit 2 HTML
Document48 pages
Unit 2 HTML
Ashish Bhoye
No ratings yet
Subject Code 308
Document59 pages
Subject Code 308
Abhishek
No ratings yet
Web Tech Presentation
Document48 pages
Web Tech Presentation
Simbarashe Hlanya
No ratings yet
Practical 1
Document45 pages
Practical 1
lofofop718
No ratings yet
Web Page Development Using HTML
Document126 pages
Web Page Development Using HTML
Hussien Mekonnen
No ratings yet
Write A HTML Page To Print Hello World.: Code
Document22 pages
Write A HTML Page To Print Hello World.: Code
Krunal Shah
No ratings yet
Wta Svit Notes Mod 2
Document40 pages
Wta Svit Notes Mod 2
Vinayaka Gombi
No ratings yet
Unit Ii Web Designing
Document55 pages
Unit Ii Web Designing
rathna
No ratings yet
HTML Slide
Document14 pages
HTML Slide
Kim Sor
No ratings yet
Module 2 18CS63
Document40 pages
Module 2 18CS63
abhishek
No ratings yet
HTML
Document20 pages
HTML
a balu
No ratings yet
Web-Module II - Chapter1
Document16 pages
Web-Module II - Chapter1
Samba Shiva Reddy.g Shiva
No ratings yet
HTML
Document68 pages
HTML
Kumar Gaurav
No ratings yet
309P WP Record PDF
Document68 pages
309P WP Record PDF
Muni Gani
No ratings yet
Web Module II
Document40 pages
Web Module II
Tony
No ratings yet
316 WT Week1
Document7 pages
316 WT Week1
kondapuramprashanthreddy
No ratings yet
HTML-Tables, Forms and Frames
Document13 pages
HTML-Tables, Forms and Frames
xilice9481
No ratings yet
HTML
Document28 pages
HTML
Matteo Do
No ratings yet
S
Document41 pages
S
YUNG CHEE CHIN
No ratings yet
8 HTML Tables
Document9 pages
8 HTML Tables
debasishroutray358
No ratings yet
HTML Cheat Sheet
Document7 pages
HTML Cheat Sheet
sam deb
No ratings yet
HTML - Tables and Forms
Document33 pages
HTML - Tables and Forms
Ankit Kumar
No ratings yet
WD PPT HTML Unit-2
Document38 pages
WD PPT HTML Unit-2
kb181203
No ratings yet
html
Document45 pages
html
dheepikaraytp551
No ratings yet
Introduction To HTML: Created By: Blesilda B. Vocal
Document20 pages
Introduction To HTML: Created By: Blesilda B. Vocal
Maan Caboles
No ratings yet
WEB T 001
Document19 pages
WEB T 001
akshat.2226cs1031
No ratings yet
Smart Technologies: Ammber Nosheen
Document71 pages
Smart Technologies: Ammber Nosheen
Syed Arslan
No ratings yet
Notes - More On HTML
Document8 pages
Notes - More On HTML
asd
No ratings yet
02 Laboratory Exercise 2 - ARG
Document2 pages
02 Laboratory Exercise 2 - ARG
merryjoybarba05
No ratings yet
Hahha
Document2 pages
Hahha
Elyza Marione Gamboa
No ratings yet
Learn HTML and CSS In 24 Hours and Learn It Right | HTML and CSS For Beginners with Hands-on Exercises
From Everand
Learn HTML and CSS In 24 Hours and Learn It Right | HTML and CSS For Beginners with Hands-on Exercises
Ibnul Jaif Farabi
No ratings yet
9 Python Regex Group Functions
Document2 pages
9 Python Regex Group Functions
ArvindSharma
No ratings yet
16 Basics
Document2 pages
16 Basics
ArvindSharma
No ratings yet
18 Python Lookbehind
Document2 pages
18 Python Lookbehind
ArvindSharma
No ratings yet
8 Python Regex Match Vs Search Functions
Document2 pages
8 Python Regex Match Vs Search Functions
ArvindSharma
No ratings yet
7 Python Regex Search Function Coding Exercise
Document2 pages
7 Python Regex Search Function Coding Exercise
ArvindSharma
No ratings yet
3 Quiz 1 REGEX Patterns
Document4 pages
3 Quiz 1 REGEX Patterns
ArvindSharma
No ratings yet
4 Python Regex Match Function
Document4 pages
4 Python Regex Match Function
ArvindSharma
No ratings yet