Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and

parsing text with complex patterns of characters.

from __future__ import print_function


import re
lines = open("textfile", "r").readlines()
for line in lines:
if re.search(r"dream", line):
print(line, end="")

for line in lines:


if re.search(r"dream", line, re.IGNORECASE):#ingnoring case sensitive(Dream
also come here)
print line,
for line in lines:
if re.search(r"\sthe\w", line):
print(line,end="")

You can control which characters are matched in the square brackets using;

 [abc] Match a, b or c
 [a-z] Match any character between a to z
 [A-Z] Match any character between A to Z
 [a-zA-Z] Match any character from a to z and A to Z (any letter)
 [0-9] Match any digit
 [02468] Match any even digit
 [^0-9] Matches NOT digits (^ means NOT)

 for line in lines:


 if re.search(r"th[aiy]", line):
 print(line,end="")

You can also use repetition in your matching.

 * Match 0 or more times, e.g. \w* means match 0 or more word characters
 + Match 1 or more times, e.g. \w+ means match 1 or more word characters
 ? Match 0 or 1 times, e.g. \w? means match 0 or 1 word characters
 {n} Match exactly n times, e.g. \w{3} means match exactly 3 word characters
 {n,} Match at least n times, e.g. \w{5,} means match at least 5 word characters
 {m,n} Match between m and n times, e.g. \w{5,7} means match 5-7 word characters

 for line in lines:


 if re.search(r"\w{10,12}", line):
 print(line,end="")

m = re.search(r"the\s(\w+)", line)
This matches the followed by a space, followed by 1 or more word characters. 

m.group(0) returns the entire matched substring

Pattern extraction
If we have added extra groups, these would be available as m.group(2), m.group(3) etc., e.g.
try typing;

m = re.search(r"to (\w+), or not (\w+) (\w+)", line, re.IGNORECASE)


print(m.group(0))

to get To be, or not to be. Now look at the individual matches, e.g. type

print(m.group(1))

to get be, then type

print(m.group(2))

to get to, then finally type

print(m.group(3))

(\w+) one or more word character and it will take it as one group

for line in lines:


m = re.search(r"\sthe\s(\w+)", line, re.IGNORECASE)
if m:
print(line,end="")
print(m.group(1))
 for line in lines:
 if re.search(r"\w{10,12}", line):
 print(line,end="")

You might also like