Professional Documents
Culture Documents
Regular Expres
Regular Expres
You can control which characters are matched in the square brackets using;
[abc] Match a, b or c
[a-z] Match any character between a to z
[A-Z] Match any character between A to Z
[a-zA-Z] Match any character from a to z and A to Z (any letter)
[0-9] Match any digit
[02468] Match any even digit
[^0-9] Matches NOT digits (^ means NOT)
* Match 0 or more times, e.g. \w* means match 0 or more word characters
+ Match 1 or more times, e.g. \w+ means match 1 or more word characters
? Match 0 or 1 times, e.g. \w? means match 0 or 1 word characters
{n} Match exactly n times, e.g. \w{3} means match exactly 3 word characters
{n,} Match at least n times, e.g. \w{5,} means match at least 5 word characters
{m,n} Match between m and n times, e.g. \w{5,7} means match 5-7 word characters
m = re.search(r"the\s(\w+)", line)
This matches the followed by a space, followed by 1 or more word characters.
Pattern extraction
If we have added extra groups, these would be available as m.group(2), m.group(3) etc., e.g.
try typing;
to get To be, or not to be. Now look at the individual matches, e.g. type
print(m.group(1))
print(m.group(2))
print(m.group(3))
(\w+) one or more word character and it will take it as one group