Professional Documents
Culture Documents
Regular Expressions
Regular Expressions
• DFA
Find the longest match
Deterministic, fast, state machine must be built
• NFA
Leftmost, longest (in order)
ordering of RE changes pattern matched.
• POSIX NFA
“longest of the leftmost”
for multiple matches starting at the same (leftmost)
position, return the one matching the most text.
Regular Expressons
1 of 20
Lecture 14
EECS 498 Winter 2000
Examples
M = (Q, S, d, q0, F ) where Q = q0..3, S = { 0, 1 }, F = { q0 }
1
start q0 q2
1
0 0 0 0
1
q1 q3
1
Regular Expressons
2 of 20
Lecture 14
EECS 498 Winter 2000
An FSA to accept decimal strings:
0...9
B
‘.’
0...9
0...9
0...9
S ‘-’ A H
‘.’
‘.’ 0...9
Regular Expressons
3 of 20
Lecture 14
EECS 498 Winter 2000
The DFA engine serves as the basis for some language parsing
engines ...
Regular Expressons
4 of 20
Lecture 14
EECS 498 Winter 2000
Structuring Elements for REs
Meta Meaning
Chars
. (period) Match any single character
* (kleene star) Match zero or more REs
[] Match any character within brackets
0-9 matches digits
a-z matches lower case characters
“-” in first position matches “-”
“^” in first position inverts set
^ matches beginning of line
$ matches end-of-line
{a,b} Match count of preceding pattern (from ‘a’ to
‘b’ times). ‘b’ optional
Regular Expressons
5 of 20
Lecture 14
EECS 498 Winter 2000
Regular Expressons
6 of 20
Lecture 14
EECS 498 Winter 2000
Construction of NFA from RE
λ
S E F S F
λ E1 λ
E
F
λ λ
S
λ λ S
λ A λ F
E2
Alternation: (E1 | E2) Closure (E*)
Regular Expressons
7 of 20
Lecture 14
EECS 498 Winter 2000
( + | - | λ ) d+ ≡ ( + | - | λ ) d d*
(+|-|λ) d d*
S A F
- d d*
S A F
λ
- d d*
S A B F
λ
+ d
- d λ λ
S A B C F
λ
Regular Expressons
8 of 20
Lecture 14
EECS 498 Winter 2000
Perl Operators & Operands
• $_, $ARG
default input, string pattern matched.
• $‘, $PREMATCH
string preceding last successful match
• $&, $MATCH
string matched by last successful pattern match
• $’, $POSTMATCH
string following last successful pattern match
• $+, $LAST_PAREN_MATCH
last bracket matched by last search
Regular Expressons
9 of 20
Lecture 14
EECS 498 Winter 2000
Any program that uses these variables (or calls functions that
do), forces the match engine to make copies of text string used
for matching. This can be VERY expensive for large strings.
$a = “hello world”;
$a =~ /^he/ # true
$a =! /^help/ # also true.
Regular Expressons
10 of 20
Lecture 14
EECS 498 Winter 2000
Regex Operators and their Precedence
Parentheses ( PATTERN)
(?: PATTERN)
Multipliers ? + * {m,n}
?? +? *? {m,n}?
Alternation |
Regular Expressons
11 of 20
Lecture 14
EECS 498 Winter 2000
Pattern Matching
• match: m/PATTERN/
• substitution: s/PATTERN/REPLACE/mods
• split: split PATTERN
Modifier Meaning
Regular Expressons
12 of 20
Lecture 14
EECS 498 Winter 2000
Traditional NFA
Matching Strategy
We’re given a string and a regular expression (RE). The
processing “engine” will establish if the string is a member of the
language defined by the RE.
Regular Expressons
13 of 20
Lecture 14
EECS 498 Winter 2000
Consider /x*y*/
Regular Expressons
14 of 20
Lecture 14
EECS 498 Winter 2000
\B non word boundary (inverse of \b)
beginning of string
\A
matches only once if /m specified
end of string
\Z
matches only once if /m specified
Lookahead assertion
(?=PATTERN)
Matches if PATTERN follows
Lookahead
(?!PATTERN)
Matches if PATTERN does NOT follow
Regular Expressons
15 of 20
Lecture 14
EECS 498 Winter 2000
e) Quantified Atom
The atom is matched some number of times
Maximal Minimal
Expected Range
greedy lazy
* *? {0,}
+ +? {1,}
? ?? {0,1}
Regular Expressons
16 of 20
Lecture 14
EECS 498 Winter 2000
The maximal (or greedy) form can cause LOTS of backtracking.
Regular Expressons
17 of 20
Lecture 14
EECS 498 Winter 2000
\a,\n
\d,\D digit, non-digit
\w,\W word ([a-zA-Z_0-9]), non-word
\s,\S whitespace, non-whitespace
Regular Expressons
18 of 20
Lecture 14
EECS 498 Winter 2000
RE Extensions for Perl
• Comment: (?#text)
• No Backreference: (?:PATTERN)
$pattern = “testString”;
if (/$pattern/i)
same as
$pattern = “(?i)testString”;
if (/$pattern/)
Regular Expressons
19 of 20
Lecture 14
EECS 498 Winter 2000
POSIX NFA
Must examine ALL cases to find the longest left-most match
(not just first match)
Regular Expressons
20 of 20
Lecture 14
EECS 498 Winter 2000