Professional Documents
Culture Documents
Perl Regular Expressions by Example
Perl Regular Expressions by Example
Perl Regular Expressions by Example
No, it doesn't. The binding operator = ~ with the match operator m / / does a pattern search on
$ m y s t r i n gand
returns true if the pattern is found. The pattern is whatever is between the m /and the
trailing / . (Note, there is no such thing as a ~ =operator, and using it will give a compile error.) Does the string contains the word "World", ignoring case?
i f ( $ m y s t r i n g= ~m / W o r l d / i ){p r i n t" Y e s " ;}
Yes, it does. The pattern modifier iimmediately after the trailing /changes the match to be caseinsensitive. I want "Hello world!" to be changed to "Hello mom!" instead.
$ m y s t r i n g= ~s / w o r l d / m o m / ; p r i n t$ m y s t r i n g ;
www.somacon.com/p127.php 1/6
4/10/13
Prints "Hello mom!". The substitution operator s / / / replaces the pattern between the s / and the middle / , with the pattern between the middle /and last / . In this case, "world" is replaced with the word "mom". Now change "Hello mom!" to say "Goodby mom!".
$ m y s t r i n g= ~s / h e l l o / G o o d b y e / ; p r i n t$ m y s t r i n g ;
This does not substitute, and prints "Hello mom!" as before. By default, the search is case sensitive. As before, use the pattern modifier iimmediately after the trailing /to make the search case-insensitive. Okay, ignoring case, change "Hello mom!" to say "Goodby mom!".
$ m y s t r i n g= ~s / h e l l o / G o o d b y e / i ; p r i n t$ m y s t r i n g ;
Prints "Yes". The pattern \ dmatches any single digit. In this case, the search will finish as soon as it reads the "2". Searching always goes left to right. Huh? Why doesn't "\d" match the exact characters '\' and 'd'? This is because Perl uses characters from the alphabet to also match things with special meaning, like digits. To differentiate between matching a regular character and something else, the character is immediately preceded by a backslash. Therefore, whenever you read '\' followed by any character, you treat the two together as one symbol. For example, '\d' means digit, '\w' means alphanumeric characters including '_', '\/' means forward slash, and '\\' means match a single backslash. Preceding a character with a '\' is called escaping, and the '\' together with its character is called an escape sequence. Okay, how do I return the first matching digit from my string?
www.somacon.com/p127.php 2/6
4/10/13
Prints "The first digit is 2." In order to designate a pattern for extraction, one places parenthesis around the pattern. If the pattern is matched, it is returned in the Perl special variable called $1. If there are multiple parenthesized expressions, then they will be in variables $1, $2, $3, etc. Huh? Why doesn't '(' and ')' match the parenthesis symbols exactly? This is because the designers of regular expressions felt that some constructs are so common that they should use unescaped characters to represent them. Besides parentheses, there are a number of other characters that have special meanings when unescaped, and these are called metacharacters. To match parenthesis characters or other metacharacters, you have to escape them like '\(' and '\)'. They designed it for their convenience, not to make it easy to learn. Okay, how do I extract a complete number, like the year?
$ m y s t r i n g=" [ 2 0 0 4 / 0 4 / 1 3 ]T h ed a t eo ft h i sa r t i c l e . " ; i f ( $ m y s t r i n g= ~m / ( \ d + ) / ){ p r i n t" T h ef i r s tn u m b e ri s$ 1 . " ; }
Prints "The first number is 2004." First, when one says "complete number", one really means a grouping of one or more digits. The pattern quantifier + matches one or more of the pattern that immediately precedes it, in this case, the \ d . The search will finish as soon as it reads the "2004". How do I print all the numbers from the string?
$ m y s t r i n g=" [ 2 0 0 4 / 0 4 / 1 3 ]T h ed a t eo ft h i sa r t i c l e . " ; w h i l e ( $ m y s t r i n g= ~m / ( \ d + ) / g ){ p r i n t" F o u n dn u m b e r$ 1 . " ; }
Prints "Found number 2004. Found number 04. Found number 13. ". This introduces another pattern modifier g , which tells Perl to do a global search on the string. In other words, search the whole string from left to right. How do I get all the numbers from the string into an array instead?
$ m y s t r i n g=" [ 2 0 0 4 / 0 4 / 1 3 ]T h ed a t eo ft h i sa r t i c l e . " ; @ m y a r r a y=( $ m y s t r i n g= ~m / ( \ d + ) / g ) ;
www.somacon.com/p127.php 3/6
4/10/13
p r i n tj o i n ( " , " ,@ m y a r r a y ) ;
Prints "2004,04,13". This does the same thing as before, except assigns the returned values from the pattern search into myarray.
Prints t e x ta l w a y sp r e c e d e st h ee n do ft h e . The pattern . * is two different metacharacters that tell Perl to match everything between the start and end. Specifically, the metacharacter . means match any symbol except new line. The pattern quantifier *means match zero or more of the preceding symbol. That isn't exactly what I expected. How do I extract everything between "start" and the first "end" encountered?
$ m y s t r i n g=" T h es t a r tt e x ta l w a y sp r e c e d e st h ee n do ft h ee n dt e x t . " ; i f ( $ m y s t r i n g= ~m / s t a r t ( . * ? ) e n d / ){ p r i n t$ 1 ; }
Prints t e x ta l w a y sp r e c e d e st h e . By default, the quantifiers are greedy. This means that when you say . * , Perl matches every character (except new line) all the way to the end of the string, and then works backward until it finds e n d . To make the pattern quantifier miserly, you use the pattern quantifier limiter ? . This tells Perl to match as few as possible of the preceding symbol before continuing to the next part of the pattern.
Conclusion
Regular expressions in Perl are very powerful, and there are many ways to do the same thing. I hope you find this page useful to get started in regular expressions. Hopefully, now you can read the specifications and get more out of it.
www.somacon.com/p127.php
4/6
4/10/13
Assertions
Assertions have zero width. ^ - Matches the beginning of the line $ - Matches the end of the line (or before a newline at the end) \B - Matches everywhere except between a word character and non-word character
www.somacon.com/p127.php 5/6
4/10/13
\b - Matches between word character and non-word character \A - Matches only at the beginning of a string \Z - Matches only at the end of a string or before a newline \z - Matches only at the end of a string \G - Matches where previous m / / gleft off
www.somacon.com/p127.php
6/6