Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

Your Oracle RX -

Regular Expressions in an Oracle World

Gravenstein, Rebholtz
Definition
 regular expressions provide a concise and
flexible means for identifying strings of text of
interest, such as particular characters, words,
or patterns of characters
Regular Expressions
 Used to search strings to find matching
patterns
 Match patterns can be fairly simple to
extremely difficult
 It’s much easier to understand your own
expressions!
 Search for match starts at beginning of string
and stops when first match is found
Found In
 Text Editors
 Unix Utilities
 ed (text editor)
 Grep

 Programming Languages
 Perl
 Tcl
 And since Oracle 10g
 SQL

 Application Express
Patterns
 Generally characters and patterns represent
themselves
 Special Characters

. Matches any character


\ Escapes special characters
\n New line
\r Carriage Return
\t Tab
\s space
Sample Patterns
 Pattern
 ‘800’
 matches 800
 ‘ORA’
 matches ORA
 ‘…..-….’
 Matches
 44313-2323
 A4313-d3r3
 444-313-23234
Simple Repetition
 Quantifiers

? Makes the previous item optional – 0 or 1


times
+ Repeats the previous item 1 or more times

* Repeats the previous item 0 or more times


Quiz
 Given a 9 digit zip code will the pattern
match?
Yes

'…..-…. '
Yes

'.?-.+'
Yes

'.*-.+'
Quantifies
 {count} defines an exact repetition count of
the prior object

 A{5} matches AAAAA and AAAAAAAA

 Zip code could be defined as .{5}-.{4}


More on Quantifiers …
 {m,n} defines an exact repetition count of the
prior object
 m minimum number of matches
 n maximum number of matches
A{1,5} matches A and AAA and AAAAAA

 {m,} defines a repetition count of m or more


Sample Patterns
Does the pattern 216-.{3,}-.{4,4} match?

Yes
Yes
216-588-5023
216-5888888-5023
Yes – skip the 011 and you
011-216-588-5023 get the match

011-216--58-5-23 Yes – . Matches anything

216-5888-888-5023
Yes – {3,) eats all charters to
the -5023
Anchors
 Anchors
^ Start of line
$ End of line
No Yes
Does ^…..-….$ match?
44313-1234 441313-1234 4431--abcd
44313-1234c a44313-1234
Yes
No No
Alternate and Grouping
[char] character list
| alternation (boolean or operator)
() group subexpression
Character Expression
[abc] defines a list of characters that can be
matched to a single a or b or c

Our zip code match can be expressed as

[0123456789]{5}-[0123456789]{4}
4 of the
Any 5 of the A dash prior pattern
number prior
pattern
Any
number
More on alternation
[9]|[1] 9 or 1

What is this string matching?


Group 1 Group 2 Order makes no difference

([8]{1})|([9]{1})[1234567890]{2}-[1234567890]
{3}-[1234567890]{4}

Or the
groups 900-555-5125
Or
823-123-4567
Ranges
[a-z] matches any letter from a to z
[0-9] matches any digit from 0 to 9
Order is important – must start with lower
and go to higher
[a-zA-Z] matches any letter from a to Z

Does [a0-3a-zA-Z] match


1A 45a ?*34 9
Yes Yes Yes No
Predefined Character Classes
(restricted to character lists)

[:alnum:] alphanumeric characters


[:alpha:] alphabetic characters
[:blank:] Blank space characters
[:cntrl:] control characters (non-printing)
[:digit:] any numeric
[:lower:] lower case alphabetic characters
[:print:] printable characters
[:punct:] punctuation characters
[:space:] space and non-printing (newline...)
[:upper:] upper case alphabetic characters
New Zip Code
Does [[:digit:]]{5}-[[:digit:]]{4} match
Yes Yes No

44313-1234 441313-1234 4431--abcd


44313-1234c a44313-1234
Yes Yes
Groups
What is?:

(^[[:digit:]]{5})(-[[:digit:]]{4})?$

Note:
[^[:digit:]]
This is a special case of ^ as it is in a character
list
When ^ is not at the beginning of a
string it negates
“Greediness”
Regular expression operators are greedy, they match
the maximum set
[a-z]+ will match the entire string
abcedef

[a-z]+? – add ? after the quantity, and you get the lazy
or minimum match

Will match just the letter a in 123abce


Greedy and Eager
“Eagerness” are not the same

Oracle’s regular expression parser is eager.


What that means is that the parser will stop
looking for a match once one is found in an
alternation group.

e.g. given pattern ‘regex|regex not’


And text ‘regex not’

Eager will return ‘regex’


What are these?
^[[:digit:]]{3}-[[:digit:]]{2}-[[:digit:]]{4,4}$

Matches SSN 302-77-1234

[-+]?([0-9]+.?[0-9]*|.[0-9]+)([eE][-+]?[0-9]+)?

Matches fp number +23.344E+12


SELECT REGEXP_REPLACE(
'23.2323e+12','[-+]?([0-9]+.?[0-9]*|.[0-9]+)([eE][-+]?[0-9]+)?','match')
FROM dual;
Amtrust Example Matched
10-MAR-2008

What’s the functional difference between


[ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC]-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9] Escape *
and
(ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC)-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9]
Does not match
10-MAR-2008
Amtrust Example
What’s the functional difference between
[ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC]-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9]
and
(ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC)-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9]
Group instead
of Character
List
Groups
Specified with ( )
Oracle supports up to 9
\1 is defined by the first open (
\2 is defined by the second open (
\3 is defined by the third open (
And so on through to \9

Once defined they can be used later in a


pattern – back reference
Back References
Round bracket create groups which can be referenced
later in the same pattern or in a replacement pattern.
Must start with a capital Opening tag again,
letter possibly followed must match exactly the
HTML tag: by additional letters text in the first parens
<([A-Z][A-Z0-9]*)\b*([^>])*>.*</\1>
Optional group can
occur 0 or more times
these are attributes

A blank followed by
zero or more letters not
>
Back References
Given pattern (([0-9]){3}-([0-9]){3}-([0-9]{4}))

And string “My number, 216-588-5023, is working”

What are:
\1 = 216-588-5023
\2 = 6
\3 = 8
\4 = 5023
Oracle Reg Expression Functions
10g provides these regular expression analogs for
existing string functions

REGEXP_LIKE
REGEXP_INSTR
REGEXP_SUBSTR
REGEXP_REPLACE

Source text can be CHAR, VARCHAR2, NCHAR,


NVARCHAR2, CLOB or NCLOB
Oracle 10g Regular Expressions
REGEXP_LIKE returns TRUE if a match found,
otherwise FALSE

REGEXP_INSTR returns character position of first


match, otherwise 0

REGEXP_SUBSTR returns first matched string, null


if no matches found

REGEXP_REPLACE replaces all matched strings,


returns original string if no matches found
Function Match Parameters
i case-insensitive matching

c Case sensitive matching, the default

n allow the . (period) to match new line char

m treats the source as multiple lines so that ^ and $


anchors work on each line
x ignores white space characters
REGEXP_LIKE
REGEXP_LIKE(
source,
pattern, -- Match pattern
match_parameters -- Function parameters
) RETURN BOOLEAN
REGEXP_LIKE
u31125@DEDW> SELECT 'Yes this is a match'
2 FROM dual
3 WHERE regexp_like('44313-2345',
4 '[[:digit:]]{5}-[[:digit:]]{4}'
5 );

'YESTHISISAMATCH'
-------------------
Yes this is a match
REGEXP_INSTR
REGEXP_INSTR(
source,
pattern,
start_position, --start searching from here
occurrence, --which occurrence should be ret
return_position, -- 0 start of occurrence
-- 1 end of occurrence
match_parameters
) RETURN NUMBER
REGEXP_INSTR
U31125@DEDW>SELECT
2 REGEXP_INSTR(
3 'The quick red fox jumped over the lazy brown dog.',
4 'quick',
5 1, --start searching from here
6 1, --which occurrence should be ret
7 0, -- 0 start of occurrence
8 -- 1 end of occurrence
9 NULL -- match_parameters
10 ) match_pos
11 FROM dual;

MATCH_POS
----------
5

1 row selected.
REGEXP_INSTR
U31125@DEDW>SELECT
2 REGEXP_INSTR(
3 'The quick red fox jumped over the lazy brown dog.',
4 'quick',
5 1, --start searching from here
6 1, --which occurrence should be ret
7 1, -- 0 start of occurrence
8 -- 1 end of occurrence
9 NULL -- match_parameters
10 ) match_pos
11 FROM dual;

MATCH_POS
----------
10

1 row selected.
REGEXP_REPLACE
REGEXP_REPLACE(
source,
pattern,
rep_string, -- replaces matched text
position, -- search start position
occurrence, -- match occurrence
-- 0 replaces all matches
match_parameters
) RETURN VARCHAR2
REGEXP_REPLACE
 Compress two or more spaces
SELECT
REGEXP_REPLACE(
'500 Oracle Parkway, Redwood Shores, CA',
'( ){2,}',
' '
) "REGEXP_REPLACE"
FROM DUAL;

REGEXP_REPLACE

------------------------------------------
500 Oracle Parkway, Redwood Shores, CA
REGEXP_REPLACE
Provide better error message

SELECT REGEXP_REPLACE (
‘Error on employee <s1> whose name is <s2>',
'(.*)<s1>(.*)<s2>',
'\1A31124\2RGravenstein'
) AS "REGEXP_REPLACE"
FROM DUAL;

REGEXP_REPLACE

--------------------------------------------------
Error on employee A31124 whose name is RGravenstein

1 row selected.
REGEXP_SUBSTR
REGEXP_SUBSTR(
source,
pattern,
position, -- search start position
occurrence, -- which occurrence should
-- be found and sub-stringed
match_parameter
) RETURN VARCHAR2
Parsing Example
set serveroutput on First non-colon
DECLARE found
x VARCHAR2(20);
y VARCHAR2(20);
c VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12';
BEGIN
x := REGEXP_SUBSTR(c,'[^:]+', 1, 1); Starting after the
y := REGEXP_SUBSTR(c,'[^,]+', 1, 2); first non-comma
through to the
dbms_output.put_line('<'||x||'-'||y||'>'); second comma
END;
/
<1-4:6>
PL/SQL procedure successfully completed.
Where to use Regular Expression
 Validation of
Lots of examples
 e-mail addresses can be found on
the internet
 Credit card numbers
 SSN
 …
 Complicated string parsing
 Where the standard Oracle functions LIKE,
INSTR, REPLACE and SUBSTR can’t do the
job without a lot of work
Performance Considerations
 Regular expressions are more compute
intensive than non-regular expression
equivalents.
 Most database processes are IO bound and
therefore some additional cpu load is
normally not an issue
References The best reference

http://www.regular-expressions.info/reference.html

http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/ri
schert_regexp_pt1.html

http://www.psoug.org/reference/regexp.html

http://www.dba-oracle.com/t_regular_expressions.htm

http://rootshell.be/~yong321/computer/OracleRegExp.html

http://www.databasejournal.com/features/oracle/article.php/3501826

John Garmany – “Being Regular with Regular Expresssions” Collaborate08

You might also like