Professional Documents
Culture Documents
Your Oracle R - : Regular Expressions in An Oracle World
Your Oracle R - : Regular Expressions in An Oracle World
Gravenstein, Rebholtz
Definition
regular expressions provide a concise and
flexible means for identifying strings of text of
interest, such as particular characters, words,
or patterns of characters
Regular Expressions
Used to search strings to find matching
patterns
Match patterns can be fairly simple to
extremely difficult
It’s much easier to understand your own
expressions!
Search for match starts at beginning of string
and stops when first match is found
Found In
Text Editors
Unix Utilities
ed (text editor)
Grep
Programming Languages
Perl
Tcl
And since Oracle 10g
SQL
Application Express
Patterns
Generally characters and patterns represent
themselves
Special Characters
'…..-…. '
Yes
'.?-.+'
Yes
'.*-.+'
Quantifies
{count} defines an exact repetition count of
the prior object
Yes
Yes
216-588-5023
216-5888888-5023
Yes – skip the 011 and you
011-216-588-5023 get the match
216-5888-888-5023
Yes – {3,) eats all charters to
the -5023
Anchors
Anchors
^ Start of line
$ End of line
No Yes
Does ^…..-….$ match?
44313-1234 441313-1234 4431--abcd
44313-1234c a44313-1234
Yes
No No
Alternate and Grouping
[char] character list
| alternation (boolean or operator)
() group subexpression
Character Expression
[abc] defines a list of characters that can be
matched to a single a or b or c
[0123456789]{5}-[0123456789]{4}
4 of the
Any 5 of the A dash prior pattern
number prior
pattern
Any
number
More on alternation
[9]|[1] 9 or 1
([8]{1})|([9]{1})[1234567890]{2}-[1234567890]
{3}-[1234567890]{4}
Or the
groups 900-555-5125
Or
823-123-4567
Ranges
[a-z] matches any letter from a to z
[0-9] matches any digit from 0 to 9
Order is important – must start with lower
and go to higher
[a-zA-Z] matches any letter from a to Z
(^[[:digit:]]{5})(-[[:digit:]]{4})?$
Note:
[^[:digit:]]
This is a special case of ^ as it is in a character
list
When ^ is not at the beginning of a
string it negates
“Greediness”
Regular expression operators are greedy, they match
the maximum set
[a-z]+ will match the entire string
abcedef
[a-z]+? – add ? after the quantity, and you get the lazy
or minimum match
[-+]?([0-9]+.?[0-9]*|.[0-9]+)([eE][-+]?[0-9]+)?
A blank followed by
zero or more letters not
>
Back References
Given pattern (([0-9]){3}-([0-9]){3}-([0-9]{4}))
What are:
\1 = 216-588-5023
\2 = 6
\3 = 8
\4 = 5023
Oracle Reg Expression Functions
10g provides these regular expression analogs for
existing string functions
REGEXP_LIKE
REGEXP_INSTR
REGEXP_SUBSTR
REGEXP_REPLACE
'YESTHISISAMATCH'
-------------------
Yes this is a match
REGEXP_INSTR
REGEXP_INSTR(
source,
pattern,
start_position, --start searching from here
occurrence, --which occurrence should be ret
return_position, -- 0 start of occurrence
-- 1 end of occurrence
match_parameters
) RETURN NUMBER
REGEXP_INSTR
U31125@DEDW>SELECT
2 REGEXP_INSTR(
3 'The quick red fox jumped over the lazy brown dog.',
4 'quick',
5 1, --start searching from here
6 1, --which occurrence should be ret
7 0, -- 0 start of occurrence
8 -- 1 end of occurrence
9 NULL -- match_parameters
10 ) match_pos
11 FROM dual;
MATCH_POS
----------
5
1 row selected.
REGEXP_INSTR
U31125@DEDW>SELECT
2 REGEXP_INSTR(
3 'The quick red fox jumped over the lazy brown dog.',
4 'quick',
5 1, --start searching from here
6 1, --which occurrence should be ret
7 1, -- 0 start of occurrence
8 -- 1 end of occurrence
9 NULL -- match_parameters
10 ) match_pos
11 FROM dual;
MATCH_POS
----------
10
1 row selected.
REGEXP_REPLACE
REGEXP_REPLACE(
source,
pattern,
rep_string, -- replaces matched text
position, -- search start position
occurrence, -- match occurrence
-- 0 replaces all matches
match_parameters
) RETURN VARCHAR2
REGEXP_REPLACE
Compress two or more spaces
SELECT
REGEXP_REPLACE(
'500 Oracle Parkway, Redwood Shores, CA',
'( ){2,}',
' '
) "REGEXP_REPLACE"
FROM DUAL;
REGEXP_REPLACE
------------------------------------------
500 Oracle Parkway, Redwood Shores, CA
REGEXP_REPLACE
Provide better error message
SELECT REGEXP_REPLACE (
‘Error on employee <s1> whose name is <s2>',
'(.*)<s1>(.*)<s2>',
'\1A31124\2RGravenstein'
) AS "REGEXP_REPLACE"
FROM DUAL;
REGEXP_REPLACE
--------------------------------------------------
Error on employee A31124 whose name is RGravenstein
1 row selected.
REGEXP_SUBSTR
REGEXP_SUBSTR(
source,
pattern,
position, -- search start position
occurrence, -- which occurrence should
-- be found and sub-stringed
match_parameter
) RETURN VARCHAR2
Parsing Example
set serveroutput on First non-colon
DECLARE found
x VARCHAR2(20);
y VARCHAR2(20);
c VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12';
BEGIN
x := REGEXP_SUBSTR(c,'[^:]+', 1, 1); Starting after the
y := REGEXP_SUBSTR(c,'[^,]+', 1, 2); first non-comma
through to the
dbms_output.put_line('<'||x||'-'||y||'>'); second comma
END;
/
<1-4:6>
PL/SQL procedure successfully completed.
Where to use Regular Expression
Validation of
Lots of examples
e-mail addresses can be found on
the internet
Credit card numbers
SSN
…
Complicated string parsing
Where the standard Oracle functions LIKE,
INSTR, REPLACE and SUBSTR can’t do the
job without a lot of work
Performance Considerations
Regular expressions are more compute
intensive than non-regular expression
equivalents.
Most database processes are IO bound and
therefore some additional cpu load is
normally not an issue
References The best reference
http://www.regular-expressions.info/reference.html
http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/ri
schert_regexp_pt1.html
http://www.psoug.org/reference/regexp.html
http://www.dba-oracle.com/t_regular_expressions.htm
http://rootshell.be/~yong321/computer/OracleRegExp.html
http://www.databasejournal.com/features/oracle/article.php/3501826