Professional Documents
Culture Documents
REGEX Extended
REGEX Extended
1. the 12 punctuation characters that make regular expressions work their magic are $ ( ) *+.?[\^{| 2. notably absent from the list are ] , - and }. The first two become metacharacters only after an unescaped [, and the } only after an unescaped { 3. If you want your regex to match them literally, you need to escape them by placing a backslash in front of them
Meaning bell escape form feed new line carriage return horizontal tab vertical tab
Flavors .NET .NET .NET, JScript .NET, JScript .NET, JScript .NET, JScript .NET, JScript
@"[$""'\n\d/\\]
- to include a double quote in a verbatim string, double it up Note: @"\n" is always the regex token \n, which matches a newline; verbatim strings do not support \n at the string level
JavaScript - /[$"'\n\d\/\\]/
- Simply place your regular expression between two forward slashes - If any forward slashes occur within the regular expression itself, escape those with a backslash.
Note: RegexOptions.Compile can run up to 10 times faster than a regular expression compiled without this option (it compiles the regular expression down to CIL)
JavaScript:
var myregexp = /regex pattern/; var myregexp = new RegExp(userinput);
Shorthands
Six regex tokens that consist of a backslash and a letter form shorthand character classes. Each lowercase shorthand character has an associated uppercase shorthand character with the opposite meaning. Token \d \w \s Matches a single digit a single word character any whitespace character
(this includes spaces, tabs, and line)
Opposite \D *^\d+) \W \S
Note - In JavaScript \w is always identical to *a-zA-Z09_+. In .NET it includes letters and digits from all other scripts (Cyrillic, Thai, etc.)
.NET
[\s\S]
JScript[2]
[1]
you can also place a mode modifier at the start of the regular expression : (?s) is the mode modifier for dot matches line breaks mode in .NET [2] an alternative solution is needed for JavaScript, which doesnt have a dot matches line breaks option (*\d\D+ and *\w\W+ have the same effect).
Matches
At the very start of the subject text, before the first character (to test whether the subject text begins with the text you want to match)
Flavor
.NET
Note
A must be uppercase
<^>
equivalent to \A, as long as you do not turn on the ^ and $ match at line breaks option; otherwise it will match at the very start of the each line
at the very end of the subject text, after the last character (to test whether the subject text ends with the text you want to match)
\Z \z
.NET
Difference between \Z and \z - when the last character in your subject text is a line break. In that case, \Z can match at the very end of the subject text, after the final line break, as well as immediately before that line break
equivalent to \Z, as long as you do not turn on the ^ and $ match at line breaks option; otherwise it will
ExplicitCapture
Compiled Singleline
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). (Dot matches line break) IgnorePatternWhitespace Eliminates unescaped white space from the pattern and enables comments marked with #. (Free-spacing). RightToLeft ECMAScript Specifies that the search will be from right to left instead of from left to right. Enables ECMAScript-compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase, Multiline, and Compiled values. The use of this value with any other values results in an exception (JavaScript flavor) - most important effect is that with this option, \w and \d are restricted to ASCII characters, as they are in JavaScript Specifies that cultural differences in language is ignored.
CultureInvariant
JavaScript
var myregexp = /regex pattern/im;
Regex Options
1. 2. 3. 4. 5. Free-spacing: Not supported by JavaScript. Case insensitive: /i Dot matches line breaks: Not supported by JavaScript. Caret and dollar match at line breaks: /m Additional Language-Specific Options: apply a regular expression repeatedly to the same string: /g
or
bool foundMatch = Regex.IsMatch(subjectString, "regex pattern"); Note: @"\Aregex pattern\Z" - regex matches the subject string entirely
Javascript:
if (/regex pattern/.test(subjectString)) { // Successful match } else { // Match attempt failed } Note: /^regex pattern&/.test(subjectString) - regex matches the subject string
entirely
Note:
1. regexObj.Match("123456", 3, 2) tries to find a match in "45 2. regexObj.Match(subjectString).Index position in subject string 3. regexObj.Match(subjectString).Length length of the match
JavaScript:
var result = subject.match(/\d+/); if (result) { result = result[0]; } else { result = ''; } var matchstart = -1; var matchlength = -1; var match = /\d+/.exec(subject); if (match) { matchstart = match.index; matchlength = match[0].length; }
\p{Sc}
\p{IsGreek Extended}
\P{M}\p{M }*
Unicode block
Unicode grapheme
.NET
.NET
noncapturing group)
Benefits: You can add them to an existing regex without upsetting the references to numbered capturing groups Performance - a capturing group adds unnecessary overhead that you can eliminate by using a noncapturing group Note: parts of the match can be named : \b(?<year>\d\d\d\d)-(?<month>\d\d)-
JavaScript:
var result = ""; var match = /http:\/\/([a-z0-9.-]+)/.exec(subject); if (match) { result = match[1]; } else { result = ''; }
JavaScript:
var list = subject.match(/\d+/g);
Note: - the /g flag tells the match() function to iterate over all matches in the string and put them into an array - regex with the /g flag, string.match() does not provide any further details about the regular expression
C#:
Match matchResult = Regex.Match(subjectString, @"\d+"); while (matchResult.Success) { // Here you can process the match stored in matchResult matchResult = matchResult.NextMatch(); }
JavaScript:
var regex = /\d+/g; var match = null; while (match = regex.exec(subject)) { // Don't let browsers such as Firefox get stuck in an infinite loop if (match.index == regex.lastIndex) regex.lastIndex++; // Here you can process the match stored in the match variable } Note: exec() should set lastIndex to the first character after the match if the match is zero characters long, the next match attempt will begin at the position of the match just found, resulting in an infinite loop
\b\d{100}\b - a decimal number with 100 digits \b[a-f0-9]{1,8}\b - A 32-bit hexadecimal number \b[a-f0-9]{1,8}h?\b - A 32-bit hexadecimal number with an optional h suffix \b\d*\.\d+(e\d+)? - A floating-point number with an optional integer part, a mandatory fractional part, and an optional exponent
<p>.*</p> vs <p>.*?</p>
2.
JavaScript:
result = subject.replace(/before/g, "after");
Note: if you want to replace all regex matches in the string, set the /g flag when
creating your regular expression object; if you dont use the /g flag, only the first match will be replaced.
C#:
string resultString = Regex.Replace(subjectString, @"(\w+)=(\w+)", "$2=$1"); or Regex regexObj = new Regex(@"(\w+)=(\w+)"); string resultString = regexObj.Replace(subjectString, "$2=$1");
With named groups: Regex regexObj = new Regex(@"(?<left>\w+)=(?<right>\w+)"); string resultString = regexObj.Replace(subjectString, "${right}=${left}");
JavaScript:
result = subject.replace(/(\w+)=(\w+)/g, "$2=$1");
C#:
Regex regexObj = new Regex(@"\d+"); string resultString = regexObj.Replace(subjectString, new MatchEvaluator(ComputeReplacement)); public String ComputeReplacement(Match matchResult) { int t= int.Parse(matchResult.Value) * 2; return t.ToString(); }
JavaScript:
var result = subject.replace(/\d+/g, function(match) { return match * 2; } ); Note: replacement function may accept one or more parameters:
the first parameter will be set to the text matched by the regular expression. If the regular expression has capturing groups, the second parameter will hold the text matched by the first capturing group, the third parameter gives you the text of the second capturing group, and so on.
Split a string
C#:
string[] splitArray = Regex.Split(subjectString, "<[^<>]*>");
JavaScript:
var list = []; var regex = /<[^<>]*>/g; var match = null; var lastIndex = 0; while (match = regex.exec(subject)) { // Don't let browsers such as Firefox get stuck in an infinite loop if (match.index == regex.lastIndex) regex.lastIndex++; // Add the text before the match list.push(subject.substring(lastIndex, match.index)); lastIndex = match.index + match[0].length; }
C#:
string[] lines = Regex.Split(subjectString, "\r?\n"); Regex regexObj = new Regex("regex pattern"); for (int i = 0; i < lines.Length; i++) { if (regexObj.IsMatch(lines[i])) { // The regex matches lines[i] } else { // The regex does not match lines[i] } }
JavaScript:
var lines = subject.split(/\r?\n/); var regexp = /regex pattern/; for (var i = 0; i < lines.length; i++) { if (lines[i].match(regexp)) { // The regex matches lines[i] } else { // The regex does not match lines[i] } }
Validating URL
^((https?|ftp)://|(www|ftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$