A Regular Expression, or RegEx, is a sequence of special characters used to define a textual pattern match.  RegEx processors are now included in numerous programming languages, including Java, JavaScript, PHP, Microsoft .NET, Perl, Python and VBScript.  PowerShell leverages the RegEx capabilities of Microsoft's .NET framework, which, in turn, are based on those provided by Perl 5.

The test harness below can be used to test Regular Expressions during development.  Simply enter the text to be searched in the Text to Search box, the RegEx being tested in the RegEx Pattern box, and click Test.   The results - if any - are then displayed in the RegEx Output window.  These are displayed as members of the built-in $matches hash table (see ).

 

A comprehensive breakdown of RegEx syntax is beyond the scope of this primer (Tony Stubblebine's book, cited above, contains over 100 pages!).   However, the list below provides a very brief overview of the key (and most commonly used) RegEx constructs:

Construct Description
^ Anchor left.
$ Anchor right.
. Any character, except new line / line feed (see \n, below).
* Previous character zero or more times (greedy).
*? Previous character zero or more times (lazy).
+ Previous character one or more times (greedy).
+? Previous character one or more times (lazy).
? Previous character zero or one times (greedy).
?? Previous character zero or one times (lazy).
\ Escape (e.g.: \\ for literal back-slash, \[ for literal left-square bracket, Etc.).
\d Shorthand for digit.
\D Shorthand for non-digit.
\w Shorthand for word character; equivalent to [a-zA-Z_0-9].
\W Shorthand for non-word character.
\t Shorthand for tab (ASCII 9).
\f Shorthand for form feed (ASCII 12).
\n Shorthand for new line / line feed (ASCII 10).
\r Shorthand for carriage return (ASCII 13).
\s Shorthand for white space; equivalent of [ \t\r\n\f].
\S Shorthand for non-white space.
\xNN Hexadecimal ASCII value (e.g.: \x41 = decimal 65 = upper-case A).
{n} Previous character n times.
{n,} Previous character n, or more, times.
{a,b} Previous character between a and b times.
[<set>] Character set, e.g.:

[a-z0-9]
: Lower a to z or 0 to 9.
[abc0123]
: Lower a, b or c or 0 thru 3.
[A-Za-z_]
: Upper or lower A to Z, or underscore.
[abc0123]
: Lower a, b or c or 0 thru 3.
[A-Z_\-]
: Capital A to Z, underscore or hyphen.
[^A]
: Not capital A.
[^aeiou]
: Not a vowel.
[ -~]
: All printable characters.
() Capturing or grouping, e.g.:

Test(ing)?
: Matches Test or Testing, with ing being captured.
Test(?:ing)?
: Matches Test or Testing, but with no capture (?:).
set colour=(\w+)
: Captures the defined colour.
pan(?:acae|dora|oramic)
: Matches panacae, pandora or panoramic, with no capture.

Captured values are placed in the built-in $matches hash table.  Capture groups can also be assigned labels to simplify subsequent processing.  The labels can be defined using one of two methods:

set colour=(?'colour'\w+)
: The chosen colour can be retrieved by referencing $matches.colour.
set colour=(?<colour>\w+)
: Same as above.


Regular Expressions also support a number of modifiers that can be used to control the way that the RegEx engine operates.  For example, by default, the any character match operator (.) does not include new line characters in its match set.  This can cause issues when dealing with text that spans multiple lines.   To overcome this, the Single-Line modifier can be specified to make the . match operator include new line characters (causing the text being analysed to be treated as a single line).

To demonstrate this, the following example shows a multi-lined text item being tested using two -match , firstly without the Single-Line modifier, and then with:

PS C:\Users\JohnDoe> $text = "This text`nspans multiple`nlines."; PS C:\Users\JohnDoe> $text; This text spans multiple lines. PS C:\Users\JohnDoe> $text -match 'This text.*multiple.*lines' False PS C:\Users\JohnDoe> $text -match '(?s)This text.*multiple.*lines' True PS C:\Users\JohnDoe>

Modifiers can be applied to an entire RegEx pattern, or to a particular group.   The -match and -cmatch support the following modifiers:

Modifier Description
i Causes the RegEx comparision to be case-insensitive (although, the -match operator is case-insensitive already).  However, this can be useful when wishing to perform partial case-insensitive matches, where the i modifier is applied to a group, e.g.:

'Rod, Jane and Freddy' -cmatch 'Rod, jane and Freddy' # Does not match. 'Rod, Jane and Freddy' -cmatch 'Rod, (?i:jane) and Freddy' # Matches.
m Multi-line mode.  Causes the ^ (caret) and $ to match the beginning and end of lines within a multi-line string, rather than the beginning and end of the entire string.  It should be noted, however, that RegEx deems a line delimeter to be a new line character, as opposed to Microsoft Windows' carriage-return and new line combination (escaped as `r`n in PowerShell).
n Explicit capture.  Groups that don't have a label won't capture.  This can be useful for decluttering your RegEx, for example, the following RegEx captures the noun cat ($matches.subject), but not the verb sat:

'The cat sat on the mat.' -match '(?n)^The (?<subject>\w+) (rolled|sat|slept) on the mat.$'
s Single-line mode.  Causes the . operator to include new line characters in its match set, thus making multi-lined text to appear as a single line.  It is for this reason that this modifier is often confused with the multi-line modifier, described above .  
x Ignore pattern white space.  Causes any un-escaped white space in the RegEx pattern to be ignored and also treats the # symbol as the start of a comment.  If the pattern needs to match white space, it must be explicitly escaped, e.g.: \s or \␣.  For example, the following RegEx does not match:

'Hello World' -match '(?x)Hello World'

However, the following RegEx patterns are successful:

'Hello World' -match '(?x)Hello\ World' 'Lorem Ipsum' -match '(?x)Lorem\ Ipsum #This is a comment.'

PowerShell's native RegEx operators, -match and -cmatch have two limitations that you may, one day, encounter.  These will be discussed below.

The -match and -cmatch operators both return their results using the $matches .  This is fine until you need perform two or more concurrent RegEx matches, for example:

[String] $Local:strValueA = ' test.'; [String] $Local:strValueB = '2017 = ABC'; if ( ($strValueA -match '^\s+(\w+)\.$') -and ($strValueB -match '^(\d{1,4})\s+=\s+(\w+)$') ) { # # Do something. # Write-Host -Object 'Both RegEx patterns match.'; } #if

In this simple example, both RegEx pattern matches are successful, but the $matches hash table only contains the captured values from the second match operation, that is, 2017 and ABC.

In order to secure the captured values from two or more concurrent RegEx pattern matches, we have to use the .NET Framework's System.Text.RegularExpressions.Regex class, more specifically, its Match method, for example:

[System.Text.RegularExpressions.Match] $Local:objMatchesA = $null; [System.Text.RegularExpressions.Match] $Local:objMatchesB = $null; [String] $Local:strValueA = ' test.'; [String] $Local:strValueB = '2017 = ABC'; if ( ($objMatchesA = [RegEx]::Match( $strValueA, '^\s+(\w+)\.$')) -and ($objMatchesB = [RegEx]::Match( $strValueB, '^(\d{1,4})\s+=\s+(\w+)$')) ) { # # Do something. # Write-Host -Object ( 'Both RegEx patterns match; captured values are "{0}", "{1}" and "{2}".' -f $objMatchesA.Groups[1].Value, $objMatchesB.Groups[1].Value, $objMatchesB.Groups[2].Value ); } #if

As already seen in section , above, PowerShell's -match and -cmatch operators support the i, m, n, s and x modifers.   However, the .NET Framework's System.Text.RegularExpressions.Regex class supports four additional modifiers:

Modifier Description
Compiled
CultureInvariant
ECMAScript
RightToLeft