Powershell: Using Regular Expressions

This is not a introduction or tutorial on what Regular Expressions (RegEx) are or can do, but on how to use them with Powershell!


Brief summary

[regex] $Filter = "([iI])([dD])(\d{5})"

$Data | select-string -pattern $Filter # There are
$Data -match $Filter                   #     multiple ways on
[regex]::Match($Data, $Filter)         #         how to use RegEx in PS

More details on how to use one of these lines can be seen in the Examples section below.

See also


Examples and snippets

Extract a text string

Matches “GroupName-RW” from $inputstring:

$inputstring = "C:\Folder1\Foler2\AnotherFolder3\;DOMAIN\GroupName-RW;Modify, Synchronize;Allow;inherited"
$pattern = ".*DOMAIN\\(.*?);"
[regex]::Match($inputstring, $pattern).Groups[1].Value

Get the five digits of an ID like ‘id01234’

… and use the extracted number to build another ID like “A-01234-Z”:

[regex] $Filter = "^([iI])([dD])(\d{5})$"
$user.extensionAttribute2 | select-string -pattern $Filter | % { $_.matches.groups[3].Value } | % { "A-" + $_ + "-Z"}

Find a tag and its value in a text

The regular expression below finds in the example text block variants of this syntax (and more):

tag: 12
tAg =ab22
Tag - ab22_d-z

We want to finde “Foo:Value” from the text:

$tag = "foo"
[regex] $re = "($tag)(\s*)([\-:=])(\s*)([a-zA-Z0-9_-]*)"

$text = @"
dfdfd fdsfdfdf
blah Foo : 1xa-dv dfdf
fdfdf fsdfsdf

bvbbd yuyuuuyu
qewe foo = 2yb_dv kkxx
lklkl nbbnbnn
"@

$text | select-string -pattern $re | % {$_.matches.groups}
$text | select-string -pattern $re | % {$_.matches.groups[5].Value} # -> "1xa-dv"

$text | select-string -pattern $re -AllMatches | % {$_.matches.groups} 
$text | select-string -pattern $re -AllMatches | % {$_.matches[0].groups[5].Value} # -> "1xa-dv"
$text | select-string -pattern $re -AllMatches | % {$_.matches[1].groups[5].Value} # -> "2yb_dv"

Note: Select-String is case-insensitive by default; if you got a non-Powershell engine that isn’t, one can use (?i)...(?-i) in a RegEx (doesn’t work with Powershell, though!).

URLs

There are some very elaborate and long REs out there to match URLs in a text string; with a bit of trial and error, I damped it down for my most common use case:

[regex] $re = "(?i)(\bhttps?:\/\/[a-zA-Z0-9/+=\-?_.#%;]*)"
# Needs to be more specific: \S* also matches with too much else (all the Markdown
# stuff and other text), which then cannot be trimmed, since its not a single character:
# For example:
#   ... visit **[Title](http://www.example.net/foo.html)** blah blah...
#   ... visit <http://www.example.net/foo.html>, blah blah...

"Long string with many URLs like http://www.example.net/foo.html in it..." |
    Select-String -Pattern $re -AllMatches |
         % { ($_.matches.groups.value).trim(')>]"') }       # Display & trim

That gets you URLs that begin with http or https, which are in upper- and lowercases – but it’s not perfect yet:

Although I tried a lot, but couldn’t find that one (simple) singular RegEx that would be flexible enough to find all legal URL characters, but still ignored the final/closing characters (and/or space) that I don’t want/need. Some variants worked on one half of my test strings, the other on the other half – but no one version fit all…

So, plan B: Get what you can via RegEx and then trim any trailing character from the matched string that you don’t want (e.g. closing parenthesis, brackets, etc., like ] ' ) } > ") by hand in the script.

Split a string at the first occurrence of a number

("ABC123" -split '(?=\d)')[0]
ABC

By the way: This doesn’t work the other way around (so, not for “123ABC”)…

Ignore comments in a text file

A sample input (i.e. “file.txt”):

# Comment A
# -----------------
Value 1       # Comment B
Longer Text   # Another comment
Value 3

And here’s how to get just the value of the file (line by line), without the comment:

$Delimiter = '#'

$x = Get-Content -Path '.\file.txt' |
    Select-String -Pattern "^([^$Delimiter]*)(.*)$" | # Two groups: The actual text in group 1 and the comment in group 2.
        % { ($_.matches.groups[1].Value).Trim() } |   # Group 0 is be the whole match; but we're only interest in group 1 here;
            ? { ![string]::IsNullOrEmpty($_)}         # we also trim leading & trailing spaces and ignore empty/blank lines.

Applying modifiers

See Specifying modes inside the Regular Expression for more details.

For example, (?i) makes a regex case insensitive:

Param ($input_string)

[regex] $regex1 = "^([0-9]+)\s*((?i)b|kb|mb|gb|tb)$" # (?i) to accept kb, KB, Kb, etc.
[regex] $regex2 = "^([0-9]+)\s*((?i)[kmgt]?b)$"      # Shorter.

$matches = $regex1.match($input_string)

$size = $matches.Groups[1].Value
$unit = $matches.Groups[2].Value

"Size: $size; Unit: $unit"

Greedy and Lazy Quantifiers

Add a ? to a quantifier to change it from “greedy” to “lazy”:

Greedy Lazy Quantifier Description
* *? Zero or more times
+ +? One or more times
? ?? Zero or one time
{n} {n}? Exactly n times
{n,} {n,}? n or more times
{n,m} {n,m}? Between n and m times
> "sascha" | select-string -pattern "s.*a"  | % { "Greedy: $($_.matches.value)" }
> "100101" | select-string -pattern "1.*"   | % { "Greedy: $($_.matches.value)" }
> "sascha" | select-string -pattern "s.*?a" | % { "Lazy  : $($_.matches.value)" }
> "100101" | select-string -pattern "1.*?"  | % { "Lazy  : $($_.matches.value)" }

Greedy: sascha
Greedy: 100101
Lazy  : sa
Lazy  : 1

Operator -match

The operators -match and -notmatch use regular expressions to search for pattern in the left-hand side values:
(See also About Comparison Operators: -match and -notmatch at microsoft.com)

String

> "This is a string" -match "This"
True

> "This is a string" -match "^This$"
False

> "This is a string" -match "^This([a-zA-Z|\s]*)string$"
True

If the result is $true, then the automatic variable $Matches will be set with the result:

> $Matches

Name                           Value
----                           -----
1                               is a
0                              This is a string

> $Matches[1]
 is a

Array of strings

If -match is being used with a collection (e.g. an array of strings), then you will not get a boolean return value, but the matched value itself:
(And the automatic variable $Matches will not be set/overwritten!)

> @("First item in an array of strings", "Second item in an array of strings") -match "^([a-zA-Z|\s]*)(array)([a-zA-Z|\s]*)$"
First item in an array of strings
Second item in an array of strings

Multi-Line Regular Expressions

At work, I had the need to find certain values in a file that were only detectable if one considered also the values of preceding lines.
So I thought: This is a job for multi-line regular expressions! But which I hadn’t ever used before… 😓

The problem

This is an excerpt from a text file that I got which had several thousand events from the Windows Security log. The file was named a *.csv, and it did in fact contain some lines that were comma/semicolon separated; but beneath those lines, it had a body like this (the output you’d get from something like Get-EventLog):

Subject:
Security ID:S-1-2-34-5678901234-5678901234-5678901234-56789
Account Name:id00001
Account Domain:EXAMPLE
Logon ID:0x77A284BCD
 
Member:
Security ID:S-1-4-32-1234567890-1234567890-1234567890-12345
Account Name:CN=S-1-4-32-1234567890-1234567890-1234567890-12345,CN=ForeignSecurityPrincipals,DC=example,DC=net
 
Group:
Security ID:S-1-8-64-0987654321-0987654321-0987654321-098765
Group Name:Some GroupName
Group Domain:EXAMPLE
 
Additional Information:
Privileges:-"

Multiply that a few thousand times, and you get the idea.

This file (which was generated by a different team) contained the result of filtering for a 4733 event of a given AD group, in a specified time frame and AD domain; and we needed to get the users that got kicked out, and by whom.

The members were the easy part (I can’t actually remember anymore why that was the case, maybe because they were all listed with their Distinguished Name), but the users from the Subject section (i.e. the user and logon session that performed the action.) was trickier: Just looking for “Account Name:” would also match the account names of the group members that were removed from the group.

So, my idea: To do a regular expression pattern that would match over multiple lines – how hard could it be…?

(Hm, thinking now about it again: I could also have filtered for the matches of “Account Name:” that were not followed by an CN= maybe… Anyways, at least by this, I learned how to use multi-line searches with regular expressions…)

Multi-line RegEx pattern1

First off, I created a regular expression that spans multiple lines and against which I shall compare the input text:

[regex] $pattern = '(?msi)^Subject:.*?Security ID:.*?Account Name:(.*?)$'

The (?msi) at the beginning is a mode modifier:

In order to match the smallest possible expression, the question mark is added to the wildcard: .*?

The third wildcard is put into Parentheses (), so that that match will create its own capture group, which can then be accessed directly later on.

Using it

Next up, processing the input text file and finding all matches for the pattern:

$FileContent = Get-Content -Raw -Path "InputFile.txt"                               # (1.)
$AllMatches = Select-String -InputObject $FileContent -Pattern $pattern -AllMatches # (2.)
$ExpandedMatches = $AllMatches | Select-Object -ExpandProperty Matches              # (3.)
ForEach ($i in $ExpandedMatches) { $i.Groups[1].Value }                             # (4.)
  1. One should use Get-Content with the -Raw argument to read the text file as one continuous string, because otherwise one would interate over the read content line by line, and then the multi-line RegEx cannot be matched.
  2. Then collect all those strings which match with the RegEx pattern.
  3. The objects are then expanded to get to the real values of it.
  4. And finally, getting the actual value that I’m interested in: In this example, it’s the first (and only) captured group, with index 1 (because index 0 will return the whole matched pattern, starting with Subject:…).

Of course, this can also be done more dense in one compact pipeline expression, without all those extra and intermediate variables:

Get-Content -Raw -Path "InputFile.txt" |
    Select-String -Pattern $pattern -AllMatches |
            Select-Object -ExpandProperty Matches |
                % { $_.Groups[1].Value }

  1. Thanks to the article Multi-line Regular Expression Replace in Powershell for explaining the concepts! ↩︎