Powershell: Multi-Line Regular Expressions

At work, I had the need to find certain values in a file that were only detectable if one considered also the values of preceding lines.
So I thought: This is a job for multi-line regular expressions! But which I hadn’t ever used before… 😓

The problem

This is an excerpt from a text file that I got which had several thousand events from the Windows Security log. The file was named a *.csv, and it did in fact contain some lines that were comma/semicolon separated; but beneath those lines, it had a body like this (the output you’d get from something like Get-EventLog):

Subject:
Security ID:S-1-2-34-5678901234-5678901234-5678901234-56789
Account Name:id00001
Account Domain:EXAMPLE
Logon ID:0x77A284BCD
 
Member:
Security ID:S-1-4-32-1234567890-1234567890-1234567890-12345
Account Name:CN=S-1-4-32-1234567890-1234567890-1234567890-12345,CN=ForeignSecurityPrincipals,DC=example,DC=net
 
Group:
Security ID:S-1-8-64-0987654321-0987654321-0987654321-098765
Group Name:Some GroupName
Group Domain:EXAMPLE
 
Additional Information:
Privileges:-"

Multiply that a few thousand times, and you get the idea.

This file (which was generated by a different team) contained the result of filtering for a 4733 event of a given AD group, in a specified time frame and AD domain; and we needed to get the users that got kicked out, and by whom.

The members were the easy part (I can’t actually remember anymore why that was the case, maybe because they were all listed with their Distinguished Name), but the users from the Subject section (i.e. the user and logon session that performed the action.) was trickier: Just looking for “Account Name:” would also match the account names of the group members that were removed from the group.

So, my idea: To do a regular expression pattern that would match over multiple lines – how hard could it be…?

(Hm, thinking now about it again: I could also have filtered for the matches of “Account Name:” that were not followed by an CN= maybe… Anyways, at least by this, I learned how to use multi-line searches with regular expressions…)

Multi-line RegEx pattern1

First off, I created a regular expression that spans multiple lines and against which I shall compare the input text:

[regex] $pattern = '(?msi)^Subject:.*?Security ID:.*?Account Name:(.*?)$'

The (?msi) at the beginning is a mode modifier:

In order to match the smallest possible expression, the question mark is added to the wildcard: .*?

The third wildcard is put into Parentheses (), so that that match will create its own capture group, which can then be accessed directly later on.

Using it

Next up, processing the input text file and finding all matches for the pattern:

$FileContent = Get-Content -Raw -Path "InputFile.txt"                               # (1.)
$AllMatches = Select-String -InputObject $FileContent -Pattern $pattern -AllMatches # (2.)
$ExpandedMatches = $AllMatches | Select-Object -ExpandProperty Matches              # (3.)
ForEach ($i in $ExpandedMatches) { $i.Groups[1].Value }                             # (4.)
  1. One should use Get-Content with the -Raw argument to read the text file as one continuous string, because otherwise one would interate over the read content line by line, and then the multi-line RegEx cannot be matched.
  2. Then collect all those strings which match with the RegEx pattern.
  3. The objects are then expanded to get to the real values of it.
  4. And finally, getting the actual value that I’m interested in: In this example, it’s the first (and only) captured group, with index 1 (because index 0 will return the whole matched pattern, starting with Subject:…).

Of course, this can also be done more dense in one compact pipeline expression, without all those extra and intermediate variables:

Get-Content -Raw -Path "InputFile.txt" |
    Select-String -Pattern $pattern -AllMatches |
            Select-Object -ExpandProperty Matches |
                % { $_.Groups[1].Value }

  1. Thanks to the article Multi-line Regular Expression Replace in Powershell for explaining the concepts! ↩︎