26

Using Powershell, I know how to search a file for a complicated string using a regex, and replace that with some fixed value, as in the following snippet:

Get-ChildItem  "*.txt" |
Foreach-Object {
    $c = ($_ | Get-Content)
    $c = $c -replace $regexA,'NewText'
    [IO.File]::WriteAllText($_.FullName, ($c -join "`r`n"))
}

Now I'm trying to figure out how to replace a subsection of each match of a regex. Can this be done in one smooth step like above? Or do you have to extract each match of the larger regex, search and replace within it, and then somehow stick that result back into the original text?

To clarify with an example, suppose that in the following test text I want to find only the 14xx-numbered instances like "TEST=*1404" in the following text, and replace the 14xx with 16xx?

A 2180 1830 12 0 3 3 TEST=C1404
A 900 1830 12 0 3 3 TEST=R1413
A 400 1830 12 0 3 3 TEST=R1411
A 1090 1970 12 0 3 3 TEST=U1400
A 1090 1970 12 0 3 3 TEST=CSA1400
A 1090 1970 12 0 3 3 TEST=CSA1414
A 1090 1970 12 0 3 3 TEST=CSA140
A 1090 1970 12 0 3 3 TEST=CSA14001
A 1090 1970 12 0 3 3 TEST=CSA17001

I.e. I'd like the resulting text to be as follows, where you'll note that only the first 6 lines should change:

A 2180 1830 12 0 3 3 TEST=C1604
A 900 1830 12 0 3 3 TEST=R1613
A 400 1830 12 0 3 3 TEST=R1611
A 1090 1970 12 0 3 3 TEST=U1600
A 1090 1970 12 0 3 3 TEST=CSA1600
A 1090 1970 12 0 3 3 TEST=CSA1614 <- Second instance of '14' shouldn't change
A 1090 1970 12 0 3 3 TEST=CSA140 <- Shorter numbers shouldn't change
A 1090 1970 12 0 3 3 TEST=CSA14001 <- Longer numbers shouldn't change
A 1090 1970 12 0 3 3 TEST=CSA17001

The following regex seems to do the job of finding the larger strings where I need to make replacements, but I don't know what functionality in Powershell (replace?) to use to just replace the substring of the results. Also, feel free to suggest a better regex if that would help.

$regexA = "\bTEST=\b[A-Za-z]+14\d\d\r"

I'd rather not have to hard-code an exhaustive list of the stuff that can come between the '=' and the numbers, like 'R', 'C', "CSA", etc.

I've been working on something for an hour or so where I get all the matches for the regex, search within them to replace 14 with 16, then run replace on the original text with the old and new values, e.g. replace($myText,"TEST=CSA1400","TEST=CSA1600"), but this is not covering off the special cases very well, and it feels like I'm heading down the rabbit-hole.

1

3 Answers 3

34

You need to group the sub-expressions you want to preserve (i.e. put them between parentheses) and then reference the groups via the variables $1 and $2 in the replacement string. Try something like this:

$regexA = '( TEST=[A-Za-z]+)14(\d\d)$'

Get-ChildItem '*.txt' | ForEach-Object {
    $c = (Get-Content $_.FullName) -replace $regexA, '${1}16$2' -join "`r`n"
    [IO.File]::WriteAllText($_.FullName, $c)
}
9
  • +1 I was scratching my head over escaping that replacement agument.
    – mjolinor
    Commented Nov 12, 2013 at 3:52
  • In your method, is there any danger to adding a -raw to the Get-Content and removing the join from the replacement part?
    – SSilk
    Commented Nov 12, 2013 at 14:54
  • @SSilk That should work, too, but you need to replace the $ in the regular expression with something like (\r|$). Commented Nov 12, 2013 at 15:19
  • Another follow-up question: what if the value I'm sticking in the middle is itself another variable, external to the regex, rather than a fixed value. E.g. if the 16 in your sample code were some variable $number, what do I need to do to get it recognized as such? When I try that with your code, it literally prints the variable name with $. Thanks.
    – SSilk
    Commented Nov 12, 2013 at 19:28
  • 5
    @SSilk If you want to use regular variables in the replacement string you need to use double quotes instead of single quotes, and escape the variables referencing the groups from the regular expression: "`$1$number`$2". Commented Nov 12, 2013 at 19:42
3

Here's an example using a scriptblock delegate (sometimes called an evaluator):

$regex = [regex]'( TEST=\D+)14(\d{2})\s*$'
$evaluator = { '{0}16{1}' -f $args[0].Groups[1..2] }
filter set-number { $regex.Replace($_, $evaluator) }

foreach ($file in Get-ChildItem  "*.txt")
 {
   ($file | get-content) | set-number | Set-Content $file.FullName
 }

It's arguably more complex than the -replace operator, but lets you use powershell operators to construct the replacement text, so you can do anything you can put in a script block.

2

Try this:

Get-ChildItem  "*.txt" |
Foreach-Object {
  $c = $_ | Get-Content | Foreach {$_ -replace '(?<=TEST=\D+)14(?=\d{2}(\D+|$))','16'}
  $c | Out-File $_.FullName -Enc Ascii
}
1
  • 1
    $f = $_.FullName; (Get-Content $f) -replace ... | Out-File $f ... is probably a more elegant approach. Commented Nov 12, 2013 at 11:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.