What good does a Steppable Pipeline

Question

In our company were I am quiet new, they have a wrapper for almost every Native PowerShell cmdlet mainly to add more logging and error handeling. I am trying to push back on this and also refer to internal PowerShell feature to create a proxy command like:

$GCI = Get-Command Get-ChildItem
[System.Management.Automation.ProxyCommand]::Create($GCI)

But I am lacking some knowledge here.
What is difference (if any) between a SteppablePipeline and using the native PowerShell syntax.
In other words, in the Process block, what is the difference between:

$steppablePipeline.Process($_)

and using the native PowerShell syntax:

$_ |Microsoft.PowerShell.Management\Get-ChildItem # In this example

I am aware that I am seeking for general information but it appears to me that there is hardly any information on e.g. the ScriptBlock.GetSteppablePipeline Method

Well, $someStuff |A-Command will automatically call Begin() once, then Process() for each input item, then End() - with SteppablePipeline you get direct control over this flow. If you don't need that then obviously you don't need one — Mathias R. Jessen, Commented Jul 20, 2022 at 14:12
Thanks @Matthias, logically. I guess that is actually the answer as simple as it is😊. — iRon, Commented Jul 20, 2022 at 14:24
@Mathias would be nice to have the comment as answer to the question :) — Santiago Squarzon, Commented Jul 20, 2022 at 14:38
@SantiagoSquarzon I agree, but it's gonna be a few hours before I have the time to write up a proper answer ^_^ — Mathias R. Jessen, Commented Jul 20, 2022 at 14:41
When you do $_ | SomeCommand inside your command, you create a nested pipeline. With steppable pipeline you can actually chain SomeCommand into the pipeline, that your command is part of. This can be a performance improvement (e. g. when SomeCommand does expensive begin and end processing). In some cases you can provide correct results only by using steppable pipeline (e. g. when SomeCommand is one of the Format-* cmdlets, which need to see the entire input). — zett42, Commented Jul 20, 2022 at 16:57

mklement0 · Accepted Answer · 2022-07-22 15:12:50Z

This venerable blog post from 2009, which introduced proxy functions (wrapper functions), explains that steppable pipelines are required to implement them; the following quote suggests (but doesn't explicitly state) that they may have been created for that very purpose:

In particular, what you want to have happen is to be able to control the execution of the calling command – to control when it’s BEGINPROCESS(), PROCESSRECORD(), ENDPROCESS(), etc methods are called

Simply put, proxy functions, via steppable pipelines, allow you to implement a cmdlet (advanced function) by delegating most of the implementation to another cmdlet in a memory-efficient, streaming manner.

Specifically, a steppable pipeline allows you to delegate the implementation of your proxy function to a script block whose life cycle is kept in sync with the proxy function itself, in terms of initialization (begin block), per-object pipeline input processing (process block), and termination (end block), which means that the a single instantiation of the wrapped cmdlet is in effect directly connected to the same pipeline as the proxy function itself.

Conversely, this means: you don't strictly need a proxy function to write a wrapper function in the following scenarios:

If your wrapper function doesn't need to support pipeline input.
If you don't mind collecting all pipeline input first, before passing it all to the wrapped cmdlet at once, in your wrapper function's end block, which means that you're forgoing streaming processing
- While you may also get streaming processing if you call the wrapped cmdlet for each input object in your process block, doing so:
  - is inefficient (a full invocation of the wrapped cmdlet in every iteration, in a nested pipeline)
  - doesn't work for cmdlets that need to operate on all input as a whole, such as Format-* cmdlets or aggregating cmdlets such as Sort-Object and Group-Object

The following are three different implementations of a wrapper function around Select-String, which reports only the matching part of each matching line, as a string, to illustrate the tradeoffs:

Select-MatchProxy is a proper proxy function, i.e. it calls Select-String via a steppable pipeline, which amounts to streaming processing that only involves a single call instantiation of Select-String.
- It is based on a stripped-down version of the scaffolding code that [System.Management.Automation.ProxyCommand]::Create((Get-Commmand 'Select-String')) generates.
- GitHub issue #10863 discusses potential improvements to the code that [System.Management.Automation.ProxyCommand]::Create() generates.
Select-MatchSimple calls a new Select-String instance in each process block, which also amounts to streaming processing, but performs poorly; as noted above, this implementation approach isn't always feasible, depending on what cmdlet is being wrapped.
Select-MatchCollect collects all pipeline input up front, and then passes it to Select-String in the end block, which forgoes streaming processing and is memory-intensive; however, in terms of runtime it actually performs slightly better than the proxy function.

function Select-MatchProxy {
  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    $InputObject,
    [Parameter(Mandatory, Position=0)]
    [string] $Pattern
  )
  begin {
    $steppablePipeline = { 
       Select-String -Pattern $Pattern | ForEach-Object { $_.Matches.Value }
     }.GetSteppablePipeline($myInvocation.CommandOrigin)
    $steppablePipeline.Begin($PSCmdlet)
  }
  process {
    $steppablePipeline.Process($InputObject)
  }
  end {
    $steppablePipeline.End()
  }
}

function Select-MatchSimple {
  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    $InputObject,
    [Parameter(Mandatory, Position=0)]
    [string] $Pattern
  )
  process {
    Select-String -InputObject $InputObject -Pattern $Pattern |
      ForEach-Object {
        $_.Matches.Value
      }
  }
}

function Select-MatchCollect {
  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    $InputObject,
    [Parameter(Mandatory, Position=0)]
    [string] $Pattern
  )
  begin {
    $l = [System.Collections.Generic.List[object]]::new()
  }
  process {
    $l.Add($InputObject)
  }
  end {
    $l | Select-String -Pattern $Pattern | ForEach-Object { $_.Matches.Value }
  }
}

To compare runtimes, you can use the following code:

# Sample input array of 100,000 strings.
$array = ('foo', 'bar') * 50000
# Time 15 runs of each function, and report the average.
Time-Command { $array | Select-MatchProxy   'o+' }, 
             { $array | Select-MatchSimple  'o+' }, 
             { $array | Select-MatchCollect 'o+' }

Sample timings from a macOS 12.4 M1 Mac running PowerShell Core 7.3.0-preview.6, which give a sense of relative performance:

Factor Secs (15-run avg.) Command                           TimeSpan
------ ------------------ -------                           --------
1.00   0.916              $array | Select-MatchCollect 'o+' 00:00:00.9162298
1.12   1.025              $array | Select-MatchProxy   'o+' 00:00:01.0254835
5.38   4.930              $array | Select-MatchSimple  'o+' 00:00:04.9298495

The above uses the Time-Command function from this Gist.

Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
```
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
```

In some examples I see $myInvocation.CommandOrigin being passed to GetSteppablePipeline(). Can you explain when this is necessary? It seems to be related to runspaces, but I couldn't find any explanation how its behaviour would change according to the argument. — zett42, Commented Jul 22, 2022 at 7:41
@zett42, good question, that is actually also how the [System.Management.Automation.ProxyCommand]::Create($Cmd) method does it,. — iRon, Commented Jul 22, 2022 at 8:30
@iRon, good point: I've updated the answer to include this argument, and, separately, I've linked to a GitHub issue discussing improvements to the generated code. — mklement0, Commented Jul 22, 2022 at 13:34
@zett42, good point; please see my previous comment. I personally don't know why it's necessary and what it does, and the docs don't help. — mklement0, Commented Jul 22, 2022 at 13:35
Stackoverflow is really a place where I learned a lot. Not from just questioning but also answering. I started to use Stackoverflow about a decade ago mainly for PowerShell issues and as this What good does a Steppable Pipeline question, I had only a little suspicion what the answer could be but with the full explanation, I was able to answer somebody else's question and contribute to the PowerShell Community blog: Mastering the (steppable) pipeline. — iRon, Commented Jan 31, 2023 at 13:11

Collectives™ on Stack Overflow

What good does a Steppable Pipeline

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
powershell
pipeline
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged powershellpipeline or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
powershell
pipeline
or ask your own question.