3
\$\begingroup\$

I have a situation where I need to compare two strings with each other in a foreach loop that potentially run over millions of rows of data. The two strings will always contain between 1 and 12 different parameters, but these come in as a string concatenated with comma. The strings don't necessarily come sorted either, so they could look like:

"Parameter1, Parameter2, Parameter3, Parameter4"
"Parameter2, Parameter1, Parameter4, Parameter3"
"Parameter5, Parameter1, Parameter6, Parameter2"
etc.

I need to compare two of these and validate if they both contain the same parameters. My approach has currently been to split the strings by comma into arrays, sort the arrays, re-join them to strings, and then compare the strings, like:

$Array1 = $String1 -split ", " | Sort-Object
$CompareString1 = $Array1 -join ", "
$Array2 = $String2 -split ", " | Sort-Object
$CompareString2 = $Array2 -join ", "
if ($CompareString1 -eq $CompareString2) {
    do stuff
}

However, this got me thinking that I could instead compare the arrays as is:

$Array1 = $String1 -split ", " | Sort-Object
$Array2 = $String2 -split ", " | Sort-Object
if ($Array1 -eq $Array2) {
    do stuff
}

But then that got me thinking, comparing arrays as is would probably (?) be resource intense, so maybe I should instead use the Compare-Object native to Powershell:

$Array1 = $String1 -split ", " | Sort-Object
$Array2 = $String2 -split ", " | Sort-Object
if ([string]::IsNullorEmpty(Compare-Object -ReferenceObject $Array1 -DifferenceObject $Array2)) {
    do stuff
}

But that feels like I'm overcomplicating things and potentially adding overhead to my code.

Is any of the above methods optimal for comparing the two strings, or are there other, better alternatives?

EDIT: Comparing arrays directly does not seem to be feasible with Powershell, so it is down to either Compare-Object or rebuilding them to strings. Running some tests it looks like the difference is negligeble, but a friend suggested I can at least first compare the length of the strings because if they're not identical I know that I can skip the validation and just move to the next item, which seems to save about 25% of time. As in:

if($ParameterString1.Length -ne $ParameterString2.Length){
    continue
}
else{
    #continue with the actual comparison

Not sure if there are more optimisations possible here though...

\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I honestly like your second variation best. Only after encountering a real, measurable performance problem would I consider other solutions. With a little playing around, you can split, sort, and join the strings on one line, then do a normal string comparison:

$Array1 = @($String1 -split ", " | Sort-Object) -join ", "
$Array2 = @($String2 -split ", " | Sort-Object) -join ", "

if ($Array1 -eq $Array2) {
    # do stuff
}

Or use -ieq for a case insensitive match:

if ($Array1 -ieq $Array2) {
    # do stuff
}

I'm not sure what your preference is for using PowerShell command aliases, but using sort instead of Sort-Object could further reduce the clutter while giving you the same behavior:

$Array1 = @($String1 -split ", " | sort) -join ", "
$Array2 = @($String2 -split ", " | sort) -join ", "

Since the end result of the split-sort-join operation is a string, I would recommend coming up with better names for the results than $Array1 and $Array2. Maybe $SortedString1 and $SortedString2? Even that is pretty meaningless, but it is difficult to recommend variable names without seeing the rest of your script or the context in which it is being used.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.