0

Given XML such as

<?xml version="1.0"?>
<Products>
    <Product_Group id="Robot">
        <Product id="RSAPRO2017">
            <DefaultShortcut>Autodesk Robot Structural Analysis Professional 2017.lnk</DefaultShortcut>
            <ProgramFolder>C:\Program Files\Autodesk\Autodesk Robot Structural Analysis Professional 2017</ProgramFolder>
            <UserAppDataRoaming>C:\Users\$(userName)\AppData\Roaming\Autodesk\Autodesk Robot Structural Analysis Professional 2017</UserAppDataRoaming>
        </Product>
    </Product_Group>
</Products>

And a PowerShell variable $node using SelectSingleNode to select the Product, is there a way to 1: Get just the first line of the node, so <Product id="RSAPRO2017">, and 2: Get the line number in the XML, so in this example line # would be 4.

I can show an approximation for #1 with Write-Host "<$($node.name)>" but that doesn't provide the full line.

The goal here is to validate some complex XML and provide an error log with details that allow a person to revise their XML quickly and easily. So, for example, since the right node name is Product I might have an error log of

Invalid node (Productt) at line 4
    <Productt id="RSAPRO2017">

I can get the OuterXML of the entire node, but some nodes have 40-50 lines so that's no help. I could extract the first line with a RegEx to get everything between the first < > pair, but I wonder if there is a built in PowerShell/XPath approach, rather than rolling my own? And, getting the line number seems... iffy in general.

1
  • 1
    In your example of <Productt id="RSAPRO2017"> is the end tag </Product> or </Productt>? That is, are you looking to report XML that is invalid (say, according to a an actual schema or one you hard-code) or XML that is not well-formed? I'm not sure the latter is possible with a .NET BCL reader (or, if anything, you wouldn't need to detect the error because the reader would throw an exception). Also, this explicit interface implementation for System.Xml.Linq might be an alternative. Commented Dec 8, 2019 at 22:26

2 Answers 2

1

As @BACON commented, an xml reader in PowerShell will throw an exception if the xml is malformed. The [xml] type implementation in PowerShell shows this:

[xml]$malformed = @'
<?xml version="1.0"?>
<Products>
    <Product_Group id="Robot">
        <Productt id="RSAPRO2017">
            <DefaultShortcut>Autodesk Robot Structural Analysis Professional 2017.lnk</DefaultShortcut>
            <ProgramFolder>C:\Program Files\Autodesk\Autodesk Robot Structural Analysis Professional 2017</ProgramFolder>
            <UserAppDataRoaming>C:\Users\$(userName)\AppData\Roaming\Autodesk\Autodesk Robot Structural Analysis Professional 2017</UserAppDataRoaming>
        </Product>
    </Product_Group>
</Products>
'@

Yields output which includes:

Error: "The 'Productt' start tag on line 4 position 10 does not match the end tag of 'Product'.

If you want to be a bit smarter about what you do with the error information, wrap the whole thing inside try {} catch {} and try to extract data from the automatic $Error array, of which the most recent error is held at position zero ($Error[0]).

try {} attempts to do something and catch {} only runs if the attempt failed. You can also add finally {} to do something after catch regardless of error/success.

try {
[xml]@'
<?xml version="1.0"?>
<Products>
    <Product_Group id="Robot">
        <Productt id="RSAPRO2017">
            <DefaultShortcut>Autodesk Robot Structural Analysis Professional 2017.lnk</DefaultShortcut>
            <ProgramFolder>C:\Program Files\Autodesk\Autodesk Robot Structural Analysis Professional 2017</ProgramFolder>
            <UserAppDataRoaming>C:\Users\$(userName)\AppData\Roaming\Autodesk\Autodesk Robot Structural Analysis Professional 2017</UserAppDataRoaming>
        </Product>
    </Product_Group>
</Products>

'@}
catch {
$Error[0].Exception.Message
}
1
  • All, to clarify, I am meaning to test well formed XML, with invalid info arbitrary to my naming convention. So in my (imperfect) example the invalid 'Productt' node would be referenced in both the opening and closing tags, to make well formed XML, but invalid XML for my purposes, since the only valid child node of a 'Product_Group' node is a 'Product' node. 'Products' would also be invalid, and might actually be a more realistic scenario if someone did a copy paste of the root node to start their product node, for example.
    – Gordon
    Commented Dec 9, 2019 at 10:52
0

Though this really seems like a job for validating against a schema file (in which case see How do I use PowerShell to Validate XML files against an XSD?), if you want to perform hard-coded validation you can use the XDocument class to get line information.

However you obtain an XDocument instance ([System.Xml.Linq.XDocument]::Load() or [System.Xml.Linq.XDocument]::Parse()), you'll need to pass a [System.Xml.Linq.LoadOptions] value specifying that you want line information to be tracked. Line information is made available via the IXmlLineInfo interface which is explicitly implemented by the XObject class and its descendants, though in PowerShell you can access its members automagically.

Given SO59240313.xml...

<?xml version="1.0"?>
<Products>
    <Product_Group id="Robot">
        <Product id="RSAPRO2017">
            <DefaultShortcut>Autodesk Robot Structural Analysis Professional 2017.lnk</DefaultShortcut>
            <ProgramFolder>C:\Program Files\Autodesk\Autodesk Robot Structural Analysis Professional 2017</ProgramFolder>
            <UserAppDataRoaming>C:\Users\$(userName)\AppData\Roaming\Autodesk\Autodesk Robot Structural Analysis Professional 2017</UserAppDataRoaming>
        </Product>
        <Productt id="SomeId">
            <DefaultShortcut>Some shortcut</DefaultShortcut>
            <ProgramFolder>Some program path</ProgramFolder>
            <UserAppDataRoaming>Some roaming path</UserAppDataRoaming>
        </Productt>
    </Product_Group>
</Products>

...this code...

function ReportUnexpectedElement([System.Xml.Linq.XElement] $element)
{
    Write-Warning "Unexpected element ""$($element.Name)"" found at position $($element.LinePosition) of line $($element.LineNumber)."
}

$document = [System.Xml.Linq.XDocument]::Load(
    "$PWD\SO59240313.xml",
    [System.Xml.Linq.LoadOptions]::SetLineInfo
)

if ($document.Root.Name -ne 'Products')
{
    ReportUnexpectedElement $document.Root
}
else
{
    $productGroupElement = $document.Root.Element('Product_Group')
    if ($null -eq $productGroupElement)
    {
        Write-Warning 'Product_Group element not found.'
    }
    else
    {
        foreach ($productGroupChildElement in $productGroupElement.Elements())
        {
            if ($productGroupChildElement.Name -ne 'Product')
            {
                ReportUnexpectedElement $productGroupChildElement
            }
        }
    }
}

...will print this output...

WARNING: Unexpected element "Productt" found at position 4 of line 9.

I'm not seeing a readily-accessible way to retrieve an element's complete opening tag, but you could reconstruct it yourself by modifying the ReportUnexpectedElement function like this...

function ReportUnexpectedElement([System.Xml.Linq.XElement] $element)
{
    $elementBuilder = New-Object -TypeName 'System.Text.StringBuilder'

    # Most StringBuilder methods return a reference to that same instance,
    # so remember to suppress them from the function output
    $elementBuilder.Append('<').Append($element.Name) `
        | Out-Null
    foreach ($attribute in $element.Attributes())
    {
        $elementBuilder.AppendFormat(' {0}="{1}"', $attribute.Name, $attribute.Value) `
            | Out-Null
    }
    $elementBuilder.Append('>') `
        | Out-Null

    Write-Warning "Unexpected element ""$($elementBuilder.ToString())"" found at position $($element.LinePosition) of line $($element.LineNumber)."
}

...which, given the same XML file, will print this output...

WARNING: Unexpected element "<Productt id="SomeId">" found at position 4 of line 9.

5
  • Hmmmmm, I have been using XPath for ages, but it seems I had better refactor for Linq now, while I am refactoring everything else and it will be easy. @BACON, do you have a sense of the performance implications? Is moving to Linq going to be meaningfully faster or slower than XPath? XPath takes a few seconds to load some larger XML file sets now, which is a slight annoyance since it happens at launch when I am most aware of performance, but all things considered it's minor.
    – Gordon
    Commented Dec 10, 2019 at 9:29
  • Also, with regards to XSD, I have looked into that, but I am perhaps using XML in a less than ideal way that seems to make XSD harder to use. For example, I also define "tasks" in the XML, & one task, a Copy, can have a one to one <Source> to <Destination> situation, or a one to many, or a many to one. And maybe a many to many as long as there are equal Source and Destination nodes and they are alternating. Validating that with code is fussy, but possible, but validating that with XSD seems to be either impossible, or requires a deep understanding & hand coding of XSD. Deeper than I have.
    – Gordon
    Commented Dec 10, 2019 at 9:36
  • I can't speak to any performance differences between the two, but it's been asked before. There are extension methods that allow for using XPath with XNode instances, so you could, say, retrieve all <Product /> elements with [System.Xml.XPath.Extensions]::XPathSelectElements($document, '//Product'), although such a simple query could also be done with simply $document.Descendants('Product'). Take note, however, of the first paragraph of that class's Remarks section. Commented Dec 10, 2019 at 22:04
  • Interesting. I took a look at what it would take to implement LINQ, and it seems... messy. And probably hard for me to maintain alone, since it will be so different from all the other code around it. All that JUST for line numbers, especially when my ultimate goal is to have a GUI and no longer need to put customers into the XML anyway, seems like not an ideal approach. But it also looks like LINQ is SOOOO much more than just another way to mess about with XML, so I should probably play with it a bit as a learning process, even if I don't refactor now.
    – Gordon
    Commented Dec 11, 2019 at 15:57
  • XML is just one small corner of LINQ. You'll see the most benefit in languages (C#/VB.NET) that support extension methods and its query syntax; neither of those apply to PowerShell, though the former can be invoked through their respective classes directly. For some of the common LINQ methods (Where(), Select(), etc.) there already exist analogous PowerShell cmdlets. As for what was asked here, as far as I can see XDocument - not XmlDocument nor Select-Xml - is the only built-in way to parse XML with line information. Commented Dec 11, 2019 at 23:48

Not the answer you're looking for? Browse other questions tagged or ask your own question.