1

I am trying to read in an xml document that is defined by an xml schema with a whiteSpace replace restriction on one of the elements, but when I try to access that element with PowerShell, all the whitespace is still there.

From my research, a whiteSpace replace restriction should tell the xml reader to replace all the newlines and tabs with a single space (w3schools)

For replication I have:

test.xml

<DOC xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="test.xsd">
    <Description>
        Here I have a long description
        that takes multiple lines, but I'd
        like it formatted nicely in this document -
        even though it should be all one line when
        parsed.
    </Description>
</DOC>

test.xsd

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="DOC">
    <xs:complexType>
        <xs:all>
            <xs:element name="Description" minOccurs="0">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:whiteSpace value="replace" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
        </xs:all>
    </xs:complexType>
</xs:element>

</xs:schema>

And then in PowerShell v5.0 I try the following commands:

$document = New-Object System.Xml.XmlDocument

$readersettings = New-Object -TypeName System.Xml.XmlReaderSettings
$readersettings.ValidationType = [System.Xml.ValidationType]::Schema
$readersettings.ValidationFlags = [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessInlineSchema -bor [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessSchemaLocation
$docPath = (Get-Item 'test.xml').FullName
$reader = [System.Xml.XmlReader]::Create($docPath, $private:readersettings)
$document.Load($private:reader)
$reader.Close()

Write-Output $document.DOC.Description

Which writes out the following:

      Here I have a long description
      that takes multiple lines, but I'd
      like it formatted nicely in this document -
      even though it should be all one line when
      parsed.

What I want is for this to return

Here I have a long description that takes multiple lines, but I'd like it formatted nicely in this document - even though it should be all one line when parsed.

I also tried:

  • setting the xml reader's IgnoreWhitespace Property to true.
  • setting xs:whiteSpace value="collapse"
  • making Description of type xs:normalizedString
  • making Description of type xs:token

How can I tell the .NET xml reader that this element should have its whitespaces replaced with a single space?

Update

Despite MSDN's support of the types xs:token and xs:normalizedString in their System.Xml.Schema.XmlTypeCode enumeration, it does not appear that .NET actually conforms to the standard by replacing/collapsing the white space characters

This is not very satisfactory, but since I know which element I want collapsed, I can use PowerShell's -replace operator to collapse the whitespace for me and then use string trim to clean up any extra white space on the edges

PS C:\>($document.DOC.Description -replace '(\s)+',' ').Trim()

Is there any other way to do this during the $document.Load() to effectively extend the .NET class so my whitespace is collapsed on load of the xml and not just when I access it and purposefully replace it?

3
  • This appears to be a solved issue. Have you seen this question and the associate answers?
    – VertigoRay
    Commented Jul 31, 2017 at 16:37
  • 1
    Yes, that is where I was able to figure out how to validate against a schema. I'm not asking how to validate against a schema, I'm asking how to use the whitespace functionality of XSD. The associated answers do not discuss whitespace at all. The only functional differences between my example powershell code and the accepted answer are: 1. I'm not supplying a separate xml schema, I'm just using the inline defined schema. 2. I'm not reporting validation warnings and 3. I'm using the XML document .Load method because I want to load the xml and access values, not just validate it. Commented Jul 31, 2017 at 17:08
  • 1
    Yeah, it looked like you were passed that, but I wanted to be sure. This isn't the first time that I've seen a .NET method not follow published standards. You may need to write some code to catch these anomalies and force solutions to the standard. I've definitely had to do that before, and it's frustrating.
    – VertigoRay
    Commented Jul 31, 2017 at 17:55

0

Browse other questions tagged or ask your own question.