I am trying to read in an xml document that is defined by an xml schema with a whiteSpace replace restriction on one of the elements, but when I try to access that element with PowerShell, all the whitespace is still there.
From my research, a whiteSpace replace restriction should tell the xml reader to replace all the newlines and tabs with a single space (w3schools)
For replication I have:
test.xml
<DOC xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="test.xsd">
<Description>
Here I have a long description
that takes multiple lines, but I'd
like it formatted nicely in this document -
even though it should be all one line when
parsed.
</Description>
</DOC>
test.xsd
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="DOC">
<xs:complexType>
<xs:all>
<xs:element name="Description" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:whiteSpace value="replace" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:all>
</xs:complexType>
</xs:element>
</xs:schema>
And then in PowerShell v5.0 I try the following commands:
$document = New-Object System.Xml.XmlDocument
$readersettings = New-Object -TypeName System.Xml.XmlReaderSettings
$readersettings.ValidationType = [System.Xml.ValidationType]::Schema
$readersettings.ValidationFlags = [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessInlineSchema -bor [System.Xml.Schema.XmlSchemaValidationFlags]::ProcessSchemaLocation
$docPath = (Get-Item 'test.xml').FullName
$reader = [System.Xml.XmlReader]::Create($docPath, $private:readersettings)
$document.Load($private:reader)
$reader.Close()
Write-Output $document.DOC.Description
Which writes out the following:
Here I have a long description that takes multiple lines, but I'd like it formatted nicely in this document - even though it should be all one line when parsed.
What I want is for this to return
Here I have a long description that takes multiple lines, but I'd like it formatted nicely in this document - even though it should be all one line when parsed.
I also tried:
- setting the xml reader's IgnoreWhitespace Property to true.
- setting
xs:whiteSpace value="collapse"
- making
Description
of typexs:normalizedString
- making
Description
of typexs:token
How can I tell the .NET xml reader that this element should have its whitespaces replaced with a single space?
Update
Despite MSDN's support of the types xs:token
and xs:normalizedString
in their System.Xml.Schema.XmlTypeCode
enumeration, it does not appear that .NET actually conforms to the standard by replacing/collapsing the white space characters
This is not very satisfactory, but since I know which element I want collapsed, I can use PowerShell's -replace
operator to collapse the whitespace for me and then use string trim to clean up any extra white space on the edges
PS C:\>($document.DOC.Description -replace '(\s)+',' ').Trim()
Is there any other way to do this during the $document.Load()
to effectively extend the .NET class so my whitespace is collapsed on load of the xml and not just when I access it and purposefully replace it?