In the following I create a XML with several namespaces, one multi-text element and various nestings, repetitions, name-clashes and attributes. This should cover most real-world scenarios.
Hint: It's easy to wrap this as inline TVF and call it as a one liner, passing the XML as parameter.
DECLARE @xml XML=
N'<root xmlns="defaultNs" xmlns:ns1="dummy1" xmlns:other="SomeOther">
<!-- this element contains several attributes in various namespaces
Hint: An attribute without a prefix is assumed to live in the same namespace as its element -->
<ns1:level1 test1="test1" ns1:test2="test2" other:test3="test3">
<other:InnerElement>Some inner element</other:InnerElement>
<!-- this element contains several text nodes -->
<multiText>text1<someInner>blah</someInner>text2<someInner/>text3</multiText>
<!-- repeating elements some of them with attributes -->
<repeating>rep 1</repeating>
<repeating r2="r2">rep 2</repeating>
<!-- one with the same name, but living in another namespace -->
<other:repeating r4="r4">rep 4</other:repeating>
<!-- some deeper nesting -->
<level2>
<level3/>
<level3>
<content>Content in second level3 element</content>
</level3>
</level2>
<!-- and one more of the repeating, but listed in a lower position -->
<repeating oneMore="oneMore">one more</repeating>
</ns1:level1>
</root>';
--the query
WITH AllNamespaces As
(
SELECT CONCAT('ns',ROW_NUMBER() OVER(ORDER BY (B.namespaceUri))) Prefix
,B.namespaceUri
FROM @xml.nodes('//*') A(nd)
CROSS APPLY(VALUES(A.nd.value('namespace-uri(.)','nvarchar(max)')))B(namespaceUri)
WHERE LEN(B.namespaceUri)>0
GROUP BY B.namespaceUri
)
,recCte AS
(
SELECT 1 AS RecursionLevel
,1 AS NodeType
,ROW_NUMBER() OVER(ORDER BY A.nd) AS ElementPosition
,CAST(REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0') AS VARCHAR(900)) COLLATE DATABASE_DEFAULT AS SortString
,ns.Prefix AS CurrentPrefix
,ns.namespaceUri AS CurrentUri
,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
,CAST(CONCAT('/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS NVARCHAR(MAX)) COLLATE DATABASE_DEFAULT AS XPath
,A.nd.query('.') CurrentFragment
,A.nd.query('./*') NextFragment
FROM @xml.nodes('/*') A(nd)
LEFT JOIN AllNamespaces ns ON ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')
UNION ALL
SELECT r.RecursionLevel+1
,1
,ROW_NUMBER() OVER(ORDER BY A.nd)
,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0')) AS VARCHAR(900)) COLLATE DATABASE_DEFAULT
,ns.Prefix
,ns.namespaceUri
,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']')
,CONCAT(r.XPath,'/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']')
,A.nd.query('.') CurrentFragment
,A.nd.query('./*') NextFragment
FROM recCte r
CROSS APPLY NextFragment.nodes('*') A(nd)
OUTER APPLY(SELECT Prefix,namespaceUri FROM AllNamespaces ns WHERE ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')) ns
)
,WithValues AS
(
SELECT r.RecursionLevel
,CASE WHEN LEN(B.NodeValue)>0 THEN 3 ELSE r.NodeType END AS NodeType
,r.ElementPosition
,CASE WHEN LEN(B.NodeValue)>0 THEN CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(PARTITION BY r.Xpath ORDER BY A.txt),5),' ','0')) ELSE r.SortString END AS SortString
,r.CurrentPrefix
,r.CurrentUri
,CASE WHEN LEN(B.NodeValue)>0 THEN 'text()' ELSE r.FullName END AS FullName
,r.XPath AS OrigXPath
,CASE WHEN LEN(B.NodeValue)>0 THEN CONCAT(r.XPath,'/text()[',ROW_NUMBER() OVER(PARTITION BY r.Xpath ORDER BY A.txt),']') ELSE r.XPath END AS XPath
,CASE WHEN LEN(B.NodeValue)>0 THEN B.NodeValue ELSE NULL END AS NodeValue
,r.CurrentFragment
,r.NextFragment
FROM recCte r
OUTER APPLY r.CurrentFragment.nodes('*/text()') A(txt)
OUTER APPLY (SELECT A.txt.value('.','nvarchar(max)')) B(NodeValue)
)
,WithAttributes AS
(
SELECT RecursionLevel
,NodeType
,ElementPosition
,SortString
,CurrentPrefix
,CurrentUri
,FullName
,XPath
,NodeValue
,CurrentFragment
,NextFragment
FROM WithValues
UNION ALL
SELECT wv.RecursionLevel
,2
,wv.ElementPosition
,wv.SortString
,CASE WHEN ns.Prefix IS NOT NULL THEN ns.Prefix ELSE wv.CurrentPrefix END AS CurrentPrefix
,CASE WHEN ns.namespaceUri IS NOT NULL THEN ns.namespaceUri ELSE wv.CurrentUri END AS CurrentUri
,CONCAT('@',ns.Prefix+':',B.AttrName) AS FullName
,CONCAT(wv.OrigXPath,'/@',ns.Prefix+':',B.AttrName) AS XPath
,A.attr.value('.','nvarchar(max)') AS NodeValue
,wv.CurrentFragment
,wv.NextFragment
FROM WithValues wv
CROSS APPLY wv.CurrentFragment.nodes('*/@*') A(attr)
CROSS APPLY (SELECT A.attr.value('local-name(.)','nvarchar(max)') AS AttrName
,A.attr.value('.','nvarchar(max)') AS AttrValue
,A.attr.value('namespace-uri(.)','nvarchar(max)') AS namespaceUri) B
OUTER APPLY(SELECT Prefix,namespaceUri FROM AllNamespaces ns WHERE ns.namespaceUri=B.namespaceUri) ns
)
SELECT NodeType
,CurrentPrefix
,CurrentUri
,FullName
,XPath
,NodeValue
FROM WithAttributes
WHERE NodeValue IS NOT NULL
ORDER BY SortString;
--The result
/*
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| NodeType | CurrentPrefix | CurrentUri | FullName | XPath | NodeValue |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns2 | dummy1 | @test1 | /ns1:root[1]/ns2:level1[1]/@test1 | test1 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns2 | dummy1 | @ns2:test2 | /ns1:root[1]/ns2:level1[1]/@ns2:test2 | test2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns3 | SomeOther | @ns3:test3 | /ns1:root[1]/ns2:level1[1]/@ns3:test3 | test3 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns3 | SomeOther | text() | /ns1:root[1]/ns2:level1[1]/ns3:InnerElement[1]/text()[1] | Some inner element |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[1] | text1 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/ns1:someInner[1]/text()[1] | blah |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[2] | text2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[3] | text3 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:repeating[1]/text()[1] | rep 1 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:repeating[2]/text()[1] | rep 2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns1 | defaultNs | @r2 | /ns1:root[1]/ns2:level1[1]/ns1:repeating[2]/@r2 | r2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns3 | SomeOther | @r4 | /ns1:root[1]/ns2:level1[1]/ns3:repeating[1]/@r4 | r4 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns3 | SomeOther | text() | /ns1:root[1]/ns2:level1[1]/ns3:repeating[1]/text()[1] | rep 4 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:level2[1]/ns1:level3[2]/ns1:content[1]/text()[1] | Content in second level3 element |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns1 | defaultNs | @oneMore | /ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/@oneMore | oneMore |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/text()[1] | one more |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
*/
--Just to show, that the created XPaths return the expected (attention: We must use our own prefixes - even for the default namespace):
WITH XMLNAMESPACES( 'defaultNs' AS ns1
,'dummy1' AS ns2
,'SomeOther' AS ns3)
SELECT @xml.value('/ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/ns1:someInner[1]/text()[1]','nvarchar(max)') Is_blah
,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:level2[1]/ns1:level3[2]/ns1:content[1]/text()[1]','nvarchar(max)') Is_Content_in_second_level3_element
,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/@oneMore','nvarchar(max)') Is_attribute_oneMore
,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[3]','nvarchar(max)') Is_3rd_text_in_multiText;
The idea in short:
- The namespace prefixes can be defined by your own. There is no XQuery-function available in T-SQL to find the actual prefix, so we just use our own prefixes. The underlying URI is important.
- The first cte will create a set of all occuring URIs and return this together with a prefix.
- The recursive CTE will traverse deeper and deeper into the XML. This will continue as long as
APPLY
with .nodes()
can return nested nodes.
- One CTE adds
text()
nodes - if there are any.
- One CTE adds attributes - if there are any.
- The full name is concatenated as well as the full XPath.
- The
NodeType
helps to distinguish between elements (=1), attributes (=2) and text()
(=3)
- The CASTs and COLLATEs help to avoid data type mismatch (recursive CTEs are very picky with this).
- The concatenated SortString is needed to ensure the same order in your output.
- You might use
SELECT * ...
to see all returned colums...
- You might query this without
WHERE NodeValue IS NOT NULL
to see more of the empty structure.