Slow XML import with SQL server

Question

I have a XML file with a size of 1GB. I use the following code to load the data into sql server.

DECLARE @xmlvar XML
SELECT @xmlvar = BulkColumn
FROM OPENROWSET(BULK 'C:\Data\demo.xml', SINGLE_BLOB) x;

WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
                            'ux:no:ehe:v5:move' AS ns4,
                            'ux:no:ehe:v5:cat:fill' as ns3,
                            'ux:no:ehe:v5:centre' as ns2)
SELECT

        zs.value(N'(../@versionCode)', 'VARCHAR(100)') as versionCode,
        zs.value(N'(@Start)', 'VARCHAR(50)') as Start_date,
        zs.value(N'(@End)', 'VARCHAR(50)') as End_date

into testtbl

FROM @xmlvar.nodes('/ns4:Dataview1/ns4:Content/ns4:gen') A(zs);

I takes now more than 2 hours to run the query and it is not finished. I have tested the query with a smaller version of the XML file and that works. Any tips on improving the loading speed?

Thank you.

Update XML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:Dataview1 xmlns="ux:no::ehe:v5:actual:aver" xmlns:ns4="ux:no:ehe:v5:move">
    <ns4:Content versionCode="16000">
        <ns4:gen start="1961-07-01" end="1961-07-01">            
        </ns4:gen>
        <ns4:gen start="2017-09-19">            
        </ns4:gen>
        <ns4:gen start="1961-07-02" end="2016-09-30">           
        </ns4:gen>
        <ns4:gen start="2016-10-01" end="2017-09-18">            
        </ns4:gen>      
    </ns4:Content>
  </ns4:Dataview1>

Maybe try loading the file directly into a staging table first before then selecting your xml transformations from the staging table. — Stu, Commented Jul 23, 2022 at 20:27
Try running from command line utility sqlcmd.exe which comes with sqlserver: See : learn.microsoft.com/en-us/sql/tools/… — jdweng, Commented Jul 23, 2022 at 21:16
I think you are going to need a different tool for this. Perhaps use C#/Powershell with XmlTextReader to break down the XML, and maybe SqlBulkCopy to stream the data into SQL Server. — Charlieface, Commented Jul 24, 2022 at 2:07
There are scenarios where OPENXML is faster, but XmlReader+SqlBulkCopy will always be the fastest. XmlReader can be daunting. The key is to just target the Elements with XmlReader, then use .ReadSubTree and pass that to XElement.Load learn.microsoft.com/en-us/dotnet/api/… — David Browne - Microsoft, Commented Jul 24, 2022 at 20:46

Yitzhak Khabinsky · Accepted Answer · 2022-07-25 16:03:23Z

1

(1) As @Stu already pointed out, loading XML file first into a single row table will speed up the process of loading significantly.

(2) it is not a good idea to traverse XML up in the XPath expressions. Like here:

c.value('../@versionCode', 'VARCHAR(100)') as versionCode

But the XML structure was not shared in the question. So, it is impossible to suggest anything concrete.

2nd CROSS APPLY is simulating 1-to-many relationship in the XML hierarchy.

Check it out below.

SQL

CREATE TABLE tbl (
    ID INT IDENTITY(1, 1) PRIMARY KEY,
    XmlColumn XML
);

INSERT INTO tbl(XmlColumn)
SELECT * FROM OPENROWSET(BULK N'C:\Data\demo.xml', SINGLE_BLOB) AS x;

WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
                            'ux:no:ehe:v5:move' AS ns4,
                            'ux:no:ehe:v5:cat:fill' as ns3,
                            'ux:no:ehe:v5:centre' as ns2)
SELECT c.value('@versionCode', 'VARCHAR(100)') as versionCode,
    x.value('@start', 'DATE') as Start_date,
    x.value('@end', 'DATE') as End_date
INTO dbo.testtbl
FROM tbl
    CROSS APPLY XmlColumn.nodes('/ns4:Dataview1/ns4:Content') AS t1(c)
    CROSS APPLY t1.c.nodes('ns4:gen') AS t2(x);

edited Jul 25, 2022 at 16:03

answered Jul 24, 2022 at 2:06

Yitzhak Khabinsky

20.8k2 gold badges17 silver badges23 bronze badges

Hi, I have applied your suggestion. Qry is now stil running after 2 hours. Initial qry was running all night and did not finish, because i cancelled it.
– Zayfaya83
Commented Jul 24, 2022 at 8:28
Are you getting just 3 attributes, or your real life query is more complicated?
– Yitzhak Khabinsky
Commented Jul 24, 2022 at 13:08
Only 3 attributes for testing, it still runs > 11hours
– Zayfaya83
Commented Jul 24, 2022 at 20:02
Please try to use a real table instead of table variable. Also, what about (2) ?
– Yitzhak Khabinsky
Commented Jul 24, 2022 at 20:23
If i remove this line: "c.value('../@versionCode', 'VARCHAR(100)') as versionCode" Qry finish in 10 seconds. This line is the versionidentifier.
– Zayfaya83
Commented Jul 25, 2022 at 11:53

| Show 5 more comments

Shahriar Khazaei · Accepted Answer · 2022-07-23 19:40:08Z

0

In my opinion it's better to use an SSIS Package for importing XML files. It has a component named "XML Source" for loading XML file.

There is a useful article at : https://www.sqlshack.com/import-xml-documents-into-sql-server-tables-using-ssis-packages/

answered Jul 23, 2022 at 19:40

Shahriar Khazaei

3393 silver badges8 bronze badges

1

Normally I import xml with SSIS, but this one is to complex. Therefore i am trying to get the import with a query.
– Zayfaya83
Commented Jul 23, 2022 at 20:10
Do you have any evidence that this runs any faster as this seems to be the problem to be solved.
– Nick.Mc
Commented Jul 24, 2022 at 21:45

Add a comment |

Collectives™ on Stack Overflow

Slow XML import with SQL server

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
sql-server
xml
t-sql
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged sql-serverxmlt-sql or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
sql-server
xml
t-sql
or ask your own question.