0

I have a XML file with a size of 1GB. I use the following code to load the data into sql server.

DECLARE @xmlvar XML
SELECT @xmlvar = BulkColumn
FROM OPENROWSET(BULK 'C:\Data\demo.xml', SINGLE_BLOB) x;

WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
                            'ux:no:ehe:v5:move' AS ns4,
                            'ux:no:ehe:v5:cat:fill' as ns3,
                            'ux:no:ehe:v5:centre' as ns2)
SELECT

        zs.value(N'(../@versionCode)', 'VARCHAR(100)') as versionCode,
        zs.value(N'(@Start)', 'VARCHAR(50)') as Start_date,
        zs.value(N'(@End)', 'VARCHAR(50)') as End_date

into testtbl

FROM @xmlvar.nodes('/ns4:Dataview1/ns4:Content/ns4:gen') A(zs);

I takes now more than 2 hours to run the query and it is not finished. I have tested the query with a smaller version of the XML file and that works. Any tips on improving the loading speed?

Thank you.

Update XML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:Dataview1 xmlns="ux:no::ehe:v5:actual:aver" xmlns:ns4="ux:no:ehe:v5:move">
    <ns4:Content versionCode="16000">
        <ns4:gen start="1961-07-01" end="1961-07-01">            
        </ns4:gen>
        <ns4:gen start="2017-09-19">            
        </ns4:gen>
        <ns4:gen start="1961-07-02" end="2016-09-30">           
        </ns4:gen>
        <ns4:gen start="2016-10-01" end="2017-09-18">            
        </ns4:gen>      
    </ns4:Content>
  </ns4:Dataview1>
4
  • Maybe try loading the file directly into a staging table first before then selecting your xml transformations from the staging table.
    – Stu
    Commented Jul 23, 2022 at 20:27
  • Try running from command line utility sqlcmd.exe which comes with sqlserver: See : learn.microsoft.com/en-us/sql/tools/…
    – jdweng
    Commented Jul 23, 2022 at 21:16
  • 1
    I think you are going to need a different tool for this. Perhaps use C#/Powershell with XmlTextReader to break down the XML, and maybe SqlBulkCopy to stream the data into SQL Server. Commented Jul 24, 2022 at 2:07
  • There are scenarios where OPENXML is faster, but XmlReader+SqlBulkCopy will always be the fastest. XmlReader can be daunting. The key is to just target the Elements with XmlReader, then use .ReadSubTree and pass that to XElement.Load learn.microsoft.com/en-us/dotnet/api/… Commented Jul 24, 2022 at 20:46

2 Answers 2

1

(1) As @Stu already pointed out, loading XML file first into a single row table will speed up the process of loading significantly.

(2) it is not a good idea to traverse XML up in the XPath expressions. Like here:

c.value('../@versionCode', 'VARCHAR(100)') as versionCode

But the XML structure was not shared in the question. So, it is impossible to suggest anything concrete.

2nd CROSS APPLY is simulating 1-to-many relationship in the XML hierarchy.

Check it out below.

SQL

CREATE TABLE tbl (
    ID INT IDENTITY(1, 1) PRIMARY KEY,
    XmlColumn XML
);

INSERT INTO tbl(XmlColumn)
SELECT * FROM OPENROWSET(BULK N'C:\Data\demo.xml', SINGLE_BLOB) AS x;

WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
                            'ux:no:ehe:v5:move' AS ns4,
                            'ux:no:ehe:v5:cat:fill' as ns3,
                            'ux:no:ehe:v5:centre' as ns2)
SELECT c.value('@versionCode', 'VARCHAR(100)') as versionCode,
    x.value('@start', 'DATE') as Start_date,
    x.value('@end', 'DATE') as End_date
INTO dbo.testtbl
FROM tbl
    CROSS APPLY XmlColumn.nodes('/ns4:Dataview1/ns4:Content') AS t1(c)
    CROSS APPLY t1.c.nodes('ns4:gen') AS t2(x);
10
  • Hi, I have applied your suggestion. Qry is now stil running after 2 hours. Initial qry was running all night and did not finish, because i cancelled it.
    – Zayfaya83
    Commented Jul 24, 2022 at 8:28
  • Are you getting just 3 attributes, or your real life query is more complicated? Commented Jul 24, 2022 at 13:08
  • Only 3 attributes for testing, it still runs > 11hours
    – Zayfaya83
    Commented Jul 24, 2022 at 20:02
  • Please try to use a real table instead of table variable. Also, what about (2) ? Commented Jul 24, 2022 at 20:23
  • If i remove this line: "c.value('../@versionCode', 'VARCHAR(100)') as versionCode" Qry finish in 10 seconds. This line is the versionidentifier.
    – Zayfaya83
    Commented Jul 25, 2022 at 11:53
0

In my opinion it's better to use an SSIS Package for importing XML files. It has a component named "XML Source" for loading XML file.

There is a useful article at : https://www.sqlshack.com/import-xml-documents-into-sql-server-tables-using-ssis-packages/

2
  • 1
    Normally I import xml with SSIS, but this one is to complex. Therefore i am trying to get the import with a query.
    – Zayfaya83
    Commented Jul 23, 2022 at 20:10
  • Do you have any evidence that this runs any faster as this seems to be the problem to be solved.
    – Nick.Mc
    Commented Jul 24, 2022 at 21:45

Not the answer you're looking for? Browse other questions tagged or ask your own question.