1

We have a table with an XML column holding quite a bit of data, this has worked fine in our dev environments but as the table grew in size (close to 10,000 rows) we started seeing performance issues.

Just doing SELECT * takes 12 seconds alone...

Any suggestions to remedy this?

Thanks in advance.

2
  • Do you have the requisite indexes on the table? I suspect the data row size is just too big, and just takes a lot of time to return the data...
    – M.R.
    Commented Sep 6, 2011 at 19:44
  • How are you doing the SELECT *?
    – Tom H
    Commented Sep 6, 2011 at 19:48

3 Answers 3

2

You could check out several things - at least if the performance hit is mostly when dealing with and selecting data from the XML column:

  • you can put an index on your XML column - this can help if you need to grab lots of data from within the XML column. One word of caution: XML indices use a lot of disk space - in our case, a database of 1.5GB rocketed up to 11GB in disk size .... use with caution!

  • you can "surface" certain elements from within your XML onto the "parent" table as computed, persisted columns and thus find the rows you need more quickly (needs a stored function - but it's really quite a nice technique if you have this need)

Also: never do a SELECT * anyway - and if you don't need the XML column - don't select it - it will be quite verbose and use quite a bit of memory.

0

Just to add a bit to what marc_s said: I would also recommend an index -- 10k records is not very much. But make sure that you are adding an index on the correct thing -- usually the best places to put indexes are on columns that are used for JOIN conditions, WHERE clauses, or ORDER BY clauses. If your query is not using the XML itself for these cases, you may be better served by creating an index on a different column (for example if you are doing a lookup on an ID which is in a non-XML column, you might see more benefit by creating the index on the ID).

If actually extracting the XML data is slow, you could consider making a covering index (using INCLUDE keyword), where you have an index on the ID but INCLUDE an expression that extracts the value from the XML column. This made a huge difference for me on one of my projects, but as always make sure to test the performance.

Of course, if your queries are actually doing JOIN/WHERE/ORDER BY on the XML data then you should probably do what marc_s recommends and create the index on the XML column.

3
  • Thanks for both of the replies, this gave me some ideas on where we need to improve in our app. However our scenario is not as complicated at the moment, we do not do much querying on XML in SQL at the moment, the use case for this issue is that we pull entire column and application code converts it to an object where the rest of the app can use it, however just pulling a few thousand of these at a time takes time. With that said and given that we pull the entire column would the index still help?
    – AVP06
    Commented Sep 8, 2011 at 2:25
  • Also would it make any difference if we only pulled partial XML nodes instead of an entire column?
    – AVP06
    Commented Sep 8, 2011 at 2:26
  • In general, the less data that is transferred from the database to the client, the better you will be performance-wise. That's why you should avoid "SELECT *" and instead select only necessary columns. Similarly if you can reduce the # of rows returned (by filtering the records somehow) that would also improve performance. If an index is useful to your filtering, then yes the index would speed things up. And yes, it's possible that pulling a subset of the XML would help, too. Hard to say for sure without testing.
    – JohnD
    Commented Sep 8, 2011 at 2:47
0

If querying records, and filtering on data within an XML data type, you're asking SQL Server to examine all the XML content to find results.

To speed things up, combine XML data type filters with full text search expressions. The full text search narrows down the results (depending how specific you are) before the XML is parsed and searched. It can save a lot of CPU and IO. Here's an example:

SELECT * 
FROM   Table 
WHERE  CONTAINS(XmlColumn,'value') 
AND    XmlColumn.exist('/element/element/text()[contains(.,"value")]') = 1

This is documented by Microsoft here, and you can examine your before and after by running your queries with statistics on. Here's how you turn statistics on:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

Not the answer you're looking for? Browse other questions tagged or ask your own question.