0

I have ingested XML file in Postgresql DB and I trying to perform XPATH operation on the field with date comparison but it is not working. From Postgresql official site I found out that Postgresql uses Xpath1 and Xpath1 does not support date comparison. I tried changing my Xpath query to use string based comparison but still it is not matching.

Input Xpath:

//val:key[@modifiedDate>"2024-06-06"]

Since Direct date comparison is not available in Xpath1 I tried String based comparison.

Converted Xpath:

//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]

But this XPath is also not matching even though there is a matching Xpath. How should I frame my Xpath to match the field with date compaission.

The sample structure of XML ingested in table.

<root xmlns:val="http://some.uri">
    <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
    <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
    <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
    <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
</root>

The expected result is we should be able to match this XML becuase it xpaths <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key> and <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>, as their modifiedDate attributes start with dates later than "2024-06-06".

SQL Query for reproducing the problem:

CREATE TABLE xml_data (
    id SERIAL PRIMARY KEY,
    document XML
);

INSERT INTO xml_data (document)
VALUES ('<root xmlns:val="http://some.uri">
            <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
            <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
            <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
            <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
        </root>');

Query to filter data:

SELECT 
    id, document
FROM 
    xml_data
WHERE 
    xpath_exists('//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]', document);
7
  • 1
    Add dummy data and expected results so the community could help.
    – Pepe N O
    Commented Jul 4 at 14:54
  • 1
    While asking a question, you need to provide a minimal reproducible example: (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your RDBMS version. Commented Jul 4 at 15:06
  • Changed the question @YitzhakKhabinsky Commented Jul 4 at 16:41
  • 1
    There is one fundamental problem with the XML sample provided: it doesn't have XML namespace declaration. It makes XML as not well-formed, and unusable. Please fix it. Commented Jul 4 at 17:09
  • 1
    Add <root xmlns:val="http://some.uri"> to define that undefined namespace "val"
    – Pepe N O
    Commented Jul 4 at 17:30

2 Answers 2

1

As Pepe N O points out, using the > operator in your XPath expression will convert the date operands from strings into numbers, but since both operands contain non-numeric characters, the conversion will produce NaN in both cases.

However, you could modify your XPath expression to use the XPath translate function to convert your dates into purely numerical values.

For example the expression translate('2024-06-07T12:34:56', '-T:', '') equals "20240607123456".

1
  • 1
    Thanks I was able to solve it using xpath //val:key[number(translate(substring(@modifiedDate, 1, 10), "-", "")) < number("20240606")] Commented Jul 8 at 17:11
1

Considering that

XPath 1.0 has no ordering comparison operator for strings. Both "cat" < "dog" and "cat" > "dog" are false, because each is a numeric comparison of two NaNs.

You could access the values on your XML document and cast to date or timestamp and then make the comparison, this way (fixing a missing namespace xmlns:val="http://some.uri" to make it valid):

select key, modified_date from
xmltable(xmlnamespaces('http://some.uri' as val),
'//val:key'
passing (select document from xml_data) 
columns key varchar path 'text()',  
modified_date timestamp path '@modifiedDate')
where modified_date>'2024-06-06';

Resulting

key modified_date
Key1 2024-06-07 12:34:56.000
Key3 2024-06-10 14:22:33.000
Key4 2024-06-06 09:00:00.000

Fiddle to test

1
  • Thanks I was able to solve it using xpath //val:key[number(translate(substring(@modifiedDate, 1, 10), "-", "")) < number("20240606")] Commented Jul 8 at 17:10

Not the answer you're looking for? Browse other questions tagged or ask your own question.