4

I just pulled out a piece of code which I wrote a few months ago. The code fetches an XML document from a web server and parses it using JAXB. The last time I tried it worked flawlessly; now I am getting an exception:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)

Looking around, this suggests some issues with the XML header data, namely <!DOCTYPE ...>. The answer suggests that the statement is misleading: in the case described, systemId was missing altogether, despite the error just complaining about a missing whitespace in front of it.

However, if I get the XML document with a web browser, it doesn’t even contain the <!DOCTYPE ...> header.

Parsing an XML document I retrieved a few months back works without issues.

If I diff the document I retrieved today and the one from a few months back, both are exactly the same up to the start of the root element.

1 Answer 1

11

Capturing the HTTP traffic finally provided the answer (unencrypted connections come in handy at times): Apparently the service switched from HTTP to HTTPS in the last few months, with URLs remaining unchanged otherwise.

Requests to the old URL are answered with 301 Moved Permanently and the new URL.

When reading from a URL with java.net.URL.openStream(), redirects are not followed automatically. Thus, the data it returns is not valid XML, leading to the error message.

Lesson learned for today: White spaces are required between publicId and systemId is really just a cryptic way of saying: Something’s wrong with the XML data you supplied, but we didn’t bother to dig any deeper.

Not the answer you're looking for? Browse other questions tagged or ask your own question.