18

I've always found validation against a schema to be an invaluable ward against thinkos and would like to incorporate validation checks as part of a project where I frequently need to hand-write XML files a few hundred lines in length. My text editor has a fairly nice CLI integration feature, so I'm looking for a command-line validator.

When I didn't find any clear winners via Google, I poked around here and found a similar question, but none of the tools suggested there quite fit my needs:

  • libxml (via cygwin) — does not report line numbers; I have no idea where my errors are!
  • msxml — cannot be run from the command line?
  • xerces-c — seems to require a copy of Visual C?
  • xerces2-j — cannot be run from the command line?
  • xmlstarlet — insufficient XSD support*

(*The schema I'm validating against uses substitution groups — inappropriately, but it's external to the project, so I can't change it — which causes xmlstarlet to choke even on valid files.)

Normally, this is the point in solving a problem at which I'd give up on looking for an existing solution and reach for the Python-hammer, but Python's XML support is notoriously… well… actually, let's just leave it at "notorious".

So I'm back to looking for a pre-existing tool. My requirements are pretty simple:

  • runs on Win32 (Windows XP SP3, specifically)
  • command-line; my editor can work with just about any combination of stdin/-out/-err, arguments, temp files, etc.
  • reasonably complete XSD support (particularly namespaces and substitution groups)
  • reports the line number where the error occurred!

Does such a tool exist? I'd prefer not to have to install Visual Studio and friends (too bloated, IMO), but I do already have both Cygwin and Python installed.

6 Answers 6

13

Your first option, xmllint (libxml2), does give line numbers for errors in the xml (and also in the xsd). You probably just need a later version. I just confirmed both using my copy, which is:

>  xmllint --version
xmllint: using libxml version 20627

Example output:

invalidXml.xml:4: element c: Schemas validity error : Element 'c': This element is not expected. Expected is ( b ).
invalidXml.xml fails to validate
<?xml version="1.0"?>
<invalidXmlEg>
  <a/>
<!--  <b></b> -->
  <c/>
</invalidXmlEg>

Where the xsd is:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="invalidXmlEg">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="a" type="xs:string" />
        <xs:element name="b" type="xs:string" />
        <xs:element name="c" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

NOTE: I have noticed that xmllint will accept elements names that it shouldn't (e.g. "<invalidXml.xsd>"), but this doesn't seem to affect your task.

EDIT adding the "compiled with" part of the version:

 compiled with: Threads Tree Output Push Reader
 Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy
 C14N Catalog XPath XPointer XInclude Iconv ISO8859X
 Unicode Regexps Automata Expr Schemas Schematron
 Modules Debug Zlib 
3
  • Interesting! The version I'm using is 20703, which produces "Element 'c': This element is not expected. Expected is ( b )." (nearly identical, but lacks line number). I'll have to see if I can dig up and older version.
    – Ben Blank
    Commented Jul 24, 2009 at 16:02
  • Seems a backward step... I wonder if they added an option, for whether line numbers are included or not? Might be worth checking the docs (maybe --verbose). Or... maybe it's to do with what has been compiled in? I didn't include that, but I'll add it (also I'm running on linux, which shouldn't make any difference): it's compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib
    – 13ren
    Commented Jul 25, 2009 at 5:08
  • Looks like the version of libxml in Cygwin is compiled with the same options, and there doesn't seem to be any --verbose-like option. I ended up grabbing the Win32 binaries for 2.6.27 from the official site and it works just fine. A backwards step, indeed. :-)
    – Ben Blank
    Commented Jul 26, 2009 at 16:07
4

As 13ren stated above, libxml's xmllint does report line numbers - perhaps you have a version issue. You might find it useful to grab native (non cygwin) versions of the libxml/libxslt tools from http://www.zlatkovic.com/libxml.en.html

You might also want to take a look at msv from Sun. It isn't a full implementation of XSD but might do the job (I use it for RelaxNG validation generally)

2
  • The linked windows binaries are unfortunately completely outdated and don't seem to work properly on modern Windows versions (at least I couldn't get the xmllint to produce useful output on my Windows 10). Do you know about up-to-date binaries somewhere? Otherwise it's back to building it on one's own I guess, done that, and it's simple enough with CMake
    – codeling
    Commented Feb 10, 2023 at 14:34
  • 1
    This answer might be useful
    – Nic Gibson
    Commented Feb 10, 2023 at 16:19
2

I would suggest Windows Powershell with PowerShell Community eXtensions. PSCX has the Test-Xml cmdlet which has the following Get-Help detailed description:

Tests for well formedness and optionally validates against XML Schema. It doesn't handle specifying the targetName space. To see validation error messages, specify the -Verbose flag.

I do not know if it reports the errors with linenumbers but 3 out 4 isn't bad.

1

You might try one of the Visual Studio 2008 Express editions. There's much better XML support now, including validation, of course, but also XML Intellisense, XML snippets, and an XML Schema view.

2
  • I doubt that devenv can validate XML files on the commandline.
    – Joey
    Commented Jul 23, 2009 at 19:03
  • I didn't suggest it could. I'm suggesting the UI experience may be sufficiently good to change how you work with XML files. Commented Jul 23, 2009 at 19:42
1

Unable to comment, but the latest version of xmllint (20708) Windows port from Igor Zlatkovic is giving line numbers as well.

0

Xerces-J comes with a sample application, jaxp.SourceValidator. You can feed it your XML file and it will validate it.

As for Xerces-C, I haven't used it myself, but I know it does not require all of the Visual C++, all it needs is runtime files. These can be downloaded separately from Microsoft. There seems to be a sample application which does what you need - see StdInParse

Not the answer you're looking for? Browse other questions tagged or ask your own question.