IVOA Vocabulary Validator ------------------------- This checks a SKOS vocabulary against the prescriptions described in the (proposed) Recommendation at http://www.ivoa.net/Documents/latest/Vocabularies.html. It is based on Simon Jupp's SKOS API available at , and included within lib/. To build, simply use: % ant To build the distribution jar file (placed in directory dist/): % ant dist-plain To use the validator, give a URL for the vocabulary as an argument to the VocabularyValidator main function: % java -jar dist/VocabularyValidator-x.x.jar .../foo.rdf Alternatively, you can invoke it from Java code if that's useful. For the required javadocs, do 'ant javadoc'. You may decide (for one reason or another) to ignore some warnings. These can be specified with the -w option; see -W for the list of (labels of) ignorable warnings. To run the self-tests, copy build.properties.template to build.properties, and fill in the location of a junit.jar. Then run the tests with: % ant test To build the compressed jar file, copy build.properties.template to build.properties, and fill in the location of the ProGuard home (see http://proguard.sourceforge.net). Then: % ant dist If the version number and release date need to be updated here, do so in build.xml. What does this validator check? ------------------------------- The validator aims to check as many of the document's normative remarks as possible. These primarily appear in section 3, [#publishing]. * Section 3.1.1 [#req-derefns]. Checks the 303-dance * Section 3.1.2 [#req-availability]. Can't be tested. * Section 3.1.3 [#req-distformat]. Tested as part of Sect 3.1.1 testing. * Section 3.1.4 [#versioning]. Can't be tested, realistically. * Section 3.1.5 [#req-labels]. Tested. See the notes below about extended requirements -- this checks that all the labels have a language tag, that at least one of them is @en, and that there is a prefLabel@en. * Section 3.1.6 [#req-sourcefiles]. Negative requirement, not tested. * Section 3.2 [#practices]. Checks practices-id (concept regexp), practices-lang (require language tag), practices-relations (reciprocated relationships), practices-singlescheme (single ConceptScheme). Practices #practices-readable, #practices-labelnumber, #practices-mappings, #practices-existing cannot be checked mechanically. Practices #practices-conceptmd and #practices-topconcepts, could be checked mechanically, but as the spec notes, this practice could quite reasonably be violated, and we can't distinguish this. Some problems found ------------------- Working through the document with the validator in mind, I found a couple of problems with it. * Some (sub)sections with requirements didn't have IDs; added. * Section 3.1.1 (#req-derefns) says that dereferencing the namespace SHOULD provide RDF, but 3.1.3 (#req-distformat) says it MUST. We should go with the latter. * MIME types for Turtle. Only application/rdf+xml is registered, but http://www.w3.org/TeamSubmission/turtle/ anticipates text/turtle, and says that application/x-turtle should be accepted pre-registration. * Section 3.1.5 (#req-labels) doesn't require that all labels have a language tag; we should, and require that at least one of them be @en. * ...but that conflicts with Section 3.2 item 5, which only says that they SHOULD have a language tag. They now MUST have a language tag. * I'd never required that there be only one ConceptScheme in a vocabulary. This is now a MUST. * Emphasise that people should use the DC Terms namespace, rather than the older DC Elements namespace, and that dct:creator is an object property. * There seems no good reason to forbid [0-9] as characters at the start of a concept name, so I've relaxed that Release history ------------ 0.1, 2009-06-22 Initial release 0.2, 2010-02-16 Improved error feedback, minor interface change to Logger class. 0.3, 2012-04-18 Added -w and -W options, to ignore warnings.