ViewVC logotype

Annotation of /trunk/projects/semantics/Vocabularies/Vocabularies.tex

Parent Directory Parent Directory | Revision Log Revision Log

Revision 5922 - (hide annotations)
Thu Jan 14 08:12:17 2021 UTC (3 months, 3 weeks ago) by msdemlei
File MIME type: application/x-tex
File size: 80899 byte(s)
Vocabularies: Misc updates prior to PR

* enabling hyphens within prefixed vocterms
* various editorial updates

1 msdemlei 5459 \documentclass[11pt,a4paper]{ivoa}
2     \input tthdefs
4 msdemlei 5612 \usepackage{todonotes}
5 msdemlei 5824 \lstloadlanguages{XML,python}
6 msdemlei 5486 \lstset{flexiblecolumns=true,tagstyle=\ttfamily, showstringspaces=False,
7     basicstyle=\footnotesize}
9 msdemlei 5567 \definecolor{termcolor}{rgb}{0.6,0.1,0.1}
10 msdemlei 5922
11     \iftth
12     \def\vocterm#1{\emph{\color{termcolor}#1}}
14     \else
15     \def\vocterm{\startvocterm\realvocterm}
16     \def\realvocterm#1{\emph{\color{termcolor}#1}\endvocterm}
17     \begingroup
18     \gdef\breakablecolon{:\hskip0pt}
19     \catcode`\:=\active
20     \gdef\startvocterm{\begingroup
21     \catcode`\:=\active\let:=\breakablecolon}
22     \gdef\endvocterm{\endgroup}
23     \endgroup
24     \fi
27 msdemlei 5704 \newcommand{\vepitem}[1]{\emph{#1}}
28 msdemlei 5474
29 msdemlei 5459 \title{Vocabularies in the VO}
31     % see ivoatexDoc for what group names to use here
32     \ivoagroup{Semantics}
34     \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkusDemleitner]{Markus
35     Demleitner}
36 msdemlei 5776 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/NormanGray]{Norman
37     Gray}
38 msdemlei 5800 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkTaylor]{Mark
39     Taylor}
40 msdemlei 5459
41     \editor{Markus Demleitner}
43 msdemlei 5922 \previousversion[https://ivoa.net/documents/Vocabularies/20200612/]
44     {WD-20200612}
45     \previousversion[https://ivoa.net/documents/Vocabularies/20200326/]
46     {WD-20200326}
47 msdemlei 5755 \previousversion[http://ivoa.net/documents/Vocabularies/20190905/]
48     {WD-20190905}
49 msdemlei 5459
51     \begin{document}
52     \begin{abstract}
53 msdemlei 5470 In this document, we discuss practices related to the use of RDF-based
54 msdemlei 5911 consensus vocabularies in the Virtual Observatory, that is the creation,
55     publication, maintenance, and consumption of
56     hierarchical word lists agreed upon within the IVOA.
57 msdemlei 5610 To cover the wide range of use cases envisoned, we define three flavours
58     of such vocabularies: SKOS for informal knowledge organisation on the
59     one hand, and strict hierarchies of classes and properties on the other.
60 msdemlei 5758 While the framework rests on the solid foundations of W3C RDF,
61     provisions are made to facilitate using IVOA vocabularies without
62     specific RDF tooling.
63 msdemlei 5551 Non-normative appendices detail the current vocabulary-related tooling.
64 msdemlei 5459 \end{abstract}
67     \section*{Acknowledgments}
69     While this is a complete rewrite of the specification how vocabularies
70     are treated in the VO, we gratefully acknowlegde the groundbreaking work
71     of the authors of version 1 of Vocabulary in the VO, S\'ebastien
72     Derriere, Alasdair Gray, Norman Gray, Frederic Hessmann, Tony Linde,
73     Andrea Preite Martinez, Rob Seaman, and Brian Thomas.
75     In particular, the vocabulary for datalink semantics done by Norman Gray
76 msdemlei 5547 was formative for many aspects of what is specified here.
77 msdemlei 5459
78     \section*{Conformance-related definitions}
80     The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
81     ``OPTIONAL'' (in upper or lower case) used in this document are to be
82     interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
84     The \emph{Virtual Observatory (VO)} is a
85     general term for a collection of federated resources that can be used
86     to conduct astronomical research, education, and outreach.
87     The \href{http://www.ivoa.net}{International
88     Virtual Observatory Alliance (IVOA)} is a global
89     collaboration of separately funded projects to develop standards and
90     infrastructure that enable VO applications.
92     \section{Introduction}
94 msdemlei 5553 The W3C's Resource Description Framework RDF \citep{note:rdfprimer} is a powerful
95 msdemlei 5470 and very generic means to represent, transmit, and reason on highly
96     structured, ``semantic'' information. With both its power and
97     generality, however, comes a high complexity for consumers of this
98     information if no further conventions are in force. Also, the generic
99     W3C standards understandably do not cover how semantic resources (e.g.,
100     vocabularies or ontologies) are to be managed, let alone developed
101     within organisations like the IVOA.
102 msdemlei 5459
103 msdemlei 5911 While for many applications even within the VO, the significant
104     complexity and the lack of defined management processes is acceptable,
105     for several other use cases -- in particular those given in
106     sect.~\ref{sect:usecases} ––, having extra extra conventions greatly
107     help implementatability and interoperability.
109     Based on requirements derived from these use cases
110 msdemlei 5567 (sect.~\ref{sect:requirements}), this standard will therefore define
111     conventions for
112 msdemlei 5485 vocabularies based on either SKOS or RDFS in
113 msdemlei 5553 sect.~\ref{sect:voccontent}. Where these vocabularies -- and hence, in
114 msdemlei 5758 particular, the permanent URIs of their RDF resources (``terms'')
115     -- are managed by the
116 msdemlei 5567 IVOA, they need to be reviewed and consensus be found. A process to
117     ensure this is described in
118     sect.~\ref{sect:management}. In order
119 msdemlei 5485 to provide certain guarantees to clients, sect.~\ref{sect:deployment}
120     defines minimal standards for how IVOA-managed vocabularies must be made
121 msdemlei 5911 available. In order to help adopters simply looking for simple
122     vocabulary-related recipes, sect.~\ref{sect:withoutrdf} discusses how IVOA
123 msdemlei 5758 vocabularies can be used without knowledge of RDF.
124 msdemlei 5470
125 msdemlei 5551 The non-normative appendices~\ref{app:tools} and \ref{app:curtech}
126     describe the tooling
127 msdemlei 5553 currently used or recommended for building and managing vocabularies in the
128 msdemlei 5485 IVOA.
129 msdemlei 5470
131 msdemlei 5459 \subsection{Role within the VO Architecture}
133     \begin{figure}
134     \centering
136     \includegraphics[width=0.9\textwidth]{role_diagram.pdf}
137     \caption{Architecture diagram for this document}
138     \label{fig:archdiag}
139     \end{figure}
141 msdemlei 5758 Fig.~\ref{fig:archdiag} shows the role the Vocabularies in VO standard
142 msdemlei 5922 plays within the IVOA architecture \citep{2010ivoa.rept.1123A}.
143 msdemlei 5459
144 msdemlei 5911 This standard defines a set of conventiontions on procedures on
145     top of several W3C standards that can be adopted by other VO standards
146     that require interoperable, consensus vocabularies, such as:
147 msdemlei 5459
148 msdemlei 5470 \begin{bigdescription}
149     \item[Datalink \citep{2015ivoa.spec.0617D}] Datalink includes a
150     vocabulary letting clients work out the kind of artefact a row pertains
151     to.
153     \item[VOResource \citep{2018ivoa.spec.0625P}] VOResource 1.1 comes with
154     several (rather flat) vocabularies enumerating, for instance, the types
155     of relationships between VO resources, their intended audiences, or
156     classes of actions performed on them.
158     \item[VOEvent \citep{2006ivoa.spec.1101S}] VOEvent defines \emph{Why}
159     and \emph{What} elements which, while not formally required to be drawn
160     from a specific vocabulary in version 1.11, certainly become much more
161     useful if they are.
163 msdemlei 5752 \item[VOTable \citep{2019ivoa.spec.1021O}] VOTable, in its version 1.4,
164 msdemlei 5470 introduces vocabularies for time scales and reference positions.
167     \item[UCDs \citep{2007ivoa.spec.0402M}] UCDs are related to vocabularies in
168     that they provide machine-readable semantics. Because the terms listed
169     in the document can be combined and have an underlying grammar, however,
170     they go beyond standard RDF.
171     \end{bigdescription}
173 msdemlei 5911 Other VO standards can do with fewer normative constraints; using W3C
174     standards without the extra requirements laid down here is explitly
175     encouraged where the use cases do not require the extra management and
176     definition effort, or where perhaps more complex structures (e.g., full
177     ontologies) must be employed. An example for a direct use of SKOS
178     without adoption of the present document is the Simulation Data Model
179     SimDM \citep{2012ivoa.spec.0503L}, where several fields the values of
180     which are required to be \vocterm{skos:narrower} than certain top-level
181     concepts but no further restrictions on the vocabularies need to be
182     imposed.
184 msdemlei 5485 \subsection{Relationship to Vocabularies in the VO Version 1}
186     Published in 2009, version 1.19 of the IVOA Recommendation on
187 msdemlei 5567 Vocabularies in the VO had an outlook fairly different from the present
188 msdemlei 5485 document: The big use case was VOEvent's Why and What, and so its focus
189 msdemlei 5612 was on large, general-purpose vocabularies, of which several existed even
190 msdemlei 5551 back then, while an overhaul of a thesaurus of general astronomical
191 msdemlei 5485 terms approved by the IAU in 1993 was underway as part of IVOA's
192 msdemlei 5567 activities. Mapping between vocabularies maintained by different VO
193     and non-VO parties seemed to be the way to ensure interoperability and
194 msdemlei 5485 therefore played a large role in the document. Also, the use cases
195     called for ``soft'' relations, which is why the standard confined itself
196     to SKOS as the vocabulary formalism.
198     Since then, ``the'' large astronomy thesaurus is being maintained
199     outside of the IVOA (the UAT\footnote{\url{http://astrothesaurus.org}}),
200 msdemlei 5551 and there is hope that its takeup will be sufficient to make mapping
201     between it and, say, legacy journal keyword systems an exercise general
202     clients will not have to perform.
203 msdemlei 5485
204     Instead, in 2010, a fairly formal vocabulary of what
205 msdemlei 5551 should be properties (in the RDF sense) rather than \vocterm{skos:Concept}-s
206 msdemlei 5485 was required during the development of the datalink standard. The
207     vocabulary was (and still is) small in comparison to, say, the UAT. In
208     contrast to the expectations of Vocabularies~1, the plan had been that
209     most data providers would work with this small vocabulary, and terms
210     from external vocabularies would only be used as temporary stand-ins
211     until the consensus vocabulary was updated. Of course, this required a
212     process for managing such vocabularies. The lack of such a process
213 msdemlei 5553 became even more noticeable when VOResource 1.1 and VOTable 1.4
214 msdemlei 5758 introduced vocabularies of their own similar in size and scope to the
215     datalink vocabulary.
216 msdemlei 5485
217     On the other hand, we are not aware of a single attempt to map
218     between different vocabularies in a VO context, and the SKOS versions of
219     some vocabularies that Vocabularies 1 declared as normative in its
220 msdemlei 5567 section~4 were largely unused and have been unmaintained for a while now.
221 msdemlei 5485
222     Since large parts of the original specification turned out to be
223 msdemlei 5553 irrelevant or unsustainable as the VO ecosystem evolved,
224     while some core requirements found later
225 msdemlei 5551 were not addressed, it was decided to prepare a new major version of the
226     Vocabularies in the VO standard.
227 msdemlei 5485
228 msdemlei 5754 \subsection{Reading Guide}
229 msdemlei 5485
230 msdemlei 5754 We hope that software authors or annotators just wanting to consume IVOA
231 msdemlei 5758 vocabularies or use them to annotate documents will be able to
232 msdemlei 5754 do so after reading just section~\ref{sect:withoutrdf}. In particular, no
233     deeper understanding of RDF should be necessary.
235 msdemlei 5758 Persons intending to participate in vocabulary evolution should skim
236 msdemlei 5754 sect.~\ref{sect:voccontent}, in particular the subsection on the kind of
237     vocabulary they want to modify, and must study
238     sect.~\ref{sect:management}.
240 msdemlei 5553 Readers unfamiliar with RDF should read \citet{local:normanspaper} before
241 msdemlei 5754 reading anything outside of section~\ref{sect:withoutrdf}.
242     In particular, we assume familiarity with all RDF
243 msdemlei 5485 terminology discussed there. Concepts not covered by Gray's
244 msdemlei 5567 essay will be informally introduced here. Of course, the
245     underlying W3C standards are normative where applicable.
246 msdemlei 5485
247 msdemlei 5754
249     \subsection{Terminology, Conventions, Typography}
251 msdemlei 5758 When we speak of \emph{term} here, that either means a \vocterm{skos:Concept}
252 msdemlei 5595 in SKOS vocabularies, an \vocterm{rdfs:Class} in RDF class vocabularies,
253 msdemlei 5758 and an \vocterm{rdf:Property} in RDF property vocabularies. We also use
254     \emph{term} for ``the string after the hash character in
255     the RDF resource URI'', i.e., the machine-readable string typically used
256 mbt 5798 in annotation. It is rarely necessary to distinguish between the two
257 msdemlei 5758 meanings.
258 msdemlei 5485
259     We refer to classes and properties by CURIEs. The prefixes in this
260     document correspond the the following URIs:
262 msdemlei 5551 \begin{compactitem}
263 msdemlei 5530 \item dc -- \url{http://purl.org/dc/terms/}
264     \item rdf -- \url{http://www.w3.org/1999/02/22-rdf-syntax-ns#}
265 msdemlei 5485 \item rdfs -- \url{http://www.w3.org/2000/01/rdf-schema#}
266 msdemlei 5530 \item owl -- \url{http://www.w3.org/2002/07/owl#}
267 msdemlei 5485 \item skos -- \url{http://www.w3.org/2004/02/skos/core#}
268 msdemlei 5553 \item ivoasem -- \url{http://www.ivoa.net/rdf/ivoasem#}
269 msdemlei 5551 \end{compactitem}
270 msdemlei 5485
271 msdemlei 5598 Vocabulary terms are written in italics (e.g., \vocterm{rdfs:Class})
272     and, where supported, in a reddish hue. As common in IVOA
273     specifications, XML element and attribute names are written in
274     typewriter italic (e.g., \xmlel{img}).
275 msdemlei 5485
276 msdemlei 5758 \section{Derivation of Requirements (Non-Normative)}
277 msdemlei 5485
278 msdemlei 5474 \subsection{Use Cases}
279 msdemlei 5485 \label{sect:usecases}
280 msdemlei 5470
281 msdemlei 5474 The normative content of this document is guided by a set of
282 mbt 5650 requirements derived from the following use cases.
283 msdemlei 5474
284     \subsubsection{Controlled Vocabulary in VOResource}
285     \label{uc:simplevoc}
287 msdemlei 5758 In VOResource, in certain use cases clients have to find services that
288     publish a given data collection. This is effected by linking the resource
289     records for service and data with a
290     DataCite-compatible \vocterm{isServedBy} relationship.
291 msdemlei 5474 Its concrete literal needs to be reliably defined in order to let
292 msdemlei 5553 clients find such relationships by a simple string comparison in RegTAP
293     queries.
294 msdemlei 5474
295     A related use case is that validators can flag errors (or at least
296 msdemlei 5567 warnings) when resource records use terms that are not part of some
297     controlled vocabulary (e.g., content levels or types of events in a
298 msdemlei 5612 resource's history). Very typically, such out-of-vocabulary terms
299     indicate small oversights on the part of the resource record author that
300     will lead to hard-to-debug problems in data discovery.
301 msdemlei 5474
302     \subsubsection{Controlled Vocabularies in VOTable}
303     \label{uc:votvoc}
305 msdemlei 5758 VOTable 1.4 constrains two attributes of the TIMESYS elements
306     -- reference positions and time
307     scales -- using vocabularies.
308 msdemlei 5752 While with time scales the situation is not fundamentally
309 msdemlei 5474 different from the VOResource case discussed in
310 msdemlei 5758 use case.~\ref{uc:simplevoc} -- a simple enumeration of agreed-upon strings
311 msdemlei 5474 is enough to uniquely determine what operations need to be performed to
312     combine times given in different time scales --, the situation for
313     reference positions is probably different. There, even if a client does
314     not exactly know the location of, say, the Hubble Space Telescope at any
315     given time, several important use cases can already be satisfied if a
316     client knows that it is in lower Earth orbit (e.g., assuming a reference
317     position Geocenter and adjusting the systematic error estimates). For
318     this, a client needs information of the type ``\vocterm{HST}
319 msdemlei 5752 \vocterm{is-close-to} \vocterm{GEOCENTER\/}'' (or similar).
320 msdemlei 5474
321 msdemlei 5567 There is also another difference between this and at least the
322     VOResource relationship vocabulary from use case~\ref{uc:simplevoc}
323     in that the latter is property-like, as
324 msdemlei 5551 in ``Resource-1 \vocterm{isServedBy} Resource-2\/''. In constrast with
325 msdemlei 5752 this, a time scale would be used like ``Time-coordinate
326     \vocterm{is-given-in}
327     \vocterm{TT\/}''. In RDFS terminology, they are therefore better modelled
328 msdemlei 5474 as classes rather than properties.
330     \subsubsection{Datalink Link Selection}
331     \label{uc:links}
333 msdemlei 5612 In Datalink, clients receive a set of links
334 msdemlei 5758 to pieces of information (e.g., previews, additional metadata,
335     progenitors, or
336     derived data) and need to present to the user only those items
337 msdemlei 5474 relevant to the task at hand. For instance, in a discovery phase, only
338     previews should be offered, while scientific exploitation would call for
339 msdemlei 5758 cutout services, alternate formats, or derived data. For debugging,
340 msdemlei 5474 progenitors should be made accessible, and so on.
342     Operators of datalink services, on the other hand, want to be precise in
343     their annotation of datasets. For instance, they may want to discern
344 msdemlei 5612 among progenitors the raw image, a dark frame, and a flat field. In all
345 msdemlei 5758 these cases, clients should still be able to work out that such
346 msdemlei 5474 artefacts are progenitors.
348 msdemlei 5567 \subsubsection{VOEvent Filtering, Query Expansion}
349 msdemlei 5474 \label{uc:filtering}
351     In VOEvent, an event stream can contain a classification of what the
352 msdemlei 5551 observers believe was observed, for instance ``supernova Ia explosion''.
353 msdemlei 5474 While an event stream from one project might provide a classification on
354     that level for some event, it might not (yet) be able to do that in
355 msdemlei 5758 another event, and a different event stream might not be able to
356 msdemlei 5474 distinguish between different sorts of supernovae at all.
358     In this situation, an event broker looking for supernovae of type Ia
359     will filter out anything not related to supernovae; however, since for
360     one reason or another a Ia supernova might only be tagged as supernova,
361     it will want to widen its filter somewhat, where some backend process
362     might prioritise events classified as Ia upstream over those only tagged
363     as a generic supernova, and those, again, over those tagged explicitly
364     as some different type of supernova.
366     Similar use cases exist, for instance, in the discovery of simulations
367     and possibly for subjects of VO resources.
370     \subsubsection{Vocabulary Updates in VOResource}
371     \label{uc:deprecation}
373     In VOResource 1.0, relationship types like \vocterm{served-by} or
374     \vocterm{service-for} were defined. Later, DataCite defined equivalent
375 msdemlei 5551 terms \vocterm{IsServedBy} and \vocterm{IsServiceFor}. Arguably, the VO should,
376 msdemlei 5474 as far as sensible, take up standards in the wider data management
377     community, and so VOResource 1.1 adopts the DataCite terms. In a minor
378     version, it cannot forbid the old terms. It can, however, say not only
379 msdemlei 5824 ``\vocterm{served-by\/} is the same as \vocterm{isServedBy\/}'' but also
380 msdemlei 5567 ``Use the latter term in preference to the former''. If this information is
381 msdemlei 5474 available machine-readably, validators can warn against the use of
382 msdemlei 5553 deprecated terms and user interfaces can transparently replace
383     deprecated terms with current ones. This latter use case is is
384 msdemlei 5752 already specified in RegTAP 1.1 \citep{2019ivoa.spec.1011D}.
385 msdemlei 5474
386 msdemlei 5597 Another use case in the context of VOResource and vocabulary updating
387 msdemlei 5612 is the definition of content levels. In VOResource 1.0, a list of
388 msdemlei 5597 terms was adopted that was far too fine-grained in the area of public
389     outreach, distinguishing, for instance, ``Middle School'' from
390     ``Secondary Education''; while this granularity was useful for the
391     original realm of the list of terms, in the VO it resulted in extremely
392 msdemlei 5612 inhomogeneous annotation. Obviously, persons employed in research
393 msdemlei 5597 institutions can hardly be expected to assess needs and capabilities of
394     middle school versus elementary school educators. Eventually, for
395     VOResource 1.1 a three-term list was drawn up and is now actually used.
396     To avoid a repetition of such an experience, we want to enable small
397     initial vocabularies easily extendable as new terms are actually needed
398     and the use of the existing terms is well understood.
401 msdemlei 5911 \subsubsection{Vocabularies in VO-DML}
403     The modelling language VO-DML \citep{2018ivoa.spec.0910L} lets model
404     designers constrain attribute values though external resources defined
405     through a vocabulary URI and possibly a top concept. The standard
406     mentions both SKOS -- inspired by version 1 of this document -- and RDFS
407     as possible technologies for such constraints.
409     Depending on the nature of the attributes constrained, modellers might
410     forsee the need for having these vocabularies managed by the IVOA. Of
411     course, that is up to the modeller: There are certainly many cases in
412     which there is no need for the overhead this specification brings with
413     it, be it because vocabularies are externally defined or because the
414     concrete application profits from less-constrained vocabularies.
416 msdemlei 5474 \subsubsection{Discovering Meanings}
417     \label{uc:discovering}
419 msdemlei 5612 Software developers or researchers want to work out
420 msdemlei 5485 what some term mentioned ``means'' (where we are agnostic as to what
421     ``means'' should mean here). If the term URI alone is insufficient,
422     they can simply paste the resource URI of the term into a web browser
423 msdemlei 5551 and read (at least) its description and perhaps find out even more using
424     relationships between terms.
425 msdemlei 5474
426 msdemlei 5552 \subsubsection{Simple Review Process}
427     \label{uc:simplereview}
428 msdemlei 5485
429 msdemlei 5552 As vocabularies evolve, new terms are being added to
430     vocabularies. To facilitate their review and enable rapid uptake
431     of the proposed terms, it is desirable that new terms and even
432     new vocabularies are immediately visible to users and tools.
433     Note that since terms under review might be modified or removed later,
434     this use case is somewhat in conflict with the basic requirement
435     of stable vocabularies (i.e., a document valid once will not
436     become invalid later because of changes in vocabularies).
438 msdemlei 5912 \subsubsection{Understanding Vocabulary Evolution}
439     \label{uc:understanding}
441     When a question coes up what, say, \vocterm{calibration} actually means
442     in the datalink core vocabulary, and the (legacy) description is not
443     sufficiently clear, people can go back to the discussions that lead up
444     to the addition of that term. This will also help clarify existing
445     usage that might have begun at the time of the initial definition.
447 msdemlei 5612 \subsubsection{Offline operation}
448     \label{uc:offline}
450     A system doing, say, coordinate transformations runs without an internet
451 msdemlei 5758 connection but still needs to use semantic resources on frames and
452     reference positions (e.g., figure out that a given space probe is in L1
453 msdemlei 5612 and use that as reference position). To do that, it wants to use a
454     previously downloaded copy of the vocabulary.
456 msdemlei 5721 \subsubsection{UAT in VOResource}
457     \label{uc:uat}
458 msdemlei 5612
459 msdemlei 5721 VOResource 1.1, in the description of the \xmlel{subject} element, says
460 mbt 5798 that its content ``should be drawn from the Unified Astronomy Thesaurus''
461 msdemlei 5721 (here: UAT). This is intended to later facilitate interactive topic
462     navigation within the Registry or semantic expansion of Registry queries
463     (``include narrower terms'').
466 msdemlei 5474 \subsection{Requirements}
467 msdemlei 5485 \label{sect:requirements}
468 msdemlei 5474
469     \subsubsection{Lists of Terms}
470     \label{req:lists}
472 msdemlei 5567 We need to be able to represent simple lists of terms even for the most
473 msdemlei 5486 basic use case~\ref{uc:simplevoc}. As per
474 msdemlei 5553 use case~\ref{uc:votvoc}, we will have to represent instances of both
475     \vocterm{rdf:Property} and \vocterm{rdfs:Class} (though not necessarily
476 msdemlei 5914 in one vocabulary). In order to not break existing practices (e.g.,
477     use cases \ref{uc:simplevoc}, \ref{uc:votvoc}, \ref{uc:links}), the
478     machine-readable terms must be allowed to follow existing patterns of
479     essentially human-readable identifiers (against external best practices
480     of using non-informative URI forms). In general, in essentially all use
481     cases discussed, making the machine-readable terms discernable by a
482     human is an advantage.
483 msdemlei 5474
484     \subsubsection{Hierarchies of Terms}
485     \label{req:hierarchy}
487 msdemlei 5553 Both use case~\ref{uc:links} and use case~\ref{uc:filtering} require a hierarchy
488     of terms, where clients can find wider and potentially narrower terms
489     relative to an original one. There is a difference,
490 msdemlei 5474 however: in the datalink use-case, strict \vocterm{is-a} relationships
491     are what clients need (e.g., ``give me all kinds of previews''). In the
492     VOEvent case, however, a somewhat softer sort of hierarchy is required.
493     For instance, a filter for accretion disks might very well expand to
494 mbt 5798 match both quasars and cataclysmic variables. Hence, we want to
495 msdemlei 5474 be able to represent strict class hierarchies as well as thesaurus-like
496     soft knowledge structures.
498 msdemlei 5600 \subsubsection{Tree-like Hierarchies}
499 msdemlei 5599 \label{req:tree}
501     Where we expect some sort of semi-formal inference to take place on the
502     vocabularies, the hierarchy should be a tree in order to facilitate
503     traversal and controlled query expansion. In other words, outside of
504     SKOS we do not support multiple inheritance. Use cases requiring
505 msdemlei 5758 something equivalent would have to resort to supporting multiple terms
506     on the annotation level.
507 msdemlei 5599
508 msdemlei 5474 \subsubsection{Consensus Vocabularies}
509     \label{req:consensus}
511     Essentially all our our use cases will be much easier to implement if
512     clients can work through simple string comparisons. Therefore,
513 mbt 5650 wherever feasible IVOA standards should build on IVOA-sanctioned,
514 msdemlei 5474 consensus vocabularies.
516     \subsubsection{Deprecating Terms}
517     \label{req:deprecating}
519     While we believe at this point that terms once approved by the IVOA
520     should never disappear -- for instance, because validators might
521     otherwise flag previously valid instance documents as invalid --, use
522 msdemlei 5551 case~\ref{uc:deprecation} shows that some way of declaring
523     deprecations must be forseen.
524 msdemlei 5474
525 msdemlei 5486 \subsubsection{Public Availability of Machine-Readable Vocabularies}
526     \label{req:machine}
527 msdemlei 5474
528 msdemlei 5486 In particular in use cases~\ref{uc:links} and \ref{uc:filtering},
529 msdemlei 5474 clients can flexibly incorporate vocabulary updates without code
530     changes, perhaps even without re-deployment, if vocabularies are
531 msdemlei 5485 available at constant, public URIs, where clients can retrieve them in
532     formats reasonably easy to parse.
533 msdemlei 5474
534 msdemlei 5485 Use case~\ref{uc:discovering} implies that at least one representation
535 msdemlei 5612 of the vocabulary should be human-readable.
536 msdemlei 5474
537 msdemlei 5485 \subsubsection{Minimal Term Metadata}
538 msdemlei 5486 \label{req:mtm}
539 msdemlei 5474
540 msdemlei 5485 To support use case~\ref{uc:discovering}, all terms in IVOA vocabularies
541 msdemlei 5619 MUST come with a non-trivial description.
542 msdemlei 5474
543 msdemlei 5486 \subsubsection{Simple Cases do not Require RDF Tooling}
544 msdemlei 5752 \label{req:nordf}
545 msdemlei 5486
546     (Not derived from any specific use case). Since libraries implementing
547     (some subset of) RDF tend to be rather massive and thus appear
548     unproportional when all a client wants is an up-to date list of terms
549 msdemlei 5752 with their descriptions, at least the basic use cases must not require
550     specific RDF tooling. Indeed, simple uses should not require an
551     understanding of RDF in the first place.
552 msdemlei 5486
553 msdemlei 5752
554 msdemlei 5552 \subsubsection{Vocabulary Evolution}
555     \label{req:evolution}
556 msdemlei 5486
557 msdemlei 5553 Most use cases make it desirable that terms can be added to existing
558 msdemlei 5552 vocabularies; this is very clear for the reference positions in
559     use case~\ref{uc:votvoc}, where new instruments would imply new
560 msdemlei 5612 terms. The history of content level annotation in VOResource mentioned
561     in use case~\ref{uc:deprecation} illustrates the desirability of a
562     simple process that invites standard authors to start with minimal
563     vocabularies, relying on later extensions.
564 msdemlei 5552
565 msdemlei 5912 \subsubsection{Traceable Provenance}
566     \label{req:traceable}
568     To satisfy use case~\ref{uc:understanding}, the considerations that led
569     to the adoption or modification of a term must be documented publicly
570     in sufficient detail. It is clearly an advantage if a brief, accessible
571     summary of these considerations can easily be found without, say,
572     resorting to version control logs.
574 msdemlei 5552 \subsubsection{Preliminary Vocabularies and Terms}
575     \label{req:preliminary}
577 msdemlei 5553 In use case~\ref{uc:simplereview}, it is desirable to admit
578 msdemlei 5552 ``preliminary'' vocabularies and terms. For these, both humans
579     and machines must be able to discern a temporary status, and
580     their use implies that the general rule ``once valid, always
581     valid'' does not apply. Validators and similar software could
582 msdemlei 5553 then add notices to that effect in their outputs.
583 msdemlei 5552
584 msdemlei 5612 \subsubsection{Vocabulary Files are Usable Stand-Alone}
585     \label{req:standalone}
587     Vocabulary files need to be cacheable without applications having to
588     manage extra metadata (e.g., the URL from which the file was obtained)
589 msdemlei 5758 in order to easily satisfy use case~\ref{uc:offline} (or other scenarios
590     in which vocabulary content cannot be retrieved from the IVOA
591 msdemlei 5612 site for each session).
593 msdemlei 5757 \subsubsection{Externally Curated Vocabularies and VO Tooling}
594 msdemlei 5752 \label{req:external}
595 msdemlei 5721
596     Regrettably, VOResource does not explain how use case~\ref{uc:uat} would
597     look like in actual documents, and the example given in the document
598     clearly does not use UAT concepts.
600     The first difficulty in a straightforward uptake is that UAT URIs look
601     like \url{http://astrothesaurus.org/uat/1774}. Given that, should
602     publishers have such URIs in \xmlel{subject}? Or should they rather use
603     just the last URI segment for conciseness? Or perhaps the preferred
604     labels, in keeping with the style of existing subject content and its
605     use by clients (which typically look for natural language in subject),
606     even though the labels are not considered stable?
608     Regardless of how VOResource clarifies this matter, UAT artefacts (e.g.,
609 msdemlei 5757 SKOS files), do not match some of our other requirements. In particular,
610 msdemlei 5721 the human-readable URIs from \ref{req:lists}, the specific way we
611 msdemlei 5752 satisfy \ref{req:machine}, and the non-RDF requirement \ref{req:nordf} are
612 msdemlei 5721 not immediately satisfied by the UAT as distributed at the time of
613     writing.
615     For simple, uniform use of such externally curated vocabularies, it
616     should be possible to have some sort of endorsement process and then
617     distribute the vocabularies in a form compliant with this specification.
618     This will entail IVOA-specific concept URIs, and we must be able to
619     express that these resources have the same meaning as the ones
620     externally maintained.
623 msdemlei 5485 \subsection{Non-Requirement}
624 msdemlei 5474
625 msdemlei 5485 This specification is not called ``Semantics in the VO'' or the like
626     because we do \emph{not} intend to prescribe ways to turn any VO
627 msdemlei 5612 artefact into RDF triples. Indeed, for many existing vocabularies, it
628 msdemlei 5485 is left open what exactly the domain or range of properties might be or
629     what subject and predicate the classes or concepts should be used with.
631     This is partly because this would substantially complicate the
632 msdemlei 5612 generation of vocabularies -- which would quickly turn into proper
633     ontologies --, partly because the information encoded by
634 msdemlei 5485 the triples has traditionally been expressed using techniques developed
635     by the Data Models working group.
637 msdemlei 5551 In particular with a view to later use in linked data scenarios,
638     vocabulary authors should neverthess take care that, given appropriate
639 msdemlei 5485 properties or annotation tools, the vocabularies \emph{could} be used in
640     meaningful RDF triples.
642 msdemlei 5758 Conversely, this specification is written with future ``deeper''
643 msdemlei 5612 semantics in the VO in mind; tools restricting their operations to the ones
644 msdemlei 5599 discussed here should not break when future specifications enrich
645     existing vocabularies towards full ontologies.
648 msdemlei 5754 \section{Using IVOA Vocabularies without RDF Tooling}
649     \label{sect:withoutrdf}
651 msdemlei 5758 RDF is a
652 mbt 5798 powerful system for expressing a wide range of semantics and enriching
653 msdemlei 5754 various documents with semantic information in a globally distributed
654     fashion. Due to its generality, handling its artefacts is relatively
655 msdemlei 5758 involved and in general requires special tooling, non-negligible
656 msdemlei 5754 investment in understanding RDF, and non-trivial management of URIs and
657     prefix mappings.
659 msdemlei 5757 To lower the bar for an adoption of IVOA vocabularies
660     [requirement~\ref{req:nordf}], they are given in
661 msdemlei 5754 two formats usable without RDF tooling or, indeed, deeper knowledge of
662     RDF. This section discusses these.
664     \subsection{Choosing Terms From IVOA Vocabularies}
666     Resource annotators can usually treat IVOA Vocabularies as simple lists
667 msdemlei 5824 of (case-sensitive) strings with human-readable labels and definitions.
668     These lists can be inspected with a simple web browser.
669 msdemlei 5754
670     Each IVOA vocabulary has an associated URI starting with
671     \url{http://www.ivoa.net/rdf}. Dereferencing that URI yields a list of
672 msdemlei 5824 the vocabularies approved or under review.
674     An individual vocabulary has a
675 msdemlei 5758 URI like \url{http://www.ivoa.net/rdf/refposition}. Dereferencing this URI
676 msdemlei 5800 with a web browser (or, indeed, any user agent indicating it prefers
677     text/html media) redirects to a tabular representation of the vocabulary,
678     giving \emph{terms} -- i.e., the strings actually used in annotation --,
679     \emph{labels} -- i.e., strings that should be presented to humans instead of
680     the slightly formalised terms --, and \emph{descriptions}, which should
681 mbt 5798 be sufficiently precise to allow someone with a certain amount
682 msdemlei 5754 of domain expertise to decide whether a certain ``thing'' is or is not
683 msdemlei 5824 covered by the term (or more precisely, the underlying concept).
684 msdemlei 5754
685     Some terms may be marked as deprecated, in which case they should no
686     longer be used in new annotations. In most cases, deprecated terms will
687 mbt 5798 come with information about what to use instead.
688 msdemlei 5754
689     Some terms may be marked as preliminary. Such terms might disappear
690     without further notice. Casual users should avoid the use of such
691     terms; if they find they want to use them, the semantics working group
692     requests notification over its mailing list, since such use is clearly
693     relevant to the term's adoption process.
695 msdemlei 5824 Once a term is located within the HTML page, annotators can usually
696     directly use it in instance documents. For instance, continuing the
697     refposition example, the string \texttt{BARYCENTER} found in the
698     vocabulary is directly used in VOTable's TIMESYS element.
699 msdemlei 5754
700 msdemlei 5824 Some applications (Datalink being the prime example) instead use URIs
701     relative to the vocabulary URI. In practical terms, this just means
702     that a hash sign is prepended to the term (e.g., \texttt{\#progenitor}).
704     This latter practice builds on the property of IVOA vocabularies that if
705     one adds the term as fragment to the vocabulary URI (e.g.,
706     \url{http://ivoa.net/rdf/refposition#BARYCENTER}), that URI is the full,
707     RDF-compliant resource identifier of the concept. When used in
708     HTML-aware user agents (such as a web browser), dereferencing this URI
709     (i.e., opening it) will give the table of terms with the chosen term
710     highlighted. How exactly this is represented depends on the user agent.
713     \subsection{Semantic Operations Without RDF Tooling}
714 msdemlei 5754 \label{sect:desise}
716     Many VO components need a machine-readable representation of the
717 msdemlei 5758 entire vocabulary, for instance in order to
718     (cf.~sect.~\ref{sect:usecases}):
719 msdemlei 5754
720     \begin{compactitem}
721 msdemlei 5758 \item display labels and descriptions for terms to users,
722     \item perform query expansion or similar exploitation of hierarchical
723     relationships, or
724     \item validate annotated instances for the use of correct and current
725     terms.
726 msdemlei 5754 \end{compactitem}
728     To let VO programs perform such tasks with minimal technical overhead,
729     in addition to the RDF artefacts described in
730     sect.~\ref{sect:deployment}, IVOA vocabularies are also available in an
731     ad-hoc format called desise (``dead simple semantics''). Clients can
732     obtain vocabularies in desise by retrieving the vocabulary URI with the
733 msdemlei 5824 HTTP accept header set to \texttt{application/x-desise+json}.
734 msdemlei 5754
735 msdemlei 5826 What is returned is a JSON-encoded \citep{std:JSON} mapping (``object''
736     in JSON terms)
737 msdemlei 5824 containing the following keys (all mandatory):
738 msdemlei 5754
739     \begin{description}
740     \item[uri] The vocabulary URI. All terms occurring in desise documents
741     can be turned into full, RDF-compliant resource URIs by prefixing them
742     with this URI and a hash character.
743     \item[flavour] The flavour of the vocabulary (can generally be ignored;
744 msdemlei 5758 see sect.~\ref{sect:voccontent}).
745 msdemlei 5787
746 msdemlei 5826 \item[terms] A JSON object mapping the (machine-readable) terms to a
747     JSON object giving the term's properties as described below.
748     The keys in \textit{terms} are the strings used in
749 msdemlei 5824 machine-readable data.
750     \end{description}
752 msdemlei 5826 The JSON objects present as values in the terms object can have the
753 msdemlei 5824 following keys:
755 msdemlei 5787 \begin{description}
756 msdemlei 5824 \item[label] (mandatory)
757     A human-readable label for display purposes; clients should
758 msdemlei 5787 always try to display this rather than the raw term.
759 msdemlei 5824
760     \item[description] (mandatory) A human-readable definition of the underlying
761 msdemlei 5787 concept.
763 msdemlei 5824 \item[deprecated] present and mapped to a reserved value if the term is
764     deprecated and should no longer be used; validators will warn against
765     its use.
767     \item[preliminary] present and mapped to a reserved value if the term
768     is preliminary, meaning that in contrast to the other, ``eternal'' terms
769     it can disappear again; validators should qualify a validation as
770     preliminary if a document uses such a term.
772 msdemlei 5826 \item[wider] (mandatory) A JSON array
773     of ``wider'' terms. Most IVOA vocabularies are
774 msdemlei 5824 tree-like, and for them, there is only up to one term in here, which
775     would be the the parent node, which is the hypernym of the current term.
776     In SKOS-flavoured vocabularies, multiple terms can be here, and the
777     meaning of ``wider'' is a bit less clear-cut. The \textit{wider} list
778     is empty for top-level terms.
780 msdemlei 5826 \item[narrower] (mandatory) A JSON array
781     of ``narrower'' terms. In SKOS-flavoured
782 msdemlei 5824 vocabularies, that is just a list of all terms that list the current
783     term as wider. Otherwise, the vocabularies are tree-like and
784     \textit{narrower} is a list of all terms on the term's branch and below
785     it in the tree (it is the ``transitive closure of the inverse of
786     wider''). This is much more easily understood in an example, which we
787     give below in the discussion on addressing use case~\ref{uc:links} below.
788 msdemlei 5754 \end{description}
790 msdemlei 5826 Note that, while \textit{wider} and \textit{narrower} are mandatory
791     keys, their values can of course be empty lists.
793 msdemlei 5824 See appendix~\ref{app:desiseexample} for a example of a vocabulary
794     represented in desise.
795 msdemlei 5754
796 msdemlei 5824 For illustration, here are recipes to solve the various use cases in
797 msdemlei 5913 Python:
798 msdemlei 5754
799 msdemlei 5913 \paragraph{Load a vocabulary} Using the popular requests module:\\
800     \begin{lstlisting}
801     import requests
802     voc = requests.get(
803     "http://www.ivoa.net/rdf/uat",
804     headers={"accept": "application/x-desise+json"}
805     ).json()
806     \end{lstlisting}
808     Note, however, that non-trivial clients should cache files retrieved in
809     this way for a reasonable time span; IVOA vocabularies typically do not
810     change on time scales of months.
812 msdemlei 5824 \paragraph{See if a term is in the vocabulary} (\ref{uc:simplevoc},
813     \ref{uc:votvoc})\\ \lstinline{term in voc["terms"]}
815     \paragraph{See if a term is deprecated} (\ref{uc:deprecation})\\
816     \lstinline{"deprecated" in voc["terms"][term]}
818     \paragraph{Find a human-readable label for a term}
819     (\ref{uc:discovering})\\
820     \lstinline{voc["terms"][term]["label"]}
822     \paragraph{Find a human-readable description for a term}
823     (\ref{uc:discovering})\\
824     \lstinline{voc["terms"][term]["description"]}
826     \paragraph{Find out if a term is preliminary} (\ref{uc:simplereview})\\
827     \lstinline{"preliminary" in voc["terms"][term]}
829     \paragraph{Query expansion: select branch} (in \ref{uc:links}, select all
830     progenitors, including flat fields, dark frames, etc)
831     \begin{lstlisting}[language=python]
832     base_term = "progenitor"
833     expanded_terms = set(
834     [base_term]
835     +voc["terms"][base_term]["narrower"])
836     is_match = datalink_row["semantics"][1:] in expanded_terms
837     \end{lstlisting}
839     \paragraph{SKOS-type query expansion by neighbouring terms}
840     (\ref{uc:filtering})
841     \begin{lstlisting}[language=python]
842     assert voc["flavour"]=="SKOS"
843     expanded_terms = set(
844     [base_term]
845     +voc["terms"][base_term]["narrower"]
846     +voc["terms"][base_term]["wider"])
847     is_match = keyword_found in expanded_terms
848     \end{lstlisting}
851 msdemlei 5485 \section{Vocabulary Content}
852     \label{sect:voccontent}
854 msdemlei 5619 IVOA vocabularies MUST be based on W3C's Resource Description Framework.
855 msdemlei 5485 Details on required serialisations are given in
856     sect.~\ref{sect:deployment}. This section deals with what kinds of
857     statements users of IVOA vocabularies SHOULD evaluate to ensure
858     interoperability. Statements of other types are legal in IVOA
859     vocabularies but are not expected to be interpreted interoperably.
860     Clients MAY ignore them.
862 msdemlei 5530 In IVOA vocabularies, the concept URI MUST begin with
863     \url{http://www.ivoa.net/rdf}\footnote{In retrospect, the unnecessary
864     ``www'' in this URI is somewhat regrettable, but existing vocabularies
865 msdemlei 5553 have used URIs including it, and it seems a small price to pay for
866 msdemlei 5551 having uniform URIs}. It is recommended to not introduce
867 msdemlei 5824 additional hierarchy levels, i.e., vocabulary URIs SHOULD be direct children
868 msdemlei 5551 of \texttt{rdf}\footnote{Some existing vocabularies do not follow this
869 msdemlei 5758 rule; since vocabulary URI changes will break certain usage scenarios,
870 msdemlei 5612 their URIs are still retained.}.
871 msdemlei 5551
872     Since all vocabularies specified here are
873 msdemlei 5758 single-file, the full term (i.e., RDF resource)
874     URI is formed by appending a hash sign
875 msdemlei 5530 and a fragment identifier. In IVOA vocabularies, this fragment
876     identifier MUST consist of ASCII letters, numbers, underscores and
877     dashes exclusively [for requirement~\ref{req:machine}].
879 msdemlei 5619 The fragment identifiers in the vocabulary URIs SHOULD be
880 msdemlei 5567 human-readable, usually by suitably contracting the
881 msdemlei 5530 preferred label. In the IVOA, we do \emph{not} use natural
882     language-neutral concept identifiers but instead expect that domain
883     experts will already have an impression of a term's meaning from looking
884     at its URI.
886 msdemlei 5612 In this specification, we distinguish three different ``flavours'' of
887     vocabularies. Each covers a particular domain of problems and is
888     therefore subject to different requirements.
889 msdemlei 5599 Although the requirements are largely non-contradicting, each vocabulary must
890 msdemlei 5551 be clearly identified as \emph{either} giving SKOS concepts, RDFS
891 msdemlei 5553 classes or RDF properties so clients know how to extract word lists and
892 msdemlei 5752 hierarchies; see sect.~\ref{sect:genprop}
893 msdemlei 5619 for details.
894 msdemlei 5485
895 msdemlei 5530
896     \subsection{SKOS Vocabularies}
897 msdemlei 5486 \label{sect:skosvoc}
899 msdemlei 5758 SKOS vocabularies should be used where terms are organised
900     in informal (i.e., non necessarily strict is-a)
901 msdemlei 5530 hierarchies. The classic use case here is query expansion, where, for
902     instance, a search for ``AGN'' might be expanded to include matches for
903     ``accretion disk'' (under certain circumstances).
905 msdemlei 5612 The terms in SKOS vocabularies have the RDF type \vocterm{skos:Concept}.
907 msdemlei 5530 \subsubsection{Properties in SKOS Vocabularies}
908     \label{sect:skosvoc-prop}
910 msdemlei 5486 IVOA SKOS vocabularies use the following properties:
912     \begin{itemize}
913 msdemlei 5595 \item \vocterm{skos:broader} -- interpreted in the standard SKOS sense.
914     The reverse property, \vocterm{skos:narrower}, MAY be given, but clients
915 msdemlei 5486 MUST NOT depend on their presence [this satisifies
916     requirement~\ref{req:hierarchy}].
918     \item \vocterm{skos:prefLabel} -- all concepts MUST have an
919 msdemlei 5612 English-language preferred label, which is an RDF plain literal [by
920 msdemlei 5553 requirement~\ref{req:mtm}]. No RDF language label is allowed on the
921     literal, and only one preferred label is permitted
922 msdemlei 5752 [these help requirement~\ref{req:nordf}].
923 msdemlei 5486
924     \item \vocterm{skos:definition} -- all concepts MUST have a non-trivial
925     English-language definition. It is obviously impossible to define
926     ``non-trivial'' in a rigorous way; a suggested criterion is that a
927     domain expert would, given the definition, presumably arrive at a
928 msdemlei 5661 similar preferred label, and recursive definitions (i.e., those using
929     the label itself) should be avoided whenever possible. Definitions in
930     non-English languages are not permitted, and only one definition is
931     permitted [again, this helps requirement~\ref{req:mtm}].
932 msdemlei 5486
933 msdemlei 5757 \item \vocterm{skos:exactMatch} -- for externally managed vocabularies
934     the IVOA has endorsed (see sect.~\ref{sect:externally-managed}), this
935     property links the IVOA term (subject) to the external RDF resource
936     (object).
938 msdemlei 5552 \item General properties discussed in \ref{sect:genprop} [this is
939     for requirements~\ref{req:deprecating} and
940 msdemlei 5595 \ref{req:preliminary}]. The \vocterm{ivoasem:vocflavour} of these
941     vocabularies is \verb|SKOS|.
942 msdemlei 5486 \end{itemize}
944     This specification does not include requirements on the use or the
945 msdemlei 5757 interpretation of \vocterm{skos:related},
946 msdemlei 5486 \vocterm{skos:closeMatch}, \vocterm{skos:broadMatch},
947     \vocterm{skos:narrowMatch}, \vocterm{skos:ConceptScheme},
948     \vocterm{skos:inScheme}, \vocterm{skos:hasTopconcept},
949 msdemlei 5551 \vocterm{skos:altLabel}, and \vocterm{skos:hiddenLabel}. If use cases
950 mbt 5650 are found that require those, this specification will be amended. Until
951 msdemlei 5619 then, vocabulary authors SHOULD NOT use them in order to avoid creating
952 msdemlei 5551 practices that might conflict with later usage patterns.
953 msdemlei 5486
954     This specification does not include requirements on the use or the
955     interpretation of the transitive SKOS properties
956 msdemlei 5551 (\vocterm{skos:broaderTransitive}, \vocterm{skos:narrowerTransitive}).
957 msdemlei 5486 At this point, we believe that applications requiring this type of
958 msdemlei 5551 reasoning-friendly semantics should preferably use RDF class
959 msdemlei 5486 vocabularies.
961     \subsubsection{Example (non-normative)}
963 msdemlei 5758 Here is a term from a SKOS vocabulary conforming to this specification
964 msdemlei 5568 in RDF/XML serialisation:
965 msdemlei 5486
966     \begin{lstlisting}[language=XML]
967 msdemlei 5595 <skos:Concept rdf:about="http://ivoa.net/rdf/AstronomicalObjects#AGN">
968 msdemlei 5828 <skos:prefLabel>AGN</skos:prefLabel>
969     <skos:definition>A compact object in the center of a galaxy showing
970     unusual emission ("active galactic nucleus").</skos:definition>
971     <skos:broader rdf:resource
972     ="http://ivoa.net/rdf/theory/AstronomicalObjects#OpticalSource"/>
973     <skos:broader rdf:resource
974     ="http://ivoa.net/rdf/theory/AstronomicalObjects#CompoundObject"/>
975 msdemlei 5486 </skos:Concept>
976     \end{lstlisting}
978 msdemlei 5553 \subsection{RDF Properties Vocabularies}
979 msdemlei 5530 \label{sect:refpropvoc}
981 msdemlei 5553 RDF properties vocabularies should be used when the terms in the
982     vocabulary are mainly used to state
983     relationships between entities that can sensibly be imagined as
984     resources in the RDF sense. Such terms would naturally be used as
985 msdemlei 5530 predicates in RDF triples. Obvious examples might be something
986 msdemlei 5758 like is-progenitor-for in a provenance chain or, indeed, the special
987     properties for IVOA vocabularies introduced in sect.~\ref{sect:genprop}.
988 msdemlei 5530
989 msdemlei 5758
990 msdemlei 5612 The terms in RDF Properties vocabularies have the RDF type
991     \vocterm{rdf:Property}.
993 msdemlei 5553 \subsubsection{Properties in RDF Properties Vocabularies}
994 msdemlei 5597 \label{sect:propvoc-prop}
995 msdemlei 5530
996 msdemlei 5553 IVOA RDF properties vocabularies use the following properties (where
997 msdemlei 5551 not specified, the requirements considered essentially match those in
998     sect.~\ref{sect:skosvoc-prop}):
999 msdemlei 5530
1000     \begin{itemize}
1001     \item \vocterm{rdfs:label} -- all terms MUST have an English-language
1002 msdemlei 5758 label, and clients should prefer it over the fragment in the
1003     term URI for presentation purposes. Only
1004 msdemlei 5530 one such label is permitted.
1006     \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial
1007     English-language comment serving as a human-oriented definition of the
1008     term. The considerations for \vocterm{skos:definition} in
1009 msdemlei 5661 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one
1010     \vocterm{rdfs:comment} per term is allowed.
1011 msdemlei 5530
1012     \item \vocterm{rdfs:subPropertyOf} -- interpreted as in RDFS to induce
1013 msdemlei 5619 the hierarchy of terms; a term MUST NOT appear as subject of more than
1014 msdemlei 5599 one \vocterm{rdfs:subPropertyOf} triple (i.e., the hierarchy is a tree).
1015 msdemlei 5530
1016 msdemlei 5595 \item General properties discussed in sect.~\ref{sect:genprop}.
1017     The \vocterm{ivoasem:vocflavour} of these vocabularies is
1018     \verb|RDF Property|.
1020 msdemlei 5547 \end{itemize}
1021 msdemlei 5530
1022     \subsubsection{Example (non-normative)}
1023 msdemlei 5551 \label{sect:rdfpxex}
1024 msdemlei 5530
1025     \begin{lstlisting}[language=XML]
1026 msdemlei 5613 <rdf:Property rdf:about
1027     ="http://www.ivoa.net/rdf/datalink/core#preview-image">
1028     <rdfs:comment>preview of the data as a 2-dimensional
1029     image</rdfs:comment>
1030     <rdfs:label>Image preview</rdfs:label>
1031 msdemlei 5530 <rdfs:subPropertyOf rdf:resource
1032 msdemlei 5613 ="http://www.ivoa.net/rdf/datalink/core#preview"/>
1033     </rdf:Property>
1034 msdemlei 5530 \end{lstlisting}
1037 msdemlei 5551 \subsection{RDF Class Vocabularies}
1038 msdemlei 5530
1039 msdemlei 5567 RDF class vocabularies should be used when the terms in the vocabulary
1040 msdemlei 5530 are reasonably class-like, i.e., would usually be either subjects or
1041     objects in RDF triples. As opposed to SKOS vocabularies, the hierarchy
1042 msdemlei 5612 implied is strict in the sense of \vocterm{rdfs:subClassOf}
1043 msdemlei 5567 -- roughly, that statements true for a wider term must be true
1044 msdemlei 5553 a more specialised term, too. This lets clients confidently perform
1045 msdemlei 5530 inferences.
1047     For instance, coordinates in the FK4 reference frame are equatorial, and
1048     thus even a client unfamiliar with the FK4 frame as such can confidently
1049     infer that the coordinates are right ascension and declination, and that
1050     right ascensions increase eastwards. Reasoning of this type is
1051     impossible within a SKOS vocabulary.
1053 msdemlei 5612 The terms in RDF Class vocabularies have the RDF type
1054     \vocterm{rdfs:Class}.
1056 msdemlei 5551 \subsubsection{Properties in RDF Class Vocabularies}
1057 msdemlei 5597 \label{sect:classvoc-prop}
1058 msdemlei 5530
1059 msdemlei 5551 IVOA RDF class vocabularies use the following properties:
1060 msdemlei 5530
1061     \begin{itemize}
1062     \item \vocterm{rdfs:label} -- all terms MUST have an English-language
1063 msdemlei 5551 label, and clients should prefer it over the term (the fragment of the
1064     term URI) for presentation purposes. Only
1065 msdemlei 5530 one such label is permitted.
1067     \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial
1068     English-language comment serving as a human-oriented definition of the
1069     term. The considerations for \vocterm{skos:definition} in
1070 msdemlei 5661 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one
1071     \vocterm{rdfs:comment} per term is allowed.
1072 msdemlei 5530
1073     \item \vocterm{rdfs:subClassOf} -- interpreted as in RDFS to induce
1074 msdemlei 5619 the hierarchy of terms; a term MUST NOT appear as subject of more than
1075 msdemlei 5599 one \vocterm{rdfs:subClassOf} triple (i.e., the hierarchy is a tree).
1076 msdemlei 5530
1077 msdemlei 5552 \item General properties discussed in \ref{sect:genprop}.
1078 msdemlei 5595 The \vocterm{ivoasem:vocflavour} of these vocabularies is
1079     \verb|RDF Class|.
1080 msdemlei 5530 \end{itemize}
1082     \subsubsection{Example (non-normative)}
1084 msdemlei 5553 Here is a term from an RDF class vocabulary conforming to this
1085 msdemlei 5568 specification in RDF/XML serialisation:
1086 msdemlei 5530
1087     \begin{lstlisting}[language=XML]
1088 msdemlei 5613 <rdfs:Class rdf:about="http://www.ivoa.net/rdf/refframe#FK5">
1089     <rdfs:comment>
1090     Positions based on the 5th Fundamental Katalog. If no equinox is
1091     [...]
1092     </rdfs:comment>
1093     <rdfs:label>FK5</rdfs:label>
1094     <rdfs:subClassOf rdf:resource
1095     ="http://www.ivoa.net/rdf/refframe#EQUATORIAL"/>
1096     </rdfs:Class>
1097 msdemlei 5530 \end{lstlisting}
1099 msdemlei 5551 \subsection{General Properties}
1100     \label{sect:genprop}
1102 msdemlei 5553 To cover requirements~\ref{req:deprecating} and
1103 msdemlei 5597 \ref{req:preliminary} and to facilitate the handling of vocabularies not
1104     directly retrieved via HTTP (which means that the application may not
1105 msdemlei 5612 know the vocabulary URI a priori; cf.~requirement~\ref{req:standalone}),
1106     the Semantics WG defines some
1107 msdemlei 5597 properties of its own in the vocabulary
1108     \url{http://www.ivoa.net/rdf/ivoasem}. The following properties may be
1109 msdemlei 5612 used in all three vocabulary flavours:
1110 msdemlei 5551
1111     \begin{itemize}
1112 msdemlei 5612 \item \vocterm{dc:created} -- IVOA vocabularies MUST include exactly one
1113 msdemlei 5595 triple with the vocabulary as subject and a predicate
1114     \vocterm{dc:created}. The object is the datestamp of the vocabulary in
1115     YYYY-MM-DD format. Clients may only use this for debugging and similar
1116     purposes.
1118 msdemlei 5612 \item \vocterm{ivoasem:vocflavour} -- IVOA vocabularies MUST include
1119 msdemlei 5595 exactly one triple with the vocabulary as subject and a string literal
1120 msdemlei 5597 specifying the kind of vocabulary as per this specification. The
1121     ``General properties'' bullet points of sects.~\ref{sect:skosvoc-prop}
1122     (\verb|SKOS|), \ref{sect:propvoc-prop} (\verb|RDF Property|), and
1123 msdemlei 5612 \ref{sect:classvoc-prop} (\verb|RDF Class|) define what strings may occur
1124 msdemlei 5597 here.
1125 msdemlei 5595
1126 msdemlei 5552 \item \vocterm{ivoasem:preliminary} -- this property indicates
1127     that a term is preliminary and might disappear from the
1128 msdemlei 5597 vocabulary without warning. The object of triples using it
1129     is a blank node. Validators need not warn against the use
1130 msdemlei 5619 of preliminary terms, but as they encounter them, they SHOULD
1131 msdemlei 5552 qualify their validation to the effect that it is temporary.
1133     \item \vocterm{ivoasem:deprecated} -- this property indicates
1134 msdemlei 5597 that a term is deprecated. The object of triples using it
1135 msdemlei 5619 is a blank node. Validators SHOULD issue warnings if such terms
1136     are encountered.
1137 msdemlei 5552
1138     \item \vocterm{ivoasem:useInstead} -- for a deprecated term, the
1139 msdemlei 5758 objects of RDF triples using this property indicate
1140     which terms should be
1141 msdemlei 5619 used instead of the deprecated one.
1142 msdemlei 5552
1143 msdemlei 5551 \end{itemize}
1145 msdemlei 5612 \subsubsection{Example (non-normative)}
1146 msdemlei 5567
1147 msdemlei 5597 The following snippets show RDF/XML triples using the common terms,
1148     taken from the existing relationship\_type vocabulary; the notation
1149     \verb|__| as a blank node is an implementation detail and must not be
1150 msdemlei 5612 relied upon. In general, where ivoasem properties take blank nodes as
1151     objects, clients should normally just ignore the objects.
1152 msdemlei 5567
1153 msdemlei 5597 \begin{lstlisting}[language=XML]
1154     <rdf:Description rdf:about
1155     ="http://www.ivoa.net/rdf/voresource/relationship_type">
1156     <dc:created>2016-08-17</dc:created>
1157     </rdf:Description>
1158     <rdf:Description rdf:about
1159     ="http://www.ivoa.net/rdf/voresource/relationship_type">
1160     <ivoasem:vocflavour>RDF Property</ivoasem:vocflavour>
1161     </rdf:Description>
1162     <rdf:Description rdf:about
1163     ="http://www.ivoa.net/rdf/voresource/relationship_type#IsPartOf">
1164     <ivoasem:preliminary rdf:resource=
1165     "http://www.ivoa.net/rdf/voresource/relationship_type#__"/>
1166     </rdf:Description>
1167     <rdf:Description rdf:about
1168     ="http://www.ivoa.net/rdf/voresource/relationship_type#derived-from">
1169     <ivoasem:deprecated rdf:resource
1170     ="http://www.ivoa.net/rdf/voresource/relationship_type#__"/>
1171     <ivoasem:useInstead rdf:resource
1172     ="http://www.ivoa.net/rdf/voresource/relationship_type#IsDerivedFrom"/>
1173     </rdf:Description>
1174     \end{lstlisting}
1176 msdemlei 5553
1177 msdemlei 5485 \section{Vocabulary Management}
1178 msdemlei 5551 \label{sect:management}
1179 msdemlei 5485
1180 msdemlei 5912 This section discusses the processes through which new vocabularies can be
1181 msdemlei 5567 defined and how vocabulary updates are performed in way
1182 msdemlei 5912 that ensures community participation and at least a minimal level of
1183     consensus; prodecures here primarily address requirements
1184     \ref{req:consensus}, \ref{req:evolution} and \ref{req:traceable}.
1186 msdemlei 5758 In the following, the phrase ``chair of the Semantics WG'' is understood
1187 msdemlei 5552 to mean ``chair or vice-chair of the Semantics WG''; in the unlikely
1188 msdemlei 5553 situation that chair and vice-chair dissent, the resolution of the
1189     problem is up to the TCG chair.
1190 msdemlei 5547
1191 msdemlei 5567
1192 msdemlei 5552 \subsection{New Vocabularies}
1193 msdemlei 5757 \label{sect:new-vocabularies}
1194 msdemlei 5547
1195 msdemlei 5552 New vocabularies in the VO should be introduced with a document going
1196     through the normal IVOA approval process, i.e., intended to become a
1197     recommendation or an endorsed note with RFC as described in the IVOA
1198     Document Standards \citep{2017ivoa.spec.0517G}.
1199 msdemlei 5547
1200 msdemlei 5552 At the discretion of the chair or the Semantics WG, the vocabulary is
1201     uploaded to the vocabulary repository when a document reaches the state
1202     of a Working Draft. At the latest, the vocabulary is uploaded when the
1203 msdemlei 5553 document becomes a Proposed Recommendation or a Proposed Endorsed Note
1204     in order to support a thorough review and reference implementations.
1205 msdemlei 5552
1206     The entire vocabulary is marked human-readably as preliminary in the
1207     vocabulary index (cf.~sect.~\ref{sect:deployment}). All terms in the
1208     vocabulary are marked as preliminary using the
1209     \vocterm{ivoasem:preliminary} property (cf.~sect.~\ref{sect:genprop}) in
1210     order to satisfy requirement~\ref{req:preliminary}.
1212 mbt 5650 The entire new vocabulary gets approved as the document introducing it
1213 msdemlei 5552 reaches the status of a Recommendation or an Endorsed Note. From then
1214 msdemlei 5758 on, it is managed by the Semantics WG using the process defined in
1215 msdemlei 5552 the next section.
1217 msdemlei 5567 Once approved (i.e., no longer marked as preliminary),
1218     terms in IVOA vocabularies cannot be removed. They can,
1219 msdemlei 5619 however, be marked as deprecated.
1220 msdemlei 5567
1221 msdemlei 5552 \subsection{Updating Vocabularies}
1222 msdemlei 5757 \label{sect:updating-vocabularies}
1223 msdemlei 5552
1224 msdemlei 5567 IVOA vocabularies can be extended as domain requirements develop
1225     [requirement~\ref{req:evolution}]. Clients
1226     should therefore be designed such that they gracefully deal with terms
1227     that have not been part of the vocabulary at build time, typically by
1228     exploiting information in the vocabulary, perhaps by falling back to
1229     wider, known terms, or by presenting their users labels and descriptions
1230     for terms not explicitly handled.
1233 msdemlei 5552 \subsubsection{Vocabulary Enhancement Proposals}
1235 msdemlei 5553 To add one or more terms to a vocabulary, to introduce deprecations or
1236     to change term labels, descriptions, or relationships,
1237 msdemlei 5552 an interested party -- not necessarily affiliated with the Working Group
1238     that has originally introduced the vocabulary -- prepares a Vocabulary
1239 msdemlei 5547 Enhancement Proposal (VEP). In the interest of thorough review and
1240     topical discussion, a single VEP should only cover directly related
1241     terms. For instance, in a vocabulary of reference frames, it would be
1242     reasonable to add old-style and new-style galactic frames in one
1243     VEP, but not, say, azimuthal and supergalactic coordinates. The
1244     arguments for both terms in the former pair are rather
1245     analogous\footnote{This does not rule out that, in the example, one
1246     might argue that old-style galactic coordinates are so ancient that
1247     perhaps they should not be supported in the VO at all; the chair of the
1248     Semantics WG might then decree that the VEP still needs to be split.}.
1249     In the latter case, two very different rationales would have
1250     to be put forward, which is a clear sign that two VEPs are in order.
1252 msdemlei 5551 \begin{figure}
1253     \begin{verbatim}
1254     Vocabulary: http://www.ivoa.net/rdf/datalink/core
1255     Author: msdemlei@ari.uni-heidelberg.de
1256     Date: 2019-07-19
1258 msdemlei 5874 Term: IsPreviousVersionOf
1259 msdemlei 5551 Action: Addition
1260 msdemlei 5567 Label: Newer Version
1261     Description: This dataset in a previous edition, e.g., processed
1262 msdemlei 5551 with an older pipeline, as part of an older data release.
1263     Relationships: rdfs:subProperyOf(this)
1264 msdemlei 5704 Used-in: http://example.org/datalink?ID=doc-v1
1265 msdemlei 5551
1266 msdemlei 5874 Term: IsNewVersionOf
1267 msdemlei 5551 Action: Addition
1268 msdemlei 5567 Label: Previous Version
1269     Description: This dataset in a newer edition, e.g., processed
1270 msdemlei 5551 with a newer pipeline, as part of a newer data release.
1271     Relationships: rdfs:subProperyOf(this)
1272 msdemlei 5704 Used-in: http://example.org/datalink?ID=doc-v2
1273 msdemlei 5551
1274     Rationale:
1276     The terms are mainly intended for projects with data releases.
1277 msdemlei 5661 IsPreviousVersionOf allows services to mark up links to (typically
1278 msdemlei 5551 datalink documents for) later version(s) of this data set. It
1279     allows a client to alert users that a newer, probably improved,
1280     rendition of the current dataset is available and should
1281     presumably be used instead of what they are looking at. The
1282     inverse relationship, IsNewVersionOf, is useful if projects want
1283     to keep previous versions of the dataset findable without having
1284     them show up in the default queries.
1286     The terms are taken from the relationship types of DataCite.
1287     \end{verbatim}
1289     \caption{A sample VEP.}
1290     \label{fig:vepsample}
1291     \end{figure}
1293 msdemlei 5547 A VEP is a semistructured text file containing the following items:
1295     \begin{itemize}
1296 msdemlei 5800 \item \vepitem{Vocabulary:} The URI of the vocabulary
1297 msdemlei 5704 \item \vepitem{Author:} Contact information for the author(s) of
1298 msdemlei 5547 the VEP.
1299 msdemlei 5704 \item \vepitem{Date:} The date on which the VEP was posted.
1300 msdemlei 5874 \item \vepitem{Term:} The identifier of the term to be added, modified,
1301     or deleted.
1302 msdemlei 5704 \item \vepitem{Action:} one of \textit{Addition}, \textit{Deprecation}, or
1303 msdemlei 5547 \textit{Modification}.
1304 msdemlei 5874 \item \vepitem{Label:} The English-language, human-readable label of the term.
1305 msdemlei 5704 \item \vepitem{Description:} The description that will come with the term.
1306     \item \vepitem{Relationships}: If applicable, relationships the new
1307 msdemlei 5567 term will have to existing terms, using the properties defined in
1308     the present document.
1309 msdemlei 5758 \item \vepitem{Used-In}: At least one URI of a document using the
1310     proposed term.
1311 msdemlei 5874 \item \vepitem{Rationale}: A discussion of use cases, the role of the term in
1312 msdemlei 5758 the vocabulary, and the like. In particular, the item(s) in Used-In
1313     should be commented on.
1314 msdemlei 5547 \end{itemize}
1316 msdemlei 5704 The items \vepitem{Term}, \vepitem{Action}, \vepitem{Label},
1317     \vepitem{Description}, \vepitem{Used-in},
1318     and \vepitem{Relationships}, may be repeated if
1319 msdemlei 5547 multiple terms are affected by a VEP. In \textit{Addition} VEPs, all items
1320 msdemlei 5704 except \vepitem{Relationships} are mandatory.
1321 msdemlei 5547
1322 msdemlei 5704 When \vepitem{Action} is \textit{Deprecation}, \vepitem{Label},
1323     \vepitem{Description}, and \vepitem{Relationships} are optional but can be
1324 msdemlei 5619 given if useful for understanding the VEP. The rationale MUST discuss
1325 msdemlei 5612 the reasons for a deprecation. Usually, one or more replacement
1326 msdemlei 5553 term(s) will be proposed within the same VEP.
1327 msdemlei 5547
1328 msdemlei 5704 When \vepitem{Action} is \textit{Modification}, \vepitem{Label},
1329     \vepitem{Description}, and \vepitem{Relationships} give the proposed new
1330 msdemlei 5547 values of the term. The term itself cannot be modified. The rationale
1331 msdemlei 5553 will usually detail the changes proposed while mentioning the previous
1332     values.
1333 msdemlei 5547
1334     We do not expect the VEPs to be evaluated by machines. Therefore, we
1335     define no grammar for the markup of sections, section headers, and their
1336     content. It is still recommended that authors follow the formatting of
1337     the example in Fig.~\ref{fig:vepsample}.
1339 msdemlei 5552 \subsubsection{Publishing a VEP}
1340 msdemlei 5547
1341 msdemlei 5705 To publish a VEP, it is sent to the chair of the Semantics WG,
1342     preferably by e-mail. The chair of the Semantics WG will perform a
1343     formal validation, in particular as regards the presence of all required
1344     items and syntactically valid relationships. No assessment of the
1345     contents is done at this stage.
1346 msdemlei 5547
1347 msdemlei 5758 VEPs formally valid then receive a running number. The first VEP was
1348     VEP-0001, the second VEP-0002, and so on. The chair of the Semantics WG
1349     then adds the new VEP is added to the public index of VEPs as
1350     ``Current'' (see Appendix~\ref{app:curtech} for the technical details).
1351     This index has a link to each VEP's text (in general, a location in a
1352     version control system).
1353 msdemlei 5705
1354 msdemlei 5547 Once the VEP is uploaded, it is announced to the IVOA Semantics Working
1355     Group and all other IVOA Working Groups concerned (again, the technical
1356 msdemlei 5551 details are found in Appendix~\ref{app:curtech}). The chair of the
1357 msdemlei 5547 Semantics WG can extend the distribution as they see fit. The
1358     announcement in particular contains a copy of the VEP in question.
1360     As soon as possible after the upload, the chair of the Semantics WG adds
1361 mbt 5798 any term(s) proposed to the vocabulary as a preliminary term using the
1362 msdemlei 5758 \vocterm{ivoasem:preliminary} property. This means that the terms can
1363 msdemlei 5612 immediately be used without raising warnings or errors, but in contrast
1364     to approved terms, they may disappear again. Deprecation or
1365     modification VEPs have no immediate effect.
1366 msdemlei 5547
1367 msdemlei 5552 \subsubsection{Approval Process}
1368 msdemlei 5550 \label{sect:approval}
1369 msdemlei 5547
1370     Discussion of a VEP takes place in the WGs' discussion forums (again,
1371 msdemlei 5551 see Appendix~\ref{app:curtech}). The chair of the Semantics WG will
1372 msdemlei 5704 summarise the discussion in the VEP in a \textit{Discussion} section.
1373 msdemlei 5547
1374 msdemlei 5704 During the process, all parts of the VEP may be changed except the
1375     term(s) proposed.
1377 msdemlei 5547 Once the chair of the Semantics WG sees a sufficient consensus reached,
1378     they announce the VEP in the TCG. If, at the next meeting of the TCG,
1379     no Working Group objects to the VEP, it is accepted and the marker that
1380 msdemlei 5704 a term is preliminary is removed from the relationships of any terms
1381     added by the VEP. In the case of deprecation or modification VEPs, the
1382 msdemlei 5612 requested actions are taken at this point.
1383 msdemlei 5547
1384 msdemlei 5704 If, on the other hand, discussion of an addition request results in the
1385     realisation that terms proposed need to be changed, the VEP in question
1386     must be withdrawn, its effects on the vocabulary be undone, and zero or
1387     more new VEPs are posted containing proposals for terms for which
1388     consensus appears feasible. The VEP withdrawn receives a
1389     \vepitem{Superceded-by} item referencing any new VEPs, any new VEPs have
1390     a \vepitem{Supercedes} item referencing the original VEP.
1391 msdemlei 5547
1392 msdemlei 5756 \subsubsection{Guidelines for Creating Concepts (non-normative)}
1394 msdemlei 5758 When introducing terms, it is useful to consider a very simple
1395 msdemlei 5756 semantic model, where the world is a set of (tangible or non-tangible)
1396     ``things'' in the sense of naive set theory.
1398     A vocabulary has a scope, which is a subset of the world; this could be
1399     ``reference systems'' or ``astronomical object types'' or even something
1400     as concrete as ``observatories''.
1402 msdemlei 5824 In this picture, a term denotes a certain subset of a vocabulary's
1403     scope. This set is called the term's (or, where an additional level
1404     between the concrete letters making up the term as defined by this
1405     document and the set is useful, the concept's) ``extension''.
1407     Now, in an ideal vocabulary the extensions of its
1408 msdemlei 5758 top-level terms are disjunct (meaning: each thing in scope of the vocabulary
1409     belongs to not more than one top-level term's extension) and the terms cover the
1410 msdemlei 5756 entire scope (meaning: for each thing in the scope, there is at least
1411 msdemlei 5758 one term's extension that contains that thing): The top-level terms are
1412 msdemlei 5756 equivalence classes over the vocabulary's scope.
1414 msdemlei 5758 Where vocabularies are hierarchical, analogous considerations would
1415     apply for the extensions of a general term and its more specialised
1416     terms.
1418     When natural language and the real world are involved,
1419     this ideal generally is unreachable.
1420     But when proposing a term and its definition, authors should try to
1421 msdemlei 5756 make sure that
1423     \begin{compactenum}
1424     \item their new term has a useful extension (i.e., consumers actually
1425     want to know whether a thing is or is not inside it)
1426     \item the extension is reasonably disjunct from existing terms, or is a
1427 msdemlei 5758 true superset (in which case the other terms are narrower), or is a true
1428 msdemlei 5756 subset (in which case they are wider) of other terms' extensions.
1429     \end{compactenum}
1431     Put another way: When designing terms, it is as important to say what is
1432 mbt 5798 not covered as to clearly say what is.
1433 msdemlei 5756
1434     This is a major reason why it is important to give clear definitions
1435     whenever these definitions are not uniquely given by the domain. For
1436     instance, while an object type vocabulary probably does not need to be
1437     very diligent in defining $\delta$~Cephei stars because the extension of
1438     that term is uncontroversial to first order\footnote{Although it might
1439     seem desirable to clarify whether, say, W~Virginis stars are or are not
1440     excluded}, a term like ``dataset'' should come with a precise
1441     definition, ideally containing a reference to a longer explanation.
1442 msdemlei 5757
1443     \subsection{Externally Managed Vocabularies}
1444     \label{sect:externally-managed}
1446     The IVOA is not the only body developing vocabularies, and of course VO
1447     components are free to use other, non-IVOA vocabularies whenever
1448     convenient or even required for interoperability beyond the IVOA.
1450     Sometimes, however, it is advantageous to subject an external vocabulary
1451     to the requirements set forth by this specification. The motivating use
1452     case here is \ref{uc:uat}, the Unified Astronomy Thesaurus. As derived
1453     in requirement~\ref{req:external}, multiple considerations make a
1454     ``mirror'' of the vocabulary in the IVOA RDF repository highly
1455     desirable. Regrettably, since RDF resources (i.e., what we call terms
1456     here) are identified by their full URIs, this will create new RDF
1457     resources, and hence care must be taken that RDF tools can work out the
1458     identity of the mirrored IVOA terms and the original RDF resources.
1460     Also, the processes from sects.~\ref{sect:new-vocabularies}
1461     and~\ref{sect:updating-vocabularies} obviously cannot apply to such
1462     vocabularies, which have their own management procedures.
1464     To address these issues, the following rules apply:
1466     When a vocabulary managed by an IVOA-external body needs to be made
1467     available in the form prescribed by this specification, a proposal for
1468     doing this needs to pass the endorsed notes process of the IVOA as laid
1469     out in the IVOA Document Standards \citep{2017ivoa.spec.0517G}. As it
1470     concerns external relationships of the IVOA, it additionally needs
1471     endorsment by the IVOA Execuive Committee to become effective.
1473     This proposal has to specify:
1474     \begin{itemize}
1475     \item The basic metadata for the vocabulary on the IVOA side.
1476     \item The rules for mapping the external RDF resource URIs to IVOA term
1477     URIs, together with a plan for how this mapping is kept stable.
1478     \item If during the mapping of the vocabulary, external RDF triples are
1479     discarded (which likely is necessary to ensure adherence to our
1480     constraints), what triples are discarded.
1481     \item A description of and reference to software that performs this
1482     mapping.
1483     \item A description of the external management process.
1484     \end{itemize}
1486     The proposing party has to provide software to automatically translate
1487     resources from the external format to a suitable input for the IVOA
1488     vocabulary tooling.
1490     Each term in the IVOA vocabulary mirror MUST declare its identity to
1491 msdemlei 5758 the original, external RDF resource. At this point, this is only
1492 msdemlei 5757 defined for SKOS-flavoured vocabularies, where the IVOA term must be the
1493     subject of exactly one triple with the \vocterm{skos:exactMatch}
1494     property. The object of that triple is the URI of the external RDF
1495     resource.
1497     For other flavours, no such mechanism is defined in this version of the
1498     specification, which means that for now, externally managed vocabularies
1499     must use the SKOS flavour.
1501     Once an external vocabulary is endorsed by both the TCG and the
1502     Executive Committee, the chair of the Semantics working group has the
1503     responsibility to keep the IVOA mirror of the vocabulary synchronised,
1504     ideally by using a monitored, automatised process like a post-commit
1505     action on an external version control system.
1508 msdemlei 5755 \section{Publishing Vocabularies}
1509 msdemlei 5485 \label{sect:deployment}
1511 msdemlei 5552 This section is an adaptation of \citet{note:cooluris} and is
1512     intended to satisfy requirements~\ref{req:machine}
1513 msdemlei 5755 and~\ref{req:mtm}. It also briefly discusses how IVOA vocabularies
1514     should be referenced.
1515 msdemlei 5549
1516 msdemlei 5755 \subsection{Deploying Vocabularies}
1518 msdemlei 5548 All IVOA-approved vocabularies are accessible as children of
1519 msdemlei 5612 \url{http://www.ivoa.net/rdf}. Dereferencing that URI will lead to an
1520 msdemlei 5551 index of current approved and proposed vocabularies.
1521 msdemlei 5548 Vocabularies still under review are clearly marked as such.
1523 msdemlei 5612 When dereferencing a vocabulary URI, clients will receive an HTTP 303
1524 msdemlei 5548 (See Other) code, with the \texttt{Location} header set to the last
1525     version of the vocabulary. The version is written as the date of the
1526     last update in the format YYYY-MM-DD. Depending on the value of the
1527     request's accept header, the redirect will end up at
1529     \begin{itemize}
1530 msdemlei 5612 \item an HTML rendition of the vocabulary by default. The HTML element
1531 msdemlei 5758 corresponding to a term has the term (i.e., the fragment identifier in the
1532 msdemlei 5612 term's URI) as its HTML id ; hence a URI
1533     \verb|<vocabulary URI>#<term>| will immediately focus the term's HTML
1534     rendition in common
1535 msdemlei 5553 user agents [requirement~\ref{req:mtm}].
1536 msdemlei 5548
1537 msdemlei 5553 \item a Turtle rendition of the vocabulary if the accept header
1538 msdemlei 5548 indicates that \verb|text/turtle| documents are preferred.
1540 msdemlei 5752 \item an RDF/XML rendition of the vocabulary
1541 msdemlei 5612 if the accept header indicates that
1542 msdemlei 5752 \verb|application/rdf+xml| documents are preferred.
1543 msdemlei 5755
1544     \item an ad-hoc JSON rendition of the vocabulary as specified in
1545     sect.~\ref{sect:desise} if the accept header indicates that
1546 msdemlei 5824 \verb|application/x-desise+json| documents are preferred.
1547 msdemlei 5548 \end{itemize}
1549     Individual vocabularies may be available in additional formats.
1550     Content negotiation might then consider additional media types.
1552     Clients may record the full versioned URI of the vocabulary used for
1553 msdemlei 5619 debug or provenance purposes. These URIs, however, MUST NOT be used
1554 msdemlei 5548 externally. In particular, a URI like
1555 msdemlei 5549 \url{http://www.ivoa.net/rdf/example/2019-07-14/example.html#term} has no
1556 msdemlei 5548 RDF meaning by this standard and must never be used in publicly visible
1557     RDF triples. Always use URIs of the form
1558 msdemlei 5549 \url{http://www.ivoa.net/rdf/example#term}.
1559 msdemlei 5548
1560 msdemlei 5755 \subsection{Referencing Vocabularies}
1562     Since IVOA vocabularies, at least after some time, generally are a
1563     collective effort with a continious evolution, it is inappropriate to
1564 msdemlei 5758 cite them in the conventional author-year-title format.
1565 msdemlei 5755
1566     However, the vocabulary URI is intended to be stable and uniquely
1567     identifies the vocabulary as such. Hence, this URI is what should
1568     normally be cited. The standard style would be along the lines of
1569     \begin{lstlisting}[language={}]
1570     Terms in this field must be taken from the IVOA vocabulary
1571     \url{http://www.ivoa.net/rdf/voresource/content_level}.
1572     \end{lstlisting}
1573     or, in formats where footnotes are appropriate and inline URIs should be
1574     avoided for typographical reasons
1575     \begin{lstlisting}[language={}]
1576     Terms in this field must be taken from the IVOA vocabulary
1577     \emph{Content levels for VO resources}\footnote{
1578     \url{http://www.ivoa.net/rdf/voresource/content_level}}.
1579     \end{lstlisting}
1580     -- the footnote anchor should be the vocabulary name as given in the
1581     IVOA vocabulary repository\footnote{\url{http://www.ivoa.net/rdf}}.
1583     Except in the rare cases in which version-sharp references are actually
1584     necessary (for instance, descriptions of errors), it is inappropriate to
1585     references URLs with dates (e.g.,
1586     \url{http://ivoa.net/rdf/voresource/content_level/2016-08-17/}). URIs
1587     to actual resources (e.g., the XML or Turtle renditions) must never be
1588     used to reference vocabularies.
1590     We do not see a relevant use case for having IVOA vocabularies formally
1591 msdemlei 5758 cited in reference sections of scholarly works: such references will not
1592 msdemlei 5755 aid in finding them, and there is no credible benefit in tracking their
1593     usage from citation in literature.
1596 msdemlei 5459 \appendix
1597 msdemlei 5549 \section{The 2019 IVOA Vocabulary Toolset (non-normative)}
1598 msdemlei 5485 \label{app:tools}
1600 msdemlei 5549 This appendix describes the recommended toolset for authoring IVOA
1601     vocabularies as of 2019. Vocabulary authors may decide to use other
1602     tools but should consider that that may incur additional work for the
1603 msdemlei 5553 chair of the Semantics WG in later maintenance.
1604 msdemlei 5549
1605     This appendix is non-normative. It will serve as documentation of the
1606     toolset and will occasionally be updated as the tooling evolves;
1607     vocabulary authors are still advised to inspect documentation within the
1608 msdemlei 5550 tools. Even major changes here will not lead to a new major version of
1609     the standard.
1610 msdemlei 5549
1611 msdemlei 5550
1612 msdemlei 5549 \subsection{Input Format}
1614 msdemlei 5553 In the current tooling, RDF class and property
1615     vocabularies are authored in simple CSV files
1616 msdemlei 5549 with five columns. These columns are:
1618     \begin{description}
1619     \item[term]
1620 msdemlei 5551 This is the actual, machine-readable vocabulary term. Only use
1621 msdemlei 5549 letters, digits, underscores, and dashes here. As specified in
1622 msdemlei 5619 sect.~\ref{sect:voccontent}, these identifiers should be
1623 msdemlei 5549 human-readable, even though they are not directly intended for human
1624 msdemlei 5551 consumption (clients will use the label). In the interest of
1625     reasonably compact URIs we advise to keep the length of the
1626     terms below, say, 30 characters.
1627 msdemlei 5549 \item[level]
1628     This is used for simple input of wider/narrower relationships.
1629 msdemlei 5619 It is 1 for ``root'' terms. Terms with a level of 2 that follow a
1630 msdemlei 5612 root term become its children. i.e., the tooling will add the
1631     appropriate wider relationship between the level 2 and the level 1
1632     term. You can nest, i.e., have
1633 msdemlei 5549 terms of level 3 below terms of level 2. Note that this means the
1634 msdemlei 5551 order of rows must be preserved in the CSV files: Do \emph{not} sort
1635     vocabulary CSVs.
1636 msdemlei 5549 \item[label]
1637     This is a short, human-readable label for the term. In the VO, this
1638 msdemlei 5758 is generally derived fairly directly from the content of the first
1639     column, usually by
1640 msdemlei 5549 inserting blanks at the right places and fixing capitalisation.
1641     \item[description]
1642     This is a longer explanation of what the term means. We do not
1643     support any markup here, not even paragraphs, so there is probably a
1644 msdemlei 5553 limit to how much can be communicated.
1645 msdemlei 5549 \item[more\_relations]
1646 msdemlei 5758 This column can be used to declare non-hierarchical relationships
1647 msdemlei 5549 and contains whitespace-separated declarations. Each declaration has
1648     the form property[(term)]. Omitting the term is allowed for certain
1649     properties; in RDF, this corresponds to a blank node. See below for
1650 msdemlei 5612 the common properties supported here. Plain terms are resolved
1651 msdemlei 5549 within the vocabulary, but CURIEs with known prefixes or full URIs are
1652     admitted, too.
1653     \end{description}
1655     Non-ASCII characters are allowed in label and description; files must be
1656 msdemlei 5661 encoded in UTF-8, the column separator currently is required to be a
1657     semicolon in order to save on escaping with descriptions (which very
1658     commonly contains commas). Fields that contain semicolons are escaped
1659     with double quotes, embedded double quotes are doubled.
1660 msdemlei 5549
1661 msdemlei 5776 The following properties are supported in the more\_relations
1662 msdemlei 5549 column:
1664     \begin{itemize}
1665 msdemlei 5553 \item \vocterm{ivoasem:deprecated} -- see sect.~\ref{sect:genprop}.
1666     \item \vocterm{ivoasem:useInstead} -- see sect.~\ref{sect:genprop}.
1667     \item \vocterm{ivoasem:preliminary} -- see sect.~\ref{sect:genprop}.
1668 msdemlei 5549 \end{itemize}
1670     \subsection{Vocabulary Metadata}
1671     \label{sect:vocmeta}
1673     Global vocabulary metadata is kept an INI-style format. The following
1674     keys are understood:
1676     \begin{description}
1677     \item[timestamp]
1678     A manually maintained date of the last modification. This is
1679     essentially a version marker and should be changed only in preparation
1680 msdemlei 5612 for a release. It is recommended to set it to the intended release
1681 msdemlei 5549 date during development and not change it for every edit.
1682     \item[title]
1683     A human-readable short phrase saying what the vocabulary describes.
1684 msdemlei 5800 \item[flavour]
1685 msdemlei 5612 One of \textit{RDF Class}, \textit{RDF Property}, or \textit{SKOS}
1686     (where SKOS currently expects RDF/XML serialised SKOS rather than CSV).
1687 msdemlei 5549 \item[description]
1688     A longer text (about a paragraph) stating what the vocabulary should
1689 msdemlei 5567 be used for. No markup is supported here.
1690 msdemlei 5549 \item[authors]
1691 msdemlei 5612 Persons involved with the creation of the vocabulary. These are \emph{not}
1692 msdemlei 5549 the persons to ask for maintenance; all requests for changes should be
1693     directed to the Semantics working group first.
1694     \item[filename]
1695 msdemlei 5612 The tooling expects the input at
1696 msdemlei 5758 \verb|<vocabulary name>/terms.csv|. If it is kept elsewhere, give
1697 msdemlei 5551 the source file name here. This is to support legacy
1698 msdemlei 5612 vocabularies with nonstandard names and native SKOS input.
1699 msdemlei 5549 \item[draft]
1700     While a vocabulary is still being reviewed in its entirety, add a key
1701     draft set to \texttt{True}. This will add language to the effect that
1702     terms may still vanish from the vocabulary and mark all terms as
1703     preliminary. Once the vocabulary is approved, this key is deleted.
1704 msdemlei 5789 \item[licenseuri]
1705     IVOA-managed vocabularies are always made available under CC-0 and
1706     hence do not use this key. External vocabularies as per
1707 msdemlei 5805 sect.~\ref{sect:externally-managed} may be subject to actual licences,
1708     in which case this field holds a URI containing the licence's
1709 msdemlei 5789 conditions.
1710 msdemlei 5813 \item[licensenhtml]
1711     This is arbitrary HTML expressing whatever licence terms may be
1712 msdemlei 5911 attached to an external vocabulary. Again, do not use for IVOA
1713 msdemlei 5813 vocabularies.
1714 msdemlei 5549 \end{description}
1716     Currently, the global metadata is maintained in a file
1717 msdemlei 5758 \verb|vocabs.conf| in the root of the vocabulary source repository, with one
1718 msdemlei 5553 section per vocabulary. The section name is the vocabulary name.
1719 msdemlei 5549
1720 msdemlei 5758 \subsection{Vocabulary Source Repository}
1721 msdemlei 5549
1722     Vocabulary authors are encouraged to maintain their vocabularies in the
1723     shared version control system of the IVOA. At the time of writing, this
1724     is a subversion repository at
1725 msdemlei 5620 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/voc-source}.
1726 msdemlei 5549
1727     Authors of new vocabularies should create a child directory and place
1728     their terms.csv file in there. They should then edit \verb|vocabs.conf|
1729     and add a section named after their directory with the content discussed
1730     in sect.~\ref{sect:vocmeta}.
1732 msdemlei 5610
1733 msdemlei 5550 \section{Current Network Resources (non-normative)}
1734 msdemlei 5551 \label{app:curtech}
1735 msdemlei 5550
1736     This appendix details network resources used in vocabulary management.
1737     It is non-normative and will occasionally be updated as the IVOA's
1738     infrastructure evolves. Even major changes here will not lead to a new
1739     major version of the standard.
1741     The list of vocabulary enhancement proposals is maintained in the IVOA's
1742     wiki at
1743     \url{https://wiki.ivoa.net/twiki/bin/view/IVOA/WebHome?topic=VEPs}.
1744 msdemlei 5553 Approved VEPs will be moved to an archive page linked there.
1745 msdemlei 5550 VEPs may be added as attachments to this page, but authors are
1746     encouraged to maintain them in version controlled repositories instead.
1747     The recommended place to do that is
1748     \url{https://volute.g-vo.org/svn/trunk/projects/semantics/veps}.
1750     The discussion of VEPs (see sect.~\ref{sect:approval}) is to take place
1751     on the appropriate mailing list(s). See
1752 msdemlei 5553 \url{http://ivoa.net/members/index.html} for a directory of IVOA mailing
1753 msdemlei 5550 lists and their addresses.
1755 msdemlei 5754 \section{An Example for a Vocabulary in Desise (non-normative)}
1756     \label{app:desiseexample}
1758 mbt 5798 The following example shows what a vocabulary in desise looks like. The
1759 msdemlei 5754 content is, superficial similarities to real vocabularies
1760     notwithstanding, contrived.
1762     \begin{lstlisting}[language=python]
1763     {
1764     "uri": "http://www.ivoa.net/rdf/example",
1765     "flavour": "RDF Class",
1766     "terms": {
1767 msdemlei 5788 "EQUATORIAL": {
1768     "label": "Equatorial",
1769 msdemlei 5824 "description": "Umbrella term for all sorts of equatorial frames.",
1770 msdemlei 5828 "narrower": ["ICRS", "ICRS2", "BD", "BD1875.0"], "wider": []
1771 msdemlei 5788 },
1772     "ICRS": {
1773     "label": "ICRS",
1774 msdemlei 5824 "description": "As defined by 1998AJ....116..516M.",
1775 msdemlei 5828 "wider": ["EQUATORIAL"], "narrower": []
1776 msdemlei 5788 },
1777     "B1875.0": {
1778     "label": "Bonner Durchmusterung System",
1779 msdemlei 5824 "description": "Deprecated term for the reference system implied by BD/CD",
1780 msdemlei 5828 "deprecated": "",
1781     "wider": ["EQUATORIAL"], "narrower": []
1782 msdemlei 5788 },
1783     "BD": {
1784     "label": "Bonner Durchmusterung System",
1785     "description": "The reference system implied by BD/CD"
1786 msdemlei 5828 "wider": ["EQUATORIAL"], "narrower": []
1787 msdemlei 5788 },
1788     "ICRS2": {
1789     "label": "ICRS 2",
1790 msdemlei 5824 "description": "The reference system defined by 2027A&A..1234...12B",
1791     "preliminary": "",
1792 msdemlei 5828 "wider": ["EQUATORIAL"], "narrower": []
1793 msdemlei 5788 }
1794 msdemlei 5754 }
1795     }
1796     \end{lstlisting}
1798 msdemlei 5459 \section{Changes from Previous Versions}
1800 msdemlei 5922 \subsection{Changes from WD-2020-06-12}
1802     \begin{itemize}
1803     \item No changes to normative material.
1804     \item Adding a use case on vocabulary evolution and on VO-DML.
1805     \item Various editorial changes.
1806     \end{itemize}
1808 msdemlei 5789 \subsection{Changes from WD-2020-03-26}
1810     \begin{itemize}
1811     \item Desise term values are now dicts with label and description to
1812 msdemlei 5824 make it a bit more self-explanatory; this let us pull in preliminary,
1813     deprecated, and wider as well.
1814 msdemlei 5828 \item Desise now contains an inversion of wider, narrower, with meanings
1815 msdemlei 5824 quite different between SKOS and the other flavours.
1816     \item The main media type for Desise is now application/x-desise+json rather
1817     than text/json because there is no text/json, and you can't have
1818     content media type parameters on either.
1819 msdemlei 5813 \item Mentioning licenseuri and licensehtml in the non-normative part on
1820 msdemlei 5828 managing vocabulary metadata. Also stating there that IVOA-managed
1821 msdemlei 5789 vocabularies are CC-0.
1822     \end{itemize}
1825 msdemlei 5661 \subsection{Changes from WD-2019-09-05}
1826 msdemlei 5459
1827 msdemlei 5600 \begin{itemize}
1828 msdemlei 5755 \item We no longer recommend that non-RDF clients use RDF/XML. We have
1829 msdemlei 5752 therefore removed the ``usage with plain XML tooling'' sections. We
1830     have also removed the description of the revovo python module from the
1831     toolset appendix.
1833 msdemlei 5755 \item Instead, we now have the custom ``desise'' format described in a
1834     new section that doubles as a very quick introduction for adopters not
1835     interested in RDF.
1837 msdemlei 5752 \item Adding a use case and requirement for the UAT (and, perhaps,
1838 msdemlei 5758 similar externally curated vocabularies). Adding a section on how
1839     such vocabularies may be integrated into the IVOA RDF repository.
1840 msdemlei 5752
1841 msdemlei 5704 \item Now requiring a \emph{Used-in} item in addition VEPs, implying
1842     that only terms that are already applied may be proposed.
1844     \item Adding \emph{Supercedes} and \emph{Superceded-by} items,
1845     formalising the previous language on ``splitting'' VEPs a bit.
1847 msdemlei 5755 \item Adding advice on referencing vocabularies.
1849 msdemlei 5754 \item We now demand a formal validation of VEPs by the semantics chair.
1850 msdemlei 5705 The responsibility for ``uploading'' the VEP, i.e., adding it to the VEP
1851     index, is now assigned to them.
1852 msdemlei 5756
1853     \item Adding a soapbox section with advice on what to do when proposing
1854 msdemlei 5758 new terms and introducing a naive semantics model.
1855 msdemlei 5600 \end{itemize}
1857 msdemlei 5553 \bibliography{local.bib,ivoatex/ivoabib,ivoatex/docrepo}
1858 msdemlei 5459
1860     \end{document}

ViewVC Help
Powered by ViewVC 1.1.26