/[volute]/trunk/projects/semantics/Vocabularies/Vocabularies.tex
ViewVC logotype

Contents of /trunk/projects/semantics/Vocabularies/Vocabularies.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 5922 - (show annotations)
Thu Jan 14 08:12:17 2021 UTC (3 months ago) by msdemlei
File MIME type: application/x-tex
File size: 80899 byte(s)
Vocabularies: Misc updates prior to PR

* enabling hyphens within prefixed vocterms
* various editorial updates

1 \documentclass[11pt,a4paper]{ivoa}
2 \input tthdefs
3
4 \usepackage{todonotes}
5 \lstloadlanguages{XML,python}
6 \lstset{flexiblecolumns=true,tagstyle=\ttfamily, showstringspaces=False,
7 basicstyle=\footnotesize}
8
9 \definecolor{termcolor}{rgb}{0.6,0.1,0.1}
10
11 \iftth
12 \def\vocterm#1{\emph{\color{termcolor}#1}}
13
14 \else
15 \def\vocterm{\startvocterm\realvocterm}
16 \def\realvocterm#1{\emph{\color{termcolor}#1}\endvocterm}
17 \begingroup
18 \gdef\breakablecolon{:\hskip0pt}
19 \catcode`\:=\active
20 \gdef\startvocterm{\begingroup
21 \catcode`\:=\active\let:=\breakablecolon}
22 \gdef\endvocterm{\endgroup}
23 \endgroup
24 \fi
25
26
27 \newcommand{\vepitem}[1]{\emph{#1}}
28
29 \title{Vocabularies in the VO}
30
31 % see ivoatexDoc for what group names to use here
32 \ivoagroup{Semantics}
33
34 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkusDemleitner]{Markus
35 Demleitner}
36 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/NormanGray]{Norman
37 Gray}
38 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkTaylor]{Mark
39 Taylor}
40
41 \editor{Markus Demleitner}
42
43 \previousversion[https://ivoa.net/documents/Vocabularies/20200612/]
44 {WD-20200612}
45 \previousversion[https://ivoa.net/documents/Vocabularies/20200326/]
46 {WD-20200326}
47 \previousversion[http://ivoa.net/documents/Vocabularies/20190905/]
48 {WD-20190905}
49
50
51 \begin{document}
52 \begin{abstract}
53 In this document, we discuss practices related to the use of RDF-based
54 consensus vocabularies in the Virtual Observatory, that is the creation,
55 publication, maintenance, and consumption of
56 hierarchical word lists agreed upon within the IVOA.
57 To cover the wide range of use cases envisoned, we define three flavours
58 of such vocabularies: SKOS for informal knowledge organisation on the
59 one hand, and strict hierarchies of classes and properties on the other.
60 While the framework rests on the solid foundations of W3C RDF,
61 provisions are made to facilitate using IVOA vocabularies without
62 specific RDF tooling.
63 Non-normative appendices detail the current vocabulary-related tooling.
64 \end{abstract}
65
66
67 \section*{Acknowledgments}
68
69 While this is a complete rewrite of the specification how vocabularies
70 are treated in the VO, we gratefully acknowlegde the groundbreaking work
71 of the authors of version 1 of Vocabulary in the VO, S\'ebastien
72 Derriere, Alasdair Gray, Norman Gray, Frederic Hessmann, Tony Linde,
73 Andrea Preite Martinez, Rob Seaman, and Brian Thomas.
74
75 In particular, the vocabulary for datalink semantics done by Norman Gray
76 was formative for many aspects of what is specified here.
77
78 \section*{Conformance-related definitions}
79
80 The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
81 ``OPTIONAL'' (in upper or lower case) used in this document are to be
82 interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
83
84 The \emph{Virtual Observatory (VO)} is a
85 general term for a collection of federated resources that can be used
86 to conduct astronomical research, education, and outreach.
87 The \href{http://www.ivoa.net}{International
88 Virtual Observatory Alliance (IVOA)} is a global
89 collaboration of separately funded projects to develop standards and
90 infrastructure that enable VO applications.
91
92 \section{Introduction}
93
94 The W3C's Resource Description Framework RDF \citep{note:rdfprimer} is a powerful
95 and very generic means to represent, transmit, and reason on highly
96 structured, ``semantic'' information. With both its power and
97 generality, however, comes a high complexity for consumers of this
98 information if no further conventions are in force. Also, the generic
99 W3C standards understandably do not cover how semantic resources (e.g.,
100 vocabularies or ontologies) are to be managed, let alone developed
101 within organisations like the IVOA.
102
103 While for many applications even within the VO, the significant
104 complexity and the lack of defined management processes is acceptable,
105 for several other use cases -- in particular those given in
106 sect.~\ref{sect:usecases} ––, having extra extra conventions greatly
107 help implementatability and interoperability.
108
109 Based on requirements derived from these use cases
110 (sect.~\ref{sect:requirements}), this standard will therefore define
111 conventions for
112 vocabularies based on either SKOS or RDFS in
113 sect.~\ref{sect:voccontent}. Where these vocabularies -- and hence, in
114 particular, the permanent URIs of their RDF resources (``terms'')
115 -- are managed by the
116 IVOA, they need to be reviewed and consensus be found. A process to
117 ensure this is described in
118 sect.~\ref{sect:management}. In order
119 to provide certain guarantees to clients, sect.~\ref{sect:deployment}
120 defines minimal standards for how IVOA-managed vocabularies must be made
121 available. In order to help adopters simply looking for simple
122 vocabulary-related recipes, sect.~\ref{sect:withoutrdf} discusses how IVOA
123 vocabularies can be used without knowledge of RDF.
124
125 The non-normative appendices~\ref{app:tools} and \ref{app:curtech}
126 describe the tooling
127 currently used or recommended for building and managing vocabularies in the
128 IVOA.
129
130
131 \subsection{Role within the VO Architecture}
132
133 \begin{figure}
134 \centering
135
136 \includegraphics[width=0.9\textwidth]{role_diagram.pdf}
137 \caption{Architecture diagram for this document}
138 \label{fig:archdiag}
139 \end{figure}
140
141 Fig.~\ref{fig:archdiag} shows the role the Vocabularies in VO standard
142 plays within the IVOA architecture \citep{2010ivoa.rept.1123A}.
143
144 This standard defines a set of conventiontions on procedures on
145 top of several W3C standards that can be adopted by other VO standards
146 that require interoperable, consensus vocabularies, such as:
147
148 \begin{bigdescription}
149 \item[Datalink \citep{2015ivoa.spec.0617D}] Datalink includes a
150 vocabulary letting clients work out the kind of artefact a row pertains
151 to.
152
153 \item[VOResource \citep{2018ivoa.spec.0625P}] VOResource 1.1 comes with
154 several (rather flat) vocabularies enumerating, for instance, the types
155 of relationships between VO resources, their intended audiences, or
156 classes of actions performed on them.
157
158 \item[VOEvent \citep{2006ivoa.spec.1101S}] VOEvent defines \emph{Why}
159 and \emph{What} elements which, while not formally required to be drawn
160 from a specific vocabulary in version 1.11, certainly become much more
161 useful if they are.
162
163 \item[VOTable \citep{2019ivoa.spec.1021O}] VOTable, in its version 1.4,
164 introduces vocabularies for time scales and reference positions.
165
166
167 \item[UCDs \citep{2007ivoa.spec.0402M}] UCDs are related to vocabularies in
168 that they provide machine-readable semantics. Because the terms listed
169 in the document can be combined and have an underlying grammar, however,
170 they go beyond standard RDF.
171 \end{bigdescription}
172
173 Other VO standards can do with fewer normative constraints; using W3C
174 standards without the extra requirements laid down here is explitly
175 encouraged where the use cases do not require the extra management and
176 definition effort, or where perhaps more complex structures (e.g., full
177 ontologies) must be employed. An example for a direct use of SKOS
178 without adoption of the present document is the Simulation Data Model
179 SimDM \citep{2012ivoa.spec.0503L}, where several fields the values of
180 which are required to be \vocterm{skos:narrower} than certain top-level
181 concepts but no further restrictions on the vocabularies need to be
182 imposed.
183
184 \subsection{Relationship to Vocabularies in the VO Version 1}
185
186 Published in 2009, version 1.19 of the IVOA Recommendation on
187 Vocabularies in the VO had an outlook fairly different from the present
188 document: The big use case was VOEvent's Why and What, and so its focus
189 was on large, general-purpose vocabularies, of which several existed even
190 back then, while an overhaul of a thesaurus of general astronomical
191 terms approved by the IAU in 1993 was underway as part of IVOA's
192 activities. Mapping between vocabularies maintained by different VO
193 and non-VO parties seemed to be the way to ensure interoperability and
194 therefore played a large role in the document. Also, the use cases
195 called for ``soft'' relations, which is why the standard confined itself
196 to SKOS as the vocabulary formalism.
197
198 Since then, ``the'' large astronomy thesaurus is being maintained
199 outside of the IVOA (the UAT\footnote{\url{http://astrothesaurus.org}}),
200 and there is hope that its takeup will be sufficient to make mapping
201 between it and, say, legacy journal keyword systems an exercise general
202 clients will not have to perform.
203
204 Instead, in 2010, a fairly formal vocabulary of what
205 should be properties (in the RDF sense) rather than \vocterm{skos:Concept}-s
206 was required during the development of the datalink standard. The
207 vocabulary was (and still is) small in comparison to, say, the UAT. In
208 contrast to the expectations of Vocabularies~1, the plan had been that
209 most data providers would work with this small vocabulary, and terms
210 from external vocabularies would only be used as temporary stand-ins
211 until the consensus vocabulary was updated. Of course, this required a
212 process for managing such vocabularies. The lack of such a process
213 became even more noticeable when VOResource 1.1 and VOTable 1.4
214 introduced vocabularies of their own similar in size and scope to the
215 datalink vocabulary.
216
217 On the other hand, we are not aware of a single attempt to map
218 between different vocabularies in a VO context, and the SKOS versions of
219 some vocabularies that Vocabularies 1 declared as normative in its
220 section~4 were largely unused and have been unmaintained for a while now.
221
222 Since large parts of the original specification turned out to be
223 irrelevant or unsustainable as the VO ecosystem evolved,
224 while some core requirements found later
225 were not addressed, it was decided to prepare a new major version of the
226 Vocabularies in the VO standard.
227
228 \subsection{Reading Guide}
229
230 We hope that software authors or annotators just wanting to consume IVOA
231 vocabularies or use them to annotate documents will be able to
232 do so after reading just section~\ref{sect:withoutrdf}. In particular, no
233 deeper understanding of RDF should be necessary.
234
235 Persons intending to participate in vocabulary evolution should skim
236 sect.~\ref{sect:voccontent}, in particular the subsection on the kind of
237 vocabulary they want to modify, and must study
238 sect.~\ref{sect:management}.
239
240 Readers unfamiliar with RDF should read \citet{local:normanspaper} before
241 reading anything outside of section~\ref{sect:withoutrdf}.
242 In particular, we assume familiarity with all RDF
243 terminology discussed there. Concepts not covered by Gray's
244 essay will be informally introduced here. Of course, the
245 underlying W3C standards are normative where applicable.
246
247
248
249 \subsection{Terminology, Conventions, Typography}
250
251 When we speak of \emph{term} here, that either means a \vocterm{skos:Concept}
252 in SKOS vocabularies, an \vocterm{rdfs:Class} in RDF class vocabularies,
253 and an \vocterm{rdf:Property} in RDF property vocabularies. We also use
254 \emph{term} for ``the string after the hash character in
255 the RDF resource URI'', i.e., the machine-readable string typically used
256 in annotation. It is rarely necessary to distinguish between the two
257 meanings.
258
259 We refer to classes and properties by CURIEs. The prefixes in this
260 document correspond the the following URIs:
261
262 \begin{compactitem}
263 \item dc -- \url{http://purl.org/dc/terms/}
264 \item rdf -- \url{http://www.w3.org/1999/02/22-rdf-syntax-ns#}
265 \item rdfs -- \url{http://www.w3.org/2000/01/rdf-schema#}
266 \item owl -- \url{http://www.w3.org/2002/07/owl#}
267 \item skos -- \url{http://www.w3.org/2004/02/skos/core#}
268 \item ivoasem -- \url{http://www.ivoa.net/rdf/ivoasem#}
269 \end{compactitem}
270
271 Vocabulary terms are written in italics (e.g., \vocterm{rdfs:Class})
272 and, where supported, in a reddish hue. As common in IVOA
273 specifications, XML element and attribute names are written in
274 typewriter italic (e.g., \xmlel{img}).
275
276 \section{Derivation of Requirements (Non-Normative)}
277
278 \subsection{Use Cases}
279 \label{sect:usecases}
280
281 The normative content of this document is guided by a set of
282 requirements derived from the following use cases.
283
284 \subsubsection{Controlled Vocabulary in VOResource}
285 \label{uc:simplevoc}
286
287 In VOResource, in certain use cases clients have to find services that
288 publish a given data collection. This is effected by linking the resource
289 records for service and data with a
290 DataCite-compatible \vocterm{isServedBy} relationship.
291 Its concrete literal needs to be reliably defined in order to let
292 clients find such relationships by a simple string comparison in RegTAP
293 queries.
294
295 A related use case is that validators can flag errors (or at least
296 warnings) when resource records use terms that are not part of some
297 controlled vocabulary (e.g., content levels or types of events in a
298 resource's history). Very typically, such out-of-vocabulary terms
299 indicate small oversights on the part of the resource record author that
300 will lead to hard-to-debug problems in data discovery.
301
302 \subsubsection{Controlled Vocabularies in VOTable}
303 \label{uc:votvoc}
304
305 VOTable 1.4 constrains two attributes of the TIMESYS elements
306 -- reference positions and time
307 scales -- using vocabularies.
308 While with time scales the situation is not fundamentally
309 different from the VOResource case discussed in
310 use case.~\ref{uc:simplevoc} -- a simple enumeration of agreed-upon strings
311 is enough to uniquely determine what operations need to be performed to
312 combine times given in different time scales --, the situation for
313 reference positions is probably different. There, even if a client does
314 not exactly know the location of, say, the Hubble Space Telescope at any
315 given time, several important use cases can already be satisfied if a
316 client knows that it is in lower Earth orbit (e.g., assuming a reference
317 position Geocenter and adjusting the systematic error estimates). For
318 this, a client needs information of the type ``\vocterm{HST}
319 \vocterm{is-close-to} \vocterm{GEOCENTER\/}'' (or similar).
320
321 There is also another difference between this and at least the
322 VOResource relationship vocabulary from use case~\ref{uc:simplevoc}
323 in that the latter is property-like, as
324 in ``Resource-1 \vocterm{isServedBy} Resource-2\/''. In constrast with
325 this, a time scale would be used like ``Time-coordinate
326 \vocterm{is-given-in}
327 \vocterm{TT\/}''. In RDFS terminology, they are therefore better modelled
328 as classes rather than properties.
329
330 \subsubsection{Datalink Link Selection}
331 \label{uc:links}
332
333 In Datalink, clients receive a set of links
334 to pieces of information (e.g., previews, additional metadata,
335 progenitors, or
336 derived data) and need to present to the user only those items
337 relevant to the task at hand. For instance, in a discovery phase, only
338 previews should be offered, while scientific exploitation would call for
339 cutout services, alternate formats, or derived data. For debugging,
340 progenitors should be made accessible, and so on.
341
342 Operators of datalink services, on the other hand, want to be precise in
343 their annotation of datasets. For instance, they may want to discern
344 among progenitors the raw image, a dark frame, and a flat field. In all
345 these cases, clients should still be able to work out that such
346 artefacts are progenitors.
347
348 \subsubsection{VOEvent Filtering, Query Expansion}
349 \label{uc:filtering}
350
351 In VOEvent, an event stream can contain a classification of what the
352 observers believe was observed, for instance ``supernova Ia explosion''.
353 While an event stream from one project might provide a classification on
354 that level for some event, it might not (yet) be able to do that in
355 another event, and a different event stream might not be able to
356 distinguish between different sorts of supernovae at all.
357
358 In this situation, an event broker looking for supernovae of type Ia
359 will filter out anything not related to supernovae; however, since for
360 one reason or another a Ia supernova might only be tagged as supernova,
361 it will want to widen its filter somewhat, where some backend process
362 might prioritise events classified as Ia upstream over those only tagged
363 as a generic supernova, and those, again, over those tagged explicitly
364 as some different type of supernova.
365
366 Similar use cases exist, for instance, in the discovery of simulations
367 and possibly for subjects of VO resources.
368
369
370 \subsubsection{Vocabulary Updates in VOResource}
371 \label{uc:deprecation}
372
373 In VOResource 1.0, relationship types like \vocterm{served-by} or
374 \vocterm{service-for} were defined. Later, DataCite defined equivalent
375 terms \vocterm{IsServedBy} and \vocterm{IsServiceFor}. Arguably, the VO should,
376 as far as sensible, take up standards in the wider data management
377 community, and so VOResource 1.1 adopts the DataCite terms. In a minor
378 version, it cannot forbid the old terms. It can, however, say not only
379 ``\vocterm{served-by\/} is the same as \vocterm{isServedBy\/}'' but also
380 ``Use the latter term in preference to the former''. If this information is
381 available machine-readably, validators can warn against the use of
382 deprecated terms and user interfaces can transparently replace
383 deprecated terms with current ones. This latter use case is is
384 already specified in RegTAP 1.1 \citep{2019ivoa.spec.1011D}.
385
386 Another use case in the context of VOResource and vocabulary updating
387 is the definition of content levels. In VOResource 1.0, a list of
388 terms was adopted that was far too fine-grained in the area of public
389 outreach, distinguishing, for instance, ``Middle School'' from
390 ``Secondary Education''; while this granularity was useful for the
391 original realm of the list of terms, in the VO it resulted in extremely
392 inhomogeneous annotation. Obviously, persons employed in research
393 institutions can hardly be expected to assess needs and capabilities of
394 middle school versus elementary school educators. Eventually, for
395 VOResource 1.1 a three-term list was drawn up and is now actually used.
396 To avoid a repetition of such an experience, we want to enable small
397 initial vocabularies easily extendable as new terms are actually needed
398 and the use of the existing terms is well understood.
399
400
401 \subsubsection{Vocabularies in VO-DML}
402
403 The modelling language VO-DML \citep{2018ivoa.spec.0910L} lets model
404 designers constrain attribute values though external resources defined
405 through a vocabulary URI and possibly a top concept. The standard
406 mentions both SKOS -- inspired by version 1 of this document -- and RDFS
407 as possible technologies for such constraints.
408
409 Depending on the nature of the attributes constrained, modellers might
410 forsee the need for having these vocabularies managed by the IVOA. Of
411 course, that is up to the modeller: There are certainly many cases in
412 which there is no need for the overhead this specification brings with
413 it, be it because vocabularies are externally defined or because the
414 concrete application profits from less-constrained vocabularies.
415
416 \subsubsection{Discovering Meanings}
417 \label{uc:discovering}
418
419 Software developers or researchers want to work out
420 what some term mentioned ``means'' (where we are agnostic as to what
421 ``means'' should mean here). If the term URI alone is insufficient,
422 they can simply paste the resource URI of the term into a web browser
423 and read (at least) its description and perhaps find out even more using
424 relationships between terms.
425
426 \subsubsection{Simple Review Process}
427 \label{uc:simplereview}
428
429 As vocabularies evolve, new terms are being added to
430 vocabularies. To facilitate their review and enable rapid uptake
431 of the proposed terms, it is desirable that new terms and even
432 new vocabularies are immediately visible to users and tools.
433 Note that since terms under review might be modified or removed later,
434 this use case is somewhat in conflict with the basic requirement
435 of stable vocabularies (i.e., a document valid once will not
436 become invalid later because of changes in vocabularies).
437
438 \subsubsection{Understanding Vocabulary Evolution}
439 \label{uc:understanding}
440
441 When a question coes up what, say, \vocterm{calibration} actually means
442 in the datalink core vocabulary, and the (legacy) description is not
443 sufficiently clear, people can go back to the discussions that lead up
444 to the addition of that term. This will also help clarify existing
445 usage that might have begun at the time of the initial definition.
446
447 \subsubsection{Offline operation}
448 \label{uc:offline}
449
450 A system doing, say, coordinate transformations runs without an internet
451 connection but still needs to use semantic resources on frames and
452 reference positions (e.g., figure out that a given space probe is in L1
453 and use that as reference position). To do that, it wants to use a
454 previously downloaded copy of the vocabulary.
455
456 \subsubsection{UAT in VOResource}
457 \label{uc:uat}
458
459 VOResource 1.1, in the description of the \xmlel{subject} element, says
460 that its content ``should be drawn from the Unified Astronomy Thesaurus''
461 (here: UAT). This is intended to later facilitate interactive topic
462 navigation within the Registry or semantic expansion of Registry queries
463 (``include narrower terms'').
464
465
466 \subsection{Requirements}
467 \label{sect:requirements}
468
469 \subsubsection{Lists of Terms}
470 \label{req:lists}
471
472 We need to be able to represent simple lists of terms even for the most
473 basic use case~\ref{uc:simplevoc}. As per
474 use case~\ref{uc:votvoc}, we will have to represent instances of both
475 \vocterm{rdf:Property} and \vocterm{rdfs:Class} (though not necessarily
476 in one vocabulary). In order to not break existing practices (e.g.,
477 use cases \ref{uc:simplevoc}, \ref{uc:votvoc}, \ref{uc:links}), the
478 machine-readable terms must be allowed to follow existing patterns of
479 essentially human-readable identifiers (against external best practices
480 of using non-informative URI forms). In general, in essentially all use
481 cases discussed, making the machine-readable terms discernable by a
482 human is an advantage.
483
484 \subsubsection{Hierarchies of Terms}
485 \label{req:hierarchy}
486
487 Both use case~\ref{uc:links} and use case~\ref{uc:filtering} require a hierarchy
488 of terms, where clients can find wider and potentially narrower terms
489 relative to an original one. There is a difference,
490 however: in the datalink use-case, strict \vocterm{is-a} relationships
491 are what clients need (e.g., ``give me all kinds of previews''). In the
492 VOEvent case, however, a somewhat softer sort of hierarchy is required.
493 For instance, a filter for accretion disks might very well expand to
494 match both quasars and cataclysmic variables. Hence, we want to
495 be able to represent strict class hierarchies as well as thesaurus-like
496 soft knowledge structures.
497
498 \subsubsection{Tree-like Hierarchies}
499 \label{req:tree}
500
501 Where we expect some sort of semi-formal inference to take place on the
502 vocabularies, the hierarchy should be a tree in order to facilitate
503 traversal and controlled query expansion. In other words, outside of
504 SKOS we do not support multiple inheritance. Use cases requiring
505 something equivalent would have to resort to supporting multiple terms
506 on the annotation level.
507
508 \subsubsection{Consensus Vocabularies}
509 \label{req:consensus}
510
511 Essentially all our our use cases will be much easier to implement if
512 clients can work through simple string comparisons. Therefore,
513 wherever feasible IVOA standards should build on IVOA-sanctioned,
514 consensus vocabularies.
515
516 \subsubsection{Deprecating Terms}
517 \label{req:deprecating}
518
519 While we believe at this point that terms once approved by the IVOA
520 should never disappear -- for instance, because validators might
521 otherwise flag previously valid instance documents as invalid --, use
522 case~\ref{uc:deprecation} shows that some way of declaring
523 deprecations must be forseen.
524
525 \subsubsection{Public Availability of Machine-Readable Vocabularies}
526 \label{req:machine}
527
528 In particular in use cases~\ref{uc:links} and \ref{uc:filtering},
529 clients can flexibly incorporate vocabulary updates without code
530 changes, perhaps even without re-deployment, if vocabularies are
531 available at constant, public URIs, where clients can retrieve them in
532 formats reasonably easy to parse.
533
534 Use case~\ref{uc:discovering} implies that at least one representation
535 of the vocabulary should be human-readable.
536
537 \subsubsection{Minimal Term Metadata}
538 \label{req:mtm}
539
540 To support use case~\ref{uc:discovering}, all terms in IVOA vocabularies
541 MUST come with a non-trivial description.
542
543 \subsubsection{Simple Cases do not Require RDF Tooling}
544 \label{req:nordf}
545
546 (Not derived from any specific use case). Since libraries implementing
547 (some subset of) RDF tend to be rather massive and thus appear
548 unproportional when all a client wants is an up-to date list of terms
549 with their descriptions, at least the basic use cases must not require
550 specific RDF tooling. Indeed, simple uses should not require an
551 understanding of RDF in the first place.
552
553
554 \subsubsection{Vocabulary Evolution}
555 \label{req:evolution}
556
557 Most use cases make it desirable that terms can be added to existing
558 vocabularies; this is very clear for the reference positions in
559 use case~\ref{uc:votvoc}, where new instruments would imply new
560 terms. The history of content level annotation in VOResource mentioned
561 in use case~\ref{uc:deprecation} illustrates the desirability of a
562 simple process that invites standard authors to start with minimal
563 vocabularies, relying on later extensions.
564
565 \subsubsection{Traceable Provenance}
566 \label{req:traceable}
567
568 To satisfy use case~\ref{uc:understanding}, the considerations that led
569 to the adoption or modification of a term must be documented publicly
570 in sufficient detail. It is clearly an advantage if a brief, accessible
571 summary of these considerations can easily be found without, say,
572 resorting to version control logs.
573
574 \subsubsection{Preliminary Vocabularies and Terms}
575 \label{req:preliminary}
576
577 In use case~\ref{uc:simplereview}, it is desirable to admit
578 ``preliminary'' vocabularies and terms. For these, both humans
579 and machines must be able to discern a temporary status, and
580 their use implies that the general rule ``once valid, always
581 valid'' does not apply. Validators and similar software could
582 then add notices to that effect in their outputs.
583
584 \subsubsection{Vocabulary Files are Usable Stand-Alone}
585 \label{req:standalone}
586
587 Vocabulary files need to be cacheable without applications having to
588 manage extra metadata (e.g., the URL from which the file was obtained)
589 in order to easily satisfy use case~\ref{uc:offline} (or other scenarios
590 in which vocabulary content cannot be retrieved from the IVOA
591 site for each session).
592
593 \subsubsection{Externally Curated Vocabularies and VO Tooling}
594 \label{req:external}
595
596 Regrettably, VOResource does not explain how use case~\ref{uc:uat} would
597 look like in actual documents, and the example given in the document
598 clearly does not use UAT concepts.
599
600 The first difficulty in a straightforward uptake is that UAT URIs look
601 like \url{http://astrothesaurus.org/uat/1774}. Given that, should
602 publishers have such URIs in \xmlel{subject}? Or should they rather use
603 just the last URI segment for conciseness? Or perhaps the preferred
604 labels, in keeping with the style of existing subject content and its
605 use by clients (which typically look for natural language in subject),
606 even though the labels are not considered stable?
607
608 Regardless of how VOResource clarifies this matter, UAT artefacts (e.g.,
609 SKOS files), do not match some of our other requirements. In particular,
610 the human-readable URIs from \ref{req:lists}, the specific way we
611 satisfy \ref{req:machine}, and the non-RDF requirement \ref{req:nordf} are
612 not immediately satisfied by the UAT as distributed at the time of
613 writing.
614
615 For simple, uniform use of such externally curated vocabularies, it
616 should be possible to have some sort of endorsement process and then
617 distribute the vocabularies in a form compliant with this specification.
618 This will entail IVOA-specific concept URIs, and we must be able to
619 express that these resources have the same meaning as the ones
620 externally maintained.
621
622
623 \subsection{Non-Requirement}
624
625 This specification is not called ``Semantics in the VO'' or the like
626 because we do \emph{not} intend to prescribe ways to turn any VO
627 artefact into RDF triples. Indeed, for many existing vocabularies, it
628 is left open what exactly the domain or range of properties might be or
629 what subject and predicate the classes or concepts should be used with.
630
631 This is partly because this would substantially complicate the
632 generation of vocabularies -- which would quickly turn into proper
633 ontologies --, partly because the information encoded by
634 the triples has traditionally been expressed using techniques developed
635 by the Data Models working group.
636
637 In particular with a view to later use in linked data scenarios,
638 vocabulary authors should neverthess take care that, given appropriate
639 properties or annotation tools, the vocabularies \emph{could} be used in
640 meaningful RDF triples.
641
642 Conversely, this specification is written with future ``deeper''
643 semantics in the VO in mind; tools restricting their operations to the ones
644 discussed here should not break when future specifications enrich
645 existing vocabularies towards full ontologies.
646
647
648 \section{Using IVOA Vocabularies without RDF Tooling}
649 \label{sect:withoutrdf}
650
651 RDF is a
652 powerful system for expressing a wide range of semantics and enriching
653 various documents with semantic information in a globally distributed
654 fashion. Due to its generality, handling its artefacts is relatively
655 involved and in general requires special tooling, non-negligible
656 investment in understanding RDF, and non-trivial management of URIs and
657 prefix mappings.
658
659 To lower the bar for an adoption of IVOA vocabularies
660 [requirement~\ref{req:nordf}], they are given in
661 two formats usable without RDF tooling or, indeed, deeper knowledge of
662 RDF. This section discusses these.
663
664 \subsection{Choosing Terms From IVOA Vocabularies}
665
666 Resource annotators can usually treat IVOA Vocabularies as simple lists
667 of (case-sensitive) strings with human-readable labels and definitions.
668 These lists can be inspected with a simple web browser.
669
670 Each IVOA vocabulary has an associated URI starting with
671 \url{http://www.ivoa.net/rdf}. Dereferencing that URI yields a list of
672 the vocabularies approved or under review.
673
674 An individual vocabulary has a
675 URI like \url{http://www.ivoa.net/rdf/refposition}. Dereferencing this URI
676 with a web browser (or, indeed, any user agent indicating it prefers
677 text/html media) redirects to a tabular representation of the vocabulary,
678 giving \emph{terms} -- i.e., the strings actually used in annotation --,
679 \emph{labels} -- i.e., strings that should be presented to humans instead of
680 the slightly formalised terms --, and \emph{descriptions}, which should
681 be sufficiently precise to allow someone with a certain amount
682 of domain expertise to decide whether a certain ``thing'' is or is not
683 covered by the term (or more precisely, the underlying concept).
684
685 Some terms may be marked as deprecated, in which case they should no
686 longer be used in new annotations. In most cases, deprecated terms will
687 come with information about what to use instead.
688
689 Some terms may be marked as preliminary. Such terms might disappear
690 without further notice. Casual users should avoid the use of such
691 terms; if they find they want to use them, the semantics working group
692 requests notification over its mailing list, since such use is clearly
693 relevant to the term's adoption process.
694
695 Once a term is located within the HTML page, annotators can usually
696 directly use it in instance documents. For instance, continuing the
697 refposition example, the string \texttt{BARYCENTER} found in the
698 vocabulary is directly used in VOTable's TIMESYS element.
699
700 Some applications (Datalink being the prime example) instead use URIs
701 relative to the vocabulary URI. In practical terms, this just means
702 that a hash sign is prepended to the term (e.g., \texttt{\#progenitor}).
703
704 This latter practice builds on the property of IVOA vocabularies that if
705 one adds the term as fragment to the vocabulary URI (e.g.,
706 \url{http://ivoa.net/rdf/refposition#BARYCENTER}), that URI is the full,
707 RDF-compliant resource identifier of the concept. When used in
708 HTML-aware user agents (such as a web browser), dereferencing this URI
709 (i.e., opening it) will give the table of terms with the chosen term
710 highlighted. How exactly this is represented depends on the user agent.
711
712
713 \subsection{Semantic Operations Without RDF Tooling}
714 \label{sect:desise}
715
716 Many VO components need a machine-readable representation of the
717 entire vocabulary, for instance in order to
718 (cf.~sect.~\ref{sect:usecases}):
719
720 \begin{compactitem}
721 \item display labels and descriptions for terms to users,
722 \item perform query expansion or similar exploitation of hierarchical
723 relationships, or
724 \item validate annotated instances for the use of correct and current
725 terms.
726 \end{compactitem}
727
728 To let VO programs perform such tasks with minimal technical overhead,
729 in addition to the RDF artefacts described in
730 sect.~\ref{sect:deployment}, IVOA vocabularies are also available in an
731 ad-hoc format called desise (``dead simple semantics''). Clients can
732 obtain vocabularies in desise by retrieving the vocabulary URI with the
733 HTTP accept header set to \texttt{application/x-desise+json}.
734
735 What is returned is a JSON-encoded \citep{std:JSON} mapping (``object''
736 in JSON terms)
737 containing the following keys (all mandatory):
738
739 \begin{description}
740 \item[uri] The vocabulary URI. All terms occurring in desise documents
741 can be turned into full, RDF-compliant resource URIs by prefixing them
742 with this URI and a hash character.
743 \item[flavour] The flavour of the vocabulary (can generally be ignored;
744 see sect.~\ref{sect:voccontent}).
745
746 \item[terms] A JSON object mapping the (machine-readable) terms to a
747 JSON object giving the term's properties as described below.
748 The keys in \textit{terms} are the strings used in
749 machine-readable data.
750 \end{description}
751
752 The JSON objects present as values in the terms object can have the
753 following keys:
754
755 \begin{description}
756 \item[label] (mandatory)
757 A human-readable label for display purposes; clients should
758 always try to display this rather than the raw term.
759
760 \item[description] (mandatory) A human-readable definition of the underlying
761 concept.
762
763 \item[deprecated] present and mapped to a reserved value if the term is
764 deprecated and should no longer be used; validators will warn against
765 its use.
766
767 \item[preliminary] present and mapped to a reserved value if the term
768 is preliminary, meaning that in contrast to the other, ``eternal'' terms
769 it can disappear again; validators should qualify a validation as
770 preliminary if a document uses such a term.
771
772 \item[wider] (mandatory) A JSON array
773 of ``wider'' terms. Most IVOA vocabularies are
774 tree-like, and for them, there is only up to one term in here, which
775 would be the the parent node, which is the hypernym of the current term.
776 In SKOS-flavoured vocabularies, multiple terms can be here, and the
777 meaning of ``wider'' is a bit less clear-cut. The \textit{wider} list
778 is empty for top-level terms.
779
780 \item[narrower] (mandatory) A JSON array
781 of ``narrower'' terms. In SKOS-flavoured
782 vocabularies, that is just a list of all terms that list the current
783 term as wider. Otherwise, the vocabularies are tree-like and
784 \textit{narrower} is a list of all terms on the term's branch and below
785 it in the tree (it is the ``transitive closure of the inverse of
786 wider''). This is much more easily understood in an example, which we
787 give below in the discussion on addressing use case~\ref{uc:links} below.
788 \end{description}
789
790 Note that, while \textit{wider} and \textit{narrower} are mandatory
791 keys, their values can of course be empty lists.
792
793 See appendix~\ref{app:desiseexample} for a example of a vocabulary
794 represented in desise.
795
796 For illustration, here are recipes to solve the various use cases in
797 Python:
798
799 \paragraph{Load a vocabulary} Using the popular requests module:\\
800 \begin{lstlisting}
801 import requests
802 voc = requests.get(
803 "http://www.ivoa.net/rdf/uat",
804 headers={"accept": "application/x-desise+json"}
805 ).json()
806 \end{lstlisting}
807
808 Note, however, that non-trivial clients should cache files retrieved in
809 this way for a reasonable time span; IVOA vocabularies typically do not
810 change on time scales of months.
811
812 \paragraph{See if a term is in the vocabulary} (\ref{uc:simplevoc},
813 \ref{uc:votvoc})\\ \lstinline{term in voc["terms"]}
814
815 \paragraph{See if a term is deprecated} (\ref{uc:deprecation})\\
816 \lstinline{"deprecated" in voc["terms"][term]}
817
818 \paragraph{Find a human-readable label for a term}
819 (\ref{uc:discovering})\\
820 \lstinline{voc["terms"][term]["label"]}
821
822 \paragraph{Find a human-readable description for a term}
823 (\ref{uc:discovering})\\
824 \lstinline{voc["terms"][term]["description"]}
825
826 \paragraph{Find out if a term is preliminary} (\ref{uc:simplereview})\\
827 \lstinline{"preliminary" in voc["terms"][term]}
828
829 \paragraph{Query expansion: select branch} (in \ref{uc:links}, select all
830 progenitors, including flat fields, dark frames, etc)
831 \begin{lstlisting}[language=python]
832 base_term = "progenitor"
833 expanded_terms = set(
834 [base_term]
835 +voc["terms"][base_term]["narrower"])
836 is_match = datalink_row["semantics"][1:] in expanded_terms
837 \end{lstlisting}
838
839 \paragraph{SKOS-type query expansion by neighbouring terms}
840 (\ref{uc:filtering})
841 \begin{lstlisting}[language=python]
842 assert voc["flavour"]=="SKOS"
843 expanded_terms = set(
844 [base_term]
845 +voc["terms"][base_term]["narrower"]
846 +voc["terms"][base_term]["wider"])
847 is_match = keyword_found in expanded_terms
848 \end{lstlisting}
849
850
851 \section{Vocabulary Content}
852 \label{sect:voccontent}
853
854 IVOA vocabularies MUST be based on W3C's Resource Description Framework.
855 Details on required serialisations are given in
856 sect.~\ref{sect:deployment}. This section deals with what kinds of
857 statements users of IVOA vocabularies SHOULD evaluate to ensure
858 interoperability. Statements of other types are legal in IVOA
859 vocabularies but are not expected to be interpreted interoperably.
860 Clients MAY ignore them.
861
862 In IVOA vocabularies, the concept URI MUST begin with
863 \url{http://www.ivoa.net/rdf}\footnote{In retrospect, the unnecessary
864 ``www'' in this URI is somewhat regrettable, but existing vocabularies
865 have used URIs including it, and it seems a small price to pay for
866 having uniform URIs}. It is recommended to not introduce
867 additional hierarchy levels, i.e., vocabulary URIs SHOULD be direct children
868 of \texttt{rdf}\footnote{Some existing vocabularies do not follow this
869 rule; since vocabulary URI changes will break certain usage scenarios,
870 their URIs are still retained.}.
871
872 Since all vocabularies specified here are
873 single-file, the full term (i.e., RDF resource)
874 URI is formed by appending a hash sign
875 and a fragment identifier. In IVOA vocabularies, this fragment
876 identifier MUST consist of ASCII letters, numbers, underscores and
877 dashes exclusively [for requirement~\ref{req:machine}].
878
879 The fragment identifiers in the vocabulary URIs SHOULD be
880 human-readable, usually by suitably contracting the
881 preferred label. In the IVOA, we do \emph{not} use natural
882 language-neutral concept identifiers but instead expect that domain
883 experts will already have an impression of a term's meaning from looking
884 at its URI.
885
886 In this specification, we distinguish three different ``flavours'' of
887 vocabularies. Each covers a particular domain of problems and is
888 therefore subject to different requirements.
889 Although the requirements are largely non-contradicting, each vocabulary must
890 be clearly identified as \emph{either} giving SKOS concepts, RDFS
891 classes or RDF properties so clients know how to extract word lists and
892 hierarchies; see sect.~\ref{sect:genprop}
893 for details.
894
895
896 \subsection{SKOS Vocabularies}
897 \label{sect:skosvoc}
898
899 SKOS vocabularies should be used where terms are organised
900 in informal (i.e., non necessarily strict is-a)
901 hierarchies. The classic use case here is query expansion, where, for
902 instance, a search for ``AGN'' might be expanded to include matches for
903 ``accretion disk'' (under certain circumstances).
904
905 The terms in SKOS vocabularies have the RDF type \vocterm{skos:Concept}.
906
907 \subsubsection{Properties in SKOS Vocabularies}
908 \label{sect:skosvoc-prop}
909
910 IVOA SKOS vocabularies use the following properties:
911
912 \begin{itemize}
913 \item \vocterm{skos:broader} -- interpreted in the standard SKOS sense.
914 The reverse property, \vocterm{skos:narrower}, MAY be given, but clients
915 MUST NOT depend on their presence [this satisifies
916 requirement~\ref{req:hierarchy}].
917
918 \item \vocterm{skos:prefLabel} -- all concepts MUST have an
919 English-language preferred label, which is an RDF plain literal [by
920 requirement~\ref{req:mtm}]. No RDF language label is allowed on the
921 literal, and only one preferred label is permitted
922 [these help requirement~\ref{req:nordf}].
923
924 \item \vocterm{skos:definition} -- all concepts MUST have a non-trivial
925 English-language definition. It is obviously impossible to define
926 ``non-trivial'' in a rigorous way; a suggested criterion is that a
927 domain expert would, given the definition, presumably arrive at a
928 similar preferred label, and recursive definitions (i.e., those using
929 the label itself) should be avoided whenever possible. Definitions in
930 non-English languages are not permitted, and only one definition is
931 permitted [again, this helps requirement~\ref{req:mtm}].
932
933 \item \vocterm{skos:exactMatch} -- for externally managed vocabularies
934 the IVOA has endorsed (see sect.~\ref{sect:externally-managed}), this
935 property links the IVOA term (subject) to the external RDF resource
936 (object).
937
938 \item General properties discussed in \ref{sect:genprop} [this is
939 for requirements~\ref{req:deprecating} and
940 \ref{req:preliminary}]. The \vocterm{ivoasem:vocflavour} of these
941 vocabularies is \verb|SKOS|.
942 \end{itemize}
943
944 This specification does not include requirements on the use or the
945 interpretation of \vocterm{skos:related},
946 \vocterm{skos:closeMatch}, \vocterm{skos:broadMatch},
947 \vocterm{skos:narrowMatch}, \vocterm{skos:ConceptScheme},
948 \vocterm{skos:inScheme}, \vocterm{skos:hasTopconcept},
949 \vocterm{skos:altLabel}, and \vocterm{skos:hiddenLabel}. If use cases
950 are found that require those, this specification will be amended. Until
951 then, vocabulary authors SHOULD NOT use them in order to avoid creating
952 practices that might conflict with later usage patterns.
953
954 This specification does not include requirements on the use or the
955 interpretation of the transitive SKOS properties
956 (\vocterm{skos:broaderTransitive}, \vocterm{skos:narrowerTransitive}).
957 At this point, we believe that applications requiring this type of
958 reasoning-friendly semantics should preferably use RDF class
959 vocabularies.
960
961 \subsubsection{Example (non-normative)}
962
963 Here is a term from a SKOS vocabulary conforming to this specification
964 in RDF/XML serialisation:
965
966 \begin{lstlisting}[language=XML]
967 <skos:Concept rdf:about="http://ivoa.net/rdf/AstronomicalObjects#AGN">
968 <skos:prefLabel>AGN</skos:prefLabel>
969 <skos:definition>A compact object in the center of a galaxy showing
970 unusual emission ("active galactic nucleus").</skos:definition>
971 <skos:broader rdf:resource
972 ="http://ivoa.net/rdf/theory/AstronomicalObjects#OpticalSource"/>
973 <skos:broader rdf:resource
974 ="http://ivoa.net/rdf/theory/AstronomicalObjects#CompoundObject"/>
975 </skos:Concept>
976 \end{lstlisting}
977
978 \subsection{RDF Properties Vocabularies}
979 \label{sect:refpropvoc}
980
981 RDF properties vocabularies should be used when the terms in the
982 vocabulary are mainly used to state
983 relationships between entities that can sensibly be imagined as
984 resources in the RDF sense. Such terms would naturally be used as
985 predicates in RDF triples. Obvious examples might be something
986 like is-progenitor-for in a provenance chain or, indeed, the special
987 properties for IVOA vocabularies introduced in sect.~\ref{sect:genprop}.
988
989
990 The terms in RDF Properties vocabularies have the RDF type
991 \vocterm{rdf:Property}.
992
993 \subsubsection{Properties in RDF Properties Vocabularies}
994 \label{sect:propvoc-prop}
995
996 IVOA RDF properties vocabularies use the following properties (where
997 not specified, the requirements considered essentially match those in
998 sect.~\ref{sect:skosvoc-prop}):
999
1000 \begin{itemize}
1001 \item \vocterm{rdfs:label} -- all terms MUST have an English-language
1002 label, and clients should prefer it over the fragment in the
1003 term URI for presentation purposes. Only
1004 one such label is permitted.
1005
1006 \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial
1007 English-language comment serving as a human-oriented definition of the
1008 term. The considerations for \vocterm{skos:definition} in
1009 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one
1010 \vocterm{rdfs:comment} per term is allowed.
1011
1012 \item \vocterm{rdfs:subPropertyOf} -- interpreted as in RDFS to induce
1013 the hierarchy of terms; a term MUST NOT appear as subject of more than
1014 one \vocterm{rdfs:subPropertyOf} triple (i.e., the hierarchy is a tree).
1015
1016 \item General properties discussed in sect.~\ref{sect:genprop}.
1017 The \vocterm{ivoasem:vocflavour} of these vocabularies is
1018 \verb|RDF Property|.
1019
1020 \end{itemize}
1021
1022 \subsubsection{Example (non-normative)}
1023 \label{sect:rdfpxex}
1024
1025 \begin{lstlisting}[language=XML]
1026 <rdf:Property rdf:about
1027 ="http://www.ivoa.net/rdf/datalink/core#preview-image">
1028 <rdfs:comment>preview of the data as a 2-dimensional
1029 image</rdfs:comment>
1030 <rdfs:label>Image preview</rdfs:label>
1031 <rdfs:subPropertyOf rdf:resource
1032 ="http://www.ivoa.net/rdf/datalink/core#preview"/>
1033 </rdf:Property>
1034 \end{lstlisting}
1035
1036
1037 \subsection{RDF Class Vocabularies}
1038
1039 RDF class vocabularies should be used when the terms in the vocabulary
1040 are reasonably class-like, i.e., would usually be either subjects or
1041 objects in RDF triples. As opposed to SKOS vocabularies, the hierarchy
1042 implied is strict in the sense of \vocterm{rdfs:subClassOf}
1043 -- roughly, that statements true for a wider term must be true
1044 a more specialised term, too. This lets clients confidently perform
1045 inferences.
1046
1047 For instance, coordinates in the FK4 reference frame are equatorial, and
1048 thus even a client unfamiliar with the FK4 frame as such can confidently
1049 infer that the coordinates are right ascension and declination, and that
1050 right ascensions increase eastwards. Reasoning of this type is
1051 impossible within a SKOS vocabulary.
1052
1053 The terms in RDF Class vocabularies have the RDF type
1054 \vocterm{rdfs:Class}.
1055
1056 \subsubsection{Properties in RDF Class Vocabularies}
1057 \label{sect:classvoc-prop}
1058
1059 IVOA RDF class vocabularies use the following properties:
1060
1061 \begin{itemize}
1062 \item \vocterm{rdfs:label} -- all terms MUST have an English-language
1063 label, and clients should prefer it over the term (the fragment of the
1064 term URI) for presentation purposes. Only
1065 one such label is permitted.
1066
1067 \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial
1068 English-language comment serving as a human-oriented definition of the
1069 term. The considerations for \vocterm{skos:definition} in
1070 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one
1071 \vocterm{rdfs:comment} per term is allowed.
1072
1073 \item \vocterm{rdfs:subClassOf} -- interpreted as in RDFS to induce
1074 the hierarchy of terms; a term MUST NOT appear as subject of more than
1075 one \vocterm{rdfs:subClassOf} triple (i.e., the hierarchy is a tree).
1076
1077 \item General properties discussed in \ref{sect:genprop}.
1078 The \vocterm{ivoasem:vocflavour} of these vocabularies is
1079 \verb|RDF Class|.
1080 \end{itemize}
1081
1082 \subsubsection{Example (non-normative)}
1083
1084 Here is a term from an RDF class vocabulary conforming to this
1085 specification in RDF/XML serialisation:
1086
1087 \begin{lstlisting}[language=XML]
1088 <rdfs:Class rdf:about="http://www.ivoa.net/rdf/refframe#FK5">
1089 <rdfs:comment>
1090 Positions based on the 5th Fundamental Katalog. If no equinox is
1091 [...]
1092 </rdfs:comment>
1093 <rdfs:label>FK5</rdfs:label>
1094 <rdfs:subClassOf rdf:resource
1095 ="http://www.ivoa.net/rdf/refframe#EQUATORIAL"/>
1096 </rdfs:Class>
1097 \end{lstlisting}
1098
1099 \subsection{General Properties}
1100 \label{sect:genprop}
1101
1102 To cover requirements~\ref{req:deprecating} and
1103 \ref{req:preliminary} and to facilitate the handling of vocabularies not
1104 directly retrieved via HTTP (which means that the application may not
1105 know the vocabulary URI a priori; cf.~requirement~\ref{req:standalone}),
1106 the Semantics WG defines some
1107 properties of its own in the vocabulary
1108 \url{http://www.ivoa.net/rdf/ivoasem}. The following properties may be
1109 used in all three vocabulary flavours:
1110
1111 \begin{itemize}
1112 \item \vocterm{dc:created} -- IVOA vocabularies MUST include exactly one
1113 triple with the vocabulary as subject and a predicate
1114 \vocterm{dc:created}. The object is the datestamp of the vocabulary in
1115 YYYY-MM-DD format. Clients may only use this for debugging and similar
1116 purposes.
1117
1118 \item \vocterm{ivoasem:vocflavour} -- IVOA vocabularies MUST include
1119 exactly one triple with the vocabulary as subject and a string literal
1120 specifying the kind of vocabulary as per this specification. The
1121 ``General properties'' bullet points of sects.~\ref{sect:skosvoc-prop}
1122 (\verb|SKOS|), \ref{sect:propvoc-prop} (\verb|RDF Property|), and
1123 \ref{sect:classvoc-prop} (\verb|RDF Class|) define what strings may occur
1124 here.
1125
1126 \item \vocterm{ivoasem:preliminary} -- this property indicates
1127 that a term is preliminary and might disappear from the
1128 vocabulary without warning. The object of triples using it
1129 is a blank node. Validators need not warn against the use
1130 of preliminary terms, but as they encounter them, they SHOULD
1131 qualify their validation to the effect that it is temporary.
1132
1133 \item \vocterm{ivoasem:deprecated} -- this property indicates
1134 that a term is deprecated. The object of triples using it
1135 is a blank node. Validators SHOULD issue warnings if such terms
1136 are encountered.
1137
1138 \item \vocterm{ivoasem:useInstead} -- for a deprecated term, the
1139 objects of RDF triples using this property indicate
1140 which terms should be
1141 used instead of the deprecated one.
1142
1143 \end{itemize}
1144
1145 \subsubsection{Example (non-normative)}
1146
1147 The following snippets show RDF/XML triples using the common terms,
1148 taken from the existing relationship\_type vocabulary; the notation
1149 \verb|__| as a blank node is an implementation detail and must not be
1150 relied upon. In general, where ivoasem properties take blank nodes as
1151 objects, clients should normally just ignore the objects.
1152
1153 \begin{lstlisting}[language=XML]
1154 <rdf:Description rdf:about
1155 ="http://www.ivoa.net/rdf/voresource/relationship_type">
1156 <dc:created>2016-08-17</dc:created>
1157 </rdf:Description>
1158 <rdf:Description rdf:about
1159 ="http://www.ivoa.net/rdf/voresource/relationship_type">
1160 <ivoasem:vocflavour>RDF Property</ivoasem:vocflavour>
1161 </rdf:Description>
1162 <rdf:Description rdf:about
1163 ="http://www.ivoa.net/rdf/voresource/relationship_type#IsPartOf">
1164 <ivoasem:preliminary rdf:resource=
1165 "http://www.ivoa.net/rdf/voresource/relationship_type#__"/>
1166 </rdf:Description>
1167 <rdf:Description rdf:about
1168 ="http://www.ivoa.net/rdf/voresource/relationship_type#derived-from">
1169 <ivoasem:deprecated rdf:resource
1170 ="http://www.ivoa.net/rdf/voresource/relationship_type#__"/>
1171 <ivoasem:useInstead rdf:resource
1172 ="http://www.ivoa.net/rdf/voresource/relationship_type#IsDerivedFrom"/>
1173 </rdf:Description>
1174 \end{lstlisting}
1175
1176
1177 \section{Vocabulary Management}
1178 \label{sect:management}
1179
1180 This section discusses the processes through which new vocabularies can be
1181 defined and how vocabulary updates are performed in way
1182 that ensures community participation and at least a minimal level of
1183 consensus; prodecures here primarily address requirements
1184 \ref{req:consensus}, \ref{req:evolution} and \ref{req:traceable}.
1185
1186 In the following, the phrase ``chair of the Semantics WG'' is understood
1187 to mean ``chair or vice-chair of the Semantics WG''; in the unlikely
1188 situation that chair and vice-chair dissent, the resolution of the
1189 problem is up to the TCG chair.
1190
1191
1192 \subsection{New Vocabularies}
1193 \label{sect:new-vocabularies}
1194
1195 New vocabularies in the VO should be introduced with a document going
1196 through the normal IVOA approval process, i.e., intended to become a
1197 recommendation or an endorsed note with RFC as described in the IVOA
1198 Document Standards \citep{2017ivoa.spec.0517G}.
1199
1200 At the discretion of the chair or the Semantics WG, the vocabulary is
1201 uploaded to the vocabulary repository when a document reaches the state
1202 of a Working Draft. At the latest, the vocabulary is uploaded when the
1203 document becomes a Proposed Recommendation or a Proposed Endorsed Note
1204 in order to support a thorough review and reference implementations.
1205
1206 The entire vocabulary is marked human-readably as preliminary in the
1207 vocabulary index (cf.~sect.~\ref{sect:deployment}). All terms in the
1208 vocabulary are marked as preliminary using the
1209 \vocterm{ivoasem:preliminary} property (cf.~sect.~\ref{sect:genprop}) in
1210 order to satisfy requirement~\ref{req:preliminary}.
1211
1212 The entire new vocabulary gets approved as the document introducing it
1213 reaches the status of a Recommendation or an Endorsed Note. From then
1214 on, it is managed by the Semantics WG using the process defined in
1215 the next section.
1216
1217 Once approved (i.e., no longer marked as preliminary),
1218 terms in IVOA vocabularies cannot be removed. They can,
1219 however, be marked as deprecated.
1220
1221 \subsection{Updating Vocabularies}
1222 \label{sect:updating-vocabularies}
1223
1224 IVOA vocabularies can be extended as domain requirements develop
1225 [requirement~\ref{req:evolution}]. Clients
1226 should therefore be designed such that they gracefully deal with terms
1227 that have not been part of the vocabulary at build time, typically by
1228 exploiting information in the vocabulary, perhaps by falling back to
1229 wider, known terms, or by presenting their users labels and descriptions
1230 for terms not explicitly handled.
1231
1232
1233 \subsubsection{Vocabulary Enhancement Proposals}
1234
1235 To add one or more terms to a vocabulary, to introduce deprecations or
1236 to change term labels, descriptions, or relationships,
1237 an interested party -- not necessarily affiliated with the Working Group
1238 that has originally introduced the vocabulary -- prepares a Vocabulary
1239 Enhancement Proposal (VEP). In the interest of thorough review and
1240 topical discussion, a single VEP should only cover directly related
1241 terms. For instance, in a vocabulary of reference frames, it would be
1242 reasonable to add old-style and new-style galactic frames in one
1243 VEP, but not, say, azimuthal and supergalactic coordinates. The
1244 arguments for both terms in the former pair are rather
1245 analogous\footnote{This does not rule out that, in the example, one
1246 might argue that old-style galactic coordinates are so ancient that
1247 perhaps they should not be supported in the VO at all; the chair of the
1248 Semantics WG might then decree that the VEP still needs to be split.}.
1249 In the latter case, two very different rationales would have
1250 to be put forward, which is a clear sign that two VEPs are in order.
1251
1252 \begin{figure}
1253 \begin{verbatim}
1254 Vocabulary: http://www.ivoa.net/rdf/datalink/core
1255 Author: msdemlei@ari.uni-heidelberg.de
1256 Date: 2019-07-19
1257
1258 Term: IsPreviousVersionOf
1259 Action: Addition
1260 Label: Newer Version
1261 Description: This dataset in a previous edition, e.g., processed
1262 with an older pipeline, as part of an older data release.
1263 Relationships: rdfs:subProperyOf(this)
1264 Used-in: http://example.org/datalink?ID=doc-v1
1265
1266 Term: IsNewVersionOf
1267 Action: Addition
1268 Label: Previous Version
1269 Description: This dataset in a newer edition, e.g., processed
1270 with a newer pipeline, as part of a newer data release.
1271 Relationships: rdfs:subProperyOf(this)
1272 Used-in: http://example.org/datalink?ID=doc-v2
1273
1274 Rationale:
1275
1276 The terms are mainly intended for projects with data releases.
1277 IsPreviousVersionOf allows services to mark up links to (typically
1278 datalink documents for) later version(s) of this data set. It
1279 allows a client to alert users that a newer, probably improved,
1280 rendition of the current dataset is available and should
1281 presumably be used instead of what they are looking at. The
1282 inverse relationship, IsNewVersionOf, is useful if projects want
1283 to keep previous versions of the dataset findable without having
1284 them show up in the default queries.
1285
1286 The terms are taken from the relationship types of DataCite.
1287 \end{verbatim}
1288
1289 \caption{A sample VEP.}
1290 \label{fig:vepsample}
1291 \end{figure}
1292
1293 A VEP is a semistructured text file containing the following items:
1294
1295 \begin{itemize}
1296 \item \vepitem{Vocabulary:} The URI of the vocabulary
1297 \item \vepitem{Author:} Contact information for the author(s) of
1298 the VEP.
1299 \item \vepitem{Date:} The date on which the VEP was posted.
1300 \item \vepitem{Term:} The identifier of the term to be added, modified,
1301 or deleted.
1302 \item \vepitem{Action:} one of \textit{Addition}, \textit{Deprecation}, or
1303 \textit{Modification}.
1304 \item \vepitem{Label:} The English-language, human-readable label of the term.
1305 \item \vepitem{Description:} The description that will come with the term.
1306 \item \vepitem{Relationships}: If applicable, relationships the new
1307 term will have to existing terms, using the properties defined in
1308 the present document.
1309 \item \vepitem{Used-In}: At least one URI of a document using the
1310 proposed term.
1311 \item \vepitem{Rationale}: A discussion of use cases, the role of the term in
1312 the vocabulary, and the like. In particular, the item(s) in Used-In
1313 should be commented on.
1314 \end{itemize}
1315
1316 The items \vepitem{Term}, \vepitem{Action}, \vepitem{Label},
1317 \vepitem{Description}, \vepitem{Used-in},
1318 and \vepitem{Relationships}, may be repeated if
1319 multiple terms are affected by a VEP. In \textit{Addition} VEPs, all items
1320 except \vepitem{Relationships} are mandatory.
1321
1322 When \vepitem{Action} is \textit{Deprecation}, \vepitem{Label},
1323 \vepitem{Description}, and \vepitem{Relationships} are optional but can be
1324 given if useful for understanding the VEP. The rationale MUST discuss
1325 the reasons for a deprecation. Usually, one or more replacement
1326 term(s) will be proposed within the same VEP.
1327
1328 When \vepitem{Action} is \textit{Modification}, \vepitem{Label},
1329 \vepitem{Description}, and \vepitem{Relationships} give the proposed new
1330 values of the term. The term itself cannot be modified. The rationale
1331 will usually detail the changes proposed while mentioning the previous
1332 values.
1333
1334 We do not expect the VEPs to be evaluated by machines. Therefore, we
1335 define no grammar for the markup of sections, section headers, and their
1336 content. It is still recommended that authors follow the formatting of
1337 the example in Fig.~\ref{fig:vepsample}.
1338
1339 \subsubsection{Publishing a VEP}
1340
1341 To publish a VEP, it is sent to the chair of the Semantics WG,
1342 preferably by e-mail. The chair of the Semantics WG will perform a
1343 formal validation, in particular as regards the presence of all required
1344 items and syntactically valid relationships. No assessment of the
1345 contents is done at this stage.
1346
1347 VEPs formally valid then receive a running number. The first VEP was
1348 VEP-0001, the second VEP-0002, and so on. The chair of the Semantics WG
1349 then adds the new VEP is added to the public index of VEPs as
1350 ``Current'' (see Appendix~\ref{app:curtech} for the technical details).
1351 This index has a link to each VEP's text (in general, a location in a
1352 version control system).
1353
1354 Once the VEP is uploaded, it is announced to the IVOA Semantics Working
1355 Group and all other IVOA Working Groups concerned (again, the technical
1356 details are found in Appendix~\ref{app:curtech}). The chair of the
1357 Semantics WG can extend the distribution as they see fit. The
1358 announcement in particular contains a copy of the VEP in question.
1359
1360 As soon as possible after the upload, the chair of the Semantics WG adds
1361 any term(s) proposed to the vocabulary as a preliminary term using the
1362 \vocterm{ivoasem:preliminary} property. This means that the terms can
1363 immediately be used without raising warnings or errors, but in contrast
1364 to approved terms, they may disappear again. Deprecation or
1365 modification VEPs have no immediate effect.
1366
1367 \subsubsection{Approval Process}
1368 \label{sect:approval}
1369
1370 Discussion of a VEP takes place in the WGs' discussion forums (again,
1371 see Appendix~\ref{app:curtech}). The chair of the Semantics WG will
1372 summarise the discussion in the VEP in a \textit{Discussion} section.
1373
1374 During the process, all parts of the VEP may be changed except the
1375 term(s) proposed.
1376
1377 Once the chair of the Semantics WG sees a sufficient consensus reached,
1378 they announce the VEP in the TCG. If, at the next meeting of the TCG,
1379 no Working Group objects to the VEP, it is accepted and the marker that
1380 a term is preliminary is removed from the relationships of any terms
1381 added by the VEP. In the case of deprecation or modification VEPs, the
1382 requested actions are taken at this point.
1383
1384 If, on the other hand, discussion of an addition request results in the
1385 realisation that terms proposed need to be changed, the VEP in question
1386 must be withdrawn, its effects on the vocabulary be undone, and zero or
1387 more new VEPs are posted containing proposals for terms for which
1388 consensus appears feasible. The VEP withdrawn receives a
1389 \vepitem{Superceded-by} item referencing any new VEPs, any new VEPs have
1390 a \vepitem{Supercedes} item referencing the original VEP.
1391
1392 \subsubsection{Guidelines for Creating Concepts (non-normative)}
1393
1394 When introducing terms, it is useful to consider a very simple
1395 semantic model, where the world is a set of (tangible or non-tangible)
1396 ``things'' in the sense of naive set theory.
1397
1398 A vocabulary has a scope, which is a subset of the world; this could be
1399 ``reference systems'' or ``astronomical object types'' or even something
1400 as concrete as ``observatories''.
1401
1402 In this picture, a term denotes a certain subset of a vocabulary's
1403 scope. This set is called the term's (or, where an additional level
1404 between the concrete letters making up the term as defined by this
1405 document and the set is useful, the concept's) ``extension''.
1406
1407 Now, in an ideal vocabulary the extensions of its
1408 top-level terms are disjunct (meaning: each thing in scope of the vocabulary
1409 belongs to not more than one top-level term's extension) and the terms cover the
1410 entire scope (meaning: for each thing in the scope, there is at least
1411 one term's extension that contains that thing): The top-level terms are
1412 equivalence classes over the vocabulary's scope.
1413
1414 Where vocabularies are hierarchical, analogous considerations would
1415 apply for the extensions of a general term and its more specialised
1416 terms.
1417
1418 When natural language and the real world are involved,
1419 this ideal generally is unreachable.
1420 But when proposing a term and its definition, authors should try to
1421 make sure that
1422
1423 \begin{compactenum}
1424 \item their new term has a useful extension (i.e., consumers actually
1425 want to know whether a thing is or is not inside it)
1426 \item the extension is reasonably disjunct from existing terms, or is a
1427 true superset (in which case the other terms are narrower), or is a true
1428 subset (in which case they are wider) of other terms' extensions.
1429 \end{compactenum}
1430
1431 Put another way: When designing terms, it is as important to say what is
1432 not covered as to clearly say what is.
1433
1434 This is a major reason why it is important to give clear definitions
1435 whenever these definitions are not uniquely given by the domain. For
1436 instance, while an object type vocabulary probably does not need to be
1437 very diligent in defining $\delta$~Cephei stars because the extension of
1438 that term is uncontroversial to first order\footnote{Although it might
1439 seem desirable to clarify whether, say, W~Virginis stars are or are not
1440 excluded}, a term like ``dataset'' should come with a precise
1441 definition, ideally containing a reference to a longer explanation.
1442
1443 \subsection{Externally Managed Vocabularies}
1444 \label{sect:externally-managed}
1445
1446 The IVOA is not the only body developing vocabularies, and of course VO
1447 components are free to use other, non-IVOA vocabularies whenever
1448 convenient or even required for interoperability beyond the IVOA.
1449
1450 Sometimes, however, it is advantageous to subject an external vocabulary
1451 to the requirements set forth by this specification. The motivating use
1452 case here is \ref{uc:uat}, the Unified Astronomy Thesaurus. As derived
1453 in requirement~\ref{req:external}, multiple considerations make a
1454 ``mirror'' of the vocabulary in the IVOA RDF repository highly
1455 desirable. Regrettably, since RDF resources (i.e., what we call terms
1456 here) are identified by their full URIs, this will create new RDF
1457 resources, and hence care must be taken that RDF tools can work out the
1458 identity of the mirrored IVOA terms and the original RDF resources.
1459
1460 Also, the processes from sects.~\ref{sect:new-vocabularies}
1461 and~\ref{sect:updating-vocabularies} obviously cannot apply to such
1462 vocabularies, which have their own management procedures.
1463
1464 To address these issues, the following rules apply:
1465
1466 When a vocabulary managed by an IVOA-external body needs to be made
1467 available in the form prescribed by this specification, a proposal for
1468 doing this needs to pass the endorsed notes process of the IVOA as laid
1469 out in the IVOA Document Standards \citep{2017ivoa.spec.0517G}. As it
1470 concerns external relationships of the IVOA, it additionally needs
1471 endorsment by the IVOA Execuive Committee to become effective.
1472
1473 This proposal has to specify:
1474 \begin{itemize}
1475 \item The basic metadata for the vocabulary on the IVOA side.
1476 \item The rules for mapping the external RDF resource URIs to IVOA term
1477 URIs, together with a plan for how this mapping is kept stable.
1478 \item If during the mapping of the vocabulary, external RDF triples are
1479 discarded (which likely is necessary to ensure adherence to our
1480 constraints), what triples are discarded.
1481 \item A description of and reference to software that performs this
1482 mapping.
1483 \item A description of the external management process.
1484 \end{itemize}
1485
1486 The proposing party has to provide software to automatically translate
1487 resources from the external format to a suitable input for the IVOA
1488 vocabulary tooling.
1489
1490 Each term in the IVOA vocabulary mirror MUST declare its identity to
1491 the original, external RDF resource. At this point, this is only
1492 defined for SKOS-flavoured vocabularies, where the IVOA term must be the
1493 subject of exactly one triple with the \vocterm{skos:exactMatch}
1494 property. The object of that triple is the URI of the external RDF
1495 resource.
1496
1497 For other flavours, no such mechanism is defined in this version of the
1498 specification, which means that for now, externally managed vocabularies
1499 must use the SKOS flavour.
1500
1501 Once an external vocabulary is endorsed by both the TCG and the
1502 Executive Committee, the chair of the Semantics working group has the
1503 responsibility to keep the IVOA mirror of the vocabulary synchronised,
1504 ideally by using a monitored, automatised process like a post-commit
1505 action on an external version control system.
1506
1507
1508 \section{Publishing Vocabularies}
1509 \label{sect:deployment}
1510
1511 This section is an adaptation of \citet{note:cooluris} and is
1512 intended to satisfy requirements~\ref{req:machine}
1513 and~\ref{req:mtm}. It also briefly discusses how IVOA vocabularies
1514 should be referenced.
1515
1516 \subsection{Deploying Vocabularies}
1517
1518 All IVOA-approved vocabularies are accessible as children of
1519 \url{http://www.ivoa.net/rdf}. Dereferencing that URI will lead to an
1520 index of current approved and proposed vocabularies.
1521 Vocabularies still under review are clearly marked as such.
1522
1523 When dereferencing a vocabulary URI, clients will receive an HTTP 303
1524 (See Other) code, with the \texttt{Location} header set to the last
1525 version of the vocabulary. The version is written as the date of the
1526 last update in the format YYYY-MM-DD. Depending on the value of the
1527 request's accept header, the redirect will end up at
1528
1529 \begin{itemize}
1530 \item an HTML rendition of the vocabulary by default. The HTML element
1531 corresponding to a term has the term (i.e., the fragment identifier in the
1532 term's URI) as its HTML id ; hence a URI
1533 \verb|<vocabulary URI>#<term>| will immediately focus the term's HTML
1534 rendition in common
1535 user agents [requirement~\ref{req:mtm}].
1536
1537 \item a Turtle rendition of the vocabulary if the accept header
1538 indicates that \verb|text/turtle| documents are preferred.
1539
1540 \item an RDF/XML rendition of the vocabulary
1541 if the accept header indicates that
1542 \verb|application/rdf+xml| documents are preferred.
1543
1544 \item an ad-hoc JSON rendition of the vocabulary as specified in
1545 sect.~\ref{sect:desise} if the accept header indicates that
1546 \verb|application/x-desise+json| documents are preferred.
1547 \end{itemize}
1548
1549 Individual vocabularies may be available in additional formats.
1550 Content negotiation might then consider additional media types.
1551
1552 Clients may record the full versioned URI of the vocabulary used for
1553 debug or provenance purposes. These URIs, however, MUST NOT be used
1554 externally. In particular, a URI like
1555 \url{http://www.ivoa.net/rdf/example/2019-07-14/example.html#term} has no
1556 RDF meaning by this standard and must never be used in publicly visible
1557 RDF triples. Always use URIs of the form
1558 \url{http://www.ivoa.net/rdf/example#term}.
1559
1560 \subsection{Referencing Vocabularies}
1561
1562 Since IVOA vocabularies, at least after some time, generally are a
1563 collective effort with a continious evolution, it is inappropriate to
1564 cite them in the conventional author-year-title format.
1565
1566 However, the vocabulary URI is intended to be stable and uniquely
1567 identifies the vocabulary as such. Hence, this URI is what should
1568 normally be cited. The standard style would be along the lines of
1569 \begin{lstlisting}[language={}]
1570 Terms in this field must be taken from the IVOA vocabulary
1571 \url{http://www.ivoa.net/rdf/voresource/content_level}.
1572 \end{lstlisting}
1573 or, in formats where footnotes are appropriate and inline URIs should be
1574 avoided for typographical reasons
1575 \begin{lstlisting}[language={}]
1576 Terms in this field must be taken from the IVOA vocabulary
1577 \emph{Content levels for VO resources}\footnote{
1578 \url{http://www.ivoa.net/rdf/voresource/content_level}}.
1579 \end{lstlisting}
1580 -- the footnote anchor should be the vocabulary name as given in the
1581 IVOA vocabulary repository\footnote{\url{http://www.ivoa.net/rdf}}.
1582
1583 Except in the rare cases in which version-sharp references are actually
1584 necessary (for instance, descriptions of errors), it is inappropriate to
1585 references URLs with dates (e.g.,
1586 \url{http://ivoa.net/rdf/voresource/content_level/2016-08-17/}). URIs
1587 to actual resources (e.g., the XML or Turtle renditions) must never be
1588 used to reference vocabularies.
1589
1590 We do not see a relevant use case for having IVOA vocabularies formally
1591 cited in reference sections of scholarly works: such references will not
1592 aid in finding them, and there is no credible benefit in tracking their
1593 usage from citation in literature.
1594
1595
1596 \appendix
1597 \section{The 2019 IVOA Vocabulary Toolset (non-normative)}
1598 \label{app:tools}
1599
1600 This appendix describes the recommended toolset for authoring IVOA
1601 vocabularies as of 2019. Vocabulary authors may decide to use other
1602 tools but should consider that that may incur additional work for the
1603 chair of the Semantics WG in later maintenance.
1604
1605 This appendix is non-normative. It will serve as documentation of the
1606 toolset and will occasionally be updated as the tooling evolves;
1607 vocabulary authors are still advised to inspect documentation within the
1608 tools. Even major changes here will not lead to a new major version of
1609 the standard.
1610
1611
1612 \subsection{Input Format}
1613
1614 In the current tooling, RDF class and property
1615 vocabularies are authored in simple CSV files
1616 with five columns. These columns are:
1617
1618 \begin{description}
1619 \item[term]
1620 This is the actual, machine-readable vocabulary term. Only use
1621 letters, digits, underscores, and dashes here. As specified in
1622 sect.~\ref{sect:voccontent}, these identifiers should be
1623 human-readable, even though they are not directly intended for human
1624 consumption (clients will use the label). In the interest of
1625 reasonably compact URIs we advise to keep the length of the
1626 terms below, say, 30 characters.
1627 \item[level]
1628 This is used for simple input of wider/narrower relationships.
1629 It is 1 for ``root'' terms. Terms with a level of 2 that follow a
1630 root term become its children. i.e., the tooling will add the
1631 appropriate wider relationship between the level 2 and the level 1
1632 term. You can nest, i.e., have
1633 terms of level 3 below terms of level 2. Note that this means the
1634 order of rows must be preserved in the CSV files: Do \emph{not} sort
1635 vocabulary CSVs.
1636 \item[label]
1637 This is a short, human-readable label for the term. In the VO, this
1638 is generally derived fairly directly from the content of the first
1639 column, usually by
1640 inserting blanks at the right places and fixing capitalisation.
1641 \item[description]
1642 This is a longer explanation of what the term means. We do not
1643 support any markup here, not even paragraphs, so there is probably a
1644 limit to how much can be communicated.
1645 \item[more\_relations]
1646 This column can be used to declare non-hierarchical relationships
1647 and contains whitespace-separated declarations. Each declaration has
1648 the form property[(term)]. Omitting the term is allowed for certain
1649 properties; in RDF, this corresponds to a blank node. See below for
1650 the common properties supported here. Plain terms are resolved
1651 within the vocabulary, but CURIEs with known prefixes or full URIs are
1652 admitted, too.
1653 \end{description}
1654
1655 Non-ASCII characters are allowed in label and description; files must be
1656 encoded in UTF-8, the column separator currently is required to be a
1657 semicolon in order to save on escaping with descriptions (which very
1658 commonly contains commas). Fields that contain semicolons are escaped
1659 with double quotes, embedded double quotes are doubled.
1660
1661 The following properties are supported in the more\_relations
1662 column:
1663
1664 \begin{itemize}
1665 \item \vocterm{ivoasem:deprecated} -- see sect.~\ref{sect:genprop}.
1666 \item \vocterm{ivoasem:useInstead} -- see sect.~\ref{sect:genprop}.
1667 \item \vocterm{ivoasem:preliminary} -- see sect.~\ref{sect:genprop}.
1668 \end{itemize}
1669
1670 \subsection{Vocabulary Metadata}
1671 \label{sect:vocmeta}
1672
1673 Global vocabulary metadata is kept an INI-style format. The following
1674 keys are understood:
1675
1676 \begin{description}
1677 \item[timestamp]
1678 A manually maintained date of the last modification. This is
1679 essentially a version marker and should be changed only in preparation
1680 for a release. It is recommended to set it to the intended release
1681 date during development and not change it for every edit.
1682 \item[title]
1683 A human-readable short phrase saying what the vocabulary describes.
1684 \item[flavour]
1685 One of \textit{RDF Class}, \textit{RDF Property}, or \textit{SKOS}
1686 (where SKOS currently expects RDF/XML serialised SKOS rather than CSV).
1687 \item[description]
1688 A longer text (about a paragraph) stating what the vocabulary should
1689 be used for. No markup is supported here.
1690 \item[authors]
1691 Persons involved with the creation of the vocabulary. These are \emph{not}
1692 the persons to ask for maintenance; all requests for changes should be
1693 directed to the Semantics working group first.
1694 \item[filename]
1695 The tooling expects the input at
1696 \verb|<vocabulary name>/terms.csv|. If it is kept elsewhere, give
1697 the source file name here. This is to support legacy
1698 vocabularies with nonstandard names and native SKOS input.
1699 \item[draft]
1700 While a vocabulary is still being reviewed in its entirety, add a key
1701 draft set to \texttt{True}. This will add language to the effect that
1702 terms may still vanish from the vocabulary and mark all terms as
1703 preliminary. Once the vocabulary is approved, this key is deleted.
1704 \item[licenseuri]
1705 IVOA-managed vocabularies are always made available under CC-0 and
1706 hence do not use this key. External vocabularies as per
1707 sect.~\ref{sect:externally-managed} may be subject to actual licences,
1708 in which case this field holds a URI containing the licence's
1709 conditions.
1710 \item[licensenhtml]
1711 This is arbitrary HTML expressing whatever licence terms may be
1712 attached to an external vocabulary. Again, do not use for IVOA
1713 vocabularies.
1714 \end{description}
1715
1716 Currently, the global metadata is maintained in a file
1717 \verb|vocabs.conf| in the root of the vocabulary source repository, with one
1718 section per vocabulary. The section name is the vocabulary name.
1719
1720 \subsection{Vocabulary Source Repository}
1721
1722 Vocabulary authors are encouraged to maintain their vocabularies in the
1723 shared version control system of the IVOA. At the time of writing, this
1724 is a subversion repository at
1725 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/voc-source}.
1726
1727 Authors of new vocabularies should create a child directory and place
1728 their terms.csv file in there. They should then edit \verb|vocabs.conf|
1729 and add a section named after their directory with the content discussed
1730 in sect.~\ref{sect:vocmeta}.
1731
1732
1733 \section{Current Network Resources (non-normative)}
1734 \label{app:curtech}
1735
1736 This appendix details network resources used in vocabulary management.
1737 It is non-normative and will occasionally be updated as the IVOA's
1738 infrastructure evolves. Even major changes here will not lead to a new
1739 major version of the standard.
1740
1741 The list of vocabulary enhancement proposals is maintained in the IVOA's
1742 wiki at
1743 \url{https://wiki.ivoa.net/twiki/bin/view/IVOA/WebHome?topic=VEPs}.
1744 Approved VEPs will be moved to an archive page linked there.
1745 VEPs may be added as attachments to this page, but authors are
1746 encouraged to maintain them in version controlled repositories instead.
1747 The recommended place to do that is
1748 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/veps}.
1749
1750 The discussion of VEPs (see sect.~\ref{sect:approval}) is to take place
1751 on the appropriate mailing list(s). See
1752 \url{http://ivoa.net/members/index.html} for a directory of IVOA mailing
1753 lists and their addresses.
1754
1755 \section{An Example for a Vocabulary in Desise (non-normative)}
1756 \label{app:desiseexample}
1757
1758 The following example shows what a vocabulary in desise looks like. The
1759 content is, superficial similarities to real vocabularies
1760 notwithstanding, contrived.
1761
1762 \begin{lstlisting}[language=python]
1763 {
1764 "uri": "http://www.ivoa.net/rdf/example",
1765 "flavour": "RDF Class",
1766 "terms": {
1767 "EQUATORIAL": {
1768 "label": "Equatorial",
1769 "description": "Umbrella term for all sorts of equatorial frames.",
1770 "narrower": ["ICRS", "ICRS2", "BD", "BD1875.0"], "wider": []
1771 },
1772 "ICRS": {
1773 "label": "ICRS",
1774 "description": "As defined by 1998AJ....116..516M.",
1775 "wider": ["EQUATORIAL"], "narrower": []
1776 },
1777 "B1875.0": {
1778 "label": "Bonner Durchmusterung System",
1779 "description": "Deprecated term for the reference system implied by BD/CD",
1780 "deprecated": "",
1781 "wider": ["EQUATORIAL"], "narrower": []
1782 },
1783 "BD": {
1784 "label": "Bonner Durchmusterung System",
1785 "description": "The reference system implied by BD/CD"
1786 "wider": ["EQUATORIAL"], "narrower": []
1787 },
1788 "ICRS2": {
1789 "label": "ICRS 2",
1790 "description": "The reference system defined by 2027A&A..1234...12B",
1791 "preliminary": "",
1792 "wider": ["EQUATORIAL"], "narrower": []
1793 }
1794 }
1795 }
1796 \end{lstlisting}
1797
1798 \section{Changes from Previous Versions}
1799
1800 \subsection{Changes from WD-2020-06-12}
1801
1802 \begin{itemize}
1803 \item No changes to normative material.
1804 \item Adding a use case on vocabulary evolution and on VO-DML.
1805 \item Various editorial changes.
1806 \end{itemize}
1807
1808 \subsection{Changes from WD-2020-03-26}
1809
1810 \begin{itemize}
1811 \item Desise term values are now dicts with label and description to
1812 make it a bit more self-explanatory; this let us pull in preliminary,
1813 deprecated, and wider as well.
1814 \item Desise now contains an inversion of wider, narrower, with meanings
1815 quite different between SKOS and the other flavours.
1816 \item The main media type for Desise is now application/x-desise+json rather
1817 than text/json because there is no text/json, and you can't have
1818 content media type parameters on either.
1819 \item Mentioning licenseuri and licensehtml in the non-normative part on
1820 managing vocabulary metadata. Also stating there that IVOA-managed
1821 vocabularies are CC-0.
1822 \end{itemize}
1823
1824
1825 \subsection{Changes from WD-2019-09-05}
1826
1827 \begin{itemize}
1828 \item We no longer recommend that non-RDF clients use RDF/XML. We have
1829 therefore removed the ``usage with plain XML tooling'' sections. We
1830 have also removed the description of the revovo python module from the
1831 toolset appendix.
1832
1833 \item Instead, we now have the custom ``desise'' format described in a
1834 new section that doubles as a very quick introduction for adopters not
1835 interested in RDF.
1836
1837 \item Adding a use case and requirement for the UAT (and, perhaps,
1838 similar externally curated vocabularies). Adding a section on how
1839 such vocabularies may be integrated into the IVOA RDF repository.
1840
1841 \item Now requiring a \emph{Used-in} item in addition VEPs, implying
1842 that only terms that are already applied may be proposed.
1843
1844 \item Adding \emph{Supercedes} and \emph{Superceded-by} items,
1845 formalising the previous language on ``splitting'' VEPs a bit.
1846
1847 \item Adding advice on referencing vocabularies.
1848
1849 \item We now demand a formal validation of VEPs by the semantics chair.
1850 The responsibility for ``uploading'' the VEP, i.e., adding it to the VEP
1851 index, is now assigned to them.
1852
1853 \item Adding a soapbox section with advice on what to do when proposing
1854 new terms and introducing a naive semantics model.
1855 \end{itemize}
1856
1857 \bibliography{local.bib,ivoatex/ivoabib,ivoatex/docrepo}
1858
1859
1860 \end{document}

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26