/[volute]/trunk/projects/semantics/Vocabularies/Vocabularies.tex
ViewVC logotype

Contents of /trunk/projects/semantics/Vocabularies/Vocabularies.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 5952 - (show annotations)
Tue May 11 12:26:25 2021 UTC (6 weeks ago) by msdemlei
File MIME type: application/x-tex
File size: 82109 byte(s)
Changes after first DAL review.


1 \documentclass[11pt,a4paper]{ivoa}
2 \input tthdefs
3
4 \usepackage{todonotes}
5 \lstloadlanguages{XML,python}
6 \lstset{flexiblecolumns=true,tagstyle=\ttfamily, showstringspaces=False,
7 basicstyle=\footnotesize}
8
9 \definecolor{termcolor}{rgb}{0.6,0.1,0.1}
10
11 \iftth
12 \def\vocterm#1{\emph{\color{termcolor}#1}}
13
14 \else
15 \def\vocterm{\startvocterm\realvocterm}
16 \def\realvocterm#1{\emph{\color{termcolor}#1}\endvocterm}
17 \begingroup
18 \gdef\breakablecolon{:\hskip0pt}
19 \catcode`\:=\active
20 \gdef\startvocterm{\begingroup
21 \catcode`\:=\active\let:=\breakablecolon}
22 \gdef\endvocterm{\endgroup}
23 \endgroup
24 \fi
25
26
27 \newcommand{\vepitem}[1]{\emph{#1}}
28
29 \title{Vocabularies in the VO}
30
31 % see ivoatexDoc for what group names to use here
32 \ivoagroup{Semantics}
33
34 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkusDemleitner]{Markus
35 Demleitner}
36 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/NormanGray]{Norman
37 Gray}
38 \author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkTaylor]{Mark
39 Taylor}
40
41 \editor{Markus Demleitner}
42
43 \previousversion[https://ivoa.net/documents/Vocabularies/20200612/]
44 {WD-20200612}
45 \previousversion[https://ivoa.net/documents/Vocabularies/20200326/]
46 {WD-20200326}
47 \previousversion[http://ivoa.net/documents/Vocabularies/20190905/]
48 {WD-20190905}
49
50
51 \begin{document}
52 \begin{abstract}
53 In this document, we discuss practices related to the use of RDF-based
54 consensus vocabularies in the Virtual Observatory, that is the creation,
55 publication, maintenance, and consumption of
56 hierarchical word lists agreed upon within the IVOA.
57 To cover the wide range of use cases envisoned, we define different
58 vocabulary types for informal knowledge organisation on the
59 one hand, and strict hierarchies of classes and properties on the other.
60 While the framework rests on the solid foundations of W3C RDF,
61 provisions are made to facilitate using IVOA vocabularies without
62 specific RDF tooling.
63 Non-normative appendices detail the current vocabulary-related tooling.
64 \end{abstract}
65
66
67 \section*{Acknowledgments}
68
69 While this is a complete rewrite of the specification of how vocabularies
70 are treated in the VO, we gratefully acknowlegde the groundbreaking work
71 of the authors of version 1 of Vocabulary in the VO, S\'ebastien
72 Derriere, Alasdair Gray, Norman Gray, Frederic Hessmann, Tony Linde,
73 Andrea Preite Martinez, Rob Seaman, and Brian Thomas.
74
75 In particular, the vocabulary for datalink semantics done by Norman Gray
76 was formative for many aspects of what is specified here.
77
78 \section*{Conformance-related definitions}
79
80 The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
81 ``OPTIONAL'' (in upper or lower case) used in this document are to be
82 interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
83
84 The \emph{Virtual Observatory (VO)} is a
85 general term for a collection of federated resources that can be used
86 to conduct astronomical research, education, and outreach.
87 The \href{http://www.ivoa.net}{International
88 Virtual Observatory Alliance (IVOA)} is a global
89 collaboration of separately funded projects to develop standards and
90 infrastructure that enable VO applications.
91
92 \section{Introduction}
93
94 The W3C's Resource Description Framework RDF \citep{note:rdfprimer} is a powerful
95 and very generic means to represent, transmit, and reason on highly
96 structured, ``semantic'' information. With both its power and
97 generality, however, comes a high complexity for consumers of this
98 information if no further conventions are in force. Also, the generic
99 W3C standards understandably do not cover how semantic resources (e.g.,
100 vocabularies or ontologies) are to be managed, let alone developed
101 within organisations like the IVOA.
102
103 While for many applications even within the VO, the significant
104 complexity and the lack of defined management processes is acceptable,
105 for several other use cases -- in particular those given in
106 sect.~\ref{sect:usecases} ––, having extra conventions greatly
107 help implementability and interoperability.
108
109 Based on requirements derived from these use cases
110 (sect.~\ref{sect:requirements}), this standard will therefore define
111 conventions for vocabularies based on either SKOS \citep{std:skos} or
112 RDFS \citep{std:rdfs} in
113 sect.~\ref{sect:voccontent}. Where these vocabularies -- and hence, in
114 particular, the permanent URIs of their RDF resources (``terms'')
115 -- are managed by the
116 IVOA, they need to be reviewed and consensus be found. A process to
117 ensure this is described in
118 sect.~\ref{sect:management}. In order
119 to provide certain guarantees to clients, sect.~\ref{sect:deployment}
120 defines minimal standards for how IVOA-managed vocabularies must be made
121 available. In order to help adopters simply looking for simple
122 vocabulary-related recipes, sect.~\ref{sect:withoutrdf} discusses how IVOA
123 vocabularies can be used without knowledge of RDF.
124
125 The non-normative appendices~\ref{app:tools} and \ref{app:curtech}
126 describe the tooling
127 currently used or recommended for building and managing vocabularies in the
128 IVOA.
129
130
131 \subsection{Role within the VO Architecture}
132
133 \begin{figure}
134 \centering
135
136 \includegraphics[width=0.9\textwidth]{role_diagram.pdf}
137 \caption{Architecture diagram for this document}
138 \label{fig:archdiag}
139 \end{figure}
140
141 Fig.~\ref{fig:archdiag} shows the role the Vocabularies in the VO standard
142 plays within the IVOA architecture \citep{2010ivoa.rept.1123A}.
143
144 This standard defines a set of conventions on procedures on
145 top of several W3C standards that can be adopted by other VO standards
146 that require interoperable, consensus vocabularies, such as:
147
148 \begin{bigdescription}
149 \item[Datalink \citep{2015ivoa.spec.0617D}] Datalink includes a
150 vocabulary letting clients work out the kind of artefact a row pertains
151 to.
152
153 \item[VOResource \citep{2018ivoa.spec.0625P}] VOResource 1.1 comes with
154 several (rather flat) vocabularies enumerating, for instance, the types
155 of relationships between VO resources, their intended audiences, or
156 classes of actions performed on them.
157
158 \item[VOEvent \citep{2006ivoa.spec.1101S}] VOEvent defines \emph{Why}
159 and \emph{What} elements which, while not formally required to be drawn
160 from a specific vocabulary in version 1.11, certainly become much more
161 useful if they are.
162
163 \item[VOTable \citep{2019ivoa.spec.1021O}] VOTable, in its version 1.4,
164 introduces vocabularies for time scales and reference positions.
165
166
167 \item[UCDs \citep{2007ivoa.spec.0402M}] UCDs are related to vocabularies in
168 that they provide machine-readable semantics. Because the terms listed
169 in the document can be combined and have an underlying grammar, however,
170 they go beyond standard RDF. Hence, no attempt is being made to
171 integrate them into the framework proposed here at this time. The
172 UCD atoms might be organised in an RDF vocabulary, though, and doing so
173 might be considered in the future.
174 \end{bigdescription}
175
176 Other VO standards can do with fewer normative constraints; using W3C
177 standards without the extra requirements laid down here is explicitly
178 encouraged where the use cases do not require the extra management and
179 definition effort, or where perhaps more complex structures (e.g., full
180 ontologies) must be employed. An example for a direct use of SKOS
181 without adoption of the present document is the Simulation Data Model
182 SimDM \citep{2012ivoa.spec.0503L}, where several fields constrain their
183 values to be \vocterm{skos:narrower} than certain top-level concepts.
184
185 \subsection{Relationship to Vocabularies in the VO Version 1}
186 \label{sect:version1rel}
187
188 Published in 2009, version 1.19 of the IVOA Recommendation on
189 Vocabularies in the VO had an outlook fairly different from the present
190 document: the big use case was VOEvent's Why and What, and so its focus
191 was on large, general-purpose vocabularies, of which several existed even
192 back then, while an overhaul of a thesaurus of general astronomical
193 terms approved by the IAU in 1993 was underway as part of IVOA's
194 activities. Mapping between vocabularies maintained by different VO
195 and non-VO parties seemed to be the way to ensure interoperability and
196 therefore played a large role in the document. Also, the use cases
197 called for ``soft'' relations, which is why the standard confined itself
198 to SKOS as the vocabulary formalism.
199
200 Since then, ``the'' large astronomy thesaurus is being maintained
201 outside of the IVOA (the UAT\footnote{\url{http://astrothesaurus.org}}),
202 and there is hope that its takeup will be sufficient to make mapping
203 between it and, say, legacy journal keyword systems an exercise general
204 clients will not have to perform.
205
206 Instead, in 2010, a fairly formal vocabulary of what
207 should be properties (in the RDF sense) rather than \vocterm{skos:Concept}-s
208 was required during the development of the datalink standard. The
209 vocabulary was (and still is) small in comparison to, say, the UAT. In
210 contrast to the expectations of Vocabularies~1, the plan had been that
211 most data providers would work with this small vocabulary, and terms
212 from external vocabularies would only be used as temporary stand-ins
213 until the consensus vocabulary was updated. Of course, this required a
214 process for managing such vocabularies. The lack of such a process
215 became even more noticeable when VOResource 1.1 and VOTable 1.4
216 introduced vocabularies of their own similar in size and scope to the
217 datalink vocabulary.
218
219 On the other hand, we are not aware of a single attempt to map
220 between different vocabularies in a VO context, and the SKOS versions of
221 some vocabularies that Vocabularies 1 declared as normative in its
222 section~4 were largely unused and have been unmaintained for a while now.
223
224 Since large parts of the original specification turned out to be
225 irrelevant or unsustainable as the VO ecosystem evolved,
226 while some core requirements found later
227 were not addressed, it was decided to prepare a new major version of the
228 Vocabularies in the VO standard.
229
230 \subsection{Reading Guide}
231
232 We hope that software authors or annotators just wanting to consume IVOA
233 vocabularies or use them to annotate documents will be able to
234 do so after reading just section~\ref{sect:withoutrdf}. In particular, no
235 deeper understanding of RDF should be necessary.
236
237 Persons intending to participate in vocabulary evolution should skim
238 sect.~\ref{sect:voccontent}, in particular the subsection on the kind of
239 vocabulary they want to modify, and must study
240 sect.~\ref{sect:management}.
241
242 Readers unfamiliar with RDF should read \citet{local:normanspaper} before
243 reading anything outside of section~\ref{sect:withoutrdf}.
244 In particular, we assume familiarity with all RDF
245 terminology discussed there. Concepts not covered by Gray's
246 essay will be informally introduced here. Of course, the
247 underlying W3C standards are normative where applicable.
248
249
250
251 \subsection{Terminology, Conventions, Typography}
252
253 When we speak of \emph{term} here, that either means a \vocterm{skos:Concept}
254 in SKOS vocabularies, an \vocterm{rdfs:Class} in RDF class vocabularies,
255 or an \vocterm{rdf:Property} in RDF property vocabularies. We also use
256 \emph{term} for ``the string after the hash character in
257 the RDF resource URI'', i.e., the machine-readable string typically used
258 in annotation. It is rarely necessary to distinguish between the two
259 meanings.
260
261 We refer to classes and properties by CURIEs \citep{std:curie}, i.e.,
262 URIs shortened by replacing long strings with compact prefixes and a
263 colon. The prefixes in this
264 document correspond to the following base URIs:
265
266 \begin{compactitem}
267 \item dc -- \url{http://purl.org/dc/terms/}
268 \item rdf -- \url{http://www.w3.org/1999/02/22-rdf-syntax-ns#}
269 \item rdfs -- \url{http://www.w3.org/2000/01/rdf-schema#}
270 \item owl -- \url{http://www.w3.org/2002/07/owl#}
271 \item skos -- \url{http://www.w3.org/2004/02/skos/core#}
272 \item ivoasem -- \url{http://www.ivoa.net/rdf/ivoasem#}
273 \end{compactitem}
274
275 Vocabulary terms are written in italics (e.g., \vocterm{rdfs:Class})
276 and, where supported, in a reddish hue. As common in IVOA
277 specifications, XML element and attribute names are written in
278 typewriter italic (e.g., \xmlel{img}).
279
280 \section{Derivation of Requirements (Non-Normative)}
281
282 \subsection{Use Cases}
283 \label{sect:usecases}
284
285 The normative content of this document is guided by a set of
286 requirements derived from the following use cases.
287
288 \subsubsection{Controlled Vocabulary in VOResource}
289 \label{uc:simplevoc}
290
291 In VOResource, in certain use cases clients have to find services that
292 publish a given data collection. This is effected by linking the resource
293 records for service and data with a
294 DataCite-compatible \vocterm{isServedBy} relationship.
295 Its concrete literal needs to be reliably defined in order to let
296 clients find such relationships by a simple string comparison in RegTAP
297 queries.
298
299 A related use case is that validators can flag errors (or at least
300 warnings) when resource records use terms that are not part of some
301 controlled vocabulary (e.g., content levels or types of events in a
302 resource's history). Very typically, such out-of-vocabulary terms
303 indicate small oversights on the part of the resource record author that
304 will lead to hard-to-debug problems in data discovery.
305
306 \subsubsection{Controlled Vocabularies in VOTable}
307 \label{uc:votvoc}
308
309 VOTable 1.4 constrains two attributes of the TIMESYS elements
310 -- reference positions and time
311 scales -- using vocabularies.
312 While with time scales the situation is not fundamentally
313 different from the VOResource case discussed in
314 use case.~\ref{uc:simplevoc} -- a simple enumeration of agreed-upon strings
315 is enough to uniquely determine what operations need to be performed to
316 combine times given in different time scales --, the situation for
317 reference positions is probably different. There, even if a client does
318 not exactly know the location of, say, the Hubble Space Telescope at any
319 given time, several important use cases can already be satisfied if a
320 client knows that it is in lower Earth orbit (e.g., assuming a reference
321 position Geocenter and adjusting the systematic error estimates). For
322 this, a client needs information of the type ``\vocterm{HST}
323 \vocterm{is-close-to} \vocterm{GEOCENTER\/}'' (or similar).
324
325 There is also another difference between this and at least the
326 VOResource relationship vocabulary from use case~\ref{uc:simplevoc}
327 in that the latter is property-like, as
328 in ``Resource-1 \vocterm{isServedBy} Resource-2\/''. In contrast with
329 this, a time scale would be used like ``Time-coordinate
330 \vocterm{is-given-in} \vocterm{TT\/}''. In RDFS terminology, time scales
331 are therefore better modelled as classes rather than properties.
332
333 \subsubsection{Datalink Link Selection}
334 \label{uc:links}
335
336 In Datalink, clients receive a set of links
337 to pieces of information (e.g., previews, additional metadata,
338 progenitors, or
339 derived data) and need to present to the user only those items
340 relevant to the task at hand. For instance, in a discovery phase, only
341 previews should be offered, while scientific exploitation would call for
342 cutout services, alternate formats, or derived data. For debugging,
343 progenitors should be made accessible, and so on.
344
345 Operators of datalink services, on the other hand, want to be precise in
346 their annotation of datasets. For instance, they may want to discern
347 among progenitors: the raw image, a dark frame, and a flat field. In all
348 these cases, clients should still be able to work out that such
349 artefacts are progenitors.
350
351 \subsubsection{VOEvent Filtering, Query Expansion}
352 \label{uc:filtering}
353
354 In VOEvent, an event stream can contain a classification of what the
355 observers believe was observed, for instance ``supernova Ia explosion''.
356 While an event stream from one project might provide a classification on
357 that level for some event, it might not (yet) be able to do that in
358 another event, and a different event stream might not be able to
359 distinguish between different sorts of supernovae at all.
360
361 In this situation, an event broker looking for supernovae of type Ia
362 will filter out anything not related to supernovae; however, since for
363 one reason or another a Ia supernova might only be tagged as supernova,
364 it will want to widen its filter somewhat, where some backend process
365 might prioritise events classified as Ia upstream over those only tagged
366 as a generic supernova, and those, again, over those tagged explicitly
367 as some different type of supernova.
368
369 Similar use cases exist, for instance, in the discovery of simulations
370 and possibly for subjects of VO resources.
371
372
373 \subsubsection{Vocabulary Updates in VOResource}
374 \label{uc:deprecation}
375
376 In VOResource 1.0 \citep{2008ivoa.spec.0222P}, relationship types
377 like \vocterm{served-by} or
378 \vocterm{service-for} were defined. Later, DataCite defined equivalent
379 terms \vocterm{IsServedBy} and \vocterm{IsServiceFor}. Arguably, the VO should,
380 as far as sensible, take up standards in the wider data management
381 community, and so VOResource 1.1 adopts the DataCite terms. In a minor
382 version, it cannot forbid the old terms. It can, however, say not only
383 ``\vocterm{served-by\/} is the same as \vocterm{isServedBy\/}'' but also
384 ``Use the latter term in preference to the former''. If this information is
385 available machine-readably, validators can warn against the use of
386 deprecated terms and user interfaces can transparently replace
387 deprecated terms with current ones. This latter use case is is
388 already specified in RegTAP 1.1 \citep{2019ivoa.spec.1011D}.
389
390 Another use case in the context of VOResource and vocabulary updating
391 is the definition of content levels. In VOResource 1.0, a list of
392 terms was adopted that was far too fine-grained in the area of public
393 outreach, distinguishing, for instance, ``Middle School'' from
394 ``Secondary Education''. While this granularity was useful for the
395 original realm of the list of terms, in the VO it resulted in extremely
396 inhomogeneous annotation. Obviously, persons employed in research
397 institutions can hardly be expected to assess needs and capabilities of
398 middle school versus elementary school educators. Eventually, for
399 VOResource 1.1 a three-term list was drawn up and is now actually used.
400 To avoid a repetition of such an experience, we want to enable small
401 initial vocabularies easily extendable as new terms are actually needed
402 and the use of the existing terms is well understood.
403
404
405 \subsubsection{Vocabularies in VO-DML}
406
407 The modelling language VO-DML \citep{2018ivoa.spec.0910L} lets model
408 designers constrain attribute values though external resources defined
409 through a vocabulary URI and possibly a top concept. The standard
410 mentions both SKOS -- inspired by version 1 of this document -- and RDFS
411 as possible technologies for such constraints.
412
413 Depending on the nature of the attributes constrained, modellers might
414 forsee the need for having these vocabularies managed by the IVOA. Of
415 course, that is up to the modeller: There are certainly many cases in
416 which there is no need for the overhead this specification brings with
417 it, be it because vocabularies are externally defined or because the
418 concrete application profits from less-constrained vocabularies.
419
420 \subsubsection{Discovering Meanings}
421 \label{uc:discovering}
422
423 Software developers or researchers want to work out
424 what some term mentioned ``means'' (where we are agnostic as to what
425 ``means'' should mean here). If the term URI alone is insufficient,
426 they can simply paste the resource URI of the term into a web browser
427 and read (at least) its description and perhaps find out even more using
428 relationships between terms.
429
430 \subsubsection{Simple Review Process}
431 \label{uc:simplereview}
432
433 As vocabularies evolve, new terms are being added to
434 vocabularies. To facilitate their review and enable rapid uptake
435 of the proposed terms, it is desirable that new terms and even
436 new vocabularies are immediately visible to users and tools.
437 Note that since terms under review might be modified or removed later,
438 this use case is somewhat in conflict with the basic requirement
439 of stable vocabularies (i.e., a document valid once will not
440 become invalid later because of changes in vocabularies).
441
442 \subsubsection{Understanding Vocabulary Evolution}
443 \label{uc:understanding}
444
445 When a question comes up, such as what \vocterm{calibration} actually means
446 in the datalink core vocabulary, and the (legacy) description is not
447 sufficiently clear, people can go back to the discussions that led up
448 to the addition of that term. This will also help clarify existing
449 usage that might have begun at the time of the initial definition.
450
451 \subsubsection{Offline operation}
452 \label{uc:offline}
453
454 A system doing, say, coordinate transformations might run without an internet
455 connection but still needs to use semantic resources on frames and
456 reference positions (e.g., figure out that a given space probe is in L1
457 and use that as reference position). To do that, it wants to use a
458 previously downloaded copy of the vocabulary.
459
460 \subsubsection{UAT in VOResource}
461 \label{uc:uat}
462
463 VOResource 1.1, in the description of the \xmlel{subject} element, says
464 that its content ``should be drawn from the Unified Astronomy Thesaurus''
465 (here: UAT). This is intended to later facilitate interactive topic
466 navigation within the Registry or semantic expansion of Registry queries
467 (``include narrower terms'').
468
469
470 \subsection{Requirements}
471 \label{sect:requirements}
472
473 \subsubsection{Lists of Terms}
474 \label{req:lists}
475
476 We need to be able to represent simple lists of terms even for the most
477 basic use case~\ref{uc:simplevoc}. As per
478 use case~\ref{uc:votvoc}, we will have to represent instances of both
479 \vocterm{rdf:Property} and \vocterm{rdfs:Class} (though not necessarily
480 in one vocabulary). In order to not break existing practices (e.g.,
481 use cases \ref{uc:simplevoc}, \ref{uc:votvoc}, \ref{uc:links}), the
482 machine-readable terms must be allowed to follow existing patterns of
483 essentially human-readable identifiers (against external best practices
484 of using non-informative URI forms). In general, in essentially all use
485 cases discussed, making the machine-readable terms discernable by a
486 human is an advantage.
487
488 \subsubsection{Hierarchies of Terms}
489 \label{req:hierarchy}
490
491 Both use case~\ref{uc:links} and use case~\ref{uc:filtering} require a hierarchy
492 of terms, where clients can find wider and potentially narrower terms
493 relative to an original one. There is a difference,
494 however: in the datalink use-case, strict \vocterm{is-a} relationships
495 are what clients need (e.g., ``give me all kinds of previews''). In the
496 VOEvent case, however, a somewhat softer sort of hierarchy is required.
497 For instance, a filter for accretion disks might very well expand to
498 match both quasars and cataclysmic variables. Hence, we want to
499 be able to represent strict class hierarchies as well as thesaurus-like
500 soft knowledge structures.
501
502 \subsubsection{Tree-like Hierarchies}
503 \label{req:tree}
504
505 Where we expect some sort of semi-formal inference to take place on the
506 vocabularies, the hierarchy should be a tree in order to facilitate
507 traversal and controlled query expansion. In other words, outside of
508 SKOS we do not support multiple inheritance. Use cases requiring
509 something equivalent would have to resort to supporting multiple terms
510 on the annotation level.
511
512 \subsubsection{Consensus Vocabularies}
513 \label{req:consensus}
514
515 Essentially all our our use cases will be much easier to implement if
516 clients can work through simple string comparisons. Therefore,
517 wherever feasible IVOA standards should build on IVOA-sanctioned,
518 consensus vocabularies.
519
520 \subsubsection{Deprecating Terms}
521 \label{req:deprecating}
522
523 While we believe at this point that terms once approved by the IVOA
524 should never disappear -- for instance, because validators might
525 otherwise flag previously valid instance documents as invalid --, use
526 case~\ref{uc:deprecation} shows that some way of declaring
527 deprecations must be forseen.
528
529 \subsubsection{Public Availability of Machine-Readable Vocabularies}
530 \label{req:machine}
531
532 In particular in use cases~\ref{uc:links} and \ref{uc:filtering},
533 clients can flexibly incorporate vocabulary updates without code
534 changes, perhaps even without re-deployment, if vocabularies are
535 available at constant, public URIs, where clients can retrieve them in
536 formats reasonably easy to parse.
537
538 Use case~\ref{uc:discovering} implies that at least one representation
539 of the vocabulary should be human-readable.
540
541 \subsubsection{Minimal Term Metadata}
542 \label{req:mtm}
543
544 To support use case~\ref{uc:discovering}, all terms in IVOA vocabularies
545 MUST come with a non-trivial description.
546
547 \subsubsection{Simple Cases do not Require RDF Tooling}
548 \label{req:nordf}
549
550 (Not derived from any specific use case). Since libraries implementing
551 (some subset of) RDF tend to be rather massive and thus appear
552 unproportional when all a client wants is an up-to date list of terms
553 with their descriptions, at least the basic use cases must not require
554 specific RDF tooling. Indeed, simple uses should not require an
555 understanding of RDF in the first place.
556
557
558 \subsubsection{Vocabulary Evolution}
559 \label{req:evolution}
560
561 Most use cases make it desirable that terms can be added to existing
562 vocabularies; this is very clear for the reference positions in
563 use case~\ref{uc:votvoc}, where new instruments would imply new
564 terms. The history of content level annotation in VOResource mentioned
565 in use case~\ref{uc:deprecation} illustrates the desirability of a
566 simple process that invites standard authors to start with minimal
567 vocabularies, relying on later extensions.
568
569 \subsubsection{Traceable Provenance}
570 \label{req:traceable}
571
572 To satisfy use case~\ref{uc:understanding}, the considerations that led
573 to the adoption or modification of a term must be documented publicly
574 in sufficient detail. It is clearly an advantage if a brief, accessible
575 summary of these considerations can easily be found without, say,
576 resorting to version control logs.
577
578 \subsubsection{Preliminary Vocabularies and Terms}
579 \label{req:preliminary}
580
581 In use case~\ref{uc:simplereview}, it is desirable to admit
582 ``preliminary'' vocabularies and terms. For these, both humans
583 and machines must be able to discern a temporary status, and
584 their use implies that the general rule ``once valid, always
585 valid'' does not apply. Validators and similar software could
586 then add notices to that effect in their outputs.
587
588 \subsubsection{Vocabulary Files are Usable Stand-Alone}
589 \label{req:standalone}
590
591 Vocabulary files need to be cacheable without applications having to
592 manage extra metadata (e.g., the URL from which the file was obtained)
593 in order to easily satisfy use case~\ref{uc:offline} (or other scenarios
594 in which vocabulary content cannot be retrieved from the IVOA
595 site for each session).
596
597 \subsubsection{Externally Curated Vocabularies and VO Tooling}
598 \label{req:external}
599
600 Regrettably, VOResource does not explain how use case~\ref{uc:uat} would
601 look like in actual documents, and the example given in the document
602 clearly does not use UAT concepts.
603
604 The first difficulty in a straightforward uptake is that UAT URIs look
605 like \url{http://astrothesaurus.org/uat/1774}. Given that, should
606 publishers have such URIs in \xmlel{subject}? Or should they rather use
607 just the last URI segment for conciseness? Or perhaps the preferred
608 labels, in keeping with the style of existing subject content and its
609 use by clients (which typically look for natural language in subject),
610 even though the labels are not considered stable?
611
612 Regardless of how VOResource clarifies this matter, UAT artefacts (e.g.,
613 SKOS files), do not match some of our other requirements. In particular,
614 the human-readable URIs from \ref{req:lists}, the specific way we
615 satisfy \ref{req:machine}, and the non-RDF requirement \ref{req:nordf} are
616 not immediately satisfied by the UAT as distributed at the time of
617 writing.
618
619 For simple, uniform use of such externally curated vocabularies, it
620 should be possible to have some sort of endorsement process and then
621 distribute the vocabularies in a form compliant with this specification.
622 This will entail IVOA-specific concept URIs, and we must be able to
623 express that these resources have the same meaning as the ones
624 externally maintained.
625
626
627 \subsection{Non-Requirement}
628
629 This specification is not called ``Semantics in the VO'' or the like
630 because we do \emph{not} intend to prescribe ways to turn any VO
631 artefact into RDF triples. Indeed, for many existing vocabularies, it
632 is left open what exactly the domain or range of properties might be or
633 what subject and predicate the classes or concepts should be used with.
634
635 This is partly because this would substantially complicate the
636 generation of vocabularies -- which would quickly turn into proper
637 ontologies --, partly because the information encoded by
638 the triples has traditionally been expressed using techniques developed
639 by the Data Models working group.
640
641 In particular with a view to later use in linked data scenarios,
642 vocabulary authors should neverthess take care that, given appropriate
643 properties or annotation tools, the vocabularies \emph{could} be used in
644 meaningful RDF triples.
645
646 Conversely, this specification is written with future ``deeper''
647 semantics in the VO in mind; tools restricting their operations to the ones
648 discussed here should not break when future specifications enrich
649 existing vocabularies towards full ontologies.
650
651
652 \section{Using IVOA Vocabularies without RDF Tooling}
653 \label{sect:withoutrdf}
654
655 RDF is a
656 powerful system for expressing a wide range of semantics and enriching
657 various documents with semantic information in a globally distributed
658 fashion. Due to its generality, handling its artefacts is relatively
659 involved and in general requires special tooling, non-negligible
660 investment in understanding RDF, and non-trivial management of URIs and
661 prefix mappings.
662
663 To lower the bar for an adoption of IVOA vocabularies
664 [requirement~\ref{req:nordf}], they are given in
665 two formats usable without RDF tooling or, indeed, deeper knowledge of
666 RDF. This section discusses these.
667
668 \subsection{Choosing Terms From IVOA Vocabularies (non-normative)}
669
670 Resource annotators can usually treat IVOA Vocabularies as simple lists
671 of (case-sensitive) strings with human-readable labels and definitions.
672 These lists can be inspected with a simple web browser.
673
674 Each IVOA vocabulary has an associated URI starting with
675 \url{http://www.ivoa.net/rdf}. Dereferencing that URI yields a list of
676 the vocabularies approved or under review.
677
678 An individual vocabulary has a
679 URI like \url{http://www.ivoa.net/rdf/refposition}. Dereferencing this URI
680 with a web browser (or, indeed, any user agent indicating it prefers
681 text/html media) redirects to a tabular representation of the vocabulary,
682 giving:
683 \begin{itemize}
684 \item \emph{terms} -- i.e., the strings actually used in annotation,
685 \item \emph{labels} -- i.e., strings that should be presented to humans instead of
686 the slightly formalised terms, and
687 \item \emph{descriptions}, which should
688 be sufficiently precise to allow someone with a certain amount
689 of domain expertise to decide whether a certain ``thing'' is or is not
690 covered by the term (or more precisely, the underlying concept).
691 \end{itemize}
692
693 Some terms may be marked as deprecated, in which case they should no
694 longer be used in new annotations. In most cases, deprecated terms will
695 come with information about what to use instead.
696
697 Some terms may be marked as preliminary. Such terms might disappear
698 without further notice. Casual users should avoid the use of such
699 terms; if they find they want to use them, the semantics working group
700 requests notification over its mailing list, since such use is clearly
701 relevant to the term's adoption process.
702
703 Once a term is located within the HTML page, annotators can usually
704 directly use it in instance documents. For instance, continuing the
705 refposition example, the string \texttt{BARYCENTER} found in the
706 vocabulary is directly used in VOTable's TIMESYS element.
707
708 Some applications (Datalink being the prime example) instead use URIs
709 relative to the vocabulary URI. In practical terms, this just means
710 that a hash sign is prepended to the term (e.g., \texttt{\#progenitor}).
711
712 This latter practice builds on the property of IVOA vocabularies that if
713 one adds the term as fragment to the vocabulary URI (e.g.,
714 \url{http://ivoa.net/rdf/refposition#BARYCENTER}), that URI is the full,
715 RDF-compliant resource identifier of the concept. When used in
716 HTML-aware user agents (such as a web browser), dereferencing this URI
717 (i.e., opening it) will give the table of terms with the chosen term
718 highlighted. How exactly this is represented depends on the user agent.
719
720
721 \subsection{Semantic Operations Without RDF Tooling}
722 \label{sect:desise}
723
724 Many VO components need a machine-readable representation of the
725 entire vocabulary, for instance in order to
726 (cf.~sect.~\ref{sect:usecases}):
727
728 \begin{compactitem}
729 \item display labels and descriptions for terms to users,
730 \item perform query expansion or similar exploitation of hierarchical
731 relationships, or
732 \item validate annotated instances for the use of correct and current
733 terms.
734 \end{compactitem}
735
736 \subsubsection{Vocabularies in desise}
737
738 To let VO programs perform such tasks with minimal technical overhead,
739 in addition to the RDF artefacts described in
740 sect.~\ref{sect:deployment}, IVOA vocabularies are also available in an
741 ad-hoc format defined here for VO-internal use, nicknamed ``desise''
742 (``dead simple semantics''). Clients can retrieve vocabularies in
743 desise by requesting the vocabulary URI with the HTTP accept header set
744 to \texttt{application/x-desise+json}.
745
746 What is returned is a JSON-encoded \citep{std:JSON} mapping (``object''
747 in JSON terms)
748 containing the following keys (all mandatory):
749
750 \begin{description}
751 \item[uri] The vocabulary URI. All terms occurring in desise documents
752 can be turned into full, RDF-compliant resource URIs by prefixing them
753 with this URI and a hash character.
754 \item[flavour] The flavour of the vocabulary (can generally be ignored;
755 see sect.~\ref{sect:voccontent}).
756
757 \item[terms] A JSON object mapping the (machine-readable) terms to a
758 JSON object giving the term's properties as described below.
759 The keys in \textit{terms} are the strings used in
760 machine-readable data.
761 \end{description}
762
763 The JSON objects present as values in the terms object can have the
764 following keys:
765
766 \begin{description}
767 \item[label] (mandatory)
768 A human-readable label for display purposes; clients should
769 always try to display this rather than the raw term.
770
771 \item[description] (mandatory) A human-readable definition of the underlying
772 concept.
773
774 \item[deprecated] present and mapped to a reserved value if the term is
775 deprecated and should no longer be used; validators will warn against
776 its use.
777
778 \item[preliminary] present and mapped to a reserved value if the term
779 is preliminary, meaning that in contrast to the other, ``eternal'' terms
780 it can disappear again; validators should qualify a validation as
781 preliminary if a document uses such a term.
782
783 \item[wider] (mandatory) A JSON array
784 of ``wider'' terms. Most IVOA vocabularies are
785 tree-like, and for them, there is only up to one term in here, which
786 would be the the parent node, which is the hypernym of the current term.
787 In SKOS-flavoured vocabularies, multiple terms can be here, and the
788 meaning of ``wider'' is a bit less clear-cut. The \textit{wider} list
789 is empty for top-level terms.
790
791 \item[narrower] (mandatory) A JSON array
792 of ``narrower'' terms. In SKOS-flavoured
793 vocabularies, that is just a list of all terms that list the current
794 term as wider. Otherwise, the vocabularies are tree-like and
795 \textit{narrower} is a list of all terms on the term's branch and below
796 it in the tree (it is the ``transitive closure of the inverse of
797 wider''). This is much more easily understood in an example, which we
798 give below in the discussion on addressing use case~\ref{uc:links}.
799 \end{description}
800
801 Note that, while \textit{wider} and \textit{narrower} are mandatory
802 keys, their values can of course be empty lists.
803
804 See appendix~\ref{app:desiseexample} for a example of a vocabulary
805 represented in desise.
806
807 \subsubsection{Working with desise (non-normative)}
808
809 For illustration, here are recipes showing how to address
810 the various use cases in Python:
811
812 \paragraph{Load a vocabulary} Using the popular requests module:\\
813 \begin{lstlisting}
814 import requests
815 voc = requests.get(
816 "http://www.ivoa.net/rdf/uat",
817 headers={"accept": "application/x-desise+json"}
818 ).json()
819 \end{lstlisting}
820
821 Note, however, that non-trivial clients should cache files retrieved in
822 this way for a reasonable time span; IVOA vocabularies typically do not
823 change on time scales of months.
824
825 \paragraph{See if a term is in the vocabulary} (\ref{uc:simplevoc},
826 \ref{uc:votvoc})\\ \lstinline{term in voc["terms"]}
827
828 \paragraph{See if a term is deprecated} (\ref{uc:deprecation})\\
829 \lstinline{"deprecated" in voc["terms"][term]}
830
831 \paragraph{Find a human-readable label for a term}
832 (\ref{uc:discovering})\\
833 \lstinline{voc["terms"][term]["label"]}
834
835 \paragraph{Find a human-readable description for a term}
836 (\ref{uc:discovering})\\
837 \lstinline{voc["terms"][term]["description"]}
838
839 \paragraph{Find out if a term is preliminary} (\ref{uc:simplereview})\\
840 \lstinline{"preliminary" in voc["terms"][term]}
841
842 \paragraph{Query expansion: select branch} (in \ref{uc:links}, select all
843 progenitors, including flat fields, dark frames, etc)
844 \begin{lstlisting}[language=python]
845 base_term = "progenitor"
846 expanded_terms = set(
847 [base_term]
848 +voc["terms"][base_term]["narrower"])
849 is_match = datalink_row["semantics"][1:] in expanded_terms
850 \end{lstlisting}
851
852 \paragraph{SKOS-type query expansion by neighbouring terms}
853 (\ref{uc:filtering})
854 \begin{lstlisting}[language=python]
855 assert voc["flavour"]=="SKOS"
856 expanded_terms = set(
857 [base_term]
858 +voc["terms"][base_term]["narrower"]
859 +voc["terms"][base_term]["wider"])
860 is_match = keyword_found in expanded_terms
861 \end{lstlisting}
862
863
864 \section{Vocabulary Content}
865 \label{sect:voccontent}
866
867 IVOA vocabularies MUST be based on W3C's Resource Description Framework.
868 Details on required serialisations are given in
869 sect.~\ref{sect:deployment}. This section deals with what kinds of
870 statements users of IVOA vocabularies SHOULD evaluate to ensure
871 interoperability. Statements of other types are legal in IVOA
872 vocabularies but are not expected to be interpreted interoperably.
873 Clients MAY ignore them.
874
875 In IVOA vocabularies, the concept URI MUST begin with
876 \url{http://www.ivoa.net/rdf}\footnote{In retrospect, the unnecessary
877 ``www'' in this URI is somewhat regrettable, but existing vocabularies
878 have used URIs including it, and it seems a small price to pay for
879 having uniform URIs.}. It is recommended to not introduce
880 additional hierarchy levels, i.e., vocabulary URIs SHOULD be direct children
881 of \texttt{rdf}\footnote{Some existing vocabularies do not follow this
882 rule; since vocabulary URI changes will break certain usage scenarios,
883 their URIs are still retained.}.
884
885 Since all vocabularies specified here are
886 single-file, the full term (i.e., RDF resource)
887 URI is formed by appending a hash sign
888 and a fragment identifier. In IVOA vocabularies, this fragment
889 identifier MUST consist of ASCII letters, numbers, underscores and
890 dashes exclusively [for requirement~\ref{req:machine}].
891
892 The fragment identifiers in the vocabulary URIs SHOULD be
893 human-readable, usually by suitably contracting the
894 preferred label. In the IVOA, we do \emph{not} use natural
895 language-neutral concept identifiers but instead expect that domain
896 experts will already have an impression of a term's meaning from looking
897 at its URI.
898
899 Examples of URIs in the recommended form include:
900
901 \begin{itemize}
902 \item \url{http://www.ivoa.net/rdf/ivoasem#preliminary} for a
903 preliminary term by this specification.
904 \item \url{http://www.ivoa.net/rdf/timescale#TT} for the Terrestial Time
905 time scale.
906 \item \url{http://www.ivoa.net/rdf/uat#active-galactic-nuclei} for the
907 concept ``Active Galactic Nuclei''.
908 \end{itemize}
909
910 In this specification, we distinguish three different ``flavours'' of
911 vocabularies. Each covers a particular domain of problems and is
912 therefore subject to different requirements.
913 Although the requirements are largely non-contradicting, each vocabulary must
914 be clearly identified as \emph{either} giving SKOS concepts, RDFS
915 classes or RDF properties so clients know how to extract word lists and
916 hierarchies; see sect.~\ref{sect:genprop}
917 for details.
918
919
920 \subsection{SKOS Vocabularies}
921 \label{sect:skosvoc}
922
923 SKOS vocabularies should be used where terms are organised
924 in informal (i.e., non necessarily strict is-a)
925 hierarchies. The classic use case here is query expansion, where, for
926 instance, a search for ``AGN'' might be expanded to include matches for
927 ``accretion disk'' (under certain circumstances).
928
929 The terms in SKOS vocabularies have the RDF type \vocterm{skos:Concept}.
930
931 \subsubsection{Properties in SKOS Vocabularies}
932 \label{sect:skosvoc-prop}
933
934 IVOA SKOS vocabularies use the following properties:
935
936 \begin{itemize}
937 \item \vocterm{skos:broader} -- interpreted in the standard SKOS sense.
938 The reverse property, \vocterm{skos:narrower}, MAY be given, but clients
939 MUST NOT depend on their presence [this satisifies
940 requirement~\ref{req:hierarchy}].
941
942 \item \vocterm{skos:prefLabel} -- all concepts MUST have an
943 English-language preferred label, which is an RDF plain literal [by
944 requirement~\ref{req:mtm}]. No RDF language label is allowed on the
945 literal, and only one preferred label is permitted
946 [these help requirement~\ref{req:nordf}].
947
948 \item \vocterm{skos:definition} -- all concepts MUST have a non-trivial
949 English-language definition. It is obviously impossible to define
950 ``non-trivial'' in a rigorous way; a suggested criterion is that a
951 domain expert would, given the definition, presumably arrive at a
952 similar preferred label, and recursive definitions (i.e., those using
953 the label itself) should be avoided whenever possible. Definitions in
954 non-English languages are not permitted, and only one definition is
955 permitted [again, this helps requirement~\ref{req:mtm}].
956
957 \item \vocterm{skos:exactMatch} -- for externally managed vocabularies
958 the IVOA has endorsed (see sect.~\ref{sect:externally-managed}), this
959 property links the IVOA term (subject) to the external RDF resource
960 (object) [mostly for requirement~\ref{req:external}].
961
962 \item General properties discussed in \ref{sect:genprop} [this is
963 for requirements~\ref{req:deprecating} and
964 \ref{req:preliminary}]. The \vocterm{ivoasem:vocflavour} of these
965 vocabularies is \verb|SKOS|.
966 \end{itemize}
967
968 This specification does not include requirements on the use or the
969 interpretation of \vocterm{skos:related},
970 \vocterm{skos:closeMatch}, \vocterm{skos:broadMatch},
971 \vocterm{skos:narrowMatch}, \vocterm{skos:ConceptScheme},
972 \vocterm{skos:inScheme}, \vocterm{skos:hasTopconcept},
973 \vocterm{skos:altLabel}, and \vocterm{skos:hiddenLabel}. If use cases
974 are found that require those, this specification will be amended. Until
975 then, vocabulary authors SHOULD NOT use them in order to avoid creating
976 practices that might conflict with later usage patterns.
977
978 This specification does not include requirements on the use or the
979 interpretation of the transitive SKOS properties
980 (\vocterm{skos:broaderTransitive}, \vocterm{skos:narrowerTransitive}).
981 At this point, we believe that applications requiring this type of
982 reasoning-friendly semantics should preferably use RDF class
983 vocabularies.
984
985 \subsubsection{Example (non-normative)}
986
987 Here is a term from a SKOS vocabulary conforming to this specification
988 in RDF/XML serialisation:
989
990 \begin{lstlisting}[language=XML]
991 <skos:Concept rdf:about="http://ivoa.net/rdf/AstronomicalObjects#AGN">
992 <skos:prefLabel>AGN</skos:prefLabel>
993 <skos:definition>A compact object in the center of a galaxy showing
994 unusual emission ("active galactic nucleus").</skos:definition>
995 <skos:broader rdf:resource
996 ="http://ivoa.net/rdf/theory/AstronomicalObjects#OpticalSource"/>
997 <skos:broader rdf:resource
998 ="http://ivoa.net/rdf/theory/AstronomicalObjects#CompoundObject"/>
999 </skos:Concept>
1000 \end{lstlisting}
1001
1002 \subsection{RDF Properties Vocabularies}
1003 \label{sect:refpropvoc}
1004
1005 RDF properties vocabularies should be used when the terms in the
1006 vocabulary are mainly used to state
1007 relationships between entities that can sensibly be imagined as
1008 resources in the RDF sense. Such terms would naturally be used as
1009 predicates in RDF triples. Obvious examples might be something
1010 like is-progenitor-for in a provenance chain or, indeed, the special
1011 properties for IVOA vocabularies introduced in sect.~\ref{sect:genprop}.
1012
1013
1014 The terms in RDF Properties vocabularies have the RDF type
1015 \vocterm{rdf:Property}.
1016
1017 \subsubsection{Properties in RDF Properties Vocabularies}
1018 \label{sect:propvoc-prop}
1019
1020 IVOA RDF properties vocabularies use the following properties (where
1021 not specified, the requirements considered essentially match those in
1022 sect.~\ref{sect:skosvoc-prop}):
1023
1024 \begin{itemize}
1025 \item \vocterm{rdfs:label} -- all terms MUST have an English-language
1026 label, and clients should prefer it over the fragment in the
1027 term URI for presentation purposes. Only
1028 one such label is permitted.
1029
1030 \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial
1031 English-language comment serving as a human-oriented definition of the
1032 term. The considerations for \vocterm{skos:definition} in
1033 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one
1034 \vocterm{rdfs:comment} per term is allowed.
1035
1036 \item \vocterm{rdfs:subPropertyOf} -- interpreted as in RDFS to induce
1037 the hierarchy of terms; a term MUST NOT appear as subject of more than
1038 one \vocterm{rdfs:subPropertyOf} triple (i.e., the hierarchy is a tree).
1039
1040 \item General properties discussed in sect.~\ref{sect:genprop}.
1041 The \vocterm{ivoasem:vocflavour} of these vocabularies is
1042 \verb|RDF Property|.
1043
1044 \end{itemize}
1045
1046 \subsubsection{Example (non-normative)}
1047 \label{sect:rdfpxex}
1048
1049 \begin{lstlisting}[language=XML]
1050 <rdf:Property rdf:about
1051 ="http://www.ivoa.net/rdf/datalink/core#preview-image">
1052 <rdfs:comment>preview of the data as a 2-dimensional
1053 image</rdfs:comment>
1054 <rdfs:label>Image preview</rdfs:label>
1055 <rdfs:subPropertyOf rdf:resource
1056 ="http://www.ivoa.net/rdf/datalink/core#preview"/>
1057 </rdf:Property>
1058 \end{lstlisting}
1059
1060
1061 \subsection{RDF Class Vocabularies}
1062
1063 RDF class vocabularies should be used when the terms in the vocabulary
1064 are reasonably class-like, i.e., would usually be either subjects or
1065 objects in RDF triples. As opposed to SKOS vocabularies, the hierarchy
1066 implied is strict in the sense of \vocterm{rdfs:subClassOf}
1067 (roughly: statements that are true for a wider term must be true
1068 for a more specialised term, too). This lets clients confidently perform
1069 inferences.
1070
1071 For instance, coordinates in the FK4 reference frame are equatorial, and
1072 thus even a client unfamiliar with the FK4 frame as such can confidently
1073 infer that the coordinates are right ascension and declination, and that
1074 right ascensions increase eastwards. Reasoning of this type is
1075 impossible within a SKOS vocabulary.
1076
1077 The terms in RDF Class vocabularies have the RDF type
1078 \vocterm{rdfs:Class}.
1079
1080 \subsubsection{Properties in RDF Class Vocabularies}
1081 \label{sect:classvoc-prop}
1082
1083 IVOA RDF class vocabularies use the following properties:
1084
1085 \begin{itemize}
1086 \item \vocterm{rdfs:label} -- all terms MUST have an English-language
1087 label, and clients should prefer it over the term (the fragment of the
1088 term URI) for presentation purposes. Only
1089 one such label is permitted.
1090
1091 \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial
1092 English-language comment serving as a human-oriented definition of the
1093 term. The considerations for \vocterm{skos:definition} in
1094 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one
1095 \vocterm{rdfs:comment} per term is allowed.
1096
1097 \item \vocterm{rdfs:subClassOf} -- interpreted as in RDFS to induce
1098 the hierarchy of terms; a term MUST NOT appear as subject of more than
1099 one \vocterm{rdfs:subClassOf} triple (i.e., the hierarchy is a tree).
1100
1101 \item General properties discussed in \ref{sect:genprop}.
1102 The \vocterm{ivoasem:vocflavour} of these vocabularies is
1103 \verb|RDF Class|.
1104 \end{itemize}
1105
1106 \subsubsection{Example (non-normative)}
1107
1108 Here is a term from an RDF class vocabulary conforming to this
1109 specification in RDF/XML serialisation:
1110
1111 \begin{lstlisting}[language=XML]
1112 <rdfs:Class rdf:about="http://www.ivoa.net/rdf/refframe#FK5">
1113 <rdfs:comment>
1114 Positions based on the 5th Fundamental Katalog. If no equinox is
1115 [...]
1116 </rdfs:comment>
1117 <rdfs:label>FK5</rdfs:label>
1118 <rdfs:subClassOf rdf:resource
1119 ="http://www.ivoa.net/rdf/refframe#EQUATORIAL"/>
1120 </rdfs:Class>
1121 \end{lstlisting}
1122
1123 \subsection{General Properties}
1124 \label{sect:genprop}
1125
1126 To cover requirements~\ref{req:deprecating} and
1127 \ref{req:preliminary} and to facilitate the handling of vocabularies not
1128 directly retrieved via HTTP (which means that the application may not
1129 know the vocabulary URI a priori; cf.~requirement~\ref{req:standalone}),
1130 the Semantics WG defines some
1131 properties of its own in the vocabulary
1132 \url{http://www.ivoa.net/rdf/ivoasem}. The following properties may be
1133 used in all three vocabulary flavours:
1134
1135 \begin{itemize}
1136 \item \vocterm{dc:created} -- IVOA vocabularies MUST include exactly one
1137 triple with the vocabulary as subject and a predicate
1138 \vocterm{dc:created}. The object is the datestamp of the vocabulary in
1139 YYYY-MM-DD format. Clients may only use this for debugging and similar
1140 purposes.
1141
1142 \item \vocterm{ivoasem:vocflavour} -- IVOA vocabularies MUST include
1143 exactly one triple with the vocabulary as subject and a string literal
1144 specifying the kind of vocabulary as per this specification. The
1145 ``General properties'' bullet points of sects.~\ref{sect:skosvoc-prop}
1146 (\verb|SKOS|), \ref{sect:propvoc-prop} (\verb|RDF Property|), and
1147 \ref{sect:classvoc-prop} (\verb|RDF Class|) define what strings may occur
1148 here.
1149
1150 \item \vocterm{ivoasem:preliminary} -- this property indicates
1151 that a term is preliminary and might disappear from the
1152 vocabulary without warning. The object of triples using it
1153 is a blank node. Validators need not warn against the use
1154 of preliminary terms, but as they encounter them, they SHOULD
1155 qualify their validation to the effect that it is temporary.
1156
1157 \item \vocterm{ivoasem:deprecated} -- this property indicates
1158 that a term is deprecated. The object of triples using it
1159 is a blank node. Validators SHOULD issue warnings if such terms
1160 are encountered.
1161
1162 \item \vocterm{ivoasem:useInstead} -- for a deprecated term, the
1163 objects of RDF triples using this property indicate
1164 which terms should be
1165 used instead of the deprecated one. This property MUST NOT be used with
1166 non-deprecated subjects.
1167
1168 \end{itemize}
1169
1170 \subsubsection{Example (non-normative)}
1171
1172 The following snippets show RDF/XML triples using the common terms,
1173 taken from the existing relationship\_type vocabulary; the notation
1174 \verb|__| as a blank node is an implementation detail and must not be
1175 relied upon. In general, where ivoasem properties take blank nodes as
1176 objects, clients should normally just ignore the objects.
1177
1178 \begin{lstlisting}[language=XML]
1179 <rdf:Description rdf:about
1180 ="http://www.ivoa.net/rdf/voresource/relationship_type">
1181 <dc:created>2016-08-17</dc:created>
1182 </rdf:Description>
1183 <rdf:Description rdf:about
1184 ="http://www.ivoa.net/rdf/voresource/relationship_type">
1185 <ivoasem:vocflavour>RDF Property</ivoasem:vocflavour>
1186 </rdf:Description>
1187 <rdf:Description rdf:about
1188 ="http://www.ivoa.net/rdf/voresource/relationship_type#IsPartOf">
1189 <ivoasem:preliminary rdf:resource=
1190 "http://www.ivoa.net/rdf/voresource/relationship_type#__"/>
1191 </rdf:Description>
1192 <rdf:Description rdf:about
1193 ="http://www.ivoa.net/rdf/voresource/relationship_type#derived-from">
1194 <ivoasem:deprecated rdf:resource
1195 ="http://www.ivoa.net/rdf/voresource/relationship_type#__"/>
1196 <ivoasem:useInstead rdf:resource
1197 ="http://www.ivoa.net/rdf/voresource/relationship_type#IsDerivedFrom"/>
1198 </rdf:Description>
1199 \end{lstlisting}
1200
1201
1202 \section{Vocabulary Management}
1203 \label{sect:management}
1204
1205 This section discusses the processes through which new vocabularies can be
1206 defined and how vocabulary updates are performed in way
1207 that ensures community participation and at least a minimal level of
1208 consensus. Procedures here primarily address requirements
1209 \ref{req:consensus}, \ref{req:evolution} and \ref{req:traceable}.
1210
1211 In the following, the phrase ``chair of the Semantics WG'' is understood
1212 to mean ``chair or vice-chair of the Semantics WG''; in the unlikely
1213 situation that chair and vice-chair dissent, the resolution of the
1214 problem is up to the TCG chair.
1215
1216
1217 \subsection{New Vocabularies}
1218 \label{sect:new-vocabularies}
1219
1220 New vocabularies in the VO should be introduced with a document going
1221 through the normal IVOA approval process, i.e., intended to become a
1222 recommendation or an endorsed note, with RFC as described in the IVOA
1223 Document Standards \citep{2017ivoa.spec.0517G}.
1224
1225 At the discretion of the chair of the Semantics WG, the vocabulary is
1226 uploaded to the vocabulary repository when a document reaches the state
1227 of a Working Draft. At the latest, the vocabulary is uploaded when the
1228 document becomes a Proposed Recommendation or a Proposed Endorsed Note
1229 in order to support a thorough review and reference implementations.
1230
1231 The entire vocabulary is marked human-readably as preliminary in the
1232 vocabulary index (cf.~sect.~\ref{sect:deployment}). All terms in the
1233 vocabulary are marked as preliminary using the
1234 \vocterm{ivoasem:preliminary} property (cf.~sect.~\ref{sect:genprop}) in
1235 order to satisfy requirement~\ref{req:preliminary}.
1236
1237 The entire new vocabulary gets approved as the document introducing it
1238 reaches the status of Recommendation or Endorsed Note. At that point,
1239 all its terms become un-deprecated. From then
1240 on, it is managed by the Semantics WG using the process defined in
1241 the next section.
1242
1243 Once approved (i.e., no longer marked as preliminary),
1244 terms in IVOA vocabularies cannot be removed. They can,
1245 however, be marked as deprecated.
1246
1247 \subsection{Updating Vocabularies}
1248 \label{sect:updating-vocabularies}
1249
1250 IVOA vocabularies can be extended as domain requirements develop
1251 [requirement~\ref{req:evolution}]. Clients
1252 should therefore be designed such that they gracefully deal with terms
1253 that have not been part of the vocabulary at build time, typically by
1254 exploiting information in the vocabulary, perhaps by falling back to
1255 wider, known terms, or by presenting their users labels and descriptions
1256 for terms not explicitly handled.
1257
1258
1259 \subsubsection{Vocabulary Enhancement Proposals}
1260
1261 To add one or more terms to a vocabulary, to introduce deprecations or
1262 to change term labels, descriptions, or relationships,
1263 an interested party -- not necessarily affiliated with the Working Group
1264 that has originally introduced the vocabulary -- prepares a Vocabulary
1265 Enhancement Proposal (VEP). In the interest of thorough review and
1266 topical discussion, a single VEP should only cover directly related
1267 terms. For instance, in a vocabulary of reference frames, it would be
1268 reasonable to add old-style and new-style galactic frames in one
1269 VEP, but not, say, azimuthal and supergalactic coordinates. The
1270 arguments for both terms in the former pair are rather
1271 analogous\footnote{This does not rule out that, in the example, one
1272 might argue that old-style galactic coordinates are so ancient that
1273 perhaps they should not be supported in the VO at all; the chair of the
1274 Semantics WG might then decree that the VEP still needs to be split.}.
1275 In the latter case, two very different rationales would have
1276 to be put forward, which is a clear sign that two VEPs are in order.
1277
1278 \begin{figure}
1279 \begin{verbatim}
1280 Vocabulary: http://www.ivoa.net/rdf/datalink/core
1281 Author: msdemlei@ari.uni-heidelberg.de
1282 Date: 2019-07-19
1283
1284 Term: IsPreviousVersionOf
1285 Action: Addition
1286 Label: Newer Version
1287 Description: This dataset in a previous edition, e.g., processed
1288 with an older pipeline, as part of an older data release.
1289 Relationships: rdfs:subProperyOf(this)
1290 Used-in: http://example.org/datalink?ID=doc-v1
1291
1292 Term: IsNewVersionOf
1293 Action: Addition
1294 Label: Previous Version
1295 Description: This dataset in a newer edition, e.g., processed
1296 with a newer pipeline, as part of a newer data release.
1297 Relationships: rdfs:subProperyOf(this)
1298 Used-in: http://example.org/datalink?ID=doc-v2
1299
1300 Rationale:
1301
1302 The terms are mainly intended for projects with data releases.
1303 IsPreviousVersionOf allows services to mark up links to (typically
1304 datalink documents for) later version(s) of this data set. It
1305 allows a client to alert users that a newer, probably improved,
1306 rendition of the current dataset is available and should
1307 presumably be used instead of what they are looking at. The
1308 inverse relationship, IsNewVersionOf, is useful if projects want
1309 to keep previous versions of the dataset findable without having
1310 them show up in the default queries.
1311
1312 The terms are taken from the relationship types of DataCite.
1313 \end{verbatim}
1314
1315 \caption{A sample VEP.}
1316 \label{fig:vepsample}
1317 \end{figure}
1318
1319 A VEP is a semistructured text file containing the following items:
1320
1321 \begin{itemize}
1322 \item \vepitem{Vocabulary:} The URI of the vocabulary
1323 \item \vepitem{Author:} Contact information for the author(s) of
1324 the VEP.
1325 \item \vepitem{Date:} The date on which the VEP was posted.
1326 \item \vepitem{Term:} The identifier of the term to be added, modified,
1327 or deleted.
1328 \item \vepitem{Action:} one of \textit{Addition}, \textit{Deprecation}, or
1329 \textit{Modification}.
1330 \item \vepitem{Label:} The English-language, human-readable label of the term.
1331 \item \vepitem{Description:} The description that will come with the term.
1332 \item \vepitem{Relationships}: If applicable, relationships the new
1333 term will have to existing terms, using the properties defined in
1334 the present document.
1335 \item \vepitem{Used-In}: At least one URI of a document using the
1336 proposed term.
1337 \item \vepitem{Rationale}: A discussion of use cases, the role of the term in
1338 the vocabulary, and the like. In particular, the item(s) in Used-In
1339 should be commented on.
1340 \end{itemize}
1341
1342 The items \vepitem{Term}, \vepitem{Action}, \vepitem{Label},
1343 \vepitem{Description}, \vepitem{Used-in},
1344 and \vepitem{Relationships}, may be repeated if
1345 multiple terms are affected by a VEP. In \textit{Addition} VEPs, all items
1346 except \vepitem{Relationships} are mandatory.
1347
1348 When \vepitem{Action} is \textit{Deprecation}, \vepitem{Label},
1349 \vepitem{Description}, and \vepitem{Relationships} are optional but can be
1350 given if useful for understanding the VEP. The rationale MUST discuss
1351 the reasons for a deprecation. Usually, one or more replacement
1352 term(s) will be proposed within the same VEP.
1353
1354 When \vepitem{Action} is \textit{Modification}, \vepitem{Label},
1355 \vepitem{Description}, and \vepitem{Relationships} give the proposed new
1356 values of the term. The term itself cannot be modified. The rationale
1357 will usually detail the changes proposed while mentioning the previous
1358 values.
1359
1360 We do not expect the VEPs to be evaluated by machines. Therefore, we
1361 define no grammar for the markup of sections, section headers, and their
1362 content. It is still recommended that authors follow the formatting of
1363 the example in Fig.~\ref{fig:vepsample}.
1364
1365 \subsubsection{Publishing a VEP}
1366
1367 To publish a VEP, it is sent to the chair of the Semantics WG,
1368 preferably by e-mail. The chair of the Semantics WG will perform a
1369 formal validation, in particular as regards the presence of all required
1370 items and syntactically valid relationships. No assessment of the
1371 contents is done at this stage.
1372
1373 VEPs formally valid then receive a running number. The first VEP was
1374 VEP-0001, the second VEP-0002, and so on. The chair of the Semantics WG
1375 then adds the new VEP to the public index of VEPs as
1376 ``Current'' (see Appendix~\ref{app:curtech} for the technical details).
1377 This index has a link to each VEP's text (in general, a location in a
1378 version control system).
1379
1380 Once the VEP is uploaded, it is announced to the IVOA Semantics Working
1381 Group and all other IVOA Working Groups concerned (again, the technical
1382 details are found in Appendix~\ref{app:curtech}). The chair of the
1383 Semantics WG can extend the distribution as they see fit. The
1384 announcement in particular contains a copy of the VEP in question.
1385
1386 As soon as possible after the upload, the chair of the Semantics WG adds
1387 any term(s) proposed to the vocabulary as a preliminary term using the
1388 \vocterm{ivoasem:preliminary} property. This means that the terms can
1389 immediately be used without raising warnings or errors, but in contrast
1390 to approved terms, they may disappear again. Deprecation or
1391 modification VEPs have no immediate effect.
1392
1393 \subsubsection{Approval Process}
1394 \label{sect:approval}
1395
1396 Discussion of a VEP takes place in the WGs' discussion forums (again,
1397 see Appendix~\ref{app:curtech}). The chair of the Semantics WG will
1398 summarise the discussion in the VEP in a \textit{Discussion} section.
1399
1400 During the process, all parts of the VEP may be changed except the
1401 term(s) proposed.
1402
1403 Once the chair of the Semantics WG sees a sufficient consensus reached,
1404 they announce the VEP in the TCG. If, at the next meeting of the TCG,
1405 no Working Group objects to the VEP, it is accepted and the marker that
1406 a term is preliminary is removed from the relationships of any terms
1407 added by the VEP. In the case of deprecation or modification VEPs, the
1408 requested actions are taken at this point.
1409
1410 If, on the other hand, discussion of an addition request results in the
1411 realisation that terms proposed need to be changed, the VEP in question
1412 must be withdrawn, its effects on the vocabulary be undone, and zero or
1413 more new VEPs are posted containing proposals for terms for which
1414 consensus appears feasible. The VEP withdrawn receives a
1415 \vepitem{Superceded-by} item referencing any new VEPs, any new VEPs have
1416 a \vepitem{Supercedes} item referencing the original VEP.
1417
1418 \subsubsection{Guidelines for Creating Concepts (non-normative)}
1419
1420 When introducing terms, it is useful to consider a very simple
1421 semantic model, where the world is a set of (tangible or non-tangible)
1422 ``things'' in the sense of naive set theory.
1423
1424 A vocabulary has a scope, which is a subset of the world; this could be
1425 ``reference systems'' or ``astronomical object types'' or even something
1426 as concrete as ``observatories''.
1427
1428 In this picture, a term denotes a certain subset of a vocabulary's
1429 scope. This set is called the term's (or, where an additional level
1430 between the concrete letters making up the term as defined by this
1431 document and the set is useful, the concept's) ``extension''.
1432
1433 Now, in an ideal vocabulary the extensions of its
1434 top-level terms are disjunct (meaning: each thing in scope of the vocabulary
1435 belongs to not more than one top-level term's extension) and the terms cover the
1436 entire scope (meaning: for each thing in the scope, there is at least
1437 one term's extension that contains that thing). The top-level terms are
1438 equivalence classes over the vocabulary's scope.
1439
1440 Where vocabularies are hierarchical, analogous considerations would
1441 apply for the extensions of a general term and its more specialised
1442 terms.
1443
1444 When natural language and the real world are involved,
1445 this ideal generally is unreachable.
1446 But when proposing a term and its definition, authors should try to
1447 make sure that
1448
1449 \begin{compactenum}
1450 \item their new term has a useful extension (i.e., consumers actually
1451 want to know whether a thing is or is not inside it)
1452 \item the extension is reasonably disjunct from existing terms, or is a
1453 true superset (in which case the other terms are narrower), or is a true
1454 subset (in which case they are wider) of other terms' extensions.
1455 \end{compactenum}
1456
1457 Put another way: When designing terms, it is as important to say what is
1458 not covered as to clearly say what is.
1459
1460 This is a major reason why it is important to give clear definitions
1461 whenever these definitions are not uniquely given by the domain. For
1462 instance, while an object type vocabulary probably does not need to be
1463 very diligent in defining $\delta$~Cephei stars because the extension of
1464 that term is uncontroversial to first order\footnote{Although it might
1465 seem desirable to clarify whether, say, W~Virginis stars are or are not
1466 excluded}, a term like ``dataset'' should come with a precise
1467 definition, ideally containing a reference to a longer explanation.
1468
1469 \subsection{Externally Managed Vocabularies}
1470 \label{sect:externally-managed}
1471
1472 The IVOA is not the only body developing vocabularies, and of course VO
1473 components are free to use other, non-IVOA vocabularies whenever
1474 convenient or even required for interoperability beyond the IVOA.
1475
1476 Sometimes, however, it is advantageous to subject an external vocabulary
1477 to the requirements set forth by this specification. The motivating use
1478 case here is \ref{uc:uat}, the Unified Astronomy Thesaurus. As derived
1479 in requirement~\ref{req:external}, multiple considerations make a
1480 ``mirror'' of the vocabulary in the IVOA RDF repository highly
1481 desirable. Regrettably, since RDF resources (i.e., what we call terms
1482 here) are identified by their full URIs, this will create new RDF
1483 resources, and hence care must be taken that RDF tools can work out the
1484 identity of the mirrored IVOA terms and the original RDF resources.
1485
1486 Also, the processes from sects.~\ref{sect:new-vocabularies}
1487 and~\ref{sect:updating-vocabularies} obviously cannot apply to such
1488 vocabularies, which have their own management procedures.
1489
1490 To address these issues, the following rules apply:
1491
1492 When a vocabulary managed by an IVOA-external body needs to be made
1493 available in the form prescribed by this specification, a proposal for
1494 doing this needs to pass the endorsed notes process of the IVOA as laid
1495 out in the IVOA Document Standards \citep{2017ivoa.spec.0517G}. As it
1496 concerns external relationships of the IVOA, it additionally needs
1497 endorsment by the IVOA Executive Committee to become effective.
1498
1499 This proposal has to specify:
1500 \begin{itemize}
1501 \item The basic metadata for the vocabulary on the IVOA side.
1502 \item The rules for mapping the external RDF resource URIs to IVOA term
1503 URIs, together with a plan for how this mapping is kept stable.
1504 \item If during the mapping of the vocabulary, external RDF triples are
1505 discarded (which likely is necessary to ensure adherence to our
1506 constraints), what triples are discarded.
1507 \item A description of and reference to software that performs this
1508 mapping.
1509 \item A description of the external management process.
1510 \end{itemize}
1511
1512 The proposing party has to provide software to automatically translate
1513 resources from the external format to a suitable input for the IVOA
1514 vocabulary tooling.
1515
1516 Each term in the IVOA vocabulary mirror MUST declare its identity to
1517 the original, external RDF resource. At this point, this is only
1518 defined for SKOS-flavoured vocabularies, where the IVOA term must be the
1519 subject of exactly one triple with the \vocterm{skos:exactMatch}
1520 property. The object of that triple is the URI of the external RDF
1521 resource.
1522
1523 For other flavours, no such mechanism is defined in this version of the
1524 specification, which means that for now, externally managed vocabularies
1525 must use the SKOS flavour.
1526
1527 Once an external vocabulary is endorsed by both the TCG and the
1528 Executive Committee, the chair of the Semantics working group has the
1529 responsibility to keep the IVOA mirror of the vocabulary synchronised,
1530 ideally by using a monitored, automatised process like a post-commit
1531 action on an external version control system.
1532
1533
1534 \section{Publishing Vocabularies}
1535 \label{sect:deployment}
1536
1537 This section is an adaptation of \citet{note:cooluris} and is
1538 intended to satisfy requirements~\ref{req:machine}
1539 and~\ref{req:mtm}. It also briefly discusses how IVOA vocabularies
1540 should be referenced.
1541
1542 \subsection{Deploying Vocabularies}
1543
1544 All IVOA-approved vocabularies are accessible as children of
1545 \url{http://www.ivoa.net/rdf}. Dereferencing that URI will lead to an
1546 index of current approved and proposed vocabularies.
1547 Vocabularies still under review are clearly marked as such.
1548
1549 When dereferencing a vocabulary URI, clients will receive an HTTP 303
1550 (See Other) code, with the \texttt{Location} header set to the last
1551 version of the vocabulary. The version is written as the date of the
1552 last update in the format YYYY-MM-DD. Depending on the value of the
1553 request's accept header, the redirect will end up at
1554
1555 \begin{itemize}
1556 \item an HTML rendition of the vocabulary by default. The HTML element
1557 corresponding to a term has the term (i.e., the fragment identifier in the
1558 term's URI) as its HTML id ; hence a URI
1559 \verb|<vocabulary URI>#<term>| will immediately focus the term's HTML
1560 rendition in common
1561 user agents [requirement~\ref{req:mtm}].
1562
1563 \item a Turtle rendition of the vocabulary if the accept header
1564 indicates that \verb|text/turtle| documents are preferred.
1565
1566 \item an RDF/XML rendition of the vocabulary
1567 if the accept header indicates that
1568 \verb|application/rdf+xml| documents are preferred.
1569
1570 \item an ad-hoc JSON rendition of the vocabulary as specified in
1571 sect.~\ref{sect:desise} if the accept header indicates that
1572 \verb|application/x-desise+json| documents are preferred.
1573 \end{itemize}
1574
1575 Individual vocabularies may be available in additional formats.
1576 Content negotiation might then consider additional media types.
1577
1578 Clients may record the full versioned URI of the vocabulary used for
1579 debug or provenance purposes. These URIs, however, MUST NOT be used
1580 externally. In particular, a URI like
1581 \url{http://www.ivoa.net/rdf/example/2019-07-14/example.html#term} has no
1582 RDF meaning by this standard and must never be used in publicly visible
1583 RDF triples. Always use URIs of the form
1584 \url{http://www.ivoa.net/rdf/example#term}.
1585
1586 \subsection{Referencing Vocabularies}
1587
1588 Since IVOA vocabularies, at least after some time, generally are a
1589 collective effort with a continuous evolution, it is inappropriate to
1590 cite them in the conventional author-year-title format.
1591
1592 However, the vocabulary URI is intended to be stable and uniquely
1593 identifies the vocabulary as such. Hence, this URI is what should
1594 normally be cited. The standard style would be along the lines of
1595 \begin{lstlisting}[language={}]
1596 Terms in this field must be taken from the IVOA vocabulary
1597 \url{http://www.ivoa.net/rdf/voresource/content_level}.
1598 \end{lstlisting}
1599 or, in formats where footnotes are appropriate and inline URIs should be
1600 avoided for typographical reasons
1601 \begin{lstlisting}[language={}]
1602 Terms in this field must be taken from the IVOA vocabulary
1603 \emph{Content levels for VO resources}\footnote{
1604 \url{http://www.ivoa.net/rdf/voresource/content_level}}.
1605 \end{lstlisting}
1606 -- the footnote anchor should be the vocabulary name as given in the
1607 IVOA vocabulary repository\footnote{\url{http://www.ivoa.net/rdf}}.
1608
1609 Except in the rare cases in which version-sharp references are actually
1610 necessary (for instance, descriptions of errors), it is inappropriate to
1611 references URLs with dates (e.g.,
1612 \url{http://ivoa.net/rdf/voresource/content_level/2016-08-17/}). URIs
1613 to actual resources (e.g., the XML or Turtle renditions) must never be
1614 used to reference vocabularies.
1615
1616 We do not see a relevant use case for having IVOA vocabularies formally
1617 cited in reference sections of scholarly works: such references will not
1618 aid in finding them, and there is no credible benefit in tracking their
1619 usage from citation in literature.
1620
1621
1622 \appendix
1623 \section{The 2019 IVOA Vocabulary Toolset (non-normative)}
1624 \label{app:tools}
1625
1626 This appendix describes the recommended toolset for authoring IVOA
1627 vocabularies as of 2019. Vocabulary authors may decide to use other
1628 tools but should consider that that may incur additional work for the
1629 chair of the Semantics WG in later maintenance.
1630
1631 This appendix is non-normative. It will serve as documentation of the
1632 toolset and will occasionally be updated as the tooling evolves;
1633 vocabulary authors are still advised to inspect documentation within the
1634 tools. Even major changes here will not lead to a new major version of
1635 the standard.
1636
1637
1638 \subsection{Input Format}
1639
1640 In the current tooling, RDF class and property
1641 vocabularies are authored in simple CSV files
1642 with five columns. These columns are:
1643
1644 \begin{description}
1645 \item[term]
1646 This is the actual, machine-readable vocabulary term. Only use
1647 letters, digits, underscores, and dashes here. As specified in
1648 sect.~\ref{sect:voccontent}, these identifiers should be
1649 human-readable, even though they are not directly intended for human
1650 consumption (clients will use the label). In the interest of
1651 reasonably compact URIs we advise to keep the length of the
1652 terms below, say, 30 characters.
1653 \item[level]
1654 This is used for simple input of wider/narrower relationships.
1655 It is 1 for ``root'' terms. Terms with a level of 2 that follow a
1656 root term become its children. i.e., the tooling will add the
1657 appropriate wider relationship between the level 2 and the level 1
1658 term. You can nest, i.e., have
1659 terms of level 3 below terms of level 2. Note that this means the
1660 order of rows must be preserved in the CSV files: Do \emph{not} sort
1661 vocabulary CSVs.
1662 \item[label]
1663 This is a short, human-readable label for the term. In the VO, this
1664 is generally derived fairly directly from the content of the first
1665 column, usually by
1666 inserting blanks at the right places and fixing capitalisation.
1667 \item[description]
1668 This is a longer explanation of what the term means. We do not
1669 support any markup here, not even paragraphs, so there is probably a
1670 limit to how much can be communicated.
1671 \item[more\_relations]
1672 This column can be used to declare non-hierarchical relationships
1673 and contains whitespace-separated declarations. Each declaration has
1674 the form property[(term)]. Omitting the term is allowed for certain
1675 properties; in RDF, this corresponds to a blank node. See below for
1676 the common properties supported here. Plain terms are resolved
1677 within the vocabulary, but CURIEs with known prefixes or full URIs are
1678 admitted, too.
1679 \end{description}
1680
1681 Non-ASCII characters are allowed in label and description; files must be
1682 encoded in UTF-8, the column separator currently is required to be a
1683 semicolon in order to save on escaping with descriptions (which very
1684 commonly contains commas). Fields that contain semicolons are escaped
1685 with double quotes, embedded double quotes are doubled.
1686
1687 The following properties are supported in the more\_relations
1688 column:
1689
1690 \begin{itemize}
1691 \item \vocterm{ivoasem:deprecated} -- see sect.~\ref{sect:genprop}.
1692 \item \vocterm{ivoasem:useInstead} -- see sect.~\ref{sect:genprop}.
1693 \item \vocterm{ivoasem:preliminary} -- see sect.~\ref{sect:genprop}.
1694 \end{itemize}
1695
1696 \subsection{Vocabulary Metadata}
1697 \label{sect:vocmeta}
1698
1699 Global vocabulary metadata is kept an INI-style format. The following
1700 keys are understood:
1701
1702 \begin{description}
1703 \item[timestamp]
1704 A manually maintained date of the last modification. This is
1705 essentially a version marker and should be changed only in preparation
1706 for a release. It is recommended to set it to the intended release
1707 date during development and not change it for every edit.
1708 \item[title]
1709 A human-readable short phrase saying what the vocabulary describes.
1710 \item[flavour]
1711 One of \textit{RDF Class}, \textit{RDF Property}, or \textit{SKOS}
1712 (where SKOS currently expects RDF/XML serialised SKOS rather than CSV).
1713 \item[description]
1714 A longer text (about a paragraph) stating what the vocabulary should
1715 be used for. No markup is supported here.
1716 \item[authors]
1717 Persons involved with the creation of the vocabulary. These are \emph{not}
1718 the persons to ask for maintenance; all requests for changes should be
1719 directed to the Semantics working group first.
1720 \item[filename]
1721 The tooling expects the input at
1722 \verb|<vocabulary name>/terms.csv|. If it is kept elsewhere, give
1723 the source file name here. This is to support legacy
1724 vocabularies with nonstandard names and native SKOS input.
1725 \item[draft]
1726 While a vocabulary is still being reviewed in its entirety, add a key
1727 draft set to \texttt{True}. This will add language to the effect that
1728 terms may still vanish from the vocabulary and mark all terms as
1729 preliminary. Once the vocabulary is approved, this key is deleted.
1730 \item[licenseuri]
1731 IVOA-managed vocabularies are always made available under CC-0 and
1732 hence do not use this key. External vocabularies as per
1733 sect.~\ref{sect:externally-managed} may be subject to actual licences,
1734 in which case this field holds a URI containing the licence's
1735 conditions.
1736 \item[licensehtml]
1737 This is arbitrary HTML expressing whatever licence terms may be
1738 attached to an external vocabulary. Again, do not use for IVOA
1739 vocabularies.
1740 \end{description}
1741
1742 Currently, the global metadata is maintained in a file
1743 \verb|vocabs.conf| in the root of the vocabulary source repository, with one
1744 section per vocabulary. The section name is the vocabulary name.
1745
1746 \subsection{Vocabulary Source Repository}
1747
1748 Vocabulary authors are encouraged to maintain their vocabularies in the
1749 shared version control system of the IVOA. At the time of writing, this
1750 is a subversion repository at
1751 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/voc-source}.
1752
1753 Authors of new vocabularies should create a child directory and place
1754 their terms.csv file in there. They should then edit \verb|vocabs.conf|
1755 and add a section named after their directory with the content discussed
1756 in sect.~\ref{sect:vocmeta}.
1757
1758
1759 \section{Current Network Resources (non-normative)}
1760 \label{app:curtech}
1761
1762 This appendix details network resources used in vocabulary management.
1763 It is non-normative and will occasionally be updated as the IVOA's
1764 infrastructure evolves. Even major changes here will not lead to a new
1765 major version of the standard.
1766
1767 The list of vocabulary enhancement proposals is maintained in the IVOA's
1768 wiki at
1769 \url{https://wiki.ivoa.net/twiki/bin/view/IVOA/VEPs}.
1770 Approved VEPs will be moved to an archive page linked there.
1771 VEPs may be added as attachments to this page, but authors are
1772 encouraged to maintain them in version controlled repositories instead.
1773 The recommended place to do that is
1774 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/veps}.
1775
1776 The discussion of VEPs (see sect.~\ref{sect:approval}) is to take place
1777 on the appropriate mailing list(s). See
1778 \url{http://ivoa.net/members/index.html} for a directory of IVOA mailing
1779 lists and their addresses.
1780
1781 \section{An Example for a Vocabulary in Desise (non-normative)}
1782 \label{app:desiseexample}
1783
1784 The following example shows what a vocabulary in desise looks like. The
1785 content is, superficial similarities to real vocabularies
1786 notwithstanding, contrived.
1787
1788 \begin{lstlisting}[language=python]
1789 {
1790 "uri": "http://www.ivoa.net/rdf/example",
1791 "flavour": "RDF Class",
1792 "terms": {
1793 "EQUATORIAL": {
1794 "label": "Equatorial",
1795 "description": "Umbrella term for all sorts of equatorial frames.",
1796 "narrower": ["ICRS", "ICRS2", "BD", "BD1875.0"], "wider": []
1797 },
1798 "ICRS": {
1799 "label": "ICRS",
1800 "description": "As defined by 1998AJ....116..516M.",
1801 "wider": ["EQUATORIAL"], "narrower": []
1802 },
1803 "B1875": {
1804 "label": "Bonner Durchmusterung System",
1805 "description": "Deprecated term for the reference system implied by BD/CD",
1806 "deprecated": "",
1807 "wider": ["EQUATORIAL"], "narrower": []
1808 },
1809 "BD": {
1810 "label": "Bonner Durchmusterung System",
1811 "description": "The reference system implied by BD/CD"
1812 "wider": ["EQUATORIAL"], "narrower": []
1813 },
1814 "ICRS2": {
1815 "label": "ICRS 2",
1816 "description": "The reference system defined by 2027A&A..1234...12B",
1817 "preliminary": "",
1818 "wider": ["EQUATORIAL"], "narrower": []
1819 }
1820 }
1821 }
1822 \end{lstlisting}
1823
1824 \section{Changes from Previous Versions}
1825
1826 \subsection{Changes from WD-2020-06-12}
1827
1828 \begin{itemize}
1829 \item No changes to normative material.
1830 \item Adding a use case on vocabulary evolution and on VO-DML.
1831 \item Various editorial changes.
1832 \end{itemize}
1833
1834 \subsection{Changes from WD-2020-03-26}
1835
1836 \begin{itemize}
1837 \item Desise term values are now dicts with label and description to
1838 make it a bit more self-explanatory; this let us pull in preliminary,
1839 deprecated, and wider as well.
1840 \item Desise now contains an inversion of wider, narrower, with meanings
1841 quite different between SKOS and the other flavours.
1842 \item The main media type for Desise is now application/x-desise+json rather
1843 than text/json because there is no text/json, and you can't have
1844 content media type parameters on either.
1845 \item Mentioning licenseuri and licensehtml in the non-normative part on
1846 managing vocabulary metadata. Also stating there that IVOA-managed
1847 vocabularies are CC-0.
1848 \end{itemize}
1849
1850
1851 \subsection{Changes from WD-2019-09-05}
1852
1853 \begin{itemize}
1854 \item We no longer recommend that non-RDF clients use RDF/XML. We have
1855 therefore removed the ``usage with plain XML tooling'' sections. We
1856 have also removed the description of the revovo python module from the
1857 toolset appendix.
1858
1859 \item Instead, we now have the custom ``desise'' format described in a
1860 new section that doubles as a very quick introduction for adopters not
1861 interested in RDF.
1862
1863 \item Adding a use case and requirement for the UAT (and, perhaps,
1864 similar externally curated vocabularies). Adding a section on how
1865 such vocabularies may be integrated into the IVOA RDF repository.
1866
1867 \item Now requiring a \emph{Used-in} item in addition VEPs, implying
1868 that only terms that are already applied may be proposed.
1869
1870 \item Adding \emph{Supercedes} and \emph{Superceded-by} items,
1871 formalising the previous language on ``splitting'' VEPs a bit.
1872
1873 \item Adding advice on referencing vocabularies.
1874
1875 \item We now demand a formal validation of VEPs by the semantics chair.
1876 The responsibility for ``uploading'' the VEP, i.e., adding it to the VEP
1877 index, is now assigned to them.
1878
1879 \item Adding a soapbox section with advice on what to do when proposing
1880 new terms and introducing a naive semantics model.
1881 \end{itemize}
1882
1883 \subsection{Changes from REC-1.19}
1884
1885 The present document is a full re-write of Version 1 of Vocabularies in
1886 the VO. See sect.~\ref{sect:version1rel} for details.
1887
1888 \bibliography{local.bib,ivoatex/ivoabib,ivoatex/docrepo}
1889
1890
1891 \end{document}

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26