# Annotation of /trunk/projects/semantics/Vocabularies/Vocabularies.tex

Revision 5922 - (hide annotations)
Thu Jan 14 08:12:17 2021 UTC (3 months, 3 weeks ago) by msdemlei
File MIME type: application/x-tex
File size: 80899 byte(s)
Vocabularies: Misc updates prior to PR

* enabling hyphens within prefixed vocterms
* various editorial updates


 1 msdemlei 5459 \documentclass[11pt,a4paper]{ivoa} 2 \input tthdefs 3 4 msdemlei 5612 \usepackage{todonotes} 5 msdemlei 5824 \lstloadlanguages{XML,python} 6 msdemlei 5486 \lstset{flexiblecolumns=true,tagstyle=\ttfamily, showstringspaces=False, 7 basicstyle=\footnotesize} 8 9 msdemlei 5567 \definecolor{termcolor}{rgb}{0.6,0.1,0.1} 10 msdemlei 5922 11 \iftth 12 \def\vocterm#1{\emph{\color{termcolor}#1}} 13 14 \else 15 \def\vocterm{\startvocterm\realvocterm} 16 \def\realvocterm#1{\emph{\color{termcolor}#1}\endvocterm} 17 \begingroup 18 \gdef\breakablecolon{:\hskip0pt} 19 \catcode\:=\active 20 \gdef\startvocterm{\begingroup 21 \catcode\:=\active\let:=\breakablecolon} 22 \gdef\endvocterm{\endgroup} 23 \endgroup 24 \fi 25 26 27 msdemlei 5704 \newcommand{\vepitem}[1]{\emph{#1}} 28 msdemlei 5474 29 msdemlei 5459 \title{Vocabularies in the VO} 30 31 % see ivoatexDoc for what group names to use here 32 \ivoagroup{Semantics} 33 34 \author{Markus 35 Demleitner} 36 msdemlei 5776 \author{Norman 37 Gray} 38 msdemlei 5800 \author{Mark 39 Taylor} 40 msdemlei 5459 41 \editor{Markus Demleitner} 42 43 msdemlei 5922 \previousversion 44 {WD-20200612} 45 \previousversion 46 {WD-20200326} 47 msdemlei 5755 \previousversion 48 {WD-20190905} 49 msdemlei 5459 50 51 \begin{document} 52 \begin{abstract} 53 msdemlei 5470 In this document, we discuss practices related to the use of RDF-based 54 msdemlei 5911 consensus vocabularies in the Virtual Observatory, that is the creation, 55 publication, maintenance, and consumption of 56 hierarchical word lists agreed upon within the IVOA. 57 msdemlei 5610 To cover the wide range of use cases envisoned, we define three flavours 58 of such vocabularies: SKOS for informal knowledge organisation on the 59 one hand, and strict hierarchies of classes and properties on the other. 60 msdemlei 5758 While the framework rests on the solid foundations of W3C RDF, 61 provisions are made to facilitate using IVOA vocabularies without 62 specific RDF tooling. 63 msdemlei 5551 Non-normative appendices detail the current vocabulary-related tooling. 64 msdemlei 5459 \end{abstract} 65 66 67 \section*{Acknowledgments} 68 69 While this is a complete rewrite of the specification how vocabularies 70 are treated in the VO, we gratefully acknowlegde the groundbreaking work 71 of the authors of version 1 of Vocabulary in the VO, S\'ebastien 72 Derriere, Alasdair Gray, Norman Gray, Frederic Hessmann, Tony Linde, 73 Andrea Preite Martinez, Rob Seaman, and Brian Thomas. 74 75 In particular, the vocabulary for datalink semantics done by Norman Gray 76 msdemlei 5547 was formative for many aspects of what is specified here. 77 msdemlei 5459 78 \section*{Conformance-related definitions} 79 80 The words MUST'', SHALL'', SHOULD'', MAY'', RECOMMENDED'', and 81 OPTIONAL'' (in upper or lower case) used in this document are to be 82 interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}. 83 84 The \emph{Virtual Observatory (VO)} is a 85 general term for a collection of federated resources that can be used 86 to conduct astronomical research, education, and outreach. 87 The \href{http://www.ivoa.net}{International 88 Virtual Observatory Alliance (IVOA)} is a global 89 collaboration of separately funded projects to develop standards and 90 infrastructure that enable VO applications. 91 92 \section{Introduction} 93 94 msdemlei 5553 The W3C's Resource Description Framework RDF \citep{note:rdfprimer} is a powerful 95 msdemlei 5470 and very generic means to represent, transmit, and reason on highly 96 structured, semantic'' information. With both its power and 97 generality, however, comes a high complexity for consumers of this 98 information if no further conventions are in force. Also, the generic 99 W3C standards understandably do not cover how semantic resources (e.g., 100 vocabularies or ontologies) are to be managed, let alone developed 101 within organisations like the IVOA. 102 msdemlei 5459 103 msdemlei 5911 While for many applications even within the VO, the significant 104 complexity and the lack of defined management processes is acceptable, 105 for several other use cases -- in particular those given in 106 sect.~\ref{sect:usecases} ––, having extra extra conventions greatly 107 help implementatability and interoperability. 108 109 Based on requirements derived from these use cases 110 msdemlei 5567 (sect.~\ref{sect:requirements}), this standard will therefore define 111 conventions for 112 msdemlei 5485 vocabularies based on either SKOS or RDFS in 113 msdemlei 5553 sect.~\ref{sect:voccontent}. Where these vocabularies -- and hence, in 114 msdemlei 5758 particular, the permanent URIs of their RDF resources (terms'') 115 -- are managed by the 116 msdemlei 5567 IVOA, they need to be reviewed and consensus be found. A process to 117 ensure this is described in 118 sect.~\ref{sect:management}. In order 119 msdemlei 5485 to provide certain guarantees to clients, sect.~\ref{sect:deployment} 120 defines minimal standards for how IVOA-managed vocabularies must be made 121 msdemlei 5911 available. In order to help adopters simply looking for simple 122 vocabulary-related recipes, sect.~\ref{sect:withoutrdf} discusses how IVOA 123 msdemlei 5758 vocabularies can be used without knowledge of RDF. 124 msdemlei 5470 125 msdemlei 5551 The non-normative appendices~\ref{app:tools} and \ref{app:curtech} 126 describe the tooling 127 msdemlei 5553 currently used or recommended for building and managing vocabularies in the 128 msdemlei 5485 IVOA. 129 msdemlei 5470 130 131 msdemlei 5459 \subsection{Role within the VO Architecture} 132 133 \begin{figure} 134 \centering 135 136 \includegraphics[width=0.9\textwidth]{role_diagram.pdf} 137 \caption{Architecture diagram for this document} 138 \label{fig:archdiag} 139 \end{figure} 140 141 msdemlei 5758 Fig.~\ref{fig:archdiag} shows the role the Vocabularies in VO standard 142 msdemlei 5922 plays within the IVOA architecture \citep{2010ivoa.rept.1123A}. 143 msdemlei 5459 144 msdemlei 5911 This standard defines a set of conventiontions on procedures on 145 top of several W3C standards that can be adopted by other VO standards 146 that require interoperable, consensus vocabularies, such as: 147 msdemlei 5459 148 msdemlei 5470 \begin{bigdescription} 149 \item[Datalink \citep{2015ivoa.spec.0617D}] Datalink includes a 150 vocabulary letting clients work out the kind of artefact a row pertains 151 to. 152 153 \item[VOResource \citep{2018ivoa.spec.0625P}] VOResource 1.1 comes with 154 several (rather flat) vocabularies enumerating, for instance, the types 155 of relationships between VO resources, their intended audiences, or 156 classes of actions performed on them. 157 158 \item[VOEvent \citep{2006ivoa.spec.1101S}] VOEvent defines \emph{Why} 159 and \emph{What} elements which, while not formally required to be drawn 160 from a specific vocabulary in version 1.11, certainly become much more 161 useful if they are. 162 163 msdemlei 5752 \item[VOTable \citep{2019ivoa.spec.1021O}] VOTable, in its version 1.4, 164 msdemlei 5470 introduces vocabularies for time scales and reference positions. 165 166 167 \item[UCDs \citep{2007ivoa.spec.0402M}] UCDs are related to vocabularies in 168 that they provide machine-readable semantics. Because the terms listed 169 in the document can be combined and have an underlying grammar, however, 170 they go beyond standard RDF. 171 \end{bigdescription} 172 173 msdemlei 5911 Other VO standards can do with fewer normative constraints; using W3C 174 standards without the extra requirements laid down here is explitly 175 encouraged where the use cases do not require the extra management and 176 definition effort, or where perhaps more complex structures (e.g., full 177 ontologies) must be employed. An example for a direct use of SKOS 178 without adoption of the present document is the Simulation Data Model 179 SimDM \citep{2012ivoa.spec.0503L}, where several fields the values of 180 which are required to be \vocterm{skos:narrower} than certain top-level 181 concepts but no further restrictions on the vocabularies need to be 182 imposed. 183 184 msdemlei 5485 \subsection{Relationship to Vocabularies in the VO Version 1} 185 186 Published in 2009, version 1.19 of the IVOA Recommendation on 187 msdemlei 5567 Vocabularies in the VO had an outlook fairly different from the present 188 msdemlei 5485 document: The big use case was VOEvent's Why and What, and so its focus 189 msdemlei 5612 was on large, general-purpose vocabularies, of which several existed even 190 msdemlei 5551 back then, while an overhaul of a thesaurus of general astronomical 191 msdemlei 5485 terms approved by the IAU in 1993 was underway as part of IVOA's 192 msdemlei 5567 activities. Mapping between vocabularies maintained by different VO 193 and non-VO parties seemed to be the way to ensure interoperability and 194 msdemlei 5485 therefore played a large role in the document. Also, the use cases 195 called for soft'' relations, which is why the standard confined itself 196 to SKOS as the vocabulary formalism. 197 198 Since then, the'' large astronomy thesaurus is being maintained 199 outside of the IVOA (the UAT\footnote{\url{http://astrothesaurus.org}}), 200 msdemlei 5551 and there is hope that its takeup will be sufficient to make mapping 201 between it and, say, legacy journal keyword systems an exercise general 202 clients will not have to perform. 203 msdemlei 5485 204 Instead, in 2010, a fairly formal vocabulary of what 205 msdemlei 5551 should be properties (in the RDF sense) rather than \vocterm{skos:Concept}-s 206 msdemlei 5485 was required during the development of the datalink standard. The 207 vocabulary was (and still is) small in comparison to, say, the UAT. In 208 contrast to the expectations of Vocabularies~1, the plan had been that 209 most data providers would work with this small vocabulary, and terms 210 from external vocabularies would only be used as temporary stand-ins 211 until the consensus vocabulary was updated. Of course, this required a 212 process for managing such vocabularies. The lack of such a process 213 msdemlei 5553 became even more noticeable when VOResource 1.1 and VOTable 1.4 214 msdemlei 5758 introduced vocabularies of their own similar in size and scope to the 215 datalink vocabulary. 216 msdemlei 5485 217 On the other hand, we are not aware of a single attempt to map 218 between different vocabularies in a VO context, and the SKOS versions of 219 some vocabularies that Vocabularies 1 declared as normative in its 220 msdemlei 5567 section~4 were largely unused and have been unmaintained for a while now. 221 msdemlei 5485 222 Since large parts of the original specification turned out to be 223 msdemlei 5553 irrelevant or unsustainable as the VO ecosystem evolved, 224 while some core requirements found later 225 msdemlei 5551 were not addressed, it was decided to prepare a new major version of the 226 Vocabularies in the VO standard. 227 msdemlei 5485 228 msdemlei 5754 \subsection{Reading Guide} 229 msdemlei 5485 230 msdemlei 5754 We hope that software authors or annotators just wanting to consume IVOA 231 msdemlei 5758 vocabularies or use them to annotate documents will be able to 232 msdemlei 5754 do so after reading just section~\ref{sect:withoutrdf}. In particular, no 233 deeper understanding of RDF should be necessary. 234 235 msdemlei 5758 Persons intending to participate in vocabulary evolution should skim 236 msdemlei 5754 sect.~\ref{sect:voccontent}, in particular the subsection on the kind of 237 vocabulary they want to modify, and must study 238 sect.~\ref{sect:management}. 239 240 msdemlei 5553 Readers unfamiliar with RDF should read \citet{local:normanspaper} before 241 msdemlei 5754 reading anything outside of section~\ref{sect:withoutrdf}. 242 In particular, we assume familiarity with all RDF 243 msdemlei 5485 terminology discussed there. Concepts not covered by Gray's 244 msdemlei 5567 essay will be informally introduced here. Of course, the 245 underlying W3C standards are normative where applicable. 246 msdemlei 5485 247 msdemlei 5754 248 249 \subsection{Terminology, Conventions, Typography} 250 251 msdemlei 5758 When we speak of \emph{term} here, that either means a \vocterm{skos:Concept} 252 msdemlei 5595 in SKOS vocabularies, an \vocterm{rdfs:Class} in RDF class vocabularies, 253 msdemlei 5758 and an \vocterm{rdf:Property} in RDF property vocabularies. We also use 254 \emph{term} for the string after the hash character in 255 the RDF resource URI'', i.e., the machine-readable string typically used 256 mbt 5798 in annotation. It is rarely necessary to distinguish between the two 257 msdemlei 5758 meanings. 258 msdemlei 5485 259 We refer to classes and properties by CURIEs. The prefixes in this 260 document correspond the the following URIs: 261 262 msdemlei 5551 \begin{compactitem} 263 msdemlei 5530 \item dc -- \url{http://purl.org/dc/terms/} 264 \item rdf -- \url{http://www.w3.org/1999/02/22-rdf-syntax-ns#} 265 msdemlei 5485 \item rdfs -- \url{http://www.w3.org/2000/01/rdf-schema#} 266 msdemlei 5530 \item owl -- \url{http://www.w3.org/2002/07/owl#} 267 msdemlei 5485 \item skos -- \url{http://www.w3.org/2004/02/skos/core#} 268 msdemlei 5553 \item ivoasem -- \url{http://www.ivoa.net/rdf/ivoasem#} 269 msdemlei 5551 \end{compactitem} 270 msdemlei 5485 271 msdemlei 5598 Vocabulary terms are written in italics (e.g., \vocterm{rdfs:Class}) 272 and, where supported, in a reddish hue. As common in IVOA 273 specifications, XML element and attribute names are written in 274 typewriter italic (e.g., \xmlel{img}). 275 msdemlei 5485 276 msdemlei 5758 \section{Derivation of Requirements (Non-Normative)} 277 msdemlei 5485 278 msdemlei 5474 \subsection{Use Cases} 279 msdemlei 5485 \label{sect:usecases} 280 msdemlei 5470 281 msdemlei 5474 The normative content of this document is guided by a set of 282 mbt 5650 requirements derived from the following use cases. 283 msdemlei 5474 284 \subsubsection{Controlled Vocabulary in VOResource} 285 \label{uc:simplevoc} 286 287 msdemlei 5758 In VOResource, in certain use cases clients have to find services that 288 publish a given data collection. This is effected by linking the resource 289 records for service and data with a 290 DataCite-compatible \vocterm{isServedBy} relationship. 291 msdemlei 5474 Its concrete literal needs to be reliably defined in order to let 292 msdemlei 5553 clients find such relationships by a simple string comparison in RegTAP 293 queries. 294 msdemlei 5474 295 A related use case is that validators can flag errors (or at least 296 msdemlei 5567 warnings) when resource records use terms that are not part of some 297 controlled vocabulary (e.g., content levels or types of events in a 298 msdemlei 5612 resource's history). Very typically, such out-of-vocabulary terms 299 indicate small oversights on the part of the resource record author that 300 will lead to hard-to-debug problems in data discovery. 301 msdemlei 5474 302 \subsubsection{Controlled Vocabularies in VOTable} 303 \label{uc:votvoc} 304 305 msdemlei 5758 VOTable 1.4 constrains two attributes of the TIMESYS elements 306 -- reference positions and time 307 scales -- using vocabularies. 308 msdemlei 5752 While with time scales the situation is not fundamentally 309 msdemlei 5474 different from the VOResource case discussed in 310 msdemlei 5758 use case.~\ref{uc:simplevoc} -- a simple enumeration of agreed-upon strings 311 msdemlei 5474 is enough to uniquely determine what operations need to be performed to 312 combine times given in different time scales --, the situation for 313 reference positions is probably different. There, even if a client does 314 not exactly know the location of, say, the Hubble Space Telescope at any 315 given time, several important use cases can already be satisfied if a 316 client knows that it is in lower Earth orbit (e.g., assuming a reference 317 position Geocenter and adjusting the systematic error estimates). For 318 this, a client needs information of the type \vocterm{HST} 319 msdemlei 5752 \vocterm{is-close-to} \vocterm{GEOCENTER\/}'' (or similar). 320 msdemlei 5474 321 msdemlei 5567 There is also another difference between this and at least the 322 VOResource relationship vocabulary from use case~\ref{uc:simplevoc} 323 in that the latter is property-like, as 324 msdemlei 5551 in Resource-1 \vocterm{isServedBy} Resource-2\/''. In constrast with 325 msdemlei 5752 this, a time scale would be used like Time-coordinate 326 \vocterm{is-given-in} 327 \vocterm{TT\/}''. In RDFS terminology, they are therefore better modelled 328 msdemlei 5474 as classes rather than properties. 329 330 \subsubsection{Datalink Link Selection} 331 \label{uc:links} 332 333 msdemlei 5612 In Datalink, clients receive a set of links 334 msdemlei 5758 to pieces of information (e.g., previews, additional metadata, 335 progenitors, or 336 derived data) and need to present to the user only those items 337 msdemlei 5474 relevant to the task at hand. For instance, in a discovery phase, only 338 previews should be offered, while scientific exploitation would call for 339 msdemlei 5758 cutout services, alternate formats, or derived data. For debugging, 340 msdemlei 5474 progenitors should be made accessible, and so on. 341 342 Operators of datalink services, on the other hand, want to be precise in 343 their annotation of datasets. For instance, they may want to discern 344 msdemlei 5612 among progenitors the raw image, a dark frame, and a flat field. In all 345 msdemlei 5758 these cases, clients should still be able to work out that such 346 msdemlei 5474 artefacts are progenitors. 347 348 msdemlei 5567 \subsubsection{VOEvent Filtering, Query Expansion} 349 msdemlei 5474 \label{uc:filtering} 350 351 In VOEvent, an event stream can contain a classification of what the 352 msdemlei 5551 observers believe was observed, for instance supernova Ia explosion''. 353 msdemlei 5474 While an event stream from one project might provide a classification on 354 that level for some event, it might not (yet) be able to do that in 355 msdemlei 5758 another event, and a different event stream might not be able to 356 msdemlei 5474 distinguish between different sorts of supernovae at all. 357 358 In this situation, an event broker looking for supernovae of type Ia 359 will filter out anything not related to supernovae; however, since for 360 one reason or another a Ia supernova might only be tagged as supernova, 361 it will want to widen its filter somewhat, where some backend process 362 might prioritise events classified as Ia upstream over those only tagged 363 as a generic supernova, and those, again, over those tagged explicitly 364 as some different type of supernova. 365 366 Similar use cases exist, for instance, in the discovery of simulations 367 and possibly for subjects of VO resources. 368 369 370 \subsubsection{Vocabulary Updates in VOResource} 371 \label{uc:deprecation} 372 373 In VOResource 1.0, relationship types like \vocterm{served-by} or 374 \vocterm{service-for} were defined. Later, DataCite defined equivalent 375 msdemlei 5551 terms \vocterm{IsServedBy} and \vocterm{IsServiceFor}. Arguably, the VO should, 376 msdemlei 5474 as far as sensible, take up standards in the wider data management 377 community, and so VOResource 1.1 adopts the DataCite terms. In a minor 378 version, it cannot forbid the old terms. It can, however, say not only 379 msdemlei 5824 \vocterm{served-by\/} is the same as \vocterm{isServedBy\/}'' but also 380 msdemlei 5567 Use the latter term in preference to the former''. If this information is 381 msdemlei 5474 available machine-readably, validators can warn against the use of 382 msdemlei 5553 deprecated terms and user interfaces can transparently replace 383 deprecated terms with current ones. This latter use case is is 384 msdemlei 5752 already specified in RegTAP 1.1 \citep{2019ivoa.spec.1011D}. 385 msdemlei 5474 386 msdemlei 5597 Another use case in the context of VOResource and vocabulary updating 387 msdemlei 5612 is the definition of content levels. In VOResource 1.0, a list of 388 msdemlei 5597 terms was adopted that was far too fine-grained in the area of public 389 outreach, distinguishing, for instance, Middle School'' from 390 Secondary Education''; while this granularity was useful for the 391 original realm of the list of terms, in the VO it resulted in extremely 392 msdemlei 5612 inhomogeneous annotation. Obviously, persons employed in research 393 msdemlei 5597 institutions can hardly be expected to assess needs and capabilities of 394 middle school versus elementary school educators. Eventually, for 395 VOResource 1.1 a three-term list was drawn up and is now actually used. 396 To avoid a repetition of such an experience, we want to enable small 397 initial vocabularies easily extendable as new terms are actually needed 398 and the use of the existing terms is well understood. 399 400 401 msdemlei 5911 \subsubsection{Vocabularies in VO-DML} 402 403 The modelling language VO-DML \citep{2018ivoa.spec.0910L} lets model 404 designers constrain attribute values though external resources defined 405 through a vocabulary URI and possibly a top concept. The standard 406 mentions both SKOS -- inspired by version 1 of this document -- and RDFS 407 as possible technologies for such constraints. 408 409 Depending on the nature of the attributes constrained, modellers might 410 forsee the need for having these vocabularies managed by the IVOA. Of 411 course, that is up to the modeller: There are certainly many cases in 412 which there is no need for the overhead this specification brings with 413 it, be it because vocabularies are externally defined or because the 414 concrete application profits from less-constrained vocabularies. 415 416 msdemlei 5474 \subsubsection{Discovering Meanings} 417 \label{uc:discovering} 418 419 msdemlei 5612 Software developers or researchers want to work out 420 msdemlei 5485 what some term mentioned means'' (where we are agnostic as to what 421 means'' should mean here). If the term URI alone is insufficient, 422 they can simply paste the resource URI of the term into a web browser 423 msdemlei 5551 and read (at least) its description and perhaps find out even more using 424 relationships between terms. 425 msdemlei 5474 426 msdemlei 5552 \subsubsection{Simple Review Process} 427 \label{uc:simplereview} 428 msdemlei 5485 429 msdemlei 5552 As vocabularies evolve, new terms are being added to 430 vocabularies. To facilitate their review and enable rapid uptake 431 of the proposed terms, it is desirable that new terms and even 432 new vocabularies are immediately visible to users and tools. 433 Note that since terms under review might be modified or removed later, 434 this use case is somewhat in conflict with the basic requirement 435 of stable vocabularies (i.e., a document valid once will not 436 become invalid later because of changes in vocabularies). 437 438 msdemlei 5912 \subsubsection{Understanding Vocabulary Evolution} 439 \label{uc:understanding} 440 441 When a question coes up what, say, \vocterm{calibration} actually means 442 in the datalink core vocabulary, and the (legacy) description is not 443 sufficiently clear, people can go back to the discussions that lead up 444 to the addition of that term. This will also help clarify existing 445 usage that might have begun at the time of the initial definition. 446 447 msdemlei 5612 \subsubsection{Offline operation} 448 \label{uc:offline} 449 450 A system doing, say, coordinate transformations runs without an internet 451 msdemlei 5758 connection but still needs to use semantic resources on frames and 452 reference positions (e.g., figure out that a given space probe is in L1 453 msdemlei 5612 and use that as reference position). To do that, it wants to use a 454 previously downloaded copy of the vocabulary. 455 456 msdemlei 5721 \subsubsection{UAT in VOResource} 457 \label{uc:uat} 458 msdemlei 5612 459 msdemlei 5721 VOResource 1.1, in the description of the \xmlel{subject} element, says 460 mbt 5798 that its content should be drawn from the Unified Astronomy Thesaurus'' 461 msdemlei 5721 (here: UAT). This is intended to later facilitate interactive topic 462 navigation within the Registry or semantic expansion of Registry queries 463 (include narrower terms''). 464 465 466 msdemlei 5474 \subsection{Requirements} 467 msdemlei 5485 \label{sect:requirements} 468 msdemlei 5474 469 \subsubsection{Lists of Terms} 470 \label{req:lists} 471 472 msdemlei 5567 We need to be able to represent simple lists of terms even for the most 473 msdemlei 5486 basic use case~\ref{uc:simplevoc}. As per 474 msdemlei 5553 use case~\ref{uc:votvoc}, we will have to represent instances of both 475 \vocterm{rdf:Property} and \vocterm{rdfs:Class} (though not necessarily 476 msdemlei 5914 in one vocabulary). In order to not break existing practices (e.g., 477 use cases \ref{uc:simplevoc}, \ref{uc:votvoc}, \ref{uc:links}), the 478 machine-readable terms must be allowed to follow existing patterns of 479 essentially human-readable identifiers (against external best practices 480 of using non-informative URI forms). In general, in essentially all use 481 cases discussed, making the machine-readable terms discernable by a 482 human is an advantage. 483 msdemlei 5474 484 \subsubsection{Hierarchies of Terms} 485 \label{req:hierarchy} 486 487 msdemlei 5553 Both use case~\ref{uc:links} and use case~\ref{uc:filtering} require a hierarchy 488 of terms, where clients can find wider and potentially narrower terms 489 relative to an original one. There is a difference, 490 msdemlei 5474 however: in the datalink use-case, strict \vocterm{is-a} relationships 491 are what clients need (e.g., give me all kinds of previews''). In the 492 VOEvent case, however, a somewhat softer sort of hierarchy is required. 493 For instance, a filter for accretion disks might very well expand to 494 mbt 5798 match both quasars and cataclysmic variables. Hence, we want to 495 msdemlei 5474 be able to represent strict class hierarchies as well as thesaurus-like 496 soft knowledge structures. 497 498 msdemlei 5600 \subsubsection{Tree-like Hierarchies} 499 msdemlei 5599 \label{req:tree} 500 501 Where we expect some sort of semi-formal inference to take place on the 502 vocabularies, the hierarchy should be a tree in order to facilitate 503 traversal and controlled query expansion. In other words, outside of 504 SKOS we do not support multiple inheritance. Use cases requiring 505 msdemlei 5758 something equivalent would have to resort to supporting multiple terms 506 on the annotation level. 507 msdemlei 5599 508 msdemlei 5474 \subsubsection{Consensus Vocabularies} 509 \label{req:consensus} 510 511 Essentially all our our use cases will be much easier to implement if 512 clients can work through simple string comparisons. Therefore, 513 mbt 5650 wherever feasible IVOA standards should build on IVOA-sanctioned, 514 msdemlei 5474 consensus vocabularies. 515 516 \subsubsection{Deprecating Terms} 517 \label{req:deprecating} 518 519 While we believe at this point that terms once approved by the IVOA 520 should never disappear -- for instance, because validators might 521 otherwise flag previously valid instance documents as invalid --, use 522 msdemlei 5551 case~\ref{uc:deprecation} shows that some way of declaring 523 deprecations must be forseen. 524 msdemlei 5474 525 msdemlei 5486 \subsubsection{Public Availability of Machine-Readable Vocabularies} 526 \label{req:machine} 527 msdemlei 5474 528 msdemlei 5486 In particular in use cases~\ref{uc:links} and \ref{uc:filtering}, 529 msdemlei 5474 clients can flexibly incorporate vocabulary updates without code 530 changes, perhaps even without re-deployment, if vocabularies are 531 msdemlei 5485 available at constant, public URIs, where clients can retrieve them in 532 formats reasonably easy to parse. 533 msdemlei 5474 534 msdemlei 5485 Use case~\ref{uc:discovering} implies that at least one representation 535 msdemlei 5612 of the vocabulary should be human-readable. 536 msdemlei 5474 537 msdemlei 5485 \subsubsection{Minimal Term Metadata} 538 msdemlei 5486 \label{req:mtm} 539 msdemlei 5474 540 msdemlei 5485 To support use case~\ref{uc:discovering}, all terms in IVOA vocabularies 541 msdemlei 5619 MUST come with a non-trivial description. 542 msdemlei 5474 543 msdemlei 5486 \subsubsection{Simple Cases do not Require RDF Tooling} 544 msdemlei 5752 \label{req:nordf} 545 msdemlei 5486 546 (Not derived from any specific use case). Since libraries implementing 547 (some subset of) RDF tend to be rather massive and thus appear 548 unproportional when all a client wants is an up-to date list of terms 549 msdemlei 5752 with their descriptions, at least the basic use cases must not require 550 specific RDF tooling. Indeed, simple uses should not require an 551 understanding of RDF in the first place. 552 msdemlei 5486 553 msdemlei 5752 554 msdemlei 5552 \subsubsection{Vocabulary Evolution} 555 \label{req:evolution} 556 msdemlei 5486 557 msdemlei 5553 Most use cases make it desirable that terms can be added to existing 558 msdemlei 5552 vocabularies; this is very clear for the reference positions in 559 use case~\ref{uc:votvoc}, where new instruments would imply new 560 msdemlei 5612 terms. The history of content level annotation in VOResource mentioned 561 in use case~\ref{uc:deprecation} illustrates the desirability of a 562 simple process that invites standard authors to start with minimal 563 vocabularies, relying on later extensions. 564 msdemlei 5552 565 msdemlei 5912 \subsubsection{Traceable Provenance} 566 \label{req:traceable} 567 568 To satisfy use case~\ref{uc:understanding}, the considerations that led 569 to the adoption or modification of a term must be documented publicly 570 in sufficient detail. It is clearly an advantage if a brief, accessible 571 summary of these considerations can easily be found without, say, 572 resorting to version control logs. 573 574 msdemlei 5552 \subsubsection{Preliminary Vocabularies and Terms} 575 \label{req:preliminary} 576 577 msdemlei 5553 In use case~\ref{uc:simplereview}, it is desirable to admit 578 msdemlei 5552 preliminary'' vocabularies and terms. For these, both humans 579 and machines must be able to discern a temporary status, and 580 their use implies that the general rule once valid, always 581 valid'' does not apply. Validators and similar software could 582 msdemlei 5553 then add notices to that effect in their outputs. 583 msdemlei 5552 584 msdemlei 5612 \subsubsection{Vocabulary Files are Usable Stand-Alone} 585 \label{req:standalone} 586 587 Vocabulary files need to be cacheable without applications having to 588 manage extra metadata (e.g., the URL from which the file was obtained) 589 msdemlei 5758 in order to easily satisfy use case~\ref{uc:offline} (or other scenarios 590 in which vocabulary content cannot be retrieved from the IVOA 591 msdemlei 5612 site for each session). 592 593 msdemlei 5757 \subsubsection{Externally Curated Vocabularies and VO Tooling} 594 msdemlei 5752 \label{req:external} 595 msdemlei 5721 596 Regrettably, VOResource does not explain how use case~\ref{uc:uat} would 597 look like in actual documents, and the example given in the document 598 clearly does not use UAT concepts. 599 600 The first difficulty in a straightforward uptake is that UAT URIs look 601 like \url{http://astrothesaurus.org/uat/1774}. Given that, should 602 publishers have such URIs in \xmlel{subject}? Or should they rather use 603 just the last URI segment for conciseness? Or perhaps the preferred 604 labels, in keeping with the style of existing subject content and its 605 use by clients (which typically look for natural language in subject), 606 even though the labels are not considered stable? 607 608 Regardless of how VOResource clarifies this matter, UAT artefacts (e.g., 609 msdemlei 5757 SKOS files), do not match some of our other requirements. In particular, 610 msdemlei 5721 the human-readable URIs from \ref{req:lists}, the specific way we 611 msdemlei 5752 satisfy \ref{req:machine}, and the non-RDF requirement \ref{req:nordf} are 612 msdemlei 5721 not immediately satisfied by the UAT as distributed at the time of 613 writing. 614 615 For simple, uniform use of such externally curated vocabularies, it 616 should be possible to have some sort of endorsement process and then 617 distribute the vocabularies in a form compliant with this specification. 618 This will entail IVOA-specific concept URIs, and we must be able to 619 express that these resources have the same meaning as the ones 620 externally maintained. 621 622 623 msdemlei 5485 \subsection{Non-Requirement} 624 msdemlei 5474 625 msdemlei 5485 This specification is not called Semantics in the VO'' or the like 626 because we do \emph{not} intend to prescribe ways to turn any VO 627 msdemlei 5612 artefact into RDF triples. Indeed, for many existing vocabularies, it 628 msdemlei 5485 is left open what exactly the domain or range of properties might be or 629 what subject and predicate the classes or concepts should be used with. 630 631 This is partly because this would substantially complicate the 632 msdemlei 5612 generation of vocabularies -- which would quickly turn into proper 633 ontologies --, partly because the information encoded by 634 msdemlei 5485 the triples has traditionally been expressed using techniques developed 635 by the Data Models working group. 636 637 msdemlei 5551 In particular with a view to later use in linked data scenarios, 638 vocabulary authors should neverthess take care that, given appropriate 639 msdemlei 5485 properties or annotation tools, the vocabularies \emph{could} be used in 640 meaningful RDF triples. 641 642 msdemlei 5758 Conversely, this specification is written with future deeper'' 643 msdemlei 5612 semantics in the VO in mind; tools restricting their operations to the ones 644 msdemlei 5599 discussed here should not break when future specifications enrich 645 existing vocabularies towards full ontologies. 646 647 648 msdemlei 5754 \section{Using IVOA Vocabularies without RDF Tooling} 649 \label{sect:withoutrdf} 650 651 msdemlei 5758 RDF is a 652 mbt 5798 powerful system for expressing a wide range of semantics and enriching 653 msdemlei 5754 various documents with semantic information in a globally distributed 654 fashion. Due to its generality, handling its artefacts is relatively 655 msdemlei 5758 involved and in general requires special tooling, non-negligible 656 msdemlei 5754 investment in understanding RDF, and non-trivial management of URIs and 657 prefix mappings. 658 659 msdemlei 5757 To lower the bar for an adoption of IVOA vocabularies 660 [requirement~\ref{req:nordf}], they are given in 661 msdemlei 5754 two formats usable without RDF tooling or, indeed, deeper knowledge of 662 RDF. This section discusses these. 663 664 \subsection{Choosing Terms From IVOA Vocabularies} 665 666 Resource annotators can usually treat IVOA Vocabularies as simple lists 667 msdemlei 5824 of (case-sensitive) strings with human-readable labels and definitions. 668 These lists can be inspected with a simple web browser. 669 msdemlei 5754 670 Each IVOA vocabulary has an associated URI starting with 671 \url{http://www.ivoa.net/rdf}. Dereferencing that URI yields a list of 672 msdemlei 5824 the vocabularies approved or under review. 673 674 An individual vocabulary has a 675 msdemlei 5758 URI like \url{http://www.ivoa.net/rdf/refposition}. Dereferencing this URI 676 msdemlei 5800 with a web browser (or, indeed, any user agent indicating it prefers 677 text/html media) redirects to a tabular representation of the vocabulary, 678 giving \emph{terms} -- i.e., the strings actually used in annotation --, 679 \emph{labels} -- i.e., strings that should be presented to humans instead of 680 the slightly formalised terms --, and \emph{descriptions}, which should 681 mbt 5798 be sufficiently precise to allow someone with a certain amount 682 msdemlei 5754 of domain expertise to decide whether a certain thing'' is or is not 683 msdemlei 5824 covered by the term (or more precisely, the underlying concept). 684 msdemlei 5754 685 Some terms may be marked as deprecated, in which case they should no 686 longer be used in new annotations. In most cases, deprecated terms will 687 mbt 5798 come with information about what to use instead. 688 msdemlei 5754 689 Some terms may be marked as preliminary. Such terms might disappear 690 without further notice. Casual users should avoid the use of such 691 terms; if they find they want to use them, the semantics working group 692 requests notification over its mailing list, since such use is clearly 693 relevant to the term's adoption process. 694 695 msdemlei 5824 Once a term is located within the HTML page, annotators can usually 696 directly use it in instance documents. For instance, continuing the 697 refposition example, the string \texttt{BARYCENTER} found in the 698 vocabulary is directly used in VOTable's TIMESYS element. 699 msdemlei 5754 700 msdemlei 5824 Some applications (Datalink being the prime example) instead use URIs 701 relative to the vocabulary URI. In practical terms, this just means 702 that a hash sign is prepended to the term (e.g., \texttt{\#progenitor}). 703 704 This latter practice builds on the property of IVOA vocabularies that if 705 one adds the term as fragment to the vocabulary URI (e.g., 706 \url{http://ivoa.net/rdf/refposition#BARYCENTER}), that URI is the full, 707 RDF-compliant resource identifier of the concept. When used in 708 HTML-aware user agents (such as a web browser), dereferencing this URI 709 (i.e., opening it) will give the table of terms with the chosen term 710 highlighted. How exactly this is represented depends on the user agent. 711 712 713 \subsection{Semantic Operations Without RDF Tooling} 714 msdemlei 5754 \label{sect:desise} 715 716 Many VO components need a machine-readable representation of the 717 msdemlei 5758 entire vocabulary, for instance in order to 718 (cf.~sect.~\ref{sect:usecases}): 719 msdemlei 5754 720 \begin{compactitem} 721 msdemlei 5758 \item display labels and descriptions for terms to users, 722 \item perform query expansion or similar exploitation of hierarchical 723 relationships, or 724 \item validate annotated instances for the use of correct and current 725 terms. 726 msdemlei 5754 \end{compactitem} 727 728 To let VO programs perform such tasks with minimal technical overhead, 729 in addition to the RDF artefacts described in 730 sect.~\ref{sect:deployment}, IVOA vocabularies are also available in an 731 ad-hoc format called desise (dead simple semantics''). Clients can 732 obtain vocabularies in desise by retrieving the vocabulary URI with the 733 msdemlei 5824 HTTP accept header set to \texttt{application/x-desise+json}. 734 msdemlei 5754 735 msdemlei 5826 What is returned is a JSON-encoded \citep{std:JSON} mapping (object'' 736 in JSON terms) 737 msdemlei 5824 containing the following keys (all mandatory): 738 msdemlei 5754 739 \begin{description} 740 \item[uri] The vocabulary URI. All terms occurring in desise documents 741 can be turned into full, RDF-compliant resource URIs by prefixing them 742 with this URI and a hash character. 743 \item[flavour] The flavour of the vocabulary (can generally be ignored; 744 msdemlei 5758 see sect.~\ref{sect:voccontent}). 745 msdemlei 5787 746 msdemlei 5826 \item[terms] A JSON object mapping the (machine-readable) terms to a 747 JSON object giving the term's properties as described below. 748 The keys in \textit{terms} are the strings used in 749 msdemlei 5824 machine-readable data. 750 \end{description} 751 752 msdemlei 5826 The JSON objects present as values in the terms object can have the 753 msdemlei 5824 following keys: 754 755 msdemlei 5787 \begin{description} 756 msdemlei 5824 \item[label] (mandatory) 757 A human-readable label for display purposes; clients should 758 msdemlei 5787 always try to display this rather than the raw term. 759 msdemlei 5824 760 \item[description] (mandatory) A human-readable definition of the underlying 761 msdemlei 5787 concept. 762 763 msdemlei 5824 \item[deprecated] present and mapped to a reserved value if the term is 764 deprecated and should no longer be used; validators will warn against 765 its use. 766 767 \item[preliminary] present and mapped to a reserved value if the term 768 is preliminary, meaning that in contrast to the other, eternal'' terms 769 it can disappear again; validators should qualify a validation as 770 preliminary if a document uses such a term. 771 772 msdemlei 5826 \item[wider] (mandatory) A JSON array 773 of wider'' terms. Most IVOA vocabularies are 774 msdemlei 5824 tree-like, and for them, there is only up to one term in here, which 775 would be the the parent node, which is the hypernym of the current term. 776 In SKOS-flavoured vocabularies, multiple terms can be here, and the 777 meaning of wider'' is a bit less clear-cut. The \textit{wider} list 778 is empty for top-level terms. 779 780 msdemlei 5826 \item[narrower] (mandatory) A JSON array 781 of narrower'' terms. In SKOS-flavoured 782 msdemlei 5824 vocabularies, that is just a list of all terms that list the current 783 term as wider. Otherwise, the vocabularies are tree-like and 784 \textit{narrower} is a list of all terms on the term's branch and below 785 it in the tree (it is the transitive closure of the inverse of 786 wider''). This is much more easily understood in an example, which we 787 give below in the discussion on addressing use case~\ref{uc:links} below. 788 msdemlei 5754 \end{description} 789 790 msdemlei 5826 Note that, while \textit{wider} and \textit{narrower} are mandatory 791 keys, their values can of course be empty lists. 792 793 msdemlei 5824 See appendix~\ref{app:desiseexample} for a example of a vocabulary 794 represented in desise. 795 msdemlei 5754 796 msdemlei 5824 For illustration, here are recipes to solve the various use cases in 797 msdemlei 5913 Python: 798 msdemlei 5754 799 msdemlei 5913 \paragraph{Load a vocabulary} Using the popular requests module:\\ 800 \begin{lstlisting} 801 import requests 802 voc = requests.get( 803 "http://www.ivoa.net/rdf/uat", 804 headers={"accept": "application/x-desise+json"} 805 ).json() 806 \end{lstlisting} 807 808 Note, however, that non-trivial clients should cache files retrieved in 809 this way for a reasonable time span; IVOA vocabularies typically do not 810 change on time scales of months. 811 812 msdemlei 5824 \paragraph{See if a term is in the vocabulary} (\ref{uc:simplevoc}, 813 \ref{uc:votvoc})\\ \lstinline{term in voc["terms"]} 814 815 \paragraph{See if a term is deprecated} (\ref{uc:deprecation})\\ 816 \lstinline{"deprecated" in voc["terms"][term]} 817 818 \paragraph{Find a human-readable label for a term} 819 (\ref{uc:discovering})\\ 820 \lstinline{voc["terms"][term]["label"]} 821 822 \paragraph{Find a human-readable description for a term} 823 (\ref{uc:discovering})\\ 824 \lstinline{voc["terms"][term]["description"]} 825 826 \paragraph{Find out if a term is preliminary} (\ref{uc:simplereview})\\ 827 \lstinline{"preliminary" in voc["terms"][term]} 828 829 \paragraph{Query expansion: select branch} (in \ref{uc:links}, select all 830 progenitors, including flat fields, dark frames, etc) 831 \begin{lstlisting}[language=python] 832 base_term = "progenitor" 833 expanded_terms = set( 834 [base_term] 835 +voc["terms"][base_term]["narrower"]) 836 is_match = datalink_row["semantics"][1:] in expanded_terms 837 \end{lstlisting} 838 839 \paragraph{SKOS-type query expansion by neighbouring terms} 840 (\ref{uc:filtering}) 841 \begin{lstlisting}[language=python] 842 assert voc["flavour"]=="SKOS" 843 expanded_terms = set( 844 [base_term] 845 +voc["terms"][base_term]["narrower"] 846 +voc["terms"][base_term]["wider"]) 847 is_match = keyword_found in expanded_terms 848 \end{lstlisting} 849 850 851 msdemlei 5485 \section{Vocabulary Content} 852 \label{sect:voccontent} 853 854 msdemlei 5619 IVOA vocabularies MUST be based on W3C's Resource Description Framework. 855 msdemlei 5485 Details on required serialisations are given in 856 sect.~\ref{sect:deployment}. This section deals with what kinds of 857 statements users of IVOA vocabularies SHOULD evaluate to ensure 858 interoperability. Statements of other types are legal in IVOA 859 vocabularies but are not expected to be interpreted interoperably. 860 Clients MAY ignore them. 861 862 msdemlei 5530 In IVOA vocabularies, the concept URI MUST begin with 863 \url{http://www.ivoa.net/rdf}\footnote{In retrospect, the unnecessary 864 www'' in this URI is somewhat regrettable, but existing vocabularies 865 msdemlei 5553 have used URIs including it, and it seems a small price to pay for 866 msdemlei 5551 having uniform URIs}. It is recommended to not introduce 867 msdemlei 5824 additional hierarchy levels, i.e., vocabulary URIs SHOULD be direct children 868 msdemlei 5551 of \texttt{rdf}\footnote{Some existing vocabularies do not follow this 869 msdemlei 5758 rule; since vocabulary URI changes will break certain usage scenarios, 870 msdemlei 5612 their URIs are still retained.}. 871 msdemlei 5551 872 Since all vocabularies specified here are 873 msdemlei 5758 single-file, the full term (i.e., RDF resource) 874 URI is formed by appending a hash sign 875 msdemlei 5530 and a fragment identifier. In IVOA vocabularies, this fragment 876 identifier MUST consist of ASCII letters, numbers, underscores and 877 dashes exclusively [for requirement~\ref{req:machine}]. 878 879 msdemlei 5619 The fragment identifiers in the vocabulary URIs SHOULD be 880 msdemlei 5567 human-readable, usually by suitably contracting the 881 msdemlei 5530 preferred label. In the IVOA, we do \emph{not} use natural 882 language-neutral concept identifiers but instead expect that domain 883 experts will already have an impression of a term's meaning from looking 884 at its URI. 885 886 msdemlei 5612 In this specification, we distinguish three different flavours'' of 887 vocabularies. Each covers a particular domain of problems and is 888 therefore subject to different requirements. 889 msdemlei 5599 Although the requirements are largely non-contradicting, each vocabulary must 890 msdemlei 5551 be clearly identified as \emph{either} giving SKOS concepts, RDFS 891 msdemlei 5553 classes or RDF properties so clients know how to extract word lists and 892 msdemlei 5752 hierarchies; see sect.~\ref{sect:genprop} 893 msdemlei 5619 for details. 894 msdemlei 5485 895 msdemlei 5530 896 \subsection{SKOS Vocabularies} 897 msdemlei 5486 \label{sect:skosvoc} 898 899 msdemlei 5758 SKOS vocabularies should be used where terms are organised 900 in informal (i.e., non necessarily strict is-a) 901 msdemlei 5530 hierarchies. The classic use case here is query expansion, where, for 902 instance, a search for AGN'' might be expanded to include matches for 903 accretion disk'' (under certain circumstances). 904 905 msdemlei 5612 The terms in SKOS vocabularies have the RDF type \vocterm{skos:Concept}. 906 907 msdemlei 5530 \subsubsection{Properties in SKOS Vocabularies} 908 \label{sect:skosvoc-prop} 909 910 msdemlei 5486 IVOA SKOS vocabularies use the following properties: 911 912 \begin{itemize} 913 msdemlei 5595 \item \vocterm{skos:broader} -- interpreted in the standard SKOS sense. 914 The reverse property, \vocterm{skos:narrower}, MAY be given, but clients 915 msdemlei 5486 MUST NOT depend on their presence [this satisifies 916 requirement~\ref{req:hierarchy}]. 917 918 \item \vocterm{skos:prefLabel} -- all concepts MUST have an 919 msdemlei 5612 English-language preferred label, which is an RDF plain literal [by 920 msdemlei 5553 requirement~\ref{req:mtm}]. No RDF language label is allowed on the 921 literal, and only one preferred label is permitted 922 msdemlei 5752 [these help requirement~\ref{req:nordf}]. 923 msdemlei 5486 924 \item \vocterm{skos:definition} -- all concepts MUST have a non-trivial 925 English-language definition. It is obviously impossible to define 926 non-trivial'' in a rigorous way; a suggested criterion is that a 927 domain expert would, given the definition, presumably arrive at a 928 msdemlei 5661 similar preferred label, and recursive definitions (i.e., those using 929 the label itself) should be avoided whenever possible. Definitions in 930 non-English languages are not permitted, and only one definition is 931 permitted [again, this helps requirement~\ref{req:mtm}]. 932 msdemlei 5486 933 msdemlei 5757 \item \vocterm{skos:exactMatch} -- for externally managed vocabularies 934 the IVOA has endorsed (see sect.~\ref{sect:externally-managed}), this 935 property links the IVOA term (subject) to the external RDF resource 936 (object). 937 938 msdemlei 5552 \item General properties discussed in \ref{sect:genprop} [this is 939 for requirements~\ref{req:deprecating} and 940 msdemlei 5595 \ref{req:preliminary}]. The \vocterm{ivoasem:vocflavour} of these 941 vocabularies is \verb|SKOS|. 942 msdemlei 5486 \end{itemize} 943 944 This specification does not include requirements on the use or the 945 msdemlei 5757 interpretation of \vocterm{skos:related}, 946 msdemlei 5486 \vocterm{skos:closeMatch}, \vocterm{skos:broadMatch}, 947 \vocterm{skos:narrowMatch}, \vocterm{skos:ConceptScheme}, 948 \vocterm{skos:inScheme}, \vocterm{skos:hasTopconcept}, 949 msdemlei 5551 \vocterm{skos:altLabel}, and \vocterm{skos:hiddenLabel}. If use cases 950 mbt 5650 are found that require those, this specification will be amended. Until 951 msdemlei 5619 then, vocabulary authors SHOULD NOT use them in order to avoid creating 952 msdemlei 5551 practices that might conflict with later usage patterns. 953 msdemlei 5486 954 This specification does not include requirements on the use or the 955 interpretation of the transitive SKOS properties 956 msdemlei 5551 (\vocterm{skos:broaderTransitive}, \vocterm{skos:narrowerTransitive}). 957 msdemlei 5486 At this point, we believe that applications requiring this type of 958 msdemlei 5551 reasoning-friendly semantics should preferably use RDF class 959 msdemlei 5486 vocabularies. 960 961 \subsubsection{Example (non-normative)} 962 963 msdemlei 5758 Here is a term from a SKOS vocabulary conforming to this specification 964 msdemlei 5568 in RDF/XML serialisation: 965 msdemlei 5486 966 \begin{lstlisting}[language=XML] 967 msdemlei 5595 968 msdemlei 5828 AGN 969 A compact object in the center of a galaxy showing 970 unusual emission ("active galactic nucleus"). 971 973 975 msdemlei 5486 976 \end{lstlisting} 977 978 msdemlei 5553 \subsection{RDF Properties Vocabularies} 979 msdemlei 5530 \label{sect:refpropvoc} 980 981 msdemlei 5553 RDF properties vocabularies should be used when the terms in the 982 vocabulary are mainly used to state 983 relationships between entities that can sensibly be imagined as 984 resources in the RDF sense. Such terms would naturally be used as 985 msdemlei 5530 predicates in RDF triples. Obvious examples might be something 986 msdemlei 5758 like is-progenitor-for in a provenance chain or, indeed, the special 987 properties for IVOA vocabularies introduced in sect.~\ref{sect:genprop}. 988 msdemlei 5530 989 msdemlei 5758 990 msdemlei 5612 The terms in RDF Properties vocabularies have the RDF type 991 \vocterm{rdf:Property}. 992 993 msdemlei 5553 \subsubsection{Properties in RDF Properties Vocabularies} 994 msdemlei 5597 \label{sect:propvoc-prop} 995 msdemlei 5530 996 msdemlei 5553 IVOA RDF properties vocabularies use the following properties (where 997 msdemlei 5551 not specified, the requirements considered essentially match those in 998 sect.~\ref{sect:skosvoc-prop}): 999 msdemlei 5530 1000 \begin{itemize} 1001 \item \vocterm{rdfs:label} -- all terms MUST have an English-language 1002 msdemlei 5758 label, and clients should prefer it over the fragment in the 1003 term URI for presentation purposes. Only 1004 msdemlei 5530 one such label is permitted. 1005 1006 \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial 1007 English-language comment serving as a human-oriented definition of the 1008 term. The considerations for \vocterm{skos:definition} in 1009 msdemlei 5661 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one 1010 \vocterm{rdfs:comment} per term is allowed. 1011 msdemlei 5530 1012 \item \vocterm{rdfs:subPropertyOf} -- interpreted as in RDFS to induce 1013 msdemlei 5619 the hierarchy of terms; a term MUST NOT appear as subject of more than 1014 msdemlei 5599 one \vocterm{rdfs:subPropertyOf} triple (i.e., the hierarchy is a tree). 1015 msdemlei 5530 1016 msdemlei 5595 \item General properties discussed in sect.~\ref{sect:genprop}. 1017 The \vocterm{ivoasem:vocflavour} of these vocabularies is 1018 \verb|RDF Property|. 1019 1020 msdemlei 5547 \end{itemize} 1021 msdemlei 5530 1022 \subsubsection{Example (non-normative)} 1023 msdemlei 5551 \label{sect:rdfpxex} 1024 msdemlei 5530 1025 \begin{lstlisting}[language=XML] 1026 msdemlei 5613 1028 preview of the data as a 2-dimensional 1029 image 1030 Image preview 1031 msdemlei 5530 1033 1034 msdemlei 5530 \end{lstlisting} 1035 1036 1037 msdemlei 5551 \subsection{RDF Class Vocabularies} 1038 msdemlei 5530 1039 msdemlei 5567 RDF class vocabularies should be used when the terms in the vocabulary 1040 msdemlei 5530 are reasonably class-like, i.e., would usually be either subjects or 1041 objects in RDF triples. As opposed to SKOS vocabularies, the hierarchy 1042 msdemlei 5612 implied is strict in the sense of \vocterm{rdfs:subClassOf} 1043 msdemlei 5567 -- roughly, that statements true for a wider term must be true 1044 msdemlei 5553 a more specialised term, too. This lets clients confidently perform 1045 msdemlei 5530 inferences. 1046 1047 For instance, coordinates in the FK4 reference frame are equatorial, and 1048 thus even a client unfamiliar with the FK4 frame as such can confidently 1049 infer that the coordinates are right ascension and declination, and that 1050 right ascensions increase eastwards. Reasoning of this type is 1051 impossible within a SKOS vocabulary. 1052 1053 msdemlei 5612 The terms in RDF Class vocabularies have the RDF type 1054 \vocterm{rdfs:Class}. 1055 1056 msdemlei 5551 \subsubsection{Properties in RDF Class Vocabularies} 1057 msdemlei 5597 \label{sect:classvoc-prop} 1058 msdemlei 5530 1059 msdemlei 5551 IVOA RDF class vocabularies use the following properties: 1060 msdemlei 5530 1061 \begin{itemize} 1062 \item \vocterm{rdfs:label} -- all terms MUST have an English-language 1063 msdemlei 5551 label, and clients should prefer it over the term (the fragment of the 1064 term URI) for presentation purposes. Only 1065 msdemlei 5530 one such label is permitted. 1066 1067 \item \vocterm{rdfs:comment} -- all concepts MUST have a non-trivial 1068 English-language comment serving as a human-oriented definition of the 1069 term. The considerations for \vocterm{skos:definition} in 1070 msdemlei 5661 sect.~\ref{sect:skosvoc-prop} apply. As for those, only one 1071 \vocterm{rdfs:comment} per term is allowed. 1072 msdemlei 5530 1073 \item \vocterm{rdfs:subClassOf} -- interpreted as in RDFS to induce 1074 msdemlei 5619 the hierarchy of terms; a term MUST NOT appear as subject of more than 1075 msdemlei 5599 one \vocterm{rdfs:subClassOf} triple (i.e., the hierarchy is a tree). 1076 msdemlei 5530 1077 msdemlei 5552 \item General properties discussed in \ref{sect:genprop}. 1078 msdemlei 5595 The \vocterm{ivoasem:vocflavour} of these vocabularies is 1079 \verb|RDF Class|. 1080 msdemlei 5530 \end{itemize} 1081 1082 \subsubsection{Example (non-normative)} 1083 1084 msdemlei 5553 Here is a term from an RDF class vocabulary conforming to this 1085 msdemlei 5568 specification in RDF/XML serialisation: 1086 msdemlei 5530 1087 \begin{lstlisting}[language=XML] 1088 msdemlei 5613 1089 1090 Positions based on the 5th Fundamental Katalog. If no equinox is 1091 [...] 1092 1093 FK5 1094 1096 1097 msdemlei 5530 \end{lstlisting} 1098 1099 msdemlei 5551 \subsection{General Properties} 1100 \label{sect:genprop} 1101 1102 msdemlei 5553 To cover requirements~\ref{req:deprecating} and 1103 msdemlei 5597 \ref{req:preliminary} and to facilitate the handling of vocabularies not 1104 directly retrieved via HTTP (which means that the application may not 1105 msdemlei 5612 know the vocabulary URI a priori; cf.~requirement~\ref{req:standalone}), 1106 the Semantics WG defines some 1107 msdemlei 5597 properties of its own in the vocabulary 1108 \url{http://www.ivoa.net/rdf/ivoasem}. The following properties may be 1109 msdemlei 5612 used in all three vocabulary flavours: 1110 msdemlei 5551 1111 \begin{itemize} 1112 msdemlei 5612 \item \vocterm{dc:created} -- IVOA vocabularies MUST include exactly one 1113 msdemlei 5595 triple with the vocabulary as subject and a predicate 1114 \vocterm{dc:created}. The object is the datestamp of the vocabulary in 1115 YYYY-MM-DD format. Clients may only use this for debugging and similar 1116 purposes. 1117 1118 msdemlei 5612 \item \vocterm{ivoasem:vocflavour} -- IVOA vocabularies MUST include 1119 msdemlei 5595 exactly one triple with the vocabulary as subject and a string literal 1120 msdemlei 5597 specifying the kind of vocabulary as per this specification. The 1121 General properties'' bullet points of sects.~\ref{sect:skosvoc-prop} 1122 (\verb|SKOS|), \ref{sect:propvoc-prop} (\verb|RDF Property|), and 1123 msdemlei 5612 \ref{sect:classvoc-prop} (\verb|RDF Class|) define what strings may occur 1124 msdemlei 5597 here. 1125 msdemlei 5595 1126 msdemlei 5552 \item \vocterm{ivoasem:preliminary} -- this property indicates 1127 that a term is preliminary and might disappear from the 1128 msdemlei 5597 vocabulary without warning. The object of triples using it 1129 is a blank node. Validators need not warn against the use 1130 msdemlei 5619 of preliminary terms, but as they encounter them, they SHOULD 1131 msdemlei 5552 qualify their validation to the effect that it is temporary. 1132 1133 \item \vocterm{ivoasem:deprecated} -- this property indicates 1134 msdemlei 5597 that a term is deprecated. The object of triples using it 1135 msdemlei 5619 is a blank node. Validators SHOULD issue warnings if such terms 1136 are encountered. 1137 msdemlei 5552 1138 \item \vocterm{ivoasem:useInstead} -- for a deprecated term, the 1139 msdemlei 5758 objects of RDF triples using this property indicate 1140 which terms should be 1141 msdemlei 5619 used instead of the deprecated one. 1142 msdemlei 5552 1143 msdemlei 5551 \end{itemize} 1144 1145 msdemlei 5612 \subsubsection{Example (non-normative)} 1146 msdemlei 5567 1147 msdemlei 5597 The following snippets show RDF/XML triples using the common terms, 1148 taken from the existing relationship\_type vocabulary; the notation 1149 \verb|__| as a blank node is an implementation detail and must not be 1150 msdemlei 5612 relied upon. In general, where ivoasem properties take blank nodes as 1151 objects, clients should normally just ignore the objects. 1152 msdemlei 5567 1153 msdemlei 5597 \begin{lstlisting}[language=XML] 1154 1156 2016-08-17 1157 1158 1160 RDF Property 1161 1162 1164 1166 1167 1169 1171 1173 1174 \end{lstlisting} 1175 1176 msdemlei 5553 1177 msdemlei 5485 \section{Vocabulary Management} 1178 msdemlei 5551 \label{sect:management} 1179 msdemlei 5485 1180 msdemlei 5912 This section discusses the processes through which new vocabularies can be 1181 msdemlei 5567 defined and how vocabulary updates are performed in way 1182 msdemlei 5912 that ensures community participation and at least a minimal level of 1183 consensus; prodecures here primarily address requirements 1184 \ref{req:consensus}, \ref{req:evolution} and \ref{req:traceable}. 1185 1186 msdemlei 5758 In the following, the phrase chair of the Semantics WG'' is understood 1187 msdemlei 5552 to mean chair or vice-chair of the Semantics WG''; in the unlikely 1188 msdemlei 5553 situation that chair and vice-chair dissent, the resolution of the 1189 problem is up to the TCG chair. 1190 msdemlei 5547 1191 msdemlei 5567 1192 msdemlei 5552 \subsection{New Vocabularies} 1193 msdemlei 5757 \label{sect:new-vocabularies} 1194 msdemlei 5547 1195 msdemlei 5552 New vocabularies in the VO should be introduced with a document going 1196 through the normal IVOA approval process, i.e., intended to become a 1197 recommendation or an endorsed note with RFC as described in the IVOA 1198 Document Standards \citep{2017ivoa.spec.0517G}. 1199 msdemlei 5547 1200 msdemlei 5552 At the discretion of the chair or the Semantics WG, the vocabulary is 1201 uploaded to the vocabulary repository when a document reaches the state 1202 of a Working Draft. At the latest, the vocabulary is uploaded when the 1203 msdemlei 5553 document becomes a Proposed Recommendation or a Proposed Endorsed Note 1204 in order to support a thorough review and reference implementations. 1205 msdemlei 5552 1206 The entire vocabulary is marked human-readably as preliminary in the 1207 vocabulary index (cf.~sect.~\ref{sect:deployment}). All terms in the 1208 vocabulary are marked as preliminary using the 1209 \vocterm{ivoasem:preliminary} property (cf.~sect.~\ref{sect:genprop}) in 1210 order to satisfy requirement~\ref{req:preliminary}. 1211 1212 mbt 5650 The entire new vocabulary gets approved as the document introducing it 1213 msdemlei 5552 reaches the status of a Recommendation or an Endorsed Note. From then 1214 msdemlei 5758 on, it is managed by the Semantics WG using the process defined in 1215 msdemlei 5552 the next section. 1216 1217 msdemlei 5567 Once approved (i.e., no longer marked as preliminary), 1218 terms in IVOA vocabularies cannot be removed. They can, 1219 msdemlei 5619 however, be marked as deprecated. 1220 msdemlei 5567 1221 msdemlei 5552 \subsection{Updating Vocabularies} 1222 msdemlei 5757 \label{sect:updating-vocabularies} 1223 msdemlei 5552 1224 msdemlei 5567 IVOA vocabularies can be extended as domain requirements develop 1225 [requirement~\ref{req:evolution}]. Clients 1226 should therefore be designed such that they gracefully deal with terms 1227 that have not been part of the vocabulary at build time, typically by 1228 exploiting information in the vocabulary, perhaps by falling back to 1229 wider, known terms, or by presenting their users labels and descriptions 1230 for terms not explicitly handled. 1231 1232 1233 msdemlei 5552 \subsubsection{Vocabulary Enhancement Proposals} 1234 1235 msdemlei 5553 To add one or more terms to a vocabulary, to introduce deprecations or 1236 to change term labels, descriptions, or relationships, 1237 msdemlei 5552 an interested party -- not necessarily affiliated with the Working Group 1238 that has originally introduced the vocabulary -- prepares a Vocabulary 1239 msdemlei 5547 Enhancement Proposal (VEP). In the interest of thorough review and 1240 topical discussion, a single VEP should only cover directly related 1241 terms. For instance, in a vocabulary of reference frames, it would be 1242 reasonable to add old-style and new-style galactic frames in one 1243 VEP, but not, say, azimuthal and supergalactic coordinates. The 1244 arguments for both terms in the former pair are rather 1245 analogous\footnote{This does not rule out that, in the example, one 1246 might argue that old-style galactic coordinates are so ancient that 1247 perhaps they should not be supported in the VO at all; the chair of the 1248 Semantics WG might then decree that the VEP still needs to be split.}. 1249 In the latter case, two very different rationales would have 1250 to be put forward, which is a clear sign that two VEPs are in order. 1251 1252 msdemlei 5551 \begin{figure} 1253 \begin{verbatim} 1254 Vocabulary: http://www.ivoa.net/rdf/datalink/core 1255 Author: msdemlei@ari.uni-heidelberg.de 1256 Date: 2019-07-19 1257 1258 msdemlei 5874 Term: IsPreviousVersionOf 1259 msdemlei 5551 Action: Addition 1260 msdemlei 5567 Label: Newer Version 1261 Description: This dataset in a previous edition, e.g., processed 1262 msdemlei 5551 with an older pipeline, as part of an older data release. 1263 Relationships: rdfs:subProperyOf(this) 1264 msdemlei 5704 Used-in: http://example.org/datalink?ID=doc-v1 1265 msdemlei 5551 1266 msdemlei 5874 Term: IsNewVersionOf 1267 msdemlei 5551 Action: Addition 1268 msdemlei 5567 Label: Previous Version 1269 Description: This dataset in a newer edition, e.g., processed 1270 msdemlei 5551 with a newer pipeline, as part of a newer data release. 1271 Relationships: rdfs:subProperyOf(this) 1272 msdemlei 5704 Used-in: http://example.org/datalink?ID=doc-v2 1273 msdemlei 5551 1274 Rationale: 1275 1276 The terms are mainly intended for projects with data releases. 1277 msdemlei 5661 IsPreviousVersionOf allows services to mark up links to (typically 1278 msdemlei 5551 datalink documents for) later version(s) of this data set. It 1279 allows a client to alert users that a newer, probably improved, 1280 rendition of the current dataset is available and should 1281 presumably be used instead of what they are looking at. The 1282 inverse relationship, IsNewVersionOf, is useful if projects want 1283 to keep previous versions of the dataset findable without having 1284 them show up in the default queries. 1285 1286 The terms are taken from the relationship types of DataCite. 1287 \end{verbatim} 1288 1289 \caption{A sample VEP.} 1290 \label{fig:vepsample} 1291 \end{figure} 1292 1293 msdemlei 5547 A VEP is a semistructured text file containing the following items: 1294 1295 \begin{itemize} 1296 msdemlei 5800 \item \vepitem{Vocabulary:} The URI of the vocabulary 1297 msdemlei 5704 \item \vepitem{Author:} Contact information for the author(s) of 1298 msdemlei 5547 the VEP. 1299 msdemlei 5704 \item \vepitem{Date:} The date on which the VEP was posted. 1300 msdemlei 5874 \item \vepitem{Term:} The identifier of the term to be added, modified, 1301 or deleted. 1302 msdemlei 5704 \item \vepitem{Action:} one of \textit{Addition}, \textit{Deprecation}, or 1303 msdemlei 5547 \textit{Modification}. 1304 msdemlei 5874 \item \vepitem{Label:} The English-language, human-readable label of the term. 1305 msdemlei 5704 \item \vepitem{Description:} The description that will come with the term. 1306 \item \vepitem{Relationships}: If applicable, relationships the new 1307 msdemlei 5567 term will have to existing terms, using the properties defined in 1308 the present document. 1309 msdemlei 5758 \item \vepitem{Used-In}: At least one URI of a document using the 1310 proposed term. 1311 msdemlei 5874 \item \vepitem{Rationale}: A discussion of use cases, the role of the term in 1312 msdemlei 5758 the vocabulary, and the like. In particular, the item(s) in Used-In 1313 should be commented on. 1314 msdemlei 5547 \end{itemize} 1315 1316 msdemlei 5704 The items \vepitem{Term}, \vepitem{Action}, \vepitem{Label}, 1317 \vepitem{Description}, \vepitem{Used-in}, 1318 and \vepitem{Relationships}, may be repeated if 1319 msdemlei 5547 multiple terms are affected by a VEP. In \textit{Addition} VEPs, all items 1320 msdemlei 5704 except \vepitem{Relationships} are mandatory. 1321 msdemlei 5547 1322 msdemlei 5704 When \vepitem{Action} is \textit{Deprecation}, \vepitem{Label}, 1323 \vepitem{Description}, and \vepitem{Relationships} are optional but can be 1324 msdemlei 5619 given if useful for understanding the VEP. The rationale MUST discuss 1325 msdemlei 5612 the reasons for a deprecation. Usually, one or more replacement 1326 msdemlei 5553 term(s) will be proposed within the same VEP. 1327 msdemlei 5547 1328 msdemlei 5704 When \vepitem{Action} is \textit{Modification}, \vepitem{Label}, 1329 \vepitem{Description}, and \vepitem{Relationships} give the proposed new 1330 msdemlei 5547 values of the term. The term itself cannot be modified. The rationale 1331 msdemlei 5553 will usually detail the changes proposed while mentioning the previous 1332 values. 1333 msdemlei 5547 1334 We do not expect the VEPs to be evaluated by machines. Therefore, we 1335 define no grammar for the markup of sections, section headers, and their 1336 content. It is still recommended that authors follow the formatting of 1337 the example in Fig.~\ref{fig:vepsample}. 1338 1339 msdemlei 5552 \subsubsection{Publishing a VEP} 1340 msdemlei 5547 1341 msdemlei 5705 To publish a VEP, it is sent to the chair of the Semantics WG, 1342 preferably by e-mail. The chair of the Semantics WG will perform a 1343 formal validation, in particular as regards the presence of all required 1344 items and syntactically valid relationships. No assessment of the 1345 contents is done at this stage. 1346 msdemlei 5547 1347 msdemlei 5758 VEPs formally valid then receive a running number. The first VEP was 1348 VEP-0001, the second VEP-0002, and so on. The chair of the Semantics WG 1349 then adds the new VEP is added to the public index of VEPs as 1350 Current'' (see Appendix~\ref{app:curtech} for the technical details). 1351 This index has a link to each VEP's text (in general, a location in a 1352 version control system). 1353 msdemlei 5705 1354 msdemlei 5547 Once the VEP is uploaded, it is announced to the IVOA Semantics Working 1355 Group and all other IVOA Working Groups concerned (again, the technical 1356 msdemlei 5551 details are found in Appendix~\ref{app:curtech}). The chair of the 1357 msdemlei 5547 Semantics WG can extend the distribution as they see fit. The 1358 announcement in particular contains a copy of the VEP in question. 1359 1360 As soon as possible after the upload, the chair of the Semantics WG adds 1361 mbt 5798 any term(s) proposed to the vocabulary as a preliminary term using the 1362 msdemlei 5758 \vocterm{ivoasem:preliminary} property. This means that the terms can 1363 msdemlei 5612 immediately be used without raising warnings or errors, but in contrast 1364 to approved terms, they may disappear again. Deprecation or 1365 modification VEPs have no immediate effect. 1366 msdemlei 5547 1367 msdemlei 5552 \subsubsection{Approval Process} 1368 msdemlei 5550 \label{sect:approval} 1369 msdemlei 5547 1370 Discussion of a VEP takes place in the WGs' discussion forums (again, 1371 msdemlei 5551 see Appendix~\ref{app:curtech}). The chair of the Semantics WG will 1372 msdemlei 5704 summarise the discussion in the VEP in a \textit{Discussion} section. 1373 msdemlei 5547 1374 msdemlei 5704 During the process, all parts of the VEP may be changed except the 1375 term(s) proposed. 1376 1377 msdemlei 5547 Once the chair of the Semantics WG sees a sufficient consensus reached, 1378 they announce the VEP in the TCG. If, at the next meeting of the TCG, 1379 no Working Group objects to the VEP, it is accepted and the marker that 1380 msdemlei 5704 a term is preliminary is removed from the relationships of any terms 1381 added by the VEP. In the case of deprecation or modification VEPs, the 1382 msdemlei 5612 requested actions are taken at this point. 1383 msdemlei 5547 1384 msdemlei 5704 If, on the other hand, discussion of an addition request results in the 1385 realisation that terms proposed need to be changed, the VEP in question 1386 must be withdrawn, its effects on the vocabulary be undone, and zero or 1387 more new VEPs are posted containing proposals for terms for which 1388 consensus appears feasible. The VEP withdrawn receives a 1389 \vepitem{Superceded-by} item referencing any new VEPs, any new VEPs have 1390 a \vepitem{Supercedes} item referencing the original VEP. 1391 msdemlei 5547 1392 msdemlei 5756 \subsubsection{Guidelines for Creating Concepts (non-normative)} 1393 1394 msdemlei 5758 When introducing terms, it is useful to consider a very simple 1395 msdemlei 5756 semantic model, where the world is a set of (tangible or non-tangible) 1396 things'' in the sense of naive set theory. 1397 1398 A vocabulary has a scope, which is a subset of the world; this could be 1399 reference systems'' or astronomical object types'' or even something 1400 as concrete as observatories''. 1401 1402 msdemlei 5824 In this picture, a term denotes a certain subset of a vocabulary's 1403 scope. This set is called the term's (or, where an additional level 1404 between the concrete letters making up the term as defined by this 1405 document and the set is useful, the concept's) extension''. 1406 1407 Now, in an ideal vocabulary the extensions of its 1408 msdemlei 5758 top-level terms are disjunct (meaning: each thing in scope of the vocabulary 1409 belongs to not more than one top-level term's extension) and the terms cover the 1410 msdemlei 5756 entire scope (meaning: for each thing in the scope, there is at least 1411 msdemlei 5758 one term's extension that contains that thing): The top-level terms are 1412 msdemlei 5756 equivalence classes over the vocabulary's scope. 1413 1414 msdemlei 5758 Where vocabularies are hierarchical, analogous considerations would 1415 apply for the extensions of a general term and its more specialised 1416 terms. 1417 1418 When natural language and the real world are involved, 1419 this ideal generally is unreachable. 1420 But when proposing a term and its definition, authors should try to 1421 msdemlei 5756 make sure that 1422 1423 \begin{compactenum} 1424 \item their new term has a useful extension (i.e., consumers actually 1425 want to know whether a thing is or is not inside it) 1426 \item the extension is reasonably disjunct from existing terms, or is a 1427 msdemlei 5758 true superset (in which case the other terms are narrower), or is a true 1428 msdemlei 5756 subset (in which case they are wider) of other terms' extensions. 1429 \end{compactenum} 1430 1431 Put another way: When designing terms, it is as important to say what is 1432 mbt 5798 not covered as to clearly say what is. 1433 msdemlei 5756 1434 This is a major reason why it is important to give clear definitions 1435 whenever these definitions are not uniquely given by the domain. For 1436 instance, while an object type vocabulary probably does not need to be 1437 very diligent in defining $\delta$~Cephei stars because the extension of 1438 that term is uncontroversial to first order\footnote{Although it might 1439 seem desirable to clarify whether, say, W~Virginis stars are or are not 1440 excluded}, a term like dataset'' should come with a precise 1441 definition, ideally containing a reference to a longer explanation. 1442 msdemlei 5757 1443 \subsection{Externally Managed Vocabularies} 1444 \label{sect:externally-managed} 1445 1446 The IVOA is not the only body developing vocabularies, and of course VO 1447 components are free to use other, non-IVOA vocabularies whenever 1448 convenient or even required for interoperability beyond the IVOA. 1449 1450 Sometimes, however, it is advantageous to subject an external vocabulary 1451 to the requirements set forth by this specification. The motivating use 1452 case here is \ref{uc:uat}, the Unified Astronomy Thesaurus. As derived 1453 in requirement~\ref{req:external}, multiple considerations make a 1454 mirror'' of the vocabulary in the IVOA RDF repository highly 1455 desirable. Regrettably, since RDF resources (i.e., what we call terms 1456 here) are identified by their full URIs, this will create new RDF 1457 resources, and hence care must be taken that RDF tools can work out the 1458 identity of the mirrored IVOA terms and the original RDF resources. 1459 1460 Also, the processes from sects.~\ref{sect:new-vocabularies} 1461 and~\ref{sect:updating-vocabularies} obviously cannot apply to such 1462 vocabularies, which have their own management procedures. 1463 1464 To address these issues, the following rules apply: 1465 1466 When a vocabulary managed by an IVOA-external body needs to be made 1467 available in the form prescribed by this specification, a proposal for 1468 doing this needs to pass the endorsed notes process of the IVOA as laid 1469 out in the IVOA Document Standards \citep{2017ivoa.spec.0517G}. As it 1470 concerns external relationships of the IVOA, it additionally needs 1471 endorsment by the IVOA Execuive Committee to become effective. 1472 1473 This proposal has to specify: 1474 \begin{itemize} 1475 \item The basic metadata for the vocabulary on the IVOA side. 1476 \item The rules for mapping the external RDF resource URIs to IVOA term 1477 URIs, together with a plan for how this mapping is kept stable. 1478 \item If during the mapping of the vocabulary, external RDF triples are 1479 discarded (which likely is necessary to ensure adherence to our 1480 constraints), what triples are discarded. 1481 \item A description of and reference to software that performs this 1482 mapping. 1483 \item A description of the external management process. 1484 \end{itemize} 1485 1486 The proposing party has to provide software to automatically translate 1487 resources from the external format to a suitable input for the IVOA 1488 vocabulary tooling. 1489 1490 Each term in the IVOA vocabulary mirror MUST declare its identity to 1491 msdemlei 5758 the original, external RDF resource. At this point, this is only 1492 msdemlei 5757 defined for SKOS-flavoured vocabularies, where the IVOA term must be the 1493 subject of exactly one triple with the \vocterm{skos:exactMatch} 1494 property. The object of that triple is the URI of the external RDF 1495 resource. 1496 1497 For other flavours, no such mechanism is defined in this version of the 1498 specification, which means that for now, externally managed vocabularies 1499 must use the SKOS flavour. 1500 1501 Once an external vocabulary is endorsed by both the TCG and the 1502 Executive Committee, the chair of the Semantics working group has the 1503 responsibility to keep the IVOA mirror of the vocabulary synchronised, 1504 ideally by using a monitored, automatised process like a post-commit 1505 action on an external version control system. 1506 1507 1508 msdemlei 5755 \section{Publishing Vocabularies} 1509 msdemlei 5485 \label{sect:deployment} 1510 1511 msdemlei 5552 This section is an adaptation of \citet{note:cooluris} and is 1512 intended to satisfy requirements~\ref{req:machine} 1513 msdemlei 5755 and~\ref{req:mtm}. It also briefly discusses how IVOA vocabularies 1514 should be referenced. 1515 msdemlei 5549 1516 msdemlei 5755 \subsection{Deploying Vocabularies} 1517 1518 msdemlei 5548 All IVOA-approved vocabularies are accessible as children of 1519 msdemlei 5612 \url{http://www.ivoa.net/rdf}. Dereferencing that URI will lead to an 1520 msdemlei 5551 index of current approved and proposed vocabularies. 1521 msdemlei 5548 Vocabularies still under review are clearly marked as such. 1522 1523 msdemlei 5612 When dereferencing a vocabulary URI, clients will receive an HTTP 303 1524 msdemlei 5548 (See Other) code, with the \texttt{Location} header set to the last 1525 version of the vocabulary. The version is written as the date of the 1526 last update in the format YYYY-MM-DD. Depending on the value of the 1527 request's accept header, the redirect will end up at 1528 1529 \begin{itemize} 1530 msdemlei 5612 \item an HTML rendition of the vocabulary by default. The HTML element 1531 msdemlei 5758 corresponding to a term has the term (i.e., the fragment identifier in the 1532 msdemlei 5612 term's URI) as its HTML id ; hence a URI 1533 \verb|#| will immediately focus the term's HTML 1534 rendition in common 1535 msdemlei 5553 user agents [requirement~\ref{req:mtm}]. 1536 msdemlei 5548 1537 msdemlei 5553 \item a Turtle rendition of the vocabulary if the accept header 1538 msdemlei 5548 indicates that \verb|text/turtle| documents are preferred. 1539 1540 msdemlei 5752 \item an RDF/XML rendition of the vocabulary 1541 msdemlei 5612 if the accept header indicates that 1542 msdemlei 5752 \verb|application/rdf+xml| documents are preferred. 1543 msdemlei 5755 1544 \item an ad-hoc JSON rendition of the vocabulary as specified in 1545 sect.~\ref{sect:desise} if the accept header indicates that 1546 msdemlei 5824 \verb|application/x-desise+json| documents are preferred. 1547 msdemlei 5548 \end{itemize} 1548 1549 Individual vocabularies may be available in additional formats. 1550 Content negotiation might then consider additional media types. 1551 1552 Clients may record the full versioned URI of the vocabulary used for 1553 msdemlei 5619 debug or provenance purposes. These URIs, however, MUST NOT be used 1554 msdemlei 5548 externally. In particular, a URI like 1555 msdemlei 5549 \url{http://www.ivoa.net/rdf/example/2019-07-14/example.html#term} has no 1556 msdemlei 5548 RDF meaning by this standard and must never be used in publicly visible 1557 RDF triples. Always use URIs of the form 1558 msdemlei 5549 \url{http://www.ivoa.net/rdf/example#term}. 1559 msdemlei 5548 1560 msdemlei 5755 \subsection{Referencing Vocabularies} 1561 1562 Since IVOA vocabularies, at least after some time, generally are a 1563 collective effort with a continious evolution, it is inappropriate to 1564 msdemlei 5758 cite them in the conventional author-year-title format. 1565 msdemlei 5755 1566 However, the vocabulary URI is intended to be stable and uniquely 1567 identifies the vocabulary as such. Hence, this URI is what should 1568 normally be cited. The standard style would be along the lines of 1569 \begin{lstlisting}[language={}] 1570 Terms in this field must be taken from the IVOA vocabulary 1571 \url{http://www.ivoa.net/rdf/voresource/content_level}. 1572 \end{lstlisting} 1573 or, in formats where footnotes are appropriate and inline URIs should be 1574 avoided for typographical reasons 1575 \begin{lstlisting}[language={}] 1576 Terms in this field must be taken from the IVOA vocabulary 1577 \emph{Content levels for VO resources}\footnote{ 1578 \url{http://www.ivoa.net/rdf/voresource/content_level}}. 1579 \end{lstlisting} 1580 -- the footnote anchor should be the vocabulary name as given in the 1581 IVOA vocabulary repository\footnote{\url{http://www.ivoa.net/rdf}}. 1582 1583 Except in the rare cases in which version-sharp references are actually 1584 necessary (for instance, descriptions of errors), it is inappropriate to 1585 references URLs with dates (e.g., 1586 \url{http://ivoa.net/rdf/voresource/content_level/2016-08-17/}). URIs 1587 to actual resources (e.g., the XML or Turtle renditions) must never be 1588 used to reference vocabularies. 1589 1590 We do not see a relevant use case for having IVOA vocabularies formally 1591 msdemlei 5758 cited in reference sections of scholarly works: such references will not 1592 msdemlei 5755 aid in finding them, and there is no credible benefit in tracking their 1593 usage from citation in literature. 1594 1595 1596 msdemlei 5459 \appendix 1597 msdemlei 5549 \section{The 2019 IVOA Vocabulary Toolset (non-normative)} 1598 msdemlei 5485 \label{app:tools} 1599 1600 msdemlei 5549 This appendix describes the recommended toolset for authoring IVOA 1601 vocabularies as of 2019. Vocabulary authors may decide to use other 1602 tools but should consider that that may incur additional work for the 1603 msdemlei 5553 chair of the Semantics WG in later maintenance. 1604 msdemlei 5549 1605 This appendix is non-normative. It will serve as documentation of the 1606 toolset and will occasionally be updated as the tooling evolves; 1607 vocabulary authors are still advised to inspect documentation within the 1608 msdemlei 5550 tools. Even major changes here will not lead to a new major version of 1609 the standard. 1610 msdemlei 5549 1611 msdemlei 5550 1612 msdemlei 5549 \subsection{Input Format} 1613 1614 msdemlei 5553 In the current tooling, RDF class and property 1615 vocabularies are authored in simple CSV files 1616 msdemlei 5549 with five columns. These columns are: 1617 1618 \begin{description} 1619 \item[term] 1620 msdemlei 5551 This is the actual, machine-readable vocabulary term. Only use 1621 msdemlei 5549 letters, digits, underscores, and dashes here. As specified in 1622 msdemlei 5619 sect.~\ref{sect:voccontent}, these identifiers should be 1623 msdemlei 5549 human-readable, even though they are not directly intended for human 1624 msdemlei 5551 consumption (clients will use the label). In the interest of 1625 reasonably compact URIs we advise to keep the length of the 1626 terms below, say, 30 characters. 1627 msdemlei 5549 \item[level] 1628 This is used for simple input of wider/narrower relationships. 1629 msdemlei 5619 It is 1 for root'' terms. Terms with a level of 2 that follow a 1630 msdemlei 5612 root term become its children. i.e., the tooling will add the 1631 appropriate wider relationship between the level 2 and the level 1 1632 term. You can nest, i.e., have 1633 msdemlei 5549 terms of level 3 below terms of level 2. Note that this means the 1634 msdemlei 5551 order of rows must be preserved in the CSV files: Do \emph{not} sort 1635 vocabulary CSVs. 1636 msdemlei 5549 \item[label] 1637 This is a short, human-readable label for the term. In the VO, this 1638 msdemlei 5758 is generally derived fairly directly from the content of the first 1639 column, usually by 1640 msdemlei 5549 inserting blanks at the right places and fixing capitalisation. 1641 \item[description] 1642 This is a longer explanation of what the term means. We do not 1643 support any markup here, not even paragraphs, so there is probably a 1644 msdemlei 5553 limit to how much can be communicated. 1645 msdemlei 5549 \item[more\_relations] 1646 msdemlei 5758 This column can be used to declare non-hierarchical relationships 1647 msdemlei 5549 and contains whitespace-separated declarations. Each declaration has 1648 the form property[(term)]. Omitting the term is allowed for certain 1649 properties; in RDF, this corresponds to a blank node. See below for 1650 msdemlei 5612 the common properties supported here. Plain terms are resolved 1651 msdemlei 5549 within the vocabulary, but CURIEs with known prefixes or full URIs are 1652 admitted, too. 1653 \end{description} 1654 1655 Non-ASCII characters are allowed in label and description; files must be 1656 msdemlei 5661 encoded in UTF-8, the column separator currently is required to be a 1657 semicolon in order to save on escaping with descriptions (which very 1658 commonly contains commas). Fields that contain semicolons are escaped 1659 with double quotes, embedded double quotes are doubled. 1660 msdemlei 5549 1661 msdemlei 5776 The following properties are supported in the more\_relations 1662 msdemlei 5549 column: 1663 1664 \begin{itemize} 1665 msdemlei 5553 \item \vocterm{ivoasem:deprecated} -- see sect.~\ref{sect:genprop}. 1666 \item \vocterm{ivoasem:useInstead} -- see sect.~\ref{sect:genprop}. 1667 \item \vocterm{ivoasem:preliminary} -- see sect.~\ref{sect:genprop}. 1668 msdemlei 5549 \end{itemize} 1669 1670 \subsection{Vocabulary Metadata} 1671 \label{sect:vocmeta} 1672 1673 Global vocabulary metadata is kept an INI-style format. The following 1674 keys are understood: 1675 1676 \begin{description} 1677 \item[timestamp] 1678 A manually maintained date of the last modification. This is 1679 essentially a version marker and should be changed only in preparation 1680 msdemlei 5612 for a release. It is recommended to set it to the intended release 1681 msdemlei 5549 date during development and not change it for every edit. 1682 \item[title] 1683 A human-readable short phrase saying what the vocabulary describes. 1684 msdemlei 5800 \item[flavour] 1685 msdemlei 5612 One of \textit{RDF Class}, \textit{RDF Property}, or \textit{SKOS} 1686 (where SKOS currently expects RDF/XML serialised SKOS rather than CSV). 1687 msdemlei 5549 \item[description] 1688 A longer text (about a paragraph) stating what the vocabulary should 1689 msdemlei 5567 be used for. No markup is supported here. 1690 msdemlei 5549 \item[authors] 1691 msdemlei 5612 Persons involved with the creation of the vocabulary. These are \emph{not} 1692 msdemlei 5549 the persons to ask for maintenance; all requests for changes should be 1693 directed to the Semantics working group first. 1694 \item[filename] 1695 msdemlei 5612 The tooling expects the input at 1696 msdemlei 5758 \verb|/terms.csv|. If it is kept elsewhere, give 1697 msdemlei 5551 the source file name here. This is to support legacy 1698 msdemlei 5612 vocabularies with nonstandard names and native SKOS input. 1699 msdemlei 5549 \item[draft] 1700 While a vocabulary is still being reviewed in its entirety, add a key 1701 draft set to \texttt{True}. This will add language to the effect that 1702 terms may still vanish from the vocabulary and mark all terms as 1703 preliminary. Once the vocabulary is approved, this key is deleted. 1704 msdemlei 5789 \item[licenseuri] 1705 IVOA-managed vocabularies are always made available under CC-0 and 1706 hence do not use this key. External vocabularies as per 1707 msdemlei 5805 sect.~\ref{sect:externally-managed} may be subject to actual licences, 1708 in which case this field holds a URI containing the licence's 1709 msdemlei 5789 conditions. 1710 msdemlei 5813 \item[licensenhtml] 1711 This is arbitrary HTML expressing whatever licence terms may be 1712 msdemlei 5911 attached to an external vocabulary. Again, do not use for IVOA 1713 msdemlei 5813 vocabularies. 1714 msdemlei 5549 \end{description} 1715 1716 Currently, the global metadata is maintained in a file 1717 msdemlei 5758 \verb|vocabs.conf| in the root of the vocabulary source repository, with one 1718 msdemlei 5553 section per vocabulary. The section name is the vocabulary name. 1719 msdemlei 5549 1720 msdemlei 5758 \subsection{Vocabulary Source Repository} 1721 msdemlei 5549 1722 Vocabulary authors are encouraged to maintain their vocabularies in the 1723 shared version control system of the IVOA. At the time of writing, this 1724 is a subversion repository at 1725 msdemlei 5620 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/voc-source}. 1726 msdemlei 5549 1727 Authors of new vocabularies should create a child directory and place 1728 their terms.csv file in there. They should then edit \verb|vocabs.conf| 1729 and add a section named after their directory with the content discussed 1730 in sect.~\ref{sect:vocmeta}. 1731 1732 msdemlei 5610 1733 msdemlei 5550 \section{Current Network Resources (non-normative)} 1734 msdemlei 5551 \label{app:curtech} 1735 msdemlei 5550 1736 This appendix details network resources used in vocabulary management. 1737 It is non-normative and will occasionally be updated as the IVOA's 1738 infrastructure evolves. Even major changes here will not lead to a new 1739 major version of the standard. 1740 1741 The list of vocabulary enhancement proposals is maintained in the IVOA's 1742 wiki at 1743 \url{https://wiki.ivoa.net/twiki/bin/view/IVOA/WebHome?topic=VEPs}. 1744 msdemlei 5553 Approved VEPs will be moved to an archive page linked there. 1745 msdemlei 5550 VEPs may be added as attachments to this page, but authors are 1746 encouraged to maintain them in version controlled repositories instead. 1747 The recommended place to do that is 1748 \url{https://volute.g-vo.org/svn/trunk/projects/semantics/veps}. 1749 1750 The discussion of VEPs (see sect.~\ref{sect:approval}) is to take place 1751 on the appropriate mailing list(s). See 1752 msdemlei 5553 \url{http://ivoa.net/members/index.html} for a directory of IVOA mailing 1753 msdemlei 5550 lists and their addresses. 1754 1755 msdemlei 5754 \section{An Example for a Vocabulary in Desise (non-normative)} 1756 \label{app:desiseexample} 1757 1758 mbt 5798 The following example shows what a vocabulary in desise looks like. The 1759 msdemlei 5754 content is, superficial similarities to real vocabularies 1760 notwithstanding, contrived. 1761 1762 \begin{lstlisting}[language=python] 1763 { 1764 "uri": "http://www.ivoa.net/rdf/example", 1765 "flavour": "RDF Class", 1766 "terms": { 1767 msdemlei 5788 "EQUATORIAL": { 1768 "label": "Equatorial", 1769 msdemlei 5824 "description": "Umbrella term for all sorts of equatorial frames.", 1770 msdemlei 5828 "narrower": ["ICRS", "ICRS2", "BD", "BD1875.0"], "wider": [] 1771 msdemlei 5788 }, 1772 "ICRS": { 1773 "label": "ICRS", 1774 msdemlei 5824 "description": "As defined by 1998AJ....116..516M.", 1775 msdemlei 5828 "wider": ["EQUATORIAL"], "narrower": [] 1776 msdemlei 5788 }, 1777 "B1875.0": { 1778 "label": "Bonner Durchmusterung System", 1779 msdemlei 5824 "description": "Deprecated term for the reference system implied by BD/CD", 1780 msdemlei 5828 "deprecated": "", 1781 "wider": ["EQUATORIAL"], "narrower": [] 1782 msdemlei 5788 }, 1783 "BD": { 1784 "label": "Bonner Durchmusterung System", 1785 "description": "The reference system implied by BD/CD" 1786 msdemlei 5828 "wider": ["EQUATORIAL"], "narrower": [] 1787 msdemlei 5788 }, 1788 "ICRS2": { 1789 "label": "ICRS 2", 1790 msdemlei 5824 "description": "The reference system defined by 2027A&A..1234...12B", 1791 "preliminary": "", 1792 msdemlei 5828 "wider": ["EQUATORIAL"], "narrower": [] 1793 msdemlei 5788 } 1794 msdemlei 5754 } 1795 } 1796 \end{lstlisting} 1797 1798 msdemlei 5459 \section{Changes from Previous Versions} 1799 1800 msdemlei 5922 \subsection{Changes from WD-2020-06-12} 1801 1802 \begin{itemize} 1803 \item No changes to normative material. 1804 \item Adding a use case on vocabulary evolution and on VO-DML. 1805 \item Various editorial changes. 1806 \end{itemize} 1807 1808 msdemlei 5789 \subsection{Changes from WD-2020-03-26} 1809 1810 \begin{itemize} 1811 \item Desise term values are now dicts with label and description to 1812 msdemlei 5824 make it a bit more self-explanatory; this let us pull in preliminary, 1813 deprecated, and wider as well. 1814 msdemlei 5828 \item Desise now contains an inversion of wider, narrower, with meanings 1815 msdemlei 5824 quite different between SKOS and the other flavours. 1816 \item The main media type for Desise is now application/x-desise+json rather 1817 than text/json because there is no text/json, and you can't have 1818 content media type parameters on either. 1819 msdemlei 5813 \item Mentioning licenseuri and licensehtml in the non-normative part on 1820 msdemlei 5828 managing vocabulary metadata. Also stating there that IVOA-managed 1821 msdemlei 5789 vocabularies are CC-0. 1822 \end{itemize} 1823 1824 1825 msdemlei 5661 \subsection{Changes from WD-2019-09-05} 1826 msdemlei 5459 1827 msdemlei 5600 \begin{itemize} 1828 msdemlei 5755 \item We no longer recommend that non-RDF clients use RDF/XML. We have 1829 msdemlei 5752 therefore removed the usage with plain XML tooling'' sections. We 1830 have also removed the description of the revovo python module from the 1831 toolset appendix. 1832 1833 msdemlei 5755 \item Instead, we now have the custom desise'' format described in a 1834 new section that doubles as a very quick introduction for adopters not 1835 interested in RDF. 1836 1837 msdemlei 5752 \item Adding a use case and requirement for the UAT (and, perhaps, 1838 msdemlei 5758 similar externally curated vocabularies). Adding a section on how 1839 such vocabularies may be integrated into the IVOA RDF repository. 1840 msdemlei 5752 1841 msdemlei 5704 \item Now requiring a \emph{Used-in} item in addition VEPs, implying 1842 that only terms that are already applied may be proposed. 1843 1844 \item Adding \emph{Supercedes} and \emph{Superceded-by} items, 1845 formalising the previous language on splitting'' VEPs a bit. 1846 1847 msdemlei 5755 \item Adding advice on referencing vocabularies. 1848 1849 msdemlei 5754 \item We now demand a formal validation of VEPs by the semantics chair. 1850 msdemlei 5705 The responsibility for uploading'' the VEP, i.e., adding it to the VEP 1851 index, is now assigned to them. 1852 msdemlei 5756 1853 \item Adding a soapbox section with advice on what to do when proposing 1854 msdemlei 5758 new terms and introducing a naive semantics model. 1855 msdemlei 5600 \end{itemize} 1856 1857 msdemlei 5553 \bibliography{local.bib,ivoatex/ivoabib,ivoatex/docrepo} 1858 msdemlei 5459 1859 1860 \end{document}

 msdemlei@ari.uni-heidelberg.de ViewVC Help Powered by ViewVC 1.1.26