ViewVC logotype

Contents of /trunk/projects/vocabularies/doc/vocabularies.xml

Parent Directory Parent Directory | Revision Log Revision Log

Revision 26 - (show annotations)
Wed Jan 9 13:54:36 2008 UTC (13 years, 8 months ago) by norman.x.gray
File MIME type: text/xml
File size: 35328 byte(s)
Minor wording changes and typos
Fix internal crossrefs in generated XHTML
Added reference to DC standard

1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Based on template at
3 http://www.ivoa.net/Documents/templates/ivoa-tmpl.html -->
4 <html xmlns="http://www.w3.org/1999/xhtml"
5 xmlns:dc="http://purl.org/dc/elements/1.1/"
6 xmlns:dcterms="http://purl.org/dc/terms/"
7 xml:lang="en" lang="en">
9 <head>
10 <title>Vocabularies in the Virtual Observatory</title>
11 <link rev="made" href="http://nxg.me.uk/norman/#norman" title="Norman Gray"/>
12 <meta name="author" content="Norman Gray"/>
13 <meta name="DC.subject" content="IVOA, Virtual Observatory, Vocabulary"/>
14 <meta name="rcsdate" content="$Date$"/>
15 <link href="http://www.ivoa.net/misc/ivoa_wd.css" rel="stylesheet" type="text/css"/>
16 <!-- style: make the ToC a little more compact, and without bullets -->
17 <style type="text/css">
18 div.toc ul { list-style: none; padding-left: 1em; }
19 div.toc li { padding-top: 0ex; padding-bottom: 0ex; }
20 li { padding-top: 1ex; padding-bottom: 1ex; }
21 span.userinput { font-weight: bold; }
22 span.url { font-family: monospace; }
23 q { color: #666; }
24 q:before { content: "“"; }
25 q:after { content: "”"; }
26 .todo { background: #ff7; }
27 </style>
28 </head>
30 <body>
31 <div class="head">
32 <table>
33 <tr><td><a href="http://www.ivoa.net/"><img alt="IVOA logo" src="http://ivoa.net/icons/ivoa_logo_small.jpg" border="0"/></a></td></tr>
34 </table>
36 <h1>Vocabularies in the Virtual Observatory, v@VERSION@</h1>
37 <h2>IVOA Working Draft, @RELEASEDATE@ [DRAFT $Revision$]</h2>
38 <!-- $Revision$ $Date$ -->
40 <dl>
41 <dt>Working Group</dt>
42 <dd><em><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics">Semantics</a></em></dd>
44 <dt>This version</dt>
45 <dd>@BASEURI@</dd> <!-- XXX adjust current/latest URI from Makefile -->
47 <dt>Latest version</dt>
48 <dd>@BASEURI@</dd>
50 <dt>Editors</dt>
51 <dd>TBD</dd>
53 <dt>Authors</dt>
54 <dd>
55 <!-- The following are the folk that I'm definitely know have contributed
56 text or code to this document: add others as appropriate -->
57 <span property="dc:creator">Alasdair J G Gray</span>,
58 <span property="dc:creator">Norman Gray</span>,
59 <span property="dc:creator">Frederic V Hessman</span> and
60 <span property="dc:creator">Andrea Preite Martinez</span>
61 </dd>
62 </dl>
63 <hr/>
64 </div>
66 <div class="section-nonum" id="abstract">
67 <p class="title">Abstract</p>
69 <div class="abstract">
70 <p>As the astronomical information processed within the <em>Virtual Observatory
71 </em> becomes more complex, there is an increasing need for a more
72 formal means of identifying quantities, concepts, and processes not
73 confined to things easily placed in a FITS image, or expressed in a
74 catalogue or a table. We proposed that the IVOA adopt a standard
75 format for vocabularies based on the W3C's <em>Resource Description
76 Framework</em> (RDF) and <em>Simple Knowledge Organization System</em>
77 (SKOS). By adopting a standard and simple format, the IVOA will
78 permit different groups to create and maintain their own specialized
79 vocabularies while letting the rest of the astronomical community
80 access, use, and combined them. The use of current, open standards
81 ensures that VO applications will be able to tap into resources of the
82 growing semantic web. Several examples of useful astronomical
83 vocabularies are provided, including work on a common IVOA thesaurus
84 intended to provide a semantic common base for VO applications.</p>
85 </div>
87 </div>
89 <div class="section-nonum" id="status">
90 <p class="title">Status of this document</p>
92 <p>This is an IVOA Working Draft. The first release of this document was
93 <span property="dc:date">@RELEASEDATE@</span>.</p>
95 <p>This document is an IVOA Working Draft for review by IVOA members
96 and other interested parties. It is a draft document and may be
97 updated, replaced, or obsoleted by other documents at any time. It is
98 inappropriate to use IVOA Working Drafts as reference materials or to
99 cite them as other than <q>work in progress</q>.</p>
101 <p>A list of current IVOA Recommendations and other technical
102 documents can be found at
103 <a href="http://www.ivoa.net/Documents/"><code>http://www.ivoa.net/Documents/</code></a>.</p>
105 <h3>Acknowledgments</h3>
107 <p>We would like to thank the members of the IVOA semantic working
108 group for many interesting ideas and fruitful discussions.</p>
109 </div>
111 <h2><a id="contents" name="contents">Table of Contents</a></h2>
112 <?toc?>
114 <hr/>
116 <div class="section" id="introduction">
117 <p class="title">Introduction</p>
119 <div class="section">
120 <p class="title">Vocabularies in astronomy</p>
122 <p>Astronomical information of relevance to the Virtual Observatory
123 (VO) is not confined to quantities easily expressed in a catalogue or
124 a table.
125 Fairly simple things such as position on the sky, brightness in some
126 units, times measured in some frame, redshits, classifications or
127 other similar quantities are easily manipulated and stored in VOTables
128 and can currently be identified using IVOA Unified Content Descriptors
129 (UCDs) <span class="cite">std:ucd</span>.
130 However, astrophysical concepts and quantities use a wide variety of
131 names, identifications, classifications and associations, most of
132 which cannot be described or labelled via UCDs.</p>
134 <p>There are a number of basic forms of organised semantic knowledge
135 of potential use to the VO, ranging from informal <q>folksonomies</q>
136 (where users are free to choose their own labels) at one extreme, to
137 formally structured <q>vocabularies</q> (where the label is drawn from
138 a predefined set of defintions which can include relationships between
139 labels) and <q>ontologies</q> (where the domain is captured in a data
140 model) at the other.
141 More formal definitions are presented later in this document.
142 </p>
144 <!-- <span
145 class='todo' >I think this list covers definitions covered more
146 naturally in the text below it - omissable?[NG]</span></p>
147 <ul>
148 <li>A <em>controlled vocabulary</em> is a standardized list of
149 words or other tokens with accepted meanings (for example <q>M31</q>,
150 <q>spiral galaxy</q>, <q>star</q>, <q>gas</q>, <q>dust</q>,
151 <q>cloud</q>, <q>black hole</q>, <q>Dark Matter</q>,
152 <q>halo</q>). See the fuller discussion in <span class='xref'
153 >vocab</span>.</li>
155 <li>A <em>taxonomy</em> is a controlled vocabulary encompassing all of
156 the members of a semantic group (for example there are <q>spiral</q>,
157 <q>elliptical</q>, <q>lenticular</q>, and <q>irregular</q> galaxies).</li>
159 <li>A <em>thesaurus</em> is a controlled vocabulary with some linking
160 between tokens so that simple hierarchical structures and equivalences
161 can be identified (for example <q>M31</q> is a narrower term for a <q>spiral
162 galaxy</q> which, in turn, is a narrower term for a
163 <q>galaxy</q>).</li>
165 <li>At the most formal end of this spectrum, an <em>ontology</em> is,
166 in the now-standard description
167 ultimately attributable to <span class='cite' >gruber93</span>, <q>a
168 formal specification of a shared conceptualisation</q>, that is, a set
169 of classes and properties which articulate a model of the world (see
170 also <span class='cite' >baader04</span>). It can range from an
171 elaborate set of definitions and restrictions, to a lightweight model
172 which is barely more than a set of subclass relationships. For
173 example, one might define a set of astronomical concepts and their
174 relations with each other, and say that <q>M31</q> is a
175 member of the class <q>Spiral Galaxy</q>, the latter consisting of
176 <q>Stars</q>, <q>Gas and Dust Clouds</q>, a <q>Central Black Hole</q>,
177 and a <q>Dark Matter Halo</q>.</li>
178 </ul>
180 <p>The term <q>folksonomy</q> has emerged in the last few years, to
181 describe what would in other circumstances be described as an
182 uncontrolled keyword list. The new term, and the substantial recent
183 interest in it, is a consequence of the realisation that even such a
184 simple mechanism can in certain circumstances (well-known examples are
185 the Flickr and del.icio.us social services) add substantial value to
186 a set of resources.</p>
187 -->
189 <p>
190 An astronomical ontology is necessary if we are to have a computer
191 (appear to) `understand' something of the domain.
192 There has been some progress towards creating an ontology of
193 astronomical object types <span
194 class="cite">std:ivoa-astro-onto</span> to meet this need.
195 However there are distinct use cases for letting human users find
196 resources of interest through search and navigation of the information space.
197 The most appropriate technology to meet these use cases derives from
198 the Information Science community, that of <em>controlled
199 vocabularies, taxonomies and thesauri</em>.
200 In the present document, we do not distinguish between controlled
201 vocabularies, taxonomies and thesauri, and use the term
202 <em>vocabulary</em> to represent all three.
203 </p>
205 <p>One of the best examples of the need for a simple vocabulary within
206 the VO is VOEvent <span class="cite">std:voevent</span>, the VO
207 standard for handling astronomical events: if someone broadcasts, or
208 `publishes', the occurrence of an event, the implication is that
209 someone else is going to want to respond to it, but no institution is
210 interested in all possible events, so some standardised information
211 about what the event `is about' is necessary, in a form which
212 ensures that the parties can communicate effectively. If a `burst' is
213 announced, is it a Gamma Ray Burst due to the collapse of a star in a
214 distant galaxy, a solar flare, or the brightening of a stellar or AGN
215 accretion disk? If a publisher doesn't use the label one might have
216 expected, how is one to guess what other equivalent labels might have
217 been used?</p>
219 <p>There have been a number of attempts to create astronomical
220 vocabularies.</p>
221 <ul>
223 <li>The <em>Second Reference Dictionary of the Nomenclature of
224 Celestial Objects</em> <span class="cite">lortet94</span>, <span
225 class="cite">lortet94a</span> contains 500 paper pages of astronomical
226 nomenclature</li>
228 <li>For decades professional journals have used a set of reasonably
229 compatible keywords to help classify the content of whole articles.
230 These keywords have been analysed by Preite Martinez &amp; Lesteven
231 <span class="cite">preitemartinez07</span>, from which they derived a set
232 of common keywords constituting one of the potential bases for a
233 fuller VO vocabulary. The same authors also attempted to derive a set
234 of common concepts by analyzing the contents of abstracts in journal
235 articles, which should comprise a list of tokens/concepts more
236 up-to-date than the old list of journal keywords. A similar but
237 less formal attempt was made by Hessman for the VOEvent working group,
238 resulting in a similar list <span class="todo">Find Hessman05
239 reference, and check differences from the A&amp;A list</span>.</li>
241 <li>Astronomical databases generally use simple sets of keywords –
242 sometimes hierarchically organized – to aid the users in the querying
243 of the databases. Two examples from totally different contexts are the
244 list of object types used in the <a
245 href="http://simbad.u-strasbg.fr">Simbad</a> database and the search
246 keywords used in the educational Hands-On Universe image database
247 portal.</li>
249 <li>The Astronomical Outreach Imagery (AOI) working group has created
250 a simple taxonomy for helping to classify images used for educational
251 or public relations <span class="cite">std:aoim</span>.</li>
252 <!--
253 <li>The Hands-On Universe project (see <span class='url'
254 >http://sunra.lbl.gov/telescope2/index.html</span>) has maintained a
255 public database of images for use by the general public since the
256 1990s. The images are very heterogeneous, since they are gathered from
257 a variety of professional, semi-professional, amateur, and school
258 observatories, so a simple taxonomy is used to facilitate browsing
259 by the users of the database.</li>
261 <li>Remote Telescope Markup Language <span
262 class="cite">std:rtml</span>, a document definition for the transfer
263 of observing requests that has been adopted by the Heterogeneous
264 Telescope Network (HTN) Consortium and is indirectly supported by the
265 VOEvent protocol, currently contains several telescope and
266 observation-related taxonomies of terms (e.g. for devices, filters,
267 objects).<span class='todo'>Confirm status: does this need to be
268 converted to SKOS? [AG]. No: RTML will use IVOAT! [FVH] So delete
269 this item? [NG]</span></li>
270 -->
271 <li>In 1993, Shobbrook and Shobbrook published an Astronomy Thesaurus
272 endorsed by the IAU <span class='cite' >shobbrook92</span>. This
273 collection of nearly 3000 terms, in five languages, is a valuable
274 resource, but has seen little use in recent years. Its very size,
275 which gives it expressive power, is a disadvantage to the extent that
276 it is therefore hard to use.</li>
278 <li>The Unified Content Descriptors <span class='cite' >std:ucd</span>
279 (UCD) constitute the main controlled vocabulary of the IVOA and
280 contains some taxonometric information. However, UCD suffers from two
281 major problems which makes it difficult to use beyond the present
282 applications of labeling VOTables: firstly, there is no standard means of
283 identifying and processing the contents of the text-based reference
284 document; and secondly, the content cannot be openly extended beyond that set
285 by a formal IVOA committee without going through a laborious and
286 time-consuming negotiation process of extending the primary vocabulary
287 itself.</li>
289 </ul>
290 </div>
292 <div class="section">
293 <p class="title">Formalising and managing multiple vocabularies</p>
295 <p>We find ourselves in the situation where there are multiple
296 vocabularies in use, describing a broad range of resources of interest
297 to professional and amateur astronomers, and members of the public.
298 These different vocabularies use different terms and different
299 relationships to support the different constituencies they cater for.
300 For example, <q>delta Sct</q> and <q>RR Lyr</q> are terms one would
301 find in a vocabulary aimed at professional astronomers, associated
302 with the notion of <q>variable star</q>; however one would
303 <em>not</em> find such technical terms in a vocabulary intended to
304 support outreach activities.</p>
306 <p>One approach to this problem is to create a single consensus
307 vocabulary, which draws terms from the various existing vocabularies
308 to create a new vocabulary which is able to express anything its users
309 might desire. The problem with this is that such an effort would be
310 very expensive: both in terms of time and effort on the part of those
311 creating it, and to the potential users, who have to learn
312 to navigate around it, recognise the new terms, and who have to be
313 supported in using the new terms correctly (or, more often,
314 incorrectly).</p>
316 <p>The alternative approach to the problem is to evade it, and this is
317 the approach taken in this document. Rather than deprecating the
318 existence of multiple overlapping vocabularies, we embrace it,
319 formalise all of them, and formally declare the relationships between
320 them. This means that:</p>
321 <ul>
322 <li>The various vocabularies are allowed to evolve separately, on
323 their own timescales, managed either by the IVOA, individual working
324 groups within the IVOA, or by third parties;</li>
326 <li>Specialized vocabularies can be developed and maintained by the
327 community with the most knowledge about a specific topic, ensuring
328 that the vocabulary will have the right breadth, depth, and
329 accuracy;</li>
331 <li>Users can choose the vocabulary or combination of vocabularies most
332 appropriate to their situation, either when annotating resources, or
333 when querying them; and</li>
335 <li>We can retain the previous investments made in vocabularies by
336 users and resource owners.</li>
338 </ul>
340 <p>The purpose of this proposal is to establish a common format for
341 the grass-roots creation, publishing, use, and manipulation of
342 astronomical vocabularies within the Virtual Observatory, based upon
343 the W3C's SKOS standard. We include as appendices to this proposal
344 formalised versions of a number of existing vocabularies, encoded as
345 SKOS vocabularies <span class="cite">std:skoscore</span>.</p>
347 </div>
349 </div>
351 <div class='section'>
352 <p class='title'>SKOS-based vocabularies</p>
354 <div class="section" id='vocab'>
355 <p class="title">Selection of the vocabulary format</p>
357 <p>After extensive online and face-to-face discussions, the authors have
358 brokered a consensus within the IVOA community that
359 formalised vocabularies should be published at least in SKOS (Simple Knowledge
360 Organising Systems) format, a W3C draft standard application of RDF to the
361 field of knowledge organisation <span
362 class="cite">std:skoscore</span>. SKOS draws on long experience
363 within the Library and Information Science community, to address a
364 well-defined set of problems to do with the indexing and retrieval of
365 information and resources; as such, it is a close match to the problem
366 this working group is addressing.</p>
368 <p>ISO 5964 <span class='cite' >std:iso5964</span> defines a number of
369 the relevant terms (ISO 5964:1985=BS 6723:1985; see also <span
370 class='cite' >std:bs8723-1</span> and <span class='cite'
371 >std:z39.19</span>), and some of the (lightweight) theoretical
372 background. The only technical distinction relevant to this document
373 is that between `vocabulary' and `thesaurus': BS-8723-1 defines a
374 thesaurus as a</p>
375 <blockquote>
376 Controlled vocabulary in which concepts are represented by preferred
377 terms, formally organized so that paradigmatic relationships between
378 the concepts are made explicit, and the preferred terms are
379 accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE:
380 The purpose of a thesaurus is to guide both the indexer and the
381 searcher to select the same preferred term or combination of preferred
382 terms to represent a given subject. (BS-8723-1, sect. 2.39)
383 </blockquote>
384 <p>with a similar definition in ISO-5964 sect. 3.16. The paradigmatic
385 relationships in question are those relating a term to a <q>broader</q>,
386 <q>narrower</q> or more generically <q>related</q> term, with an operational
387 definition of <q>broader term</q> which is such that a resource retrieved
388 by a given term will also be retrieved by that term's <q>broader term</q>.
389 This is not a subsumption relationship, as there is no implication
390 that the concept referred to by a narrower term is of the same
391 <em>type</em> as a broader term.</p>
393 <p>Thus <strong>a vocabulary (SKOS or otherwise) is not an
394 ontology</strong>. It has lighter and looser semantics than an
395 ontology, and is specialised for the restricted case of resource
396 retrieval. Those interested in ontological analyses can easily
397 transfer the vocabulary relationship information from SKOS to a formal
398 ontological format such as OWL <span class='cite' >std:owl</span>.</p>
400 <!--
401 <p><span class='todo' >What is to be the format of the `master' files?
402 SKOS or mildly-formatted plain text?[NG] By definition, this will be
403 left up to the publishers! All we need to see is SKOS. [FVH] There's
404 more than one notation for SKOS (RDF/XML and Turtle/N3): do we need to
405 mandate one over others (FVH says yes, RDF/XML; NG says no). Open
406 issue.</span></p>
407 -->
408 </div>
410 <div class='section'>
411 <p class='title'>Content and format of a SKOS vocabulary</p>
413 <p>A published vocabulary in SKOS format consists of a set of
414 <q>concepts</q> – the examples below are shown in the Turtle notation
415 for RDF <span class='cite' >std:turtle</span> (this is similar to the
416 more informal N3 notation).
417 Each concept should contain the following elements:</p>
418 <ul>
419 <li>A single URI representing the concept, mainly for use by
420 computers but preferably human-readable, e.g. an entry for <q>spiral
421 galaxy</q> might look like: <code>&lt;#spiralGalaxy&gt; a
422 skos:Concept</code>.
423 <!-- <code>&lt;skos:Concept rdf:about="#spiralGalaxy"&gt;</code>-->
424 </li>
426 <li>A prefered label in each supported language for the vocabulary for
427 use by humans, e.g. <code>skos:prefLabel "spiral galaxy"@en,
428 "Spiralgalaxie"@de</code>.
429 <!-- <code>&lt;skos:prefLabel&gt;spiral galaxy&lt;/skos:prefLabel&gt;</code>-->
430 </li>
432 <li>Optional alternative labels which applications may encounter or in
433 common use, whether simple synonyms or commonly-used aliases,
434 e.g. <q>GRB</q> for "gamma-ray burst": <code>skos:altLabel
435 "GRB"@en</code> <!--<code>&lt;skos:altLabel
436 lang="de"&gt;Spiralgalaxie&lt;/skos:altLabel&gt;</code> --></li>
438 <li>Optional hidden labels which capture common misspellings for
439 either the preferred or alternate labels, e.g. <q>spiral glaxy</q> for
440 <q>spiral galaxy</q>: <code>skos:hiddenLabel "spiral
441 glaxy"@en</code>.</li>
443 <li>A definition for the concept, where one exists in the original
444 vocabulary, to clarify the meaning of the term,
445 e.g. <code>skos:definition "A galaxy having a spiral
446 structure."@en</code>.</li>
448 <li>A scope note to further clarify a defintion, or the usage of the
449 concept, e.g. <code>skos:scopeNote "Spiral galaxies fall into one of
450 three categories: Sa, Sc, and Sd"@en</code>.</li>
452 <li>Optional, a concept may be involved in any number of relationships
453 to other concepts. The types of relationships are
454 <ul>
455 <li>Narrower or more specific concepts, e.g. a link to the concept
456 representing a <q>barred spiral galaxy</q>: <code>skos:narrower
457 &lt;#barredSpiralGalaxy&gt;</code>.
458 <!--<code>&lt;skos:narrower rdf:resource="#barredSpiralGalaxy&gt;</code>-->
459 </li>
460 <li>Broader or more general concepts, e.g. a link to the token
461 representing galaxies in general: <code>skos:broader
462 &lt;#galaxy&gt;</code>.
463 <!--<code>&lt;skos:broader rdf:resource="#galaxy&gt;</code>-->
464 </li>
465 <li>Related concepts, e.g. a link to the token representing spiral
466 arms of galaxies: <code>skos:related &lt;#spiralArm&gt;</code>
467 <!--<code>&lt;skos:related rdf:resource="#spiralArm"&gt;</code>-->
468 (note this relationship does not say that spiral galaxies have spiral
469 arms – that would be ontological information of a higher order which
470 is beyond the requirements for information stored in a vocabulary).</li>
471 </ul>
472 </li>
473 </ul>
475 <p>In addition to the information about a single concept, a vocabulary
476 can contain information to help users navigate its structure and
477 contents:</p>
478 <ul>
479 <li>The <q>top concepts</q> of the vocabulary, i.e. those that occur
480 at the top of the vocabulary hierarchy defined by the broader/narrower
481 relationships, can be explicitly stated to make it easier to navigate
482 the vocabulary.</li>
484 <li>Concepts that form a natural group can be defined as being members
485 of a <q>collection</q>.</li>
487 <li>Versioning information can be added using change notes.</li>
489 <li>Additional metadata about the vocabulary, e.g. the publisher, may
490 be documented using the Dublin Core metadata set <span class='cite'
491 >std:dublincore</span>.</li>
492 </ul>
493 </div>
496 <div class='section'>
497 <p class='title'>Relationships Between Vocabularies</p>
499 <span class='todo'>[TODO] AG to write a draft for this section</span>
500 <ul>
501 <li>equivalences in external vocabularies, e.g. to the IAU thesaurus
502 <code>skos:related &lt;iau93:SPIRALGALAXY&gt;</code>
503 <!--<code>&lt;skos:related rdf:resource="IAU93:SPIRALGALAXY"&gt;</code>-->
504 (note the use of an external namespace <code>iau93</code> which must
505 be defined within the document)
506 </li>
507 </ul>
509 </div>
511 <div class='section'>
512 <p class='title'>Suggested good practices</p>
514 <p>As long as the vocabularies conform to the SKOS standard and
515 published in a machine processable RDF format, there is nothing
516 keeping a VO application from using the vocabulary to support the
517 human user and to enable new connections between different sources of
518 information.
519 However, we have identified a set of
520 <q>best practice rules</q> which, if followed, will make the creation,
521 management, and use of the vocabularies within the VO simpler and more
522 effective:</p>
524 <ol>
525 <li>The SKOS documents defining the vocabulary should be published at
526 a long-term accessible URI and should be mirrored at a central IVOA
527 vocabulary repository.
528 Each version of the vocabulary should be indicated within the name
529 (e.g. "MyFavoriteVocabulary-v3.14") and previous versions should
530 continue to be available even after having been subsumed by newer
531 versions; Published vocabulary updates should be infrequent and
532 individual changes should be documented, e.g. by
533 <code>&lt;skos:changeNote&gt;</code>. The vocabulary namespace should
534 be the same as the location of the vocabulary.</li>
536 <li>Concept identifiers should consist only of the letters a-z, A-Z,
537 and numbers 0-9, i.e. no spaces, no exotic letters (e.g. umlauts), and
538 no characters which would make a token inexpressible as part of a URI;
539 since tokens are for use by computers only, this is not a big
540 restriction - the exotic letters can be used within the labels and
541 documentation if appropriate.</li>
543 <li>Token names should be kept in human-readable form, directly
544 reflect the implied meaning, and not be semi-random identifiers only
545 (e.g. <q>spiralGalaxy</q>, not "t1234567"); tokens should preferably
546 be created via a direct conversion from the preferred label via
547 removable/translation of non-token characters (see above) and
548 sub-token separation via capitalization of the first sub-token
549 character (e.g. the label "My favorite idea-label #42" is converted
550 into "MyFavoriteIdeaLabel42"). <span class='todo'>Open
551 issue</span></li>
553 <li>Labels should be in the form of the source vocabulary. When
554 developing a new vocabulary the singular form is preferred,
555 e.g. <q>spiral galaxy</q>, not "spiral galaxies". <span
556 class='todo'>Open issue</span></li>
558 <li>Each concept should have a definition
559 (<code>skos:definition</code>) that constitutes a short description of
560 the concept which could be adopted by an application using the
561 vocabulary; The use of additional documentation in standard SKOS or
562 Dublin format (see above) is encouraged. <span class='todo'>Note
563 distinction between description and SKOS scope-note</span></li>
565 <li>The language localization should be declared where appropriate,
566 e.g. preferred labels, alternate labels, defintions, etc.</li>
568 <li>Relationships (<q>broader</q>, <q>narrower</q>, <q>related</q>)
569 between concepts are encouraged, but not required; if used, they
570 should be complete (e.g. all <q>broader</q> links have corresponding
571 <q>narrower</q> links in the referenced entries and <q>related</q>
572 entries link each other).</li>
574 <li><q>TopConcept</q> entries (see above) should be declared and
575 normally consist of those concepts that do not have any <q>broader</q>
576 relationships (i.e. not at a sub-ordinate position in the
577 hierarchy).</li>
579 <li>Publishers are encouraged to publish <q>mappings</q> between their
580 vocabularies and other commonly used vocabularies. These should be
581 external to the defining vocabulary document so that the vocabulary
582 can be used independently of the publisher's mappings.</li>
583 </ol>
585 <p>These suggestions are by no means trivial – there was considerable
586 discussion within the semantic working group on many of these topics,
587 particularly about token formats (some wanted lower-case only), and
588 singular versus plural forms of the labels (different traditions exist
589 within the international library science community). Obviously, no
590 publisher of an astronomical vocabulary has to adopt these rules, but
591 the adoption of these rules will make it easier to use the vocabularly
592 in external generic VO applications. However, VO applications should
593 be developed to accept any vocabulary that complies with the latest
594 SKOS standard <span class="cite">std:skoscore</span>.</p>
595 </div>
597 </div> <!-- section SKOS-based vocabularies -->
607 <div class="section">
608 <p class="title">Example vocabularies</p>
610 <p>The intent of having the IVOA adopt SKOS as the prefered format for
611 astronomical vocabularies is to encourage the creation and management
612 of diverse vocabularies by competent astronomical groups, so that
613 users of the VO and related resources can benefit directly and
614 dynamically without the intervention of the IAU or IVOA. However, we
615 felt it important to provide several examples of vocabularies in the
616 SKOS format as part of the proposal, to illustrate their simplicity
617 and power, and to provide an immediate vocabular basis for VO
618 applications.</p>
620 <p>We provide a set of SKOS files representing the vocabularies which
621 have been developed, and mappings between them. These can be
622 downloaded at the URL</p>
623 <blockquote>
624 <span class='url'>@BASEURI@/@DISTNAME@.tar.gz</span>
625 </blockquote>
627 <p><span class='todo' >[To be expanded:] there are no mappings at the
628 moment. Also, the vocabularies are all in a single language, though
629 translations of the IAU93 thesaurus are available.</span></p>
631 <div class='section'>
632 <p class='title'>A Constellation Name Vocabulary (normative)</p>
634 <p>This vocabulary is presented as a simple example of an astronomical vocabulary for a very particular purpose, e.g. handling constellation information like that commonly encountered in variable star research. For example, <q>SS Cygni</q> is a cataclysmic variable located in the constellation <q>Cygnus</q>. The name of the star uses the genitive form <q>Cygni</q>, but the alternate label <q>SS Cyg</q> uses the standard abbreviation <q>Cyg</q>. Given the constellation vocabulary, all of these forms are recorded together in a computer-manipulatable format. <span class='todo'>`Incorrect' forms should probably be represented in SKOS `hidden labels'</span></p>
636 <p>The &lt;skos:ConceptScheme&gt; contains a single &lt;skos:TopConcept&gt;, <q>constellation</q></p>
637 <pre>
638 &lt;skos:Concept rdf:about="#constellation"&gt;
639 &lt;skos:inScheme rdf:resource=""/&gt;
640 &lt;skos:prefLabel&gt;constellation&lt;/skos:prefLabel&gt;
641 &lt;skos:definition&gt;IAU-sanctioned constellation names&lt;/skos:definition&gt;
642 &lt;skos:narrower rdf:resource="#Andromeda"/&gt;
643 ...
644 &lt;skos:narrower rdf:resource="#Vulpecula"/&gt;
645 &lt;/skos:Concept&gt;
646 </pre>
647 <p><span class='todo' >Alternate Turtle form, for illustration, with
648 the SKOS namespace being the default...</span></p>
649 <pre>
650 &lt;#constellation&gt; a :Concept;
651 :inScheme &lt;&gt;;
652 :prefLabel "constellation";
653 :definition "IAU-sanctioned constellation names";
654 :narrower &lt;#Andromeda&gt;;
655 ...
656 :narrower &lt;#Vulpecula&gt;.
657 </pre>
658 <p>and the entry for <q>Cygnus</q> is</p>
659 <pre>
660 &lt;skos:Concept rdf:about="#Cygnus"&gt;
661 &lt;skos:inScheme rdf:resource=""/&gt;
662 &lt;skos:prefLabel&gt;Cygnus&lt;/skos:prefLabel&gt;
663 &lt;skos:definition&gt;Cygnus&lt;/skos:definition&gt;
664 &lt;skos:altLabel&gt;Cygni&lt;/skos:altLabel&gt;
665 &lt;skos:altLabel&gt;Cyg&lt;/skos:altLabel&gt;
666 &lt;skos:broader rdf:resource="#constellation"/&gt;
667 &lt;skos:scopeNote&gt;Cygnus is nominative form; The alternative labels are the genitive and short forms&lt;/skos:scopeNote&gt;
668 &lt;/skos:Concept&gt;
669 </pre>
670 <p>Note that SKOS alone does not permit the distinct differentiation
671 of genitive forms and abbreviations, but the use of alternate labels
672 is more than adequate enough for processing by VO applications where
673 the difference between <q>SS Cygni</q>, <q>SS Cyg</q>, and the incorrect form
674 <q>SS Cygnus</q> is probably irrelevant.</p>
675 </div>
677 <div class='section'>
678 <p class='title'>The 1993 IAU Thesaurus (normative)</p>
680 <p>The IAU Thesaurus consists of concepts with mostly capitalized
681 labels and a rich set of thesaurus relationships (<q>BF</q> for
682 "broader form", <q>NF</q> for <q>narrower form</q>, and <q>RF</q> for
683 <q>related form</q>). The thesaurus also contains <q>U</q> (for
684 <q>use</q>) and <q>UF</q> (<q>use for</q>) relationships. In a SKOS
685 model of a vocabulary these are captured as alternative labels. A
686 separate document contains translations of the vocabulary terms in
687 five languages: English, French, German, Italian, and
688 Spanish. Enumeratable concepts are plural (e.g. <q>SPIRAL
689 GALAXIES</q>) and non-enumerable concepts are singular
690 (e.g. <q>STABILITY</q>). Finally, there are some usage hints like
691 <q>combine with other</q></p>
693 <p>In converting the IAU Thesaurus to SKOS, we have been as faithful
694 as possible to the original format of the thesaurus. Thus, preferred
695 labels have been kept in their uppercase format.</p>
697 </div>
699 <div class='section'>
700 <p class='title'>The Astronomy &amp; Astrophysics Keyword List (normative)</p>
702 <p><span class='todo'>[TODO] AG to write a short description here</span></p>
703 </div>
705 <div class='section'>
706 <p class='title'>The AOIM Taxonomy (normative)</p>
708 <p><span class='todo'>[TODO] AG to write a short description here</span></p>
710 </div>
712 <div class='section'>
713 <p class='title'>The UCD1+ Vocabulary (non-normative)</p>
715 <p>The UCD standard is an officially sanctioned and managed vocabulary
716 of the IVOA. The normative document is a simple text file containing
717 entries consisting of tokens (e.g. <code>em.IR</code>), a short
718 description, and usage information (<q>syntax codes</q> which permit
719 UCD tokens to be concatenated). The form of the tokens implies a
720 natural hierarchy: <code>em.IR.8-15um</code> is obviously a narrower
721 term than <code>em.IR</code>, which in turn is narrower than
722 <code>em</code>.</p>
724 <p>Given the structure of the UCD1+ vocabulary, the natural
725 translation to SKOS consists of preferred labels equal to the original
726 tokens (the UCD1 words include dashes and periods), vocabulary tokens
727 created using the "5th Commandment" (e.g. "emIR815Um" for
728 <code>em.IR.8-15um</code>), direct use of the definitions, and the syntax codes
729 placed in usage documentation: <code>&lt;skos:scopeNote&gt;UCD syntax code: P&lt;/skos:scopeNote&gt;</code>
732 <p>Note that the SKOS document containing the UCD1+ vocabulary does
733 NOT consistute the official version: the normative document is still
734 the text list. However, on the long term, the IVOA may decide to make
735 the SKOS version normative, since the SKOS version contains all of the
736 information contained in the original text document but has the
737 advantage of being in a standard format easily read and used by any
738 application on the semantic web.</p>
740 </div>
742 <div class='section'>
743 <p class='title'>The proposed IVOA Thesaurus</p>
745 <p>While it is true that the adoption of SKOS will make it easy to
746 publish and access different astronomical vocabularies, the fact is
747 that there is no vocabulary which makes it easy to jump-start the
748 use of vocabularies in generic astrophysical VO applications: each of
749 the previously developed vocabularies has their own limits and
750 biases. For example, the IAU Thesaurus provides a large number of
751 entries, copious relationships, and translations to four other languages,
752 but there are no definitions, many concepts are now only useful for
753 historical purposes (e.g. many photographic or historical instrument
754 entries), some of the relationships are false or outdated, and many
755 important or newer concepts and their common abbreviations are
756 missing.</p>
758 <p>Despite its faults, the IAU Thesaurus constitutes a very extensive
759 vocabulary which could easily serve as the basis vocabulary once
760 we have removed its most egregrious faults and extended it to cover the
761 most obvious semantic holes. To this end, a heavily revised IAU
762 thesaurus is in preparation for use within the IVOA and other
763 astronomical contexts. The goal is to provide a general vocabulary
764 foundation to which other, more specialized, vocabularies can be added
765 as needed, and to provide a good <q>lingua franca</q> for the creation of
766 vocabulary mappings.</p>
767 </div>
768 </div> <!-- End: Example vocabularies -->
771 <div class="appendices">
773 <div class="section-nonum" id="bibliography">
774 <p class="title">Bibliography</p>
775 <?bibliography rm-refs ?>
776 </div>
778 <p style="text-align: right; font-size: x-small; color: #888;">
779 $Revision$ $Date$
780 </p>
782 </div>
784 </body>
785 </html>


Name Value
svn:keywords Author Date Revision

ViewVC Help
Powered by ViewVC 1.1.26