ViewVC logotype

Contents of /trunk/projects/vocabularies/doc/vocabularies.xml

Parent Directory Parent Directory | Revision Log Revision Log

Revision 27 - (show annotations)
Wed Jan 9 14:18:43 2008 UTC (13 years, 9 months ago) by alasdair.gray
File MIME type: text/xml
File size: 37128 byte(s)
Added text for section on Mappings between vocabularies.
1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Based on template at
3 http://www.ivoa.net/Documents/templates/ivoa-tmpl.html -->
4 <html xmlns="http://www.w3.org/1999/xhtml"
5 xmlns:dc="http://purl.org/dc/elements/1.1/"
6 xmlns:dcterms="http://purl.org/dc/terms/"
7 xml:lang="en" lang="en">
9 <head>
10 <title>Vocabularies in the Virtual Observatory</title>
11 <link rev="made" href="http://nxg.me.uk/norman/#norman" title="Norman Gray"/>
12 <meta name="author" content="Norman Gray"/>
13 <meta name="DC.subject" content="IVOA, Virtual Observatory, Vocabulary"/>
14 <meta name="rcsdate" content="$Date$"/>
15 <link href="http://www.ivoa.net/misc/ivoa_wd.css" rel="stylesheet" type="text/css"/>
16 <!-- style: make the ToC a little more compact, and without bullets -->
17 <style type="text/css">
18 div.toc ul { list-style: none; padding-left: 1em; }
19 div.toc li { padding-top: 0ex; padding-bottom: 0ex; }
20 li { padding-top: 1ex; padding-bottom: 1ex; }
21 span.userinput { font-weight: bold; }
22 span.url { font-family: monospace; }
23 q { color: #666; }
24 q:before { content: "“"; }
25 q:after { content: "”"; }
26 .todo { background: #ff7; }
27 </style>
28 </head>
30 <body>
31 <div class="head">
32 <table>
33 <tr><td><a href="http://www.ivoa.net/"><img alt="IVOA logo" src="http://ivoa.net/icons/ivoa_logo_small.jpg" border="0"/></a></td></tr>
34 </table>
36 <h1>Vocabularies in the Virtual Observatory, v@VERSION@</h1>
37 <h2>IVOA Working Draft, @RELEASEDATE@ [DRAFT $Revision$]</h2>
38 <!-- $Revision$ $Date$ -->
40 <dl>
41 <dt>Working Group</dt>
42 <dd><em><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics">Semantics</a></em></dd>
44 <dt>This version</dt>
45 <dd>@BASEURI@</dd> <!-- XXX adjust current/latest URI from Makefile -->
47 <dt>Latest version</dt>
48 <dd>@BASEURI@</dd>
50 <dt>Editors</dt>
51 <dd>TBD</dd>
53 <dt>Authors</dt>
54 <dd>
55 <!-- The following are the folk that I'm definitely know have contributed
56 text or code to this document: add others as appropriate -->
57 <span property="dc:creator">Alasdair J G Gray</span>,
58 <span property="dc:creator">Norman Gray</span>,
59 <span property="dc:creator">Frederic V Hessman</span> and
60 <span property="dc:creator">Andrea Preite Martinez</span>
61 </dd>
62 </dl>
63 <hr/>
64 </div>
66 <div class="section-nonum" id="abstract">
67 <p class="title">Abstract</p>
69 <div class="abstract">
70 <p>As the astronomical information processed within the <em>Virtual Observatory
71 </em> becomes more complex, there is an increasing need for a more
72 formal means of identifying quantities, concepts, and processes not
73 confined to things easily placed in a FITS image, or expressed in a
74 catalogue or a table. We proposed that the IVOA adopt a standard
75 format for vocabularies based on the W3C's <em>Resource Description
76 Framework</em> (RDF) and <em>Simple Knowledge Organization System</em>
77 (SKOS). By adopting a standard and simple format, the IVOA will
78 permit different groups to create and maintain their own specialized
79 vocabularies while letting the rest of the astronomical community
80 access, use, and combined them. The use of current, open standards
81 ensures that VO applications will be able to tap into resources of the
82 growing semantic web. Several examples of useful astronomical
83 vocabularies are provided, including work on a common IVOA thesaurus
84 intended to provide a semantic common base for VO applications.</p>
85 </div>
87 </div>
89 <div class="section-nonum" id="status">
90 <p class="title">Status of this document</p>
92 <p>This is an IVOA Working Draft. The first release of this document was
93 <span property="dc:date">@RELEASEDATE@</span>.</p>
95 <p>This document is an IVOA Working Draft for review by IVOA members
96 and other interested parties. It is a draft document and may be
97 updated, replaced, or obsoleted by other documents at any time. It is
98 inappropriate to use IVOA Working Drafts as reference materials or to
99 cite them as other than <q>work in progress</q>.</p>
101 <p>A list of current IVOA Recommendations and other technical
102 documents can be found at
103 <a href="http://www.ivoa.net/Documents/"><code>http://www.ivoa.net/Documents/</code></a>.</p>
105 <h3>Acknowledgments</h3>
107 <p>We would like to thank the members of the IVOA semantic working
108 group for many interesting ideas and fruitful discussions.</p>
109 </div>
111 <h2><a id="contents" name="contents">Table of Contents</a></h2>
112 <?toc?>
114 <hr/>
116 <div class="section" id="introduction">
117 <p class="title">Introduction</p>
119 <div class="section">
120 <p class="title">Vocabularies in astronomy</p>
122 <p>Astronomical information of relevance to the Virtual Observatory
123 (VO) is not confined to quantities easily expressed in a catalogue or
124 a table.
125 Fairly simple things such as position on the sky, brightness in some
126 units, times measured in some frame, redshits, classifications or
127 other similar quantities are easily manipulated and stored in VOTables
128 and can currently be identified using IVOA Unified Content Descriptors
129 (UCDs) <span class="cite">std:ucd</span>.
130 However, astrophysical concepts and quantities use a wide variety of
131 names, identifications, classifications and associations, most of
132 which cannot be described or labelled via UCDs.</p>
134 <p>There are a number of basic forms of organised semantic knowledge
135 of potential use to the VO, ranging from informal <q>folksonomies</q>
136 (where users are free to choose their own labels) at one extreme, to
137 formally structured <q>vocabularies</q> (where the label is drawn from
138 a predefined set of defintions which can include relationships between
139 labels) and <q>ontologies</q> (where the domain is captured in a data
140 model) at the other.
141 More formal definitions are presented later in this document.
142 </p>
144 <!-- <span
145 class='todo' >I think this list covers definitions covered more
146 naturally in the text below it - omissable?[NG]</span></p>
147 <ul>
148 <li>A <em>controlled vocabulary</em> is a standardized list of
149 words or other tokens with accepted meanings (for example <q>M31</q>,
150 <q>spiral galaxy</q>, <q>star</q>, <q>gas</q>, <q>dust</q>,
151 <q>cloud</q>, <q>black hole</q>, <q>Dark Matter</q>,
152 <q>halo</q>). See the fuller discussion in <span class='xref'
153 >vocab</span>.</li>
155 <li>A <em>taxonomy</em> is a controlled vocabulary encompassing all of
156 the members of a semantic group (for example there are <q>spiral</q>,
157 <q>elliptical</q>, <q>lenticular</q>, and <q>irregular</q> galaxies).</li>
159 <li>A <em>thesaurus</em> is a controlled vocabulary with some linking
160 between tokens so that simple hierarchical structures and equivalences
161 can be identified (for example <q>M31</q> is a narrower term for a <q>spiral
162 galaxy</q> which, in turn, is a narrower term for a
163 <q>galaxy</q>).</li>
165 <li>At the most formal end of this spectrum, an <em>ontology</em> is,
166 in the now-standard description
167 ultimately attributable to <span class='cite' >gruber93</span>, <q>a
168 formal specification of a shared conceptualisation</q>, that is, a set
169 of classes and properties which articulate a model of the world (see
170 also <span class='cite' >baader04</span>). It can range from an
171 elaborate set of definitions and restrictions, to a lightweight model
172 which is barely more than a set of subclass relationships. For
173 example, one might define a set of astronomical concepts and their
174 relations with each other, and say that <q>M31</q> is a
175 member of the class <q>Spiral Galaxy</q>, the latter consisting of
176 <q>Stars</q>, <q>Gas and Dust Clouds</q>, a <q>Central Black Hole</q>,
177 and a <q>Dark Matter Halo</q>.</li>
178 </ul>
180 <p>The term <q>folksonomy</q> has emerged in the last few years, to
181 describe what would in other circumstances be described as an
182 uncontrolled keyword list. The new term, and the substantial recent
183 interest in it, is a consequence of the realisation that even such a
184 simple mechanism can in certain circumstances (well-known examples are
185 the Flickr and del.icio.us social services) add substantial value to
186 a set of resources.</p>
187 -->
189 <p>
190 An astronomical ontology is necessary if we are to have a computer
191 (appear to) `understand' something of the domain.
192 There has been some progress towards creating an ontology of
193 astronomical object types <span
194 class="cite">std:ivoa-astro-onto</span> to meet this need.
195 However there are distinct use cases for letting human users find
196 resources of interest through search and navigation of the information space.
197 The most appropriate technology to meet these use cases derives from
198 the Information Science community, that of <em>controlled
199 vocabularies, taxonomies and thesauri</em>.
200 In the present document, we do not distinguish between controlled
201 vocabularies, taxonomies and thesauri, and use the term
202 <em>vocabulary</em> to represent all three.
203 </p>
205 <p>One of the best examples of the need for a simple vocabulary within
206 the VO is VOEvent <span class="cite">std:voevent</span>, the VO
207 standard for handling astronomical events: if someone broadcasts, or
208 `publishes', the occurrence of an event, the implication is that
209 someone else is going to want to respond to it, but no institution is
210 interested in all possible events, so some standardised information
211 about what the event `is about' is necessary, in a form which
212 ensures that the parties can communicate effectively. If a `burst' is
213 announced, is it a Gamma Ray Burst due to the collapse of a star in a
214 distant galaxy, a solar flare, or the brightening of a stellar or AGN
215 accretion disk? If a publisher doesn't use the label one might have
216 expected, how is one to guess what other equivalent labels might have
217 been used?</p>
219 <p>There have been a number of attempts to create astronomical
220 vocabularies.</p>
221 <ul>
223 <li>The <em>Second Reference Dictionary of the Nomenclature of
224 Celestial Objects</em> <span class="cite">lortet94</span>, <span
225 class="cite">lortet94a</span> contains 500 paper pages of astronomical
226 nomenclature</li>
228 <li>For decades professional journals have used a set of reasonably
229 compatible keywords to help classify the content of whole articles.
230 These keywords have been analysed by Preite Martinez &amp; Lesteven
231 <span class="cite">preitemartinez07</span>, from which they derived a set
232 of common keywords constituting one of the potential bases for a
233 fuller VO vocabulary. The same authors also attempted to derive a set
234 of common concepts by analyzing the contents of abstracts in journal
235 articles, which should comprise a list of tokens/concepts more
236 up-to-date than the old list of journal keywords. A similar but
237 less formal attempt was made by Hessman for the VOEvent working group,
238 resulting in a similar list <span class="todo">Find Hessman05
239 reference, and check differences from the A&amp;A list</span>.</li>
241 <li>Astronomical databases generally use simple sets of keywords –
242 sometimes hierarchically organized – to aid the users in the querying
243 of the databases. Two examples from totally different contexts are the
244 list of object types used in the <a
245 href="http://simbad.u-strasbg.fr">Simbad</a> database and the search
246 keywords used in the educational Hands-On Universe image database
247 portal.</li>
249 <li>The Astronomical Outreach Imagery (AOI) working group has created
250 a simple taxonomy for helping to classify images used for educational
251 or public relations <span class="cite">std:aoim</span>.</li>
252 <!--
253 <li>The Hands-On Universe project (see <span class='url'
254 >http://sunra.lbl.gov/telescope2/index.html</span>) has maintained a
255 public database of images for use by the general public since the
256 1990s. The images are very heterogeneous, since they are gathered from
257 a variety of professional, semi-professional, amateur, and school
258 observatories, so a simple taxonomy is used to facilitate browsing
259 by the users of the database.</li>
261 <li>Remote Telescope Markup Language <span
262 class="cite">std:rtml</span>, a document definition for the transfer
263 of observing requests that has been adopted by the Heterogeneous
264 Telescope Network (HTN) Consortium and is indirectly supported by the
265 VOEvent protocol, currently contains several telescope and
266 observation-related taxonomies of terms (e.g. for devices, filters,
267 objects).<span class='todo'>Confirm status: does this need to be
268 converted to SKOS? [AG]. No: RTML will use IVOAT! [FVH] So delete
269 this item? [NG]</span></li>
270 -->
271 <li>In 1993, Shobbrook and Shobbrook published an Astronomy Thesaurus
272 endorsed by the IAU <span class='cite' >shobbrook92</span>. This
273 collection of nearly 3000 terms, in five languages, is a valuable
274 resource, but has seen little use in recent years. Its very size,
275 which gives it expressive power, is a disadvantage to the extent that
276 it is therefore hard to use.</li>
278 <li>The Unified Content Descriptors <span class='cite' >std:ucd</span>
279 (UCD) constitute the main controlled vocabulary of the IVOA and
280 contains some taxonometric information. However, UCD suffers from two
281 major problems which makes it difficult to use beyond the present
282 applications of labeling VOTables: firstly, there is no standard means of
283 identifying and processing the contents of the text-based reference
284 document; and secondly, the content cannot be openly extended beyond that set
285 by a formal IVOA committee without going through a laborious and
286 time-consuming negotiation process of extending the primary vocabulary
287 itself.</li>
289 </ul>
290 </div>
292 <div class="section">
293 <p class="title">Formalising and managing multiple vocabularies</p>
295 <p>We find ourselves in the situation where there are multiple
296 vocabularies in use, describing a broad range of resources of interest
297 to professional and amateur astronomers, and members of the public.
298 These different vocabularies use different terms and different
299 relationships to support the different constituencies they cater for.
300 For example, <q>delta Sct</q> and <q>RR Lyr</q> are terms one would
301 find in a vocabulary aimed at professional astronomers, associated
302 with the notion of <q>variable star</q>; however one would
303 <em>not</em> find such technical terms in a vocabulary intended to
304 support outreach activities.</p>
306 <p>One approach to this problem is to create a single consensus
307 vocabulary, which draws terms from the various existing vocabularies
308 to create a new vocabulary which is able to express anything its users
309 might desire. The problem with this is that such an effort would be
310 very expensive: both in terms of time and effort on the part of those
311 creating it, and to the potential users, who have to learn
312 to navigate around it, recognise the new terms, and who have to be
313 supported in using the new terms correctly (or, more often,
314 incorrectly).</p>
316 <p>The alternative approach to the problem is to evade it, and this is
317 the approach taken in this document. Rather than deprecating the
318 existence of multiple overlapping vocabularies, we embrace it,
319 formalise all of them, and formally declare the relationships between
320 them. This means that:</p>
321 <ul>
322 <li>The various vocabularies are allowed to evolve separately, on
323 their own timescales, managed either by the IVOA, individual working
324 groups within the IVOA, or by third parties;</li>
326 <li>Specialized vocabularies can be developed and maintained by the
327 community with the most knowledge about a specific topic, ensuring
328 that the vocabulary will have the right breadth, depth, and
329 accuracy;</li>
331 <li>Users can choose the vocabulary or combination of vocabularies most
332 appropriate to their situation, either when annotating resources, or
333 when querying them; and</li>
335 <li>We can retain the previous investments made in vocabularies by
336 users and resource owners.</li>
338 </ul>
340 <p>The purpose of this proposal is to establish a common format for
341 the grass-roots creation, publishing, use, and manipulation of
342 astronomical vocabularies within the Virtual Observatory, based upon
343 the W3C's SKOS standard. We include as appendices to this proposal
344 formalised versions of a number of existing vocabularies, encoded as
345 SKOS vocabularies <span class="cite">std:skoscore</span>.</p>
347 </div>
349 </div>
351 <div class='section'>
352 <p class='title'>SKOS-based vocabularies</p>
354 <div class="section" id='vocab'>
355 <p class="title">Selection of the vocabulary format</p>
357 <p>After extensive online and face-to-face discussions, the authors have
358 brokered a consensus within the IVOA community that
359 formalised vocabularies should be published at least in SKOS (Simple Knowledge
360 Organising Systems) format, a W3C draft standard application of RDF to the
361 field of knowledge organisation <span
362 class="cite">std:skoscore</span>. SKOS draws on long experience
363 within the Library and Information Science community, to address a
364 well-defined set of problems to do with the indexing and retrieval of
365 information and resources; as such, it is a close match to the problem
366 this working group is addressing.</p>
368 <p>ISO 5964 <span class='cite' >std:iso5964</span> defines a number of
369 the relevant terms (ISO 5964:1985=BS 6723:1985; see also <span
370 class='cite' >std:bs8723-1</span> and <span class='cite'
371 >std:z39.19</span>), and some of the (lightweight) theoretical
372 background. The only technical distinction relevant to this document
373 is that between `vocabulary' and `thesaurus': BS-8723-1 defines a
374 thesaurus as a</p>
375 <blockquote>
376 Controlled vocabulary in which concepts are represented by preferred
377 terms, formally organized so that paradigmatic relationships between
378 the concepts are made explicit, and the preferred terms are
379 accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE:
380 The purpose of a thesaurus is to guide both the indexer and the
381 searcher to select the same preferred term or combination of preferred
382 terms to represent a given subject. (BS-8723-1, sect. 2.39)
383 </blockquote>
384 <p>with a similar definition in ISO-5964 sect. 3.16. The paradigmatic
385 relationships in question are those relating a term to a <q>broader</q>,
386 <q>narrower</q> or more generically <q>related</q> term, with an operational
387 definition of <q>broader term</q> which is such that a resource retrieved
388 by a given term will also be retrieved by that term's <q>broader term</q>.
389 This is not a subsumption relationship, as there is no implication
390 that the concept referred to by a narrower term is of the same
391 <em>type</em> as a broader term.</p>
393 <p>Thus <strong>a vocabulary (SKOS or otherwise) is not an
394 ontology</strong>. It has lighter and looser semantics than an
395 ontology, and is specialised for the restricted case of resource
396 retrieval. Those interested in ontological analyses can easily
397 transfer the vocabulary relationship information from SKOS to a formal
398 ontological format such as OWL <span class='cite' >std:owl</span>.</p>
400 <!--
401 <p><span class='todo' >What is to be the format of the `master' files?
402 SKOS or mildly-formatted plain text?[NG] By definition, this will be
403 left up to the publishers! All we need to see is SKOS. [FVH] There's
404 more than one notation for SKOS (RDF/XML and Turtle/N3): do we need to
405 mandate one over others (FVH says yes, RDF/XML; NG says no). Open
406 issue.</span></p>
407 -->
408 </div>
410 <div class='section'>
411 <p class='title'>Content and format of a SKOS vocabulary</p>
413 <p>A published vocabulary in SKOS format consists of a set of
414 <q>concepts</q> – the examples below are shown in the Turtle notation
415 for RDF <span class='cite' >std:turtle</span> (this is similar to the
416 more informal N3 notation).
417 Each concept should contain the following elements:</p>
418 <ul>
419 <li>A single URI representing the concept, mainly for use by
420 computers but preferably human-readable, e.g. an entry for <q>spiral
421 galaxy</q> might look like: <code>&lt;#spiralGalaxy&gt; a
422 skos:Concept</code>.
423 <!-- <code>&lt;skos:Concept rdf:about="#spiralGalaxy"&gt;</code>-->
424 </li>
426 <li>A prefered label in each supported language for the vocabulary for
427 use by humans, e.g. <code>skos:prefLabel "spiral galaxy"@en,
428 "Spiralgalaxie"@de</code>.
429 <!-- <code>&lt;skos:prefLabel&gt;spiral galaxy&lt;/skos:prefLabel&gt;</code>-->
430 </li>
432 <li>Optional alternative labels which applications may encounter or in
433 common use, whether simple synonyms or commonly-used aliases,
434 e.g. <q>GRB</q> for "gamma-ray burst": <code>skos:altLabel
435 "GRB"@en</code> <!--<code>&lt;skos:altLabel
436 lang="de"&gt;Spiralgalaxie&lt;/skos:altLabel&gt;</code> --></li>
438 <li>Optional hidden labels which capture common misspellings for
439 either the preferred or alternate labels, e.g. <q>spiral glaxy</q> for
440 <q>spiral galaxy</q>: <code>skos:hiddenLabel "spiral
441 glaxy"@en</code>.</li>
443 <li>A definition for the concept, where one exists in the original
444 vocabulary, to clarify the meaning of the term,
445 e.g. <code>skos:definition "A galaxy having a spiral
446 structure."@en</code>.</li>
448 <li>A scope note to further clarify a defintion, or the usage of the
449 concept, e.g. <code>skos:scopeNote "Spiral galaxies fall into one of
450 three categories: Sa, Sc, and Sd"@en</code>.</li>
452 <li>Optional, a concept may be involved in any number of relationships
453 to other concepts. The types of relationships are
454 <ul>
455 <li>Narrower or more specific concepts, e.g. a link to the concept
456 representing a <q>barred spiral galaxy</q>: <code>skos:narrower
457 &lt;#barredSpiralGalaxy&gt;</code>.
458 <!--<code>&lt;skos:narrower rdf:resource="#barredSpiralGalaxy&gt;</code>-->
459 </li>
460 <li>Broader or more general concepts, e.g. a link to the token
461 representing galaxies in general: <code>skos:broader
462 &lt;#galaxy&gt;</code>.
463 <!--<code>&lt;skos:broader rdf:resource="#galaxy&gt;</code>-->
464 </li>
465 <li>Related concepts, e.g. a link to the token representing spiral
466 arms of galaxies: <code>skos:related &lt;#spiralArm&gt;</code>
467 <!--<code>&lt;skos:related rdf:resource="#spiralArm"&gt;</code>-->
468 (note this relationship does not say that spiral galaxies have spiral
469 arms – that would be ontological information of a higher order which
470 is beyond the requirements for information stored in a vocabulary).</li>
471 </ul>
472 </li>
473 </ul>
475 <p>In addition to the information about a single concept, a vocabulary
476 can contain information to help users navigate its structure and
477 contents:</p>
478 <ul>
479 <li>The <q>top concepts</q> of the vocabulary, i.e. those that occur
480 at the top of the vocabulary hierarchy defined by the broader/narrower
481 relationships, can be explicitly stated to make it easier to navigate
482 the vocabulary.</li>
484 <li>Concepts that form a natural group can be defined as being members
485 of a <q>collection</q>.</li>
487 <li>Versioning information can be added using change notes.</li>
489 <li>Additional metadata about the vocabulary, e.g. the publisher, may
490 be documented using the Dublin Core metadata set <span class='cite'
491 >std:dublincore</span>.</li>
492 </ul>
493 </div>
496 <div class='section'>
497 <p class='title'>Relationships Between Vocabularies</p>
499 <p>
500 There already exist several vocabularies in the domain of astronomy.
501 Instead of attempting to replace all these existing vocabularies,
502 which have been developed to achieve different aims and user groups,
503 we embrace them.
504 This requires a mechanism to relate the concepts in the different
505 vocabularies.
506 The W3C are in the process of developing a standard for relating the
507 concepts in different SKOS vocabularies <span
508 class='cite'>std:skosMapping</span> and when completed this should be
509 reviewed for use by the IVOA.
510 </p>
512 <p>
513 Four types of relationship are sufficient to capture the relationships
514 between concepts in vocabularies and are similar to those defined for
515 relationships between concepts within a single vocabulary.
516 The relationships are as follows.
517 <span class='todo'>[TODO] Add specifics to the examples.</span>
518 </p>
519 <ul>
521 <li>
522 Equivalence between concepts, i.e. the concpets in the different
523 vocabularies refer to the same real world entity.
524 This is captured with the following RDF statement
525 <code>iau93:#SPIRALGALAXY map:exactMatch ivoat:#spiralGalaxy</code>
526 which states the the spiral galaxy concept in the IAU thesaurus is the
527 same as the spiral galaxy concept in the IVOAT.
528 (Note the use of an external namespaces <code>iau93</code> and
529 <code>ivoat</code> which must be defined within the document.)
530 </li>
532 <li>
533 Broader concept, i.e. there is not an equivalent concept but there is
534 a more general one.
535 This is captured with the RDF statement <code>iau93:#XXX
536 map:broadMatch ivoat:#YYY</code> which states that the IVOAT concept
537 YYY is more general than the IAU93 concept XXX.
538 </li>
540 <li>
541 Narrower concept, i.e. there is not an equivalent concept but there is
542 a more specific one.
543 This is captured with the RDF statement <code>iau93:#XXX
544 map:narrowMatch ivoat:#YYY</code> which states that the IVOAT concept
545 YYY is more specific than the IAU93 concept XXX.
546 </li>
548 <li>
549 Related concept, i.e. there is some form of relationship.
550 This is captured with the RDF statement <code>iau93:#XXX
551 map:relatedMatch ivoat:#YYY</code> which states that the IAU93 concept
552 XXX has an association with the IVOAT concept YYY.
553 </li>
555 </ul>
557 <p>
558 <span class='todo'>[TODO:] Enter text regarding the resolution of
559 issue 7.</span>
560 </p>
562 </div>
564 <div class='section'>
565 <p class='title'>Suggested good practices</p>
567 <p>As long as the vocabularies conform to the SKOS standard and
568 published in a machine processable RDF format, there is nothing
569 keeping a VO application from using the vocabulary to support the
570 human user and to enable new connections between different sources of
571 information.
572 However, we have identified a set of
573 <q>best practice rules</q> which, if followed, will make the creation,
574 management, and use of the vocabularies within the VO simpler and more
575 effective:</p>
577 <ol>
578 <li>The SKOS documents defining the vocabulary should be published at
579 a long-term accessible URI and should be mirrored at a central IVOA
580 vocabulary repository.
581 Each version of the vocabulary should be indicated within the name
582 (e.g. "MyFavoriteVocabulary-v3.14") and previous versions should
583 continue to be available even after having been subsumed by newer
584 versions; Published vocabulary updates should be infrequent and
585 individual changes should be documented, e.g. by
586 <code>&lt;skos:changeNote&gt;</code>. The vocabulary namespace should
587 be the same as the location of the vocabulary.</li>
589 <li>Concept identifiers should consist only of the letters a-z, A-Z,
590 and numbers 0-9, i.e. no spaces, no exotic letters (e.g. umlauts), and
591 no characters which would make a token inexpressible as part of a URI;
592 since tokens are for use by computers only, this is not a big
593 restriction - the exotic letters can be used within the labels and
594 documentation if appropriate.</li>
596 <li>Token names should be kept in human-readable form, directly
597 reflect the implied meaning, and not be semi-random identifiers only
598 (e.g. <q>spiralGalaxy</q>, not "t1234567"); tokens should preferably
599 be created via a direct conversion from the preferred label via
600 removable/translation of non-token characters (see above) and
601 sub-token separation via capitalization of the first sub-token
602 character (e.g. the label "My favorite idea-label #42" is converted
603 into "MyFavoriteIdeaLabel42"). <span class='todo'>Open
604 issue</span></li>
606 <li>Labels should be in the form of the source vocabulary. When
607 developing a new vocabulary the singular form is preferred,
608 e.g. <q>spiral galaxy</q>, not "spiral galaxies". <span
609 class='todo'>Open issue</span></li>
611 <li>Each concept should have a definition
612 (<code>skos:definition</code>) that constitutes a short description of
613 the concept which could be adopted by an application using the
614 vocabulary; The use of additional documentation in standard SKOS or
615 Dublin format (see above) is encouraged. <span class='todo'>Note
616 distinction between description and SKOS scope-note</span></li>
618 <li>The language localization should be declared where appropriate,
619 e.g. preferred labels, alternate labels, defintions, etc.</li>
621 <li>Relationships (<q>broader</q>, <q>narrower</q>, <q>related</q>)
622 between concepts are encouraged, but not required; if used, they
623 should be complete (e.g. all <q>broader</q> links have corresponding
624 <q>narrower</q> links in the referenced entries and <q>related</q>
625 entries link each other).</li>
627 <li><q>TopConcept</q> entries (see above) should be declared and
628 normally consist of those concepts that do not have any <q>broader</q>
629 relationships (i.e. not at a sub-ordinate position in the
630 hierarchy).</li>
632 <li>Publishers are encouraged to publish <q>mappings</q> between their
633 vocabularies and other commonly used vocabularies. These should be
634 external to the defining vocabulary document so that the vocabulary
635 can be used independently of the publisher's mappings.</li>
636 </ol>
638 <p>These suggestions are by no means trivial – there was considerable
639 discussion within the semantic working group on many of these topics,
640 particularly about token formats (some wanted lower-case only), and
641 singular versus plural forms of the labels (different traditions exist
642 within the international library science community). Obviously, no
643 publisher of an astronomical vocabulary has to adopt these rules, but
644 the adoption of these rules will make it easier to use the vocabularly
645 in external generic VO applications. However, VO applications should
646 be developed to accept any vocabulary that complies with the latest
647 SKOS standard <span class="cite">std:skoscore</span>.</p>
648 </div>
650 </div>
653 <div class="section">
654 <p class="title">Example vocabularies</p>
656 <p>The intent of having the IVOA adopt SKOS as the prefered format for
657 astronomical vocabularies is to encourage the creation and management
658 of diverse vocabularies by competent astronomical groups, so that
659 users of the VO and related resources can benefit directly and
660 dynamically without the intervention of the IAU or IVOA. However, we
661 felt it important to provide several examples of vocabularies in the
662 SKOS format as part of the proposal, to illustrate their simplicity
663 and power, and to provide an immediate vocabular basis for VO
664 applications.</p>
666 <p>We provide a set of SKOS files representing the vocabularies which
667 have been developed, and mappings between them. These can be
668 downloaded at the URL</p>
669 <blockquote>
670 <span class='url'>@BASEURI@/@DISTNAME@.tar.gz</span>
671 </blockquote>
673 <p><span class='todo' >[To be expanded:] there are no mappings at the
674 moment. Also, the vocabularies are all in a single language, though
675 translations of the IAU93 thesaurus are available.</span></p>
677 <div class='section'>
678 <p class='title'>A Constellation Name Vocabulary (normative)</p>
680 <p>This vocabulary is presented as a simple example of an astronomical vocabulary for a very particular purpose, e.g. handling constellation information like that commonly encountered in variable star research. For example, <q>SS Cygni</q> is a cataclysmic variable located in the constellation <q>Cygnus</q>. The name of the star uses the genitive form <q>Cygni</q>, but the alternate label <q>SS Cyg</q> uses the standard abbreviation <q>Cyg</q>. Given the constellation vocabulary, all of these forms are recorded together in a computer-manipulatable format. <span class='todo'>`Incorrect' forms should probably be represented in SKOS `hidden labels'</span></p>
682 <p>The &lt;skos:ConceptScheme&gt; contains a single &lt;skos:TopConcept&gt;, <q>constellation</q></p>
683 <pre>
684 &lt;skos:Concept rdf:about="#constellation"&gt;
685 &lt;skos:inScheme rdf:resource=""/&gt;
686 &lt;skos:prefLabel&gt;constellation&lt;/skos:prefLabel&gt;
687 &lt;skos:definition&gt;IAU-sanctioned constellation names&lt;/skos:definition&gt;
688 &lt;skos:narrower rdf:resource="#Andromeda"/&gt;
689 ...
690 &lt;skos:narrower rdf:resource="#Vulpecula"/&gt;
691 &lt;/skos:Concept&gt;
692 </pre>
693 <p><span class='todo' >Alternate Turtle form, for illustration, with
694 the SKOS namespace being the default...</span></p>
695 <pre>
696 &lt;#constellation&gt; a :Concept;
697 :inScheme &lt;&gt;;
698 :prefLabel "constellation";
699 :definition "IAU-sanctioned constellation names";
700 :narrower &lt;#Andromeda&gt;;
701 ...
702 :narrower &lt;#Vulpecula&gt;.
703 </pre>
704 <p>and the entry for <q>Cygnus</q> is</p>
705 <pre>
706 &lt;skos:Concept rdf:about="#Cygnus"&gt;
707 &lt;skos:inScheme rdf:resource=""/&gt;
708 &lt;skos:prefLabel&gt;Cygnus&lt;/skos:prefLabel&gt;
709 &lt;skos:definition&gt;Cygnus&lt;/skos:definition&gt;
710 &lt;skos:altLabel&gt;Cygni&lt;/skos:altLabel&gt;
711 &lt;skos:altLabel&gt;Cyg&lt;/skos:altLabel&gt;
712 &lt;skos:broader rdf:resource="#constellation"/&gt;
713 &lt;skos:scopeNote&gt;Cygnus is nominative form; The alternative labels are the genitive and short forms&lt;/skos:scopeNote&gt;
714 &lt;/skos:Concept&gt;
715 </pre>
716 <p>Note that SKOS alone does not permit the distinct differentiation
717 of genitive forms and abbreviations, but the use of alternate labels
718 is more than adequate enough for processing by VO applications where
719 the difference between <q>SS Cygni</q>, <q>SS Cyg</q>, and the incorrect form
720 <q>SS Cygnus</q> is probably irrelevant.</p>
721 </div>
723 <div class='section'>
724 <p class='title'>The 1993 IAU Thesaurus (normative)</p>
726 <p>The IAU Thesaurus consists of concepts with mostly capitalized
727 labels and a rich set of thesaurus relationships (<q>BF</q> for
728 "broader form", <q>NF</q> for <q>narrower form</q>, and <q>RF</q> for
729 <q>related form</q>). The thesaurus also contains <q>U</q> (for
730 <q>use</q>) and <q>UF</q> (<q>use for</q>) relationships. In a SKOS
731 model of a vocabulary these are captured as alternative labels. A
732 separate document contains translations of the vocabulary terms in
733 five languages: English, French, German, Italian, and
734 Spanish. Enumeratable concepts are plural (e.g. <q>SPIRAL
735 GALAXIES</q>) and non-enumerable concepts are singular
736 (e.g. <q>STABILITY</q>). Finally, there are some usage hints like
737 <q>combine with other</q></p>
739 <p>In converting the IAU Thesaurus to SKOS, we have been as faithful
740 as possible to the original format of the thesaurus. Thus, preferred
741 labels have been kept in their uppercase format.</p>
743 </div>
745 <div class='section'>
746 <p class='title'>The Astronomy &amp; Astrophysics Keyword List (normative)</p>
748 <p><span class='todo'>[TODO] AG to write a short description here</span></p>
749 </div>
751 <div class='section'>
752 <p class='title'>The AOIM Taxonomy (normative)</p>
754 <p><span class='todo'>[TODO] AG to write a short description here</span></p>
756 </div>
758 <div class='section'>
759 <p class='title'>The UCD1+ Vocabulary (non-normative)</p>
761 <p>The UCD standard is an officially sanctioned and managed vocabulary
762 of the IVOA. The normative document is a simple text file containing
763 entries consisting of tokens (e.g. <code>em.IR</code>), a short
764 description, and usage information (<q>syntax codes</q> which permit
765 UCD tokens to be concatenated). The form of the tokens implies a
766 natural hierarchy: <code>em.IR.8-15um</code> is obviously a narrower
767 term than <code>em.IR</code>, which in turn is narrower than
768 <code>em</code>.</p>
770 <p>Given the structure of the UCD1+ vocabulary, the natural
771 translation to SKOS consists of preferred labels equal to the original
772 tokens (the UCD1 words include dashes and periods), vocabulary tokens
773 created using the "5th Commandment" (e.g. "emIR815Um" for
774 <code>em.IR.8-15um</code>), direct use of the definitions, and the syntax codes
775 placed in usage documentation: <code>&lt;skos:scopeNote&gt;UCD syntax code: P&lt;/skos:scopeNote&gt;</code>
778 <p>Note that the SKOS document containing the UCD1+ vocabulary does
779 NOT consistute the official version: the normative document is still
780 the text list. However, on the long term, the IVOA may decide to make
781 the SKOS version normative, since the SKOS version contains all of the
782 information contained in the original text document but has the
783 advantage of being in a standard format easily read and used by any
784 application on the semantic web.</p>
786 </div>
788 <div class='section'>
789 <p class='title'>The proposed IVOA Thesaurus</p>
791 <p>While it is true that the adoption of SKOS will make it easy to
792 publish and access different astronomical vocabularies, the fact is
793 that there is no vocabulary which makes it easy to jump-start the
794 use of vocabularies in generic astrophysical VO applications: each of
795 the previously developed vocabularies has their own limits and
796 biases. For example, the IAU Thesaurus provides a large number of
797 entries, copious relationships, and translations to four other languages,
798 but there are no definitions, many concepts are now only useful for
799 historical purposes (e.g. many photographic or historical instrument
800 entries), some of the relationships are false or outdated, and many
801 important or newer concepts and their common abbreviations are
802 missing.</p>
804 <p>Despite its faults, the IAU Thesaurus constitutes a very extensive
805 vocabulary which could easily serve as the basis vocabulary once
806 we have removed its most egregrious faults and extended it to cover the
807 most obvious semantic holes. To this end, a heavily revised IAU
808 thesaurus is in preparation for use within the IVOA and other
809 astronomical contexts. The goal is to provide a general vocabulary
810 foundation to which other, more specialized, vocabularies can be added
811 as needed, and to provide a good <q>lingua franca</q> for the creation of
812 vocabulary mappings.</p>
813 </div>
814 </div> <!-- End: Example vocabularies -->
817 <div class="appendices">
819 <div class="section-nonum" id="bibliography">
820 <p class="title">Bibliography</p>
821 <?bibliography rm-refs ?>
822 </div>
824 <p style="text-align: right; font-size: x-small; color: #888;">
825 $Revision$ $Date$
826 </p>
828 </div>
830 </body>
831 </html>


Name Value
svn:keywords Author Date Revision

ViewVC Help
Powered by ViewVC 1.1.26