/[volute]/trunk/projects/vocabularies/doc/vocabularies.xml
ViewVC logotype

Contents of /trunk/projects/vocabularies/doc/vocabularies.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 40 - (show annotations)
Thu Jan 24 13:23:49 2008 UTC (12 years, 10 months ago) by norman.x.gray
File MIME type: text/xml
File size: 45079 byte(s)
Add an initial set of use cases
Adjust the Makefile.in (echo \b has problems on POSIX (which OS X 10.5
  is compatible with; switch to printf)

1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Based on template at
3 http://www.ivoa.net/Documents/templates/ivoa-tmpl.html -->
4 <html xmlns="http://www.w3.org/1999/xhtml"
5 xmlns:dc="http://purl.org/dc/elements/1.1/"
6 xmlns:dcterms="http://purl.org/dc/terms/"
7 xml:lang="en" lang="en">
8
9 <head>
10 <title>Vocabularies in the Virtual Observatory</title>
11 <link rev="made" href="http://nxg.me.uk/norman/#norman" title="Norman Gray"/>
12 <meta name="author" content="Norman Gray"/>
13 <meta name="DC.subject" content="IVOA, Virtual Observatory, Vocabulary"/>
14 <meta name="rcsdate" content="$Date$"/>
15 <link href="http://www.ivoa.net/misc/ivoa_wd.css" rel="stylesheet" type="text/css"/>
16 <!-- style: make the ToC a little more compact, and without bullets -->
17 <style type="text/css">
18 div.toc ul { list-style: none; padding-left: 1em; }
19 div.toc li { padding-top: 0ex; padding-bottom: 0ex; }
20 li { padding-top: 1ex; padding-bottom: 1ex; }
21 td { vertical-align: top; }
22 span.userinput { font-weight: bold; }
23 span.url { font-family: monospace; }
24 q { color: #666; }
25 q:before { content: "“"; }
26 q:after { content: "”"; }
27 .todo { background: #ff7; }
28 </style>
29 </head>
30
31 <body>
32 <div class="head">
33 <table>
34 <tr><td><a href="http://www.ivoa.net/"><img alt="IVOA logo" src="http://ivoa.net/icons/ivoa_logo_small.jpg" border="0"/></a></td></tr>
35 </table>
36
37 <h1>Vocabularies in the Virtual Observatory, v@VERSION@</h1>
38 <h2>IVOA Working Draft, @RELEASEDATE@ [DRAFT $Revision$]</h2>
39 <!-- $Revision$ $Date$ -->
40
41 <dl>
42 <dt>Working Group</dt>
43 <dd><em><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics">Semantics</a></em></dd>
44
45 <dt>This version</dt>
46 <dd><span class='url' >http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics</span><br/>@BASEURI@</dd> <!-- XXX adjust current/latest URI from Makefile -->
47
48 <dt>Latest version</dt>
49 <dd>@BASEURI@</dd>
50
51 <dt>Editors</dt>
52 <dd>TBD</dd>
53
54 <dt>Authors</dt>
55 <dd>
56 <!-- The following are the folk that I'm definitely know have contributed
57 text or code to this document: add others as appropriate -->
58 <span property="dc:creator">Alasdair J G Gray</span>,
59 <span property="dc:creator">Norman Gray</span>,
60 <span property="dc:creator">Frederic V Hessman</span> and
61 <span property="dc:creator">Andrea Preite Martinez</span>
62 </dd>
63 </dl>
64 <hr/>
65 </div>
66
67 <div class="section-nonum" id="abstract">
68 <p class="title">Abstract</p>
69
70 <div class="abstract">
71 <p>As the astronomical information processed within the <em>Virtual Observatory
72 </em> becomes more complex, there is an increasing need for a more
73 formal means of identifying quantities, concepts, and processes not
74 confined to things easily placed in a FITS image, or expressed in a
75 catalogue or a table. We proposed that the IVOA adopt a standard
76 format for vocabularies based on the W3C's <em>Resource Description
77 Framework</em> (RDF) and <em>Simple Knowledge Organization System</em>
78 (SKOS). By adopting a standard and simple format, the IVOA will
79 permit different groups to create and maintain their own specialized
80 vocabularies while letting the rest of the astronomical community
81 access, use, and combined them. The use of current, open standards
82 ensures that VO applications will be able to tap into resources of the
83 growing semantic web. Several examples of useful astronomical
84 vocabularies are provided, including work on a common IVOA thesaurus
85 intended to provide a semantic common base for VO applications.</p>
86 </div>
87
88 </div>
89
90 <div class="section-nonum" id="status">
91 <p class="title">Status of this document</p>
92
93 <p>This is an IVOA Working Draft. The first release of this document was
94 <span property="dc:date">@RELEASEDATE@</span>.</p>
95
96 <p>This document is an IVOA Working Draft for review by IVOA members
97 and other interested parties. It is a draft document and may be
98 updated, replaced, or obsoleted by other documents at any time. It is
99 inappropriate to use IVOA Working Drafts as reference materials or to
100 cite them as other than <q>work in progress</q>.</p>
101
102 <p>A list of current IVOA Recommendations and other technical
103 documents can be found at
104 <a href="http://www.ivoa.net/Documents/"><code>http://www.ivoa.net/Documents/</code></a>.</p>
105
106 <h3>Acknowledgments</h3>
107
108 <p>We would like to thank the members of the IVOA semantic working
109 group for many interesting ideas and fruitful discussions.</p>
110 </div>
111
112 <h2><a id="contents" name="contents">Table of Contents</a></h2>
113 <?toc?>
114
115 <hr/>
116
117 <div class="section" id="introduction">
118 <p class="title">Introduction</p>
119
120 <div class="section">
121 <p class="title">Vocabularies in astronomy</p>
122
123 <p>Astronomical information of relevance to the Virtual Observatory
124 (VO) is not confined to quantities easily expressed in a catalogue or
125 a table.
126 Fairly simple things such as position on the sky, brightness in some
127 units, times measured in some frame, redshits, classifications or
128 other similar quantities are easily manipulated and stored in VOTables
129 and can currently be identified using IVOA Unified Content Descriptors
130 (UCDs) <span class="cite">std:ucd</span>.
131 However, astrophysical concepts and quantities use a wide variety of
132 names, identifications, classifications and associations, most of
133 which cannot be described or labelled via UCDs.</p>
134
135 <p>There are a number of basic forms of organised semantic knowledge
136 of potential use to the VO, ranging from informal <q>folksonomies</q>
137 (where users are free to choose their own labels) at one extreme, to
138 formally structured <q>vocabularies</q> (where the label is drawn from
139 a predefined set of defintions which can include relationships between
140 labels) and <q>ontologies</q> (where the domain is captured in a data
141 model) at the other.
142 More formal definitions are presented later in this document.
143 </p>
144
145 <!-- <span
146 class='todo' >I think this list covers definitions covered more
147 naturally in the text below it - omissable?[NG]</span></p>
148 <ul>
149 <li>A <em>controlled vocabulary</em> is a standardized list of
150 words or other tokens with accepted meanings (for example <q>M31</q>,
151 <q>spiral galaxy</q>, <q>star</q>, <q>gas</q>, <q>dust</q>,
152 <q>cloud</q>, <q>black hole</q>, <q>Dark Matter</q>,
153 <q>halo</q>). See the fuller discussion in <span class='xref'
154 >vocab</span>.</li>
155
156 <li>A <em>taxonomy</em> is a controlled vocabulary encompassing all of
157 the members of a semantic group (for example there are <q>spiral</q>,
158 <q>elliptical</q>, <q>lenticular</q>, and <q>irregular</q> galaxies).</li>
159
160 <li>A <em>thesaurus</em> is a controlled vocabulary with some linking
161 between tokens so that simple hierarchical structures and equivalences
162 can be identified (for example <q>M31</q> is a narrower term for a <q>spiral
163 galaxy</q> which, in turn, is a narrower term for a
164 <q>galaxy</q>).</li>
165
166 <li>At the most formal end of this spectrum, an <em>ontology</em> is,
167 in the now-standard description
168 ultimately attributable to <span class='cite' >gruber93</span>, <q>a
169 formal specification of a shared conceptualisation</q>, that is, a set
170 of classes and properties which articulate a model of the world (see
171 also <span class='cite' >baader04</span>). It can range from an
172 elaborate set of definitions and restrictions, to a lightweight model
173 which is barely more than a set of subclass relationships. For
174 example, one might define a set of astronomical concepts and their
175 relations with each other, and say that <q>M31</q> is a
176 member of the class <q>Spiral Galaxy</q>, the latter consisting of
177 <q>Stars</q>, <q>Gas and Dust Clouds</q>, a <q>Central Black Hole</q>,
178 and a <q>Dark Matter Halo</q>.</li>
179 </ul>
180
181 <p>The term <q>folksonomy</q> has emerged in the last few years, to
182 describe what would in other circumstances be described as an
183 uncontrolled keyword list. The new term, and the substantial recent
184 interest in it, is a consequence of the realisation that even such a
185 simple mechanism can in certain circumstances (well-known examples are
186 the Flickr and del.icio.us social services) add substantial value to
187 a set of resources.</p>
188 -->
189
190 <p>
191 An astronomical ontology is necessary if we are to have a computer
192 (appear to) `understand' something of the domain.
193 There has been some progress towards creating an ontology of
194 astronomical object types <span
195 class="cite">std:ivoa-astro-onto</span> to meet this need.
196 However there are distinct use cases for letting human users find
197 resources of interest through search and navigation of the information space.
198 The most appropriate technology to meet these use cases derives from
199 the Information Science community, that of <em>controlled
200 vocabularies, taxonomies and thesauri</em>.
201 In the present document, we do not distinguish between controlled
202 vocabularies, taxonomies and thesauri, and use the term
203 <em>vocabulary</em> to represent all three.
204 </p>
205
206 <p>One of the best examples of the need for a simple vocabulary within
207 the VO is VOEvent <span class="cite">std:voevent</span>, the VO
208 standard for handling astronomical events. This standard requires
209 some formalised indication of what a published event is `about'. See
210 <span class='xref' >usecases</span> for further discussion.</p>
211
212 <p>There have been a number of astronomical vocabularies created, each
213 with its own intended purpose. Some examples are detailed below. </p>
214
215 <ul>
216
217 <li>The <em>Second Reference Dictionary of the Nomenclature of
218 Celestial Objects</em> <span class="cite">lortet94</span>, <span
219 class="cite">lortet94a</span> contains 500 paper pages of astronomical
220 nomenclature</li>
221
222 <li>For decades professional journals have used a set of reasonably
223 compatible keywords to help classify the content of whole articles.
224 These keywords have been analysed by Preite Martinez &amp; Lesteven
225 <span class="cite">preitemartinez07</span>, from which they derived a
226 set of common keywords constituting one of the potential bases for a
227 fuller VO vocabulary. The same authors also attempted to derive a set
228 of common concepts by analyzing the contents of abstracts in journal
229 articles, which should comprise a list of tokens/concepts more
230 up-to-date than the old list of journal keywords. A similar but less
231 formal attempt was made by Hessman <span class='cite'>hessman05</span>
232 for the VOEvent working group, resulting in a similar list <span
233 class="todo">[TODO] Check differences from the A&amp;A
234 list</span>.</li>
235
236 <li>Astronomical databases generally use simple sets of keywords
237 – sometimes hierarchically organized – to aid
238 the users in the querying of the databases. Two examples from totally
239 different contexts are the list of object types used in the <a
240 href="http://simbad.u-strasbg.fr">Simbad</a> database and the search
241 keywords used in the educational Hands-On Universe image database
242 portal.</li>
243
244 <li>The Astronomical Outreach Imagery (AOI) working group has created
245 a simple taxonomy for helping to classify images used for educational
246 or public relations <span class="cite">std:aoim</span>. See section
247 <span class='xref'>vocab-aoim</span>.</li>
248 <!--
249 <li>The Hands-On Universe project (see <span class='url'
250 >http://sunra.lbl.gov/telescope2/index.html</span>) has maintained a
251 public database of images for use by the general public since the
252 1990s. The images are very heterogeneous, since they are gathered from
253 a variety of professional, semi-professional, amateur, and school
254 observatories, so a simple taxonomy is used to facilitate browsing
255 by the users of the database.</li>
256
257 <li>Remote Telescope Markup Language <span
258 class="cite">std:rtml</span>, a document definition for the transfer
259 of observing requests that has been adopted by the Heterogeneous
260 Telescope Network (HTN) Consortium and is indirectly supported by the
261 VOEvent protocol, currently contains several telescope and
262 observation-related taxonomies of terms (e.g. for devices, filters,
263 objects).<span class='todo'>Confirm status: does this need to be
264 converted to SKOS? [AG]. No: RTML will use IVOAT! [FVH] So delete
265 this item? [NG]</span></li>
266 -->
267 <li>In 1993, Shobbrook and Shobbrook published an Astronomy Thesaurus
268 endorsed by the IAU <span class='cite' >shobbrook92</span>. This
269 collection of nearly 3000 terms, in five languages, is a valuable
270 resource, but has seen little use in recent years. Its very size,
271 which gives it expressive power, is a disadvantage to the extent that
272 it is therefore hard to use. See section <span class='xref'>vocab-iau93</span>.</li>
273
274 <li>The Unified Content Descriptors <span class='cite' >std:ucd</span>
275 (UCD) constitute the main controlled vocabulary of the IVOA and
276 contains some taxonometric information. However, UCD suffers from two
277 major problems which makes it difficult to use beyond the present
278 applications of labeling VOTables: firstly, there is no standard means of
279 identifying and processing the contents of the text-based reference
280 document; and secondly, the content cannot be openly extended beyond that set
281 by a formal IVOA committee without going through a laborious and
282 time-consuming negotiation process of extending the primary vocabulary
283 itself. See section <span class='xref'>vocab-ucd1</span>.</li>
284
285 </ul>
286 </div>
287
288 <div class='section' id='usecases'>
289 <p class='title'>Use-cases, and the motivation for formalised vocabularies</p>
290
291 <p>The most immediate high-level motivation for this work is the
292 requirement of the VOEvent standard <span class='cite'
293 >std:voevent</span> for a controlled vocabulary usable in the
294 VOEvent's <code>&lt;what/&gt;</code> element, which describes what
295 sort of object the VOEvent packet is describing, in some broadly
296 intelligible way. This addresses the problem that if a `burst' is
297 announced, it might be a Gamma Ray Burst due to the collapse of a star
298 in a distant galaxy, a solar flare, or the brightening of a stellar or
299 AGN accretion disk, and if a publisher doesn't use the label a user
300 might have expected, they will find it hard to guess what other
301 equivalent labels might have been used. A free-text label can help
302 here (which brings us into the area modishly known as `folksonomies'),
303 but the astronomical community, with its systematising instincts, and
304 aware of the benefits of standardisation, can do better.</p>
305
306 <p>Specific use-cases include the following.</p>
307 <ul>
308 <li>A user wishes to process all events concerning supernovae, which
309 means that an event concerning a supernova1a must be understood to be
310 relevant. [This supports a system working autonomously, filtering
311 incoming information]</li>
312
313 <li>A user is searching an archive of VOEvents for microlensing
314 events, and retrieves a large number of them; the search interface may
315 then prompt her to narrow her search using one of a set of terms
316 including, say, binary lens events. [This supports so-called `semantic
317 search', providing semantic support to an interface which is in turn
318 supporting a user]</li>
319
320 <li>A user wishes to search for resources based on the
321 journal-supported keywords in a paper; they might do this either by
322 hand, or have this done on their behalf by a tool which can extract
323 the keywords from a PDF. The keywords are in the A&amp;A vocabulary,
324 and mappings have been defined between this vocabulary and others,
325 which means that the query keywords is translated automatically
326 into those appropriate for a search of an outreach image database
327 (everyone likes pretty pictures), the VO Registry, a set of SIMBAD
328 object types, and one or more concepts in more formal ontologies. The
329 search interface is then able to support the user browsing up and down
330 the AOIM vocabulary, and a specialised SIMBAD tool is able to take
331 over the search, now it has an appropriate starting place. [This
332 supports interoperability, building on the investments which
333 institutions and users have made in existing vocabularies]</li>
334
335 </ul>
336
337 </div>
338
339 <div class="section">
340 <p class="title">Formalising and managing multiple vocabularies</p>
341
342 <p>We find ourselves in the situation where there are multiple
343 vocabularies in use, describing a broad range of resources of interest
344 to professional and amateur astronomers, and members of the public.
345 These different vocabularies use different terms and different
346 relationships to support the different constituencies they cater for.
347 For example, <q>delta Sct</q> and <q>RR Lyr</q> are terms one would
348 find in a vocabulary aimed at professional astronomers, associated
349 with the notion of <q>variable star</q>; however one would
350 <em>not</em> find such technical terms in a vocabulary intended to
351 support outreach activities.</p>
352
353 <p>One approach to this problem is to create a single consensus
354 vocabulary, which draws terms from the various existing vocabularies
355 to create a new vocabulary which is able to express anything its users
356 might desire. The problem with this is that such an effort would be
357 very expensive: both in terms of time and effort on the part of those
358 creating it, and to the potential users, who have to learn
359 to navigate around it, recognise the new terms, and who have to be
360 supported in using the new terms correctly (or, more often,
361 incorrectly).</p>
362
363 <p>The alternative approach to the problem is to evade it, and this is
364 the approach taken in this document. Rather than deprecating the
365 existence of multiple overlapping vocabularies, we embrace it,
366 formalise all of them, and formally declare the relationships between
367 them. This means that:</p>
368 <ul>
369 <li>The various vocabularies are allowed to evolve separately, on
370 their own timescales, managed either by the IVOA, individual working
371 groups within the IVOA, or by third parties;</li>
372
373 <li>Specialized vocabularies can be developed and maintained by the
374 community with the most knowledge about a specific topic, ensuring
375 that the vocabulary will have the right breadth, depth, and
376 accuracy;</li>
377
378 <li>Users can choose the vocabulary or combination of vocabularies most
379 appropriate to their situation, either when annotating resources, or
380 when querying them; and</li>
381
382 <li>We can retain the previous investments made in vocabularies by
383 users and resource owners.</li>
384
385 </ul>
386
387 <p>The purpose of this proposal is to establish a common format for
388 the grass-roots creation, publishing, use, and manipulation of
389 astronomical vocabularies within the Virtual Observatory, based upon
390 the W3C's SKOS standard. We include as appendices to this proposal
391 formalised versions of a number of existing vocabularies, encoded as
392 SKOS vocabularies <span class="cite">std:skoscore</span>.</p>
393
394 </div>
395
396 </div>
397
398 <div class='section'>
399 <p class='title'>SKOS-based vocabularies</p>
400
401 <div class="section" id='vocab'>
402 <p class="title">Selection of the vocabulary format</p>
403
404 <p>After extensive online and face-to-face discussions, the authors have
405 brokered a consensus within the IVOA community that
406 formalised vocabularies should be published at least in SKOS (Simple Knowledge
407 Organising Systems) format, a W3C draft standard application of RDF to the
408 field of knowledge organisation <span
409 class="cite">std:skoscore</span>. SKOS draws on long experience
410 within the Library and Information Science community, to address a
411 well-defined set of problems to do with the indexing and retrieval of
412 information and resources; as such, it is a close match to the problem
413 this working group is addressing.</p>
414
415 <p>ISO 5964 <span class='cite' >std:iso5964</span> defines a number of
416 the relevant terms (ISO 5964:1985=BS 6723:1985; see also <span
417 class='cite' >std:bs8723-1</span> and <span class='cite'
418 >std:z39.19</span>), and some of the (lightweight) theoretical
419 background. The only technical distinction relevant to this document
420 is that between `vocabulary' and `thesaurus': BS-8723-1 defines a
421 thesaurus as a</p>
422 <blockquote>
423 Controlled vocabulary in which concepts are represented by preferred
424 terms, formally organized so that paradigmatic relationships between
425 the concepts are made explicit, and the preferred terms are
426 accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE:
427 The purpose of a thesaurus is to guide both the indexer and the
428 searcher to select the same preferred term or combination of preferred
429 terms to represent a given subject. (BS-8723-1, sect. 2.39)
430 </blockquote>
431 <p>with a similar definition in ISO-5964 sect. 3.16. The paradigmatic
432 relationships in question are those relating a term to a <q>broader</q>,
433 <q>narrower</q> or more generically <q>related</q> term, with an operational
434 definition of <q>broader term</q> which is such that a resource retrieved
435 by a given term will also be retrieved by that term's <q>broader term</q>.
436 This is not a subsumption relationship, as there is no implication
437 that the concept referred to by a narrower term is of the same
438 <em>type</em> as a broader term.</p>
439
440 <p>Thus <strong>a vocabulary (SKOS or otherwise) is not an
441 ontology</strong>. It has lighter and looser semantics than an
442 ontology, and is specialised for the restricted case of resource
443 retrieval. Those interested in ontological analyses can easily
444 transfer the vocabulary relationship information from SKOS to a formal
445 ontological format such as OWL <span class='cite' >std:owl</span>.</p>
446
447 <!--
448 <p><span class='todo' >What is to be the format of the `master' files?
449 SKOS or mildly-formatted plain text?[NG] By definition, this will be
450 left up to the publishers! All we need to see is SKOS. [FVH] There's
451 more than one notation for SKOS (RDF/XML and Turtle/N3): do we need to
452 mandate one over others (FVH says yes, RDF/XML; NG says no). Open
453 issue.</span></p>
454 -->
455 </div>
456
457 <div class='section'>
458 <p class='title'>Content and format of a SKOS vocabulary</p>
459
460 <p>A published vocabulary in SKOS format consists of a set of
461 <q>concepts</q> – an example concept capturing the
462 vocabulary information about spiral galaxies is provided in Figure
463 XXX, with the RDF shown in both the XML and the Turtle notation <span
464 class='cite' >std:turtle</span> (Turtle is similar to the more
465 informal N3 notation). The elements of a concept are detailed
466 below.</p>
467
468 <center>
469 <table>
470 <tr>
471 <th bgcolor="#eecccc">XML Syntax</th>
472 <th width="10"/>
473 <th bgcolor="#cceecc">Turtle Syntax</th>
474 </tr>
475 <tr><td/></tr>
476 <tr>
477 <td bgcolor="#eecccc">
478 <pre>
479 &lt;skos:Concept rdf:about="#spiralGalaxy"&gt;
480 &lt;skos:prefLabel lang="en"&gt;
481 spiral galaxy
482 &lt;/prefLabel&gt;
483 &lt;skos:prefLabel lang="de"&gt;
484 Spiralgalaxie
485 &lt;/prefLabel&gt;
486 &lt;skos:altLabel lang="en"&gt;
487 spiral nebula
488 &lt;/skos:altLabel&gt;
489 &lt;skos:hiddenLabel lang="en"&gt;
490 spiral glaxy
491 &lt;/hiddenLabel&gt;
492 &lt;skos:definition lang="en"&gt;
493 A galaxy having a spiral structure.
494 &lt;/skos:definition&gt;
495 &lt;skos:scopeNote lang="en"&gt;
496 Spiral galaxies fall into one of
497 three catagories: Sa, Sc, and Sd.
498 &lt;/skos:scopeNote&gt;
499 &lt;skos:narrower
500 rdf:resource="#barredSpiralGalaxy"/&gt;
501 &lt;skos:broader
502 rdf:resource="#galaxy"/&gt;
503 &lt;skos:related
504 rdf:resource="#spiralArm"/&gt;
505 &lt;/skos:Concept&gt;
506 </pre>
507 </td>
508 <td/>
509 <td bgcolor="#cceecc">
510 <pre>
511 &lt;#spiralGalaxy&gt; a skos:Concept;
512 skos:prefLabel
513 "spiral galaxy"@en,
514 "Spiralgalaxie"@de;
515 skos:altLabel "spiral nebula"@en;
516 skos:hiddenLabel "spiral glaxy"@en;
517 skos:definition """A galaxy having a
518 spiral structure."""@en;
519 skos:scopeNote """Spiral galaxies fall
520 into one of three categories:
521 Sa, Sc, and Sd"""@en;
522 skos:narrower &lt;#barredSpiralGalaxy&gt;;
523 skos:broader &lt;#galaxy&gt;;
524 skos:related &lt;#spiralArm&gt; .
525 </pre>
526 </td>
527 </tr>
528 </table>
529 </center>
530
531 <ul>
532
533 <li>A single URI representing the concept, mainly for use by computers
534 but preferably human-readable.
535 <!--
536 <code>&lt;#spiralGalaxy&gt; a skos:Concept</code>.
537 <code>&lt;skos:Concept rdf:about="#spiralGalaxy"&gt;</code>
538 -->
539 </li>
540
541 <li>A single prefered label in each supported language of the
542 vocabulary for use by humans.
543 <!--
544 <code>skos:prefLabel "spiral galaxy"@en, "Spiralgalaxie"@de</code>.
545 <code>&lt;skos:prefLabel&gt;spiral galaxy&lt;/skos:prefLabel&gt;</code>
546 -->
547 </li>
548
549 <li>Optional alternative labels which applications may encounter or in
550 common use, whether simple synonyms or commonly-used aliases,
551 e.g. <q>GRB</q> for "gamma-ray burst", or <q>Spiral nebula</q> for
552 spiral galaxies.
553 <!--
554 <code>skos:altLabel "GRB"@en</code>
555 <code>&lt;skos:altLabel lang="de"&gt;Spiralgalaxie&lt;/skos:altLabel&gt;</code>
556 -->
557 </li>
558
559 <li>Optional hidden labels which capture common misspellings for
560 either the preferred or alternate labels, e.g. <q>spiral glaxy</q> for
561 <q>spiral galaxy</q>.
562 <!--
563 <code>skos:hiddenLabel "spiral glaxy"@en</code>
564 <code>&lt;skos:hiddenLabel lang="en"&gt;spiral glaxy&lt;/prefLabel&gt;</code>
565 -->
566 </li>
567
568 <li>A definition for the concept, where one exists in the original
569 vocabulary, to clarify the meaning of the term.
570 <!--
571 <code>skos:definition "A galaxy having a spiral structure."@en</code>
572 <code>&lt;skos:definition lang="en"&gt;<br/>A galaxy having a spiral structure.<br/>&lt;/skos:definition&gt;</code>
573 -->
574 </li>
575
576 <li>A scope note to further clarify a defintion, or the usage of the
577 concept.
578 <!--
579 <code>skos:scopeNote "Spiral galaxies fall into one of three categories: Sa, Sc, and Sd"@en</code>
580 <code>&lt;skos:scopeNote lang="en"&gt;<br/>Spiral galaxies fall into one of three catagories: Sa, Sc, and Sd.<br/>&lt;/skos:scopeNote&gt;</code>
581 -->
582 </li>
583
584 <li>Optionally, a concept may be involved in any number of relationships
585 to other concepts. The types of relationships are
586 <ul>
587 <li>Narrower or more specific concepts, e.g. a link to the concept
588 representing a <q>barred spiral galaxy</q>.
589 <!--
590 <code>skos:narrower &lt;#barredSpiralGalaxy&gt;</code>.
591 <code>&lt;skos:narrower rdf:resource="#barredSpiralGalaxy"&gt;</code>
592 -->
593 </li>
594 <li>Broader or more general concepts, e.g. a link to the token
595 representing galaxies in general.
596 <!--
597 <code>skos:broader &lt;#galaxy&gt;</code>.
598 <code>&lt;skos:broader rdf:resource="#galaxy"&gt;</code>
599 -->
600 </li>
601 <li>Related concepts, e.g. a link to the token representing spiral
602 arms of galaxies
603 <!--
604 <code>skos:related &lt;#spiralArm&gt;</code>
605 <code>&lt;skos:related rdf:resource="#spiralArm"&gt;</code>
606 -->
607 <br/>
608 (note this relationship does not say that spiral galaxies have spiral
609 arms – that would be ontological information of a higher order which
610 is beyond the requirements for information stored in a vocabulary).</li>
611 </ul>
612 </li>
613 </ul>
614
615 <p>In addition to the information about a single concept, a vocabulary
616 can contain information to help users navigate its structure and
617 contents:</p>
618 <ul>
619 <li>The <q>top concepts</q> of the vocabulary, i.e. those that occur
620 at the top of the vocabulary hierarchy defined by the broader/narrower
621 relationships, can be explicitly stated to make it easier to navigate
622 the vocabulary.</li>
623
624 <li>Concepts that form a natural group can be defined as being members
625 of a <q>collection</q>.</li>
626
627 <li>Versioning information can be added using change notes.</li>
628
629 <li>Additional metadata about the vocabulary, e.g. the publisher, may
630 be documented using the Dublin Core metadata set <span class='cite'
631 >std:dublincore</span>.</li>
632 </ul>
633 </div>
634
635
636 <div class='section'>
637 <p class='title'>Relationships Between Vocabularies</p>
638
639 <p>
640 There already exist several vocabularies in the domain of astronomy.
641 Instead of attempting to replace all these existing vocabularies,
642 which have been developed to achieve different aims and user groups,
643 we embrace them.
644 This requires a mechanism to relate the concepts in the different
645 vocabularies.
646 The W3C are in the process of developing a standard for relating the
647 concepts in different SKOS vocabularies <span
648 class='cite'>std:skosMapping</span> and when completed this should be
649 reviewed for use by the IVOA.
650 </p>
651
652 <p>
653 Four types of relationship are sufficient to capture the relationships
654 between concepts in vocabularies and are similar to those defined for
655 relationships between concepts within a single vocabulary.
656 The relationships are as follows.
657 <span class='todo'>[TODO] Add specifics to the examples.</span>
658 </p>
659 <ul>
660
661 <li>
662 Equivalence between concepts, i.e. the concepts in the different
663 vocabularies refer to the same real world entity.
664 This is captured with the following RDF statement
665 <code>iau93:#SPIRALGALAXY map:exactMatch ivoat:#spiralGalaxy</code>
666 which states the the spiral galaxy concept in the IAU thesaurus is the
667 same as the spiral galaxy concept in the IVOAT.
668 (Note the use of an external namespaces <code>iau93</code> and
669 <code>ivoat</code> which must be defined within the document.)
670 </li>
671
672 <li>
673 Broader concept, i.e. there is not an equivalent concept but there is
674 a more general one.
675 This is captured with the RDF statement <code>iau93:#XXX
676 map:broadMatch ivoat:#YYY</code> which states that the IVOAT concept
677 YYY is more general than the IAU93 concept XXX.
678 </li>
679
680 <li>
681 Narrower concept, i.e. there is not an equivalent concept but there is
682 a more specific one.
683 This is captured with the RDF statement <code>iau93:#XXX
684 map:narrowMatch ivoat:#YYY</code> which states that the IVOAT concept
685 YYY is more specific than the IAU93 concept XXX.
686 </li>
687
688 <li>
689 Related concept, i.e. there is some form of relationship.
690 This is captured with the RDF statement <code>iau93:#XXX
691 map:relatedMatch ivoat:#YYY</code> which states that the IAU93 concept
692 XXX has an association with the IVOAT concept YYY.
693 </li>
694
695 </ul>
696
697 <p>
698 <span class='todo'>[TODO:] Enter text regarding the resolution of <a
699 href="http://code.google.com/p/volute/issues/detail?id=7">Issue
700 7</a>.</span>
701 </p>
702
703 </div>
704
705 <div class='section' id='practices'>
706 <p class='title'>Suggested good practices</p>
707
708 <p>As long as the vocabularies conform to the SKOS standard and are
709 published in a standardised machine processable RDF format, there is
710 nothing keeping a VO application from using the vocabulary to support
711 the human user and to enable new connections between different sources
712 of information. However, we have identified a set of
713 <em>best practice guidelines</em> which, if followed, will make the creation,
714 management, and use of the vocabularies within the VO simpler and more
715 effective: <span class='todo' >Several of the guidelines below are
716 marked as <q>open issues</q>; this does not imply that the rest are
717 necessarily finalised</span></p>
718
719 <ol>
720 <li>The SKOS documents defining the vocabulary should be published at
721 a long-term accessible URI and should be mirrored at a central IVOA
722 vocabulary repository.
723 Each version of the vocabulary should be indicated within the name
724 (e.g. "MyFavoriteVocabulary-v3.14") and previous versions should
725 continue to be available even after having been subsumed by newer
726 versions; Published vocabulary updates should be infrequent and
727 individual changes should be documented, e.g. by
728 <code>&lt;skos:changeNote&gt;</code>. The vocabulary namespace should
729 be the same as the location of the vocabulary.</li>
730
731 <li>Concept identifiers should consist only of the letters a-z, A-Z,
732 and numbers 0-9, i.e. no spaces, no exotic letters (e.g. umlauts), and
733 no characters which would make a token inexpressible as part of a URI;
734 since tokens are for use by computers only, this is not a big
735 restriction - the exotic letters can be used within the labels and
736 documentation if appropriate.</li>
737
738 <li>Token names should be kept in human-readable form, directly
739 reflect the implied meaning, and not be semi-random identifiers only
740 (e.g. <q>spiralGalaxy</q>, not "t1234567"); tokens should preferably
741 be created via a direct conversion from the preferred label via
742 removable/translation of non-token characters (see above) and
743 sub-token separation via capitalization of the first sub-token
744 character (e.g. the label "My favorite idea-label #42" is converted
745 into "MyFavoriteIdeaLabel42"). <span class='todo'><a
746 href="http://code.google.com/p/volute/issues/detail?id=2">Open
747 issue</a></span></li>
748
749 <li>Labels should be in the form of the source vocabulary. When
750 developing a new vocabulary the singular form is preferred,
751 e.g. <q>spiral galaxy</q>, not "spiral galaxies". <span
752 class='todo'><a
753 href="http://code.google.com/p/volute/issues/detail?id=1">Open
754 issue</a></span></li>
755
756 <li>Each concept should have a definition
757 (<code>skos:definition</code>) that constitutes a short description of
758 the concept which could be adopted by an application using the
759 vocabulary; The use of additional documentation in standard SKOS or
760 Dublin format (see above) is encouraged. <span class='todo'>Note
761 distinction between description and SKOS scope-note</span></li>
762
763 <li>The language localization should be declared where appropriate,
764 e.g. preferred labels, alternate labels, defintions, etc.</li>
765
766 <li>Relationships (<q>broader</q>, <q>narrower</q>, <q>related</q>)
767 between concepts are encouraged, but not required; if used, they
768 should be complete (e.g. all <q>broader</q> links have corresponding
769 <q>narrower</q> links in the referenced entries and <q>related</q>
770 entries link each other).</li>
771
772 <li><q>TopConcept</q> entries (see above) should be declared and
773 normally consist of those concepts that do not have any <q>broader</q>
774 relationships (i.e. not at a sub-ordinate position in the
775 hierarchy).</li>
776
777 <li>Publishers are encouraged to publish <q>mappings</q> between their
778 vocabularies and other commonly used vocabularies. These should be
779 external to the defining vocabulary document so that the vocabulary
780 can be used independently of the publisher's mappings.
781 <span class='todo' ><a href='http://code.google.com/p/volute/issues/detail?id=8' >Open issue</a></span>.</li>
782 </ol>
783
784 <p>These suggestions are by no means trivial – there was
785 considerable discussion within the semantic working group on many of
786 these topics, particularly about token formats (some wanted lower-case
787 only), and singular versus plural forms of the labels (different
788 traditions exist within the international library science
789 community). Obviously, no publisher of an astronomical vocabulary has
790 to adopt these rules, but the adoption of these rules will make it
791 easier to use the vocabularly in external generic VO
792 applications. However, VO applications should be developed to accept
793 any vocabulary that complies with the latest SKOS standard <span
794 class="cite">std:skoscore</span>.</p> </div>
795
796 </div>
797
798
799 <div class="section">
800 <p class="title">Example vocabularies</p>
801
802 <p>The intent of having the IVOA adopt SKOS as the prefered format for
803 astronomical vocabularies is to encourage the creation and management
804 of diverse vocabularies by competent astronomical groups, so that
805 users of the VO and related resources can benefit directly and
806 dynamically without the intervention of the IAU or IVOA. However, we
807 felt it important to provide several examples of vocabularies in the
808 SKOS format as part of the proposal, to illustrate their simplicity
809 and power, and to provide an immediate vocabular basis for VO
810 applications.</p>
811
812 <p>We provide a set of SKOS files representing the vocabularies which
813 have been developed, and mappings between them. These can be
814 downloaded at the URL</p>
815 <blockquote>
816 <span class='url'>@BASEURI@/@DISTNAME@.tar.gz</span>
817 </blockquote>
818 <p><span class='todo'>This URL isn't live yet, but will become so when
819 the WD is mature enough to be distributed from the ivoa.net site –
820 that is, when it becomes a formal IVOA Working Draft and enters the
821 formal document process. While in this current draft stage, see
822 <span class='url' >http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics</span></span>.</p>
823
824 <p><span class='todo' >[To be expanded:] there are no mappings at the
825 moment. Also, the vocabularies are all in a single language, though
826 translations of the IAU93 thesaurus are available. See also
827 <a href='http://code.google.com/p/volute/issues/detail?id=8' >issue 8</a></span></p>
828
829 <div class='section' id='vocab-constellation'>
830 <p class='title'>A Constellation Name Vocabulary (normative)</p>
831
832 <p>This vocabulary is presented as a simple example of an astronomical vocabulary for a very particular purpose, e.g. handling constellation information like that commonly encountered in variable star research. For example, <q>SS Cygni</q> is a cataclysmic variable located in the constellation <q>Cygnus</q>. The name of the star uses the genitive form <q>Cygni</q>, but the alternate label <q>SS Cyg</q> uses the standard abbreviation <q>Cyg</q>. Given the constellation vocabulary, all of these forms are recorded together in a computer-manipulatable format. <span class='todo'>`Incorrect' forms should probably be represented in SKOS `hidden labels'</span></p>
833
834 <p>The &lt;skos:ConceptScheme&gt; contains a single &lt;skos:TopConcept&gt;, <q>constellation</q></p>
835 <br/><br/><center>
836 <table>
837 <tr><th bgcolor="#eecccc">XML Syntax</th>
838 <th width="10"/><th bgcolor="#cceecc">Turtle Syntax</th></tr>
839 <tr><td/></tr>
840 <tr>
841 <td bgcolor="#eecccc">
842 <pre>
843 &lt;skos:Concept rdf:about="#constellation"&gt;
844 &lt;skos:inScheme rdf:resource=""/&gt;
845 &lt;skos:prefLabel&gt;
846 constellation
847 &lt;/skos:prefLabel&gt;
848 &lt;skos:definition&gt;
849 IAU-sanctioned constellation names
850 &lt;/skos:definition&gt;
851 &lt;skos:narrower rdf:resource="#Andromeda"/&gt;
852 ...
853 &lt;skos:narrower rdf:resource="#Vulpecula"/&gt;
854 &lt;/skos:Concept&gt;
855 </pre>
856 </td>
857 <td/>
858 <td bgcolor="#cceecc">
859 <pre>
860 &lt;#constellation&gt; a :Concept;
861 :inScheme &lt;&gt;;
862 :prefLabel "constellation";
863 :definition "IAU-sanctioned constellation names";
864 :narrower &lt;#Andromeda&gt;;
865 ...
866 :narrower &lt;#Vulpecula&gt;.
867 </pre>
868 </td></tr>
869 </table></center>
870 <p>and the entry for <q>Cygnus</q> is</p>
871 <center><table><tr>
872 <td bgcolor="#eecccc">
873 <pre>
874 &lt;skos:Concept rdf:about="#Cygnus"&gt;
875 &lt;skos:inScheme rdf:resource=""/&gt;
876 &lt;skos:prefLabel&gt;Cygnus&lt;/skos:prefLabel&gt;
877 &lt;skos:definition&gt;Cygnus&lt;/skos:definition&gt;
878 &lt;skos:altLabel&gt;Cygni&lt;/skos:altLabel&gt;
879 &lt;skos:altLabel&gt;Cyg&lt;/skos:altLabel&gt;
880 &lt;skos:broader rdf:resource="#constellation"/&gt;
881 &lt;skos:scopeNote&gt;
882 Cygnus is nominative form; the alternative
883 labels are the genitive and short forms
884 &lt;/skos:scopeNote&gt;
885 &lt;/skos:Concept&gt;
886 </pre>
887 </td>
888 <td width="10"/>
889 <td bgcolor="#cceecc">
890 <pre>
891 &lt;#Cygnus&gt; a :Concept;
892 :inScheme &lt;&gt;;
893 :prefLabel "Cygnus";
894 :definition "Cygnus";
895 :altLabel "Cygni";
896 :altLabel "Cyg";
897 :broader &lt;#constellation&gt;;
898 :scopeNote """Cygnus is nominative form;
899 the alternative labels are the genitive and
900 short forms""" .
901 </pre>
902 </td>
903 </tr></table></center>
904
905 <p>Note that SKOS alone does not permit the distinct differentiation
906 of genitive forms and abbreviations, but the use of alternate labels
907 is more than adequate enough for processing by VO applications where
908 the difference between <q>SS Cygni</q>, <q>SS Cyg</q>, and the incorrect form
909 <q>SS Cygnus</q> is probably irrelevant.</p>
910 </div>
911
912 <div class='section' id='vocab-aa'>
913 <p class='title'>The Astronomy &amp; Astrophysics Keyword List (normative)</p>
914
915 <p>
916 This vocabulary is a set of keywords made available on a web page by
917 the publisher of the journal.
918 The intended usage of the vocabulary is to tag articles with
919 descriptive keywords to aid searching for articles on a particular
920 topic.
921 </p>
922
923 <p>
924 The keywords are organised into categories which have been modelled as
925 hierachical relationships.
926 Additionally, some of the keywords are grouped into collections which
927 has been mirrored in the SKOS version.
928 The vocabulary contains no defintions, alternative labels, scope
929 notes, or related links, as these are not provided in the original
930 keyword list.
931 </p>
932
933 </div>
934
935 <div class='section' id='vocab-aoim'>
936 <p class='title'>The AOIM Taxonomy (normative)</p>
937
938 <p>
939 This vocabulary is published by the IVOA to allow images to be tagged
940 with keywords that are relevant for the public.
941 It consists of a set of keywords organised into an enumerated
942 hierarchical structure.
943 Each term consists of a taxonomic number and a label.
944 There are no alternative labels, definitions, scope notes, or cross
945 references.
946 </p>
947
948 <p>When converting the AOIM into SKOS, it was decided to model the
949 taxonomic number as an alternative label.
950 Since there are duplication of terms, the token for a term consists of
951 the full hierarchical location of the term.
952 Thus, it is possible to distinguish between</p>
953 <pre>
954 Planet -> Feature -> Surface -> Canyon
955 </pre>
956 <p>and</p>
957 <pre>
958 Planet -> Satellite -> Feature -> Surface -> Canyon
959 </pre>
960 <p>which have the tokens <code>PlanetFeatureSurfaceCanyon</code> and
961 <code>PlanetSatelliteFeatureSurfaceCanyon</code> respectively.
962 </p>
963
964 </div>
965
966 <div class='section' id='vocab-ucd1'>
967 <p class='title'>The UCD1+ Vocabulary (non-normative)</p>
968
969 <p>The UCD standard is an officially sanctioned and managed vocabulary
970 of the IVOA. The normative document is a simple text file containing
971 entries consisting of tokens (e.g. <code>em.IR</code>), a short
972 description, and usage information (<q>syntax codes</q> which permit
973 UCD tokens to be concatenated). The form of the tokens implies a
974 natural hierarchy: <code>em.IR.8-15um</code> is obviously a narrower
975 term than <code>em.IR</code>, which in turn is narrower than
976 <code>em</code>.</p>
977
978 <p>Given the structure of the UCD1+ vocabulary, the natural
979 translation to SKOS consists of preferred labels equal to the original
980 tokens (the UCD1 words include dashes and periods), vocabulary tokens
981 created using guidelines in section <span class='xref'
982 >practices</span> (e.g., "emIR815Um" for
983 <code>em.IR.8-15um</code>), direct use of the definitions, and the syntax codes
984 placed in usage documentation: <code>&lt;skos:scopeNote&gt;UCD syntax code: P&lt;/skos:scopeNote&gt;</code>
985 <span class='todo'>NOTE: THIS IS THE FORMAT I USED IN MY VERSION - MAY NOT BE THE SAME AS NORMAN'S [FVH]</span></p>
986
987 <p>Note that the SKOS document containing the UCD1+ vocabulary does
988 NOT consistute the official version: the normative document is still
989 the text list. However, on the long term, the IVOA may decide to make
990 the SKOS version normative, since the SKOS version contains all of the
991 information contained in the original text document but has the
992 advantage of being in a standard format easily read and used by any
993 application on the semantic web.</p>
994
995 </div>
996
997 <div class='section' id='vocab-iau93'>
998 <p class='title'>The 1993 IAU Thesaurus (normative)</p>
999
1000 <p>The IAU Thesaurus consists of concepts with mostly capitalized
1001 labels and a rich set of thesaurus relationships (<q>BF</q> for
1002 "broader form", <q>NF</q> for <q>narrower form</q>, and <q>RF</q> for
1003 <q>related form</q>). The thesaurus also contains <q>U</q> (for
1004 <q>use</q>) and <q>UF</q> (<q>use for</q>) relationships. In a SKOS
1005 model of a vocabulary these are captured as alternative labels. A
1006 separate document contains translations of the vocabulary terms in
1007 five languages: English, French, German, Italian, and
1008 Spanish. Enumeratable concepts are plural (e.g. <q>SPIRAL
1009 GALAXIES</q>) and non-enumerable concepts are singular
1010 (e.g. <q>STABILITY</q>). Finally, there are some usage hints like
1011 <q>combine with other</q></p>
1012
1013 <p>In converting the IAU Thesaurus to SKOS, we have been as faithful
1014 as possible to the original format of the thesaurus. Thus, preferred
1015 labels have been kept in their uppercase format.</p>
1016
1017 <p>The IAU Thesaurus has been unmaintained since its initial production in
1018 1993; it is therefore significantly out of date in places. This
1019 vocabulary is published for the sake of completeness, and to make the
1020 link between the evolving vocabulary work and any uses of the 1993
1021 vocabulary which come to light. We do not expect to make any future
1022 maintenance changes to this vocabulary, and would expect the IVOAT
1023 vocabulary, based on this one, to be used instead (see section <span class='xref'>vocab-ivoat</span>).</p>
1024
1025 </div>
1026
1027 <div class='section' id='vocab-ivoat'>
1028 <p class='title'>Towards an IVOA Thesaurus</p>
1029
1030 <p>While it is true that the adoption of SKOS will make it easy to
1031 publish and access different astronomical vocabularies, the fact is
1032 that there is no vocabulary which makes it easy to jump-start the
1033 use of vocabularies in generic astrophysical VO applications: each of
1034 the previously developed vocabularies has their own limits and
1035 biases. For example, the IAU Thesaurus provides a large number of
1036 entries, copious relationships, and translations to four other languages,
1037 but there are no definitions, many concepts are now only useful for
1038 historical purposes (e.g. many photographic or historical instrument
1039 entries), some of the relationships are false or outdated, and many
1040 important or newer concepts and their common abbreviations are
1041 missing.</p>
1042
1043 <p>Despite its faults, the IAU Thesaurus constitutes a very extensive
1044 vocabulary which could easily serve as the basis vocabulary once
1045 we have removed its most egregious faults and extended it to cover the
1046 most obvious semantic holes. To this end, a heavily revised IAU
1047 thesaurus is in preparation for use within the IVOA and other
1048 astronomical contexts. The goal is to provide a general vocabulary
1049 foundation to which other, more specialized, vocabularies can be added
1050 as needed, and to provide a good <q>lingua franca</q> for the creation of
1051 vocabulary mappings.</p>
1052 </div>
1053 </div> <!-- End: Example vocabularies -->
1054
1055
1056 <div class="appendices">
1057
1058 <div class="section-nonum" id="bibliography">
1059 <p class="title">Bibliography</p>
1060 <?bibliography rm-refs ?>
1061 </div>
1062
1063 <p style="text-align: right; font-size: x-small; color: #888;">
1064 $Revision$ $Date$
1065 </p>
1066
1067 </div>
1068
1069 </body>
1070 </html>

Properties

Name Value
svn:keywords Author Date Revision

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26