1 |
<?xml version="1.0" encoding="utf-8"?> |
2 |
<!-- Based on template at |
3 |
http://www.ivoa.net/Documents/templates/ivoa-tmpl.html --> |
4 |
<html xmlns="http://www.w3.org/1999/xhtml" |
5 |
xmlns:dc="http://purl.org/dc/elements/1.1/" |
6 |
xmlns:dcterms="http://purl.org/dc/terms/" |
7 |
xml:lang="en" lang="en"> |
8 |
|
9 |
<head> |
10 |
<title>Vocabularies in the Virtual Observatory</title> |
11 |
<link rev="made" href="http://nxg.me.uk/norman/#norman" title="Norman Gray"/> |
12 |
<meta name="author" content="Norman Gray"/> |
13 |
<meta name="DC.subject" content="IVOA, Virtual Observatory, Vocabulary"/> |
14 |
<meta name="rcsdate" content="$Date$"/> |
15 |
<link href="http://www.ivoa.net/misc/ivoa_wd.css" rel="stylesheet" type="text/css"/> |
16 |
<!-- style: make the ToC a little more compact, and without bullets --> |
17 |
<style type="text/css"> |
18 |
div.toc ul { list-style: none; padding-left: 1em; } |
19 |
span.userinput { font-weight: bold; } |
20 |
span.url { font-family: monospace; } |
21 |
q { color: #666; } |
22 |
q:before { content: "“"; } |
23 |
q:after { content: "”"; } |
24 |
.todo { background: #ff7; } |
25 |
</style> |
26 |
</head> |
27 |
|
28 |
<body> |
29 |
<div class="head"> |
30 |
<table> |
31 |
<tr><td><a href="http://www.ivoa.net/"><img alt="IVOA logo" src="http://ivoa.net/icons/ivoa_logo_small.jpg" border="0"/></a></td></tr> |
32 |
</table> |
33 |
|
34 |
<h1>Vocabularies in the Virtual Observatory, v@VERSION@</h1> |
35 |
<h2>IVOA Working Draft, @RELEASEDATE@</h2> |
36 |
<!-- $Revision$ $Date$ --> |
37 |
|
38 |
<dl> |
39 |
<dt>Working Group</dt> |
40 |
<dd><em><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics">Semantics</a></em></dd> |
41 |
|
42 |
<dt>This version</dt> |
43 |
<dd>@BASEURI@</dd> <!-- XXX adjust current/latest URI from Makefile --> |
44 |
|
45 |
<dt>Latest version</dt> |
46 |
<dd>@BASEURI@</dd> |
47 |
|
48 |
<dt>Editors</dt> |
49 |
<dd>TBD</dd> |
50 |
|
51 |
<dt>Authors</dt> |
52 |
<dd> |
53 |
<!-- The following are the folk that I'm aware have contributed text or code to this document: add others as appropriate --> |
54 |
<span property="dc:creator">Alasdair J G Gray</span>, |
55 |
<span property="dc:creator">Norman Gray</span>, |
56 |
<span property="dc:creator">Frederick V Hessman</span> and |
57 |
<span property="dc:creator">Andrea Preite Martinez</span> |
58 |
</dd> |
59 |
</dl> |
60 |
<hr/> |
61 |
</div> |
62 |
|
63 |
<div class="section-nonum" id="abstract"> |
64 |
<p class="title">Abstract</p> |
65 |
|
66 |
<div class="abstract"> |
67 |
<p>Use SKOS.</p> |
68 |
</div> |
69 |
|
70 |
</div> |
71 |
|
72 |
<div class="section-nonum" id="status"> |
73 |
<p class="title">Status of this document</p> |
74 |
|
75 |
<p>This is an IVOA Working Draft. The first release of this document was |
76 |
<span property="dc:date">@RELEASEDATE@</span>.</p> |
77 |
|
78 |
<p>This document is an IVOA Working Draft for review by IVOA members |
79 |
and other interested parties. It is a draft document and may be |
80 |
updated, replaced, or obsoleted by other documents at any time. It is |
81 |
inappropriate to use IVOA Working Drafts as reference materials or to |
82 |
cite them as other than <q>work in progress</q>.</p> |
83 |
|
84 |
<p>A list of current IVOA Recommendations and other technical |
85 |
documents can be found at |
86 |
<a href="http://www.ivoa.net/Documents/"><code>http://www.ivoa.net/Documents/</code></a>.</p> |
87 |
|
88 |
<h3>Acknowledgments</h3> |
89 |
|
90 |
<p>None so far.</p> |
91 |
|
92 |
</div> |
93 |
|
94 |
<h2><a id="contents" name="contents">Table of Contents</a></h2> |
95 |
<?toc?> |
96 |
|
97 |
<hr/> |
98 |
|
99 |
<div class="section" id="introduction"> |
100 |
<p class="title">Introduction</p> |
101 |
|
102 |
<div class="section"> |
103 |
<p class="title">Vocabularies in astronomy</p> |
104 |
|
105 |
<p>Astronomical information of relevance to the Virtual Observatory |
106 |
(VO) is not confined to quantities easily expressed in a catalogue or |
107 |
a table. Fairly simple things like position on the sky, brightness |
108 |
in some units, times measured in some frame, redshits, classifications |
109 |
or other similar quantities are easily manipulated and stored in |
110 |
VOTables and can now be identified using IVOA UCDs <span class="cite">std:ucd</span>. However, astrophysical concepts and |
111 |
quantities consist of a wide variety of names, identifications, |
112 |
classifications and associations, most of which cannot be described or |
113 |
labelled via UCDs.</p> |
114 |
|
115 |
<p>There has been some progress towards creating an ontology of |
116 |
astronomical object types <span class="cite">std:ivoa-astro-onto</span> (an ontology is a systematic formal |
117 |
description of a set of concepts and their relations with each other), |
118 |
such a formal approach may not be necessary, and may be |
119 |
counterproductive [AG Not sure counterproductive is the right argument here. Ontologies do not meet all of the navigation and retrieval use cases.]. An ontology is necessary if we are to have a |
120 |
computer (appear to) `understand' something of a domain, but in the |
121 |
present case, we are more concerned with the related but distinct |
122 |
problem of letting human users find resources of interest, and so the |
123 |
most appropriate technology derives from the Information Science |
124 |
community, that of <em>controlled vocabularies, taxonomies and |
125 |
thesauri</em>.</p> |
126 |
|
127 |
<p>One of the best examples of the need for a simple vocabulary within |
128 |
the VO is VOEvent <span class="cite">std:voevent</span>, the VO |
129 |
standard for handling astronomical events: if someone broadcasts, or |
130 |
`publishes', the occurrence of an event, the implication is that |
131 |
someone else is going to want to respond to it, but no institution is |
132 |
interested in all possible events, so some standardised information |
133 |
about what the event `is about' is necessary, in a form which |
134 |
ensures that the parties can communicate effectively. If a `burst' is |
135 |
announced, is it a Gamma Ray Burst due to the collapse of a star in a |
136 |
distant galaxy, a solar flare, or the brightening of a stellar or AGN |
137 |
accretion disk? If a publisher doesn't use the label one might have |
138 |
expected, how is one to guess what other equivalent labels might have |
139 |
been used?</p> |
140 |
|
141 |
<p>There have been a number of attempts to create astronomical |
142 |
vocabularies (in the present note we will not need to distinguish |
143 |
vocabularies, taxonomies and thesauri, and will use the term |
144 |
`vocabulary' for all three cases).</p> |
145 |
<ul> |
146 |
<li>The <em>Second Reference Dictionary of the Nomenclature of |
147 |
Celestial Objects</em> <span class="cite">lortet94</span>, <span class="cite">lortet94a</span> contains 500 paper pages of |
148 |
astronomical nomenclature</li> |
149 |
|
150 |
<li>For decades professional journals have used a set of reasonably |
151 |
compatible keywords to help classify the content of whole articles. |
152 |
These keywords have been analysed by Preite Martinez & Lesteven |
153 |
<span class="cite">preitemartinez07</span>, from which they derived a set |
154 |
of common keywords constituting one of the potential bases for a |
155 |
fuller VO vocabulary. The same authors also attempted to derive a set |
156 |
of common concepts by analyzing the contents of abstracts in journal |
157 |
articles, the list of which should contain more up-to-date |
158 |
tokens/concepts than the old list of journal keywords. A similar but |
159 |
less formal attempt was made by Hessman for the VOEvent working group, |
160 |
resulting in a similar list <span class="todo">Find Hessman05 |
161 |
reference, and check differences from the A&A list</span>.</li> |
162 |
|
163 |
<li>Astronomical databases generally use simple sets of keywords – |
164 |
sometimes hierarchically organized – to aid the users in the querying |
165 |
of the databases. Two examples from totally different contexts are the |
166 |
list of object types used in the <a href="http://simbad.u-strasbg.fr">Simbad</a> database and the search keywords used in the educational |
167 |
Hands-On Universe image database portal.</li> |
168 |
|
169 |
<li>The Astronomical Outreach Imagery (AOI) working group has created a simple |
170 |
taxonomy for helping to classify images used for educational or public |
171 |
relations. See <span class='url'>http://ivoa.net/Documents/latest/AOIMetadata.html</span></li> |
172 |
|
173 |
<li>The Hands-On Universe project (see <span class='url' |
174 |
>http://sunra.lbl.gov/telescope2/index.html</span> has maintained a |
175 |
public database of images for use by the general public since the |
176 |
1990s. The images are very heterogeneous, since they are gathered from |
177 |
a variety of professional, semi-professional, amateur, and school |
178 |
observatories, so a simple taxonomy is used to facilitate the browsing |
179 |
by the users of the database.</li> |
180 |
|
181 |
<li>Remote Telescope Markup Language <span |
182 |
class="cite">std:rtml</span>, a document definition for the transfer |
183 |
of observing requests that has been adopted by the Heterogeneous |
184 |
Telescope Network (HTN) Consortium and is indirectly supported by the |
185 |
VOEvent protocol, currently contains several telescope and |
186 |
observation-related taxonomies of terms (e.g. for devices, filters, |
187 |
objects).<span class='todo'>Confirm status: does this need to be |
188 |
converted to SKOS? [AG]. Possibly: chase with Rick? [NG]</span></li> |
189 |
|
190 |
<li>In 1993, Shobbrook and Shobbrook published an Astronomy Thesaurus |
191 |
endoresed by the IAU (see <span class='url' |
192 |
>http://www.aao.gov.au/lib/thesaurus.html</span> <span class='todo' |
193 |
>What's the correct citation for this?</span>. This collection of |
194 |
just short of 3000 terms, in four languages, is a valuable resource, |
195 |
but has been unfortunately little used in recent years. Its very |
196 |
size, which gives it expressive power, is a disadvantage to the extent |
197 |
that it is therefore hard to use.</li> |
198 |
|
199 |
</ul> |
200 |
</div> |
201 |
|
202 |
<div class="section"> |
203 |
<p class="title">Formalising and managing multiple vocabularies</p> |
204 |
|
205 |
<p>We find ourselves in the situation where there are multiple |
206 |
vocabularies in use, describing a broad range of resources of interest |
207 |
to professional and amateur astronomers, and members of the public. |
208 |
These different vocabularies use different terms and different |
209 |
relationships to support the different constituencies they cater for. |
210 |
For example, `delta Sct' and `RR Lyr' are terms one would hope to find |
211 |
in a vocabulary aimed at professional astronomers, associated with the |
212 |
notion of `variable star'; one would hope <em>not</em> to find such |
213 |
technical terms in a vocabulary intended to support outreach |
214 |
activities.</p> |
215 |
|
216 |
<p>One approach to this problem is to create a single consensus |
217 |
vocabulary, which draws terms from the various existing vocabularies |
218 |
to create a new vocabulary which is able to express anything its users |
219 |
might desire. The problem with this is that such an effort would be |
220 |
very expensive: both in terms of time and effort on the part of those |
221 |
creating it, and to the potential users, who have to learn |
222 |
to navigate around it, recognise the new terms, and who have to be |
223 |
supported in using the new terms correctly (or, more often, |
224 |
incorrectly).</p> |
225 |
|
226 |
<p>The alternative approach to the problem is to evade it, and this is |
227 |
the approach taken in this Draft. Rather than deprecating the |
228 |
existence of multiple overlapping vocabularies, we embrace it, |
229 |
formalise all of them, and formally declare the relationships between |
230 |
them. This means that:</p> |
231 |
<ul> |
232 |
<li>The various vocabularies can evolve separately, on their own |
233 |
timescales, managed by the IVOA or by third parties;</li> |
234 |
<li>Users can use the vocabulary most appropriate to their situation, |
235 |
either when annotating resources, or when querying them;</li> |
236 |
<li>We retain the investments made in vocabularies by users and |
237 |
resource owners.</li> |
238 |
</ul> |
239 |
|
240 |
<p>To this end we present in this Draft formalised versions of a |
241 |
number of existing vocabularies, encoded as SKOS vocabularies <span class="cite">std:skoscore</span>.</p> |
242 |
|
243 |
</div> |
244 |
|
245 |
</div> |
246 |
|
247 |
|
248 |
<div class="section"> |
249 |
<p class="title">Formalising the Vocabularies</p> |
250 |
|
251 |
<p>After a number of online and face-to-face discussions, the authors |
252 |
brokered a consensus within the IVOA community that the published formats of |
253 |
formalised vocabularies should include at least SKOS (Simple Knowledge |
254 |
Organising Systems), a W3C draft standard application of RDF to the |
255 |
field of knowledge organisation <span |
256 |
class="cite">std:skoscore</span>. SKOS draws on long experience |
257 |
within the Library and Information Science community, to address a |
258 |
well-defined set of problems to do with the indexing and retrieval of |
259 |
information and resources; as such, it is a close match to the problem |
260 |
this working group is addressing.</p> |
261 |
|
262 |
<p>ISO 5964 <span class='cite' >std:iso5964</span> defines a number of |
263 |
the relevant terms (ISO 5964:1985=BS 6723:1985; see also <span |
264 |
class='cite' >std:bs8723-1</span> and <span class='cite' |
265 |
>std:z39.19</span>), and some of the (lightweight) theoretical |
266 |
background. The only technical distinction relevant to this document |
267 |
is that between `vocabulary' and `thesaurus': BS-8723-1 defines a |
268 |
thesaurus as a</p> |
269 |
<blockquote> |
270 |
controlled vocabulary in which concepts are represented by preferred |
271 |
terms, formally organized so that paradigmatic relationships between |
272 |
the concepts are made explicit, and the preferred terms are |
273 |
accompanied by lead-in entries for synonyms or quasi-synonyms. NOTE: |
274 |
The purpose of a thesaurus is to guide both the indexer and the |
275 |
searcher to select the same preferred term or combination of preferred |
276 |
terms to represent a given subject. (BS-8723-1, sect. 2.39) |
277 |
</blockquote> |
278 |
<p>with a similar definition in ISO-5964 sect. 3.16. The paradigmatic |
279 |
relationships in question are those relating a term to a `broader', |
280 |
`narrower' or more generically `related' term, with an operational |
281 |
definition of `broader term' which is such that a resource retrieved |
282 |
by a given term will also be retrieved by that term's `broader term'. |
283 |
This is not a subsumption relationship, as there is no implication |
284 |
that the concept referred to by a narrower term is of the same |
285 |
<em>type</em> as a broader term.</p> |
286 |
|
287 |
<p>Thus a vocabulary (SKOS or otherwise) is not an ontology. It has |
288 |
lighter and looser semantics than an ontology, and is specialised for |
289 |
the restricted case of resource retrieval.</p> |
290 |
|
291 |
<p><span class='todo' >What is to be the format of the `master' files? |
292 |
SKOS or mildly-formatted plain text?</span></p> |
293 |
|
294 |
<div class="section"> |
295 |
<p class="title">SKOS files (normative)</p> |
296 |
|
297 |
<p>We provide a set of SKOS files representing the vocabularies which |
298 |
have been developed.</p> |
299 |
|
300 |
<p>To come: one SKOS file per vocabulary, defining the list of |
301 |
concepts; at least one file per vocabulary, giving mappings to other |
302 |
vocabularies; possibly translations. See Makefile in ../Vocabularies, |
303 |
which produces a tarball, at present without mappings or translations.</p> |
304 |
|
305 |
</div> |
306 |
</div> |
307 |
|
308 |
<div class="appendices"> |
309 |
|
310 |
<div class="section-nonum" id="bibliography"> |
311 |
<p class="title">Bibliography</p> |
312 |
<?bibliography rm-refs ?> |
313 |
</div> |
314 |
|
315 |
<p style="text-align: right; font-size: x-small; color: #888;"> |
316 |
$Revision$ $Date$ |
317 |
</p> |
318 |
|
319 |
</div> |
320 |
|
321 |
</body> |
322 |
</html> |