/[volute]/trunk/projects/registry/Identifiers/Identifiers.tex
ViewVC logotype

Contents of /trunk/projects/registry/Identifiers/Identifiers.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 3106 - (show annotations)
Mon Oct 12 12:13:34 2015 UTC (6 years ago) by msdemlei
File MIME type: application/x-tex
File size: 45619 byte(s)
Identifiers: Fixed markup bugs uncovered by HTML translation.


1 \documentclass{ivoa}
2 \input tthdefs
3
4 \SVN$Rev$
5 \SVN$Date$
6 \SVN$URL$
7
8 \newcommand{\abnfterm}[1]{%
9 \ensuremath{\,\hbox{\texttt{\char'042\relax #1\char'042}}\,}}
10 \newcommand{\abnfrepeat}[1]{\,*#1}
11 \newcommand{\abnfoptional}[1]{[#1]}
12 \newcommand{\abnfor}{\ensuremath{\,\,/\,\,}}
13 \newcommand{\abnfnt}[1]{\ensuremath{\langle\textit{#1\/}\rangle\,}}
14 \newcommand{\abnfto}{=}
15 \newcommand{\mytilde}{\char'176}
16
17 \iftth
18 \renewcommand{\abnfterm}[1]{%
19 \special{html:<tt>"}#1\special{html:"</tt>}}
20 \renewcommand{\abnfnt}[1]{\special{html:<i>&lt;}#1\special{html:&gt;</i>}}
21 \renewcommand{\mytilde}{\special{html:~}}
22 \fi
23
24 \hyphenation{Stan-dards-Reg-Ext}
25 \hyphenation{Obs-Core}
26
27 \title{IVOA Identifiers}
28
29 \ivoagroup{Resource Registry}
30
31 \author[http://www.ivoa.net/twiki/bin/view/IVOA/MarkusDemleitner]{Markus Demleitner}
32 \author[http://www.ivoa.net/twiki/bin/view/IVOA/RayPlante]{Raymond Plante}
33 \author[http://www.ivoa.net/twiki/bin/view/IVOA/TonyLinde]{Tony Linde}
34 \author[http://www.ivoa.net/twiki/bin/view/IVOA/RoyWilliams]{Roy Williams}
35 \author[http://www.ivoa.net/twiki/bin/view/IVOA/KeithNoddle]{Keith Noddle}
36 \author{and the IVOA Registry Working Group}
37
38 \editor{Markus Demleitner}
39
40 \previousversion[http://www.ivoa.net/Documents/REC/Identifiers/Identifiers-20070302.html]{PR-20060822}
41 \previousversion[http://www.ivoa.net/Documents/PR/Identifiers/Identifiers-20050302.html]{PR-20050302}
42 \previousversion[http://www.ivoa.net/Documents/PR/Identifiers/Identifiers-20040621.html]{PR-20040621}
43 \previousversion[http://www.ivoa.net/Documents/WD/Identifiers/Identifiers-20040209.html]{WD-20040209.html}
44 \previousversion[http://www.ivoa.net/Documents/PR/Identifiers/Identifiers-20031031.html]{WD-20031031}
45 \previousversion[http://www.ivoa.net/Documents/WD/Identifiers/Identifiers-20030930.html]{WD-20030930}
46 \previousversion[http://www.ivoa.net/Documents/WD/Identifiers/Identifiers-20030830.html]{PR-20030830.html}
47
48 \begin{document}
49 \begin{abstract}
50 An IVOA Identifier is a globally unique name for a resource
51 within the Virtual Observatory. This
52 name can be used to retrieve a unique description of the resource
53 from an IVOA-compliant registry or to identify an entity like a dataset
54 or a protocol without dereferencing the identifier.
55 This document describes the syntax
56 for IVOA Identifiers as well as how they are created.
57 The syntax has been defined to encourage global-uniqueness naturally
58 and to maximize the freedom of resource providers to control the
59 character content of an identifier.
60 \end{abstract}
61
62
63 \section*{Acknowledgments}
64
65 This document builds on the concept of a Uniform Resource Identifier
66 as described in RFC 3986 \citep{std:RFC3986} and its predecessors.
67
68 This document has been developed with support from the
69 \href{http://www.nsf.gov}{National Science Foundation}'s
70 Information Technology Research Program under Cooperative Agreement
71 AST0122449 with The Johns Hopkins University, from the
72 \href{http://www.pparc.ac.uk}{UK Particle Physics and Astronomy
73 Research Council (PPARC)}, from the
74 \href{http://fp6.cordis.lu/fp6/home.cfm}{European Commission's Sixth
75 Framework Program} via the \href{http://www.astro-opticon.org/} {Optical
76 Infrared Coordination Network (OPTICON)}, and from the German
77 Astrophyiscal Virtual Observatory GAVO, BMBF grant 05A14VHA.
78
79
80 \section*{Conformance-related definitions}
81
82 The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
83 ``OPTIONAL'' (in upper or lower case) used in this document are to be
84 interpreted as described in RFC 2119 \citep{std:RFC2119}.
85
86 \section*{Usage of ABNF}
87
88 This specification uses ABNF \citep{std:RFC2234} to specify grammar
89 rules. The rules from RFC 3986 are assumed throughout. Where both this
90 specification and RFC 3986 define a nonterminal, the rule in this
91 specification overrides the corresponding rule from RFC 3986.
92
93 For explicitness, we write ABNF nonterminals in angle brackets
94 (\abnfnt{like this}) throughout.
95
96 \section{Introduction}
97
98 Virtual Observatory applications frequently need to
99 unambiguously refer to some resource or concept
100 which is described elsewhere. It is therefore necessary to
101 define global, potentially
102 dereferenceable identifiers. In the VO, these are called
103 IVOA identifiers (IVOIDs).
104 An unambiguous reference within the entire Virtual Observatory
105 requires that the identifier is globally unique. Ensuring
106 this uniqueness inevitably requires oversight by a moderating
107 authority; however, a flexible framework can minimize the opportunity
108 for duplicated identifiers.
109
110 Many data providers in the VO were creating
111 and using identifiers long before this specification was developed.
112 Their choices of identifiers were made
113 presumably to best fit the needs of the data.
114 In order to minimize the cost of adoption of the IVOA identifier framework
115 the design specified here maximizes the
116 control providers have, thus allowing the reuse of the identifiers
117 data providers already have in place
118 as well as the creation of
119 new idenfiers that are consistent with their overall
120 organization.
121
122 Identifiers are crucial to the operation of registries that aid users in
123 discovering data and services \citep{std:RI1}. In general, a registry stores
124 descriptions of data and services in a searchable form, and it
125 distinguishes them by the unique identifier defined here. It thus
126 serves as a primary key for the VO Registry, and thus allows
127 dereferencing identifiers to metadata about a resource (the
128 resource record).
129
130 IVOA identifiers with query or fragment parts can furthermore
131 reference essentially arbitrary
132 entities like datasets or protocols, based on this primary mechanism of
133 dereferencing.
134
135 We recognize that resources do
136 not always remain in the control of a single organization forever.
137 This
138 necessitates a form of referencing that is
139 location-independent -- or more precisely, organization-independent.
140 Apart from enabling seamless transfers of data curation, an
141 attractive use case for such identifiers is
142 when several copies of a dataset exist at several locations around
143 the VO and one could refer to all of them collectively, deferring the choice
144 of a particular instance until it is actually needed.
145 Such references thus serve as
146 \emph{persistent} pointers to data that can be flexibly resolved.
147 This is very important to journal publishers
148 that wish to refer to data in publications (whose useful life might be
149 measured in decades) without worry that the references will become
150 obsolete.
151
152 This specification, in contrast, defines
153 \emph{organization-dependent identifiers}.
154 Persistent, organization- and location-independent identifiers are
155 \emph{not} (directly) defined here.
156
157 Referencing resources is
158 addressed by the IETF standard for URIs, RFC 3986 \citep{std:RFC3986}.
159 Thus, the framework proposed
160 in this document builds directly on this standard. Essentially, this
161 standard sets the parameters left open for application use
162 by RFC 3986.
163
164 \subsection{Definitions}
165
166 A \emph{Uniform Resource Identifier}
167 (URI) is defined by RFC 3986 as ``a compact sequence of
168 characters that identifies an abstract or physical resource'' which
169 complies with the syntax specification of that document
170 \citep{std:RFC3986}. It can point to an
171 actual retrievable resource, but there is no requirement for it to be
172 dereferenceable at all, let alone by a stock web browser.
173
174 An \emph{IVOA identifier}, or IVOID, is a
175 special sort of URI complying with all parts of this specification.
176 Historically, these have also been known as \emph{IVOA Resource Names}
177 (IVORNs), in parallel to
178 the \emph{Uniform
179 Resource Names} (URNs) that formulated extra requirements on persistency
180 and location-independence. As plain IVOIDs do not fulfill those
181 requirements and the term URN has been deprecated by
182 RFC 3986, we now deprecate the term IVORN, too.
183
184 A full IVOID can thus be split into a \emph{Registry part} (schema,
185 authority, and path) and a possibly empty
186 \emph{local part} consisting of query and fragment component, again
187 using RFC 3986 nomenclature. An IVOID with an empty local part is also
188 known as a \emph{Registry reference}.
189
190 In VO practice, the term \emph{resource} is somewhat ambiguous.
191 The IVOA Recommendation on
192 Resource Metadata \citep{std:RM}, from here on referred to as RM,
193 defines it as ``a VO element that can be described in terms of who
194 curates or maintains it and which can be given a name and a unique
195 identifier.'' It then goes on to define the relevant pieces of
196 metadata, which later provided the foundations of the data model behind
197 the IVOA Registry.
198
199 This might lead to the expectation that there is a 1:1 relationship
200 between Registry records, ``VO resources'', and IVOA identifiers, and
201 version 1 of this document essentially implied as much. In this
202 version, we only require Registry references to resolve in the
203 Registry.
204
205 IVOIDs having a nonempty local part do not dereference to
206 Registry records. Since we want to
207 maintain the notion that a resource is whatever a URI points to,
208 ``resource'' as used here does \emph{not} correspond to the usage of the term in
209 VOResource \citep{std:VOR}. To maintain the distinction, we call
210 resources in the sense of VOResource Registry records,
211 which now form a subset of the resources that
212 can be referenced by IVOIDs.
213
214 We refer to organizations and providers in the sense that they
215 are defined in RM:
216
217 \begin{quotation}
218 An \emph{organization} is a specific
219 type of resource that brings people together to pursue
220 participation in VO applications. Organizations can be hierarchical
221 and range greatly in size and scope. At a high level, it could be a
222 university, observatory, or government agency. At a finer level, it
223 could be a specific scientific project, space mission, or individual
224 researcher. A \emph{provider} is an
225 \emph{organization} that makes data and/or services
226 available to users over the network.
227 \end{quotation}
228
229 Definitions of other types of resources, including data collection
230 and service, are also provided in RM, and
231 are assumed by this document.
232
233
234 \subsection{Selected Requirements}
235
236 This proposal is the result of various requirement studies for VO
237 identifiers and registries in general (e.g.
238 NVO ID
239 requirements\footnote{
240 \url{http://web.archive.org/web/20070226120639/http://nvo.ncsa.uiuc.edu/~rplante/VO/metadata/oidreq2.txt}}).
241 This section highlights a few of the important
242 ones that guided the design of the ID framework.
243
244 \begin{enumerate}
245 \item A single framework should be used to identify anything a VO
246 application can refer to, including organizations, projects
247 (mission/telescope), data collections, and services.
248
249 \item It should be easy to compare two instances of an identifier to
250 determine if they refer to the same object.
251
252 \item It should be possible to use an identifier to access a unique
253 description of the resource it identifies.
254
255 \item The framework should maximize the freedom of data providers to
256 choose identifiers for resources and collections under their
257 control.
258 \end{enumerate}
259
260 \subsection{Rationale for Version 2}
261
262 A need for revising the IVOA Identifiers specification was discerned
263 ever since \citet{note:uriforms} pointed out that common practices
264 regarding dataset identifiers were not in line with URI semantics.
265 Also, with the publication of StandardsRegExt
266 \citep{std:STDREGEXT}, it became advisable to regulate the ways
267 standards are referenced in the VO in ways compatible with the spirit of
268 that standard.
269
270 As the Registry Working Group set about revising the Identifiers
271 recommendation, it was decided to drop the XML representation for
272 IVOIDs since it complicated the text but had never actually been found
273 useful. Even in XML serializations only the URI form of IVOA
274 identifiers had been used. Dropping the XML form nevertheless
275 constitutes an incompatible change, which necessitates an increase in the
276 major version number.
277
278 Despite the new major version, consensus was that current usage of
279 IVOIDs should not be impacted and existing practices sanctioned as far
280 as possible. Apart from deprecating the use of fragment identifiers to
281 distinguish datasets and restricting authorities to only use
282 \abnfnt{unreserved} characters (which does not impact existing authority
283 identifiers), this specification therefore refrains from modifying
284 version 1 regulations even where they were found somewhat burdensome
285 (e.g., as regards case-insensitiveness in resource keys).
286
287 The opportunity of a revision was also used to organize the
288 specification content in parallel to RFC 3986; for instance, the notion
289 of stop characters from version 1 -- necessitated by the non-URI XML
290 representation -- has no counterpart in non-IVOID URIs and is now
291 encompassed naturally by the usual rules for parsing URIs.
292
293 Closely following RFC 3986 also allows rigorous definitions for the
294 interpretation of local parts. In addition to what version 1
295 specified, we now allow
296 percent-encoded characters there, and we comment on techniques to
297 resolve IVOIDs with such local parts. This is finally used to define
298 the standard and the dataset identifiers that started the revision
299 process.
300
301 \subsection{IVOA Identifiers within the VO Architecture}
302
303 \begin{figure}[ht]
304 \centering
305 \includegraphics[width=0.9\textwidth]{archdiag.png}
306 \caption{Architecture diagram for the IVOA Resource Identifier
307 specification}
308 \label{fig:archdiag}
309 \end{figure}
310
311 Fig.~\ref{fig:archdiag} shows the role this document plays within the
312 IVOA architecture \citep{note:VOARCH}. As identifiers are the primary
313 keys into the Registry, essentially all standards regulating the
314 Registry depend on this specification. The data access protocols are
315 mainly impacted through the use of dataset identifiers -- e.g., in SSAP
316 \citep{std:SSAP} and Obscore \citep{std:OBSCORE} --, which are also
317 IVOID. For the same reason, VOEvent is impacted.
318
319 The core of this standard has no dependencies on other VO standards.
320 The section on identifiers for standards depends on
321 StandardsRegExt \citep{std:STDREGEXT}.
322
323 \section{Specification}
324
325 After a brief, informal specification that should be enough for
326 non-demanding applications, this section gives, for each relevant part
327 of RFC 3986, additional requirements for IVOA identifiers. The
328 normative content should be read together with RFC 3986.
329
330 \subsection{Overview (non-normative)}
331
332 IVOA identifiers (IVOIDs for short) are RFC 3986-compliant URIs with a
333 scheme of \texttt{ivo}. Thus, their generic form is
334
335 $$\underbrace{\texttt{ivo:}
336 \texttt{//}
337 \abnfnt{authority}
338 \abnfnt{path}}_{\mbox{Registry part}}
339 \underbrace{\texttt{?}\abnfnt{query}
340 \texttt{\#}\abnfnt{fragment}}_{\mbox{local part}},
341 $$
342 where \abnfnt{path} is either empty or starts with a slash.
343
344 IVOIDs consisting of only scheme and authority are known as authority
345 identifiers and play a special role in creating other IVOIDs (see
346 sec.~\ref{sect:creating}). IVOIDs without a local part
347 must resolve to a Registry record within the IVOA Registry.
348 Likewise, for all IVOIDs, the IVOID resulting from stripping the local
349 part (the Registry part) must resolve within the IVOA Registry. It is
350 called a \emph{Registry reference}.
351
352 The RFC 3986 \abnfnt{path} element is called \emph{resource
353 key} in IVOIDs.
354
355 Authority ids must consist of letters, numbers, dashes and dots
356 exclusively. Resource keys must not contain URI reserved characters
357 (essentially, anything except alphanumeric characters, dashes, dots,
358 underscores, and tildes is forbidden) except where an IVOA standard
359 defines how they are to be treated.
360
361 The Registry references are,
362 as a whole, compared case-insensitively, and must be treated
363 case-insensitively throughout to maintain backwards compatibility with
364 version 1 of this specification. When comparing full IVOIDs, the local
365 part must be split off and compared preserving case, while the registry
366 part must be compared case-insensitively.
367
368 To make IVOIDs useful where these complex rules are hard to implement
369 (e.g., databases columns), handling applications SHOULD NOT change the
370 case of any part of IVOIDs when these might have a local part.
371
372 Examples for IVOIDs:
373
374 \begin{itemize}
375 \item \nolinkurl{ivo://ivoa.net} -- an IVOID without a resource key,
376 i.e., an authority; dereferencing in the Registry must yield a
377 \xmlel{vr:Authority}-typed record.
378
379 \item \nolinkurl{ivo://ivoa.net/std/Identifiers} -- an IVOID with a
380 resource key. Dereferencing this in the Registry must yield a resource
381 record. As long as there is no local part, an IVOID only differing
382 in case, e.g.,
383 \nolinkurl{ivo://IVOA.NET/std/identifiers}, is in every respect equivalent to
384 it.
385
386 \item \nolinkurl{ivo://example.org/~?path/to/\%C3\%89CLAIRE} -- an IVOID
387 without guarantees as to if it resolves and what it resolves to. The
388 Registry reference \nolinkurl{ivo://example.org/~} must resolve to a valid Registry
389 record, though.
390
391 \item \nolinkurl{ivo://example.org/svc?voc.xml#Term} -- an IVOID
392 conceptually referencing some item within
393 \nolinkurl{ivo://example.org/svc?voc.xml}. If that latter IVOID can be
394 dereferenced, there should be an entity within the resource retrieved
395 that is itself identified by \texttt{Term}. The classic example would
396 be an element with an \xmlel{id} of \texttt{Term} within an XML
397 document.
398 \end{itemize}
399
400 The remainder of this section contains a formalization of these points.
401
402 \subsection{Characters}
403
404 \label{sect:chars}
405
406 This specification poses no additional global constraints on the
407 character content of IVOIDs over what Section~2 of RFC 3986 specifies.
408 Special restrictions on the authority part and the resource key are
409 given below. In particular, the \abnfnt{gen-delims} have, where
410 applicable, the standard URI interpretation. As IVOIDs have no use for
411 IPv6 addresses or user components, square backets and the commercial at
412 sign MUST NOT occur literally in IVOIDs anywhere.
413
414 The \abnfnt{sub-delims} MUST NOT be part of the resource key unless
415 another IVOA specification defines their use. Their use in local parts
416 is not restricted by this specification, nor is any semantics defined
417 for them. Other IVOA specifications may furnish them with semantics.
418
419 In IVOIDs, characters from \abnfnt{unreserved} MUST NOT be
420 percent-encoded.
421
422 Percent-encoded characters are allowed in local parts (but neither in
423 authority nor the resource key). When
424 specifications or applications require text to be percent-encoded within
425 an IVOID, the text MUST be encoded in UTF-8.
426
427
428 \subsection{Syntax Components}
429
430 \subsubsection{Scheme}
431
432 The \abnfnt{scheme} part of IVOIDs is \texttt{ivo}. Note that, by RFC
433 3986, scheme identifiers are case-insensitive.
434
435 A URI that uses this scheme (an IVOID) signals that:
436
437 \begin{itemize}
438
439 \item the registry part of the IVOID
440 and the resource it refers to have been
441 registered in the VO Registry
442 \item the URI complies with the additional restrictions laid down in
443 this document
444 \end{itemize}
445
446 The ivo scheme does not imply a transport protocol by which the resource
447 may be accessed. Agents, in general, should not depend on implicit
448 mappings between IVOIDs and URIs in other schemes like \texttt{http}
449 when dereferencing them. The only defined way to dereference IVOIDs is
450 described in sect.~\ref{sect:dereferencing}. Resource publishers,
451 however, may support additional mappings between identifiers and other
452 URIs (such as http URLs) that they manage; in this case, agents should
453 only assume the mapping applies within the domain of the publisher.
454
455
456
457 \subsubsection{Authority}
458
459 \begin{admonition}{Note}
460 While the syntax for the authority identifiers
461 allows it to look just like a DNS hostname, current convention
462 discourages this practice to avoid the suggestion that an IVOA
463 Identifier can be resolved like a common http URL.
464 As of this writing, the
465 convention of the US Virtual Astronomical Observatory (VAO)
466 is hierarchical naming that
467 combines the publishing organization name with the project or
468 archive (e.g. ``adil.ncsa'') while leaving out fields like
469 ``.edu''
470 or ``.org''. In the AstroGrid
471 project, the convention is to use a DNS name in reverse order
472 (e.g. ``org.astrogrid.www''); this practice has the advantage of
473 reducing the probability that two organizations will want to
474 use the same authority identifier.
475 \end{admonition}
476
477
478 A \emph{naming authority} is an
479 organization (usually a data
480 provider) that has been granted the right by
481 the IVOA to create IVOA-compliant identifiers for resources it
482 registers. See sect.~\ref{sect:creating} for
483 details on how this right is granted. The naming authority creates
484 IVOIDs with empty local parts within the scope of one or more
485 authority identifiers.
486
487 The \emph{authority} component of an IVOID is severely restricted over
488 RFC 3986 as follows:
489
490 \begin{itemize}
491 \item it MUST be at least three characters long
492 \item it MUST begin with an alpha-numeric character
493 \item it MUST NOT contain percent-encoded characters
494 \item it MUST NOT contain characters outside of \abnfnt{unreserved},
495 with the tilde strongly discouraged
496 \item there are no \abnfnt{userinfo} or \abnfnt{port} components
497 \end{itemize}
498
499
500 In ABNF, using the symbols from RFC 3986, an authority identifier
501 in IVOIDs thus has the form:
502
503 \begin{eqnarray*}
504 \abnfnt{authority} &\abnfto& \abnfnt{alphanum} \abnfnt{unreserved}
505 \abnfnt{unreserved} \abnfrepeat{\abnfnt{unreserved}}
506 \end{eqnarray*}
507
508 A naming authority is allowed to control multiple
509 authority identifiers to organize related resources into different
510 namespaces. For example, an organization may
511 choose to control two authority identifiers, one for research-related
512 resources and one for education/outreach resources, even though they
513 are all maintained by the same organization and perhaps made available
514 through the same machine.
515
516
517 \paragraph{Examples for valid authorities}
518
519 \begin{compactenum}[(1)]
520 \item \texttt{nasa.heasarc}
521 \item \texttt{n\_1a.alph-0.02}
522 \item \texttt{123} (authorities can start with a number)
523 \end{compactenum}
524
525 \paragraph{Examples for invalid authorities}
526
527 \begin{compactenum}[(1)]
528 \item \texttt{a2} (less than three characters)
529 \item \texttt{\_temporary.id} (authorities must begin with an alphanumeric
530 character, which the underscore is not)
531 \item \texttt{DAT\%41} (percent-encoded characters are not allowed, even if they
532 work out to be unreserved characters)
533 \item \texttt{de!uni-hd!physics\#ari} (not entirely consisting of unreserved
534 characters)
535 \end{compactenum}
536
537
538 \subsubsection{Resource Key}
539
540 \label{sect:reskey}
541
542 RFC 3986's \abnfnt{path} part of an IVOID is called a \emph{resource key}.
543 It is a
544 name for a resource that is unique within the namespace of an
545 authority identifier. The naming authority creates keys for its namespaces
546 and has complete control of their forms beyond the syntax constraints
547 specified here.
548
549 On top of the definitions in RFC 3986 for paths, section 3.3, resource keys in
550 IVOIDs are further constrained in that
551
552 \begin{itemize}
553 \item \abnfnt{segment} MUST NOT contain percent-encoded characters
554 \item \abnfnt{segment} MUST NOT contain colons or commercial at signs
555 \item Only \abnfnt{path-abempty} expansions are allowed
556 \end{itemize}
557
558 In ABNF, using or overriding the symbols of RFC 3986, this means:
559
560 \begin{eqnarray*}
561 \abnfnt{path} &\abnfto &\abnfnt{path-abempty}\\
562 \abnfnt{segment} &\abnfto & \abnfrepeat{\abnfnt{ivo-segment-char}}\\
563 \abnfnt{ivo-segment-char}& \abnfto& \abnfnt{unreserved} \abnfor
564 \abnfnt{sub-delims}
565 \end{eqnarray*}
566
567 Naming authorities MUST NOT create path
568 segments matching either ``.'' or ``..''; empty
569 segments, resulting in two or more consecutive slashes or a trailing
570 slash, are also forbidden. In particular, as
571 described in sect.~\ref{sect:comparing},
572 such segments would not have the
573 special meaning they have in traditional file system pathnames; that
574 is, a resource key cannot be transformed by removing any kinds of
575 segments and still reference the same resource.
576
577 Note that, as discussed in sect.~\ref{sect:chars}, characters from
578 \abnfnt{sub-delims} MUST NOT be used in resource keys unless their
579 semantics is defined in an IVOA specification. As percent-encoded
580 characters are not allowed in resource keys, these characters MUST NOT
581 occur in generic Registry references at all.
582
583 The naming authority is free to create a
584 resource key that suggests something about the resource it refers to.
585 Any meaning that is suggested by the resource key is intended only for
586 human consumption. The character content of a resource key is not
587 semantically machine-interpretable within the context of the IVOA as
588 defined by this document.
589
590 The presence of a resource key is optional. An identifier that
591 contains only an authority identifier refers to the authority
592 itself and MUST resolve to a \xmlel{vr:Authority}-typed resource record
593 \citep{std:VOR} in the IVOA Registry.
594
595 VO applications MUST be case-insensitive when processing
596 resource keys. In presentation,
597 the preferred use of case is set by the rendering of the key by the
598 naming authority when the IVOID is registered. This may contain
599 capital letters to improve readability.
600
601 \paragraph{Examples for valid resource keys}
602
603 \begin{compactenum}[(1)]
604 \item \texttt{""} (i.e., the empty string; zero repetitions of (\abnfterm/ \abnfnt{segment}) are
605 legal)
606 \item \texttt{/reskey}
607 \item \texttt{/\char127 user/STScI\_1/1a-7z.u} (unreserved characters are
608 allowed, and arbitrarily many segments are allowed)
609 \end{compactenum}
610
611 \paragraph{Examples of invalid resource keys}
612
613 \begin{compactenum}[(1)]
614 \item \texttt{/} (empty \abnfnt{segment}s are forbidden)
615 \item \texttt{reskey} (nonempty resource keys must always start with a
616 slash)
617 \item \texttt{/data/} (empty \abnfnt{segment}s are forbidden)
618 \item \texttt{/data//other} (empty \abnfnt{segment}s are forbidden)
619 \item \texttt{/data/c/../d} (\abnfnt{segment}s that indicate tree traversal in
620 other URI schemes are forbidden)
621 \item \texttt{/data!g-vo.org} (although this might become legal when some
622 IVOA standard gives the bang -- which is from \abnfnt{sub-delims} -- an
623 extra meaning)
624 \item \texttt{/user/M\%fcller} (percent encoding is forbidden in resource
625 keys; if it were, the codepoint 0xfc is not in
626 \abnfnt{ivo-segment-char}; if that were true, it would still not be
627 valid utf-8)
628 \end{compactenum}
629
630
631 \subsubsection{Query}
632 \label{sect:querypart}
633
634 This specification does not pose constraints on \abnfnt{query} beyond
635 the definitions in RFC 3986. It also does not define any semantics.
636
637 Creators of IVOIDs are encouraged to adhere to URI semantics, i.e.,
638 IVOIDs with different query parts should refer to different resources.
639
640 To allow some resilience towards clients erronerously case folding the
641 query part, operators SHOULD NOT define IVOIDs referring to different
642 resources differing only by case in the query part.
643
644 Still, operators are not required to perform case folding on query
645 parts. Therefore, applications MUST NOT change the case of characters
646 in query parts.
647
648 \paragraph{Examples for valid query parts}
649
650 \begin{compactenum}[(1)]
651 \item \texttt{par1=val1\&par2=val2} (the classic use for query parts in
652 HTTP URLs as, e.g., generated by browser forms)
653 \item \texttt{//..//!:??} (but sub-delims, slashes and question marks
654 are allowed here, as are strings looking like forbidden segments in
655 resource keys)
656 \item \texttt{\%C2\%B5\%20Her} (percent-encoding special characters is
657 legal, but outsize of ASCII one has to use utf-8; this example works out
658 to be ``$mu$ Her'')
659 \item \texttt{\%3A\%5B\%5D} (while the generic delimiters \#, [,
660 and ] are not allowed in query parts literally, they can be included
661 in percent-encoded forms)
662 \end{compactenum}
663
664 \paragraph{Examples for invalid query parts}
665
666 \begin{compactenum}[(1)]
667 \item \texttt{:\#[] bad} (most generic delimiters are not allowed
668 literally in query parts, nor is the blank)
669 \item \texttt{\%B5\%20Her} (sequences of percent-encoded characters must
670 be valid utf-8 after decoding)
671 \end{compactenum}
672
673
674 \subsubsection{Fragment}
675
676 This specification does not pose constraints on \abnfnt{fragment}
677 beyond the definitions in RFC 3986.
678
679 Creators of IVOIDs are encouraged to adhere to URI semantics, i.e.,
680 fragment identifiers should be used to distinguish between different
681 entities within the same parent resource as discussed in
682 \citet{note:uriforms}. The details of this process depend on the type
683 of document being retrieved. See sects.~\ref{sect:dereferencing} and
684 \ref{sect:standards} for details.
685
686 Applications MUST NOT change the case of characters in fragments.
687
688 For examples for valid and invalid fragments, see the examples for query
689 parts in sect.~\ref{sect:querypart}
690
691 \subsection{Usage}
692
693 IVOIDs are used to identify resources in the general sense, i.e., they
694 might refer to datasets, abstract concepts, etc.; their Registry
695 parts, on the other
696 hand, MUST always be dereferenceable, i.e., resolve in the VO Registry.
697
698 No hierarchy is implied in any of the components. Therefore, there are
699 no relative URIs for IVOA Identifiers. In effect, this specification
700 overrides the rule in section~4.1 of RFC 3986 to become
701
702 $$
703 \abnfnt{URI-reference} \abnfto \abnfnt{URI}.
704 $$
705
706 \subsection{Reference Resolution}
707 \label{sect:dereferencing}
708
709 Registry references
710 can always be resolved to a Registry record by querying a
711 searchable registry, for instance, using RegTAP \citep{std:RegTAP}.
712 Clients will usually have some Registry endpoint URLs built in, more
713 are discoverable as described in \citet{std:RI1}. In a full registry
714 with an OAI-PMH interface, the OAI-PMH \emph{GetRecord} operation
715 provides another means for obtaining the Registry record referenced by
716 an IVOID.
717
718 If an IVOID's Registry part does not resolve in the Registry,
719 clients SHOULD assume it
720 is obsolete and that any IVOID built with it does not reference an
721 existing resource or entity either.
722
723 When dereferencing IVOIDs with query parts, applications should first
724 dereference the reference part to a registry record. From that, a service
725 should be identified that can dereference the full IVOID. Concrete
726 procedures may be given in IVOA specifications introducing certain
727 resource types. One example for this is sect.~\ref{sect:dids}.
728
729 There is no mechanism that would allow applications to tell from
730 an IVOID's form whether or not it can be dereferenced in any special
731 way. Any such information has to be obtained from the context the IVOID
732 is found in.
733
734 For resolving IVOIDs with fragment identifiers, applications would again
735 resolve the Registry part in the Registry. In the presence of a query
736 component, it would be dereferenced as just discussed to obtain a basic
737 document, otherwise the basic document is the Registry record itself.
738 The entity referred to is then extracted from the basic document by
739 means specific to the document type; one example of such a prescription
740 is given in sect.~\ref{sect:standards}.
741
742 As there are no relative IVOIDs, most of RFC 3986's section~5 does not
743 apply here.
744
745 \subsection{Normalization and Comparison}
746 \label{sect:comparing}
747
748 An important use of identifiers is comparing two instances to
749 determine if they refer to the same resource. This will most commonly
750 occur when using an identifier to look up the associated resource
751 description in a registry.
752
753 IVOID comparison is according to RFC 3986, section 6.2.2, with the
754 following additional regulations:
755
756 \begin{itemize}
757 \item As no hierarchy is implied in any IVOID part, no path segment
758 normalization is ever performed on IVOIDs.
759 \item As IVOIDs must not percent-encode characters that do not need to
760 be encoded, no percent-encoding normalization is ever performed on
761 IVOIDs.
762 \item In addition to scheme and authority as in RFC 3986, in IVOIDs the
763 resource key is also compared case-insensitively. This means that
764 Registry references can be case-folded for processing.
765 \end{itemize}
766
767 Note that neither query parts nor fragment identifiers may be compared
768 case-insensitively or normalized in any other way; allowing this would
769 severely impact their usefulness, as they, in general, refer to
770 case-sensitive entities like XML ids or file system paths.
771
772 No further normalizations are performed in IVOID comparison, i.e.,
773 sections 6.2.3 and 6.2.4 of RFC 3986 do not apply.
774
775 For instance, given the IVOID
776 $$\mbox{\nolinkurl{ivo://example.com/res/key1?par=U\%20Pic\#Part1},}$$ the
777 IVOID
778 $$\mbox{\nolinkurl{IVO://EXAMPLE.COM/RES/KEY1?par=U\%20Pic\#Part1}}
779 $$ must compare equal, while the following IVOIDs must compare
780 non-equal:
781
782 \begin{itemize}
783 \item \nolinkurl{ivo://example.com/res/key1?par=u\%20Pic\#part1}
784 (query part and fragment are non case-insensitive)
785 \item \nolinkurl{ivo://example.com/./res/key1?par=U\%20Pic\#Part1}
786 (no path normalization takes place, even if that were a legal IVOID)
787 \item \nolinkurl{ivo://example.com/res/key1?par=U\%20Pic} (fragment
788 identifiers may not be stripped off for comparison)
789 \item \nolinkurl{ivo://example.com/res/key1?par=U\%20Pic\&\#Part1}
790 (query parts are not parsed, and their interpretation as key/value pairs
791 is up to data providers)
792 \item \nolinkurl{ivo://example.com/res/\%6Bey1?par=U\%20Pic\#Part1}
793 (no normalization of percent encoding takes place)
794 \end{itemize}
795
796 In general, the string-based comparison of identifiers
797 cannot determine definitively if two identifiers refer to different
798 resources. While it is not intended that a Registry record is
799 registered multiple times with different identifiers, it is not
800 disallowed by this specification. In particular, it is possible that
801 two resources with different identifiers may be mirrors of each other;
802 such a relationship can only be determined by examining the metadata
803 contained in the descriptions associated with each identifier.
804
805
806 This concludes the additional constraints and regulations for IVOIDs
807 over RFC 3986 compliant URIs. The remainder of this document
808 standardizes certain aspects not in the scope of RFC 3986.
809
810 \section{Creating Identifiers}
811 \label{sect:creating}
812
813 An important aim of the process for creating identifiers is to ensure
814 uniqueness. In the context of IVOA
815 identifiers, ``unique'' means that a given identifier MUST NOT refer
816 to two different resources at any instant. Furthermore, the
817 identifier SHOULD refer to at most one resource over all time; that
818 is, IVOIDs should not be reused for unrelated resouces. Note that a
819 resource may potentially be dynamic (such as 'weather at telescope' or
820 'current version of the standard') -- here, there is a conceptually unique
821 resource, even though the content of it may change in time.
822
823 Another aim of the identifier creation process is to trace the
824 delegation of authority over the identifier.
825 In practice, a Registry reference is created by
826 an organization when registering a resource.
827 Thus, only recognized naming authorities (or
828 persons representing such organizations) may create Registry references.
829
830 The details of the service used to claim a
831 naming authority is described in the IVOA Registry
832 Interfaces standard \citep{std:RI2}.
833
834 Once an organization is recognized as a naming authority, it is free
835 to register any number of resources with identifiers having an
836 authority identifier that they control. No
837 organization may create an identifier with an
838 authority identifier it does not control. The naming
839 authority has full control over the creation of a
840 resource key as long as it conforms to the syntax
841 and uniqueness constraints described in this specification.
842
843 Likewise, once a Registry reference is established, any number of IVOIDs may be
844 built using it (e.g., when publishing new datasets). In this case, the
845 VO Registry is not involved, IVOID creation happens under the exclusive
846 control of the owner of the service or data collection the Registry
847 reference refers to.
848
849
850
851 \section{Special Identifier Types}
852 \label{sect:specials}
853
854 This section discusses some special classes of IVOIDs that reference
855 something other than Registry records and for which identifier forms for
856 one reason or other must or should be uniform across the different other
857 standards that define the resources referenced.
858
859 \subsection{Dataset Identifiers}
860 \label{sect:dids}
861
862 DAL standards standards like Obscore \citep{std:OBSCORE}, SSAP
863 \citep{std:SSAP}, or Datalink \citep{std:Datalink} need to reference
864 datasets. The SSAP standard defines these as ``an individual data object
865 usually including associated metadata.'' In astronomy, single images or
866 spectra are datasets, but tables or more complex data products might, at
867 the publisher's discretion, also be referenced as a single dataset.
868
869 A reference to a dataset is called a dataset identifier (DID), more
870 specifically publisher DID if the DID was assigned by the dataset's
871 publisher, and creator DID if the DID was assigned by the dataset's
872 author. Various standards mandate that DIDs must be IVOIDs.
873
874 Historically, DIDs were customarily formed by adding fragment
875 identifiers to Registry reference, a practice recommended in
876 SSAP in versions up to 1.1.
877 This definition was criticized in
878 \citet{note:uriforms} as a potential interoperability issue.
879
880 Therefore, this specification deprecates the regulation from SSAP 1.1.
881 Instead, DIDs in the VO now MUST use the query part to distinguish
882 datasets within one VO resource. In short, the separator between
883 Registry reference and local part now must be the question mark rather than the
884 octothorpe. A welcome side effect is that the fragment identifier can
885 now be used to reference sub-entities within the datasets.
886
887 An example for a dataset id (that should actually resolve according to
888 the scheme laid out below) is $$
889 \mbox{\nolinkurl{ivo://org.gavo.dc/\~?flashheros/data/ca92/f0065.mt}.}$$
890
891 Existing DIDs in services implementing SSAP up to 1.1 and Obscore 1.0
892 are not affected by these requirements and may be used until the
893 respective services are updated to newer standards.
894
895 Note that by this specification publishers have no obligation to ensure
896 continued access to datasets identified with PubDIDs. They are \emph{not}
897 by themselves
898 persistent identifiers with guarantees on resolvability. Their main
899 function is to provide globally unique identifiers for use in, e.g.,
900 federating responses from different services.
901
902 Publishers are, however, encouraged to declare at least one capability
903 of a protocol dealing with
904 PubDIDs\footnote{At the time of this writing, Datalink, Obscore, and
905 SSA are IVOA recommended protocols allowing queries involving PubDIDs.
906 SIA \citep{std:SIAP} will, according to current
907 proposed recommendations, have an analogous facility in version
908 2.0.} in the resource record referenced by the Registry part of
909 a PubDID (i.e., the URI in front of the first question mark). In that
910 way, clients can attempt to retrieve data based on
911 stand-alone PubDIDs by querying the
912 Registry for the ``embedding'' resource and seeing if it supports any
913 protocol they implement.
914
915 The definition of a proper resolver or resolution strategy is beyond the
916 scope of this standard. Although services prototyping such funtionality have
917 been written\footnote{e.g., GAVO's global PubDID resolver at
918 \url{http://dc.g-vo.org/glopidir}.}, we
919 maintain additional efforts are required outside of Registry to build a
920 reliable infrastructure on top of PubDIDs.
921
922 \subsection{Standard Identifiers}
923 \label{sect:standards}
924
925 In many VO standards, it is important to express adherence to a
926 set of constraints.
927 Common examples include the declaration of the protocol --
928 and the version of the protocol -- that an endpoint implements in
929 VOResource's \xmlel{capability} element or a data model represented
930 with a TAP service in TAPRegExt. The resource record such identifiers
931 reference is defined by StandardsRegExt \citep{std:STDREGEXT}. As such
932 records typically describe multiple versions of a standard, and a single
933 standard may contain definitions of multiple different capabilities that
934 need to be discerned, the simple Registy Reference of the standard record usually is
935 not enough.
936
937 Therefore, StandardsRegExt records should define one
938 \xmlel{key} element for each such referenceable
939 entity. The \xmlel{name} child of this key, denoting both the kind of
940 capability and the major and minor version, is then what is referenced
941 by the identifier as defined by StandardsRegExt, such that the complete
942 element will typically have the form
943 $$
944 \abnfnt{standard-ref}\abnfterm{\#}\abnfnt{key-name}\abnfterm{-}\abnfnt{version}
945 $$
946
947 For instance, the standard exampleProto might define both a
948 data model \texttt{model} and a query capability \texttt{query}. In
949 its version 1.0, there would be two standard keys \texttt{model-1.0} and
950 \texttt{query-1.0}. In a \xmlel{capability} element in another
951 resource's Registry record, support of the query capability would then
952 be declared with the IVOID
953 \texttt{ivo://ivoa.net/std/exampleProto\#query-1.0}, whereas a TAP
954 service exposing the model would contain a \xmlel{dataModel} element
955 with an \xmlel{ivo-id} attribute of
956 \texttt{ivo://ivoa.net/std/exampleProto\#model-1.0}.
957
958 As the exampleProto develops, new standard keys like
959 \texttt{query-1.1} or \texttt{query-2.0} are added. Note that while ideally,
960 the version tags in the keys will correspond to the version of the
961 document that defines them, this is not a requirement. Indeed, if the
962 underlying model has no incompatible changes, even exampleProto 2.0
963 might specify that its data model would remain
964 \texttt{ivo://ivoa.net/std/exampleProto\#model-1.0}. This allows clients
965 to easily discover all services they can operate.
966
967 Registry interfaces will typically offer some pattern matching
968 capability for comparing such identifiers.
969 Clients should use that feature
970 to ignore minor versions if appropriate -- by the IVOA's versioning
971 rules \citep{std:docSTD},
972 a generic client for version 1 of a protocol should be able to
973 operate all version 1 services, regardless of their minor versions, and
974 clients implementing multiple versions of a standard can entirely ignore
975 the version tag. For instance, with RegTAP \citep{std:RegTAP}
976 an exampleProto 1.0 client would look for capabilities for which
977 $$
978 \texttt{standard\_id LIKE 'ivo://ivoa.net/std/exampleProto\#query-1.\%'}
979 $$
980 holds, whereas a client that speaks both versions 1 and 2 of the
981 protocol would look for capabilities with
982 $$
983 \texttt{standard\_id LIKE 'ivo://ivoa.net/std/exampleProto\#query-\%'}.
984 $$
985
986 \appendix
987
988 \section{Changes from Previous Versions}
989
990 \subsection{Changes from PR-2015-07-09}
991
992 \begin{itemize}
993 \item Now deprecating the term IVORN, as historical usage has been too
994 inconsistent. Instead, there is now the ``Registry part'' of an IVOID,
995 and an IVOID hat only has a registry part is called a Registry
996 reference.
997 \item More examples
998 \item No longer suggesting a concrete algorithm for PubDID resolution;
999 instead, clear encouragement to PubDID minters to point to appropriate
1000 services from the Registry part of a PubDID.
1001 \item Editorial changes
1002 \end{itemize}
1003
1004 \subsection{Changes from 1.13}
1005
1006 \begin{itemize}
1007 \item Removed the (unused) XML representation of Identifiers.
1008 \item Rewrote the section on URI forms to more closely correspond to
1009 the organization of RFC 3986.
1010 \item Case-insensitive handling of IVORNs is now a MUST.
1011 \item Now allowing percent-encoded items outside of the authority and
1012 resource key.
1013 \item Added rules for forming URI-compliant dataset identifiers
1014 \item Added rules for forming StandardsRegExt-compliant standard
1015 identifiers.
1016 \item Empty path segments, as well as those consisting exclusively of
1017 dots, are now forbidden rather than just discouraged.
1018 \item Dropped the recommendation to present authority identifiers in
1019 lower case.
1020 \item Generally moved to IVOID as the abbreviation for IVOA identifier,
1021 defined IVORN to be the part of an IVOID without a local part.
1022 \item Removed some obsolete introductory material that has been
1023 superseded by other standards.
1024 \item Migrated to ivoatex source
1025 \end{itemize}
1026
1027 \subsection{Changes from v1.10}
1028
1029 \begin{itemize}
1030 \item Moved ``!'' from the discouraged list of
1031 characters to the reserved list,
1032 thereby disallowing its inclusion in IVOA identifiers.
1033 \item Clarified the list of characters disallowed in an authority ID by:
1034 \begin{itemize}
1035 \item explicitly disallowing URI-escaped sequences.
1036 \item listing as reserved characters only those characters
1037 that are allowed by the URI spec but disallowed by this
1038 one.
1039 \item Listed in a tip box the characters that are disallowed
1040 by the URI spec.
1041 \end{itemize}
1042 As before, the definition of the resource key
1043 refers to the same list of
1044 reserved characters as those disallowed.
1045 \item Fixed numerous links and references.
1046 \end{itemize}
1047
1048
1049 \subsection{Changes from v1.0}
1050
1051 \begin{itemize}
1052 \item The prohibition of using ``+'' and ``='' within
1053 Identifier components has been dropped.
1054 \item Recommendations for authority ID strings
1055 have been updated to match current practice in AstroGrid and the
1056 NVO.
1057 \item In the example schema in App. A, the namespace was altered to conform
1058 with IVOA conventions. A correction was also made to the
1059 allowed pattern for AuthorityIDType to properly comply with the XML
1060 specification defined in section 3.2.1.
1061 \item various clarifications based on reviewer comments
1062 \end{itemize}
1063
1064 \subsection{Changes from v0.1}
1065
1066 \begin{itemize}
1067 \item Resource key is now required except when referring to a naming
1068 authority itself.
1069 \item support for DNS-like authority IDs clarified.
1070 \item added role of \# and ? as ``stop'' characters in URI form.
1071 \item dropped non-binding Appendix B: Recommended Mechanism for
1072 becoming a Naming authority.
1073 \end{itemize}
1074
1075
1076 \bibliography{ivoatex/ivoabib}
1077
1078
1079 \end{document}

Properties

Name Value
svn:keywords Date Rev URL

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26