I nternational
V irtual
O bservatory
A lliance
The fragment
identifier in a URI has a specific semantics
attached to it. IVOA specifications should therefore not use
it as a simple indicator of hierarchy or containment
.
This is an author's draft. It has no IVOA standing as such, but will be submitted as a Note to the IVOA documents series once it has received some feedback.
(updated automatically)
A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.
The author is most grateful for comments and criticisms received from Guy Rixon, Mark Taylor, Markus Demleitner, and Dick Shaw.
URIs are defined in IETF RFC 3986 std:rfc3986. In its
full generality, the syntax of URIs is quite complicated, but most of
the URIs we commonly see use only a subset of the possible features, namely a
scheme
(which is usually http
or sometimes, in VO contexts,
ivo
), a host
prefixed by a pair of slashes
//
, a path
with elements separated by single
slashes /
, and a possible fragment, separated from the
rest of the URI by a hash or number sign, #
. The point of
this present note is to stress that the fragment is importantly
distinct from the other parts of the URI: it is not sent over the
network to a remote server, when the URI is retrieved or
dereferenced.
When looking at a webpage in a web browser – for example the URL
http://www.ivoa.net/Documents/#notes
– the browser
retrieves the path /Documents/
from the server at
www.ivoa.net
and once it has retrieved the HTML page that
come back, it searches within the page for the anchor
labelled with notes
. Crucially, this search happens
entirely on the client side, and it or its analogue happens during the
processing of any URI – it is not specific to HTTP or to HTML
pages. It also therefore applies to IVORN URIs (starting
ivo:
) std:ivo and VOSpace URIs (starting
vos:
) std:voevent.
In brief: The fragment
identifier in a URI (RFC 3986,
std:rfc3986) has a specific semantics
attached to it. IVOA specifications should therefore not use
it as a simple indicator of hierarchy or containment
. Or, put
another way: punctu–ation,isn#t ju`st !dec$ora/tion.
This document is not intended to be a comprehensive survey of recommended and deprecated URL patterns. We note, however, that quite a lot of the suggestions in the famous Cool URIs don't change document are as valid now as they were in 1998.
Several IVOA standards define URI patterns for the objects they describe – the VOEvent and VOSpace standards are an example. In this context, it is natural to use the URI fragment as a way of referring to a resource which is conceptually contained within another, by analogy with the way that the fragments in HTML pages are conceptually within the page. Unfortunately, the fixed and invariable meaning attached to URI fragments means that the applications which process such URIs may be required by the (IETF RFC) standard to process them in ways which may be unintended by the IVOA standards. If applications, guided by an IVOA standard, do not process URIs in a conformant way, then we are concerned that those applications will risk being frustrated by conformant library APIs, by caches, and by future developments in URI standards themselves.
The rest of this section is a detailed discussion of the problem, with a rather legalistic tone, in terms which presume some acquaintance with the details of the URI specification std:rfc3986.
The fundamental problem with URI formats such as
scheme:foo#local_ID
is that the specification for URIs
std:rfc3986 requires that the fragment (the
#local_ID
) is removed prior to any dereference –
the fragment identifier is separated from the rest of the URI prior
to a dereference
(this and other quotations here are from section 3.5 of
the URI RFC). Other language in this section makes it clear
that the fragment has a special, and secondary, status ([t]he
fragment identifier component of a URI allows indirect identification
of a secondary resource by reference to a primary resource and
additional identifying information
) and that this
is independent of the scheme: [f]ragment
identifier semantics are independent of the URI scheme and thus cannot
be redefined by scheme specifications
.
Further, the fragment identifier is not used in the
scheme-specific processing of a URI
. This means that in order to
conform to the URI specification, the processing of the
ivo:
URI scheme must ignore the fragment. This means
that whenever an IVORN ivo://foo/bar#baz
is processed
(or in general used in any way other than a name in the
ivo://foo/bar
namespace), that processing must be done on
the IVORN ivo://foo/bar
alone, and the presence of the
#baz
fragment taken account of only after retrieval is complete.
Another way of phrasing this is that there is no guarantee that a
server will see
the fragment in any URI, since any of possibly
multiple intermediaries between the client and the server will be
licensed to remove it (nor, incidentally, is there any guarantee that
a server will not see the fragment).
The intention of the URI specification is that such a URI is conceptually handled by the client stripping the fragment, processing the resulting cropped URI, and then resolving the fragment, in some scheme-specific way, on the client.
In the VOEvent spec, however, .../streamid
and
.../streamid#local_ID
are conceived as completely
independent resources, contrary to the prescriptions in the URI
RFC.
See section affected for a note on affected IVOA Standards.
This is not merely a theoretical problem, for three reasons.
One can imagine a URI API which allows for scheme-specific
handlers (eg for vos:
or ivo:
), in the way
that the java.net.URI
class does. Such a handler
class's API could potentially be constructed in such a way that the
handler code couldn't get access to the fragment part of the parsed
URI. This would completely destroy the functionality of a custom
handler for ivo:
URLs which included significant fragments. And
this would not be a bug in the API.
The java.net.URLStreamHandler
abstract class is not in fact
constructed in this way, but this is no guarantee that a different
class, in this or a different language, won't act in the same
inconvenient fashion.
When a cache is asked for
scheme:path#fragment
, it should simply return the content of
scheme:path
since, according to the URI spec, and for
any scheme, these are equivalent in this context. Indeed,
any ivo:
cache is required to behave like
this (RFC section 6.1: When URIs are compared to select (or avoid)
a network action, such as retrieval of a representation, fragment
components (if any) should be excluded from the comparison.
).
That is, if a user-agent were to ask a proxy or cache for
ivo://auth/obj#frag
, it should receive the contents of
ivo://auth/obj
.
This also is not a bug in the cache.
Superficially, it seems that these two problems can be evaded: don't use scheme-specific handlers, and don't use proxies or caches; or more generally, avoid tools which conform to the demands of the URI specification. Depending on the local network environment, however, user-agents may be obliged to use caches; this is unlikely in (current) practice, in the case of non-HTTP URIs, but this may not be avoidable in future for the following reason.
The third point is the longest-term point, and may not be so easily worked around.
At some point – perhaps in a decade, perhaps longer – there will be a replacement standard for addressing things on the web (or whatever replaces it). As the web's core addressing technology, URIs are so important that there will certainly be a mechanism for mapping URIs to the new standard, supported by gateways or proxies of some type. At this point, using a URI proxy will not be optional, if the IVOA is to remain reasonably consistent with the rest of the world.
Whatever technology finally replaces URIs as a addressing mechanism will have a lot of work invested in it, to make sure the two are compatible. The gateways implementing this mapping cannot be guaranteed to be friendly to URI schemes which depend on behaviour which the URI specification declares must not happen.
We do not wish to suggest that fragments should be avoided in general; there are plenty of cases where they are perfectly appropriate. In the best-known use, to provide a direct link to elements within an HTML page, fragments are useful and unexceptionable; and when a fragment is used to create a name for something, as is used within the Standards Registry Extension, or in many Semantic Web use-cases, that is a useful and increasingly common technique which provides natural namespacing.
The Standards Registry Extension specification
std:stdregext uses URIs as names: for example
ivo://ivoa.net/std/QueryProtocol#case-insensitive
. Here,
there’s no suggestion that the #case-insensitive
thing
is a differently-retrieved resource – it is simply a
name, and the non-fragment part of the URI is merely acting
as a type of namespace. This goes with the grain of the URI
definition.
At the risk of belabouring the point, the difference between this and the VOEvent case is
that in the VOEvent case there is the clear implication that a VOEvent identifier
stream#event
is not merely a name for an event, but is
expected to be retrievable directly, in contrast to being accessed
by downloading the entire stream, and searching
locally for the
secondary resource #event
. There is a similar situation,
mutatis mutandis, when the VOSpace specification talks of accessing nodes.
VOEvent identifiers have the form
ivo://example.org/streamid#local_ID
(see section 2.2 of
std:voevent). The URI RFC requires that this is resolved
by retrieving the resource ivo://example.org/streamid
and
finding #local_ID
within it, but the VOEvent specification
indicates that the resources ivo://example.org/streamid
and ivo://example.org/streamid#local_ID
might be
retrieved independently.
The text of the VOSpace specification std:vospace
principally illustrates URI fragments being used as property names;
this is unproblematic for the reasons discussed below (Sect.namesok). However the
specification also describes URIs in a vos:
scheme
(implicitly and explicitly including fragments) as names for VOSpace
nodes, and describes these being retrieved to obtain the node
contents. Depending on how this retrieval is done, this dereferencing
procedure might be adversely affected by the issues described in this Note.
Other IVOA specifications which discuss URIs with fragments may need to be examined, to discover whether they are also unwittingly depending on unsupported behaviour.
This Note makes the following recommendations:
Volute $Revision$ $Date$