I nternational

V irtual

O bservatory

A lliance

URI fragments in IVOA specifications
Version 0.1

Filled in automatically

Working Group
Semantics???
This version:
filled in automatically
Latest version:
http://www.astro.gla.ac.uk/users/norman/temp/uri-fragments.html
Previous version(s):
0.2: http://www.astro.gla.ac.uk/users/norman/temp/uri-fragments/20120404/
Author(s):
Norman Gray

Abstract

The fragment identifier in a URI has a specific semantics attached to it. IVOA specifications should therefore not use it as a simple indicator of hierarchy or containment.

Status of This Document

This is an author's draft. It has no IVOA standing as such, but will be submitted as a Note to the IVOA documents series once it has received some feedback.

(updated automatically)

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgements

The author is most grateful for comments and criticisms received from Guy Rixon, Mark Taylor, Markus Demleitner, and Dick Shaw.

Contents

Introduction

URIs are defined in IETF RFC 3986 std:rfc3986. In its full generality, the syntax of URIs is quite complicated, but most of the URIs we commonly see use only a subset of the possible features, namely a scheme (which is usually http or sometimes, in VO contexts, ivo), a host prefixed by a pair of slashes //, a path with elements separated by single slashes /, and a possible fragment, separated from the rest of the URI by a hash or number sign #. The point of this present note is to stress that the fragment is importantly distinct from the other parts of the URI – it is not sent over the network to a remote server, when the URI is retrieved or dereferenced.

When looking at a webpage in a web browser – for example the URL http://www.ivoa.net/Documents/#notes, the browser retrieves the path /Documents/ from the server at www.ivoa.net and once it has retrieved the HTML page that come back, it searches within the page for the anchor labelled with notes. Crucially, this search happens entirely on the client side, and it or its analogue happens during the processing of any URI – it is not specific to HTTP or to HTML pages. It also therefore applies to IVORN URIs (starting ivo:) std:ivo and VOSpace URIs (starting vos:) std:voevent.

In brief: The fragment identifier in a URI (RFC 3986, std:rfc3986) has a specific semantics attached to it. IVOA specifications should therefore not use it as a simple indicator of hierarchy or containment. Or, put another way: punctu–ation,isn#t ju`st !dec$ora/tion.

@@@ It might be useful/interesting/valuable to include in this note a discussion of other recommended and deprecated URL patterns. Quite a lot of the suggestions in the famous Cool URIs don't change document are as valid now as they were in 1998.

The problem with fragments

Several IVOA standards define URI patterns for the objects they describe – the VOEvent and VOSpace standards are an example. In this context, it is natural to use the URI fragment as a way of referring to a resource which is conceptually contained within another, by analogy with the way that the fragments in HTML pages are conceptually within the page. Unfortunately, the fixed and invariable meaning attached to URI fragments means that the applications which process such URIs may be required by the (IETF RFC) standard to process them in ways which are unintended by the IVOA standards. If applications carefully do not process them in a conformant way, then we are concerned that those applications will risk being frustrated by conformant library APIs, by caches, and by future developments in URI standards themselves.

The rest of this section is a detailed discussion of the problem, with a rather legalistic tone, in terms which presume some acquaintance with the details of the URI specification std:rfc3986.

The fundamental problem with URI formats such as scheme:foo#local_ID is that the specification for URIs std:rfc3986 requires that the fragment (the #local_ID) is removed prior to any dereference – the fragment identifier is separated from the rest of the URI prior to a dereference (this and other quotations here are from section 3.5 of the URI RFC). Other language in this section makes it clear that the fragment has a special, and secondary, status ([t]he fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information) and that this cannot be redefined by scheme-specific specifications: [f]ragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Further, the fragment identifier is not used in the scheme-specific processing of a URI. This means that in order to conform to the URI specification, the processing of the ivo: URI scheme must ignore the fragment. This means that whenever an IVORN ivo://foo/bar#baz is processed (or in general used in any way other than a name in the ivo://foo/bar namespace), that processing must be done on the IVORN ivo://foo/bar alone, and the presence of the #baz fragment taken account of only after retrieval is complete.

Another way of phrasing this is that there is no guarantee that a server will see the fragment in any URI, since any of possibly multiple intermediaries between the client and the server will be licensed to remove it (nor, incidentally, is there any guarantee that a server will not see the fragment).

The intention of the URI specification is that such a URI is conceptually handled by the client stripping the fragment, processing the resulting cropped URI, and then resolving the fragment, in some scheme-specific way, on the client.

In the VOEvent spec, however, .../streamid and .../streamid#local_ID are conceived as completely independent resources, contrary to the prescriptions in the URI RFC.

See section affected for a note on affected IVOA Standards.

This is not merely a theoretical problem, for three reasons.

Issue 1: scheme handlers may not report the fragment

One can imagine a URI API which allows for scheme-specific handlers (eg for vos: or ivo:), in the way that the java.net.URI class does. Such a handler class's API could potentially be constructed in such a way that the handler code couldn't get access to the fragment part of the parsed URI. This would completely destroy the functionality of a custom handler for ivo: URLs which included significant fragments. And this would not be a bug in the API.

The java.net.URLStreamHandler abstract class is not in fact constructed in this way, but this is no guarantee that a different class, in this or a different language, won't act in the same inconvenient fashion.

Issue 3: URIs won't last forever

The third point is the longest-term point, and may not be so easily worked around.

At some point – perhaps in a decade, perhaps longer – there will be a replacement standard for addressing things on the web (or whatever replaces it). As the web's core addressing technology, URIs are so important that there will certainly be a mechanism for mapping URIs to the new standard, supported by gateways or proxies of some type. At this point, using an HTTP proxy will not be optional, if the IVOA is to remain reasonably consistent with the rest of the world.

Whatever technology finally replaces URIs as a addressing mechanism will have a lot of work invested in it, to make sure the two are compatible. The gateways implementing this mapping cannot be guaranteed to be friendly to URI schemes which depend on behaviour which the URI specification declares must not happen.

Recommendations

This Note makes the following recommendations:

  1. IVOA protocols should not use URI fragments other than in a context in which (a) the fragment is being used as a name for an object which is not expected to be retrieved, or (b) there is an implication that the object so named will be retrieved in the way which is implied by the URI model.
  2. If a resource named by a standard-specified URI will ever be retrieved, then to avoid doubt the standard should explicitly note that the fragment processing is expected to be performed by the client.

References


Volute $Revision$ $Date$