1 |
<?xml version="1.0"?> |
2 |
<!-- $Id:$ |
3 |
Note that this file should be xhtml with div to mark sections - see README for more information |
4 |
Paul Harrison --> |
5 |
<!DOCTYPE html |
6 |
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "ivoadoc/xmlcatalog/xhtml1-transitional.dtd"> |
7 |
<html xmlns="http://www.w3.org/1999/xhtml"> |
8 |
<head> |
9 |
<title>URI fragments in IVOA specifications</title> |
10 |
<meta name="DC.Title" content="URI fragments in IVOA specifications" /> |
11 |
<meta name="DC.author" content="Norman Gray, norman@astro.gla.ac.uk" /> |
12 |
<meta name="DC.maintainedBy" content="Norman Gray, norman@astro.gla.ac.uk" /> |
13 |
<link href="http://www.ivoa.net/misc/ivoa_a.css" rel="stylesheet" type="text/css" /> |
14 |
<link rel="stylesheet" href="http://www.ivoa.net/misc/ivoa_note.css" type="text/css" /> |
15 |
<!-- Add other styling information here (but this element, if present, mustn't be empty) |
16 |
<style type="text/css"></style> |
17 |
--> |
18 |
<link href="XMLPrint.css" rel="stylesheet" type="text/css" /> |
19 |
<link href="ivoa-extras.css" rel="stylesheet" type="text/css" /> |
20 |
</head> |
21 |
<body> |
22 |
<div class="head"> |
23 |
<div id="titlehead" style="position:relative;height:170px;width: 500px"> |
24 |
<div id="logo" style="position:absolute;width:300px;height:169px;left: 50px;top: 0px;"> |
25 |
<img src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" alt="IVOA logo"/></div> |
26 |
<div id="logo-title" |
27 |
style="position: absolute; width: 200px; height: 115px; left: 320px; top: 5px; font-size: 14pt; color: #005A9C; font-style: italic;"> |
28 |
<p style='position: absolute; left: 0px; top: 0px;'><span style='font-weight: bold;'>I</span> nternational</p> |
29 |
<p style='position: absolute; left: 15pt; top: 25pt;'><span style='font-weight: bold;'>V</span> irtual</p> |
30 |
<p style='position: absolute; left: 15pt; top: 50pt;'><span style='font-weight: bold;'>O</span> bservatory</p> |
31 |
<p style='position: absolute; left: 0px; top: 75pt;'><span style='font-weight: bold;'>A</span> lliance</p> |
32 |
</div> |
33 |
</div> |
34 |
<h1>URI fragments in IVOA specifications<br/> |
35 |
Version <span class="docversion">0.1</span></h1> |
36 |
<h2 class="subtitle">Filled in automatically</h2> |
37 |
<dl> |
38 |
<dt>Working Group</dt> |
39 |
<dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/Semantics">Semantics</a></dd> |
40 |
<dt><b>This version:</b></dt> |
41 |
<dd><a href="" class="currentlink">filled in automatically</a></dd> |
42 |
<dt><b>Latest version:</b></dt> |
43 |
<dd><a href='' class='latestlink'>filled in automatically</a></dd> |
44 |
<dt><b>Previous version(s):</b></dt> |
45 |
<dd>None</dd> |
46 |
<dt><b>Author(s):</b></dt> |
47 |
<dd>Norman Gray</dd> |
48 |
</dl> |
49 |
|
50 |
<h2>Abstract</h2> |
51 |
<p>The <q>fragment</q> identifier in a URI has a specific semantics |
52 |
attached to it. IVOA specifications should therefore <em>not</em> use |
53 |
it as a simple indicator of hierarchy or <q>containment</q>.</p> |
54 |
<h2> Status of This Document</h2> |
55 |
<p >This is an author's draft. |
56 |
It has no IVOA standing as such, but will be submitted as a Note to |
57 |
the IVOA documents series once it has received some feedback.</p> |
58 |
<p id="statusdecl">(updated automatically)</p> |
59 |
<p> <em >A list of </em><span style="background: transparent"><a href="http://www.ivoa.net/Documents/"><i>current |
60 |
IVOA Recommendations and other technical documents</i></a></span><em > can be found at http://www.ivoa.net/Documents/.</em></p> |
61 |
<h2 class="prologue-heading-western" >Acknowledgements</h2> |
62 |
<p>The author is most grateful for comments and criticisms received from Guy Rixon, Mark Taylor, Markus Demleitner, and Dick Shaw.</p> |
63 |
</div> |
64 |
<h2>Contents</h2> |
65 |
<div> |
66 |
<?toc ?> |
67 |
</div> |
68 |
<div class="body"> |
69 |
<div class="section"> |
70 |
<h1><a id="intro"></a>Introduction</h1> |
71 |
|
72 |
<p>URIs are defined in IETF RFC 3986 <cite>std:rfc3986</cite>. In its |
73 |
full generality, the syntax of URIs is quite complicated, but most of |
74 |
the URIs we commonly see use only a subset of the possible features, namely a |
75 |
<q>scheme</q> (which is usually <code>http</code> or sometimes, in VO contexts, |
76 |
<code>ivo</code>), a <q>host</q> prefixed by a pair of slashes |
77 |
<code>//</code>, a <q>path</q> with elements separated by single |
78 |
slashes <code>/</code>, and a possible fragment, separated from the |
79 |
rest of the URI by a hash or number sign, <code>#</code>. The point of |
80 |
this present note is to stress that the fragment is importantly |
81 |
distinct from the other parts of the URI: it is not sent over the |
82 |
network to a remote server, when the URI is retrieved or |
83 |
dereferenced.</p> |
84 |
|
85 |
<p>When looking at a webpage in a web browser – for example the URL |
86 |
<code>http://www.ivoa.net/Documents/#notes</code> – the browser |
87 |
retrieves the path <code>/Documents/</code> from the server at |
88 |
<code>www.ivoa.net</code> and once it has retrieved the HTML page that |
89 |
come back, it <em>searches within the page</em> for the anchor |
90 |
labelled with <code>notes</code>. Crucially, this search happens |
91 |
entirely on the client side, and it or its analogue happens during the |
92 |
processing of <em>any</em> URI – it is not specific to HTTP or to HTML |
93 |
pages. It also therefore applies to IVORN URIs (starting |
94 |
<code>ivo:</code>) <cite>std:ivo</cite> and VOSpace URIs (starting |
95 |
<code>vos:</code>) <cite>std:voevent</cite>.</p> |
96 |
|
97 |
<p>In brief: The <q>fragment</q> identifier in a URI (RFC 3986, |
98 |
<cite>std:rfc3986</cite>) has a specific semantics |
99 |
attached to it. IVOA specifications should therefore <em>not</em> use |
100 |
it as a simple indicator of hierarchy or <q>containment</q>. Or, put |
101 |
another way: punctu–ation,isn#t ju`st !dec$ora/tion.</p> |
102 |
|
103 |
<p>This document is not intended to be a comprehensive survey of |
104 |
recommended and deprecated URL patterns. We note, however, that quite |
105 |
a lot of the suggestions in the famous <a |
106 |
href='http://www.w3.org/Provider/Style/URI'>Cool URIs don't change</a> |
107 |
document are as valid now as they were in 1998.</p> |
108 |
|
109 |
</div> |
110 |
|
111 |
<div class='section'> |
112 |
<h1>The problem with fragments</h1> |
113 |
|
114 |
<p>Several IVOA standards define URI patterns for the objects they |
115 |
describe – the VOEvent and VOSpace standards are an example. In this |
116 |
context, it is natural to use the URI fragment as a way of referring |
117 |
to a resource which is conceptually <em>contained within</em> another, |
118 |
by analogy with the way that the fragments in HTML pages are |
119 |
conceptually within the page. Unfortunately, the fixed and invariable |
120 |
meaning attached to URI fragments means that the applications which process such |
121 |
URIs may be required by the (IETF RFC) standard to process them in |
122 |
ways which may be unintended by the IVOA standards. If applications, |
123 |
guided by an IVOA standard, do not process URIs in a conformant way, |
124 |
then we are concerned that those applications will risk being frustrated by |
125 |
conformant library APIs, by caches, and by future developments in URI |
126 |
standards themselves.</p> |
127 |
|
128 |
<p>The rest of this section is a detailed discussion of the problem, |
129 |
with a rather legalistic tone, in terms which presume some |
130 |
acquaintance with the details of the URI specification |
131 |
<cite>std:rfc3986</cite>.</p> |
132 |
|
133 |
<p>The fundamental problem with URI formats such as |
134 |
<code>scheme:foo#local_ID</code> is that the specification for URIs |
135 |
<cite>std:rfc3986</cite> requires that the fragment (the |
136 |
<code>#local_ID</code>) is removed prior to any dereference – |
137 |
<q>the fragment identifier is separated from the rest of the URI prior |
138 |
to a dereference</q> (this and other quotations here are from section 3.5 of |
139 |
the URI RFC). Other language in this section makes it clear |
140 |
that the fragment has a special, and secondary, status (<q>[t]he |
141 |
fragment identifier component of a URI allows indirect identification |
142 |
of a secondary resource by reference to a primary resource and |
143 |
additional identifying information</q>) and that this |
144 |
is independent of the scheme: <q>[f]ragment |
145 |
identifier semantics are independent of the URI scheme and thus cannot |
146 |
be redefined by scheme specifications</q>.</p> |
147 |
|
148 |
<p>Further, <q>the fragment identifier is not used in the |
149 |
scheme-specific processing of a URI</q>. This means that in order to |
150 |
conform to the URI specification, the processing of the |
151 |
<code>ivo:</code> URI scheme must ignore the fragment. This means |
152 |
that whenever an IVORN <code>ivo://foo/bar#baz</code> is <q>processed</q> |
153 |
(or in general used in any way other than a name in the |
154 |
<code>ivo://foo/bar</code> namespace), that processing must be done on |
155 |
the IVORN <code>ivo://foo/bar</code> alone, and the presence of the |
156 |
<code>#baz</code> fragment taken account of only after retrieval is complete.</p> |
157 |
|
158 |
<p>Another way of phrasing this is that there is no guarantee that a |
159 |
server will <q>see</q> the fragment in any URI, since any of possibly |
160 |
multiple intermediaries between the client and the server will be |
161 |
licensed to remove it (nor, incidentally, is there any guarantee that |
162 |
a server will <em>not</em> see the fragment).</p> |
163 |
|
164 |
<p>The intention of the URI specification is that such a URI is |
165 |
conceptually handled by the client stripping the fragment, processing |
166 |
the resulting cropped URI, and then resolving the fragment, in some |
167 |
scheme-specific way, <em>on the client</em>.</p> |
168 |
|
169 |
<p>In the VOEvent spec, however, <code>.../streamid</code> and |
170 |
<code>.../streamid#local_ID</code> are conceived as completely |
171 |
independent resources, contrary to the prescriptions in the URI |
172 |
RFC.</p> |
173 |
|
174 |
<p>See section <span class='xref'>affected</span> for a note on |
175 |
affected IVOA Standards.</p> |
176 |
|
177 |
<p>This is not merely a theoretical problem, for three reasons.</p> |
178 |
|
179 |
<div class='section'> |
180 |
<h1>Issue 1: scheme handlers may not report the fragment</h1> |
181 |
<p>One can imagine a URI API which allows for scheme-specific |
182 |
handlers (eg for <code>vos:</code> or <code>ivo:</code>), in the way |
183 |
that the <code>java.net.URI</code> class does. Such a handler |
184 |
class's API could potentially be constructed in such a way that the |
185 |
handler code couldn't get access to the fragment part of the parsed |
186 |
URI. This would completely destroy the functionality of a custom |
187 |
handler for <code>ivo:</code> URLs which included significant fragments. And |
188 |
<em>this would not be a bug</em> in the API.</p> |
189 |
|
190 |
<p>The <code>java.net.URLStreamHandler</code> abstract class is not in fact |
191 |
constructed in this way, but this is no guarantee that a different |
192 |
class, in this or a different language, won't act in the same |
193 |
inconvenient fashion.</p> |
194 |
</div> |
195 |
|
196 |
<div class='section'> |
197 |
<h1><a id='i2'/>Issue 2: servers (including caches) may equate URIs with and without fragments</h1> |
198 |
<p>When a cache is asked for |
199 |
<code>scheme:path#fragment</code>, it should simply return the content of |
200 |
<code>scheme:path</code> since, according to the URI spec, and for |
201 |
<em>any</em> scheme, these are equivalent in this context. Indeed, |
202 |
any <code>ivo:</code> cache is <em>required</em> to behave like |
203 |
this (RFC section 6.1: <q>When URIs are compared to select (or avoid) |
204 |
a network action, such as retrieval of a representation, fragment |
205 |
components (if any) should be excluded from the comparison.</q>). |
206 |
That is, if a user-agent were to ask a proxy or cache for |
207 |
<code>ivo://auth/obj#frag</code>, it should receive the contents of |
208 |
<code>ivo://auth/obj</code>.</p> |
209 |
|
210 |
<p>This also is <em>not a bug</em> in the cache.</p> |
211 |
|
212 |
<p>Superficially, it seems that these two problems can be evaded: |
213 |
don't use scheme-specific handlers, and don't use proxies or caches; |
214 |
or more generally, avoid tools which conform to the demands of the URI |
215 |
specification. Depending on the local network environment, however, |
216 |
user-agents may be obliged to use caches; this is unlikely in |
217 |
(current) practice, in the case of non-HTTP URIs, but this may not be |
218 |
avoidable in future for the following reason.</p> |
219 |
</div> |
220 |
|
221 |
<div class='section'> |
222 |
<h1>Issue 3: URIs won't last forever</h1> |
223 |
<p>The third point is the longest-term point, and may not |
224 |
be so easily worked around.</p> |
225 |
|
226 |
<p>At some point – perhaps in |
227 |
a decade, perhaps longer – there will be a replacement standard |
228 |
for addressing things on the web (or whatever replaces it). |
229 |
As the web's core addressing technology, URIs are so important that |
230 |
there will certainly be a mechanism for mapping URIs |
231 |
to the new standard, supported by gateways or proxies of some type. |
232 |
At this point, using a URI proxy will not be optional, if the IVOA |
233 |
is to remain reasonably consistent with the rest of the world.</p> |
234 |
|
235 |
<p>Whatever technology finally replaces URIs as a addressing mechanism |
236 |
will have a lot of work invested in it, to make sure the two are compatible. |
237 |
The gateways implementing this mapping cannot be guaranteed to be friendly to URI schemes |
238 |
which depend on behaviour which the URI specification declares must |
239 |
not happen.</p> |
240 |
</div> |
241 |
|
242 |
<div class='section'> |
243 |
<h1><a id='namesok'/>Non-problem: URIs as names</h1> |
244 |
|
245 |
<p>We do not wish to suggest that fragments should be avoided in |
246 |
general; there are plenty of cases where they are perfectly |
247 |
appropriate. In the best-known use, to provide a direct link to |
248 |
elements within an HTML page, fragments are useful and |
249 |
unexceptionable; and when a fragment is used to create a <em>name</em> |
250 |
for something, as is used within the Standards Registry Extension, or |
251 |
in many Semantic Web use-cases, that is a useful and increasingly |
252 |
common technique which provides natural namespacing.</p> |
253 |
|
254 |
<p>The Standards Registry Extension specification |
255 |
<cite>std:stdregext</cite> uses URIs as names: for example |
256 |
<code>ivo://ivoa.net/std/QueryProtocol#case-insensitive</code>. Here, |
257 |
there’s no suggestion that the <code>#case-insensitive</code> <q>thing</q> |
258 |
is a differently-retrieved resource – it is simply a |
259 |
<em>name</em>, and the non-fragment part of the URI is merely acting |
260 |
as a type of namespace. This goes <em>with</em> the grain of the URI |
261 |
definition.</p> |
262 |
|
263 |
<p>At the risk of belabouring the point, the difference between this and the VOEvent case is |
264 |
that in the VOEvent case there is the clear implication that a VOEvent identifier |
265 |
<code>stream#event</code> is not merely a name for an event, but is |
266 |
expected to be retrievable directly, in contrast to being accessed |
267 |
by downloading the entire stream, and <q>searching</q> locally for the |
268 |
secondary resource <code>#event</code>. There is a similar situation, |
269 |
mutatis mutandis, when the VOSpace specification talks of accessing nodes.</p> |
270 |
</div> |
271 |
|
272 |
<div class='section'> |
273 |
<h1><a id='affected'/>Affected IVOA standards</h1> |
274 |
|
275 |
<p>VOEvent identifiers have the form |
276 |
<code>ivo://example.org/streamid#local_ID</code> (see section 2.2 of |
277 |
<cite>std:voevent</cite>). The URI RFC requires that this is resolved |
278 |
by retrieving the resource <code>ivo://example.org/streamid</code> and |
279 |
finding <code>#local_ID</code> within it, but the VOEvent specification |
280 |
indicates that the resources <code>ivo://example.org/streamid</code> |
281 |
and <code>ivo://example.org/streamid#local_ID</code> might be |
282 |
retrieved independently.</p> |
283 |
|
284 |
<p>The text of the VOSpace specification <cite>std:vospace</cite> |
285 |
principally illustrates URI fragments being used as property names; |
286 |
this is unproblematic for the reasons discussed below (Sect.<span class='xref'>namesok</span>). However the |
287 |
specification also describes URIs in a <code>vos:</code> scheme |
288 |
(implicitly and explicitly including fragments) as names for VOSpace |
289 |
nodes, and describes these being retrieved to obtain the node |
290 |
contents. Depending on how this retrieval is done, this dereferencing |
291 |
procedure might be adversely affected by the issues described in this Note.</p> |
292 |
|
293 |
<p>Other IVOA specifications which discuss URIs with fragments may |
294 |
need to be examined, to discover whether they are also unwittingly |
295 |
depending on unsupported behaviour.</p> |
296 |
|
297 |
</div> |
298 |
</div> |
299 |
|
300 |
<div class='section'> |
301 |
<h1>Recommendations</h1> |
302 |
<p>This Note makes the following recommendations:</p> |
303 |
<ol> |
304 |
<li>IVOA protocols should not use URI fragments |
305 |
other than in a context in which (a) the fragment is being used as |
306 |
a name for an object which is not expected to be retrieved, or (b) |
307 |
there is an implication that the object so named will be retrieved |
308 |
in the way which is implied by the URI model.</li> |
309 |
<li>If a resource named by a standard-specified URI will ever be |
310 |
retrieved, then to avoid doubt the standard should explicitly note |
311 |
that the fragment processing is expected to be performed by the client.</li> |
312 |
</ol> |
313 |
</div> |
314 |
|
315 |
</div><!--body--> |
316 |
|
317 |
<div class="section-nonum"> |
318 |
<h1 ><a name="References" id="References"></a>References</h1> |
319 |
<?bibliography note ?> |
320 |
</div> |
321 |
<hr/> |
322 |
<p style="text-align: right; font-size: x-small; color: #888;"> |
323 |
<a href='http://code.google.com/p/volute' >Volute</a> |
324 |
$Revision$ $Date$ |
325 |
</p> |
326 |
|
327 |
</body> |
328 |
</html> |