ViewVC logotype

Contents of /trunk/projects/note-urifragments/uri-fragments.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1775 - (show annotations)
Fri May 25 17:32:48 2012 UTC (8 years, 11 months ago) by norman.x.gray@gmail.com
File MIME type: text/html
File size: 16252 byte(s)
Tidyups for v1.0, to be posted on ivoa.net
1 <?xml version="1.0"?>
2 <!-- $Id:$
3 Note that this file should be xhtml with div to mark sections - see README for more information
4 Paul Harrison -->
5 <!DOCTYPE html
6 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "ivoadoc/xmlcatalog/xhtml1-transitional.dtd">
7 <html xmlns="http://www.w3.org/1999/xhtml">
8 <head>
9 <title>URI fragments in IVOA specifications</title>
10 <meta name="DC.Title" content="URI fragments in IVOA specifications" />
11 <meta name="DC.author" content="Norman Gray, norman@astro.gla.ac.uk" />
12 <meta name="DC.maintainedBy" content="Norman Gray, norman@astro.gla.ac.uk" />
13 <link href="http://www.ivoa.net/misc/ivoa_a.css" rel="stylesheet" type="text/css" />
14 <link rel="stylesheet" href="http://www.ivoa.net/misc/ivoa_note.css" type="text/css" />
15 <!-- Add other styling information here (but this element, if present, mustn't be empty)
16 <style type="text/css"></style>
17 -->
18 <link href="XMLPrint.css" rel="stylesheet" type="text/css" />
19 <link href="ivoa-extras.css" rel="stylesheet" type="text/css" />
20 </head>
21 <body>
22 <div class="head">
23 <div id="titlehead" style="position:relative;height:170px;width: 500px">
24 <div id="logo" style="position:absolute;width:300px;height:169px;left: 50px;top: 0px;">
25 <img src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" alt="IVOA logo"/></div>
26 <div id="logo-title"
27 style="position: absolute; width: 200px; height: 115px; left: 320px; top: 5px; font-size: 14pt; color: #005A9C; font-style: italic;">
28 <p style='position: absolute; left: 0px; top: 0px;'><span style='font-weight: bold;'>I</span> nternational</p>
29 <p style='position: absolute; left: 15pt; top: 25pt;'><span style='font-weight: bold;'>V</span> irtual</p>
30 <p style='position: absolute; left: 15pt; top: 50pt;'><span style='font-weight: bold;'>O</span> bservatory</p>
31 <p style='position: absolute; left: 0px; top: 75pt;'><span style='font-weight: bold;'>A</span> lliance</p>
32 </div>
33 </div>
34 <h1>URI fragments in IVOA specifications<br/>
35 Version <span class="docversion">0.1</span></h1>
36 <h2 class="subtitle">Filled in automatically</h2>
37 <dl>
38 <dt>Working Group</dt>
39 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/Semantics">Semantics</a></dd>
40 <dt><b>This version:</b></dt>
41 <dd><a href="" class="currentlink">filled in automatically</a></dd>
42 <dt><b>Latest version:</b></dt>
43 <dd><a href='' class='latestlink'>filled in automatically</a></dd>
44 <dt><b>Previous version(s):</b></dt>
45 <dd>None</dd>
46 <dt><b>Author(s):</b></dt>
47 <dd>Norman Gray</dd>
48 </dl>
50 <h2>Abstract</h2>
51 <p>The <q>fragment</q> identifier in a URI has a specific semantics
52 attached to it. IVOA specifications should therefore <em>not</em> use
53 it as a simple indicator of hierarchy or <q>containment</q>.</p>
54 <h2> Status of This Document</h2>
55 <p >This is an author's draft.
56 It has no IVOA standing as such, but will be submitted as a Note to
57 the IVOA documents series once it has received some feedback.</p>
58 <p id="statusdecl">(updated automatically)</p>
59 <p> <em >A list of </em><span style="background: transparent"><a href="http://www.ivoa.net/Documents/"><i>current
60 IVOA Recommendations and other technical documents</i></a></span><em > can be found at http://www.ivoa.net/Documents/.</em></p>
61 <h2 class="prologue-heading-western" >Acknowledgements</h2>
62 <p>The author is most grateful for comments and criticisms received from Guy Rixon, Mark Taylor, Markus Demleitner, and Dick Shaw.</p>
63 </div>
64 <h2>Contents</h2>
65 <div>
66 <?toc ?>
67 </div>
68 <div class="body">
69 <div class="section">
70 <h1><a id="intro"></a>Introduction</h1>
72 <p>URIs are defined in IETF RFC 3986 <cite>std:rfc3986</cite>. In its
73 full generality, the syntax of URIs is quite complicated, but most of
74 the URIs we commonly see use only a subset of the possible features, namely a
75 <q>scheme</q> (which is usually <code>http</code> or sometimes, in VO contexts,
76 <code>ivo</code>), a <q>host</q> prefixed by a pair of slashes
77 <code>//</code>, a <q>path</q> with elements separated by single
78 slashes <code>/</code>, and a possible fragment, separated from the
79 rest of the URI by a hash or number sign, <code>#</code>. The point of
80 this present note is to stress that the fragment is importantly
81 distinct from the other parts of the URI: it is not sent over the
82 network to a remote server, when the URI is retrieved or
83 dereferenced.</p>
85 <p>When looking at a webpage in a web browser – for example the URL
86 <code>http://www.ivoa.net/Documents/#notes</code> – the browser
87 retrieves the path <code>/Documents/</code> from the server at
88 <code>www.ivoa.net</code> and once it has retrieved the HTML page that
89 come back, it <em>searches within the page</em> for the anchor
90 labelled with <code>notes</code>. Crucially, this search happens
91 entirely on the client side, and it or its analogue happens during the
92 processing of <em>any</em> URI – it is not specific to HTTP or to HTML
93 pages. It also therefore applies to IVORN URIs (starting
94 <code>ivo:</code>) <cite>std:ivo</cite> and VOSpace URIs (starting
95 <code>vos:</code>) <cite>std:voevent</cite>.</p>
97 <p>In brief: The <q>fragment</q> identifier in a URI (RFC 3986,
98 <cite>std:rfc3986</cite>) has a specific semantics
99 attached to it. IVOA specifications should therefore <em>not</em> use
100 it as a simple indicator of hierarchy or <q>containment</q>. Or, put
101 another way: punctu–ation,isn#t ju`st !dec$ora/tion.</p>
103 <p>This document is not intended to be a comprehensive survey of
104 recommended and deprecated URL patterns. We note, however, that quite
105 a lot of the suggestions in the famous <a
106 href='http://www.w3.org/Provider/Style/URI'>Cool URIs don't change</a>
107 document are as valid now as they were in 1998.</p>
109 </div>
111 <div class='section'>
112 <h1>The problem with fragments</h1>
114 <p>Several IVOA standards define URI patterns for the objects they
115 describe – the VOEvent and VOSpace standards are an example. In this
116 context, it is natural to use the URI fragment as a way of referring
117 to a resource which is conceptually <em>contained within</em> another,
118 by analogy with the way that the fragments in HTML pages are
119 conceptually within the page. Unfortunately, the fixed and invariable
120 meaning attached to URI fragments means that the applications which process such
121 URIs may be required by the (IETF RFC) standard to process them in
122 ways which may be unintended by the IVOA standards. If applications,
123 guided by an IVOA standard, do not process URIs in a conformant way,
124 then we are concerned that those applications will risk being frustrated by
125 conformant library APIs, by caches, and by future developments in URI
126 standards themselves.</p>
128 <p>The rest of this section is a detailed discussion of the problem,
129 with a rather legalistic tone, in terms which presume some
130 acquaintance with the details of the URI specification
131 <cite>std:rfc3986</cite>.</p>
133 <p>The fundamental problem with URI formats such as
134 <code>scheme:foo#local_ID</code> is that the specification for URIs
135 <cite>std:rfc3986</cite> requires that the fragment (the
136 <code>#local_ID</code>) is removed prior to any dereference &ndash;
137 <q>the fragment identifier is separated from the rest of the URI prior
138 to a dereference</q> (this and other quotations here are from section 3.5 of
139 the URI RFC). Other language in this section makes it clear
140 that the fragment has a special, and secondary, status (<q>[t]he
141 fragment identifier component of a URI allows indirect identification
142 of a secondary resource by reference to a primary resource and
143 additional identifying information</q>) and that this
144 is independent of the scheme: <q>[f]ragment
145 identifier semantics are independent of the URI scheme and thus cannot
146 be redefined by scheme specifications</q>.</p>
148 <p>Further, <q>the fragment identifier is not used in the
149 scheme-specific processing of a URI</q>. This means that in order to
150 conform to the URI specification, the processing of the
151 <code>ivo:</code> URI scheme must ignore the fragment. This means
152 that whenever an IVORN <code>ivo://foo/bar#baz</code> is <q>processed</q>
153 (or in general used in any way other than a name in the
154 <code>ivo://foo/bar</code> namespace), that processing must be done on
155 the IVORN <code>ivo://foo/bar</code> alone, and the presence of the
156 <code>#baz</code> fragment taken account of only after retrieval is complete.</p>
158 <p>Another way of phrasing this is that there is no guarantee that a
159 server will <q>see</q> the fragment in any URI, since any of possibly
160 multiple intermediaries between the client and the server will be
161 licensed to remove it (nor, incidentally, is there any guarantee that
162 a server will <em>not</em> see the fragment).</p>
164 <p>The intention of the URI specification is that such a URI is
165 conceptually handled by the client stripping the fragment, processing
166 the resulting cropped URI, and then resolving the fragment, in some
167 scheme-specific way, <em>on the client</em>.</p>
169 <p>In the VOEvent spec, however, <code>.../streamid</code> and
170 <code>.../streamid#local_ID</code> are conceived as completely
171 independent resources, contrary to the prescriptions in the URI
172 RFC.</p>
174 <p>See section <span class='xref'>affected</span> for a note on
175 affected IVOA Standards.</p>
177 <p>This is not merely a theoretical problem, for three reasons.</p>
179 <div class='section'>
180 <h1>Issue 1: scheme handlers may not report the fragment</h1>
181 <p>One can imagine a URI API which allows for scheme-specific
182 handlers (eg for <code>vos:</code> or <code>ivo:</code>), in the way
183 that the <code>java.net.URI</code> class does. Such a handler
184 class's API could potentially be constructed in such a way that the
185 handler code couldn't get access to the fragment part of the parsed
186 URI. This would completely destroy the functionality of a custom
187 handler for <code>ivo:</code> URLs which included significant fragments. And
188 <em>this would not be a bug</em> in the API.</p>
190 <p>The <code>java.net.URLStreamHandler</code> abstract class is not in fact
191 constructed in this way, but this is no guarantee that a different
192 class, in this or a different language, won't act in the same
193 inconvenient fashion.</p>
194 </div>
196 <div class='section'>
197 <h1><a id='i2'/>Issue 2: servers (including caches) may equate URIs with and without fragments</h1>
198 <p>When a cache is asked for
199 <code>scheme:path#fragment</code>, it should simply return the content of
200 <code>scheme:path</code> since, according to the URI spec, and for
201 <em>any</em> scheme, these are equivalent in this context. Indeed,
202 any <code>ivo:</code> cache is <em>required</em> to behave like
203 this (RFC section 6.1: <q>When URIs are compared to select (or avoid)
204 a network action, such as retrieval of a representation, fragment
205 components (if any) should be excluded from the comparison.</q>).
206 That is, if a user-agent were to ask a proxy or cache for
207 <code>ivo://auth/obj#frag</code>, it should receive the contents of
208 <code>ivo://auth/obj</code>.</p>
210 <p>This also is <em>not a bug</em> in the cache.</p>
212 <p>Superficially, it seems that these two problems can be evaded:
213 don't use scheme-specific handlers, and don't use proxies or caches;
214 or more generally, avoid tools which conform to the demands of the URI
215 specification. Depending on the local network environment, however,
216 user-agents may be obliged to use caches; this is unlikely in
217 (current) practice, in the case of non-HTTP URIs, but this may not be
218 avoidable in future for the following reason.</p>
219 </div>
221 <div class='section'>
222 <h1>Issue 3: URIs won't last forever</h1>
223 <p>The third point is the longest-term point, and may not
224 be so easily worked around.</p>
226 <p>At some point &ndash; perhaps in
227 a decade, perhaps longer &ndash; there will be a replacement standard
228 for addressing things on the web (or whatever replaces it).
229 As the web's core addressing technology, URIs are so important that
230 there will certainly be a mechanism for mapping URIs
231 to the new standard, supported by gateways or proxies of some type.
232 At this point, using a URI proxy will not be optional, if the IVOA
233 is to remain reasonably consistent with the rest of the world.</p>
235 <p>Whatever technology finally replaces URIs as a addressing mechanism
236 will have a lot of work invested in it, to make sure the two are compatible.
237 The gateways implementing this mapping cannot be guaranteed to be friendly to URI schemes
238 which depend on behaviour which the URI specification declares must
239 not happen.</p>
240 </div>
242 <div class='section'>
243 <h1><a id='namesok'/>Non-problem: URIs as names</h1>
245 <p>We do not wish to suggest that fragments should be avoided in
246 general; there are plenty of cases where they are perfectly
247 appropriate. In the best-known use, to provide a direct link to
248 elements within an HTML page, fragments are useful and
249 unexceptionable; and when a fragment is used to create a <em>name</em>
250 for something, as is used within the Standards Registry Extension, or
251 in many Semantic Web use-cases, that is a useful and increasingly
252 common technique which provides natural namespacing.</p>
254 <p>The Standards Registry Extension specification
255 <cite>std:stdregext</cite> uses URIs as names: for example
256 <code>ivo://ivoa.net/std/QueryProtocol#case-insensitive</code>. Here,
257 there’s no suggestion that the <code>#case-insensitive</code> <q>thing</q>
258 is a differently-retrieved resource &ndash; it is simply a
259 <em>name</em>, and the non-fragment part of the URI is merely acting
260 as a type of namespace. This goes <em>with</em> the grain of the URI
261 definition.</p>
263 <p>At the risk of belabouring the point, the difference between this and the VOEvent case is
264 that in the VOEvent case there is the clear implication that a VOEvent identifier
265 <code>stream#event</code> is not merely a name for an event, but is
266 expected to be retrievable directly, in contrast to being accessed
267 by downloading the entire stream, and <q>searching</q> locally for the
268 secondary resource <code>#event</code>. There is a similar situation,
269 mutatis mutandis, when the VOSpace specification talks of accessing nodes.</p>
270 </div>
272 <div class='section'>
273 <h1><a id='affected'/>Affected IVOA standards</h1>
275 <p>VOEvent identifiers have the form
276 <code>ivo://example.org/streamid#local_ID</code> (see section 2.2 of
277 <cite>std:voevent</cite>). The URI RFC requires that this is resolved
278 by retrieving the resource <code>ivo://example.org/streamid</code> and
279 finding <code>#local_ID</code> within it, but the VOEvent specification
280 indicates that the resources <code>ivo://example.org/streamid</code>
281 and <code>ivo://example.org/streamid#local_ID</code> might be
282 retrieved independently.</p>
284 <p>The text of the VOSpace specification <cite>std:vospace</cite>
285 principally illustrates URI fragments being used as property names;
286 this is unproblematic for the reasons discussed below (Sect.<span class='xref'>namesok</span>). However the
287 specification also describes URIs in a <code>vos:</code> scheme
288 (implicitly and explicitly including fragments) as names for VOSpace
289 nodes, and describes these being retrieved to obtain the node
290 contents. Depending on how this retrieval is done, this dereferencing
291 procedure might be adversely affected by the issues described in this Note.</p>
293 <p>Other IVOA specifications which discuss URIs with fragments may
294 need to be examined, to discover whether they are also unwittingly
295 depending on unsupported behaviour.</p>
297 </div>
298 </div>
300 <div class='section'>
301 <h1>Recommendations</h1>
302 <p>This Note makes the following recommendations:</p>
303 <ol>
304 <li>IVOA protocols should not use URI fragments
305 other than in a context in which (a) the fragment is being used as
306 a name for an object which is not expected to be retrieved, or (b)
307 there is an implication that the object so named will be retrieved
308 in the way which is implied by the URI model.</li>
309 <li>If a resource named by a standard-specified URI will ever be
310 retrieved, then to avoid doubt the standard should explicitly note
311 that the fragment processing is expected to be performed by the client.</li>
312 </ol>
313 </div>
315 </div><!--body-->
317 <div class="section-nonum">
318 <h1 ><a name="References" id="References"></a>References</h1>
319 <?bibliography note ?>
320 </div>
321 <hr/>
322 <p style="text-align: right; font-size: x-small; color: #888;">
323 <a href='http://code.google.com/p/volute' >Volute</a>
324 $Revision$ $Date$
325 </p>
327 </body>
328 </html>


Name Value
svn:keywords Revision Date

ViewVC Help
Powered by ViewVC 1.1.26