ViewVC logotype

Contents of /trunk/projects/note-urifragments/uri-fragments.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1743 - (show annotations)
Wed May 16 23:00:58 2012 UTC (9 years, 5 months ago) by norman.x.gray
File MIME type: text/html
File size: 16871 byte(s)
Update note text, to update acknowledgements and add some mild extra clarification
1 <?xml version="1.0"?>
2 <!-- $Id:$
3 Note that this file should be xhtml with div to mark sections - see README for more information
4 Paul Harrison -->
5 <!DOCTYPE html
6 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "ivoadoc/xmlcatalog/xhtml1-transitional.dtd">
7 <html xmlns="http://www.w3.org/1999/xhtml">
8 <head>
9 <title>URI fragments in IVOA specifications</title>
10 <meta name="DC.Title" content="URI fragments in IVOA specifications" />
11 <meta name="DC.author" content="Norman Gray, norman@astro.gla.ac.uk" />
12 <meta name="DC.maintainedBy" content="Norman Gray, norman@astro.gla.ac.uk" />
13 <link href="http://www.ivoa.net/misc/ivoa_a.css" rel="stylesheet" type="text/css" />
14 <link rel="stylesheet" href="http://www.ivoa.net/misc/ivoa_note.css" type="text/css" />
15 <!-- Add other styling information here (but this element, if present, mustn't be empty)
16 <style type="text/css"></style>
17 -->
18 <link href="XMLPrint.css" rel="stylesheet" type="text/css" />
19 <link href="ivoa-extras.css" rel="stylesheet" type="text/css" />
20 </head>
21 <body>
22 <div class="head">
23 <div id="titlehead" style="position:relative;height:170px;width: 500px">
24 <div id="logo" style="position:absolute;width:300px;height:169px;left: 50px;top: 0px;">
25 <img src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" alt="IVOA logo"/></div>
26 <div id="logo-title"
27 style="position: absolute; width: 200px; height: 115px; left: 320px; top: 5px; font-size: 14pt; color: #005A9C; font-style: italic;">
28 <p style='position: absolute; left: 0px; top: 0px;'><span style='font-weight: bold;'>I</span> nternational</p>
29 <p style='position: absolute; left: 15pt; top: 25pt;'><span style='font-weight: bold;'>V</span> irtual</p>
30 <p style='position: absolute; left: 15pt; top: 50pt;'><span style='font-weight: bold;'>O</span> bservatory</p>
31 <p style='position: absolute; left: 0px; top: 75pt;'><span style='font-weight: bold;'>A</span> lliance</p>
32 </div>
33 </div>
34 <h1>URI fragments in IVOA specifications<br/>
35 Version <span class="docversion">0.1</span></h1>
36 <h2 class="subtitle">Filled in automatically</h2>
37 <dl>
38 <dt>Working Group</dt>
39 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/Semantics">Semantics</a>???</dd>
40 <dt><b>This version:</b></dt>
41 <dd><a href="" class="currentlink">filled in automatically</a></dd>
42 <dt><b>Latest version:</b></dt>
43 <dd><a href='http://www.astro.gla.ac.uk/users/norman/temp/uri-fragments.html'
44 >http://www.astro.gla.ac.uk/users/norman/temp/uri-fragments.html</a></dd>
45 <dt><b>Previous version(s):</b></dt>
46 <dd>0.2: <a href='http://www.astro.gla.ac.uk/users/norman/temp/uri-fragments/20120404/'
47 >http://www.astro.gla.ac.uk/users/norman/temp/uri-fragments/20120404/</a></dd>
48 <dt><b>Author(s):</b></dt>
49 <dd>Norman Gray</dd>
50 </dl>
52 <h2>Abstract</h2>
53 <p>The <q>fragment</q> identifier in a URI has a specific semantics
54 attached to it. IVOA specifications should therefore <em>not</em> use
55 it as a simple indicator of hierarchy or <q>containment</q>.</p>
56 <h2> Status of This Document</h2>
57 <p >This is an author's draft.
58 It has no IVOA standing as such, but will be submitted as a Note to
59 the IVOA documents series once it has received some feedback.</p>
60 <p id="statusdecl">(updated automatically)</p>
61 <p> <em >A list of </em><span style="background: transparent"><a href="http://www.ivoa.net/Documents/"><i>current
62 IVOA Recommendations and other technical documents</i></a></span><em > can be found at http://www.ivoa.net/Documents/.</em></p>
63 <h2 class="prologue-heading-western" >Acknowledgements</h2>
64 <p>The author is most grateful for comments and criticisms received from Guy Rixon, Mark Taylor, Markus Demleitner, and Dick Shaw.</p>
65 </div>
66 <h2>Contents</h2>
67 <div>
68 <?toc ?>
69 </div>
70 <div class="body">
71 <div class="section">
72 <h1><a id="intro"></a>Introduction</h1>
74 <p>URIs are defined in IETF RFC 3986 <cite>std:rfc3986</cite>. In its
75 full generality, the syntax of URIs is quite complicated, but most of
76 the URIs we commonly see use only a subset of the possible features, namely a
77 <q>scheme</q> (which is usually <code>http</code> or sometimes, in VO contexts,
78 <code>ivo</code>), a <q>host</q> prefixed by a pair of slashes
79 <code>//</code>, a <q>path</q> with elements separated by single
80 slashes <code>/</code>, and a possible fragment, separated from the
81 rest of the URI by a hash or number sign <code>#</code>. The point of
82 this present note is to stress that the fragment is importantly
83 distinct from the other parts of the URI – it is not sent over the
84 network to a remote server, when the URI is retrieved or
85 dereferenced.</p>
87 <p>When looking at a webpage in a web browser – for example the URL
88 <code>http://www.ivoa.net/Documents/#notes</code>, the browser
89 retrieves the path <code>/Documents/</code> from the server at
90 <code>www.ivoa.net</code> and once it has retrieved the HTML page that
91 come back, it <em>searches within the page</em> for the anchor
92 labelled with <code>notes</code>. Crucially, this search happens
93 entirely on the client side, and it or its analogue happens during the
94 processing of <em>any</em> URI – it is not specific to HTTP or to HTML
95 pages. It also therefore applies to IVORN URIs (starting
96 <code>ivo:</code>) <cite>std:ivo</cite> and VOSpace URIs (starting
97 <code>vos:</code>) <cite>std:voevent</cite>.</p>
99 <p>In brief: The <q>fragment</q> identifier in a URI (RFC 3986,
100 <cite>std:rfc3986</cite>) has a specific semantics
101 attached to it. IVOA specifications should therefore <em>not</em> use
102 it as a simple indicator of hierarchy or <q>containment</q>. Or, put
103 another way: punctu–ation,isn#t ju`st !dec$ora/tion.</p>
105 <p class='issue'><strong>@@@</strong> It might be useful/interesting/valuable to
106 include in this note a discussion of other recommended and deprecated
107 URL patterns. Quite a lot of the suggestions in the famous <a
108 href='http://www.w3.org/Provider/Style/URI'>Cool URIs don't change</a>
109 document are as valid now as they were in 1998.</p>
111 <!--
112 <p>The following discussion is framed in terms of version 2.0 of the VOEvent
113 specification <cite>std:voevent</cite>, because that is where the
114 problem first became evident, and where it can be clearly
115 described. The problem is not, however, restricted to that
116 specification, and any similar use of the URI fragment would cause
117 similar problems in principle.</p>
118 -->
120 </div>
122 <div class='section'>
123 <h1>The problem with fragments</h1>
125 <p>Several IVOA standards define URI patterns for the objects they
126 describe – the VOEvent and VOSpace standards are an example. In this
127 context, it is natural to use the URI fragment as a way of referring
128 to a resource which is conceptually <em>contained within</em> another,
129 by analogy with the way that the fragments in HTML pages are
130 conceptually within the page. Unfortunately, the fixed and invariable
131 meaning attached to URI fragments means that the applications which process such
132 URIs may be required by the (IETF RFC) standard to process them in
133 ways which are unintended by the IVOA standards. If applications
134 carefully do not process them in a conformant way, then we are
135 concerned that those applications will risk being frustrated by
136 conformant library APIs, by caches, and by future developments in URI
137 standards themselves.</p>
139 <p>The rest of this section is a detailed discussion of the problem,
140 with a rather legalistic tone, in terms which presume some
141 acquaintance with the details of the URI specification
142 <cite>std:rfc3986</cite>.</p>
144 <p>The fundamental problem with URI formats such as
145 <code>scheme:foo#local_ID</code> is that the specification for URIs
146 <cite>std:rfc3986</cite> requires that the fragment (the
147 <code>#local_ID</code>) is removed prior to any dereference &ndash;
148 <q>the fragment identifier is separated from the rest of the URI prior
149 to a dereference</q> (this and other quotations here are from section 3.5 of
150 the URI RFC). Other language in this section makes it clear
151 that the fragment has a special, and secondary, status (<q>[t]he
152 fragment identifier component of a URI allows indirect identification
153 of a secondary resource by reference to a primary resource and
154 additional identifying information</q>) and that this
155 cannot be redefined by scheme-specific specifications: <q>[f]ragment
156 identifier semantics are independent of the URI scheme and thus cannot
157 be redefined by scheme specifications</q>.</p>
159 <p>Further, <q>the fragment identifier is not used in the
160 scheme-specific processing of a URI</q>. This means that in order to
161 conform to the URI specification, the processing of the
162 <code>ivo:</code> URI scheme must ignore the fragment. This means
163 that whenever an IVORN <code>ivo://foo/bar#baz</code> is <q>processed</q>
164 (or in general used in any way other than a name in the
165 <code>ivo://foo/bar</code> namespace), that processing must be done on
166 the IVORN <code>ivo://foo/bar</code> alone, and the presence of the
167 <code>#baz</code> fragment taken account of only after retrieval is complete.</p>
169 <p>Another way of phrasing this is that there is no guarantee that a
170 server will <q>see</q> the fragment in any URI, since any of possibly
171 multiple intermediaries between the client and the server will be
172 licensed to remove it (nor, incidentally, is there any guarantee that
173 a server will <em>not</em> see the fragment).</p>
175 <p>The intention of the URI specification is that such a URI is
176 conceptually handled by the client stripping the fragment, processing
177 the resulting cropped URI, and then resolving the fragment, in some
178 scheme-specific way, <em>on the client</em>.</p>
180 <p>In the VOEvent spec, however, <code>.../streamid</code> and
181 <code>.../streamid#local_ID</code> are conceived as completely
182 independent resources, contrary to the prescriptions in the URI
183 RFC.</p>
185 <p>See section <span class='xref'>affected</span> for a note on
186 affected IVOA Standards.</p>
188 <p>This is not merely a theoretical problem, for three reasons.</p>
190 <div class='section'>
191 <h1>Issue 1: scheme handlers may not report the fragment</h1>
192 <p>One can imagine a URI API which allows for scheme-specific
193 handlers (eg for <code>vos:</code> or <code>ivo:</code>), in the way
194 that the <code>java.net.URI</code> class does. Such a handler
195 class's API could potentially be constructed in such a way that the
196 handler code couldn't get access to the fragment part of the parsed
197 URI. This would completely destroy the functionality of a custom
198 handler for <code>ivo:</code> URLs which included significant fragments. And
199 <em>this would not be a bug</em> in the API.</p>
201 <p>The <code>java.net.URLStreamHandler</code> abstract class is not in fact
202 constructed in this way, but this is no guarantee that a different
203 class, in this or a different language, won't act in the same
204 inconvenient fashion.</p>
205 </div>
207 <div class='section'>
208 <h1><a id='i2'/>Issue 2: servers (including caches) may equate URIs with and without fragments</h1>
209 <p>When a cache is asked for
210 <code>scheme:path#fragment</code>, it should simply return the content of
211 <code>scheme:path</code> since, according to the URI spec, and for
212 <em>any</em> scheme, these are equivalent in this context. Indeed,
213 any <code>ivo:</code> cache is <em>required</em> to behave like
214 this (RFC section 6.1: <q>When URIs are compared to select (or avoid)
215 a network action, such as retrieval of a representation, fragment
216 components (if any) should be excluded from the comparison.</q>).
217 That is, if a user-agent were to ask a proxy or cache for
218 <code>ivo://auth/obj#frag</code>, it should receive the contents of
219 <code>ivo://auth/obj</code>.</p>
221 <p>This also is <em>not a bug</em> in the cache.</p>
223 <p>Superficially, it seems that these two problems can be evaded:
224 don't use scheme-specific handlers, and don't use proxies or caches;
225 or more generally, avoid tools which conform to the demands of the URI
226 specification. Depending on the local network environment, however,
227 user-agents may be obliged to use caches; this is unlikely in
228 (current) practice, in the case of non-HTTP URIs, but this may not be
229 avoidable in future for the following reason.</p>
230 </div>
232 <div class='section'>
233 <h1>Issue 3: URIs won't last forever</h1>
234 <p>The third point is the longest-term point, and may not
235 be so easily worked around.</p>
237 <p>At some point &ndash; perhaps in
238 a decade, perhaps longer &ndash; there will be a replacement standard
239 for addressing things on the web (or whatever replaces it).
240 As the web's core addressing technology, URIs are so important that
241 there will certainly be a mechanism for mapping URIs
242 to the new standard, supported by gateways or proxies of some type.
243 At this point, using an HTTP proxy will not be optional, if the IVOA
244 is to remain reasonably consistent with the rest of the world.</p>
246 <p>Whatever technology finally replaces URIs as a addressing mechanism
247 will have a lot of work invested in it, to make sure the two are compatible.
248 The gateways implementing this mapping cannot be guaranteed to be friendly to URI schemes
249 which depend on behaviour which the URI specification declares must
250 not happen.</p>
251 </div>
253 <div class='section'>
254 <h1><a id='namesok'/>Non-problem: URIs as names</h1>
256 <p>We do not wish to suggest that fragments should be avoided in
257 general; there are plenty of cases where they are perfectly
258 appropriate. In the best-known use, to provide a direct link to
259 elements within an HTML page, fragments are useful and
260 unexceptionable; and when a fragment is used to create a <em>name</em>
261 for something – as is used within the Standards Registry Extension, or
262 in many Semantic Web use-cases – that is a useful and increasingly
263 common technique which provides natural namespacing.</p>
265 <p>The Standards Registry Extension specification
266 <cite>std:stdregext</cite> uses URIs as names: for example
267 <code>ivo://ivoa.net/std/QueryProtocol#case-insensitive</code>. Here,
268 there’s no suggestion that the <code>#case-insensitive</code> <q>thing</q>
269 is a differently-retrieved resource &ndash; it is simply a
270 <em>name</em>, and the non-fragment part of the URI is merely acting
271 as a type of namespace. This goes <em>with</em> the grain of the URI
272 definition.</p>
274 <p>At the risk of belabouring the point, the difference between this and the VOEvent case is
275 that in the VOEvent case there is the clear implication that a VOEvent identifier
276 <code>stream#event</code> is not merely a name for an event, but is
277 expected to be retrievable directly, in contrast to being accessed
278 by downloading the entire stream, and <q>searching</q> locally for the
279 secondary resource <code>#event</code>. There is a similar situation,
280 mutatis mutandis, when the VOSpace specification talks of accessing nodes.</p>
281 </div>
283 <div class='section'>
284 <h1><a id='affected'/>Affected IVOA standards</h1>
286 <p>VOEvent identifiers have the form
287 <code>ivo://example.org/streamid#local_ID</code> (see section 2.2 of
288 <cite>std:voevent</cite>). The URI RFC requires that this is resolved
289 by retrieving the resource <code>ivo://example.org/streamid</code> and
290 finding <code>#local_ID</code> within it, but the VOEvent specification
291 indicates that the resources <code>ivo://example.org/streamid</code>
292 and <code>ivo://example.org/streamid#local_ID</code> might be
293 retrieved independently.</p>
295 <p>The text of the VOSpace specification <cite>std:vospace</cite>
296 principally illustrates URI fragments being used as property names;
297 this is unproblematic for the reasons discussed below (Sect.<span class='xref'>namesok</span>). However the
298 specification also describes URIs in a <code>vos:</code> scheme
299 (implicitly and explicitly including fragments) as names for VOSpace
300 nodes, and describes these being retrieved to obtain the node
301 contents. This dereferencing procedure would be adversely affected by
302 the issues described below.</p>
304 <p>Other IVOA specifications which discuss URIs with fragments may
305 need to be examined, to discover whether they are also unwittingly
306 depending on unsupported behaviour.</p>
308 </div>
309 </div>
311 <div class='section'>
312 <h1>Recommendations</h1>
313 <p>This Note makes the following recommendations:</p>
314 <ol>
315 <li>IVOA protocols should not use URI fragments
316 other than in a context in which (a) the fragment is being used as
317 a name for an object which is not expected to be retrieved, or (b)
318 there is an implication that the object so named will be retrieved
319 in the way which is implied by the URI model.</li>
320 <li>If a resource named by a standard-specified URI will ever be
321 retrieved, then to avoid doubt the standard should explicitly note
322 that the fragment processing is expected to be performed by the client.</li>
323 </ol>
324 </div>
326 </div><!--body-->
328 <div class="section-nonum">
329 <h1 ><a name="References" id="References"></a>References</h1>
330 <?bibliography note ?>
331 </div>
332 <hr/>
333 <p style="text-align: right; font-size: x-small; color: #888;">
334 <a href='http://code.google.com/p/volute' >Volute</a>
335 $Revision$ $Date$
336 </p>
338 </body>
339 </html>


Name Value
svn:keywords Revision Date

ViewVC Help
Powered by ViewVC 1.1.26