/[volute]/trunk/projects/dal/TAPNotes/TAPNotes.html
ViewVC logotype

Contents of /trunk/projects/dal/TAPNotes/TAPNotes.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 2383 - (show annotations)
Fri Dec 6 13:51:28 2013 UTC (7 years, 10 months ago) by volute@g-vo.org
File MIME type: text/html
File size: 45524 byte(s)
Tap notes: Added Mark and Paul to the author list.


1 <?xml version="1.0"?>
2 <!-- $Id:$
3 Note that this file should be xhtml with div to mark sections - see README for more information
4 Paul Harrison -->
5 <!DOCTYPE html
6 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "ivoadoc/xmlcatalog/xhtml1-transitional.dtd">
7 <html xmlns="http://www.w3.org/1999/xhtml">
8 <head>
9 <title>TAP Implementation Notes</title>
10 <meta http-equiv="content-type" value="text/html;charset=utf-8"/>
11 <meta name="Title" content="TAP Implementation Notes" />
12 <meta name="author" content="Markus Demleitner, msdemlei@ari.uni-heidelberg.de" />
13 <meta name="maintainedBy" content="Markus Demleitner, msdemlei@ari.uni-heidelberg.de" />
14 <link href="http://www.ivoa.net/misc/ivoa_a.css" rel="stylesheet" type="text/css" />
15 <!-- Add other styling information here (but this element, if present, mustn't be empty)
16 <style type="text/css"></style>
17 -->
18 <link href="./ivoadoc/XMLPrint.css" rel="stylesheet" type="text/css" />
19 <link href="./ivoadoc/ivoa-extras.css" rel="stylesheet" type="text/css" />
20 <style type="text/css" xml:space="preserve">
21
22 .tbw {
23 background: yellow;
24 }
25
26 p.tbw:before {
27 content: 'TO BE WRITTEN: ';
28 }
29
30
31 </style>
32 </head>
33 <body>
34 <div class="head">
35 <div id="titlehead" style="position:relative;height:170px;width: 500px">
36 <div id="logo" style="position:absolute;width:300px;height:169px;left: 50px;top: 0px;">
37 <img src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" alt="IVOA logo"/></div>
38 <div id="logo-title"
39 style="position: absolute; width: 200px; height: 115px; left: 320px; top: 5px; font-size: 14pt; color: #005A9C; font-style: italic;">
40 <p style='position: absolute; left: 0px; top: 0px;'><span style='font-weight: bold;'>I</span> nternational</p>
41 <p style='position: absolute; left: 15pt; top: 25pt;'><span style='font-weight: bold;'>V</span> irtual</p>
42 <p style='position: absolute; left: 15pt; top: 50pt;'><span style='font-weight: bold;'>O</span> bservatory</p>
43 <p style='position: absolute; left: 0px; top: 75pt;'><span style='font-weight: bold;'>A</span> lliance</p>
44 </div>
45 </div>
46 <h1>TAP Implementation Notes<br/>
47 Version <span class="docversion">0.1</span></h1>
48 <h2 class="subtitle">Filled in automatically</h2>
49
50 <dl>
51 <dt>Working Group</dt>
52 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaGridAndWebServices">http://www.ivoa.net/twiki/bin/view/IVOA/IvoaGridAndWebServices</a></dd>
53 <dt><b>This version:</b></dt>
54 <dd><a href="" class="currentlink">filled in automatically</a></dd>
55 <dt><b>Latest version:</b></dt>
56 <dd> not issued outside DAL WG</dd>
57 <dt><b>Previous version(s):</b></dt>
58 <dd>None</dd>
59 <dt>Authors:</dt><dd>
60 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/MarkusDemleitner"
61 >Markus Demleitner</a><br clear="none"/><br/>
62 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/PaulHarrison">Paul Harrison</a><br/>
63 <a
64 href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/MarkTaylor">Mark Taylor</a><br/>
65 </dd>
66 </dl>
67
68 <h2>Abstract</h2>
69 <p>This IVOA Note discusses several clarifications to the TAP protocol
70 stack, i.e., to the ADQL dialect, the UWS job system, the VOSI metadata
71 interfaces, and TAP itself.
72 It also proposes a number of enhancements that might be incorporated
73 in the next versions of the respective standards. The authors hope
74 that the proposed text changes and additions can mature while in the
75 relatively fluid note state to achieve a rapid and easy standards
76 process later on.</p>
77 <p>Further contributions to this text are most welcome.</p>
78
79 <h2> Status of This Document</h2>
80 <p >This is an IVOA note published within the IVOA DAL working group.
81 The first release of this document was on 2013-12-31.</p>
82 <p id="statusdecl">(updated automatically)</p>
83 <p> <em >A list of </em><span style="background: transparent"><a href="http://www.ivoa.net/Documents/"><i>current
84 IVOA Recommendations and other technical documents</i></a></span><em > can be found at http://www.ivoa.net/Documents/.</em></p>
85
86 <h2>Acknowledgements</h2>
87
88 <p>Several sections of this document are based on the the <a
89 href="http://wiki.ivoa.net/twiki/bin/view/IVOA/TAPImplementationNotes">TAPImplementationNotes</a>
90 page on the IVOA wiki <cite>IVOAWIKI</cite>. Several persons
91 contributed to its content, including Mark Taylor, Paul Harrison,
92 Pierre LeSidaner, Tom McGlynn, and Markus Demleitner.</p>
93
94 </div> <!-- header -->
95
96 <h2>Contents</h2>
97 <div>
98 <?toc ?>
99 </div>
100
101 <div class="body">
102 <div class="section">
103 <h1><a id="introduction"></a>Introduction</h1>
104 <p>The protocol stack for exchanging database queries and their results
105 within the Virtual Observatory context is, by 2013, implemented in
106 several software packages, both on the server and on the client
107 side.</p>
108
109 <p>Several implementors found that the respective standards leave some
110 questions open. The first purpose of this document is to collect these
111 questions and give answers reflecting a broad consensus on the part of
112 the implementors. The points raised in these clarifications, errata and
113 recommendations should be addressed in future revisions of the standard
114 texts. It is the intent of this document to serve as an evolving
115 reference for implementors that should eventually reflect the updates
116 to the actual standards.</p>
117
118 <p>With the experience gathered from roll-out and use of the protocols,
119 several additions to (or deletions from) the standards appeared
120 beneficial. This document collects such proposals for changes to the
121 content of the standards. Some of these changes have been written such
122 that neither servers nor clients break and thus are candidates for minor
123 updates to the standards, whereas the adoption of others might require
124 new major releases. Again, the authors plan to evolve this document to
125 have the note reflect the eventual plans for updates to the
126 standards.</p>
127
128 </div> <!-- section introduction -->
129
130 <div class="section">
131 <h1><a id="adql">ADQL</a></h1>
132 <div class="section">
133 <h2><a id="adql-clar">ADQL: Clarifications, Errata, and Recommendations</a></h2>
134
135 <div class="section">
136 <h3><a id="ac-sep">The Separator Nonterminal</a></h3>
137 <p>The grammar given in appendix A of <cite>std:ADQL</cite> gives a
138 nonterminal <em>separator</em>, expanding to either a comment or
139 whitespace. This nonterminal, however, is only referenced within the
140 rule for <em>character_string_literal</em>. It is uncontentious that the
141 intent is to allow comments and whitespace wherever SQL1992 allow them.
142 With the nonterminal in the grammar, however, the ADQL standard says
143 differently, and there should be a clarification.</p>
144
145 <p>One option for such a clarification is to amend section 2.1 of
146 <cite>std:ADQL</cite> with a subsection 2.1.4, "Tokens and literals",
147 containing text like the following (taken essentially from
148 <cite>std:SQL1992</cite>.</p>
149
150 <blockquote>
151 <p>
152 Any <em>token</em> may be followed by a <em>separator</em>. A
153 <em>nondelimiter token</em> shall be followed by a <em>delimiter
154 token</em> or a <em>separator</em>.
155 </p>
156 </blockquote>
157
158 <p>Since the full rules for the separator are somewhat more complex
159 in <cite>std:ADQL</cite>, an attractive alternative could be
160 to omit the <em>separator</em> nonterminal from the grammar and to just
161 note:</p>
162
163 <blockquote>
164 <p>Whitespace and comments can occur wherever they can occur in
165 <cite>std:SQL1992</cite>.</p>
166 </blockquote>
167
168 </div> <!-- subsubseciton ac-sep -->
169
170 <div class="section">
171 <h3><a id="ac-typesystem">Type System</a></h3>
172 <p>The ADQL specification does not explicitly talk about types. Some
173 intentions regarding types can be taken from the grammar (e.g., the lack
174 of a boolean type), but it is clear that for a predictable behaviour
175 across individual ADQL implementations, ADQL should talk about
176 types. The TAP specification has already covered most of the ground
177 here, with a table on PDF page 19 in version 1.0. The following
178 proposal mainly builds on this.</p>
179
180 <p>To introduce a notion of types into section 2 of the ADQL
181 recommendation should be amended with a subsection 2.6, "ADQL Type
182 System", as follows:</p>
183
184 <blockquote>
185 <p>ADQL defines no data definition language (DDL). It is assumed that
186 table definition and data ingestion are preformed in the backend
187 database's native language and type system.</p>
188
189 <p>However, column metadata needs to give column types in order to allow
190 the construction of queries that are both syntactically and semantically
191 correct. Examples of such metadata includes VODataService's
192 <code>vs:TAPType</code> <cite>std:VODS11</cite> or TAP's TAP_SCHEMA.
193 Services SHOULD, if at all possible, try express their column
194 metadata in these terms even if the underlying database employs
195 different types. Services SHOULD also use the following mapping when
196 interfacing to user data, either by serializing result sets into
197 VOTables or by ingesting user-provided VOTables into ADQL-visible
198 tables. Where non-ADQL types are employed
199 in the backend, implementors SHOULD make sure that all operations that are
200 possible with the recommended ADQL type are also possible with the type
201 used in the backend engine. For instance, the ADQL string concatenation
202 operator || should be applicable to all columns resulting from VOTable
203 char-typed columns.</p>
204
205 <table border="1">
206 <tr><th span="3">VOTable</th><th>ADQL</th></tr>
207 <tr><th>datatype</th><th>arraysize</th><th>xtype</th>
208 <th>type</th></tr>
209 <tr><td>boolean</td><td>1</td><td></td><td>implemenation defined</td></tr>
210 <tr><td>short</td><td>1</td><td></td><td>SMALLINT</td></tr>
211 <tr><td>int</td><td>1</td><td></td><td>INTEGER</td></tr>
212 <tr><td>long</td><td>1</td><td></td><td>BIGINT</td></tr>
213 <tr><td>float</td><td>1</td><td></td><td>REAL</td></tr>
214 <tr><td>double</td><td>1</td><td></td><td>DOUBLE</td></tr>
215 <tr><td>(numeric)</td><td>&gt; 1</td><td></td><td>
216 implementation defined</td></tr>
217 <tr><td>char</td><td>1</td><td></td><td>CHAR(1)</td></tr>
218 <tr><td>char</td><td>n*</td><td></td><td>VARCHAR(n)</td></tr>
219 <tr><td>char</td><td>n</td><td></td><td>CHAR(n)</td></tr>
220 <tr><td>unsignedByte</td><td>n*</td><td></td><td>VARBINARY(n)</td></tr>
221 <tr><td>unsignedByte</td><td>n</td><td></td><td>BINARY(n)</td></tr>
222
223 <tr><td>unsignedByte</td><td>n, *,
224 n*</td><td>adql:BLOB</td><td>BLOB</td></tr>
225 <tr><td>char</td><td>n, *, n*</td><td>adql:CLOB</td><td>CLOB</td></tr>
226 <tr><td>char</td><td>n, *,
227 n*</td><td>adql:TIMESTAMP</td><td>TIMESTAMP</td></tr>
228 <tr><td>char</td><td>n, *, n*</td><td>adql:POINT</td><td>POINT</td></tr>
229 <tr><td>char</td><td>n, *,
230 n*</td><td>adql:REGION</td><td>REGION</td></tr>
231 </table>
232
233 <p>"Implementation defined" in the above table means that an
234 implementation is free to reject attempts to (de-) serialize values in
235 these types. They are to be considered unsupported by ADQL, and the
236 language provides no means to manipulate "native" representations of
237 them.</p>
238
239 <p>References to REGION-typed columns must be valid whereever the
240 ADQL <em>region</em> nonterminal is allowed. References to POINT-typed
241 columns must be valid whereever the ADQL <em>point</em> nonterminal is
242 allowed.</p>
243
244 </blockquote>
245
246 </div> <!-- subsubsection ac-typesystem -->
247
248
249 <div class="section">
250 <h3><a id="ac-emptycoosys">Empty Coordinate Systems</a></h3>
251
252 <p>The legal values and the semantics of the first arguments to the
253 geometry constructors (POINT, BOX, CIRCLE, POLYGON) have been left
254 largely open by the ADQL standard. The TAP standard clarified those
255 somewhat to the effect that the prescriptions became implementable. On
256 the other hand, the only thing clients can reasonably expect according
257 to TAP (on a recommendation base) from a server is one of four reference
258 frames. Compared to the implementation effort and the potential for
259 user confusion, the additional expressiveness gained by keeping the
260 first argument seems minute. Even allowing more expressive system
261 strings will not help the feature much, since non-trivial
262 transformations (e.g., between reference positions) will need more data
263 than merely the celestial coordinates available to the geometry
264 constructors.</p>
265
266 <p>We therefore propose to deprecate the first argument in a point
267 release of ADQL. In the next major release, the first argument as
268 defined in ADQL2 should be declared as ignored. The standard should
269 require constructors both with and without the current first argument,
270 though, in order to ensure backward compatiblity for ADQL2 queries.</p>
271
272 <p>To implement the first step, we propose replacing the second
273 paragraph on PDF page 10 of <cite>std:ADQL</cite> (starting with "For all
274 these functions...") with:</p>
275
276 <blockquote>
277 <p>For historical reasons, the geometry constructors (BOX, CIRCLE, POINT,
278 POLYGON) require a string-valued first argument. It was intended to
279 carry information on a reference system or other coordinate system
280 metadata. In this version, we recommend ignoring this first argument,
281 and clients are advised to pass an empty string here. Future versions
282 of this specification will make this first, string-valued parameter
283 optional for the listed functions.</p>
284 </blockquote>
285
286 <p>In consequence, the COORDSYS function would be taken out of the
287 enumeration on PDF page 9, and its description on PDF page 11 would be
288 removed, too. All examples would use an empty string rather than "ICRS
289 GEOCENTER" -- which is not contained in the TAP clarification anyway --
290 as in the current text.</p>
291
292 <p>A library of standard generalized user defined functions (see section
293 <span class="xref">af-genudf</span>) could provide for simple conversion
294 between reference frames as well as more demanding transformations,
295 e.g., between epochs or reference positions. This, however, depends on
296 allowing geometry-valued user defined functions and is outside of the
297 scope of a clarification. See also section <span
298 class="xref">af-genudf</span>.</p>
299
300 </div> <!-- subsubsection adql-emptycoosys -->
301 </div> <!-- subsection adql-clar -->
302
303 <div class="section">
304 <h2><a id="adql-features">ADQL: Proposed New Features</a></h2>
305
306 <div class="section">
307 <h3><a id="af-simplecrossmatch">Simple Crossmatch Function</a></h3>
308
309 <p>Since a simple positional crossmatch is such a common operation, we
310 should define a function <code>CROSSMATCH(ra1, dec1, ra2, dec2, radius) ->
311 INTEGER</code> returning 1 if
312 (ra1, dec1) and (ra2, dec2) are within radius degrees of each other.
313 This allows more compact expressions than the conventional
314 CONTAINS(POINT, CIRCLE) construct, and ADQL to SQL translators can more
315 easily exploit special constructs for fast crossmatching that may be
316 built into the backend databases.</p>
317 </div> <!-- subsubsection af-simplecrossmatch -->
318
319 <div class="section">
320 <h3><a id="af-intersects">No Type-based Decay of INTERSECTS</a></h3>
321
322 <p>Section 2.4.11 of
323 <cite>std:ADQL</cite> stipulates that a call to INTERSECTS should decay
324 to a CONTAINS when one argument is a POINT. This rule is a major
325 implementation liability for simple translators, since it is the only
326 place in the ADQL specification that actually requires a type calculus.
327 For a feature that does not actually add functionality, this seems a
328 high price to pay.</p>
329
330 <p>We therefore recommend to strike the text from "Note that if one of
331 the arguments" through "equivalent to INTERSECTS(b,a)" and add at the
332 end for 2.4.11:</p>
333
334 <blockquote>
335 <p>The arguments to INTERSECTS SHOULD be geometric expressions
336 evaluating to either BOX, CIRCLE, POLYGON, or REGION. Previous versions
337 of this specification allow POINTs as well and require servers to
338 interpret the expression as a CONTAINS with the POINT moved into the
339 first position. Servers SHOULD still implement that behaviour, but
340 clients SHOULD NOT expect it. It will be dropped in the next major
341 version of this specification.</p>
342 </blockquote>
343 </div> <!-- subsubsection af-intersects -->
344
345 <div class="section">
346 <h3><a id="af-genudf">Generalized User Defined Functions</a></h3>
347
348 <p>Currently, user defined functions may only return numbers or strings
349 (in terms of the grammar, only <em>numeric_value_function</em> and
350 <em>string_value_function</em> can expand to
351 <em>user_defined_function</em>). Many interesting functions (e.g.,
352 coordinate transforms, applying proper motions) are extremely
353 inconvenient to define with such a restriction. Therefore, we propose
354 to add <code>| &lt;user_defined_function&gt;</code> to the right hand
355 side of the <em>geometry_value_function</em> rule.</p>
356
357 <p>With this, we could define some standard functions for manipulating
358 geometries; these should be defined in the standard, but they could
359 remain optional. Clients can determine their availability using
360 <cite>std:TAPREGEXT</cite>.</p>
361
362 <p>A future version of this note will propose a library of such
363 functions, including proper motion, precession, and system
364 transformation.</p>
365
366 </div> <!-- subsubsection af-genudf -->
367
368 <div class="section">
369 <h3><a id="af-casefolding">Case-Insensitive String Comparisons</a></h3>
370
371 <p>ADQL currently has no facility reliably allowing case-insensitive
372 string comparisons. This is particularly regrettable since UCDs and at least
373 the majority of the defined utypes are to be compared
374 case-insensitively.</p>
375
376 <p>Thus, we propose the addition of a string function <code>LOWER</code>
377 and the case-insensitive variant of <code>LIKE</code>,
378 <code>ILIKE</code>. Since case folding is a nontrivial operation in a
379 multi-encoding world, ADQL would only require standard behaviour for the
380 ASCII characters (which would suffice for UCDs and utypes) and only
381 recommend following algorithm R2 in section 3.13, "Default Case Algorithms" of
382 <cite>std:UNICODE</cite> outside of ASCII.</p>
383
384 <p>The grammar changes are trivial.</p>
385
386
387 </div> <!-- subsubsection af-casefolding -->
388
389 <div class="section">
390 <h3><a id="af-setops">Set Operators</a></h3>
391
392 <p>ADQL 2.0 does not support any of the SQL <code>UNION</code>,
393 <code>EXCEPT</code> and <code>INTERSECT</code> operators. Since
394 at least set union and intersection are basic operations of relational algebra
395 and combining data from several tables is an operation of significant
396 practical use, this is a serious deficit. Also, there is probably no
397 backend SQL system that does not support these operations.</p>
398
399 <p>Thus, to add minimal support of set operations to ADQL, ADQL systems
400 will mainly need to update their grammars. The following rules, adapted
401 from <cite>std:SQL1992</cite>, will suffice (the
402 <em>query_expression</em> rule replaces the one given in the current
403 grammar, all others are new rules):</p>
404
405 <pre><![CDATA[
406 <query_expression> ::=
407 <non_join_query_expression>
408 | <joined_table>
409
410 <non_join_query_expression> ::=
411 <non_join_query_term>
412 | <query_expression> UNION [ ALL ] <query_term>
413 | <query_expression> EXCEPT [ ALL ] <query_term>
414
415 <query_term> ::=
416 <non_join_query_term>
417 | <joined_table>
418
419 <non_join_query_term> ::=
420 <non_join_query_primary>
421 | <query term> INTERSECT [ ALL ]
422
423 <query primary> ::=
424 <non_join_query_primary>
425 | <joined_table>
426
427 <non_join_query_primary> ::=
428 <query_specification>
429 | <left_paren> <non_join_query_expression> <right_paren>
430
431 ]]></pre>
432
433 <p>This leaves out the <code>CORRESPONDING</code> specifications of
434 SQL92, and it
435 still does not include <code>VALUES</code> and explicit table
436 specifications (which would enter through
437 <em>non_join_query_primary</em>) in ADQL. None of these seem
438 indispensible, although one could probably make a case for
439 <code>VALUES</code> .</p>
440
441
442 </div> <!-- subsubsection af-union -->
443
444
445 <div class="section">
446 <h3><a id="af-booleans">Adding a Boolean Type</a></h3>
447
448 <p>Having a boolean type in ADQL could make some expressions nicer
449 (e.g., it could eliminate the comparison against 1 for the geometry
450 predicate functions). However, adding boolean functions and allowing
451 references to boolean columns complicates catching syntax errors
452 significantly, since expressions like <code>WHERE colref</code> would
453 then parse and only would only raise an error when it turns out that
454 colref does not refer to a boolean column. Simple ADQL translators
455 may not be able to verify this.</p>
456
457 <p>We therefore propose to add a boolean type to the ADQL type system
458 (see section <span class="xref">ac-typesystem</span>) without any
459 grammatical support for it. However, the standard prose should be
460 amended to contain:</p>
461
462 <blockquote>
463 <p>If the backend database contains columns of type boolean, a
464 comparison of those against the literal strings <code>True</code> and
465 <code>False</code> must be true and false when the column is true and
466 false, respectively. The comparison to other literals is undefined by
467 this specification. Clients should note that the strings have to be
468 entered exactly as given here, without changing case, adding whitespace,
469 or any other modification.
470 </p>
471 </blockquote>
472
473 <p>If this change is adopted, the type system table given in section
474 <span class="xref">ac-typesystem</span> should be updated; luckily, the
475 VODataService specification underlying VOSI already allows BOOLEAN as a
476 TAPType. In the table row for VOTable boolean,
477 "implementation defined" should be replaced with "BOOLEAN".</p>
478
479
480 </div> <!-- subsubsection af-booleans -->
481
482
483 <div class="section">
484 <h3><a id="af-unitcast">Casting to Unit</a></h3>
485
486 <p>ADQL translators can typically introspect the tables they operate on,
487 and thus can typically infer the (physical) unit of a column. Manually
488 converting units (as in <code>col_in_deg*3600</code> is error-prone, and
489 expressions like that make it almost impossible to infer the unit of the
490 result.</p>
491
492 <p>This problem is addressed by the introduction of a function
493 <code>IN_UNIT(expr, &lt;character_string_literal&gt;)</code>; the
494 second argument has to be a literal in order to make sure that an ADQL
495 translator has access to its value; this value must be in the format
496 defined by
497 <cite>std:VOUNIT</cite>. The intended functionality is that the
498 translator replaces the function call with an new expression that
499 is <code>expr</code> given in the unit defined by
500 the second argument if the translator can figure out <code>expr</code>'s
501 unit, and it knows how to convert values in one unit into another.
502 In every other case, the query must be rejected as erroneous.</p>
503
504 </div> <!-- subsubsection af-unitcast -->
505
506 <div class="section">
507 <h3><a id="af-ucdcol">Column References with UCD Patterns</a></h3>
508 <p>In the same spirit of a function that really is a macro evaluated by
509 an ADQL translator, we suggest a new function
510 <code>UCDCOL(&lt;character_string_literal&gt;)</code>. The
511 <em>character_string_literal</em> in this case specifies a posix
512 shell pattern (i.e., users write * for a sequence of 0 or more arbitrary
513 chars, ? for exactly one arbitrary char, [] for a character range, and
514 the backslash is the escape character)
515 for a UCD. The translator replaces the entire
516 function call with the first match of a column matching this pattern.
517 If no such column exists, the query must be rejected as erroneous.</p>
518
519 </div> <!-- subsubsection af-ucdcol -->
520 </div> <!-- subsection adql-features -->
521 </div> <!-- section adql -->
522
523
524 <div class="section">
525 <h1><a id="uws">UWS</a></h1>
526 <div class="section">
527 <h2><a id="uws-clar">UWS: Clarifications, Errata, and Recommendations</a></h2>
528
529 <div class="section">
530 <h3><a id="uc-initpost">Updating Parameters</a></h3>
531
532 <p>Section 2.1.11 of
533 <cite>std:UWS</cite> states that a "particular implementation of UWS may
534 choose to allow the parameters to be updated after the initial job
535 creation step, before the Phase is set to the executing state" and
536 successively allows POSTing to jobs/job-id, jobs/job-id/parameters and
537 PUTting to jobs/job-id/parameters/parameter-name.</p>
538
539 <p>It turned out that the concrete semantics of this cavalier approach
540 quickly become difficult. We therefore propose to amend the language
541 on changing parameters post-creation by:</p>
542
543 <blockquote>
544 <p>
545 In most cases, the values of the parameters are all established during
546 the initial POST that creates the job. However, a particular
547 implementation of UWS may choose to allow the parameters to be updated
548 after the initial job creation step, before the Phase is set to the
549 executing state. It should, however, not offer the ability to create new
550 parameters nor delete existing parameters.
551 The next major version of this specification will remove the ability
552 to set an individual parameter.</p>
553
554 <p>From the client perspective, there is only one guaranteed way to set
555 a parameter that all UWS services must implement: In the initial POST
556 that creates the job.</p>
557 </blockquote>
558 </div> <!-- subsubsection uc-initpost -->
559
560 <div class="section">
561 <h3><a id="uc-failedjobcreation">Behaviour for Failed Job Creation</a></h3>
562 <p>In Section 2.2.3.1 of
563 <cite>std:UWS</cite> a UWS is required to return a "code 303 'See
564 other'" "unless the service rejects the request". It is not specified
565 what should happen when the service rejects the request.</p>
566
567 <p>We propose to add, at an appropriate position, the following
568 text:</p>
569
570 <blockquote>
571 <p>If the execution of an UWS request fails, the service has to
572 generate an appropriate error message with codes in the 400 (client
573 error) or 500 (server error) ranges according to
574 <cite>std:HTTP</cite>. If the erroneous request is recoverable (e.g.,
575 a request for a transition to an impossible state), the job does not
576 go into the ERROR state because of a failed request.</p>
577
578 <p>The payload of such an error message SHOULD be a VOTable formatted
579 as a <cite>std:DALI</cite>-compliant error message, accompanied by one
580 of the legal VOTable MIME types. Clients should be prepared for other
581 documents coming back as payloads of such request responses. As such
582 events can be assumed major server failures, it is recommended to
583 abandon a job that had a non-VOTable response to any UWS request.</p>
584 </blockquote>
585
586 </div> <!-- subsubsection uc-failedjobcreation -->
587 </div> <!-- subsection uws-clar -->
588
589 <div class="section">
590 <h2><a id="uws-features">UWS: Proposed New Features</a></h2>
591 <div class="section">
592 <h3><a id="uf-quoteformat">Format of Quote</a></h3>
593 <p>Section 2.2.1 of
594 <cite>std:UWS</cite> states that the jobs/job-id/quote resource
595 represents quote as a number of seconds, while the schema represents
596 quote as an xs:dateTime.</p>
597
598 <p>This is an unnecessary inconsistency. Since schema changes are
599 probably more expensive, we propose to solve it by requiring the
600 representation in the resource to be in ISO 8601 YYYY-mm-ddThh:mm:ss
601 form. While doing this, we should also clarify the format for the value
602 of desctruction, that currently just defers to
603 <cite>std:iso8601</cite>; it should be made clear that the particular
604 format just given is to be used.</p>
605
606 </div> <!-- subsubsection uf-quoteformat -->
607
608 </div> <!-- subsection uws-features -->
609 </div> <!-- section uws -->
610
611
612 <div class="section">
613 <h1><a id="tap">TAP</a></h1>
614 <div class="section">
615 <h2><a id="tap-clar">TAP: Clarifications, Errata, and Recommendations</a></h2>
616
617
618 <div class="section">
619 <h3><a id="tc-uploadsyntax">Names of Uploaded Tables</a></h3>
620 <p>Section 2.5 of
621 <cite>std:TAP</cite> requires the name of the uploaded tables to be a
622 "legal ADQL table name with no catalog or schema (e.g. an unqualified
623 table name)". This language probably allows delimited identifiers,
624 as the ADQL <em>table_name</em> can expand to one. This, however, was
625 clearly not the intention of text, as the use of delimited identifiers
626 is not (fully) supported by the syntax of the UPLOAD parameter. To
627 resolve these difficulties, we propose to
628 replace the parenthesis starting with "e.g." with:</p>
629
630 <blockquote>
631 <p>i.e., a string following the <em>regular_identifier</em> production
632 of
633 <cite>std:ADQL</cite>.</p>
634 </blockquote>
635
636 <p>This could, in theory, invalidate existing clients that might want to
637 use delimited identifiers in uploads. Due to the difficulties with the
638 UPLOAD parameter syntax, however, that would not really be supported in
639 version 1, either. Thus, we claim that this language can enter in a
640 minor version.</p>
641
642 </div> <!-- subsubseciton tc-uploadsyntax -->
643
644 <div class="section">
645 <h3><a id="tc-multiupload">Multiple UPLOAD Posts</a></h3>
646 <p>Since UWS allows posting parameters after job creation Section 2.5.1
647 of
648 <cite>std:TAP</cite> needs to specify what happens when the UPLOAD
649 parameter is posted into a job that already has one or more uploads. We
650 propose to add at the end of the section:</p>
651
652 <blockquote>
653 <p>UPLOADs are accumulating, i.e., each UPLOAD parameter given will
654 create one or more tables in TAP_UPLOAD. When the table names from two
655 or more upload items agree after case folding, the service behaviour is
656 unspecified. Clients thus cannot reliably overwrite uploaded tables; to
657 correct errors, they have to tear down the existing job and create a new
658 one.</p>
659 </blockquote>
660 </div> <!-- subsubseciton tc-multiupload -->
661
662
663 <div class="section">
664 <h3><a id="tc-dbregion">Database Column Types</a></h3>
665 <p>Section 2.5 of
666 <cite>std:TAP</cite> gives "database column types" for all kinds of
667 VOTable objects. Given the lack of an ADQL type system, this must be
668 clearly be taken with a grain of salt; the types given in this column at
669 least cannot be taken as conformance criteria. We propose to add the
670 following language before section 2.5.1:</p>
671
672 <blockquote>
673 <p>Note that the last column of Table (x) is not normative.
674 Implementations SHOULD try to make sure that the actual types chosen are
675 at least signature-compatible with the recommended types (i.e., integers
676 should remain integers, floating-point values floating-point values,
677 etc.), such that clients can reliably write queries against uploaded
678 tables.</p>
679 <p>For columns with xtype <code>adql:REGION</code>, this is particularly
680 critical, since databases typically use different types to represent
681 various STC-S objects. Clients are advised to assume that such columns
682 will be approximated with polygons in the actual database table.</p>
683 </blockquote>
684 </div> <!-- subsubseciton tc-dbregion -->
685
686 <div class="section">
687 <h3><a id="tc-size">The size Column in TAP_SCHEMA</a></h3>
688 <p>The table TAP_SCHEMA.columns as specified in section 2.6.3 of
689 <cite>std:TAP</cite> has a column named size. This is unfortunate since
690 SIZE is an ADQL reserved word, and thus must be quoted in queries.</p>
691
692 <p>We therefore propose to append the following language to section
693 2.6.3:</p>
694
695 <blockquote>
696 <p>To use <code>size</code> in a query, it must be put in double quotes
697 since it collides with an ADQL reserved word. Since delimited
698 identifiers are case-sensitive, for the size column both clients and
699 servers MUST always (in particular, in the DDL for TAP_SCHEMA) use lower
700 case exclusively.</p>
701 <p>In the next major version of TAP, this column will be called
702 <code>arraysize</code>.</p>
703 </blockquote>
704
705 </div> <!-- subsubseciton tc-size -->
706
707
708 <div class="section">
709 <h3><a id="tc-errordoc">Use of VOTable</a></h3>
710 </div> <!-- subsubseciton tc-errordoc -->
711 <p>To allow the text to be consistent with the rules for VOTable error
712 documents, we propose the following changes in Section 2.9 of
713 <cite>std:TAP</cite>:</p>
714
715
716 <table>
717 <tr><th>Current</th><th>New</th></tr>
718 <tr>
719 <td>
720 The VOTable must contain a RESOURCE element identified with the
721 attribute type='results', containing a single TABLE element with the
722 results of the query.</td>
723 <td>
724 The VOTable must contain a RESOURCE element identified with the
725 attribute type='results', containing exactly one TABLE element with the
726 results of the query if the job execution was successful or no TABLE
727 element if the job execution failed to produce a result.</td>
728 </tr>
729 <tr>
730 <td>The RESOURCE element must contain, before the TABLE element, an INFO
731 element with attribute name = "QUERY_STATUS". The value attribute must contain one of the following values:</td>
732 <td>The RESOURCE element must contain an INFO
733 element with attribute <code>name="QUERY_STATUS"</code> indicating the
734 success of the operation. For RESOURCE elements
735 that contain a TABLE element, this INFO element must appear lexically
736 before the TABLE. The following values are defined for this INFO
737 element's value attribute:</td>
738 </tr>
739 </table>
740
741
742
743
744
745 </div> <!-- subsection tap-clar -->
746
747
748 <div class="section">
749 <h2><a id="tap-features">TAP: New Features</a></h2>
750
751 <div class="section">
752 <h3><a id="tf-examples">An examples Endpoint</a></h3>
753 <p>Feedback from TAP users indicates that providing query examples is
754 considered most helpful, which is probably not surprising since to
755 effectively use a TAP service, a user has to combine knowlege of a
756 fairly complex query language with server-specific metadata like table
757 schemata and local extensions as well as domain knowledge. A head start
758 as provided by examples doing something related to what the users
759 actually want is therefore most welcome.</p>
760
761 <p>TAP services are usually accessed through specialized clients.
762 Therefore, a simple link "for examples see here" will in general not
763 work for them. In principle, one could simply communicate an example
764 URL to a client and let the user browse it. Allowing a certain amount
765 of structuring within the document at this URL, however, lets clients
766 do some useful in-application presentation of the examples.</p>
767
768 <p><cite>std:DALI</cite> defines a simple system to communicate examples
769 to humans and machine clients alike, based on RDFa. This section
770 specifies how the generic DALI specification is to be applied to
771 TAP.</p>
772
773 <div class="section">
774 <h4><a id="tf-ex-endpoint">The Endpoint</a></h4>
775 <p>A TAP server exposes the example queries in an <code>examples</code>
776 endpoint
777 residing next to <code>sync</code>, <code>async</code> ,
778 and the VOSI endpoints. A GET from
779 this endpoint MUST yield a document with a MIME type of either
780 <code>application/xhtml+xml</code> or <code>text/html</code>. A service
781 that does not provide examples MUST return a 404 HTTP status on
782 accessing this resource.</p>
783
784 <p>If present, the endpoint must be represented in a capability in the
785 TAP service's registry record. The capability's standardID is, as
786 defined by DALI, <code>ivo://ivoa.net/std/DALI#examples</code>. A
787 capability element could hence look like this:</p>
788
789 <pre>
790 <![CDATA[
791 <capability standardID="ivo://ivoa.net/std/DALI#examples">
792 <interface xsi:type="vr:WebBrowser">
793 <accessURL use="full">http://localhost:8080/tap/tapexampless</accessURL>
794 </interface>
795 </capability>
796 ]]>
797 </pre>
798
799 </div> <!-- subsubseciton tf-ex-endpoint -->
800
801 <div class="section">
802 <h4><a id="tf-ex-content">Document Content</a></h4>
803
804 <p>The document at <code>examples</code> MUST follow the rules laid out
805 for DALI-examples in <cite>std:DALI</cite>; in particular, it must be
806 valid XML, viewable with "common web browsers".</p>
807
808 <p>TAP defines two additional properties within the
809 <code>ivo://ivoa.net/std/DALI-examples</code> (note that at the time of
810 writing the DALI PR has "DALI#examples" here, which we corrected here)
811 vocabulary:</p>
812
813 <ul>
814 <li><code>query</code> --
815 each example MUST have a unique child
816 element with simple text content having a <code>property</code>
817 attribute valued <code>query</code>. It contains the query itself,
818 preferably with extra whitespace for easy human consumption and editing.
819 This will usually be a HTML <code>pre</code> element.</li>
820
821 <li><code>table</code> -- examples MAY also have descendants with
822 <code>property</code> attributes having the value
823 <code>table</code>. These must have pure text content and
824 contain fully qualified table names to which the query is somehow
825 "pertaining". Suitable HTML elements holding these include
826 <code>span</code>, or <code>a</code> (which would
827 allow linking to further information on the table).</li>
828
829 </ul>
830 <p>An example for a document served from the examples endpoint is given
831 in Appendix <span class="xref">appA</span></p>
832 </div> <!-- subsubsubsection tf-ex-content -->
833
834 <div class="section">
835 <h4><a id="tf-ex-use">Intended Use</a></h4>
836
837 <p>In the simplest case, TAP clients can provide links to the current
838 server's example endpoint. A more advanced interface would give an
839 interface element allowing the selection of example titles with the
840 option of entering the sample query into the query field of the user
841 interface. The documentation for the query would be accessed by opening
842 a web browser using the base example URL and the example's fragment
843 identfier.</p>
844
845 <p>Advanced clients could render the HTML div elements themselves, and they
846 could provide a means to discover example queries involving particular
847 tables in their table metadata browser based on
848 <code>property=table</code> markup.</p>
849 </div> <!-- subsubsubsection tf-ex-use -->
850
851
852 <div class="section">
853 <h4><a id="tf-ex-validation">Validation</a></h4>
854
855 <p>Appendix <span class="xref">appB</span> givs an XSLT 1.0 stylesheet
856 that extracts the machine readable information from compliant documents
857 and emits the results in text format.</p>
858
859 <p>The style sheet checks for proper vocabulary declaration. If you
860 have no element declaring the vocabulary, the output will be empty.</p>
861
862 <p>Service operators should also use RDFa validation tools, e.g., the W3C RDFa
863 validator <cite>RDFaVal</cite>, to make sure their document is usable
864 from RDF tools.</p>
865 </div> <!-- subsubsubsection tf-ex-validation -->
866
867 </div> <!-- subsubseciton tf-examples -->
868
869 <div class="section">
870 <h3><a id="tf-plan">A plan Endpoint</a></h3>
871
872 <p class="tbw">CDS have a debug endpoint with additional information;
873 join their concepts with this.</p>
874
875 <p>As already noted in <cite>std:TAP</cite>, it is notoriously
876 difficult to predict the runtime of SQL queries. For nontrivial
877 queries, even experts may have a hard time figuring out performance
878 bottlenecks. Therefore, most database systems provide some mechanism to
879 obtain a query plan, that is, to inspect what elementary operations will
880 be performed for a given query.</p>
881
882 <p>Since TAP queries are typically formulated by persons not intimately
883 familiar with the database queried, the need for a mechanism allowing
884 insights into the database engine's reasoning is
885 even more pronounced. On the other hand, different database systems
886 give their plans in completely different formats and even schemata. In
887 addition, as the Postgres Documentation
888 says: "Plan-reading is an art that deserves an extensive tutorial"
889 (<cite>doc:Postgres92</cite>, Sect. 14.1).</p>
890
891 <p>Thus, specifying a fixed format for query plans that would be both
892 expressive enough and sufficiently generic to be easily adaptible to
893 various backend database engines is probably impossible. To still allow
894 users to inspect actual query plans, we propose the following language
895 be added at the end of section 2.2.2 of <cite>std:TAP</cite>:</p>
896
897 <blockquote>
898 <p>In addition to the UWS resources, a TAP server SHOULD support a
899 child <code>plan</code> for each job resource. If retrieving this
900 resource is successful (i.e., results in a 200 HTTP response after
901 possible redirects and authentication), it MUST be a preformatted
902 document with MIME type <code>text/plain</code>. Within it, the actual
903 query as executed by the database engine MUST come first.</p>
904
905 <p>After at least one blank line, a rendering of the query plan follows.
906 Note that the query as excecuted may contain blank lines, which means
907 that machine clients cannot use the blank line to separate query and
908 plan. In general, clients SHOULD display the plan without any
909 reformatting in a fixed-width font.</p>
910
911 <p>Since it is hard to define a generic and sufficiently
912 expressive format for query plans and the authors want to avoid
913 excessive implemenation cost for this feature, this specification does
914 not give a format for the query plan. Implementors are advised to keep
915 as much of the "native" plan format of their database engine as
916 possible.</p>
917
918 <p>After the plan, the service is free to give additional debugging
919 information. The indended audience for this information are again
920 humans, so even in cases where proprietary clients actually parse out
921 information from that area, such information should still be
922 decipherable by knowledgeable humans.</p>
923
924 <p>If the creation of the query plan fails, the service MUST reply with
925 a 400 (if the failure appears to be due to syntax errors in the query,
926 the query plan not being available in this UWS phase, or
927 similar problems) or 500 HTTP status code. Errors in plan generation do
928 <em>not</em> change the phase of the job. Clients may thus use the plan
929 endpoint to check the syntax of a query on services supporting it.</p>
930
931 <p>Services that cannot or choose not to support the retrieval of query
932 plans MUST respond with a 404 HTTP code to requests for
933 <code>plan</code> children of job resources.</p>
934
935 <p>Except for 404 responses, all documents delivered from the plan
936 endpoint MUST have the MIME type text/plain. They should contain ASCII
937 exclusively, but clients SHOULD assume UTF-8 encoding and if no
938 character set is declared by HTTP means.</p>
939 </blockquote>
940
941 </div> <!-- subsubseciton tf-plan -->
942
943
944
945 <div class="section">
946 <h3><a id="tf-scaletable">Scaleable tables Endpoint</a></h3>
947 <p>For archives serving hundreds or thousands of tables, the
948 tables endpoint on TAP services as defined by <cite>std:VOSI</cite>
949 will have to return documents of several dozen megabytes. This results
950 in nontrivial transfer times for data that in all likelihood is
951 uninteresting to the user that typically will only write queries against
952 fairly few of those tables.</p>
953
954 <p>To mitigate this problem, we propose to define that
955 <code>vs:Table</code> typed elements in responses from VOSI table
956 endpoints that have no <code>column</code> children are to be regarded
957 as stubs by clients. A client SHOULD give the user the possibility to
958 request "full" information on such a stubbed table. This full
959 information is available from a child resource of tables named like the
960 table, in exactly the captialization as given in the <code>name</code>
961 child of the table stub; it would come as the full table element.</p>
962
963 <p>As an example, a service might return the following from its tables
964 endpoint:</p>
965
966 <pre><![CDATA[
967 <tableset xmlns:vs="http://www.ivoa.net/xml/VODataService/v1.1"
968 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
969 xsi:type="vs:TableSet">
970 <schema>
971 <name>ppmxl</name>
972 <table>
973 <name>ppmxl.main</name>
974 </table>
975 </schema>
976 </tableset>]]>
977 </pre>
978
979 <p>A client could then retrieve the url
980 <code>.../tables/ppmxl.main</code> and would receive something like
981 this:</p>
982
983 <pre><![CDATA[
984 <table xmlns:vs="http://www.ivoa.net/xml/VODataService/v1.1"
985 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
986 xsi:type="vs:Table">
987 <name>ppmxl.main</name>
988 <description> PPMXL is a catalog of positions, proper motions...
989 </description>
990 <column>
991 <name>ipix</name>
992 <description>Identifier (Q3C ipix of the USNO-B 1.0 object)</description>
993 ...
994 </table>]]>
995 </pre>
996
997 <p>More formally, we propose to replace the last paragraph of section
998 3.4, "Table metadata", of <cite>std:VOSI</cite>, Version 1.0, with the
999 following text:</p>
1000
1001 <blockquote>
1002 <p>In the REST binding, the registred URL retrieves an XML document
1003 containing this element. However, services exposing a large number of
1004 tables may only write table stubs into the document retrieved from this
1005 web resource. Table stubs are table elements containing no column
1006 children. While the XSD requires a name child to be present, the
1007 services may or may not include any of the remaining table metadata.</p>
1008
1009 <p>Still in the REST binding, the server that has produced such a columnless
1010 table element should provide a child of the tables resource named like the
1011 content of the tables <code>name</code> child element, with any leading
1012 or trailing whitespace removed. If a request for this resource is
1013 successful, the document received must contain a XML document containing
1014 a single element of the type
1015 <em>{http://www.ivoa.net/xml/VODataService/v1.1}Table</em> with all metadata
1016 available for the table.</p>
1017 </blockquote>
1018
1019 </div> <!-- subsubseciton tf-scaletable -->
1020
1021 <div>
1022 <h3><a id="tf-noasync">Making the async Endpoint Optional</a></h3>
1023
1024 <p>Some existing TAP-like services have data that is small and simple enough
1025 that synchronous queries are likely to be sufficient. They therefore
1026 chose not to implement the async endpoint, which makes these services
1027 technically non-TAP. Given the implemenation overhead of a UWS for
1028 something that is not really required by the services in question, the
1029 choice seems reasonable, though, and the services are "mostly
1030 interoperable" with existing clients in that there are usually ways to
1031 operate the services from the clients.</p>
1032
1033 <p>Therefore, we propose to make the async endpoint optional and add
1034 language that requires clients to offer ways to fall back to synchronous
1035 operation for services that do not support async.</p>
1036 </div>
1037
1038 </div> <!-- subsection tap-features -->
1039 </div> <!-- section tap -->
1040
1041
1042 <div class="appendices">
1043 <div class="section">
1044 <h2><a id="appA">Appendix A. An Example for an /examples
1045 Document</a></h2>
1046 <?incxml href="../examples_ex.html"?>
1047 </div> <!-- appA -->
1048
1049 <div class="section">
1050 <h2><a id="appB">Appendix B. An XSLT stylesheet for validating an examples
1051 Document</a></h2>
1052 <?incxml href="../tapexex.xslt"?>
1053 </div> <!-- appB -->
1054
1055 </div> <!-- appendices -->
1056
1057 <div class="section-nonum">
1058 <h1 ><a id="references"></a>References</h1>
1059 <?bibliography ivoadoc/refs ?>
1060
1061 <?includebibliography?>
1062 </div> <!-- references -->
1063
1064 </div> <!-- body -->
1065 </body>
1066 </html>
1067
1068 <!-- vim: tw=72
1069 -->

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26