I nternational

V irtual

O bservatory

A lliance

TAP Implementation Notes
Version 1.0

IVOA Note 13 December 2013

Working Group
http://www.ivoa.net/twiki/bin/view/IVOA/IvoaGridAndWebServices
This version:
http://www.ivoa.net/Documents/TAPNotes-20131213
Latest version:
not issued outside DAL WG
Previous version(s):
None
Authors:
Markus Demleitner
Paul Harrison
Mark Taylor

Abstract

This IVOA Note discusses several clarifications to the TAP protocol stack, i.e., to the ADQL dialect, the UWS job system, the VOSI metadata interfaces, and TAP itself. It also proposes a number of enhancements that might be incorporated in the next versions of the respective standards. The authors hope that the proposed text changes and additions can mature while in the relatively fluid note state to achieve a rapid and easy standards process later on.

Further contributions to this text are most welcome.

Status of This Document

This is an IVOA note published within the IVOA DAL working group. The first release of this document was on 2013-12-13.

This is an IVOA Note expressing suggestions from and opinions of the authors. It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory. It should not be referenced or otherwise interpreted as a standard specification.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgements

Several sections of this document are based on the the TAPImplementationNotes page on the IVOA wiki [IVOAWIKI]. Several persons contributed to its content, including Mark Taylor, Paul Harrison, Pierre LeSidaner, Tom McGlynn, and Markus Demleitner.

Contents

1. Introduction

The protocol stack for exchanging database queries and their results within the Virtual Observatory context is, by 2013, implemented in several software packages, both on the server and on the client side.

Several implementors found that the respective standards leave some questions open. The first purpose of this document is to collect these questions and give answers reflecting a broad consensus on the part of the implementors. The points raised in these clarifications, errata and recommendations should be addressed in future revisions of the standard texts. It is the intent of this document to serve as an evolving reference for implementors that should eventually reflect the updates to the actual standards.

With the experience gathered from roll-out and use of the protocols, several additions to (or deletions from) the standards appeared beneficial. This document collects such proposals for changes to the content of the standards. Some of these changes have been written such that neither servers nor clients break and thus are candidates for minor updates to the standards, whereas the adoption of others might require new major releases. Again, the authors plan to evolve this document to have the note reflect the eventual plans for updates to the standards.

2. ADQL

2.1. ADQL: Clarifications, Errata, and Recommendations

2.1.1. The Separator Nonterminal

The grammar given in appendix A of [std:ADQL] gives a nonterminal separator, expanding to either a comment or whitespace. This nonterminal, however, is only referenced within the rule for character_string_literal. It is uncontentious that the intent is to allow comments and whitespace wherever SQL1992 allows them. With the nonterminal in the grammar, however, the ADQL standard says differently, and there should be a clarification.

One option for such a clarification is to amend section 2.1 of [std:ADQL] with a subsection 2.1.4, "Tokens and literals", containing text like the following (taken essentially from [std:SQL1992].

Any token may be followed by a separator. A nondelimiter token shall be followed by a delimiter token or a separator.

Since the full rules for the separator are somewhat more complex in [std:ADQL], an attractive alternative could be to omit the separator nonterminal from the grammar and to just note:

Whitespace and comments can occur wherever they can occur in [std:SQL1992].

2.1.2. Type System

The ADQL specification does not explicitly talk about types. Some intentions regarding types can be taken from the grammar (e.g., the lack of a boolean type), but it is clear that for a predictable behaviour across individual ADQL implementations, ADQL should talk about types. The TAP specification has already covered most of the ground here, with a table on PDF page 19 in version 1.0. The following proposal mainly builds on this.

To introduce a notion of types into section 2 of the ADQL recommendation, it should be amended with a subsection 2.6, "ADQL Type System", as follows:

ADQL defines no data definition language (DDL). It is assumed that table definition and data ingestion are performed in the backend database's native language and type system.

However, column metadata needs to give column types in order to allow the construction of queries that are both syntactically and semantically correct. Examples of such metadata includes VODataService's vs:TAPType [std:VODS11] or TAP's TAP_SCHEMA. Services SHOULD, if at all possible, try express their column metadata in these terms even if the underlying database employs different types. Services SHOULD also use the following mapping when interfacing to user data, either by serializing result sets into VOTables or by ingesting user-provided VOTables into ADQL-visible tables. Where non-ADQL types are employed in the backend, implementors SHOULD make sure that all operations that are possible with the recommended ADQL type are also possible with the type used in the backend engine. For instance, the ADQL string concatenation operator || should be applicable to all columns resulting from VOTable char-typed columns.

VOTableADQL
datatypearraysizextypetype
boolean1implemenation defined
short1SMALLINT
int1INTEGER
long1BIGINT
float1REAL
double1DOUBLE
(numeric)> 1 implementation defined
char1CHAR(1)
charn*VARCHAR(n)
charnCHAR(n)
unsignedByten*VARBINARY(n)
unsignedBytenBINARY(n)
unsignedByten, *, n*adql:BLOBBLOB
charn, *, n*adql:CLOBCLOB
charn, *, n*adql:TIMESTAMPTIMESTAMP
charn, *, n*adql:POINTPOINT
charn, *, n*adql:REGIONREGION

"Implementation defined" in the above table means that an implementation is free to reject attempts to (de-) serialize values in these types. They are to be considered unsupported by ADQL, and the language provides no means to manipulate "native" representations of them.

References to REGION-typed columns must be valid wherever the ADQL region nonterminal is allowed. References to POINT-typed columns must be valid wherever the ADQL point nonterminal is allowed.

2.1.3. Empty Coordinate Systems

The legal values and the semantics of the first arguments to the geometry constructors (POINT, BOX, CIRCLE, POLYGON) have been left largely open by the ADQL standard. The TAP standard clarified those somewhat to the effect that the prescriptions became implementable. On the other hand, the only thing clients can reasonably expect according to TAP (on a recommendation base) from a server is one of four reference frames. Compared to the implementation effort and the potential for user confusion, the additional expressiveness gained by keeping the first argument seems minute. Even allowing more expressive system strings will not help the feature much, since non-trivial transformations (e.g., between reference positions) will need more data than merely the celestial coordinates available to the geometry constructors.

We therefore propose to deprecate the first argument in a point release of ADQL. In the next major release, the first argument as defined in ADQL2 should be declared as ignored. The standard should require constructors both with and without the current first argument, though, in order to ensure backward compatiblity for ADQL2 queries.

To implement the first step, we propose replacing the second paragraph on PDF page 10 of [std:ADQL] (starting with "For all these functions...") with:

For historical reasons, the geometry constructors (BOX, CIRCLE, POINT, POLYGON) require a string-valued first argument. It was intended to carry information on a reference system or other coordinate system metadata. In this version, we recommend ignoring this first argument, and clients are advised to pass an empty string here. Future versions of this specification will make this first, string-valued parameter optional for the listed functions.

In consequence, the COORDSYS function would be taken out of the enumeration on PDF page 9, and its description on PDF page 11 would be removed, too. All examples would use an empty string rather than "ICRS GEOCENTER" -- which is not contained in the TAP clarification anyway -- as in the current text.

A library of standard generalized user defined functions (see section 2.2.3) could provide for simple conversion between reference frames as well as more demanding transformations, e.g., between epochs or reference positions. This, however, depends on allowing geometry-valued user defined functions and is outside of the scope of a clarification. See also section 2.2.3.

2.2. ADQL: Proposed New Features

2.2.1. Simple Crossmatch Function

Since a simple positional crossmatch is such a common operation, we should define a function CROSSMATCH(ra1, dec1, ra2, dec2, radius) -> INTEGER returning 1 if (ra1, dec1) and (ra2, dec2) are within radius degrees of each other. This allows more compact expressions than the conventional CONTAINS(POINT, CIRCLE) construct, and ADQL to SQL translators can more easily exploit special constructs for fast crossmatching that may be built into the backend databases.

2.2.2. No Type-based Decay of INTERSECTS

Section 2.4.11 of [std:ADQL] stipulates that a call to INTERSECTS should decay to a CONTAINS when one argument is a POINT. This rule is a major implementation liability for simple translators, since it is the only place in the ADQL specification that actually requires a type calculus. For a feature that does not actually add functionality, this seems a high price to pay.

We therefore recommend to strike the text from "Note that if one of the arguments" through "equivalent to INTERSECTS(b,a)" and add at the end for 2.4.11:

The arguments to INTERSECTS SHOULD be geometric expressions evaluating to either BOX, CIRCLE, POLYGON, or REGION. Previous versions of this specification allow POINTs as well and require servers to interpret the expression as a CONTAINS with the POINT moved into the first position. Servers SHOULD still implement that behaviour, but clients SHOULD NOT expect it. It will be dropped in the next major version of this specification.

2.2.3. Generalized User Defined Functions

Currently, user defined functions may only return numbers or strings (in terms of the grammar, only numeric_value_function and string_value_function can expand to user_defined_function). Many interesting functions (e.g., coordinate transforms, applying proper motions) are extremely inconvenient to define with such a restriction. Therefore, we propose to add | <user_defined_function> to the right hand side of the geometry_value_function rule.

With this, we could define some standard functions for manipulating geometries; these should be defined in the standard, but they could remain optional. Clients can determine their availability using [std:TAPREGEXT].

A future version of this note will propose a library of such functions, including proper motion, precession, and system transformation.

2.2.4. Case-Insensitive String Comparisons

ADQL currently has no facility reliably allowing case-insensitive string comparisons. This is particularly regrettable since UCDs and at least the majority of the defined utypes are to be compared case-insensitively.

Thus, we propose the addition of a string function LOWER and the case-insensitive variant of LIKE, ILIKE. Since case folding is a nontrivial operation in a multi-encoding world, ADQL would only require standard behaviour for the ASCII characters (which would suffice for UCDs and utypes) and only recommend following algorithm R2 in section 3.13, "Default Case Algorithms" of [std:UNICODE] outside of ASCII.

The grammar changes are trivial.

2.2.5. Set Operators

ADQL 2.0 does not support any of the SQL UNION, EXCEPT and INTERSECT operators. Since at least set union and intersection are basic operations of relational algebra and combining data from several tables is an operation of significant practical use, this is a serious deficit. Also, there is probably no backend SQL system that does not support these operations.

Thus, to add minimal support of set operations to ADQL, ADQL systems will mainly need to update their grammars. The following rules, adapted from [std:SQL1992], will suffice (the query_expression rule replaces the one given in the current grammar, all others are new rules):

         <query_expression> ::=
                <non_join_query_expression>
              | <joined_table>

         <non_join_query_expression> ::=
                <non_join_query_term>
              | <query_expression> UNION  [ ALL ] <query_term>
              | <query_expression> EXCEPT [ ALL ] <query_term>

         <query_term> ::=
                <non_join_query_term>
              | <joined_table>

         <non_join_query_term> ::=
                <non_join_query_primary>
              | <query term> INTERSECT [ ALL ]

         <query primary> ::=
                <non_join_query_primary>
              | <joined_table>

         <non_join_query_primary> ::=
                <query_specification>
              | <left_paren> <non_join_query_expression> <right_paren>

This leaves out the CORRESPONDING specifications of SQL92, and it still does not include VALUES and explicit table specifications (which would enter through non_join_query_primary) in ADQL. None of these seem indispensible, although one could probably make a case for VALUES .

2.2.6. Adding a Boolean Type

Having a boolean type in ADQL could make some expressions nicer (e.g., it could eliminate the comparison against 1 for the geometry predicate functions). However, adding boolean functions and allowing references to boolean columns complicates catching syntax errors significantly, since expressions like WHERE colref would then parse and only would only raise an error when it turns out that colref does not refer to a boolean column. Simple ADQL translators may not be able to verify this.

We therefore propose to add a boolean type to the ADQL type system (see section 2.1.2) without any grammatical support for it. However, the standard prose should be amended to contain:

If the backend database contains columns of type boolean, a comparison of those against the literal strings True and False must be true and false when the column is true and false, respectively. The comparison to other literals is undefined by this specification. Clients should note that the strings have to be entered exactly as given here, without changing case, adding whitespace, or any other modification.

If this change is adopted, the type system table given in section 2.1.2 should be updated; luckily, the VODataService specification underlying VOSI already allows BOOLEAN as a TAPType. In the table row for VOTable boolean, "implementation defined" should be replaced with "BOOLEAN".

2.2.7. Casting to Unit

ADQL translators can typically introspect the tables they operate on, and thus can typically infer the (physical) unit of a column. Manually converting units (as in col_in_deg*3600 is error-prone, and expressions like that make it almost impossible to infer the unit of the result.

This problem is addressed by the introduction of a function IN_UNIT(expr, <character_string_literal>); the second argument has to be a literal in order to make sure that an ADQL translator has access to its value; this value must be in the format defined by [std:VOUNIT]. The intended functionality is that the translator replaces the function call with an new expression that is expr given in the unit defined by the second argument if the translator can figure out expr's unit, and it knows how to convert values in one unit into another. In every other case, the query must be rejected as erroneous.

2.2.8. Column References with UCD Patterns

In the same spirit of a function that really is a macro evaluated by an ADQL translator, we suggest a new function UCDCOL(<character_string_literal>). The character_string_literal in this case specifies a posix shell pattern (i.e., users write * for a sequence of 0 or more arbitrary chars, ? for exactly one arbitrary char, [] for a character range, and the backslash is the escape character) for a UCD. The translator replaces the entire function call with the first match of a column matching this pattern. If no such column exists, the query must be rejected as erroneous.

3. UWS

3.1. UWS: Clarifications, Errata, and Recommendations

3.1.1. Updating Parameters

Section 2.1.11 of [std:UWS] states that a "particular implementation of UWS may choose to allow the parameters to be updated after the initial job creation step, before the Phase is set to the executing state" and successively allows POSTing to jobs/job-id, jobs/job-id/parameters and PUTting to jobs/job-id/parameters/parameter-name.

It turned out that the concrete semantics of this cavalier approach quickly become difficult. We therefore propose to amend the language on changing parameters post-creation by:

In most cases, the values of the parameters are all established during the initial POST that creates the job. However, a particular implementation of UWS may choose to allow the parameters to be updated after the initial job creation step, before the Phase is set to the executing state. It should, however, not offer the ability to create new parameters nor delete existing parameters. The next major version of this specification will remove the ability to set an individual parameter.

From the client perspective, there is only one guaranteed way to set a parameter that all UWS services must implement: In the initial POST that creates the job.

3.1.2. Behaviour for Failed Job Creation

In Section 2.2.3.1 of [std:UWS] a UWS is required to return a "code 303 'See other'" "unless the service rejects the request". It is not specified what should happen when the service rejects the request.

We propose to add, at an appropriate position, the following text:

If the execution of an UWS request fails, the service has to generate an appropriate error message with codes in the 400 (client error) or 500 (server error) ranges according to [std:HTTP]. If the erroneous request is recoverable (e.g., a request for a transition to an impossible state), the job does not go into the ERROR state because of a failed request.

The payload of such an error message SHOULD be a user-presentable error message plain text, which SHOULD not be re-flowed by clients. Clients MUST accept other documents coming back as payloads of such request responses. As such events can be assumed major server failures, it is recommended to abandon a job that had a non-text/plain response to any UWS request.

3.2. UWS: Proposed New Features

3.2.1. Format of Quote

Section 2.2.1 of [std:UWS] states that the jobs/job-id/quote resource represents quote as a number of seconds, while the schema represents quote as an xs:dateTime.

This is an unnecessary inconsistency. If no schema change is required by other changes in a UWS revision, we propose to solve it by requiring the representation in the resource to be in [std:DALI] YYYY-mm-ddThh:mm:ss form. While doing this, we should also clarify the format for the value of desctruction, that currently just defers to [std:iso8601]; this should now refer to [std:DALI] as ISO 8601 allows many variants that are clearly not intended here.

If the UWS schema needs changing for other reasons, we suggest to unify the representations to the number of seconds on grounds that it is the more logical specification for the estimated duration of a job.

4. TAP

4.1. TAP: Clarifications, Errata, and Recommendations

4.1.1. Names of Uploaded Tables

Section 2.5 of [std:TAP] requires the name of the uploaded tables to be a "legal ADQL table name with no catalog or schema (e.g. an unqualified table name)". This language probably allows delimited identifiers, as the ADQL table_name can expand to one. This, however, was clearly not the intention of text, as the use of delimited identifiers is not (fully) supported by the syntax of the UPLOAD parameter. To resolve these difficulties, we propose to replace the parenthesis starting with "e.g." with:

i.e., a string following the regular_identifier production of [std:ADQL].

This could, in theory, invalidate existing clients that might want to use delimited identifiers in uploads. Due to the difficulties with the UPLOAD parameter syntax, however, that would not really be supported in version 1, either. Thus, we claim that this language can enter in a minor version.

4.1.2. Multiple UPLOAD Posts

Since UWS allows posting parameters after job creation Section 2.5.1 of [std:TAP] needs to specify what happens when the UPLOAD parameter is posted into a job that already has one or more uploads. We propose to add at the end of the section:

UPLOADs are accumulating, i.e., each UPLOAD parameter given will create one or more tables in TAP_UPLOAD. When the table names from two or more upload items agree after case folding, the service behaviour is unspecified. Clients thus cannot reliably overwrite uploaded tables; to correct errors, they have to tear down the existing job and create a new one.

4.1.3. Database Column Types

Section 2.5 of [std:TAP] gives "database column types" for all kinds of VOTable objects. Given the lack of an ADQL type system, this must be clearly be taken with a grain of salt; the types given in this column at least cannot be taken as conformance criteria. We propose to add the following language before section 2.5.1:

Note that the last column of Table (x) is not normative. Implementations SHOULD try to make sure that the actual types chosen are at least signature-compatible with the recommended types (i.e., integers should remain integers, floating-point values floating-point values, etc.), such that clients can reliably write queries against uploaded tables.

For columns with xtype adql:REGION, this is particularly critical, since databases typically use different types to represent various STC-S objects. Clients are advised to assume that such columns will be approximated with polygons in the actual database table.

4.1.4. The size Column in TAP_SCHEMA

The table TAP_SCHEMA.columns as specified in section 2.6.3 of [std:TAP] has a column named size. This is unfortunate since SIZE is an ADQL reserved word, and thus must be quoted in queries.

We therefore propose to append the following language to section 2.6.3:

To use size in a query, it must be put in double quotes since it collides with an ADQL reserved word. Since delimited identifiers are case-sensitive, for the size column both clients and servers MUST always (in particular, in the DDL for TAP_SCHEMA) use lower case exclusively.

In the next major version of TAP, this column will be called arraysize.

To allow the text to be consistent with the rules for VOTable error documents, we propose the following changes in Section 2.9 of [std:TAP]:

CurrentNew
The VOTable must contain a RESOURCE element identified with the attribute type='results', containing a single TABLE element with the results of the query. The VOTable must contain a RESOURCE element identified with the attribute type='results', containing exactly one TABLE element with the results of the query if the job execution was successful or no TABLE element if the job execution failed to produce a result.
The RESOURCE element must contain, before the TABLE element, an INFO element with attribute name = "QUERY_STATUS". The value attribute must contain one of the following values:The RESOURCE element must contain an INFO element with attribute name="QUERY_STATUS" indicating the success of the operation. For RESOURCE elements that contain a TABLE element, this INFO element must appear lexically before the TABLE. The following values are defined for this INFO element's value attribute:

4.2. TAP: New Features

4.2.1. An examples Endpoint

Feedback from TAP users indicates that providing query examples is considered most helpful, which is probably not surprising since to effectively use a TAP service, a user has to combine knowlege of a fairly complex query language with server-specific metadata like table schemata and local extensions as well as domain knowledge. A head start as provided by examples doing something related to what the users actually want is therefore most welcome.

TAP services are usually accessed through specialized clients. Therefore, a simple link "for examples see here" will in general not work for them. In principle, one could simply communicate an example URL to a client and let the user browse it. Allowing a certain amount of structuring within the document at this URL, however, lets clients do some useful in-application presentation of the examples.

[std:DALI] defines a simple system to communicate examples to humans and machine clients alike, based on RDFa. This section specifies how the generic DALI specification is to be applied to TAP.

4.2.1.1. The Endpoint

A TAP server exposes the example queries in an examples endpoint residing next to sync, async , and the VOSI endpoints. A GET from this endpoint MUST yield a document with a MIME type of either application/xhtml+xml or text/html. A service that does not provide examples MUST return a 404 HTTP status on accessing this resource.

If present, the endpoint must be represented in a capability in the TAP service's registry record. The capability's standardID is, as defined by DALI, ivo://ivoa.net/std/DALI#examples. A capability element could hence look like this:


   <capability standardID="ivo://ivoa.net/std/DALI#examples">
     <interface xsi:type="vr:WebBrowser">
       <accessURL use="full">http://localhost:8080/tap/examples</accessURL>
     </interface>
   </capability>

4.2.1.2. Document Content

The document at examples MUST follow the rules laid out for DALI-examples in [std:DALI]; in particular, it must be valid XML, viewable with "common web browsers".

TAP defines two additional properties within the ivo://ivoa.net/std/DALI-examples (note that at the time of writing the DALI PR has "DALI#examples" here, which we corrected here) vocabulary:

  • query -- each example MUST have a unique child element with simple text content having a property attribute valued query. It contains the query itself, preferably with extra whitespace for easy human consumption and editing. This will usually be a HTML pre element.
  • table -- examples MAY also have descendants with property attributes having the value table. These must have pure text content and contain fully qualified table names to which the query is somehow "pertaining". Suitable HTML elements holding these include span, or a (which would allow linking to further information on the table).

An example for a document served from the examples endpoint is given in Appendix A

4.2.1.3. Intended Use

In the simplest case, TAP clients can provide links to the current server's example endpoint. A more advanced interface would give an interface element allowing the selection of example titles with the option of entering the sample query into the query field of the user interface. The documentation for the query would be accessed by opening a web browser using the base example URL and the example's fragment identfier.

Advanced clients could render the HTML div elements themselves, and they could provide a means to discover example queries involving particular tables in their table metadata browser based on property=table markup.

4.2.1.4. Validation

Appendix B givs an XSLT 1.0 stylesheet that extracts the machine readable information from compliant documents and emits the results in text format.

The style sheet checks for proper vocabulary declaration. If you have no element declaring the vocabulary, the output will be empty.

Service operators should also use RDFa validation tools, e.g., the W3C RDFa validator [svc:RDFaVal], to make sure their document is usable from RDF tools.

4.2.2. A plan Endpoint

CDS have a debug endpoint with additional information; join their concepts with this.

As already noted in [std:TAP], it is notoriously difficult to predict the runtime of SQL queries. For nontrivial queries, even experts may have a hard time figuring out performance bottlenecks. Therefore, most database systems provide some mechanism to obtain a query plan, that is, to inspect what elementary operations will be performed for a given query.

Since TAP queries are typically formulated by persons not intimately familiar with the database queried, the need for a mechanism allowing insights into the database engine's reasoning is even more pronounced. On the other hand, different database systems give their plans in completely different formats and even schemata. In addition, as the Postgres Documentation says: "Plan-reading is an art that deserves an extensive tutorial" ([doc:Postgres92], Sect. 14.1).

Thus, specifying a fixed format for query plans that would be both expressive enough and sufficiently generic to be easily adaptible to various backend database engines is probably impossible. To still allow users to inspect actual query plans, we propose the following language be added at the end of section 2.2.2 of [std:TAP]:

In addition to the UWS resources, a TAP server SHOULD support a child plan for each job resource. If retrieving this resource is successful (i.e., results in a 200 HTTP response after possible redirects and authentication), it MUST be a preformatted document with MIME type text/plain. Within it, the actual query as executed by the database engine MUST come first.

After at least one blank line, a rendering of the query plan follows. Note that the query as excecuted may contain blank lines, which means that machine clients cannot use the blank line to separate query and plan. In general, clients SHOULD display the plan without any reformatting in a fixed-width font.

Since it is hard to define a generic and sufficiently expressive format for query plans and the authors want to avoid excessive implemenation cost for this feature, this specification does not give a format for the query plan. Implementors are advised to keep as much of the "native" plan format of their database engine as possible.

After the plan, the service is free to give additional debugging information. The indended audience for this information are again humans, so even in cases where proprietary clients actually parse out information from that area, such information should still be decipherable by knowledgeable humans.

If the creation of the query plan fails, the service MUST reply with a 400 (if the failure appears to be due to syntax errors in the query, the query plan not being available in this UWS phase, or similar problems) or 500 HTTP status code. Errors in plan generation do not change the phase of the job. Clients may thus use the plan endpoint to check the syntax of a query on services supporting it.

Services that cannot or choose not to support the retrieval of query plans MUST respond with a 404 HTTP code to requests for plan children of job resources.

Except for 404 responses, all documents delivered from the plan endpoint MUST have the MIME type text/plain. They should contain ASCII exclusively, but clients SHOULD assume UTF-8 encoding and if no character set is declared by HTTP means.

4.2.3. Scaleable tables Endpoint

For archives serving hundreds or thousands of tables, the tables endpoint on TAP services as defined by [std:VOSI] will have to return documents of several dozen megabytes. This results in nontrivial transfer times for data that in all likelihood is uninteresting to the user that typically will only write queries against fairly few of those tables.

To mitigate this problem, we propose to define that vs:Table typed elements in responses from VOSI table endpoints that have no column children are to be regarded as stubs by clients. A client SHOULD give the user the possibility to request "full" information on such a stubbed table. This full information is available from a child resource of tables named like the table, in exactly the captialization as given in the name child of the table stub; it would come as the full table element.

As an example, a service might return the following from its tables endpoint:

<tableset xmlns:vs="http://www.ivoa.net/xml/VODataService/v1.1" 
		xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
		xsi:type="vs:TableSet">
  <schema>
    <name>ppmxl</name>
    <table>
      <name>ppmxl.main</name>
    </table>
 </schema>
</tableset>

A client could then retrieve the url .../tables/ppmxl.main and would receive something like this:

<table xmlns:vs="http://www.ivoa.net/xml/VODataService/v1.1" 
		xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
		xsi:type="vs:Table">
  <name>ppmxl.main</name>
  <description> PPMXL is a catalog of positions, proper motions...
  </description>
  <column>
    <name>ipix</name>
    <description>Identifier (Q3C ipix of the USNO-B 1.0 object)</description>
	...
</table>

More formally, we propose to replace the last paragraph of section 3.4, "Table metadata", of [std:VOSI], Version 1.0, with the following text:

In the REST binding, the registred URL retrieves an XML document containing this element. However, services exposing a large number of tables may only write table stubs into the document retrieved from this web resource. Table stubs are table elements containing no column children. While the XSD requires a name child to be present, the services may or may not include any of the remaining table metadata.

Still in the REST binding, the server that has produced such a columnless table element should provide a child of the tables resource named like the content of the tables name child element, with any leading or trailing whitespace removed. If a request for this resource is successful, the document received must contain a XML document containing a single element of the type {http://www.ivoa.net/xml/VODataService/v1.1}Table with all metadata available for the table.

Making the async Endpoint Optional

Some existing TAP-like services have data that is small and simple enough that synchronous queries are likely to be sufficient. They therefore chose not to implement the async endpoint, which makes these services technically non-TAP. Given the implemenation overhead of a UWS for something that is not really required by the services in question, the choice seems reasonable, though, and the services are "mostly interoperable" with existing clients in that there are usually ways to operate the services from the clients.

Therefore, we propose to make the async endpoint optional and add language that requires clients to offer ways to fall back to synchronous operation for services that do not support async.

A. An Example for an /examples Document

<html version="XHTML+RDFa 1.1" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:="http://www.w3.org/1999/xhtml">
<head>
<title>TAP examples example</title>
</head>
<body vocab="ivo://ivoa.net/std/DALI-examples">
<h1>Example Queries for our TAP Service</h1>
<div resource="#katkatbib" typeof="example" id="#katkatbib">
<h2 property="name">katkat bibliography</h2>
<p>
To search for title (or other) words in
<a property="table" href="/tableinfo/katkat.katkat">katkat.katkat</a>
's source field or in some other sort of bibliographic query, use the
<tt class="literal">gavo_hasword</tt>
locally defined function. This basically works a bit like you'd expect from search engines: case-insensitive, and oblivious to any context.
</p>
<p>Try the following query:</p>
<pre property="query">select * from katkat.katkat where gavo_hasword('variable', source) and minEpoch&lt;1900 </pre>
</div>
<div resource="#arigfhmultiflags" typeof="example" id="arigfhmultiflags">
<h2 property="name">arigfh multiflags</h2>
<p>
This query selects reflected observations and their epochs and equinoxes from the identified objects within
<a property="table" href="/tableinfo/arigfh.id">arigfh.id</a>
. This example in particular shows how to decode combined flags (i.e., flags-like numbers in which digits (or groups of digits) need to be extracted to allow interpretation.
</p>
<p>Try the following query:</p>
<pre property="query"> SELECT decCat, raj2000, dej2000, epDec, eqDec FROM arigfh.id WHERE 4=mod(decflags/10000, 10) </pre>
</div>
</body>
</html>

B. An XSLT Stylesheet for Validating an examples Document

<stylesheet version="1.0" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:="http://www.w3.org/1999/XSL/Transform" xmlns:http="http://www.w3.org/1999/xhtml">
<!-- This XSLT stylesheet extracts from a TAP examples endpoint what machine-readable information is in there. See TAP Implementation Notes for details. -->
<!-- This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. For the complete text of the GPL, see http://www.gnu.org/licenses/. -->
<output method="text"/>
<strip-space elements="*"/>
<template match="*[@property='example']" mode="invocab">
<text>================================================ Example id=</text>
<value-of select="@id"/>
<text> </text>
<apply-templates mode="invocab"/>
<text>================================================ </text>
</template>
<template match="*[@property='name']" mode="invocab">
<text>Name: </text>
<value-of select="text()"/>
<text> </text>
</template>
<template match="*[@property='table']" mode="invocab">
<text>Pertains to: </text>
<value-of select="text()"/>
<text> </text>
</template>
<template match="*[@property='query']" mode="invocab">
<text>Query: </text>
<value-of select="text()"/>
<text> </text>
</template>
<template match="text()" mode="invocab"/>
<template match="text()"/>
<template match="*[@vocab]">
<choose>
<when test="@vocab='ivo://ivoa.net/std/DALI-examples'">
<apply-templates mode="invocab"/>
</when>
<otherwise>
<message terminate="yes">
<text>Forbidden vocabulary encountered: </text>
<value-of select="@vocab"/>
</message>
</otherwise>
</choose>
</template>
</stylesheet>

References

[std:UNICODE] The Unicode Consortium.
The unicode standard, version 6.1 core specification, 2012.
[std:TAPREGEXT] Markus Demleitner, Patrick Dowler, Ray Plante, Guy Rixon, and Mark Taylor.
TAPRegExt: a VOResource schema extension for describing TAP services, version 1.0. IVOA Recommendation, August 2012.
[std:VOUNIT] Sebastien Derriere, Norman Gray, Mireille Louys, Jonathan McDowell, Francois Ochsenbein, Pedro Osuna, Bruno Rino, and Jesus Salgado, Sebastien Derriere, editor.
Units in the VO. IVOA Proposed Recommendation, 2012.
[std:DALI] Patrick Dowler, Markus Demleitner, Mark Taylor, and Doug Tody.
Data access layer interface, version 1.0. IVOA Recommendation, November 2013.
[std:TAP] Patrick Dowler, Guy Rixon, and Doug Tody.
Table access protocol version 1.0. IVOA Recommendation, March 2010.
[std:HTTP] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee.
Hypertext transfer protocol – HTTP/1.1. rfc2616, June 1999.
[std:VOSI] Grid and Web Services Working Group, Matthew Graham and Guy Rixon, editors.
IVOA support interfaces version 1.0, 2011.
[std:UWS] Paul Harrison and Guy Rixon.
Universal worker service pattern, version 1.0. IVOA Recommendation, October 2010.
[IVOAWIKI] http://wiki.ivoa.net.
IVOA wiki. [Online].
[doc:Postgres92] http://www.postgresql.org/docs/9.2/static/index.html.
PostgreSQL 9.2.1 documentation. [Online].
[svc:RDFaVal] http://www.w3.org/2012/pyRdfa/Validator.html.
W3C RDFa validator. [Online].
[std:iso8601] International Organization for Standardization).
Data elements and interchange formats – information interchange – representation of dates and times, 2004.
[std:SQL1992] International Standard Organization.
The database language SQL. Technical Report, Document ISO/IEC9075:1992, 1992.
[std:ADQL] Iñaki Ortiz, Jeff Lusted, Pat Dowler, Alexander Szalay, Yuji Shirasaki, Maria A. Nieto-Santisteba, Masatoshi Ohishi, William O'Mullane, Pedro Osuna, the VOQL-TEG, and the VOQL Working Group, Pedro Osuna and Iñaki Ortiz, editors.
IVOA astronomical data query language. IVOA Recommendation, 2008.
[std:VODS11] Raymond Plante, Aurélien Stébé, Kevin Benson, Patrick Dowler, Matthew Graham, Gretchen Greene, Paul Harrison, Gerard Lemson, Tony Linde, and Guy Rixon.
VODataService: a VOResource schema extension for describing collections and services version 1.1. IVOA Recommendation, December 2010.