/[volute]/trunk/projects/dal/AccessData/AccessData.tex
ViewVC logotype

Contents of /trunk/projects/dal/AccessData/AccessData.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 3183 - (show annotations)
Mon Dec 14 14:56:17 2015 UTC (5 years, 9 months ago) by francois
File MIME type: application/x-tex
File size: 21938 byte(s)


1 \documentclass[11pt,a4paper]{ivoa}
2 \input tthdefs
3
4 \usepackage{listings}
5 \lstloadlanguages{XML,sh}
6 \lstset{flexiblecolumns=true,tagstyle=\ttfamily,
7 showstringspaces=False}
8 \usepackage{todonotes}
9
10 \title{IVOA Server-side Operations for Data Access}
11
12 \ivoagroup{DAL}
13
14 \author{Fran\c cois Bonnarel, Markus Demleitner, Patrick Dowler, Douglas Tody }
15
16 \editor{Fran\c cois Bonnarel}
17
18 \previousversion{WD-AccessData-1.0-20151021}
19 \previousversion{WD-AccessData-1.0-20140730}
20 \previousversion{WD-AccessData-1.0-20140312}
21
22
23 \begin{document}
24
25 \begin{abstract}
26 This document describes the SODA web service capability. SODA is a low-level data access capability or server side data processing that can act upon the data files, performing various kinds of operations: filtering/subsection, transformations, pixel operations, and applying functions to the data.
27
28 \end{abstract}
29
30 \section*{Acknowledgments}
31 The authors would like to thank all the participants in DAL-
32 WG discussions for their ideas, critical reviews, and
33 contributions to this document.
34
35 \section{Introduction}
36 The SODA web service interface defines a RESTful web service for performing server-side operations and processing on data before transfer.
37
38
39 \subsection{The Role in the IVOA Architecture}
40
41
42
43
44 TODO: new diagram from TCG
45
46 SODA services conform to the Data Access
47 layer Interface \citep{std:DALI} specification, including the
48 Virtual Observatory Support Interfaces \citep{std:VOSI} resources.
49
50 \subsection{Motivating Use Cases}
51 Below are some of the more common use cases that have motivated the development of the SODA specification. While this is not complete, it helps to understand the problem area covered by this specification.
52
53 \subsubsection{Retrieve Subsection of a Datacube}
54 \label{sect:use-cube}
55
56 Cutout a subsection using coordinate axis values. The input to the cutout operation will include one or more of the following:
57
58
59 \begin{itemize}
60 \item a region on the sky
61 \item an energy value or range
62 \item a time value or range
63 \item one or more polarization states
64 \end{itemize}
65
66 The region on the sky should be something simple: a circle,
67 a range of coordinate values, or maybe a polygon.
68
69 \subsubsection{Retrieve subsection of a 2D Image}
70 This is a special case of \ref{sect:use-cube},
71 where the cutout is only in the spatial axes.
72
73 \subsubsection{Retrieve subsection of a Spectrum}
74
75 This is a special case of \ref{sect:use-cube},
76 where the cutout is only in the spectral axis.
77
78 \subsection{Provide the data in different formats}
79
80 Examples are images in PNG, or JPEG instead of FITS and spectra in csv, FITS or VOTable.
81
82 \subsubsection{Flatten a Datacube into a 2D Image}
83
84 This use case will be developed and supported in the
85 SODA-1.1 (or later) specification.
86
87 \subsubsection{Flatten a Datacube into a 1D Spectrum}
88
89 This use case will be developed and supported in the
90 SODA-1.1 (or later) specification.
91
92 \subsubsection{Rebin Data by a Fixed Factor}
93
94 This use case will be developed and supported in the
95 SODA-1.1 (or later) specification.
96
97 \subsubsection{Reproject Data onto a Specified Grid}
98
99 This use case will be developed and supported in the
100 SODA-1.1 (or later) specification.
101
102 \subsubsection{Compute Aggregate Functions over the Data}
103
104 This use case will be developed and supported in the
105 SODA-1.1 (or later) specification.
106
107
108 \subsubsection{Apply Standard Function to Data Values}
109
110 It could be
111
112
113 denoising" with standard methods or "on the fly" recalibration. This use case will be developed and supported in the
114 SODA-1.1 (or later) specification.
115
116 \subsubsection{Apply Arbitrary User-Specified Function to Data Values}
117
118 This use case will be developed and supported in the
119 SODA-1.1 (or later) specification.
120
121 \subsubsection{Run Arbitrary User-Supplied Code on the Data}
122
123 This use case will be developed and supported in the
124 SODA-1.1 (or later) specification.
125
126 \section{Resources}
127
128 SODA services are implemented as HTTP REST \citep{richardson07} web
129 services with a \{sync\} resource that conforms to the DALI-
130 sync resource description.
131
132 \begin{table}[h]
133 \begin{tabular}{rrr}
134 \sptablerule
135 \textbf{resource type}&\textbf{resource name}&\textbf{required}\cr
136 \sptablerule
137 \{sync\}&service specific&\cr
138 \{async\}&service specific&\cr
139 {DALI-examples}&/examples&no\cr
140 {VOSI-availability}&/availability&yes\cr
141 {VOSI-capabilities}&/capabilities&yes\cr
142 \sptablerule
143 \end{tabular}
144 \caption{Endpoints for AccessData services}
145 \end{table}
146
147 A stand-alone SODA service may have one or both of the \{sync\} and \{async\} resources. For either type, it could have multiple resources (e.g. to support alternate authentication schemes). The SODA service may also include other custom or supporting resources.
148
149 Either the \{sync\} or \{async\} SODA capability may be included as part of other web services. For example, a single web service could contain the SIA-2.0 \{query\} capability, the DataLink-1.0 \{links\} capability, and the SODA \{sync\} capability. Such a service must also have the VOSI-availability and VOSI-capabilities resources to report on and describe all the implemented capabilities.
150
151 \subsection{\{sync\} resource}
152
153 The \{sync\} resource is a synchronous web service resource
154 that conforms to the DALI-sync description. The implementer
155 is free to name (set the path) for this resource however
156 they like; the client will find the resource path using the
157 VOSI-capabilities resource.
158
159 The \{sync\} resource performs the data access as specified by
160 the input parameters and returns the data directly in the
161 output stream. Synchronous data access is suitable when the
162 operations can be quickly performed and the data stream can
163 be setup and written to (by the service) in a short period
164 of time (e.g. before any timeouts).
165
166 \subsection{\{async\} resource}
167
168 The \{async\} resource is an asynchronous web service resource
169 that conforms to the DALI-async description. The implementer
170 is free to name (set the path) for this resource however
171 they like; the client will find the resource path using the
172 VOSI-capabilities resource.
173
174 The \{async\} resource performs the data access as specified
175 by the input parameters and either (i) stores the results
176 for later transfer or (ii) pushes the results to a specified
177 destination (e.g. to a VOSpace location). Asynchronous data
178 access usually introduces resource constraints on the
179 service (which may be limited) and usually imposes a higher
180 latency before any results can be seen because the location
181 of results does not have to be valid until the data access
182 job is complete. Asynchronous data access is intended for
183 (but not limited to) use when the operations take
184 considerable time and results must be staged (e.g. some
185 multi-pass algorithms or operations that result in multiple
186 outputs).
187
188 \subsection{Examples: DALI-examples}
189
190 AccessData services should provide a DALI-examples resource
191 with one example invocation that shows the variety
192 operations the service can perform. Example operations using
193 the \{sync\} resource and that output a small data stream are
194 preferred.
195
196 \subsection{Availability: VOSI-availability}
197
198 A SODA web service must have a VOSI-availability
199 resource \citep{std:VOSI} as described in DALI \citep{std:DALI}.
200
201 \subsection{Capabilities: VOSI-capabilities}
202
203 A web service that includes SODA capabilities must
204 have a VOSI-capabilities resource \citep{std:VOSI} as described in DALI
205 \citep{std:DALI}. The standardID for the \{sync\} resource is
206 $$\hbox{\texttt{ivo://ivoa.net/std/SODA\#sync}.}$$
207
208 The standardID for the \{async\} resource is
209
210 $$\hbox{\texttt{ivo://ivoa.net/std/SODA\#async}.}$$
211
212 All DAL services must implement the \texttt{/capabilities} resource.
213 The following capabilities document shows the minimal
214 metadata for a stand-alone SODA service and does not
215 require a registry extension schema:
216
217 \begin{lstlisting}[language=XML]
218 <?xml version="1.0"?>
219 <capabilities
220 xmlns:vosi="http://www.ivoa.net/xml/VOSICapabilities/v1.0"
221 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
222 xmlns:vod="http://www.ivoa.net/xml/VODataService/v1.1">
223 <capability standardID="ivo://ivoa.net/std/VOSI#capabilities">
224 <interface xsi:type="vod:ParamHTTP" version="1.0">
225 <accessURL use="full">http://example.com/data/capabilities</accessURL>
226 </interface>
227 </capability>
228 <capability standardID="ivo://ivoa.net/std/VOSI#availability">
229 <interface xsi:type="vod:ParamHTTP" version="1.0">
230 <accessURL use="full">
231 http://example.com/data/availability
232 </accessURL>
233 </interface>
234 </capability>
235 <capability standardid="ivo://ivoa.net/std/SODA#sync">
236 <interface xsi:type="vod:ParamHTTP" role="std" version="1.0">
237 <accessurl use="full">
238 http://example.com/data/sync
239 </accessurl>
240 </interface>
241 <!-- service details from extension schema could go here -->
242 </capability>
243 <capability standardid="ivo://ivoa.net/std/SODA#async">
244 <interface xsi:type="vod:ParamHTTP" role="std" version="1.0">
245 <accessurl use="full">
246 http://example.com/data/async
247 </accessurl>
248 </interface>
249 <!-- service details from extension schema could go here -->
250 </capability>
251 </capabilities>
252 \end{lstlisting}
253
254 Note that the \{sync\} and \{async\} resources do not have to be
255 named as shown in the accessURL(s) above. Multiple
256 capability elements for the \{sync\} and the \{async\} resources
257 may be included; this is typically used if the differ in
258 protocol (http vs. https) and/or authentication
259 requirements.
260
261 \section{Parameters for \{sync\} and \{async\}}
262
263 \label{sect:stdpars}
264
265 The \{sync\} and \{async\} resources accept the same set of
266 parameters.
267
268 \subsection{Parameter description: : 3 factor semantics}
269 Each SODA input parameter follows a generic rule of semantic description by three attributes: name, ucd -identifying the queried astronomical quantity-, and unit. With this three factors it is possible in principal to identify the nature and the style of the parameters. This also achieve overall consistency between input parameters and the VOTable PARAM element. Datatype and xtype attributes come to complete the definition of the value type and format.
270 The input parameters listed and defined in this section are fully described according to this rule. However these parameters should also be described this way each time the service is invocated with an empty request (see below \ref{sect:serv-desc}). Custom parameters of the service, if any, MUST be described in the same way.
271
272
273 \subsection{Common Parameters}
274
275 \subsubsection{ID}
276
277 The ID parameter is used to specify the dataset or file to
278 be accessed. The values for the ID parameter are generally
279 discovered from data discovery or DataLink requests. The
280 values must be treated as opaque identifiers that are used
281 as-is. The DataLink specification \citep{std:DataLink} describes mechanisms
282 for conveying opaque parameters and values in service
283 descriptor resources that can be used by clients to set the
284 ID parameter.
285
286 The ID parameter is single-valued in \{sync\} requests, so
287 \{sync\} soda requests access a single dataset or file.
288 Multiple ID parameters may be submitted in \{async\} requests
289 on order to bundle access to multiple datasets or files in a
290 single job.
291
292
293 The ID ucd is “meta.id”, and its unit is blank.
294 In addition its xtype is “ivoident” and its datatype "char".
295
296
297 \subsection{Filtering Parameters}
298
299 Filtering parameters are used to extract subsets of larger
300 datasets or data files. In general, filtering parameters are
301 single-valued in \{sync\} requests and multi-valued in \{async\}
302 requests (exceptions noted below). When multiple values of
303 filtering parameters are used in an \{async\} job, each
304 combination of values produces zero or one result. For
305 example, if an \{async\} job included two POS and two BAND
306 values, there could be as many as four results (or fewer if
307 some combinations do not produce a result because the filter
308 does not intersect the bounds of the data).</p>
309
310 \subsubsection{POS}
311
312 The POS parameter defines the positional region(s) to be
313 extracted from the data. The value is made up of a shape
314 keyword followed by coordinate values. The
315 allowed shapes are:
316
317 \begin{table}[h]
318 \begin{tabular}{rr}
319 \sptablerule
320 \textbf{Shape}&\textbf{Coordinate values}\cr
321 \sptablerule
322 \texttt{CIRCLE}&\texttt{<longitude> <latitude> <radius>}\cr
323 \texttt{RANGE}&\texttt{<longitude1> <longitude2> <latitude1> <latitude2>}\cr
324 \texttt{POLYGON}&\texttt{<longitude1> <latitude1> ... (at least 3 pairs)}\cr
325 \sptablerule
326 \end{tabular}
327 \caption{POS Values in Spherical Coordinates}
328 \end{table}
329
330 Unlimited value is coded by -Inf or +Inf.
331
332 A circle at (12,34) with radius 0.5:
333
334 \begin{lstlisting}
335 POS=CIRCLE 12 34 0.5
336 \end{lstlisting}
337
338 A range of [12,14] in longitude and [34,36] in latitude:
339
340 \begin{lstlisting}
341 POS=RANGE 12 14 34 36
342 \end{lstlisting}
343
344 A polygon from (12,34) to (14,34) to (14,36) to (12,36) and
345 (implicitly) back to (12,34):
346
347 \begin{lstlisting}
348 POS=POLYGON 12 34 14 34 14 36 12 36
349 \end{lstlisting}
350
351 The inside is always assumed to be the smaller of the region
352 to the left and the region to the right so only polygons
353 smaller than half the sphere can be specified.
354
355 A band around the equator:
356
357 \begin{lstlisting}
358 POS=RANGE 0 360 -2 2
359 \end{lstlisting}
360
361 The north pole:
362
363 \begin{lstlisting}
364 POS=RANGE 0 360 89 +Inf
365 \end{lstlisting}
366
367 This syntax is in the same style as STC-S, but with no
368 reference positions, coordinate systems, units, or geometric
369 operators like union, intersection, not, etc.
370
371 All longitude and latitude values (plus the radius of the
372 CIRCLE) are expressed in degrees in the ICRS. A future
373 version of this specification may allow the use of other
374 reference systems (specifically the native system of the
375 data).
376
377 The POS parameter is single-valued for \{sync\} requests and
378 multi-valued for \{async\} jobs.
379
380 The unit of POS is "deg" and the ucd is "pos". However the datatype of the POS parameter is “char”, and the xtype can take one of the three values “circle”, “range” and “polygon” as defined in DALI.
381
382 \subsubsection{BAND}
383
384 The BAND parameter defines the energy interval(s) to be
385 extracted from the data. The value is an open or closed
386 numeric interval of values in the native spectral axis
387 coordinate system and units of the data. The intervals
388 always include the bounding values. Unlimited values are coded by +Inf or -Inf.
389
390 If there is one single value the interval is assumed to be
391 infinitely small (a scalar value).
392
393 The closed interval [500,550]:
394
395 \begin{lstlisting}
396 BAND=500 550
397 \end{lstlisting}
398
399 The open interval (-inf,300]:
400
401 \begin{lstlisting}
402 BAND=-Inf 300
403 \end{lstlisting}
404
405 The open interval [750,inf):
406
407 \begin{lstlisting}
408 BAND=750 +Inf
409 \end{lstlisting}
410
411 The scalar value 550, equivalent to [550,550]:
412
413 \begin{lstlisting}
414 BAND=550
415 \end{lstlisting}
416
417 Extracting using a scalar value should normally extract a
418 single pixel along the energy axis of the data; extracting
419 using an interval should extract one or more pixels.
420
421 All energy values are expressed as barycentric wavelength in
422 meters. A future version of this specification may allow the
423 use of other reference systems (specifically the native
424 system of ther data).
425
426 The BAND parameter is single-valued for \{sync\} requests and
427 multi-valued for \{async\} jobs.
428
429 The ucd of the BAND parameter is “em”, the unit is “m”. Its datatype id double and the xtype is “interval” as defined in DALI.
430
431
432 \subsubsection{TIME}
433
434 The TIME parameter defines the time interval(s) to be
435 extracted from the data. The value is an open or closed
436 interval with either numeric values (interpreted as Modified
437 Julian Dates). Unlimited values are coded by +Inf or -Inf.
438
439 If there is one single value the numeric interval is assumed
440 to be infinitely small (a scalar value).
441
442 An open interval from the MJD 55100.0 and all later times:
443
444 \begin{lstlisting}
445 TIME= 55100.0 +Inf
446 \end{lstlisting}
447
448 A range of MJD values:
449
450 \begin{lstlisting}
451 TIME=55123.456 55123.466
452 \end{lstlisting}
453
454 An instant in time using Modified Julian Date:
455
456 \begin{lstlisting}
457 TIME=55678.123456
458 \end{lstlisting}
459
460 Time values are always UTC.
461 The TIME parameter is single-valued for \{sync\} requests and
462 multi-valued for \{async\} jobs.
463
464 The ucd of the TIME parameter is “time” and the unit is "day". The datatype is "double" and the xtype is, again, "interval" as defined in DALI
465
466
467 \subsubsection{POL}
468
469 The POL parameter defines the polarization state(s) (Stokes)
470 to be extracted from the data.
471
472 Extract the unpolarized intensity:
473 \begin{lstlisting}
474 POL=I
475 \end{lstlisting}
476 Extract the standard circular polarization:
477 \begin{lstlisting}
478 POL=V
479 \end{lstlisting}
480 The POL parameter is multi-valued; multiple values can be
481 included in a single request and all will be extracted.
482 Extract only the IQU components:
483 \begin{lstlisting}
484 POL=I
485 POL=Q
486 POL=U
487 \end{lstlisting}
488
489 The POL is multi-valued for both \{sync\} and \{async\} jobs.
490 Unlike general filtering parameters, all values of POL are
491 combined into a single filter; for example, if the request
492 includes the three values above, the job would generate one
493 result with some or all of these polarization states (per
494 combination of ID and other filtering parameters).</p>
495
496 The ucd of the POL PARAMETER is "pol" and the unit is none. The datatype is “char", and the xtype is “stokes”.
497
498
499
500 \section{\{sync\} Responses}
501
502 All responses from the \{sync\} resource follow the rules for
503 DALI-sync resources, except that the \{sync\} response allows
504 for error messages for individual input identifier values.
505
506 \subsection{Successful Requests}
507
508 Successfully executed requests should result in a response
509 with HTTP status code 200 (OK) and a response in the format
510 requested by the client or in the default format for the
511 service.
512
513 If the values specified for cutout parameters do not include
514 any pixels from the target dataset/file, the service must
515 respond with HTTP status code 204 (No Content) and no
516 response body.
517
518 The service should set the following HTTP headers to the
519 correct values where possible.
520
521 \begin{table}[h]
522 \begin{tabular}{rr}
523 \sptablerule
524 Content-Type&mime-type of the response\cr
525 Content-Encoding&encoding/compression of the response (if applicable)\cr
526 \sptablerule
527 \end{tabular}
528
529 \caption{Recommended HTTP Response Headers}
530 \end{table}
531
532 Since the response is usually dynamically generated, the
533 Content-Length and Last-Modified headers cannot usually be
534 set.\todo{If we say that, we should at least mention chunked transfers, or
535 people might think they have to close the connections. -- Markus}
536
537 \subsection{SODA Service Descriptor}
538 \label{sect: serv-desc}
539
540 The DataLink \citep{std:DataLink} specification describes a mechanism for
541 describing a service within a VOTable resource and
542 recommends that services can describe themselves with a
543 special resource with \texttt{name="this"}. SODA responses for
544 empty sync queries should include a descriptor describing
545 both standard and custom query parameters (if applicable).
546 The descriptor for a service with standard parameters (see
547 sect.~\ref{sect:stdpars}) would be:
548
549 \begin{lstlisting}[language=XML]
550
551 <RESOURCE type="meta" utype="adhoc:service" name="this">
552
553 <PARAM name="standardID" datatype="char" arraysize="*"
554 value="ivo://ivoa.net/std/SODA#sync-1.0" />
555
556 <PARAM name="accessURL" datatype="char" arraysize="*"
557 value="http://example.com/SODA/sync" />
558
559 <GROUP name="inputParams">
560 <PARAM name="ID" ucd="meta.id" datatype="char" arraysize="*" xtype="ivoident" />
561 <PARAM name="POS" ucd="pos" unit="deg" datatype="char" arraysize="*" xtype="circle" />
562 <PARAM name="POS" ucd="pos" unit="deg" datatype="char" arraysize="*" xtype="range" />
563 <PARAM name="POS" ucd="pos" unit="deg" datatype="char" arraysize="*" xtype="polygon" />
564 <PARAM name="BAND" ucd="em" unit="m" datatype="double" arraysize="*"
565 xtype="interval" />
566 <PARAM name="TIME" ucd="time" unit="d" datatype="double" arraysize="*" xtype="interval" />
567 <PARAM name="POL" ucd="pol" datatype="char" arraysize="*" xtype="Stokes" />
568 </GROUP>
569 </RESOURCE>
570
571 \end{lstlisting}
572
573 This VOTable resource should be output for empty sync
574 queries; Thus all inputs and outputs would be fully
575 described.
576
577 \subsection{Errors}
578
579 The error handling specified for DALI-sync resources applies
580 to service failure. Error documents should be text using the
581 text/plain content-type and the text must begin with one of
582 the following strings:
583
584 \begin{table}[h]
585 \begin{tabular}{rr}
586 Error&General error (not covered below)\cr
587 AuthenticationError&Not authenticated\cr
588 AuthorizationError&Not authorized to access the resource\cr
589 ServiceUnavailable&Transient error (could succeed with retry)\cr
590 UsageError&Permanent error (retry pointless)\cr
591 \end{tabular}
592 \caption{???}
593 \end{table}
594
595 \section{\{async\} Responses}
596
597 The \{async\} resource conforms to the DALI-async resource
598 description, which means the job is a UWS job with all the
599 job control features available. All result files are to be
600 listed as children of the UWS results resource. The service
601 provider is free to name each result.
602
603 \appendix
604
605 \section{Changes from Previous Versions}
606
607 \subsection{WD-SODA-1.0-20151120}
608
609 Change the name of the protocol. Suppression of SELECT and COORD. xtype description are in DALI. Reference to this has been added.
610
611 \subsection{WD-AccessData-1.0-20151021}
612
613 Added general introduction on PARAMETER description to
614 section 3. Modified SELECT and COORD sections in order to
615 detach them from SimDal. Added Appendix on xtype description
616 with BNF syntax.
617
618 \subsection{WD-AccessData-1.0-20140730}
619
620 \begin{itemize}
621 \item Removed REQUEST parameter since the DAL-WG decision to not
622 include it when there is only one value.
623
624 \item Clarified that ID and filierting parameters are single
625 valued for \{sync\} and multi-valued for \{async\}, wth POL
626 being multi-valued but still being treated as a single
627 filter.
628 \end{itemize}
629
630 \subsection{WD-AccessData-1.0-20140312}
631
632 This is the initial document.</p>
633
634 \bibliography{ivoatex/ivoabib}
635
636 \end{document}

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26