# Contents of /trunk/projects/dal/AccessData/AccessData.tex

Revision 3183 - (show annotations)
Mon Dec 14 14:56:17 2015 UTC (5 years, 9 months ago) by francois
File MIME type: application/x-tex
File size: 21938 byte(s)

 1 \documentclass[11pt,a4paper]{ivoa} 2 \input tthdefs 3 4 \usepackage{listings} 5 \lstloadlanguages{XML,sh} 6 \lstset{flexiblecolumns=true,tagstyle=\ttfamily, 7 showstringspaces=False} 8 \usepackage{todonotes} 9 10 \title{IVOA Server-side Operations for Data Access} 11 12 \ivoagroup{DAL} 13 14 \author{Fran\c cois Bonnarel, Markus Demleitner, Patrick Dowler, Douglas Tody } 15 16 \editor{Fran\c cois Bonnarel} 17 18 \previousversion{WD-AccessData-1.0-20151021} 19 \previousversion{WD-AccessData-1.0-20140730} 20 \previousversion{WD-AccessData-1.0-20140312} 21 22 23 \begin{document} 24 25 \begin{abstract} 26 This document describes the SODA web service capability. SODA is a low-level data access capability or server side data processing that can act upon the data files, performing various kinds of operations: filtering/subsection, transformations, pixel operations, and applying functions to the data. 27 28 \end{abstract} 29 30 \section*{Acknowledgments} 31 The authors would like to thank all the participants in DAL- 32 WG discussions for their ideas, critical reviews, and 33 contributions to this document. 34 35 \section{Introduction} 36 The SODA web service interface defines a RESTful web service for performing server-side operations and processing on data before transfer. 37 38 39 \subsection{The Role in the IVOA Architecture} 40 41 42 43 44 TODO: new diagram from TCG 45 46 SODA services conform to the Data Access 47 layer Interface \citep{std:DALI} specification, including the 48 Virtual Observatory Support Interfaces \citep{std:VOSI} resources. 49 50 \subsection{Motivating Use Cases} 51 Below are some of the more common use cases that have motivated the development of the SODA specification. While this is not complete, it helps to understand the problem area covered by this specification. 52 53 \subsubsection{Retrieve Subsection of a Datacube} 54 \label{sect:use-cube} 55 56 Cutout a subsection using coordinate axis values. The input to the cutout operation will include one or more of the following: 57 58 59 \begin{itemize} 60 \item a region on the sky 61 \item an energy value or range 62 \item a time value or range 63 \item one or more polarization states 64 \end{itemize} 65 66 The region on the sky should be something simple: a circle, 67 a range of coordinate values, or maybe a polygon. 68 69 \subsubsection{Retrieve subsection of a 2D Image} 70 This is a special case of \ref{sect:use-cube}, 71 where the cutout is only in the spatial axes. 72 73 \subsubsection{Retrieve subsection of a Spectrum} 74 75 This is a special case of \ref{sect:use-cube}, 76 where the cutout is only in the spectral axis. 77 78 \subsection{Provide the data in different formats} 79 80 Examples are images in PNG, or JPEG instead of FITS and spectra in csv, FITS or VOTable. 81 82 \subsubsection{Flatten a Datacube into a 2D Image} 83 84 This use case will be developed and supported in the 85 SODA-1.1 (or later) specification. 86 87 \subsubsection{Flatten a Datacube into a 1D Spectrum} 88 89 This use case will be developed and supported in the 90 SODA-1.1 (or later) specification. 91 92 \subsubsection{Rebin Data by a Fixed Factor} 93 94 This use case will be developed and supported in the 95 SODA-1.1 (or later) specification. 96 97 \subsubsection{Reproject Data onto a Specified Grid} 98 99 This use case will be developed and supported in the 100 SODA-1.1 (or later) specification. 101 102 \subsubsection{Compute Aggregate Functions over the Data} 103 104 This use case will be developed and supported in the 105 SODA-1.1 (or later) specification. 106 107 108 \subsubsection{Apply Standard Function to Data Values} 109 110 It could be 111 112 113 denoising" with standard methods or "on the fly" recalibration. This use case will be developed and supported in the 114 SODA-1.1 (or later) specification. 115 116 \subsubsection{Apply Arbitrary User-Specified Function to Data Values} 117 118 This use case will be developed and supported in the 119 SODA-1.1 (or later) specification. 120 121 \subsubsection{Run Arbitrary User-Supplied Code on the Data} 122 123 This use case will be developed and supported in the 124 SODA-1.1 (or later) specification. 125 126 \section{Resources} 127 128 SODA services are implemented as HTTP REST \citep{richardson07} web 129 services with a \{sync\} resource that conforms to the DALI- 130 sync resource description. 131 132 \begin{table}[h] 133 \begin{tabular}{rrr} 134 \sptablerule 135 \textbf{resource type}&\textbf{resource name}&\textbf{required}\cr 136 \sptablerule 137 \{sync\}&service specific&\cr 138 \{async\}&service specific&\cr 139 {DALI-examples}&/examples&no\cr 140 {VOSI-availability}&/availability&yes\cr 141 {VOSI-capabilities}&/capabilities&yes\cr 142 \sptablerule 143 \end{tabular} 144 \caption{Endpoints for AccessData services} 145 \end{table} 146 147 A stand-alone SODA service may have one or both of the \{sync\} and \{async\} resources. For either type, it could have multiple resources (e.g. to support alternate authentication schemes). The SODA service may also include other custom or supporting resources. 148 149 Either the \{sync\} or \{async\} SODA capability may be included as part of other web services. For example, a single web service could contain the SIA-2.0 \{query\} capability, the DataLink-1.0 \{links\} capability, and the SODA \{sync\} capability. Such a service must also have the VOSI-availability and VOSI-capabilities resources to report on and describe all the implemented capabilities. 150 151 \subsection{\{sync\} resource} 152 153 The \{sync\} resource is a synchronous web service resource 154 that conforms to the DALI-sync description. The implementer 155 is free to name (set the path) for this resource however 156 they like; the client will find the resource path using the 157 VOSI-capabilities resource. 158 159 The \{sync\} resource performs the data access as specified by 160 the input parameters and returns the data directly in the 161 output stream. Synchronous data access is suitable when the 162 operations can be quickly performed and the data stream can 163 be setup and written to (by the service) in a short period 164 of time (e.g. before any timeouts). 165 166 \subsection{\{async\} resource} 167 168 The \{async\} resource is an asynchronous web service resource 169 that conforms to the DALI-async description. The implementer 170 is free to name (set the path) for this resource however 171 they like; the client will find the resource path using the 172 VOSI-capabilities resource. 173 174 The \{async\} resource performs the data access as specified 175 by the input parameters and either (i) stores the results 176 for later transfer or (ii) pushes the results to a specified 177 destination (e.g. to a VOSpace location). Asynchronous data 178 access usually introduces resource constraints on the 179 service (which may be limited) and usually imposes a higher 180 latency before any results can be seen because the location 181 of results does not have to be valid until the data access 182 job is complete. Asynchronous data access is intended for 183 (but not limited to) use when the operations take 184 considerable time and results must be staged (e.g. some 185 multi-pass algorithms or operations that result in multiple 186 outputs). 187 188 \subsection{Examples: DALI-examples} 189 190 AccessData services should provide a DALI-examples resource 191 with one example invocation that shows the variety 192 operations the service can perform. Example operations using 193 the \{sync\} resource and that output a small data stream are 194 preferred. 195 196 \subsection{Availability: VOSI-availability} 197 198 A SODA web service must have a VOSI-availability 199 resource \citep{std:VOSI} as described in DALI \citep{std:DALI}. 200 201 \subsection{Capabilities: VOSI-capabilities} 202 203 A web service that includes SODA capabilities must 204 have a VOSI-capabilities resource \citep{std:VOSI} as described in DALI 205 \citep{std:DALI}. The standardID for the \{sync\} resource is 206 $$\hbox{\texttt{ivo://ivoa.net/std/SODA\#sync}.}$$ 207 208 The standardID for the \{async\} resource is 209 210 $$\hbox{\texttt{ivo://ivoa.net/std/SODA\#async}.}$$ 211 212 All DAL services must implement the \texttt{/capabilities} resource. 213 The following capabilities document shows the minimal 214 metadata for a stand-alone SODA service and does not 215 require a registry extension schema: 216 217 \begin{lstlisting}[language=XML] 218 219 223 224 225 http://example.com/data/capabilities 226 227 228 229 230 231 http://example.com/data/availability 232 233 234 235 236 237 238 http://example.com/data/sync 239 240 241 242 243 244 245 246 http://example.com/data/async 247 248 249 250 251 252 \end{lstlisting} 253 254 Note that the \{sync\} and \{async\} resources do not have to be 255 named as shown in the accessURL(s) above. Multiple 256 capability elements for the \{sync\} and the \{async\} resources 257 may be included; this is typically used if the differ in 258 protocol (http vs. https) and/or authentication 259 requirements. 260 261 \section{Parameters for \{sync\} and \{async\}} 262 263 \label{sect:stdpars} 264 265 The \{sync\} and \{async\} resources accept the same set of 266 parameters. 267 268 \subsection{Parameter description: : 3 factor semantics} 269 Each SODA input parameter follows a generic rule of semantic description by three attributes: name, ucd -identifying the queried astronomical quantity-, and unit. With this three factors it is possible in principal to identify the nature and the style of the parameters. This also achieve overall consistency between input parameters and the VOTable PARAM element. Datatype and xtype attributes come to complete the definition of the value type and format. 270 The input parameters listed and defined in this section are fully described according to this rule. However these parameters should also be described this way each time the service is invocated with an empty request (see below \ref{sect:serv-desc}). Custom parameters of the service, if any, MUST be described in the same way. 271 272 273 \subsection{Common Parameters} 274 275 \subsubsection{ID} 276 277 The ID parameter is used to specify the dataset or file to 278 be accessed. The values for the ID parameter are generally 279 discovered from data discovery or DataLink requests. The 280 values must be treated as opaque identifiers that are used 281 as-is. The DataLink specification \citep{std:DataLink} describes mechanisms 282 for conveying opaque parameters and values in service 283 descriptor resources that can be used by clients to set the 284 ID parameter. 285 286 The ID parameter is single-valued in \{sync\} requests, so 287 \{sync\} soda requests access a single dataset or file. 288 Multiple ID parameters may be submitted in \{async\} requests 289 on order to bundle access to multiple datasets or files in a 290 single job. 291 292 293 The ID ucd is “meta.id”, and its unit is blank. 294 In addition its xtype is “ivoident” and its datatype "char". 295 296 297 \subsection{Filtering Parameters} 298 299 Filtering parameters are used to extract subsets of larger 300 datasets or data files. In general, filtering parameters are 301 single-valued in \{sync\} requests and multi-valued in \{async\} 302 requests (exceptions noted below). When multiple values of 303 filtering parameters are used in an \{async\} job, each 304 combination of values produces zero or one result. For 305 example, if an \{async\} job included two POS and two BAND 306 values, there could be as many as four results (or fewer if 307 some combinations do not produce a result because the filter 308 does not intersect the bounds of the data).

309 310 \subsubsection{POS} 311 312 The POS parameter defines the positional region(s) to be 313 extracted from the data. The value is made up of a shape 314 keyword followed by coordinate values. The 315 allowed shapes are: 316 317 \begin{table}[h] 318 \begin{tabular}{rr} 319 \sptablerule 320 \textbf{Shape}&\textbf{Coordinate values}\cr 321 \sptablerule 322 \texttt{CIRCLE}&\texttt{ }\cr 323 \texttt{RANGE}&\texttt{ }\cr 324 \texttt{POLYGON}&\texttt{ ... (at least 3 pairs)}\cr 325 \sptablerule 326 \end{tabular} 327 \caption{POS Values in Spherical Coordinates} 328 \end{table} 329 330 Unlimited value is coded by -Inf or +Inf. 331 332 A circle at (12,34) with radius 0.5: 333 334 \begin{lstlisting} 335 POS=CIRCLE 12 34 0.5 336 \end{lstlisting} 337 338 A range of [12,14] in longitude and [34,36] in latitude: 339 340 \begin{lstlisting} 341 POS=RANGE 12 14 34 36 342 \end{lstlisting} 343 344 A polygon from (12,34) to (14,34) to (14,36) to (12,36) and 345 (implicitly) back to (12,34): 346 347 \begin{lstlisting} 348 POS=POLYGON 12 34 14 34 14 36 12 36 349 \end{lstlisting} 350 351 The inside is always assumed to be the smaller of the region 352 to the left and the region to the right so only polygons 353 smaller than half the sphere can be specified. 354 355 A band around the equator: 356 357 \begin{lstlisting} 358 POS=RANGE 0 360 -2 2 359 \end{lstlisting} 360 361 The north pole: 362 363 \begin{lstlisting} 364 POS=RANGE 0 360 89 +Inf 365 \end{lstlisting} 366 367 This syntax is in the same style as STC-S, but with no 368 reference positions, coordinate systems, units, or geometric 369 operators like union, intersection, not, etc. 370 371 All longitude and latitude values (plus the radius of the 372 CIRCLE) are expressed in degrees in the ICRS. A future 373 version of this specification may allow the use of other 374 reference systems (specifically the native system of the 375 data). 376 377 The POS parameter is single-valued for \{sync\} requests and 378 multi-valued for \{async\} jobs. 379 380 The unit of POS is "deg" and the ucd is "pos". However the datatype of the POS parameter is “char”, and the xtype can take one of the three values “circle”, “range” and “polygon” as defined in DALI. 381 382 \subsubsection{BAND} 383 384 The BAND parameter defines the energy interval(s) to be 385 extracted from the data. The value is an open or closed 386 numeric interval of values in the native spectral axis 387 coordinate system and units of the data. The intervals 388 always include the bounding values. Unlimited values are coded by +Inf or -Inf. 389 390 If there is one single value the interval is assumed to be 391 infinitely small (a scalar value). 392 393 The closed interval [500,550]: 394 395 \begin{lstlisting} 396 BAND=500 550 397 \end{lstlisting} 398 399 The open interval (-inf,300]: 400 401 \begin{lstlisting} 402 BAND=-Inf 300 403 \end{lstlisting} 404 405 The open interval [750,inf): 406 407 \begin{lstlisting} 408 BAND=750 +Inf 409 \end{lstlisting} 410 411 The scalar value 550, equivalent to [550,550]: 412 413 \begin{lstlisting} 414 BAND=550 415 \end{lstlisting} 416 417 Extracting using a scalar value should normally extract a 418 single pixel along the energy axis of the data; extracting 419 using an interval should extract one or more pixels. 420 421 All energy values are expressed as barycentric wavelength in 422 meters. A future version of this specification may allow the 423 use of other reference systems (specifically the native 424 system of ther data). 425 426 The BAND parameter is single-valued for \{sync\} requests and 427 multi-valued for \{async\} jobs. 428 429 The ucd of the BAND parameter is “em”, the unit is “m”. Its datatype id double and the xtype is “interval” as defined in DALI. 430 431 432 \subsubsection{TIME} 433 434 The TIME parameter defines the time interval(s) to be 435 extracted from the data. The value is an open or closed 436 interval with either numeric values (interpreted as Modified 437 Julian Dates). Unlimited values are coded by +Inf or -Inf. 438 439 If there is one single value the numeric interval is assumed 440 to be infinitely small (a scalar value). 441 442 An open interval from the MJD 55100.0 and all later times: 443 444 \begin{lstlisting} 445 TIME= 55100.0 +Inf 446 \end{lstlisting} 447 448 A range of MJD values: 449 450 \begin{lstlisting} 451 TIME=55123.456 55123.466 452 \end{lstlisting} 453 454 An instant in time using Modified Julian Date: 455 456 \begin{lstlisting} 457 TIME=55678.123456 458 \end{lstlisting} 459 460 Time values are always UTC. 461 The TIME parameter is single-valued for \{sync\} requests and 462 multi-valued for \{async\} jobs. 463 464 The ucd of the TIME parameter is “time” and the unit is "day". The datatype is "double" and the xtype is, again, "interval" as defined in DALI 465 466 467 \subsubsection{POL} 468 469 The POL parameter defines the polarization state(s) (Stokes) 470 to be extracted from the data. 471 472 Extract the unpolarized intensity: 473 \begin{lstlisting} 474 POL=I 475 \end{lstlisting} 476 Extract the standard circular polarization: 477 \begin{lstlisting} 478 POL=V 479 \end{lstlisting} 480 The POL parameter is multi-valued; multiple values can be 481 included in a single request and all will be extracted. 482 Extract only the IQU components: 483 \begin{lstlisting} 484 POL=I 485 POL=Q 486 POL=U 487 \end{lstlisting} 488 489 The POL is multi-valued for both \{sync\} and \{async\} jobs. 490 Unlike general filtering parameters, all values of POL are 491 combined into a single filter; for example, if the request 492 includes the three values above, the job would generate one 493 result with some or all of these polarization states (per 494 combination of ID and other filtering parameters).

495 496 The ucd of the POL PARAMETER is "pol" and the unit is none. The datatype is “char", and the xtype is “stokes”. 497 498 499 500 \section{\{sync\} Responses} 501 502 All responses from the \{sync\} resource follow the rules for 503 DALI-sync resources, except that the \{sync\} response allows 504 for error messages for individual input identifier values. 505 506 \subsection{Successful Requests} 507 508 Successfully executed requests should result in a response 509 with HTTP status code 200 (OK) and a response in the format 510 requested by the client or in the default format for the 511 service. 512 513 If the values specified for cutout parameters do not include 514 any pixels from the target dataset/file, the service must 515 respond with HTTP status code 204 (No Content) and no 516 response body. 517 518 The service should set the following HTTP headers to the 519 correct values where possible. 520 521 \begin{table}[h] 522 \begin{tabular}{rr} 523 \sptablerule 524 Content-Type&mime-type of the response\cr 525 Content-Encoding&encoding/compression of the response (if applicable)\cr 526 \sptablerule 527 \end{tabular} 528 529 \caption{Recommended HTTP Response Headers} 530 \end{table} 531 532 Since the response is usually dynamically generated, the 533 Content-Length and Last-Modified headers cannot usually be 534 set.\todo{If we say that, we should at least mention chunked transfers, or 535 people might think they have to close the connections. -- Markus} 536 537 \subsection{SODA Service Descriptor} 538 \label{sect: serv-desc} 539 540 The DataLink \citep{std:DataLink} specification describes a mechanism for 541 describing a service within a VOTable resource and 542 recommends that services can describe themselves with a 543 special resource with \texttt{name="this"}. SODA responses for 544 empty sync queries should include a descriptor describing 545 both standard and custom query parameters (if applicable). 546 The descriptor for a service with standard parameters (see 547 sect.~\ref{sect:stdpars}) would be: 548 549 \begin{lstlisting}[language=XML] 550 551 552 553 555 556 558 559 560 561 562 563 564 566 567 568 569 570 571 \end{lstlisting} 572 573 This VOTable resource should be output for empty sync 574 queries; Thus all inputs and outputs would be fully 575 described. 576 577 \subsection{Errors} 578 579 The error handling specified for DALI-sync resources applies 580 to service failure. Error documents should be text using the 581 text/plain content-type and the text must begin with one of 582 the following strings: 583 584 \begin{table}[h] 585 \begin{tabular}{rr} 586 Error&General error (not covered below)\cr 587 AuthenticationError&Not authenticated\cr 588 AuthorizationError&Not authorized to access the resource\cr 589 ServiceUnavailable&Transient error (could succeed with retry)\cr 590 UsageError&Permanent error (retry pointless)\cr 591 \end{tabular} 592 \caption{???} 593 \end{table} 594 595 \section{\{async\} Responses} 596 597 The \{async\} resource conforms to the DALI-async resource 598 description, which means the job is a UWS job with all the 599 job control features available. All result files are to be 600 listed as children of the UWS results resource. The service 601 provider is free to name each result. 602 603 \appendix 604 605 \section{Changes from Previous Versions} 606 607 \subsection{WD-SODA-1.0-20151120} 608 609 Change the name of the protocol. Suppression of SELECT and COORD. xtype description are in DALI. Reference to this has been added. 610 611 \subsection{WD-AccessData-1.0-20151021} 612 613 Added general introduction on PARAMETER description to 614 section 3. Modified SELECT and COORD sections in order to 615 detach them from SimDal. Added Appendix on xtype description 616 with BNF syntax. 617 618 \subsection{WD-AccessData-1.0-20140730} 619 620 \begin{itemize} 621 \item Removed REQUEST parameter since the DAL-WG decision to not 622 include it when there is only one value. 623 624 \item Clarified that ID and filierting parameters are single 625 valued for \{sync\} and multi-valued for \{async\}, wth POL 626 being multi-valued but still being treated as a single 627 filter. 628 \end{itemize} 629 630 \subsection{WD-AccessData-1.0-20140312} 631 632 This is the initial document.

633 634 \bibliography{ivoatex/ivoabib} 635 636 \end{document}