IVOA

Simulation Data Access Protocol (SimDAP)
Draft

IVOA Note March 2009

This version:
http://www.ivoa.net/Documents/...
Latest version:
http://www.ivoa.net/Documents/latest/...
Previous versions:
http://www.ivoa.net/Documents/...
http://www.ivoa.net/Documents/...
Interest Group:
http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory
Author(s):
Claudio Gheller
Gerard Lemson
Rick Wagner

Abstract

This specification defines a protocol for retrieving data coming from numerical simulations from a variety of data repositories through a uniform interface. The interface is meant to be reasonably simple to implement by service providers. Data are selected by a proper search procedure. Once data of interest is identified specific quantities can be selected and sub-samples can be extracted and downloaded. Data is returned in VOTable simulation specific format, with support of external binary file management.

Status of this Document

This is a Note. The first release of this document was 18 May 2008.

This is an IVOA Note expressing suggestions from and opinions of the authors.
It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory. It should not be referenced or otherwise interpreted as a standard specification.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.


Acknowledgments

We thank Ugo Becciani, Laurent Bourgès, Patrizia Manzato, Hervé Wozniak for discussions and feedbacks on the topic.

Contents

1. Introduction

This specification defines a prototype standard for accessing theoretical data from a variety of astrophysical simulation repositories: the Simulation Data Access Protocol (hereafter SimDAP). In this context Theoretical Data is defined as the outcome of different kinds of numerical applications, like dynamical simulations, semianalytical models, montecarlo simulations etc.

SimDAP will deal with datasets that can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as snapshot of a numerical application. Snapshots are the data sources. No further assumption are made on data. Data is described and can be searched by means of the SimDM theoretical data model (Lemson et al, 2008).

The simplest access mode is the download of a data file in a standard format. However, in general, data is so large that its direct dowload is unfeasible. The SimDAP protocol describes a standard interface to access services which allow the user to reduce the data volume to move over the network (e.g. focus on a proper subsample of the data), permitting its download. The protocol defines also the interface to preview services which allow the user to choose between different datasets and to set the parameters to properly reduce the data volume.

In operation, SimDAP represents a negotiation between the client and the data service, which allows the user to preview data and to select and retrieve specific subsets. The retrieval of the complete dataset can be considered as a degenerate selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file since data are delivered in the standard TVO format described in section X.X.

Services supporting SimDAP provide access to both existing datasets and virtual ones (i.e., datasets generated by the service).

Generating virtual datasets, or "data on demand", is not a simple task, for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation. Furthermore, simulation outputs can be represented by a wide variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.

2 Concepts and Terminology

2.1 The Simulation Data Model (SimDM)

TODO: A general description of the SimDM, and how SimDAP uses it to describe the data.

2.2 The Theoretical Data File Format (TDFF)

The Theoretical Data File Format adopts the VOTable standard to describe respones and data coming from a SimDAP service. In all cases, the TDFF table must provide all the necessary information expected for the specific service. In particular, if the TDFF contains the desciption of snapshot or, more generally, of and external data file, it must provide all the information required to read, interpret and use that file. This required an extension of the basic SimDM schema as sketched in Appendix B.

TODO: Introduce the TDFF, and the need to extend the SimDM to incorporate a low-level description of the data returned from SimDAP services. Also, look at how the TDFF can be applied to S3, in particular using the TDFF to describe a table of results from an S3 service.

See Appendix B for a more detailed description of the TDFF.

2.3 Virtual Data

One of the primary purposes of SimDAP is to enable access to large simulation datasets by producing derived data "on the fly". In this case, the Experiment corresponds to the operation of the service in producing the results.

TODO: Finish, especially in regards to custom services (e.g., S3), and how the concepts of the data models relate to virtual data.

3 Interface Overview

3.1 Architecture

A SimDAP service provides access to data files directly produced by a numerical simulation or derived from them. from simulations.

3.2 Service Operations

A SimDAP service implements multiple service operations, each of which performs some well defined function when invoked by a client application. The service operations described here use HTTP GET and POST as the low level communications protocol. The functionality of each operation is defined independently of the low level communications protocol, and semantically equivalent operations could be implemented via other protocols.

SimDAP defines the following standard service operations:

GetCapabilities
Return a standardized XML description of the capabilities of the service instance, describing what the service is capable of doing (VOSI compliant, registry cacheable and searchable).
GetAvailability
Return a standardized XML description of the runtime status of the service, describing the state and availability of the service (VOSI compliant).
ListExperiments
Return a list of the experiments served by this SimDAP instance.
ListSnapshots
List the available snapshots, either all, of for one or more experiments.
QueryData
The QueryData operation returns a list of access reference to files containing the data for an Experiment. Furthermore, a list of the available properties can be retrieved.
Cutout
The goal of the cutout service is to select and extract a sub-volume of data from a given snapshot.
Preview
Previews operations provide access references to precomputed or rapidly available datasets, that are assumed to be easily rendered by the client. (E.g., images or small fields.)
Custom
A service define custom operation.

3.3 Service Profile

The specific SimDAP service request parameters and responses are detailed in Section 4., and summarized in Appendix A. In the current section we merely summarize the elements of the basic service interface.

3.3.1 Request Format

A request URL is formed by concatenating a baseURL with zero or more operation-defined request parameters. The baseURL defines the network address to which request messages are to be sent for a particular operation of a particular service instance on a particular server. Service operations generally share the same baseURL but this is not required.

Example:

      http://example.org/simdap/sync?REQUEST=LISTEXPERIMENTS
    

SimDAP defines two versions of the baseURL, one for synchronous operations and another for asynchronous operations. These are formed by contentating the service-baseURL with either /sync? or /async?. Hence for synchronous operations we have a full baseURL of

      http://example.org/simdap/sync?
    

and for asynchronous operations the full baseURL is

      http://example.org/simdap/async?
    

In general the service operation is much the same whether or not it executes synchronously or asynchronously. Minor differences in service operation function or input parameters are poindividual service operation below.

TODO: Tie this in with UWS for asynchronous operations.

Note that since a URI pathname segment is appended to the service baseURL the service baseURL may not contain any HTTP GET parameters, and must be a fixed URI.

3.3.2 Parameters

Parameters may appear in any order. If the same parameter appears multiple times in a request the operation is undefined (if alternate values for a parameter are desired the range-list syntax may be used instead). Parameter names are case-insensitive. Parameter values are case-sensitive unless defined otherwise in the description of an individual parameter.

All service operations define the following standard parameters, which are part of the basic service profile:

REQUEST
The request or operation name (mandatory).
VERSION
The version number of the interface (optional).

The REQUEST parameter specifies the service operation to be executed. VERSION allows a specific version of the interface to be requested. The values of both the REQUEST and VERSION parameters are case-insensitive.

A given service instance may support multiple versions of the SimDAP interface, and by default the service assumes the highest standard version which is implemented (access to any experimental versions supported by a service requires explicit specification of the version by the client). Explicit specification of the interface version assumed by the client is necessary to ensure against a runtime version mismatch, e.g., if the client caches the service endpoint but a newer version of the service is subsequently deployed. If desired the client can omit the VERSION parameter to disable runtime version checking, and default to the highest version standard interface implemented by the service.

All other request parameters are defined separately for each operation.

3.3.3 Parameter Values

Integer numbers are represented as defined in the specification of integers in XML Schema Datatypes. Real numbers are represented as specified for double precision numbers in XML Schema Datatypes. Sexagesimal formatting is not permitted, either for parameter input or in formal output metadata, other than in ISO 8601 formatted time strings (sexagesimal format is permitted in any informal output intended for a human, e.g., text or HTML formatted tables).

SimDAP defines a special range-list format for specifying numerical ranges or lists of ranges as parameter values. For example, 1E-7/3E-6 specifies a closed range from 1E-7 to 3E-6 inclusive. The syntax supports both open and closed ranges. Ranges or range lists are permitted only when explicitly indicated in the definition of an individual parameter. A variant of the range list is the value of the WHERE parameter, used to specify the query constraint for a ParamQuery operation. For a full description of range list syntax refer to section ?

Repeated values in an array are specified using a single comma-separated list, in order to preserve the order of the elements when specifying spatial dimensions.

      $/sync?REQUEST=CUTOUT&EXPERIMENT=clrc00&SNAPSHOT=clrc00_0010&LEFTEDGE=0.5,0.6,0.2&RIGHTEDGE=0.7,0.8,0.4
    

TODO: Find a reference for the ranges based on the new TAP parameter query (PQL). Descibe the array-like requests in more detail. We need this for cutouts and previews, and possible for passing parameters to a custom or S3 operation. Clients may also use this to request data from several experiments, snapshots or properties.

3.3.4 Use of GET and POST

Describe in more detail. GET is needed for the basic operations, but custom operations may require a POST, if there's some input data. Also, UWS utilizes POST.

3.3.5 Error Response

How to return errors? Should we return standard HTTP errors, like Internal Error, or Bad Request?

3.4 Request Examples

TODO: Add some examples. More importantly, add the examples in the section for each query response.

3.5 Query Response

The basic format of a response from a SimDAP service is a VOTable XML document, containing a nested hierarchy of RESOURCE elements.

   <RESOURCE utype="SimDB.Experiment">
     ...Experiment metadata...
     <RESOURCE utype="SimDB.Snapshot">
       ...Snaphot metadata...
       <RESOURCE utype="TDFF.File">
         ...File metadata, access reference...
         <TABLE utype="TDFF.Array">
           ...Table of arrays metadata...
    

The response to a ListExperiments request is a VOTable containing a series of RESOURCE elements, where each RESOURCE contains the metadata for a single Experiment. Individual attributes of the Experiment (taken from the SimDM), are listed as PARAM or LINK elements in the RESOURCE. Attributes that are collections, ParameterSetting for example, are listed as TABLEs in the RESOURCE.

The required and optional attributes are in Section A.3.1 of the Appendix. This list has been deliberately kept to a minimum, since not all data providers will have a complete database with all of the classes from the SimDM. Instead, the can use the a LINK element for the RefererenceURL attribute to point the client to a richer description of the simulation. Ideally, this would point to an XML instance document describing the Experiment based on the XML Schema from the SimDM. (SimDM or SimDB?)

Similarly, the required attributes are the same for all service operations. It is assumed that a client performing a ListExperiment query is exploring the Experiments, and would like more metadata. However, when performing a QueryData request, the client may already have

TODO: Should the service allow continuation tokens for long responses?

4 SimDAP Service Operations

4.1 GetAvailability

Per VOSI standard.

4.2 GetCapabilities

TODO: Can we use a GetCapabilities request describe custom services, such as S3? Also, we'll need to define the XML Schema for the resource registration.

4.3 ListExperiments

4.3.1 Input Parameters

None.

4.3.2 Query Response Metadata

TODO: Finish.

4.4 ListSnapshots

4.4.1 Input Parameters

Name UTYPE Requrired?
EXPERIMENT SimDB.Experiment.PublisherDID OPT

4.4.2 Query Response Metadata

4.5 QueryData

The basic parameter is the PublisherDID of the Experiment, but the query may be restricted to one or more Snapshots, and one or more Propertys. If no Snapshots are requested, all of the file available from the Experiment are listed. Likewise, if no Propertys are requested, all of the available ones are listed.

4.5.1 Input Parameters

Name UTYPE Required?
EXPERIMENT SimDB.Experiment.PublisherDID REQ
SNAPSHOT SimDB.Snapshot.PublisherDID OPT
PROPERTY SimDB.RepresentationObject.Property OPT

4.5.2 Query Response

4.6 Cutout

Such operation refers to a single snapshot. Multiple sources cutouts, like for various time steps of the same simulation, are not supported by the protocol. Their implementation is up to the client, as, for example, sequences of requests with same subbox and fields but different datasets.

4.6.1 Input Parameters

Name UTYPE Required?
EXPERIMENT SimDB.Experiment.PublisherDID REQ
SNAPSHOT SimDB.Snapshot.PublisherDID REQ
PROPERTY SimDB.RepresentationObject.Property OPT
LEFTEDGE TDFF.Array.LeftEdge OPT
RIGHTEDGE TDFF.Array.RightEdge OPT

4.6.2 Query Response

4.7 Preview

The preview can be implemented in different ways, depending on the specific data we are dealing with. In all the cases, if the service is supported, a getPreview method MUST be implemented. The input of this method is the basic couple EXPERIMENT and SNAPSHOT. The PROPERTY parameter may be used to specify which fields to preview (if supported, otherwise it is discarded). No FIELDS specification or a blank PROPERTY parameter, is interpeted as: preview all available fields. If PROPERTY requires unavailable quantities, the corresponding request is discarded. If the cutout service is available, the preview service MUST provide instruments to select the fields of interest and the cutout region.

4.7.1 Input Parameters

Name UTYPE Required?
EXPERIMENT SimDB.Experiment.PublisherDID REQ
SNAPSHOT SimDB.Snapshot.PublisherDID OPT
PROPERTY SimDB.RepresentationObject.Property OPT

4.7.2 Query Response

4.8 Custom

4.8.1 Input Parameters

4.8.2 Query Response


Appendix A: Detailed List of Query Parameters and Response Content

A.1 Custom Services

Custom services must define their own input parameters and responses.

A.2 Input Parameters

Parameter Service Operation
Name UTYPE ListExperiments ListSnapshots QueryData Preview Cutout
EXPERIMENT SimDB.Experiment.PublisherDID N/A OPT REQ REQ REQ
SNAPSHOT SimDB.Snapshot.PublisherDID N/A N/A OPT OPT REQ
PROPERTY SimDB.RepresentationObject.Property N/A N/A OPT OPT OPT
LEFTEDGE TDFF.Array.LeftEdge N/A N/A N/A N/A OPT
RIGHTEDGE TDFF.Array.RightEdge N/A N/A N/A N/A OPT

A.3 Query Response

Tables are used to represent collections from the data model. In many cases, these tables are optional. In this case, the required fields (columns) of the table only apply if the service chooses to return that table. This way, the client can be assured of a minimal set of metadata if the table is returned.

Resource Service Operation
Name UTYPE ListExperiments ListSnapshots QueryData Preview Cutout
EXPERIMENT SimDB.Experiment REQ REQ REQ REQ REQ
SNAPSHOT SimDB.Snapshot OPT REQ REQ REQ REQ
FILE TDFF.File OPT OPT REQ REQ REQ

A.3.1 Experiment Resource Metadata

The Experiment, Simulation, and PostProcessing classes from the SimDB have more attributes than are listed here. In principle, all of these attribute can be returned by a SimDAP service, in addition to appropriatedly related elements from the other classes, namely Protocol and its subclasses. The attributes and collections given here are the ones most important for describing the data.

UTYPE VOT Element Required?
SimDB.Experiment.Name PARAM REQ
SimDB.Experiment.Created PARAM OPT
SimDB.Experiment.Description PARAM OPT
SimDB.Experiment.Status PARAM OPT
SimDB.Experiment.Updated PARAM OPT
SimDB.Experiment.ReferenceURL LINK REQ
SimDB.Protocol.Name PARAM REQ
SimDB.Protocol.PublisherDID PARAM REQ
SimDB.Protocol.ReferenceURL LINK REQ
SimDB.Protocol.Version PARAM OPT
SimDB.Experiment.GenericParameterSetting TABLE OPT
SimDB.Experiment.NumericParameterSetting TABLE OPT
SimDB.Experiment.InputDataset TABLE OPT
SimDB.Experiment.ExperimentRepresentationObject TABLE OPT
A.3.1.1 Generic Experiment Parameter Setting Table Columns
Column Required?
SimDB.Protocol.InputParameter.Name REQ
SimDB.Protocol.InputParameter.Description OPT
SimDB.Protocol.InputParameter.Datatype REQ
SimDB.Experiment.GenericParameterSetting.Value REQ
A.3.1.2 Numeric Experiment Parameter Setting Table Columns
Column Required?
SimDB.Protocol.InputParameter.Name REQ
SimDB.Protocol.InputParameter.Description OPT
SimDB.Protocol.InputParameter.Datatype REQ
SimDB.Experiment.NumericParameterSetting.Value.Value REQ
SimDB.Experiment.NumericParameterSetting.Value.Unit REQ
A.3.1.3 Input Dataset Table Columns
Column Required?
SimDB.Experiment.Name REQ
SimDB.Experiment.PublisherDID REQ
SimDB.Experiment.ReferenceURL REQ
SimDB.Snapshot.PublisherDID REQ
A.3.1.4 Experiment Representation Object Table Columns
Column Required?
SimDB.Protocol.RepresentationObjectType.Name REQ
SimDB.Protocol.RepresentationObjectType.Description OPT
SimDB.Protocol.RepresentationObjectType.Label OPT
SimDB.Protocol.RepresentationObjectType.Type REQ

A.3.2 Snapshot Resource Metadata

UTYPE VOT Element Required?
SimDB.Experiment.Snapshot.PublisherDID PARAM REQ

A.3.3 File Resource Metadata

UTYPE VOT Element Required
TDFF.File.PublisherDID PARAM REQ
TDFF.File.AccessURL LINK REQ
Protocol.FileType.PublisherDID PARAM REQ
Protocol.FileType.Mimetype PARAM REQ
TDFF.Array TABLE REQ
A.3.3.1 Array Table Columns
Column Required?
TDFF.Array.Name REQ
SimDB.Protocol.RepresentationObject.Name REQ
SimDB.Protocol.RepresentationObject.Description OPT
SimDB.Protocol.RepresentationObject.PublisherDID REQ
SimDB.Protocol.RepresentationObject.Property.Name REQ
SimDB.Protocol.RepresentationObject.Property.Description OPT
SimDB.Protocol.RepresentationObject.Property.PublisherDID REQ
TDFF.Array.Datatype REQ

Appendix B: Theoretical Data File Format (TDFF)

B.1 TDFF Class Diagram

Theoretical Data File
      Format class diagram

B.2 Description of TDFF Elements

Class
UTYPEUCD1+Description
TDFF.FileType ? Type of file produced by a software protocol
Attributes
UTYPEUCD1+DatatypeDescription
TDFF.FileType.Name string Short name
TDFF.FileType.PublisherDID string Publisher assigned identifier of the FileType.
TDFF.FileType.Description text
TDFF.FileType.Mimetype string Content-type
Class
UTYPEUCD1+Description
TDFF.File ? File or table containing one or more arrays
Attributes
UTYPEUCD1+DatatypeDescription
TDFF.File.Name string File name
TDFF.File.Type string Reference to the PublisherDID of the FileType.
TDFF.File.PublisherDID string
TDFF.File.Size int Approximate size in KiB
TDFF.File.AccessURL string Resolvable URL for retrieving file
Class
UTYPEUCD1+Description
TDFF.Array ? Sequence of data values in a binary array or table column
Attributes
UTYPEUCD1+DatatypeDescription
TDFF.Array.Name string Array or column name
TDFF.Array.Dataype string Array name
TDFF.Array.Property string Reference to the PublisherDID of the Property represented by the Array.
TDFF.Array.Rank int Number of axes in the Array.
TDFF.Array.Dims int[] Array of length Rank indicate the number of elements along each axis of the Array.
TDFF.Array.Offset int Number of bytes in the File before the beginning of the Array.
TDFF.Array.Stride int Number of bytes to skip between each element of the Array.
TDFF.Array.SkipByte int Claudio, do we need this if we're providing the offset for each array?
TDFF.Array.Endian string The endian-ness of the Array; possible values are "little" or "big".
TDFF.Array.RowMajor bool Whether or not the Array is in row-major or column-major order.
TDFF.Array.InternalPath string Internal path of the Array if it is a self-desciribing file format, such as FITS or HDF5.
TDFF.Array.LeftEdge float[] Array of length Rank of the minimum spatial extent in each dimension.
TDFF.Array.RightEdge float[] Array of length Rank of the maximum spatial extent in each dimension.

References

[1] R. Hanisch, Resource Metadata for the Virtual Observatory
http://www.ivoa.net/Documents/latest/RM.html

[2] R. Hanisch, M. Dolensky, M. Leoni, Document Standards Management: Guidelines and Procedure
http://www.ivoa.net/Documents/latest/DocStdProc.html