This specification defines a protocol for retrieving data coming from numerical simulations from a variety of data repositories through a uniform interface. The interface is meant to be reasonably simple to implement by service providers. Data are selected by a proper search procedure. Once data of interest is identified specific quantities can be selected and sub-samples can be extracted and downloaded. Data is returned in VOTable simulation specific format, with support of external binary file management.
This is a Note. The first release of this document was 18 May 2008.
This is an IVOA Note expressing suggestions from and opinions of the authors.
It is intended to share best practices, possible approaches,
or other perspectives on interoperability with the Virtual
Observatory. It should not be referenced or otherwise
interpreted as a standard specification.
A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.
We thank Ugo Becciani, Laurent Bourgès, Patrizia Manzato, Hervé Wozniak for discussions and feedbacks on the topic.
This specification defines a prototype standard for accessing theoretical data from a variety of astrophysical simulation repositories: the Simulation Data Access Protocol (hereafter SimDAP). In this context Theoretical Data is defined as the outcome of different kinds of numerical applications, like dynamical simulations, semianalytical models, montecarlo simulations etc.
SimDAP will deal with datasets that can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as snapshot of a numerical application. Snapshots are the data sources. No further assumption are made on data. Data is described and can be searched by means of the SimDM theoretical data model (Lemson et al, 2008).
The simplest access mode is the download of a data file in a standard format. However, in general, data is so large that its direct dowload is unfeasible. The SimDAP protocol describes a standard interface to access services which allow the user to reduce the data volume to move over the network (e.g. focus on a proper subsample of the data), permitting its download. The protocol defines also the interface to preview services which allow the user to choose between different datasets and to set the parameters to properly reduce the data volume.
In operation, SimDAP represents a negotiation between the client and the data service, which allows the user to preview data and to select and retrieve specific subsets. The retrieval of the complete dataset can be considered as a degenerate selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file since data are delivered in the standard TVO format described in section X.X.
Services supporting SimDAP provide access to both existing datasets and virtual ones (i.e., datasets generated by the service).
Generating virtual datasets, or "data on demand", is not a simple task, for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation. Furthermore, simulation outputs can be represented by a wide variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.
TODO: A general description of the SimDM, and how SimDAP uses it to describe the data.
The Theoretical Data File Format adopts the VOTable standard to describe respones and data coming from a SimDAP service. In all cases, the TDFF table must provide all the necessary information expected for the specific service. In particular, if the TDFF contains the desciption of snapshot or, more generally, of and external data file, it must provide all the information required to read, interpret and use that file. This required an extension of the basic SimDM schema as sketched in Appendix B.
TODO: Introduce the TDFF, and the need to extend the SimDM to incorporate a low-level description of the data returned from SimDAP services. Also, look at how the TDFF can be applied to S3, in particular using the TDFF to describe a table of results from an S3 service.
See Appendix B for a more detailed description of the TDFF.
One of the primary purposes of SimDAP is to enable access to
large simulation datasets by producing derived data "on the
fly". In this case, the Experiment
corresponds to
the operation of the service in producing the results.
TODO: Finish, especially in regards to custom services (e.g., S3), and how the concepts of the data models relate to virtual data.
A SimDAP service provides access to data files directly produced by a numerical simulation or derived from them. from simulations.
A SimDAP service implements multiple service operations, each of which performs some well defined function when invoked by a client application. The service operations described here use HTTP GET and POST as the low level communications protocol. The functionality of each operation is defined independently of the low level communications protocol, and semantically equivalent operations could be implemented via other protocols.
SimDAP defines the following standard service operations:
Experiment
. Furthermore,
a list of the available properties can be retrieved.
The specific SimDAP service request parameters and responses are detailed in Section 4., and summarized in Appendix A. In the current section we merely summarize the elements of the basic service interface.
A request URL is formed by concatenating a baseURL with zero or more operation-defined request parameters. The baseURL defines the network address to which request messages are to be sent for a particular operation of a particular service instance on a particular server. Service operations generally share the same baseURL but this is not required.
Example:
http://example.org/simdap/sync?REQUEST=LISTEXPERIMENTS
SimDAP defines two versions of the baseURL, one for synchronous
operations and another for asynchronous operations. These are
formed by contentating the service-baseURL with either /sync?
or
/async?
. Hence for synchronous operations we have a full baseURL
of
http://example.org/simdap/sync?
and for asynchronous operations the full baseURL is
http://example.org/simdap/async?
In general the service operation is much the same whether or not it executes synchronously or asynchronously. Minor differences in service operation function or input parameters are poindividual service operation below.
TODO: Tie this in with UWS for asynchronous operations.
Note that since a URI pathname segment is appended to the service baseURL the service baseURL may not contain any HTTP GET parameters, and must be a fixed URI.
Parameters may appear in any order. If the same parameter appears multiple times in a request the operation is undefined (if alternate values for a parameter are desired the range-list syntax may be used instead). Parameter names are case-insensitive. Parameter values are case-sensitive unless defined otherwise in the description of an individual parameter.
All service operations define the following standard parameters, which are part of the basic service profile:
The REQUEST parameter specifies the service operation to be executed. VERSION allows a specific version of the interface to be requested. The values of both the REQUEST and VERSION parameters are case-insensitive.
A given service instance may support multiple versions of the SimDAP interface, and by default the service assumes the highest standard version which is implemented (access to any experimental versions supported by a service requires explicit specification of the version by the client). Explicit specification of the interface version assumed by the client is necessary to ensure against a runtime version mismatch, e.g., if the client caches the service endpoint but a newer version of the service is subsequently deployed. If desired the client can omit the VERSION parameter to disable runtime version checking, and default to the highest version standard interface implemented by the service.
All other request parameters are defined separately for each operation.
Integer numbers are represented as defined in the specification of integers in XML Schema Datatypes. Real numbers are represented as specified for double precision numbers in XML Schema Datatypes. Sexagesimal formatting is not permitted, either for parameter input or in formal output metadata, other than in ISO 8601 formatted time strings (sexagesimal format is permitted in any informal output intended for a human, e.g., text or HTML formatted tables).
SimDAP defines a special range-list format for specifying numerical ranges or lists of ranges as parameter values. For example, 1E-7/3E-6 specifies a closed range from 1E-7 to 3E-6 inclusive. The syntax supports both open and closed ranges. Ranges or range lists are permitted only when explicitly indicated in the definition of an individual parameter. A variant of the range list is the value of the WHERE parameter, used to specify the query constraint for a ParamQuery operation. For a full description of range list syntax refer to section ?
Repeated values in an array are specified using a single comma-separated list, in order to preserve the order of the elements when specifying spatial dimensions.
$/sync?REQUEST=CUTOUT&EXPERIMENT=clrc00&SNAPSHOT=clrc00_0010&LEFTEDGE=0.5,0.6,0.2&RIGHTEDGE=0.7,0.8,0.4
TODO: Find a reference for the ranges based on the new TAP parameter query (PQL). Descibe the array-like requests in more detail. We need this for cutouts and previews, and possible for passing parameters to a custom or S3 operation. Clients may also use this to request data from several experiments, snapshots or properties.
Describe in more detail. GET
is needed for the
basic operations, but custom operations may require a POST
, if
there's some input data. Also, UWS utilizes POST
.
How to return errors? Should we return standard HTTP errors, like Internal Error, or Bad Request?
TODO: Add some examples. More importantly, add the examples in the section for each query response.
The basic format of a response from a SimDAP service is
a VOTable
XML document, containing a nested hierarchy
of RESOURCE
elements.
<RESOURCE utype="SimDB.Experiment"> ...Experiment metadata... <RESOURCE utype="SimDB.Snapshot"> ...Snaphot metadata... <RESOURCE utype="TDFF.File"> ...File metadata, access reference... <TABLE utype="TDFF.Array"> ...Table of arrays metadata...
The response to a ListExperiments
request is a
VOTable containing a series of RESOURCE
elements,
where each RESOURCE
contains the metadata for a
single Experiment
. Individual attributes of
the Experiment
(taken from the SimDM), are listed
as PARAM
or LINK
elements in
the RESOURCE
. Attributes that are
collections, ParameterSetting
for
example, are listed as TABLE
s in the RESOURCE
.
The required and optional attributes are
in Section A.3.1 of the Appendix. This list has been
deliberately kept to a minimum, since not all data providers will
have a complete database with all of the classes from the
SimDM. Instead, the can use the a LINK
element for the
RefererenceURL
attribute to point the client to a
richer description of the simulation. Ideally, this would point
to an XML instance document describing
the Experiment
based on the XML Schema from the
SimDM. (SimDM or SimDB?)
Similarly, the required attributes are the same for all service
operations. It is assumed that a client performing
a ListExperiment
query is exploring
the Experiment
s, and would like more
metadata. However, when performing a QueryData
request, the client may already have
TODO: Should the service allow continuation tokens for long responses?
Per VOSI standard.
TODO: Can we use a GetCapabilities
request describe
custom services, such as S3? Also, we'll need to define the XML
Schema for the resource registration.
None.
TODO: Finish.
Name | UTYPE | Requrired? |
---|---|---|
EXPERIMENT | SimDB.Experiment.PublisherDID | OPT |
The
basic parameter is the PublisherDID
of
the Experiment
, but the query may be restricted
to one or more Snapshot
s, and one or
more Property
s. If no Snapshot
s are
requested, all of the file available from
the Experiment
are listed. Likewise,
if no Property
s are requested, all of the available
ones are listed.
Name | UTYPE | Required? |
---|---|---|
EXPERIMENT | SimDB.Experiment.PublisherDID | REQ |
SNAPSHOT | SimDB.Snapshot.PublisherDID | OPT |
PROPERTY | SimDB.RepresentationObject.Property | OPT |
Such operation refers to a single snapshot. Multiple sources cutouts, like for various time steps of the same simulation, are not supported by the protocol. Their implementation is up to the client, as, for example, sequences of requests with same subbox and fields but different datasets.
Name | UTYPE | Required? |
---|---|---|
EXPERIMENT | SimDB.Experiment.PublisherDID | REQ |
SNAPSHOT | SimDB.Snapshot.PublisherDID | REQ |
PROPERTY | SimDB.RepresentationObject.Property | OPT |
LEFTEDGE | TDFF.Array.LeftEdge | OPT |
RIGHTEDGE | TDFF.Array.RightEdge | OPT |
The preview can be implemented in different ways, depending on the specific data we are dealing with. In all the cases, if the service is supported, a getPreview method MUST be implemented. The input of this method is the basic couple EXPERIMENT and SNAPSHOT. The PROPERTY parameter may be used to specify which fields to preview (if supported, otherwise it is discarded). No FIELDS specification or a blank PROPERTY parameter, is interpeted as: preview all available fields. If PROPERTY requires unavailable quantities, the corresponding request is discarded. If the cutout service is available, the preview service MUST provide instruments to select the fields of interest and the cutout region.
Name | UTYPE | Required? |
---|---|---|
EXPERIMENT | SimDB.Experiment.PublisherDID | REQ |
SNAPSHOT | SimDB.Snapshot.PublisherDID | OPT |
PROPERTY | SimDB.RepresentationObject.Property | OPT |
Custom services must define their own input parameters and responses.
Parameter | Service Operation | |||||
---|---|---|---|---|---|---|
Name | UTYPE | ListExperiments | ListSnapshots | QueryData | Preview | Cutout |
EXPERIMENT | SimDB.Experiment.PublisherDID | N/A | OPT | REQ | REQ | REQ |
SNAPSHOT | SimDB.Snapshot.PublisherDID | N/A | N/A | OPT | OPT | REQ |
PROPERTY | SimDB.RepresentationObject.Property | N/A | N/A | OPT | OPT | OPT |
LEFTEDGE | TDFF.Array.LeftEdge | N/A | N/A | N/A | N/A | OPT |
RIGHTEDGE | TDFF.Array.RightEdge | N/A | N/A | N/A | N/A | OPT |
Tables are used to represent collections from the data model. In many cases, these tables are optional. In this case, the required fields (columns) of the table only apply if the service chooses to return that table. This way, the client can be assured of a minimal set of metadata if the table is returned.
Resource | Service Operation | |||||
---|---|---|---|---|---|---|
Name | UTYPE | ListExperiments | ListSnapshots | QueryData | Preview | Cutout |
EXPERIMENT | SimDB.Experiment | REQ | REQ | REQ | REQ | REQ |
SNAPSHOT | SimDB.Snapshot | OPT | REQ | REQ | REQ | REQ |
FILE | TDFF.File | OPT | OPT | REQ | REQ | REQ |
The Experiment
, Simulation
,
and PostProcessing
classes from the SimDB have more
attributes than are listed here. In principle, all of these
attribute can be returned by a SimDAP service, in addition to
appropriatedly related elements from the other classes,
namely Protocol
and its subclasses. The attributes
and collections given here are the ones most important for
describing the data.
UTYPE | VOT Element | Required? |
---|---|---|
SimDB.Experiment.Name | PARAM | REQ |
SimDB.Experiment.Created | PARAM | OPT |
SimDB.Experiment.Description | PARAM | OPT |
SimDB.Experiment.Status | PARAM | OPT |
SimDB.Experiment.Updated | PARAM | OPT |
SimDB.Experiment.ReferenceURL | LINK | REQ |
SimDB.Protocol.Name | PARAM | REQ |
SimDB.Protocol.PublisherDID | PARAM | REQ |
SimDB.Protocol.ReferenceURL | LINK | REQ |
SimDB.Protocol.Version | PARAM | OPT |
SimDB.Experiment.GenericParameterSetting | TABLE | OPT |
SimDB.Experiment.NumericParameterSetting | TABLE | OPT |
SimDB.Experiment.InputDataset | TABLE | OPT |
SimDB.Experiment.ExperimentRepresentationObject | TABLE | OPT |
Column | Required? |
---|---|
SimDB.Protocol.InputParameter.Name | REQ |
SimDB.Protocol.InputParameter.Description | OPT |
SimDB.Protocol.InputParameter.Datatype | REQ |
SimDB.Experiment.GenericParameterSetting.Value | REQ |
Column | Required? |
---|---|
SimDB.Protocol.InputParameter.Name | REQ |
SimDB.Protocol.InputParameter.Description | OPT |
SimDB.Protocol.InputParameter.Datatype | REQ |
SimDB.Experiment.NumericParameterSetting.Value.Value | REQ |
SimDB.Experiment.NumericParameterSetting.Value.Unit | REQ |
Column | Required? |
---|---|
SimDB.Experiment.Name | REQ |
SimDB.Experiment.PublisherDID | REQ |
SimDB.Experiment.ReferenceURL | REQ |
SimDB.Snapshot.PublisherDID | REQ |
Column | Required? |
---|---|
SimDB.Protocol.RepresentationObjectType.Name | REQ |
SimDB.Protocol.RepresentationObjectType.Description | OPT |
SimDB.Protocol.RepresentationObjectType.Label | OPT |
SimDB.Protocol.RepresentationObjectType.Type | REQ |
UTYPE | VOT Element | Required? |
---|---|---|
SimDB.Experiment.Snapshot.PublisherDID | PARAM | REQ |
UTYPE | VOT Element | Required |
---|---|---|
TDFF.File.PublisherDID | PARAM | REQ |
TDFF.File.AccessURL | LINK | REQ |
Protocol.FileType.PublisherDID | PARAM | REQ |
Protocol.FileType.Mimetype | PARAM | REQ |
TDFF.Array | TABLE | REQ |
Column | Required? |
---|---|
TDFF.Array.Name | REQ |
SimDB.Protocol.RepresentationObject.Name | REQ |
SimDB.Protocol.RepresentationObject.Description | OPT |
SimDB.Protocol.RepresentationObject.PublisherDID | REQ |
SimDB.Protocol.RepresentationObject.Property.Name | REQ |
SimDB.Protocol.RepresentationObject.Property.Description | OPT |
SimDB.Protocol.RepresentationObject.Property.PublisherDID | REQ |
TDFF.Array.Datatype | REQ |
Class | |||
---|---|---|---|
UTYPE | UCD1+ | Description | |
TDFF.FileType | ? | Type of file produced by a software protocol | |
Attributes | |||
UTYPE | UCD1+ | Datatype | Description |
TDFF.FileType.Name | string | Short name | |
TDFF.FileType.PublisherDID | string | Publisher assigned identifier of the FileType . |
|
TDFF.FileType.Description | text | ||
TDFF.FileType.Mimetype | string | Content-type | |
Class | |||
UTYPE | UCD1+ | Description | |
TDFF.File | ? | File or table containing one or more arrays | |
Attributes | |||
UTYPE | UCD1+ | Datatype | Description |
TDFF.File.Name | string | File name | |
TDFF.File.Type | string | Reference to the PublisherDID of the FileType . |
|
TDFF.File.PublisherDID | string | ||
TDFF.File.Size | int | Approximate size in KiB | |
TDFF.File.AccessURL | string | Resolvable URL for retrieving file | |
Class | |||
UTYPE | UCD1+ | Description | |
TDFF.Array | ? | Sequence of data values in a binary array or table column | |
Attributes | |||
UTYPE | UCD1+ | Datatype | Description |
TDFF.Array.Name | string | Array or column name | |
TDFF.Array.Dataype | string | Array name | |
TDFF.Array.Property | string | Reference to the PublisherDID of
the Property represented by the Array . |
|
TDFF.Array.Rank | int | Number of axes in the Array . |
|
TDFF.Array.Dims | int[] |
Array of length Rank indicate the number of
elements along each axis of the Array .
|
|
TDFF.Array.Offset | int | Number of bytes in the File before the
beginning of the Array . |
|
TDFF.Array.Stride | int | Number of bytes to skip between each element of the
Array . |
|
TDFF.Array.SkipByte | int | Claudio, do we need this if we're providing the offset for each array? | |
TDFF.Array.Endian | string | The endian-ness of the Array ;
possible values are "little" or "big". |
|
TDFF.Array.RowMajor | bool | Whether or not the Array is in row-major or
column-major order. |
|
TDFF.Array.InternalPath | string | Internal path of the Array if it is a
self-desciribing file format, such as FITS or HDF5. |
|
TDFF.Array.LeftEdge | float[] | Array of length Rank of the minimum spatial
extent in each dimension. |
|
TDFF.Array.RightEdge | float[] | Array of length Rank of the maximum spatial
extent in each dimension. |
[1] R. Hanisch, Resource Metadata for the Virtual Observatory
http://www.ivoa.net/Documents/latest/RM.html
[2] R. Hanisch, M. Dolensky, M. Leoni, Document Standards Management: Guidelines and Procedure
http://www.ivoa.net/Documents/latest/DocStdProc.html