/[volute]/trunk/projects/theory/snap/SimDAP.html
ViewVC logotype

Contents of /trunk/projects/theory/snap/SimDAP.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 415 - (show annotations)
Thu May 8 10:59:35 2008 UTC (12 years, 7 months ago) by claudio.gheller
File MIME type: text/html
File size: 40610 byte(s)
further changes on sections 1-3. Rewritten section 4
1 <?xml version="1.0" encoding="utf-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html>
4 <head>
5 <title>Simulation Data Access Protocol - Internal Draft</title>
6 <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
7 <meta name="keywords" content="IVOA, International, Virtual, Observatory, Alliance" />
8 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9 <meta name="maintainedBy" content="IVOA Document Coordinator, ivoadoc@ivoa.net" />
10 <link rel="stylesheet" href="http://ivoa.net/misc/ivoa_wg.css" type="text/css" />
11 </head>
12
13 <body>
14 <div class="head">
15 <a href="http://www.ivoa.net/"><img alt="IVOA" src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" width="300" height="169"/></a>
16 <h1>Simulation Data Access Protocol (SimDAP)<br/>
17 Draft</h1>
18 <h2>IVOA Note 20 April 2008</h2>
19
20 <dl>
21 <dt>This version:</dt>
22 <dd><a href="http://www.ivoa.net/Documents/...">
23 http://www.ivoa.net/Documents/...</a></dd>
24
25 <dt>Latest version:</dt>
26
27 <dd><a href="http://www.ivoa.net/Documents/latest/...">
28 http://www.ivoa.net/Documents/latest/...</a></dd>
29
30 <dt>Previous versions:</dt>
31 <dd><a href="http://www.ivoa.net/Documents/...">
32 http://www.ivoa.net/Documents/...</a></dd>
33 <dd><a href="http://www.ivoa.net/Documents/...">
34 http://www.ivoa.net/Documents/...</a></dd>
35 </dl>
36
37 <dl>
38 <dt>Interest Group:</dt>
39 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory">http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory</a></dd>
40 <dt>Author(s):</dt>
41 <dd>
42 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/ClaudioGheller">Claudio Gheller</a><br/>
43 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/GerardLemson">Gerard Lemson</a><br/>
44 </dd>
45 </dl>
46 <hr/></div>
47
48 <h2><a name="abstract" id="abstract">Abstract</a></h2>
49 <p>This specification defines a protocol for retrieving data coming from numerical simulations from a variety of data repositories through a uniform interface. The interface is meant to be reasonably simple to implement by service providers. Data are selected by a proper search procedure. Once data of interest is identified specific quantities can be selected and sub-samples can be extracted and downloaded. Data is returned in VOTable simulation specific format, with support of external binary file management.</p>
50
51 <div class="status">
52 <h2><a name="status" id="status">Status of this Document</a></h2>
53 This is a Note. The first release of this document was 18 May 2008.
54 <p></p><br />
55
56 <!-- Choose one of the following (and remove the rest)-->
57 <!--Note-->
58
59 <p>This is an IVOA Note expressing suggestions from and opinions of the authors.<br/>
60 It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory.
61 It should not be referenced or otherwise interpreted as a standard specification.</p>
62
63 A list of <a href="http://www.ivoa.net/Documents/">current IVOA Recommendations and other technical documents</a> can be found at http://www.ivoa.net/Documents/.
64
65 </div><br />
66
67 <h2><a name="acknowledgments" id="acknowledgments">Acknowledgments</a></h2>
68
69
70 <h2><a id="contents" name="contents">Contents</a></h2>
71 <div class="head">
72 <ul class="toc">
73 <li><a href="#abstract">Abstract</a></li>
74 <li><a href="#status">Status</a></li>
75 <li><a href="#acknowledgments">Acknowledgments</a></li>
76 <li><a href="#contents">Contents</a></li>
77 <li><a href="#sec1">1 Introduction</a></li>
78 <li><a href="#sec2">2 Requirements for Compliance</a></li>
79 <li><a href="#sec3">3 Simulation Selection and Units</a></li>
80 <ul class="toc">
81 <li><a href="#sec3_1">3.1 Simulation Data Model</a></li>
82 </ul>
83 <li><a href="#sec4">4 Subset Selection</a></li>
84 <li><a href="#sec5">5 SimDAP Request</a></li>
85 <ul class="toc">
86 <li><a href="#sec5_1">5.1 setSimDAP Input</a></li>
87 <ul class="toc">
88 <li><a href="#sec5_1_1">5.1.1 Region of Interest</a></li>
89 <li><a href="#sec5_1_2">5.1.2 Fields of Interest</a></li>
90 <li><a href="#sec5_1_3">5.1.3 Data Sources</a></li>
91 <li><a href="#sec5_1_4">5.1.4 File Format</a></li>
92 <li><a href="#sec5_1_5">5.1.5 Service Defined Parameters</a></li>
93 </ul>
94 <li><a href="#sec5_2">5.2 setSimDAP Output</a></li>
95 </ul>
96 <li><a href="#sec6">6 Data Staging</a></li>
97 <li><a href="#sec7">7 Data Delivery</a></li>
98 <li><a href="#sec8">8 Service Registration</a></li>
99 <br/>
100 <li><a href="#appA">Appendix A: VOTable Examples</a></li>
101 <ul class="toc">
102 <li><a href="#appA_1">A.1 VOTable for the velocity field of a fluid on a fixed 3D mesh</a></li>
103 <li><a href="#appA_2">A.2 VOTable for the velocity and position fields of particles from an N-Body simulation</a></li>
104 <li><a href="#appA_3">A.3. VOTable for the temperature field of a mesh based quantity and the position of N-Body particles extracted from the same spatial region.</a></li>
105 </ul>
106 <li><a href="#appB">Appendix B: Binary File Formats</a></li>
107 <br/>
108 <li><a href="#references">References</a></li>
109 </ul>
110 </div>
111 <hr/>
112
113 <br/>
114 <h2><a name="sec1">1 Introduction</a></h2>
115 <p>This specification defines a prototype standard for retrieving theoretical data from a variety of astrophysical simulation repositories: the Simulation Data Access Protocol (hereafter SimDAP). In this context <i>Theoretical Data</i> is defined the outcome of different kinds of numerical applications, like dynamical simulations, semianalytical models, montecarlo simulations etc.</p>
116 <p>The standard is intended to define a basic theoretical data service, represented by the selection and retrieval of a set of data, according to a few specific requirements. Such service is particularly relevant since:</p>
117 <ul>
118 <li>Many (if not most of) data processing applications rely on it to access data</li>
119 <li>It leads to a (strong) reduction of the data volume to be moved across the network, permitting its download, otherwise difficult, if not impossible</li>
120 </ul>
121 <p>The datasets we will deal with, can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as <i>snapshot</i> of a numerical application. Snapshots are our data sources. No further assumption are made on data. </p>
122 <p>From the definition of snapshot, we can immediately specify a <i>simple</i> access protocol, which we define as a <i>rectangular cutout</i> of the parameters space. This consists in extracting all the simulated elements for which some parameters have values in a given range. Notice that no assumption is made on the dimensionality of the problem (selection can be done on any number of parameters) or on the nature of the parameters (no restrictions to the parameters adopted in the selection operation). However, it can be convenient to consider as a favoured case, a 3D geometric selection, in which data are extracted according to its position in the 3D space. In practice, spatial coordinates are adopted as parameters for the query. This case is particularly simple and intuitive. Furthermore it is common to a large number of applications. Therefore, it will be developed in details in the next sections. However, everything we propose can be extended to a generic number of parameters. In the rest of the document we will refer to:</p>
123 <ul>
124 <li>position, as a N-uple of selection parameters which define a position in the phase space (e.g. the center of a the 3D geometric box)</li>
125 <li>size, as the extension of the selected region (dependent on the position specification) in the N-Dim phase space (sides of the 3D deometric box)</li>
126 </ul>
127 <p><img src="extraction.JPG" width="582" height="246" border="0"/></p>
128 <p>The SimDAP protocol is designed primarily as a &quot;data on demand&quot; service, with dataset created on the fly by the service given the position and size of the desired output dataset as specified by the client. This is not a simple task for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation. Furthermore, simulation outputs can be represented by a wode variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.</p>
129 <p>In operation, SimDAP represents a negotiation between the client and the data service, which allows to select and retrieve a specific subset of available data. The retrieval of the complete dataset can be considered as a <i>degenerate</i> selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file, since data are delivered in the standard TVO format described in section X.X.</p>
130 <p>
131 In summary, we can identify five main stages for the SimDAP service.
132
133 <dl>
134 <dt>Selection of simulations and data</dt>
135 <dd>(Section 3) According to the results of a simulation discovery procedure (not part of the SimDAP protocol) select potentially interesting simulations.</dd>
136 <dt>Identification of subset of interest<dt>
137 <dd>(Section 4) The user identifies a subset of the full simulation data which is of interest.</dd>
138 <dt>SimDAP request</dt>
139 <dd>(Section 5) Send to the server the selection parameters for the SimDAP operation.</dd>
140 <dt>Data staging and delivery</dt>
141 <dd>(Section 6 and 7) Metadata are delivered to the client as a VOTable or a more general XML file. Data are staged and delivered (possibly after some time, needed for extraction) via HTTP, FTP etc. as binary files + XML descriptors.
142 Delivery of VOTable and binary data files can be in two separated stages.</dd>
143 <dt>Service registration</dt>
144 <dd>(Section 8) SimDAP services need to be published in available registry. Registry inquiry must be performed according to the SimDB data modelS</dd>
145 <dd/>
146 </dl>
147 <h2><br/></h2>
148 <h2><a name="sec2">2 Requirements for Compliance</a></h2>
149 <p>The keywords "MUST", "REQUIRED", "SHOULD", and "MAY" as used in this document are to be interpreted as described in RFC 2119 [34]. An implementation is compliant if it satisfies all the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant".</p>
150
151 <p>Compliance with this specification requires that a SimDAP service is maintained with the following characteristics:</p>
152
153 <ol>
154 <li><font color="red">The service MUST support a Simulation Selection service as described in section 3 below. The SimDAP service MUST provide tools to select the datasets and the regions of interest and proceed with following steps of the SimDAP procedure.</font></li>
155
156 <li><font color="red">The SimDAP service MUST support a getUnits method (or getFields method&acirc;&#128;&brvbar; to be discussed) This method allows clients to get the list of units associated to the available fields.</font></li>
157
158 <li><font color="red">The Sub-Volume Extraction method SHOULD be supported as defined in section 4 below. If supported, a getThumb method MUST be available
159 This method allows clients to retrieve data from a spatially defined sub-volume of the simulation box. The client determines the rectangular or spherical region within the simulation, the bounds and scale (i.e. units) of which are specified in the simulation metadata, and the service returns the simulation data contained within this region. The service SHOULD use a staging method (section 6) to return the particle file, as extracting a sub-sample of particles or grid points from a larger simulation box is likely to be a time-consuming process and would thus require some kind of caching. </font></li>
160
161 <li><font color="red">The setSimDAP method MUST be supported as defined in section 5 below This method allows clients to submit a SimDAP operation.</font></li>
162
163 <li><font color="red">The data retrieval (getSimDAP) method MUST be supported as defined in section 7 below.
164 This method allows clients to retrieve single simulation snapshots and cutouts</font></li>
165
166 <li><font color="red">The SimDAP service MUST be registered by providing the information defined in section 8 below. Registration allows clients to use a central registry service to locate compliant simulation access services and select an optimal subset of services to query, based on the characteristics of each service and the simulation data collections it serves.</font></li>
167
168 <li><font color="red">Job management request methods, getSimDAPInfo, cancelSimDAP, MAY be supported.
169 These methods allow users to inquire about the status of a submitted request and, possibly, to cancel it.</font></li>
170 </ol>
171
172 <br/>
173 <h2><a name="sec3">3 Simulation Selection and Units</a></h2>
174 <p>Search and exploration of available data archives and collections is described in detail in REF. The user select the datasets of interest. Each dataset is characterized by a set of matadata according to the SimDM. Several of these parameters represent the input data for all the SimDAP actions:</p>
175 <ul>
176 <li>DATASERVICE = unique identifier of the data provider </li>
177 <li>DATASOURCE = unique identifier of the dataset at the data provider</li>
178 </ul>
179 <p>Other two parameters are used at some stages of the SimDAP&nbsp;protocol:</p>
180 <ul>
181 <li>FIELDS = list of the physical quantities delected for the extraction (subset of the complete list available with the data search)</li>
182 <li>UNITS = list of the units of the selected physical quantities as stored in the data archive</li>
183 </ul>
184 <p>Notice that FIELDS and UNITS must have the same number of entries. For example:</p>
185 <p><font face="Courier New,Courier,Monaco">FIELDS = &quot;xposition,yposition,zposition,velocity,temperature&quot; UNITS = &quot;Mpc,Mpc,Mpc,km sec-1,K&quot;</font></p>
186 <p>UNITS is used to convert client side to server side units. In the case of dimensionless/normalized quantities, UNITS must specify a conversion factor to a physical unit. For instance, if coordinates of an N-Body simulations are defined between 0 and 1, UNITS&nbsp;must specify that L(adim)=1 corresponds to L(phys)=100Mpc, where L() represents a length measure. This is specified by the SimDM.</p>
187 <p>All the parameters are retrieved as specified in REF. </p>
188 <br/>
189 <h2><a name="sec4">4 Subset Selection</a></h2>
190 <p>The result of a SimDAP request can be the whole snapshot or a part of it. The service MUST provide tools to enable the client to </p>
191 <ul>
192 <li>preview the available datasets</li>
193 <li>select a subset specifying its size and position </li>
194 </ul>
195 <p>Both these actions could require the availability of a &quot;simplified&quot; (but meaningful) version of the data, namely a <i>thumbnail</i>, easy to download and handle. The preview and the thumbnails features depend on the data and their implementation is up to the data provider. E.g. they could be represented by:</p>
196 <ul>
197 <li>projections of the computational box in the three coordinate directions (images),</li>
198 <li>a random or decimated sample of the dataset (in particular for point like data),</li>
199 <li>a reduced resolution realization of the dataset (e.g. averages over neighboring cells of a computational mesh)</li>
200 <li>a &quot;clever&quot; selection of regions according to specific criteria (e.g. &quot;overdense&quot; regions) implemented by proper algorithms</li>
201 </ul>
202 <p>However a getThumb method MUST be implemented. The input of this method is a couple of DATASERVICE, DATASOURCE parameters which identifies the dataset of interest. The output is a VOTable in the TVO standard. As for the results of the SimDAP procedure, thumbnails data are stored in external binary files. However, these files are immediately downloaded (together with the VOTable, as a response to the web method), since their size is small and they may be precalculated.</p>
203 <br/>
204 <h2><a name="sec5">5 SimDAP Request</a></h2>
205 <p>The main target of the SimDAP service is the access to the raw data from a simulation, selected by a general Simulation Query, described in section 3. The SimDAP service in general provides the following functionalities:</p>
206 <ol>
207 <li>Extraction of a subset of data properly selected (data cutout)</li>
208 <li>Storage of the associated metadata in a VOTable (see later in this section) delivered to the client</li>
209 <li>Staging of the extracted data and their delivery to the client via http, ftp etc. (see section 6 and 7)</li>
210 </ol>
211 <p>
212 In principle, the extraction phase (1) could be performed using any of the set of N parameters that characterize the simulation. However, for simplicity, in a first stage of development, we will focus on geometric selections, allowing the user to select a either rectangular or spherical sub region of the entire computational volume, without having to download the whole dataset. Of course, it is still possible to retrieve the complete dataset. This can be seen as a degenerate cutout request, with a region of interest which covers the entire computational volume. Notice that this action is not just a simple download, since action 2 is still performed.
213 </p>
214 <p>
215 In order to submit the SimDAP request, a setSimDAP() web method (see below) MUST be implemented, with parameters defined as follows:
216 </p>
217 <p>
218 To select the region of interest, only geometric parameters are necessary. For a rectangular region, the user has to specify the center of the box and the length of each of its sides. For a spherical selection, center and radius of the sphere are required. One or more variables of a given snapshot can be selected in the same cutout operation.
219 </p>
220 <p>
221 For regions that intersect the boundary of the simulation box, the service has the option of applying different types of boundary conditions. Possible solutions are truncated boundary conditions (the sub-box is truncated at the box boundaries) and periodic boundary conditions (if applicable). The resulting file will be made available through an access URL, possibly using the SimDAPshot staging method, notifying the client when the sub-volume extraction has been completed and the resulting particle file is available for retrieval. This appears to be a necessary feature due to the rapidly increasing size of data files associated to the increasing availability of computing power. Consequently, the processing time to extract requested volumes could be high, larger than a typical working session. Furthermore it is important to stress that, differently from what generally happens when retrieving observational images and data, simulation data are usually large and it is not convenient to retrieve them via http with some kind of encoding for the binaries (e.g. base64). This is, in fact, extremely expensive, both for CPU (time spent in encoding and decoding data) and for size (the encoded file is larger than the original one).
222 </p>
223 <h3><a name="sec5_1">5.1 setSimDAP input</a></h3>
224 <p>In order to submit a SimDAP request, the following parameters must be specified and passed to the server.</p>
225
226 <h4><a name="sec5_1_1">5.1.1 Region of Interest</a></h4>
227 <p>
228 An input Sub-Volume query must consist of an x,y,z position in the box, plus the side lengths (or radius) of the rectangular (spherical) region surrounding this point. These quantities MUST be specified in the units published by the server.
229 </p>
230 <p>
231 The service MUST support the following two parameters:
232 </p>
233 <dl>
234 <dt>POS</dt>
235 <dd>The position of the center of the region of interest, expressed as a set of three coordinates in fractions of the corresponding box side. A comma should delimit the three values; embedded whitespace is not permitted. Example: "POS=0.3,0.25,0.9". A NULL value represents the center of the whole box (0.5,0.5,0.5).</dd>
236 <dt>SIZE</dt>
237 <dd>The length of the sides (or the radius) of the region. The region may be specified using either one or three values. If only one value is given it represents the radius of a sphere. If three values are provided (all the same for a cubic box), a rectangular subbox is defined. The format of the SIZE parameter is the same as that for POS. Example "SIZE=0.2,0.5,0.3". A special case is SIZE=NULL, which represents the whole box.</dd>
238 </dl>
239
240 <p>
241 In addition, the service MAY support the following parameter specifying the adopted boundary conditions:
242 </p>
243 <dl>
244 <dt>BOUNDARY</dt>
245 <dd>This parameter can have three values, one for each coordinate direction. Possible values are:
246 <dl>
247 <dt>TRUNC</dt>
248 <dd>if the interesting region exceeds the computational box, it is resized at the box boundary</dd>
249 <dt>PERIODIC</dt>
250 <dd>if the interesting region exceeds the computational box, data are selected from the opposite side of the box</dd>
251 </dl>
252 </dd>
253 </dl>
254 <p>
255 Registry metadata of the service indicates what kind of boundary conditions are supported.
256 </p>
257
258 <h4><a name="sec5_1_2">5.1.2 Fields of Interest</a></h4>
259 <p>
260 The user can specify the physical quantities he is interested in, which can be a subset of the available ones.
261 </p>
262 <dl>
263 <dt>FIELDS</dt>
264 <dd>The service MAY support an optional parameter with the name FIELDS, the value of which is a comma separated list of field names corresponding to the data elements the simulation can return. If the parameter is not provided or it is set to NULL all fields are returned. The fields name are published by the server (see…). Example: "FIELDS=Density,Temperature,Velocity_x".. </dd>
265 </dl>
266
267 <h4><a name="sec5_1_3">5.1.3 Data Sources</a></h4>
268 <p>
269 Simulations outputs are stored in files. This files can be indicated by a reference name which identify unambiguously the data source. This link can be provided directly by the client, by registries or/and by the middleware software which a distributed archive is built on (e.g. SRB, OGSA-DAI…). The data source can be also a database. However, this does not imply anything on the service interface implementation. The complexity of the database access is hidden behind the setSimDAP operation and its implementation. But this is up to the service provider.
270 </p>
271 <p>
272 The service id MUST also be specified.
273 </p>
274 <p>A SimDAP operation MUST refer to a single data source. Multiple sources cutouts, like for various time steps of the same simulation, cannot be supported by the protocol. Their implementation is up to the client, as, for example, sequences of single source requests with same subbox and fields. The client must verify that such operation is possible and/or meaningful.</p>
275 <dl>
276 <dt>DATASERVICE</dt>
277 <dd>Identification of the data service (to be better specified)</dd>
278 <dt>DATASOURCE</dt>
279 <dd>The service MUST support an optional parameter with the name DATASOURCE, the value of which is single data source reference. The DATASOURCES parameter MUST be set.</dd>
280 </dl>
281 <p>
282 Examples:
283 </p>
284 <pre>
285 DATASOURCE=/scratch/my_directory/myfile1.bin
286 DATASOURCE=myfile2.ref
287 </pre>
288
289 <h4><a name="sec5_1_4">5.1.4 File Format</a></h4>
290 <p>The SimDAP service deliver its results as VOTables with associated binary files. The service MAY support a parameter with the name FORMAT to indicate the desired format or formats of the data referenced by the output table. The value is a comma-delimited list where each element can be any recognized MIME-type. Possible formats are:</p>
291 <ul>
292 <li>data/raw_tabular</li>
293 <li>data/raw_sequential</li>
294 <li>data/votable</li>
295 <li>data/hdf5</li>
296 <li>data/fits</li>
297 </ul>
298 <p>
299 to be discussed further.
300 </p>
301 <h4><a name="sec5_1_5">5.1.5 Service Defined Parameters</a></h4>
302 <p>
303 The service MAY support additional service-specific parameters. The names, meanings, and allowed values are defined by the service. The names need not be upper-case; however, they should not match any of the reserved parameter names defined above.
304 </p>
305
306 <h3><a name="sec5_2">5.2 setSimDAP Output</a></h3>
307 <p>The output produced by a SimDAP cutout request is a VOTable, an XML table format, returned with a MIME-type of text/xml, plus an external binary file with the extracted data. The VOTable is characterized by the following items:</p>
308 <ol>
309 <li>The VOTable MUST contain a RESOURCE element, identified with the tag type="results", containing one or more TABLE elements with the metadata results of the setSimDAP operation. The VOTable is permitted to contain additional RESOURCE elements, but the usage of any such elements is not defined here. If multiple resources are present it is recommended that the query results be returned in the first resource element.</li>
310 <li>The VOTable MUST contain a DATASERVICE parameter which identifies the used service. </li>
311 <li>The VOTable MUST contain a REQUEST_ID parameter which identifies uniquely the job request on the service. REQUEST_ID is a 4 bytes integer.</li>
312 <li>The VOTable MUST contain a REQUEST_STATUS parameter which can be Ok or Rejected. In this last case all the other fields of the VOTable are not present.</li>
313 <li>TABLE contains different species extracted from the dataset. Species can differ either by their geometrical representation (e.g. particles, regular meshes, AMR meshes…) or in their "physical meaning" (e.g. star particles vs. dark matter particles). All the FIELDS in a table have the same number of elements, specified by the arraysize parameter. This parameter set also the geometry of the quantity. E.g. arraysize="N" represents a point like quantity; arraysize="NxMxS" represents a grid based variable. For point like quantities arraysize is NOT mandatory, since often it cannot be calculated on-the-fly. Resulting data FIELDS are stored one after the other in a single binary file, in the same order they appear in the VOTable.</li>
314 <li>Each TABLE MUST contain FIELDs where the UCDs that follow have been set. FIELDS refer to the variables stored in the external binary file. </li>
315 <li>Variables must be scalars. Vectors and more generally, multidimensional quantities, are not supported. This means that each FIELD represents a scalar value. E.g. temperature of each point, x coordinate of a particle.</li>
316 <li>Each FIELD must specify the datatype and the unit of the variable. Furthermore name, ID, and ucd has to be set. The ucds for simulations are still in progress, therefore we do not enter in more details.</li>
317 <li>The acref binary data file reference is specified in a DATA section, according to the rules defined in other documents (e.g. SIAP specification)</li>
318 </ol>
319 <p>
320 Other parameters may be supported according to the services offered by the data provider.
321 </p>
322 <br/>
323 <h2><a name="sec6">6 Data Staging</a></h2>
324 <p>
325 By Data Staging we refer to the processing the server performs to retrieve or generate the requested simulation volumes and cache them in online storage for retrieval by a client. Staging is necessary for large archives which must retrieve simulation data from hierarchical storage, or for services which can dynamically extract subvolumes, where it may take a substantial time (e.g. minutes or hours) to retrieve the particles in the relevant region of the simulation box. Issuing a staging request for a set of simulation subvolumes (e.g. for a set of small cubes randomly placed in a simulation box) also permits large servers to optimize subvolume extraction, for example to take advantage of parallelization for large requests.
326 </p>
327 <p>
328 The snapshot staging service is optional for the simulation server. If staging is not implemented, data should be immediately available for retrieval (URL direct to file). The availability of this function is communicated to the registry services.
329 </p>
330 <p>
331 When staging of data is necessary, the technique used is to stage data on the server for later retrieval by the client. The data is only staged for a period of time and is eventually deleted by the service. The getSimDAP method (see section 7) is identical whether or not staging is used. The service can proceed to generate the simulation sub-volume regardless of the state or accessibility of the client.
332 </p>
333 <p>
334 As soon as staged data are available at the given URL, the user can start the download procedure. The user can be informed of the availability of the data following two different approaches:
335 </p>
336 <ol>
337 <li>The client searches for the data on the service (e.g. reload a web/ftp page).</li>
338 <li>The service searches for the client and, if present, sends information to it.</li>
339 </ol>
340 <p>
341 The first approach is simpler. In its most strightforward implementation, it simply consists in making the client reload the data URL, to see if data are there.
342 </p>
343 <p>
344 In the second approach, the staging mechanism should provide a messaging capability. The service broadcasts messages to subscribing clients whenever a staging (processing) event occurs, such as when the sub-volume extraction has been completed and is available for retrieval. Service generated messages can also be used to pass informational or diagnostic messages to clients as processing proceeds. This type of messaging is asynchronous and one way: the service broadcasts messages to subscribing clients as things happen, whereas clients send requests to the service to invoke web methods. For example, to initiate staging, subscribe to staging-related messages, or abort a staging operation in progress, the client sends a request to (invokes a web method provided by) the service.
345 </p>
346 <p>
347 SimDAP is not just a search-and-download service, but it requires also running processes and, possibly, managing them (see later in this section). Therefore the authentication of the client should be required. This is strictly required for approach 2, in which the user must be detected and identified by the service. However, authentication should be always necessary for security and privacy reasons: access to the services should be granted only to "trustable" users with proper privileges (some data could be available only for specific communities etc.) and extracted data should be accessible only to the user who performed the request.
348 </p>
349 <p>
350 Authentication could be on a username-password basis or on some more sophisticated methods, like certificates. This choice is up to the service provider. Authentication allows the user to use the scheduling/batch system which is implemented by the service provider. This system set all the policies of access to the resources (requests pipeline, multiple requests from the same user, CPU time limits, accounting). Obviously, also these choices are up to the provider, who is only required to notify all the available features to the registry service.
351 </p>
352 <p>
353 Since the SimDAP request is staged, the provider should support at least two basic operations:
354 </p>
355 <ul>
356 <li>Job monitoring</li>
357 <li>Job cancellation</li>
358 </ul>
359 <p>
360 The specific implementation of the two operations depends on the adopted service technology.
361 </p>
362 <p>
363 Both operations use the SERVICE and REQUEST_ID parameters written in the VOTable. They are called using proper web methods:
364 </p>
365 <ul>
366 <li><code>getSimDAPInfo(SERVICE, REQUEST_ID, SimDAPINFO)</code></li>
367 <li><code>cancelSimDAP(SERVICE, REQUEST_ID, SimDAPINFO)</code></li>
368 </ul>
369 <p>The getSimDAPInfo method returns a SimDAPINFO string with the following information: STATUS (Idle, Hold, Cancelled, Running, reJected), SUBMISSION_DATE, other (up to the service provider, specified to the registry). The cancelSimDAP method returns a SimDAPINFO string that can have the values &quot;Ok&quot; or &quot;Rejected&quot;. Other services can be implemented and registered by the provider.
370 </p>
371
372 <br/>
373 <h2><a name="sec7">7 Data Delivery</a></h2>
374 <p>
375 The getSimDAP(acref, SERVICE, STATUS) web method allows a client to retrieve a single raw simulation file given the access reference (acref) in the result VOTable. The file can contains more than one variable and can be in the formats defined in Section 5. The files can be downloaded using http, ftp, grid ftp protocols (or any other useful protocol). All the metadata about the content and the structure of the data file is stored in the associated VOTable (see Appendix A).</p>
376 <p>
377 The getSimDAP method returns a STATUS string which can be Ok, Rejected or Defferred (if data are not yet available).
378 XML header files are stored as well and they are downloaded together with the binary file using the same getSimDAP method.</p>
379
380
381 <br/>
382 <h2><a name="sec8">8 Service Registration</a></h2>
383 <p>
384 The following features and methods MUST be published to the registration service.
385 </p>
386
387
388 <h2><a name="appA">Appendix A: VOTable examples</a></h2>
389
390 <h3><a name="appA_1">A.1 VOTable for the velocity field of a fluid on a fixed 3D mesh</a></h3>
391
392 <p>
393 [GL – We still need a proper way I guess of indicating what the spatial dimensions are for a representation like this. FITS has its WCS system for implicitly specifying the spatial coordinates of a multidimensional array. Is something like this in existence for VOTable ? We need to inquire.]
394 </p>
395
396 <pre>
397 &lt;RESOURCE name="myVectorField" type="results" &gt;
398 &lt;DESCRIPTION&gt;Velocity Field from N-Body run&lt;/DESCRIPTION&gt;
399 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
400
401 &lt;TABLE name="VelocityField" ID="Vel" order="sequential"&gt;
402 &lt;FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x" datatype="float"
403 arraysize="41x41x41" unit="km/s" geometry="mesh" /&gt;
404 &lt;FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y" datatype="float"
405 arraysize="41x41x41" unit="km/s" geometry="mesh" /&gt;
406 &lt;FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z" datatype="float"
407 arraysize="41x41x41" unit="km/s" geometry="mesh" /&gt;
408 &lt;DATA&gt;
409 &lt;BINARY&gt;
410 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
411 &lt;/BINARY&gt;
412 &lt;/DATA&gt;
413 &lt;/TABLE&gt;
414 &lt;/RESOURCE&gt;
415 &lt;/VOTABLE&gt;
416 </pre>
417
418 <h3><a name="appA_2">A.2. VOTable for the velocity and position fields of particles from an N-Body simulation</a></h3>
419
420 <pre>
421 &lt;RESOURCE name=myParticles type="results"&gt;
422 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
423 &lt;TABLE name="Particles" ID="NBody" order="tabular"&gt;
424 &lt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
425 datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /&gt;
426 &lt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
427 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
428 &lt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
429 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
430 &lt;FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
431 datatype="float"arraysize="100000" unit="km/s" geometry="particles" /&gt;
432 &lt;FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
433 datatype="float"arraysize="100000" unit="km/s" geometry="particles" /&gt;
434 &lt;FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
435 datatype="float" arraysize="100000" unit="km/s" /&gt;
436 &lt;DATA&gt;
437 &lt;BINARY&gt;
438 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
439 &lt;/BINARY&gt;
440 &lt;/DATA&gt;
441 &lt;/TABLE&gt;
442 &lt;/RESOURCE&gt;
443 &lt;/VOTABLE&gt;
444 </pre>
445
446 <h3><a name="appA_3">A.3. VOTable for the temperature field of a mesh based quantity and the position of N-Body particles extracted from the same spatial region.</a></h3>
447
448 <pre>
449 &lt;RESOURCE name=myMixedData type="results"&gt;
450 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
451 &lt;TABLE name="ParticlesAndMesh" ID="NBody" order="sequential"&gt;
452 &lt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
453 datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /&gt;
454 &lt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
455 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
456 &lt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
457 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
458 &lt;FIELD name="temperature" ID="temp" ucd="phys.temperature;pos.cartesian.x"
459 datatype="float"arraysize="41x41x41" unit="K" geometry="mesh" /&gt;
460 &lt;DATA&gt;
461 &lt;BINARY&gt;
462 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
463 &lt;/BINARY&gt;
464 &lt;/DATA&gt;
465 &lt;/TABLE&gt;
466 &lt;/RESOURCE&gt;
467 &lt;/VOTABLE&gt;
468 </pre>
469
470 <p>
471 An alternate version
472 </p>
473
474 <pre>
475 &lt;VOTABLE&gt;
476 &lt;RESOURCE name=myMixedData type="results"&gt;
477 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
478 &lt;TABLE name="Particles" ID="NBodyParticles" order="sequential"&gt;
479 &lt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
480 datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /&gt;
481 &lt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
482 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
483 &lt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
484 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
485 &lt;DATA&gt;
486 &lt;BINARY&gt;
487 &lt;STREAM href=_mesh"file:///scratch/myhome/test_particles.bin"/&gt;
488 &lt;/BINARY&gt;
489 &lt;/DATA&gt;
490 &lt;/TABLE&gt;
491 &lt;TABLE name="Mesh" ID="NBodyMesh" order="sequential"&gt;
492 &lt;FIELD name="temperature" ID="temp" ucd="phys.temperature;pos.cartesian.x"
493 datatype="float"arraysize="41x41x41" unit="K" geometry="mesh" /&gt;
494 &lt;DATA&gt;
495 &lt;BINARY&gt;
496 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
497 &lt;/BINARY&gt;
498 &lt;/DATA&gt;
499 &lt;/TABLE&gt;
500 &lt;/RESOURCE&gt;
501 &lt;/VOTABLE&gt;
502 </pre>
503
504 <p>
505 [GL – Do we need an example of an "ordinary" tabular VOTable as well ? Something like
506 </p>
507
508 <pre>
509 &gt;RESOURCE name=myParticles type="results"&lt;
510 &gt;INFO name="QUERY_STATUS" value="OK"/&lt;
511 &gt;TABLE name="Particles" ID="NBody" &lt;
512 &gt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
513 datatype="float" unit="Mpc" /&lt;
514 &gt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
515 datatype="float" unit="Mpc" /&lt;
516 &gt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
517 datatype="float" unit="Mpc" /&lt;
518 &gt;FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
519 datatype="float" unit="km/s"/&lt;
520 &gt;FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
521 datatype="float" unit="km/s" /&lt;
522 &gt;FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
523 datatype="float" unit="km/s" /&lt;
524 &gt;DATA&lt;
525 &gt;BINARY&lt;
526 &gt;STREAM href="file:///scratch/myhome/test.bin"/&lt;
527 &gt;/BINARY&lt;
528 &gt;/DATA&lt;
529 &gt;/TABLE&lt;
530 &gt;/RESOURCE&lt;
531 &gt;/VOTABLE&lt;
532 </pre>
533 <p>
534 ]
535 </p>
536 <br/>
537
538 <h2><a name="appB">Appendix B: Binary File Formats</a></h2>
539 <p>
540 To be done.
541 </p>
542 <br/>
543 <h2><a name="references">References</a></h2>
544
545 <p>[1] R. Hanisch, <i>Resource Metadata for the Virtual Observatory</i>
546 <br/><a href="http://www.ivoa.net/Documents/latest/RM.html">http://www.ivoa.net/Documents/latest/RM.html</a>
547 </p>
548 <p>[2] R. Hanisch, M. Dolensky, M. Leoni, <i>Document Standards Management: Guidelines and Procedure</i>
549 <br/><a href="http://www.ivoa.net/Documents/latest/DocStdProc.html">http://www.ivoa.net/Documents/latest/DocStdProc.html</a>
550 </p>
551
552
553 </body></html>

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26