/[volute]/trunk/projects/theory/snap/SimDAP.html
ViewVC logotype

Contents of /trunk/projects/theory/snap/SimDAP.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 461 - (show annotations)
Mon May 12 14:07:49 2008 UTC (12 years, 6 months ago) by claudio.gheller
File MIME type: text/html
File size: 39434 byte(s)
minor changes
1 <?xml version="1.0" encoding="utf-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html>
4 <head>
5 <title>Simulation Data Access Protocol - Internal Draft</title>
6 <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
7 <meta name="keywords" content="IVOA, International, Virtual, Observatory, Alliance" />
8 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9 <meta name="maintainedBy" content="IVOA Document Coordinator, ivoadoc@ivoa.net" />
10 <link rel="stylesheet" href="http://ivoa.net/misc/ivoa_wg.css" type="text/css" />
11 </head>
12
13 <body>
14 <div class="head">
15 <a href="http://www.ivoa.net/"><img alt="IVOA" src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" width="300" height="169"/></a>
16 <h1>Simulation Data Access Protocol (SimDAP)<br/>
17 Draft</h1>
18 <h2>IVOA Note 20 April 2008</h2>
19
20 <dl>
21 <dt>This version:</dt>
22 <dd><a href="http://www.ivoa.net/Documents/...">
23 http://www.ivoa.net/Documents/...</a></dd>
24
25 <dt>Latest version:</dt>
26
27 <dd><a href="http://www.ivoa.net/Documents/latest/...">
28 http://www.ivoa.net/Documents/latest/...</a></dd>
29
30 <dt>Previous versions:</dt>
31 <dd><a href="http://www.ivoa.net/Documents/...">
32 http://www.ivoa.net/Documents/...</a></dd>
33 <dd><a href="http://www.ivoa.net/Documents/...">
34 http://www.ivoa.net/Documents/...</a></dd>
35 </dl>
36
37 <dl>
38 <dt>Interest Group:</dt>
39 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory">http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory</a></dd>
40 <dt>Author(s):</dt>
41 <dd>
42 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/ClaudioGheller">Claudio Gheller</a><br/>
43 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/GerardLemson">Gerard Lemson</a><br/>
44 </dd>
45 </dl>
46 <hr/></div>
47
48 <h2><a name="abstract" id="abstract">Abstract</a></h2>
49 <p>This specification defines a protocol for retrieving data coming from numerical simulations from a variety of data repositories through a uniform interface. The interface is meant to be reasonably simple to implement by service providers. Data are selected by a proper search procedure. Once data of interest is identified specific quantities can be selected and sub-samples can be extracted and downloaded. Data is returned in VOTable simulation specific format, with support of external binary file management.</p>
50
51 <div class="status">
52 <h2><a name="status" id="status">Status of this Document</a></h2>
53 This is a Note. The first release of this document was 18 May 2008.
54 <p></p><br />
55
56 <!-- Choose one of the following (and remove the rest)-->
57 <!--Note-->
58
59 <p>This is an IVOA Note expressing suggestions from and opinions of the authors.<br/>
60 It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory.
61 It should not be referenced or otherwise interpreted as a standard specification.</p>
62
63 A list of <a href="http://www.ivoa.net/Documents/">current IVOA Recommendations and other technical documents</a> can be found at http://www.ivoa.net/Documents/.
64
65 </div><br />
66
67 <h2><a name="acknowledgments" id="acknowledgments">Acknowledgments</a></h2>
68
69
70 <h2><a id="contents" name="contents">Contents</a></h2>
71 <div class="head">
72 <ul class="toc">
73 <li><a href="#abstract">Abstract</a></li>
74 <li><a href="#status">Status</a></li>
75 <li><a href="#acknowledgments">Acknowledgments</a></li>
76 <li><a href="#contents">Contents</a></li>
77 <li><a href="#sec1">1 Introduction</a></li>
78 <li><a href="#sec2">2 Requirements for Compliance</a></li>
79 <li><a href="#sec3">3 Simulation Selection and Units</a></li>
80 <ul class="toc">
81 <li><a href="#sec3_1">3.1 Simulation Data Model</a></li>
82 </ul>
83 <li><a href="#sec4">4 Subset Selection</a></li>
84 <li><a href="#sec5">5 SimDAP Request</a></li>
85 <ul class="toc">
86 <li><a href="#sec5_1">5.1 setSimDAP Input</a></li>
87 <ul class="toc">
88 <li><a href="#sec5_1_1">5.1.1 Region of Interest</a></li>
89 <li><a href="#sec5_1_2">5.1.2 Fields of Interest</a></li>
90 <li><a href="#sec5_1_3">5.1.3 Data Sources</a></li>
91 <li><a href="#sec5_1_4">5.1.4 File Format</a></li>
92 <li><a href="#sec5_1_5">5.1.5 Service Defined Parameters</a></li>
93 </ul>
94 <li><a href="#sec5_2">5.2 setSimDAP Output</a></li>
95 </ul>
96 <li><a href="#sec6">6 Data Staging</a></li>
97 <li><a href="#sec7">7 Data Delivery</a></li>
98 <li><a href="#sec8">8 Service Registration</a></li>
99 <br/>
100 <li><a href="#appA">Appendix A: VOTable Examples</a></li>
101 <ul class="toc">
102 <li><a href="#appA_1">A.1 VOTable for the velocity field of a fluid on a fixed 3D mesh</a></li>
103 <li><a href="#appA_2">A.2 VOTable for the velocity and position fields of particles from an N-Body simulation</a></li>
104 <li><a href="#appA_3">A.3. VOTable for the temperature field of a mesh based quantity and the position of N-Body particles extracted from the same spatial region.</a></li>
105 </ul>
106 <li><a href="#appB">Appendix B: Binary File Formats</a></li>
107 <br/>
108 <li><a href="#references">References</a></li>
109 </ul>
110 </div>
111 <hr/>
112
113 <br/>
114 <h2><a name="sec1">1 Introduction</a></h2>
115 <p>This specification defines a prototype standard for retrieving theoretical data from a variety of astrophysical simulation repositories: the Simulation Data Access Protocol (hereafter SimDAP). In this context <i>Theoretical Data</i> is defined the outcome of different kinds of numerical applications, like dynamical simulations, semianalytical models, montecarlo simulations etc.</p>
116 <p>The standard is strictly related to the SimDM theoretical data model (REF XXX), which allows the user to identify the data of interest and the associated source. Such source (in most of the cases a data file) can be downloaded. However, in general, data is so large that its direct dowload is unfeasible. The objective of the SimDAP protocol is to allow the user to focus on a proper subsample of the data, leading to a (strong) reduction of the data volume to be moved across the network, permitting its download, otherwise difficult, if not impossible.</p>
117 <ul/>
118
119
120 <p>SimDAP will deal with datasets that can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as <i>snapshot</i> of a numerical application. Snapshots are our data sources. No further assumption are made on data. </p>
121 <p>The SimDAP protocol support a <i>rectangular cutout</i> of the data in a generic N-Dimensional (N-Dim) parameters space. This consists in extracting all the simulated elements for which some parameters have values in a given range. Notice that no assumption is made on the dimensionality of the problem (selection can be done on any number of parameters) or on the nature of the parameters (no restrictions to the parameters adopted in the selection operation). However, it can be convenient to consider as a favoured case, a 3D geometric selection, in which data are extracted according to its position in the 3D space. This means that spatial coordinates are used as cut-out parameters. This case is particularly simple and intuitive. Furthermore it is common to a large number of applications. In the rest of the document we will refer to:</p>
122 <ul>
123 <li>position, as a N-uple of selection parameters which define a position in the phase space (e.g. the center of a the 3D geometric box)</li>
124 <li>size, as the extension of the selected region (dependent on the position specification) in the N-Dim phase space (sides of the 3D deometric box)</li>
125 </ul>
126 <p><img src="extraction.JPG" width="582" height="246" border="0"/></p>
127 <p>The SimDAP protocol is designed primarily as a &quot;data on demand&quot; service, with dataset created on the fly by the service given the position and size of the desired output dataset as specified by the client. This is not a simple task for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation. Furthermore, simulation outputs can be represented by a wide variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.</p>
128 <p>In operation, SimDAP represents a negotiation between the client and the data service, which allows to select and retrieve a specific subset of available data. The retrieval of the complete dataset can be considered as a <i>degenerate</i> selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file (supported by the SimDM&nbsp;protocol), since data are delivered in the standard TVO format described in section X.X.</p>
129 <p>
130 In summary, we can identify five main stages for the SimDAP service.
131
132 <dl>
133 <dt>Selection of simulations and data</dt>
134 <dd>(Section 3) According to the results of a simulation discovery procedure (not part of the SimDAP protocol) select potentially interesting simulations.</dd>
135 <dt>Identification of subset of interest<dt>
136 <dd>(Section 4) The user identifies a subset of the full simulation data which is of interest.</dd>
137 <dt>SimDAP request</dt>
138 <dd>(Section 5) Send to the server the selection parameters for the SimDAP operation.</dd>
139 <dt>Data staging and delivery</dt>
140 <dd>(Section 6 and 7) Metadata are delivered to the client as a VOTable or a more general XML file. Data are staged and delivered (possibly after some time, needed for extraction) via HTTP, FTP etc. as binary files + XML descriptors.
141 Delivery of VOTable and binary data files can be in two separated stages.</dd>
142 <dt>Service registration</dt>
143 <dd>(Section 8) SimDAP services need to be published in available registry. Registry inquiry must be performed according to the SimDB data modelS</dd>
144 <dd/>
145 </dl>
146 <h2><br/></h2>
147 <h2><a name="sec2">2 Requirements for Compliance</a></h2>
148 <p>The keywords "MUST", "REQUIRED", "SHOULD", and "MAY" as used in this document are to be interpreted as described in RFC 2119 [34]. An implementation is compliant if it satisfies all the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant".</p>
149
150 <p>Compliance with this specification requires that a SimDAP service is maintained with the following characteristics:</p>
151
152 <ol>
153 <li><font color="red">The service MUST support a Simulation Selection service as described in section 3 below. The SimDAP service MUST provide tools to select the datasets and the regions of interest and proceed with following steps of the SimDAP procedure.</font></li>
154
155 <li><font color="red">The SimDAP service MUST support a getUnits method (or getFields method&acirc;&#128;&brvbar; to be discussed) This method allows clients to get the list of units associated to the available fields.</font></li>
156
157 <li><font color="red">The Sub-Volume Extraction method SHOULD be supported as defined in section 4 below. If supported, a getThumb method MUST be available
158 This method allows clients to retrieve data from a spatially defined sub-volume of the simulation box. The client determines the rectangular or spherical region within the simulation, the bounds and scale (i.e. units) of which are specified in the simulation metadata, and the service returns the simulation data contained within this region. The service SHOULD use a staging method (section 6) to return the particle file, as extracting a sub-sample of particles or grid points from a larger simulation box is likely to be a time-consuming process and would thus require some kind of caching. </font></li>
159
160 <li><font color="red">The setSimDAP method MUST be supported as defined in section 5 below This method allows clients to submit a SimDAP operation.</font></li>
161
162 <li><font color="red">The data retrieval (getSimDAP) method MUST be supported as defined in section 7 below.
163 This method allows clients to retrieve single simulation snapshots and cutouts</font></li>
164
165 <li><font color="red">The SimDAP service MUST be registered by providing the information defined in section 8 below. Registration allows clients to use a central registry service to locate compliant simulation access services and select an optimal subset of services to query, based on the characteristics of each service and the simulation data collections it serves.</font></li>
166
167 <li><font color="red">Job management request methods, getSimDAPInfo, cancelSimDAP, MAY be supported.
168 These methods allow users to inquire about the status of a submitted request and, possibly, to cancel it.</font></li>
169 </ol>
170
171 <br/>
172 <h2><a name="sec3">3 Simulation Selection</a></h2>
173 <p>Search and exploration of available data archives and collections is described in detail in REF. The user select the datasets of interest. Each dataset is characterized by a set of matadata according to the SimDM. Several of these parameters represent the input data for all the SimDAP actions:</p>
174 <ul>
175 <li>DATASERVICE = unique identifier of the data provider </li>
176 <li>DATASOURCE = unique identifier of the dataset at the data provider</li>
177 <li>SIMULATION = </li>
178 </ul>
179 <p>Other two parameters are used at some stages of the SimDAP&nbsp;protocol:</p>
180 <ul>
181 <li>FIELDS = list of the physical quantities delected for the extraction (subset of the complete list available with the data search)</li>
182 <li>UNITS = list of the units of the selected physical quantities as stored in the data archive</li>
183 </ul>
184 <p>Notice that FIELDS and UNITS must have the same number of entries. For example:</p>
185 <p><font face="Courier New,Courier,Monaco">FIELDS = &quot;xposition,yposition,zposition,velocity,temperature&quot; UNITS = &quot;Mpc,Mpc,Mpc,km,sec-1,K&quot;</font></p>
186 <p>UNITS is used to convert client side to server side units. In the case of dimensionless/normalized quantities, UNITS must specify a conversion factor to a physical unit. For instance, if coordinates of an N-Body simulations are defined between 0 and 1, UNITS&nbsp;must specify that L(adim)=1 corresponds to L(phys)=100Mpc, where L() represents a length measure. This is specified by the SimDM.</p>
187 <p>All the parameters are retrieved as specified in REF. </p>
188 <br/>
189 <h2><a name="sec4">4 Data Preview</a></h2>
190 <p>The result of a SimDAP request can be the whole snapshot or a part of it. The service MUST provide tools to enable the client to </p>
191 <ul>
192 <li>preview the available datasets</li>
193 <li>select a subset specifying its size and position </li>
194 </ul>
195 <p>Both these actions could require the availability of a &quot;simplified&quot; (but meaningful) version of the data, namely a <i>thumbnail</i>, easy to download and handle. The preview and the thumbnails features depend on the data and their implementation is up to the data provider. E.g. they could be represented by:</p>
196 <ul>
197 <li>projections of the computational box in the three coordinate directions (images),</li>
198 <li>a random or decimated sample of the dataset (in particular for point like data),</li>
199 <li>a reduced resolution realization of the dataset (e.g. averages over neighboring cells of a computational mesh)</li>
200 <li>a &quot;clever&quot; selection of regions according to specific criteria (e.g. &quot;overdense&quot; regions) implemented by proper algorithms</li>
201 </ul>
202 <p>However a getThumb method MUST be implemented. The input of this method is a couple of DATASERVICE, DATASOURCE parameters which identifies the dataset of interest. The output is a VOTable in the TVO standard. As for the results of the SimDAP procedure, thumbnails data are stored in external binary files. However, these files are immediately downloaded (together with the VOTable, as a response to the web method), since their size is small and they may be precalculated.</p>
203 <table border="1" cellpadding="4" cellspacing="5">
204 <tr>
205 <td>Method</td>
206 <td>Input parameter</td>
207 <td>Output</td>
208 <td>Result</td>
209 </tr>
210 <tr>
211 <td>getThumb</td>
212 <td>DATASERVICE
213 <p>DATASOURCE</p>
214 </td>
215 <td>INFO</td>
216 <td>VOTable</td>
217 </tr>
218 </table>
219 <br/>
220 <h2><a name="sec5">5 SimDAP Request</a></h2>
221 <p>The main target of the SimDAP service is the access the snapshot data, selected by a general Simulation Query, described in section 3. The SimDAP service in general provides the following functionalities:</p>
222 <ol>
223 <li>Extraction of a subset of data properly selected (data cutout)</li>
224 <li>Storage of the associated metadata in a VOTable (see later in this section) delivered to the client</li>
225 <li>Staging of the extracted data and their delivery to the client via http, ftp etc. (see section 6 and 7)</li>
226 </ol>
227 <p>In principle, the extraction phase (1) could be performed using any of the set of N parameters that characterize the simulation. However, for simplicity, in a first stage of development, we will focus on geometric selections, assuming that the user wants to select a rectangular sub region of the entire computational volume, without having to download the whole dataset. Of course, it is still possible to retrieve the complete dataset. This can be seen as a degenerate cutout request, with a region of interest which covers the entire computational volume. Notice that this action is not just a simple download, since action 2 is performed as well.</p>
228 <p>In order to submit the SimDAP request, a <b>setSimDAP</b> method MUST be implemented, with inputs and outputs defined in the following subsections.</p>
229 <h3>5.1 INPUT PARAMETERS</h3>
230 <h4>5.1.1 Selection parameters</h4>
231 <p>Simulations outputs are stored in files. This files can be indicated by a reference name which identify unambiguously the data source. This link can be provided directly by the client, by registries or/and by the middleware software which a distributed archive is built on (e.g. SRB, OGSA-DAI). The data source can be also a database. However, this does not imply anything on the service interface implementation. The complexity of the database access is hidden behind the setSimDAP operation and its implementation. But this is up to the service provider.</p>
232 <p>The service id MUST also be specified.</p>
233 <p>A SimDAP operation MUST refer to a single data source. Multiple sources cutouts, like for various time steps of the same simulation, cannot be supported by the protocol. Their implementation is up to the client, as, for example, sequences of single source requests with same subbox and fields. The client must verify that such operation is possible and/or meaningful.</p>
234 <dl>
235 <dt>DATASERVICE</dt>
236 <dd>Identification of the data service (to be better specified)</dd>
237 <dt>DATASOURCE</dt>
238 <dd>The service MUST support an optional parameter with the name DATASOURCE, the value of which is single data source reference. The DATASOURCES parameter MUST be set.</dd>
239 </dl>
240 <p>Examples:</p>
241 <pre>
242 DATASOURCE=/scratch/my_directory/myfile1.bin
243 DATASOURCE=myfile2.ref
244 </pre>
245 <h4>5.1.2 Geometric variables</h4>
246 <p>The GEOM parameters allows to select some of the snapshot variables (e.g. the coordinates of the particles in a N-Body simulation) as those on which selection is defined. It is specified as a comma-seaparted list of strings.</p>
247 <p>Example: GEOM=&quot;xpos,ypos,zpos&quot;</p>
248 <h4>5.1.3 Region of interest</h4>
249 <p>To select the region of interest, two parameters are necessary on each dimension:</p>
250 <ul>
251 <li>Lower Value: MIN</li>
252 <li>Upper Value: MAX</li>
253 </ul>
254 <p>The MIN and MAX parameters are converted client-side to the proper units using the corresponding UNIT&nbsp;parameter. For discrete fields MIN&nbsp;and MAX will be properly approximated such that they represent the smaller interval containing the requested one. MIN and MAX parameters will be returned to the user with their corrected values.</p>
255 <p>MIN&nbsp;and MAX will be expressed as a N-uple of comma-separated values (mbedded whitespaces are not permitted). A NULL value represents the minimum or maximum value of the selection parameter in that dimension.</p>
256 <p>Example: &quot;MIN=0.3,0.25,0.1&quot;, &quot;MAX=0.5,NULL,0.5&quot; . </p>
257 <p>The number of values in MIN&nbsp;and MAX and their order MUST&nbsp;be the same as that of geometric variables.</p>
258 <h4>5.1.4 Selected variables</h4>
259 <p>The user can specify the physical quantities he is interested in, which can be a subset of the available ones.</p>
260 <dl>
261 <dt>FIELDS</dt>
262 <dd>The service MAY support an optional parameter with the name FIELDS, the value of which is a comma separated list of field names corresponding to the data elements the simulation can return. If the parameter is not provided or it is set to NULL all fields are returned. The fields name are defined according to the standards (see X.X). </dd>
263 </dl>
264 <p>Example: &quot;FIELDS=Density,Temperature,Velocity_x&quot;</p>
265 <h4>5.1.5 Boundaries</h4>
266 <p>For regions that intersect the boundary of the simulation box, the service has the option of applying different types of boundary conditions. Possible solutions are truncated boundary conditions (the sub-box is truncated at the box boundaries) or periodic boundary conditions (if applicable). The service MAY support the following parameter specifying the adopted boundary conditions:</p>
267 <dl>
268 <dt>BOUNDARY</dt>
269 <dd>This parameter have one value, one for each selection dimension. Possible values are:</dd>
270 <dl>
271 <dt>TRUNC</dt>
272 <dd>if the selected interval exceeds the computational box, it is resized at the interval boundary</dd>
273 <dt>PERIODIC</dt>
274 <dd>if the selected interval exceeds the computational box, data are selected from the opposite side of the interval</dd>
275 <dt>CUSTOM</dt>
276 <dd>service dependent</dd>
277 </dl>
278 </dl>
279 <p>Registry metadata of the service indicates what kind of boundary conditions are supported.</p>
280 <dl/>
281 <h4><a name="sec5_1_5">5.1.5 Service Defined Parameters</a></h4>
282 <p>
283 The service MAY support additional service-specific parameters. The names, meanings, and allowed values are defined by the service. The names need not be upper-case; however, they should not match any of the reserved parameter names defined above.
284 </p>
285 <p/>
286 <table border="1" cellpadding="4" cellspacing="5">
287 <tr>
288 <td>Method</td>
289 <td>Input parameter</td>
290 <td>Output</td>
291 <td>Result</td>
292 </tr>
293 <tr>
294 <td>getThumb</td>
295 <td>DATASERVICE
296 <p>DATASOURCE</p>
297 </td>
298 <td>INFO</td>
299 <td>VOTable</td>
300 </tr>
301 </table>
302 <h3><a name="sec5_2">5.2 setSimDAP Output</a></h3>
303 <p>The output produced by a SimDAP cutout request is a VOTable, an XML table format, returned with a MIME-type of text/xml, plus an external binary file with the extracted data. The VOTable is characterized by the following items:</p>
304 <ol>
305 <li>The VOTable MUST contain a RESOURCE element, identified with the tag type="results", containing one or more TABLE elements with the metadata results of the setSimDAP operation. The VOTable is permitted to contain additional RESOURCE elements, but the usage of any such elements is not defined here. If multiple resources are present it is recommended that the query results be returned in the first resource element.</li>
306 <li>The VOTable MUST contain a DATASERVICE parameter which identifies the used service. </li>
307 <li>The VOTable MUST contain a REQUEST_ID parameter which identifies uniquely the job request on the service. REQUEST_ID is a 4 bytes integer.</li>
308 <li>The VOTable MUST contain a REQUEST_STATUS parameter which can be Ok or Rejected. In this last case all the other fields of the VOTable are not present.</li>
309 <li>TABLE contains different species extracted from the dataset. Species can differ either by their geometrical representation (e.g. particles, regular meshes, AMR meshes…) or in their "physical meaning" (e.g. star particles vs. dark matter particles). All the FIELDS in a table have the same number of elements, specified by the arraysize parameter. This parameter set also the geometry of the quantity. E.g. arraysize="N" represents a point like quantity; arraysize="NxMxS" represents a grid based variable. For point like quantities arraysize is NOT mandatory, since often it cannot be calculated on-the-fly. Resulting data FIELDS are stored one after the other in a single binary file, in the same order they appear in the VOTable.</li>
310 <li>Each TABLE MUST contain FIELDs where the UCDs that follow have been set. FIELDS refer to the variables stored in the external binary file. </li>
311 <li>Variables must be scalars. Vectors and more generally, multidimensional quantities, are not supported. This means that each FIELD represents a scalar value. E.g. temperature of each point, x coordinate of a particle.</li>
312 <li>Each FIELD must specify the datatype and the unit of the variable. Furthermore name, ID, and ucd has to be set. The ucds for simulations are still in progress, therefore we do not enter in more details.</li>
313 <li>The acref binary data file reference is specified in a DATA section, according to the rules defined in other documents (e.g. SIAP specification)</li>
314 </ol>
315 <p>
316 Other parameters may be supported according to the services offered by the data provider.
317 </p>
318 <br/>
319 <h2><a name="sec6">6 Data Staging</a></h2>
320 <p>
321 By Data Staging we refer to the processing the server performs to retrieve or generate the requested simulation volumes and cache them in online storage for retrieval by a client. Staging is necessary for large archives which must retrieve simulation data from hierarchical storage, or for services which can dynamically extract subvolumes, where it may take a substantial time (e.g. minutes or hours) to retrieve the particles in the relevant region of the simulation box. Issuing a staging request for a set of simulation subvolumes (e.g. for a set of small cubes randomly placed in a simulation box) also permits large servers to optimize subvolume extraction, for example to take advantage of parallelization for large requests.
322 </p>
323 <p>
324 The snapshot staging service is optional for the simulation server. If staging is not implemented, data should be immediately available for retrieval (URL direct to file). The availability of this function is communicated to the registry services.
325 </p>
326 <p>
327 When staging of data is necessary, the technique used is to stage data on the server for later retrieval by the client. The data is only staged for a period of time and is eventually deleted by the service. The getSimDAP method (see section 7) is identical whether or not staging is used. The service can proceed to generate the simulation sub-volume regardless of the state or accessibility of the client.
328 </p>
329 <p>
330 As soon as staged data are available at the given URL, the user can start the download procedure. The user can be informed of the availability of the data following two different approaches:
331 </p>
332 <ol>
333 <li>The client searches for the data on the service (e.g. reload a web/ftp page).</li>
334 <li>The service searches for the client and, if present, sends information to it.</li>
335 </ol>
336 <p>
337 The first approach is simpler. In its most strightforward implementation, it simply consists in making the client reload the data URL, to see if data are there.
338 </p>
339 <p>
340 In the second approach, the staging mechanism should provide a messaging capability. The service broadcasts messages to subscribing clients whenever a staging (processing) event occurs, such as when the sub-volume extraction has been completed and is available for retrieval. Service generated messages can also be used to pass informational or diagnostic messages to clients as processing proceeds. This type of messaging is asynchronous and one way: the service broadcasts messages to subscribing clients as things happen, whereas clients send requests to the service to invoke web methods. For example, to initiate staging, subscribe to staging-related messages, or abort a staging operation in progress, the client sends a request to (invokes a web method provided by) the service.
341 </p>
342 <p>
343 SimDAP is not just a search-and-download service, but it requires also running processes and, possibly, managing them (see later in this section). Therefore the authentication of the client should be required. This is strictly required for approach 2, in which the user must be detected and identified by the service. However, authentication should be always necessary for security and privacy reasons: access to the services should be granted only to "trustable" users with proper privileges (some data could be available only for specific communities etc.) and extracted data should be accessible only to the user who performed the request.
344 </p>
345 <p>
346 Authentication could be on a username-password basis or on some more sophisticated methods, like certificates. This choice is up to the service provider. Authentication allows the user to use the scheduling/batch system which is implemented by the service provider. This system set all the policies of access to the resources (requests pipeline, multiple requests from the same user, CPU time limits, accounting). Obviously, also these choices are up to the provider, who is only required to notify all the available features to the registry service.
347 </p>
348 <p>
349 Since the SimDAP request is staged, the provider should support at least two basic operations:
350 </p>
351 <ul>
352 <li>Job monitoring</li>
353 <li>Job cancellation</li>
354 </ul>
355 <p>
356 The specific implementation of the two operations depends on the adopted service technology.
357 </p>
358 <p>
359 Both operations use the SERVICE and REQUEST_ID parameters written in the VOTable. They are called using proper web methods:
360 </p>
361 <ul>
362 <li><code>getSimDAPInfo(SERVICE, REQUEST_ID, SimDAPINFO)</code></li>
363 <li><code>cancelSimDAP(SERVICE, REQUEST_ID, SimDAPINFO)</code></li>
364 </ul>
365 <p>The getSimDAPInfo method returns a SimDAPINFO string with the following information: STATUS (Idle, Hold, Cancelled, Running, reJected), SUBMISSION_DATE, other (up to the service provider, specified to the registry). The cancelSimDAP method returns a SimDAPINFO string that can have the values &quot;Ok&quot; or &quot;Rejected&quot;. Other services can be implemented and registered by the provider.
366 </p>
367
368 <br/>
369 <h2><a name="sec7">7 Data Delivery</a></h2>
370 <p>
371 The getSimDAP(acref, SERVICE, STATUS) web method allows a client to retrieve a single raw simulation file given the access reference (acref) in the result VOTable. The file can contains more than one variable and can be in the formats defined in Section 5. The files can be downloaded using http, ftp, grid ftp protocols (or any other useful protocol). All the metadata about the content and the structure of the data file is stored in the associated VOTable (see Appendix A).</p>
372 <p>
373 The getSimDAP method returns a STATUS string which can be Ok, Rejected or Defferred (if data are not yet available).
374 XML header files are stored as well and they are downloaded together with the binary file using the same getSimDAP method.</p>
375
376
377 <br/>
378 <h2><a name="sec8">8 Service Registration</a></h2>
379 <p>
380 The following features and methods MUST be published to the registration service.
381 </p>
382
383
384 <h2><a name="appA">Appendix A: VOTable examples</a></h2>
385
386 <h3><a name="appA_1">A.1 VOTable for the velocity field of a fluid on a fixed 3D mesh</a></h3>
387
388 <p>
389 [GL – We still need a proper way I guess of indicating what the spatial dimensions are for a representation like this. FITS has its WCS system for implicitly specifying the spatial coordinates of a multidimensional array. Is something like this in existence for VOTable ? We need to inquire.]
390 </p>
391
392 <pre>
393 &lt;RESOURCE name="myVectorField" type="results" &gt;
394 &lt;DESCRIPTION&gt;Velocity Field from N-Body run&lt;/DESCRIPTION&gt;
395 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
396
397 &lt;TABLE name="VelocityField" ID="Vel" order="sequential"&gt;
398 &lt;FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x" datatype="float"
399 arraysize="41x41x41" unit="km/s" geometry="mesh" /&gt;
400 &lt;FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y" datatype="float"
401 arraysize="41x41x41" unit="km/s" geometry="mesh" /&gt;
402 &lt;FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z" datatype="float"
403 arraysize="41x41x41" unit="km/s" geometry="mesh" /&gt;
404 &lt;DATA&gt;
405 &lt;BINARY&gt;
406 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
407 &lt;/BINARY&gt;
408 &lt;/DATA&gt;
409 &lt;/TABLE&gt;
410 &lt;/RESOURCE&gt;
411 &lt;/VOTABLE&gt;
412 </pre>
413
414 <h3><a name="appA_2">A.2. VOTable for the velocity and position fields of particles from an N-Body simulation</a></h3>
415
416 <pre>
417 &lt;RESOURCE name=myParticles type="results"&gt;
418 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
419 &lt;TABLE name="Particles" ID="NBody" order="tabular"&gt;
420 &lt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
421 datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /&gt;
422 &lt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
423 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
424 &lt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
425 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
426 &lt;FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
427 datatype="float"arraysize="100000" unit="km/s" geometry="particles" /&gt;
428 &lt;FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
429 datatype="float"arraysize="100000" unit="km/s" geometry="particles" /&gt;
430 &lt;FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
431 datatype="float" arraysize="100000" unit="km/s" /&gt;
432 &lt;DATA&gt;
433 &lt;BINARY&gt;
434 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
435 &lt;/BINARY&gt;
436 &lt;/DATA&gt;
437 &lt;/TABLE&gt;
438 &lt;/RESOURCE&gt;
439 &lt;/VOTABLE&gt;
440 </pre>
441
442 <h3><a name="appA_3">A.3. VOTable for the temperature field of a mesh based quantity and the position of N-Body particles extracted from the same spatial region.</a></h3>
443
444 <pre>
445 &lt;RESOURCE name=myMixedData type="results"&gt;
446 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
447 &lt;TABLE name="ParticlesAndMesh" ID="NBody" order="sequential"&gt;
448 &lt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
449 datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /&gt;
450 &lt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
451 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
452 &lt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
453 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
454 &lt;FIELD name="temperature" ID="temp" ucd="phys.temperature;pos.cartesian.x"
455 datatype="float"arraysize="41x41x41" unit="K" geometry="mesh" /&gt;
456 &lt;DATA&gt;
457 &lt;BINARY&gt;
458 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
459 &lt;/BINARY&gt;
460 &lt;/DATA&gt;
461 &lt;/TABLE&gt;
462 &lt;/RESOURCE&gt;
463 &lt;/VOTABLE&gt;
464 </pre>
465
466 <p>
467 An alternate version
468 </p>
469
470 <pre>
471 &lt;VOTABLE&gt;
472 &lt;RESOURCE name=myMixedData type="results"&gt;
473 &lt;INFO name="QUERY_STATUS" value="OK"/&gt;
474 &lt;TABLE name="Particles" ID="NBodyParticles" order="sequential"&gt;
475 &lt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
476 datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /&gt;
477 &lt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
478 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
479 &lt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
480 datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /&gt;
481 &lt;DATA&gt;
482 &lt;BINARY&gt;
483 &lt;STREAM href=_mesh"file:///scratch/myhome/test_particles.bin"/&gt;
484 &lt;/BINARY&gt;
485 &lt;/DATA&gt;
486 &lt;/TABLE&gt;
487 &lt;TABLE name="Mesh" ID="NBodyMesh" order="sequential"&gt;
488 &lt;FIELD name="temperature" ID="temp" ucd="phys.temperature;pos.cartesian.x"
489 datatype="float"arraysize="41x41x41" unit="K" geometry="mesh" /&gt;
490 &lt;DATA&gt;
491 &lt;BINARY&gt;
492 &lt;STREAM href="file:///scratch/myhome/test.bin"/&gt;
493 &lt;/BINARY&gt;
494 &lt;/DATA&gt;
495 &lt;/TABLE&gt;
496 &lt;/RESOURCE&gt;
497 &lt;/VOTABLE&gt;
498 </pre>
499
500 <p>
501 [GL – Do we need an example of an "ordinary" tabular VOTable as well ? Something like
502 </p>
503
504 <pre>
505 &gt;RESOURCE name=myParticles type="results"&lt;
506 &gt;INFO name="QUERY_STATUS" value="OK"/&lt;
507 &gt;TABLE name="Particles" ID="NBody" &lt;
508 &gt;FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
509 datatype="float" unit="Mpc" /&lt;
510 &gt;FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
511 datatype="float" unit="Mpc" /&lt;
512 &gt;FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
513 datatype="float" unit="Mpc" /&lt;
514 &gt;FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
515 datatype="float" unit="km/s"/&lt;
516 &gt;FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
517 datatype="float" unit="km/s" /&lt;
518 &gt;FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
519 datatype="float" unit="km/s" /&lt;
520 &gt;DATA&lt;
521 &gt;BINARY&lt;
522 &gt;STREAM href="file:///scratch/myhome/test.bin"/&lt;
523 &gt;/BINARY&lt;
524 &gt;/DATA&lt;
525 &gt;/TABLE&lt;
526 &gt;/RESOURCE&lt;
527 &gt;/VOTABLE&lt;
528 </pre>
529 <p>
530 ]
531 </p>
532 <br/>
533
534 <h2><a name="appB">Appendix B: Binary File Formats</a></h2>
535 <p>
536 To be done.
537 </p>
538 <br/>
539 <h2><a name="references">References</a></h2>
540
541 <p>[1] R. Hanisch, <i>Resource Metadata for the Virtual Observatory</i>
542 <br/><a href="http://www.ivoa.net/Documents/latest/RM.html">http://www.ivoa.net/Documents/latest/RM.html</a>
543 </p>
544 <p>[2] R. Hanisch, M. Dolensky, M. Leoni, <i>Document Standards Management: Guidelines and Procedure</i>
545 <br/><a href="http://www.ivoa.net/Documents/latest/DocStdProc.html">http://www.ivoa.net/Documents/latest/DocStdProc.html</a>
546 </p>
547
548
549 </body></html>

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26