/[volute]/trunk/projects/theory/snap/SimDAP.html
ViewVC logotype

Diff of /trunk/projects/theory/snap/SimDAP.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 460 by claudio.gheller, Thu May 8 12:58:44 2008 UTC revision 461 by claudio.gheller, Mon May 12 14:07:49 2008 UTC
# Line 113  Line 113 
113  <br/>  <br/>
114  <h2><a name="sec1">1 Introduction</a></h2>  <h2><a name="sec1">1 Introduction</a></h2>
115                  <p>This specification defines a prototype standard for retrieving theoretical data from a variety of astrophysical simulation repositories: the Simulation Data Access Protocol (hereafter SimDAP). In this context <i>Theoretical Data</i> is defined the outcome of different kinds of numerical applications, like dynamical simulations, semianalytical models, montecarlo simulations etc.</p>                  <p>This specification defines a prototype standard for retrieving theoretical data from a variety of astrophysical simulation repositories: the Simulation Data Access Protocol (hereafter SimDAP). In this context <i>Theoretical Data</i> is defined the outcome of different kinds of numerical applications, like dynamical simulations, semianalytical models, montecarlo simulations etc.</p>
116                  <p>The standard is intended to define a basic theoretical data service, represented by the selection and retrieval of a set of data, according to a few specific requirements. Such service is particularly relevant since:</p>                  <p>The standard is strictly related to the SimDM theoretical data model (REF XXX), which allows the user to identify the data of interest and the associated source. Such source (in most of the cases a data file) can be downloaded. However, in general, data is so large that its direct dowload is unfeasible.  The objective of the SimDAP protocol is to allow the user to focus on a proper subsample of the data, leading to a (strong) reduction of the data volume to be moved across the network, permitting its download, otherwise difficult, if not impossible.</p>
117                  <ul>                  <ul/>
118                          <li>Many (if not most of) data processing applications rely on it to access data</li>                          
119                          <li>It leads to a (strong) reduction of the data volume to be moved across the network, permitting its download, otherwise difficult, if not impossible</li>                  
120                  </ul>                  <p>SimDAP will deal with datasets that can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as <i>snapshot</i> of a numerical application.  Snapshots are our data sources. No further assumption are made on data. </p>
121                  <p>The datasets we will deal with, can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as <i>snapshot</i> of a numerical application.  Snapshots are our data sources. No further assumption are made on data. </p>                  <p>The SimDAP protocol support a <i>rectangular cutout</i> of the data in a generic N-Dimensional (N-Dim) parameters space. This consists in extracting all the simulated elements for which some parameters have values in a given range. Notice that no assumption is made on the dimensionality of the problem (selection can be done on any number of parameters) or on the nature of the parameters (no restrictions to the parameters adopted in the selection operation). However, it can be convenient to consider as a favoured case, a 3D geometric selection, in which data are extracted according to its position in the 3D space. This means that spatial coordinates are used as cut-out parameters. This case is particularly simple and intuitive. Furthermore it is common to a large number of applications. In the rest of the document we will refer to:</p>
                 <p>From the definition of snapshot, we can immediately specify a <i>simple</i> access protocol, which we define as a <i>rectangular cutout</i> of the parameters space. This consists in extracting all the simulated elements for which some parameters have values in a given range. Notice that no assumption is made on the dimensionality of the problem (selection can be done on any number of parameters) or on the nature of the parameters (no restrictions to the parameters adopted in the selection operation). However, it can be convenient to consider as a favoured case, a 3D geometric selection, in which data are extracted according to its position in the 3D space. In practice, spatial coordinates are adopted as parameters for the query. This case is particularly simple and intuitive. Furthermore it is common to a large number of applications. Therefore, it will be developed in details in the next sections. However, everything we propose can be extended to a generic number of parameters. In the rest of the document we will refer to:</p>  
122                  <ul>                  <ul>
123                          <li>position, as a N-uple of selection parameters which define a position in the phase space (e.g. the center of a the 3D geometric box)</li>                          <li>position, as a N-uple of selection parameters which define a position in the phase space (e.g. the center of a the 3D geometric box)</li>
124                          <li>size, as the extension of the selected region (dependent on the position specification) in the N-Dim phase space (sides of the 3D deometric box)</li>                          <li>size, as the extension of the selected region (dependent on the position specification) in the N-Dim phase space (sides of the 3D deometric box)</li>
125                  </ul>                  </ul>
126                  <p><img src="extraction.JPG" width="582" height="246" border="0"/></p>                  <p><img src="extraction.JPG" width="582" height="246" border="0"/></p>
127                  <p>The SimDAP protocol is designed primarily as a &quot;data on demand&quot; service, with dataset created on the fly by the service given the position and size of the desired output dataset as specified by the client. This is not a simple task for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation.  Furthermore, simulation outputs can be represented by a wode variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.</p>                  <p>The SimDAP protocol is designed primarily as a &quot;data on demand&quot; service, with dataset created on the fly by the service given the position and size of the desired output dataset as specified by the client. This is not a simple task for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation. Furthermore, simulation outputs can be represented by a wide variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.</p>
128                  <p>In operation, SimDAP represents a negotiation between the client and the data service, which allows to select and retrieve a specific subset of available data. The retrieval of the complete dataset can be considered as a <i>degenerate</i> selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file, since data are delivered in the standard TVO format described in section X.X.</p>                  <p>In operation, SimDAP represents a negotiation between the client and the data service, which allows to select and retrieve a specific subset of available data. The retrieval of the complete dataset can be considered as a <i>degenerate</i> selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file (supported by the SimDM&nbsp;protocol), since data are delivered in the standard TVO format described in section X.X.</p>
129                  <p>                  <p>
130  In summary, we can identify five main stages for the SimDAP service.  In summary, we can identify five main stages for the SimDAP service.
131    
# Line 175  Line 174 
174                  <ul>                  <ul>
175                          <li>DATASERVICE = unique identifier of the data provider </li>                          <li>DATASERVICE = unique identifier of the data provider </li>
176                          <li>DATASOURCE = unique identifier of the dataset at the data provider</li>                          <li>DATASOURCE = unique identifier of the dataset at the data provider</li>
177                            <li>SIMULATION = </li>
178                  </ul>                  </ul>
179                  <p>Other two parameters are used at some stages of the SimDAP&nbsp;protocol:</p>                  <p>Other two parameters are used at some stages of the SimDAP&nbsp;protocol:</p>
180                  <ul>                  <ul>
# Line 182  Line 182 
182                          <li>UNITS = list of the units of the selected physical quantities as stored in the data archive</li>                          <li>UNITS = list of the units of the selected physical quantities as stored in the data archive</li>
183                  </ul>                  </ul>
184                  <p>Notice that FIELDS and UNITS must have the same number of entries. For example:</p>                  <p>Notice that FIELDS and UNITS must have the same number of entries. For example:</p>
185                  <p><font face="Courier New,Courier,Monaco">FIELDS = &quot;xposition,yposition,zposition,velocity,temperature&quot; UNITS = &quot;Mpc,Mpc,Mpc,km sec-1,K&quot;</font></p>                  <p><font face="Courier New,Courier,Monaco">FIELDS = &quot;xposition,yposition,zposition,velocity,temperature&quot; UNITS = &quot;Mpc,Mpc,Mpc,km,sec-1,K&quot;</font></p>
186                  <p>UNITS is used to convert client side to server side units. In the case of dimensionless/normalized quantities, UNITS must specify a conversion factor to a physical unit. For instance, if coordinates of an N-Body simulations are defined between 0 and 1, UNITS&nbsp;must specify that L(adim)=1 corresponds to L(phys)=100Mpc, where L() represents a length measure. This is specified by the SimDM.</p>                  <p>UNITS is used to convert client side to server side units. In the case of dimensionless/normalized quantities, UNITS must specify a conversion factor to a physical unit. For instance, if coordinates of an N-Body simulations are defined between 0 and 1, UNITS&nbsp;must specify that L(adim)=1 corresponds to L(phys)=100Mpc, where L() represents a length measure. This is specified by the SimDM.</p>
187                  <p>All the parameters are retrieved as specified in REF. </p>                  <p>All the parameters are retrieved as specified in REF. </p>
188                  <br/>                  <br/>
# Line 228  Line 228 
228                  <p>In order to submit the SimDAP request, a <b>setSimDAP</b> method MUST be implemented, with inputs and outputs defined in the following subsections.</p>                  <p>In order to submit the SimDAP request, a <b>setSimDAP</b> method MUST be implemented, with inputs and outputs defined in the following subsections.</p>
229                  <h3>5.1 INPUT PARAMETERS</h3>                  <h3>5.1 INPUT PARAMETERS</h3>
230                  <h4>5.1.1 Selection parameters</h4>                  <h4>5.1.1 Selection parameters</h4>
231                  <p>Simulations outputs are stored in files. This files can be indicated by a reference name which identify unambiguously the data source. This link can be provided directly by the client, by registries or/and by the middleware software which a distributed archive is built on (e.g. SRB, OGSA-DAI&acirc;&#128;&brvbar;). The data source can be also a database. However, this does not imply anything on the service interface implementation. The complexity of the database access is hidden behind the setSimDAP operation and its implementation. But this is up to the service provider.</p>                  <p>Simulations outputs are stored in files. This files can be indicated by a reference name which identify unambiguously the data source. This link can be provided directly by the client, by registries or/and by the middleware software which a distributed archive is built on (e.g. SRB, OGSA-DAI). The data source can be also a database. However, this does not imply anything on the service interface implementation. The complexity of the database access is hidden behind the setSimDAP operation and its implementation. But this is up to the service provider.</p>
232                  <p>The service id MUST also be specified.</p>                  <p>The service id MUST also be specified.</p>
233                  <p>A SimDAP operation MUST refer to a single data source. Multiple sources cutouts, like for various time steps of the same simulation, cannot be supported by the protocol. Their implementation is up to the client, as, for example, sequences of single source requests with same subbox and fields. The client must verify that such operation is possible and/or meaningful.</p>                  <p>A SimDAP operation MUST refer to a single data source. Multiple sources cutouts, like for various time steps of the same simulation, cannot be supported by the protocol. Their implementation is up to the client, as, for example, sequences of single source requests with same subbox and fields. The client must verify that such operation is possible and/or meaningful.</p>
234                  <dl>                  <dl>
# Line 244  Line 244 
244  </pre>  </pre>
245                  <h4>5.1.2 Geometric variables</h4>                  <h4>5.1.2 Geometric variables</h4>
246                  <p>The GEOM parameters allows to select some of the snapshot variables (e.g. the coordinates of the particles in a N-Body simulation) as those on which selection is defined. It is specified as a comma-seaparted list of strings.</p>                  <p>The GEOM parameters allows to select some of the snapshot variables (e.g. the coordinates of the particles in a N-Body simulation) as those on which selection is defined. It is specified as a comma-seaparted list of strings.</p>
247                  <p>Example: &quot;GEOM=xpos,ypos,zpos&quot;</p>                  <p>Example: GEOM=&quot;xpos,ypos,zpos&quot;</p>
248                  <h4>5.1.3 Region of interest</h4>                  <h4>5.1.3 Region of interest</h4>
249                  <p>To select the region of interest, two parameters are necessary on each dimension:</p>                  <p>To select the region of interest, two parameters are necessary on each dimension:</p>
250                  <ul>                  <ul>

Legend:
Removed from v.460  
changed lines
  Added in v.461

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26