/[volute]/trunk/projects/theory/snap/SimDAP.html
ViewVC logotype

Diff of /trunk/projects/theory/snap/SimDAP.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 414 by claudio.gheller, Tue Apr 29 14:08:12 2008 UTC revision 415 by claudio.gheller, Thu May 8 10:59:35 2008 UTC
# Line 118  Line 118 
118                          <li>Many (if not most of) data processing applications rely on it to access data</li>                          <li>Many (if not most of) data processing applications rely on it to access data</li>
119                          <li>It leads to a (strong) reduction of the data volume to be moved across the network, permitting its download, otherwise difficult, if not impossible</li>                          <li>It leads to a (strong) reduction of the data volume to be moved across the network, permitting its download, otherwise difficult, if not impossible</li>
120                  </ul>                  </ul>
121                  <p>The datasets we will deal with, can always be represented as (large/huge) tables in which raws represent a simulated element (a mesh cell, a particle, a pixel...) and colums represent associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as <i>snapshot</i> of a numerical application.  Snapshots are our data sources. No further assumption are made on data. </p>                  <p>The datasets we will deal with, can always be represented as (large/huge) tables in which raws identify a simulated element (a mesh cell, a particle, a pixel...) and colums represent the associated physical parameters (the 3D spatial coordinates, the velocity, the temperature...). Datasets can represent different timesteps (so, evolutionary configurations) of the the same simulated system. In the rest of the document, we will refer to the considered datasets as <i>snapshot</i> of a numerical application.  Snapshots are our data sources. No further assumption are made on data. </p>
122                  <p>From the definition of snapshot, we can immediately specify a <i>simple</i> access protocol, which we define as a <i>rectangular cutout</i> of the parameters space. This consists in extracting all the simulated elements for which some parameters have values in a given range. Notice that no assumption is made on the dimensionality of the problem (selection can be done on any number of parameters) or on the nature of the parameters (no restrictions to the parameters adopted in the selection operation). However, it can be convenient to consider as a favoured case, a 3D geometric selection, in which data are extracted according to its position in the 3D space. In practice, spatial coordinates are adopted as parameters for the query. This case is particularly simple and intuitive. Furthermore it is common to a large number of applications. Therefore, it will be developed in details in the next sections. However, everything we propose can be extended to a generic set of N selection parameters.</p>                  <p>From the definition of snapshot, we can immediately specify a <i>simple</i> access protocol, which we define as a <i>rectangular cutout</i> of the parameters space. This consists in extracting all the simulated elements for which some parameters have values in a given range. Notice that no assumption is made on the dimensionality of the problem (selection can be done on any number of parameters) or on the nature of the parameters (no restrictions to the parameters adopted in the selection operation). However, it can be convenient to consider as a favoured case, a 3D geometric selection, in which data are extracted according to its position in the 3D space. In practice, spatial coordinates are adopted as parameters for the query. This case is particularly simple and intuitive. Furthermore it is common to a large number of applications. Therefore, it will be developed in details in the next sections. However, everything we propose can be extended to a generic number of parameters. In the rest of the document we will refer to:</p>
123                  <p>The SimDAP protocol is designed primarily as a &quot;data on demand&quot; service, with dataset created on the fly by the service given the position and size of the desired output dataset as specified by the client. This is not a simple task for various reasons. First, simulations adopt specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation. Furthermore, there is no equivalent to a &quot;position in the sky&quot; as for astronomical images and therefore no absolute common reference frame. Furthermore, simulation outputs can be represented by a wode variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.</p>                  <ul>
124                            <li>position, as a N-uple of selection parameters which define a position in the phase space (e.g. the center of a the 3D geometric box)</li>
125                            <li>size, as the extension of the selected region (dependent on the position specification) in the N-Dim phase space (sides of the 3D deometric box)</li>
126                    </ul>
127                    <p><img src="extraction.JPG" width="582" height="246" border="0"/></p>
128                    <p>The SimDAP protocol is designed primarily as a &quot;data on demand&quot; service, with dataset created on the fly by the service given the position and size of the desired output dataset as specified by the client. This is not a simple task for various reasons. First, simulations data adopts specific units and coordinate systems, which depend on the nature of the problem, the characteristics of the algorithms and their implementation.  Furthermore, simulation outputs can be represented by a wode variety of completely different data objects. For example, the output can consist in a set of particles in a given volume, where each particle has its physical position and a set of associated scalar and vector quantities, like velocity, mass density, temperature etc. On the other hand, mesh based simulations describe their data as discrete fields defined on a regular or adaptive mesh. The SimDAP protocol has the goal of providing a uniform description of the selection service trying keep it simple and, at the same time, to include as many different kind of simulations and data as possible.</p>
129                  <p>In operation, SimDAP represents a negotiation between the client and the data service, which allows to select and retrieve a specific subset of available data. The retrieval of the complete dataset can be considered as a <i>degenerate</i> selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file, since data are delivered in the standard TVO format described in section X.X.</p>                  <p>In operation, SimDAP represents a negotiation between the client and the data service, which allows to select and retrieve a specific subset of available data. The retrieval of the complete dataset can be considered as a <i>degenerate</i> selection of all the volume and all the parameters. However, notice that even this case does not lead to a simple download of the original data file, since data are delivered in the standard TVO format described in section X.X.</p>
130                  <p>                  <p>
131  In summary, we can identify five main stages for the SimDAP service.  In summary, we can identify five main stages for the SimDAP service.
# Line 166  Line 171 
171    
172  <br/>  <br/>
173  <h2><a name="sec3">3 Simulation Selection and Units</a></h2>  <h2><a name="sec3">3 Simulation Selection and Units</a></h2>
174                  <p>Search and exploration of available data archives and collections is described in detail in REF. The user select the datasets of interest. Each dataset is characterized by a set of matadata according to the SimDM. Several of these parameters are used as input data for the any SimDAP action:</p>                  <p>Search and exploration of available data archives and collections is described in detail in REF. The user select the datasets of interest. Each dataset is characterized by a set of matadata according to the SimDM. Several of these parameters represent the input data for all the SimDAP actions:</p>
175                  <ul>                  <ul>
176                          <li>DATASERVICE = unique identifier of the data provider </li>                          <li>DATASERVICE = unique identifier of the data provider </li>
177                          <li>DATASOURCE = unique identifier of the dataset at the data provider</li>                          <li>DATASOURCE = unique identifier of the dataset at the data provider</li>
# Line 181  Line 186 
186                  <p>UNITS is used to convert client side to server side units. In the case of dimensionless/normalized quantities, UNITS must specify a conversion factor to a physical unit. For instance, if coordinates of an N-Body simulations are defined between 0 and 1, UNITS&nbsp;must specify that L(adim)=1 corresponds to L(phys)=100Mpc, where L() represents a length measure. This is specified by the SimDM.</p>                  <p>UNITS is used to convert client side to server side units. In the case of dimensionless/normalized quantities, UNITS must specify a conversion factor to a physical unit. For instance, if coordinates of an N-Body simulations are defined between 0 and 1, UNITS&nbsp;must specify that L(adim)=1 corresponds to L(phys)=100Mpc, where L() represents a length measure. This is specified by the SimDM.</p>
187                  <p>All the parameters are retrieved as specified in REF. </p>                  <p>All the parameters are retrieved as specified in REF. </p>
188                  <br/>                  <br/>
189                  <h2><a name="sec4">4 Subset Selection</a></h2>                  <h2><a name="sec4">4 Subset Selection</a></h2>
190  <p>                  <p>The result of a SimDAP request can be the whole snapshot or a part of it. The service MUST provide tools to enable the client to </p>
191  The SimDAP request must be submitted according to the prescription given in section 5. If data cutout is supported, the service MUST provide tools to enable the client to specify the size and position of the subset. Geometrical parameters are particularly tricky to set, since the user has to know where the interesting regions are in advance. Therefore a thumbnail of the data could be necessary to proceed with data discovery. The thumbnail is a representative, but much smaller (with respect to the data size), realization of the whole dataset. It could be:                  <ul>
192  </p>                          <li>preview the available datasets</li>
193  <ul>                          <li>select a subset specifying its size and position </li>
194                    </ul>
195                    <p>Both these actions could require the availability of a &quot;simplified&quot; (but meaningful) version of the data, namely a <i>thumbnail</i>, easy to download and handle. The preview and the thumbnails features depend on the data and their implementation is up to the data provider. E.g. they could be represented by:</p>
196                    <ul>
197  <li>projections of the computational box in the three coordinate directions (images),</li>  <li>projections of the computational box in the three coordinate directions (images),</li>
198  <li>a random or decimated sample of the dataset (in particular for point like data),</li>  <li>a random or decimated sample of the dataset (in particular for point like data),</li>
199  <li>a reduced resolution realization of the dataset (e.g. averages over neighboring cells of a computational mesh)</li>  <li>a reduced resolution realization of the dataset (e.g. averages over neighboring cells of a computational mesh)</li>
200  <li>a "clever" selection of regions according to specific criteria (e.g. "overdense" regions) implemented by proper algorithms (which are not subject of this work).</li>  <li>a &quot;clever&quot; selection of regions according to specific criteria (e.g. &quot;overdense&quot; regions) implemented by proper algorithms</li>
201  </ul>  </ul>
202  <p>                  <p>However a getThumb method MUST be implemented. The input of this method is a couple of DATASERVICE, DATASOURCE parameters which identifies the dataset of interest. The output is a VOTable in the TVO standard. As for the results of the SimDAP procedure, thumbnails data are stored in external binary files. However, these files are immediately downloaded (together with the VOTable, as a response to the web method), since their size is small and they may be precalculated.</p>
203  The specific details of these services depends on their implementation and they must be published to the registry. However, a minimal set of methods and interfaces can be defined.</p>                  <br/>
 <p>  
 A getThumb web method MUST be implemented. The input of this method is a couple of DATASERVICE, DATASOURCE parameters (see section 5), which identifies the dataset of interest. The output is a VOTable with the same features of that described in section 5.2. As for the results of the SimDAP procedure, thumbnails data are stored in external binary files. However, these files are immediately downloaded (together with the VOTable, as a response to the web method), since their size is small.  
 </p>  
   
 <br/>  
204  <h2><a name="sec5">5 SimDAP Request</a></h2>  <h2><a name="sec5">5 SimDAP Request</a></h2>
205  <p>The main target of the SimDAP service is the access to the raw data from a simulation, selected by a general Simulation Query, described in section 3. The SimDAP service in general provides the following functionalities:</p>  <p>The main target of the SimDAP service is the access to the raw data from a simulation, selected by a general Simulation Query, described in section 3. The SimDAP service in general provides the following functionalities:</p>
206  <ol>  <ol>

Legend:
Removed from v.414  
changed lines
  Added in v.415

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26