/[volute]/trunk/projects/theory/snapdm/doc/note/SimDB-note.html
ViewVC logotype

Contents of /trunk/projects/theory/snapdm/doc/note/SimDB-note.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 342 - (show annotations)
Wed Apr 30 16:28:24 2008 UTC (12 years, 7 months ago) by gerard.lemson
File MIME type: text/html
File size: 43718 byte(s)
depend on style file in css/
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html>
4 <head>
5 <title>IVOA Working Group - Internal Draft</title>
6 <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
7 <meta name="keywords" content="IVOA, International, Virtual, Observatory, Alliance" />
8 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9 <meta name="maintainedBy" content="IVOA Document Coordinator, ivoadoc@ivoa.net" />
10 <link rel="stylesheet" href="http://ivoa.net/misc/ivoa_wg.css" type="text/css" />
11 <link rel="stylesheet" href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/css/simdb-note.css" type="text/css">
12 </head>
13
14 <body>
15 <div class="head">
16 <a href="http://www.ivoa.net/"><img alt="IVOA" src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" width="300" height="169"/></a>
17 <h1>Simulation Database (SimDB)<br/>
18 Version 0.x</h1>
19 <h2>IVOA Theory Interest Group <br />Internal Draft 2008 April 19 </h2>
20
21
22 <dt>This version:</dt>
23 <dd><a href="http://www.ivoa.net/Documents/...">
24 http://www.ivoa.net/Documents/...</a></dd>
25
26 <dt>Latest version:</dt>
27
28 <dd><a href="http://www.ivoa.net/Documents/latest/...">
29 http://www.ivoa.net/Documents/latest/...</a></dd>
30
31 <dt>Previous versions:</dt>
32 <dt>Interest Group:</dt>
33 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory"> http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory</a></dd>
34 <dt>Author(s):</dt>
35 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/GerardLemson">Gerard Lemson</a> (editor)<br /></dd>
36 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/LaurentBourges">Laurent Bourges</a><br /></dd>
37 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/PatriziaManzato">Patrizia Manzato</a><br /></dd>
38 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/RickWagner">Rick Wagner</a><br /></dd>
39 <dd>others?</dd>
40 <hr/></div>
41
42 <h2><a name="abstract" id="abstract">Abstract</a></h2>
43 <p>In this note we propose that the IVOA develop a standard protocol for discovering simulations.
44 We will call this protocol the <i>Simulation Database</i> (SimDB). Implementations of the SimDB will allow users to query for
45 results of simulations in quite some detail and will provide links to services for accessing these
46 simulations. </p>
47 <p>The results presented in this note, which form the core of the peoposed standard, are one half of a concerted effort of the theory Interest Group that originally went by the name
48 S<i>imple Numerical Access Protocol</i> (SNAP), and is now split up in two parts. The second part defines protocols
49 for accessing the simulations data products themselves. This part will be written up in a separate Note
50 (Gheller, Wagner et al, in preparation), under the name Simulation Data Access Protocol (SimDAP).
51 </p>
52 <p>The current proposal is built around a UML data model describing simulations, a representation (mapping) of this model as a relational
53 database schema and a mapping to an XML schema.
54 We propose the relational schema to be the outer facade of a SimDB-TAP implementation which is to be queried using
55 <a href="http://www.ivoa.net/internal/IVOA/IvoaVOQL/ADQL-20080415.pdf">ADQL</a> <em class="todo">.@@ TODO update the ADQL link to later versions @@</em>
56 The XML schema provides type definitions from
57 which a machine readable serialisations of the model may be constructed. The schema also defines root elements for documents
58 describing SimDB-resources. The SimDB should return such documents for identified SimDB-Resources upon request, as an
59 alternative to the tabular (VOTable) results of ADQL queries.
60 In case updates are supported by a SimDB implementation, such documents may be sent
61 </p>
62 <p>
63 This Note describes use cases and requirements and the approach we have taken to define a specification
64 that and current state of the results. We feel that the results are
65 sufficiently far evolved that they can start following the formal IVOA standardisation track.
66 To this end it could be turned over to one of the existing working groups. If that is the decisions we feel
67 that the data modelling WG is closest to its scope, but there exist very strong links to Registry, Semantics, ADQL
68 and DAL as well. One might argue that a targeted WG for this effort alone might be as appropriate.
69 We leave the decision about this to the IVOA exec.
70 </p>
71
72
73
74 <div class="status">
75 <h2><a name="status" id="status">Status of this Document</a></h2>
76 This is a Note. The first release of this document was 2008 April 19.
77 <p></p><br />
78
79 <!-- Choose one of the following (and remove the rest)-->
80 <!--Note-->
81 <p>This is an IVOA Note expressing suggestions from and opinions of the authors.<br/>
82 It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory.
83 It should not be referenced or otherwise interpreted as a standard specification.</p>
84
85
86 A list of <a href="http://www.ivoa.net/Documents/">current IVOA Recommendations and other technical documents</a> can be found at http://www.ivoa.net/Documents/.
87
88 </div><br />
89
90 <h2><a name="acknowledgments" id="acknowledgments">Acknowledgments</a></h2>
91 <p>We thank various persons for useful discussions in the course of this work. First the participants of the
92 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/CambridgeTheoryWorkshopFeb06">Feb 2006 theory
93 workshop</a> in Cambridge, UK, where this work was started. Second the participants of the
94 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/GarchingSNAPWorkshop200704">April 2007 SNAP workshop</a> in
95 Garching, Germany, where the design started taking shape. Then we want to thank particularly the following persons
96 for useful discussions and feedback: Jeremy Blaizot, Klaus Dolag, Ray Plante, Volker Springel. We finally want to thank
97 participants to the theory sessions in the interoperability meetings in Victoria, Moscow, Beijing and Cambridge where parts
98 of this work was discussed.
99 </p>
100 <h2><a id="contents" name="contents">Contents</a></h2>
101 <div class="head">
102 <ul class="toc">
103 <li><a href="#abstract">Abstract</a></li>
104 <li><a href="#status">Status</a></li>
105 <li><a href="#acknowledgments">Acknowledgements</a></li>
106 <li><a href="#contents">Contents</a></li>
107 <li><a href="#sec1">1. Executive Summary</a></li>
108
109 <li><a href="#sec2">2. Overview</a></li>
110 <ul class="toc">
111 <li><a href="#sec2_1">2.1 SNAP &rArr; SimDB + SimDAP</a></li>
112 <li><a href="#sec2_3">2.3 Simulation Database: structure and interface</a></li>
113 <li><a href="#sec2_3">2.3 Registration</a></li>
114 <li><a href="#sec2_4">2.4 Technology: UML, XMI, XSLT</a></li>
115 <li><a href="#sec2_5">2.5 Reference implementations</a></li>
116 </ul>
117
118
119 <li><a href="#sec3">3 Usage scenarios</a></li>
120 <ul class="toc">
121 <li><a href="#sec3_1">3.1 "20 questions"</a></li>
122 <li><a href="#sec3_2">3.2 SimDB-standard implementation</a></li>
123 <li><a href="#sec3_3">3.3 Legacy database</a></li>
124 <li><a href="#sec3_4">3.4 Meta data production pipe line</a></li>
125 <li><a href="#sec3_5">3.5 Client tools</a></li>
126 </ul>
127
128 <li><a href="#sec4">4 Analysis model</a></li>
129 <ul class="toc">
130 <li><a href="#sec4_1">4.1 Universe of Discourse</a></li>
131 <li><a href="#sec4_2">4.2 <i>Domain Model for Astronomy</i></a></li>
132 <li><a href="#sec4_3">4.3 SimDB analysis model</a></li>
133 </ul>
134
135 <li><a href="#sec5">5 Logical model</a></li>
136 <ul class="toc">
137 <li><a href="#sec5_1">5.1 Overview</a></li>
138 <li><a href="#sec5_2">5.2 Normalisation</a></li>
139 <li><a href="#sec5_3">5.3 Target</a></li>
140 <li><a href="#sec5_4">5.4 Characterisation</a></li>
141 <li><a href="#sec5_5">5.5 Semantics</a></li>
142 </ul>
143
144 <li><a href="#sec6">6 Physical models</a></li>
145 <ul class="toc">
146 <li><a href="#sec6_1">6.1 RDBM Schema</a></li>
147 <li><a href="#sec6_2">6.2 XML Schema</a></li>
148 <li><a href="#sec6_3">6.3 Identifiers</a></li>
149 <li><a href="#sec6_4">6.4 JAVA/JPA+JAXB (non-normative)</a></li>
150 </ul>
151
152 <li><a href="#sec7">7. Query protocols</a></li>
153 <ul class="toc">
154 <li><a href="#sec7_1">7.1 ADQL</a></li>
155 <li><a href="#sec7_2">7.3 REST</a></li>
156 <li><a href="#sec7_3">7.2 TAP?</a></li>
157 </ul>
158
159 <li><a href="#sec8">8. Next steps</a></li>
160 <ul class="toc">
161 <li><a href="#sec8_1">8.1 Reference implementations</a></li>
162 <ul class="toc">
163 <li><a href="#sec8_1_1">8.1.1 France</a></li>
164 <li><a href="#sec8_1_2">8.1.2 Germany</a></li>
165 <li><a href="#sec8_1_3">8.1.3 Italy</a></li>
166 <li><a href="#sec8_1_4">8.1.4 USA</a></li>
167 </ul>
168 <li><a href="#sec8_2">8.2 SimDAP services</a></li>
169 </ul>
170 <br/>
171 <li><a href="#appA">Appendix A: Data modelling specifics</a></li>
172 <li><a href="#appB">Appendix B: XSLT pipe line</a></li>
173 <li><a href="#glossary">Glossary and Acronyms</a></li>
174
175 <li><a href="#references">References</a></li>
176 </ul>
177 </div>
178 <hr/>
179
180
181 <br/>
182 <h2><a name="sec1">1. Executive summary</a></h2>
183 <em class="todo">@@ TODO Modify this text, which was originally an email to be sent to THEORY, TCG, DM, maybe EXEC @@</em>
184 <p>
185 We propose to derive two WG projects from what was so far the
186 SNAP project of the theory interest group: SimDB and SimDAP.
187 In this note we discuss the first of these, SimDB, in some detail.
188
189 </p>
190 <h3> Simulation Database (SimDB)</h3>
191 <p>We propose to developa standard specification project, called the "Simulation Database" (SimDB).
192 It is based on the description+discovery part of the old
193 SNAP project. Its normative deliverables are
194 <ul>
195 <li> A logical data model for describing simulations.<br/>
196 Following SNAP we keep concentrating
197 on 3+1D simulations, with which we mean simulations modelling a
198 space-time sub-volume of the universe OF ANY SIZE, so not only large
199 scale structure, galaxy clusters, but everything down to asteroid collisions etc.
200 As the model <i>describes</i> simulations, it may be called a meta-data model.
201 It will be a logical model in the sense of standard data modelling approaches <em class="todo">@@TODO add some references@@</em>,
202 and is based on an analysis, or domain model which is presented but not normative.
203 The logical model is presented in fully detailed and documented UML2, serialised
204 to XMI 2.1, created using the MagicDraw 12.1 Community edition tool.
205 The data model is using a small subset of UML2 and has some UML profile
206 extensions added. Together this can be seen as a domain specific language,
207 and this can be formalised in a UML Profile. We will propose using such a profile
208 to the DM working group as a general approach for DM efforts.
209 </li>
210 <li>A query protocol based on the logical model.
211 <br />We propose this to have at least an ADQL version.
212 To this end we will provide a relational mapping.
213 This physical model is completely derived from the SimDB logical model using rules
214 implemented as a pipe-line of XSLT2 scripts working on the XMI representation of
215 the UML. The scripts will produce relational database DDL scripts defining the
216 database schema. That schema itself is not normative, instead we will define the
217 replies to TAP metadata queries. We provide implementaiton scenarios in the text below,
218 for the case of someone using the results from this project completely and for the
219 case of someone implementing a SimDB on top of a legacy database.
220 </li>
221 <li> a messaging format for sending instances of the various components
222 in the data model around.
223 <br />This format will be based on a number of XML
224 schema documents (XSDs), one of which contains the root elements defining valid SimDB resources.
225 This requires a mapping from the UML to XSD.
226 This mapping will take the form of one or more XSLT documents.
227 </li>
228 <li> An IVOA working draft document describing these components.
229 <br />This will be based on the current document.</li></ul>
230 </p>
231 <p>
232 We introduce some non-normative solutions that can be taken over for generic
233 data models (this is ofcourse also true for the UML/XMI+XSLT approach for the
234 normative standards).
235 <ul>
236 <li> The XSLT scripts we propose above do not work on the XMI itself, but on
237 an intermediate representation of the UML data model. This is an XML dialect
238 based on a schema we define and which captures the UML profile more directly.
239 XMI is very generic and rather cumbersome to work with. The representation of
240 the UML in our intermediate XML form is much more readable and XSLT based on it
241 is much simpler. It also allows easier adaptation to future modifications in UML,
242 or to tools whose XMI representation is different from the standard. We only need
243 to update the XMI->Intermediate XSLT transformation scripts. Not the more complex
244 transformations to the other official representations.
245 We will propose a similar approach to the DM WG.
246 </li>
247 <li> We will provide XMI->Java+JPA+JAXB transformation scripts in XSLT (properly, intermediate->Java).
248 These scripts generate Java classes corresponding to the types (Class, DataType, Enumeration)
249 in UML. These classes are annotated with Java Persistence Architecture (JPA)
250 and Java Architecture for XML Binding (JAXB) attributes to assist in the transformation
251 between relational database and XML representations.
252 Similar scripts can be written for C#. C# allows the same annotations as Java 5 supports
253 already for longer. For persistence we will likely use Linq, which seems similar to JPA.
254 </li>
255 <li>We propose an approach for including application specific and legacy simulation databases
256 in this framework. This approach follows the "global-as-view" approach to information
257 integration (see for example http://www.deg.byu.edu/papers/PODS.integration.pdf;
258 Leonid Kalinichenko from the RVO is an expert in this field).
259 Implementors with an existing relational database schema may be able to define database
260 views which implement the relational representatiopn of the SimDB data model,
261 and in this way provide a simple way to support querying of their database using ADQL.
262 </li></ul></p>
263 <h4>organisation</h4>
264 <p>
265 The SimDB is ready to be transferred to the DM WG.
266 <br />We propose that Gerard Lemson keeps leading this effort (as main editor), also when it is moved
267 to that WG. The DM WG's chair (Mireille Louys) will be responsible all WG-chair
268 issues associated with moving a specification through the document process.
269 The people at the bottom will be part of a "tiger team" to push the standard to RFC.
270 We may want to expand this group with an expert from each of the WGs mentioned below.
271 </p>
272 <p>
273 We have been discussing the data model for some time now.
274 Various projects (Italy, USA, France and Germany) have implementations that are similar
275 to the envisioned SimDB. We believe that by autumn 2008 it can go to RFC.
276 Patriza Manzato and Rick Wagner will have reference implementations based on existing DBs,
277 so will various projects in France (Lyon: Jeremy Blaizot and Laurent Bourges;
278 Galmer database: Igor Chillingarian) and GAVO.
279 </p>
280 <p>
281 Other relevant working groups for this process are Registry, ADQL and Semantics, possibly DAL.
282 Registry because the simulation database is similar to a registry. We can
283 learn from implementations and the registry interface. Also, we (think we) may need an
284 extension to the IVO Identifier in the implementation of references in SimDB.
285 ADQL because we propose it to be the standard (main) query interface to a SimDB implementation.
286 Semantics because our model includes usage of semantic vocabularies, maybe full ontologies
287 DAL because we our proposal for using ADQL in the query phase requirs a version of
288 the TAP protocol for defining the interface.
289 We would like to include a person from each of these WGs in the tiger team.
290 Our wishes are: Ray Plante (Registry), ? (ADQL), Norman Gray (Semantics), (?) TAP.
291 Ray and Norm have contributed to early discussions about SNAP.
292 </p>
293 <p>
294 Of these other efforts it seems TAP offers the main risk for the SimDB standard to go to
295 RFC by the Autumn. What may help us is that we do not need all the details of TAP.
296 In particular the information_schema approach allowing users to
297 query for the data model is not required as it is part of SimDB specification.
298 We mainly need a prescription for sending ADQL queries to the SimDB, and what the
299 format of results should be.
300 Since we expect meta-data databases to be relatively small (compared to
301 say an SDSS or Millennium database), we expect fewer, if any problems with
302 performance and can stick to synchronous behaviour at first.
303 </p>
304 <p>
305 We may need some explicit registry-interface like features such as returning a
306 complete XML document according to the messaging format of the SimDB data model.
307 Other issues will come up during the next phase of the discussions.
308 </p>
309
310 <h3>Simulation Data Access Protocol (SimDAP)</h3>
311 <p>
312 The second spin-off of the SNAP project we propose we rename to <i>Simulation Data Access Protocol</i> (SimDAP).
313 It deals with accessing the data after discovery by some means,
314 likely trough an implementation of a Simulation Database.
315 It should handle special services such as cut-out, projection,
316 extraction (AMR-like cut-outs, produces regular grids), but also staging etc.
317 It should also deal with data formats. Claudio Gheller (Italy) is leading
318 this effort with close help of Rick Wagner (USA).
319 </p>
320 <p>
321 This project needs more fleshing out and is hopefully ready to be transmitted
322 to a WG, likely DAL by the Autumn interop.
323 </p>
324 <h3>Connections between SimDB and SimDAP</h3>
325 <p>
326 The two projects are connected as follows:
327 The meta-data formats to be included in SimDAP messages are derived from
328 the data model of the SimDB.
329 Vice versa, the SimDB will include a component describing
330 which SimDAP services are applicable/available for a given simulation.
331 </p>
332
333 <!-- ++++++++++++++++++++++++ -->
334 <h2><a name="sec2"/> </a>2 Overview</h2>
335
336 <h3><a name="sec2_1"/>2.1 SNAP &rArr; SimDB + SimDAP</h3>
337 <p>This document presents a model for describing certain types of numerical computer simulations
338 and certain types of simulation post-processing products. The model was oringinally envisioned to
339 be used in the query part of the <i>Simple Numerical Access Protocol</i> (SNAP),
340 and in discovery of interesting SNAP services in the first place.
341 After investigating the application domain carefully, we have decided to leave the concept of
342 designing a DAL-like SxAP protocol for simulations. Instead we have split up the effort into
343 two separate efforts that can be used each in their own right, though their is a clear link between them.
344 This document discusses the firsts of these, which we have named the <i>Simulation Database</i>, and
345 will have the acronym <i>SimDB</i>. The second will be developed further in a separate effort amd is
346 called the <i>Simulation Data Access Protocol</i> (SimDAP, "Sim" stands for "Simulation", <i>not</i> "Simple"!).
347 </p>
348 <p>
349 Following SNAP, SimDB only explicitly considers simulations for systems that represent a space-time
350 sub-volume of the universe and (part of) its material contents. Examples of such simulations are
351 cosmological, pure dark matter N-body simulations of the large-scale structure of the universe;
352 adaptive mesh refinement (AMR) simulations following the evolution of a galaxy cluster using full hydrodynamics;
353 a simulation of the evolution of a globular cluster using a combination of tools, together simulating
354 the various types of physics <em class="todo">@@ TODO reference to MODEST-like activities</em>; or
355 simulations calculating the few seconds of a super nova explosion in full 3D.
356 </p
357 <p>
358 In general these simulations will evolve this system forward
359 in time and are able to produce <i>snapshots</i>, representing the state of the system, a 3D volume of space,
360 at a number of discrete times (though there are alternatives: light cone simulations, individual particle orbits).
361 These direct, raw results of simulations we call Level-0 products, following
362 similar terminology for observations.
363 SimDB also covers Level-1 products, which consist of the results of certain types of post-processing
364 of simulations, namely those products that in some form create an alternative representation of
365 a spatial sub-volume of the universe. For example a density field calculated on a regular grid, derived
366 created from an N-body or an AMR simulation; a cluster catalogue derived using some group finder applied
367 to a cosmological simulaiton, or a synthetic galaxy catalogue derived from the cluster catalogue using
368 halo occupation distribution models (HODs) or semi-analytical models (SAMs).
369 </p>
370 We do not make any restrictions on the type of systems being simulated, or the size of the
371 simulation, or the way the system is represented in the simulation code and results. We also
372 make no restrictions on the type of "observables" produced by the simulations.
373 </p>
374 <p>
375 The SimDAP
376 specification will includes protocols for services that process level-0 or level-1 results and produce
377 other level-1 results. The allowed services deal with selecting the results in a
378 sub-volume of the complete result, sampling a regular 3-dimensional grid, etc. SimDAP also allows for
379 services, that do not produce SimDB-like, level-0 or 1 products. Examples are projections, 1D or 2D samplings.
380 But also custom services will be allowed, for example calculating statistical properties such as correlation
381 functions or power spectra in cosmological simulations. A more detailed description of SimDAP
382 is outside of the main scope of this note.
383 </p>
384 <h3><a name="sec2_2"/>2.2 Simulation Database: structure, interface and applicable services</h3>
385 <p>
386 SimDB is a specification that defines the interface to a database containing meta data describing
387 simulations. To this end it contains two main parts, one is a model for the meta data, the other
388 a protocol for interacting with the database. The model is the core of the specification.
389 It describes the structure of individual data products in the database. We have chosen UML
390 as modelling language, as prescribed by the data modelling working group in the interoperability meeting
391 in Cambridge, UK, May 2003.
392 </p>
393 <p>
394 The UML model is a logical model (see [..] <em class="todo">@@ TODO add reference @@</em>) and
395 forms the basis for physical representations of the data products in the standard
396 language that the IVOA has chosen for such purposes, XML. We derive an XML schema defining valid
397 XML documents directly from the logical model. The SimDB interface will include functions for insetting
398 SimDB data products using such documents, and for retrieving individual, identified data products.
399 </p>
400 <p>
401 The logical model also forms the basis for a physical representation supporting formulation of queries.
402 For various reasons explained below we have chosen ADQL to be the query language and accordingly we derive
403 from the model a relational schema that defines the tables and columns that can be used in ADQL queries sent
404 to a SimDB implementation. The result of ADQL queries is supposed to be a VOTable, and this will in general
405 not represent a complete SimDB data product. However it can be used to browse the database, finally identifying
406 resources and possibly requesting these from the SimDB as XML documents.
407 </p>
408 <p>
409 We make very limited assumptions on <em>how</em> a data product discovered in a SimDB can actually be accessed.
410 We only assume there is a web-based service available, identified by a base URL and tagged with a service type.
411 The range of service types will be defined by SimDAP, but it will at least include "download" and "custom".
412 The data model contains an explicit element for indicating which services are available for a given data product,
413 and users may, if they wish, retrieve this information through ADQL queries and follow the links directly.
414 SimDB implementations can and likely will eventually provide SimDAP related functionality, but this is not part
415 of this specification.
416 </p>
417 <h3><a name="sec2_3"/>2.3 Registration</h3>
418 <p>
419 It must be possible to find SimDB instances in an IVOA Resource Registry <am class="todo">@@TODO add references&&</am>.
420 This implies we need a corresponding resource type, and we have to design its structure.
421 We also assume that one may define resources in the sense of [...]
422 <em class="todo">@@ TODO add reference to Resource data model document @@</em>
423 from within the contents of a SimDB. We take this into account explicitly in the model.
424 The SimDB will have a "getIVOAResource" function, which will execute the appropriate transformation from
425 the internal representation of the SimDB data products to the Resource model's XML representation [...]
426 <em class="todo">@@ TODO link to Resource XML schema document@@</em>.
427 This will likely put more requirements on the Registry model itself, maybe requiring extensions to its schema.
428 Possibly a SimDB itself can be an extension registry. This we think can be postponed to a future version of the
429 specification.
430 </p>
431 <h3><a name="sec2_4"/>2.4 Technology: UML, XMI, XSLT</h3>
432 <p>
433 We
434 </p>
435 <h3><a name="sec2_5"/>2.5 Reference implementations</h3>
436 <!-- ++++++++++++++++++++++++ -->
437
438 <h2><a name="sec3"/>3 Usage scenarios</h2>
439 <em class="todo">@@ TODO needs severe editing @@</em>
440 We have assembled a list of explicit use cases and scenarios from which we derive
441 requirements for the current model and the SNAP protocol.
442 <h4><a name="sec3_1"/>3.1 "20 questions"</h4>
443 <p>
444 SimDB defines a common data model for simulations.
445 Following the good practice for database design initiated in [], we here provide a number of
446 scientific questions one might want to ask such a database. The data model and associated data
447 access protocol need to be sufficiently rich that they can support such questions.
448 </p>
449 <ul>
450 <li> Scientific goal: investigate baryon wiggles in the evolved density field<br/>
451 Query: Return all cosmological, pure dark matter, N-body simulations with WMAP 3 initial
452 conditions and a box size of at least 1000 Mpc comoving, containing snapshots at about
453 10 redshifts between 3 and 0.
454 </li>
455 <li> Scientific goal: investigate whether observed structures in X-ray cluster that seem to
456 indicate turbulence, can truly be that.<br> Query: return all hydro-dynamical simulations of
457 galaxy clusters of mass at least 1o<sup>14</sup> M<sub>sun</sub>,
458 that have a model for viscosity included in the simulation.
459 Moreover, return only those simulations that have associated to them an online visualisation
460 service that can produce projected temperature and pressure maps.
461 </li>
462 <li> Scientific goal: interpret the possible histories of an observed galaxy merger to calculate
463 possible star formation episodes and compare these to the observed stellar populations.<br>
464 Query: Return all simulations of galaxy mergers where the component galaxies have a particular
465 mass ratio and where there are enough snapshots to follow the evolution over a few Gyr.
466 </li>
467
468 <li> Scientific goal: compare the luminosity function of galaxies in the SDSS survey with those
469 in synthetic catalogues.<br>Query: Select all cosmological simulations that have produced as
470 secondary product synthetic galaxy catalogues on a light-cone and provide those via an SQL (ADQL?)
471 query interface.
472 </li>
473 <li> ...
474 </li>
475 </ul>
476 <p>
477 In the design of the model it is useful to think about the steps a user might go through
478 when querying a database system in various "drilling down" steps. For example the following
479 questions might be asked :
480 </p>
481 <ul>
482 <li>What system/object is being simulated?</li>
483 <li>What physical processes are included?</li>
484 <li>How is the system being represented in the simulation
485 (particles (Langrangian), (adaptive) mesh (Eulerian)), both, other?</li>
486 <li>Per process:<ul>
487 <li>How are the physical processes implemented ?</li>
488 <li>Characterise the numerical approximations (.e.g. resolution, softening parameter)</li></ul></li>
489 <li>What observables are available for the system/object, possibly as function of time?
490 As it is a spatial system, at least size, center-of-mass position.</li>
491 <li>What observables are available for the constituents, i.e. what is the schema of the atomic objects?</li>
492 <li>Per snapshot, per atomic object type, per variable:
493 <ul>
494 <li>Characterise the possible values</li>
495 <li>Characterise the result</li></ul></li>
496 <li>Are post-processing results available?</li>
497 <li>Are services/applications available working on the results?</li>
498 <li>Which code ran the simulation?</li>
499 <li>What were values of physical parameters?</li>
500 <li>How were initial conditions created, what parameters?</li>
501 </ul>
502 </p>
503
504 <h4><a name="sec3_2"/>3.2 SimDB-standard implementation</h4>
505 We foresee a simple implementation scenario based directly on products developed
506 in the course of the SimDB effort. We believe that from the data model to be developed
507 in this effort we should be able to derive physical representations that
508 can be used directly in implementations. We envisions that with only a little custom infrastructure code
509 it should be possible to
510 <ul>
511 <li>fill a relational database with tables and views representing the SimDB data model from
512 DDL scripts generated from the UML</li>
513 <li>create a web-based service that accept XML documents for inserting new simulation results
514 and translates these, using generated code with JAXB annotations, to in memory Java objects</li>
515 <li>flush these objects to a relational database using the Java Persistence Architecture (JPA) implementation,
516 structured using the JPA annotations generated on the Java classes.
517 It should be not too hard to support other languages as well if they provide similar simple XML binding and
518 OR-mapping capabilities. Python+Django and C#+LINQ or NHibernate come to mind.<em class="todo">
519 @@ TODO check with people knowing more about these technologies @@</em></li>
520 <li>accept ADQL queries that are translated to the appropriate vendor specific SQL
521 (using modules defined by the ADQL effort?) and return a VOTable</li>
522 <li>accept requests for identified SimDB resources (using an IVO or implementation specific identifier),
523 translate this into a JPA query to retrieve the object form the database, which is translated to
524 the appropriate XML using the JAXB layer and sent back to the user.</li>
525 </ul>
526
527 <h4><a name="sec3_3"/>3.3 Legacy database</h4>
528 Although by no means as common as similar efforts in the observational domain,
529 databases have been developed containing the meta data of simulations.
530 How could a SimDB be implemented around such a database.
531 Our ideas are inspired by (what we understand from) the "global-as-view" approach to information
532 integration. We assume the implementers have their own way of filing up their database with meta-data
533 describing simulations from their own efforts. The idea is that they write database views to provide
534 a virtual implementation of the SimDB/RDB schema. ADQL queries sent to their service can now still be
535 understood and replied to. The users should also be able to write custom code to produce the appropriate
536 XML documents based on a request for an identified resource, possibly by querying these same views.
537
538 <h4><a name="sec3_4"/>3.4 Meta data production pipe line</h4>
539 The SimDB data model is relatively comprehensive, which reflects itself in XML documents
540 of substantial size ad complexity for realistic cases.
541 For a registration scenario, i.e. one where a user is allowed to upload XML documents to a SimDB implementation,
542 one would prefer not to have to produce these documents by hand. By far the preferred manner in our opinion
543 would be for simulation and post-processing pipe-lines to produce compliant documents.
544 We have contacted authors of some of the most popular major simulation codes (Springel; Norman et al; more needed),
545 and they have agreed that this is feasible and are willing to participate in this effort.
546
547 <h4><a name="sec3_5"/>3.5 Client tools</h4>
548 One reason to produce a standard which uses ADQL on top of a standard data model is that client tools
549 can be written to query different such holdings. For example we could envision a tool such as VisIVO [..]
550 to offer some user-friendly interface for querying SimDB implementations retrieved from an IVOA Registry.
551 The user need to see any ADQL, that is all generated by VisIVO, but can be shown results and services.
552 In particular if a cut-out service is available, VisIVO could provide an interface for the user to decide
553 on the sub-volume, retrieve and visualise it. The advantage of having a standard data model
554 clearly is that the same ADQL can be sent to all SimDB services.
555 <em class="todo">@@ TODO contact VisIVO people to see whether this could be implemented @@</em>.
556
557 <!-- ++++++++++++++++++++++++ -->
558
559
560 <h2><a name="sec4"/>4 Analysis model</h2>
561 <em class="todo">@@TODO Gerard@@</em>
562 An <i>analysis model</i>, also called domain model, is an abstract, high-level representation of the
563 <i>universe of discourse</i> (UoD), the part of the world that our application deals with.
564 It is a UML model, with emphasis on the concepts and their exact relationships in the UoD, though details
565 such as attributes need not be completely filled in.
566 Importantly, it should not be influenced by application scenarios apart form knowledge of their UoD.
567 Here we describe the UoD and our analysis model. The model is strongly influenced by patterns
568 discovered in earlier work on a
569 <i><a href="http://www.ivoa.net/internal/IVOA/IvoaDataModel/DomainModelv0.9.1.doc">Domain model for Astronomy</a></i>,
570 co-written by one of the authors of the present note. We describe some of its main patterns below as well.
571
572 <h4><a name="sec4.1"/>4.1 Universe of Discourse</h4>
573
574 <h4><a name="sec4.2"/>4.2 Domain Model for Astronomy</h4>
575
576 <h4><a name="sec4.3"/>4.3 SimDB analysis model</h4>
577 <em class="todo">@@TODO create a version and add it to volute@@</em>.
578
579 <!-- ++++++++++++++++++++++++ -->
580
581 <h2><a name="sec5"/>5 Logical Model: SimDB</h2>
582 <p>
583 Here we introduce the core of our proposal, the UML representaiton of our logical data model
584 for our Simulation Database.
585 <h4><a name="sec5_1"/>5.1 Overview</h4>
586 <p>
587 The logical data model is a fully detailed model of the application domain. It is to form the basis of physical
588 models, representing the model in various computational environments.
589 The logical model is represented as a set of UML diagrams, which we created using MagicDraw Community Edition 12.1 and stored as an
590 XMI file in the GoogleCode
591 SVN repository: <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SNAP_Simulation_DM.xml">
592 SNAP_Simulation_DM.xml</a> <em class="todo">@@TODO should change all occurrences of names with SNAP to using SimDB@@</em>
593 JPG representations of the model can be found in <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/images/">this</a>
594 directory. <em class="todo">@@TODO find proper representation image of the complete model. Possibly color packages differently.@@</em>
595 </p>
596 <h4><a name="sec5_2"/>5.2 Normalisation</h4>
597 <h4><a name="sec5_3"/>5.3 Target</h4>
598 <h4><a name="sec5_4"/>5.4 Characterisation</h4>
599 <h4><a name="sec5_5"/>5.5 Semantics</h4>
600 <h4><a name="sec5_6"/>5.6 Units</h4>
601
602
603 <h2><a hname="sec6"/>6 Physical models</h2>
604 <h3><a name="sec6_1"/>6.1 RDBM Schema</h3>
605
606 <h3><a name="sec6_2"/>6.2 XML Schema</h3>
607
608 <h3><a name="sec6_3"/>Identifiers</h3>
609 <h3><a name="sec6_4"/>6.4 Java/JPA+JAXB (non normative)</h3>
610
611 <h2><a name="sec7"/>7 Query Protocols</h2>
612 <h3><a name="sec7_1"/>7.1 ADQL</h3>
613 <h3><a name="sec7_2"/>7.2 REST</h3>
614 <p>
615 Under this heading we mean a protocol whereby data products can be retrieved through
616 HTTP GET requests. Possibly also they can be POST-ed, or PUT.
617 This needs to be discussed further, but maybe can be punted until a future release.
618 The GET will always only be able to get a complete SimDB resource, serialised to SimDB/XML.
619 </p>
620 <h3><a name="sec7_3"/>7.3 TAP?</h3>
621 Issues:
622 <ul>
623 <li>How does TAP deal with units?</li>
624 <li>In TAP, does a table column containing values always has a single UCD and a single Unit?</li>
625 <li>Is TAP suited for this kind of metadata databases?</li>
626 </ul>
627
628 <h2><a name="sec8"/>8 Next Steps</h2>
629 <h3><a name="sec8_1"/>8.1 Reference implementations</h3>
630 <h4><a name="sec8_1_1"/>8.1.1 France</h4>
631 <em class="todo">@@ TODO Laurent @@</em>
632 <h4><a name="sec8_1_2"/>8.1.2 Germany</h4>
633 <em class="todo">@@ TODO Gerard @@</em>
634 <h4><a name="sec8_1_3"/>8.1.3 Italy</h4>
635 <em class="todo">@@ TODO Patrizia @@</em>
636 <h4><a name="sec8_1_4"/>8.1.4 USA</h4>
637 <em class="todo">@@ TODO Rick @@</em>
638
639 <h3><a name="sec8_2"/>8.2 Generating XML form simulation pipe lines</h3>
640
641 <h3><a name="sec8_3"/>8.3 SimDAP services</h3>
642
643 <h2><a name="appA"/>Appendix A: Data modelling specifics</h2>
644 Here we describe various aspects of UML modelling as we applied it to the current
645 problem area.
646 <p>
647 UML allows communities to create a domain specific modelling language through its Profiling capabilitites
648 <em class="todo">@@ TODO is this the proper term ?@@</em>.
649
650 We have an initial implementation of a UML profile as created by MagicDraw available under <a href="">this link</a>.
651 Here we list the main elements and give a a short motivation for their inclusion in the model/.
652 It is our opinion that the DM working group should be ultimately responsible for a profile such as this,
653 defining a domain specific language for all IVOA data modelling efforts.
654 </p>
655 <p>
656 As first step in our simulation pipeline we generate an XML document that represents the data model in a form
657 that is more easily interpreted, both by human readers and by XSLT scripts, than the XMI representation.
658 This document itself is structured according to an XML schema that
659 represents the UML profile rather directly and that we here shortly describe.
660 </p>
661 This schema is located in
662 <a href="http://volute.googlecode.com//svn/trunk/projects/theory/snapdm/input/intermediateModel.xsd">
663 http://volute.googlecode.com//svn/trunk/projects/theory/snapdm/input/intermediateModel.xsd</a>.
664
665
666 We introduce our own XML format, defined by the XML schema in
667 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/res/intermediateModel.xsd">intermediateModel.xsd</a>,
668 for representing the logical model. For the time being we call this the <i>intermediate representation</i>.
669 The first step in the generation pipeline is a translation of the XMI to an XML document following this format.
670 This transformation is implemented in the
671 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/res/xmi2intermediate.xsl">xmi2intermediate.xsl</a>
672 XSLT script. The latest version of the intermediate representation for the SimDB data model can be found in
673 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/output/SNAP_Simulation_DM_INTERMEDIATE.xml">this location</a>.
674 All other generation scripts work on this intermediate representation, not on the XMI document.
675 Variations in tool-generated XMI, or different versions of XMI can now be supported by an appropriately adjusted
676 XSLT script.
677 One reasons why this may be useful is that are different tools may produce different versions or different
678 dialects of XMI. Another reason for this representation is that XMI is a rather complex representation of a UML
679 model. Since we are using a rather restricted <a href="#profile">profile</a> we do not need this generality, and
680 this allows us to represent the model using XML documents that are much easier to handle with XSLT.
681
682
683 <p>
684 We illustrate out UML profile using an example data model
685 derived form the SimDB/DM, shown in the following diagram:<br/>
686 <img src="img/example.jpg"/>
687 <br/>
688 We now describe the individual elements.
689 some of these are standard, some of these are domain specific extensions following
690 standard UML profile <i>stereotype</i> extension elements and associated <i>tag definition</i>.
691
692 <ul>
693 <li>Model<br/>
694 (no visual counterpart)</li>
695 <ul>
696 <li> &lt;&lt;model&gt;&gt; </li>
697 <ul>
698 <li>TagDefinition: author</li>
699 <li>TagDefinition: title</li>
700 </ul>
701 </ul>
702 <li> Package <br/><img src="img/package.jpg" />
703 <ul>
704 <li>package containment</li>
705 <li>package dependency</li>
706 </ul>
707 </li>
708 <li> Class <br/><img src="img/class.jpg" />
709 <ul>
710 <li>isAbstract<br/>
711 </li>
712 </ul>
713 </li>
714 <li> DataType <br/><img src="img/datatype.jpg" /></li>
715 <li> Enumeration <br/><img src="img/enumeration.jpg" /></li>
716 <li> Property: attribute<br/><img src="img/attribute.jpg" /></li>
717 <ul><li>&lt;&lt;attribute&gt;&gt; </li>
718 <ul>
719 <li>TagDefinition: minLength<br/>
720 </li>
721 <li>TagDefinition: maxLength<br/>
722 </li>
723 </ul>
724 <li> &lt;&lt;ontologyterm&gt;&gt; </li>
725 <ul>
726 <li>TagDefinition: ontology<br/>
727 A URL locating a standard (RDF|SKOS|OWL|???) document containing
728 a list of terms from which the value for this attribute may be obtained.
729 It is our opinion that the Semantics working group should be responsible for the
730 definition of relevant ontologies (or semantic vocabularies, or thesauri, or ...)
731 required for a given application domain, though the contents should be decided in
732 cooperation with domain experts.
733 </li>
734 </ul>
735 </ul>
736 <li>Inheritance
737 <br/><img src="img/inheritance.jpg" /></li>
738 <li>Binary association end: collection
739 <br/><img src="img/collection.jpg" /></li>
740 <li>Binary association end: reference
741 <br/><img src="img/reference.jpg" /></li>
742 <li>Binary association end: subsets
743 <br/><img src="img/subsets.jpg" /></li>
744
745 </ul>
746
747 </p>
748
749
750 <h2><a name="appB"/>Appendix B: XSLT pipe line</h2>
751 <em class="todo">@@ TODO Laurent @@</em>
752
753 <h2><name="glossary"/>Glossary and Acronyms</h2>
754 <dl>
755 <dt><a name="g_SimDB">SimDB</a></dt>
756 <dd></dd>
757 <dt><a name="g_SimDAP"/>SimDAP</dt>
758 <dd></dd>
759 <dt><a name="g_SimDB/DM"/>SimDB/DM</dt>
760 <dd>The logical data model defining the structure of <a href="#g_SimDB">SimDB</a>.</dd>
761 <dt><a name="g_SimDB/RDB"/>SimDB/RDB</dt>
762 <dd>The representation of the SimDB/DM as a relational data base schema.</dd>
763 <dt><a name="g_SimDB/XML"/>SimDB/XML</dt>
764 <dd>The XML representation of the SimDB/DM</dd>
765 <dt><a name="g_SimDB_resource"/>SimDB resource</dt>
766 <dd>A top-level data product stored in a SimDB.
767 A SimDB resource can be described in a SimDB/XML document, but none of its constitutents can.</dd>
768 </dl>
769
770 <h2><a name="references">References</a></h2>
771
772 <p><a name="r_UML">[1] ???, <i>UML standard</i>
773 <br/><a href="http://">http://</a>
774 </p>
775 <p><a name="r_XMI">[2] ???, <i>XMI standard</i>
776 <br/><a href="http://">http://</a>
777 </p>
778 <p><a name="r_AnalaysisPatterns">[3] ???, <i>Analysis Patterns</i>
779 <br/><a href="http://">http://</a>
780 </p>
781 <p><a name="r_TheoryinVO">[4] Lemson & Colberg, <i>Theory in the virtual observatory</i>
782 <br/><a href="http://">http://</a>
783 </p>
784
785 <p><a name="r_Characterisation">[5] ???, <i>Characterisation DM</i>
786 <br/><a href="http://">http://</a>
787 </p>
788
789 <p><a name="r_informatonIntegration">[6] <em class="todo>@@ TODO @@</em>references on global-as-view and information integration
790 <br/><a href="http://">http://</a>
791 </p>
792
793 <p><a name="r_visivo">[6] <em class="todo>@@ TODO @@</em>reference to VisIVO
794 <br/><a href="http://">http://</a>
795 </p>
796
797 </body></html>

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26