/[volute]/trunk/projects/theory/snapdm/doc/note/SimDB-note.html
ViewVC logotype

Contents of /trunk/projects/theory/snapdm/doc/note/SimDB-note.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 453 - (show annotations)
Mon May 12 08:31:24 2008 UTC (12 years, 6 months ago) by gerard.lemson
File MIME type: text/html
File size: 65268 byte(s)
updates ...

1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html>
4 <head>
5 <title>IVOA Working Group - Internal Draft</title>
6 <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
7 <meta name="keywords" content="IVOA, International, Virtual, Observatory, Alliance" />
8 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9 <meta name="maintainedBy" content="IVOA Document Coordinator, ivoadoc@ivoa.net" />
10 <link rel="stylesheet" href="http://ivoa.net/misc/ivoa_wg.css" type="text/css" />
11 <link rel="stylesheet" href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/css/simdb-note.css" type="text/css" />
12 </head>
13
14 <body>
15 <div class="head">
16 <a href="http://www.ivoa.net/"><img alt="IVOA" src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" width="300" height="169"/></a>
17 <h1>Simulation Database (SimDB)<br/>
18 Version 0.x</h1>
19 <h2>IVOA Theory Interest Group <br />Internal Draft 2008 April 19 </h2>
20
21
22 <dt>This version:</dt>
23 <dd><a href="http://www.ivoa.net/Documents/...">
24 http://www.ivoa.net/Documents/...</a></dd>
25
26 <dt>Latest version:</dt>
27
28 <dd><a href="http://www.ivoa.net/Documents/latest/...">
29 http://www.ivoa.net/Documents/latest/...</a></dd>
30
31 <dt>Previous versions:</dt>
32 <dt>Interest Group:</dt>
33 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory"> http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory</a></dd>
34 <dt>Author(s):</dt>
35 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/GerardLemson">Gerard Lemson</a> (editor)<br /></dd>
36 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/LaurentBourges">Laurent Bourges</a><br /></dd>
37 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/PatriziaManzato">Patrizia Manzato</a><br /></dd>
38 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/RickWagner">Rick Wagner</a><br /></dd>
39 <dd>others?</dd>
40 <hr/></div>
41
42 <h2><a name="abstract" id="abstract">Abstract</a></h2>
43 <p>In this note we propose that the IVOA develop a standard protocol for discovering simulations.
44 We will call this protocol the <i>Simulation Database</i> (SimDB). Implementations of the SimDB will allow users to query for
45 results of simulations in quite some detail and will provide links to services for accessing these
46 simulations. </p>
47 <p>The results presented in this note, which form the core of the peoposed standard, are one half of a concerted effort of the theory Interest Group that originally went by the name
48 S<i>imple Numerical Access Protocol</i> (SNAP), and is now split up in two parts. The second part defines protocols
49 for accessing the simulations data products themselves. This part will be written up in a separate Note
50 (Gheller, Wagner et al, in preparation), under the name Simulation Data Access Protocol (SimDAP).
51 </p>
52 <p>The current proposal is built around a UML data model describing simulations, a representation (mapping) of this model as a relational
53 database schema and a mapping to an XML schema.
54 We propose the relational schema to be the outer facade of a SimDB-TAP implementation which is to be queried using
55 <a href="http://www.ivoa.net/internal/IVOA/IvoaVOQL/ADQL-20080415.pdf">ADQL</a> <em class="todo">.@@ TODO update the ADQL link to later versions @@</em>
56 The XML schema provides type definitions from
57 which a machine readable serialisations of the model may be constructed. The schema also defines root elements for documents
58 describing SimDB-resources. The SimDB should return such documents for identified SimDB-Resources upon request, as an
59 alternative to the tabular (VOTable) results of ADQL queries.
60 In case updates are supported by a SimDB implementation, such documents may be sent
61 </p>
62 <p>
63 This Note describes use cases and requirements and the approach we have taken to define a specification
64 that and current state of the results. We feel that the results are
65 sufficiently far evolved that they can start following the formal IVOA standardisation track.
66 To this end it could be turned over to one of the existing working groups. If that is the decisions we feel
67 that the data modelling WG is closest to its scope, but there exist very strong links to Registry, Semantics, ADQL
68 and DAL as well. One might argue that a targeted WG for this effort alone might be as appropriate.
69 We leave the decision about this to the IVOA exec.
70 </p>
71
72
73
74 <div class="status">
75 <h2><a name="status" id="status">Status of this Document</a></h2>
76 This is a Note. The first release of this document was 2008 April 19.
77 <p></p><br />
78
79 <!-- Choose one of the following (and remove the rest)-->
80 <!--Note-->
81 <p>This is an IVOA Note expressing suggestions from and opinions of the authors.<br/>
82 It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory.
83 It should not be referenced or otherwise interpreted as a standard specification.</p>
84
85
86 A list of <a href="http://www.ivoa.net/Documents/">current IVOA Recommendations and other technical documents</a> can be found at http://www.ivoa.net/Documents/.
87
88 </div><br />
89
90 <h2><a name="acknowledgments" id="acknowledgments">Acknowledgments</a></h2>
91 <p>We thank various persons for useful discussions in the course of this work. First the participants of the
92 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/CambridgeTheoryWorkshopFeb06">Feb 2006 theory
93 workshop</a> in Cambridge, UK, where this work was started. Second the participants of the
94 <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/GarchingSNAPWorkshop200704">April 2007 SNAP workshop</a> in
95 Garching, Germany, where the design started taking shape. Then we want to thank particularly the following persons
96 for useful discussions and feedback: Jeremy Blaizot, Klaus Dolag, Ray Plante, Volker Springel. We finally want to thank
97 participants to the theory sessions in the interoperability meetings in Victoria, Moscow, Beijing and Cambridge where parts
98 of this work was discussed.
99 </p>
100 <h2><a id="contents" name="contents">Contents</a></h2>
101 <div class="head">
102 <ul class="toc">
103 <li><a href="#abstract">Abstract</a></li>
104 <li><a href="#status">Status</a></li>
105 <li><a href="#acknowledgments">Acknowledgements</a></li>
106 <li><a href="#contents">Contents</a></li>
107 <li><a href="#sec1">1. Executive Summary</a></li>
108
109 <li><a href="#sec2">2. Overview</a></li>
110 <ul class="toc">
111 <li><a href="#sec2_1">2.1 SNAP &rArr; SimDB + SimDAP</a></li>
112 <li><a href="#sec2_3">2.3 Simulation Database: structure and interface</a></li>
113 <li><a href="#sec2_3">2.3 Registration</a></li>
114 <li><a href="#sec2_4">2.4 Technology: UML, XMI, XSLT</a></li>
115 <li><a href="#sec2_5">2.5 Reference implementations</a></li>
116 </ul>
117
118
119 <li><a href="#sec3">3 Usage scenarios</a></li>
120 <ul class="toc">
121 <li><a href="#sec3_1">3.1 "20 questions"</a></li>
122 <li><a href="#sec3_2">3.2 SimDB-standard implementation</a></li>
123 <li><a href="#sec3_3">3.3 Legacy database</a></li>
124 <li><a href="#sec3_4">3.4 Meta data production pipe line</a></li>
125 <li><a href="#sec3_5">3.5 Client tools</a></li>
126 </ul>
127
128 <li><a href="#sec4">4 Analysis model</a></li>
129 <ul class="toc">
130 <li><a href="#sec4_1">4.1 Universe of Discourse</a></li>
131 <li><a href="#sec4_2">4.2 <i>Domain Model for Astronomy</i></a></li>
132 <li><a href="#sec4_3">4.3 SimDB analysis model</a></li>
133 </ul>
134
135 <li><a href="#sec5">5 Logical model</a></li>
136 <ul class="toc">
137 <li><a href="#sec5_1">5.1 Overview</a></li>
138 <li><a href="#sec5_2">5.2 Normalisation</a></li>
139 <li><a href="#sec5_3">5.3 Model content</a></li>
140 <li><a href="#sec5_3_1">5.3.1 Resource hierarchy</a></li>
141 <li><a href="#sec5_3_2">5.3.2 Object types</a></li>
142 <li><a href="#sec5_3_3">5.3.3 Target</a></li>
143 <li><a href="#sec5_3_4">5.3.4 Characterisation</a></li>
144 <li><a href="#sec5_3_5">5.3.5 Semantics</a></li>
145 <li><a href="#sec5_3_6">5.3.6 Units</a></li>
146 <li><a href="#sec5_3_7">5.3.7 Services</a></li>
147 </ul>
148
149 <li><a href="#sec6">6 Physical models</a></li>
150 <ul class="toc">
151 <li><a href="#sec6_1">6.1 Identifiers and references</a></li>
152 <li><a href="#sec6_2">6.2 RDBM Schema</a></li>
153 <li><a href="#sec6_3">6.3 XML Schema</a></li>
154 <li><a href="#sec6_4">6.4 Identifiers</a></li>
155 <li><a href="#sec6_5">6.5 JAVA/JPA+JAXB (non-normative)</a></li>
156 </ul>
157
158 <li><a href="#sec7">7. Query protocols</a></li>
159 <ul class="toc">
160 <li><a href="#sec7_1">7.1 ADQL</a></li>
161 <li><a href="#sec7_2">7.3 REST</a></li>
162 <li><a href="#sec7_3">7.2 TAP?</a></li>
163 </ul>
164
165 <li><a href="#sec8">8. Next steps</a></li>
166 <ul class="toc">
167 <li><a href="#sec8_1">8.1 Reference implementations</a></li>
168 <ul class="toc">
169 <li><a href="#sec8_1_1">8.1.1 France</a></li>
170 <li><a href="#sec8_1_2">8.1.2 Germany</a></li>
171 <li><a href="#sec8_1_3">8.1.3 Italy</a></li>
172 <li><a href="#sec8_1_4">8.1.4 USA</a></li>
173 </ul>
174 <li><a href="#sec8_2">8.2 SimDAP services</a></li>
175 </ul>
176 <br/>
177 <li><a href="#appA">Appendix A: Data modelling specifics</a></li>
178 <li><a href="#appB">Appendix B: XSLT pipe line</a></li>
179 <li><a href="#glossary">Glossary and Acronyms</a></li>
180
181 <li><a href="#references">References</a></li>
182 </ul>
183 </div>
184 <hr/>
185
186
187 <br/>
188 <h2><a name="sec1">1. Executive summary</a></h2>
189 <em class="todo">@@ TODO Modify this text, which was originally an email to be sent to THEORY, TCG, DM, maybe EXEC @@</em>
190 <p>
191 We propose to derive two WG projects from what was so far the
192 SNAP project of the theory interest group: SimDB and SimDAP.
193 In this note we discuss the first of these, SimDB, in some detail.
194
195 </p>
196 <h3> Simulation Database (SimDB)</h3>
197 <p>We propose to developa standard specification project, called the "Simulation Database" (SimDB).
198 It is based on the description+discovery part of the old
199 SNAP project. Its normative deliverables are
200 <ul>
201 <li> A logical data model for describing simulations.<br/>
202 Following SNAP we keep concentrating
203 on 3+1D simulations, with which we mean simulations modelling a
204 space-time sub-volume of the universe OF ANY SIZE, so not only large
205 scale structure, galaxy clusters, but everything down to asteroid collisions etc.
206 As the model <i>describes</i> simulations, it may be called a meta-data model.
207 It will be a logical model in the sense of standard data modelling approaches <em class="todo">@@TODO add some references@@</em>,
208 and is based on an analysis, or domain model which is presented but not normative.
209 The logical model is presented in fully detailed and documented UML2, serialised
210 to XMI 2.1, created using the MagicDraw 12.1 Community edition tool.
211 The data model is using a small subset of UML2 and has some UML profile
212 extensions added. Together this can be seen as a domain specific language,
213 and this can be formalised in a UML Profile. We will propose using such a profile
214 to the DM working group as a general approach for DM efforts.
215 </li>
216 <li>A query protocol based on the logical model.
217 <br />We propose this to have at least an ADQL version.
218 To this end we will provide a relational mapping.
219 This physical model is completely derived from the SimDB logical model using rules
220 implemented as a pipe-line of XSLT2 scripts working on the XMI representation of
221 the UML. The scripts will produce relational database DDL scripts defining the
222 database schema. That schema itself is not normative, instead we will define the
223 replies to TAP metadata queries. We provide implementaiton scenarios in the text below,
224 for the case of someone using the results from this project completely and for the
225 case of someone implementing a SimDB on top of a legacy database.
226 </li>
227 <li> a messaging format for sending instances of the various components
228 in the data model around.
229 <br />This format will be based on a number of XML
230 schema documents (XSDs), one of which contains the root elements defining valid SimDB resources.
231 This requires a mapping from the UML to XSD.
232 This mapping will take the form of one or more XSLT documents.
233 </li>
234 <li> An IVOA working draft document describing these components.
235 <br />This will be based on the current document.</li></ul>
236 </p>
237 <p>
238 We introduce some non-normative solutions that can be taken over for generic
239 data models (this is ofcourse also true for the UML/XMI+XSLT approach for the
240 normative standards).
241 <ul>
242 <li> The XSLT scripts we propose above do not work on the XMI itself, but on
243 an intermediate representation of the UML data model. This is an XML dialect
244 based on a schema we define and which captures the UML profile more directly.
245 XMI is very generic and rather cumbersome to work with. The representation of
246 the UML in our intermediate XML form is much more readable and XSLT based on it
247 is much simpler. It also allows easier adaptation to future modifications in UML,
248 or to tools whose XMI representation is different from the standard. We only need
249 to update the XMI->Intermediate XSLT transformation scripts. Not the more complex
250 transformations to the other official representations.
251 We will propose a similar approach to the DM WG.
252 </li>
253 <li> We will provide XMI->Java+JPA+JAXB transformation scripts in XSLT (properly, intermediate->Java).
254 These scripts generate Java classes corresponding to the types (Class, DataType, Enumeration)
255 in UML. These classes are annotated with Java Persistence Architecture (JPA)
256 and Java Architecture for XML Binding (JAXB) attributes to assist in the transformation
257 between relational database and XML representations.
258 Similar scripts can be written for C#. C# allows the same annotations as Java 5 supports
259 already for longer. For persistence we will likely use Linq, which seems similar to JPA.
260 </li>
261 <li>We propose an approach for including application specific and legacy simulation databases
262 in this framework. This approach follows the "global-as-view" approach to information
263 integration (see for example http://www.deg.byu.edu/papers/PODS.integration.pdf;
264 Leonid Kalinichenko from the RVO is an expert in this field).
265 Implementors with an existing relational database schema may be able to define database
266 views which implement the relational representatiopn of the SimDB data model,
267 and in this way provide a simple way to support querying of their database using ADQL.
268 </li></ul></p>
269 <h4>organisation</h4>
270 <p>
271 The SimDB is ready to be transferred to the DM WG.
272 <br />We propose that Gerard Lemson keeps leading this effort (as main editor), also when it is moved
273 to that WG. The DM WG's chair (Mireille Louys) will be responsible all WG-chair
274 issues associated with moving a specification through the document process.
275 The people at the bottom will be part of a "tiger team" to push the standard to RFC.
276 We may want to expand this group with an expert from each of the WGs mentioned below.
277 </p>
278 <p>
279 We have been discussing the data model for some time now.
280 Various projects (Italy, USA, France and Germany) have implementations that are similar
281 to the envisioned SimDB. We believe that by autumn 2008 it can go to RFC.
282 Patriza Manzato and Rick Wagner will have reference implementations based on existing DBs,
283 so will various projects in France (Lyon: Jeremy Blaizot and Laurent Bourges;
284 Galmer database: Igor Chillingarian) and GAVO.
285 </p>
286 <p>
287 Other relevant working groups for this process are Registry, ADQL and Semantics, possibly DAL.
288 Registry because the simulation database is similar to a registry. We can
289 learn from implementations and the registry interface. Also, we (think we) may need an
290 extension to the IVO Identifier in the implementation of references in SimDB.
291 ADQL because we propose it to be the standard (main) query interface to a SimDB implementation.
292 Semantics because our model includes usage of semantic vocabularies, maybe full ontologies
293 DAL because we our proposal for using ADQL in the query phase requirs a version of
294 the TAP protocol for defining the interface.
295 We would like to include a person from each of these WGs in the tiger team.
296 Our wishes are: Ray Plante (Registry), ? (ADQL), Norman Gray (Semantics), (?) TAP.
297 Ray and Norm have contributed to early discussions about SNAP.
298 </p>
299 <p>
300 Of these other efforts it seems TAP offers the main risk for the SimDB standard to go to
301 RFC by the Autumn. What may help us is that we do not need all the details of TAP.
302 In particular the information_schema approach allowing users to
303 query for the data model is not required as it is part of SimDB specification.
304 We mainly need a prescription for sending ADQL queries to the SimDB, and what the
305 format of results should be.
306 Since we expect meta-data databases to be relatively small (compared to
307 say an SDSS or Millennium database), we expect fewer, if any problems with
308 performance and can stick to synchronous behaviour at first.
309 </p>
310 <p>
311 We may need some explicit registry-interface like features such as returning a
312 complete XML document according to the messaging format of the SimDB data model.
313 Other issues will come up during the next phase of the discussions.
314 </p>
315
316 <h3>Simulation Data Access Protocol (SimDAP)</h3>
317 <p>
318 The second spin-off of the SNAP project we propose we rename to <i>Simulation Data Access Protocol</i> (SimDAP).
319 It deals with accessing the data after discovery by some means,
320 likely trough an implementation of a Simulation Database.
321 It should handle special services such as cut-out, projection,
322 extraction (AMR-like cut-outs, produces regular grids), but also staging etc.
323 It should also deal with data formats. Claudio Gheller (Italy) is leading
324 this effort with close help of Rick Wagner (USA).
325 </p>
326 <p>
327 This project needs more fleshing out and is hopefully ready to be transmitted
328 to a WG, likely DAL by the Autumn interop.
329 </p>
330 <h3>Connections between SimDB and SimDAP</h3>
331 <p>
332 The two projects are connected as follows:
333 The meta-data formats to be included in SimDAP messages are derived from
334 the data model of the SimDB.
335 Vice versa, the SimDB will include a component describing
336 which SimDAP services are applicable/available for a given simulation.
337 </p>
338
339 <!-- ++++++++++++++++++++++++ -->
340 <h2><a name="sec2"/> 2 Overview</h2>
341
342 <h3><a name="sec2_1"/>2.1 SNAP &rArr; SimDB + SimDAP</h3>
343 <p>This document presents a model for describing certain types of numerical computer simulations
344 and certain types of simulation post-processing products. The model was oringinally envisioned to
345 be used in the query part of the <i>Simple Numerical Access Protocol</i> (SNAP),
346 and in discovery of interesting SNAP services in the first place.
347 After investigating the application domain carefully, we have decided to leave the concept of
348 designing a DAL-like SxAP protocol for simulations. Instead we have split up the effort into
349 two separate efforts that can be used each in their own right, though their is a clear link between them.
350 This document discusses the firsts of these, which we have named the <i>Simulation Database</i>, and
351 will have the acronym <i>SimDB</i>. The second will be developed further in a separate effort amd is
352 called the <i>Simulation Data Access Protocol</i> (SimDAP, "Sim" stands for "Simulation", <i>not</i> "Simple"!).
353 </p>
354 <p>
355 Following SNAP, SimDB only explicitly considers simulations for systems that represent a space-time
356 sub-volume of the universe and (part of) its material contents. Examples of such simulations are
357 cosmological, pure dark matter N-body simulations of the large-scale structure of the universe;
358 adaptive mesh refinement (AMR) simulations following the evolution of a galaxy cluster using full hydrodynamics;
359 a simulation of the evolution of a globular cluster using a combination of tools, together simulating
360 the various types of physics <em class="todo">@@ TODO reference to MODEST-like activities</em>; or
361 simulations calculating the few seconds of a super nova explosion in full 3D.
362 </p>
363 <p>
364 In general these simulations will evolve this system forward
365 in time and are able to produce <i>snapshots</i>, representing the state of the system, a 3D volume of space,
366 at a number of discrete times (though there are alternatives: light cone simulations, individual particle orbits).
367 These direct, raw results of simulations we call Level-0 products, following
368 similar terminology for observations.
369 SimDB also covers Level-1 products, which consist of the results of certain types of post-processing
370 of simulations, namely those products that in some form create an alternative representation of
371 a spatial sub-volume of the universe. For example a density field calculated on a regular grid, derived
372 created from an N-body or an AMR simulation; a cluster catalogue derived using some group finder applied
373 to a cosmological simulaiton, or a synthetic galaxy catalogue derived from the cluster catalogue using
374 halo occupation distribution models (HODs) or semi-analytical models (SAMs).
375 </p>
376 <p>
377 We do not make any restrictions on the type of systems being simulated, or the size of the
378 simulation, or the way the system is represented in the simulation code and results. We also
379 make no restrictions on the type of "observables" produced by the simulations.
380 </p>
381 <p>
382 The SimDAP
383 specification will includes protocols for services that process level-0 or level-1 results and produce
384 other level-1 results. The allowed services deal with selecting the results in a
385 sub-volume of the complete result, sampling a regular 3-dimensional grid, etc. SimDAP also allows for
386 services, that do not produce SimDB-like, level-0 or 1 products. Examples are projections, 1D or 2D samplings.
387 But also custom services will be allowed, for example calculating statistical properties such as correlation
388 functions or power spectra in cosmological simulations. A more detailed description of SimDAP
389 is outside of the main scope of this note.
390 </p>
391 <h3><a name="sec2_2"/>2.2 Simulation Database: structure, interface and applicable services</h3>
392 <p>
393 SimDB is a specification that defines the interface to a database containing meta data describing
394 simulations. To this end it contains two main parts, one is a model for the meta data, the other
395 a protocol for interacting with the database. The model is the core of the specification.
396 It describes the structure of individual data products in the database. We have chosen UML
397 as modelling language, as prescribed by the data modelling working group in the interoperability meeting
398 in Cambridge, UK, May 2003.
399 </p>
400 <p>
401 The UML model is a logical model (see [..] <em class="todo">@@ TODO add reference @@</em>) and
402 forms the basis for physical representations of the data products in the standard
403 language that the IVOA has chosen for such purposes, XML. We derive an XML schema defining valid
404 XML documents directly from the logical model. The SimDB interface will include functions for insetting
405 SimDB data products using such documents, and for retrieving individual, identified data products.
406 </p>
407 <p>
408 The logical model also forms the basis for a physical representation supporting formulation of queries.
409 For various reasons explained below we have chosen ADQL to be the query language and accordingly we derive
410 from the model a relational schema that defines the tables and columns that can be used in ADQL queries sent
411 to a SimDB implementation. The result of ADQL queries is supposed to be a VOTable, and this will in general
412 not represent a complete SimDB data product. However it can be used to browse the database, finally identifying
413 resources and possibly requesting these from the SimDB as XML documents.
414 </p>
415 <p>
416 We make very limited assumptions on <em>how</em> a data product discovered in a SimDB can actually be accessed.
417 We only assume there is a web-based service available, identified by a base URL and tagged with a service type.
418 The range of service types will be defined by SimDAP, but it will at least include "download" and "custom".
419 The data model contains an explicit element for indicating which services are available for a given data product,
420 and users may, if they wish, retrieve this information through ADQL queries and follow the links directly.
421 SimDB implementations can and likely will eventually provide SimDAP related functionality, but this is not part
422 of this specification.
423 </p>
424 <h3><a name="sec2_3"/>2.3 Registration</h3>
425 <p>
426 It must be possible to find SimDB instances in an IVOA Resource Registry <am class="todo">@@TODO add references&&</am>.
427 This implies we need a corresponding resource type, and we have to design its structure.
428 We also assume that one may define resources in the sense of [...]
429 <em class="todo">@@ TODO add reference to Resource data model document @@</em>
430 from within the contents of a SimDB. We take this into account explicitly in the model.
431 The SimDB will have a "getIVOAResource" function, which will execute the appropriate transformation from
432 the internal representation of the SimDB data products to the Resource model's XML representation [...]
433 <em class="todo">@@ TODO link to Resource XML schema document@@</em>.
434 This will likely put more requirements on the Registry model itself, maybe requiring extensions to its schema.
435 Possibly a SimDB itself can be an extension registry. This we think can be postponed to a future version of the
436 specification.
437 </p>
438 <h3><a name="sec2_4"/>2.4 Technology: UML, XMI, XSLT</h3>
439 <p>
440 We
441 </p>
442 <h3><a name="sec2_5"/>2.5 Reference implementations</h3>
443 <!-- ++++++++++++++++++++++++ -->
444
445 <h2><a name="sec3"/>3 Usage scenarios</h2>
446 <em class="todo">@@ TODO needs severe editing @@</em>
447 We have assembled a list of explicit use cases and scenarios from which we derive
448 requirements for the current model and the SNAP protocol.
449 <h4><a name="sec3_1"/>3.1 "20 questions"</h4>
450 <p>
451 SimDB defines a common data model for simulations.
452 Following the good practice for database design initiated in [], we here provide a number of
453 scientific questions one might want to ask such a database. The data model and associated data
454 access protocol need to be sufficiently rich that they can support such questions.
455 </p>
456 <ul>
457 <li> Scientific goal: investigate baryon wiggles in the evolved density field<br/>
458 Query: Return all cosmological, pure dark matter, N-body simulations with WMAP 3 initial
459 conditions and a box size of at least 1000 Mpc comoving, containing snapshots at about
460 10 redshifts between 3 and 0.
461 </li>
462 <li> Scientific goal: investigate whether observed structures in X-ray cluster that seem to
463 indicate turbulence, can truly be that.<br> Query: return all hydro-dynamical simulations of
464 galaxy clusters of mass at least 1o<sup>14</sup> M<sub>sun</sub>,
465 that have a model for viscosity included in the simulation.
466 Moreover, return only those simulations that have associated to them an online visualisation
467 service that can produce projected temperature and pressure maps.
468 </li>
469 <li> Scientific goal: interpret the possible histories of an observed galaxy merger to calculate
470 possible star formation episodes and compare these to the observed stellar populations.<br>
471 Query: Return all simulations of galaxy mergers where the component galaxies have a particular
472 mass ratio and where there are enough snapshots to follow the evolution over a few Gyr.
473 </li>
474
475 <li> Scientific goal: compare the luminosity function of galaxies in the SDSS survey with those
476 in synthetic catalogues.<br>Query: Select all cosmological simulations that have produced as
477 secondary product synthetic galaxy catalogues on a light-cone and provide those via an SQL (ADQL?)
478 query interface.
479 </li>
480 <li> ...
481 </li>
482 </ul>
483 <p>
484 In the design of the model it is useful to think about the steps a user might go through
485 when querying a database system in various "drilling down" steps. For example the following
486 questions might be asked :
487 </p>
488 <p>
489 <ul>
490 <li>What system/object is being simulated?</li>
491 <li>What physical processes are included?</li>
492 <li>How is the system being represented in the simulation
493 (particles (Langrangian), (adaptive) mesh (Eulerian)), both, other?</li>
494 <li>Per process:<ul>
495 <li>How are the physical processes implemented ?</li>
496 <li>Characterise the numerical approximations (.e.g. resolution, softening parameter)</li></ul></li>
497 <li>What observables are available for the system/object, possibly as function of time?
498 As it is a spatial system, at least size, center-of-mass position.</li>
499 <li>What observables are available for the constituents, i.e. what is the schema of the atomic objects?</li>
500 <li>Per snapshot, per atomic object type, per variable:
501 <ul>
502 <li>Characterise the possible values</li>
503 <li>Characterise the result</li></ul></li>
504 <li>Are post-processing results available?</li>
505 <li>Are services/applications available working on the results?</li>
506 <li>Which code ran the simulation?</li>
507 <li>What were values of physical parameters?</li>
508 <li>How were initial conditions created, what parameters?</li>
509 </ul>
510 </p>
511
512 <h4><a name="sec3_2"/>3.2 SimDB-standard implementation</h4>
513 We foresee a simple implementation scenario based directly on products developed
514 in the course of the SimDB effort. We believe that from the data model to be developed
515 in this effort we should be able to derive physical representations that
516 can be used directly in implementations. We envisions that with only a little custom infrastructure code
517 it should be possible to
518 <ul>
519 <li>fill a relational database with tables and views representing the SimDB data model from
520 DDL scripts generated from the UML</li>
521 <li>create a web-based service that accept XML documents for inserting new simulation results
522 and translates these, using generated code with JAXB annotations, to in memory Java objects</li>
523 <li>flush these objects to a relational database using the Java Persistence Architecture (JPA) implementation,
524 structured using the JPA annotations generated on the Java classes.
525 It should be not too hard to support other languages as well if they provide similar simple XML binding and
526 OR-mapping capabilities. Python+Django and C#+LINQ or NHibernate come to mind.<em class="todo">
527 @@ TODO check with people knowing more about these technologies @@</em></li>
528 <li>accept ADQL queries that are translated to the appropriate vendor specific SQL
529 (using modules defined by the ADQL effort?) and return a VOTable</li>
530 <li>accept requests for identified SimDB resources (using an IVO or implementation specific identifier),
531 translate this into a JPA query to retrieve the object form the database, which is translated to
532 the appropriate XML using the JAXB layer and sent back to the user.</li>
533 </ul>
534
535 <h4><a name="sec3_3"/>3.3 Legacy database</h4>
536 Although by no means as common as similar efforts in the observational domain,
537 databases have been developed containing the meta data of simulations.
538 How could a SimDB be implemented around such a database.
539 Our ideas are inspired by (what we understand from) the "global-as-view" approach to information
540 integration. We assume the implementers have their own way of filing up their database with meta-data
541 describing simulations from their own efforts. The idea is that they write database views to provide
542 a virtual implementation of the SimDB/RDB schema. ADQL queries sent to their service can now still be
543 understood and replied to. The users should also be able to write custom code to produce the appropriate
544 XML documents based on a request for an identified resource, possibly by querying these same views.
545
546 <h4><a name="sec3_4"/>3.4 Meta data production pipe line</h4>
547 The SimDB data model is relatively comprehensive, which reflects itself in XML documents
548 of substantial size ad complexity for realistic cases.
549 For a registration scenario, i.e. one where a user is allowed to upload XML documents to a SimDB implementation,
550 one would prefer not to have to produce these documents by hand. By far the preferred manner in our opinion
551 would be for simulation and post-processing pipe-lines to produce compliant documents.
552 We have contacted authors of some of the most popular major simulation codes (Springel; Norman et al; more needed),
553 and they have agreed that this is feasible and are willing to participate in this effort.
554
555 <h4><a name="sec3_5"/>3.5 Client tools</h4>
556 One reason to produce a standard which uses ADQL on top of a standard data model is that client tools
557 can be written to query different such holdings. For example we could envision a tool such as VisIVO [..]
558 to offer some user-friendly interface for querying SimDB implementations retrieved from an IVOA Registry.
559 The user need to see any ADQL, that is all generated by VisIVO, but can be shown results and services.
560 In particular if a cut-out service is available, VisIVO could provide an interface for the user to decide
561 on the sub-volume, retrieve and visualise it. The advantage of having a standard data model
562 clearly is that the same ADQL can be sent to all SimDB services.
563 <em class="todo">@@ TODO contact VisIVO people to see whether this could be implemented @@</em>.
564
565 <!-- ++++++++++++++++++++++++ -->
566
567
568 <h2><a name="sec4"/>4 Analysis model</h2>
569 <em class="todo">@@TODO Gerard@@</em>
570 An <i>analysis model</i>, also called domain model, is an abstract, high-level representation of the
571 <i>universe of discourse</i> (UoD), the part of the world that our application deals with.
572 It is a UML model, with emphasis on the concepts and their exact relationships in the UoD, though details
573 such as attributes need not be completely filled in.
574 Importantly, it should not be influenced by application scenarios apart form knowledge of their UoD.
575 Here we describe the UoD and our analysis model. The model is strongly influenced by patterns
576 discovered in earlier work on a
577 <i><a href="http://www.ivoa.net/internal/IVOA/IvoaDataModel/DomainModelv0.9.1.doc">Domain model for Astronomy</a></i>,
578 co-written by one of the authors of the present note. We describe some of its main patterns below as well.
579
580 <h4><a name="sec4.1"/>4.1 Universe of Discourse</h4>
581
582 <h4><a name="sec4.2"/>4.2 Domain Model for Astronomy</h4>
583
584 <h4><a name="sec4.3"/>4.3 SimDB analysis model</h4>
585 <em class="todo">@@TODO create a version and add it to volute@@</em>.
586
587 <!-- ++++++++++++++++++++++++ -->
588
589 <h2><a name="sec5"/>5 Logical Model: SimDB</h2>
590 <p>
591 Here we introduce the core of our proposal, the UML representaiton of our logical data model
592 for our Simulation Database. The exact representation of this model is an
593 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SimDB_DM.xml">XMI file</a>,
594 which can be found in the <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm">snapdm section</a>
595 of the <a href="http://volute.googlecode.com/svn/">Volute subversion database</a> on Google code.
596 Other representations can be found in that same hierarchy, in particular check out the
597 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SimDB_DM.xml">HTML documentation</a> which we generated from the XMI
598 representation with the XSLT pipeline described in <a href="#appB">Appendix B</a>. This generated documentation file contains
599 the explicit description of all of the elements in the model and forms the reference documentaiton document for the model.
600 </p>
601 <h3><a name="sec5_1"/>5.1 Overview</h3>
602 <p>
603 The logical data model is a fully detailed model of the application domain. It is to form the basis of physical
604 models, representing the model in various computational environments.
605 The logical model is represented as a set of UML diagrams, which we created using MagicDraw Community Edition 12.1 and stored as an
606 XMI file in the GoogleCode
607 SVN repository: <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SNAP_Simulation_DM.xml">
608 SNAP_Simulation_DM.xml</a> <em class="todo">@@TODO should change all occurrences of names with SNAP to using SimDB@@</em>
609 JPG representations of the model can be found in <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/images/">this</a>
610 directory. <em class="todo">@@TODO find proper representation image of the complete model. Possibly color packages differently.@@</em>
611 </p>
612 <h3><a name="sec5_2"/>5.2 Normalisation</h3>
613 <p>
614 </p>
615 <h3><a name="sec5_3"/>5.3 Model contents</h3>
616 <p></p>
617 <h4><a name="sec5_3_1"/>5.3.1 Resource hierarchy</h4>
618 <p>
619 At the root of the SimDb data model is an abstract class called Resource, in the rest
620 of this document we will refere to this as SimDB/Resource.
621 It represents the different types of highest level meta-data objects to be stored in a SimDB.
622 Examples of this are represented as subclasses. First Experiment (SimDB/Experiment), which represents
623 different types of experiments that have been performed (run/executed/...) and have produced the results
624 that SimDB users may be interested in. Examples of SimDB/Experiment-s are first simulations,
625 but also the various post-processing operations transforming simulation results into other products
626 such as halo catalogues, density fields etc.
627 </p>
628 <p>
629 The second major type of SimDB/Resource is the SimDB/Protocol.
630 This concept represents a <i>formally prescribed way of doing an experiment</i>.
631 It is derived from the concept with the same name in the domain model, which itself was inspired
632 by the concept with the same name in Chapter 8.5 in <a href="#r_AnalaysisPatterns>[3]</a>.
633 In the SimDB/DM this concept has concrete representations in the computer programs that are being
634 used to run simulations and post-processing etc. As such it defines the possible input parameters,
635 possble algorithms, the kind of results that can be produced by the code. Every SimDB/Experiment must
636 indicate which SimDB/Protocol was used and for example provide values for the input parameters, indicate
637 which physics was used
638 </p>
639 <p>
640 The SimDB/Resource concept is clearly similar, but in general <i>not equivalent</i> to the Resource Registry's Resource concept.
641 In data modeling terms, it is not true that a SimDB/Resource <i>is a</i> Registry/Resource.
642 Often the reason is similar to the reasons that a single image is not a Registry/Resource, whereas a SIAP-compatible service is.
643 The granularity of a SimDB will be higher than a Registry and many simulations on their own will be too small.
644 The SimDB itself will have to be registered (see <a href="#">section ???</a> for a further discussion
645 <em class="todo">@@ TODO add propoer section and href@@</em>),
646 i.e. a SimDb service <i>is a</i> Registry/Resource. In discussion with Ray Plante (IVOA Interop May 2007, Beijing)
647 on this issue it was proposed that some part of the contents could also be registered in a Registry directly.
648 I.e. we should be able to identify Registry/Resource-s in SimDB. Considerations to decide on how to make this identification would be for example
649 that all data products resulting form a well defined (and published) scientific project could qualify.
650 To represent such a possibility for now we have introduced another subclass of SimDB/Resource: SimDB/Project.
651 This is not much more than an aggregation of experiments, with some additional atrributes describing the motivation etc.
652 The metadata of a SimDB/Project is not the same as that of a Registry/Resource, however we propose that we should be able
653 to define a transformation (possibly implemented again in XSLT) to transform a SimDB/Project and produce a Registry/XML representation.
654 Some more thoughts on this subject will be given in <a href="#">section ???</a> <em class="todo">@@ TODO add proper section and href@@</em> mentioned above.
655 </p>
656
657 <h4><a name="sec5_3_2"/>5.3.2 Object types</h4>
658 <p>
659 </p>
660
661 <h4><a name="sec5_3_3"/>5.3.3 Target</h4>
662 <p>
663 The first question most people want to know about a simulations is: "what is being simulated?".
664 The answer should correspond to a real (astronomical) object, or collection of objects,
665 or possibly a physical process. For SimDB to answer such questions implies that publishers must be
666 able to describe these concepts in the model.
667 We have introduced the TargetObjectType and TargetProcess classes for this.... <em class="todo">@@ TODO expand @@</em>.
668 </p>
669
670 <h4><a name="sec5_3_4"/>5.3.4 Characterisation</h4>
671 <p>
672 </p>
673
674 <h4><a name="sec5_3_5"/>5.3.5 Semantics</h4>
675 <p>
676 There are many instances in the data model where we need to describe elements of the
677 SimDB/Resource-s explicitly, because we do not have implicit information based on the context.
678 Examples are the various properties of object types, the target objects and processes etc.
679 Apart from a name and a description we then frequently add
680 an attribute which is supposed to "label" the element according to an assumed standard list of terms.
681 We model this using the <pre>&lt;&lt;ontologyterm&gt;&gt;</pre> stereotype. Attributes with this stereotype
682 are assumed to take their values form such a predefined "ontology".
683
684 </p>
685
686 <h4><a name="sec5_3_6"/>5.3.6 Units</h4>
687 <p>
688 The current (May 2008 <em class="todo">@@ TODO update when necessary @@</em>) version of the model
689 allows publishers to specify numerical quantities using a real value and a unit.
690 I.e. we do not prescribe units for particular quantities.
691 Allowing this flexibility in units assignment does pose a problem for a query interface that allows user to query on
692 characterisation values and other numerical quantities. ADQL does not include units for example, but a user
693 can not assume that every publisher will use the same unit for for example the typical size of a simulation box.
694 This is even worse of course for the characterisation values of properties that have to be defined
695 in the model and can have any kind of assumed unit.
696 </p>
697 <p>
698 We believe we should treat units as a special semantic vocabulary, possibly an ontology.
699 This implies we push its development off to elsewhere for now, and assume we can
700 at some point use a standard list of units in a similar way to the other ontology references.
701 Maybe this could include a link to the physical quantity (etc, see for example the
702 <a href="http://physics.nist.gov/cuu/Units/introduction.html">NIST reference on SI</a>) to which the unit applies.
703 </p>
704 <p>
705 If this kind of link can be made, we could eventually attempt to impose a single unit to correspond to
706 all properties sharing a given <a href="http://physics.nist.gov/cuu/Units/introduction.html">quantity in the general sense</a>.
707 This may lead to very small or very large values, depending on the simulation, but at least allows simpler
708 interfaces.
709 </p>
710 <em class="todo">@@ THIS ISSUE NEEDS RESOLVING @@</em>
711 <h4><a name="sec5_3_7"/>5.3.7 Services</h4>
712 <p>
713 The goal of the SimDB specification is to define a protocol for querying interesting simulations
714 and related SimDB/Resource-s.
715 Once these have been identified the user should be able to access these simulations.
716 We assume that web services are the means to do so, and allow publishers to indicate such
717 web services as are available for a given Experiment. We assume for now that we know little of the
718 web service beyond some generic types: <i>download, cut-out, extraction, projection, custom</i>.
719 The SimDAP specification is being developed to address those aspects in detail.
720 We assume that there will be a base-URL implementing some standard DAL (VOSI?) like services
721 and leave it up to SimDB-client implementations to interact with these services in standard manners.
722 Only custom services can be directly accessed, and for now many services will necessarily be custom.
723 </p>
724
725 <h2><a name="sec6"/>6 Physical models</h2>
726 <p>
727 Here we describe how we create <i>physical models</i> out of the logical model.
728 A <i>physical model</i> is (see <em class="todo">@@TODO reference to some standard reference on data modelling@@</em>)
729 a representation of the logical model that is adapted to a particular software environment.
730 We present physical representations for the following contexts:
731 <ul>
732 <li>XML: we present an <a href="http://www.w3.org/XML/Schema">XML schema</a> defining valid XML documents</li>
733 <li>Relational databases: we derive a relational database schema for storing instaces of the model.</li>
734 <li>Java: we present Java classes representing the
735 </ul>
736 We actually <i>derive</i> these representations from the logical model using transformation rules implemented in XSLT.
737
738 we give pointers to the actual schema documents resulting from an implementation of such rules in XSLT.
739 In a similar manner we define rules and provide XSLT based implementations of these,
740 to derive a relational database schema from the logical model.
741 We propose this model as we want to use <a href="http://www.ivoa.net/Documents/latest/ADQL.html">ADQL</a>
742 in the protocol for querying SimDB-s. To this end we also produce first approximations to a <a href="">TAP
743 </p>
744 <p>
745 Our complete XSLT pipeline is described in more detail in <a href="#appB">Appendix B</a>.
746 It also produces an HTML document of the UML model in standardised form. The HTML document describes all
747 object types, value types and enumerations in detail. It also derives UTYPE-s for all features.
748 Once a more formal UTYPE definition is being worked out in the
749 The XSLT further produces Java classes which, using
750 <a href="http://java.sun.com/javaee/technologies/persistence.jsp">Java Persistence API (JPA)</a> and
751 <a href="http://java.sun.com/developer/technicalArticles/WebServices/jaxb/">Java Architecture for XML Binding (JAXB)</a>
752 annotations, provides simple means to store contents of SimDB/XML documents in
753 a SimDB relational database and retrieve them from there again.
754 </p>
755
756 <h3><a name="sec6_1"/>6.1 Identifiers and References</h3>
757
758 We want to be able to identify each instance of each concrete type explicitly in a globaly unique way.
759
760 To this end we need to be able to assign identifier on each
761
762
763 <h3><a name="sec6_2"/>6.2 RDBM Schema</h3>
764 The public schema, i.e. the view the outside world has of a SimDB, is a relational schema.
765 This will be formally defined using VOTables containing the appropriate TABLE definitions
766 <ul>
767 <li>object types are mapped to tables, one table per object type</li>
768 <li>Inheritance hierarchies: JOINED strategy as defined in JPA, i.e. each table only has columns for the attributes and references defined on the corresponding type.
769 Also an ID column that is a PK and also a FK to the ID of the base class' table. Possibly a container column (see below)</li>
770 <li>Primary key column: <tt>ID NUMERIC(18)</tt></li>
771 <li>Foreign key to container: <tt>containerId</tt><br/>plus foreign key and index declaration</li>
772 <li>References: &lt;referenceName&gt;Id<br/>plus foreign key and index declaration.</li>
773 <li>Using topological sort of object types based on (extends|container|reference) relations we generated
774 create table statements and ther indexes and foreign keys in blocks. drop table statements in opposite order.</li>
775 <li>For each class we create a view named "v_&lt;class name&gt;<br/>returns all columns for that class; uses join to base class's view.</li>
776 <li>generate a discriminator column on table for root in inheritance hierarchy, stores name of class (must be unique in inheritance hierarchy!)</li>
777 <li>attributes mapped to single column if their type is simple (i.e. primitive, or enumeration)</li>
778 <li>if attribute's type is dataType mapped to as many columns as the dataType has attributes,
779 with column names the name of the dataType's attributes, prefixed by &lt;attribute-name&gt;_</li>
780 <li>For PK columns we use the
781 </ul>
782
783 <h3><a name="sec6_3"/>6.3 XML Schema</h3>
784 <p>
785 The DM WG has mandated (IVOA interoperability meeting, Cambridge, UK, May 2003) that one
786 such representation should be an XML schema.
787 We foresee that this representation can be used to communicate instances of SimDB/Resource-s as XML documents.
788 Such communication can be for registering new SimDB/Resources in a SimDB, or
789 used in message to communicate instances of the SimDB Resource type.
790 Here we shortly describe rules how to derive an XML schema from our logical model and
791 </p>
792
793 <h3><a name="sec6_4"/>6.4 UTYPE-s</h3>
794 <p>
795 It is generally the case that contents of databases may be represented in ways that do not
796 conform to one of the standard serialisations. Nothing prevents services to be developed on
797 top of SimDB that represent SimDB/Resource-s or even fragments of these in another form.
798 The standard example would be to have VOTables storing the results of a generic ADQL query of the SimDB/RDB representation.
799 VOTable first introduced the option to have a UTYPE attribute in FIELD definition tags store
800 a pointer to an element in a data model that the column represents.
801 </p>
802 <p>
803 The <a href="#r_SpectrumDatamodel">Spectrum data model</a> was the first to add explicit
804 UTYPE-s for each of the attributes in its model and the <a href="#r_CharacterisationDM">Characterisaiton data model</a>
805 has followed that example. As long as the precise usage and relation of the syntax of the underlying data model is
806 is not defined, we will follow these examples by assigning UTYPE-s explicitly to all elements in the model.
807 However, we will follow a fixed set of rules to makes this assignment and implement these in XSLT.
808 If a similar approach is at some time accepted within the IVOA, possibly in an alternative form, it will be straightforward
809 to adjust our definitions. The important point we want to make is that it is possible to simply define rules that then will
810 automatically produce the UTYPE-s for a given data model, i.e. the only discussion that is required is on the rules for doing so.
811 </p>
812 <p>
813 Our assumption is that the UTYPE should be able to uniquely represent any element in the data model, and in a manner
814 that is also easily interpreted. For now we assume that we need to point to those elements
815 that can be stored in a column in a VOTable, i.e. for now we are looking for "simple" elements.
816 We can use our relational mapping to identify all these features, they are
817 <ul>
818 <li> attributes (paying attention to attributes with non simple data types)</li>
819 <li> references (an identifier </li>
820 identifying the referenced object) and
821 <li>collections (through a pointer to the containing, parent object). </li>
822 </ul>
823 VOTable also allows arrays to be stored in single columns, so a collection can be stored as an array of identifiers of
824 child objects. There are some other features that are not explicitly modelled, but are implied.
825 Examples are the identifier (ID) assigned to all objects and the name of the object type of an object.
826 </p>
827 <p>
828 Of course we could give each of the elements a uniquely generated identifier, but we assume that UTYPE-s should hold
829 semantic information, otherwise we could use the XMI-ids generated by the UML modelling tool.
830 To identify any of these elements uniquely within the context of the IVOA,
831 we then need the following components:
832 <ul>
833 <li>name of element (possibly a path expression for structured attributes leading to a "leaf attribute")</li>
834 <li>name of containing object type</li>
835 <li>a path expression for the package(s) containing the object type</li>
836 <li>unique identifier of the model, possibly its name if that is to be unique in the IVOA DM efforts</li>
837 <li>some indication of the context, unless this can be implicit.</li>
838 </ul>
839 NB this assumes that we do not have a uniqueness rule on the names of object types within a model, something we do actually
840 assume in the mapping of SimDB/RDB above. In that case we could leave out the package path.
841 </p>
842 <p>
843 One could argue one could also give nice, unique names to each of the elements, but to find out what the actual element in
844 the model and in other representations one would still need to perform a look up. Such a uniqe name would likely include some of
845 the elements above anyhow. So we believe it would be a waste of efforts to do so and instead propose a simple convention
846 for deriving the UTYPE-s form the model based on this hiherarchy.
847 We have done so using these rules (in BNF-like notation)
848 <dl>
849 <dt>attribute</dt>
850 <dd>
851 <pre>
852 &lt;model-name&gt; ":" &lt;package-name&gt;[ "/" &lt;package-name&gt;]* "/" &lt;objecttype-name&gt; "." &lt;attribute-name&gt; [ "." &lt;attribute-name&gt;]*
853 </pre>
854 </dd>
855 <dt>reference</dt>
856 <dd>
857 <pre>
858 &lt;model-name&gt; ":" &lt;package-name&gt;[ "/" &lt;package-name&gt;]* "/" &lt;objecttype-name&gt; "." &lt;reference-name&gt;
859 </pre>
860 </dd>
861 <dt>collection (as array of p0inters to child objects)</dt>
862 <dd>
863 <pre>
864 &lt;model-name&gt; ":" &lt;package-name&gt;[ "/" &lt;package-name&gt;]* "/" &lt;objecttype-name&gt; "." &lt;collection-name&gt;
865 </pre>
866 </dd>
867 <dt>container</dt>
868 <dd>
869 <pre>
870 &lt;model-name&gt; ":" &lt;package-name&gt;[ "/" &lt;package-name&gt;]* "/" &lt;objecttype-name&gt; "." "CONTAINER";
871 </pre>
872 </dd>
873 <dt>ID</dt>
874 <dd>
875 <pre>
876 &lt;model-name&gt; ":" &lt;package-name&gt;[ "/" &lt;package-name&gt;]* "/" &lt;objecttype-name&gt; "." "ID";
877 </pre>
878 </dd>
879 <dt>object type name</dt>
880 <dd>
881 <pre>
882 &lt;model-name&gt; ":" &lt;package-name&gt;[ "/" &lt;package-name&gt;]* "/" &lt;objecttype-name&gt; "." "DTYPE";
883 </pre>
884 </dd>
885
886 </dl>
887 The HTML documentation generated from the logical model contains UTYPE-s for these features, generated according to these rules.
888 It will be obvious how to accommodate changes in the precise UTYPE specification, <em>as long as similar rules are upheld</em>.
889 </p>
890 <h3><a name="sec6_5"/>6.5 Java/JPA+JAXB (non normative)</h3>
891
892 <h2><a name="sec7"/>7 Query Protocols</h2>
893 <h3><a name="sec7_1"/>7.1 ADQL</h3>
894
895 <h3><a name="sec7_2"/>7.2 REST</h3>
896 <p>
897 Under this heading we mean a protocol whereby data products can be retrieved through
898 HTTP GET requests. Possibly also they can be POST-ed, or PUT.
899 This needs to be discussed further, but maybe can be punted until a future release.
900 The GET will always only be able to get a complete SimDB resource, serialised to SimDB/XML, similar to the Registry.
901 </p>
902 <h3><a name="sec7_3"/>7.3 TAP?</h3>
903 Issues:
904 <ul>
905 <li>How does TAP deal with units?</li>
906 <li>In TAP, does a table column containing values always has a single UCD and a single Unit?</li>
907 <li>Is TAP suited for this kind of metadata databases?</li>
908 </ul>
909
910 <h2><a name="sec8"/>8 Next Steps</h2>
911 <h3><a name="sec8_1"/>8.1 Reference implementations</h3>
912 <h4><a name="sec8_1_1"/>8.1.1 France</h4>
913 <em class="todo">@@ TODO Laurent @@</em>
914 <h4><a name="sec8_1_2"/>8.1.2 Germany</h4>
915 <em class="todo">@@ TODO Gerard @@</em>
916 <h4><a name="sec8_1_3"/>8.1.3 Italy</h4>
917 <em class="todo">@@ TODO Patrizia @@</em>
918 <h4><a name="sec8_1_4"/>8.1.4 USA</h4>
919 <em class="todo">@@ TODO Rick @@</em>
920 s
921 <h3><a name="sec8_2"/>8.2 Generating SimDB/XML documents from simulation pipe lines</h3>
922
923 <h3><a name="sec8_3"/>8.3 SimDAP services</h3>
924
925 <h2><a name="appA"/>Appendix A: Data modelling specifics</h2>
926 Here we describe various aspects of UML modelling as we applied it to the current
927 problem area.
928 <p>
929 UML allows communities to create a domain specific modelling language through its Profiling capabilities
930 <em class="todo">@@ TODO is this the proper term ?@@</em>.
931
932 We have an initial implementation of a UML profile as created by MagicDraw available under
933 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/IVOA%20UML%20Profile%20v-2.xml">this link</a>.
934 Here we list the main elements and give a a short motivation for their inclusion in the model/.
935 It is our opinion that the DM working group should be ultimately responsible for a profile such as this,
936 defining a domain specific language for all IVOA data modelling efforts.
937 </p>
938 <p>
939 As first step in our simulation pipeline we generate an XML document that represents the data model in a form
940 that is more easily interpreted, both by human readers and by XSLT scripts, than the XMI representation.
941 This document itself is structured according to an XML schema that
942 represents the UML profile rather directly and that we here shortly describe.
943 </p>
944 This schema is located in
945 <a href="http://volute.googlecode.com//svn/trunk/projects/theory/snapdm/input/intermediateModel.xsd">
946 http://volute.googlecode.com//svn/trunk/projects/theory/snapdm/input/intermediateModel.xsd</a>.
947
948
949 We introduce our own XML format, defined by the XML schema in
950 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/res/intermediateModel.xsd">intermediateModel.xsd</a>,
951 for representing the logical model. For the time being we call this the <i>intermediate representation</i>.
952 The first step in the generation pipeline is a translation of the XMI to an XML document following this format.
953 This transformation is implemented in the
954 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/res/xmi2intermediate.xsl">xmi2intermediate.xsl</a>
955 XSLT script. The latest version of the intermediate representation for the SimDB data model can be found in
956 <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/output/SNAP_Simulation_DM_INTERMEDIATE.xml">this location</a>.
957 All other generation scripts work on this intermediate representation, not on the XMI document.
958 Variations in tool-generated XMI, or different versions of XMI can now be supported by an appropriately adjusted
959 XSLT script.
960 One reasons why this may be useful is that are different tools may produce different versions or different
961 dialects of XMI. Another reason for this representation is that XMI is a rather complex representation of a UML
962 model. Since we are using a rather restricted <a href="#profile">profile</a> we do not need this generality, and
963 this allows us to represent the model using XML documents that are much easier to handle with XSLT.
964
965
966 <p>
967 We illustrate out UML profile using an example data model
968 derived form the SimDB/DM, shown in the following diagram:<br/>
969 <img src="img/example.jpg"/>
970 <br/>
971 We now describe the individual elements.
972 some of these are standard, some of these are domain specific extensions following
973 standard UML profile <i>stereotype</i> extension elements and associated <i>tag definition</i>.
974
975 <dl>
976 <dt><a name="uml_model"/>Model (no visual counterpart)<dt>
977 <dd>
978 <ul>
979 <li> &lt;&lt;model&gt;&gt; </li>
980 <ul>
981 <li>TagDefinition: author</li>
982 <li>TagDefinition: title</li>
983 </ul>
984 </ul>
985 </dd>
986
987 <dt> Package <br/><img src="img/package.jpg" /></dt>
988 <dd>
989 <ul>
990 <li>package containment</li>
991 <li>package dependency</li>
992 </ul>
993 </dd>
994 <dt> Class <br/><img src="img/class.jpg" /></dt>
995 <dd>
996 <ul>
997 <li>isAbstract<br/>
998 Indicated by <i>italicised</i> name of the object. Implies that no instances can be made of the class,
999 one needs sub classes for that.
1000 </li>
1001 </ul>
1002 </dd>
1003 <dt> DataType <br/><img src="img/datatype.jpg" /></dt>
1004 <dd></dd>
1005 <dt> Enumeration <br/><img src="img/enumeration.jpg" /></dt>
1006 <dd></dd>
1007 <dt> Property: attribute<br/><img src="img/attribute.jpg" /></dt>
1008 <dd>
1009 <ul><li>&lt;&lt;attribute&gt;&gt; </li>
1010 <ul>
1011 <li>TagDefinition: minLength<br/>
1012 </li>
1013 <li>TagDefinition: maxLength<br/>
1014 </li>
1015 </ul>
1016 <li> &lt;&lt;ontologyterm&gt;&gt; <br/>
1017 There are many instances in the data model where we need to describe elements of the
1018 SimDB/Resource-s explicitly, because we do not have implicit information based on the context.
1019 Examples are the various properties of object types, the target objects and processes etc.
1020 Apart from a name and a description we then frequently add
1021 an attribute which is supposed to "label" the element according to an assumed standard list of terms.
1022 We model this using the <pre>&lt;&lt;ontologyterm&gt;&gt;</pre> stereotype. Attributes with this stereotype
1023 are assumed to take their values form such a predefined "ontology".
1024 </li>
1025 <ul>
1026 <li>TagDefinition: ontologyURI<br/>
1027 A URL locating a standard (RDF|SKOS|OWL|???) document containing
1028 a list of terms from which the value for this attribute may be obtained.
1029 It is our opinion that the Semantics working group should be responsible for the
1030 definition of relevant ontologies (or semantic vocabularies, or thesauri, or ...)
1031 required for a given application domain, though the contents should be decided in
1032 cooperation with domain experts.
1033 </li>
1034 </ul>
1035 </ul>
1036 </dd>
1037 <dt>Inheritance
1038 <br/><img src="img/inheritance.jpg" /></dt>
1039 <dd>
1040 Indicates the typical <i>is a</i> relation between the sub-class and its base-class (the one pointed at).
1041 In this profile we do not support multiple inheritance. <em class="todo">@@ TODO explain? @@</em>.
1042 </dd>
1043 <dt>Binary association end: collection
1044 <br/><img src="img/collection.jpg" /></dt>
1045 <dd>
1046 This relation indicates a <i>composition</i> relation between one, parent object and 0 or more child objects.
1047 The life cycles of the child objects are governed by that of the parent.
1048 </dd>
1049 <dt>Binary association end: reference
1050 <br/><img src="img/reference.jpg" /></dt>
1051 <dd>
1052 This is a relation that indicates a kind of <i>usage</i>, or <i>dependency</i> of one object on another.
1053 It is in general shared, i.e. many objects may reference a single other object. Accordingly the referenced
1054 object is independent of the "referee". In our model the cardinality can not be &gt; 1.
1055 </dd>
1056 <dt>Binary association end: subsets
1057 <br/><img src="img/subsets.jpg" /></dt>
1058 <dd>
1059 This indicates that a relation overrides a relation defined on a base class.
1060 It does so by specifying that the class at the end point of the relation should be a subclass of the
1061 class at the enpoint of the original, subsetted relation.
1062 </dd>
1063
1064
1065 </p>
1066
1067
1068 <h2><a name="appB"/>Appendix B: XSLT pipe line</h2>
1069 <em class="todo">@@ TODO Laurent @@</em>
1070
1071 <h2><a name="glossary"/>Glossary and Acronyms</h2>
1072 <dl>
1073 <dt><a name="g_SimDB">SimDB</a></dt>
1074 <dd>Acronym for <i>Simulation Database</i>, the standard that we propose to define in this Note.
1075 Implementations of SimDB offer a query interface for discovering simulations (and related entities)
1076 using ADQL, based on a prescribed (i.e.normative) relational data model and for describing simulations
1077 via XML documents following prescribed XML (i.e. normative) schema.</dd>
1078 <dt><a name="g_SimDAP"/>SimDAP</dt>
1079 <dd>Acronym for <i>Simulation Data Access Protocol</i>, a related standard to SimDB,
1080 which will define services for accessing simulations discovered using SimDB.</dd>
1081 <dt><a name="g_SimDB/DM"/>SimDB/DM</dt>
1082 <dd>The logical data model defining the structure of <a href="#g_SimDB">SimDB</a>.</dd>
1083 <dt><a name="g_SimDB/RDB"/>SimDB/RDB</dt>
1084 <dd>The representation of the SimDB/DM as a relational data base schema.
1085 This implies a parti</dd>
1086 <dt><a name="g_SimDB/RDB"/>SimDB/Views</dt>
1087 <dd>The representation of the SimDB/DM as a collection of database view definitions. Each View directly represents
1088 a complete DM class as a relational table, this in contrast to the underlying SimDB/RDB representation in tables,
1089 at least in the JOINED object-relational mapping strategy.</dd>
1090 <dt><a name="g_SimDB/XML"/>SimDB/XML</dt>
1091 <dd>The XML representation of the SimDB/DM</dd>
1092 <dt><a name="g_SimDB/Resource"/>SimDB/Resource</dt>
1093 <dd>A top-level data product stored in a SimDB.
1094 A SimDB/Resource can be described in a SimDB/XML document, but none of its constituents can.</dd>
1095 <dt><a name="g_SimDB/TAP"/>SimDB/TAP</dt>
1096 <dd>The TAP(-like) metadata representation of the SimDB/DM.
1097 This is currently (May 2008 <em class="todo">@@ TODO update once the TAP specification is out @@</em>
1098 a representation of the <a href="#g_SimDB/Views">SimDB/Views</a> as a VOTable document.
1099 </dd>
1100 </dl>
1101
1102 <h2><a name="references">References</a></h2>
1103
1104 <p><a name="r_UML">[1] ???, <i>UML standard</i>
1105 <br/><a href="http://">http://</a>
1106 </p>
1107 <p><a name="r_XMI">[2] ???, <i>XMI standard</i>
1108 <br/><a href="http://">http://</a>
1109 </p>
1110 <p><a name="r_AnalaysisPatterns">[3] Martin Fowler, <i>Analysis Patterns</i>, 1997, Addison Wesley.
1111 <br/><a href="http://">http://</a>
1112 </p>
1113 <p><a name="r_TheoryinVO">[4] Lemson & Colberg, <i>Theory in the virtual observatory</i>
1114 <br/><a href="http://">http://</a>
1115 </p>
1116
1117 <p><a name="r_Characterisation">[5] ???, <i>Characterisation DM</i>
1118 <br/><a href="http://">http://</a>
1119 </p>
1120
1121 <p><a name="r_informatonIntegration">[6] <em class="todo>@@ TODO @@</em>references on global-as-view and information integration
1122 <br/><a href="http://">http://</a>
1123 </p>
1124
1125 <p><a name="r_visivo">[7] <em class="todo>@@ TODO @@</em>reference to VisIVO
1126 <br/><a href="http://">http://</a>
1127 </p>
1128
1129 <p><a name="r_SpectrumDatamodel">[8] <em class="todo>@@ TODO @@</em>reference to Spectrum data model
1130 <br/><a href="http://">http://</a>
1131 </p>
1132
1133 </body></html>

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26