/[volute]/trunk/projects/theory/snapdm/doc/note/SimDB-note.html
ViewVC logotype

Annotation of /trunk/projects/theory/snapdm/doc/note/SimDB-note.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 434 - (hide annotations)
Fri May 9 16:16:54 2008 UTC (13 years ago) by gerard.lemson
File MIME type: text/html
File size: 52882 byte(s)
Further updates
1 gerard.lemson 252 <?xml version="1.0" encoding="iso-8859-1"?>
2     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3     <html>
4     <head>
5     <title>IVOA Working Group - Internal Draft</title>
6     <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
7     <meta name="keywords" content="IVOA, International, Virtual, Observatory, Alliance" />
8     <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9     <meta name="maintainedBy" content="IVOA Document Coordinator, ivoadoc@ivoa.net" />
10 gerard.lemson 294 <link rel="stylesheet" href="http://ivoa.net/misc/ivoa_wg.css" type="text/css" />
11 gerard.lemson 434 <link rel="stylesheet" href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/css/simdb-note.css" type="text/css">
12 gerard.lemson 252 </head>
13    
14     <body>
15     <div class="head">
16     <a href="http://www.ivoa.net/"><img alt="IVOA" src="http://www.ivoa.net/pub/images/IVOA_wb_300.jpg" width="300" height="169"/></a>
17 gerard.lemson 256 <h1>Simulation Database (SimDB)<br/>
18 gerard.lemson 252 Version 0.x</h1>
19 gerard.lemson 256 <h2>IVOA Theory Interest Group <br />Internal Draft 2008 April 19 </h2>
20 gerard.lemson 252
21 gerard.lemson 256
22 gerard.lemson 252 <dt>This version:</dt>
23     <dd><a href="http://www.ivoa.net/Documents/...">
24     http://www.ivoa.net/Documents/...</a></dd>
25    
26     <dt>Latest version:</dt>
27    
28     <dd><a href="http://www.ivoa.net/Documents/latest/...">
29     http://www.ivoa.net/Documents/latest/...</a></dd>
30    
31     <dt>Previous versions:</dt>
32 gerard.lemson 256 <dt>Interest Group:</dt>
33 gerard.lemson 252 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory"> http://www.ivoa.net/twiki/bin/view/IVOA/IvoaTheory</a></dd>
34     <dt>Author(s):</dt>
35 gerard.lemson 294 <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/GerardLemson">Gerard Lemson</a> (editor)<br /></dd>
36     <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/LaurentBourges">Laurent Bourges</a><br /></dd>
37     <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/PatriziaManzato">Patrizia Manzato</a><br /></dd>
38     <dd><a href="http://www.ivoa.net/twiki/bin/view/IVOA/RickWagner">Rick Wagner</a><br /></dd>
39     <dd>others?</dd>
40 gerard.lemson 252 <hr/></div>
41    
42     <h2><a name="abstract" id="abstract">Abstract</a></h2>
43     <p>In this note we propose that the IVOA develop a standard protocol for discovering simulations.
44 gerard.lemson 271 We will call this protocol the <i>Simulation Database</i> (SimDB). Implementations of the SimDB will allow users to query for
45 gerard.lemson 252 results of simulations in quite some detail and will provide links to services for accessing these
46     simulations. </p>
47     <p>The results presented in this note, which form the core of the peoposed standard, are one half of a concerted effort of the theory Interest Group that originally went by the name
48     S<i>imple Numerical Access Protocol</i> (SNAP), and is now split up in two parts. The second part defines protocols
49     for accessing the simulations data products themselves. This part will be written up in a separate Note
50     (Gheller, Wagner et al, in preparation), under the name Simulation Data Access Protocol (SimDAP).
51     </p>
52     <p>The current proposal is built around a UML data model describing simulations, a representation (mapping) of this model as a relational
53     database schema and a mapping to an XML schema.
54     We propose the relational schema to be the outer facade of a SimDB-TAP implementation which is to be queried using
55 gerard.lemson 322 <a href="http://www.ivoa.net/internal/IVOA/IvoaVOQL/ADQL-20080415.pdf">ADQL</a> <em class="todo">.@@ TODO update the ADQL link to later versions @@</em>
56 gerard.lemson 313 The XML schema provides type definitions from
57 gerard.lemson 252 which a machine readable serialisations of the model may be constructed. The schema also defines root elements for documents
58     describing SimDB-resources. The SimDB should return such documents for identified SimDB-Resources upon request, as an
59     alternative to the tabular (VOTable) results of ADQL queries.
60     In case updates are supported by a SimDB implementation, such documents may be sent
61     </p>
62     <p>
63     This Note describes use cases and requirements and the approach we have taken to define a specification
64     that and current state of the results. We feel that the results are
65     sufficiently far evolved that they can start following the formal IVOA standardisation track.
66     To this end it could be turned over to one of the existing working groups. If that is the decisions we feel
67     that the data modelling WG is closest to its scope, but there exist very strong links to Registry, Semantics, ADQL
68     and DAL as well. One might argue that a targeted WG for this effort alone might be as appropriate.
69     We leave the decision about this to the IVOA exec.
70     </p>
71    
72    
73    
74     <div class="status">
75     <h2><a name="status" id="status">Status of this Document</a></h2>
76     This is a Note. The first release of this document was 2008 April 19.
77     <p></p><br />
78    
79     <!-- Choose one of the following (and remove the rest)-->
80     <!--Note-->
81     <p>This is an IVOA Note expressing suggestions from and opinions of the authors.<br/>
82     It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory.
83     It should not be referenced or otherwise interpreted as a standard specification.</p>
84    
85 gerard.lemson 253
86 gerard.lemson 252 A list of <a href="http://www.ivoa.net/Documents/">current IVOA Recommendations and other technical documents</a> can be found at http://www.ivoa.net/Documents/.
87    
88     </div><br />
89    
90     <h2><a name="acknowledgments" id="acknowledgments">Acknowledgments</a></h2>
91     <p>We thank various persons for useful discussions in the course of this work. First the participants of the
92     <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/CambridgeTheoryWorkshopFeb06">Feb 2006 theory
93     workshop</a> in Cambridge, UK, where this work was started. Second the participants of the
94     <a href="http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/GarchingSNAPWorkshop200704">April 2007 SNAP workshop</a> in
95     Garching, Germany, where the design started taking shape. Then we want to thank particularly the following persons
96     for useful discussions and feedback: Jeremy Blaizot, Klaus Dolag, Ray Plante, Volker Springel. We finally want to thank
97     participants to the theory sessions in the interoperability meetings in Victoria, Moscow, Beijing and Cambridge where parts
98     of this work was discussed.
99     </p>
100     <h2><a id="contents" name="contents">Contents</a></h2>
101     <div class="head">
102     <ul class="toc">
103     <li><a href="#abstract">Abstract</a></li>
104     <li><a href="#status">Status</a></li>
105 gerard.lemson 271 <li><a href="#acknowledgments">Acknowledgements</a></li>
106 gerard.lemson 252 <li><a href="#contents">Contents</a></li>
107 gerard.lemson 253 <li><a href="#sec1">1. Executive Summary</a></li>
108 gerard.lemson 252
109 gerard.lemson 271 <li><a href="#sec2">2. Overview</a></li>
110     <ul class="toc">
111 gerard.lemson 322 <li><a href="#sec2_1">2.1 SNAP &rArr; SimDB + SimDAP</a></li>
112     <li><a href="#sec2_3">2.3 Simulation Database: structure and interface</a></li>
113     <li><a href="#sec2_3">2.3 Registration</a></li>
114     <li><a href="#sec2_4">2.4 Technology: UML, XMI, XSLT</a></li>
115     <li><a href="#sec2_5">2.5 Reference implementations</a></li>
116 gerard.lemson 252 </ul>
117 gerard.lemson 294
118    
119 gerard.lemson 296 <li><a href="#sec3">3 Usage scenarios</a></li>
120 gerard.lemson 271 <ul class="toc">
121 gerard.lemson 296 <li><a href="#sec3_1">3.1 "20 questions"</a></li>
122     <li><a href="#sec3_2">3.2 SimDB-standard implementation</a></li>
123     <li><a href="#sec3_3">3.3 Legacy database</a></li>
124     <li><a href="#sec3_4">3.4 Meta data production pipe line</a></li>
125     <li><a href="#sec3_5">3.5 Client tools</a></li>
126 gerard.lemson 294 </ul>
127    
128     <li><a href="#sec4">4 Analysis model</a></li>
129 gerard.lemson 271 <ul class="toc">
130 gerard.lemson 294 <li><a href="#sec4_1">4.1 Universe of Discourse</a></li>
131     <li><a href="#sec4_2">4.2 <i>Domain Model for Astronomy</i></a></li>
132 gerard.lemson 296 <li><a href="#sec4_3">4.3 SimDB analysis model</a></li>
133 gerard.lemson 271 </ul>
134 gerard.lemson 294
135     <li><a href="#sec5">5 Logical model</a></li>
136 gerard.lemson 271 <ul class="toc">
137 gerard.lemson 296 <li><a href="#sec5_1">5.1 Overview</a></li>
138     <li><a href="#sec5_2">5.2 Normalisation</a></li>
139     <li><a href="#sec5_3">5.3 Target</a></li>
140     <li><a href="#sec5_4">5.4 Characterisation</a></li>
141     <li><a href="#sec5_5">5.5 Semantics</a></li>
142 gerard.lemson 271 </ul>
143 gerard.lemson 294
144     <li><a href="#sec6">6 Physical models</a></li>
145 gerard.lemson 271 <ul class="toc">
146 gerard.lemson 434 <li><a href="#sec6_1">6.1 Identifiers and references</a></li>
147     <li><a href="#sec6_2">6.2 RDBM Schema</a></li>
148     <li><a href="#sec6_3">6.3 XML Schema</a></li>
149     <li><a href="#sec6_4">6.4 Identifiers</a></li>
150     <li><a href="#sec6_5">6.5 JAVA/JPA+JAXB (non-normative)</a></li>
151 gerard.lemson 271 </ul>
152 gerard.lemson 294
153     <li><a href="#sec7">7. Query protocols</a></li>
154     <ul class="toc">
155     <li><a href="#sec7_1">7.1 ADQL</a></li>
156     <li><a href="#sec7_2">7.3 REST</a></li>
157     <li><a href="#sec7_3">7.2 TAP?</a></li>
158 gerard.lemson 271 </ul>
159 gerard.lemson 252
160 gerard.lemson 294 <li><a href="#sec8">8. Next steps</a></li>
161 gerard.lemson 271 <ul class="toc">
162 gerard.lemson 294 <li><a href="#sec8_1">8.1 Reference implementations</a></li>
163 gerard.lemson 271 <ul class="toc">
164 gerard.lemson 294 <li><a href="#sec8_1_1">8.1.1 France</a></li>
165     <li><a href="#sec8_1_2">8.1.2 Germany</a></li>
166     <li><a href="#sec8_1_3">8.1.3 Italy</a></li>
167     <li><a href="#sec8_1_4">8.1.4 USA</a></li>
168 gerard.lemson 271 </ul>
169 gerard.lemson 294 <li><a href="#sec8_2">8.2 SimDAP services</a></li>
170 gerard.lemson 271 </ul>
171 gerard.lemson 252 <br/>
172 gerard.lemson 294 <li><a href="#appA">Appendix A: Data modelling specifics</a></li>
173     <li><a href="#appB">Appendix B: XSLT pipe line</a></li>
174 gerard.lemson 322 <li><a href="#glossary">Glossary and Acronyms</a></li>
175    
176 gerard.lemson 252 <li><a href="#references">References</a></li>
177     </ul>
178     </div>
179     <hr/>
180    
181    
182     <br/>
183 gerard.lemson 296 <h2><a name="sec1">1. Executive summary</a></h2>
184 gerard.lemson 322 <em class="todo">@@ TODO Modify this text, which was originally an email to be sent to THEORY, TCG, DM, maybe EXEC @@</em>
185 gerard.lemson 253 <p>
186     We propose to derive two WG projects from what was so far the
187     SNAP project of the theory interest group: SimDB and SimDAP.
188     In this note we discuss the first of these, SimDB, in some detail.
189    
190     </p>
191     <h3> Simulation Database (SimDB)</h3>
192 gerard.lemson 294 <p>We propose to developa standard specification project, called the "Simulation Database" (SimDB).
193 gerard.lemson 253 It is based on the description+discovery part of the old
194     SNAP project. Its normative deliverables are
195     <ul>
196 gerard.lemson 271 <li> A logical data model for describing simulations.<br/>
197 gerard.lemson 253 Following SNAP we keep concentrating
198     on 3+1D simulations, with which we mean simulations modelling a
199     space-time sub-volume of the universe OF ANY SIZE, so not only large
200     scale structure, galaxy clusters, but everything down to asteroid collisions etc.
201     As the model <i>describes</i> simulations, it may be called a meta-data model.
202 gerard.lemson 322 It will be a logical model in the sense of standard data modelling approaches <em class="todo">@@TODO add some references@@</em>,
203 gerard.lemson 253 and is based on an analysis, or domain model which is presented but not normative.
204     The logical model is presented in fully detailed and documented UML2, serialised
205     to XMI 2.1, created using the MagicDraw 12.1 Community edition tool.
206     The data model is using a small subset of UML2 and has some UML profile
207     extensions added. Together this can be seen as a domain specific language,
208     and this can be formalised in a UML Profile. We will propose using such a profile
209     to the DM working group as a general approach for DM efforts.
210     </li>
211     <li>A query protocol based on the logical model.
212     <br />We propose this to have at least an ADQL version.
213     To this end we will provide a relational mapping.
214     This physical model is completely derived from the SimDB logical model using rules
215     implemented as a pipe-line of XSLT2 scripts working on the XMI representation of
216     the UML. The scripts will produce relational database DDL scripts defining the
217     database schema. That schema itself is not normative, instead we will define the
218     replies to TAP metadata queries. We provide implementaiton scenarios in the text below,
219     for the case of someone using the results from this project completely and for the
220     case of someone implementing a SimDB on top of a legacy database.
221     </li>
222     <li> a messaging format for sending instances of the various components
223     in the data model around.
224     <br />This format will be based on a number of XML
225     schema documents (XSDs), one of which contains the root elements defining valid SimDB resources.
226     This requires a mapping from the UML to XSD.
227     This mapping will take the form of one or more XSLT documents.
228     </li>
229     <li> An IVOA working draft document describing these components.
230     <br />This will be based on the current document.</li></ul>
231     </p>
232     <p>
233 gerard.lemson 271 We introduce some non-normative solutions that can be taken over for generic
234 gerard.lemson 253 data models (this is ofcourse also true for the UML/XMI+XSLT approach for the
235     normative standards).
236     <ul>
237     <li> The XSLT scripts we propose above do not work on the XMI itself, but on
238     an intermediate representation of the UML data model. This is an XML dialect
239     based on a schema we define and which captures the UML profile more directly.
240     XMI is very generic and rather cumbersome to work with. The representation of
241     the UML in our intermediate XML form is much more readable and XSLT based on it
242     is much simpler. It also allows easier adaptation to future modifications in UML,
243     or to tools whose XMI representation is different from the standard. We only need
244     to update the XMI->Intermediate XSLT transformation scripts. Not the more complex
245     transformations to the other official representations.
246     We will propose a similar approach to the DM WG.
247     </li>
248     <li> We will provide XMI->Java+JPA+JAXB transformation scripts in XSLT (properly, intermediate->Java).
249     These scripts generate Java classes corresponding to the types (Class, DataType, Enumeration)
250     in UML. These classes are annotated with Java Persistence Architecture (JPA)
251     and Java Architecture for XML Binding (JAXB) attributes to assist in the transformation
252     between relational database and XML representations.
253     Similar scripts can be written for C#. C# allows the same annotations as Java 5 supports
254     already for longer. For persistence we will likely use Linq, which seems similar to JPA.
255     </li>
256     <li>We propose an approach for including application specific and legacy simulation databases
257     in this framework. This approach follows the "global-as-view" approach to information
258     integration (see for example http://www.deg.byu.edu/papers/PODS.integration.pdf;
259     Leonid Kalinichenko from the RVO is an expert in this field).
260     Implementors with an existing relational database schema may be able to define database
261     views which implement the relational representatiopn of the SimDB data model,
262     and in this way provide a simple way to support querying of their database using ADQL.
263     </li></ul></p>
264     <h4>organisation</h4>
265     <p>
266     The SimDB is ready to be transferred to the DM WG.
267     <br />We propose that Gerard Lemson keeps leading this effort (as main editor), also when it is moved
268     to that WG. The DM WG's chair (Mireille Louys) will be responsible all WG-chair
269     issues associated with moving a specification through the document process.
270     The people at the bottom will be part of a "tiger team" to push the standard to RFC.
271     We may want to expand this group with an expert from each of the WGs mentioned below.
272     </p>
273     <p>
274     We have been discussing the data model for some time now.
275     Various projects (Italy, USA, France and Germany) have implementations that are similar
276     to the envisioned SimDB. We believe that by autumn 2008 it can go to RFC.
277     Patriza Manzato and Rick Wagner will have reference implementations based on existing DBs,
278     so will various projects in France (Lyon: Jeremy Blaizot and Laurent Bourges;
279     Galmer database: Igor Chillingarian) and GAVO.
280     </p>
281     <p>
282     Other relevant working groups for this process are Registry, ADQL and Semantics, possibly DAL.
283     Registry because the simulation database is similar to a registry. We can
284     learn from implementations and the registry interface. Also, we (think we) may need an
285     extension to the IVO Identifier in the implementation of references in SimDB.
286     ADQL because we propose it to be the standard (main) query interface to a SimDB implementation.
287     Semantics because our model includes usage of semantic vocabularies, maybe full ontologies
288     DAL because we our proposal for using ADQL in the query phase requirs a version of
289     the TAP protocol for defining the interface.
290     We would like to include a person from each of these WGs in the tiger team.
291     Our wishes are: Ray Plante (Registry), ? (ADQL), Norman Gray (Semantics), (?) TAP.
292     Ray and Norm have contributed to early discussions about SNAP.
293     </p>
294     <p>
295     Of these other efforts it seems TAP offers the main risk for the SimDB standard to go to
296     RFC by the Autumn. What may help us is that we do not need all the details of TAP.
297     In particular the information_schema approach allowing users to
298     query for the data model is not required as it is part of SimDB specification.
299     We mainly need a prescription for sending ADQL queries to the SimDB, and what the
300     format of results should be.
301     Since we expect meta-data databases to be relatively small (compared to
302     say an SDSS or Millennium database), we expect fewer, if any problems with
303     performance and can stick to synchronous behaviour at first.
304     </p>
305     <p>
306     We may need some explicit registry-interface like features such as returning a
307     complete XML document according to the messaging format of the SimDB data model.
308     Other issues will come up during the next phase of the discussions.
309     </p>
310    
311 gerard.lemson 294 <h3>Simulation Data Access Protocol (SimDAP)</h3>
312 gerard.lemson 253 <p>
313     The second spin-off of the SNAP project we propose we rename to <i>Simulation Data Access Protocol</i> (SimDAP).
314     It deals with accessing the data after discovery by some means,
315     likely trough an implementation of a Simulation Database.
316     It should handle special services such as cut-out, projection,
317     extraction (AMR-like cut-outs, produces regular grids), but also staging etc.
318     It should also deal with data formats. Claudio Gheller (Italy) is leading
319     this effort with close help of Rick Wagner (USA).
320     </p>
321     <p>
322     This project needs more fleshing out and is hopefully ready to be transmitted
323     to a WG, likely DAL by the Autumn interop.
324     </p>
325 gerard.lemson 294 <h3>Connections between SimDB and SimDAP</h3>
326 gerard.lemson 253 <p>
327     The two projects are connected as follows:
328     The meta-data formats to be included in SimDAP messages are derived from
329     the data model of the SimDB.
330     Vice versa, the SimDB will include a component describing
331     which SimDAP services are applicable/available for a given simulation.
332     </p>
333 gerard.lemson 294
334     <!-- ++++++++++++++++++++++++ -->
335 gerard.lemson 296 <h2><a name="sec2"/> </a>2 Overview</h2>
336 gerard.lemson 294
337 gerard.lemson 322 <h3><a name="sec2_1"/>2.1 SNAP &rArr; SimDB + SimDAP</h3>
338 gerard.lemson 252 <p>This document presents a model for describing certain types of numerical computer simulations
339 gerard.lemson 296 and certain types of simulation post-processing products. The model was oringinally envisioned to
340     be used in the query part of the <i>Simple Numerical Access Protocol</i> (SNAP),
341     and in discovery of interesting SNAP services in the first place.
342     After investigating the application domain carefully, we have decided to leave the concept of
343     designing a DAL-like SxAP protocol for simulations. Instead we have split up the effort into
344     two separate efforts that can be used each in their own right, though their is a clear link between them.
345     This document discusses the firsts of these, which we have named the <i>Simulation Database</i>, and
346     will have the acronym <i>SimDB</i>. The second will be developed further in a separate effort amd is
347     called the <i>Simulation Data Access Protocol</i> (SimDAP, "Sim" stands for "Simulation", <i>not</i> "Simple"!).
348     </p>
349     <p>
350     Following SNAP, SimDB only explicitly considers simulations for systems that represent a space-time
351     sub-volume of the universe and (part of) its material contents. Examples of such simulations are
352     cosmological, pure dark matter N-body simulations of the large-scale structure of the universe;
353     adaptive mesh refinement (AMR) simulations following the evolution of a galaxy cluster using full hydrodynamics;
354     a simulation of the evolution of a globular cluster using a combination of tools, together simulating
355 gerard.lemson 322 the various types of physics <em class="todo">@@ TODO reference to MODEST-like activities</em>; or
356 gerard.lemson 296 simulations calculating the few seconds of a super nova explosion in full 3D.
357     </p
358     <p>
359     In general these simulations will evolve this system forward
360     in time and are able to produce <i>snapshots</i>, representing the state of the system, a 3D volume of space,
361     at a number of discrete times (though there are alternatives: light cone simulations, individual particle orbits).
362     These direct, raw results of simulations we call Level-0 products, following
363 gerard.lemson 252 similar terminology for observations.
364 gerard.lemson 296 SimDB also covers Level-1 products, which consist of the results of certain types of post-processing
365     of simulations, namely those products that in some form create an alternative representation of
366     a spatial sub-volume of the universe. For example a density field calculated on a regular grid, derived
367     created from an N-body or an AMR simulation; a cluster catalogue derived using some group finder applied
368     to a cosmological simulaiton, or a synthetic galaxy catalogue derived from the cluster catalogue using
369     halo occupation distribution models (HODs) or semi-analytical models (SAMs).
370     </p>
371 gerard.lemson 252 We do not make any restrictions on the type of systems being simulated, or the size of the
372     simulation, or the way the system is represented in the simulation code and results. We also
373 gerard.lemson 296 make no restrictions on the type of "observables" produced by the simulations.
374     </p>
375     <p>
376     The SimDAP
377     specification will includes protocols for services that process level-0 or level-1 results and produce
378 gerard.lemson 322 other level-1 results. The allowed services deal with selecting the results in a
379     sub-volume of the complete result, sampling a regular 3-dimensional grid, etc. SimDAP also allows for
380     services, that do not produce SimDB-like, level-0 or 1 products. Examples are projections, 1D or 2D samplings.
381     But also custom services will be allowed, for example calculating statistical properties such as correlation
382     functions or power spectra in cosmological simulations. A more detailed description of SimDAP
383     is outside of the main scope of this note.
384 gerard.lemson 296 </p>
385 gerard.lemson 322 <h3><a name="sec2_2"/>2.2 Simulation Database: structure, interface and applicable services</h3>
386     <p>
387     SimDB is a specification that defines the interface to a database containing meta data describing
388     simulations. To this end it contains two main parts, one is a model for the meta data, the other
389     a protocol for interacting with the database. The model is the core of the specification.
390     It describes the structure of individual data products in the database. We have chosen UML
391     as modelling language, as prescribed by the data modelling working group in the interoperability meeting
392     in Cambridge, UK, May 2003.
393     </p>
394     <p>
395     The UML model is a logical model (see [..] <em class="todo">@@ TODO add reference @@</em>) and
396     forms the basis for physical representations of the data products in the standard
397     language that the IVOA has chosen for such purposes, XML. We derive an XML schema defining valid
398     XML documents directly from the logical model. The SimDB interface will include functions for insetting
399     SimDB data products using such documents, and for retrieving individual, identified data products.
400     </p>
401     <p>
402     The logical model also forms the basis for a physical representation supporting formulation of queries.
403     For various reasons explained below we have chosen ADQL to be the query language and accordingly we derive
404     from the model a relational schema that defines the tables and columns that can be used in ADQL queries sent
405     to a SimDB implementation. The result of ADQL queries is supposed to be a VOTable, and this will in general
406     not represent a complete SimDB data product. However it can be used to browse the database, finally identifying
407     resources and possibly requesting these from the SimDB as XML documents.
408     </p>
409     <p>
410     We make very limited assumptions on <em>how</em> a data product discovered in a SimDB can actually be accessed.
411     We only assume there is a web-based service available, identified by a base URL and tagged with a service type.
412     The range of service types will be defined by SimDAP, but it will at least include "download" and "custom".
413     The data model contains an explicit element for indicating which services are available for a given data product,
414     and users may, if they wish, retrieve this information through ADQL queries and follow the links directly.
415     SimDB implementations can and likely will eventually provide SimDAP related functionality, but this is not part
416     of this specification.
417     </p>
418     <h3><a name="sec2_3"/>2.3 Registration</h3>
419     <p>
420     It must be possible to find SimDB instances in an IVOA Resource Registry <am class="todo">@@TODO add references&&</am>.
421     This implies we need a corresponding resource type, and we have to design its structure.
422     We also assume that one may define resources in the sense of [...]
423     <em class="todo">@@ TODO add reference to Resource data model document @@</em>
424     from within the contents of a SimDB. We take this into account explicitly in the model.
425     The SimDB will have a "getIVOAResource" function, which will execute the appropriate transformation from
426     the internal representation of the SimDB data products to the Resource model's XML representation [...]
427     <em class="todo">@@ TODO link to Resource XML schema document@@</em>.
428     This will likely put more requirements on the Registry model itself, maybe requiring extensions to its schema.
429     Possibly a SimDB itself can be an extension registry. This we think can be postponed to a future version of the
430     specification.
431     </p>
432     <h3><a name="sec2_4"/>2.4 Technology: UML, XMI, XSLT</h3>
433     <p>
434     We
435     </p>
436     <h3><a name="sec2_5"/>2.5 Reference implementations</h3>
437 gerard.lemson 294 <!-- ++++++++++++++++++++++++ -->
438    
439 gerard.lemson 296 <h2><a name="sec3"/>3 Usage scenarios</h2>
440 gerard.lemson 322 <em class="todo">@@ TODO needs severe editing @@</em>
441 gerard.lemson 252 We have assembled a list of explicit use cases and scenarios from which we derive
442     requirements for the current model and the SNAP protocol.
443 gerard.lemson 296 <h4><a name="sec3_1"/>3.1 "20 questions"</h4>
444 gerard.lemson 252 <p>
445 gerard.lemson 296 SimDB defines a common data model for simulations.
446     Following the good practice for database design initiated in [], we here provide a number of
447     scientific questions one might want to ask such a database. The data model and associated data
448     access protocol need to be sufficiently rich that they can support such questions.
449 gerard.lemson 294 </p>
450     <ul>
451 gerard.lemson 252 <li> Scientific goal: investigate baryon wiggles in the evolved density field<br/>
452     Query: Return all cosmological, pure dark matter, N-body simulations with WMAP 3 initial
453     conditions and a box size of at least 1000 Mpc comoving, containing snapshots at about
454     10 redshifts between 3 and 0.
455     </li>
456     <li> Scientific goal: investigate whether observed structures in X-ray cluster that seem to
457     indicate turbulence, can truly be that.<br> Query: return all hydro-dynamical simulations of
458 gerard.lemson 296 galaxy clusters of mass at least 1o<sup>14</sup> M<sub>sun</sub>,
459     that have a model for viscosity included in the simulation.
460 gerard.lemson 252 Moreover, return only those simulations that have associated to them an online visualisation
461     service that can produce projected temperature and pressure maps.
462     </li>
463     <li> Scientific goal: interpret the possible histories of an observed galaxy merger to calculate
464     possible star formation episodes and compare these to the observed stellar populations.<br>
465     Query: Return all simulations of galaxy mergers where the component galaxies have a particular
466     mass ratio and where there are enough snapshots to follow the evolution over a few Gyr.
467     </li>
468    
469     <li> Scientific goal: compare the luminosity function of galaxies in the SDSS survey with those
470     in synthetic catalogues.<br>Query: Select all cosmological simulations that have produced as
471     secondary product synthetic galaxy catalogues on a light-cone and provide those via an SQL (ADQL?)
472     query interface.
473     </li>
474     <li> ...
475     </li>
476 gerard.lemson 294 </ul>
477 gerard.lemson 252 <p>
478     In the design of the model it is useful to think about the steps a user might go through
479     when querying a database system in various "drilling down" steps. For example the following
480     questions might be asked :
481 gerard.lemson 294 </p>
482 gerard.lemson 252 <ul>
483     <li>What system/object is being simulated?</li>
484     <li>What physical processes are included?</li>
485     <li>How is the system being represented in the simulation
486     (particles (Langrangian), (adaptive) mesh (Eulerian)), both, other?</li>
487     <li>Per process:<ul>
488     <li>How are the physical processes implemented ?</li>
489     <li>Characterise the numerical approximations (.e.g. resolution, softening parameter)</li></ul></li>
490     <li>What observables are available for the system/object, possibly as function of time?
491     As it is a spatial system, at least size, center-of-mass position.</li>
492     <li>What observables are available for the constituents, i.e. what is the schema of the atomic objects?</li>
493     <li>Per snapshot, per atomic object type, per variable:
494     <ul>
495     <li>Characterise the possible values</li>
496     <li>Characterise the result</li></ul></li>
497     <li>Are post-processing results available?</li>
498     <li>Are services/applications available working on the results?</li>
499     <li>Which code ran the simulation?</li>
500     <li>What were values of physical parameters?</li>
501     <li>How were initial conditions created, what parameters?</li>
502     </ul>
503 gerard.lemson 296 </p>
504 gerard.lemson 264
505 gerard.lemson 296 <h4><a name="sec3_2"/>3.2 SimDB-standard implementation</h4>
506     We foresee a simple implementation scenario based directly on products developed
507     in the course of the SimDB effort. We believe that from the data model to be developed
508     in this effort we should be able to derive physical representations that
509     can be used directly in implementations. We envisions that with only a little custom infrastructure code
510     it should be possible to
511     <ul>
512     <li>fill a relational database with tables and views representing the SimDB data model from
513     DDL scripts generated from the UML</li>
514     <li>create a web-based service that accept XML documents for inserting new simulation results
515     and translates these, using generated code with JAXB annotations, to in memory Java objects</li>
516     <li>flush these objects to a relational database using the Java Persistence Architecture (JPA) implementation,
517     structured using the JPA annotations generated on the Java classes.
518     It should be not too hard to support other languages as well if they provide similar simple XML binding and
519 gerard.lemson 322 OR-mapping capabilities. Python+Django and C#+LINQ or NHibernate come to mind.<em class="todo">
520     @@ TODO check with people knowing more about these technologies @@</em></li>
521 gerard.lemson 296 <li>accept ADQL queries that are translated to the appropriate vendor specific SQL
522     (using modules defined by the ADQL effort?) and return a VOTable</li>
523     <li>accept requests for identified SimDB resources (using an IVO or implementation specific identifier),
524     translate this into a JPA query to retrieve the object form the database, which is translated to
525     the appropriate XML using the JAXB layer and sent back to the user.</li>
526     </ul>
527    
528     <h4><a name="sec3_3"/>3.3 Legacy database</h4>
529     Although by no means as common as similar efforts in the observational domain,
530     databases have been developed containing the meta data of simulations.
531     How could a SimDB be implemented around such a database.
532     Our ideas are inspired by (what we understand from) the "global-as-view" approach to information
533     integration. We assume the implementers have their own way of filing up their database with meta-data
534     describing simulations from their own efforts. The idea is that they write database views to provide
535     a virtual implementation of the SimDB/RDB schema. ADQL queries sent to their service can now still be
536     understood and replied to. The users should also be able to write custom code to produce the appropriate
537     XML documents based on a request for an identified resource, possibly by querying these same views.
538    
539     <h4><a name="sec3_4"/>3.4 Meta data production pipe line</h4>
540     The SimDB data model is relatively comprehensive, which reflects itself in XML documents
541     of substantial size ad complexity for realistic cases.
542     For a registration scenario, i.e. one where a user is allowed to upload XML documents to a SimDB implementation,
543     one would prefer not to have to produce these documents by hand. By far the preferred manner in our opinion
544     would be for simulation and post-processing pipe-lines to produce compliant documents.
545     We have contacted authors of some of the most popular major simulation codes (Springel; Norman et al; more needed),
546     and they have agreed that this is feasible and are willing to participate in this effort.
547    
548     <h4><a name="sec3_5"/>3.5 Client tools</h4>
549     One reason to produce a standard which uses ADQL on top of a standard data model is that client tools
550     can be written to query different such holdings. For example we could envision a tool such as VisIVO [..]
551     to offer some user-friendly interface for querying SimDB implementations retrieved from an IVOA Registry.
552     The user need to see any ADQL, that is all generated by VisIVO, but can be shown results and services.
553     In particular if a cut-out service is available, VisIVO could provide an interface for the user to decide
554     on the sub-volume, retrieve and visualise it. The advantage of having a standard data model
555     clearly is that the same ADQL can be sent to all SimDB services.
556 gerard.lemson 322 <em class="todo">@@ TODO contact VisIVO people to see whether this could be implemented @@</em>.
557 gerard.lemson 296
558 gerard.lemson 294 <!-- ++++++++++++++++++++++++ -->
559 gerard.lemson 264
560    
561 gerard.lemson 294 <h2><a name="sec4"/>4 Analysis model</h2>
562 gerard.lemson 322 <em class="todo">@@TODO Gerard@@</em>
563 gerard.lemson 296 An <i>analysis model</i>, also called domain model, is an abstract, high-level representation of the
564     <i>universe of discourse</i> (UoD), the part of the world that our application deals with.
565     It is a UML model, with emphasis on the concepts and their exact relationships in the UoD, though details
566     such as attributes need not be completely filled in.
567     Importantly, it should not be influenced by application scenarios apart form knowledge of their UoD.
568     Here we describe the UoD and our analysis model. The model is strongly influenced by patterns
569     discovered in earlier work on a
570     <i><a href="http://www.ivoa.net/internal/IVOA/IvoaDataModel/DomainModelv0.9.1.doc">Domain model for Astronomy</a></i>,
571     co-written by one of the authors of the present note. We describe some of its main patterns below as well.
572 gerard.lemson 264
573 gerard.lemson 294 <h4><a name="sec4.1"/>4.1 Universe of Discourse</h4>
574 gerard.lemson 264
575 gerard.lemson 294 <h4><a name="sec4.2"/>4.2 Domain Model for Astronomy</h4>
576    
577 gerard.lemson 296 <h4><a name="sec4.3"/>4.3 SimDB analysis model</h4>
578 gerard.lemson 322 <em class="todo">@@TODO create a version and add it to volute@@</em>.
579 gerard.lemson 296
580 gerard.lemson 294 <!-- ++++++++++++++++++++++++ -->
581 gerard.lemson 252
582 gerard.lemson 294 <h2><a name="sec5"/>5 Logical Model: SimDB</h2>
583 gerard.lemson 264 <p>
584 gerard.lemson 296 Here we introduce the core of our proposal, the UML representaiton of our logical data model
585 gerard.lemson 434 for our Simulation Database. The exact representation of this model is an
586     <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SimDB_DM.xml">XMI file</a>,
587     which can be found in the <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm">snapdm section</a>
588     of the <a href="http://volute.googlecode.com/svn/">Volute subversion database</a> on Google code.
589     Other representations can be found in that same hierarchy, in particular check out the
590     <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SimDB_DM.xml">HTML documentation</a> which we generated from the XMI
591     representation with the XSLT pipeline described in <a href="#appB">Appendix B</a>. This generated documentation file contains
592     the explicit description of all of the elements in the model and forms the reference documentaiton document for the model.
593     <h3><a name="sec5_1"/>5.1 Overview</h4>
594 gerard.lemson 296 <p>
595 gerard.lemson 294 The logical data model is a fully detailed model of the application domain. It is to form the basis of physical
596     models, representing the model in various computational environments.
597     The logical model is represented as a set of UML diagrams, which we created using MagicDraw Community Edition 12.1 and stored as an
598     XMI file in the GoogleCode
599     SVN repository: <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/SNAP_Simulation_DM.xml">
600 gerard.lemson 322 SNAP_Simulation_DM.xml</a> <em class="todo">@@TODO should change all occurrences of names with SNAP to using SimDB@@</em>
601 gerard.lemson 294 JPG representations of the model can be found in <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/input/images/">this</a>
602 gerard.lemson 322 directory. <em class="todo">@@TODO find proper representation image of the complete model. Possibly color packages differently.@@</em>
603 gerard.lemson 264 </p>
604 gerard.lemson 434 <h3><a name="sec5_2"/>5.2 Normalisation</h4>
605     <h3><a name="sec5_3"/>5.3 Model contents</h3>
606     <h4><a name="sec5_3_1"/>5.3.1 Resource hierarchy</h4>
607     <p>
608     At the root of the SimDb data model is an abstract class called Resource, in the rest
609     of this document we will refere to this as SimDB/Resource.
610     It represents the different types of highest level meta-data objects to be stored in a SimDB.
611     Examples of this are represented as subclasses. First Experiment (SimDB/Experiment), which represents
612     different types of experiments that have been performed (run/executed/...) and have produced the results
613     that SimDB users may be interested in. Examples of SimDB/Experiment-s are first simulations,
614     but also the various post-processing operations transforming simulation results into other products
615     such as halo catalogues, density fields etc.
616     </p>
617     <p>
618     The second major type of SimDB/Resource is the SimDB/Protocol.
619     This concept represents a <i>formally prescribed way of doing an experiment</i>.
620     It is derived from the concept with the same name in the domain model, which itself was inspired
621     by the concept with the same name in Chapter 8.5 in <a href="#r_AnalaysisPatterns>[3]</a>.
622     In the SimDB/DM this concept has concrete representations in the computer programs that are being
623     used to run simulations and post-processing etc. As such it defines the possible input parameters,
624     possble algorithms, the kind of results that can be produced by the code. Every SimDB/Experiment must
625     indicate which SimDB/Protocol was used and for example provide values for the input parameters, indicate
626     which physics was used
627     </p>
628     <p>
629     The SimDB/Resource concept is clearly similar, but in general <i>not equivalent</i> to the Resource Registry's Resource concept.
630     In data modeling terms, it is not true that a SimDB/Resource <i>is a</i> Registry/Resource.
631     Often the reason is similar to the reasons that a single image is not a Registry/Resource, whereas a SIAP-compatible service is.
632     The granularity of a SimDB will be higher than a Registry and many simulations on their own will be too small.
633     The SimDB itself will have to be registered (see <a href="#">section ???</a> for a further discussion
634     <em class="todo">@@ TODO add propoer section and href@@</em>),
635     i.e. a SimDb service <i>is a</i> Registry/Resource. In discussion with Ray Plante (IVOA Interop May 2007, Beijing)
636     on this issue it was proposed that some part of the contents could also be registered in a Registry directly.
637     I.e. we should be able to identify Registry/Resource-s in SimDB. Considerations to decide on how to make this identification would be for example
638     that all data products resulting form a well defined (and published) scientific project could qualify.
639     To represent such a possibility for now we have introduced another subclass of SimDB/Resource: SimDB/Project.
640     This is not much more than an aggregation of experiments, with some additional atrributes describing the motivation etc.
641     The metadata of a SimDB/Project is not the same as that of a Registry/Resource, however we propose that we should be able
642     to define a transformation (possibly implemented again in XSLT) to transform a SimDB/Project and produce a Registry/XML representation.
643     Some more thoughts on this subject will be given in <a href="#">section ???</a> <em class="todo">@@ TODO add proper section and href@@</em> mentioned above.
644     </p>
645 gerard.lemson 293
646 gerard.lemson 434 <h4><a name="sec5_3_2"/>5.3.2 Target</h4>
647 gerard.lemson 322
648 gerard.lemson 434 <h4><a name="sec5_3_3"/>5.3.3 Characterisation</h4>
649 gerard.lemson 294
650 gerard.lemson 434 <h4><a name="sec5_3_4"/>5.3.4 Semantics</h4>
651 gerard.lemson 294
652 gerard.lemson 434 <h4><a name="sec5_3_5"/>5.3.5 Units</h4>
653 gerard.lemson 294
654 gerard.lemson 434
655     <h4><a name="sec5_3_6"/>5.3.6 Services</h4>
656     The goal of the SimDB specification is to define a protocol for querying interesting simulations and related SimDB/Resource-s.
657     Once these have been identified the user should be able to access these simulations.
658    
659    
660     <h2><a name="sec6"/>6 Physical models</h2>
661     Here we describe how we create physical models out of the logical model.
662     A <i>physical model</i> is (see <em class="todo">@@TODO reference to some standard reference on data modelling@@</em>)
663     a representation of the logical model that is adapted to a particular software environment.
664     The DM WG has mandated (IVOA interoperability meeting, Cambridge, UK, May 2003) that one
665     such representation should be an XML schema. This is to be used to define the structure of XML documents
666     used in message to communicate instances of the SimDB Resource type.
667     Together with this we also create a relational database schema.
668     We propose this model as we want to use the ADQL standard under development in the VOQL WG
669     in the protocol for querying SimDB-s.
670    
671    
672     <h3><a name="sec6_1"/>6.1 Identifiers and References</h3>
673    
674     We want to be able to identify each instance of each concrete type explicitly in a globaly unique way.
675    
676     To this end we need to be able to assign identifier on each
677    
678    
679     <h3><a name="sec6_2"/>6.2 RDBM Schema</h3>
680     The public schema, i.e. the view the outside world has of a SimDB, is a relational schema.
681     This will be formally defined using VOTables containing the appropriate TABLE definitions
682     <ul>
683     <li>object types are mapped to tables, one table per object type</li>
684     <li>Inheritance hierarchies: JOINED strategy as defined in JPA, i.e. each table only has columns for the attributes and references defined on the corresponding type.
685     Also an ID column that is a PK and also a FK to the ID of the base class' table. Possibly a container column (see below)</li>
686     <li>Primary key column: <tt>ID NUMERIC(18)</tt></li>
687     <li>Foreign key to container: <tt>containerId</tt><br/>plus foreign key and index declaration</li>
688     <li>References: &lt;referenceName&gt;Id<br/>plus foreign key and index declaration.</li>
689     <li>Using topological sort of object types based on (extends|container|reference) relations we generated
690     create table statements and ther indexes and foreign keys in blocks. drop table statements in opposite order.</li>
691     <li>For each class we create a view named "v_&lt;class name&gt;<br/>returns all columns for that class; uses join to base class's view.</li>
692     <li>generate a discriminator column on table for root in inheritance hierarchy, stores name of class (must be unique in inheritance hierarchy!)</li>
693     <li>attributes mapped to single column if their type is simple (i.e. primitive, or enumeration)</li>
694     <li>if attribute's type is dataType mapped to as many columns as the dataType has attributes,
695     with column names the name of the dataType's attributes, prefixed by &lt;attribute-name&gt;_</li>
696     <li>For PK columns we use the
697     </ul>
698    
699     <h3><a name="sec6_3"/>6.3 XML Schema</h3>
700    
701     <h3><a name="sec6_4"/>6.4 UTYPE-s</h3>
702     <p>
703     It is generally the case that contents of databases may be represented in ways that do not
704     conform to one of the standard serialisations. Nothing prevents services to be developed on
705     top of SimDB that represent SimDB/Resource-s or even fragments of these in another form.
706     The standard example would be to have VOTables storing the results of a generic ADQL query of the SimDB/RDB representation.
707     VOTable first introduced the option to have a UTYPE attribute in FIELD definition tags store
708     a pointer to an element in a data model that the column represents.
709     </p>
710     <p>
711     The <a href="#r_SpectrumDatamodel">Spectrum data model</a> was the first to add explicit
712     UTYPE-s for each of the attributes in its model and the <a href="#r_CharacterisationDM">Characterisaiton data model</a>
713     has followed that example. As long as the precise usage and relation of the syntax of the underlying data model is
714     is not defined, we will follow these exmaples by assigning UTYPE-s explicitly to all elements in the model.
715     However, we will follow a fixed set of rules to makes this assignment and implement these in XSLT.
716     If a similar approach is at some time accepted within the IVOA, possibly in an alternative form, it will be straightforward
717     to adjust our definitions.
718     </p>
719     <p>
720     Our assumption is that the UTYPE should be able to uniquely represent any element in the data model, and in a manner
721     that is also easily interpreted. For now the elements that we assume need to be able to address are those that can be
722     represented by a single value in a column. This leaves us to requiring to be able to derive UTYPE-s for the following
723     model elements:
724     <ul>
725     <li>Attribute</li>
726     <li>Reference</li>
727     <li>Collection</li>
728     </ul>
729    
730     </p>
731    
732    
733     <h3><a name="sec6_5"/>6.5 Java/JPA+JAXB (non normative)</h3>
734    
735 gerard.lemson 294 <h2><a name="sec7"/>7 Query Protocols</h2>
736     <h3><a name="sec7_1"/>7.1 ADQL</h3>
737 gerard.lemson 434
738 gerard.lemson 322 <h3><a name="sec7_2"/>7.2 REST</h3>
739     <p>
740     Under this heading we mean a protocol whereby data products can be retrieved through
741     HTTP GET requests. Possibly also they can be POST-ed, or PUT.
742     This needs to be discussed further, but maybe can be punted until a future release.
743 gerard.lemson 434 The GET will always only be able to get a complete SimDB resource, serialised to SimDB/XML, similar to the Registry.
744 gerard.lemson 322 </p>
745 gerard.lemson 294 <h3><a name="sec7_3"/>7.3 TAP?</h3>
746 gerard.lemson 322 Issues:
747     <ul>
748     <li>How does TAP deal with units?</li>
749     <li>In TAP, does a table column containing values always has a single UCD and a single Unit?</li>
750     <li>Is TAP suited for this kind of metadata databases?</li>
751     </ul>
752 gerard.lemson 294
753     <h2><a name="sec8"/>8 Next Steps</h2>
754     <h3><a name="sec8_1"/>8.1 Reference implementations</h3>
755     <h4><a name="sec8_1_1"/>8.1.1 France</h4>
756 gerard.lemson 322 <em class="todo">@@ TODO Laurent @@</em>
757 gerard.lemson 294 <h4><a name="sec8_1_2"/>8.1.2 Germany</h4>
758 gerard.lemson 322 <em class="todo">@@ TODO Gerard @@</em>
759 gerard.lemson 294 <h4><a name="sec8_1_3"/>8.1.3 Italy</h4>
760 gerard.lemson 322 <em class="todo">@@ TODO Patrizia @@</em>
761 gerard.lemson 294 <h4><a name="sec8_1_4"/>8.1.4 USA</h4>
762 gerard.lemson 322 <em class="todo">@@ TODO Rick @@</em>
763 gerard.lemson 294
764 gerard.lemson 434 <h3><a name="sec8_2"/>8.2 Generating XML from simulation pipe lines</h3>
765 gerard.lemson 294
766     <h3><a name="sec8_3"/>8.3 SimDAP services</h3>
767    
768     <h2><a name="appA"/>Appendix A: Data modelling specifics</h2>
769     Here we describe various aspects of UML modelling as we applied it to the current
770     problem area.
771 gerard.lemson 293 <p>
772 gerard.lemson 294 UML allows communities to create a domain specific modelling language through its Profiling capabilitites
773 gerard.lemson 322 <em class="todo">@@ TODO is this the proper term ?@@</em>.
774 gerard.lemson 294
775     We have an initial implementation of a UML profile as created by MagicDraw available under <a href="">this link</a>.
776     Here we list the main elements and give a a short motivation for their inclusion in the model/.
777     It is our opinion that the DM working group should be ultimately responsible for a profile such as this,
778     defining a domain specific language for all IVOA data modelling efforts.
779     </p>
780     <p>
781     As first step in our simulation pipeline we generate an XML document that represents the data model in a form
782     that is more easily interpreted, both by human readers and by XSLT scripts, than the XMI representation.
783     This document itself is structured according to an XML schema that
784     represents the UML profile rather directly and that we here shortly describe.
785     </p>
786     This schema is located in
787     <a href="http://volute.googlecode.com//svn/trunk/projects/theory/snapdm/input/intermediateModel.xsd">
788     http://volute.googlecode.com//svn/trunk/projects/theory/snapdm/input/intermediateModel.xsd</a>.
789    
790    
791     We introduce our own XML format, defined by the XML schema in
792     <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/res/intermediateModel.xsd">intermediateModel.xsd</a>,
793     for representing the logical model. For the time being we call this the <i>intermediate representation</i>.
794     The first step in the generation pipeline is a translation of the XMI to an XML document following this format.
795     This transformation is implemented in the
796     <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/res/xmi2intermediate.xsl">xmi2intermediate.xsl</a>
797     XSLT script. The latest version of the intermediate representation for the SimDB data model can be found in
798     <a href="http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/output/SNAP_Simulation_DM_INTERMEDIATE.xml">this location</a>.
799     All other generation scripts work on this intermediate representation, not on the XMI document.
800     Variations in tool-generated XMI, or different versions of XMI can now be supported by an appropriately adjusted
801     XSLT script.
802     One reasons why this may be useful is that are different tools may produce different versions or different
803     dialects of XMI. Another reason for this representation is that XMI is a rather complex representation of a UML
804     model. Since we are using a rather restricted <a href="#profile">profile</a> we do not need this generality, and
805     this allows us to represent the model using XML documents that are much easier to handle with XSLT.
806    
807    
808     <p>
809 gerard.lemson 293 We illustrate out UML profile using an example data model
810     derived form the SimDB/DM, shown in the following diagram:<br/>
811     <img src="img/example.jpg"/>
812     <br/>
813     We now describe the individual elements.
814     some of these are standard, some of these are domain specific extensions following
815     standard UML profile <i>stereotype</i> extension elements and associated <i>tag definition</i>.
816    
817     <ul>
818     <li>Model<br/>
819     (no visual counterpart)</li>
820     <ul>
821     <li> &lt;&lt;model&gt;&gt; </li>
822     <ul>
823     <li>TagDefinition: author</li>
824     <li>TagDefinition: title</li>
825     </ul>
826     </ul>
827     <li> Package <br/><img src="img/package.jpg" />
828     <ul>
829     <li>package containment</li>
830     <li>package dependency</li>
831     </ul>
832     </li>
833     <li> Class <br/><img src="img/class.jpg" />
834     <ul>
835     <li>isAbstract<br/>
836     </li>
837     </ul>
838     </li>
839     <li> DataType <br/><img src="img/datatype.jpg" /></li>
840     <li> Enumeration <br/><img src="img/enumeration.jpg" /></li>
841     <li> Property: attribute<br/><img src="img/attribute.jpg" /></li>
842     <ul><li>&lt;&lt;attribute&gt;&gt; </li>
843     <ul>
844     <li>TagDefinition: minLength<br/>
845     </li>
846     <li>TagDefinition: maxLength<br/>
847     </li>
848     </ul>
849     <li> &lt;&lt;ontologyterm&gt;&gt; </li>
850     <ul>
851     <li>TagDefinition: ontology<br/>
852     A URL locating a standard (RDF|SKOS|OWL|???) document containing
853     a list of terms from which the value for this attribute may be obtained.
854     It is our opinion that the Semantics working group should be responsible for the
855     definition of relevant ontologies (or semantic vocabularies, or thesauri, or ...)
856     required for a given application domain, though the contents should be decided in
857     cooperation with domain experts.
858     </li>
859     </ul>
860     </ul>
861     <li>Inheritance
862     <br/><img src="img/inheritance.jpg" /></li>
863     <li>Binary association end: collection
864     <br/><img src="img/collection.jpg" /></li>
865     <li>Binary association end: reference
866     <br/><img src="img/reference.jpg" /></li>
867     <li>Binary association end: subsets
868     <br/><img src="img/subsets.jpg" /></li>
869    
870     </ul>
871    
872     </p>
873    
874    
875 gerard.lemson 294 <h2><a name="appB"/>Appendix B: XSLT pipe line</h2>
876 gerard.lemson 322 <em class="todo">@@ TODO Laurent @@</em>
877 gerard.lemson 293
878 gerard.lemson 322 <h2><name="glossary"/>Glossary and Acronyms</h2>
879     <dl>
880     <dt><a name="g_SimDB">SimDB</a></dt>
881     <dd></dd>
882     <dt><a name="g_SimDAP"/>SimDAP</dt>
883     <dd></dd>
884     <dt><a name="g_SimDB/DM"/>SimDB/DM</dt>
885     <dd>The logical data model defining the structure of <a href="#g_SimDB">SimDB</a>.</dd>
886     <dt><a name="g_SimDB/RDB"/>SimDB/RDB</dt>
887     <dd>The representation of the SimDB/DM as a relational data base schema.</dd>
888     <dt><a name="g_SimDB/XML"/>SimDB/XML</dt>
889     <dd>The XML representation of the SimDB/DM</dd>
890     <dt><a name="g_SimDB_resource"/>SimDB resource</dt>
891     <dd>A top-level data product stored in a SimDB.
892     A SimDB resource can be described in a SimDB/XML document, but none of its constitutents can.</dd>
893     </dl>
894 gerard.lemson 252
895     <h2><a name="references">References</a></h2>
896    
897 gerard.lemson 293 <p><a name="r_UML">[1] ???, <i>UML standard</i>
898 gerard.lemson 252 <br/><a href="http://">http://</a>
899     </p>
900 gerard.lemson 293 <p><a name="r_XMI">[2] ???, <i>XMI standard</i>
901     <br/><a href="http://">http://</a>
902     </p>
903 gerard.lemson 434 <p><a name="r_AnalaysisPatterns">[3] Martin Fowler, <i>Analysis Patterns</i>, 1997, Addison Wesley.
904 gerard.lemson 293 <br/><a href="http://">http://</a>
905     </p>
906     <p><a name="r_TheoryinVO">[4] Lemson & Colberg, <i>Theory in the virtual observatory</i>
907     <br/><a href="http://">http://</a>
908     </p>
909 gerard.lemson 252
910 gerard.lemson 293 <p><a name="r_Characterisation">[5] ???, <i>Characterisation DM</i>
911     <br/><a href="http://">http://</a>
912     </p>
913 gerard.lemson 252
914 gerard.lemson 322 <p><a name="r_informatonIntegration">[6] <em class="todo>@@ TODO @@</em>references on global-as-view and information integration
915 gerard.lemson 293 <br/><a href="http://">http://</a>
916     </p>
917 gerard.lemson 252
918 gerard.lemson 434 <p><a name="r_visivo">[7] <em class="todo>@@ TODO @@</em>reference to VisIVO
919 gerard.lemson 296 <br/><a href="http://">http://</a>
920     </p>
921    
922 gerard.lemson 434 <p><a name="r_SpectrumDatamodel">[8] <em class="todo>@@ TODO @@</em>reference to Spectrum data model
923     <br/><a href="http://">http://</a>
924     </p>
925    
926 gerard.lemson 252 </body></html>

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26