2008-09-19
----------

A potential problem with the SimDB data model is that due to its complexity,
the obvious mapping from a certain simulation(-code) may not be obvious.

In particular, the "most correct", or formal mapping may not be according
to the way one thinks about the problem in general.
Hence for 3rd parties it may be hard to find a given experiment/protocol
when searching according to the expected/natural patterns, which
may not be the way the mapping has been performed.

Case in point, PDR code form LePetit etc.

"obvious" mapping might be to define a representation object type
for each chemical element that is simulated.
But this is not the way the program works.
There a "gas element" or "grains particle" is considered, which as
one of its properties has the chemical element it represents.
To find out that a simulation contains Ne IV (say), one needs to go to the data,
or at least to the characterisation (which for discrete values is not obvious).

The reason for this is that the protocol does NOT define a fixed set of possible 
representation objects that contains He, HI, Ne IV etc.
To make it do so requires one to change the protocol for each run.

What we may need is a set of models for physical objects/substances/systems
that can be reused in these cases.

Alternatively, the model could be less normalised, with less emphasis on
fixed, predefined, static protocols. ONly experiments that explicitly 
define their representation objects.

Or we put all of this in TargetObjectType.
Problem is still that there are no semantic vocabularies containing all possible 
chemical elements (let along the fact that also individual excitation levels
are handled by the code). For those we really need a model.

What this implies is that for example the SImDB expression of a PDR code+experiment 
is less expressive than could be the case, mainly because we do not have a shared
vocabulary to describe the individual terms.
Such a vocabulary implicitly DOES exist in the "PDR community", hence they could
in principle define more expressive models themselves, which then do not fit
exactly onto SimDB.
Can't help that (compare SIAP metadata << most FITS metadata for the real images).

Q: can/should we use the TargetObjectType for this?
Should that, and its properties, be one of the main entry points for queries, 
possibly on properties of these, by name (fuzzy, Google-like).


Views
-----
For usability we may want to write some views on the model that simplify querying.
For example, properties with their typical value obtained from  characterisation.

create view TOTProps as 
select pt.name as protocolName
,      ex.id as experimentId
,      tot.name as targetObjectTYpeName
,      tot.label as totLabel
,      p.name as property
,      p.ucd as propertyUCD
,      ch.value as characteristicValue
,      ch.type as charType
  from protocol pt
  ,    experiment ex
  ,    targetobjecttype tot
  ,    objectcollection oc
  ,    characterisation ch
  ,    property p
 where oc.objectTypeId = tot.id
   and ch.containerId = oc.id
   and p.id = ch.propertyId
   and ex.id = tot.contaianerId
   and pt.id = ex.protocolId
order by pt.name, ex.id, tot.name, p.name


For such custom views we can create custom JSP pages that allow
restrictions on the result columns in a parametrised fashion.
Then also we could for example turn an ID into a URL to a page opening the resource.

Question:
How about creating standardised, simplifying JPA queries as well
and base parametrised query pages on them?




ADQL/TAP page
-------------
A simple page with a textarea to type in the SQL, possibly some demo queries.
Left menu bar with the schema, linking to page for each table (obtained fomr generated TAP
metamodel).

It might be nice to have a simple tag that has a dropdown with the root entity 
types, and a text box for the ID, and a button which, when clicked, opens a page for the 
resource. Especially from custom query pages this may be useful, so users can simply
cut-and-paste an ID and go to the resource.

TODO
- create a page with the text area etc (Millennium-like)
- code Query.do
- create a page documenting a table/view (
- load the table/TAP metadata at same time as JXAB metamodel
OR
  use XSLT working on the TAP document to display a table in HTML
OR
  generate the static HTML pages for each table (add to build.xml)
- create a tag ala described above and add t to simple query page.
- find a way to integrate the custom view and related pages as above in this standard
  query page (show custom views in left menubar (document them!), link to
  parameterised query pages.
  
  
JQL page?
-----------
Can we expand Laurent's JQL component?
Show on each page with a standard view what the JPA query was that produced it
and possibly allow modifications.


Release build target?
---------------------
WOuld be good to have a "release" target in build scripts that gathers all
relevant code, generated and source, i a single release directory, which can be
synched with volute. Would contain:
- all generated sources
- all required inputs (DM, ..?)
- all src code, including webapp



Validation and Upload
---------------------
Validation of XML docs:
- schema based (use JAXB validator, gives errors in standard form)
- based on our rules, which are:
 . references must be valid (resolver should find reference, 
 and these should be of correct type. does JPA take care of this?
 Btw, resolver will have to check whether ivoID has correct format.)
 . uniqueness constraints must be obeyed. 
  (only discovered (supposing we generate uniquenes constraints) 
  when a JPA flush fails, what error messages?)
OR 
  could the validation only (i.e. no flush) include uniqueness checking
  by attempting a flush, but always with a rollback at the end?
 . other rules: maxLength etc.
  Q: can they be generated into a pattern?
NB: we need a constraints language
- publisherDID must have valid format and must be unique.
 note, empty string (i.e. publisherDID="") is invalid, NOT equivalent with NULL.
- 



publisherDID
------------
- Is going to be added to base MettadataObject
- Form should be compatible with other similar IDs, e.g. in SSA/SpectrumDM.
  Which means it must be an IVO Identifier.
- Should it's authority ID be assigned by a Registry first?
Do we want to link this to a user? Do we need to check this with a registry?
- MUST BE DISCUSSED (with DAL, Registry)


User Management
---------------
- add user column on root entity classes
- only owner (user) can update a resource, or fragment of a resource
- how to create Party? 
  Should we denormalise Party? Make Contact inherit from Party?  
  Reuse of party feasible?
    . a User will have a Party created for him
    . should userId be an (implicit) column on Resource (each root entity table)
    . but contacts are different from user.
- we need to be able to identify which user is the owner of a resource
- through a userId which lives in context of session, derived from tomcat 
  login, NOT in same (i.e. SimDB) database, in UserDB
- when creating  User, we'll create a Party. Where? in SimDb or in UserDB, both?
- the Party-s corresponding to Contact-s have to be dealt with, 

PROPOSAL: remove Party class, put name on Contact, make mainContact an attribute on
resource, of type string, representing the name of the person.
More similar to Registry's Curation (which uses ResourceName)
We only need email on User.

TODO:
- user registration page
- generate userId on root entity base class
- must find a proper time to set the userid for the first persist.