/[volute]/trunk/projects/dm/provenance/description/provaccess.tex
ViewVC logotype

Contents of /trunk/projects/dm/provenance/description/provaccess.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 4205 - (show annotations)
Wed Aug 2 23:04:40 2017 UTC (3 years, 1 month ago) by kriebe
File MIME type: application/x-tex
File size: 16681 byte(s)
Add updates on ProvDAL (and TODO-boxes with questions concerning this)

1 \subsection{Provenance Data Model serialization}\label{sec:serialisations}
2 There are two possible families of ProvenanceDM metadata serializations, examples for these can be found in the implementation section (\ref{sec:usecases-implementations}) and the links therein.
3 \begin{itemize}
4 \item W3C serializations: PROV\-N, PROV\-JSON, PROV\-XML. These are serializations of the W3C provenance data model. They allow the possibility to add additional IVOA or ad hoc attributes to the basic ones in each class. This way the IVOA models can produce W3C compliant serializations.
5 % \item Mapping of ProvenanceDM classes onto tables with appropriate relationships. This can allow management by a TAP service (the model mapping is then described with the TAP schema). The serialization will result in a single table according to the query.
6
7 %\TODO{TAP SCHEMA of the ProvenanceDM datamodel: Maybe Mathieu can provide us with a copy of the TAP schema he designed ?}
8
9 \item Direct VOTABLE mapping by using some ad hoc mapping based on transcription of PROV-N format: this is called PROV-VOTABLE. Moreover in the future we could also define a VO-DML \citep{std:VODML} version of the mapping.
10 %The following is an example of provenance metadata in this PROV-VOTABLE format. Objects become tables, their classes are rendered by a utype. Attributes and relationships become FIELDS or PARAMS. The model attribute names also become VOTABLE utypes.
11
12 \end{itemize}
13
14 This can be done using the voprov \footnote{\url{https://github.com/sanguillon/voprov}} python module, also see Section~\ref{sec:implementation_voprov}.
15 Here is an example serialization for the process of creating a composite image from three images, in PROV-N format:
16
17 \begin{verbatim}
18
19 document
20 prefix ivo <http://www.ivoa.net/documents/rer/ivo/>
21 prefix hips <http://cds.u-strasbg.fr/data/>
22 prefix voprov <http://www.ivoa.net/documents/dm/provdm/voprov/>
23
24 entity(ivo://CDS/P/DSS2color#RGB_NGC6946, [voprov:annotation="This is a PNG RGB image built from DSS2 with Aladin for galaxy NGC 6946", voprov:doculink="http://cds.u-strasbg.fr/aladin.gml", voprov:name="RGB DSS2 image for NGC 6946"])
25 entity(ivo://CDS/P/DSS2/POSSII#POSSII.J-DSS2.143, [voprov:annotation="This is the DSS2 digitazition of the Blue POSSII Schmidt survey around NGC 6946", voprov:doculink="http://cds.u-strasbg.fr/aladin.gm", voprov:name="POSSII Blue Survey DSS2 NGC6946"])
26 entity(ivo://CDS/P/DSS2/POSSII#POSSII.F-DSS2.143, [voprov:annotation="This is the DSS2 digitazition of the Red POSSII Schmidt survey around NGC 6946", voprov:doculink="http://cds.u-strasbg.fr/aladin.gml", voprov:name="POSSII Red Survey DSS2 NGC6946"])
27 entity(ivo://CDS/P/DSS2/POSSII#POSSII.N-DSS2.143, [voprov:annotation="This is the DSS2 digitazition of the Infra red POSSII Schmidt survey around NGC 6946", voprov:doculink="http://cds.u-strasbg.fr/aladin.gm", voprov:name="POSSII Infra Red Survey DSS2 NGC6946"])
28 activity(hips:AlaRGB1, 2017-04-18T17:28:00, 2017-04-19T17:29:00, [voprov:desc_id="AlaRGB", voprov:desc_type="RGBencoding", voprov:annotation="Aladin RGB image generation for NGC 6946", voprov:desc_name="Aladin RGB image generation algorithm", voprov:name="Aladin RGB 1", voprov:desc_doculink="http://cds.u-strasbg.fr/aladin.gml"])
29 used(hips:AlaRGB1, ivo://CDS/P/DSS2/POSSII#POSSII.J-DSS2.143, -)
30 used(hips:AlaRGB1, ivo://CDS/P/DSS2/POSSII#POSSII.F-DSS2.143, -)
31 used(hips:AlaRGB1, ivo://CDS/P/DSS2/POSSII#POSSII.N-DSS2.143, -)
32 wasGeneratedBy(ivo://CDS/P/DSS2color#RGB_NGC6946, hips:AlaRGB1, 2017-05-05T00:00:00)
33 endDocument
34
35 \end{verbatim}
36
37 This is the corresponding PROV-JSON serialization:
38
39 \begin{verbatim}
40 {
41 "prefix": {
42 "ivo": "http://www.ivoa.net/documents/rer/ivo/",
43 "voprov": "http://www.ivoa.net/documents/dm/provdm/voprov/",
44 "hips": "http://cds.u-strasbg.fr/data/"
45 },
46 "activity": {
47 "hips:AlaRGB1": {
48 "voprov:desc_doculink": "http://cds.u-strasbg.fr/aladin.gml",
49 "voprov:desc_id": "AlaRGB",
50 "prov:startTime": "2017-04-18T17:28:00",
51 "voprov:annotation": "Aladin RGB image generation for NGC 6946",
52 "voprov:desc_type": "RGBencoding",
53 "voprov:desc_name": "Aladin RGB image generation algorithm",
54 "prov:endTime": "2017-04-19T17:29:00",
55 "voprov:name": "Aladin RGB 1"
56 }
57 },
58 "wasGeneratedBy": {
59 "_:id4": {
60 "prov:time": "2017-05-05T00:00:00",
61 "prov:entity": "ivo://CDS/P/DSS2color#RGB_NGC6946",
62 "prov:activity": "hips:AlaRGB1"
63 }
64 },
65 "used": {
66 "_:id1": {
67 "prov:entity": "ivo://CDS/P/DSS2/POSSII#POSSII.J-DSS2.143",
68 "prov:activity": "hips:AlaRGB1"
69 },
70 "_:id3": {
71 "prov:entity": "ivo://CDS/P/DSS2/POSSII#POSSII.N-DSS2.143",
72 "prov:activity": "hips:AlaRGB1"
73 },
74 "_:id2": {
75 "prov:entity": "ivo://CDS/P/DSS2/POSSII#POSSII.F-DSS2.143",
76 "prov:activity": "hips:AlaRGB1"
77 }
78 },
79 "entity": {
80 "ivo://CDS/P/DSS2/POSSII#POSSII.J-DSS2.143": {
81 "voprov:name": "POSSII Blue Survey DSS2 NGC6946",
82 "voprov:annotation": "This is the DSS2 digitazition of the Blue POSSII Schmidt survey around NGC 6946",
83 "voprov:doculink": "http://cds.u-strasbg.fr/aladin.gm"
84 },
85 "ivo://CDS/P/DSS2/POSSII#POSSII.F-DSS2.143": {
86 "voprov:name": "POSSII Red Survey DSS2 NGC6946",
87 "voprov:annotation": "This is the DSS2 digitazition of the Red POSSII Schmidt survey around NGC 6946",
88 "voprov:doculink": "http://cds.u-strasbg.fr/aladin.gml"
89 },
90 "ivo://CDS/P/DSS2/POSSII#POSSII.N-DSS2.143": {
91 "voprov:name": "POSSII Infra Red Survey DSS2 NGC6946",
92 "voprov:annotation": "This is the DSS2 digitazition of the Infra red POSSII Schmidt survey around NGC 6946",
93 "voprov:doculink": "http://cds.u-strasbg.fr/aladin.gm"
94 },
95 "ivo://CDS/P/DSS2color#RGB_NGC6946": {
96 "voprov:name": "RGB DSS2 image for NGC 6946",
97 "voprov:annotation": "This is a PNG RGB image built from DSS2 with Aladin for galaxy NGC 6946",
98 "voprov:doculink": "http://cds.u-strasbg.fr/aladin.gml"
99 }
100 }
101 }
102 \end{verbatim}
103
104 This is the VOTABLE serialization:
105
106 \begin{verbatim}
107
108 <?xml version="1.0" encoding="UTF-8"?>
109 <VOTABLE version="1.2" xmlns="http://www.ivoa.net/xml/VOTable/v1.2" xmlns:hips="http://cds.u-strasbg.fr/data/" xmlns:ivo="http://www.ivoa.net/documents/rer/ivo/" xmlns:voprov="http://www.ivoa.net/documents/dm/provdm/voprov/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.2 http://www.ivoa.net/xml/VOTable/VOTable-1.2.xsd">
110 <RESOURCE type="provenance">
111 <DESCRIPTION>Provenance VOTable</DESCRIPTION>
112 <TABLE name="Usage" utype="voprov:used">
113 <FIELD arraysize="*" datatype="char" name="activity" ucd="meta.id" utype="voprov:Usage.activity"/>
114 <FIELD arraysize="*" datatype="char" name="entity" ucd="meta.id" utype="voprov:Usage.entity"/>
115 <DATA>
116 <TABLEDATA>
117 <TR>
118 <TD>hips:AlaRGB1</TD>
119 <TD>ivo://CDS/P/DSS2/POSSII#POSSII.N-DSS2.143</TD>
120 </TR>
121 </TABLEDATA>
122 </DATA>
123 </TABLE>
124 <TABLE name="Generation" utype="voprov:wasGeneratedBy">
125 <FIELD arraysize="*" datatype="char" name="entity" ucd="meta.id" utype="voprov:Generation.entity"/>
126 <FIELD arraysize="*" datatype="char" name="activity" ucd="meta.id" utype="voprov:Generation.activity"/>
127 <DATA>
128 <TABLEDATA>
129 <TR>
130 <TD>ivo://CDS/P/DSS2color#RGB_NGC6946</TD>
131 <TD>hips:AlaRGB1</TD>
132 </TR>
133 </TABLEDATA>
134 </DATA>
135 </TABLE>
136 <TABLE name="Activity" utype="voprov:Activity">
137 <FIELD arraysize="*" datatype="char" name="id" ucd="meta.id" utype="voprov:Activity.id"/>
138 <FIELD arraysize="*" datatype="char" name="name" ucd="meta.title" utype="voprov:Activity.name"/>
139 <FIELD arraysize="*" datatype="char" name="start" ucd="" utype="voprov:Activity.startTime"/>
140 <FIELD arraysize="*" datatype="char" name="stop" ucd="" utype="voprov:Activity.endTime"/>
141 <FIELD arraysize="*" datatype="char" name="annotation" ucd="meta.description" utype="voprov:Activity.annotation"/>
142 <FIELD arraysize="*" datatype="char" name="desc_id" ucd="" utype="voprov:ActivityDescription.id"/>
143 <FIELD arraysize="*" datatype="char" name="desc_name" ucd="" utype="voprov:ActivityDescription.name"/>
144 <FIELD arraysize="*" datatype="char" name="desc_type" ucd="meta.code.class" utype="voprov:ActivityDescription.type"/>
145 <FIELD arraysize="*" datatype="char" name="desc_doculink" ucd="meta.ref.url" utype="voprov:ActivityDescription.doculink"/>
146 <DATA>
147 <TABLEDATA>
148 <TR>
149 <TD>hips:AlaRGB1</TD>
150 <TD>Aladin RGB 1</TD>
151 <TD>2017-04-18 17:28:00</TD>
152 <TD>2017-04-19 17:29:00</TD>
153 <TD>Aladin RGB image generation for NGC 6946</TD>
154 <TD>AlaRGB</TD>
155 <TD>Aladin RGB image generation algorithm</TD>
156 <TD>RGBencoding</TD>
157 <TD>http://cds.u-strasbg.fr/aladin.gml</TD>
158 </TR>
159 </TABLEDATA>
160 </DATA>
161 </TABLE>
162 <TABLE name="Entity" utype="voprov:Entity">
163 <FIELD arraysize="*" datatype="char" name="id" ucd="meta.id" utype="voprov:Entity.id"/>
164 <FIELD arraysize="*" datatype="char" name="name" ucd="meta.title" utype="voprov:Entity.name"/>
165 <FIELD arraysize="*" datatype="char" name="annotation" ucd="meta.description" utype="voprov:Entity.annotation"/>
166 <DATA>
167 <TABLEDATA>
168 <TR>
169 <TD>ivo://CDS/P/DSS2/POSSII#POSSII.J-DSS2.143</TD>
170 <TD>POSSII Blue Survey DSS2 NGC6946</TD>
171 <TD>This is the DSS2 digitazition of the Blue POSSII Schmidt survey around NGC 6946</TD>
172 </TR>
173 <TR>
174 <TD>ivo://CDS/P/DSS2/POSSII#POSSII.F-DSS2.143</TD>
175 <TD>POSSII Red Survey DSS2 NGC6946</TD>
176 <TD>This is the DSS2 digitazition of the Red POSSII Schmidt survey around NGC 6946</TD>
177 </TR>
178 <TR>
179 <TD>ivo://CDS/P/DSS2/POSSII#POSSII.N-DSS2.143</TD>
180 <TD>POSSII Infra Red Survey DSS2 NGC6946</TD>
181 <TD>This is the DSS2 digitazition of the Infra red POSSII Schmidt survey around NGC 6946</TD>
182 </TR>
183 <TR>
184 <TD>ivo://CDS/P/DSS2color#RGB_NGC6946</TD>
185 <TD>RGB DSS2 image for NGC 6946</TD>
186 <TD>This is a PNG RGB image built from DSS2 with Aladin for galaxy NGC 6946</TD>
187 </TR>
188 </TABLEDATA>
189 </DATA>
190 </TABLE>
191 <INFO name="QUERY_STATUS" value="OK"/>
192 </RESOURCE>
193 </VOTABLE>
194
195 \end{verbatim}
196
197 Such serializations can be retrieved through access protocols (see \ref{sec:access_protocols} ) or directly integrated in dataset headers or ``associated metadata'' in order to provide provenance metadata for these datasets. E.g. for FITS files a provenance extension called ``PROVENANCE'' could be added which contains provenance information of the workflow that generated the FITS file in one of the serialisation formats.
198
199 \TODO{Check that this keyword is not already taken.}
200
201 % \subsection{Graphic formats} --> moved to implementation section. But may want to
202 % include a more general section here, mentioning different ways to serialize
203
204
205 \subsection{Access protocols}
206 \label{sec:access_protocols}
207 We envision two possible access protocols:
208 \begin{itemize}
209 \item ProvDAL: retrieve provenance information based on given ID of a data entity or activity.
210 \item ProvTAP: allows detailed queries for provenance information, discovery of datasets based on e.g. code version.
211 \end{itemize}
212
213 \subsubsection{ProvDAL}
214 ProvDAL is a service the interface of which is organized around one main parameter, the ``ID'' of an entity (obs\_publisher\_did of an ObsDataSet for example) or activity. The response is given in one of the following formats: PROV-N, PROV-JSON, PROV-XML, PROV-VOTABLE. Additional parameters can complete ID to refine the query: FORMAT allows to choose the output format. BACKWARD gives the number of relations that shall be tracked in backward direction, i.e. along the provenance history. Its value is either a positive integer or ALL. If this parameter is omitted, the default is ALL, wich returs the complete provenance history.
215 The optional parameter FORWARD defines the number of forward relations; it's also either a positive integer or ALL, but default is 0. That means if neither FORWARD nor BACKWARD are specified, then the complete provenance history is returned.
216
217
218 The ID parameter is allowed more than once in order to retrieve several data set provenance details at the same time. An example request could look like this:
219
220 \begin{verbatim}
221 {provdal-base-url}?ID=rave:dr4&BACKWARD=1&FORMAT=PROV-JSON
222 \end{verbatim}
223
224 Each of the provenance relation has a direction, BACKWARD follows these directions whereas FORWARD follows the relations in reverse direction, independent of the relation type. This is easier to implement, but has the (for a user unexpected) side effect that e.g. agent relations are only retrieved when using BACKWARD, but never with FORWARD. Similarly for membership (hadStep, hadMember) relations: members of a collection or activityFlow are retrieved only in BACKWARD direction, and collections or activityFlows that contain an entity or activity are only found in FORWARD direction. In order to provide a more user-friendly interface with less surprising behaviour, we define three more request parameters: EXPAND\_AGENT, EXPAND\_COLLECTION and EXPAND\_ACTIVITYFLOW. They take TRUE or FALSE as arguments. If they are set to TRUE, the relations with agents, collections and activityFlows will be included in any case, independent of the direction in which the provenance graph is retrieved.
225 \TODO{Draw a provenance graph picture here with different relation types and arrows for direction.}
226 \TODO{Implementations need to show if this is really the best way.}
227
228 A ProvDAL service MUST implement the parameters ID, BACKWARD and FORMAT; the remaining parameters are optional.
229 If a service does not implement the optional parameters, but they appear in the request, then the service should return with an error.
230
231 Table~\ref{tab:provdal-parameters} summarizes the parameters for such a ProvDAL service interface.
232
233 \begin{table}[h]
234 \small
235 \begin{tabulary}{1.0\textwidth}{@{}p{0.17\textwidth}Lp{0.2\textwidth}p{0.10\textwidth}p{0.3\textwidth}@{}}
236 %{llp{0.2\textwidth}p{0.3\textwidth}}
237 \toprule
238 \head{Parameter} & \head{Requirement} & \head{Value/options} & \head{Default} & \head{Description}\\\hline
239 \midrule
240 ID & required & qualified ID & -- & a valid qualified identifier for an entity or activity (can occur multiple times)\\
241 BACKWARD & required & 0,1,2,..., ALL & ALL & number of relations to be followed backwards or \texttt{ALL} for everything\\
242 FORWARD & optional & 0,1,2,..., ALL & 0 & number of relations to be followed forward or \texttt{ALL} for everything\\
243 FORMAT & required & PROV-N, PROV-JSON, PROV-XML, PROV-VOTABLE & ? & serialisation format of the response\\
244 EXPAND\_ AGENT & optional & TRUE or FALSE & TRUE & include agent relations in any case\\
245 EXPAND\_ COLLECTION & optional & TRUE or FALSE & TRUE & include relations with collections in any case\\
246 EXPAND\_ ACTIVITYFLOW & optional & TRUE or FALSE & TRUE & include relations with activityFlows in any case\\
247 \bottomrule
248 \end{tabulary}
249 \caption{ProvDAL request parameters}
250 \label{tab:provdal-parameters}
251 \end{table}
252
253 \TODO{If EXPAND\_AGENT=TRUE: include all agent relations, but if EXPAND\_AGENT=FALSE, then use default behaviour? Or do not include any of the agent relations? Which one would it be?}
254
255 \clearpage
256 \subsubsection{ProvTAP}
257 ProvTAP is a TAP service implementing the ProvenanceDM data model. The data model mapping is included in the TAP schema. The mapping of ProvenanceDM classes and attributes onto tables and columns of the schema with the appropriate relationships, datatypes, units, utypes and ucds is done similarly to the PROV-VOTABLE serialization. The query response will result in a single table according to the query.
258 This single table is joining information coming from one or several ``provenance'' tables available in the database.
259
260 A special case is considered where ProvenanceDM and ObsCore are both implemented in the same TAP service and queried together. The TAP response is then providing an Obscore table with a ProvenanceDM extension. We can imagine that in the future this could be hard-coded and registered as an ObsTapProv service.
261
262
263 %\TODO{Do we need combined query possibilities, i.e. ask for ObsCore-fields and Provenance fields in one query? Or rather use a 2-step-process, decoupling them from each other?}
264
265
266 %\TODO{Also look at PROV-AQ from the W3C.}

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26