ViewVC logotype

Annotation of /trunk/projects/dm/provenance/description/datamodel-description.tex

Parent Directory Parent Directory | Revision Log Revision Log

Revision 4238 - (hide annotations)
Mon Sep 11 14:33:33 2017 UTC (4 years ago) by mathieu.servillat
File MIME type: application/x-tex
File size: 48671 byte(s)
modify EntityDescription paragraphs, add min/max/option to ParameterDescription table
1 mir.louys 4001 % updates Mireille 2017 April/May 2nd
2     %roles for Agents -updates + funder
3     %
4 kriebe 3734 In this section, we describe the currently discussed Provenance Data Model. We
5     start with an UML class diagram, explain the core elements and then give
6 kriebe 3721 in the following sections more details for each class and relation.
7 kriebe 3447
8 kriebe 4090 \subsection{Overview: Conceptional UML class diagram and introduction to core classes}
9     %We give in this section an overview on the main classes. More details about
10 kriebe 3721 %each class and their relations will be explained in the following sections.
11 kriebe 3654 %Its core elements are colored in blue. These core elements can also be found in the W3C Provenance Data
12     %Model. The pattern defined by these classes is very general and can be reused everywhere where provenance is needed.
13 kriebe 3447
14 kriebe 3452 \begin{figure}[h]
15 kriebe 3451 \centering
16 kriebe 4090 \includegraphics[width=1.0\textwidth]{../datamodel-diagrams/images/domain-classdiagram.pdf}
17     \caption{Overview of the classes for the Provenance Data Model in a conceptual class diagram. The blue classes are core elements. There appear a number of many-to-many relationships with attached association classes (grey) which can contain additional attributes.}
18 kriebe 3699 %Objects in the blue box also appear in the W3C Provenance Data Model.
19 kriebe 4090 %Green classes are links to the IVOA Dataset Metadata Model.}
20     \label{fig:classdiagram-conceptional}
21 kriebe 3451 \end{figure}
24 kriebe 3721 %\label{sec:core}
25 kriebe 3451 % Some examples for different use cases are given in Section \ref{sec:usecases-implementations}.
26     % The elements of a provenance model can be expressed as a directed graph to capture the causal dependencies.
28 kriebe 4090 Figure~\ref{fig:classdiagram-conceptional} shows the conceptional UML diagram for an IVOA Provenance Data
29 kriebe 3721 Model.
30 kriebe 3727 The core elements of the Provenance Data Model are \class{Entity}, \class{Activity} and \class{Agent}.
31 kriebe 3451 We chose for these elements the same names as were used in the Provenance Data
32 mathieu.servillat 3726 Model of the World Wide Web Consortium (W3C, \citealt{std:W3CProvDM}), which defines
33 kriebe 3473 a very abstract pattern that can be reused here. Here are the core classes with
34     a short description and some examples:
35 kriebe 3451
36 kriebe 3447 \begin{itemize}
37 kriebe 3473 \item \class{Entity:} a thing at a certain state\\
38 kriebe 3447 examples: data products like images, catalogs, parameter files, calibration data, instrument characteristics
40 kriebe 3473 \item \class{Activity:} an action/process or a series of actions, occurs over a period of time, performed on or caused by entities, usually results in new entities\\
41 kriebe 3447 examples: data acquisition like observation, simulation; regridding, fusion, calibration steps, reconstruction
43 kriebe 3473 \item \class{Agent:} executes/controls an activity, is responsible for an activity or an entity\\
44 mathieu.servillat 3710 examples: telescope astronomer, pipeline operator, principal investigator, software engineer, project helpdesk
45 kriebe 3447
46     \end{itemize}
48     \noindent
49 kriebe 3451
50 kriebe 3721
52     \begin{figure}[h]
53     \centering
54 kriebe 4015 \includegraphics[scale=0.8]{../datamodel-diagrams/images/classes-core-w3c}
55 kriebe 3721 \caption{The main core classes and relations of the Provenance Data Model, which also occur in the W3C model.}
56     \label{fig:coreclasses}
57     \end{figure}
59     These core classes along with their relations to each other are provided in Figure~\ref{fig:coreclasses}.
60 mathieu.servillat 3710 We use the following relation classes to specify the mapping between the three core
61 kriebe 4204 classes.
62     The relation names were again chosen to match the W3C model names:
63 kriebe 3447 \begin{itemize}
64 mathieu.servillat 3710 \item \class{WasGeneratedBy:} a new entity is generated by an activity\\
65 kriebe 3473 (entity ``image m31.fits'' wasGeneratedBy activity ``observation'')
66     \item \class{Used:} an entity is used by an activity\\
67     (activity ``calibration'' used entities ``calibration data'', ``raw images'')
68     \item \class{WasAssociatedWith:} agents have responsibility for an activity\\
69     (agent ``observer Max Smith'' wasAssociatedWith activity ``observation'')
70     \item \class{WasAttributedTo:} an entity can be attributed to an agent\\
71 kriebe 4032 (entity ``image m31.fits'' wasAttributedTo ``M31 observation campaign'')
72 kriebe 3447 \end{itemize}
74 kriebe 4204 Note that the relations appear as extra classes (and thus boxes in the diagrams, instead of just having annotated relations), because they can have additional attributes -- when mapping the model to a relational database, these relations would appear as mapping tables.
75 kriebe 3703
76     In the domain of astronomy, certain processes and steps are repeated again and
77     again with different parameters. We therefore separate the descriptions of activities
78 kriebe 4104 from the actual processes and introduce an additional \class{ActivityDescription} class (see Figure~\ref{fig:classdiagram-conceptional}).
79 kriebe 3703 Likewise, we also apply the same pattern for \class{Entity} and add an \class{EntityDescription}
80 kriebe 3473 class.
81     Defining such descriptions allows them to be reused, which is very useful
82     when performing a series of tasks of the same type, as is typically done in
83     astronomy.
85 kriebe 3727 A similar normalization of descriptions of the actual processes and datasets
86     can also be found in the IVOA Simulation Data Model \citep[SimDM, ][]{std:SimDM}),
87 kriebe 3703 which describes simulation metadata. The SimDM classes \class{Experiment} and \class{Protocol}
88     correspond to the Provenance terms \class{Activity} and \class{ActivityDescription}.
90 kriebe 3473 %The W3C-model has the advantage of being already an approved standard, and it
91     %contains all the necessary main features needed for a Provenance model for
92     %Astronomy. However, it is very general, and by adding reusable prototypes,
93     %templates or descriptions for activities and entities, the model may fit better
94     %to the astronomy domain.
96 kriebe 3703 This separation into two classes may not be needed for each and every project,
97     and everyone is free to choose which classes make sense for his/her use case.
98 kriebe 4204 When serializing provenance, one can integrate the description side into the
99 kriebe 3721 other classes, thus producing a W3C compliant provenance description. More details about
100     all these classes and relations are given in the following section.
103 kriebe 3703 %It still remains to be seen if this separation into two classes is necessary,
104     %useful or just nice to have. Currently, we include the descriptions in our model,
105     %for normalization purposes.
106 kriebe 3473
107 kriebe 3703 %But when serialising the provenance one could
108     %integrate the description side into the other classes, thus producing W3C
109     %compliant provenance.
110 kriebe 3473
111 kriebe 3703
112     \subsection{Model description}
113 kriebe 4090
114     \subsubsection{Class diagram and VO-DML compatibility}
115     \begin{figure}[h]
116     \centering
117     \includegraphics[width=1.0\textwidth]{../datamodel-diagrams/images/classes-overview.pdf}
118     \caption{More detailed overview of the classes for the Provenance Data Model. Note that this UML class diagram is more compatible with VO-DML.}
119     \label{fig:classdiagram}
120     \end{figure}
122 kriebe 4204 Figure~\ref{fig:classdiagram} shows the full class diagram with the association classes for the many-to-many relations modeled more directly as mapping classes. When implementing the model in a relational database, these classes can be represented as individual tables for mapping the relation. We model one of the associations of the many-to-many relationships as composition (full diamond), if the mapping class belongs more strongly to one of its linked classes, e.g. the \emph{Used} relations are strongly dependent on the corresponding \emph{Activities}. The documentation of all classes and an automatically generated figure based on the underlying xmi-description behind this UML diagram is available in the Volute repository at \url{https://volute.g-vo.org/svn/trunk/projects/dm/vo-dml/models/provenancedm/ProvenanceDM.html}.
123 kriebe 4090
124 kriebe 4104 This version of the UML diagram is fully VO-DML compliant, i.e. we just used the restricted subset of UML to model
125     Provenance and reused the IVOA datatypes.
126 kriebe 4090
127 kriebe 4104
128 kriebe 3721 \subsubsection{Entity and EntityDescription}
129 mathieu.servillat 4238
130 mnullmei 3490 Entities in astronomy are usually astronomical or astrophysical datasets in the
131 kriebe 3473 form of images, tables, numbers, etc. But they can also be observation or
132 kriebe 3910 simulation log files, files containing system information, environment variables, names and versions of packages, ambient conditions or, in a wider sense, also observation proposals, scientific
133     articles, or manuals and other documents.
135 mathieu.servillat 4238 An entity is not restricted to being a file.
136 kriebe 3473 It can even be just a number in a table, depending on how fine-grained the
137     provenance shall be described.
138 kriebe 3452
139 kriebe 3701 \begin{figure}[h]
140     \centering
141 kriebe 4204 \includegraphics[scale=0.6]{../datamodel-diagrams/images/entity-details.pdf}
142 kriebe 4032 \caption{The relation between Entity, EntityDescription and Collection (see Section~\ref{sec:collection}).
143     Links to the Dataset class from the Dataset Metadata Model are described in Section~\ref{sec:dmlinks}.}
144 kriebe 3705 \label{fig:entity-details}
145 kriebe 3701 \end{figure}
147 mathieu.servillat 4238 The VO concept closest to Entity is the notion of ``Dataset'', which could mean a single
148 kriebe 3701 table, an image or a collection of them. The Dataset Metadata Model
149 kriebe 3452 \citep{std:DatasetDM} specifies an ``IVOA Dataset'' as ``a file or files which
150 kriebe 3721 are considered to be a single deliverable''.
151 kriebe 4032 Most attributes of the \class{Dataset} class can be mapped
152     directly to attributes of the \class{Entity} and EntityDescription class, see the mapping table \ref{tab:datasetmapping} in Section~\ref{sec:dmlinks}.
153 kriebe 3654
154 kriebe 3447
155 kriebe 3452 \begin{table}[h]
156 kriebe 3447
157 kriebe 3457 \small
158 kriebe 4032 \tymax 0.5\textwidth
159 kriebe 3457
160 kriebe 3473 \textbf{\normalsize Entity}\vspace{0.25em}\\
161 kriebe 3699 \begin{tabulary}{1.0\textwidth}{@{}lp{3.5cm}p{2cm}L@{}}
162 kriebe 3457 \toprule
163 kriebe 3699 \head{Attribute} & \head{W3C ProvDM} & \head{Data type} & \head{Description}\\
164 kriebe 3457 \midrule
165 kriebe 3699 \textbf{id} & prov:id & (qualified) string & a unique id for this entity (unique in its realm)\\
166 kriebe 4027 name & prov:label & string & a human-readable name for the entity (to be displayed by clients)\\
167 kriebe 4032 type & prov:type & string & a provenance type, i.e. one of: prov:collection, prov:bundle, prov:plan, prov:entity; not needed for a simple entity\\
168 kriebe 3910 %description\_ref & & foreign key/url & link to \class{EntityDescription}\\
169 kriebe 3765 annotation & prov:description & string & text describing the entity in more detail\\
170 mir.louys 4001 rights & -- & string & access rights for the data, values: public, restricted or internal; can be linked to Curation.Rights from ObsCore/DatasetDM\\
171 kriebe 4204 creationTime & -- & datetime & date and time at which the entity was created (e.g. timestamp of a file)\\
172 kriebe 3457 \bottomrule
173     \end{tabulary}
174 kriebe 3654 \caption{Attributes of entities. Mandatory attributes are marked in bold.
175 kriebe 3473 }\label{tab:entity-attributes}
176     \end{table}
177 kriebe 3447
178 kriebe 3721 For entities, we suggest the attributes given in Table
179     \ref{tab:entity-attributes}. If the attribute also exists in the W3C
180 kriebe 3734 Provenance Data Model, we list its name in the second column.
181 kriebe 3473
182 mathieu.servillat 4238 %We discussed further attributes like \emph{size} and \emph{format}, but we decided to treat an
183     %entity of the same content but different format (and thus size) as the same entity,
184     %unless they do not have the same provenance (e.g. when the ``transformation'' activity
185     %for converting one format into another is included in the provenance description).
187 kriebe 3654 %\TODO{format and size may not be needed, if entities with the same content but different format and size are considered as the same entity.}
189 kriebe 3447 The difference between entities that are used as input data or output data
190 kriebe 3654 becomes clear by specifying the relations between the data and activities producing or using these data.
191 kriebe 3457 More details on this will follow in Section \ref{sec:entity-activity-relations}.
192 kriebe 3447
193 kriebe 3721 \paragraph{EntityDescription.}
194 mathieu.servillat 4238 %The Entity class can have an EntityDescription class attached.
195     The types of entities, or datasets in astronomy, can be predefined using a description class \class{EntityDescription}.
196     This class is meant to store information about an Entity that are known before the Entity instance is created. For example, if we run an activity to create a RGB image from three grey images, we may have a mandatory format for the input and output images before the execution (JPG, PNG, FITS\dots), but we probably cannot know the final size of the image that will be created. Therefore, ``format'' would be an EntityDescription attribute , while ``size'' would be an attribute of the Entity instance.
197 kriebe 3457
198 mathieu.servillat 4238 %This class thus stores entity-related
199     Some of the attributes that describe the content of the data could be derived from
200     the Dataset Metadata Model.
202 kriebe 3473 The \class{EntityDescription} does NOT contain any information about the usage
203     of the data, it tells nothing about them being used as input or output. This is
204 kriebe 3703 defined only by the relations (and the relation descriptions) between activities
205 kriebe 3473 and entities (see Section \ref{sec:entity-activity-relations}).
207 mathieu.servillat 4238 The EntityDescription general attributes are summarized in Table
208     \ref{tab:entitydescription-attributes}.
209 kriebe 3473
210 mathieu.servillat 4238
211 kriebe 3473 \begin{table}[h]
212     \small
213 kriebe 4032 \tymax 0.5\textwidth
214 kriebe 3473 \textbf{\normalsize EntityDescription}\vspace{0.25em}\\
215 kriebe 3699 \begin{tabulary}{\textwidth}{@{}p{2.75cm}p{0cm}p{2cm}L@{}}
216 kriebe 3473 \toprule
217 kriebe 3699 \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\
218 kriebe 3473 \midrule
219 kriebe 3699 \textbf{id} & & (qualified) string & a unique identifier for this description\\
220 kriebe 4027 name & & string & a human-readable name for the entity description\\
221     annotation & & string & a decriptive text for this kind of entity\\
222 kriebe 4204 category & & string & specifies if the entity contains information on logging, system (environment), calibration, simulation, observation, configuration, ...\\
223 kriebe 3769 doculink & & url & link to more documentation\\
224 kriebe 4027 % removed the obscore attributes, since specific for observations only, not applicable to configuration entities etc.
225     % dataproduct\_ type & & string & from ObsCore data model \citep{std:ObsCore}, if applicable; describes, what kind of product it is (e.g. image, table)\\
226     % dataproduct\_ subtype & & string & from ObsCore data model, more specific subtype\\
227     % level & & enum integer & the level of processing or calibration; for ObsCore's calib\_level it is an integer between 0 and 3\\
228 kriebe 3473 \bottomrule
229     \end{tabulary}
230 kriebe 3654 \caption{Attributes of \class{EntityDescription}. For simple use cases,
231 kriebe 3473 the description classes may be ignored and its attributes may be used for
232 kriebe 3654 \class{Entity} instead.
233 kriebe 3699 %The utypes may vary depending on the data model, e.g. for simulation data they
234     %would point to utypes of SimDM.
235 kriebe 3473 }\label{tab:entitydescription-attributes}
236     \end{table}
238 kriebe 4032
239     \begin{table}[h]
241     \small
242     \tymax 0.5\textwidth
244     \textbf{\normalsize WasDerivedFrom}\vspace{0.25em}\\
245     \begin{tabulary}{1.0\textwidth}{@{}lp{3cm}L@{}}
246     \toprule
247     \head{Attribute} & \head{Data type} & \head{Description}\\
248     \midrule
249     id & string & a unique id for this entity (unique in its realm)\\
250     \textbf{generatedEntity} & string & foreign key to the entity\\
251     \textbf{usedEntity} & string & foreign key to the progenitor, from which the generatedEntity was derived\\
252     activity & string & foreign key to the generation activity\\
253     generation & string & foreign key to the wasGeneratedBy relation\\
254     usage & string & foreign key to the used relation\\
255     \bottomrule
256     \end{tabulary}
257     \caption{Attributes of the WasDerivedFrom relation. This is the same as used in W3C's ProvDM. Mandatory attributes are marked in bold.
258     }\label{tab:wasderivedfrom-attributes}
259     \end{table}
262 kriebe 3721 \paragraph{WasDerivedFrom.}
263 kriebe 3705 In Figure~\ref{fig:entity-details} there is one more relation that we have not mentioned yet:
264 kriebe 3703 the \class{WasDerivedFrom}-relation which links two entities together, borrowed from the W3C model.
265 kriebe 4204 It is used to express that
266 kriebe 3703 one entity was derived from another, i.e. it can be used to find one (or more) progenitor(s)
267 kriebe 4032 of a dataset, without having to look for the activities in between. It can therefore serve as
268 kriebe 4204 a shortcut.
269 kriebe 4032
270     The information this relation provides is somewhat redundant, since progenitors for entities
271 kriebe 3703 can be found through the links to activity and the corresponding descriptions.
272 kriebe 3727 Nevertheless, we include \class{WasDerivedFrom} for those cases where an explicit
273 kriebe 3703 link between an entity and its progenitor is useful (e.g. for speeding up searches for
274     progenitors or if the activity in between is not important).
276 kriebe 4032 Note that the \class{WasDerivedFrom} relation
277     cannot always automatically be infered from following \class{WasGeneratedBy} and \class{Used} relations alone:
278     If there is more than one input and more than one output of an activity, it is not clear (without
279     consulting the activityDescription and entity roles in the relation-descriptions) which entity was derived from which.
280     Only by specifying the descriptions and roles accordingly or by adding the a \class{WasDerivedFrom} relation,
281 kriebe 4204 this direct derivation becomes known.
282 kriebe 3703
284 kriebe 4032
285 kriebe 3701 \subsubsection{Collection}\label{sec:collection}
286 kriebe 3671 Collections are entities that are grouped together and can be treated as one single entity.
287 kriebe 3727 From the provenance point of view, they have to have the \emph{same origin}, i.e., they were
288 kriebe 3473 produced by the same activity (which could also be the activity of collecting
289 kriebe 3671 data for a publication or similar). The term ``collection'' is
290 kriebe 3701 also used in the Dataset Metadata Model for grouping datasets.
291 kriebe 3671 % (but with a slightly different meaning).
292 kriebe 3668 As an example, a collection
293 kriebe 3457 with the name `RAVE survey' could consist of a number of database tables and spectra files.
295 kriebe 3654 %\TODO{Do we allow empty collections? Or should collections always contain at least 1 member? (otherwise they are just prov:entities?)}
296 kriebe 3538
297 kriebe 3671 The Entity-Collection relation can be modeled using the \emph{Composite} design pattern:
298 kriebe 3473 Collection is a subclass of Entity, but also an aggregation of 1 to many entities,
299     which could be collections themselves.
300 kriebe 4204 In order to be compliant to VODML, we model the membership-relation explicitly
301 kriebe 3701 by including a \class{HadMember} class in our model, which is connected to the
302     \emph{Collection} class via a composition. It may contain an additional role attribute.
303 kriebe 3457
304 kriebe 3701 Collections are also known in the W3C model, in the same sense as used here.
305     The relation between entity and collection is also called ``HadMember'' in the W3C model.
306 kriebe 3457
307 kriebe 3654 An additional class \class{CollectionDescription} is only
308     needed if it has different attributes than
309 kriebe 3727 the \class{EntityDescription}. This class should therefore only be introduced if a use case requires it.
310 kriebe 3473
311 kriebe 3727 \paragraph{Advantages of collections:} Collections can be used to collect entities with the same provenance information together,
312 kriebe 3699 in order to hide complexity where necessary. They can be used for defining
313 kriebe 3671 different levels of detail (granularity).
314 kriebe 3457
315 kriebe 3721 %\TODO{Find a really strong use case for Collections to convince everyone that they are useful/needed.}
316 kriebe 3457
317 kriebe 3721 \subsubsection{Activity and ActivityDescription}
318 kriebe 3705
319     \begin{figure}[h]
320     \centering
321 kriebe 4015 \includegraphics[scale=0.5]{../datamodel-diagrams/images/activity-details.pdf}
322 kriebe 3721 \caption{Details for Activity, ActivityDescription and ActivityFlow (see Section~\ref{sec:activityflow}).
323 kriebe 3705 }
324     \label{fig:activity-details}
325     \end{figure}
327 kriebe 3473 \begin{table}[h]
328 kriebe 3447
329 kriebe 3473 \small
330 kriebe 3721 \tymax 0.5\textwidth
331 kriebe 3447
332 kriebe 3473 \textbf{\normalsize Activity}\vspace{0.25em}\\
333 kriebe 3699 \begin{tabulary}{1.0\textwidth}{@{}lp{2.5cm}p{2cm}L@{}}
334 kriebe 3473 \toprule
335 kriebe 3699 \head{Attribute} & \head{W3C ProvDM} & \head{Data type} & \head{Description}\\
336 kriebe 3473 \midrule
337 kriebe 3699 \textbf{id} & prov:id & (qualified) string & a unique id for this activity (unique in its realm)\\
338 kriebe 4027 name & prov:label & string & a human-readable name (to be displayed by clients)\\
339 kriebe 3699 \textbf{startTime} & prov:startTime & datetime & start of an activity\\
340     \textbf{endTime} & prov:endTime & datetime & end of an activity\\
341 kriebe 3765 annotation & prov:description & string & additional explanations for the specific activity instance\\
342 kriebe 3910 %description\_ref & & foreign key/url & link to \class{ActivityDescription}\\
343 kriebe 3473 \bottomrule
344     \end{tabulary}
345 kriebe 3699 \caption{Attributes of \class{Activity}, their data types and equivalents in the W3C Provenance
346     Data Model, if existing. Attributes in bold are \textbf{mandatory}.}
347 kriebe 3473 \end{table}
348 kriebe 3457
349 kriebe 3699
350 kriebe 3473 \begin{table}[ht]
351     \small
352 kriebe 3721 \tymax 0.5\textwidth
353 kriebe 3473 \textbf{\normalsize ActivityDescription}\vspace{0.25em}\\
354 kriebe 3699 \begin{tabulary}{1.0\textwidth}{@{}p{0cm}p{2.5cm}lL@{}}
355 kriebe 3473 \toprule
356 kriebe 3699 \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\
357 kriebe 3473 \midrule
358 kriebe 3699 \textbf{id} & & string & a unique id for this activity description (unique in its realm)\\
359 kriebe 4027 name & & string & a human-readable name (to be displayed by clients)\\
360 kriebe 3699 type & & string & type of the activity, from a vocabulary or list, e.g. data acquisition (observation or simulation), reduction, calibration, publication\\
361     subtype & & string & more specific subtype of the activity\\
362 kriebe 4027 annotation & & string & additional free text description for the activity\\
363 kriebe 3699 %code & & string & the code used for this process\\
364     %version & & string & a version number for the code\\
365 kriebe 3769 doculink & & url & link to further documentation on this process, e.g. a
366 kriebe 3703 paper, the source code in a version control system etc.\\
367 kriebe 3473 \bottomrule
368     \end{tabulary}
369     \caption{Attributes of \class{ActivityDescription}.}
370     \end{table}
371 kriebe 3447
373 kriebe 3721 Activities in astronomy include all steps from obtaining data to the reduction of
374     images and production of new datasets, like image calibration, bias subtraction, image stacking;
375     light curve generation from a number of observations, radial velocity
376     determination from spectra, post-processing steps of simulations etc.
377 kriebe 3447
378 kriebe 3721 \paragraph{ActivityDescription.}
379     The method underlying an activity can be specified by a corresponding
380     \class{ActivityDescription} class (previously named \class{Method}, corresponds
381     to the \class{Protocol} class in SimDM). This could be,
382     for instance, the name of the code used to perform an activity or a more general
383     description of the underlying algorithm or process. An activity is then a
384     concrete case (instance) of using such a method, with a startTime and endTime,
385     and it refers to a corresponding description for further information.
387     There MUST be exactly zero or one \class{ActivityDescription} per \class{Activity}. If steps from a
388     pipeline shall be grouped together, one needs to create a proper
389     \class{ActivityDescription} for describing all the steps at once. This method can then
390     be refered to by the pipeline-activity.
392     When serializing the data model, the attributes
393     of the description class may be assigned to the activity in order to produce
394     a W3C compliant serialization (same as with Entity/EntityDescription).
397     \paragraph{WasInformedBy.}
398     The individual steps of a pipeline can be chained
399     together directly, without mentioning the intermediate datasets, using the \class{WasInformedBy}-relation.
400     This relation can be used as a short-cut, if the exchanged datasets are deemed to be not important
401     enough to be recorded. For grouping activities, also see the
402     next section \ref{sec:activityflow}.
405     \subsubsection{ActivityFlow}\label{sec:activityflow}
406 kriebe 4135 \TODO{Link to D-PROV!}
407 kriebe 3721 For facilitating grouping of activities (and their related entities etc.)
408     we introduce the class \class{ActivityFlow}.
409 kriebe 4132 It can be used for hiding and grouping a part of the workflow/pipeline
410     or provenance
411     description, if different levels of granularity are needed. Such pipelines and workflows are very common in astronomical data production and processing. Figure \ref{fig:provgraph-activityflow}
412 kriebe 3721 illustrates an example provenance graph in a detailed level (left side)
413     and using the ActivityFlow (right side).
416     \begin{figure}[h]
417     \centering
418 kriebe 4015 \includegraphics[width=1\textwidth]{../datamodel-diagrams/images/provgraph-activityflow}
419 kriebe 3721 \caption{An example provenance graph. The detailed version is shown on the left side. It also shows
420     the shortcut \class{WasInformedBy} to connect two activities, which could be used if the entity e2
421 kriebe 4027 would not be needed anywhere else.
422 kriebe 3721 An ActivityFlow can be used to ``hide'' a part of the provenance graph as is shown on the right side.
423     Activities are marked by blue rectangles, entities by yellow ellipses.}
424     \label{fig:provgraph-activityflow}
425     \end{figure}
427 kriebe 4104 We also explored the different ways to describe a set of activities in the W3C
428 kriebe 3724 provenance model. This model uses \class{Bundle}, i.e. an entity with type ``Bundle'',
429     for wrapping a provenance description. Each part of a provenance description can be
430     put into a bundle, and the bundle can then be reused in other provenance descriptions.
431     W3C's \class{Plan} is an entity with type ``Plan'' and is used for describing a
432     set of actions or steps. Both, \class{Bundle} and \class{Plan}, are entities and
433     have the attributes and relations of this class (and thus one can define provenance of bundles and plans as well).
434 kriebe 3721
435 kriebe 3724 But we would like to consider a set of activities as being an \class{Activity} itself,
436     with all the relations and properties that an activity also has. Therefore we do not reuse
437 kriebe 3725 W3C's classes for describing workflows and plans, but added
438     the class \class{ActivityFlow} as an activity composed of activities. The composition is represented by
439 kriebe 3724 the ``hadStep'' relation, as is shown in Figure~\ref{fig:activity-details}.
441 kriebe 3721 %while still making it obvious that this
442     %group contains activities, we introduce the class \class{ActivityFlow}.
443     %This can be used for describing workflows or pipelines, or for
444     %
445     %We also allow ActivityCollections to consist of a whole provenance graph of
446     %activities and entities being linked together.
449 kriebe 4104 %We could introduce an additional abstract class, e.g. \class{AbstractActivity}, with \class{Activity} and
450     %\class{ActivityFlow} being subclasses to this one. But this adds another layer of complexity
451     %that we may not want in this data model.
452 kriebe 3721
453 kriebe 4104 %Since we introduced \class{ActivityFlow} mainly for having different view levels,
454     %we may want to add an attribute \emph{viewLevel} to descriptions of activityflows.
455     % But where to set the 0 point for viewLevel???
457 kriebe 4204 \begin{figure}[h]
458     \centering
459     \includegraphics[scale=0.6]{../datamodel-diagrams/images/entity-activity-relations.pdf}
460     \hspace{0.15\textwidth}
461     \includegraphics[scale=0.6]{../datamodel-diagrams/images/entity-activity-relations-nodesc.pdf}
462     \caption{\class{Entity} and \class{Activity} are linked via the \class{Used} and \class{WasGeneratedBy} relations. In the left image, the \emph{role} that an entity which was used or generated by an activity played is recorded with the corresponding \emph{UsedDescription} and \emph{WasGeneratedByDescription}, also see Section~\ref{sec:entity-roles}. If these description classes are not used, the \emph{role} can be used directly as an attribute within the \emph{Used} and \emph{WasGeneratedBy}classes (right image).}
463     \label{fig:entity-activity-relations}
464     \end{figure}
465 kriebe 4104
467 kriebe 3473 \subsubsection{Entity-Activity relations}\label{sec:entity-activity-relations}
468 kriebe 3447
469 kriebe 3473 For each data flow it should be possible to clearly identify entities and
470     activities.
471     %If the activities shall not be recorded explicitely, one could also
472     %use the \emph{Derivation}-relation as suggested in the W3C Provenance Data Model
473     %to link derived entities to their originals.
474 kriebe 3721 Each entity is usually a result from an activity, expressed by a link from
475 kriebe 3727 the entity to its generating activity using the \class{WasGeneratedBy} relation,
476 kriebe 3473 and can be used as input for (many) other activities, expressed by the \class{Used} relation.
477     Thus the information on whether data is used as input or was produced as output of
478     some activity is given by the \emph{relation-types} between activities and entities.
479     %In fact,
480     %it would be enough to provide this information just for the relations on the description side (right).
481     % -- Is this true?
482 kriebe 3447
483 kriebe 3473 We use two relations, \class{Used} and \class{WasGeneratedBy}, instead of just one
484 kriebe 3721 mapping class with a flag for input/output, because their descriptions and role-attributes
485     can be different.
486     %in order to model the different
487     %multiplicities explicitely: an entity always has only one (or none)
488     %\class{WasGeneratedBy} relation, but may be \class{Used} many times as input for
489     %different activities.
490 kriebe 3447
491 kriebe 3841 The \class{WasGeneratedBy}-relation can have the optional attribute \emph{time} -- this is the time, when
492 kriebe 4204 the generation of the entity is finished. This generation time corresponds to e.g. \emph{DataID.date} in
493     Dataset Metadata DM.
494 kriebe 3841 %It therefore corresponds to the \emph{created}-time used in
495     %the Simulation Data Model (SimDM).
497 kriebe 4104 \paragraph{Compositions and multiplicities}
498 kriebe 4204 In principle, an entity is produced by just one activity.
499 kriebe 4104 However, by introducing the \class{ActivityFlow} class for grouping activities together,
500 kriebe 4204 one entity can now have many wasGeneratedBy-links to activities. One of them must
501 kriebe 4104 be the actual generation activity, the other activities can only be activityFlows
502 kriebe 4204 containing this generation-activity. This restriction of having only one ``true'' generation activity is not explicitly expressed in the current model\footnote{The reason for this is that we want to keep the model simple and avoid introducing even more classes.}.
503 kriebe 4104
505     The \emph{Used} relation is closely coupled to the \emph{Activity}, so we use a composition here, indicated
506     in Figure~\ref{fig:classdiagram} by a filled diamond:
507     if an activity is deleted, then the corresponding used relations need to be removed as well.
508     The entities that were used still remain, since they may have been used for other activities as well.
509     We need a multiplicity * between \emph{Used} and \emph{Entity}, because an entity can be used more than once
510     (by different activities).
512     Similarly, the \emph{WasGeneratedBy} relation is closely coupled with the \emph{Entity} via a composition,
513     since a wasGeneratedBy relation makes no sense without its entity. So if an entity is deleted,
514     then its wasGeneratedBy relation must be deleted as well. There is a multiplicity * between \emph{Activity}
515     and \emph{WasGeneratedBy}, because an activity can generate many entities.
518 kriebe 4204 \paragraph{Entity roles}\label{sec:entity-roles}
519 kriebe 3473 Each activity requires specific roles for each input or output entity, thus
520 kriebe 3841 we store this information with description classes, in the role-attributes for
521 kriebe 3473 the \class{UsedDescription} and \class{WasGeneratedByDescription} relation.
522 kriebe 3721 For example, an activity for darkframe-subtraction requires two input images. But it is
523     very important to know which of the images is the raw image and
524 mathieu.servillat 4238 which one fulfils the role of dark frame.
525 kriebe 3447
526 kriebe 3721 The role is in general NOT an attribute for \class{EntityDescription} or \class{Entity},
527 kriebe 4204 since the same entity (e.g. a specific FITS file containing an image) may play
528 kriebe 3727 different roles with different activities. If this is not the case, if the
529 kriebe 3721 image can only play the same role everywhere, only then it is an intrinsic
530     property of the entity and should be stored in the \class{EntityDescription}.
532     %Additionally, input (and also output) data can take different roles in an
533     %activity. For example, one file could
534     %be a parameter file, another one is the raw image, and the third one is the
535     %dark field that should be subtracted. Since these roles are very important,
536     %it must be made explicit which data component needs to fulfill which role as
537     %input in or output from an activity.
538     %Each activity requires specific roles for each input or output entity, thus
539     %we store this information on the description side, in the role-attributes for
540     %the \class{UsedDescription} and \class{WasGeneratedByDescription} relation.
542 mnullmei 3490 %In W3C, this is partially solved by adding a derivation relation between the Entities (data). Here, we have a mapping-class between Activity and DataEntities as well as between ActivityDescription and DataDescription. The mapping-class at the description side, i.e. between the ActivityDescription and its DataEntityDescriptions, contains additionally a role for each relation, e.g. parameter, dark frame, raw image, etc. If a dataset is used as input to an activity or if it results from it, will become clear with these roles.
543 kriebe 3447
545 kriebe 3473 Some example roles are given in Table \ref{tab:entity-roles}.
546     Note that these roles don't have to be unique, many datasets may play the same role for
547 kriebe 3671 a process. For example, many image entities may be used as science-ready-images for an
548 kriebe 3473 image stacking process.
549 kriebe 3447
550 kriebe 3473 \begin{table}[h]
551     \small
552 kriebe 4027 \begin{tabulary}{1.0\textwidth}{@{}lL@{}}
553 kriebe 3473 \toprule
554 kriebe 4027 \head{Role} & \head{Example entities}\\
555 kriebe 3473 \midrule
556 kriebe 4027 configuration & configuration file \\ %& used for entities that contain configuration details for an activity\\
557     auxiliary input & calibration image, dark frame, etc. \\%& \\
558     main input & raw image, science-ready images \\%& used for entities that are the main input for an activity\\
559     main result & image, cube or spectrum \\%& used for entities that are the main result of an activity\\
560     log & logging output file \\%& used for logging output \\
561     red & image used for red channel of a composite activity\\%& used for images that will be used as the red channel of a composite activity\\
562 kriebe 3473 \bottomrule
563     \end{tabulary}
564 kriebe 4027 \caption{Examples for entity roles as attributes in the
565 kriebe 3473 \class{UsedDescription} and \class{WasGeneratedByDescription}.}
566     \label{tab:entity-roles}
567     \end{table}
568 mir.louys 4001 % here we cross some notions encountered in parameter descriptions and Activity descriptions while describing parameters
569 kriebe 3447
570 kriebe 3721 In order to facilitate interoperability, the possible
571 kriebe 3473 entity-roles could be defined and described for each activity by the IVOA community, in a
572 kriebe 3721 vocabulary list or thesaurus.
573     % TODO!!
574 kriebe 3447
576 kriebe 3721 %\TODO{Roles can be used for checking (validation) if processes use the correct type of entities,
577     %e.g. check if entity-type matches used-role!}
579 kriebe 3473 %Without the mapping tables, the relation between \class{Activity}
580     %(\class{ActivityDescription}) and \class{Entity} (\class{EntityDescription})
581     %would be an aggregation relation, or in other words: an association with the
582     %aggregation kind ``shared''. That would be required to ensure that all
583     %entities linked to an activity (either as input or output) will survive if
584     %the activity is destroyed, since they are almost always shared with other
585     %activities.
586     %
587     %By using the mapping tables we make the role of an entity in an activity more
588     %explicit and thus can replace the aggregation by a composition relation to the
589     %\class{Activity}/\class{ActivityDescription} and simple associations to the
590     %individual data components and their descriptions.
591 kriebe 3447
593 kriebe 3473 % The derivation relation together with entities is already enough to produce a
594     % Data flow view, but in astronomy we are probably even more interested in the
595     % Processes (as discussed in our first draft for requirements for provenance).
596 kriebe 3447
597 kriebe 3721 %\TODO{Add an example here! (From discussions in Heidelberg.)}
598 kriebe 3447
600 kriebe 3699
601 kriebe 3734 \subsubsection{Parameters}\label{sec:parameters}
602 kriebe 3699
603 mathieu.servillat 3732 The concept of activity configuration, generally a set of parameters that can be configured, is different to the concept of provenance information. However, it is tightly connected. We identify three different ways to link configuration information to an activity:
604     \begin{itemize}
605 kriebe 3734 \item Declare a parameter set (or each parameter) as an input entity that is used by the activity. \\
606     This also allows tracking the provenance of the parameter further.
607     \item Define families of activities, each one with fixed attributes.\\
608     I.e. use different subclasses for activities with different fixed attributes.
609 mathieu.servillat 3732 \item Add activity attributes in the form of key-value parameters.
610     \end{itemize}
611 kriebe 3721
612 kriebe 3734 To enable the latter solution, we add a \class{Parameter} class along with a \class{ParameterDescription} for describing additional properties of activities. In this solution, Parameters are directly connected to an Activity without complex Entity-Activity relations. Moreover, we can then describe each parameter in the same way as in FIELD and PARAM elements in VOTable \citep{std:VOTable}.
613 kriebe 3721
614 mathieu.servillat 3732
615     \begin{table}[h]
616     \small
617     \tymax 0.5\textwidth
618     \textbf{\normalsize Parameter}\vspace{0.25em}\\
619     \begin{tabulary}{1.0\textwidth}{@{}p{0cm}p{2.5cm}lL@{}}
620     \toprule
621     \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\
622     \midrule
623     \textbf{id} & & string & parameter unique identifier\\
624 kriebe 4027 %description\_ref & & foreign key/url & link to \emph{ParameterDescription}\\
625     %name & & string & parameter name, if no link to ParameterDescription is given\\
626 mathieu.servillat 3732 \textbf{value} & & (value dependent) & the value of the parameter\\
627     \bottomrule
628     \end{tabulary}
629     \caption{Attributes of \class{Parameter}. Attributes in bold are \textbf{mandatory}.}
630     \end{table}
632     \begin{table}[ht]
633     \small
634     \tymax 0.5\textwidth
635     \textbf{\normalsize ParameterDescription}\vspace{0.25em}\\
636     \begin{tabulary}{1.0\textwidth}{@{}p{0cm}p{2.5cm}lL@{}}
637     \toprule
638     \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\
639     \midrule
640     \textbf{id} & & string & parameter unique identifier\\
641 kriebe 4027 \textbf{name} & & string & parameter name\\
642 mathieu.servillat 4238 annotation & & string & additional free text description\\
643     datatype & & string & datatype \\
644     unit & & string & physical unit \\
645     ucd & & string & Unified Content Descriptor, supplying a standardized classification of the physical quantity\\
646     utype & & string & UType, meant to express the role of the parameter in the context of an external data model \\
647     min & & number & minimum value \\
648     max & & number & maximum value\\
649     options & & list & list of accepted values\\
650 mathieu.servillat 3732 \bottomrule
651     \end{tabulary}
652     \caption{Attributes of \class{ParameterDescription}.}
653     \end{table}
655     For example, observations generally require information on \emph{ambient conditions} as well as
656 mathieu.servillat 4218 \emph{instrument characteristics}. This contextual data associated with an observation is not directly modelled in the ProvenanceDM. However, this information can be stored as different entities. Alternatively, one could list the instrument characteristics as a set of key-value parameters using the \class{Parameter} class, so that this information is structured and stored with the provenance information (and can thus be queried simultaneously). In the case of a processing activity that cleans an image with a sigma-clipping method, the input and output images would be entities and the value of the number of sigma for sigma-clipping could be a parameter instead of an entity. We may also want to define a 3-sigma-clipping activity where this parameter is fixed to 3.
657 mathieu.servillat 3732
659     %For example for observations, the \emph{ambient conditions} as well as
660     %\emph{instrument characteristics} need to be stored. But they can both be treated
661     %as additional entities as well.
662     %Our model can then also take into account that a certain observation
663     %method requires special ambient conditions, already defined via the
664     %ActivityDescription (e.g. radio observations rely on different ambient
665     %conditions than observations
666     %of gamma rays), just following our data -- data description scheme.
667     %Ambient conditions are recorded for a certain time (startTime, endTime) and are
668     %usually only valid for a certain time interval. This time interval should be recorded
669     %with a \emph{validity}-attribute for such entities.
670     %
671     %In contrast to ambient conditions, instrument characteristics do (usually) not
672     %change from one observation to the other, so they are static, strictly related to
673     %the instrument.
674     %All the characteristics could be described either as key-value pairs directly with the
675     %observation (as attributes) or just as datasets, using the \class{Entity} class.
676     %One would then
677     %link the instrument characteristics as a type of input (or output?) dataset to a certain
678     %observation activity. Thus we don't need a separate Instrument or Device class.
680 kriebe 3721 %\note{One should also keep in mind that some instrument related parameters can change within time,
681     %e.g. the CCD temperature. The instruments can also change within time because of aging.}
684 mathieu.servillat 3732
685 kriebe 3473 \subsubsection{Agent}\label{sec:w3c-agent}
686 kriebe 3699
687     An \class{Agent} describes someone who is responsible for a certain task or
688 kriebe 3671 entity, e.g. who pressed a button,
689     ran a script, performed the observation or published a dataset.
690     The agent can be a single person, a group of persons (e.g. MUSE WISE Team), a
691 kriebe 4204 project (CTA) or an institute.
692 kriebe 3701 This is also reflected in the IVOA Dataset Metadata Model, where \class{Party}
693 kriebe 4071 represents an agent, and it has two types: \class{Individual} and \class{Organization},
694     which are explained in more detail in Table \ref{tab:agent-types} (also see Section~\ref{sec:dmlinks} for comparison between \class{Agent} and \class{Party}).
695     Both agent types are also used in the W3C Provenance Data Model, though
696     \class{Individual} is called \class{Person} there.
697 kriebe 4204 We decided to not include the type \class{SoftwareAgent} from W3C (yet), since it is not required for our current use cases. This may change in the future.
698 kriebe 3447
699 kriebe 3699 \begin{table}[h]
700     \small
701     \tymax 0.5\textwidth
702     \begin{center}
703     \begin{tabulary}{1.0\textwidth}{@{}lllL@{}}
704 kriebe 3721 \multicolumn{4}{c}{\textbf{AgentType}}\\
705 kriebe 3699 \toprule
706 kriebe 4027 \head{Class or type} & \head{W3C ProvDM} & \head{DatasetDM} &\head{Comment} \\
707 kriebe 3699 \midrule
708     Agent & Agent & Party & \\
709     Individual & Person & Individual & a person, specified by name, email, address,
710     (though all these parts may change in time)\\
711     Organization & Organization & Organization & a publishing house, institute or scientific project\\
712 mir.louys 4001
713 kriebe 4027
714 kriebe 3699 \bottomrule
715     \end{tabulary}
716 kriebe 4027 \caption{Agent class and types of agents/subclasses in this data model, compared to W3C ProvDM and DatasetDM.}
717 kriebe 3699 \label{tab:agent-types}
718     \end{center}
719     \end{table}
720 kriebe 3671
721 kriebe 4028 \begin{table}[h]
722     \small
723     \tymax 0.5\textwidth
724     \begin{center}
725     \begin{tabulary}{1.0\textwidth}{@{}llp{2cm}L@{}}
726     \multicolumn{4}{c}{\textbf{Agent}}\\
727     \toprule
728     \head{Attribute} & \head{W3C ProvDM} & \head{Data type} & \head{Description}\\
729     \midrule
730     \textbf{id} & prov:id & (qualified) string & unique identifier for an agent\\
731     \textbf{name} & prov:name & string & a common name for this agent; e.g. first name and last name; project name, agency name...\\
732     type & prov:type & string & type of the agent: either Individual (Person) or Organization\\
733     % insert here the attributes dedicated to contact for a Party in DataSet Metadata DM.
734 kriebe 4071 % \hline
735     % \multicolumn{4}{l}{Additional optional attributes from Dataset.Party subclasses:}\\
736     % \hline
737     % address & & string & Address of the agent both for Individual (Person) and Organization\\
738     % phone & & string & Contact phone number of the agent both for Individual (Person) and Organization\\
739     % email & & string & Contact email of the agent both for Individual (Person) and Organization\\
740 kriebe 4028 \bottomrule
741     \end{tabulary}
742     \caption{Agent attributes}
743     \label{tab:agent-attributes}
744     \end{center}
745     \end{table}
748 kriebe 4071
749     A definition of organizations is given in the
750     IVOA Recommendation on Resource Metadata \citep{std:ResourceMeta}, hereafter
751     refered to as RM: ``An organisation is [a] specific type of resource that
752     brings people together to pursue participation in VO applications.''
753     It also specifies further that scientific projects can be considered
754     as organisations on a finer level:
755     ``At a high level, an organisation could be a university, observatory, or government
756     agency. At a finer level, it could be a specific scientific project, space mission,
757     or individual researcher. A provider is an organisation that makes data and/or services
758     available to users over the network.''
762     For each agent a \emph{name} should be specified, a summary of the attributes for \class{Agent} is given in Table~\ref{tab:agent-attributes}.
763     One could also add the optional attributes \emph{address}, \emph{phone} and \emph{email} (compare with subclasses of \emph{Party} in Section~\ref{sec:dmlinks}). However, we skip them here in this main class, since an advanced system may use permanent identifiers (e.g. ORCIDs) to identify agents and retrieve their properties from an external system.
764 kriebe 3699 It would also increase the value of the given
765     information if the (current) affiliation of the agent (and a project leader/group
766     leader) were specified in order to maximize the chance of finding any contact
767     person later on.
768     The contact information is needed in case more information about a certain step in the past of a dataset is required,
769     but also in order
770     to know who was involved and to fulfill our ``Attribution'' requirement
771     (Section~\ref{sec:requirements}), so that proper credits are given to the right
772 kriebe 4071 people/projects.
773 kriebe 3447
775 kriebe 3699
776     It is desired to have at least one agent given for each activity (and entity), but it
777     is not enforced.
778     % , hence the multiplicity between \class{Entity}/\class{Activity} and the relations
779     %to the \class{Agent} starts with 0.
780 kriebe 4071 There can also be more than one agent for each activity/entity with different \emph{roles}
781 kriebe 3699 and one agent can be responsible for more than one activity or entity. This
782 kriebe 3727 many-to-many relationship is made explicit in our model by adding the two
783 kriebe 3699 following relation classes:
785 kriebe 3473 \begin{itemize}
786     \item wasAssociatedWith: relates an \emph{activity} to an agent
787     \item wasAttributedTo: relates an \emph{entity} to an agent
788 kriebe 3447 \end{itemize}
790 kriebe 3699 We adopted here the same naming scheme as was used in W3C ProvDM.
791 kriebe 3473 Note that the attributed-to-agent for a dataset may be different from the
792 kriebe 4027 agent that is associated with the activity that created an entity.
793 kriebe 3473 Someone who is performing a task is not necessarily given full attribution,
794     especially if he acts on behalf of someone else (the project, university, ...).
795 kriebe 3447
796 kriebe 4028
797 kriebe 3699 In order to make it clearer what an agent is useful for, we suggest the
798 kriebe 3473 possible roles an agent can have (along with descriptions partially taken from RM)
799 kriebe 4027 in Table~\ref{tab:agent-roles}.
800 kriebe 4071 For comparison, SimDM contains following roles for their contacts:
801     owner, creator, publisher and contributor. Note that the \emph{Party} class in Dataset and SimDM are very similar to the \emph{Agent} class, which is explained in more detail in Section~\ref{sec:dmlinks}.
802 kriebe 3447
804 kriebe 3473 \begin{table}[h]
805     \small
806 kriebe 4032 \tymax 0.5\textwidth
807 kriebe 3473 \begin{center}
808 kriebe 3734 \begin{tabulary}{1.0\textwidth}{@{}lp{3cm}L@{}}
809 kriebe 3473 \multicolumn{3}{c}{\textbf{AgentRoles}}\\
810     \toprule
811 kriebe 4029 \head{role} & \head{type or sub class} & \head{Comment} \\
812 kriebe 3473 \midrule
813 kriebe 4029 author & Individual & someone who wrote an article, software, proposal\\
814     contributor & Individual & someone who contributed to something (but not enough to gain authorship)\\
815     editor & Individual & editor of e.g. an article, before publishing\\
816     creator & Individual & someone who created a dataset, creators of articles or software are rather called ``author''\\
817     curator & Individual & someone who checked and corrected a dataset before publishing\\
818     publisher & Organization {(maybe also Individual?)}& organization (publishing house, institute) that published something\\
819     observer & Individual & observer at the telescope\\
820     operator & Individual & someone performing a given task \\ % removed executor: ambiguous
821     coordinator/PI & Individual & someone coordinating/leading a project\\ % we should choose one word : PI?
822     funder & Organization & agency or sponsor for a project as in Prov-N\\
823     provider & Organization & ``an organization that makes data and/or services available to users over the network'' (definition from RM)\\
824 kriebe 4027 %(owner) & voprov:Individual or voprov:Organization & Does anyone really own the data?\\
825 kriebe 3473 \bottomrule
826     \end{tabulary}
827 kriebe 3734 \caption{Examples for roles of agents and the typical type of that agent}
828 kriebe 3473 \label{tab:agent-roles}
829     \end{center}
830     \end{table}
831 kriebe 3447
832 kriebe 3734 %\TODO{\textbf{Mireille + Fran\c{c}ois}: Go through these roles, pick only the necessary ones, crosscheck with other data models.}
833 kriebe 3447
834 kriebe 3699 This list is \emph{not} complete. We consider providing a vocabulary list for this
835     in a future version of this model, collected from (future) implementations of this model.
836 kriebe 3447
837 kriebe 3699 %\TODO{Do we have a specific use case for fixing the agent-roles? Is anyone
838     %going to search for specific roles in the Provenance meta-data?
839     %Or shall we leave it open, which roles can be defined and just give examples here?}
840     % ... Yes, just give examples here. Should have a vocabulary list somewhere ...
841 kriebe 3447
842 kriebe 3703 %\subsubsection{Shortcuts: WasDerivedFrom and WasInformedBy}\label{sec:shortcuts}
843     %The classes \class{WasDerivedFrom} and \class{WasInformedBy} can be used as ``shortcuts'' and
844     %are used in the same way as the corresponding W3C classes.
845 kriebe 3671
846 kriebe 3703 %\class{WasDerivedFrom} defines the relation that links two entities together, if one entity was derived
847     %from the other entity. In principle, one can find this information also by tracing the
848     %history of an entity backwards to the generating activity and its input entities.
849     %The descriptions for activity, entity and their relations should provide enough
850     %information to find the progenitor entity from which an entity was derived.
851     %Nevertheless, we include \class{WasDerivedFrom} for those cases where an explicite
852     %link between an entity and its progenitor is useful (e.g. for speeding up searches for
853     %progenitors or if the activity in between is not important).
854 kriebe 3671
855 kriebe 3703 %The class \class{WasInformedBy} links two activities together without defining the
856     %intermediate entities that may have been exchanged. This is useful for e.g. pipelines,
857     %if the intermediate entities don't play a major role or only exist temporarily, so that
858     %their provenance information is not deemed to be important enough to be recorded.
859 kriebe 3671 %``WasInformedBy'' relation (also called ``Communication'' relation, borrowed from W3C's model)
860 kriebe 3734
862     %\subsection{Implementation hints}

ViewVC Help
Powered by ViewVC 1.1.26