# Annotation of /trunk/projects/dm/provenance/description/datamodel-description.tex

Revision 4238 - (hide annotations)
Mon Sep 11 14:33:33 2017 UTC (4 years ago) by mathieu.servillat
File MIME type: application/x-tex
File size: 48671 byte(s)
modify EntityDescription paragraphs, add min/max/option to ParameterDescription table

 1 mir.louys 4001 % updates Mireille 2017 April/May 2nd 2 %roles for Agents -updates + funder 3 % 4 kriebe 3734 In this section, we describe the currently discussed Provenance Data Model. We 5 start with an UML class diagram, explain the core elements and then give 6 kriebe 3721 in the following sections more details for each class and relation. 7 kriebe 3447 8 kriebe 4090 \subsection{Overview: Conceptional UML class diagram and introduction to core classes} 9 %We give in this section an overview on the main classes. More details about 10 kriebe 3721 %each class and their relations will be explained in the following sections. 11 kriebe 3654 %Its core elements are colored in blue. These core elements can also be found in the W3C Provenance Data 12 %Model. The pattern defined by these classes is very general and can be reused everywhere where provenance is needed. 13 kriebe 3447 14 kriebe 3452 \begin{figure}[h] 15 kriebe 3451 \centering 16 kriebe 4090 \includegraphics[width=1.0\textwidth]{../datamodel-diagrams/images/domain-classdiagram.pdf} 17 \caption{Overview of the classes for the Provenance Data Model in a conceptual class diagram. The blue classes are core elements. There appear a number of many-to-many relationships with attached association classes (grey) which can contain additional attributes.} 18 kriebe 3699 %Objects in the blue box also appear in the W3C Provenance Data Model. 19 kriebe 4090 %Green classes are links to the IVOA Dataset Metadata Model.} 20 \label{fig:classdiagram-conceptional} 21 kriebe 3451 \end{figure} 22 23 24 kriebe 3721 %\label{sec:core} 25 kriebe 3451 % Some examples for different use cases are given in Section \ref{sec:usecases-implementations}. 26 % The elements of a provenance model can be expressed as a directed graph to capture the causal dependencies. 27 28 kriebe 4090 Figure~\ref{fig:classdiagram-conceptional} shows the conceptional UML diagram for an IVOA Provenance Data 29 kriebe 3721 Model. 30 kriebe 3727 The core elements of the Provenance Data Model are \class{Entity}, \class{Activity} and \class{Agent}. 31 kriebe 3451 We chose for these elements the same names as were used in the Provenance Data 32 mathieu.servillat 3726 Model of the World Wide Web Consortium (W3C, \citealt{std:W3CProvDM}), which defines 33 kriebe 3473 a very abstract pattern that can be reused here. Here are the core classes with 34 a short description and some examples: 35 kriebe 3451 36 kriebe 3447 \begin{itemize} 37 kriebe 3473 \item \class{Entity:} a thing at a certain state\\ 38 kriebe 3447 examples: data products like images, catalogs, parameter files, calibration data, instrument characteristics 39 40 kriebe 3473 \item \class{Activity:} an action/process or a series of actions, occurs over a period of time, performed on or caused by entities, usually results in new entities\\ 41 kriebe 3447 examples: data acquisition like observation, simulation; regridding, fusion, calibration steps, reconstruction 42 43 kriebe 3473 \item \class{Agent:} executes/controls an activity, is responsible for an activity or an entity\\ 44 mathieu.servillat 3710 examples: telescope astronomer, pipeline operator, principal investigator, software engineer, project helpdesk 45 kriebe 3447 46 \end{itemize} 47 48 \noindent 49 kriebe 3451 50 kriebe 3721 51 52 \begin{figure}[h] 53 \centering 54 kriebe 4015 \includegraphics[scale=0.8]{../datamodel-diagrams/images/classes-core-w3c} 55 kriebe 3721 \caption{The main core classes and relations of the Provenance Data Model, which also occur in the W3C model.} 56 \label{fig:coreclasses} 57 \end{figure} 58 59 These core classes along with their relations to each other are provided in Figure~\ref{fig:coreclasses}. 60 mathieu.servillat 3710 We use the following relation classes to specify the mapping between the three core 61 kriebe 4204 classes. 62 The relation names were again chosen to match the W3C model names: 63 kriebe 3447 \begin{itemize} 64 mathieu.servillat 3710 \item \class{WasGeneratedBy:} a new entity is generated by an activity\\ 65 kriebe 3473 (entity image m31.fits'' wasGeneratedBy activity observation'') 66 \item \class{Used:} an entity is used by an activity\\ 67 (activity calibration'' used entities calibration data'', raw images'') 68 \item \class{WasAssociatedWith:} agents have responsibility for an activity\\ 69 (agent observer Max Smith'' wasAssociatedWith activity observation'') 70 \item \class{WasAttributedTo:} an entity can be attributed to an agent\\ 71 kriebe 4032 (entity image m31.fits'' wasAttributedTo M31 observation campaign'') 72 kriebe 3447 \end{itemize} 73 74 kriebe 4204 Note that the relations appear as extra classes (and thus boxes in the diagrams, instead of just having annotated relations), because they can have additional attributes -- when mapping the model to a relational database, these relations would appear as mapping tables. 75 kriebe 3703 76 In the domain of astronomy, certain processes and steps are repeated again and 77 again with different parameters. We therefore separate the descriptions of activities 78 kriebe 4104 from the actual processes and introduce an additional \class{ActivityDescription} class (see Figure~\ref{fig:classdiagram-conceptional}). 79 kriebe 3703 Likewise, we also apply the same pattern for \class{Entity} and add an \class{EntityDescription} 80 kriebe 3473 class. 81 Defining such descriptions allows them to be reused, which is very useful 82 when performing a series of tasks of the same type, as is typically done in 83 astronomy. 84 85 kriebe 3727 A similar normalization of descriptions of the actual processes and datasets 86 can also be found in the IVOA Simulation Data Model \citep[SimDM, ][]{std:SimDM}), 87 kriebe 3703 which describes simulation metadata. The SimDM classes \class{Experiment} and \class{Protocol} 88 correspond to the Provenance terms \class{Activity} and \class{ActivityDescription}. 89 90 kriebe 3473 %The W3C-model has the advantage of being already an approved standard, and it 91 %contains all the necessary main features needed for a Provenance model for 92 %Astronomy. However, it is very general, and by adding reusable prototypes, 93 %templates or descriptions for activities and entities, the model may fit better 94 %to the astronomy domain. 95 96 kriebe 3703 This separation into two classes may not be needed for each and every project, 97 and everyone is free to choose which classes make sense for his/her use case. 98 kriebe 4204 When serializing provenance, one can integrate the description side into the 99 kriebe 3721 other classes, thus producing a W3C compliant provenance description. More details about 100 all these classes and relations are given in the following section. 101 102 103 kriebe 3703 %It still remains to be seen if this separation into two classes is necessary, 104 %useful or just nice to have. Currently, we include the descriptions in our model, 105 %for normalization purposes. 106 kriebe 3473 107 kriebe 3703 %But when serialising the provenance one could 108 %integrate the description side into the other classes, thus producing W3C 109 %compliant provenance. 110 kriebe 3473 111 kriebe 3703 112 \subsection{Model description} 113 kriebe 4090 114 \subsubsection{Class diagram and VO-DML compatibility} 115 \begin{figure}[h] 116 \centering 117 \includegraphics[width=1.0\textwidth]{../datamodel-diagrams/images/classes-overview.pdf} 118 \caption{More detailed overview of the classes for the Provenance Data Model. Note that this UML class diagram is more compatible with VO-DML.} 119 \label{fig:classdiagram} 120 \end{figure} 121 122 kriebe 4204 Figure~\ref{fig:classdiagram} shows the full class diagram with the association classes for the many-to-many relations modeled more directly as mapping classes. When implementing the model in a relational database, these classes can be represented as individual tables for mapping the relation. We model one of the associations of the many-to-many relationships as composition (full diamond), if the mapping class belongs more strongly to one of its linked classes, e.g. the \emph{Used} relations are strongly dependent on the corresponding \emph{Activities}. The documentation of all classes and an automatically generated figure based on the underlying xmi-description behind this UML diagram is available in the Volute repository at \url{https://volute.g-vo.org/svn/trunk/projects/dm/vo-dml/models/provenancedm/ProvenanceDM.html}. 123 kriebe 4090 124 kriebe 4104 This version of the UML diagram is fully VO-DML compliant, i.e. we just used the restricted subset of UML to model 125 Provenance and reused the IVOA datatypes. 126 kriebe 4090 127 kriebe 4104 128 kriebe 3721 \subsubsection{Entity and EntityDescription} 129 mathieu.servillat 4238 130 mnullmei 3490 Entities in astronomy are usually astronomical or astrophysical datasets in the 131 kriebe 3473 form of images, tables, numbers, etc. But they can also be observation or 132 kriebe 3910 simulation log files, files containing system information, environment variables, names and versions of packages, ambient conditions or, in a wider sense, also observation proposals, scientific 133 articles, or manuals and other documents. 134 135 mathieu.servillat 4238 An entity is not restricted to being a file. 136 kriebe 3473 It can even be just a number in a table, depending on how fine-grained the 137 provenance shall be described. 138 kriebe 3452 139 kriebe 3701 \begin{figure}[h] 140 \centering 141 kriebe 4204 \includegraphics[scale=0.6]{../datamodel-diagrams/images/entity-details.pdf} 142 kriebe 4032 \caption{The relation between Entity, EntityDescription and Collection (see Section~\ref{sec:collection}). 143 Links to the Dataset class from the Dataset Metadata Model are described in Section~\ref{sec:dmlinks}.} 144 kriebe 3705 \label{fig:entity-details} 145 kriebe 3701 \end{figure} 146 147 mathieu.servillat 4238 The VO concept closest to Entity is the notion of Dataset'', which could mean a single 148 kriebe 3701 table, an image or a collection of them. The Dataset Metadata Model 149 kriebe 3452 \citep{std:DatasetDM} specifies an IVOA Dataset'' as a file or files which 150 kriebe 3721 are considered to be a single deliverable''. 151 kriebe 4032 Most attributes of the \class{Dataset} class can be mapped 152 directly to attributes of the \class{Entity} and EntityDescription class, see the mapping table \ref{tab:datasetmapping} in Section~\ref{sec:dmlinks}. 153 kriebe 3654 154 kriebe 3447 155 kriebe 3452 \begin{table}[h] 156 kriebe 3447 157 kriebe 3457 \small 158 kriebe 4032 \tymax 0.5\textwidth 159 kriebe 3457 160 kriebe 3473 \textbf{\normalsize Entity}\vspace{0.25em}\\ 161 kriebe 3699 \begin{tabulary}{1.0\textwidth}{@{}lp{3.5cm}p{2cm}L@{}} 162 kriebe 3457 \toprule 163 kriebe 3699 \head{Attribute} & \head{W3C ProvDM} & \head{Data type} & \head{Description}\\ 164 kriebe 3457 \midrule 165 kriebe 3699 \textbf{id} & prov:id & (qualified) string & a unique id for this entity (unique in its realm)\\ 166 kriebe 4027 name & prov:label & string & a human-readable name for the entity (to be displayed by clients)\\ 167 kriebe 4032 type & prov:type & string & a provenance type, i.e. one of: prov:collection, prov:bundle, prov:plan, prov:entity; not needed for a simple entity\\ 168 kriebe 3910 %description\_ref & & foreign key/url & link to \class{EntityDescription}\\ 169 kriebe 3765 annotation & prov:description & string & text describing the entity in more detail\\ 170 mir.louys 4001 rights & -- & string & access rights for the data, values: public, restricted or internal; can be linked to Curation.Rights from ObsCore/DatasetDM\\ 171 kriebe 4204 creationTime & -- & datetime & date and time at which the entity was created (e.g. timestamp of a file)\\ 172 kriebe 3457 \bottomrule 173 \end{tabulary} 174 kriebe 3654 \caption{Attributes of entities. Mandatory attributes are marked in bold. 175 kriebe 3473 }\label{tab:entity-attributes} 176 \end{table} 177 kriebe 3447 178 kriebe 3721 For entities, we suggest the attributes given in Table 179 \ref{tab:entity-attributes}. If the attribute also exists in the W3C 180 kriebe 3734 Provenance Data Model, we list its name in the second column. 181 kriebe 3473 182 mathieu.servillat 4238 %We discussed further attributes like \emph{size} and \emph{format}, but we decided to treat an 183 %entity of the same content but different format (and thus size) as the same entity, 184 %unless they do not have the same provenance (e.g. when the transformation'' activity 185 %for converting one format into another is included in the provenance description). 186 187 kriebe 3654 %\TODO{format and size may not be needed, if entities with the same content but different format and size are considered as the same entity.} 188 189 kriebe 3447 The difference between entities that are used as input data or output data 190 kriebe 3654 becomes clear by specifying the relations between the data and activities producing or using these data. 191 kriebe 3457 More details on this will follow in Section \ref{sec:entity-activity-relations}. 192 kriebe 3447 193 kriebe 3721 \paragraph{EntityDescription.} 194 mathieu.servillat 4238 %The Entity class can have an EntityDescription class attached. 195 The types of entities, or datasets in astronomy, can be predefined using a description class \class{EntityDescription}. 196 This class is meant to store information about an Entity that are known before the Entity instance is created. For example, if we run an activity to create a RGB image from three grey images, we may have a mandatory format for the input and output images before the execution (JPG, PNG, FITS\dots), but we probably cannot know the final size of the image that will be created. Therefore, format'' would be an EntityDescription attribute , while size'' would be an attribute of the Entity instance. 197 kriebe 3457 198 mathieu.servillat 4238 %This class thus stores entity-related 199 Some of the attributes that describe the content of the data could be derived from 200 the Dataset Metadata Model. 201 202 kriebe 3473 The \class{EntityDescription} does NOT contain any information about the usage 203 of the data, it tells nothing about them being used as input or output. This is 204 kriebe 3703 defined only by the relations (and the relation descriptions) between activities 205 kriebe 3473 and entities (see Section \ref{sec:entity-activity-relations}). 206 207 mathieu.servillat 4238 The EntityDescription general attributes are summarized in Table 208 \ref{tab:entitydescription-attributes}. 209 kriebe 3473 210 mathieu.servillat 4238 211 kriebe 3473 \begin{table}[h] 212 \small 213 kriebe 4032 \tymax 0.5\textwidth 214 kriebe 3473 \textbf{\normalsize EntityDescription}\vspace{0.25em}\\ 215 kriebe 3699 \begin{tabulary}{\textwidth}{@{}p{2.75cm}p{0cm}p{2cm}L@{}} 216 kriebe 3473 \toprule 217 kriebe 3699 \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\ 218 kriebe 3473 \midrule 219 kriebe 3699 \textbf{id} & & (qualified) string & a unique identifier for this description\\ 220 kriebe 4027 name & & string & a human-readable name for the entity description\\ 221 annotation & & string & a decriptive text for this kind of entity\\ 222 kriebe 4204 category & & string & specifies if the entity contains information on logging, system (environment), calibration, simulation, observation, configuration, ...\\ 223 kriebe 3769 doculink & & url & link to more documentation\\ 224 kriebe 4027 % removed the obscore attributes, since specific for observations only, not applicable to configuration entities etc. 225 % dataproduct\_ type & & string & from ObsCore data model \citep{std:ObsCore}, if applicable; describes, what kind of product it is (e.g. image, table)\\ 226 % dataproduct\_ subtype & & string & from ObsCore data model, more specific subtype\\ 227 % level & & enum integer & the level of processing or calibration; for ObsCore's calib\_level it is an integer between 0 and 3\\ 228 kriebe 3473 \bottomrule 229 \end{tabulary} 230 kriebe 3654 \caption{Attributes of \class{EntityDescription}. For simple use cases, 231 kriebe 3473 the description classes may be ignored and its attributes may be used for 232 kriebe 3654 \class{Entity} instead. 233 kriebe 3699 %The utypes may vary depending on the data model, e.g. for simulation data they 234 %would point to utypes of SimDM. 235 kriebe 3473 }\label{tab:entitydescription-attributes} 236 \end{table} 237 238 kriebe 4032 239 \begin{table}[h] 240 241 \small 242 \tymax 0.5\textwidth 243 244 \textbf{\normalsize WasDerivedFrom}\vspace{0.25em}\\ 245 \begin{tabulary}{1.0\textwidth}{@{}lp{3cm}L@{}} 246 \toprule 247 \head{Attribute} & \head{Data type} & \head{Description}\\ 248 \midrule 249 id & string & a unique id for this entity (unique in its realm)\\ 250 \textbf{generatedEntity} & string & foreign key to the entity\\ 251 \textbf{usedEntity} & string & foreign key to the progenitor, from which the generatedEntity was derived\\ 252 activity & string & foreign key to the generation activity\\ 253 generation & string & foreign key to the wasGeneratedBy relation\\ 254 usage & string & foreign key to the used relation\\ 255 \bottomrule 256 \end{tabulary} 257 \caption{Attributes of the WasDerivedFrom relation. This is the same as used in W3C's ProvDM. Mandatory attributes are marked in bold. 258 }\label{tab:wasderivedfrom-attributes} 259 \end{table} 260 261 262 kriebe 3721 \paragraph{WasDerivedFrom.} 263 kriebe 3705 In Figure~\ref{fig:entity-details} there is one more relation that we have not mentioned yet: 264 kriebe 3703 the \class{WasDerivedFrom}-relation which links two entities together, borrowed from the W3C model. 265 kriebe 4204 It is used to express that 266 kriebe 3703 one entity was derived from another, i.e. it can be used to find one (or more) progenitor(s) 267 kriebe 4032 of a dataset, without having to look for the activities in between. It can therefore serve as 268 kriebe 4204 a shortcut. 269 kriebe 4032 270 The information this relation provides is somewhat redundant, since progenitors for entities 271 kriebe 3703 can be found through the links to activity and the corresponding descriptions. 272 kriebe 3727 Nevertheless, we include \class{WasDerivedFrom} for those cases where an explicit 273 kriebe 3703 link between an entity and its progenitor is useful (e.g. for speeding up searches for 274 progenitors or if the activity in between is not important). 275 276 kriebe 4032 Note that the \class{WasDerivedFrom} relation 277 cannot always automatically be infered from following \class{WasGeneratedBy} and \class{Used} relations alone: 278 If there is more than one input and more than one output of an activity, it is not clear (without 279 consulting the activityDescription and entity roles in the relation-descriptions) which entity was derived from which. 280 Only by specifying the descriptions and roles accordingly or by adding the a \class{WasDerivedFrom} relation, 281 kriebe 4204 this direct derivation becomes known. 282 kriebe 3703 283 284 kriebe 4032 285 kriebe 3701 \subsubsection{Collection}\label{sec:collection} 286 kriebe 3671 Collections are entities that are grouped together and can be treated as one single entity. 287 kriebe 3727 From the provenance point of view, they have to have the \emph{same origin}, i.e., they were 288 kriebe 3473 produced by the same activity (which could also be the activity of collecting 289 kriebe 3671 data for a publication or similar). The term collection'' is 290 kriebe 3701 also used in the Dataset Metadata Model for grouping datasets. 291 kriebe 3671 % (but with a slightly different meaning). 292 kriebe 3668 As an example, a collection 293 kriebe 3457 with the name RAVE survey' could consist of a number of database tables and spectra files. 294 295 kriebe 3654 %\TODO{Do we allow empty collections? Or should collections always contain at least 1 member? (otherwise they are just prov:entities?)} 296 kriebe 3538 297 kriebe 3671 The Entity-Collection relation can be modeled using the \emph{Composite} design pattern: 298 kriebe 3473 Collection is a subclass of Entity, but also an aggregation of 1 to many entities, 299 which could be collections themselves. 300 kriebe 4204 In order to be compliant to VODML, we model the membership-relation explicitly 301 kriebe 3701 by including a \class{HadMember} class in our model, which is connected to the 302 \emph{Collection} class via a composition. It may contain an additional role attribute. 303 kriebe 3457 304 kriebe 3701 Collections are also known in the W3C model, in the same sense as used here. 305 The relation between entity and collection is also called HadMember'' in the W3C model. 306 kriebe 3457 307 kriebe 3654 An additional class \class{CollectionDescription} is only 308 needed if it has different attributes than 309 kriebe 3727 the \class{EntityDescription}. This class should therefore only be introduced if a use case requires it. 310 kriebe 3473 311 kriebe 3727 \paragraph{Advantages of collections:} Collections can be used to collect entities with the same provenance information together, 312 kriebe 3699 in order to hide complexity where necessary. They can be used for defining 313 kriebe 3671 different levels of detail (granularity). 314 kriebe 3457 315 kriebe 3721 %\TODO{Find a really strong use case for Collections to convince everyone that they are useful/needed.} 316 kriebe 3457 317 kriebe 3721 \subsubsection{Activity and ActivityDescription} 318 kriebe 3705 319 \begin{figure}[h] 320 \centering 321 kriebe 4015 \includegraphics[scale=0.5]{../datamodel-diagrams/images/activity-details.pdf} 322 kriebe 3721 \caption{Details for Activity, ActivityDescription and ActivityFlow (see Section~\ref{sec:activityflow}). 323 kriebe 3705 } 324 \label{fig:activity-details} 325 \end{figure} 326 327 kriebe 3473 \begin{table}[h] 328 kriebe 3447 329 kriebe 3473 \small 330 kriebe 3721 \tymax 0.5\textwidth 331 kriebe 3447 332 kriebe 3473 \textbf{\normalsize Activity}\vspace{0.25em}\\ 333 kriebe 3699 \begin{tabulary}{1.0\textwidth}{@{}lp{2.5cm}p{2cm}L@{}} 334 kriebe 3473 \toprule 335 kriebe 3699 \head{Attribute} & \head{W3C ProvDM} & \head{Data type} & \head{Description}\\ 336 kriebe 3473 \midrule 337 kriebe 3699 \textbf{id} & prov:id & (qualified) string & a unique id for this activity (unique in its realm)\\ 338 kriebe 4027 name & prov:label & string & a human-readable name (to be displayed by clients)\\ 339 kriebe 3699 \textbf{startTime} & prov:startTime & datetime & start of an activity\\ 340 \textbf{endTime} & prov:endTime & datetime & end of an activity\\ 341 kriebe 3765 annotation & prov:description & string & additional explanations for the specific activity instance\\ 342 kriebe 3910 %description\_ref & & foreign key/url & link to \class{ActivityDescription}\\ 343 kriebe 3473 \bottomrule 344 \end{tabulary} 345 kriebe 3699 \caption{Attributes of \class{Activity}, their data types and equivalents in the W3C Provenance 346 Data Model, if existing. Attributes in bold are \textbf{mandatory}.} 347 kriebe 3473 \end{table} 348 kriebe 3457 349 kriebe 3699 350 kriebe 3473 \begin{table}[ht] 351 \small 352 kriebe 3721 \tymax 0.5\textwidth 353 kriebe 3473 \textbf{\normalsize ActivityDescription}\vspace{0.25em}\\ 354 kriebe 3699 \begin{tabulary}{1.0\textwidth}{@{}p{0cm}p{2.5cm}lL@{}} 355 kriebe 3473 \toprule 356 kriebe 3699 \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\ 357 kriebe 3473 \midrule 358 kriebe 3699 \textbf{id} & & string & a unique id for this activity description (unique in its realm)\\ 359 kriebe 4027 name & & string & a human-readable name (to be displayed by clients)\\ 360 kriebe 3699 type & & string & type of the activity, from a vocabulary or list, e.g. data acquisition (observation or simulation), reduction, calibration, publication\\ 361 subtype & & string & more specific subtype of the activity\\ 362 kriebe 4027 annotation & & string & additional free text description for the activity\\ 363 kriebe 3699 %code & & string & the code used for this process\\ 364 %version & & string & a version number for the code\\ 365 kriebe 3769 doculink & & url & link to further documentation on this process, e.g. a 366 kriebe 3703 paper, the source code in a version control system etc.\\ 367 kriebe 3473 \bottomrule 368 \end{tabulary} 369 \caption{Attributes of \class{ActivityDescription}.} 370 \end{table} 371 kriebe 3447 372 373 kriebe 3721 Activities in astronomy include all steps from obtaining data to the reduction of 374 images and production of new datasets, like image calibration, bias subtraction, image stacking; 375 light curve generation from a number of observations, radial velocity 376 determination from spectra, post-processing steps of simulations etc. 377 kriebe 3447 378 kriebe 3721 \paragraph{ActivityDescription.} 379 The method underlying an activity can be specified by a corresponding 380 \class{ActivityDescription} class (previously named \class{Method}, corresponds 381 to the \class{Protocol} class in SimDM). This could be, 382 for instance, the name of the code used to perform an activity or a more general 383 description of the underlying algorithm or process. An activity is then a 384 concrete case (instance) of using such a method, with a startTime and endTime, 385 and it refers to a corresponding description for further information. 386 387 There MUST be exactly zero or one \class{ActivityDescription} per \class{Activity}. If steps from a 388 pipeline shall be grouped together, one needs to create a proper 389 \class{ActivityDescription} for describing all the steps at once. This method can then 390 be refered to by the pipeline-activity. 391 392 When serializing the data model, the attributes 393 of the description class may be assigned to the activity in order to produce 394 a W3C compliant serialization (same as with Entity/EntityDescription). 395 396 397 \paragraph{WasInformedBy.} 398 The individual steps of a pipeline can be chained 399 together directly, without mentioning the intermediate datasets, using the \class{WasInformedBy}-relation. 400 This relation can be used as a short-cut, if the exchanged datasets are deemed to be not important 401 enough to be recorded. For grouping activities, also see the 402 next section \ref{sec:activityflow}. 403 404 405 \subsubsection{ActivityFlow}\label{sec:activityflow} 406 kriebe 4135 \TODO{Link to D-PROV!} 407 kriebe 3721 For facilitating grouping of activities (and their related entities etc.) 408 we introduce the class \class{ActivityFlow}. 409 kriebe 4132 It can be used for hiding and grouping a part of the workflow/pipeline 410 or provenance 411 description, if different levels of granularity are needed. Such pipelines and workflows are very common in astronomical data production and processing. Figure \ref{fig:provgraph-activityflow} 412 kriebe 3721 illustrates an example provenance graph in a detailed level (left side) 413 and using the ActivityFlow (right side). 414 415 416 \begin{figure}[h] 417 \centering 418 kriebe 4015 \includegraphics[width=1\textwidth]{../datamodel-diagrams/images/provgraph-activityflow} 419 kriebe 3721 \caption{An example provenance graph. The detailed version is shown on the left side. It also shows 420 the shortcut \class{WasInformedBy} to connect two activities, which could be used if the entity e2 421 kriebe 4027 would not be needed anywhere else. 422 kriebe 3721 An ActivityFlow can be used to hide'' a part of the provenance graph as is shown on the right side. 423 Activities are marked by blue rectangles, entities by yellow ellipses.} 424 \label{fig:provgraph-activityflow} 425 \end{figure} 426 427 kriebe 4104 We also explored the different ways to describe a set of activities in the W3C 428 kriebe 3724 provenance model. This model uses \class{Bundle}, i.e. an entity with type Bundle'', 429 for wrapping a provenance description. Each part of a provenance description can be 430 put into a bundle, and the bundle can then be reused in other provenance descriptions. 431 W3C's \class{Plan} is an entity with type Plan'' and is used for describing a 432 set of actions or steps. Both, \class{Bundle} and \class{Plan}, are entities and 433 have the attributes and relations of this class (and thus one can define provenance of bundles and plans as well). 434 kriebe 3721 435 kriebe 3724 But we would like to consider a set of activities as being an \class{Activity} itself, 436 with all the relations and properties that an activity also has. Therefore we do not reuse 437 kriebe 3725 W3C's classes for describing workflows and plans, but added 438 the class \class{ActivityFlow} as an activity composed of activities. The composition is represented by 439 kriebe 3724 the hadStep'' relation, as is shown in Figure~\ref{fig:activity-details}. 440 441 kriebe 3721 %while still making it obvious that this 442 %group contains activities, we introduce the class \class{ActivityFlow}. 443 %This can be used for describing workflows or pipelines, or for 444 % 445 %We also allow ActivityCollections to consist of a whole provenance graph of 446 %activities and entities being linked together. 447 448 449 kriebe 4104 %We could introduce an additional abstract class, e.g. \class{AbstractActivity}, with \class{Activity} and 450 %\class{ActivityFlow} being subclasses to this one. But this adds another layer of complexity 451 %that we may not want in this data model. 452 kriebe 3721 453 kriebe 4104 %Since we introduced \class{ActivityFlow} mainly for having different view levels, 454 %we may want to add an attribute \emph{viewLevel} to descriptions of activityflows. 455 % But where to set the 0 point for viewLevel??? 456 457 kriebe 4204 \begin{figure}[h] 458 \centering 459 \includegraphics[scale=0.6]{../datamodel-diagrams/images/entity-activity-relations.pdf} 460 \hspace{0.15\textwidth} 461 \includegraphics[scale=0.6]{../datamodel-diagrams/images/entity-activity-relations-nodesc.pdf} 462 \caption{\class{Entity} and \class{Activity} are linked via the \class{Used} and \class{WasGeneratedBy} relations. In the left image, the \emph{role} that an entity which was used or generated by an activity played is recorded with the corresponding \emph{UsedDescription} and \emph{WasGeneratedByDescription}, also see Section~\ref{sec:entity-roles}. If these description classes are not used, the \emph{role} can be used directly as an attribute within the \emph{Used} and \emph{WasGeneratedBy}classes (right image).} 463 \label{fig:entity-activity-relations} 464 \end{figure} 465 kriebe 4104 466 467 kriebe 3473 \subsubsection{Entity-Activity relations}\label{sec:entity-activity-relations} 468 kriebe 3447 469 kriebe 3473 For each data flow it should be possible to clearly identify entities and 470 activities. 471 %If the activities shall not be recorded explicitely, one could also 472 %use the \emph{Derivation}-relation as suggested in the W3C Provenance Data Model 473 %to link derived entities to their originals. 474 kriebe 3721 Each entity is usually a result from an activity, expressed by a link from 475 kriebe 3727 the entity to its generating activity using the \class{WasGeneratedBy} relation, 476 kriebe 3473 and can be used as input for (many) other activities, expressed by the \class{Used} relation. 477 Thus the information on whether data is used as input or was produced as output of 478 some activity is given by the \emph{relation-types} between activities and entities. 479 %In fact, 480 %it would be enough to provide this information just for the relations on the description side (right). 481 % -- Is this true? 482 kriebe 3447 483 kriebe 3473 We use two relations, \class{Used} and \class{WasGeneratedBy}, instead of just one 484 kriebe 3721 mapping class with a flag for input/output, because their descriptions and role-attributes 485 can be different. 486 %in order to model the different 487 %multiplicities explicitely: an entity always has only one (or none) 488 %\class{WasGeneratedBy} relation, but may be \class{Used} many times as input for 489 %different activities. 490 kriebe 3447 491 kriebe 3841 The \class{WasGeneratedBy}-relation can have the optional attribute \emph{time} -- this is the time, when 492 kriebe 4204 the generation of the entity is finished. This generation time corresponds to e.g. \emph{DataID.date} in 493 Dataset Metadata DM. 494 kriebe 3841 %It therefore corresponds to the \emph{created}-time used in 495 %the Simulation Data Model (SimDM). 496 497 kriebe 4104 \paragraph{Compositions and multiplicities} 498 kriebe 4204 In principle, an entity is produced by just one activity. 499 kriebe 4104 However, by introducing the \class{ActivityFlow} class for grouping activities together, 500 kriebe 4204 one entity can now have many wasGeneratedBy-links to activities. One of them must 501 kriebe 4104 be the actual generation activity, the other activities can only be activityFlows 502 kriebe 4204 containing this generation-activity. This restriction of having only one true'' generation activity is not explicitly expressed in the current model\footnote{The reason for this is that we want to keep the model simple and avoid introducing even more classes.}. 503 kriebe 4104 504 505 The \emph{Used} relation is closely coupled to the \emph{Activity}, so we use a composition here, indicated 506 in Figure~\ref{fig:classdiagram} by a filled diamond: 507 if an activity is deleted, then the corresponding used relations need to be removed as well. 508 The entities that were used still remain, since they may have been used for other activities as well. 509 We need a multiplicity * between \emph{Used} and \emph{Entity}, because an entity can be used more than once 510 (by different activities). 511 512 Similarly, the \emph{WasGeneratedBy} relation is closely coupled with the \emph{Entity} via a composition, 513 since a wasGeneratedBy relation makes no sense without its entity. So if an entity is deleted, 514 then its wasGeneratedBy relation must be deleted as well. There is a multiplicity * between \emph{Activity} 515 and \emph{WasGeneratedBy}, because an activity can generate many entities. 516 517 518 kriebe 4204 \paragraph{Entity roles}\label{sec:entity-roles} 519 kriebe 3473 Each activity requires specific roles for each input or output entity, thus 520 kriebe 3841 we store this information with description classes, in the role-attributes for 521 kriebe 3473 the \class{UsedDescription} and \class{WasGeneratedByDescription} relation. 522 kriebe 3721 For example, an activity for darkframe-subtraction requires two input images. But it is 523 very important to know which of the images is the raw image and 524 mathieu.servillat 4238 which one fulfils the role of dark frame. 525 kriebe 3447 526 kriebe 3721 The role is in general NOT an attribute for \class{EntityDescription} or \class{Entity}, 527 kriebe 4204 since the same entity (e.g. a specific FITS file containing an image) may play 528 kriebe 3727 different roles with different activities. If this is not the case, if the 529 kriebe 3721 image can only play the same role everywhere, only then it is an intrinsic 530 property of the entity and should be stored in the \class{EntityDescription}. 531 532 %Additionally, input (and also output) data can take different roles in an 533 %activity. For example, one file could 534 %be a parameter file, another one is the raw image, and the third one is the 535 %dark field that should be subtracted. Since these roles are very important, 536 %it must be made explicit which data component needs to fulfill which role as 537 %input in or output from an activity. 538 %Each activity requires specific roles for each input or output entity, thus 539 %we store this information on the description side, in the role-attributes for 540 %the \class{UsedDescription} and \class{WasGeneratedByDescription} relation. 541 542 mnullmei 3490 %In W3C, this is partially solved by adding a derivation relation between the Entities (data). Here, we have a mapping-class between Activity and DataEntities as well as between ActivityDescription and DataDescription. The mapping-class at the description side, i.e. between the ActivityDescription and its DataEntityDescriptions, contains additionally a role for each relation, e.g. parameter, dark frame, raw image, etc. If a dataset is used as input to an activity or if it results from it, will become clear with these roles. 543 kriebe 3447 544 545 kriebe 3473 Some example roles are given in Table \ref{tab:entity-roles}. 546 Note that these roles don't have to be unique, many datasets may play the same role for 547 kriebe 3671 a process. For example, many image entities may be used as science-ready-images for an 548 kriebe 3473 image stacking process. 549 kriebe 3447 550 kriebe 3473 \begin{table}[h] 551 \small 552 kriebe 4027 \begin{tabulary}{1.0\textwidth}{@{}lL@{}} 553 kriebe 3473 \toprule 554 kriebe 4027 \head{Role} & \head{Example entities}\\ 555 kriebe 3473 \midrule 556 kriebe 4027 configuration & configuration file \\ %& used for entities that contain configuration details for an activity\\ 557 auxiliary input & calibration image, dark frame, etc. \\%& \\ 558 main input & raw image, science-ready images \\%& used for entities that are the main input for an activity\\ 559 main result & image, cube or spectrum \\%& used for entities that are the main result of an activity\\ 560 log & logging output file \\%& used for logging output \\ 561 red & image used for red channel of a composite activity\\%& used for images that will be used as the red channel of a composite activity\\ 562 kriebe 3473 \bottomrule 563 \end{tabulary} 564 kriebe 4027 \caption{Examples for entity roles as attributes in the 565 kriebe 3473 \class{UsedDescription} and \class{WasGeneratedByDescription}.} 566 \label{tab:entity-roles} 567 \end{table} 568 mir.louys 4001 % here we cross some notions encountered in parameter descriptions and Activity descriptions while describing parameters 569 kriebe 3447 570 kriebe 3721 In order to facilitate interoperability, the possible 571 kriebe 3473 entity-roles could be defined and described for each activity by the IVOA community, in a 572 kriebe 3721 vocabulary list or thesaurus. 573 % TODO!! 574 kriebe 3447 575 576 kriebe 3721 %\TODO{Roles can be used for checking (validation) if processes use the correct type of entities, 577 %e.g. check if entity-type matches used-role!} 578 579 kriebe 3473 %Without the mapping tables, the relation between \class{Activity} 580 %(\class{ActivityDescription}) and \class{Entity} (\class{EntityDescription}) 581 %would be an aggregation relation, or in other words: an association with the 582 %aggregation kind shared''. That would be required to ensure that all 583 %entities linked to an activity (either as input or output) will survive if 584 %the activity is destroyed, since they are almost always shared with other 585 %activities. 586 % 587 %By using the mapping tables we make the role of an entity in an activity more 588 %explicit and thus can replace the aggregation by a composition relation to the 589 %\class{Activity}/\class{ActivityDescription} and simple associations to the 590 %individual data components and their descriptions. 591 kriebe 3447 592 593 kriebe 3473 % The derivation relation together with entities is already enough to produce a 594 % Data flow view, but in astronomy we are probably even more interested in the 595 % Processes (as discussed in our first draft for requirements for provenance). 596 kriebe 3447 597 kriebe 3721 %\TODO{Add an example here! (From discussions in Heidelberg.)} 598 kriebe 3447 599 600 kriebe 3699 601 kriebe 3734 \subsubsection{Parameters}\label{sec:parameters} 602 kriebe 3699 603 mathieu.servillat 3732 The concept of activity configuration, generally a set of parameters that can be configured, is different to the concept of provenance information. However, it is tightly connected. We identify three different ways to link configuration information to an activity: 604 \begin{itemize} 605 kriebe 3734 \item Declare a parameter set (or each parameter) as an input entity that is used by the activity. \\ 606 This also allows tracking the provenance of the parameter further. 607 \item Define families of activities, each one with fixed attributes.\\ 608 I.e. use different subclasses for activities with different fixed attributes. 609 mathieu.servillat 3732 \item Add activity attributes in the form of key-value parameters. 610 \end{itemize} 611 kriebe 3721 612 kriebe 3734 To enable the latter solution, we add a \class{Parameter} class along with a \class{ParameterDescription} for describing additional properties of activities. In this solution, Parameters are directly connected to an Activity without complex Entity-Activity relations. Moreover, we can then describe each parameter in the same way as in FIELD and PARAM elements in VOTable \citep{std:VOTable}. 613 kriebe 3721 614 mathieu.servillat 3732 615 \begin{table}[h] 616 \small 617 \tymax 0.5\textwidth 618 \textbf{\normalsize Parameter}\vspace{0.25em}\\ 619 \begin{tabulary}{1.0\textwidth}{@{}p{0cm}p{2.5cm}lL@{}} 620 \toprule 621 \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\ 622 \midrule 623 \textbf{id} & & string & parameter unique identifier\\ 624 kriebe 4027 %description\_ref & & foreign key/url & link to \emph{ParameterDescription}\\ 625 %name & & string & parameter name, if no link to ParameterDescription is given\\ 626 mathieu.servillat 3732 \textbf{value} & & (value dependent) & the value of the parameter\\ 627 \bottomrule 628 \end{tabulary} 629 \caption{Attributes of \class{Parameter}. Attributes in bold are \textbf{mandatory}.} 630 \end{table} 631 632 \begin{table}[ht] 633 \small 634 \tymax 0.5\textwidth 635 \textbf{\normalsize ParameterDescription}\vspace{0.25em}\\ 636 \begin{tabulary}{1.0\textwidth}{@{}p{0cm}p{2.5cm}lL@{}} 637 \toprule 638 \head{Attribute} & \head{} & \head{Data type} & \head{Description}\\ 639 \midrule 640 \textbf{id} & & string & parameter unique identifier\\ 641 kriebe 4027 \textbf{name} & & string & parameter name\\ 642 mathieu.servillat 4238 annotation & & string & additional free text description\\ 643 datatype & & string & datatype \\ 644 unit & & string & physical unit \\ 645 ucd & & string & Unified Content Descriptor, supplying a standardized classification of the physical quantity\\ 646 utype & & string & UType, meant to express the role of the parameter in the context of an external data model \\ 647 min & & number & minimum value \\ 648 max & & number & maximum value\\ 649 options & & list & list of accepted values\\ 650 mathieu.servillat 3732 \bottomrule 651 \end{tabulary} 652 \caption{Attributes of \class{ParameterDescription}.} 653 \end{table} 654 655 For example, observations generally require information on \emph{ambient conditions} as well as 656 mathieu.servillat 4218 \emph{instrument characteristics}. This contextual data associated with an observation is not directly modelled in the ProvenanceDM. However, this information can be stored as different entities. Alternatively, one could list the instrument characteristics as a set of key-value parameters using the \class{Parameter} class, so that this information is structured and stored with the provenance information (and can thus be queried simultaneously). In the case of a processing activity that cleans an image with a sigma-clipping method, the input and output images would be entities and the value of the number of sigma for sigma-clipping could be a parameter instead of an entity. We may also want to define a 3-sigma-clipping activity where this parameter is fixed to 3. 657 mathieu.servillat 3732 658 659 %For example for observations, the \emph{ambient conditions} as well as 660 %\emph{instrument characteristics} need to be stored. But they can both be treated 661 %as additional entities as well. 662 %Our model can then also take into account that a certain observation 663 %method requires special ambient conditions, already defined via the 664 %ActivityDescription (e.g. radio observations rely on different ambient 665 %conditions than observations 666 %of gamma rays), just following our data -- data description scheme. 667 %Ambient conditions are recorded for a certain time (startTime, endTime) and are 668 %usually only valid for a certain time interval. This time interval should be recorded 669 %with a \emph{validity}-attribute for such entities. 670 % 671 %In contrast to ambient conditions, instrument characteristics do (usually) not 672 %change from one observation to the other, so they are static, strictly related to 673 %the instrument. 674 %All the characteristics could be described either as key-value pairs directly with the 675 %observation (as attributes) or just as datasets, using the \class{Entity} class. 676 %One would then 677 %link the instrument characteristics as a type of input (or output?) dataset to a certain 678 %observation activity. Thus we don't need a separate Instrument or Device class. 679 680 kriebe 3721 %\note{One should also keep in mind that some instrument related parameters can change within time, 681 %e.g. the CCD temperature. The instruments can also change within time because of aging.} 682 683 684 mathieu.servillat 3732 685 kriebe 3473 \subsubsection{Agent}\label{sec:w3c-agent} 686 kriebe 3699 687 An \class{Agent} describes someone who is responsible for a certain task or 688 kriebe 3671 entity, e.g. who pressed a button, 689 ran a script, performed the observation or published a dataset. 690 The agent can be a single person, a group of persons (e.g. MUSE WISE Team), a 691 kriebe 4204 project (CTA) or an institute. 692 kriebe 3701 This is also reflected in the IVOA Dataset Metadata Model, where \class{Party} 693 kriebe 4071 represents an agent, and it has two types: \class{Individual} and \class{Organization}, 694 which are explained in more detail in Table \ref{tab:agent-types} (also see Section~\ref{sec:dmlinks} for comparison between \class{Agent} and \class{Party}). 695 Both agent types are also used in the W3C Provenance Data Model, though 696 \class{Individual} is called \class{Person} there. 697 kriebe 4204 We decided to not include the type \class{SoftwareAgent} from W3C (yet), since it is not required for our current use cases. This may change in the future. 698 kriebe 3447 699 kriebe 3699 \begin{table}[h] 700 \small 701 \tymax 0.5\textwidth 702 \begin{center} 703 \begin{tabulary}{1.0\textwidth}{@{}lllL@{}} 704 kriebe 3721 \multicolumn{4}{c}{\textbf{AgentType}}\\ 705 kriebe 3699 \toprule 706 kriebe 4027 \head{Class or type} & \head{W3C ProvDM} & \head{DatasetDM} &\head{Comment} \\ 707 kriebe 3699 \midrule 708 Agent & Agent & Party & \\ 709 Individual & Person & Individual & a person, specified by name, email, address, 710 (though all these parts may change in time)\\ 711 Organization & Organization & Organization & a publishing house, institute or scientific project\\ 712 mir.louys 4001 713 kriebe 4027 714 kriebe 3699 \bottomrule 715 \end{tabulary} 716 kriebe 4027 \caption{Agent class and types of agents/subclasses in this data model, compared to W3C ProvDM and DatasetDM.} 717 kriebe 3699 \label{tab:agent-types} 718 \end{center} 719 \end{table} 720 kriebe 3671 721 kriebe 4028 \begin{table}[h] 722 \small 723 \tymax 0.5\textwidth 724 \begin{center} 725 \begin{tabulary}{1.0\textwidth}{@{}llp{2cm}L@{}} 726 \multicolumn{4}{c}{\textbf{Agent}}\\ 727 \toprule 728 \head{Attribute} & \head{W3C ProvDM} & \head{Data type} & \head{Description}\\ 729 \midrule 730 \textbf{id} & prov:id & (qualified) string & unique identifier for an agent\\ 731 \textbf{name} & prov:name & string & a common name for this agent; e.g. first name and last name; project name, agency name...\\ 732 type & prov:type & string & type of the agent: either Individual (Person) or Organization\\ 733 % insert here the attributes dedicated to contact for a Party in DataSet Metadata DM. 734 kriebe 4071 % \hline 735 % \multicolumn{4}{l}{Additional optional attributes from Dataset.Party subclasses:}\\ 736 % \hline 737 % address & & string & Address of the agent both for Individual (Person) and Organization\\ 738 % phone & & string & Contact phone number of the agent both for Individual (Person) and Organization\\ 739 % email & & string & Contact email of the agent both for Individual (Person) and Organization\\ 740 kriebe 4028 \bottomrule 741 \end{tabulary} 742 \caption{Agent attributes} 743 \label{tab:agent-attributes} 744 \end{center} 745 \end{table} 746 747 748 kriebe 4071 749 A definition of organizations is given in the 750 IVOA Recommendation on Resource Metadata \citep{std:ResourceMeta}, hereafter 751 refered to as RM: An organisation is [a] specific type of resource that 752 brings people together to pursue participation in VO applications.'' 753 It also specifies further that scientific projects can be considered 754 as organisations on a finer level: 755 At a high level, an organisation could be a university, observatory, or government 756 agency. At a finer level, it could be a specific scientific project, space mission, 757 or individual researcher. A provider is an organisation that makes data and/or services 758 available to users over the network.'' 759 760 761 762 For each agent a \emph{name} should be specified, a summary of the attributes for \class{Agent} is given in Table~\ref{tab:agent-attributes}. 763 One could also add the optional attributes \emph{address}, \emph{phone} and \emph{email} (compare with subclasses of \emph{Party} in Section~\ref{sec:dmlinks}). However, we skip them here in this main class, since an advanced system may use permanent identifiers (e.g. ORCIDs) to identify agents and retrieve their properties from an external system. 764 kriebe 3699 It would also increase the value of the given 765 information if the (current) affiliation of the agent (and a project leader/group 766 leader) were specified in order to maximize the chance of finding any contact 767 person later on. 768 The contact information is needed in case more information about a certain step in the past of a dataset is required, 769 but also in order 770 to know who was involved and to fulfill our Attribution'' requirement 771 (Section~\ref{sec:requirements}), so that proper credits are given to the right 772 kriebe 4071 people/projects. 773 kriebe 3447 774 775 kriebe 3699 776 It is desired to have at least one agent given for each activity (and entity), but it 777 is not enforced. 778 % , hence the multiplicity between \class{Entity}/\class{Activity} and the relations 779 %to the \class{Agent} starts with 0. 780 kriebe 4071 There can also be more than one agent for each activity/entity with different \emph{roles} 781 kriebe 3699 and one agent can be responsible for more than one activity or entity. This 782 kriebe 3727 many-to-many relationship is made explicit in our model by adding the two 783 kriebe 3699 following relation classes: 784 785 kriebe 3473 \begin{itemize} 786 \item wasAssociatedWith: relates an \emph{activity} to an agent 787 \item wasAttributedTo: relates an \emph{entity} to an agent 788 kriebe 3447 \end{itemize} 789 790 kriebe 3699 We adopted here the same naming scheme as was used in W3C ProvDM. 791 kriebe 3473 Note that the attributed-to-agent for a dataset may be different from the 792 kriebe 4027 agent that is associated with the activity that created an entity. 793 kriebe 3473 Someone who is performing a task is not necessarily given full attribution, 794 especially if he acts on behalf of someone else (the project, university, ...). 795 kriebe 3447 796 kriebe 4028 797 kriebe 3699 In order to make it clearer what an agent is useful for, we suggest the 798 kriebe 3473 possible roles an agent can have (along with descriptions partially taken from RM) 799 kriebe 4027 in Table~\ref{tab:agent-roles}. 800 kriebe 4071 For comparison, SimDM contains following roles for their contacts: 801 owner, creator, publisher and contributor. Note that the \emph{Party} class in Dataset and SimDM are very similar to the \emph{Agent} class, which is explained in more detail in Section~\ref{sec:dmlinks}. 802 kriebe 3447 803 804 kriebe 3473 \begin{table}[h] 805 \small 806 kriebe 4032 \tymax 0.5\textwidth 807 kriebe 3473 \begin{center} 808 kriebe 3734 \begin{tabulary}{1.0\textwidth}{@{}lp{3cm}L@{}} 809 kriebe 3473 \multicolumn{3}{c}{\textbf{AgentRoles}}\\ 810 \toprule 811 kriebe 4029 \head{role} & \head{type or sub class} & \head{Comment} \\ 812 kriebe 3473 \midrule 813 kriebe 4029 author & Individual & someone who wrote an article, software, proposal\\ 814 contributor & Individual & someone who contributed to something (but not enough to gain authorship)\\ 815 editor & Individual & editor of e.g. an article, before publishing\\ 816 creator & Individual & someone who created a dataset, creators of articles or software are rather called author''\\ 817 curator & Individual & someone who checked and corrected a dataset before publishing\\ 818 publisher & Organization {(maybe also Individual?)}& organization (publishing house, institute) that published something\\ 819 observer & Individual & observer at the telescope\\ 820 operator & Individual & someone performing a given task \\ % removed executor: ambiguous 821 coordinator/PI & Individual & someone coordinating/leading a project\\ % we should choose one word : PI? 822 funder & Organization & agency or sponsor for a project as in Prov-N\\ 823 provider & Organization & an organization that makes data and/or services available to users over the network'' (definition from RM)\\ 824 kriebe 4027 %(owner) & voprov:Individual or voprov:Organization & Does anyone really own the data?\\ 825 kriebe 3473 \bottomrule 826 \end{tabulary} 827 kriebe 3734 \caption{Examples for roles of agents and the typical type of that agent} 828 kriebe 3473 \label{tab:agent-roles} 829 \end{center} 830 \end{table} 831 kriebe 3447 832 kriebe 3734 %\TODO{\textbf{Mireille + Fran\c{c}ois}: Go through these roles, pick only the necessary ones, crosscheck with other data models.} 833 kriebe 3447 834 kriebe 3699 This list is \emph{not} complete. We consider providing a vocabulary list for this 835 in a future version of this model, collected from (future) implementations of this model. 836 kriebe 3447 837 kriebe 3699 %\TODO{Do we have a specific use case for fixing the agent-roles? Is anyone 838 %going to search for specific roles in the Provenance meta-data? 839 %Or shall we leave it open, which roles can be defined and just give examples here?} 840 % ... Yes, just give examples here. Should have a vocabulary list somewhere ... 841 kriebe 3447 842 kriebe 3703 %\subsubsection{Shortcuts: WasDerivedFrom and WasInformedBy}\label{sec:shortcuts} 843 %The classes \class{WasDerivedFrom} and \class{WasInformedBy} can be used as shortcuts'' and 844 %are used in the same way as the corresponding W3C classes. 845 kriebe 3671 846 kriebe 3703 %\class{WasDerivedFrom} defines the relation that links two entities together, if one entity was derived 847 %from the other entity. In principle, one can find this information also by tracing the 848 %history of an entity backwards to the generating activity and its input entities. 849 %The descriptions for activity, entity and their relations should provide enough 850 %information to find the progenitor entity from which an entity was derived. 851 %Nevertheless, we include \class{WasDerivedFrom} for those cases where an explicite 852 %link between an entity and its progenitor is useful (e.g. for speeding up searches for 853 %progenitors or if the activity in between is not important). 854 kriebe 3671 855 kriebe 3703 %The class \class{WasInformedBy} links two activities together without defining the 856 %intermediate entities that may have been exchanged. This is useful for e.g. pipelines, 857 %if the intermediate entities don't play a major role or only exist temporarily, so that 858 %their provenance information is not deemed to be important enough to be recorded. 859 kriebe 3671 %WasInformedBy'' relation (also called `Communication'' relation, borrowed from W3C's model) 860 kriebe 3734 861 862 %\subsection{Implementation hints}