/[volute]/trunk/projects/dm/provenance/ProvDM/doc/datamodel-description.tex
ViewVC logotype

Annotation of /trunk/projects/dm/provenance/ProvDM/doc/datamodel-description.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 5691 - (hide annotations)
Fri Nov 15 16:39:50 2019 UTC (10 months, 1 week ago) by mathieu.servillat
File MIME type: application/x-tex
File size: 54848 byte(s)
update intro phrase on FAIR principles, update agent role description, update figures
1 mathieu.servillat 5491
2     \subsection{Overview and class diagram}
3     \label{sec:overview}
4    
5    
6     \begin{figure}[hbt]
7     \centering
8 mathieu.servillat 5529 \includegraphics[width=1.0\textwidth]{PROV_Fig3.png}
9 mathieu.servillat 5663 \caption[Overview class diagram of the IVOA Provenance Data Model]{Overview class diagram of the IVOA Provenance Data Model. The core part in yellow is based on W3C PROV definitions where relations are shown in grey. It is extended by a description part (orange), specific types of entities (red) and an \class{ActivityConfiguration} package (green). A full diagram with attributes is shown in Section~\ref{sec:fulldiagram}, Figure~\ref{fig:fulldiagram}}
10 mathieu.servillat 5491 \label{fig:overview}
11     \end{figure}
12    
13     The IVOA Provenance DM is based on the the PROV-DM recommendation \citep{std:W3CProvDM} of the World Wide Web Consortium (W3C), that provides the core elements of the model (see Sections~\ref{sec:ent_act} to~\ref{sec:agent+relations}).
14 mathieu.servillat 5616 In the VO context, the provenance of something is thus a sequence of activities using and generating entities run by agents.
15 mathieu.servillat 5491
16 mathieu.servillat 5663 The model includes in addition description classes (see Section~\ref{sec:descriptions}) to provide information common to several elements; Specific types of \class{Entity} classes commonly used in astronomy (see Section~\ref{sec:spec_entities}); and an \class{ActivityConfiguration} package (see Section~\ref{sec:configuration}).
17 mathieu.servillat 5491
18 mathieu.servillat 5512 The IVOA Provenance DM is a class data model that follows the VO-DML designing rules \citep{2018ivoa.spec.0910L}. It is represented as a UML class diagram: an overview diagram is shown in Figure~\ref{fig:overview}, and a full diagram with attributes is shown in Appendix~\ref{sec:fulldiagram}, Figure~\ref{fig:fulldiagram}.
19 mathieu.servillat 5491
20    
21     \subsection{Entity and Activity classes}
22     \label{sec:ent_act}
23    
24     The core classes and relations of the IVOA Provenance DM are presented in Figure~\ref{fig:coreclasses}.
25     Traceability (see goal A in Section~\ref{sec:goals}) is enabled by chaining entities and activities, which are the building blocks of the history graph.
26    
27    
28     \begin{figure}[ht]
29     \centering
30 mathieu.servillat 5529 \includegraphics[width=1.0\textwidth]{PROV_Fig4.png}
31 mathieu.servillat 5491 \caption[Core classes and relations]{Core classes and relations. Attributes for these classes are detailed in tables found in Sections~\ref{sec:ent_act} to~\ref{sec:agent+relations}.}
32     \label{fig:coreclasses}
33     \end{figure}
34    
35    
36    
37     \subsubsection{Entity and Collection classes}
38     \label{sec:Entity}
39    
40     An \textbf{entity} is a physical, digital, conceptual, or other kind of thing with some fixed aspects (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-entity}{\S5.1.1}).
41    
42     The \class{Entity} class in the model have the attributes given in Table \ref{tab:entity}.
43    
44 mathieu.servillat 5535 Entities in astronomy are usually astronomical or astrophysical datasets in the form of images, tables, numbers, etc. But they can also be log files, files containing system information, any input or output value, environment variables, ambient conditions, or, in a wider sense, observation proposals, scientific articles, or manuals and other documents.
45 mathieu.servillat 5663 Though the focus is on digital entities in this document, entities can also refer to physical entities that may be linked to digital entities, such as e.g., tools, instruments, detectors, photographic plates.
46 mathieu.servillat 5491
47    
48     \begin{table}[ht]
49     \small
50     \tymax 0.5\textwidth
51     \textbf{\normalsize Entity}\vspace{0.25em}\\
52     \begin{tabulary}{1.0\textwidth}{llL}
53     \toprule
54     \head{Attribute} & \head{Data type} & \head{Description}\\
55     \midrule
56     \textbf{id} & string & a unique identifier for this entity\\
57     name & string & a human-readable name for the entity\\
58 mathieu.servillat 5663 %a provenance type, i.e.~one of: prov:collection, prov:bundle, prov:plan; or any of the specialized entities defined in Section~\ref{sec:spec_entities} \\
59 mathieu.servillat 5491 %description\_ref & foreign key/url & link to \class{EntityDescription}\\
60 mathieu.servillat 5663 location & string & a path or spatial coordinates, e.g., a URL, latitude-longitude coordinates on Earth, the name of a place.\\
61 mathieu.servillat 5491 %value & prov:value & & provides a value that is a direct representation of the entity \\
62 mathieu.servillat 5663 generatedAtTime & datetime & date and time at which the entity was created (e.g., timestamp of a file)\\
63 mathieu.servillat 5625 invalidatedAtTime & datetime & date and time of invalidation of the entity. After that date, the entity is no longer available for any use.\\
64     %date and time of the destruction or cessation of the entity. The entity is no longer available for use (or further invalidation) after invalidation.\\
65 mathieu.servillat 5491 comment & string & text containing specific comments on the entity\\
66     %rights & -- & string & access rights for the entity, values: public, secure or proprietary; see Curation.Rights, RightsType in DatasetDM\\
67     %\midrule
68     %$\rightarrow$ description & & link & link to \class{EntityDescription}\\
69     %$\rightarrow$ wasAttributedTo & prov:wasAttributedTo & link & link to \class{WasAttributedTo} for linking with a responsible \class{Agent}\\
70     \bottomrule
71     \end{tabulary}
72 mathieu.servillat 5616 \caption[Attributes of the \class{Entity} class]{Attributes of the \class{Entity} class. Attributes in \textbf{bold} are mandatory and must not be null.
73 mathieu.servillat 5491 }\label{tab:entity}
74     \end{table}
75    
76    
77     A \textbf{collection} is an entity that provides a structure to some constituents that must themselves be entities (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-collection}{\S5.6.1}). These constituents are said to be member of the collections. They are connected in the model with a \class{hadMember} relation.
78    
79    
80     \subsubsection{Activity class}
81     \label{sec:activity}
82    
83     An \textbf{activity} is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-Activity}{\S5.1.2}).
84    
85     The \class{Activity} class in the model have the attributes given in Table \ref{tab:activity}.
86    
87     Activities in astronomy include all steps from obtaining data to the reduction
88     of images and production of new datasets, such as image calibration, bias
89     subtraction, image stacking, light curve generation from a number of
90     observations, radial velocity determination from spectra, post-processing steps
91     of simulations, etc.
92    
93    
94     \begin{table}[ht]
95     \small
96     \tymax 0.5\textwidth
97     \textbf{\normalsize Activity}\vspace{0.25em}\\
98     \begin{tabulary}{1.0\textwidth}{llL}
99     \toprule
100     \head{Attribute} & \head{Data type} & \head{Description}\\
101     \midrule
102     \textbf{id} & string & a unique id for this activity\\
103     name & string & a human-readable name (to be displayed by clients)\\
104     startTime & datetime & start of an activity\\
105     endTime & datetime & end of an activity\\
106     % startTime and endTime are not strictly required -- and in case of a reproducible activity
107     % they are meaningless. Therefore, I removed the bf here.
108     % mireille : I do not agree for this change by Ole : They can be unknown but they must be mandatory in order to allow to query on time for an activity, and to tell the execution order.
109     % ole so what should I put there if the time is not known? And what is the use case of using the temporal execution order instead of the logical one (following the provenance links used/wasgeneratedby)?
110     comment & string & text containing specific comments on the activity\\
111 mathieu.servillat 5663 %status & & string & can be used to describe the terminal status of the activity (e.g., completed, aborted, error...)\\
112 mathieu.servillat 5491 %votype & string & can be either ``activity'' or ``activityFlow''\\
113     %, used to differentiate between these two class types, if ``activityFlow'' is not implemented as an extra class (and in W3C compatible serializations)\\
114     %\midrule
115     %$\rightarrow$ description & & link & link to \class{ActivityDescription}\\
116     %$\rightarrow$ wasAssociatedWith & prov:wasAssociatedWith & link & link to \class{WasAssociatedWith} for linking with a responsible \class{Agent}\\
117     \bottomrule
118     \end{tabulary}
119 mathieu.servillat 5616 \caption[Attributes of the \class{Activity} class.]{Attributes of the \class{Activity} class. Attributes in \textbf{bold} are mandatory and must not be null.}\label{tab:activity}
120 mathieu.servillat 5491 %, references are indicated with an arrow ($\rightarrow$).}
121     \end{table}
122    
123    
124     \subsection{Entity-Activity relations}
125     \label{sec:entity-activity-relations}
126    
127     Each entity is usually a result of an activity, expressed by a link from the entity to its generating activity, and can be used as input for (many) other activities.
128     Thus the information on whether data is used as input or was produced as output of some activity is given by the \emph{relations} between activities and entities.
129     Tracking those relations answers one of the main objective of the model (see goal A in Section~\ref{sec:goals}).
130    
131    
132     \subsubsection{Used class}
133    
134     \textbf{Usage} is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity and could not have been affected by the entity (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-Usage}{\S5.1.4}).
135    
136     Usage is implemented in the model by a class \class{Used} that connects \class{Activity} to \class{Entity} and contains the attributes in Table~\ref{tab:used}.
137    
138     For example, an activity ``calibration'' used entities with the roles ``calibration data'' and ``raw images''.
139    
140     \begin{table}[ht]
141     \small
142     \tymax 0.5\textwidth
143     \textbf{\normalsize Used}\vspace{0.25em}\\
144     \begin{tabulary}{1.0\textwidth}{llL}
145     \toprule
146     \head{Attribute} & \head{Data type} & \head{Description}\\
147     \midrule
148     %\multicolumn{4}{@{}l}{References}\\
149     %\midrule
150     %$\rightarrow$ \textbf{activity} & prov:activity & link & link to an \class{Activity} instance\\
151     %$\rightarrow$ \textbf{entity} & prov:entity & link & link to an \class{Entity} instance\\
152     %$\rightarrow$ description & & link & link to the corresponding \class{UsedDescription}, if existing\\
153     %\midrule
154     %\textbf{id} & prov:id & string & an identifier for this relation\\
155     role & string & function of the entity with respect to the activity\\
156     time & datetime & time at which the usage of an entity started\\
157     \bottomrule
158     \end{tabulary}
159     \caption[Attributes of the \class{Used} relation class]{Attributes of the \class{Used} relation class.}
160     \label{tab:used}
161     \end{table}
162    
163     The \attribute{time} of the usage can be specified, and must be between the \attribute{startTime} and the \attribute{stopTime} of the corresponding activity.
164    
165     The \class{Used} class is closely coupled to the \class{Activity} by a composition (see \ref{sect:Composition}).
166     Any given entity can be used by more than one activity.
167    
168    
169     \subsubsection{WasGeneratedBy class}
170    
171     \textbf{Generation} is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-Generation}{\S5.1.3}).
172    
173     Generation is implemented in the model by a class \class{WasGeneratedBy} that connects \class{Entity} to \class{Activity} and contains the attributes in Table~\ref{tab:used}.
174    
175 mathieu.servillat 5535 For example, the entity ``raw\_image.fits'' was generated by the activity ``observation'' with the role ``raw image''.
176 mathieu.servillat 5491
177     \begin{table}[ht]
178     \small
179     \tymax 0.5\textwidth
180     \textbf{\normalsize WasGeneratedBy}\vspace{0.25em}\\
181     \begin{tabulary}{1.0\textwidth}{llL}
182     \toprule
183     \head{Attribute} & \head{Data type} & \head{Description}\\
184     \midrule
185     %\multicolumn{4}{@{}l}{References}\\
186     %\midrule
187     %$\rightarrow$ \textbf{entity} & prov:entity & link & link to an \class{Entity} instance\\
188     %$\rightarrow$ \textbf{activity} & prov:activity & link & link to an \class{Activity} instance\\
189     %$\rightarrow$ description & & link & link to the corresponding \class{WasGeneratedByDescription}, if existing\\
190     %\midrule
191     %\textbf{id} & prov:id & string & an identifier for this relation\\
192     role & string & function of the entity with respect to the activity\\
193     %time & prov:time & datetime & time at which the generation of an entity is finished\\
194     \bottomrule
195     \end{tabulary}
196     \caption[Attributes of the \class{WasGeneratedBy} relation class]{Attributes of the \class{WasGeneratedBy} relation class.}
197     \label{tab:wasgeneratedby}
198     \end{table}
199    
200     As the \class{Entity} class has an attribute \attribute{generatedAtTime}, there is no additional time attribute in this relation.
201    
202     The \class{WasGeneratedBy} relation is closely coupled with the \class{Entity} via a composition (see \ref{sect:Composition}).
203     An entity can be generated by only one activity, so the multiplicity is 1 or 0 between \class{Entity} and \class{WasGeneratedBy}.
204    
205    
206     \subsubsection{Roles in Entity-Activity relations}
207     \label{sec:roles}
208    
209 mathieu.servillat 5616 The \attribute{role} of an entity within an activity should be provided.
210 mathieu.servillat 5535 Roles in \class{Entity}-\class{Activity} relations are free text attributes.
211 mathieu.servillat 5491
212 mathieu.servillat 5663 The \attribute{role} cannot be an attribute of the \class{Entity} class, since the same entity (e.g., a specific file containing an image) may play different roles with different activities.
213 mathieu.servillat 5491
214     In some cases the role is mandatory to distinguish two input entities. For example, an activity for dark-frame subtraction requires two input images. But it is very important to know which of the images is the raw image and which one fulfils the role of dark frame.
215    
216 mathieu.servillat 5616 Several entities may play the same role for an activity. For example, many image entities may be used as science-ready-images for an image stacking process.
217 mathieu.servillat 5491
218    
219    
220     \subsubsection{WasDerivedFrom relation}
221    
222     A \textbf{derivation} is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-Derivation}{\S5.2.1}).
223    
224 mathieu.servillat 5663 Derivation is a relation \class{wasDerivedFrom} in the model, that connects an instance of \class{Entity} to another instance.
225 mathieu.servillat 5491
226 mathieu.servillat 5535 For example, the entity ``calibrated\_image.fits'' was derived from the entity ``raw\_image.fits''
227 mathieu.servillat 5491
228 mathieu.servillat 5663 This relation makes it possible to visualize independently the flow of entities, e.g., a dataflow. It does not need a priori a specific class or table in an implementation, but it provides a way to expose partial information that follow the general chain \class{WasGeneratedBy}-\class{Activity}-\class{Used} where the activity may be an empty instance because it is unknown or irrelevant.
229 mathieu.servillat 5491
230    
231     \subsubsection{WasInformedBy relation}
232    
233     \textbf{Communication} is the exchange of information (some unspecified entity) by two activities, one activity using some entity generated by the other (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-Communication}{\S5.1.5}).
234    
235 mathieu.servillat 5663 Communication is a relation \class{wasInformedBy} in the model, that connects an instance of \class{Activity} to another instance.
236 mathieu.servillat 5491
237 mathieu.servillat 5616 For example, the activity ``calibration'' was informed by the activity ``pipeline''.
238 mathieu.servillat 5491
239 mathieu.servillat 5535 This relation makes it possible to visualize independently the flow of activities as they occurred, which may be the result of the execution of a workflow. It does not need a priori a specific class or table in an implementation, but it provides a way to expose partial information that follow the general chain \class{Used}-\class{Entity}-\class{WasGeneratedBy} where the entity may be an empty instance because it is unknown or irrelevant.
240 mathieu.servillat 5491
241    
242     \subsection{Agent and relations to Agent}
243     \label{sec:agent+relations}
244    
245     A contact information is needed in case more information about a certain activity or entity is required, but also in order to know who was involved and to fulfil the Acknowledgement objective (see goal B in Section~\ref{sec:goals}).
246    
247    
248     \subsubsection{Agent class}
249     \label{sec:agent}
250    
251     An \textbf{agent} is something that bears some form of responsibility for an activity taking place or for the existence of an entity (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-agent}{\S5.3.1}).
252    
253     The \class{Agent} class in the model has the attributes given in Table \ref{tab:agent}.
254    
255 mathieu.servillat 5616 An Agent is generally someone who pressed a button, ran a script, performed the observation or published a dataset. The agent can be a single person, a group of persons, a project or an institute (the vocabulary is defined in Table).
256 mathieu.servillat 5491
257     It is recommended to use organizational agents and agents with generic contacts.
258    
259    
260     \begin{table}[ht]
261     \small
262     \tymax 0.5\textwidth
263     \textbf{\normalsize Agent}\vspace{0.25em}\\
264     \begin{tabulary}{1.0\textwidth}{llL}
265     \toprule
266     \head{Attribute} & \head{Data type} & \head{Description}\\
267     \midrule
268     \textbf{id} & string & unique identifier for an agent\\
269 mathieu.servillat 5663 \textbf{name} & string & a common name for this agent; e.g., first name and last name; project name, pipeline team, data center.\\
270 mathieu.servillat 5491 type & AgentType & type of the agent as given in Table~\ref{tab:agent-types}\\
271     comment & string & text containing specific comments on the agent\\
272     email & string & contact email of the agent\\
273     affiliation & string & affiliation of the agent\\
274     phone & string & phone number\\
275     address & string & address of the agent\\
276     url & anyURI & reference URL to the agent\\
277     % insert here the attributes dedicated to contact for a Party in DataSet Metadata DM.
278     % \hline
279     % \multicolumn{4}{l}{Additional optional attributes from Dataset.Party subclasses:}\\
280     % \hline
281     % address & & string & Address of the agent both for Individual (Person) and Organization\\
282     % phone & & string & Contact phone number of the agent both for Individual (Person) and Organization\\
283     % email & & string & Contact email of the agent both for Individual (Person) and Organization\\
284     \bottomrule
285     \end{tabulary}
286 mathieu.servillat 5616 \caption[Attributes of the \class{Agent} class]{Attributes of the \class{Agent} class. Attributes in \textbf{bold} are mandatory and must not be null.}
287 mathieu.servillat 5491 \label{tab:agent}
288     \end{table}
289    
290     \begin{table}[ht]
291     \small
292     \tymax 0.5\textwidth
293     \textbf{\normalsize AgentType}\vspace{0.25em}\\
294 mathieu.servillat 5555 \begin{tabulary}{1.0\textwidth}{lp{8cm}}
295 mathieu.servillat 5491 \toprule
296 mathieu.servillat 5616 \head{Type} &\head{Description} \\
297 mathieu.servillat 5491 \midrule
298 mathieu.servillat 5625 Person & person agents are people\\
299 mathieu.servillat 5663 Organization & a social or legal institution, e.g., an institute, a consortium, a project\\
300     SoftwareAgent & running software, e.g., a cron job or a trigger \\
301 mathieu.servillat 5491 \bottomrule
302     \end{tabulary}
303     \caption[Enumeration of Agent types.]{Enumeration of Agent types.}
304     \label{tab:agent-types}
305     \end{table}
306    
307     % 2018-12 commented
308     %A definition of organizations is given in the
309     %IVOA Recommendation on Resource Metadata \citep{std:ResourceMeta}, hereafter
310     %referred to as RM: ``An organization is [a] specific type of resource that
311     %brings people together to pursue participation in VO applications.''
312     %It also specifies further that scientific projects can be considered
313     %as organizations on a finer level:
314     %``At a high level, an organization could be a university, observatory, or government
315     %agency. At a finer level, it could be a specific scientific project, space mission,
316     %or individual researcher. A provider is an organization that makes data and/or services
317     %available to users over the network.''
318    
319     For each agent a \attribute{name} must be specified.
320 mathieu.servillat 5616 Other attributes can help locate or contact the agent (\attribute{email}, \attribute{affiliation}, \attribute{phone}, \attribute{address}).
321 mathieu.servillat 5663 Not every project will need them; e.g. an advanced system may use permanent identifiers (ORCIDs, identities in federations, etc) to identify agents ,and retrieve their properties from an external system instead.
322 mathieu.servillat 5491
323     %It is desired to have at least one agent given for each activity.
324 mathieu.servillat 5535 There can be more than one agent for each activity and one agent can be responsible for more than one activity or entity, using the relations defined in the following sections.
325 mathieu.servillat 5491
326    
327     \subsubsection{WasAssociatedWith class}
328    
329     An activity \textbf{association} is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-Association}{\S5.3.3}).
330    
331     Association is implemented in the model by a class \class{WasAssociatedWith} that connects \class{Activity} to \class{Agent} and contains the attributes in Table~\ref{tab:wasassociatedwith}.
332    
333 mathieu.servillat 5616 For example, the agent ``Max Smith'' was associated with the activity ``observation'' with the role ``Observer''.
334 mathieu.servillat 5491
335     \begin{table}[ht]
336     \small
337     \tymax 0.5\textwidth
338     \textbf{\normalsize WasAssociatedWith}\vspace{0.25em}\\
339     \begin{tabulary}{1.0\textwidth}{llL}
340     \toprule
341     \head{Attribute} & \head{Data type} & \head{Description}\\
342     \midrule
343     %\multicolumn{4}{@{}l}{References}\\
344     %\midrule
345     %$\rightarrow$ \textbf{agent} & prov:agent & link & link to an \class{Agent} instance\\
346     %$\rightarrow$ \textbf{activity} & prov:activity & link & link to an \class{Activity} instance\\
347     %$\rightarrow$ description & & link & link to the corresponding \class{WasGeneratedByDescription}, if existing\\
348     %\midrule
349     %\textbf{id} & prov:id & string & an identifier for this relation\\
350     role & string & function of the agent with respect to the activity\\
351     \bottomrule
352     \end{tabulary}
353     \caption[Attributes of \class{WasAssociatedWith} relation class]{Attributes of \class{WasAssociatedWith} relation class.}
354     \label{tab:wasassociatedwith}
355     \end{table}
356    
357    
358     \subsubsection{WasAttributedTo class}
359    
360     \textbf{Attribution} is the ascribing of an entity to an agent. When an entity is attributed to an agent, this entity was generated by some unspecified activity that in turn was associated to the agent. Thus, this relation is generally useful when the activity is not known, or irrelevant (W3C PROV-DM \href{https://www.w3.org/TR/prov-dm/#term-attribution}{\S5.3.2}).
361     %The agent therefore bears some responsibility for its existence.
362    
363     %When an entity is attributed to an agent, this entity was generated by some unspecified activity that in turn was associated to the agent. Thus, this relation is generally useful when the activity is not known, or irrelevant.
364    
365     Attribution is implemented in the model by a class \class{WasAttributedTo} that connects \class{Entity} to \class{Agent} and contains the attributes in Table~\ref{tab:wasattributedto}.
366    
367 mathieu.servillat 5616 For example, the entity ``science\_image.fits'' was attributed to the agent ``observatory''.
368 mathieu.servillat 5491
369    
370     \begin{table}[ht]
371     \small
372     \tymax 0.5\textwidth
373     \textbf{\normalsize WasAttributedTo}\vspace{0.25em}\\
374     \begin{tabulary}{1.0\textwidth}{llL}
375     \toprule
376     \head{Attribute} & \head{Data type} & \head{Description}\\
377     \midrule
378     %\multicolumn{4}{@{}l}{References}\\
379     %\midrule
380     %$\rightarrow$ \textbf{agent} & prov:agent & link & link to an \class{Agent} instance\\
381     %$\rightarrow$ \textbf{entity} & prov:entity & link & link to an \class{Entity} instance\\
382     %\midrule
383     %\textbf{id} & prov:id & string & an identifier for this relation\\
384     role & string & function of the agent with respect to the entity \\
385     \bottomrule
386     \end{tabulary}
387     \caption[Attributes of \class{WasAttributedTo} relation class]{Attributes of \class{WasAttributedTo} relation class.}
388     \label{tab:wasattributedto}
389     \end{table}
390    
391    
392     \subsubsection{Agent roles}
393    
394     Agents may play a specific role with respect to an activity or an entity.
395     %For example: telescope observer, pipeline operator, principal investigator, software engineer, project helpdesk.
396     The \attribute{role} attribute should be specified whenever it is known.
397    
398 mathieu.servillat 5616 Roles in relations to \class{Agent} are free text attributes, but if one of the terms in Table \ref{tab:agent-roles} applies, it should be used.
399 mathieu.servillat 5491
400     % DataCite roles, see https://schema.datacite.org/meta/kernel-4.2/doc/DataCite-MetadataKernel_v4.2.pdf
401     % ContactPerson DataCollector DataCurator DataManager Distributor Editor HostingInstitution Producer ProjectLeader ProjectManager ProjectMember RegistrationAgency RegistrationAuthority RelatedPerson Researcher ResearchGroup RightsHolder Sponsor Supervisor WorkPackageLeader Other
402    
403     \begin{table}[ht]
404     \small
405     \tymax 0.5\textwidth
406     \textbf{\normalsize Agent roles}\vspace{0.25em}\\
407 mathieu.servillat 5555 \begin{tabulary}{1.0\textwidth}{lp{8cm}}
408 mathieu.servillat 5491 \toprule
409 mathieu.servillat 5616 \head{Role} & \head{Description} \\
410 mathieu.servillat 5491 \midrule
411 mathieu.servillat 5691 Author & the agent was at the origin of a written entity (e.g., article, document, proposal) \\
412     Contributor & the agent helped in the creation of an entity or execution of an activity \\
413     Coordinator & the agent was leading the organisation of an activity \\
414     Creator & the agent created an entity or an activity \\
415     Curator & the agent was responsible for the legacy aspects of an entity \\
416     Editor & the agent validated the content of an entity \\
417     Funder & the agent provided financial support for an activity or the creation of an entity \\
418     Investigator & the agent was responsible for the scientific goals of an activity \\
419     Observer & the agent executed an observation activity or was responsible for observing a specific entity \\
420     Operator & the agent was in charge of performing an activity or using an entity \\
421     Provider & the agent effectively gave access and delivered the entity \\
422     Publisher & the agent certified and made an entity available to the public \\
423 mathieu.servillat 5491 \bottomrule
424     \end{tabulary}
425     \caption[Terms applicable as agent roles.]{Terms applicable as agent roles.}
426     \label{tab:agent-roles}
427     \end{table}
428    
429    
430    
431    
432     \subsection{Description classes}
433     \label{sec:descriptions}
434    
435     In the domain of astronomy, certain processes and steps are repeated over and over again, maybe using a different configuration and within a different context.
436     We therefore separate the descriptions of activities from the actual processes and introduce an \class{ActivityDescription} class (Section~\ref{sec:activity_desc}).
437     Likewise, we also apply the same pattern for \class{Entity} and add an \class{EntityDescription} class (Section~\ref{sec:entity_desc}).
438    
439     Defining such descriptions allows them to be predefined and reused, which is less redundant when exposing the provenance of a series of tasks of the same type.
440     Providing detailed descriptions to activities and entities help assess the quality and reliability of the processes executed (see goal C in Section~\ref{sec:goals}).
441    
442 mathieu.servillat 5535 Figure~\ref{fig:classdiagram_descriptions} shows the class diagram part focused on the description classes.
443 mathieu.servillat 5491
444     \begin{figure}[ht]
445     \centering
446 mathieu.servillat 5529 \includegraphics[width=1.0\textwidth]{PROV_Fig5.png}
447 mathieu.servillat 5491 \caption[Partial class diagram focused on description classes.]{Partial class diagram focused on description classes.}
448     \label{fig:classdiagram_descriptions}
449     \end{figure}
450    
451    
452     \subsubsection{ActivityDescription class}
453     \label{sec:activity_desc}
454    
455    
456     \begin{table}[ht]
457     \small
458     \tymax 0.5\textwidth
459     \textbf{\normalsize ActivityDescription}\vspace{0.25em}\\
460     \begin{tabulary}{1.0\textwidth}{llL}
461     \toprule
462     \head{Attribute} & \head{Data type} & \head{Description}\\
463     \midrule
464     %\textbf{id} & string & a unique id for this activity description\\
465     \textbf{name} & string & a human-readable name\\
466 mathieu.servillat 5663 version & string & a version number, if applicable (e.g., for the code used)\\
467     description & string & additional free text describing how the activity works internally\\
468     docurl & anyURI & link to further documentation on this activity, e.g., a
469 mathieu.servillat 5491 paper, the source code in a version control system etc.\\
470     type & string & type of the activity\\
471     subtype & string & more specific subtype of the activity\\
472     % code & string & the code (software) used for this process, if applicable\\
473     \bottomrule
474     \end{tabulary}
475 mathieu.servillat 5616 \caption[Attributes of the \class{ActivityDescription} class]{Attributes of the \class{ActivityDescription} class. Attributes in \textbf{bold} are mandatory and must not be null.
476 mathieu.servillat 5491 }\label{tab:activitydescription}
477     \end{table}
478    
479    
480 mathieu.servillat 5663 The information necessary to describe how an activity works internally are stored in \class{ActivityDescription} objects.
481 mathieu.servillat 5491
482 mathieu.servillat 5616 \class{ActivityDescription} is directly attached to \class{Activity} and can thus be seen as a list of attributes that can be known before an \class{Activity} instance is created.
483 mathieu.servillat 5491
484 mathieu.servillat 5616 There must be exactly zero or one \class{ActivityDescription} instance per activity.
485     If an activity is linked to an \class{ActivityDescription} instance, \class{Used}/\class{WasGeneratedBy}/\class{Entity} objects bound to this activity must refer to the description elements composing the \class{ActivityDescription}.
486 mathieu.servillat 5491
487 mathieu.servillat 5616 %If a \class{Used}/\class{WasGeneratedBy} object is linked to a corresponding description element, then there must exist a link between the related activity and its corresponding activity description.
488    
489    
490 mathieu.servillat 5555 The activity \attribute{type} is a free text attribute, but if one of the terms in Table \ref{tab:activitydescription-types} applies, it should be used.
491 mathieu.servillat 5663 The activity \attribute{subtype} is a free text attribute to be used internally by the project that defined \class{ActivityDescription} instances (e.g., mosaicing, denoising, photometric calibration, cross correlation).
492 mathieu.servillat 5491
493    
494     \begin{table}[ht]
495     \small
496     \tymax 0.5\textwidth
497     \textbf{\normalsize ActivityDescription types}\vspace{0.25em}\\
498 mathieu.servillat 5555 \begin{tabulary}{1.0\textwidth}{lp{8cm}}
499 mathieu.servillat 5491 \toprule
500 mathieu.servillat 5616 \head{Type} & \head{Description} \\
501 mathieu.servillat 5491 \midrule
502 mathieu.servillat 5616 Observation & active acquisition of information on a phenomenon\\
503     Simulation & generation of data through a computational process\\
504     Reduction & transformation of digital information into a corrected, ordered, and simplified form\\
505     Calibration & transformation and comparison of measurement values with respect to a calibration standard of known accuracy\\
506     Reconstruction & estimation of physical properties using indirect information\\
507     Selection & application of filters or criteria to select partial information\\
508     Analysis & process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making\\
509 mathieu.servillat 5491 \bottomrule
510     \end{tabulary}
511     \caption[Terms applicable as activity types.]{Terms applicable as activity types.}
512 mathieu.servillat 5555 \label{tab:activitydescription-types}
513 mathieu.servillat 5491 \end{table}
514    
515    
516    
517    
518     \subsubsection{EntityDescription class}
519     \label{sec:entity_desc}
520    
521    
522     \begin{table}[ht]
523     \small
524     \tymax 0.5\textwidth
525     \textbf{\normalsize EntityDescription}\vspace{0.25em}\\
526     \begin{tabulary}{\textwidth}{llL}
527     \toprule
528     \head{Attribute} & \head{Data type} & \head{Description}\\
529     \midrule
530     %\textbf{id} & string & a unique identifier for this description\\
531     name & string & a human-readable name for the entity description\\
532     description & string & a descriptive text for this kind of entity\\
533     doculink & anyURI & link to more documentation\\
534     type & string & type of the entity\\
535     % \midrule
536     % \multicolumn{3}{@{}l}{\textbf{Optional attributes:}} \\
537     % content\_type & string & MIME type for the content of the entity\\
538     % format & string & type of container for the entity\\
539     % removed the obscore attributes, since specific for observations only, not applicable to configuration entities etc.
540 mathieu.servillat 5663 % dataproduct\_ type & string & from ObsCore data model \citep{std:ObsCore}, if applicable; describes, what kind of product it is (e.g., image, table)\\
541 mathieu.servillat 5491 % dataproduct\_ subtype & string & from ObsCore data model, more specific subtype\\
542     % level & enum integer & the level of processing or calibration; for ObsCore's calib\_level it is an integer between 0 and 3\\
543     \bottomrule
544     \end{tabulary}
545 mathieu.servillat 5535 \caption[Attributes of the \class{EntityDescription} class]{Attributes of the \class{EntityDescription} class.
546 mathieu.servillat 5491 }\label{tab:entitydescription}
547     \end{table}
548    
549    
550     The \class{EntityDescription} class is meant to store descriptive information for different categories of entities. It contains information that is known before an \class{Entity} instance is created. The \class{EntityDescription} general attributes are summarized in Table~\ref{tab:entitydescription}.
551    
552 mathieu.servillat 5663 For example, a specific category of entities in a project may be defined in details in a document or on a webpage (e.g., a CTA DL3 file, a CCD device, a photographic plate).
553 mathieu.servillat 5491
554 mathieu.servillat 5663 The entity \attribute{type} is a free text attribute, that contains the general category of the entity, e.g., if it is data, a document, a vizualization, a device.
555 mathieu.servillat 5491
556     The \class{EntityDescription} class should not contain information about the usage of the data, in particular, it generally tells nothing about them being used as input or generated as output. This kind of information should be provided by the relations (and their descriptions) between activities and entities (see Sections~\ref{sec:entity-activity-relations} and \ref{sec:use_gen_desc}).
557    
558    
559     \subsubsection{UsageDescription and GenerationDescription classes}
560     \label{sec:use_gen_desc}
561    
562     \begin{table}[ht]
563     \small
564     \tymax 0.5\textwidth
565     \textbf{\normalsize UsageDescription}\vspace{0.25em}\\
566     \begin{tabulary}{1.0\textwidth}{llL}
567     \toprule
568     \head{Attribute} & \head{Data type} & \head{Description}\\
569     % \midrule
570     % $\rightarrow$ \textbf{activityDescription} & link & link to \class{ActivityDescription}\\
571     % $\rightarrow$ entityDescription & link & link to \class{EntityDescription}\\
572     \midrule
573     %\textbf{id} & string & identifier\\
574     \textbf{role} & string & function of the entity with respect to the activity \\
575     description & string & a descriptive text for this kind of usage \\
576     type & string & type of relation, see Section~\ref{sec:ugtypes} \\
577 mathieu.servillat 5663 multiplicity & string & Number of expected input entities to be used with the given role. The multiplicity syntax is similar to that of VO-DML (\citealt{2018ivoa.spec.0910L}, \S4.19) in the form `minOccurs..maxOccurs'' or a single value if minOccurs and maxOccurs are identical, e.g., ``1'' for one item, ``*'' for unbounded or ``3..*'' for unbounded with at least 3 items. \\
578 mathieu.servillat 5491 \bottomrule
579     \end{tabulary}
580 mathieu.servillat 5616 \caption[Attributes of the \class{UsageDescription} class]{Attributes of the \class{UsageDescription} class. Attributes in \textbf{bold} are mandatory and must not be null.}
581 mathieu.servillat 5491 \label{tab:usagedescription}
582     \end{table}
583    
584    
585     \begin{table}[ht]
586     \small
587     \tymax 0.5\textwidth
588     \textbf{\normalsize GenerationDescription}\vspace{0.25em}\\
589     \begin{tabulary}{1.0\textwidth}{llL}
590     \toprule
591     \head{Attribute} & \head{Data type} & \head{Description}\\
592     \midrule
593     %\textbf{id} & string & identifier\\
594     \textbf{role} & string & function of the entity with respect to the activity \\
595     description & string & a descriptive text for this kind of generation \\
596     type & string & type of relation, see section \ref{sec:ugtypes} \\
597 mathieu.servillat 5663 multiplicity & string & Number of expected output entities that will be generated with the given role. The multiplicity syntax is similar to that of VO-DML (\citealt{2018ivoa.spec.0910L}, \S4.19) in the form `minOccurs..maxOccurs'' or a single value if minOccurs and maxOccurs are identical, e.g., ``1'' for one item, ``*'' for unbounded or ``3..*'' for unbounded with at least 3 items. \\
598 mathieu.servillat 5491 % \midrule
599     % $\rightarrow$ \textbf{activityDescription} & link & link to an \class{ActivityDescription}\\
600     % $\rightarrow$ entityDescription & link & link to \class{EntityDescription}\\
601     \bottomrule
602     \end{tabulary}
603 mathieu.servillat 5616 \caption[Attributes of the \class{GenerationDescription} class]{Attributes of the \class{GenerationDescription} class. Attributes in \textbf{bold} are mandatory and must not be null.}
604 mathieu.servillat 5491 \label{tab:wasgeneratedbydescription}
605     \end{table}
606    
607    
608     In order to describe more precisely an activity, the expected inputs and outputs of this activity should be specified.
609    
610 mathieu.servillat 5663 We introduce the \class{UsageDescription} and the \class{GenerationDescription} classes, that are meant to store the information about the usage or generation of entities that is known before an activity instance is executed, i.e.~wht we expect to store in the \class{Used} and \class{WasGeneratedBy} relations (see \ref{sec:entity-activity-relations}).
611 mathieu.servillat 5491 Instances of \class{Used} (respectively \class{WasGeneratedBy}) may thus point to an instance of \class{UsageDescription} (respectively \class{GenerationDescription}).
612    
613     If a \class{UsageDescription} (respectively \class{GenerationDescription}) instance is defined, the \attribute{role} attribute of the related \class{Used} (respectively \class{WasGeneratedBy}) instances must match the \attribute{role} attribute of this \class{UsageDescription} (respectively \class{GenerationDescription}) instance.
614    
615 mathieu.servillat 5663 A \attribute{multiplicity} attribute should be specified to indicate the number of entities expected to share the same role for a given \class{ActivityDescription} instance, e.g., in the case of the stacking of images, several images are expected with the same input role (\attribute{multiplicity=*}).
616 mathieu.servillat 5491
617 mathieu.servillat 5663 When related to the \class{UsageDescription} or \class{GenerationDescription}, the attributes of \class{EntityDescription} (see Section~\ref{sec:entity_desc}) help to describe the category of entities expected as an input or an output in an activity.
618     For example: if the input bias files are expected to be in FITS format, the \class{UsageDescription} object would have a relation to a \class{DatasetDescription} object with \attribute{contentType}=`application/fits''.
619 mathieu.servillat 5491
620    
621     \subsubsection{Types of Usage and Generation}
622     \label{sec:ugtypes}
623    
624     The typing of those relations is particularly needed to enable quality assessment and identification of error sources in the process (see goals C and D in Section \ref{sec:goals}), so as to facilitate the exploration of provenance information.
625    
626 mathieu.servillat 5663 The type of usage or generation is a free text attribute, but if one of the terms in Table \ref{tab:usage-generation-types} applies, it should be used.
627 mathieu.servillat 5491
628     \begin{table}[ht]
629     \small
630     \tymax 0.5\textwidth
631 mathieu.servillat 5555 \begin{tabulary}{1.0\textwidth}{Lp{8cm}}
632 mathieu.servillat 5491 \toprule
633     \head{Type} & \head{Description} \\
634     \midrule
635 mathieu.servillat 5663 Main & main input or output entities of the activity, i.e.~strictly necessary, and the primary objective of the activity\\
636 mathieu.servillat 5616 Calibration & usage of an entity to calibrate another entity\\
637     Preview & generation of a quick representation of an entity\\
638 mathieu.servillat 5491 Setup & usage of an entity as configuration information, see also Section~\ref{sec:configurationpackage}\\
639 mathieu.servillat 5663 Quality & generation of information that helps to assess the quality of the activity results, e.g., errors, warnings, flags, percentage of overexposed pixels\\
640 mathieu.servillat 5616 Log & generation of logging information\\
641     Context & contextual information that influences the activity, but for which there are no or little control at the moment of its execution, examples: temperature, wind, conditions of observation, execution platform, operating system, instrumental context\\
642 mathieu.servillat 5491 \bottomrule
643     \end{tabulary}
644     \caption[Terms applicable as usage or generation type.]{Terms applicable as usage or generation type.}
645     \label{tab:usage-generation-types}
646     \end{table}
647    
648 mathieu.servillat 5663 The type ``Main'' indicates the main input and output entities of an activity. It should help to provide the minimum relevant data flow to the initial entity or activity, i.e.~to find the most relevant progenitors.
649 mathieu.servillat 5491
650    
651     \subsection{Specific types of Entity classes}
652     \label{sec:spec_entities}
653    
654     \class{Entity} and \class{EntityDescription} classes carry the minimum metadata that can apply to any kind of entity without specifying the nature or the structure of the content of the entity.
655     In some cases, the structure of the content is relevant information to assess the usefulness of the entity, in particular for datasets.
656    
657     In some other cases, the content itself of an entity is relevant information to assess the usefulness of the related entities or activities. Such content must then be expose as properly described values.
658    
659     In astronomy and the VO, we thus define two main types of entity classes:
660    
661     \begin{itemize}
662 mathieu.servillat 5663 \item \textbf{Dataset}: a dataset is a resource which encodes data in a defined structure. It is generally a file or a set of files which are considered to be a single deliverable. The content may be e.g., a cube, an image, a table, a list.
663     \item \textbf{Value}: a value is an atomic piece of data with a given value type (e.g., a data type such as boolean, integer, real, string).
664 mathieu.servillat 5491 \end{itemize}
665    
666     \begin{figure}[ht]
667     \centering
668 mathieu.servillat 5529 \includegraphics[width=1.0\textwidth]{PROV_Fig6.png}
669 mathieu.servillat 5535 \caption[Partial class diagram focused on specific types of \class{Entity} classes.]{Partial class diagram focused on Specific types of \class{Entity} classes.}
670 mathieu.servillat 5491 \label{fig:classdiagram_entityclasses}
671     \end{figure}
672    
673 mathieu.servillat 5625 As shown in Figure~\ref{fig:classdiagram_entityclasses}, the entity description classes for both \class{ValueEntity} and \class{DatasetEntity} are subsetted respectively as \class{ValueDescription} and \class{DatasetDescription}.
674 mathieu.servillat 5491
675 mathieu.servillat 5663 We anticipate that more specific categories of entities can be defined by the projects (for example, a device, a document, a vizualization). The \attribute{type} attribute of the \class{EntityDescription} class should be used to differentiate the different categories of entities.
676 mathieu.servillat 5491
677 mathieu.servillat 5535
678 mathieu.servillat 5491 \subsubsection{DatasetEntity and DatasetDescription classes}
679    
680     The handling of datasets is implemented in the model by a \class{DatasetEntity} class. A corresponding \class{DatasetDescription} class contains a \attribute{contentType} attribute that must not be null (see Table~\ref{tab:datasetdescription}).
681    
682 mathieu.servillat 5535 The \attribute{contentType} indicates the MIME-type or format of a dataset, or a more precise structure, following the definition of the attribute \attribute{access\_format} defined in ObsCoreDM (\citet{2017ivoa.spec.0509L}, Section 4.7).
683 mathieu.servillat 5491
684     \begin{table}[ht]
685     \small
686     \tymax 0.5\textwidth
687     \textbf{\normalsize DatasetDescription}\vspace{0.25em}\\
688     %\begin{tabulary}{1.0\textwidth}{@{}p{2.5cm}p{0cm}lL@{}}
689     \begin{tabulary}{1.0\textwidth}{llL}
690     \toprule
691     \head{Attribute} & \head{Data type} & \head{Description}\\
692     \midrule
693 mathieu.servillat 5616 \textbf{contentType} & string & format of the dataset, MIME type when applicable \\
694 mathieu.servillat 5491 \bottomrule
695     \end{tabulary}
696 mathieu.servillat 5616 \caption[Attributes of the \class{DatasetDescription} class]{Attributes of the \class{DatasetDescription} class. The class also inherits the attributes of \class{EntityDescription} listed in Table \ref{tab:entitydescription}. Attributes in \textbf{bold} are mandatory and must not be null.}
697 mathieu.servillat 5491 \label{tab:datasetdescription}
698     \end{table}
699    
700    
701     \subsubsection{ValueEntity and ValueDescription classes}
702    
703     The handling of values is implemented in the model by a \class{ValueEntity} class that contains a \attribute{value} attribute. A corresponding \class{ValueDescription} class contains attributes commonly used in the VO to qualify values. Those attributes are listed in Table~\ref{tab:valuedescription}.
704    
705     \begin{table}[ht]
706     \small
707     \tymax 0.5\textwidth
708     \textbf{\normalsize ValueEntity}\vspace{0.25em}\\
709     %\begin{tabulary}{1.0\textwidth}{@{}p{2.5cm}p{0cm}lL@{}}
710     \begin{tabulary}{1.0\textwidth}{llL}
711     \toprule
712     \head{Attribute} & \head{Data type} & \head{Description}\\
713     \midrule
714     \textbf{value} & string & the value of the entity. If a corresponding \class{ValueDescription}.\attribute{valueType} attribute is set, the value string can be interpreted by this \attribute{valueType}. \\
715     \bottomrule
716     \end{tabulary}
717 mathieu.servillat 5616 \caption[Attributes of the \class{ValueEntity} class]{Attributes of the \class{ValueEntity} class. The class also inherits the attributes of \class{EntityDescription} listed in Table \ref{tab:entitydescription}. Attributes in \textbf{bold} are mandatory and must not be null.}
718 mathieu.servillat 5491 \label{tab:valueentity}
719     \end{table}
720    
721     \begin{table}[ht]
722     \small
723     \tymax 0.5\textwidth
724     \textbf{\normalsize ValueDescription}\vspace{0.25em}\\
725     %\begin{tabulary}{1.0\textwidth}{@{}p{2.5cm}p{0cm}lL@{}}
726     \begin{tabulary}{1.0\textwidth}{p{2cm}LL}
727     \toprule
728     \head{Attribute} & \head{Data type} & \head{Description}\\
729     \midrule
730     %\textbf{id} & string & parameter unique identifier\\
731 mathieu.servillat 5663 \textbf{valueType} & VOTableType & combination of \attribute{datatype}, \attribute{arraysize} and \attribute{xtype} following VOTable 1.3 \citep[][, \S4.1]{2013ivoa.spec.0920O} \\
732 mathieu.servillat 5616 unit & Unit & VO unit, see \ref{sect:Units} and \citet{2014ivoa.spec.0523D} for recommended unit representation \\
733 mathieu.servillat 5512 ucd & string & Unified Content Descriptor, supplying a standardized classification of the physical quantity, see \citet{2018ivoa.spec.0527M}\\
734 mathieu.servillat 5491 utype & string & Utype, meant to express the role of the value in the context of an external data model, see \citet{note:utypeusage} \\
735     \bottomrule
736     \end{tabulary}
737 mathieu.servillat 5616 \caption[Attributes of the \class{ValueDescription} class]{Attributes of the \class{ValueDescription} class. The class also inherits the attributes of \class{EntityDescription} listed in Table \ref{tab:entitydescription}. Attributes in \textbf{bold} are mandatory and must not be null.}
738 mathieu.servillat 5491 \label{tab:valuedescription}
739     \end{table}
740    
741    
742    
743 mathieu.servillat 5616 \subsection{Activity configuration}
744     \label{sec:configuration}
745 mathieu.servillat 5491
746 mathieu.servillat 5616 Configuring an activity is the way to set parameters so that the activity occurs in the desired conditions.
747 mathieu.servillat 5491
748 mathieu.servillat 5616 In some cases developed in Section~\ref{sec:goals} (goals C and D in particular), configuration information is relevant to assess the quality and reliability of an activity or an entity, and to identify the location of configuration errors in a processing. It also facilitates the re-execution of an activity (reproducibility).
749 mathieu.servillat 5491
750 mathieu.servillat 5663 Configuration information may be carried by entities using the core features, where an entity (e.g., \class{ValueEntity} and \class{DatasetEntity} instances) is referenced in \class{Used} relations with a given \attribute{role} and \attribute{type}=“setup”. With this solution, the configuration information is independent from the activity and can be generated and used as any entity.
751 mathieu.servillat 5491
752 mathieu.servillat 5616 The data model also provides a specialized \class{ActivityConfiguration} package to directly attach configuration information to an activity. This package is composed of a \class{WasConfiguredBy} relation connecting \class{Parameter} and \class{ConfigFile} classes with the \class{Activity} class (see~\ref{sec:configurationpackage}). With this solution the configuration information is independent from the entities, and seen as part of the activity.
753 mathieu.servillat 5491
754    
755 mathieu.servillat 5616 \begin{figure}[hbt]
756     \centering
757     \includegraphics[width=1.0\textwidth]{PROV_Fig7.png}
758     % Mireille: updated the diagram file for the last version with the proper cardinalities for Parameter and ConfigFile
759     \caption[Partial class diagram focused on the \class{ActivityConfiguration} package.]{Partial class diagram focused on the \class{ActivityConfiguration} package. The \class{Parameter} and \class{ConfigFile} classes provide configuration information for an \class{Activity} instance. The right side of the diagram shows the descriptions, where an \class{ActivityDescription} class is bound with the \class{ParameterDescription} and \class{ConfigFileDescription} classes.}
760     \label{fig:activityconfig}
761     \end{figure}
762 mathieu.servillat 5491
763    
764 mathieu.servillat 5616 \subsubsection{Overview of the ActivityConfiguration package} \label{sec:configurationpackage}
765 mathieu.servillat 5491
766 mathieu.servillat 5616 As shown in Figure \ref{fig:activityconfig} the \class{ActivityConfiguration} package contains two classes for the execution side: \class{Parameter} and \class{ConfigFile} which are connected to an \class{Activity} instance via the \class{WasConfiguredBy} association class.
767 mathieu.servillat 5663 An \class{Activity} may thus be configured by a set of \class{Parameter} instances, by \class{ConfigFile} instances, or by a combination of both.
768 mathieu.servillat 5616
769     The corresponding description classes, \class{ParameterDescription} and \class{ConfigFileDescription}, are both defined in the context of the description of an activity.
770     There can be several instances of a \class{Parameter} (respectively \class{ConfigFile}) that are described by the same instance of \class{ParameterDescription} (respectively \class{ConfigFileDescription}).
771    
772    
773     \subsubsection{Parameter and ParameterDescription classes}
774     \label{sec:parameterandD}
775    
776     \begin{table}[ht]
777     \small
778     \tymax 0.5\textwidth
779     \textbf{\normalsize Parameter}\vspace{0.25em}\\
780     \begin{tabulary}{1.0\textwidth}{llL}
781     \toprule
782     \head{Attribute} & \head{Data type} & \head{Description}\\
783     \midrule
784     %\textbf{id} & string & a unique id\\
785     \textbf{name} & string & name of the parameter \\
786     \textbf{value} & string & the value of the parameter. If a corresponding \class{ParameterDescription}.\attribute{valueType} attribute is set, the value string can be interpreted by this \attribute{valueType}. \\
787     \bottomrule
788     \end{tabulary}
789     \caption[Attributes of the \class{Parameter} class]{Attributes of the \class{Parameter} class. Attributes in \textbf{bold} are mandatory and must not be null.}
790     \label{tab:param}
791     \end{table}
792    
793     \begin{table}[ht]
794     \small
795     \tymax 0.5\textwidth
796     \textbf{\normalsize ParameterDescription}\vspace{0.25em}\\
797     \begin{tabulary}{1.0\textwidth}{lLL}
798     \toprule
799     \head{Attribute} & \head{Data type} & \head{Description}\\
800     \midrule
801     %\textbf{id} & string & unique ParemeterDescription identifier\\
802     \textbf{name} & string & name of the parameter \\
803 mathieu.servillat 5663 \textbf{valueType} & VOTableType & combination of \attribute{datatype}, \attribute{arraysize} and \attribute{xtype} following VOTable 1.3 \citep[][, \S4.1]{2013ivoa.spec.0920O} \\
804 mathieu.servillat 5616 description & string & a descriptive text for the parameter \\
805     unit & Unit & VO unit, see \ref{sect:Units} and \citet{2014ivoa.spec.0523D} for recommended unit representation \\
806     ucd & string & Unified Content Descriptor, supplying a standardized classification of the physical quantity, see \citet{2018ivoa.spec.0527M} \\
807     utype & string & Utype, meant to express the role of the parameter in the context of an external data model, see \citet{note:utypeusage} \\
808     %xtype & string & extended datatype as in VOTable 1.2 and above. A list of proposed \\
809     % \midrule
810     % \multicolumn{3}{@{}l}{\textbf{Optional attributes:}} \\
811     min & string & minimum value as a string whose value can be interpreted by the \attribute{valueType} attribute \\
812     max & string & maximum value as a string whose value can be interpreted by the \attribute{valueType} attribute\\
813 mathieu.servillat 5663 options & array of strings & array of possible values\\
814 mathieu.servillat 5616 default & string & the default value of the parameter as a string whose value can be interpreted by the \attribute{valueType} attribute \\
815     \bottomrule
816     \end{tabulary}
817     \caption[Attributes of the \class{ParameterDescription} class]{Attributes of the \class{ParameterDescription} class. Attributes in \textbf{bold} are mandatory and must not be null.}
818     \label{tab:Paramdescription}
819     \end{table}
820    
821     The \class{Parameter} class contains a \attribute{value} and a \attribute{name} attribute that must be set (Table~\ref{tab:param}).
822    
823     The \class{ParameterDescription} class describes the parameter \attribute{value} attribute similarly to the \class{ValueEntity} and \class{ValueDescription} classes. Those attributes are listed in Table~\ref{tab:Paramdescription}.
824    
825     If a \class{ParameterDescription} instance is defined, the \attribute{name} attribute of the related \class{Parameter} instances must match the \attribute{name} attribute of this \class{ParameterDescription} instance.
826    
827     The \class{Parameter} instance may refer to a \class{ValueEntity} instance using a \textit{hadReference} relation which gives the origin of the parameter value.
828    
829    
830     \subsubsection{ConfigFile and ConfigFileDescription classes}
831    
832     \begin{table}[ht]
833     \small
834     \tymax 0.5\textwidth
835     \textbf{\normalsize ConfigFile}\vspace{0.25em}\\
836     \begin{tabulary}{1.0\textwidth}{llL}
837     \toprule
838     \head{Attribute} & \head{Data type} & \head{Description}\\
839     \midrule
840     \textbf{name} & string & a human-readable name for the config file \\
841 mathieu.servillat 5663 \textbf{location} & string & a path to the config file, e.g., a URL \\
842 mathieu.servillat 5616 comment & string & text containing comments on the config file \\
843     \bottomrule
844     \end{tabulary}
845     \caption[Attributes of the \class{ConfigFile} class]{Attributes of the \class{ConfigFile} class. Attributes in \textbf{bold} are mandatory and must not be null.}
846     \label{tab:configfile}
847     \end{table}
848    
849     \begin{table}[ht]
850     \small
851     \tymax 0.5\textwidth
852     \textbf{\normalsize ConfigFileDescription}\vspace{0.25em}\\
853     \begin{tabulary}{1.0\textwidth}{llL}
854     \toprule
855     \head{Attribute} & \head{Data type} & \head{Description}\\
856     \midrule
857     \textbf{name} & string & a human-readable name for the config file \\
858     \textbf{contentType} & string & format of the config file, MIME type when applicable \\
859     description & string & a descriptive text for the config file \\
860     \bottomrule
861     \end{tabulary}
862     \caption[Attributes of the \class{ConfigFileDescription} class]{Attributes of the \class{ConfigFileDescription} class. Attributes in \textbf{bold} are mandatory and must not be null.}
863     \label{tab:configfiledescription}
864     \end{table}
865    
866 mathieu.servillat 5663 The \class{ConfigFile} points to a structured, machine readable file, where parameters for running an activity are stored. It contains a \attribute{location} and a \attribute{name} that must be set, and a \attribute{comment} attribute (Table~\ref{tab:configfile}).
867 mathieu.servillat 5616
868     The \class{ConfigFileDescription} class indicates the format in which the list is provided in a \attribute{contentType} attribute (see Table~\ref{tab:configfiledescription}).
869    
870     If a \class{ConfigFileDescription} instance is defined, the \attribute{name} attribute of the related \class{ConfigFile} instances must match the \attribute{name} attribute of this \class{ConfigFileDescription} instance.
871    
872    
873     \subsubsection{Relations with Activity class}
874    
875     \begin{table}[ht]
876     \small
877     \tymax 0.5\textwidth
878     \textbf{\normalsize WasConfiguredBy}\vspace{0.25em}\\
879     \begin{tabulary}{1.0\textwidth}{llL}
880     \toprule
881     \head{Attribute} & \head{Data type} & \head{Description}\\
882     \midrule
883     %\textbf{id} & string & a unique id\\
884 mathieu.servillat 5663 \textbf{artefactType} & TypeOfConfigArtefact & literal that takes the value ``Parameter'' or ``ConfigFile'' to indicate the type of class pointed by the \class{WasConfiguredBy} instance. \\
885 mathieu.servillat 5616 \bottomrule
886     \end{tabulary}
887     \caption[Attributes of the \class{WasConfiguredBy} class]{Attributes of the \class{WasConfiguredBy} class. Attributes in \textbf{bold} are mandatory and must not be null.}
888     \label{tab:WasConfiguredBy}
889     \end{table}
890    
891     The relation of \class{Parameter} and \class{ConfigFile} to \class{Activity} is formalized by a \class{WasConfiguredBy} class. There must be exactly one instance connected to a \class{WasConfiguredBy} instance, either a \class{Parameter} instance or a \class{ConfigFile} instance. The \class{WasConfiguredBy} class contains the attribute \attribute{artefactType} to indicate the type of class pointed by the \class{WasConfiguredBy} instance (see Table~\ref{tab:WasConfiguredBy}).
892    
893     The life cycle of a \class{Parameter} instance (respectively \class{ConfigFile} instance) is the one of the corresponding \class{Activity} instance.
894     The life cycle of a \class{ParameterDescription} instance (respectively \class{ConfigFileDescription} instance) is the one of the corresponding \class{ActivityDescription} instance.
895     This means that when an activity is deleted from the provenance repository, its parameters and config files also disappear.
896    
897     Several activities launched with various possible values for a parameter share the same \class{ParameterDescription} instance.
898 mathieu.servillat 5663 For instance, a cube analysis activity with a parameter ``nbofChannels'' will point to the corresponding instance of \class{ParameterDescription} (\attribute{name} = ``nbofChannels'', \attribute{ucd} = ``meta.number'', \attribute{unit} = Null, \attribute{description} = ``Nb of channel used for segmentation'').
899     `
900 mathieu.servillat 5616 Similarly, we can foresee a number of different \class{ConfigFile} instances used for various instances of an \class{Activity}, which rely on the same \class{ConfigFileDescription} instance bound to the corresponding \class{ActivityDescription} instance.

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26