This document provides a data model describing the structure and content of generic Dataset metadata for the IVOA. This is a high-level model which is to be referenced and extended by other models describing specific types of Datasets and Data products. In this document, we specify the generic Dataset, as well as an ObservationDataset model which covers the class of Datasets which are derived from an Observation. At the time of this writing, there is no formal Observation-Experiment model for the IVOA, so we include a hypothetical Observation-Experiment model to serve as a placeholder. <p>Base Data Types V1.0 (2014)</p> The Observation Experiment model refers to several elements related to an Observation and its configuration. As of the time of this writing, there is no IVOA recommendation for a general Observation data model. The Provenance data model, in progress, will define the pattern for describing the relation between actions and results, and how to record these in datasets. In lieu of these standards, this document defines a straw-man Observation model. Head class for an Observation Experiment. The Observation is modeled as a type of 'Experiment', with some basic structure defined to provide metadata about the observation target and configuration. The product, or 'result' of the Observation is zero or more ObsDataset objects. This pattern is inspired by, and compatible with the Simulation Data Model, where a 'Simulation' can be considered another form of 'Experiment' or perhaps even another form of 'Observation'. Observation configuration metadata, provides information about who, where, and how the observation was conducted. The result of an observation is zero or more Observation Datasets. The target of the observation. The content of this object may vary greatly depending on the goals and nature of the observation. For example, the 'target' could be a galaxy, stellar object, planet, or calibratioin source. As such, we allow the BaseTarget class here, and permit the users to define and use more content rich flavors according to their needs. Identifies any proposal related to the observation. This field may be used to gather all observations and products related to a particular proposal. Internal ID determined by the data provide to uniquely identify the observation within the institution or entity performing the observation. ObsConfig describes all Observation Configuration metadata. We define a small set of configuration elements which are required as Provenance in the observation dataset. Metadata pertaining to the facility performing the observation. Metadata pertaining to the instrument used to create the data. Describes the spectral domain of the observation in a very general sense. The value may be expressed in terms of general spectral bands, or specific bandpass names. If multiple bands are covered, the value may be a comma delimited combination of appropriate bands. If expressed as general bands, the value(s) must be selected from the enumerated set given by the SpectralBand type. There is no controlled vocabulary for specific bandpass names as the list is too long to enumerate. Effort should be made to use highly recognized bandpass names (eg: "U","V","B","R","I", "H-alpha"). This field corresponds to both the Coverage.Spectral and Coverage.Spectral.Bandpass fields of the Resource Metadata document. Describes the original source of the data in a very general fashion. In other words, "What sort of observation originally generated the data?" Suggested values include: + survey: Survey data typically covers some region of observational parameter space with as complete as possible coverage within that region. + pointed: Pointed data of a particular object or field. + theory: Theory data, generated based on a theoretical model. + artificial: Artificial, or simulated data. Similar to 'theory', but not necessarily based on a theoretical model. + custom: Custom data, as part of a specific research project. Abstract base class for the Target object tree. The target object provides identifying metadata related to the subject or goal of the experiment. For an Observational experiment, this would typically be an astronomical object. The BaseTarget class defines high-level identifying information, and must be extended for particular classes of Target which may define additional characteristics. The target name. The primary purpose of this field is to provide the user with a recognizable identity of the particular subject or goal. However, since this may be a query-able field in data discovery protocols, care should be taken to use values which follow conventions for the domain appropriate for the data. For an astronomical object, this may be a name suitable for use within a domain-specific resolution service. Simulated data might also use this sort of name (if simulating a particular object), or a more generic term such as "G2V star". Free form description of target. General purpose Target object. General classification or type of the target. This field supports the discovery of data pertaining to a common class. e.g. "Star", "Galaxy", "AGN". At the time of this writing, there is no IVOA recommended vocabulary for this field. The SIMBAD and NED databases use defined vocabularies for astronomical object classifications which may serve as the basis for such. Extension of BaseTarget specialized for astronomical objects. The AstroTarget defines additional astronomical properties of the target. General classification or type of the target. This field supports the discovery of data pertaining to a common class. e.g. "Star", "Galaxy", "AGN". At the time of this writing, there is no IVOA recommended vocabulary for this field. The SIMBAD and NED databases use defined vocabularies for astronomical object classifications which may serve as the basis for such. Spectral class of the object. As with objectClass, there is no IVOA recommended vocabulary for specifying the spectral class of an object. There is an IVOA Note on the subject entitled "An encoding system to represent stellar spectral classes in archival databases and catalogs", describing an encoding system which has been adopted by the MAST archive. This field gives the canonical redshift of the astronomical object. It is normally used to store the cosmological redshift of extragalactic objects, although it may also be used to store the observed redshift of Galactic sources if that information is felt by the data provider to be useful. Canonical variability amplitude attributed to the target. Facility performing the observation. Identifies the instrument used to create the data. This can be a specific instrument name, general type or something else, such as a program in the case of theoretical data. (RM:Collection.Instrument) Name of the instrument. Metadata related to the proposal or document which spawned the observation. Tag used to uniquely identify a particular proposal within the institution or entity. The Derived (short for Derived Data) object holds derived information obtained by evaluating or analyzing the contents of the dataset. The specific content of this object is strongly dependent on the specific type of dataset, so we provide a generic model which may be specialized in other models to define elements appropriate for that type of dataset. The primary purpose of this object is to provide a common framework in which specific information may be placed to aid in discovery and filtering of datasets in various access protocols. Collection of zero or more DerivedElement objects, each of which provides a specific quantity obtained by analyzing the dataset content. Extends Dataset with metadata relevant to datasets derived from Observations. Reference to a BaseTarget object from Observation. Provides metadata describing the target of the observation. Reference to Proposal object from Observation. This object provides metadata identifying any proposal related to the observation which produced the dataset. Reference to ObsConfig object from Observation. This object provides some high-level metadata related to the observation configuration. Provides a high level summary of certain properties of the dataset. Its primary purpose is to support high level filtering of datasets during data discovery. High level classification for the calibration level of the particular dataset as a whole. The calibration level concept conveys to the user information on how much data reduction/processing has been applied to the data. It is up to the data providers to consider how to map their own internal classification to the scale defined here. Scale: * 0 - Raw instrumental data, in a proprietary or internal data-provider defined format. * 1 - Instrumental data in a standard format (FITS, VOTable, etc ) * 2 - Calibrated, science ready data with the instrument signature removed. * 3 - Enhanced data products like mosaics, resampled or drizzled images, or heavily processed survey fields. Level 3 data products may represent the combination of data from multiple primary observations. Abstract base for defining derived data elements. Typically, models for specific data products would extend this object to define various elements appropriate for that model. For example, the Spectrum model could define signal-to-noise ratio (SNR), or TimeSeries could define period, or variability. We put no restriction on the DerivedElement content since the result could be a simple value or a complex object. However, it is recommended that extensions be simple and compact in keeping with the primary intent of use in data discovery. Simple extension of DerivedElement class which can serve many use cases. Usages of this object in other models to define specific elements should explicitly define the element name, and the process by which the value is determined. Name identifying the derived element. Value of the derived element. Dataset Metadata package This object provides specific information regarding the data model and version thereof that a dataset represents. This information is primarily an aide to (de)serialization of the dataset. As such, it would better be defined by serialization conventions. In lieu of such conventions, we include this object and representative serialization examples for the primary file formats VOTable and FITS. Formatted string providing the name and version of the model document. The format speficication is "[name]-[version].[subversion]". Each derived model should specifically state the appropriate string for this field. Prefix string used in UType string for elements associated with the model. For IVOA standard models, this will be a predetermined string (e.g. 'stc' or 'spec'). Extended, or user defined content can specify a unique prefix for their own content. URL pointer to the XML schema associated with the model. Abstract object for the generic IVOA Dataset. It is intended to be useful for any type of data. Specific dataset models should extend this object, providing detailed definitions and additional content as appropriate for that type of dataset. Provides metadata related to the entity responsible for the curation of the dataset. DataID provides high level identification metadata for the dataset itself, and any associations with various collections. Describes the high level scientific classification of the data content. Values are restricted to the DataProductType enumeration set and convey the general idea of the content and organization of a dataset. Secondary type classification for the dataset. This field is intended to precisely specify the scientific nature of the data product, possibly in terms relevant only to a specific archive or data collection. For example, dataProductType='image' could have associated dataProductSubtype="src.image", "bkg.image", "PixelMask", etc. Values are unrestricted strings. The Data Identification object (DataID) stores the dataset identifiers and other metadata typically assigned by the dataset creator. The Dataset IDs in this object must comply with the syntax for dataset identifiers defined in the "IVOA Identifiers" document, including the use of 'stop' characters to identify specific datasets that are not individually in the registry. e.g., ivo://example.net/aservice?2013/5/2342. Much of the content of this object is assembled from various definitions in the IVAO Resource Metadata document. Here, we provide a brief description of each field for easy reference, along with a notation of its mapping to the Resource Metadata document (RM:field), where the reader may find more detailed information. The institution or entity which created the dataset. Persons or entities who contributed to the generation of the scientific content of the dataset. Users of the dataset should include these in subsequent credits and acknowledgements. (RM:Curation.Contributor) The dataset is associated with zero or more Collections (instrument name, survey name, etc.) . Each instance identifies a tag indicating some degree of compatibility with other data sharing the same Collection properties. A free form string giving a title for the dataset. (RM:Identity.Title) If the dataset is registered with an external 'global index service' such as ADS, the publisher may include that identifier here. This provides a common, persistent identifier for the dataset, and possible access point to follow for information on publications and other related datasets. Note: the same dataset published at more than one location would have different Curation.publisherDID values, but the same DataID.datasetID. eg: "ivo://ADS/Sa.CXO?obsid=1234", "ivo://ADS/sh.hut#ngc4151_141" The dataset identifier assigned by the creator. Here, the authority-id of the identifier must be that of the creator. It is used to identify the original exposure of the dataset in an archive, and will remain static regardless of where the dataset is published. The creator ID will not necessarily change even if the VO object in question is a cutout or is otherwise further processed. Version assigned by the creator, reflecting the production version of the dataset. This value should only be changed by the creator, upon the new release of a dataset. There are no format restrictions or specifications on the versioning scheme. Data processing or creation date (RM:Curation.Date). The dataset creation type describes the nature or genre of the content. (RM:General.Type). Note: This field provides information about the process by which the dataset was created. As the Observation/Experiment model matures, this may evolve into a provenance element on the Experiment type. The Curation object provides metadata assigned by the entity responsible for the support of the dataset content as well as identifying metadata about that entity. It is assembled from definitions provided by the IVOA standard document, "Resource Metadata for the Virtual Observatory; Version 1.12" (Resource Metadata). Here, we provide a brief description of each field for easy reference, along with a notation of its mapping to the Resource Metadata document (RM:field), where the reader may find more detailed information. The entity making the data available. (RM:Curation.Publisher) Contact information of the person/entity responsible for the content of the dataset. We recommend using a generic 'helpdesk' type contact rather than individuals whose information may more easily become obsolete. (RM:Curation.Contact) Zero or more bibliographic or documentation references associated with the dataset. Each instance provides a single forward link to a major publication which references the dataset. (RM:General.Source) IVOA dataset identifier assigned by the publisher to uniquely identify the dataset within its holdings. Typically, the basis of this identifier will be the publisher ID. However, if the publisherchooses to use a 'global index service' such as ADS to obtain persistent identifiers for their datasets, rather than generate their own, that identifier should be used both here and for DataID.datasetID. Note: this model also defines a creator dataset ID (DataID.creatorDID), these will differ if the publishing entity is not the creator of the dataset. Values are to be expressed as dataset identifiers using the syntax described in "IVOA Identifiers" Version of the curated dataset, assigned by the publisher. This is an independent versioning from DataID.version that allows the publisher to track changes to the high level dataset metadata (e.g. curation metadata, identifiers, etc.) without effecting the creator defined dataset version. The value may be based on the DataID.version (e.g. by adding a sub-version extension), or an independent versioning. There are no format restrictions on the value. (RM:Curation.Version) Date the curated dataset was last modified. (RM:Curation.Date) Enumeration identifying the high level classification of a data product. A multidimensional astronomical image of three (3) or more axes. A two (2) dimensional astronomical image. Dataset with spectral coverage with irregular gaps. Dataset where spectral coverage is the primary attribute, in contiguous bins. e.g. a 1D spectrum or a long slit spectrum. Dataset presenting some quantity varying as a function of time. A light curve is a typical example of a timeseries dataset. A spectral energy distribution, an advanced data product often produced by combining data from multiple observations. A visibility (radio) dataset. Typically this is instrumental data, and is often a complex object containing multiple files or other substructures. A visibility dataset may contain data with spatial, spectral, time, and polarization information for each measured visibility. An event counting dataset (e.g. X-ray). An event dataset may contain data with spatial, spectral, and time information for each measured event. A catalog. Enumeration of dataset creation types. Indicates that it is one of a collection of datasets generated in a systematic, homogeneous way and is stored statically (or at least versioned). It will be possible to regenerate this dataset at a later date. The remaining types imply on-the-fly manipulation. Indicates that the dataset was created "on-the-fly", by subsetting, but not by modifying values. May involve excluding data prior to binning into samples, also "on-the-fly" Combines multiple original datasets "on-the-fly" Has been extracted, for example, from a spectral data cube. Has been extracted from a catalog. Enumeration of access rights levels. unrestricted, public access is allowed, without authentication. only proprietary access is allowed with authentication. authenticated, public access is allowed. λ ≥ 10 mm; ν ≤ 30 GHz 0.1 mm ≤ λ ≤ 10 mm; 3000 GHz ≥ ν ≥ 30 GHz 1 μ ≤ λ ≤ 100 μ 0.3 μ ≤ λ ≤ 1 μ 100 Å ≤ λ ≤ 3000 Å; 1.2 eV ≤ E ≤ 120 eV 0.1 Å ≤ λ ≤ 100 Å; 0.12 keV ≤ E ≤ 120 keV E ≥ 120keV Publisher is modeled as a Role played by the organization or entity making the Dataset available. IVOA resource identifier associated with the publisher and registered with an IVOA compliant registry (eg: ivo://mast.stsci.edu). Values are to be expressed using the syntax described in "IVOA Identifiers". Contact information for a person or entity. Contact is modeled as a Role played by a particular person or entity (Party). We subset the type of Party to include only Individuals. This includes both a physical person, or proxy service such as a helpdesk. Creator is modeled as a Role played by the organization or entity which created the Dataset. Contributor is modeled as a Role played by a Party or entity who participated in the generation of the Dataset. Acknowledgment expression for the contributor. Users of the dataset should include these in subsequent credits and acknowledgements. The expression should be formatted as desired by the contributor. Any referenceable publication associated with a Dataset. Reference code of the publication. Values should be expressed as a URL, or bibcode (discernible as a 19 character string beginning with 4 digits). Free text references are allowed, but discouraged. A generic organizational construct which allows Datasets to be associated with each other by a set of Collection properties. Datasets tagged with the same Collection properies can be assumed to have some degree of compatibility. The values are generally defined by the creating entity. Examples: "WFC", "Sloan", "BFS Spectrograph", "MSX Galactic Plane Survey". We include a simple Party model for associating an Entity with a Role that Entity is playing. For example, a particular Individual can be both a Contact and Publisher of a dataset. Abstract head of the set of classes describing various entities. Name of the Party or entity. All entities are assumed to have a name. Abstract class for the entity role. Models should extend this class to define local roles which are played by various entities/parties. Reference to the Party or Entity which is associated with this role. Abstract head of the sub-set of Parties which describe a single individual. Mailing address for the person. The value is expressed as a single string containing all components of the address. Phone number associated with the person. The value is expressed as a string according to the format appropriate for the locale. E-mail address of the Individual Extension of Party for any Organization or Institution. Mailing address. The value is expressed as a single string containing all components of the address. Phone number. The value is expressed as a string according to the format appropriate for the locale. E-mail address of the Organization. URL pointer to a graphical logo associated with the Organization.