I nternational
V irtual
O bservatory
A lliance
Registries provide a mechanism with which VO applications can discover and select resources—e.g. data and services—that are relevant for a particular scientific problem. This specification defines two interfaces that support interactions between applications and registries as well as between the registries themselves. It is based on a general, distributed model composed of so-called searchable and publishing registries. The specification has two main components: an interface for searching and an interface for harvesting. Harvesting is supported through the existing Open Archives Initiative Protocol for Metadata Harvesting, whereas searching is performed using the IVOA Table Access Protocol together with a specification of a set of tables comprising a useful subset of the information contained in the registry records. Finally, this specification details the metadata used to describe registries themselves.
This is an IVOA Working Draft for review by IVOA members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Working Drafts as reference materials or to cite them as other than "work in progress".
A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.
Sections 2 and 4 of this document have largely been taken from the previous version of the Registry Interfaces standard [RI1].
This document has been developed in part with support from the German Astronomical Virtual Observatory (BMBF Bewilligungsnummer 05A08VHA).
The words "MUST ", "SHOULD", "MAY", "RECOMMENDED", and "OPTIONAL" (in upper or lower case) used in this document are to be interpreted as described in the IETF standard RFC 2119 [RFC 2119].
In the Virtual Observatory (VO), registries provide a means for discovering useful resources, i.e., data and services. This discovery takes place by searching within structured descriptions of resources, the resource records. To make discovery efficient, a registry typically keeps some internal representation of all resource records available within the VO.
However, the resources records themselves and the data providers that maintain them are distributed. Since the VO contains multiple registries, the registry service itself is distributed. Thus, there is a clear need for common mechanisms for registry communication and interaction.
This document describes the standard interfaces that enable interoperable registries. Through these interfaces, registry builders have a common way of sharing resource descriptions with users, applications, and other registries. Client applications can be built according to this specification and be able to discover and retrieve descriptions from any compliant registry.
This specification does not preclude a registry operator from providing additional value-added interfaces and capabilities. In particular, they are free to build interactive, end-user interfaces in any way that best serves their target community. It is a design goal of this specification, though, that different registries operating on the same set of registry records will return identical responses for some defined subset of possible queries.
A registry is first a repository of structured descriptions of resources. In the VO, a resource is defined by the IVOA Recommendation, "Resource Metadata for the Virtual Observatory" [RMI]:
A resource is a general term referring to a VO element that can be described in terms of who curates or maintains it and which can be given a name and a unique identifier. Just about anything can be a resource: it can be an abstract idea, such as sky coverage or an instrumental setup, or it can be fairly concrete, like an organization or a data collection.
Organizations, data collections, and services can be considered as classes of resources. The most important type of resource to applications is a service that actually does something. The A registry, then, is "a service for which the response is a structured description of resources" [RMI].
This specification is based on the general IVOA model for registries [Plante2003], which builds on the [RMI]'s model for resources. In the registry model, the VO environment features different types of registries that serve different functions. The primary distinction is between publishing registries and searchable ones. A secondary distinction is full versus partial.
A searchable registry is one that allows users and client applications to search for resource records using selection criteria against the metadata contained in the records. The purpose of this type of registry is to aggregate descriptions of many resources distributed across the network. By providing a single place to locate data and services, applications are saved from having to visit many different sites to just to determine which ones are relevant to the scientific problem at hand. A searchable registry gathers its descriptions from across the network through a process called harvesting.
A publishing registry is one that simply exposes its resource descriptions to the VO environment in a way that allows those descriptions to be harvested. The contents of these registries tend to be limited to resources maintained by one or a few providers and thus are local in nature; for example, a data center will run its own publishing registry to expose all the resources it maintains to the VO environment. Since the purpose is simply publishing and not to serve users and applications directly, it is not necessary to support full searching capabilities. This simplifies the requirements for a publishing registry: not only does it not need to support the general search interface, the storage and management of the records can be simpler. While a searchable registry in practice will necessitate the use of a database system, a simple publishing registry may get by storing its records as flat files on disk.
Note that some registries can play both roles; that is, a searchable registry may also publish its own resource descriptions.
A secondary distinction is full versus local. A full registry is one that attempts to contain records of all resources known to the VO. Several such registries exist, run by various VO projects. A local registry, on the other hand, contains only a subset of known resources.
As mentioned above, harvesting is the mechanism by which a registry can collect resource records from other registries. It is used by full registries to aggregate resource records from many publishing registries. It can also be used to synchronize two registries to ensure that they have the same contents. Harvesting, in this specification, is modeled as a pull operation between two registries. The term harvester refers to the registry that wishes to receive records (usually a searchable registry); it sends its request to the harvestee (usually a publishing registry), which responds with the records. Harvesting is a much simpler process than search and retrieval. Consequently, two different protocols are employed for the two types of registry operations.
This specification directly relates to other VO standards in the following ways:
res_detail
table using
utypes algorithmically generated from the XML schema documents given by
these standards. This document should not in general need updates
for registry extension updates. Still, in particular with a view to the
caveat in VOResource Utypes, we note the
version current as of this specification: SimpleDALRegExt 1.0,
StandardsRegExt 1.0, TAPRegExt 1.0.This standard also relates to other IVOA standards:
The harvesting interface allows the retrieval of complete VOResource records from registries supporting harvesting. Publishing registries MUST support the IVOA harvesting interface, searchable registries SHOULD do so.
The IVOA harvesting interface is built on the standard Protocol for Metadata Harvesting developed by the Open Archives Initiative, OAI-PMH [OAI]. Version 2.0 Registry Interfaces drops support of the SOAP variant of OAI-PMH defined in Version 1.0 of this specification.
While for details we refer to [OAI], in the following we give a brief overview of OAI-PMH that should be sufficient to understand the protocol's role within the Registry Interface architecture.
The OAI-PMH v2.0 specification defines:
The six standard operations laid down in OAI-PMH are:
The ListRecords and GetRecord operations return the actual resource
description records held by the registry. These descriptions are encoded in XML
and wrapped in a general-purpose envelope defined by the OAI-PMH XML
Schema (with the namespace http://www.openarchives.org/OAI/2.0
).
Through the operations' arguments, OAI-PMH provides a number of useful features:
ListMetadataFormats
operation, a harvester can request the formats
available for encoding returned resource descriptions.ListIdentifiers
and
ListRecords
operations both support from
and
until
date arguments which restrict the response to records
changed withing the given, possibly half-open, interval.ListIdentifiers
and
ListRecords
operations both support a set argument for retrieving resources that are
grouped in a particular category. Resource records may belong to
multiple sets.It is important to note that the OAI-PMH interface is not intended to be a general search interface. The filtering capabilities described above are just enough to support intelligent harvesting between registries. Most end-user applications will use the search interface described below.
In addition to basic OAI-PMH compliance, this specification defines an additional set of OAI-PMH-compliant requirements and recommendations special to OAI-PMH's use within the VO that are described in the remaining subsections.
All IVOA registries that support the Harvesting Interface must support two standard metadata formats: the OAI Dublin Core format (mandated by the base OAI-PMH standard) and the IVOA VOResource metadata format [VOR].
The VOResource metadata format has the metadata prefix name
ivo_vor
, which can be used wherever [OAI]
allows a metadata prefix name. The format uses the VOResource core
XML Schema with the namespace
http://www.ivoa.net/xml/VOResource/v1.0
(recommended
namespace prefix vr:
) along with any legal
extension of this schema to encode the resource descriptions within the
OAI-PMH metadata tag from the OAI XML Schema (namespace
http://www.openarchives.org/OAI/2.0
, recommended
namespace prefix oai:
). The format is specifically
represented by an element called Resource
from the
http://www.ivoa.net/xml/RegistryInterface/v1.0
namespace
(recommended namespace prefix ri:
) as the sole child
of the oai:metadata
element. The registry interface schema
is defined by this standard and is given in Appendix
C. The ri:Resource
element must include an xsi:type
attribute that assigns the
element's type to vr:Resource
or one of its legal
extensions.
If and when the VOResource schema evolves to a new version, this standard must be updated accordingly. Thus, this definition is locked to particular version of the VOResource, so saying that a registry is compliant with vX.X of this document implies a specific version of VOResource.
It is strongly recommended that all QName values of
xsi:type
attributes within the VOResource record use XML
namespace prefixes as recommended in [VOR] or the
VOResource extensions. Minor version changes are not in general
reflected in the recommended prefixes—e.g., both VODataService 1.0
and VODataService 1.1 use vs:
. If you must deliver OAI-PMH
decuments containing resource records written to different versions of a
registry extension, override the prefix bindings on the element level if
at all possible.
The OAI Dublin Core format, with the metadata prefix of
oai_dc
, is
defined by the OAI-PMH base standard and must be supported by all
OAI-PMH compliant registries. TODO: write out that
mapping, it should largely be straightforward.
Harvestable registries may support other metadata formats. The
ListMetadataFormats
must list all names for formats supported by the registry;
even though they are mandatory, this list must include
ivo_vor
and oai_dc
.
In accordance with the OAI-PMH standard, an OAI-PMH XML envelope that
contains a resource description must include a globally unique URI that
identifies that resource record. This identifier must be the IVOA
identifier used to identify the resource being described as given in
its vr:identifier
child element.
This specification does not follow the recommendation of the OAI-PMH standard with regard to record identifiers. OAI-PMH makes a distinction between the resource record containing resource metadata and the resource itself; thus, it recommends that the identifier in the OAI envelope be different from the resource identifier. In particular, the former is the choice of the publishing registry. This allows one to distinguish resource descriptions of the same resource from different registries, which in principle could be different.
In the VO, because it is intended that resource descriptions of the
same resource from different registries should not differ (apart from
possible additions of vr:validationLevel
elements), there
is not a strong need to distinguish between the resource and the
resource description. By making the resource and resource record
identifiers the same, it makes it much easier to retrieve the record for
a single resource via GetRecord
, regardless of which
registry is being queried. Otherwise—when the registry chooses
the record identifier—a client will not a priori know the record
identifier for a particular resource, and so it is left to call
ListRecords
and search through the metadata of all the
records itself to find the one of interest. In contrast, IVOA
identifiers are intended to be a cross-application way of referring to a
resource, and thus when a client wants only a single specific resource
record, it is very likely that it would know the resource identifier
when making a call to the GetRecord
operation.
This section describes the records that a harvestable IVOA Registry must include among those it emits via the OAI-PMH operations.
The harvestable registry MUST return one record that describes the
registry itself as a whole, and the ivo_vor
format MUST be
supported for this record. This record is included in the
Identify
operation response. When encoded using the
ivo_vor
format, the returned ri:Resource
element must be of the type vg:Registry
from the VORegistry
schema (namespace http://www.ivoa.net/xml/VORegistry/v1.0
;
recommended namespace prefix vg:
; see Appendix A). The record MUST include
a vg:managedAuthority
for every authority identifier that
originated at that registry.
Before adding an authority to the list of a registry's managed authorities, the registry operator must verify no other registry claims to manage that authority. In other words: Within the whole VO, the relation mapping registries to authorities must be invertible. This allows determining the originating registry just from the authority part of a record's identifier. This specification does not provide technical safeguards to ensure the invertibility of the managed authority relation.
The harvestable registry must be able to return exactly one record in
ivo_vor
for each authority identifier listed as a
vg:managedAuthority
in the vg:Registry
record
that describes that registry. When encoded in the ivo_vor
format, the type of these elements must be vg:Authority
.
The Identify
operation describes the harvestable registry as a whole.
The response from this operation must include all information required
by the OAI-PMH standard. In particular, it must include an
oai:baseURL
element that must refer to the base URL to the
harvesting interface endpoint. The Identify
response must include an
oai:description
element containing a single
ri:Resource
element with an xsi:type
attribute
that sets the element's type to vg:Registry
. The content of
vg:Registry
type must be the registry description of the
harvestable registry itself.
Sets, as defined in the OAI-PMH standard, are "an optional construct for grouping items for the purpose of selective harvesting" (see the [OAI], section 2.6). Harvestable IVOA registries are free to define any number of custom sets for categorizing records. The OAI-PMH standard allows a record to be a member of multiple sets.
This specification defines one reserved set name with a special
meaning; future versions of this specification may define additional set
names. These reserved set names will all start with the characters
ivo_
; implementors should not define their own set names
that begin with this string. While support for sets is optional to be
compliant with the OAI-PMH standard, a harvestable registry must support
the set with the reserved name ivo_managed
to be compliant
with this specification.
The ivo_managed
set refers to all records that originate from the
queried registry. That is, those records that were harvested from other
registries are excluded. The IVOA Resource identifiers given in the
records must have an authority identifier that matches on one of the
vg:managedAuthority
values in the vg:Registry
record for that registry. Full searchable registries may use this set
to avoid getting duplicate records when harvesting from many
registries.
To be written. One candidate (possibly to be referenced from here) is at https://volute.googlecode.com/svn/trunk/projects/registry/regtap.
This specification defines a VOResource extension schema called VORegistry that can be used to specifically describe a registry and its support for the registry interface described in this document. These descriptions can be stored as resource records in registries. The schema is also used to register a naming authority—a publisher who claims ownership of an authority identifier from which IVOA identifiers may be created. A publishing registry is said to exclusively manage a naming authority on behalf of the owning publisher; this means that only that registry may publish records with IVOA identifiers using that authority identifier. The full VORegistry syntax definition expressed in XML Schema is listed in Appendix TODO.
The VORegistry schema namespace is
http://www.ivoa.net/xml/VORegistry/v1.0
.
As with the core VOResource Schema, the namespace URI has been chosen to
allow it to be resolved as a URL to the XML Schema document that defines the
VORegistry schema. Applications may assume that the namespace URI is so
resolvable. In particular, it is recommended the namespace URI be given as the
location for the VORegistry schema within the
xsi:schemaLocation
attribute. The recommended prefix for
this namespace is vg:
.
The vg:Authority
type extends the core
vr:Resource
type to specifically describe the ownership of
an authority identifier by a publishing organisation.
<xs:complexType name="Authority" > <xs:complexContent > <xs:extension base="vr:Resource" > <xs:sequence > <xs:element name="managingOrg" type="vr:ResourceName" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
vg:Authority Extension Metadata Elements | |||||||||
---|---|---|---|---|---|---|---|---|---|
Element | Definition | ||||||||
managingOrg |
|
The IVOA identifier of a vg:Authority
record provided
via the vr:identifier
element must have an empty resource
key component as defined in [VOID]. The
authority identifier component of the record's identifier is the one
that is the subject of the record itself.
The meaning of a vg:Authority
record is that the
organisation referenced in the vg:managingOrg
element has
the sole right to create (in collaboration with a publishing registry)
and register resource descriptions using the authority identifier given
by the vr:identifier
element.
Before a publisher can create resource descriptions using a new
authority identifier, it must first register its claim to the authority
identifier by creating a vg:Authority
record. Before the
publishing registry commits the record for export, it must first search
a full registry to determine if a vg:Authority
with
this identifier already exists; if it does, the publishing of the new
vg:Authority
record must fail. When a registry creates a
vg:Authority
record, it is said that the registry manages
the associated authority identifier (on behalf of the owning publisher)
because only that registry may create records with identifiers using
that authority identifier.
The vg:Registry
type extends the core
vr:Service
type to specifically
describe registries that are compliant with this standard.
<xs:complexType name="Registry" > <xs:complexContent > <xs:extension base="vr:Service" > <xs:sequence > <xs:element name="full" type="xs:boolean" /> <xs:element name="managedAuthority" type="vr:AuthorityID" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
vg:Registry Extension Metadata Elements | |||||||||
---|---|---|---|---|---|---|---|---|---|
Element | Definition | ||||||||
full |
| ||||||||
managedAuthority |
|
If the vg:full
element is set to true
, the
registry is obligated to accept all valid resource records it harvests
from other registries in accordance with the OAI-PMH specification.
The vg:managedAuthority
element applies specifically to registries in their
role as publishers of records. When a publishing registry claims to manage an
authority identifier, it has created a vg:Authority
resource record
for that authority identifier.
As a subclass of vr:Service
, the
vg:Registry
type uses vr:capability
elements to describe its support for the interfaces described in this specification.
In particular, the VORegistry schema defines two extensions of the
VOResource's vr:Capability
type: one to describe
the support for the searching interface and one to describe the
deprecated version 1.0 harvesting interface
Both extension types extension types extend from an
intermediate restriction on vr:Capability
called
vg:RegCapRestriction
to
force the value of the standardID attribute to be
ivo://ivoa.net/std/Registry
:
<xs:complexType name="RegCapRestriction" abstract="true" > <xs:complexContent > <xs:restriction base="vr:Capability" > <xs:sequence > <xs:element name="validationLevel" type="vr:Validation" minOccurs="0" maxOccurs="unbounded" /> <xs:element name="description" type="xs:token" minOccurs="0" /> <xs:element name="interface" type="vr:Interface" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="standardID" type="vr:IdentifierURI" use="required" fixed="ivo://ivoa.net/std/Registry" /> </xs:restriction> </xs:complexContent> </xs:complexType>
vg:RegCapRestriction Metadata Elements | |||||||
---|---|---|---|---|---|---|---|
Element | Definition | ||||||
validationLevel |
| ||||||
description |
| ||||||
interface |
|
vg:RegCapRestriction Attributes | |||||||
---|---|---|---|---|---|---|---|
Attribute | Definition | ||||||
standardID |
|
As an abstract type, the vg:RegCapRestriction
type cannot
be used directly
on its own within a resource description; one of the non-abstract extensions of
this intermediate type must be used instead.
The vr:Capability
extension types are used by applying
the xsi:type
attribute to the vr:capability
element (see also [VOR], section 2.2.2). A version 2
registry should, in general, provide a harvesting capability and the
three mandatory VOSI capabilities [VOSI].
The search capability (vg:Search
) is no longer used, and
clients use will search for TAP services supporting the registry data
model to locate the search endpoints. It is retained in the schema to
avoid a disruptive schema change just to remove an element.
A registry declares itself to be a harvestable registry by including a
vr:capability
element with an xsi:type
attribute set to vg:Harvest
.
<xs:complexType name="Harvest" > <xs:complexContent > <xs:extension base="vg:RegCapRestriction" > <xs:sequence > <xs:element name="maxRecords" type="xs:int" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
vg:Harvest Extension Metadata Elements | |||||||
---|---|---|---|---|---|---|---|
Element | Definition | ||||||
maxRecords |
|
A vr:capability
element of type vg:Harvest
must include at least one vr:interface
element with an
xsi:type
attribute set to vg:OAIHTTP
and the
role
attribute set to std
. If the
vr:capability
element is used to simultaneously describe
support for other versions of this Registry Interface standard, then the
vr:interface
element describing support for this version
must include the version attribute set to 2.0
. The
vr:accessURL
element must be set to the base URL for the
OAI-PMH interface.
The vg:OAISOAP
extension of vr:WebService
was used by Registry Interfaces 1.0 and is no longer part of this
specification.
For pre-REC-1.0 changes, see [RI1].