I nternational

V irtual

O bservatory

A lliance

IVOA Registry Interfaces
Version

Filled in automatically

Working Group:
Registry WG
This version:
filled in automatically
Latest version:
http://www.ivoa.net/Documents/RegistryInterface/
Previous versions:
IVOA Registry Interfaces 1.0, IVOA Recommendation 2009 November 4
Authors:
Markus Demleitner
Paul Harrison
Gretchen Greene
Theresa Dower
and the authors of the Registry Interfaces specification version 1 [RI1].

Abstract

Registries provide a mechanism with which VO applications can discover and select resources—e.g. data and services—that are relevant for a particular scientific problem. This specification defines two interfaces that support interactions between applications and registries as well as between the registries themselves. It is based on a general, distributed model composed of so-called searchable and publishing registries. The specification has two main components: an interface for searching and an interface for harvesting. Harvesting is supported through the existing Open Archives Initiative Protocol for Metadata Harvesting, whereas searching is performed using the IVOA Table Access Protocol together with a specification of a set of tables comprising a useful subset of the information contained in the registry records. Finally, this specification details the metadata used to describe registries themselves.

Status of this Document

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgments

Sections 2 and 4 of this document have largely been taken from the previous version of the Registry Interfaces standard [RI1].

This document has been developed in part with support from the German Astronomical Virtual Observatory (BMBF Bewilligungsnummer 05A08VHA).

Conformance-related definitions

The words "MUST ", "SHOULD", "MAY", "RECOMMENDED", and "OPTIONAL" (in upper or lower case) used in this document are to be interpreted as described in the IETF standard RFC 2119 [RFC 2119].

Contents

1. Introduction

In the Virtual Observatory (VO), registries provide a means for discovering useful resources, i.e., data and services. This discovery takes place by searching within structured descriptions of resources, the resource records. To make discovery efficient, a registry typically keeps some internal representation of all resource records available within the VO.

However, the resources records themselves and the data providers that maintain them are distributed. Since the VO contains multiple registries, the registry service itself is distributed. Thus, there is a clear need for common mechanisms for registry communication and interaction.

This document describes the standard interfaces that enable interoperable registries. Through these interfaces, registry builders have a common way of sharing resource descriptions with users, applications, and other registries. Client applications can be built according to this specification and be able to discover and retrieve descriptions from any compliant registry.

This specification does not preclude a registry operator from providing additional value-added interfaces and capabilities. In particular, they are free to build interactive, end-user interfaces in any way that best serves their target community. It is a design goal of this specification, though, that different registries operating on the same set of registry records will return identical responses for some defined subset of possible queries.

Registry Architecture and Definitions

A registry is first a repository of structured descriptions of resources. In the VO, a resource is defined by the IVOA Recommendation, "Resource Metadata for the Virtual Observatory" [RMI]:

A resource is a general term referring to a VO element that can be described in terms of who curates or maintains it and which can be given a name and a unique identifier. Just about anything can be a resource: it can be an abstract idea, such as sky coverage or an instrumental setup, or it can be fairly concrete, like an organization or a data collection.

Organizations, data collections, and services can be considered as classes of resources. The most important type of resource to applications is a service that actually does something. The A registry, then, is "a service for which the response is a structured description of resources" [RMI].

This specification is based on the general IVOA model for registries [Plante2003], which builds on the [RMI]'s model for resources. In the registry model, the VO environment features different types of registries that serve different functions. The primary distinction is between publishing registries and searchable ones. A secondary distinction is full versus partial.

A searchable registry is one that allows users and client applications to search for resource records using selection criteria against the metadata contained in the records. The purpose of this type of registry is to aggregate descriptions of many resources distributed across the network. By providing a single place to locate data and services, applications are saved from having to visit many different sites to just to determine which ones are relevant to the scientific problem at hand. A searchable registry gathers its descriptions from across the network through a process called harvesting.

A publishing registry is one that simply exposes its resource descriptions to the VO environment in a way that allows those descriptions to be harvested. The contents of these registries tend to be limited to resources maintained by one or a few providers and thus are local in nature; for example, a data center will run its own publishing registry to expose all the resources it maintains to the VO environment. Since the purpose is simply publishing and not to serve users and applications directly, it is not necessary to support full searching capabilities. This simplifies the requirements for a publishing registry: not only does it not need to support the general search interface, the storage and management of the records can be simpler. While a searchable registry in practice will necessitate the use of a database system, a simple publishing registry may get by storing its records as flat files on disk.

Note that some registries can play both roles; that is, a searchable registry may also publish its own resource descriptions.

A secondary distinction is full versus local. A full registry is one that attempts to contain records of all resources known to the VO. Several such registries exist, run by various VO projects. A local registry, on the other hand, contains only a subset of known resources.

As mentioned above, harvesting is the mechanism by which a registry can collect resource records from other registries. It is used by full registries to aggregate resource records from many publishing registries. It can also be used to synchronize two registries to ensure that they have the same contents. Harvesting, in this specification, is modeled as a pull operation between two registries. The term harvester refers to the registry that wishes to receive records (usually a searchable registry); it sends its request to the harvestee (usually a publishing registry), which responds with the records. Harvesting is a much simpler process than search and retrieval. Consequently, two different protocols are employed for the two types of registry operations.

The Registry Interface within the VO Architecture

RegistryInterface within the VO architecture

Figure 1: IVOA Architecture diagram with the Registry Interface specification (RI) and the related standards marked up.

This specification directly relates to other VO standards in the following ways:

VOResource, v1.03 [VOR]
VOResource sets the foundation for a formal definition of the data model for resource records via its schema definition. This document refers to concepts laid down there via the utypes given here.
VODataService, v1.1 [VODS]
VODataService extends the VOResource data model by important concepts, like tablesets, and it introduces several new resource types (data services, data collections). These concepts are reflected in the database schema defined for searchable registries.
Other Registry Extensions
Registry extensions (including this document) are VO standards defining how special types of resources are described. Most aspects introduced by them are reflected in the res_detail table using utypes algorithmically generated from the XML schema documents given by these standards. This document should not in general need updates for registry extension updates. Still, in particular with a view to the caveat in VOResource Utypes, we note the version current as of this specification: SimpleDALRegExt 1.0, StandardsRegExt 1.0, TAPRegExt 1.0.
TAP, v1.0 [TAP]
TAP is used as the transport protocol for the queries and results in the interface to searchable registries. It also allows discovering local additions to the registry relations via TAP's metadata publishing mechanisms.
IVOA Identifiers, v1.12
IVOA identifiers are something like the primary keys to the VO registry. Also, the notion of an authority as laid down in IVOA Identifiers plays an important role as publishing registries can be viewed as a realization of a set of authorities.

This standard also relates to other IVOA standards:

utypes
To link columns and tables in the relational resource model to entities defined in VOResource and ancillary specifications, we employ utypes. These utypes are generated by running the XSLT style sheet given in Appendix A on the XML schema documents shipped with the relevant standards.

The IOVA Harvesting Interface

The harvesting interface allows the retrieval of complete VOResource records from registries supporting harvesting. Publishing registries MUST support the IVOA harvesting interface, searchable registries SHOULD do so.

The OAI Protocol for Metadata Harvesting

The IVOA harvesting interface is built on the standard Protocol for Metadata Harvesting developed by the Open Archives Initiative, OAI-PMH [OAI]. Version 2.0 Registry Interfaces drops support of the SOAP variant of OAI-PMH defined in Version 1.0 of this specification.

While for details we refer to [OAI], in the following we give a brief overview of OAI-PMH that should be sufficient to understand the protocol's role within the Registry Interface architecture.

The OAI-PMH v2.0 specification defines:

  • the meaning and behavior of the six harvesting operations, referred to as verbs,
  • the meaning of the input arguments for each operation, and
  • the XML Schema used to encode response messages.

The six standard operations laid down in OAI-PMH are:

Identify
provides a description of the registry
ListIdentifiers
returns a list of identifiers for the resource records held by the registry, possibly restricted to records changed within a certain time span or to those belonging to a certain set..
ListRecords
returns complete resource records in the registry, possibly restricted to records changed within a certain time span or to those belonging to a certain set.
GetRecord
returns a single resource description matching a given identifier.
ListMetadataFormats
returns a list of supported formats that the registry can use to encode resource descriptions upon a harvester's request.
ListSets
returns a list of set names supported by the registry that harvesters can request in order to get back a subset of the descriptions held by the registry.

The ListRecords and GetRecord operations return the actual resource description records held by the registry. These descriptions are encoded in XML and wrapped in a general-purpose envelope defined by the OAI-PMH XML Schema (with the namespace http://www.openarchives.org/OAI/2.0).

Through the operations' arguments, OAI-PMH provides a number of useful features:

  • Support for multiple return formats. As suggested by the ListMetadataFormats operation, a harvester can request the formats available for encoding returned resource descriptions.
  • Harvesting by date. The ListIdentifiers and ListRecords operations both support from and until date arguments which restrict the response to records changed withing the given, possibly half-open, interval.
  • Harvesting by category. The ListIdentifiers and ListRecords operations both support a set argument for retrieving resources that are grouped in a particular category. Resource records may belong to multiple sets.
  • Marking records as deleted. Registries may mark records as deleted so that harvesters may remove access to them from their applications. Registries may permanently remove deleted resources that have been marked deleted for more than six months.
  • Support for resumption tokens. If a request results in returning a very large number of records, the registry can choose to split the results over several calls; this is done by passing a resumption token back to the harvester. The harvester uses it to retrieve the next set of matching results.

It is important to note that the OAI-PMH interface is not intended to be a general search interface. The filtering capabilities described above are just enough to support intelligent harvesting between registries. Most end-user applications will use the search interface described below.

In addition to basic OAI-PMH compliance, this specification defines an additional set of OAI-PMH-compliant requirements and recommendations special to OAI-PMH's use within the VO that are described in the remaining subsections.

Metadata Formats for Resource Descriptions

All IVOA registries that support the Harvesting Interface must support two standard metadata formats: the OAI Dublin Core format (mandated by the base OAI-PMH standard) and the IVOA VOResource metadata format [VOR].

The VOResource metadata format has the metadata prefix name ivo_vor, which can be used wherever [OAI] allows a metadata prefix name. The format uses the VOResource core XML Schema with the namespace http://www.ivoa.net/xml/VOResource/v1.0 (recommended namespace prefix vr:) along with any legal extension of this schema to encode the resource descriptions within the OAI-PMH metadata tag from the OAI XML Schema (namespace http://www.openarchives.org/OAI/2.0, recommended namespace prefix oai:). The format is specifically represented by an element called Resource from the http://www.ivoa.net/xml/RegistryInterface/v1.0 namespace (recommended namespace prefix ri:) as the sole child of the oai:metadata element. The registry interface schema is defined by this standard and is given in Appendix C. The ri:Resource element must include an xsi:type attribute that assigns the element's type to vr:Resource or one of its legal extensions.

If and when the VOResource schema evolves to a new version, this standard must be updated accordingly. Thus, this definition is locked to particular version of the VOResource, so saying that a registry is compliant with vX.X of this document implies a specific version of VOResource.

It is strongly recommended that all QName values of xsi:type attributes within the VOResource record use XML namespace prefixes as recommended in [VOR] or the VOResource extensions. Minor version changes are not in general reflected in the recommended prefixes—e.g., both VODataService 1.0 and VODataService 1.1 use vs:. If you must deliver OAI-PMH decuments containing resource records written to different versions of a registry extension, override the prefix bindings on the element level if at all possible.

The OAI Dublin Core format, with the metadata prefix of oai_dc, is defined by the OAI-PMH base standard and must be supported by all OAI-PMH compliant registries. TODO: write out that mapping, it should largely be straightforward.

Harvestable registries may support other metadata formats. The ListMetadataFormats must list all names for formats supported by the registry; even though they are mandatory, this list must include ivo_vor and oai_dc.

Identifiers in OAI Messages

In accordance with the OAI-PMH standard, an OAI-PMH XML envelope that contains a resource description must include a globally unique URI that identifies that resource record. This identifier must be the IVOA identifier used to identify the resource being described as given in its vr:identifier child element.

This specification does not follow the recommendation of the OAI-PMH standard with regard to record identifiers. OAI-PMH makes a distinction between the resource record containing resource metadata and the resource itself; thus, it recommends that the identifier in the OAI envelope be different from the resource identifier. In particular, the former is the choice of the publishing registry. This allows one to distinguish resource descriptions of the same resource from different registries, which in principle could be different.

In the VO, because it is intended that resource descriptions of the same resource from different registries should not differ (apart from possible additions of vr:validationLevel elements), there is not a strong need to distinguish between the resource and the resource description. By making the resource and resource record identifiers the same, it makes it much easier to retrieve the record for a single resource via GetRecord , regardless of which registry is being queried. Otherwise—when the registry chooses the record identifier—a client will not a priori know the record identifier for a particular resource, and so it is left to call ListRecords and search through the metadata of all the records itself to find the one of interest. In contrast, IVOA identifiers are intended to be a cross-application way of referring to a resource, and thus when a client wants only a single specific resource record, it is very likely that it would know the resource identifier when making a call to the GetRecord operation.

Required Records

This section describes the records that a harvestable IVOA Registry must include among those it emits via the OAI-PMH operations.

The harvestable registry MUST return one record that describes the registry itself as a whole, and the ivo_vor format MUST be supported for this record. This record is included in the Identify operation response. When encoded using the ivo_vor format, the returned ri:Resource element must be of the type vg:Registry from the VORegistry schema (namespace http://www.ivoa.net/xml/VORegistry/v1.0; recommended namespace prefix vg:; see Appendix A). The record MUST include a vg:managedAuthority for every authority identifier that originated at that registry.

Before adding an authority to the list of a registry's managed authorities, the registry operator must verify no other registry claims to manage that authority. In other words: Within the whole VO, the relation mapping registries to authorities must be invertible. This allows determining the originating registry just from the authority part of a record's identifier. This specification does not provide technical safeguards to ensure the invertibility of the managed authority relation.

The harvestable registry must be able to return exactly one record in ivo_vor for each authority identifier listed as a vg:managedAuthority in the vg:Registry record that describes that registry. When encoded in the ivo_vor format, the type of these elements must be vg:Authority.

The Identify Operation

The Identify operation describes the harvestable registry as a whole. The response from this operation must include all information required by the OAI-PMH standard. In particular, it must include an oai:baseURL element that must refer to the base URL to the harvesting interface endpoint. The Identify response must include an oai:description element containing a single ri:Resource element with an xsi:type attribute that sets the element's type to vg:Registry. The content of vg:Registry type must be the registry description of the harvestable registry itself.

IVOA Supported Sets

Sets, as defined in the OAI-PMH standard, are "an optional construct for grouping items for the purpose of selective harvesting" (see the [OAI], section 2.6). Harvestable IVOA registries are free to define any number of custom sets for categorizing records. The OAI-PMH standard allows a record to be a member of multiple sets.

This specification defines one reserved set name with a special meaning; future versions of this specification may define additional set names. These reserved set names will all start with the characters ivo_; implementors should not define their own set names that begin with this string. While support for sets is optional to be compliant with the OAI-PMH standard, a harvestable registry must support the set with the reserved name ivo_managed to be compliant with this specification.

The ivo_managed set refers to all records that originate from the queried registry. That is, those records that were harvested from other registries are excluded. The IVOA Resource identifiers given in the records must have an authority identifier that matches on one of the vg:managedAuthority values in the vg:Registry record for that registry. Full searchable registries may use this set to avoid getting duplicate records when harvesting from many registries.

Searching the Registry

To be written. One candidate (possibly to be referenced from here) is at https://volute.googlecode.com/svn/trunk/projects/registry/regtap.

Registering Registries

This specification defines a VOResource extension schema called VORegistry that can be used to specifically describe a registry and its support for the registry interface described in this document. These descriptions can be stored as resource records in registries. The schema is also used to register a naming authority—a publisher who claims ownership of an authority identifier from which IVOA identifiers may be created. A publishing registry is said to exclusively manage a naming authority on behalf of the owning publisher; this means that only that registry may publish records with IVOA identifiers using that authority identifier. The full VORegistry syntax definition expressed in XML Schema is listed in Appendix TODO.

The Schema Namespace and Location

The VORegistry schema namespace is http://www.ivoa.net/xml/VORegistry/v1.0. As with the core VOResource Schema, the namespace URI has been chosen to allow it to be resolved as a URL to the XML Schema document that defines the VORegistry schema. Applications may assume that the namespace URI is so resolvable. In particular, it is recommended the namespace URI be given as the location for the VORegistry schema within the xsi:schemaLocation attribute. The recommended prefix for this namespace is vg:.

The Authority Resource Extension and the Publishing Process

The vg:Authority type extends the core vr:Resource type to specifically describe the ownership of an authority identifier by a publishing organisation.

The IVOA identifier of a vg:Authority record provided via the vr:identifier element must have an empty resource key component as defined in [VOID]. The authority identifier component of the record's identifier is the one that is the subject of the record itself.

The meaning of a vg:Authority record is that the organisation referenced in the vg:managingOrg element has the sole right to create (in collaboration with a publishing registry) and register resource descriptions using the authority identifier given by the vr:identifier element.

Before a publisher can create resource descriptions using a new authority identifier, it must first register its claim to the authority identifier by creating a vg:Authority record. Before the publishing registry commits the record for export, it must first search a full registry to determine if a vg:Authority with this identifier already exists; if it does, the publishing of the new vg:Authority record must fail. When a registry creates a vg:Authority record, it is said that the registry manages the associated authority identifier (on behalf of the owning publisher) because only that registry may create records with identifiers using that authority identifier.

Describing Registries with the Registry Resource Extension

The vg:Registry type extends the core vr:Service type to specifically describe registries that are compliant with this standard.

If the vg:full element is set to true, the registry is obligated to accept all valid resource records it harvests from other registries in accordance with the OAI-PMH specification.

The vg:managedAuthority element applies specifically to registries in their role as publishers of records. When a publishing registry claims to manage an authority identifier, it has created a vg:Authority resource record for that authority identifier.

As a subclass of vr:Service, the vg:Registry type uses vr:capability elements to describe its support for the interfaces described in this specification. In particular, the VORegistry schema defines two extensions of the VOResource's vr:Capability type: one to describe the support for the searching interface and one to describe the deprecated version 1.0 harvesting interface Both extension types extension types extend from an intermediate restriction on vr:Capability called vg:RegCapRestriction to force the value of the standardID attribute to be ivo://ivoa.net/std/Registry:

As an abstract type, the vg:RegCapRestriction type cannot be used directly on its own within a resource description; one of the non-abstract extensions of this intermediate type must be used instead.

The vr:Capability extension types are used by applying the xsi:type attribute to the vr:capability element (see also [VOR], section 2.2.2). A version 2 registry should, in general, provide a harvesting capability and the three mandatory VOSI capabilities [VOSI].

The Searching Capability

The search capability (vg:Search) is no longer used, and clients use will search for TAP services supporting the registry data model to locate the search endpoints. It is retained in the schema to avoid a disruptive schema change just to remove an element.

The Harvesting Capability

A registry declares itself to be a harvestable registry by including a vr:capability element with an xsi:type attribute set to vg:Harvest.

A vr:capability element of type vg:Harvest must include at least one vr:interface element with an xsi:type attribute set to vg:OAIHTTP and the role attribute set to std. If the vr:capability element is used to simultaneously describe support for other versions of this Registry Interface standard, then the vr:interface element describing support for this version must include the version attribute set to 2.0. The vr:accessURL element must be set to the base URL for the OAI-PMH interface.

The vg:OAISOAP extension of vr:WebService was used by Registry Interfaces 1.0 and is no longer part of this specification.

Changes from Previous Versions

For pre-REC-1.0 changes, see [RI1].

Changes from Version 1.0

  • Removed the entire section 2, specifically the SOAP-based services based on "ADQL 1.0" and XQuery.
  • Added the section on the TAP-based search and the table structure of the relational registry.
  • Dropped the requirement on registries to not deliver any records that are OAI-PMH deleted when no temporal constraint is given.

References