This notebook shows how a "naive" client can serialize and deserialize instances according to the Mapping Data Model Instances to VOTable Working Draft.
The following examples may not be complete, general, or efficient. Their goal is to show in a practical way how one can use UTYPEs to serialize and deserialize instances of Data Models in VOTable. Actual implementations may vary significantly and depend on the local setup, requirements, and programming language.
We say that a client is naïve if: 1. it does not parse the VO-DML description file 2. it assumes the a priori knowledge of one or more Data Models 3. it discovers information by looking for a set of predefined UTYPEs in the VOTable
Serializing instances is generally easier than deserializing them. By introducing deserialization first, step by step, serialization patterns should become clear, and serializing instances should then be straightforward.
In this tutorial we will use a very simplistic Data Model for STC:
The above figure represents a UML Class Diagram, i.e. a conceptual representation of the domain under study, in this case a small subset of Space Time Coordinates.
This model defines these vodml-ids:
SkyCoordinate (type)
SkyCoordinate.longitude
SkyCoordinate.latitude
SkyCoordinateFrame (type)
SkyCoordinateFrame.name
SkyCoordinateFrame.equinox
UTYPEs are pointers that refer to vodml-ids with the following syntax:
<<model_id>>:<<vodml_id>>
The model_id for STCX is stcx, so:
stcx:SkyCoordinate
is a UTYPE pointing to the SkyCoordinate type in STCX.
According to the Mapping Data Model Instances to VOTable, a VOTable must include a preamble that declares the data models used in the file. This signals readers that the VOTable falls under the Mapping specification, and allows more advanced clients to get a copy of the standard model description file (VO-DML/XML).
For such clients, the preamble also provides a resolution mechanism for the model prefixes (more information below).
Naive clients, however, assume a priori knowledge of the data model, so they do not parse the VO-DML/XML file, and they can assume globally unique prefixes.
The file positions.xml
contains a list of positions, represented in the following UML Object Diagram
A UML Object Diagram represents specific instances of the model described by the Class Diagram. In this case, the class diagram describes the attributes of generic sky positions, while the object diagram represent some specific values of sky positions.
The following cells will show how to retrieve information about such instances using the Mapping specification.
We will use the lxml
Python package to parse the VOTable as an XML, to serialize and deserialize instances. For deserialization we will mostly use XPATH
strings.
import lxml.etree as ET
pos_vot = ET.parse('positions.xml').getroot()
Some special UTYPEs are used to mark-up VOTable and work as handles for clients.
In particular, vo-dml:Instance.root
tags GROUP
elements that contain an instance representation according to the Mapping specification.
On the other hand, vo-dml:Instance.type
is used as the @utype attribute of PARAMs
in GROUPs
to store the type of the instance serialized in the GROUP
itself.
The mapping is recursive, so an instance inside an instance will be represented by a GROUP
nested inside a GROUP
.
However, only the first GROUP
in the hierarchy (the root) will have the @utype set to vo-dml:Instance.root
, while the nested GROUPs
may have at least one PARAM
with @utype=vo-dml:Instance.type
.
So, the following command finds all the instance representations that the Data Provider serialized in the file, but other instances may be nested.
pos_vot.findall('.//GROUP[@utype="vo-dml:Instance.root"]')
The following command shows how to get all instance representations of a specific type, in this case a SkyCoordinate
.
The idea is to get all the GROUPs
having a PARAM
with @utype="vo-dml:Instance.type" and @value="ref:source.stc.SkyCoordinate", that is the ID of the SkyCoordinate
class in the VO-DML description of the example model.
positions = pos_vot.xpath('''.//GROUP[PARAM[@utype="vo-dml:Instance.type"
and
@value="stcx:SkyCoordinate"]]''')
print len(positions)
Although the file contains four positions, only two GROUPs
are found.
In the positions.xml
file, there are two GROUPs
representing SkyCoordinates
, i.e. positions in the sky according to a very simplistic STC model.
One GROUP
is an example of a direct serialization, i.e., a GROUP
that has no FIELDrefs
, but only PARAMs
(and the same is true for any nested GROUPs
therein).
A direct serialization is completely defined by its GROUP
as all the values are defined for the instance.
The other GROUP
, instead, is an example of an indirect serialization, as the root GROUP
, or any of its nested GROUPs
have FIELDref
in them. So, the GROUP
represents a kind of template for instances that have values stored in table cells.
So, one GROUP
represents a complete instance, the other represents a template for positions that are serialized in the table, in different rows. In fact, the table in positions.xml
contains three rows.
The following code prints the values of longitude and latitude for the direct serialization GROUP
, by using the UTYPEs defined in the model for these attributes of a SkyCoordinate
class to find the PARAMs
olding the actual values.
for position in positions:
# FIND PARAMs for longitude and latitude, using UTYPEs
longitude = position.xpath('PARAM[@utype="stcx:SkyCoordinate.longitude"]')
latitude = position.xpath('PARAM[@utype="stcx:SkyCoordinate.latitude"]')
# IF ANY PARAMs ARE FOUND for longitude
if len(longitude):
# GET THE VALUE
print "longitude: ", longitude[0].attrib['value']
# IF ANY PARAMs ARE FOUND for longitude
if len(latitude):
# GET THE VALUE
print "latitude: ", latitude[0].attrib['value']
The following code focuses on the indirect serialization to find, by means of UTYPEs, the FIELD
ID
and index
for latitude
and longitude
.
for position in positions:
# FIND FIELDrefs for longitude and latitude, using UTYPEs
longitude = position.xpath('FIELDref[@utype="stcx:SkyCoordinate.longitude"]')
latitude = position.xpath('FIELDref[@utype="stcx:SkyCoordinate.latitude"]')
# IF ANY FIELDrefs ARE FOUND for longitude
if len(longitude):
# GET THE FIELD ID
fid = longitude[0].attrib['ref']
# GET THE FIELD INDEX
idx = pos_vot.xpath("count(.//FIELD[@ID = $fid]/preceding-sibling::FIELD)", fid=fid)
# PRINT THE RESULTS
print("Longitude ID:{} Index:{}").format(fid, int(idx))
if len(latitude):
fid = latitude[0].attrib['ref']
idx = pos_vot.xpath("count(.//FIELD[@ID = $fid]/preceding-sibling::FIELD)", fid=fid)
print("Latitude ID:{} Index:{}").format(fid, int(idx))
In the above examples we showed how to use UTYPEs to find VOTable elements that make up an instance of a data model class, namely a sky coordinate with longitude and latitude.
Important points: - We are using a simplistic example model that defines some IDs for STC concepts. - The IDs defined in the model are used in the VOTable to annotate GROUPs
, FIELDrefs
, and PARAMs
. - The IDs defined in the model are prefixed in VOTable by a string (prefix, or namespace) that identifies the model ('stcx'). - We are assuming the mapping strategies defined in the Mapping document.
In other terms, we are assuming direct knowledge of this simple model, and such knowledge is represented by nothing more than the UTYPE strings and domain knowledge regarding STC.
To show how one can do more interesting stuff we can define functions that use UTYPE strings as parameters.
The following function assumes that the concept represented by a UTYPE is a column, and fetched the column values as a Python array.
Note: As acknowledged before, this kind of functions are not supposed to be scalable or efficient, and they may not be complete. For example the following function definition (and many others below) does not perform any error handling.
def get_column_array(element, utype, type_):
"""
Given a VOTable element, get the column values for the concept represented by utype,
casting the elements to type_
"""
# GET THE FIELDref FOR THE CONCEPT REPRESENTED BY utype
el = element.xpath('FIELDref[@utype=$utype]', utype=utype)
# IF ANY SUCH FIELDrefs exist
if len(el):
# GET THE FIELD ID
fid = el[0].attrib['ref']
# GET THE FIELD INDEX
idx = element.xpath("count(//FIELD[@ID = $fid]/preceding-sibling::FIELD)", fid=fid)+1
# GET THE TDs for that column
tds = element.xpath('//FIELD[@ID = $fid]/following-sibling::DATA/TABLEDATA/TR/TD[$idx]', fid=fid, idx=int(idx))
# BUILD AND RETURN THE ARRAY OF VALUES
array = [type_(td.text) for td in tds]
return array
The following simple helper function checks whether an element has an indirect representations.
def is_indirect(element):
"""
Return true if the element is an indirect representation
"""
# LOOK FOR ANY FIELDref INSIDE THE ELEMENT
el = element.xpath('.//FIELDref')
# RETURN TRUE IF THERE IS AT LEAST ONE FIELDref (INDIRECT REPRESENTATION)
return len(el) > 0
The following example shows how the above functions can be used to get the array of values for a concept serialized in a VOTable.
Again, we are only assuming knowledge of the IDs that define the concept in the Data Model.
utype = "stcx:SkyCoordinate.longitude"
for position in positions:
if is_indirect(position):
print get_column_array(position, utype, float)
A few more helper functions are defined below. While they may be interesting as concrete examples of how to do some simple I/O, the interesting reason for their creation is that they represent a I/O specific library that implements the mapping patterns defined in the Mapping document.
In other terms they show how it is possible to separate the I/O layer from the business layer. These helper functions are general (although not complete for the the sake of simplicity) in the sense that do not depend on the specific model.
def get_cell(element, utype, type_, index):
el = element.xpath('.//FIELDref[@utype=$utype]', utype=utype)
if len(el):
fid = el[0].attrib['ref']
idx = element.xpath("count(//FIELD[@ID = $fid]/preceding-sibling::FIELD)", fid=fid)+1
tds = element.xpath('//FIELD[@ID = $fid]/following-sibling::DATA/TABLEDATA/TR/TD[$idx]', fid=fid, idx=int(idx))
return type_(tds[index].text)
def get_nrows(element):
el = element.xpath('.//FIELDref')
if len(el):
fid = el[0].attrib['ref']
nrows = element.xpath('count(//FIELD[@ID = $fid]/following-sibling::DATA/TABLEDATA/TR)', fid=fid)
return int(nrows)
def get_param(element, utype, type_):
el = element.xpath('.//PARAM[@utype=$utype]', utype=utype)
if len(el):
return el[0].attrib['value']
def find_type(element, utype):
type_utype = "vo-dml:Instance.type"
return element.xpath('.//GROUP[PARAM[@utype=$type_u and @value=$utype]]',
type_u = type_utype,
utype = utype)
def get_from_field_or_param(element, utype, type_, row):
value = None
if is_indirect(element):
value = get_cell(element, utype, type_, row)
if value is None:
value = get_param(element, utype, type_)
return value
def get_column_array_from_field(element, utype, type_):
el = element.xpath(".//FIELD[@utype=$utype]", utype=utype)
if len(el):
nrows = el[0].xpath('count(DATA/TABLEDATA/TR)')
idx = el[0].xpath("count(preceding-sibling::FIELD)")+1
tds = el[0].xpath('following-sibling::DATA/TABLEDATA/TR/TD[$idx]', idx=int(idx))
return [type_(td.text) for td in tds]
The next cell is interesting because it defines a Position class that implements a simple structured object.
This class puts the I/O library defined above at work for deserializing instances of positions from VOTable. In order to do so, the class uses three UTYPEs that point to the STCX data model elements.
class Position(object):
position_utype = "stcx:SkyCoordinate"
longitude_utype = "stcx:SkyCoordinate.longitude"
latitude_utype = "stcx:SkyCoordinate.latitude"
def __init__(self, longitude, latitude):
self.longitude = longitude
self.latitude = latitude
@staticmethod
def find(element):
positions = find_type(element, Position.position_utype)
return_positions = []
for position in positions:
if is_indirect(position):
nrows = get_nrows(position)
for row in range(nrows):
longitude = get_from_field_or_param(position, Position.longitude_utype, float, row)
latitude = get_from_field_or_param(position, Position.latitude_utype, float, row)
return_positions.append(Position(longitude, latitude))
else:
longitude = get_param(position, Position.longitude_utype, float)
latitude = get_param(position, Position.latitude_utype, float)
return_positions.append(Position(longitude, latitude))
return return_positions
def __repr__(self):
return "Position {{longitude: {}, latitude: {}}}".format(self.longitude, self.latitude)
positions = Position.find(pos_vot)
for position in positions:
print position
To sum up: - We defined a number of helper functions that implement some mapping strategies from the Mapping to VOTable specification. - We defined a Position
class that implements a simplistic STCX model. The implementation uses the helper functions and the vodml-ids defined by the STCX model.
In this simplistic example we can identify a generic I/O library made of the helper functions (let's call it volib), and a model specific library for STCX, that uses the helper functions (let's call it stclib).
The STCX Model can be useful if we attach coordinates to something. Let's say that this something is a Source, according to the following (still simple) Source data model.
In particular, file catalog.xml
contains a Catalog with three Sources, each of which has a position, as specified in the object diagram below.
First of all, we need to load the VOTable using lxml:
catalog_vot = ET.parse('catalog.xml').getroot()
The following Source
and Catalog
classes implement the types defined in the Source data model, just as Position
above implemented a class from the STCX model.
We might think of these classes as a SRCLib library. Sources have positions that are STCX's SkyCoordinate
s. The Mapping document allows SRCLib to easily reuse the code in STCLib.
class Source(object):
def __init__(self, name, position):
self.name = name
self.position = position
@staticmethod
def find(element):
source_utype = "src:Source"
name_utype = "src:Source.name"
position_utype = "src:Source.position"
sources = find_type(element, source_utype)
return_sources = []
for source in sources:
if is_indirect(source):
nrows = get_nrows(source)
for row in range(nrows):
name = get_from_field_or_param(source, name_utype, str, row)
position = Position.find(source)[row]
return_sources.append(Source(name, position))
else:
name = get_param(source, name_utype, str)
position = Position.find(source)[0]
return_sources.append(Source(name, position))
return return_sources
def __repr__(self):
return "Source {{name: {}, position: {}}}".format(self.name, self.position)
class Catalog(object):
def __init__(self, name, description, sources):
self.name = name
self.description = description
self.sources = sources
@staticmethod
def find(element):
catalog_utype = "src:Catalog"
name_utype = "src:Catalog.name"
description_utype = "src:Catalog.description"
source_utype = "src:Catalog.source"
catalogs = find_type(element, catalog_utype)
return_catalogs = []
for catalog in catalogs:
if is_indirect(catalog):
nrows = get_nrows(catalog)
for row in range(nrows):
name = get_from_field_or_param(catalog, name_utype, str, row)
description = get_from_field_or_param(catalog, description_utype, str, row)
sources = Source.find(catalog)
return_catalogs.append(Catalog(name, description, sources))
else:
name = get_param(catalog, name_utype, str)
description = get_param(catalog, description_utype, str)
sources = Source.find(catalog, source_utype)
return_catalogs.append(Catalog(name, description, sources))
return return_catalogs
def __repr__(self):
ret = "Catalog {{name: {}, description: {}, sources:\n".format(self.name, self.description)
for source in self.sources:
ret += '\t\t'+ repr(source)+'\n'
ret+='}'
return ret
One can access objects at any level of the instances hierarchy, for example one can get all Source
s in the file:
sources = Source.find(catalog_vot)
for source in sources:
print source.name, source.position.longitude, source.position.latitude
Or one can access the main catalog object:
catalog = Catalog.find(catalog_vot)[0]
print catalog
We can still access individual positions, by using STCLib:
positions = Position.find(catalog_vot)
for position in positions:
print position
The catalog.xml
file also contain some old-style STC UTYPEs: - stc:AstroCoords.Position2D.Value2.C1 - stc:AstroCoords.Position2D.Value2.C1
The Mapping document allows such UTYPEs to live side-by-side with the new-style ones, and the following call shows how one can access the same data using such UTYPEs. Notice however, that the modular implementation we explored in this tutorial is not possible with the old-style UTYPEs.
print get_column_array_from_field(catalog_vot, "stc:AstroCoords.Position2D.Value2.C1", float)
Serializing instance is much easier than deserializing them, at least for data providers, who do not need to implement the specifications in a complete way. Helper tools, on the other hands, must be smarter, and they can help data providers or users even further.
To keep code simple and intuitive the example below does not produce a valid VOTable.
The following function just enables pretty-printing of the XML.
from xml.dom import minidom
def prettify(elem):
rough_string = ET.tostring(elem)
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ")
The following function serialize a catalog instance like the one we deserialized in the first part of the tutorial.
The code should be rather self-explanatory.
def print_catalog(catalog):
resource = ET.Element("RESOURCE")
table = ET.SubElement(resource, "TABLE")
catalog_repr = ET.SubElement(table, "GROUP",
utype = "vo-dml:Instance.root")
ET.SubElement(catalog_repr, "PARAM",
utype = "vo-dml:Instance.type",
value = "src:Catalog")
ET.SubElement(catalog_repr, "PARAM",
utype = "src:Catalog.name",
value = catalog.name)
ET.SubElement(catalog_repr, "PARAM",
utype = "src:Catalog.description",
value = catalog.description)
source_repr = ET.SubElement(catalog_repr, "GROUP",
utype = "Catalog.source")
ET.SubElement(source_repr, "PARAM",
utype = "vo-dml:Instance.type",
value = "src:Source")
ET.SubElement(source_repr, "FIELDref",
utype = "src:Source.name",
ref = "_name")
position_repr = ET.SubElement(source_repr, "GROUP",
utype = "Source.position")
ET.SubElement(position_repr, "PARAM",
utype = "vo-dml:Instance.type",
value = "stcx:SkyCoordinate")
ET.SubElement(position_repr, "FIELDref",
utype = "stcx:SkyCoordinate.longitude",
ref = "_long")
ET.SubElement(position_repr, "FIELDref",
utype = "stcx:SkyCoordinate.longitude",
ref = "_lat")
ET.SubElement(table, "FIELD",
ID="_name")
ET.SubElement(table, "FIELD",
ID="_long")
ET.SubElement(table, "FIELD",
ID="_lat")
data = ET.SubElement(table, "DATA")
tabledata = ET.SubElement(data, "TABLEDATA")
for source in catalog.sources:
row = ET.SubElement(tabledata, "TR")
ET.SubElement(row, "TD").text = source.name
ET.SubElement(row, "TD").text = str(source.position.longitude)
ET.SubElement(row, "TD").text = str(source.position.latitude)
print prettify(resource)
print_catalog(catalog)