1 |
volute@g-vo.org |
2749 |
This directory contains software to turn the contents of the IVOA |
2 |
|
|
document repository into the tagged format of ADS. |
3 |
|
|
|
4 |
|
|
It is probably mainly of interest to the document coordinator. |
5 |
|
|
|
6 |
msdemlei |
3133 |
Most of the information comes from the document repository landing pages |
7 |
|
|
right now. Additionally, there are the following resources: |
8 |
|
|
|
9 |
msdemlei |
3156 |
* arXiv_ids.txt -- an accessURL/arXiv id mapping maitained by the |
10 |
msdemlei |
3133 |
document coordinator. |
11 |
msdemlei |
3753 |
* published_notes.txt -- a list of landing page URLs with notes intended |
12 |
|
|
for publication; the exec names the notes to be published. |
13 |
msdemlei |
3156 |
* (ads) -- via its API, we check what records were already uploaded to ADS |
14 |
msdemlei |
3133 |
to avoid inundating them with dupes. |
15 |
|
|
|
16 |
volute@g-vo.org |
2749 |
|
17 |
msdemlei |
3156 |
Dependencies |
18 |
|
|
============ |
19 |
volute@g-vo.org |
2749 |
|
20 |
msdemlei |
3156 |
python, beautifulsoup (Debian systems: python-beautifulsoup), requests |
21 |
|
|
(Debian systems:python-requests). |
22 |
|
|
|
23 |
|
|
|
24 |
|
|
The Editor Hack |
25 |
|
|
=============== |
26 |
|
|
|
27 |
|
|
The Exec insisted we have to manipulate author lists to recognise that |
28 |
|
|
for IVOA documents, most of the work is done by the editor. Therefore, |
29 |
|
|
the script takes the editor names, removes them from the author list if |
30 |
|
|
necessary, and then prepends them to the rest of the list. |
31 |
|
|
|
32 |
|
|
|
33 |
|
|
Identifiers |
34 |
|
|
=========== |
35 |
|
|
|
36 |
|
|
This script generates two sorts of identifiers: |
37 |
|
|
|
38 |
|
|
(a) bibcodes. The bibcodes we generate use spec as bibstem for |
39 |
|
|
recommendations (which are considered refereed) and rept as bibstem for |
40 |
|
|
notes (which are considered unrefereed). The "volume" is month and day |
41 |
|
|
of publication. Where the same editor uploaded a document on the same |
42 |
|
|
month and day, qualifiers are used to disambiguate. |
43 |
|
|
|
44 |
|
|
(b) IVOA eprint ids. These are not really used anywhere at the moment |
45 |
|
|
but might become a tool to manage the document collection in the future. |
46 |
|
|
They have the form ivoa:<r|n>.<year>.<month>.<count>, where count starts |
47 |
|
|
from 0 each month and runs separately for each document type; r is for |
48 |
|
|
recommendation, n for note. |
49 |
|
|
|
50 |
|
|
|
51 |
|
|
ADS interface |
52 |
|
|
============= |
53 |
|
|
|
54 |
|
|
To avoid uploading records that ADS already has, you should obtain an |
55 |
|
|
ADS API token (see https://github.com/adsabs/adsabs-dev-api). When |
56 |
|
|
generating records for submission, pass in this token through the -a |
57 |
|
|
option. |
58 |
|
|
|
59 |
|
|
|
60 |
|
|
Brief HOWTO |
61 |
|
|
=========== |
62 |
|
|
|
63 |
msdemlei |
3687 |
Just run: |
64 |
msdemlei |
3156 |
|
65 |
msdemlei |
3687 |
python harvest.py -C -a your-access-token > ads.recs |
66 |
msdemlei |
3156 |
|
67 |
msdemlei |
3687 |
[recommendation: set the token in your environment and run |
68 |
msdemlei |
3156 |
|
69 |
msdemlei |
5328 |
rm -f httpwwwivoanetdocuments.cache |
70 |
msdemlei |
3687 |
python harvest.py -C -a $ADS_TOKEN > ads.recs |
71 |
|
|
] |
72 |
msdemlei |
3156 |
|
73 |
msdemlei |
3242 |
Send ads.recs to ADS (ads@cfa.harvard.edu) |
74 |
volute@g-vo.org |
2749 |
|
75 |
msdemlei |
3156 |
|
76 |
|
|
Open issues |
77 |
|
|
============ |
78 |
|
|
|
79 |
|
|
Can we sanely extract references? Maybe at least for ivoatex-processed |
80 |
|
|
documents? |
81 |
|
|
|
82 |
|
|
|
83 |
|
|
2015-11-19 msdemlei@ari.uni-heidelberg.de |