/[volute]/trunk/projects/WebAssets/tools/docrepoToADS/README
ViewVC logotype

Contents of /trunk/projects/WebAssets/tools/docrepoToADS/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 5328 - (show annotations)
Thu Mar 14 13:22:27 2019 UTC (22 months, 1 week ago) by msdemlei
File size: 2507 byte(s)
docrepoToADS: migrating to requests to mitigate impact of varying to encodings.


1 This directory contains software to turn the contents of the IVOA
2 document repository into the tagged format of ADS.
3
4 It is probably mainly of interest to the document coordinator.
5
6 Most of the information comes from the document repository landing pages
7 right now. Additionally, there are the following resources:
8
9 * arXiv_ids.txt -- an accessURL/arXiv id mapping maitained by the
10 document coordinator.
11 * published_notes.txt -- a list of landing page URLs with notes intended
12 for publication; the exec names the notes to be published.
13 * (ads) -- via its API, we check what records were already uploaded to ADS
14 to avoid inundating them with dupes.
15
16
17 Dependencies
18 ============
19
20 python, beautifulsoup (Debian systems: python-beautifulsoup), requests
21 (Debian systems:python-requests).
22
23
24 The Editor Hack
25 ===============
26
27 The Exec insisted we have to manipulate author lists to recognise that
28 for IVOA documents, most of the work is done by the editor. Therefore,
29 the script takes the editor names, removes them from the author list if
30 necessary, and then prepends them to the rest of the list.
31
32
33 Identifiers
34 ===========
35
36 This script generates two sorts of identifiers:
37
38 (a) bibcodes. The bibcodes we generate use spec as bibstem for
39 recommendations (which are considered refereed) and rept as bibstem for
40 notes (which are considered unrefereed). The "volume" is month and day
41 of publication. Where the same editor uploaded a document on the same
42 month and day, qualifiers are used to disambiguate.
43
44 (b) IVOA eprint ids. These are not really used anywhere at the moment
45 but might become a tool to manage the document collection in the future.
46 They have the form ivoa:<r|n>.<year>.<month>.<count>, where count starts
47 from 0 each month and runs separately for each document type; r is for
48 recommendation, n for note.
49
50
51 ADS interface
52 =============
53
54 To avoid uploading records that ADS already has, you should obtain an
55 ADS API token (see https://github.com/adsabs/adsabs-dev-api). When
56 generating records for submission, pass in this token through the -a
57 option.
58
59
60 Brief HOWTO
61 ===========
62
63 Just run:
64
65 python harvest.py -C -a your-access-token > ads.recs
66
67 [recommendation: set the token in your environment and run
68
69 rm -f httpwwwivoanetdocuments.cache
70 python harvest.py -C -a $ADS_TOKEN > ads.recs
71 ]
72
73 Send ads.recs to ADS (ads@cfa.harvard.edu)
74
75
76 Open issues
77 ============
78
79 Can we sanely extract references? Maybe at least for ivoatex-processed
80 documents?
81
82
83 2015-11-19 msdemlei@ari.uni-heidelberg.de

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26