/[volute]/trunk/projects/WebAssets/tools/docrepoToADS/README
ViewVC logotype

Annotation of /trunk/projects/WebAssets/tools/docrepoToADS/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 5328 - (hide annotations)
Thu Mar 14 13:22:27 2019 UTC (23 months, 3 weeks ago) by msdemlei
File size: 2507 byte(s)
docrepoToADS: migrating to requests to mitigate impact of varying to encodings.


1 volute@g-vo.org 2749 This directory contains software to turn the contents of the IVOA
2     document repository into the tagged format of ADS.
3    
4     It is probably mainly of interest to the document coordinator.
5    
6 msdemlei 3133 Most of the information comes from the document repository landing pages
7     right now. Additionally, there are the following resources:
8    
9 msdemlei 3156 * arXiv_ids.txt -- an accessURL/arXiv id mapping maitained by the
10 msdemlei 3133 document coordinator.
11 msdemlei 3753 * published_notes.txt -- a list of landing page URLs with notes intended
12     for publication; the exec names the notes to be published.
13 msdemlei 3156 * (ads) -- via its API, we check what records were already uploaded to ADS
14 msdemlei 3133 to avoid inundating them with dupes.
15    
16 volute@g-vo.org 2749
17 msdemlei 3156 Dependencies
18     ============
19 volute@g-vo.org 2749
20 msdemlei 3156 python, beautifulsoup (Debian systems: python-beautifulsoup), requests
21     (Debian systems:python-requests).
22    
23    
24     The Editor Hack
25     ===============
26    
27     The Exec insisted we have to manipulate author lists to recognise that
28     for IVOA documents, most of the work is done by the editor. Therefore,
29     the script takes the editor names, removes them from the author list if
30     necessary, and then prepends them to the rest of the list.
31    
32    
33     Identifiers
34     ===========
35    
36     This script generates two sorts of identifiers:
37    
38     (a) bibcodes. The bibcodes we generate use spec as bibstem for
39     recommendations (which are considered refereed) and rept as bibstem for
40     notes (which are considered unrefereed). The "volume" is month and day
41     of publication. Where the same editor uploaded a document on the same
42     month and day, qualifiers are used to disambiguate.
43    
44     (b) IVOA eprint ids. These are not really used anywhere at the moment
45     but might become a tool to manage the document collection in the future.
46     They have the form ivoa:<r|n>.<year>.<month>.<count>, where count starts
47     from 0 each month and runs separately for each document type; r is for
48     recommendation, n for note.
49    
50    
51     ADS interface
52     =============
53    
54     To avoid uploading records that ADS already has, you should obtain an
55     ADS API token (see https://github.com/adsabs/adsabs-dev-api). When
56     generating records for submission, pass in this token through the -a
57     option.
58    
59    
60     Brief HOWTO
61     ===========
62    
63 msdemlei 3687 Just run:
64 msdemlei 3156
65 msdemlei 3687 python harvest.py -C -a your-access-token > ads.recs
66 msdemlei 3156
67 msdemlei 3687 [recommendation: set the token in your environment and run
68 msdemlei 3156
69 msdemlei 5328 rm -f httpwwwivoanetdocuments.cache
70 msdemlei 3687 python harvest.py -C -a $ADS_TOKEN > ads.recs
71     ]
72 msdemlei 3156
73 msdemlei 3242 Send ads.recs to ADS (ads@cfa.harvard.edu)
74 volute@g-vo.org 2749
75 msdemlei 3156
76     Open issues
77     ============
78    
79     Can we sanely extract references? Maybe at least for ivoatex-processed
80     documents?
81    
82    
83     2015-11-19 msdemlei@ari.uni-heidelberg.de

msdemlei@ari.uni-heidelberg.de
ViewVC Help
Powered by ViewVC 1.1.26