All Projects → lukasschwab → Arxiv.py

lukasschwab / Arxiv.py

Licence: mit
Python wrapper for the arXiv API

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Arxiv.py

Python Documentcloud
A deprecated Python wrapper for the DocumentCloud API
Stars: ✭ 60 (-83.65%)
Mutual labels:  python-wrapper, pdf
E Books
IT technical related e-books and PPT information, continuous updating. For those in need, Keep real, peace and love.
Stars: ✭ 357 (-2.72%)
Mutual labels:  pdf
Crx Selection Translate
一站式划词 / 截图 / 网页全文 / 音视频翻译扩展。
Stars: ✭ 3,603 (+881.74%)
Mutual labels:  pdf
Technical Ebooks
PDFs for programming tutorials.
Stars: ✭ 342 (-6.81%)
Mutual labels:  pdf
Django Easy Pdf
PDF views, the easy way
Stars: ✭ 324 (-11.72%)
Mutual labels:  pdf
Nlp Papers With Arxiv
Statistics and accepted paper list of NLP conferences with arXiv link
Stars: ✭ 345 (-5.99%)
Mutual labels:  arxiv
Exifcleaner
Cross-platform desktop GUI app to clean image metadata
Stars: ✭ 305 (-16.89%)
Mutual labels:  pdf
Cakepdf
CakePHP plugin for creating and/or rendering PDFs, supporting several popular PDF engines.
Stars: ✭ 360 (-1.91%)
Mutual labels:  pdf
Jupyterlab Latex
JupyterLab extension for live editing of LaTeX documents
Stars: ✭ 349 (-4.9%)
Mutual labels:  pdf
Latexdraw
A vector drawing editor for LaTeX (JavaFX).
Stars: ✭ 336 (-8.45%)
Mutual labels:  pdf
Maroto
A maroto way to create PDFs. Maroto is inspired in Bootstrap and uses gofpdf. Fast and simple.
Stars: ✭ 334 (-8.99%)
Mutual labels:  pdf
Itextsharp.lgplv2.core
iTextSharp.LGPLv2.Core is an unofficial port of the last LGPL version of the iTextSharp (V4.1.6) to .NET Core
Stars: ✭ 322 (-12.26%)
Mutual labels:  pdf
Lightnovel Crawler
Download and generate e-books from online sources.
Stars: ✭ 344 (-6.27%)
Mutual labels:  pdf
Percollate
A command-line tool to turn web pages into beautiful, readable PDF, EPUB, or HTML docs.
Stars: ✭ 3,535 (+863.22%)
Mutual labels:  pdf
Cermine
Content ExtRactor and MINEr
Stars: ✭ 357 (-2.72%)
Mutual labels:  pdf
Api
Vulners Python API wrapper
Stars: ✭ 313 (-14.71%)
Mutual labels:  python-wrapper
Pdf Bookmark
pdf bookmark generator 目录 书签 大纲
Stars: ✭ 327 (-10.9%)
Mutual labels:  pdf
Universalviewer
A community-developed open source project on a mission to help you share your 📚📜📰📽️📻🗿 with the 🌎
Stars: ✭ 343 (-6.54%)
Mutual labels:  pdf
Rinohtype
The Python document processor
Stars: ✭ 365 (-0.54%)
Mutual labels:  pdf
Pdfsharpcore
Port of the PdfSharp library to .NET Core - largely removed GDI+ (only missing GetFontData - which can be replaced with freetype2)
Stars: ✭ 360 (-1.91%)
Mutual labels:  pdf

arxiv.py Python 2.7 Python 3.6

Python wrapper for the arXiv API.

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Installation

$ pip install arxiv

Verify the installation with

$ python setup.py test

In your Python script, include the line

import arxiv

Query

arxiv.query(
  query="",
  id_list=[],
  max_results=None,
  start = 0,
  sort_by="relevance",
  sort_order="descending",
  prune=True,
  iterative=False,
  max_chunk_results=1000
)
Argument Type Default
query string ""
id_list list of strings []
max_results int 10
start int 0
sort_by string "relevance"
sort_order string "descending"
prune boolean True
iterative boolean False
max_chunk_results int 1000
  • query: an arXiv query string. Format documented here.

    • Note: multi-field queries must be space-delimited. au:balents_leon AND cat:cond-mat.str-el is valid; au:balents_leon+AND+cat:cond-mat.str-el is not valid.
  • id_list: list of arXiv record IDs (typically of the format "0710.5765v1").

  • max_results: the maximum number of results returned by the query. Note: if this is unset amd iterative=False, the call to query can take a long time to resolve.

  • start: the offset of the first returned object from the arXiv query results.

  • sort_by: the arXiv field by which the result should be sorted.

  • sort_order: the sorting order, i.e. "ascending", "descending" or None.

  • prune: when True, received abstract objects will be simplified.

  • iterative: when True, query() will return an iterator. Otherwise, query() iterates internally and returns the full list of results.

  • max_chunk_results: the maximum number of abstracts ot be retrieved by a single internal request to the arXiv API.

Query examples:

import arxiv

# Keyword queries
arxiv.query(query="quantum", max_results=100)

# Multi-field queries
arxiv.query(query="au:balents_leon AND cat:cond-mat.str-el")

# Get single record by ID
arxiv.query(id_list=["1707.08567"])

# Get multiple records by ID
arxiv.query(id_list=["1707.08567", "1707.08567"])

# Get an interator over query results
result = arxiv.query(
  query="quantum",
  max_chunk_results=10,
  max_results=100,
  iterative=True
)

for paper in result():
   print(paper)

For a more detailed description of the interaction between the query and id_list arguments, see this section of the arXiv documentation.

Download article PDF or source tarfile

arxiv.arxiv.download(obj, dirpath='./', slugify=slugify, prefer_source_tarfile=False)
Argument Type Default Required?
obj dict N/A Yes
dirpath string "./" No
slugify function arxiv.slugify No
prefer_source_tarfile bool False No
  • obj is a result object, one of a list returned by query(). obj must at minimum contain values corresponding to pdf_url and title.

  • dirpath is the relative directory path to which the downloaded PDF will be saved. It defaults to the present working directory.

  • slugify is a function that processes obj into a filename. By default, arxiv.download(obj) prepends the object ID to the object title.

  • If prefer_source_tarfile is True, this function will download the source files for obj––rather than the rendered PDF––in .tar.gz format.

import arxiv

# Query for a paper of interest, then download it.
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)

# You can skip the query step if you have the paper info.
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
          "title": "The Paper Title"}
arxiv.download(paper2)

# Use prefer_source_tarfile to download the gzipped tar file.
arxiv.download(paper, prefer_source_tarfile=True)

# Override the default filename format by defining a slugify function.
arxiv.download(paper, slugify=lambda paper: paper.get('id').split('/')[-1])

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].