Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lukasschwab → Arxiv.py

lukasschwab / Arxiv.py

Licence: mit

Python wrapper for the arXiv API

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pdf arxiv python-wrapper

Projects that are alternatives of or similar to Arxiv.py

Python Documentcloud

A deprecated Python wrapper for the DocumentCloud API

Stars: ✭ 60 (-83.65%)

Mutual labels: python-wrapper, pdf

E Books

IT technical related e-books and PPT information, continuous updating. For those in need, Keep real, peace and love.

Stars: ✭ 357 (-2.72%)

Mutual labels: pdf

Crx Selection Translate

一站式划词 / 截图 / 网页全文 / 音视频翻译扩展。

Stars: ✭ 3,603 (+881.74%)

Mutual labels: pdf

Technical Ebooks

PDFs for programming tutorials.

Stars: ✭ 342 (-6.81%)

Mutual labels: pdf

Django Easy Pdf

PDF views, the easy way

Stars: ✭ 324 (-11.72%)

Mutual labels: pdf

Nlp Papers With Arxiv

Statistics and accepted paper list of NLP conferences with arXiv link

Stars: ✭ 345 (-5.99%)

Mutual labels: arxiv

Exifcleaner

Cross-platform desktop GUI app to clean image metadata

Stars: ✭ 305 (-16.89%)

Mutual labels: pdf

Cakepdf

CakePHP plugin for creating and/or rendering PDFs, supporting several popular PDF engines.

Stars: ✭ 360 (-1.91%)

Mutual labels: pdf

Jupyterlab Latex

JupyterLab extension for live editing of LaTeX documents

Stars: ✭ 349 (-4.9%)

Mutual labels: pdf

Latexdraw

A vector drawing editor for LaTeX (JavaFX).

Stars: ✭ 336 (-8.45%)

Mutual labels: pdf

Maroto

A maroto way to create PDFs. Maroto is inspired in Bootstrap and uses gofpdf. Fast and simple.

Stars: ✭ 334 (-8.99%)

Mutual labels: pdf

Itextsharp.lgplv2.core

iTextSharp.LGPLv2.Core is an unofficial port of the last LGPL version of the iTextSharp (V4.1.6) to .NET Core

Stars: ✭ 322 (-12.26%)

Mutual labels: pdf

Lightnovel Crawler

Download and generate e-books from online sources.

Stars: ✭ 344 (-6.27%)

Mutual labels: pdf

Percollate

A command-line tool to turn web pages into beautiful, readable PDF, EPUB, or HTML docs.

Stars: ✭ 3,535 (+863.22%)

Mutual labels: pdf

Cermine

Content ExtRactor and MINEr

Stars: ✭ 357 (-2.72%)

Mutual labels: pdf

Api

Vulners Python API wrapper

Stars: ✭ 313 (-14.71%)

Mutual labels: python-wrapper

Pdf Bookmark

pdf bookmark generator 目录书签大纲

Stars: ✭ 327 (-10.9%)

Mutual labels: pdf

Universalviewer

A community-developed open source project on a mission to help you share your 📚📜📰📽️📻🗿 with the 🌎

Stars: ✭ 343 (-6.54%)

Mutual labels: pdf

Rinohtype

The Python document processor

Stars: ✭ 365 (-0.54%)

Mutual labels: pdf

Pdfsharpcore

Port of the PdfSharp library to .NET Core - largely removed GDI+ (only missing GetFontData - which can be replaced with freetype2)

Stars: ✭ 360 (-1.91%)

Mutual labels: pdf

View All Similar Projects ➔

arxiv.py

Python wrapper for the arXiv API.

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Installation

$ pip install arxiv

Verify the installation with

$ python setup.py test

In your Python script, include the line

import arxiv

Query

arxiv.query(
  query="",
  id_list=[],
  max_results=None,
  start = 0,
  sort_by="relevance",
  sort_order="descending",
  prune=True,
  iterative=False,
  max_chunk_results=1000
)

Argument	Type	Default
`query`	string	`""`
`id_list`	list of strings	`[]`
`max_results`	int	10
`start`	int	0
`sort_by`	string	`"relevance"`
`sort_order`	string	`"descending"`
`prune`	boolean	`True`
`iterative`	boolean	`False`
`max_chunk_results`	int	1000

query: an arXiv query string. Format documented here.
- Note: multi-field queries must be space-delimited. au:balents_leon AND cat:cond-mat.str-el is valid; au:balents_leon+AND+cat:cond-mat.str-el is not valid.
id_list: list of arXiv record IDs (typically of the format "0710.5765v1").
max_results: the maximum number of results returned by the query. Note: if this is unset amd iterative=False, the call to query can take a long time to resolve.
start: the offset of the first returned object from the arXiv query results.
sort_by: the arXiv field by which the result should be sorted.
sort_order: the sorting order, i.e. "ascending", "descending" or None.
prune: when True, received abstract objects will be simplified.
iterative: when True, query() will return an iterator. Otherwise, query() iterates internally and returns the full list of results.
max_chunk_results: the maximum number of abstracts ot be retrieved by a single internal request to the arXiv API.

Query examples:

import arxiv

# Keyword queries
arxiv.query(query="quantum", max_results=100)

# Multi-field queries
arxiv.query(query="au:balents_leon AND cat:cond-mat.str-el")

# Get single record by ID
arxiv.query(id_list=["1707.08567"])

# Get multiple records by ID
arxiv.query(id_list=["1707.08567", "1707.08567"])

# Get an interator over query results
result = arxiv.query(
  query="quantum",
  max_chunk_results=10,
  max_results=100,
  iterative=True
)

for paper in result():
   print(paper)

For a more detailed description of the interaction between the query and id_list arguments, see this section of the arXiv documentation.

Download article PDF or source tarfile

arxiv.arxiv.download(obj, dirpath='./', slugify=slugify, prefer_source_tarfile=False)

Argument	Type	Default	Required?
`obj`	dict	N/A	Yes
`dirpath`	string	`"./"`	No
`slugify`	function	`arxiv.slugify`	No
`prefer_source_tarfile`	bool	`False`	No

obj is a result object, one of a list returned by query(). obj must at minimum contain values corresponding to pdf_url and title.
dirpath is the relative directory path to which the downloaded PDF will be saved. It defaults to the present working directory.
slugify is a function that processes obj into a filename. By default, arxiv.download(obj) prepends the object ID to the object title.
If prefer_source_tarfile is True, this function will download the source files for obj––rather than the rendered PDF––in .tar.gz format.

import arxiv

# Query for a paper of interest, then download it.
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)

# You can skip the query step if you have the paper info.
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
          "title": "The Paper Title"}
arxiv.download(paper2)

# Use prefer_source_tarfile to download the gzipped tar file.
arxiv.download(paper, prefer_source_tarfile=True)

# Override the default filename format by defining a slugify function.
arxiv.download(paper, slugify=lambda paper: paper.get('id').split('/')[-1])

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 367

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗