olivettigroup / article-downloader

Licence: MIT license

Uses publisher APIs to programmatically retrieve scientific journal articles for text mining.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to article-downloader

cvpysdk

Developer SDK - Python

Stars: ✭ 50 (-38.27%)

Mutual labels: api-wrapper

powershellwrapper

This PowerShell module acts as a wrapper for the IT Glue API.

Stars: ✭ 96 (+18.52%)

Mutual labels: api-wrapper

AniList-Node

A lightweight Node.js wrapper for the AniList API

Stars: ✭ 36 (-55.56%)

Mutual labels: api-wrapper

hata

Async Discord API wrapper.

Stars: ✭ 156 (+92.59%)

Mutual labels: api-wrapper

wikipedia-reference-scraper

Wikipedia API wrapper for references

Stars: ✭ 34 (-58.02%)

Mutual labels: api-wrapper

epicstore api

Epic Games Store Web API Wrapper written in Python

Stars: ✭ 48 (-40.74%)

Mutual labels: api-wrapper

Notion Api

Unofficial Notion.so API

Stars: ✭ 250 (+208.64%)

Mutual labels: api-wrapper

cf-mailchimp

ColdFusion wrapper for the MailChimp 3.0 API

Stars: ✭ 17 (-79.01%)

Mutual labels: api-wrapper

ruby-ambassador

Ambassador API v2 wrapper for Ruby

Stars: ✭ 20 (-75.31%)

Mutual labels: api-wrapper

libdrizzle-redux

The next generation of Libdrizzle with a simplified API and support for more features of the protocol

Stars: ✭ 14 (-82.72%)

Mutual labels: api-wrapper

knowledgeworks api

The API utils for querying CN-DBpedia & CN-Probase, the biggest Chinese knowledge bases

Stars: ✭ 24 (-70.37%)

Mutual labels: api-wrapper

messages

A python package designed to make sending messages easy and efficient!

Stars: ✭ 38 (-53.09%)

Mutual labels: api-wrapper

Pyrez

(ON REWRITE) An easy to use (a)sync wrapper for Hi-Rez Studios API (Paladins, Realm Royale, and Smite), written in Python. 🐍

Stars: ✭ 23 (-71.6%)

Mutual labels: api-wrapper

java-binance-api

Java Binance API Client

Stars: ✭ 72 (-11.11%)

Mutual labels: api-wrapper

pjbank-js-sdk

PJBank SDK para Javascript! ⚡ ⚡ ⚡

Stars: ✭ 24 (-70.37%)

Mutual labels: api-wrapper

conekta-elixir

Elixir library for Conekta api calls

Stars: ✭ 15 (-81.48%)

Mutual labels: api-wrapper

chess-web-api

Chess.com public data API wrapper with "isChanged" and priority queue functionality.

Stars: ✭ 83 (+2.47%)

Mutual labels: api-wrapper

bookops-worldcat

BookOps WorldCat Metadata API wrapper

Stars: ✭ 21 (-74.07%)

Mutual labels: api-wrapper

valorant.py

Complete Python interface for the Valorant API. Works right out of the box!

Stars: ✭ 84 (+3.7%)

Mutual labels: api-wrapper

newsapi-php

A PHP client for the News API (https://newsapi.org/docs/get-started)

Stars: ✭ 21 (-74.07%)

Mutual labels: api-wrapper

View All Similar Projects ➔

article-downloader

Uses publisher-approved APIs to programmatically retrieve large amounts of scientific journal articles for text mining. Exposes a top-level ArticleDownloader class which provides methods for retrieving lists of DOIs (== unique article IDs) from text search queries, downloading HTML and PDF articles given DOIs, and programmatically sweeping through search parameters for large scale downloading.

Important Note: This package is only intended to be used for publisher-approved text-mining activities! The code in this repository only provides an interface to existing publisher APIs and web routes; you need your own set of API keys / permissions to download articles from any source that isn't open-access.

Full API Documentation

You can read the documentation for this repository here.

Installation

Use pip install articledownloader. If you don't have pip installed, you could also download the ZIP containing all the files in this repo and manually import the ArticleDownloader class into your own Python code.

Usage

Use the ArticleDownloader class to download articles. You'll need an API key, and please respect each publisher's terms of use.

It's usually best to add your API key to your environment variables with something like export API_KEY=xxxxx.

You can find DOIs using a CSV where the first column corresponds to search queries, and these queries will be used to find articles and retrieve their DOIs.

Examples

Downloading a single PDF article

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader(els_api_key='your_elsevier_API_key')
my_file = open('my_path/something.pdf', 'w')  # Need to use 'wb' on Windows

downloader.get_pdf_from_doi('my_doi', my_file, 'crossref')

Downloading a single HTML article

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader(els_api_key='your_elsevier_API_key')
my_file = open('my_path/something.html', 'w')

downloader.get_html_from_doi('my_doi', my_file, 'elsevier')

Getting metadata

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader(els_api_key='your_elsevier_API_key')

#Get 500 DOIs from articles published after the year 2000 from a single journal
downloader.get_dois_from_journal_issn('journal_issn', rows=500, pub_after=2000)

#Get the title for a single article (only works with CrossRef for now)
downloader.get_title_from_doi('my_doi', 'crossref')

#Get the abstract for a single article (only works with Elsevier for now)
downloader.get_abstract_from_doi('my_doi', 'elsevier')

Using search queries to find DOIs

CSV file:

search query 001,
search query 002,
search query 003,
.
.
.

Python:

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader('your_API_key')

#grab up to 5 articles per search
queries = downloader.load_queries_from_csv(open('path_to_csv_file', 'r'))

dois = []
for query in queries:
  dois.append(downloader.get_dois_from_search(query))

for i, doi in enumerate(dois):
    my_file = open(str(i) + '.pdf', 'w')
    downloader.get_pdf_from_doi(doi, my_file, 'crossref') #or 'elsevier'
    my_file.close()

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

olivettigroup / article-downloader

Programming Languages

Labels

Projects that are alternatives of or similar to article-downloader

article-downloader

Full API Documentation

Installation

Usage

Examples

Downloading a single PDF article

Downloading a single HTML article

Getting metadata

Using search queries to find DOIs