All Projects → itielshwartz → asyncio-hn

itielshwartz / asyncio-hn

Licence: MIT license
Python (asyncio) wrapper for hackernews api

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to asyncio-hn

Hackernews React Graphql
Hacker News clone rewritten with universal JavaScript, using React and GraphQL.
Stars: ✭ 4,242 (+15611.11%)
Mutual labels:  hacker-news, hn
hackernews-button
Privacy-preserving Firefox extension linking to Hacker News discussion; built with Bloom filters and WebAssembly
Stars: ✭ 73 (+170.37%)
Mutual labels:  hacker-news, hn
reading-list
社区驱动的高质量聚合阅读列表
Stars: ✭ 45 (+66.67%)
Mutual labels:  hacker-news
RARBG-scraper
With Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (+40.74%)
Mutual labels:  scraping
socials
👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (+37.04%)
Mutual labels:  scraping
emacs-hnreader
Read Hacker News inside Emacs
Stars: ✭ 34 (+25.93%)
Mutual labels:  hacker-news
html-table-to-json
Generate JSON representations of HTML tables
Stars: ✭ 39 (+44.44%)
Mutual labels:  scraping
tophn
An application to recommend the topmost story of Hacker News from the last 24 hours
Stars: ✭ 31 (+14.81%)
Mutual labels:  hacker-news
shorter.recipes
A website dedicated to making recipes from any website easy to read.
Stars: ✭ 27 (+0%)
Mutual labels:  scraping
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+2533.33%)
Mutual labels:  scraping
4cat
The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Stars: ✭ 144 (+433.33%)
Mutual labels:  scraping
scrapers
scrapers for building your own image databases
Stars: ✭ 46 (+70.37%)
Mutual labels:  scraping
NBA-Fantasy-Optimizer
NBA Daily Fantasy Lineup Optimizer for FanDuel Using Python
Stars: ✭ 21 (-22.22%)
Mutual labels:  scraping
ScrapeBot
A Selenium-driven tool for automated website interaction and scraping.
Stars: ✭ 16 (-40.74%)
Mutual labels:  scraping
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+355.56%)
Mutual labels:  scraping
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+96.3%)
Mutual labels:  scraping
gochanges
**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-55.56%)
Mutual labels:  scraping
etf4u
📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation
Stars: ✭ 29 (+7.41%)
Mutual labels:  scraping
Architeuthis
MITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.
Stars: ✭ 35 (+29.63%)
Mutual labels:  scraping
ioweb
Web Scraping Framework
Stars: ✭ 31 (+14.81%)
Mutual labels:  scraping

asyncio-hn

python-3.6

A simple asyncio wrapper to download hacker-news with speed and ease.

The package supports all endpoints of the official API : hacker-news API

Develop proccess: Using asyncio to download hackernews

Installation

pip install asyncio-hn

Usage

import asyncio
from asyncio_hn import ClientHN

async def main(loop):
    # We init the client - extension of aiohttp.ClientSession
    async with ClientHN(loop=loop) as hn:
        # Up to 500 top and top stories (only ids)
        hn_new_stories = await hn.top_stories()
        # Download top 10 story data
        top_posts = await hn.items(hn_new_stories[:10])
        # Download the user data for each story
        users = await hn.users([post.get("by") for post in top_posts])


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(loop))

Advance usage

Using this config you can reach 1000+ request/sec.

import aiohttp
N = 1_000_000

async def advance_run(loop):
    # We init the client - extension of aiohttp.ClientSession
    conn = aiohttp.TCPConnector(limit=1000, loop=loop)
    async with ClientHN(loop=loop, queue_size=1000, connector=conn, progress_bar=True, debug=True) as hn:
        # Download the last 1,000,000 stories
        hn_new_stories = await hn.last_n_items(n=N)

Output example:

Item:

item = {'by': 'amzans', 'descendants': 25, 'id': 13566716,
                'kids': [13567061, 13567631, 13567027, 13567055, 13566798, 13567473], 'score': 171, 'time': 1486210548,
                'title': 'Network programming with Go (2012)', 'type': 'story',
                'url': 'https://jannewmarch.gitbooks.io/network-programming-with-go-golang-/content/'},
               {'by': 'r3bl', 'descendants': 1, 'id': 13567940, 'kids': [13568249], 'score': 24, 'time': 1486230224,
                'title': 'YouTube removes hundreds of the best climate science videos from the Internet',
                'type': 'story',
                'url': 'http://climatestate.com/2017/02/03/youtube-removes-hundreds-of-the-best-climate-science-videos-from-the-internet/'}

User:

user = {'created': 1470758993, 'id': 'amzans', 'karma': 174,
        'submitted': [13567884, 13566716, 13566699, 13558456, 13539270, 13539151, 13514498, 13418469, 13417725,
                      13416562, 13416097, 13416034, 13415954, 13415894, 13395310, 13394996, 13392554, 12418804,
                      12418361, 12413958, 12411992, 12411732, 12411546, 12262383, 12255593]}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].