Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → vitorfs → Woid

vitorfs / Woid

Licence: apache-2.0

Simple news aggregator displaying top stories in real time

Programming Languages

python

139335 projects - #7 most used programming language

Labels

django crawler news

Projects that are alternatives of or similar to Woid

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-72.06%)

Mutual labels: crawler, django

Dbworld Search

🔍 简单的搜索引擎, django 框架

Stars: ✭ 39 (-80.88%)

Mutual labels: crawler, django

Ttbot

今日头条机器人，支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等，使用今日头条网页版API实现

Stars: ✭ 338 (+65.69%)

Mutual labels: news, crawler

Django Newsfeed

A news curator and newsletter subscription package for Django

Stars: ✭ 155 (-24.02%)

Mutual labels: news, django

Hotnewsanalysis

利用文本挖掘技术进行新闻热点关注问题分析

Stars: ✭ 93 (-54.41%)

Mutual labels: news, crawler

Python Testing Crawler

A crawler for automated functional testing of a web application

Stars: ✭ 68 (-66.67%)

Mutual labels: crawler, django

News Please

news-please - an integrated web crawler and information extractor for news that just works.

Stars: ✭ 969 (+375%)

Mutual labels: news, crawler

Taiwan News Crawlers

Scrapy-based Crawlers for news of Taiwan

Stars: ✭ 83 (-59.31%)

Mutual labels: news, crawler

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+5559.31%)

Mutual labels: news, crawler

N2h4

네이버 뉴스 수집을 위한 도구

Stars: ✭ 177 (-13.24%)

Mutual labels: news, crawler

Okuna Api

🤖 The Okuna Social Network API

Stars: ✭ 200 (-1.96%)

Mutual labels: django

Videoserver

以Node.js基于express以及爬虫实现的视频资源后端

Stars: ✭ 200 (-1.96%)

Mutual labels: crawler

Googlescraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

Stars: ✭ 2,363 (+1058.33%)

Mutual labels: crawler

Django Sendgrid V5

An implementation of Django's EmailBackend compatible with sendgrid-python v5+

Stars: ✭ 202 (-0.98%)

Mutual labels: django

Django Rest Marshmallow

Marshmallow schemas for Django REST framework

Stars: ✭ 198 (-2.94%)

Mutual labels: django

Django Material

Material Design for Django

Stars: ✭ 2,362 (+1057.84%)

Mutual labels: django

Laosj

golang light-weight image crawler

Stars: ✭ 199 (-2.45%)

Mutual labels: crawler

Vecihi

Build Your Own Photo Sharing App in 5 minutes

Stars: ✭ 199 (-2.45%)

Mutual labels: django

My blog

My Django Blog

Stars: ✭ 198 (-2.94%)

Mutual labels: django

Tech Blog

我的个人技术博客（Python、Django、Docker、Go、Redis、ElasticSearch、Kafka、Linux）

Stars: ✭ 203 (-0.49%)

Mutual labels: django

View All Similar Projects ➔

Woid

Table of Contents

Running Locally
Supported Services
Crawlers
License

Running Locally

First, clone the repository to your local machine:

git clone https://github.com/vitorfs/woid.git

Install the requirements:

pip install -r requirements/dev.txt

Apply the migrations:

python manage.py migrate

Load the initial data:

python manage.py loaddata services.json

Finally, run the development server:

python manage.py runserver

The site will be available at 127.0.0.1:8000.

Supported Services

Currently Woid crawl the following services to collect top stories:

Hacker News hn
Reddit reddit
GitHub github
The New York Times nytimes
Product Hunt producthunt

Crawlers

You can run the crawlers manually to collect the top stories using the following command:

python manage.py crawl reddit

You can pass multiple services at once:

python manage.py crawl reddit hn nytimes

Valid values: hn, reddit, github, nytimes, producthunt.

The New York Times

To crawl The New York Times you will need an API key.

You can register one application at developer.nytimes.com.

Product Hunt

Product Hunt require an API key to consume their API.

You can register one application at api.producthunt.com/v1/docs

Cron Jobs

You can set up cron jobs to execute the crawlers periodically. Here is what my crontab looks like:

*/5 * * * * /home/woid/venv/bin/python /home/woid/woid/manage.py crawl reddit hn producthunt >> /home/woid/logs/cron.log 2>&1
*/30 * * * * /home/woid/venv/bin/python /home/woid/woid/manage.py crawl nytimes github >> /home/woid/logs/cron.log 2>&1

License

The source code is released under the Apache 2.0 license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 204

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗