All Projects → vitorfs → Woid

vitorfs / Woid

Licence: apache-2.0
Simple news aggregator displaying top stories in real time

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Woid

Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-72.06%)
Mutual labels:  crawler, django
Dbworld Search
🔍 简单的搜索引擎, django 框架
Stars: ✭ 39 (-80.88%)
Mutual labels:  crawler, django
Ttbot
今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现
Stars: ✭ 338 (+65.69%)
Mutual labels:  news, crawler
Django Newsfeed
A news curator and newsletter subscription package for Django
Stars: ✭ 155 (-24.02%)
Mutual labels:  news, django
Hotnewsanalysis
利用文本挖掘技术进行新闻热点关注问题分析
Stars: ✭ 93 (-54.41%)
Mutual labels:  news, crawler
Python Testing Crawler
A crawler for automated functional testing of a web application
Stars: ✭ 68 (-66.67%)
Mutual labels:  crawler, django
News Please
news-please - an integrated web crawler and information extractor for news that just works.
Stars: ✭ 969 (+375%)
Mutual labels:  news, crawler
Taiwan News Crawlers
Scrapy-based Crawlers for news of Taiwan
Stars: ✭ 83 (-59.31%)
Mutual labels:  news, crawler
Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+5559.31%)
Mutual labels:  news, crawler
N2h4
네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (-13.24%)
Mutual labels:  news, crawler
Okuna Api
🤖 The Okuna Social Network API
Stars: ✭ 200 (-1.96%)
Mutual labels:  django
Videoserver
以Node.js基于express以及爬虫实现的视频资源后端
Stars: ✭ 200 (-1.96%)
Mutual labels:  crawler
Googlescraper
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Stars: ✭ 2,363 (+1058.33%)
Mutual labels:  crawler
Django Sendgrid V5
An implementation of Django's EmailBackend compatible with sendgrid-python v5+
Stars: ✭ 202 (-0.98%)
Mutual labels:  django
Django Rest Marshmallow
Marshmallow schemas for Django REST framework
Stars: ✭ 198 (-2.94%)
Mutual labels:  django
Django Material
Material Design for Django
Stars: ✭ 2,362 (+1057.84%)
Mutual labels:  django
Laosj
golang light-weight image crawler
Stars: ✭ 199 (-2.45%)
Mutual labels:  crawler
Vecihi
Build Your Own Photo Sharing App in 5 minutes
Stars: ✭ 199 (-2.45%)
Mutual labels:  django
My blog
My Django Blog
Stars: ✭ 198 (-2.94%)
Mutual labels:  django
Tech Blog
我的个人技术博客(Python、Django、Docker、Go、Redis、ElasticSearch、Kafka、Linux)
Stars: ✭ 203 (-0.49%)
Mutual labels:  django

Woid

Python Version Django Version

Table of Contents

Running Locally

First, clone the repository to your local machine:

git clone https://github.com/vitorfs/woid.git

Install the requirements:

pip install -r requirements/dev.txt

Apply the migrations:

python manage.py migrate

Load the initial data:

python manage.py loaddata services.json

Finally, run the development server:

python manage.py runserver

The site will be available at 127.0.0.1:8000.

Supported Services

Currently Woid crawl the following services to collect top stories:

  • Hacker News hn
  • Reddit reddit
  • GitHub github
  • The New York Times nytimes
  • Product Hunt producthunt

Crawlers

You can run the crawlers manually to collect the top stories using the following command:

python manage.py crawl reddit

You can pass multiple services at once:

python manage.py crawl reddit hn nytimes

Valid values: hn, reddit, github, nytimes, producthunt.

The New York Times

To crawl The New York Times you will need an API key.

You can register one application at developer.nytimes.com.

Product Hunt

Product Hunt require an API key to consume their API.

You can register one application at api.producthunt.com/v1/docs

Cron Jobs

You can set up cron jobs to execute the crawlers periodically. Here is what my crontab looks like:

*/5 * * * * /home/woid/venv/bin/python /home/woid/woid/manage.py crawl reddit hn producthunt >> /home/woid/logs/cron.log 2>&1
*/30 * * * * /home/woid/venv/bin/python /home/woid/woid/manage.py crawl nytimes github >> /home/woid/logs/cron.log 2>&1

License

The source code is released under the Apache 2.0 license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].