All Projects → TaiwanStat → Taiwan News Crawlers

TaiwanStat / Taiwan News Crawlers

Licence: mit
Scrapy-based Crawlers for news of Taiwan

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Taiwan News Crawlers

Scrapy Redis
Redis-based components for Scrapy.
Stars: ✭ 4,998 (+5921.69%)
Mutual labels:  crawler, scrapy
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+657.83%)
Mutual labels:  crawler, scrapy
Fbcrawl
A Facebook crawler
Stars: ✭ 536 (+545.78%)
Mutual labels:  crawler, scrapy
Tsrtc
台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler
Stars: ✭ 359 (+332.53%)
Mutual labels:  taiwan, crawler
News Please
news-please - an integrated web crawler and information extractor for news that just works.
Stars: ✭ 969 (+1067.47%)
Mutual labels:  news, crawler
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+459.04%)
Mutual labels:  crawler, scrapy
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+602.41%)
Mutual labels:  crawler, scrapy
Scrapy Crawlera
Crawlera middleware for Scrapy
Stars: ✭ 281 (+238.55%)
Mutual labels:  crawler, scrapy
Scrapy Azuresearch Crawler Samples
Scrapy as a Web Crawler for Azure Search Samples
Stars: ✭ 20 (-75.9%)
Mutual labels:  crawler, scrapy
Py3 scripts
Life is short, *****.
Stars: ✭ 5 (-93.98%)
Mutual labels:  crawler, scrapy
Vault
swiss army knife for hackers
Stars: ✭ 346 (+316.87%)
Mutual labels:  crawler, scrapy
Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Stars: ✭ 63 (-24.1%)
Mutual labels:  crawler, scrapy
Ttbot
今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现
Stars: ✭ 338 (+307.23%)
Mutual labels:  news, crawler
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+5915.66%)
Mutual labels:  crawler, scrapy
Tsec
台灣上市上櫃股票爬蟲 Taiwan Stock Exchange Crawler
Stars: ✭ 327 (+293.98%)
Mutual labels:  taiwan, crawler
Wechatsogou
基于搜狗微信搜索的微信公众号爬虫接口
Stars: ✭ 5,220 (+6189.16%)
Mutual labels:  crawler, scrapy
Woid
Simple news aggregator displaying top stories in real time
Stars: ✭ 204 (+145.78%)
Mutual labels:  news, crawler
ptt-web-crawler
PTT 網路版爬蟲
Stars: ✭ 20 (-75.9%)
Mutual labels:  crawler, scrapy
Scrapyrt
HTTP API for Scrapy spiders
Stars: ✭ 637 (+667.47%)
Mutual labels:  crawler, scrapy
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+10010.84%)
Mutual labels:  crawler, scrapy

Taiwan-news-crawlers

🐞 Scrapy-based Crawlers for news of Taiwan including 10 media companies:

  1. 蘋果日報
  2. 中國時報
  3. 中央社
  4. 華視
  5. 東森新聞雲
  6. 自由時報
  7. 公視
  8. 三立
  9. TVBS
  10. UDN

Getting Started

$ git clone https://github.com/TaiwanStat/Taiwan-news-crawlers.git
$ cd Taiwan-news-crawlers
$ pip install -r requirements.txt
$ scrapy crawl apple -o apple_news.json

Prerequisites

  • Python3
  • Scrapy 1.3.0

Usage

scrapy crawl <spider> -o <output_name>

Available spiders

  1. apple
  2. appleRealtime
  3. china
  4. cna
  5. cts
  6. ettoday
  7. liberty
  8. libertyRealtime
  9. pts
  10. setn
  11. tvbs
  12. udn

Output

Key Value
website the publisher
url the origin web
title the news title
content the news content
category the category of news

License

The MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].