TaiwanStat / Taiwan News Crawlers
Licence: mit
Scrapy-based Crawlers for news of Taiwan
Stars: ✭ 83
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Taiwan News Crawlers
Scrapy Redis
Redis-based components for Scrapy.
Stars: ✭ 4,998 (+5921.69%)
Mutual labels: crawler, scrapy
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+657.83%)
Mutual labels: crawler, scrapy
Tsrtc
台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler
Stars: ✭ 359 (+332.53%)
Mutual labels: taiwan, crawler
News Please
news-please - an integrated web crawler and information extractor for news that just works.
Stars: ✭ 969 (+1067.47%)
Mutual labels: news, crawler
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+459.04%)
Mutual labels: crawler, scrapy
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+602.41%)
Mutual labels: crawler, scrapy
Scrapy Azuresearch Crawler Samples
Scrapy as a Web Crawler for Azure Search Samples
Stars: ✭ 20 (-75.9%)
Mutual labels: crawler, scrapy
Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Stars: ✭ 63 (-24.1%)
Mutual labels: crawler, scrapy
Ttbot
今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现
Stars: ✭ 338 (+307.23%)
Mutual labels: news, crawler
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+5915.66%)
Mutual labels: crawler, scrapy
Woid
Simple news aggregator displaying top stories in real time
Stars: ✭ 204 (+145.78%)
Mutual labels: news, crawler
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+10010.84%)
Mutual labels: crawler, scrapy
Taiwan-news-crawlers
🐞 Scrapy-based Crawlers for news of Taiwan including 10 media companies:
- 蘋果日報
- 中國時報
- 中央社
- 華視
- 東森新聞雲
- 自由時報
- 公視
- 三立
- TVBS
- UDN
Getting Started
$ git clone https://github.com/TaiwanStat/Taiwan-news-crawlers.git
$ cd Taiwan-news-crawlers
$ pip install -r requirements.txt
$ scrapy crawl apple -o apple_news.json
Prerequisites
- Python3
- Scrapy 1.3.0
Usage
scrapy crawl <spider> -o <output_name>
Available spiders
- apple
- appleRealtime
- china
- cna
- cts
- ettoday
- liberty
- libertyRealtime
- pts
- setn
- tvbs
- udn
Output
Key | Value |
---|---|
website | the publisher |
url | the origin web |
title | the news title |
content | the news content |
category | the category of news |
License
The MIT License
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].