All Projects → lwl5219 → ancient_chinese

lwl5219 / ancient_chinese

Licence: other
古汉语(文言文)字典-爬取文言文字典网,制作Kindle字典.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ancient chinese

scrapy-html-storage
Scrapy downloader middleware that stores response HTMLs to disk.
Stars: ✭ 17 (-64.58%)
Mutual labels:  scrapy
scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-64.58%)
Mutual labels:  scrapy
scrapy-cloudflare-middleware
A Scrapy middleware to bypass the CloudFlare's anti-bot protection
Stars: ✭ 84 (+75%)
Mutual labels:  scrapy
Inventus
Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.
Stars: ✭ 80 (+66.67%)
Mutual labels:  scrapy
Raspagem-de-dados-para-iniciantes
Raspagem de dados para iniciante usando Scrapy e outras libs básicas
Stars: ✭ 113 (+135.42%)
Mutual labels:  scrapy
jmdict-kindle
Japanese - English dictionary for Kindle based on the JMdict / EDICT database
Stars: ✭ 151 (+214.58%)
Mutual labels:  kindle
fernando-pessoa
Classificador de poemas do Fernando Pessoa de acordo com os seus heterônimos
Stars: ✭ 31 (-35.42%)
Mutual labels:  scrapy
www job com
爬取拉勾、BOSS直聘、智联招聘、51job、赶集招聘、58招聘等职位信息
Stars: ✭ 47 (-2.08%)
Mutual labels:  scrapy
easypoi
简单、免费、高效的百度地图poi采集和分析工具。
Stars: ✭ 87 (+81.25%)
Mutual labels:  scrapy
mpspider
公众号文章抓取&生成kindle电子书
Stars: ✭ 51 (+6.25%)
Mutual labels:  kindle
RARBG-scraper
With Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (-20.83%)
Mutual labels:  scrapy
hupu spider
虎扑步行街爬虫
Stars: ✭ 22 (-54.17%)
Mutual labels:  scrapy
small-spider-project
日常爬虫
Stars: ✭ 14 (-70.83%)
Mutual labels:  scrapy
scrapy-mysql-pipeline
scrapy mysql pipeline
Stars: ✭ 47 (-2.08%)
Mutual labels:  scrapy
web full stack application
show full stack technology applications : Scrapy + webservice[restful] + websocket + VueJS + MongoDB
Stars: ✭ 16 (-66.67%)
Mutual labels:  scrapy
calibre-kindle-comics
A calibre plugin that converts your comics into a readable format for kindle.
Stars: ✭ 32 (-33.33%)
Mutual labels:  kindle
Scrapy-SearchEngines
bing、google、baidu搜索引擎爬虫。python3.6 and scrapy
Stars: ✭ 28 (-41.67%)
Mutual labels:  scrapy
kindle-kt3 weatherdisplay battery-optimized
Use your Kindle KT3 (8. generation) for a weather station with weather underground api and homematic weather sensor.
Stars: ✭ 73 (+52.08%)
Mutual labels:  kindle
ufc fight predictor
UFC bout winner prediction using neural nets.
Stars: ✭ 22 (-54.17%)
Mutual labels:  scrapy
InMangaKindle
Descarga manga en español en diferentes formatos (PNG, PDF, EPUB, MOBI)
Stars: ✭ 43 (-10.42%)
Mutual labels:  kindle

关于

使用Scrapy框架爬取汉语言文学网文言文字典,使用抓取到的数据制作成Kindle字典.

下面目录结构中列出的文件是运行爬虫和字典生成最终产生的文件,如果懒得爬取可以直接使用下述文件:

  • Ancient_Chinese_Dict.mobi kindle字典
  • ancient_chinese.mongodb 爬取结果的mongodb备份
  • ancient_chinese.json 爬取的json结果
├── ancient_chinese_dict
│   ├── Ancient_Chinese_Dict.mobi
└── dict
    ├── out_file
    │   ├── ancient_chinese.json
    │   └── ancient_chinese.mongodb

依赖

  • scrapy
  • kindlegen

kindlegen 下载地址:https://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765211

Mac 下可通过 brew 安装:

brew install kindlegen

scrapy 安装参见: https://docs.scrapy.org/en/latest/intro/install.html

使用

爬取

进入 dict 目录,执行命令:

scrapy crawl guhanyu -o out_file/ancient_chinese.json

将爬取的结果保存成json文件,由于爬取耗时比较长,建议后台运行:

nohup scrapy crawl guhanyu -o out_file/ancient_chinese.json scrapy.log 2>&1 &

制作Kindle字典

将爬取的 ancient_chinese.json 文件拷贝到 ancient_chinese_dict 目录,执行:

sh make_dict.sh

可生成Kindle字典 Ancient_Chinese_Dict.mobi

感谢

自制 Kindle 字典简明教程(入门篇)

自制 Kindle 字典简明教程(进阶篇)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].