🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks

Stars: ✭ 29 (-34.09%)

Mutual labels: crawler

snapcrawl

Crawl a website and take screenshots

Stars: ✭ 37 (-15.91%)

Mutual labels: crawler

desktop

TurboWarp as a desktop app

Stars: ✭ 69 (+56.82%)

Mutual labels: scratch

videodl

Videodl: A lightweight video downloader written by pure python.

Stars: ✭ 320 (+627.27%)

Mutual labels: crawler

indieweb-search

Source code for the IndieWeb search engine.

Stars: ✭ 16 (-63.64%)

Mutual labels: crawler

ZhengFang System Spider

🐛一只登录正方教务管理系统，爬取数据的小爬虫

Stars: ✭ 21 (-52.27%)

Mutual labels: crawler

dijnet-bot

Az összes számlád még egy helyen :)

Stars: ✭ 17 (-61.36%)

Mutual labels: crawler

View All Similar Projects ➔

爬虫相关知识代码

读书笔记《自己动手写网络爬虫》，自己敲的代码。主要记录了网络爬虫的基本实现，网页去重的算法，网页指纹算法，文本信息挖掘

ConsistentHash 一致hash算法
HashAlgorithms hash算法大全
MurmurHash MurMurHash算法，是非加密HASH算法，性能很高，碰撞率低
IPSeeker 封装了腾讯的ip库，提供一些工具,读取QQwry.dat文件，以根据ip获得好友位置
HITS HITS算法实现
PageRank PageRank算法实现
WebGraph Web图建模
WebGraphMemory 内存Web图
SimpleBloomFilter 布隆过滤器
BDBFrontier 使用Berkeley DB 来做爬虫的前端url爬取列表存储
Crawler 爬虫一只，采用了宽度优先的方式爬取网络，并且使用httpclien4.3来下载网页
CrawlUrl 一个封装了爬虫的url地址的对象，可以使用其layer变量控制限制层次的爬取
DownLoadFile 一个下载网页数据到本地的工具类

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

duoan / codes-scratch-crawler

Programming Languages

Labels

Projects that are alternatives of or similar to codes-scratch-crawler

爬虫相关知识代码