Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dytttf → Antispider

dytttf / Antispider

Programming Languages

javascript

184084 projects - #8 most used programming language

python

139335 projects - #7 most used programming language

Labels

crawler

Projects that are alternatives of or similar to Antispider

Work crawler

Download comics novels 小说漫画下载工具小説漫画のダウンローダ小說漫畫下載:腾讯漫画大角虫漫画有妖气知音漫客咪咕 SF漫画哦漫画看漫画漫画柜汗汗酷漫動漫伊甸園快看漫画微博动漫 733动漫网大古漫画网漫画DB 無限動漫動漫狂卡推漫画动漫之家动漫屋古风漫画网 36漫画网亲亲漫画网乙女漫画 comico webtoons 咚漫ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミックサイコミ;アルファポリスカクヨムハーメルン小説家になろう起点中文网八一中文网顶点小说落霞小说网努努书坊笔趣阁→epub.

Stars: ✭ 1,224 (+1136.36%)

Mutual labels: crawler

Ktspeechcrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Stars: ✭ 92 (-7.07%)

Mutual labels: crawler

Infinitycrawler

A simple but powerful web crawler library for .NET

Stars: ✭ 97 (-2.02%)

Mutual labels: crawler

Acm Statistics

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, ZOJ, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge

Stars: ✭ 83 (-16.16%)

Mutual labels: crawler

Weibo Album Crawler

新浪微博相册大图多线程爬虫。

Stars: ✭ 83 (-16.16%)

Mutual labels: crawler

Scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

Stars: ✭ 1,322 (+1235.35%)

Mutual labels: crawler

Swiftlinkpreview

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

Stars: ✭ 1,216 (+1128.28%)

Mutual labels: crawler

Gopa Abandoned

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

Stars: ✭ 98 (-1.01%)

Mutual labels: crawler

Proxy Pool

爬虫代理IP池服务，可供其他爬虫程序通过restapi获取

Stars: ✭ 91 (-8.08%)

Mutual labels: crawler

Scaleable Crawler With Docker Cluster

a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine

Stars: ✭ 96 (-3.03%)

Mutual labels: crawler

Taiwan News Crawlers

Scrapy-based Crawlers for news of Taiwan

Stars: ✭ 83 (-16.16%)

Mutual labels: crawler

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+1158.59%)

Mutual labels: crawler

Gf Secrets

Secret and/ credential patterns used for gf.

Stars: ✭ 96 (-3.03%)

Mutual labels: crawler

Is Google

Verify that a request is from Google crawlers using Google's DNS verification steps

Stars: ✭ 82 (-17.17%)

Mutual labels: crawler

Amazonrobot

Amazon商品引流的 python 爬虫

Stars: ✭ 97 (-2.02%)

Mutual labels: crawler

Wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

Stars: ✭ 1,220 (+1132.32%)

Mutual labels: crawler

Hotnewsanalysis

利用文本挖掘技术进行新闻热点关注问题分析

Stars: ✭ 93 (-6.06%)

Mutual labels: crawler

Douyinsdk

抖音 SDK，数据采集，爬虫抓取不是梦

Stars: ✭ 99 (+0%)

Mutual labels: crawler

Thesaurusspider

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫，可用于构建不同行业的词汇库

Stars: ✭ 98 (-1.01%)

Mutual labels: crawler

Lightcrawler

Crawl a website and run it through Google lighthouse

Stars: ✭ 1,339 (+1252.53%)

Mutual labels: crawler

View All Similar Projects ➔

antispider

记录一下碰到过的反爬虫措施和解决办法，欢迎交流!!!

需要验证referer

js跳转 changde.py

http://bbs.changde.gov.cn/

cookie加密验证天眼查 test_down_tianyancha.py

逗比验证码+%99验证失败

http://xygs.gsaic.gov.cn/gsxygs/pub!list.do

豆瓣FM及其他豆瓣网站 https 不严密的cookie参数 test_down_douban.py

js执行后url增加_dsign参数 get_dsign.py

访问显示安全检查中... 5秒后经过js跳转到正常页面

文字使用css样式代替

http://club.autohome.com.cn/bbs/thread-a-100024-62404423-1.html
js代码见 autohome.js
破解代码见autohome.py

限制访问频率以及代理类型

https://m.guazi.com/bj/dazhong/
访问频率要小于 0.5次/s
如果使用代理的话 http协议要用http协议的代理 https要用https的代理，混用的话相当于没加代理

巧妙使用\r在不同平台的差异让爬虫开发者头疼

\r在linux下会被解释为回车，如果使用\r当做换行符，在网页和windows上显示都没有问题，但在linux下输出的时候测绘覆盖\r之前的字符，导致输出结果和网页上看到的少很多。。，如果不太明白\r含义的话，想必要调试很久很久很久很久吧。。。

爬虫技巧-西瓜视频MP4下载地址获取

https://www.ixigua.com/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 99

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dytttf / Antispider

Programming Languages

Labels

Projects that are alternatives of or similar to Antispider

antispider

记录一下碰到过的反爬虫措施和解决办法，欢迎交流!!!

第二级目录无限制

首次访问会出现js中间页跳转 估计是验证ip

页面加载时间特长

discuz论坛板块接口

需要验证referer

js跳转 changde.py

cookie加密验证天眼查 test_down_tianyancha.py

逗比验证码+%99验证失败

豆瓣FM及其他豆瓣网站 https 不严密的cookie参数 test_down_douban.py

js执行后url增加_dsign参数 get_dsign.py

访问显示安全检查中... 5秒后经过js跳转到正常页面

文字使用css样式代替

限制访问频率以及代理类型

巧妙使用\r在不同平台的差异让爬虫开发者头疼

爬虫技巧-西瓜视频MP4下载地址获取

首次访问会出现js中间页跳转估计是验证ip