All Projects → shiyanhui → Dht

shiyanhui / Dht

Licence: mit
BitTorrent DHT Protocol && DHT Spider.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Dht

Dhtspider
Bittorrent dht network spider
Stars: ✭ 302 (-87.72%)
Mutual labels:  spider, dht
dht-spider
一个简单的基于DHT协议的BT磁力链接爬虫
Stars: ✭ 16 (-99.35%)
Mutual labels:  spider, dht
Btlet
Some toolkits implements part of BT Protocol, like DHT spider.
Stars: ✭ 54 (-97.8%)
Mutual labels:  spider, dht
Zsky
DHT磁力链接magnet BT搜索引擎,纯Python开发
Stars: ✭ 256 (-89.59%)
Mutual labels:  spider, dht
Antcolony
Nodejs实现的一个磁力链接爬虫 http://findit.keenwon.com (原域名http://findit.so )
Stars: ✭ 1,151 (-53.19%)
Mutual labels:  spider, dht
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (-92.44%)
Mutual labels:  spider
Cangibrina
A fast and powerfull dashboard (admin) finder
Stars: ✭ 200 (-91.87%)
Mutual labels:  spider
Dhtesp
Optimized DHT library for ESP32/ESP8266 using Arduino framework
Stars: ✭ 184 (-92.52%)
Mutual labels:  dht
Torrent Paradise
Decentralized DHT search site for IPFS
Stars: ✭ 181 (-92.64%)
Mutual labels:  dht
Py Elasticsearch Django
基于python语言开发的千万级别搜索引擎
Stars: ✭ 207 (-91.58%)
Mutual labels:  spider
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+531.76%)
Mutual labels:  spider
Ok ip proxy pool
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196 (-92.03%)
Mutual labels:  spider
Videospider
抓取豆瓣,bilibili等中的电视剧、电影、动漫演员等信息
Stars: ✭ 186 (-92.44%)
Mutual labels:  spider
Zhihuspider
多线程知乎用户爬虫,基于python3
Stars: ✭ 201 (-91.83%)
Mutual labels:  spider
Dht
dht is used by anacrolix/torrent, and is intended for use as a library in other projects both torrent related and otherwise
Stars: ✭ 184 (-92.52%)
Mutual labels:  dht
Wereader
一个功能全面的微信读书爬虫 wereader
Stars: ✭ 207 (-91.58%)
Mutual labels:  spider
Lianjia Beike Spider
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 ,点星支持,仅供学习参考,请勿用于商业用途,后果自负。
Stars: ✭ 2,257 (-8.21%)
Mutual labels:  spider
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (-92.07%)
Mutual labels:  spider
Jssoup
JavaScript + BeautifulSoup = JSSoup
Stars: ✭ 203 (-91.74%)
Mutual labels:  spider
Zi5book
book.zi5.me全站kindle电子书籍爬取,按照作者书籍名分类,每本书有mobi和equb两种格式,采用分布式进行全站爬取
Stars: ✭ 191 (-92.23%)
Mutual labels:  spider

See the video on the Youtube.

中文版README

Introduction

DHT implements the bittorrent DHT protocol in Go. Now it includes:

It contains two modes, the standard mode and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as more metadata info as possiple. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg.

bthub.io is a BT search engine based on the crawling mode.

Installation

go get github.com/shiyanhui/dht

Example

Below is a simple spider. You can move here to see more samples.

import (
    "fmt"
    "github.com/shiyanhui/dht"
)

func main() {
    downloader := dht.NewWire(65535)
    go func() {
        // once we got the request result
        for resp := range downloader.Response() {
            fmt.Println(resp.InfoHash, resp.MetadataInfo)
        }
    }()
    go downloader.Run()

    config := dht.NewCrawlConfig()
    config.OnAnnouncePeer = func(infoHash, ip string, port int) {
        // request to download the metadata info
        downloader.Request([]byte(infoHash), ip, port)
    }
    d := dht.New(config)

    d.Run()
}

Download

You can download the demo compiled binary file here.

Note

  • The default crawl mode configure costs about 300M RAM. Set MaxNodes and BlackListMaxSize to fit yourself.
  • Now it cant't run in LAN because of NAT.

TODO

  • NAT Traversal.
  • Implements the full BEP-3.
  • Optimization.

FAQ

Why it is slow compared to other spiders ?

Well, maybe there are several reasons.

  • DHT aims to implements the standard BitTorrent DHT protocol, not born for crawling the DHT network.
  • NAT Traversal issue. You run the crawler in a local network.
  • It will block ip which looks like bad and a good ip may be mis-judged.

License

MIT, read more here

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].