All Projects → Spider → Similar Projects or Alternatives

544 Open source projects that are alternatives of or similar to Spider

TwEater

A Python Bot for Scraping Conversations from Twitter

Stars: ✭ 16 (-98.32%)

Mutual labels: text-mining, spider

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (-18.13%)

Mutual labels: spider

Douyin

API of DouYin for Humans used to Crawl Popular Videos and Musics

Stars: ✭ 580 (-39.2%)

Mutual labels: spider

Xsrfprobe

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

Stars: ✭ 532 (-44.23%)

Mutual labels: spider

Infospider

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

Stars: ✭ 5,984 (+527.25%)

Mutual labels: spider

Rake Nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Stars: ✭ 793 (-16.88%)

Mutual labels: text-mining

Xxl Crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Stars: ✭ 561 (-41.19%)

Mutual labels: spider

Text Mining

Text Mining in Python

Stars: ✭ 18 (-98.11%)

Mutual labels: text-mining

Querido Diario

📰 Brazilian government gazettes, accessible to everyone.

Stars: ✭ 681 (-28.62%)

Mutual labels: spider

Listed Company News Crawl And Text Analysis

从新浪财经、每经网、金融界、中国证券网、证券时报网上，爬取上市公司（个股）的历史新闻文本数据进行文本分析、提取特征集，然后利用SVM、随机森林等分类器进行训练，最后对实施抓取的新闻数据进行分类预测

Stars: ✭ 494 (-48.22%)

Mutual labels: text-mining

Ldavis

R package for web-based interactive topic model visualization.

Stars: ✭ 466 (-51.15%)

Mutual labels: text-mining

Istock

👉一个基于spring boot 实现的java股票爬虫(仅支持A股)，如果你❤️请⭐️ . V2升级版正在开发中！

Stars: ✭ 622 (-34.8%)

Mutual labels: spider

Anti Anti Spider

越来越多的网站具有反爬虫特性，有的用图片隐藏关键数据，有的使用反人类的验证码，建立反反爬虫的代码仓库，通过与不同特性的网站做斗争（无恶意）提高技术。（欢迎提交难以采集的网站）（因工作原因，项目暂停）

Stars: ✭ 6,907 (+624%)

Mutual labels: spider

Baiduimagespider

一个超级轻量的百度图片爬虫

Stars: ✭ 591 (-38.05%)

Mutual labels: spider

Mailinglistscraper

A python web scraper for public email lists.

Stars: ✭ 19 (-98.01%)

Mutual labels: spider

Spider163

抓取网易云音乐热门评论

Stars: ✭ 569 (-40.36%)

Mutual labels: spider

Gospider

Gospider - Fast web spider written in Go

Stars: ✭ 785 (-17.71%)

Mutual labels: spider

Web kg

爬取百度百科中文页面，抽取三元组信息，构建中文知识图谱

Stars: ✭ 549 (-42.45%)

Mutual labels: spider

Douban spider

一个简单的豆瓣信息爬虫😄

Stars: ✭ 8 (-99.16%)

Mutual labels: spider

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+423.38%)

Mutual labels: spider

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (-25.05%)

Mutual labels: text-mining

Anti Webspider

Web 端反爬技术方案

Stars: ✭ 486 (-49.06%)

Mutual labels: spider

Javlibrary

Javlibrary spider

Stars: ✭ 17 (-98.22%)

Mutual labels: spider

Oneblog

👽 OneBlog，一个简洁美观、功能强大并且自适应的Java博客

Stars: ✭ 678 (-28.93%)

Mutual labels: spider

Awesome Sentiment Analysis

Repository with all what is necessary for sentiment analysis and related areas

Stars: ✭ 459 (-51.89%)

Mutual labels: text-mining

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (-52.2%)

Mutual labels: spider

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (-34.07%)

Mutual labels: spider

Autophrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

Stars: ✭ 835 (-12.47%)

Mutual labels: text-mining

Python Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Stars: ✭ 615 (-35.53%)

Mutual labels: spider

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-97.38%)

Mutual labels: spider

Domain hunter

A Burp Suite Extension that try to find all sub-domain, similar-domain and related-domain of an organization automatically! 基于流量自动收集整个企业或组织的子域名、相似域名、相关域名的burp插件

Stars: ✭ 594 (-37.74%)

Mutual labels: spider

Torbot

Dark Web OSINT Tool

Stars: ✭ 821 (-13.94%)

Mutual labels: spider

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (-38.26%)

Mutual labels: spider

Pholcus

Pholcus is a distributed high-concurrency crawler software written in pure golang

Stars: ✭ 6,990 (+632.7%)

Mutual labels: spider

Netdiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

Stars: ✭ 573 (-39.94%)

Mutual labels: spider

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (-17.19%)

Mutual labels: text-mining

Bigartm

Fast topic modeling platform

Stars: ✭ 563 (-40.99%)

Mutual labels: text-mining

Baiduyunspider

百度云网盘搜索引擎，包含爬虫 & 网站

Stars: ✭ 903 (-5.35%)

Mutual labels: spider

91porn php

最简单的91porn爬虫php版本

Stars: ✭ 557 (-41.61%)

Mutual labels: spider

Funpyspidersearchengine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Stars: ✭ 782 (-18.03%)

Mutual labels: spider

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (-43.82%)

Mutual labels: spider

Blackwidow

A Python based web application scanner to gather OSINT and fuzz for OWASP vulnerabilities on a target website.

Stars: ✭ 887 (-7.02%)

Mutual labels: spider

Go jobs

带你了解一下Golang的市场行情

Stars: ✭ 526 (-44.86%)

Mutual labels: spider

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (-20.13%)

Mutual labels: spider

Nlp Notebooks

A collection of notebooks for Natural Language Processing from NLP Town

Stars: ✭ 513 (-46.23%)

Mutual labels: text-mining

Bagofconcepts

Python implementation of bag-of-concepts

Stars: ✭ 18 (-98.11%)

Mutual labels: text-mining

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+402.41%)

Mutual labels: spider

Bilibili Api

哔哩哔哩的API调用模块

Stars: ✭ 704 (-26.21%)

Mutual labels: spider

Movieheavens

🎬 基于Pyqt5的简单电影搜索工具

Stars: ✭ 465 (-51.26%)

Mutual labels: spider

Easylogin

A python3 package for writing spider more easily.

Stars: ✭ 26 (-97.27%)

Mutual labels: spider

Qzoneexport

QQ空间导出助手，用于备份QQ空间的说说、日志、私密日记、相册、视频、留言板、QQ好友、收藏夹、分享、最近访客为文件，便于迁移与保存

Stars: ✭ 456 (-52.2%)

Mutual labels: spider

Grab Site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Stars: ✭ 680 (-28.72%)

Mutual labels: spider

Tumblr spider

汤不热 python 多线程爬虫

Stars: ✭ 458 (-51.99%)

Mutual labels: spider

Zhihu Crawler

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

Stars: ✭ 890 (-6.71%)

Mutual labels: spider

Learnpython

Python的基础练习代码与各种爬虫代码

Stars: ✭ 451 (-52.73%)

Mutual labels: spider

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (-31.24%)

Mutual labels: spider

Jspider

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

Stars: ✭ 914 (-4.19%)

Mutual labels: spider

Go Demo

Go语言实例教程从入门到进阶，包括基础库使用、设计模式、面试易错点、工具类、对接第三方等

Stars: ✭ 881 (-7.65%)

Mutual labels: spider

Go spider

A golang spider