A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+556%)

Mutual labels: scraper, spider

evine

Interactive CLI Web Crawler

Stars: ✭ 140 (+40%)

Mutual labels: data-mining, scraper

website-to-json

Converts website to json using jQuery selectors

Stars: ✭ 37 (-63%)

Mutual labels: data-mining, scraper

awesome-Python-data-science-books

Probably the best curated list of data science books in Python

Stars: ✭ 331 (+231%)

Mutual labels: data-mining, books

aliexscrape

Get Aliexpress product details in JSON

Stars: ✭ 80 (-20%)

Mutual labels: scraper, spider

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (+244%)

Mutual labels: scraper, spider

Java Spider

一个基于webmagic框架二次开发的java爬虫框架实战，已实现能爬取腾讯，搜狐，今日头条（单独集成功能）等资讯内容，配合elasticsearch框架用法，实现了自动爬虫，已投入线上生产使用。

Stars: ✭ 276 (+176%)

Mutual labels: scraper, spider

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-75%)

Mutual labels: scraper, spider

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+8033%)

Mutual labels: scraper, spider

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (+2292%)

Mutual labels: scraper, spider

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (+7%)

Mutual labels: scraper, spider

Twitter Get Old Tweets Scraper

A data scraper for retrieving old tweets in Twitter using Python3.

Stars: ✭ 27 (-73%)

Mutual labels: data-mining, scraper

Awesome Ai Books

Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning

Stars: ✭ 855 (+755%)

Mutual labels: data-mining, books

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (+681%)

Mutual labels: scraper, spider

perke

A keyphrase extractor for Persian

Stars: ✭ 60 (-40%)

Mutual labels: data-mining, data-processing

Mailinglistscraper

A python web scraper for public email lists.

Stars: ✭ 19 (-81%)

Mutual labels: scraper, spider

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (+90%)

Mutual labels: scraper, spider

TikTokDownloader PyWebIO

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具，支持API调用，在线批量解析及下载。

Stars: ✭ 919 (+819%)

Mutual labels: scraper, spider

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+4737%)

Mutual labels: data-mining, scraper

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (+436%)

Mutual labels: scraper, spider

Instagram-Comments-Scraper

Instagram comment scraper using python and selenium. Save the comments into excel.

Stars: ✭ 73 (-27%)

Mutual labels: data-mining, scraper

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (+301%)

Mutual labels: scraper, spider

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

Stars: ✭ 80 (-20%)

Mutual labels: scraper, spider

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (-48%)

Mutual labels: scraper, spider

Spydan

A web spider for shodan.io without using the Developer API.

Stars: ✭ 30 (-70%)

Mutual labels: scraper, spider

robotstxt

robots.txt file parsing and checking for R

Stars: ✭ 65 (-35%)

Mutual labels: scraper, spider

Xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Stars: ✭ 335 (+235%)

Mutual labels: scraper, data-processing

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+4693%)

Mutual labels: scraper, spider

LeetCode

At present contains scraped data from around 1500 problems present on the site. More to follow....

Stars: ✭ 45 (-55%)

Mutual labels: data-mining, scraper

unpaprd

An audiobook 🎧 📔 app made using Flutter

Stars: ✭ 73 (-27%)

Mutual labels: books, audiobooks

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (-47%)

Mutual labels: scraper, spider

sede

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Stars: ✭ 83 (-17%)

Mutual labels: spider

data-exploration-with-apache-drill

Data Exploration with Apache Drill

Stars: ✭ 25 (-75%)

Mutual labels: data-mining

awesome-programming-books

List of good programming books for beginners and professionals

Stars: ✭ 68 (-32%)

Mutual labels: books

trawler

scraper for facebook, gab, google and tiktok

Stars: ✭ 20 (-80%)

Mutual labels: scraper

xforest

A super-fast and scalable Random Forest library based on fast histogram decision tree algorithm and distributed bagging framework. It can be used for binary classification, multi-label classification, and regression tasks. This library provides both Python and command line interface to users.

Stars: ✭ 20 (-80%)

Mutual labels: data-mining

PainlessDocker

Painless Docker book git repository.

Stars: ✭ 17 (-83%)

Mutual labels: books

python web scraping

Web scraping using python, requests and selenium

Stars: ✭ 40 (-60%)

Mutual labels: scraper

spider-mzitu

妹子图

Stars: ✭ 13 (-87%)

Mutual labels: spider

scrapeer

Essential PHP library that scrapes HTTP(S) and UDP trackers for torrent information.

Stars: ✭ 81 (-19%)

Mutual labels: scraper

hierarchical-clustering

A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.

Stars: ✭ 62 (-38%)

Mutual labels: data-mining

scikit-cycling

Tools to analyze cycling data

Stars: ✭ 25 (-75%)

Mutual labels: data-mining

Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Stars: ✭ 36 (-64%)

Mutual labels: data-mining

xgboost-smote-detect-fraud

Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!

Stars: ✭ 59 (-41%)

Mutual labels: data-mining

pyitau

Unofficial client to access your Itaú bank data

Stars: ✭ 28 (-72%)

Mutual labels: scraper

PTTmineR

Parallel Searching and Crawling Data from PTT 🚀

Stars: ✭ 31 (-69%)

Mutual labels: scraper

sciblox

sciblox - Easier Data Science and Machine Learning