Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-99.87%)

Mutual labels: crawler, scraper

newsemble

API for fetching data from news websites.

Stars: ✭ 42 (-99.64%)

Mutual labels: scraper, news

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-99.58%)

Mutual labels: crawler, crawling

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (-94.32%)

Mutual labels: crawler, scraper

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (-93.24%)

Mutual labels: crawler, scraper

weibo-scraper

Simple Weibo Scraper

Stars: ✭ 50 (-99.57%)

Mutual labels: crawler, scraper

Spidy

The simple, easy to use command line web crawler.

Stars: ✭ 257 (-97.77%)

Mutual labels: crawler, crawling

Mimo-Crawler

A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.

Stars: ✭ 22 (-99.81%)

Mutual labels: scraper, crawling

Hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

Stars: ✭ 295 (-97.44%)

Mutual labels: crawler, scraper

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-97.52%)

Mutual labels: crawler, crawling

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (-97.02%)

Mutual labels: crawler, scraper

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (-97.6%)

Mutual labels: crawler, crawling

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (-29.55%)

Mutual labels: crawler, scraper

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (-96.53%)

Mutual labels: crawler, scraper

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (-96.05%)

Mutual labels: scraper, crawling

crawlkit

A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.

Stars: ✭ 23 (-99.8%)

Mutual labels: scraper, crawling

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-99.78%)

Mutual labels: crawler, scraper

Social Scraper

Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt

Stars: ✭ 47 (-99.59%)

Mutual labels: crawler, scraper

Jd Autobuy

Python爬虫，京东自动登录，在线抢购商品

Stars: ✭ 1,174 (-89.83%)

Mutual labels: crawler, scraper

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (-95.36%)

Mutual labels: crawler, scraper

News Please

news-please - an integrated web crawler and information extractor for news that just works.

Stars: ✭ 969 (-91.61%)

Mutual labels: news, crawler

Pypergrabber

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

Stars: ✭ 14 (-99.88%)

Mutual labels: crawler, scraper

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-99.41%)

Mutual labels: crawler, crawling

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (-58.48%)

Mutual labels: crawler, scraper

Taiwan News Crawlers

Scrapy-based Crawlers for news of Taiwan

Stars: ✭ 83 (-99.28%)

Mutual labels: news, crawler

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (-89.21%)

Mutual labels: crawler, scraper

Hotnewsanalysis

利用文本挖掘技术进行新闻热点关注问题分析

Stars: ✭ 93 (-99.19%)

Mutual labels: news, crawler

Google Play Scraper

Node.js scraper to get data from Google Play

Stars: ✭ 1,606 (-86.09%)

Mutual labels: crawler, scraper

Wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

Stars: ✭ 1,220 (-89.43%)

Mutual labels: crawler, scraper

Scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

Stars: ✭ 1,322 (-88.55%)

Mutual labels: crawler, scraper

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-99.13%)

Mutual labels: crawler, crawling

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (-99.55%)

Mutual labels: scraper, crawling

TrollHunter

Twitter Troll & Fake News Hunter - Crawls news websites and twitter to identify fake news

Stars: ✭ 38 (-99.67%)

Mutual labels: scraper, news

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-99.07%)

Mutual labels: crawler, scraper

1-60 of 1004 similar projects

›

next*5