All Projects → Raspagem-de-dados-para-iniciantes → Similar Projects or Alternatives

288 Open source projects that are alternatives of or similar to Raspagem-de-dados-para-iniciantes

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (+7.96%)

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+7326.55%)

Mutual labels: web-crawler, scrapy

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-86.73%)

Mutual labels: web-crawler, scrapy

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Stars: ✭ 63 (-44.25%)

Mutual labels: web-crawler, scrapy

Awesome Web Scraper

A collection of awesome web scaper, crawler.

Stars: ✭ 147 (+30.09%)

Mutual labels: web-crawler, scrapy

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-80.53%)

Mutual labels: scrapy, spyder

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-39.82%)

Mutual labels: scrapy, webcrawling

proxi

Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.

Stars: ✭ 32 (-71.68%)

Mutual labels: web-crawler, scrapy

Crawler Commons

A set of reusable Java components that implement functionality common to any web crawler

Stars: ✭ 173 (+53.1%)

Mutual labels: web-crawler

asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

Stars: ✭ 86 (-23.89%)

Mutual labels: scrapy

Proxy

A simple tool for fetching usable proxies from several websites.

Stars: ✭ 124 (+9.73%)

Mutual labels: web-crawler

Nutch

Apache Nutch is an extensible and scalable web crawler

Stars: ✭ 2,277 (+1915.04%)

Mutual labels: web-crawler

vietnam-ecommerce-crawler

Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

Stars: ✭ 28 (-75.22%)

Mutual labels: scrapy

Abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Stars: ✭ 1,961 (+1635.4%)

Mutual labels: web-crawler

doc crawler.py

Explore a website recursively and download all the wanted documents (PDF, ODT…)

Stars: ✭ 22 (-80.53%)

Mutual labels: web-crawler

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

Stars: ✭ 84 (-25.66%)

Mutual labels: scrapy

Pspider

简单易用的Python爬虫框架，QQ交流群：597510560

Stars: ✭ 1,611 (+1325.66%)

Mutual labels: web-crawler

Infinitycrawler

A simple but powerful web crawler library for .NET

Stars: ✭ 97 (-14.16%)

Mutual labels: web-crawler

Ospider

开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)

Stars: ✭ 74 (-34.51%)

Mutual labels: web-crawler

Inventus

Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.

Stars: ✭ 80 (-29.2%)

Mutual labels: scrapy

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 92 (-18.58%)

Mutual labels: scrapy

Scrapy-tripadvisor-reviews

Using scrapy to scrape tripadvisor in order to get users' reviews.

Stars: ✭ 24 (-78.76%)

Mutual labels: scrapy

Abotx

Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.

Stars: ✭ 63 (-44.25%)

Mutual labels: web-crawler

arche

Analyze scraped data

Stars: ✭ 49 (-56.64%)

Mutual labels: scrapy

Market-Trend-Prediction

This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).

Stars: ✭ 57 (-49.56%)

Mutual labels: web-crawler

Dutsso

快速登录大连理工大学统一身份认证系统（SSO）的Python模块，可轻松实现成绩提醒、抢课、玉兰卡信息、个人信息查询等功能。

Stars: ✭ 32 (-71.68%)

Mutual labels: web-crawler

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (+75.22%)

Mutual labels: web-crawler

scrapy-LBC

Araignée LeBonCoin avec Scrapy et ElasticSearch

Stars: ✭ 14 (-87.61%)

Mutual labels: scrapy

Zhihu Crawler People

A simple distributed crawler for zhihu && data analysis

Stars: ✭ 182 (+61.06%)

Mutual labels: web-crawler

fernando-pessoa

Classificador de poemas do Fernando Pessoa de acordo com os seus heterônimos

Stars: ✭ 31 (-72.57%)

Mutual labels: scrapy

crawler

python爬虫项目集合

Stars: ✭ 29 (-74.34%)

Mutual labels: scrapy

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+480.53%)

Mutual labels: web-crawler

Collector Http

Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

Stars: ✭ 130 (+15.04%)

Mutual labels: web-crawler

ant

A web crawler for Go

Stars: ✭ 264 (+133.63%)

Mutual labels: web-crawler

fansly

Simply scrape / download all the media from an fansly account

Stars: ✭ 351 (+210.62%)

Mutual labels: datascraping

pagser

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Stars: ✭ 82 (-27.43%)

Mutual labels: scrapy

Spider Flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Stars: ✭ 365 (+223.01%)

Mutual labels: web-crawler

Web-Iota

Iota is a web scraper which can find all of the images and links/suburls on a webpage

Stars: ✭ 60 (-46.9%)

Mutual labels: scrapy

Pulsar

Turn large Web sites into tables and charts using simple SQLs.

Stars: ✭ 100 (-11.5%)

Mutual labels: web-crawler

itemadapter

Common interface for data container classes

Stars: ✭ 47 (-58.41%)

Mutual labels: scrapy

Ultimate Dork

Web Crawler

Stars: ✭ 79 (-30.09%)

Mutual labels: web-crawler

scrapy-rotated-proxy

A scrapy middleware to use rotated proxy ip list.

Stars: ✭ 22 (-80.53%)

Mutual labels: scrapy

Cvpr2019

Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.

Stars: ✭ 65 (-42.48%)

Mutual labels: web-crawler

scrapy-kafka-redis

Distributed crawling/scraping, Kafka And Redis based components for Scrapy

Stars: ✭ 45 (-60.18%)

Mutual labels: scrapy

Maman

Rust Web Crawler saving pages on Redis

Stars: ✭ 39 (-65.49%)

Mutual labels: web-crawler

ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

Stars: ✭ 34 (-69.91%)

Mutual labels: scrapy

Storm Crawler

A scalable, mature and versatile web crawler based on Apache Storm

Stars: ✭ 703 (+522.12%)

Mutual labels: web-crawler

lgcrawl

python+scrapy+splash 爬取拉勾全站职位信息

Stars: ✭ 22 (-80.53%)

Mutual labels: scrapy

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+4141.59%)

Mutual labels: web-crawler

scrapy-mysql-pipeline

scrapy mysql pipeline

Stars: ✭ 47 (-58.41%)

Mutual labels: scrapy

domains

World’s single largest Internet domains dataset

Stars: ✭ 461 (+307.96%)

Mutual labels: scrapy

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+220.35%)

Mutual labels: web-crawler

Ache

ACHE is a web crawler for domain-specific search.

Stars: ✭ 320 (+183.19%)

Mutual labels: web-crawler

Supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Stars: ✭ 306 (+170.8%)

Mutual labels: web-crawler

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+8.85%)

Mutual labels: scrapy

estate-crawler

Scraping the real estate agencies for up-to-date house listings as soon as they arrive!

Stars: ✭ 20 (-82.3%)

Mutual labels: scrapy

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn