All Projects → Raspagem-de-dados-para-iniciantes → Similar Projects or Alternatives

288 Open source projects that are alternatives of or similar to Raspagem-de-dados-para-iniciantes

Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (+7.96%)
Mutual labels:  web-crawler, scrapy
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+7326.55%)
Mutual labels:  web-crawler, scrapy
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-86.73%)
Mutual labels:  web-crawler, scrapy
Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Stars: ✭ 63 (-44.25%)
Mutual labels:  web-crawler, scrapy
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (+30.09%)
Mutual labels:  web-crawler, scrapy
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-80.53%)
Mutual labels:  scrapy, spyder
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-39.82%)
Mutual labels:  scrapy, webcrawling
proxi
Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (-71.68%)
Mutual labels:  web-crawler, scrapy
Crawler Commons
A set of reusable Java components that implement functionality common to any web crawler
Stars: ✭ 173 (+53.1%)
Mutual labels:  web-crawler
asyncpy
使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
Stars: ✭ 86 (-23.89%)
Mutual labels:  scrapy
Proxy
A simple tool for fetching usable proxies from several websites.
Stars: ✭ 124 (+9.73%)
Mutual labels:  web-crawler
Nutch
Apache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+1915.04%)
Mutual labels:  web-crawler
vietnam-ecommerce-crawler
Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs
Stars: ✭ 28 (-75.22%)
Mutual labels:  scrapy
Abot
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Stars: ✭ 1,961 (+1635.4%)
Mutual labels:  web-crawler
doc crawler.py
Explore a website recursively and download all the wanted documents (PDF, ODT…)
Stars: ✭ 22 (-80.53%)
Mutual labels:  web-crawler
scrapy helper
Dynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (-25.66%)
Mutual labels:  scrapy
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (+1325.66%)
Mutual labels:  web-crawler
Infinitycrawler
A simple but powerful web crawler library for .NET
Stars: ✭ 97 (-14.16%)
Mutual labels:  web-crawler
Ospider
开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)
Stars: ✭ 74 (-34.51%)
Mutual labels:  web-crawler
Inventus
Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.
Stars: ✭ 80 (-29.2%)
Mutual labels:  scrapy
scrapy-wayback-machine
A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (-18.58%)
Mutual labels:  scrapy
Scrapy-tripadvisor-reviews
Using scrapy to scrape tripadvisor in order to get users' reviews.
Stars: ✭ 24 (-78.76%)
Mutual labels:  scrapy
Abotx
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
Stars: ✭ 63 (-44.25%)
Mutual labels:  web-crawler
arche
Analyze scraped data
Stars: ✭ 49 (-56.64%)
Mutual labels:  scrapy
Market-Trend-Prediction
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
Stars: ✭ 57 (-49.56%)
Mutual labels:  web-crawler
Dutsso
快速登录大连理工大学统一身份认证系统(SSO)的Python模块,可轻松实现成绩提醒、抢课、玉兰卡信息、个人信息查询等功能。
Stars: ✭ 32 (-71.68%)
Mutual labels:  web-crawler
Antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+75.22%)
Mutual labels:  web-crawler
scrapy-LBC
Araignée LeBonCoin avec Scrapy et ElasticSearch
Stars: ✭ 14 (-87.61%)
Mutual labels:  scrapy
Zhihu Crawler People
A simple distributed crawler for zhihu && data analysis
Stars: ✭ 182 (+61.06%)
Mutual labels:  web-crawler
fernando-pessoa
Classificador de poemas do Fernando Pessoa de acordo com os seus heterônimos
Stars: ✭ 31 (-72.57%)
Mutual labels:  scrapy
crawler
python爬虫项目集合
Stars: ✭ 29 (-74.34%)
Mutual labels:  scrapy
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+480.53%)
Mutual labels:  web-crawler
Collector Http
Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Stars: ✭ 130 (+15.04%)
Mutual labels:  web-crawler
ant
A web crawler for Go
Stars: ✭ 264 (+133.63%)
Mutual labels:  web-crawler
fansly
Simply scrape / download all the media from an fansly account
Stars: ✭ 351 (+210.62%)
Mutual labels:  datascraping
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Stars: ✭ 82 (-27.43%)
Mutual labels:  scrapy
Spider Flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Stars: ✭ 365 (+223.01%)
Mutual labels:  web-crawler
Web-Iota
Iota is a web scraper which can find all of the images and links/suburls on a webpage
Stars: ✭ 60 (-46.9%)
Mutual labels:  scrapy
Pulsar
Turn large Web sites into tables and charts using simple SQLs.
Stars: ✭ 100 (-11.5%)
Mutual labels:  web-crawler
itemadapter
Common interface for data container classes
Stars: ✭ 47 (-58.41%)
Mutual labels:  scrapy
Ultimate Dork
Web Crawler
Stars: ✭ 79 (-30.09%)
Mutual labels:  web-crawler
scrapy-rotated-proxy
A scrapy middleware to use rotated proxy ip list.
Stars: ✭ 22 (-80.53%)
Mutual labels:  scrapy
Cvpr2019
Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.
Stars: ✭ 65 (-42.48%)
Mutual labels:  web-crawler
scrapy-kafka-redis
Distributed crawling/scraping, Kafka And Redis based components for Scrapy
Stars: ✭ 45 (-60.18%)
Mutual labels:  scrapy
Maman
Rust Web Crawler saving pages on Redis
Stars: ✭ 39 (-65.49%)
Mutual labels:  web-crawler
ArticleSpider
Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
Stars: ✭ 34 (-69.91%)
Mutual labels:  scrapy
Storm Crawler
A scalable, mature and versatile web crawler based on Apache Storm
Stars: ✭ 703 (+522.12%)
Mutual labels:  web-crawler
lgcrawl
python+scrapy+splash 爬取拉勾全站职位信息
Stars: ✭ 22 (-80.53%)
Mutual labels:  scrapy
Awesome Crawler
A collection of awesome web crawler,spider in different languages
Stars: ✭ 4,793 (+4141.59%)
Mutual labels:  web-crawler
scrapy-mysql-pipeline
scrapy mysql pipeline
Stars: ✭ 47 (-58.41%)
Mutual labels:  scrapy
domains
World’s single largest Internet domains dataset
Stars: ✭ 461 (+307.96%)
Mutual labels:  scrapy
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+220.35%)
Mutual labels:  web-crawler
Ache
ACHE is a web crawler for domain-specific search.
Stars: ✭ 320 (+183.19%)
Mutual labels:  web-crawler
Supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Stars: ✭ 306 (+170.8%)
Mutual labels:  web-crawler
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+8.85%)
Mutual labels:  scrapy
estate-crawler
Scraping the real estate agencies for up-to-date house listings as soon as they arrive!
Stars: ✭ 20 (-82.3%)
Mutual labels:  scrapy
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+145.13%)
Mutual labels:  web-crawler
Spidy
The simple, easy to use command line web crawler.
Stars: ✭ 257 (+127.43%)
Mutual labels:  web-crawler
Strong Web Crawler
基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。
Stars: ✭ 238 (+110.62%)
Mutual labels:  web-crawler
hupu spider
虎扑步行街爬虫
Stars: ✭ 22 (-80.53%)
Mutual labels:  scrapy
1-60 of 288 similar projects