DwarfThief / Raspagem-de-dados-para-iniciantes

Licence: GPL-3.0 license

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

Programming Languages

python

139335 projects - #7 most used programming language

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to Raspagem-de-dados-para-iniciantes

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-86.73%)

Mutual labels: web-crawler, scrapy

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-80.53%)

Mutual labels: scrapy, spyder

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-39.82%)

Mutual labels: scrapy, webcrawling

proxi

Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.

Stars: ✭ 32 (-71.68%)

Mutual labels: web-crawler, scrapy

Crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+7326.55%)

Mutual labels: web-crawler, scrapy

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (+7.96%)

Mutual labels: web-crawler, scrapy

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Stars: ✭ 63 (-44.25%)

Mutual labels: web-crawler, scrapy

Awesome Web Scraper

A collection of awesome web scaper, crawler.

Stars: ✭ 147 (+30.09%)

Mutual labels: web-crawler, scrapy

vietnam-ecommerce-crawler

Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

Stars: ✭ 28 (-75.22%)

Mutual labels: scrapy

doc crawler.py

Explore a website recursively and download all the wanted documents (PDF, ODT…)

Stars: ✭ 22 (-80.53%)

Mutual labels: web-crawler

asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

Stars: ✭ 86 (-23.89%)

Mutual labels: scrapy

scrapy-LBC

Araignée LeBonCoin avec Scrapy et ElasticSearch

Stars: ✭ 14 (-87.61%)

Mutual labels: scrapy

fernando-pessoa

Classificador de poemas do Fernando Pessoa de acordo com os seus heterônimos

Stars: ✭ 31 (-72.57%)

Mutual labels: scrapy

crawler

python爬虫项目集合

Stars: ✭ 29 (-74.34%)

Mutual labels: scrapy

ant

A web crawler for Go

Stars: ✭ 264 (+133.63%)

Mutual labels: web-crawler

Web-Iota

Iota is a web scraper which can find all of the images and links/suburls on a webpage

Stars: ✭ 60 (-46.9%)

Mutual labels: scrapy

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

Stars: ✭ 84 (-25.66%)

Mutual labels: scrapy

scrapy-kafka-redis

Distributed crawling/scraping, Kafka And Redis based components for Scrapy

Stars: ✭ 45 (-60.18%)

Mutual labels: scrapy

Inventus

Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.

Stars: ✭ 80 (-29.2%)

Mutual labels: scrapy

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 92 (-18.58%)

Mutual labels: scrapy

View All Similar Projects ➔

Raspagem de dados para iniciantes 📄

Esse repositório foi construido para ajudar qualquer interessado pela área de Raspagem de dados, todo o repositório será em PT-BR, mas os links/documentação podem estar em inglês (compartilhe se você possuir algo traduzido).

Instalação 💾

Uso Python versão 3.7

As principais libs que vamos usar aqui são:

requests
bs4 (BeautifulSoup)
Scrapy

Para isso você só precisa instalar algumas bibliotecas, no seu Terminal escreva:

pip install -r requirements.txt

Recomendações

Use o ambiente virtual do Python para programar independente de plataforma.

Criação:

python3 -m venv venv

Ativação (muda conforme S.O):

source venv/bin/activate

Dependências:

pip install -r requirements.txt

Jupyter notebooks

Iremos usar Jupyter notebooks aqui, então se você não tem com a ferramenta, visite a documentação.

Trilha para o tutorial: 🎓

Materiais de estudo:

Blogs: 💻

The Scraping Hub [ENG]

Livros: 📚

Web Scraping with Python, Ryan Mitchell [ENG]

Documentação: 📜

Python [ENG]
Requests [ENG]
BeautifulSoup [ENG]
Jupyter Notebooks [ENG]
Scrapy [ENG]

Podcasts: 🎧 🎵

Episódio 005: Serenata de Amor [PT-BR]
Episódio 009: Sobre Crawlers e Scrapers [PT-BR]
Episódio 011: Bots Políticos [PT-BR]

Vídeos: 📺

Live de Python #20 - Selenium / Web scraping #1 [PT-BR]
Live de Python #21 Beautifulsoup / Web scraping #2 [PT-BR]
Live de Python #22 Requests / Web scraping #3 [PT-BR]
Live de Python #23 DeepWeb/ Anonimidade / Web scraping #4 [PT-BR]
Live de Python #24 Tabelas e persistência / Web scraping #5 [PT-BR]
Live de Python #25 Raspando dados não HTML / Web scraping #6 [PT-BR]
Live de Python #26 Scrapy / Web scraping #7 (Com artur Gaspar) [PT-BR]
Live de Python #27 Baixando arquivos e manipulando datas / Web scraping #8) [PT-BR]
Live de Python #78 - Raspando a web com Scrapy - Com Renne Rocha [PT-BR]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DwarfThief / Raspagem-de-dados-para-iniciantes

Programming Languages

Labels

Projects that are alternatives of or similar to Raspagem-de-dados-para-iniciantes

Raspagem de dados para iniciantes 📄

Instalação 💾

Recomendações

Jupyter notebooks

Trilha para o tutorial: 🎓

Materiais de estudo:

Blogs: 💻

Livros: 📚

Documentação: 📜

Podcasts: 🎧 🎵

Vídeos: 📺