IaroslavR / scrapy-mysql-pipeline

Licence: other

scrapy mysql pipeline

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to scrapy-mysql-pipeline

domains

World’s single largest Internet domains dataset

Stars: ✭ 461 (+880.85%)

Mutual labels: scrapy

asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

Stars: ✭ 86 (+82.98%)

Mutual labels: scrapy

ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

Stars: ✭ 34 (-27.66%)

Mutual labels: scrapy

lgcrawl

python+scrapy+splash 爬取拉勾全站职位信息

Stars: ✭ 22 (-53.19%)

Mutual labels: scrapy

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

Stars: ✭ 84 (+78.72%)

Mutual labels: scrapy

vietnam-ecommerce-crawler

Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

Stars: ✭ 28 (-40.43%)

Mutual labels: scrapy

Awesome crawl

腾讯新闻、知乎话题、微博粉丝，Tumblr爬虫、斗鱼弹幕、妹子图爬虫、分布式设计等

Stars: ✭ 246 (+423.4%)

Mutual labels: scrapy

fernando-pessoa

Classificador de poemas do Fernando Pessoa de acordo com os seus heterônimos

Stars: ✭ 31 (-34.04%)

Mutual labels: scrapy

Web-Iota

Iota is a web scraper which can find all of the images and links/suburls on a webpage

Stars: ✭ 60 (+27.66%)

Mutual labels: scrapy

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+161.7%)

Mutual labels: scrapy

arche

Analyze scraped data

Stars: ✭ 49 (+4.26%)

Mutual labels: scrapy

scrapy-rotated-proxy

A scrapy middleware to use rotated proxy ip list.

Stars: ✭ 22 (-53.19%)

Mutual labels: scrapy

scrapy-LBC

Araignée LeBonCoin avec Scrapy et ElasticSearch

Stars: ✭ 14 (-70.21%)

Mutual labels: scrapy

pagser

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Stars: ✭ 82 (+74.47%)

Mutual labels: scrapy

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 92 (+95.74%)

Mutual labels: scrapy

estate-crawler

Scraping the real estate agencies for up-to-date house listings as soon as they arrive!

Stars: ✭ 20 (-57.45%)

Mutual labels: scrapy

crawler

python爬虫项目集合

Stars: ✭ 29 (-38.3%)

Mutual labels: scrapy

scrapy-html-storage

Scrapy downloader middleware that stores response HTMLs to disk.

Stars: ✭ 17 (-63.83%)

Mutual labels: scrapy

itemadapter

Common interface for data container classes

Stars: ✭ 47 (+0%)

Mutual labels: scrapy

Scrape-Finance-Data

My code for scraping financial data in Vietnam

Stars: ✭ 13 (-72.34%)

Mutual labels: scrapy

View All Similar Projects ➔

Pull requests are always welcome

scrapy-mysql-pipeline

Asynchronous mysql Scrapy item pipeline

Installation

pip install scrapy-mysql-pipeline

Configuration

Add pipeline

ITEM_PIPELINES = {
    'scrapy_mysql_pipeline.MySQLPipeline': 300,
}

Default values:

MYSQL_HOST = 'localhost'
MYSQL_PORT = 3306
MYSQL_USER = None
MYSQL_PASSWORD = ''
MYSQL_DB = None
MYSQL_TABLE = None
MYSQL_UPSERT = False
MYSQL_RETRIES = 3
MYSQL_CLOSE_ON_ERROR = True
MYSQL_CHARSET = 'utf8'

MYSQL_USER, MYSQL_PASSWORD, MYSQL_DB and MYSQL_TABLE, variables must be set in settings.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

IaroslavR / scrapy-mysql-pipeline

Programming Languages

Labels

Projects that are alternatives of or similar to scrapy-mysql-pipeline

Pull requests are always welcome

scrapy-mysql-pipeline

Installation

Configuration