All Projects → nhat2008 → vietnam-ecommerce-crawler

nhat2008 / vietnam-ecommerce-crawler

Licence: other
Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to vietnam-ecommerce-crawler

City Scrapers
Scrape, standardize and share public meetings from local government websites
Stars: ✭ 220 (+685.71%)
Mutual labels:  scrapy
estate-crawler
Scraping the real estate agencies for up-to-date house listings as soon as they arrive!
Stars: ✭ 20 (-28.57%)
Mutual labels:  scrapy
scrapy-rotated-proxy
A scrapy middleware to use rotated proxy ip list.
Stars: ✭ 22 (-21.43%)
Mutual labels:  scrapy
Spiderkeeper
admin ui for scrapy/open source scrapinghub
Stars: ✭ 2,562 (+9050%)
Mutual labels:  scrapy
Spider job
招聘网数据爬虫
Stars: ✭ 234 (+735.71%)
Mutual labels:  scrapy
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Stars: ✭ 82 (+192.86%)
Mutual labels:  scrapy
Stealer
抖音、快手、火山、皮皮虾,视频去水印程序
Stars: ✭ 217 (+675%)
Mutual labels:  scrapy
asyncpy
使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
Stars: ✭ 86 (+207.14%)
Mutual labels:  scrapy
Awesome crawl
腾讯新闻、知乎话题、微博粉丝,Tumblr爬虫、斗鱼弹幕、妹子图爬虫、分布式设计等
Stars: ✭ 246 (+778.57%)
Mutual labels:  scrapy
Scrapy-tripadvisor-reviews
Using scrapy to scrape tripadvisor in order to get users' reviews.
Stars: ✭ 24 (-14.29%)
Mutual labels:  scrapy
Scrapy Splash
Scrapy+Splash for JavaScript integration
Stars: ✭ 2,666 (+9421.43%)
Mutual labels:  scrapy
Ecommercecrawlers
码云仓库链接:AJay13/ECommerceCrawlers Github 仓库链接:DropsDevopsOrg/ECommerceCrawlers 项目展示平台链接:http://wechat.doonsec.com
Stars: ✭ 3,073 (+10875%)
Mutual labels:  scrapy
lgcrawl
python+scrapy+splash 爬取拉勾全站职位信息
Stars: ✭ 22 (-21.43%)
Mutual labels:  scrapy
Sourcecodeofbook
《Python爬虫开发 从入门到实战》配套源代码。
Stars: ✭ 226 (+707.14%)
Mutual labels:  scrapy
scrapy helper
Dynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (+200%)
Mutual labels:  scrapy
Ruiji.net
crawler framework, distributed crawler extractor
Stars: ✭ 220 (+685.71%)
Mutual labels:  scrapy
domains
World’s single largest Internet domains dataset
Stars: ✭ 461 (+1546.43%)
Mutual labels:  scrapy
crawler
python爬虫项目集合
Stars: ✭ 29 (+3.57%)
Mutual labels:  scrapy
Web-Iota
Iota is a web scraper which can find all of the images and links/suburls on a webpage
Stars: ✭ 60 (+114.29%)
Mutual labels:  scrapy
arche
Analyze scraped data
Stars: ✭ 49 (+75%)
Mutual labels:  scrapy

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers


1. good structure for scrapy with items and pipelines
2. automatically proxy changing
3. simply running - don't need to remember the command to run scrapy
4. flexible config- the crawler gets data by patterns in template/product.yml
5. save data to databases: mongo or es
6. applying pybloom for checking duplicate crawled data when crawling
7. stopping after time -

Install requirements.txt


$python app.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].