All Projects → cloverzrg → photo-spider-scrapy

cloverzrg / photo-spider-scrapy

Licence: other
10 photo website spiders, 10 个国外图库的 scrapy 爬虫代码

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to photo-spider-scrapy

163Music
163music spider by scrapy.
Stars: ✭ 60 (+252.94%)
Mutual labels:  spider, scrapy
Scrapy IPProxyPool
免费 IP 代理池。Scrapy 爬虫框架插件
Stars: ✭ 100 (+488.24%)
Mutual labels:  spider, scrapy
Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Stars: ✭ 2,601 (+15200%)
Mutual labels:  spider, scrapy
Scrapydweb
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO 👉
Stars: ✭ 2,385 (+13929.41%)
Mutual labels:  spider, scrapy
small-spider-project
日常爬虫
Stars: ✭ 14 (-17.65%)
Mutual labels:  spider, scrapy
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (+1017.65%)
Mutual labels:  spider, scrapy
Spider job
招聘网数据爬虫
Stars: ✭ 234 (+1276.47%)
Mutual labels:  spider, scrapy
Python3 Spider
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Stars: ✭ 2,129 (+12423.53%)
Mutual labels:  spider, scrapy
elves
🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+1794.12%)
Mutual labels:  spider, scrapy
Web-Iota
Iota is a web scraper which can find all of the images and links/suburls on a webpage
Stars: ✭ 60 (+252.94%)
Mutual labels:  spider, scrapy
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (+994.12%)
Mutual labels:  spider, scrapy
NScrapy
NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (+417.65%)
Mutual labels:  spider, scrapy
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (+864.71%)
Mutual labels:  spider, scrapy
Py Elasticsearch Django
基于python语言开发的千万级别搜索引擎
Stars: ✭ 207 (+1117.65%)
Mutual labels:  spider, scrapy
Fp Server
Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
Stars: ✭ 154 (+805.88%)
Mutual labels:  spider, scrapy
Spiderkeeper
admin ui for scrapy/open source scrapinghub
Stars: ✭ 2,562 (+14970.59%)
Mutual labels:  spider, scrapy
Taobaoscrapy
😩Tool For Taobao/Tmall| 儿时玩具已经过时
Stars: ✭ 146 (+758.82%)
Mutual labels:  spider, scrapy
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (+764.71%)
Mutual labels:  spider, scrapy
scrapy helper
Dynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (+394.12%)
Mutual labels:  spider, scrapy
instant-images
Instantly upload photos from Unsplash, Pixabay and Pexels to your website without leaving WordPress.
Stars: ✭ 26 (+52.94%)
Mutual labels:  unsplash, pexels

photo-spider-scrapy

some photo spider.

在scrapy中传参

yield Request(url=origin_photo, callback=lambda

create project

scrapy startproject project_name

run project

scrapy crawl project_name

set up user agent

https://blog.jeongen.com/python-scrapy-she-zhi-useragent/

MySQL

/*
Navicat MySQL Data Transfer

Source Server         : localhost_3306
Source Server Version : 50505
Source Host           : localhost:3306
Source Database       : photo

Target Server Type    : MYSQL
Target Server Version : 50505
File Encoding         : 65001

Date: 2017-08-09 17:24:53
*/

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for magdeleine
-- ----------------------------
DROP TABLE IF EXISTS `magdeleine`;
CREATE TABLE `magdeleine` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `page_url` varchar(255) NOT NULL,
  `photo_url` varchar(255) NOT NULL,
  `resolution` char(20) DEFAULT NULL,
  `category` varchar(255) DEFAULT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `chinese_category` varchar(255) DEFAULT NULL,
  `chinese_tags` varchar(255) DEFAULT NULL,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1226 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for pexels
-- ----------------------------
DROP TABLE IF EXISTS `pexels`;
CREATE TABLE `pexels` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `url` varchar(500) DEFAULT NULL,
  `size` varchar(255) DEFAULT NULL,
  `resolution` varchar(255) DEFAULT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `type_id` tinyint(1) DEFAULT '0',
  `chinese_tags` varchar(255) DEFAULT NULL,
  `thumb_name` varchar(255) DEFAULT NULL,
  `thumb_name2` varchar(255) DEFAULT NULL,
  `is_posted` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=37971 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for photock
-- ----------------------------
DROP TABLE IF EXISTS `photock`;
CREATE TABLE `photock` (
  `url` varchar(255) NOT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for stocksnap
-- ----------------------------
DROP TABLE IF EXISTS `stocksnap`;
CREATE TABLE `stocksnap` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `img_id` char(20) DEFAULT NULL,
  `width` int(11) DEFAULT NULL,
  `height` int(11) DEFAULT NULL,
  `chinese_tags` varchar(255) DEFAULT NULL,
  `type_id` tinyint(1) DEFAULT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `file_size` char(10) DEFAULT NULL,
  `thumb_name` varchar(255) DEFAULT NULL,
  `thumb_name2` varchar(255) DEFAULT NULL,
  `posted` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`img_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=81955 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for stockvault
-- ----------------------------
DROP TABLE IF EXISTS `stockvault`;
CREATE TABLE `stockvault` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `file_name` varchar(255) DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `file_size` varchar(20) DEFAULT NULL,
  `resolution` varchar(255) DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `chinese_tags` varchar(255) DEFAULT NULL,
  `is_posted` tinyint(1) DEFAULT '0',
  `thumb_name` varchar(255) DEFAULT NULL,
  `type_id` tinyint(1) DEFAULT NULL,
  `chinese_title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=237592 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Function structure for LastIndexOf
-- ----------------------------
DROP FUNCTION IF EXISTS `LastIndexOf`;
DELIMITER ;;
CREATE DEFINER=`root`@`localhost` FUNCTION `LastIndexOf`(`str` varchar(255),`mysubstr` varchar(255)) RETURNS int(11)
BEGIN
	#Routine body goes here...
	DECLARE pos int(11);
	DECLARE re_pos int(11);
	set re_pos = INSTR(REVERSE(str), REVERSE(mysubstr));
	if re_pos = 0 THEN
		RETURN 0;
	end if;
	set pos = LENGTH(str) - re_pos - LENGTH(mysubstr) + 2;
	RETURN pos;
END
;;
DELIMITER ;
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].