All Projects → Gerapy → Gerapy

Gerapy / Gerapy

Licence: mit
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Programming Languages

python
139335 projects - #7 most used programming language
Vue
7211 projects
javascript
184084 projects - #8 most used programming language
SCSS
7915 projects
HTML
75241 projects
shell
77523 projects

Projects that are alternatives of or similar to Gerapy

Spiderkeeper
admin ui for scrapy/open source scrapinghub
Stars: ✭ 2,562 (-1.5%)
Mutual labels:  spider, scrapy, dashboard, scrapyd
Scrapydweb
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO 👉
Stars: ✭ 2,385 (-8.3%)
Mutual labels:  spider, scrapy, dashboard, scrapyd
NScrapy
NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (-96.62%)
Mutual labels:  spider, distributed, scrapy
scrapy-admin
A django admin site for scrapy
Stars: ✭ 44 (-98.31%)
Mutual labels:  spider, scrapy, scrapyd
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+91.96%)
Mutual labels:  spider, scrapy, distributed
Py Elasticsearch Django
基于python语言开发的千万级别搜索引擎
Stars: ✭ 207 (-92.04%)
Mutual labels:  spider, scrapy, django
Python Spider
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Stars: ✭ 615 (-76.36%)
Mutual labels:  spider, scrapy, django
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (-60.63%)
Mutual labels:  spider, scrapy, django
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (-69.93%)
Mutual labels:  spider, scrapy, django
Seeker
Seeker - another job board aggregator.
Stars: ✭ 16 (-99.38%)
Mutual labels:  spider, scrapy, django
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+222.65%)
Mutual labels:  spider, scrapy, webspider
Copybook
用爬虫爬取小说网站上所有小说,存储到数据库中,并用爬到的数据构建自己的小说网站
Stars: ✭ 117 (-95.5%)
Mutual labels:  spider, scrapy, django
Feapder
feapder是一款支持分布式、批次采集、任务防丢、报警丰富的python爬虫框架
Stars: ✭ 110 (-95.77%)
Mutual labels:  spider, scrapy
Scrapy demo
all kinds of scrapy demo
Stars: ✭ 128 (-95.08%)
Mutual labels:  spider, scrapy
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (-94.35%)
Mutual labels:  spider, scrapy
Dialogue.moe
Stars: ✭ 127 (-95.12%)
Mutual labels:  scrapy, django
Taobaoscrapy
😩Tool For Taobao/Tmall| 儿时玩具已经过时
Stars: ✭ 146 (-94.39%)
Mutual labels:  spider, scrapy
Jlitespider
A lite distributed Java spider framework :-)
Stars: ✭ 151 (-94.19%)
Mutual labels:  spider, distributed
Fp Server
Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
Stars: ✭ 154 (-94.08%)
Mutual labels:  spider, scrapy
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (-93.69%)
Mutual labels:  spider, scrapy

Gerapy

Build Read the Docs PyPI - Python Version GitHub stars PyPI - Downloads Docker Pulls PyPI - License

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js.

Documentation

Documentation is available online at https://docs.gerapy.com/ and https://github.com/Gerapy/Docs.

Support

Gerapy is developed based on Python 3.x. Python 2.x may be supported later.

Usage

Install Gerapy by pip:

pip3 install gerapy

After the installation, you need to do these things below to run Gerapy server:

If you have installed Gerapy successfully, you can use command gerapy. If not, check the installation.

First use this command to initialize the workspace:

gerapy init

Now you will get a folder named gerapy. Also you can specify the name of your workspace by this command:

gerapy init <workspace>

Then cd to this folder, and run this command to initialize the Database:

cd gerapy
gerapy migrate

Next you need to create a superuser by this command:

gerapy createsuperuser

Then you can runserver by this command:

gerapy runserver

Then you can visit http://localhost:8000 to enjoy it. Also you can vist http://localhost:8000/admin to get the admin management backend.

If you want to run Gerapy in public, just run like this:

gerapy runserver 0.0.0.0:8000

Then it will run with public host and port 8000.

In Gerapy, You can create a configurable project and then configure and generate code of Scrapy automatically. But this module is unstable, we're trying to refine it.

Also you can drag your Scrapy Project to projects folder. Then refresh web, it will appear in the Project Index Page and comes to un-configurable, but you can edit this project through the web page.

As for deployment, you can move to Deploy Page. Firstly you need to build your project and add client in the Client Index Page, then you can deploy the project just by clicking button.

After the deployment, you can manage the job in Monitor Page.

Docker

Just run this command:

docker run -d -v ~/gerapy:/app/gerapy -p 8000:8000 germey/gerapy

Then it will run at port 8000. You can use the temp admin account (username: admin, password: admin) to login. And please change the password later for safety.

Command Usage:

docker run -d -v <workspace>:/app/gerapy -p <public_port>:<container_port> germey/gerapy

Please specify your workspace to mount Gerapy workspace by -v <workspace>:/app/gerapy and specify server port by -p <public_port>:<container_port>.

If you run Gerapy by Docker, you can visit Gerapy website such as http://localhost:8000 and enjoy it, no need to do other initialzation things.

TodoList

  • Add Visual Configuration of Spider with Previewing Website
  • Add Scrapyd Auth Management
  • Add Gerapy Auth Management
  • Add Timed Task Scheduler
  • Add Visual Configuration of Scrapy
  • Add Intelligent Analysis of Web Page

Communication

If you have any questions or ideas, you can send Issues or Pull Requests, your suggestions are really import for us, thanks for your contirbution.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].