lorien / Grab

Licence: mit

Web Scraping Framework

Programming Languages

python

139335 projects - #7 most used programming language

HTML

75241 projects

Makefile

30231 projects

Projects that are alternatives of or similar to Grab

Sylar

C++高性能分布式服务器框架,webserver,websocket server,自定义tcp_server（包含日志模块，配置模块，线程模块，协程模块，协程调度模块，io协程调度模块，hook模块，socket模块，bytearray序列化，http模块，TcpServer模块，Websocket模块，Https模块等, Smtp邮件模块, MySQL, SQLite3, ORM,Redis,Zookeeper)

Stars: ✭ 895 (-58.31%)

Mutual labels: framework, network, http-client

Php Curl Class

PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs

Stars: ✭ 2,903 (+35.21%)

Mutual labels: web-scraping, framework, http-client

Libmtev

Mount Everest Application Framework

Stars: ✭ 104 (-95.16%)

Mutual labels: framework, network

Tomorrowland

Lightweight Promises for Swift & Obj-C

Stars: ✭ 106 (-95.06%)

Mutual labels: asynchronous, framework

Qtnetworkng

QtNetwork Next Generation. A coroutine based network framework for Qt/C++, with more simpler API than boost::asio.

Stars: ✭ 125 (-94.18%)

Mutual labels: network, http-client

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-96.83%)

Mutual labels: spider, web-scraping

Rocket.jl

Functional reactive programming extensions library for Julia

Stars: ✭ 69 (-96.79%)

Mutual labels: asynchronous, framework

Netclient Ios

Versatile HTTP Networking in Swift

Stars: ✭ 117 (-94.55%)

Mutual labels: asynchronous, framework

Vibe Core

Repository for the next generation of vibe.d's core package.

Stars: ✭ 56 (-97.39%)

Mutual labels: asynchronous, network

Zhihuquestionsspider

😊😊😊 知乎问题爬虫

Stars: ✭ 152 (-92.92%)

Mutual labels: spider, http-client

Ecs

ECS for Unity with full game state automatic rollbacks

Stars: ✭ 151 (-92.97%)

Mutual labels: framework, network

Curlsharp

CurlSharp - .Net binding and object-oriented wrapper for libcurl.

Stars: ✭ 153 (-92.87%)

Mutual labels: network, http-client

Abotx

Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.

Stars: ✭ 63 (-97.07%)

Mutual labels: spider, framework

Collie

An asynchronous event-driven network framework( port netty ) written in D.

Stars: ✭ 60 (-97.21%)

Mutual labels: asynchronous, network

Ant nest

Simple, clear and fast Web Crawler framework build on python3.6+, powered by asyncio.

Stars: ✭ 90 (-95.81%)

Mutual labels: spider, framework

Danf

Danf is a Node.js full-stack isomorphic OOP framework allowing to code the same way on both client and server sides. It helps you to make deep architectures and handle asynchronous flows in order to help in producing scalable, maintainable, testable and performant applications.

Stars: ✭ 58 (-97.3%)

Mutual labels: asynchronous, framework

Drone

CLI utility for Drone, an Embedded Operating System.

Stars: ✭ 114 (-94.69%)

Mutual labels: asynchronous, framework

Libhv

🔥 比libevent、libuv更易用的国产网络库。A c/c++ network library for developing TCP/UDP/SSL/HTTP/WebSocket client/server.

Stars: ✭ 3,355 (+56.26%)

Mutual labels: network, http-client

Pnet

High level Java network library

Stars: ✭ 49 (-97.72%)

Mutual labels: framework, network

Hreq

A type dependent highlevel HTTP client library inspired by servant-client.

Stars: ✭ 53 (-97.53%)

Mutual labels: network, http-client

View All Similar Projects ➔

Grab Framework Documentation

Installation

    $ pip install -U grab

See details about installing Grab on different platforms here http://docs.grablib.org/en/latest/usage/installation.html

Support

Documentation: https://grablab.org/docs/

Russian telegram chat: https://t.me/grablab_ru

English telegram chat: https://t.me/grablab

To report bug please use GitHub issue tracker: https://github.com/lorien/grab/issues

What is Grab?

Grab is a python web scraping framework. Grab provides a number of helpful methods to perform network requests, scrape web sites and process the scraped content:

Automatic cookies (session) support
HTTP and SOCKS proxy with/without authorization
Keep-Alive support
IDN support
Tools to work with web forms
Easy multipart file uploading
Flexible customization of HTTP requests
Automatic charset detection
Powerful API to extract data from DOM tree of HTML documents with XPATH queries
Asynchronous API to make thousands of simultaneous queries. This part of library called Spider. See list of spider fetures below.
Python 3 ready

Spider is a framework for writing web-site scrapers. Features:

Rules and conventions to organize the request/parse logic in separate blocks of codes
Multiple parallel network requests
Automatic processing of network errors (failed tasks go back to task queue)
You can create network requests and parse responses with Grab API (see above)
HTTP proxy support
Caching network results in permanent storage
Different backends for task queue (in-memory, redis, mongodb)
Tools to debug and collect statistics

Grab Example

    import logging

    from grab import Grab

    logging.basicConfig(level=logging.DEBUG)

    g = Grab()

    g.go('https://github.com/login')
    g.doc.set_input('login', '****')
    g.doc.set_input('password', '****')
    g.doc.submit()

    g.doc.save('/tmp/x.html')

    g.doc('//ul[@id="user-links"]//button[contains(@class, "signout")]').assert_exists()

    home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
    repo_url = home_url + '?tab=repositories'

    g.go(repo_url)

    for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):
        print('%s: %s' % (elem.text(),
                          g.make_url_absolute(elem.attr('href'))))

Grab::Spider Example

    import logging

    from grab.spider import Spider, Task

    logging.basicConfig(level=logging.DEBUG)


    class ExampleSpider(Spider):
        def task_generator(self):
            for lang in 'python', 'ruby', 'perl':
                url = 'https://www.google.com/search?q=%s' % lang
                yield Task('search', url=url, lang=lang)

        def task_search(self, grab, task):
            print('%s: %s' % (task.lang,
                              grab.doc('//div[@class="s"]//cite').text()))


    bot = ExampleSpider(thread_number=2)
    bot.run()

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

lorien / Grab

Programming Languages

Labels

Projects that are alternatives of or similar to Grab

Grab Framework Documentation

Installation

Support

What is Grab?

Grab Example

Grab::Spider Example