All Projects → lorien → Grab

lorien / Grab

Licence: mit
Web Scraping Framework

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
Makefile
30231 projects

Projects that are alternatives of or similar to Grab

Sylar
C++高性能分布式服务器框架,webserver,websocket server,自定义tcp_server(包含日志模块,配置模块,线程模块,协程模块,协程调度模块,io协程调度模块,hook模块,socket模块,bytearray序列化,http模块,TcpServer模块,Websocket模块,Https模块等, Smtp邮件模块, MySQL, SQLite3, ORM,Redis,Zookeeper)
Stars: ✭ 895 (-58.31%)
Mutual labels:  framework, network, http-client
Php Curl Class
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Stars: ✭ 2,903 (+35.21%)
Mutual labels:  web-scraping, framework, http-client
Libmtev
Mount Everest Application Framework
Stars: ✭ 104 (-95.16%)
Mutual labels:  framework, network
Tomorrowland
Lightweight Promises for Swift & Obj-C
Stars: ✭ 106 (-95.06%)
Mutual labels:  asynchronous, framework
Qtnetworkng
QtNetwork Next Generation. A coroutine based network framework for Qt/C++, with more simpler API than boost::asio.
Stars: ✭ 125 (-94.18%)
Mutual labels:  network, http-client
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-96.83%)
Mutual labels:  spider, web-scraping
Rocket.jl
Functional reactive programming extensions library for Julia
Stars: ✭ 69 (-96.79%)
Mutual labels:  asynchronous, framework
Netclient Ios
Versatile HTTP Networking in Swift
Stars: ✭ 117 (-94.55%)
Mutual labels:  asynchronous, framework
Vibe Core
Repository for the next generation of vibe.d's core package.
Stars: ✭ 56 (-97.39%)
Mutual labels:  asynchronous, network
Zhihuquestionsspider
😊😊😊 知乎问题爬虫
Stars: ✭ 152 (-92.92%)
Mutual labels:  spider, http-client
Ecs
ECS for Unity with full game state automatic rollbacks
Stars: ✭ 151 (-92.97%)
Mutual labels:  framework, network
Curlsharp
CurlSharp - .Net binding and object-oriented wrapper for libcurl.
Stars: ✭ 153 (-92.87%)
Mutual labels:  network, http-client
Abotx
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
Stars: ✭ 63 (-97.07%)
Mutual labels:  spider, framework
Collie
An asynchronous event-driven network framework( port netty ) written in D.
Stars: ✭ 60 (-97.21%)
Mutual labels:  asynchronous, network
Ant nest
Simple, clear and fast Web Crawler framework build on python3.6+, powered by asyncio.
Stars: ✭ 90 (-95.81%)
Mutual labels:  spider, framework
Danf
Danf is a Node.js full-stack isomorphic OOP framework allowing to code the same way on both client and server sides. It helps you to make deep architectures and handle asynchronous flows in order to help in producing scalable, maintainable, testable and performant applications.
Stars: ✭ 58 (-97.3%)
Mutual labels:  asynchronous, framework
Drone
CLI utility for Drone, an Embedded Operating System.
Stars: ✭ 114 (-94.69%)
Mutual labels:  asynchronous, framework
Libhv
🔥 比libevent、libuv更易用的国产网络库。A c/c++ network library for developing TCP/UDP/SSL/HTTP/WebSocket client/server.
Stars: ✭ 3,355 (+56.26%)
Mutual labels:  network, http-client
Pnet
High level Java network library
Stars: ✭ 49 (-97.72%)
Mutual labels:  framework, network
Hreq
A type dependent highlevel HTTP client library inspired by servant-client.
Stars: ✭ 53 (-97.53%)
Mutual labels:  network, http-client

Grab Framework Documentation

pytest status coveralls documentation

Installation

    $ pip install -U grab

See details about installing Grab on different platforms here http://docs.grablib.org/en/latest/usage/installation.html

Support

Documentation: https://grablab.org/docs/

Russian telegram chat: https://t.me/grablab_ru

English telegram chat: https://t.me/grablab

To report bug please use GitHub issue tracker: https://github.com/lorien/grab/issues

What is Grab?

Grab is a python web scraping framework. Grab provides a number of helpful methods to perform network requests, scrape web sites and process the scraped content:

  • Automatic cookies (session) support
  • HTTP and SOCKS proxy with/without authorization
  • Keep-Alive support
  • IDN support
  • Tools to work with web forms
  • Easy multipart file uploading
  • Flexible customization of HTTP requests
  • Automatic charset detection
  • Powerful API to extract data from DOM tree of HTML documents with XPATH queries
  • Asynchronous API to make thousands of simultaneous queries. This part of library called Spider. See list of spider fetures below.
  • Python 3 ready

Spider is a framework for writing web-site scrapers. Features:

  • Rules and conventions to organize the request/parse logic in separate blocks of codes
  • Multiple parallel network requests
  • Automatic processing of network errors (failed tasks go back to task queue)
  • You can create network requests and parse responses with Grab API (see above)
  • HTTP proxy support
  • Caching network results in permanent storage
  • Different backends for task queue (in-memory, redis, mongodb)
  • Tools to debug and collect statistics

Grab Example

    import logging

    from grab import Grab

    logging.basicConfig(level=logging.DEBUG)

    g = Grab()

    g.go('https://github.com/login')
    g.doc.set_input('login', '****')
    g.doc.set_input('password', '****')
    g.doc.submit()

    g.doc.save('/tmp/x.html')

    g.doc('//ul[@id="user-links"]//button[contains(@class, "signout")]').assert_exists()

    home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
    repo_url = home_url + '?tab=repositories'

    g.go(repo_url)

    for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):
        print('%s: %s' % (elem.text(),
                          g.make_url_absolute(elem.attr('href'))))

Grab::Spider Example

    import logging

    from grab.spider import Spider, Task

    logging.basicConfig(level=logging.DEBUG)


    class ExampleSpider(Spider):
        def task_generator(self):
            for lang in 'python', 'ruby', 'perl':
                url = 'https://www.google.com/search?q=%s' % lang
                yield Task('search', url=url, lang=lang)

        def task_search(self, grab, task):
            print('%s: %s' % (task.lang,
                              grab.doc('//div[@class="s"]//cite').text()))


    bot = ExampleSpider(thread_number=2)
    bot.run()
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].