All Projects → sachaarbonel → scrapy.dart

sachaarbonel / scrapy.dart

Licence: MIT license
Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

Programming Languages

dart
5743 projects
ruby
36898 projects - #4 most used programming language
objective c
16641 projects - #2 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to scrapy.dart

aioScrapy
基于asyncio与aiohttp的异步协程爬虫框架 欢迎Star
Stars: ✭ 34 (-32%)
Mutual labels:  scrapy
InstaBot
Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-36%)
Mutual labels:  scrapy
163Music
163music spider by scrapy.
Stars: ✭ 60 (+20%)
Mutual labels:  scrapy
scrapy spider
No description or website provided.
Stars: ✭ 58 (+16%)
Mutual labels:  scrapy
NScrapy
NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (+76%)
Mutual labels:  scrapy
scrapy-boilerplate
Scrapy project boilerplate done right
Stars: ✭ 30 (-40%)
Mutual labels:  scrapy
ancient chinese
古汉语(文言文)字典-爬取文言文字典网,制作Kindle字典.
Stars: ✭ 48 (-4%)
Mutual labels:  scrapy
Scrapy IPProxyPool
免费 IP 代理池。Scrapy 爬虫框架插件
Stars: ✭ 100 (+100%)
Mutual labels:  scrapy
devsearch
A web search engine built with Python which uses TF-IDF and PageRank to sort search results.
Stars: ✭ 52 (+4%)
Mutual labels:  scrapy
python demo
一些简单有趣的Python小Demo
Stars: ✭ 109 (+118%)
Mutual labels:  scrapy
ScrapyProject
Scrapy项目(mysql+mongodb豆瓣top250电影)
Stars: ✭ 18 (-64%)
Mutual labels:  scrapy
python-crawler
爬虫学习仓库,适合零基础的人学习,对新手比较友好
Stars: ✭ 37 (-26%)
Mutual labels:  scrapy
animecenter
The source code for animecenter
Stars: ✭ 16 (-68%)
Mutual labels:  scrapy
project pjx
Python分布式爬虫打造搜索引擎
Stars: ✭ 42 (-16%)
Mutual labels:  scrapy
scrapy-cookies
A middleware of cookies persistence for Scrapy
Stars: ✭ 19 (-62%)
Mutual labels:  scrapy
JD Spider
👍 京东爬虫(大量注释,对刚入门爬虫者极度友好)
Stars: ✭ 56 (+12%)
Mutual labels:  scrapy
invana-bot
A Web Crawler that scrapes using YAML and python code.
Stars: ✭ 30 (-40%)
Mutual labels:  scrapy
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-70%)
Mutual labels:  scrapy
elves
🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+544%)
Mutual labels:  scrapy
torchestrator
Spin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-36%)
Mutual labels:  scrapy

scrapy

pub package

Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

Getting started

import 'package:scrapy/scrapy.dart';
import 'package:html/parser.dart' as html;
import 'package:http/http.dart';

class Quote extends Item {
  String quote;
  Quote({this.quote});
  @override
  String toString() {
    return "Quote : { quote : $quote }";
  }

  @override
  Map<String, dynamic> toJson() => {
        "quote": quote == null ? null : quote,
      };
  factory Quote.fromJson(String str) => Quote.fromMap(json.decode(str));
  factory Quote.fromMap(Map<String, dynamic> json) => Quote(
        quote: json["quote"] == null ? null : json["quote"],
      );
}

class Quotes extends Items {
  @override
  final List<Quote> items;
  Quotes({
    this.items,
  });

  factory Quotes.fromJson(String str) => Quotes.fromMap(json.decode(str));
  factory Quotes.fromMap(Map<String, dynamic> json) => Quotes(
        items: json["items"] == null
            ? null
            : List<Quote>.from(json["items"].map((x) => Quote.fromMap(x))),
      );
}

class BlogSpider extends Spider<Quote,Quotes> {
  Stream<String> parse(Response response) async* {
    final document = html.parse(response.body);
    final nodes = document.querySelectorAll("div.quote> span.text");

    for (var node in nodes) {
      yield node.innerHtml;
    }
  }

  @override
  Stream<String> Transform(Stream<String> stream) async* {
    await for (String parsed in stream) {
      final transformed = parsed;
      yield transformed.substring(1, parsed.length - 1);
    }
  }

  @override
  Stream<Quote> Save(Stream<String> stream) async* {
    await for (String transformed in stream) {
      final quote = Quote(quote: transformed);
      yield quote;
    }
  }
}

main() async {
  final spider = BlogSpider();
  spider.name = "myspider";
  spider.client = Client();
  spider.startUrls = [
    "http://quotes.toscrape.com/page/7/",
    "http://quotes.toscrape.com/page/8/",
    "http://quotes.toscrape.com/page/9/"
  ];

  final stopw = Stopwatch()..start();
  
  await spider.startRequests();
  await spider.saveResult();
  final elapsed = stopw.elapsed;

  print("the program took $elapsed"); //the program took 0:00:00.279733
}

Example

Here a list view example on flutter showing the quotes we just scrapped and saved on disk.

screencap.png

Lightweight dependencies:

  • http

TODOs

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].