SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.

Stars: ✭ 153 (+628.57%)

Mutual labels: scraper, scraping

TradeTheEvent

Implementation of "Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading." In Findings of ACL2021

Stars: ✭ 64 (+204.76%)

Mutual labels: scraper, scraping-websites

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (+152.38%)

Mutual labels: scraper, scraping

stweet

Advanced python library to scrap Twitter (tweets, users) from unofficial API

Stars: ✭ 287 (+1266.67%)

Mutual labels: scraper, scrap

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (+152.38%)

Mutual labels: scraper, scraping

View All Similar Projects ➔

Scrapman

Ski-bi dibby dib yo da dub dub
Yo da dub dub
Ski-bi dibby dib yo da dub dub
Yo da dub dub

I'm the Scrapman!

THE FASTEST SCRAPPER EVER... AND IT SUPPORTS PARALLEL REQUESTS (arguably)

Scrapman is a blazingly fast real (with Javascript executed) HTML scrapper, built from the ground up to support parallel fetches, with this you can get the HTML code for 50+ URLs in seconds (~30 seconds).

On NodeJS you can easily use request to fetch the HTML from a page, but what if the page you are trying to load is NOT a static HTML page, but it has dynamic content added with Javascript? What do you do then? Well, you use The Scrapman.

It uses Electron to dynamically load web pages into several <webview> within a single Chromium instance. This is why it fetches the HTML exactly as you would see it if you inspect the page with DevTools.

This is NOT an browser automation tool (yet), it's a node module that gives you the processed HTML from an URL, it focuses on multiple parallel operations and speed.

USAGE

1.- Install it

npm install scrapman -S

2.- Require it

var scrapman = require("scrapman");

3.- Use it (as many times as you need)

Single URL request

scrapman.load("http://google.com", function(results){
	//results contains the HTML obtained from the url
	console.log(results);
});

Parallel URL requests

//yes, you can use it within a loop.
for(var i=1; i<=50; i++){
    scrapman.load("https://www.website.com/page/" + i, function(results){
        console.log(results);
    });
}

API

- scrapman.load(url, callback)

url

Type: String

The URL from which the HTML code is going to be obtained.

callback(results)

Type: Function

The callback function to be executed when the loading is done. The loaded HTML will be in the results parameter.

- scrapman.configure(config)

config

The configuration object can set the following values

maxConcurrentOperations: Integer - The intensity of processing, how many URLs can be loaded at the same time, default: 50
wait: Integer - The amount of milliseconds to wait before returning the HTML code of a webpage after it has been completely loaded, default: 0

Questions

Feel free to open Issues to ask questions about using this package, PRs are very welcomed and encouraged.

SE HABLA ESPAÑOL

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

danielnieto / scrapman

Programming Languages

Labels

Projects that are alternatives of or similar to scrapman

Scrapman

THE FASTEST SCRAPPER EVER... AND IT SUPPORTS PARALLEL REQUESTS (arguably)

USAGE

API

- scrapman.load(url, callback)

url

callback(results)

- scrapman.configure(config)

config

Questions

License

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

danielnieto / scrapman

Programming Languages

Labels

Projects that are alternatives of or similar to scrapman

Scrapman

THE FASTEST SCRAPPER EVER*... AND IT SUPPORTS PARALLEL REQUESTS (*arguably)

USAGE

API

- scrapman.load(url, callback)

url

callback(results)

- scrapman.configure(config)

config

Questions

License

THE FASTEST SCRAPPER EVER... AND IT SUPPORTS PARALLEL REQUESTS (arguably)