jonstuebe / scraper

Licence: MIT license

Node.js based scraper using headless chrome

Programming Languages

javascript

184084 projects - #8 most used programming language

Projects that are alternatives of or similar to scraper

instagram-hashtag-scraper

NodeJS application for scraping recent top posts from Instagram by hashtag without API access.

Stars: ✭ 17 (-62.22%)

Mutual labels: scraper

sotoki

StackExchange websites to ZIM scraper

Stars: ✭ 64 (+42.22%)

Mutual labels: scraper

ha-multiscrape

Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.

Stars: ✭ 103 (+128.89%)

Mutual labels: scraper

jsonHunter

在线爬虫，online web scraper

Stars: ✭ 86 (+91.11%)

Mutual labels: scraper

subreddit-comments-dl

Download subreddit comments

Stars: ✭ 57 (+26.67%)

Mutual labels: scraper

TikTok

Download public videos on TikTok using Python with Selenium

Stars: ✭ 37 (-17.78%)

Mutual labels: scraper

lux

👾 Fast and simple video download library and CLI tool written in Go

Stars: ✭ 19,266 (+42713.33%)

Mutual labels: scraper

TelegramScraper

Using this tool you can easily add so many members from any group to your group. Less than 2 minutes. Super easy. Time saver. But this tool is only for educational purpose. You could be banned from Telegram. So be careful. Recommanded to use this tool only on Termux.

Stars: ✭ 234 (+420%)

Mutual labels: scraper

OnlyFans

Scrape all the media from an OnlyFans account - Updated regularly

Stars: ✭ 573 (+1173.33%)

Mutual labels: scraper

civic-scraper

Tools for downloading agendas, minutes and other documents produced by local government

Stars: ✭ 21 (-53.33%)

Mutual labels: scraper

scrapman

Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs

Stars: ✭ 21 (-53.33%)

Mutual labels: scraper

instagram-get-images

Instagram get images 🌄 (hashtags, account, locations) with puppeteer

Stars: ✭ 69 (+53.33%)

Mutual labels: scraper

unfurl

Extract rich metadata from URLs

Stars: ✭ 41 (-8.89%)

Mutual labels: scraper

rymscraper

Python API to extract data from rateyourmusic.com.

Stars: ✭ 63 (+40%)

Mutual labels: scraper

youtube-playlist

❄️ Extract links, ids, and names from a youtube playlist

Stars: ✭ 73 (+62.22%)

Mutual labels: scraper

flickr scraper

Simple Flickr Image Scraper

Stars: ✭ 148 (+228.89%)

Mutual labels: scraper

robotstxt

robots.txt file parsing and checking for R

Stars: ✭ 65 (+44.44%)

Mutual labels: scraper

Instagram-to-discord

Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!

Stars: ✭ 113 (+151.11%)

Mutual labels: scraper

LeetCode

At present contains scraped data from around 1500 problems present on the site. More to follow....

Stars: ✭ 45 (+0%)

Mutual labels: scraper

CourseCake

By serving course 📚 data that is more "edible" 🍰 for developers, we hope CourseCake offers a smooth approach to build useful tools for students.

Stars: ✭ 21 (-53.33%)

Mutual labels: scraper

View All Similar Projects ➔

Scraper

Node.js based scraper using headless chrome

Installation

$ npm install @jonstuebe/scraper

Features

Scrape top ecommerce sites (Amazon, Walmart, Target)
Return basic product information (title, price, image, description)
Easy to use API to scrape any website

API

Simply require the package and initialize with a url and pass a callback function to receive the data.

es5

const Scraper = require("@jonstuebe/scraper");

// run inside of an async function
(async () => {
  const data = await Scraper.scrapeAndDetect(
    "http://www.amazon.com/gp/product/B00X4WHP5E/"
  );
  console.log(data);
})();

es6

import Scraper from "@jonstuebe/scraper";

// run inside of an async function
(async () => {
  const data = await Scraper("http://www.amazon.com/gp/product/B00X4WHP5E/");
  console.log(data);
})();

with promises

import Scraper from "@jonstuebe/scraper";

Scraper("http://www.amazon.com/gp/product/B00X4WHP5E/").then(data => {
  console.log(data);
});

shared scraper instance

If you are going to be running the scraper a number of times in succession, it's recommended to share the same chromium instance for each sequential/parallel scrape.

import puppeteer from "puppeteer";
import Scraper from "@jonstuebe/scraper";

// run inside of an async function
(async () => {
  const browser = await puppeteer.launch();
  let products = [
    "https://www.target.com/p/corinna-angle-leg-side-table-wood-threshold-8482/-/A-53496420",
    "https://www.target.com/p/glasgow-metal-end-table-black-project-62-8482/-/A-52343433"
  ];

  let productsData = [];
  for (const product of products) {
    const productData = await Scraper(product, browser);
    productsData.push(productData);
  }

  await browser.close(); // make sure and close the browser otherwise the instances will continue to run in the backround on your machine

  console.table(productsData);
})();

emulate devices

If you want to emulate a device, pass in a puppeteer device as the third agument:

import puppeteer from "puppeteer";
import Scraper from "@jonstuebe/scraper";

// run inside of an async function
(async () => {
  const data = await Scraper(
    "http://www.amazon.com/gp/product/B00X4WHP5E/",
    null,
    puppeteer.devices["iPhone SE"]
  );
  console.log(data);
})();

custom scrapers

const Scraper = require("@jonstuebe/scraper");

(async () => {
  const site = {
    name: "npm",
    hosts: ["www.npmjs.com"],
    scrape: async page => {
      const name = await Scraper.getText("div.content-column > h1 > a", page);
      const version = await Scraper.getText(
        "div.sidebar > ul:nth-child(2) > li:nth-child(2) > strong",
        page
      );
      const author = await Scraper.getText(
        "div.sidebar > ul:nth-child(2) > li.last-publisher > a > span",
        page
      );

      return {
        name,
        version,
        author
      };
    }
  };

  const data = await Scraper.scrape(
    "https://www.npmjs.com/package/lodash",
    site
  );
  console.log(data);
})();

Contributing

If you want to add any sites, or just have an idea or feature, go ahead and fork this repo and send me a pull request. I'll be happy to take a look when I can and get back to you.

Issues

For any and all issues/bugs, please post a description and code sample to reproduce the problem on the issues page.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jonstuebe / scraper

Programming Languages

Labels

Projects that are alternatives of or similar to scraper

Scraper

Installation

Features

API

es5

es6

with promises

shared scraper instance

emulate devices

custom scrapers

Contributing

Issues

License