All Projects → jonstuebe → scraper

jonstuebe / scraper

Licence: MIT license
Node.js based scraper using headless chrome

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to scraper

instagram-hashtag-scraper
NodeJS application for scraping recent top posts from Instagram by hashtag without API access.
Stars: ✭ 17 (-62.22%)
Mutual labels:  scraper
sotoki
StackExchange websites to ZIM scraper
Stars: ✭ 64 (+42.22%)
Mutual labels:  scraper
ha-multiscrape
Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (+128.89%)
Mutual labels:  scraper
jsonHunter
在线爬虫,online web scraper
Stars: ✭ 86 (+91.11%)
Mutual labels:  scraper
subreddit-comments-dl
Download subreddit comments
Stars: ✭ 57 (+26.67%)
Mutual labels:  scraper
TikTok
Download public videos on TikTok using Python with Selenium
Stars: ✭ 37 (-17.78%)
Mutual labels:  scraper
lux
👾 Fast and simple video download library and CLI tool written in Go
Stars: ✭ 19,266 (+42713.33%)
Mutual labels:  scraper
TelegramScraper
Using this tool you can easily add so many members from any group to your group. Less than 2 minutes. Super easy. Time saver. But this tool is only for educational purpose. You could be banned from Telegram. So be careful. Recommanded to use this tool only on Termux.
Stars: ✭ 234 (+420%)
Mutual labels:  scraper
OnlyFans
Scrape all the media from an OnlyFans account - Updated regularly
Stars: ✭ 573 (+1173.33%)
Mutual labels:  scraper
civic-scraper
Tools for downloading agendas, minutes and other documents produced by local government
Stars: ✭ 21 (-53.33%)
Mutual labels:  scraper
scrapman
Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-53.33%)
Mutual labels:  scraper
instagram-get-images
Instagram get images 🌄 (hashtags, account, locations) with puppeteer
Stars: ✭ 69 (+53.33%)
Mutual labels:  scraper
unfurl
Extract rich metadata from URLs
Stars: ✭ 41 (-8.89%)
Mutual labels:  scraper
rymscraper
Python API to extract data from rateyourmusic.com.
Stars: ✭ 63 (+40%)
Mutual labels:  scraper
youtube-playlist
❄️ Extract links, ids, and names from a youtube playlist
Stars: ✭ 73 (+62.22%)
Mutual labels:  scraper
flickr scraper
Simple Flickr Image Scraper
Stars: ✭ 148 (+228.89%)
Mutual labels:  scraper
robotstxt
robots.txt file parsing and checking for R
Stars: ✭ 65 (+44.44%)
Mutual labels:  scraper
Instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+151.11%)
Mutual labels:  scraper
LeetCode
At present contains scraped data from around 1500 problems present on the site. More to follow....
Stars: ✭ 45 (+0%)
Mutual labels:  scraper
CourseCake
By serving course 📚 data that is more "edible" 🍰 for developers, we hope CourseCake offers a smooth approach to build useful tools for students.
Stars: ✭ 21 (-53.33%)
Mutual labels:  scraper

Scraper

Node.js based scraper using headless chrome

version dependecies build

Installation

$ npm install @jonstuebe/scraper

Features

  • Scrape top ecommerce sites (Amazon, Walmart, Target)
  • Return basic product information (title, price, image, description)
  • Easy to use API to scrape any website

API

Simply require the package and initialize with a url and pass a callback function to receive the data.

es5

const Scraper = require("@jonstuebe/scraper");

// run inside of an async function
(async () => {
  const data = await Scraper.scrapeAndDetect(
    "http://www.amazon.com/gp/product/B00X4WHP5E/"
  );
  console.log(data);
})();

es6

import Scraper from "@jonstuebe/scraper";

// run inside of an async function
(async () => {
  const data = await Scraper("http://www.amazon.com/gp/product/B00X4WHP5E/");
  console.log(data);
})();

with promises

import Scraper from "@jonstuebe/scraper";

Scraper("http://www.amazon.com/gp/product/B00X4WHP5E/").then(data => {
  console.log(data);
});

shared scraper instance

If you are going to be running the scraper a number of times in succession, it's recommended to share the same chromium instance for each sequential/parallel scrape.

import puppeteer from "puppeteer";
import Scraper from "@jonstuebe/scraper";

// run inside of an async function
(async () => {
  const browser = await puppeteer.launch();
  let products = [
    "https://www.target.com/p/corinna-angle-leg-side-table-wood-threshold-8482/-/A-53496420",
    "https://www.target.com/p/glasgow-metal-end-table-black-project-62-8482/-/A-52343433"
  ];

  let productsData = [];
  for (const product of products) {
    const productData = await Scraper(product, browser);
    productsData.push(productData);
  }

  await browser.close(); // make sure and close the browser otherwise the instances will continue to run in the backround on your machine

  console.table(productsData);
})();

emulate devices

If you want to emulate a device, pass in a puppeteer device as the third agument:

import puppeteer from "puppeteer";
import Scraper from "@jonstuebe/scraper";

// run inside of an async function
(async () => {
  const data = await Scraper(
    "http://www.amazon.com/gp/product/B00X4WHP5E/",
    null,
    puppeteer.devices["iPhone SE"]
  );
  console.log(data);
})();

custom scrapers

const Scraper = require("@jonstuebe/scraper");

(async () => {
  const site = {
    name: "npm",
    hosts: ["www.npmjs.com"],
    scrape: async page => {
      const name = await Scraper.getText("div.content-column > h1 > a", page);
      const version = await Scraper.getText(
        "div.sidebar > ul:nth-child(2) > li:nth-child(2) > strong",
        page
      );
      const author = await Scraper.getText(
        "div.sidebar > ul:nth-child(2) > li.last-publisher > a > span",
        page
      );

      return {
        name,
        version,
        author
      };
    }
  };

  const data = await Scraper.scrape(
    "https://www.npmjs.com/package/lodash",
    site
  );
  console.log(data);
})();

Contributing

If you want to add any sites, or just have an idea or feature, go ahead and fork this repo and send me a pull request. I'll be happy to take a look when I can and get back to you.

Issues

For any and all issues/bugs, please post a description and code sample to reproduce the problem on the issues page.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].