All Projects → elnaz → scraper

elnaz / scraper

Licence: MIT license
A web scraper starter project

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to scraper

arachnod
High performance crawler for Nodejs
Stars: ✭ 17 (-5.56%)
Mutual labels:  scraper, cheerio
Cheerio
Fast, flexible, and lean implementation of core jQuery designed specifically for the server.
Stars: ✭ 24,616 (+136655.56%)
Mutual labels:  scraper, cheerio
website-to-json
Converts website to json using jQuery selectors
Stars: ✭ 37 (+105.56%)
Mutual labels:  scraper, cheerio
oge
Page metadata as a service
Stars: ✭ 22 (+22.22%)
Mutual labels:  scraper
esaj
Scrapers for many e-SAJ systems
Stars: ✭ 35 (+94.44%)
Mutual labels:  scraper
PDAP-Scrapers
Code relating to scraping public police data.
Stars: ✭ 72 (+300%)
Mutual labels:  scraper
impartus-downloader
Download Impartus lectures, convert to mkv for offline viewing.
Stars: ✭ 19 (+5.56%)
Mutual labels:  scraper
go-jd
京东App自动登录,在线商品自动下单
Stars: ✭ 158 (+777.78%)
Mutual labels:  scraper
InstagramLocationScraper
No description or website provided.
Stars: ✭ 13 (-27.78%)
Mutual labels:  scraper
MangaReaderScraper
Search and download mangas from the command line
Stars: ✭ 23 (+27.78%)
Mutual labels:  scraper
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+188.89%)
Mutual labels:  scraper
diosts
A Go scraper that validates security.txt files and outputs them in the disclose.io JSON format.
Stars: ✭ 18 (+0%)
Mutual labels:  scraper
google-this
🔎 A simple yet powerful module to retrieve organic search results and much more from Google.
Stars: ✭ 88 (+388.89%)
Mutual labels:  scraper
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (+200%)
Mutual labels:  scraper
stock-market-scraper
Scraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)
Stars: ✭ 110 (+511.11%)
Mutual labels:  scraper
copycat
A PHP Scraping Class
Stars: ✭ 70 (+288.89%)
Mutual labels:  scraper
web-crawljs
web crawler for Nodejs
Stars: ✭ 20 (+11.11%)
Mutual labels:  cheerio
ScrapeM
A monadic web scraping library
Stars: ✭ 17 (-5.56%)
Mutual labels:  scraper
VK-Scraper
Scrapes VK user's photos
Stars: ✭ 42 (+133.33%)
Mutual labels:  scraper
scraper
A simple web scraper built around the JavaFX WebEngine
Stars: ✭ 13 (-27.78%)
Mutual labels:  scraper

Scraper

A starter project for scraping similar data from multiple sources using Node, Cheerio, and Request and saving the result in a MongoDB instance.

Prerequisites

  • Node & NPM
  • A MongoDB server instance (specify its url in config/)
  • An empty Github repo for your version of the scraper

Install

> git clone https://github.com/elnaz/scraper
> cd scraper
> git remote set-url origin [email protected]:YOUR_USERNAME/YOUR_SCRAPER_PROJECT.git
> git push origin master
> npm i

Usage

> npm start

Note: For legal reasons, when you first clone this starter project, it won't work because the example source, /lib/sources/example.js is fake. To add your own sources, see below.

Adding a source

Let's say you need to scrape people from multiple different sources. For each source:

  1. Create a file with the source's name in the /lib/sources/ directory.
  2. In /lib/sources/source-name.js,
  • Define and export a URL constant of the source's web page.
  • Define and export a parsePeople function that takes in a Cheerio selector $, uses it to select the data you want to scrape about each person on the page, and returns an array of parsed JSON people objects.
  1. Require the new source in the SOURCES array of /lib/index.js.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].