Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cassidoo → Scrapers

cassidoo / Scrapers

A list of scrapers from around the web.

Labels

list scraper web-scraper

Projects that are alternatives of or similar to Scrapers

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-95.9%)

Mutual labels: scraper, web-scraper

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+79.23%)

Mutual labels: scraper, web-scraper

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+1209.56%)

Mutual labels: scraper, web-scraper

A simple browser/client-side web scraper.

Stars: ✭ 238 (-34.97%)

Mutual labels: scraper, web-scraper

yellowpages-scraper

Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.

Stars: ✭ 56 (-84.7%)

Mutual labels: scraper, web-scraper

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (-34.7%)

Mutual labels: scraper, web-scraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (-59.56%)

Mutual labels: scraper, web-scraper

Linkedin-Client

Web scraper for grabing data from Linkedin profiles or company pages (personal project)

Stars: ✭ 42 (-88.52%)

Mutual labels: scraper, web-scraper

AzurLaneWikiScrapers

A console application that can scrape the Azur Lane wiki and export the data to Json files

Stars: ✭ 12 (-96.72%)

Mutual labels: scraper, web-scraper

Awesome Newsletters

The best (weekly) newsletters

Stars: ✭ 335 (-8.47%)

Mutual labels: list

Curated list of Prolog packages and resources

Stars: ✭ 342 (-6.56%)

Mutual labels: list

Awesome Command Line Apps

🐚 Use your terminal shell to do awesome things.

Stars: ✭ 3,572 (+875.96%)

Mutual labels: list

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+1013.93%)

Mutual labels: scraper

A curated list of awesome lorem ipsum generators.

Stars: ✭ 345 (-5.74%)

Mutual labels: list

Promise packages, patterns, chat, and tutorials

Stars: ✭ 3,779 (+932.51%)

Mutual labels: list

A curated list of awesome CMake resources, scripts, modules and examples.

Stars: ✭ 3,970 (+984.7%)

Mutual labels: list

BxJS Weekly news podcast links collection

Stars: ✭ 326 (-10.93%)

Mutual labels: list

Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).

Stars: ✭ 321 (-12.3%)

Mutual labels: list

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (-2.19%)

Mutual labels: list

Awesome Crypto Trading Bots

Awesome crypto trading bots

Stars: ✭ 337 (-7.92%)

Mutual labels: list

View All Similar Projects ➔

Scrapers

A list of scrapers from around the web.

Find your way through with the Table of Contents. It will showcase the entire list with easy navigate to their pros and cons while also providing links to their respective websites.

Please contribute by adding links, adding pros/cons, titles, or anything else you think would be helpful! Please help maintain alphabetical order.

Table Of Contents

Apifier

Description: Cloud-based scraper for JavaScript.

Applicable Language(s)

JavaScript

Beautiful Soup

Description: A Python library for navigating and parsing results from the Web. It allow for searching the HTML tree to find various tags.

Applicable Language(s)

Python

Cheerio

Description:Fast, flexible & lean implementation of core jQuery designed

Applicable Language(s)

JavaScript

Clearbit

Description: Service for looking up company and people information.

Applicable Language(s)

Common Crawl

Description: Open dataset of crawled websites.

Applicable Language(s)

Crawly

Description: Automatic service that turns a website into structured data in the form of JSON or CSV.

Applicable Language(s)

Dexi.io

Description: Website data extraction using a visual programming language.

Applicable Language(s)

Diffbot

Description: Automated tool for extracting structured information from pages, crawling websites, and turning a website into an API.

Applicable Language(s)

Diggernaut

Description: Cloud based web scraping platform.

Applicable Language(s)

SML
Javascript

Pros

Scraper can be build using visual tool and scraping meta language
Can execute JS snippets inside scraper
Supports Selenium (optionally) and OCR
Automated data validation and export to any text based format
Can run scrapers manually and scheduled in the cloud or compile and run locally
Full automation using API and integrations with other APIs

Cons

Currently in beta
Doesn't support PDF parsing yet

eLink

Description: Tool to mine LinkedIn profiles based on keywords.

Applicable Language(s)

EliteProxySwitcher

Description: Local software that can download a proxy list and let users choose which one to use.

Applicable Language(s)

Email Hunter

Description: API to find e-mail addresses for a given domain name.

Applicable Language(s)

FiveFilters

Description: Provide various website extraction and transformation tools such as Full-Text RSS and Term Extraction as services.

Applicable Language(s)

FMiner

Description: Local software for web scraping using a recording and a visual programming language.

Applicable Language(s)

FullContact

Description: API to retrieve more information on a person.

Applicable Language(s)

Grabby

Description: Service that searches a website for e-mails.

Applicable Language(s)

HrefScrap

Description: A chrome extension which scrapes off all the href's from a web page.

Applicable Language(s)

Import.io

Description: Automated tool to extract structured information from websites.

Applicable Language(s)

Kimonolabs

Description: Kimono was acquired by Palantir. This was a cloud-based service for turning websites into structured APIs. Now they offer a desktop-based alternative for continuing to use their tools.

Applicable Language(s)

lxml

Description: lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.

Pros

Incredibly fast (see: Python HTML Parser Performance)

Applicable Language(s)

Python

Mozenda

Description: Extract structured information from HTML, PDF, Excel, and Word by clicking on document elements.

Applicable Language(s)

Morph.io

Description: Based on ScraperWiki, run scrapers in Python, Ruby, R, Perl or Node.js.

Applicable Language(s)

Node.js
Perl
Python
R
Ruby

Node-Crawler

Description: Web Crawler/Spider for NodeJS + server-side jQuery

Applicable Language(s)

Node.js

Nutch

Description: Web crawler that can be combined with the Hadoop ecosystem to run in a cluster.

Applicable Language(s)

Outwit Hub

Description: Application that can extract information from a website and turn it into structured data (CSV, Excel, etc.).

Applicable Language(s)

Octoparse

Description: The free web scraping tool for extracting all the web page data into several structured file formats easily and effectively.

Applicable Language(s)

rvest

Description: R package to scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.

Applicable Language(s)

R

scrape-it

Description: A Node.js scraper for humans.

Applicable Language(s)

JavaScript (Node.js)

Scraper.AI

Description: Scraper.AI is an automated scraping SaaS that makes extracting data from any webpage as simple as clicking and selecting what you want. With a few clicks you can gather thousands of records.

Best of all, changes to the selections are monitored as often as you want. Updates are pushed to a consumable API for you to build on top of it

Applicable Language(s)

Any, through a JSON API and (optional) webhook

ScraperWiki

Description: Write a scraper in the browser and run on their cloud-based service. This is used by many news organisations.

Applicable Language(s)

ScrapingAnt

Description: ScrapingAnt is a Headless Chrome scraping API and free checked proxies service. ScrapingAnt supports Javascript rendering, premium rotating proxies and CAPTCHAs avoiding tools. Free plans available.

Applicable Language(s)

Any, through a JSON API

Scrapinghub

Description: Scraper cloud hosting as a service. Allows developers to deploy their own scrapers on their platform and benefit from their existing infrastructure.

Applicable Language(s)

Screen Scraper

Description: Local tool for scraping websites.

Applicable Language(s)

Toofr

Description: Service for looking up business e-mails.

Applicable Language(s)

UBot Studio

Description: Web automation software using a visual programming language and recorder.

Applicable Language(s)

UiPath

Description: Visual tool for GUI automation by recording.

Applicable Language(s)

Venom

Description: Venom is an open source focused crawler for the Deep Web.

Features

Multi-threaded
Structured crawling
Page Validation
Automatic Retries
Proxy support

Applicable Language(s)

JAVA

Web Robots

Description: Data as a Service platform for web scraping.

Pros

Scraping dynamic javascript heavy websites
Login and form fill on websites
Data normalization and validation
Data uploads

Cons

Currently in beta
Possible payment model in the future

Applicable Language(s)

Web Scraper

Description: Extension that downloads websites and turns them into structured data. Data is selected by element or by specialised selectors (e.g., for tables).

Applicable Language(s)

WrapAPI

Description: Turn a website into an API. The structure of the data is defined by clicking elements or regular expressions.

Applicable Language(s)

X-Ray

Description: NPM module for scraping structured data via jQuery-like selectors.

Applicable Language(s)

JavaScript (Node.js)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 366

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗