All Projects → ioweb → Similar Projects or Alternatives

455 Open source projects that are alternatives of or similar to ioweb

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+10074.19%)

Mutual labels: scraping, web-scraping, web-crawling

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (+119.35%)

Mutual labels: scraping, webscraping, webcrawling

zcrawl

An open source web crawling platform

Stars: ✭ 21 (-32.26%)

Mutual labels: scraping, web-crawling, webcrawling

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+13051.61%)

Mutual labels: scraping, web-scraping, webscraping

chesf

CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages

Stars: ✭ 18 (-41.94%)

Mutual labels: scraping, webscraping

gotor

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

Stars: ✭ 97 (+212.9%)

Mutual labels: webscraping, webcrawling

BookingScraper

🌎 🏨 Scrape Booking.com 🏨 🌎

Stars: ✭ 68 (+119.35%)

Mutual labels: web-scraping, webscraping

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+2193.55%)

Mutual labels: scraping, web-scraping

Humanoid

Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

Stars: ✭ 88 (+183.87%)

Mutual labels: scraping, web-scraping

extractnet

A Dragnet that also extract author, headline, date, keywords from context

Stars: ✭ 52 (+67.74%)

Mutual labels: web-scraping, webscraping

Stock-Fundamental-data-scraping-and-analysis

Project on building a web crawler to collect the fundamentals of the stock and review their performance in one go

Stars: ✭ 40 (+29.03%)

Mutual labels: web-scraping, webcrawling

raspagem-de-dados-fatec

📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí

Stars: ✭ 22 (-29.03%)

Mutual labels: scraping, web-scraping

Gazpacho

🥫 The simple, fast, and modern web scraping library

Stars: ✭ 525 (+1593.55%)

Mutual labels: scraping, webscraping

browser-automation-api

Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.

Stars: ✭ 24 (-22.58%)

Mutual labels: scraping, webscraping

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-51.61%)

Mutual labels: web-scraping, web-crawling

R Web Scraping Cheat Sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

Stars: ✭ 207 (+567.74%)

Mutual labels: web-scraping, webscraping

Configs

Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores

Stars: ✭ 37 (+19.35%)

Mutual labels: scraping, webscraping

newspaperjs

News extraction and scraping. Article Parsing

Stars: ✭ 59 (+90.32%)

Mutual labels: webscraping, webcrawling

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Stars: ✭ 53 (+70.97%)

Mutual labels: scraping, web-scraping

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

Stars: ✭ 71 (+129.03%)

Mutual labels: scraping, web-scraping

anime-scraper

[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3

Stars: ✭ 21 (-32.26%)

Mutual labels: scraping, webscraping

top-github-scraper

Scape top GitHub repositories and users based on keywords

Stars: ✭ 40 (+29.03%)

Mutual labels: scraping, web-scraping

PythonScrapyBasicSetup

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

Stars: ✭ 57 (+83.87%)

Mutual labels: scraping, web-scraping

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-51.61%)

Mutual labels: scraping, web-scraping

schedule-tweet

Schedules tweets using TweetDeck

Stars: ✭ 14 (-54.84%)

Mutual labels: scraping, webscraping

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+793.55%)

Mutual labels: scraping, web-scraping

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+3203.23%)

Mutual labels: scraping, webscraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (+151.61%)

Mutual labels: scraping, web-scraping

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (+222.58%)

Mutual labels: scraping, webscraping

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+1396.77%)

Mutual labels: scraping, web-scraping

Instago

Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram

Stars: ✭ 59 (+90.32%)

Mutual labels: web-scraping, webscraping

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+377.42%)

Mutual labels: scraping, web-scraping

Sqrape

Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)

Stars: ✭ 144 (+364.52%)

Mutual labels: scraping, web-scraping

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+670.97%)

Mutual labels: scraping, web-scraping

codepen-puppeteer

Use Puppeteer to download pens from Codepen.io as single html pages

Stars: ✭ 22 (-29.03%)

Mutual labels: web-scraping

medium-scrapper

Scrap Medium Articles using tags.

Stars: ✭ 34 (+9.68%)

Mutual labels: webscraping

web-poet

Web scraping Page Objects core library

Stars: ✭ 67 (+116.13%)

Mutual labels: web-scraping

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (+70.97%)

Mutual labels: scraping

Crypto-Webminer

Stars: ✭ 166 (+435.48%)

Mutual labels: webmining

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 92 (+196.77%)

Mutual labels: web-scraping

core

The complete web scraping toolkit for PHP.

Stars: ✭ 1,110 (+3480.65%)

Mutual labels: web-scraping

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.

Stars: ✭ 42 (+35.48%)

Mutual labels: scraping

Architeuthis

MITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.

Stars: ✭ 35 (+12.9%)

Mutual labels: scraping

google scraper live view

Application for extracting large amounts of data from the Google search results page

Stars: ✭ 17 (-45.16%)

Mutual labels: webscraping

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (+70.97%)

Mutual labels: scraping

4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

Stars: ✭ 144 (+364.52%)

Mutual labels: scraping

info-bot

🤖 A Versatile Telegram Bot

Stars: ✭ 37 (+19.35%)

Mutual labels: scraping

super-anime-downloader

A program which takes an Anime name or URL and downloads the specified range of episodes.

Stars: ✭ 26 (-16.13%)

Mutual labels: webscraping

crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

Stars: ✭ 70 (+125.81%)

Mutual labels: web-scraping

socials

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.

Stars: ✭ 37 (+19.35%)

Mutual labels: scraping

Goirate

Pillaging the seven seas for torrents, pieces of eight and other bounty.

Stars: ✭ 20 (-35.48%)

Mutual labels: scraping

fBrowser

Helpful Selenium functions to make web-scraping easier and faster

Stars: ✭ 16 (-48.39%)

Mutual labels: webscraping

2017-summer-workshop

Exercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)

Stars: ✭ 33 (+6.45%)

Mutual labels: web-scraping

linkedin-scraper

Tool to scrape linkedin

Stars: ✭ 74 (+138.71%)

Mutual labels: scraping

chopper

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules

Stars: ✭ 22 (-29.03%)

Mutual labels: scraping

scrapers

scrapers for building your own image databases

Stars: ✭ 46 (+48.39%)

Mutual labels: scraping

google-search-results-nodejs

SerpApi client library for Node.js. Previously: Google Search Results Node.js.

Stars: ✭ 46 (+48.39%)

Mutual labels: webscraping

readability-cli

A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!

Stars: ✭ 41 (+32.26%)

Mutual labels: scraping

shorter.recipes

A website dedicated to making recipes from any website easy to read.

Stars: ✭ 27 (-12.9%)

Mutual labels: scraping

Raspagem-de-dados-para-iniciantes

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

Stars: ✭ 113 (+264.52%)

Mutual labels: webcrawling

1-60 of 455 similar projects

›

next*5