All Categories → No Category → scraping-websites

Top 28 scraping-websites open source projects

A Python module to bypass Cloudflare's anti-bot page.

✭ 2,606

python Makefile cloudflare anti-bot-page protected-page scrape scraping-websites

Declarative web scraping

✭ 4,837

go HTML javascript hacktoberfest cli library chrome tool crawler scraper data-mining scraping crawling query-language scraping-websites cdp hacktoberfest2021

Text-Analysis

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

thal

译文：Puppeteer 与 Chrome Headless —— 从入门到爬虫

✭ 651

javascript scraping-websites headless-chrome puppeteer

metafetch

NodeJS package that fetches a given URL's title, description, images, links etc.

✭ 21

javascript typescript scraper meta-tags scraping-websites

kick-off-web-scraping-python-selenium-beautifulsoup

A tutorial-based introduction to web scraping with Python.

✭ 18

python scraper time csv phantomjs pandas-dataframe selenium beautiful-soup data-extraction beautifulsoup selenium-webdriver bs4 scraping-websites data-extractor urllib tabulate

imdb-scraper

🎬 An attempt at the most complete IMDb API

✭ 24

typescript javascript scraper imdb scraping-websites imdb-webscrapping imdb-api imdb-movies imdb-information imdb-dataset scraping-api

scrapism

a work-in-progress guide to web scraping as an artistic and critical practice

✭ 43

python HTML CSS Makefile tutorials webscraping scraping-websites communism

newspaper3 usage overview

This repository provides usage examples for the Python module Newspaper3k.

✭ 78

python news data-extraction newspaper beautifulsoup nlp-parsing scraping-websites python-requests newspaper3k

document-dl

Command line program to download documents from web portals.

✭ 14

python shell scraper scraping scraping-websites document-dl

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

✭ 15

python data-science data machine-learning scraper mongodb nosql web-crawler pymongo web-scraper artificial-intelligence web-scraping scrapping scrapy scraping-websites web-crawling olx web-crawler-python nosql-mongodb

torchestrator

Spin up Tor containers and then proxy HTTP requests via these Tor instances

scavenger

Scrape and take screenshots of dynamic and static webpages

✭ 14

javascript electron nodejs dynamic scraping scraping-websites nightmarejs

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

✭ 51

python crawler scraper scraping crawling scraping-websites proxycrawl proxycrawl-api

ebayMarketAnalyzer

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

✭ 116

python ebay webscraping scraping-websites

Instagram-to-discord

Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!

✭ 113

python instagram scraper monitor discord discordapp scraping discord-bot instagram-scraper instagram-photos scrapper scraping-websites monitors discordbot monitoring-scripts instagram-bot instagram-downloader scraping-python webhook-discord

youtube-audio

extract videos from youtube in audio format using webscraping techniques 🎶

✭ 68

ruby shell audio youtube cipher youtube-downloader webscraping scraping-websites

LeetCode

At present contains scraped data from around 1500 problems present on the site. More to follow....

✭ 45

python SQL java C++data-mining scraper leetcode dataset scraping-websites leetcode-questions leetcode-practice

costco-scrape

No description or website provided.

✭ 19

python scraping-websites

scrapman

Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs

✭ 21

javascript HTML CSS electron scraper scraping scrap javascript-tools scraping-websites

reason-rust-scraper

🦀 Scraping & crawling websites using Rust, and ReasonML

✭ 21

reason C++mysql scraping fullstack reasonml scraping-websites rocket-rs rust-scraping reasonml-rust

big-data-upf

RECSM-UPF Summer School: Social Media and Big Data Research

✭ 21

HTML r social-media facebook twitter big-data rstudio text-analysis social-network-analysis scraping-websites

medium-scrapper

Scrap Medium Articles using tags.

✭ 34

python medium webscraping scraping-websites

gochanges

**[ARCHIVED]** website changes tracker 🔍

readability-cli

A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!

✭ 41

html cli webpage scraping read reader readability scrape cleaner scraping-websites sanitize-html mercury-parser mozilla-readability

pupflare

A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)

✭ 183

javascript Dockerfile docker koa proxy chromium cloudflare anti-bot-page protected-page scrape scraping-websites puppeteer cloudflare-bypass cloudflare-scrape

TradeTheEvent

Implementation of "Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading." In Findings of ACL2021

✭ 64

python scraper news selenium acl dataset stock-market stock-price-prediction trade event-detection scraping-websites bert stock-prediction stock-analysis stock-trading event-driven-trading corporate-event

ryuanime

A free anime streaming , using the jkanime content by scraping the jkanime website.

✭ 20

Vue typescript javascript CSS HTML nodejs vuejs anime desktop-application videos electron-app scraping-websites streaming-video exprees scraping-api jkanime

1-28 of 28 scraping-websites projects