All Projects → PythonScrapyBasicSetup → Similar Projects or Alternatives

563 Open source projects that are alternatives of or similar to PythonScrapyBasicSetup

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+385.96%)

Mutual labels: scraping, web-scraping

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+159.65%)

Mutual labels: scraping, web-scraping

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

Stars: ✭ 71 (+24.56%)

Mutual labels: scraping, web-scraping

ioweb

Web Scraping Framework

Stars: ✭ 31 (-45.61%)

Mutual labels: scraping, web-scraping

torchestrator

Spin up Tor containers and then proxy HTTP requests via these Tor instances

Stars: ✭ 32 (-43.86%)

Mutual labels: scraping, tor

IMDB-Scraper

Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.

Stars: ✭ 37 (-35.09%)

Mutual labels: web-scraping, scrapy-framework

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+319.3%)

Mutual labels: scraping, web-scraping

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-73.68%)

Mutual labels: scraping, web-scraping

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+7052.63%)

Mutual labels: scraping, web-scraping

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Stars: ✭ 53 (-7.02%)

Mutual labels: scraping, web-scraping

raspagem-de-dados-fatec

📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí

Stars: ✭ 22 (-61.4%)

Mutual labels: scraping, web-scraping

Katana

A Python Tool For google Hacking

Stars: ✭ 355 (+522.81%)

Mutual labels: scraping, tor

Sqrape

Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)

Stars: ✭ 144 (+152.63%)

Mutual labels: scraping, web-scraping

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+5433.33%)

Mutual labels: scraping, web-scraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (+36.84%)

Mutual labels: scraping, web-scraping

TorScrapper

A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

Stars: ✭ 24 (-57.89%)

Mutual labels: scraping, tor

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+1147.37%)

Mutual labels: scraping, web-scraping

top-github-scraper

Scape top GitHub repositories and users based on keywords

Stars: ✭ 40 (-29.82%)

Mutual labels: scraping, web-scraping

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+714.04%)

Mutual labels: scraping, web-scraping

Humanoid

Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

Stars: ✭ 88 (+54.39%)

Mutual labels: scraping, web-scraping

Googlescraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

Stars: ✭ 2,363 (+4045.61%)

Mutual labels: scraping

UofT-Timetable-Generator

A web application that generates timetables for university students at the University of Toronto

Stars: ✭ 34 (-40.35%)

Mutual labels: web-scraping

Idt

Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.

Stars: ✭ 202 (+254.39%)

Mutual labels: scraping

Jsonframe Cheerio

simple multi-level scraper json input/output for Cheerio

Stars: ✭ 196 (+243.86%)

Mutual labels: scraping

concurrent-web-scraping

Building a Concurrent Web Scraper with Python and Selenium

Stars: ✭ 28 (-50.88%)

Mutual labels: web-scraping

Musoq

Use SQL on various data sources

Stars: ✭ 252 (+342.11%)

Mutual labels: scraping

Anime Dl

Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.

Stars: ✭ 190 (+233.33%)

Mutual labels: scraping

Panther

A browser testing and web crawling library for PHP and Symfony

Stars: ✭ 2,480 (+4250.88%)

Mutual labels: scraping

wayback

⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs

Stars: ✭ 52 (-8.77%)

Mutual labels: web-scraping

Jikan Rest

The REST API for Jikan

Stars: ✭ 200 (+250.88%)

Mutual labels: scraping

Whatsapp-Net

Generate a network graph of connections from your WhatsApp groups data

Stars: ✭ 75 (+31.58%)

Mutual labels: scraping

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (+247.37%)

Mutual labels: scraping

List Of User Agents

List of major web + mobile browser user agent strings. +1 Bonus script to scrape :)

Stars: ✭ 247 (+333.33%)

Mutual labels: scraping

Juriscraper

An API to scrape American court websites for metadata.

Stars: ✭ 194 (+240.35%)

Mutual labels: scraping

Pahe.ph-Scraper

Pahe.ph [Pahe.in] Movies Website Scraper

Stars: ✭ 57 (+0%)

Mutual labels: scraping

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (+200%)

Mutual labels: scraping

Memorious

Distributed crawling framework for documents and structured data.

Stars: ✭ 248 (+335.09%)

Mutual labels: scraping

Linkedin Learning Downloader

Linkedin Learning videos downloader

Stars: ✭ 171 (+200%)

Mutual labels: scraping

Requests Html

Pythonic HTML Parsing for Humans™

Stars: ✭ 12,268 (+21422.81%)

Mutual labels: scraping

google-scraper

This class can retrieve search results from Google.

Stars: ✭ 33 (-42.11%)

Mutual labels: scraping

Loconotion

📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

Stars: ✭ 237 (+315.79%)

Mutual labels: scraping

Secret Agent

The web browser that's built for scraping.

Stars: ✭ 151 (+164.91%)

Mutual labels: scraping

Xquery

Extract data or evaluate value from HTML/XML documents using XPath

Stars: ✭ 155 (+171.93%)

Mutual labels: scraping

Jsoup Annotations

Jsoup Annotations POJO

Stars: ✭ 242 (+324.56%)

Mutual labels: scraping

Serpscrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.

Stars: ✭ 153 (+168.42%)

Mutual labels: scraping

onionfruit

OnionFruit™ Connect - Tor access client with country selection, bridge configuration, pluggable transports and experimental DNS support

Stars: ✭ 150 (+163.16%)

Mutual labels: tor

pickall

.NET agile and extensible web searching API

Stars: ✭ 25 (-56.14%)

Mutual labels: scraping

garlicshare

Private and self-hosted file sharing over the Tor network written in golang

Stars: ✭ 110 (+92.98%)

Mutual labels: tor

Reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Stars: ✭ 240 (+321.05%)

Mutual labels: scraping

Shadow Useragent

Pick the most common user-agents on the Internet 👻

Stars: ✭ 147 (+157.89%)

Mutual labels: scraping

Fantasy Basketball

Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.

Stars: ✭ 146 (+156.14%)

Mutual labels: scraping

Embed

Get info from any web service or page

Stars: ✭ 1,808 (+3071.93%)

Mutual labels: scraping

compose-scripts-tor

compose scripts for tor-based projects

Stars: ✭ 23 (-59.65%)

Mutual labels: tor

Scrapysharp

reborn of https://bitbucket.org/rflechner/scrapysharp

Stars: ✭ 226 (+296.49%)

Mutual labels: scraping

Educative.io Downloader

📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.

Stars: ✭ 139 (+143.86%)

Mutual labels: scraping

Search Engine Google

🕷 Google client for SERPS

Stars: ✭ 138 (+142.11%)

Mutual labels: scraping

Arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

Stars: ✭ 224 (+292.98%)

Mutual labels: scraping

Udemycoursegrabber

Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!

Stars: ✭ 137 (+140.35%)

Mutual labels: scraping

Torchbear

🔥🐻 The Speakeasy Scripting Engine Which Combines Speed, Safety, and Simplicity

Stars: ✭ 128 (+124.56%)

Mutual labels: scraping

github-languages

Tiny little ruby on rails website that crawls though your public github repos to find out what your favourite languages are.

Stars: ✭ 23 (-59.65%)

Mutual labels: scraping

1-60 of 563 similar projects

›

next*5