All Projects → PythonScrapyBasicSetup → Similar Projects or Alternatives

563 Open source projects that are alternatives of or similar to PythonScrapyBasicSetup

Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+385.96%)
Mutual labels:  scraping, web-scraping
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+159.65%)
Mutual labels:  scraping, web-scraping
browser-pool
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+24.56%)
Mutual labels:  scraping, web-scraping
ioweb
Web Scraping Framework
Stars: ✭ 31 (-45.61%)
Mutual labels:  scraping, web-scraping
torchestrator
Spin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-43.86%)
Mutual labels:  scraping, tor
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-35.09%)
Mutual labels:  web-scraping, scrapy-framework
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+319.3%)
Mutual labels:  scraping, web-scraping
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-73.68%)
Mutual labels:  scraping, web-scraping
Autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+7052.63%)
Mutual labels:  scraping, web-scraping
selectorlib
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (-7.02%)
Mutual labels:  scraping, web-scraping
raspagem-de-dados-fatec
📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
Stars: ✭ 22 (-61.4%)
Mutual labels:  scraping, web-scraping
Katana
A Python Tool For google Hacking
Stars: ✭ 355 (+522.81%)
Mutual labels:  scraping, tor
Sqrape
Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+152.63%)
Mutual labels:  scraping, web-scraping
Apify Js
Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+5433.33%)
Mutual labels:  scraping, web-scraping
Detect Cms
PHP Library for detecting CMS
Stars: ✭ 78 (+36.84%)
Mutual labels:  scraping, web-scraping
TorScrapper
A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (-57.89%)
Mutual labels:  scraping, tor
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+1147.37%)
Mutual labels:  scraping, web-scraping
top-github-scraper
Scape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-29.82%)
Mutual labels:  scraping, web-scraping
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+714.04%)
Mutual labels:  scraping, web-scraping
Humanoid
Node.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (+54.39%)
Mutual labels:  scraping, web-scraping
Googlescraper
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Stars: ✭ 2,363 (+4045.61%)
Mutual labels:  scraping
UofT-Timetable-Generator
A web application that generates timetables for university students at the University of Toronto
Stars: ✭ 34 (-40.35%)
Mutual labels:  web-scraping
Idt
Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.
Stars: ✭ 202 (+254.39%)
Mutual labels:  scraping
Jsonframe Cheerio
simple multi-level scraper json input/output for Cheerio
Stars: ✭ 196 (+243.86%)
Mutual labels:  scraping
concurrent-web-scraping
Building a Concurrent Web Scraper with Python and Selenium
Stars: ✭ 28 (-50.88%)
Mutual labels:  web-scraping
Musoq
Use SQL on various data sources
Stars: ✭ 252 (+342.11%)
Mutual labels:  scraping
Anime Dl
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Stars: ✭ 190 (+233.33%)
Mutual labels:  scraping
Panther
A browser testing and web crawling library for PHP and Symfony
Stars: ✭ 2,480 (+4250.88%)
Mutual labels:  scraping
wayback
⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (-8.77%)
Mutual labels:  web-scraping
Jikan Rest
The REST API for Jikan
Stars: ✭ 200 (+250.88%)
Mutual labels:  scraping
Whatsapp-Net
Generate a network graph of connections from your WhatsApp groups data
Stars: ✭ 75 (+31.58%)
Mutual labels:  scraping
Antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+247.37%)
Mutual labels:  scraping
List Of User Agents
List of major web + mobile browser user agent strings. +1 Bonus script to scrape :)
Stars: ✭ 247 (+333.33%)
Mutual labels:  scraping
Juriscraper
An API to scrape American court websites for metadata.
Stars: ✭ 194 (+240.35%)
Mutual labels:  scraping
Pahe.ph-Scraper
Pahe.ph [Pahe.in] Movies Website Scraper
Stars: ✭ 57 (+0%)
Mutual labels:  scraping
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+200%)
Mutual labels:  scraping
Memorious
Distributed crawling framework for documents and structured data.
Stars: ✭ 248 (+335.09%)
Mutual labels:  scraping
Linkedin Learning Downloader
Linkedin Learning videos downloader
Stars: ✭ 171 (+200%)
Mutual labels:  scraping
Requests Html
Pythonic HTML Parsing for Humans™
Stars: ✭ 12,268 (+21422.81%)
Mutual labels:  scraping
google-scraper
This class can retrieve search results from Google.
Stars: ✭ 33 (-42.11%)
Mutual labels:  scraping
Loconotion
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Stars: ✭ 237 (+315.79%)
Mutual labels:  scraping
Secret Agent
The web browser that's built for scraping.
Stars: ✭ 151 (+164.91%)
Mutual labels:  scraping
Xquery
Extract data or evaluate value from HTML/XML documents using XPath
Stars: ✭ 155 (+171.93%)
Mutual labels:  scraping
Jsoup Annotations
Jsoup Annotations POJO
Stars: ✭ 242 (+324.56%)
Mutual labels:  scraping
Serpscrap
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Stars: ✭ 153 (+168.42%)
Mutual labels:  scraping
onionfruit
OnionFruit™ Connect - Tor access client with country selection, bridge configuration, pluggable transports and experimental DNS support
Stars: ✭ 150 (+163.16%)
Mutual labels:  tor
pickall
.NET agile and extensible web searching API
Stars: ✭ 25 (-56.14%)
Mutual labels:  scraping
garlicshare
Private and self-hosted file sharing over the Tor network written in golang
Stars: ✭ 110 (+92.98%)
Mutual labels:  tor
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+321.05%)
Mutual labels:  scraping
Shadow Useragent
Pick the most common user-agents on the Internet 👻
Stars: ✭ 147 (+157.89%)
Mutual labels:  scraping
Fantasy Basketball
Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Stars: ✭ 146 (+156.14%)
Mutual labels:  scraping
Embed
Get info from any web service or page
Stars: ✭ 1,808 (+3071.93%)
Mutual labels:  scraping
compose-scripts-tor
compose scripts for tor-based projects
Stars: ✭ 23 (-59.65%)
Mutual labels:  tor
Scrapysharp
reborn of https://bitbucket.org/rflechner/scrapysharp
Stars: ✭ 226 (+296.49%)
Mutual labels:  scraping
Educative.io Downloader
📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: ✭ 139 (+143.86%)
Mutual labels:  scraping
Search Engine Google
🕷 Google client for SERPS
Stars: ✭ 138 (+142.11%)
Mutual labels:  scraping
Arachnid
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Stars: ✭ 224 (+292.98%)
Mutual labels:  scraping
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (+140.35%)
Mutual labels:  scraping
Torchbear
🔥🐻 The Speakeasy Scripting Engine Which Combines Speed, Safety, and Simplicity
Stars: ✭ 128 (+124.56%)
Mutual labels:  scraping
github-languages
Tiny little ruby on rails website that crawls though your public github repos to find out what your favourite languages are.
Stars: ✭ 23 (-59.65%)
Mutual labels:  scraping
1-60 of 563 similar projects