All Projects → little-endian-0x01 → TorScrapper

little-endian-0x01 / TorScrapper

Licence: other
A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TorScrapper

Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (+4.17%)
Mutual labels:  scraper, scraping, beautifulsoup
Katana
A Python Tool For google Hacking
Stars: ✭ 355 (+1379.17%)
Mutual labels:  scraper, scraping, tor
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-37.5%)
Mutual labels:  scraper, scraping
copycat
A PHP Scraping Class
Stars: ✭ 70 (+191.67%)
Mutual labels:  scraper, scraping
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+116.67%)
Mutual labels:  scraper, scraping
ha-multiscrape
Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (+329.17%)
Mutual labels:  scraper, scraping
Instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+370.83%)
Mutual labels:  scraper, scraping
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (+125%)
Mutual labels:  scraper, scraping
BookingScraper
🌎 🏨 Scrape Booking.com 🏨 🌎
Stars: ✭ 68 (+183.33%)
Mutual labels:  scraper, beautifulsoup
Captcha-Tools
All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (-4.17%)
Mutual labels:  scraper, scraping
peeling-onions
A repository to store Deep Web (onion domain) crawler, scraper, and NLP tools for Tor network.
Stars: ✭ 18 (-25%)
Mutual labels:  scraper, tor
scrapman
Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-12.5%)
Mutual labels:  scraper, scraping
html-table-extractor
extract data from html table
Stars: ✭ 74 (+208.33%)
Mutual labels:  scraping, beautifulsoup
proxycrawl-python
ProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+112.5%)
Mutual labels:  scraper, scraping
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+120.83%)
Mutual labels:  scraper, scraping
torchestrator
Spin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (+33.33%)
Mutual labels:  scraping, tor
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-8.33%)
Mutual labels:  scraper, scraping
gochanges
**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-50%)
Mutual labels:  scraper, scraping
scrapers
scrapers for building your own image databases
Stars: ✭ 46 (+91.67%)
Mutual labels:  scraper, scraping
document-dl
Command line program to download documents from web portals.
Stars: ✭ 14 (-41.67%)
Mutual labels:  scraper, scraping

TorScrapper

A basic scrapper made in python with BeautifulSoup and Tor support to -

  • Scrape Onion and normal links.
  • Save the output in html format in Output folder.
  • Filter the html output and strip out useful data only (Work in Progress).
  • Striping out IOCs and other related data (On To-Do list).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

  • You will need Python3 to run this project smoothly. Go to your terminal and execute the following command or visit Python3 website.
[sudo] apt-get install python3 python3-dev
[sudo] pip3 install -r requirements.txt

TL;DR: We recommend installing TorScrapper inside a virtual environment on all platforms.

Python packages can be installed either globally (a.k.a system wide), or in user-space. We do not recommend installing TorScrapper system wide.

Instead, we recommend that you install TorScrapper within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your system tools and scripts), and still install packages normally with pip (without sudo and the likes).

To get started with virtual environments, see virtualenv installation instructions. To install it globally (having it globally installed actually helps here), it should be a matter of running:

[sudo] pip install virtualenv

Basic setup

Before you run the torBot make sure the following things are done properly:

  • Run tor service sudo service tor start

  • Set a password for tor tor --hash-password "my_password"

  • Give the password inside /Modules/Scrape.py from stem.control import Controller with Controller.from_port(port = 9051) as controller: controller.authenticate("your_password_hash") controller.signal(Signal.NEWNYM)

  • Go to /etc/tor/torrc and uncomment - ControlPort 9051

Read more about torrc here : Torrc

Deployment

A step by step series of examples that tells what you have to do to get this project running -

  • Enter the project directory.
  • Copy all the onion and normal links you want to scrape in onions.txt
[nano]/[vim]/[gedit]/[Your choice of editor] onions.txt
  • Run TorScrapper.py using Python3
[sudo] python3 TorScrapper.py
  • Check the scraped outputs in Output folder.

Built With

  • Python - Python programming language.
  • Tor - If you don't know about Tor then you probably shouldn't be here :)
  • BeautifulSoup - Beautiful Soup is a Python library for pulling data out of HTML and XML files.

Contributing

If you have new ideas which is worth implementing, mention those by starting a new issue with the title [FEATURE_REQUEST]. If the idea is worth implementing, congratz you are now a contributor.

Versioning

Version 1.something Mehh...

Authors

  • Shivam Kapoor - An avid learner who likes to know every tiny detail in working of real life systems. Real enthusiast of cyber security and underlying networking concepts. (Email - [email protected])

License

Too lazy to decide on a License. zZzZ

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].