All Projects → rndinfosecguy → Scavenger

rndinfosecguy / Scavenger

Licence: apache-2.0
Crawler (Bot) searching for credential leaks on different paste sites.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scavenger

Moodle Downloader 2
A Moodle downloader that downloads course content fast from Moodle (eg. lecture pdfs)
Stars: ✭ 118 (-65.99%)
Mutual labels:  bot, crawler
Laravel Crawler Detect
A Laravel wrapper for CrawlerDetect - the web crawler detection library
Stars: ✭ 227 (-34.58%)
Mutual labels:  bot, crawler
Onegram
This repository is no longer maintained.
Stars: ✭ 137 (-60.52%)
Mutual labels:  bot, crawler
Vulnx
vulnx 🕷️ is an intelligent bot auto shell injector that detect vulnerabilities in multiple types of cms { `wordpress , joomla , drupal , prestashop .. `}
Stars: ✭ 1,009 (+190.78%)
Mutual labels:  bot, crawler
pb
pb; a command line pastebin service helper (12 supported services)
Stars: ✭ 22 (-93.66%)
Mutual labels:  paste, pastebin
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-80.4%)
Mutual labels:  bot, crawler
Instagram Scraper
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Stars: ✭ 2,209 (+536.6%)
Mutual labels:  bot, crawler
Pastepwn
Python framework to scrape Pastebin pastes and analyze them
Stars: ✭ 87 (-74.93%)
Mutual labels:  pastebin, osint
SourceBin
💻 Sharing code made easy
Stars: ✭ 48 (-86.17%)
Mutual labels:  paste, pastebin
paste
paste is a simple web app for writing & sharing code.
Stars: ✭ 62 (-82.13%)
Mutual labels:  paste, pastebin
Scrapit
Scraping scripts for various websites.
Stars: ✭ 25 (-92.8%)
Mutual labels:  bot, crawler
Privatebin
A minimalist, open source online pastebin where the server has zero knowledge of pasted data. Data is encrypted/decrypted in the browser using 256 bits AES.
Stars: ✭ 3,622 (+943.8%)
Mutual labels:  pastebin, paste
Cardpwn
OSINT Tool to find Breached Credit Cards Information
Stars: ✭ 187 (-46.11%)
Mutual labels:  pastebin, osint
Is Google
Verify that a request is from Google crawlers using Google's DNS verification steps
Stars: ✭ 82 (-76.37%)
Mutual labels:  bot, crawler
Paste.laravel.io
The Laravel.io Pastebin.
Stars: ✭ 135 (-61.1%)
Mutual labels:  pastebin, paste
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-60.23%)
Mutual labels:  bot, crawler
Nekobin
Elegant and open-source pastebin service
Stars: ✭ 61 (-82.42%)
Mutual labels:  pastebin, paste
Nopaste
📋 Client-side paste service
Stars: ✭ 79 (-77.23%)
Mutual labels:  pastebin, paste
rentry
Markdown pastebin from command line
Stars: ✭ 252 (-27.38%)
Mutual labels:  paste, pastebin
Line Bot Tutorial
line-bot-tutorial use python flask
Stars: ✭ 267 (-23.05%)
Mutual labels:  bot, crawler

Scavenger - OSINT Bot


bot in action


Anurag's GitHub stats


Intro

Just the code of my OSINT bot searching for sensitive data leaks on different paste sites.

Search terms:

  • credentials
  • private RSA keys
  • Wordpress configuration files
  • MySQL connect strings
  • onion links
  • links to files hosted inside the onion network (PDF, DOC, DOCX, XLS, XLSX)
  • SQL dumps
  • API keys
  • complete emails

Keep in mind:

  1. This bot is not beautiful. I wrote it quick and dirty and do not care about code conventions or other shit... I will never care about those things.

  2. The code is not complete so far. Some parts like integrating the credentials in a database are missing in this online repository.

  3. If you want to use this code, feel free to do so. Keep in mind you have to customize things to make it run on your system.

  4. I know that I have some false positives and I know that I miss some credentials. So if you think this is crap...ok. leave now. If you have ideas for a better detection, just let me know!

  5. And again: QUICK AND DIRTY! Do not expect nice code.

Articles About Scavenger

IMPORTANT

For pastebin.com the bot can be run in two major modes:

  • API mode
  • Scraping mode (using TOR)

I highly recommend to use the API mode. It is the intended method of scraping pastes from Pastebin.com and it is just fair to do so. The only thing you need is a Pastebin.com PRO account and whitelist your public IP on their site.

To start the bot in API mode just run the program in the following way:

python3 run.py -0

However, it is not always possible to use this intended method, as you might be in NAT mode and therefore you do not have an IP exclusively (whitelisting your IP is not reasonable here). That is the reason I also implemented a scraping mode where fast TOR cycles in combination with reasonable user agents are used to avoid IP blocking and Cloudflare captchas.

To start the bot in scraping mode run it in the following way:

!!! THE TOR SCRAPING MODE DOES NOT WORK AT THE MOMENT !!!

python3 run.py -1

Important note: you need the TOR service installed on your system listening on port 9050. Additionally you need to add the following line to your /etc/tor/torrc file.

MaxCircuitDirtiness 30

This sets the maximum cycle time of TOR to 30 seconds.

To start the module which scrapes random pastes of paste.org just type in the following command:

python3 run.py -2

Usage

To learn how to use the software you just need to call the run.py script with the -h/--help argument.

python3 run.py -h

Output:


  _________
 /   _____/ ____ _____ ___  __ ____   ____    ____   ___________
 \_____  \_/ ___\\__  \\  \/ // __ \ /    \  / ___\_/ __ \_  __ \
 /        \  \___ / __ \\   /\  ___/|   |  \/ /_/  >  ___/|  | \/
/_______  /\___  >____  /\_/  \___  >___|  /\___  / \___  >__|
        \/     \/     \/          \/     \//_____/      \/

usage: run.py [-h] [-0] [-1] [-2]

Control software for the different modules of this paste crawler.

optional arguments:
  -h, --help            show this help message and exit
  -0, --pastebinCOMapi  Activate Pastebin.com module (using API)
  -1, --pastebinCOMtor  Activate Pastebin.com module (standard scraping using
                        TOR to avoid IP blocking)
  -2, --pasteORG        Activate Paste.org module

So far I implemented modules for the following paste sites:

  • Pastebin.com
  • Paste.org

If you want to observe specific mail addresses just add them to the file notification_targets.txt line by line. If the bot identifies one of these addresses on Pastebin it will write the name of the paste and the corresponding password to the file notification_results.txt.


Just start the Pastebin.com module separately (first module I implemented)...

python3 P_bot.py

Pastes are stored in data/raw_pastes until they are more then 48000. When they are more then 48000 they get filtered, ziped and moved to the archive folder. All pastes which contain credentials are stored in data/files_with_passwords


Keep in mind that at the moment only combinations like USERNAME:PASSWORD and other simple combinations are detected. However, there is a tool to search for proxy logs containing credentials.

You can search for proxy logs (URLs with username and password combinations) by using getProxyLogs.py file

python3 getProxyLogs.py data/raw_pastes/

If you want to search the raw data for specific strings you can do it using searchRaw.py (really slow).

python3 searchRaw.py SEARCHSTRING

To see statistics of the bot just call

python3 status.py 

The file findSensitiveData.py searches a folder (with pastes) for sensitive data like credit cards, RSA keys or mysqli_connect strings. Keep in mind that this script uses grep and therefore is really slow on a big amount of paste files. If you want to analyze a big amount of pastes I recommend an ELK-Stack.

python3 findSensitiveData.py data/raw_pastes/ 

There are two scripts stalk_user.py/stalk_user_wrapper.py which can be used to monitor a specific twitter user. This means every tweet he posts gets saved and every containing URL gets downloaded. To start the stalker just execute the wrapper.

python3 stalk_user_wrapper.py

To Do

I discovered other sites like Pastebin which allow to read the latest paste and crawl them. I need to integreate them into my bot. If you know additional sites which are worth a look, just let me know.

Examples:
https://slexy.org/recent
https://ghostbin.co 
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].