All Projects → ameenmaali → urldedupe

ameenmaali / urldedupe

Licence: MIT license
Pass in a list of URLs with query strings, get back a unique list of URLs and query string combinations

Programming Languages

C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to urldedupe

tugarecon
Pentest: Subdomains enumeration tool for penetration testers.
Stars: ✭ 142 (-31.73%)
Mutual labels:  penetration-testing, infosec, bugbounty
Dirsearch
Web path scanner
Stars: ✭ 7,246 (+3383.65%)
Mutual labels:  penetration-testing, infosec, bugbounty
aquatone
A Tool for Domain Flyovers
Stars: ✭ 43 (-79.33%)
Mutual labels:  penetration-testing, infosec, bugbounty
Resources
A Storehouse of resources related to Bug Bounty Hunting collected from different sources. Latest guides, tools, methodology, platforms tips, and tricks curated by us.
Stars: ✭ 62 (-70.19%)
Mutual labels:  penetration-testing, infosec, bugbounty
Rengine
reNgine is an automated reconnaissance framework for web applications with a focus on highly configurable streamlined recon process via Engines, recon data correlation and organization, continuous monitoring, backed by a database, and simple yet intuitive User Interface. reNgine makes it easy for penetration testers to gather reconnaissance with…
Stars: ✭ 3,439 (+1553.37%)
Mutual labels:  penetration-testing, infosec, bugbounty
Crithit
Takes a single wordlist item and tests it one by one over a large collection of websites before moving onto the next. Create signatures to cross-check vulnerabilities over multiple hosts.
Stars: ✭ 182 (-12.5%)
Mutual labels:  penetration-testing, infosec, bugbounty
S3Scan
Script to spider a website and find publicly open S3 buckets
Stars: ✭ 21 (-89.9%)
Mutual labels:  penetration-testing, infosec
py-scripts-other
A collection of some of my scripts
Stars: ✭ 79 (-62.02%)
Mutual labels:  infosec, bugbounty
crtfinder
Fast tool to extract all subdomains from crt.sh website. Output will be up to sub.sub.sub.subdomain.com with standard and advanced search techniques
Stars: ✭ 96 (-53.85%)
Mutual labels:  penetration-testing, bugbounty
vaf
Vaf is a cross-platform very advanced and fast web fuzzer written in nim
Stars: ✭ 294 (+41.35%)
Mutual labels:  penetration-testing, bugbounty
Awesome Bbht
A bash script that will automatically install a list of bug hunting tools that I find interesting for recon, exploitation, etc. (minus burp) For Ubuntu/Debain.
Stars: ✭ 190 (-8.65%)
Mutual labels:  penetration-testing, bugbounty
h1-search
Tool that will request the public disclosures on a specific HackerOne program and show them in a localhost webserver.
Stars: ✭ 58 (-72.12%)
Mutual labels:  infosec, bugbounty
PyParser-CVE
Multi source CVE/exploit parser.
Stars: ✭ 25 (-87.98%)
Mutual labels:  penetration-testing, infosec
boxer
Boxer: A fast directory bruteforce tool written in Python with concurrency.
Stars: ✭ 15 (-92.79%)
Mutual labels:  penetration-testing, bugbounty
Cameradar
Cameradar hacks its way into RTSP videosurveillance cameras
Stars: ✭ 2,775 (+1234.13%)
Mutual labels:  penetration-testing, infosec
fuzzmost
all manner of wordlists
Stars: ✭ 23 (-88.94%)
Mutual labels:  infosec, bugbounty
Wstg
The Web Security Testing Guide is a comprehensive Open Source guide to testing the security of web applications and web services.
Stars: ✭ 3,873 (+1762.02%)
Mutual labels:  penetration-testing, bugbounty
KaliIntelligenceSuite
Kali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.
Stars: ✭ 58 (-72.12%)
Mutual labels:  penetration-testing, bugbounty
rejig
Turn your VPS into an attack box
Stars: ✭ 33 (-84.13%)
Mutual labels:  infosec, bugbounty
magicRecon
MagicRecon is a powerful shell script to maximize the recon and data collection process of an objective and finding common vulnerabilities, all this saving the results obtained in an organized way in directories and with various formats.
Stars: ✭ 478 (+129.81%)
Mutual labels:  infosec, bugbounty

urldedupe

urldedupe is a tool to quickly pass in a list of URLs, and get back a list of deduplicated (unique) URL and query string combination. This is useful to ensure you don't have a URL list will hundreds of duplicated parameters with differing qs values. For an example run, take the following URL list passed in:

https://google.com
https://google.com/home?qs=value
https://google.com/home?qs=secondValue
https://google.com/home?qs=newValue&secondQs=anotherValue
https://google.com/home?qs=asd&secondQs=das

Passing through urldedupe will only maintain the non-duplicate URL & query string (ignoring values) combinations:

$ cat urls.txt | urldedupe
https://google.com
https://google.com/home?qs=value
https://google.com/home?qs=newValue&secondQs=anotherValue

It's also possible to deduplicate similar URLs. This is done with -s|--similar flag, to deduplicate endpoints such as API endpoints with different IDs, or assets:

$ cat urls.txt
https://site.com/api/users/123
https://site.com/api/users/222
https://site.com/api/users/412/profile
https://site.com/users/photos/photo.jpg
https://site.com/users/photos/myPhoto.jpg
https://site.com/users/photos/photo.png

Becomes:

$ cat urls.txt | urldedupe -s
https://site.com/api/users/123
https://site.com/api/users/412/profile
https://site.com/users/photos/photo.jpg

Why C++? Because it's super fast?!?! No not really, I'm working on my C++ skills and mostly just wanted to create a real-world C++ project as opposed to educational related work.

Installation

Use the binary already compiled within the repository...Or better yet to not run a random binary from myself who can be very shady, compile from source:

You'll need cmake installed and C++ 17 or higher.

Clone the repository & navigate to it:

git clone https://github.com/ameenmaali/urldedupe.git
cd urldedupe

In the urldedupe directory

cmake CMakeLists.txt

If you don't have cmake installed, do that. On Mac OS X it is:

brew install cmake

Run make:

make

The urldedupe binary should now be created in the same directory. For easy use, you can move it to your bin directory.

Usage

urldedupe takes URLs from stdin, or a file with the -u flag, of which you will most likely want in a file such as:

$ cat urls.txt
https://google.com/home/?q=2&d=asd
https://my.site/profile?param1=1&param2=2
https://my.site/profile?param3=3

Help

$ ./urldedupe -h
(-h|--help) - Usage/help info for urldedupe
(-u|--urls) - Filename containing urls (use this if you don't pipe urls via stdin)
(-V|--version) - Get current version for urldedupe
(-r|--regex-parse) - This is significantly slower than normal parsing, but may be more thorough or accurate
(-s|--similar) - Remove similar URLs (based on integers and image/font files) - i.e. /api/user/1 & /api/user/2 deduplicated
(-qs|--query-strings-only) - Only include URLs if they have query strings
(-ne|--no-extensions) - Do not include URLs if they have an extension (i.e. .png, .jpg, .woff, .js, .html)
(-m|--mode) - The mode/filters to be enabled (can be 1 or more, comma separated). Default is none, available options are the other flags (--mode "r,s,qs,ne")

Examples

Very simple, simply pass URLs from stdin or with the -u flag:

./urldedupe -u urls.txt

After moving the urldedupe binary to your bin dir..Pass in list from stdin and save to a file:

cat urls.txt | urldedupe > deduped_urls.txt

Deduplicate similar URLs with -s|--similar flag, such as API endpoints with different IDs, or assets:

cat urls.txt | urldedupe -s

https://site.com/api/users/123
https://site.com/api/users/222
https://site.com/api/users/412/profile
https://site.com/users/photos/photo.jpg
https://site.com/users/photos/myPhoto.jpg
https://site.com/users/photos/photo.png

Becomes:

https://site.com/api/users/123
https://site.com/api/users/412/profile
https://site.com/users/photos/photo.jpg

For all the bug bounty hunters, I recommend chaining with tools such as waybackurls or gau to get back only unique URLs as those sources are prone to have many similar/duplicated URLs:

cat waybackurls | urldedupe > deduped_urls.txt

For max thoroughness (usually not necessary), you can use an RFC complaint regex for URL parsing, but it is significantly slower for large data sets:

cat urls.txt | urldedupe -r > deduped_urls_regex.txt

Alternatively, use -m|--mode with the flag values you'd like to run with. For example, if you want to get URLs deduped based on similarity, include only URLs that have query strings, and do not have extensions...

Instead of:

urldedupe -u urls.txt -s -qs -ne

You can also do:

urldedupe -u urls.txt -m "s,qs,ne"

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].