All Projects → Damian89 → Commoncrawlparser

Damian89 / Commoncrawlparser

Licence: mit
Simple multi threaded tool to extract domain related data from commoncrawl.org

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Commoncrawlparser

Osint tips
OSINT
Stars: ✭ 322 (+1188%)
Mutual labels:  osint, pentesting
Sn0int
Semi-automatic OSINT framework and package manager
Stars: ✭ 814 (+3156%)
Mutual labels:  osint, pentesting
Vault
swiss army knife for hackers
Stars: ✭ 346 (+1284%)
Mutual labels:  osint, pentesting
QuickScan
Port scanning and domain utility.
Stars: ✭ 26 (+4%)
Mutual labels:  osint, pentesting
Linkedin2username
OSINT Tool: Generate username lists for companies on LinkedIn
Stars: ✭ 504 (+1916%)
Mutual labels:  osint, pentesting
Dorknet
Selenium powered Python script to automate searching for vulnerable web apps.
Stars: ✭ 256 (+924%)
Mutual labels:  osint, pentesting
Sifter
Sifter aims to be a fully loaded Op Centre for Pentesters
Stars: ✭ 403 (+1512%)
Mutual labels:  osint, pentesting
Leakscraper
LeakScraper is an efficient set of tools to process and visualize huge text files containing credentials. Theses tools are designed to help penetration testers and redteamers doing OSINT by gathering credentials belonging to their target.
Stars: ✭ 227 (+808%)
Mutual labels:  osint, pentesting
Goohak
Automatically Launch Google Hacking Queries Against A Target Domain
Stars: ✭ 432 (+1628%)
Mutual labels:  osint, pentesting
Hosthunter
HostHunter a recon tool for discovering hostnames using OSINT techniques.
Stars: ✭ 427 (+1608%)
Mutual labels:  osint, pentesting
quick-recon.py
Do some quick reconnaissance on a domain-based web-application
Stars: ✭ 13 (-48%)
Mutual labels:  osint, pentesting
Bigbountyrecon
BigBountyRecon tool utilises 58 different techniques using various Google dorks and open source tools to expedite the process of initial reconnaissance on the target organisation.
Stars: ✭ 541 (+2064%)
Mutual labels:  osint, pentesting
Cc.py
Extracting URLs of a specific target based on the results of "commoncrawl.org"
Stars: ✭ 250 (+900%)
Mutual labels:  osint, pentesting
Vajra
Vajra is a highly customizable target and scope based automated web hacking framework to automate boring recon tasks and same scans for multiple target during web applications penetration testing.
Stars: ✭ 269 (+976%)
Mutual labels:  osint, pentesting
Rengine
reNgine is an automated reconnaissance framework for web applications with a focus on highly configurable streamlined recon process via Engines, recon data correlation and organization, continuous monitoring, backed by a database, and simple yet intuitive User Interface. reNgine makes it easy for penetration testers to gather reconnaissance with…
Stars: ✭ 3,439 (+13656%)
Mutual labels:  osint, pentesting
Aiodnsbrute
Python 3.5+ DNS asynchronous brute force utility
Stars: ✭ 370 (+1380%)
Mutual labels:  osint, pentesting
Intrec Pack
Intelligence and Reconnaissance Package/Bundle installer.
Stars: ✭ 177 (+608%)
Mutual labels:  osint, pentesting
Mosint
An automated e-mail OSINT tool
Stars: ✭ 184 (+636%)
Mutual labels:  osint, pentesting
Metabigor
Intelligence tool but without API key
Stars: ✭ 424 (+1596%)
Mutual labels:  osint, pentesting
Bugcrowd Levelup Subdomain Enumeration
This repository contains all the material from the talk "Esoteric sub-domain enumeration techniques" given at Bugcrowd LevelUp 2017 virtual conference
Stars: ✭ 513 (+1952%)
Mutual labels:  osint, pentesting

cc.py

Simple multi threaded tool to extract domain related data from commoncrawl.org

Usage

ccp.py [-h] -d domain -o path [-t THREADS] [-f index1] [-f index2]

necessary arguments:
  -d, --domain   The domain you want to search for in CC data.
  -o, --outfile  The path and filename where you want the results to be saved to.

optional arguments:
  -h, --help     Show help message and exit
  -f, --filter   Use only indices which contain this string
  -t, --threads  Threads for requests

Examples

Search for github.com and save to /home/folder/cc/data.txt

python3 ccp.py -d github.com -o /home/folder/cc/data.txt

Search for github.com in indices which contain "CC-MAIN-2017-09", save to data.txt

python3 ccp.py -d github.com -o ./data.txt -f CC-MAIN-2017-09

Search for github.com in indices which contain "2013" and "2014", save to data.txt

python3 ccp.py -d github.com -o ./data.txt -f 2014 -f 2013

Search for github.com using 10 threads, save to data.txt

python3 ccp.py -d github.com -o ./data.txt -t 10

grep tips

I am no grep expert but I know how to extract data, if you have better solutions for my existing commands OR additional ideas what to search for: PR

  1. Find entries which end with popular file extension indicating dynamic pages etc:
grep -i -E '\.(php|asp|dev|jsp|wsdl|xml|cgi|json|html)$' /home/folder/cc/data.txt
  1. Find interesting files like backups, archives, log files...
grep -i -E '\.(zip|rar|tar|bkp|sql|zip|bz2|gz|txt|bak|conf|log|error|debug|yml|lock|template|tpl)$' /home/folder/cc/data.txt
  1. Find entries which contain popular strings like "admin" etc:
grep -i -E '(admin|account|debug|control|config|upload|system|secret|environment|dashboard)$' /home/folder/cc/data.txt
  1. Find files which begin with "." (htaccess, ...):
grep -i -E '\/\.' /home/folder/cc/data.txt
  1. Find obvious backup files:
grep -i -E '(\.bkp|\.bak|backup|\.dump|\.sql)' /home/folder/cc/data.txt
  1. Extract subdomains:
sed -e 's|^[^/]*//||' -e 's|^www\.||' -e 's|/.*$||' /home/folder/cc/data.txt | grep -v ":" | grep -v "@" | grep -v "?" | grep -v "/" | sort -u
  1. Find urls with parameters in it:
grep -i -E '(\?|\&)(.*?)=((.*?)|)' /home/folder/cc/data.txt | sort -u

Dependencies

  • python3
  • requests
  • argparse
  • json

Information

This project was initially forked from cc.py but since I refactored it completely and si9int took another path I decided to create a stand alone project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].