findcdn

findcdn, is a tool that can scan and detect the kind of Content Distribution Network (CDN) a domain uses. findcdn can save results to a file, be used as a module, or just output to stdout.

findcdn helps users of the tool accurately determine what CDN a domain is using. The list of supported domains is listed in the cdn_config.py file in the repository. The library can be implemented as a standalone tool or an importable module in your project. In both cases, the results can be written to an output file.

Getting Started

findcdn requires Python 3.7+. Python 2 is not supported.

findcdn can be installed as a module using pip and the requirements.txt file in the repository.

Installed as a Module

findcdn can be installed via pip:

pip install --requirement requirements.txt

It can then be run directly:

findcdn list github.com

Alternatively you can install quickly to any project with:

pip install git+https://github.com/cisagov/findcdn.git

Note: It is recommended to use a python virtual environment to install modules and keep your environment clean. If you wish to do so, you will need pyenv and the pyenv-virtualenv plugin before installing the module.

Standalone Usage and Examples

findcdn file <fileIn> [options]
findcdn list  <domain>... [options]
findcdn (-h | --help)

findcnd -h
findcdn file domains.txt -o output_cdn.txt -t 17 -d
findcdn list dhs.gov cisa.gov -o output_cnd.txt -v
findcdn list cisa.gov

Options

  -h --help                    Show this message.
  --version                    Show the current version.
  -o FILE --output=FILE        If specified, then the JSON output file will be
                               created at the specified value.
  -v --verbose                 Includes additional print statements.
  --all                        Includes domains with and without a CDN
                               in output.
  -d --double                  Run the checks twice to increase accuracy.
  -t --threads=<thread_count>  Number of threads, otherwise use default.
  --timeout=<timeout>          Max duration in seconds to wait for a domain to
                               conclude processing, otherwise use default.
  --user_agent=<user_agent>    Set the user agent to use, otherwise
                               use default.

Sample Output

user2@ubuntu:~$ findcdn list asu.edu -t 7 --double
Using 7 threads.
[Pending: 0 jobs]==[Threads: 2]: 100%|███████████████████████████████| 2/2 [00:00<00:00,  2.22it/s]
{
    "date": "06/19/2020, 13:00:38",
    "CDN_count": "1",
    "domains": {
        "asu.edu": {
            "IP": "'104.16.50.14'",
            "cdns": "'.cloudflare.com'",
            "cdns_by_names": "'Cloudflare'"
        }
    }
}
Domain processing completed.
1 domains had CDN's out of 1.

Library Usage

Since findcdn can be installed as a module, its CDN detection power can be called from and implemented in any project. First, import findcdn into your project, as shown above. Then, pass it a list of domains to analyze. After processing, the program will return a JSON object containing the inputted domains and the CDNs they use.

The way findcdn functions is determined by the passed in options; the current possible options and an example are written below for your convenience:

findcdn.main(
    domain_list: List[str],  # List of domains to search
    output_path: str = None,  # if included, output results to JSON
    verbose: bool = False,  # Verbose mode (more printing!)
    all_domains: bool = False,  # Includes domains that dont have cdn's in the output
    interactive: bool = False,  # Includes a progress bar (normally used for command line)
    double_in: bool = False,  #D ouble the number of tries on a domain to increase accuracy
    threads: int = THREADS,  # Number of threads to use
    timeout: int = TIMEOUT,  # How long to wait on a domain
    user_agent: str = USER_AGENT,  # User Agent to use
)

Example

import findcdn
import json

domains = ['google.com', 'cisa.gov', 'censys.io', 'yahoo.com', 'pbs.org', 'github.com']
resp_json = findcdn.main(domains, output_path="output.json", double_in=True, threads=23)

dumped_json = json.loads(resp_json)

for domain in dumped_json['domains']:
    print(f"{domain} has CDNs:\n {dumped_json['domains'][domain]['cdns']}")

How It Works

findcdn is broken into three sections:

findcdn's main runner file
- Validates and organizes inputted domains
- Orchestrates the use of the CDN Engine using a list of domains
- Output domain CDN's in JSON to stdout and a file if selected
The CDN Engine
- Organizes all domains into a "pot"
- Chef will use the CDN Detection library to obtain all CDNs for each domain
- Chef then runs an analysis to set the boolean has_cdn value if it detects a domain is has a CDN then returns the list of domains to the runner file
CDN Detection
- Will scrape data from:
  - HTTPS Server Headers
  - CNAME records
  - WHOIS data
- From each of these, it runs a fingerprint scan to identify any CDNs defined in cdn_config.py.

History

This tool's original purpose was to automatically detect if a domain that uses a CDN is frontable. Due to the significant overhead for fronting detection, we pivoted the development of this tool to be exclusively a CDN detection tool. We use our wiki to describe further what Domain Fronting is, our research notes, design decisions, and playbooks for fronting specific domains. If you find any additional frontable domains or any flaws in the current playbooks, please consider contributing!

Project Change Summary

The project is now for CDN detection.
- Determining frontability was infeasible for every CDN provider.
Resources, Notes, and playbooks are now available in the wiki of this repository.
Any feedback, improvements, or additional playbooks are always appreciated.

More Information

There is more information located on our wiki page. We encourage you to make feature requests or contribute anything you would like to see added into the project or wiki.

This can be any of the following:

Information pertaining to domain fronting
Playbooks for fronting different domains
Better detection methods for CDN
General updates to current wiki pages

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is in the worldwide public domain.

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

cisagov / findcdn

Programming Languages

Labels

Projects that are alternatives of or similar to findcdn