All Projects → disclose → diosts

disclose / diosts

Licence: MIT license
A Go scraper that validates security.txt files and outputs them in the disclose.io JSON format.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to diosts

youtube-playlist
❄️ Extract links, ids, and names from a youtube playlist
Stars: ✭ 73 (+305.56%)
Mutual labels:  scraper
Linkedin-Client
Web scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (+133.33%)
Mutual labels:  scraper
go-jd
京东App自动登录,在线商品自动下单
Stars: ✭ 158 (+777.78%)
Mutual labels:  scraper
TelegramScraper
Using this tool you can easily add so many members from any group to your group. Less than 2 minutes. Super easy. Time saver. But this tool is only for educational purpose. You could be banned from Telegram. So be careful. Recommanded to use this tool only on Termux.
Stars: ✭ 234 (+1200%)
Mutual labels:  scraper
proxycrawl-python
ProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+183.33%)
Mutual labels:  scraper
freeDictionaryAPI
There was no free Dictionary API on the web when I wanted one for my friend, so I created one.
Stars: ✭ 1,352 (+7411.11%)
Mutual labels:  scraper
civic-scraper
Tools for downloading agendas, minutes and other documents produced by local government
Stars: ✭ 21 (+16.67%)
Mutual labels:  scraper
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (+200%)
Mutual labels:  scraper
gutenberg
Scraper for downloading the entire ebooks repository of project Gutenberg
Stars: ✭ 100 (+455.56%)
Mutual labels:  scraper
youtube
Create a ZIM file from a Youtube channel/username/playlist
Stars: ✭ 25 (+38.89%)
Mutual labels:  scraper
Instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+527.78%)
Mutual labels:  scraper
scraper
Node.js based scraper using headless chrome
Stars: ✭ 45 (+150%)
Mutual labels:  scraper
premeStock
Monitors for restocks
Stars: ✭ 53 (+194.44%)
Mutual labels:  scraper
LeetCode
At present contains scraped data from around 1500 problems present on the site. More to follow....
Stars: ✭ 45 (+150%)
Mutual labels:  scraper
copycat
A PHP Scraping Class
Stars: ✭ 70 (+288.89%)
Mutual labels:  scraper
ha-multiscrape
Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (+472.22%)
Mutual labels:  scraper
leumi-leumicard-bank-data-scraper
Open bank data for Leumi bank and Leumi card credit card
Stars: ✭ 28 (+55.56%)
Mutual labels:  scraper
esaj
Scrapers for many e-SAJ systems
Stars: ✭ 35 (+94.44%)
Mutual labels:  scraper
oge
Page metadata as a service
Stars: ✭ 22 (+22.22%)
Mutual labels:  scraper
wikipedia for humans
No description or website provided.
Stars: ✭ 44 (+144.44%)
Mutual labels:  scraper

diosts

The disclose.io security.txt scraper (diosts) takes a list of domains as the input, retrieves and validates the security.txt if available and outputs it in the disclose.io JSON format.

Installation

Prerequisites: a working Golang installation >= 1.13

go get github.com/disclose/diosts/cmd/diosts

Usage

cat domains.txt | ~/go/bin/diosts -t <threads> 2>diosts.log >securitytxt.json

This wil try and scrape the security.txt from the domains listed in domains.txt, with <threads> parallel threads (defaults to 8). Logging (with information on each of the domains in the input) will be written to diosts.log (because it's output to stderr) and a JSON array of retrieved security.txt information in disclose.io format will be written to securitytxt.json.

For each input, the following URIs are tried, in order:

  1. https://<domain>/.well-known/security.txt
  2. https://<domain>/security.txt
  3. http://<domain>/.well-known/security.txt
  4. http://<domain>/security.txt

Any non-fatal violations of the security.txt specification will be logged.

Build

Note: building is not necessary if you use the installation instructions, Go will take care of this for you.

git clone https://github.com/disclose/diosts
cd diosts
go build ./cmd/diosts

Notes

Redirects

According to the specifications, a redirect should be followed when retrieving security.txt. However:

When retrieving the file and any resources referenced in the file, researchers should record any redirects since they can lead to a different domain or IP address controlled by an attacker. Further inspections of such redirects is recommended before using the information contained within the file.

At this point, we blindly accept redirects within the same organization (e.g., google.com to www.google.com is accepted). Any other redirect is logged as an error, to be dealt with later.

Canonical

A security.txt should contain a Canonical field with a URL pointing to the canonical version of the security.txt. We should check if we retrieved the security.txt from the canonical URL and if not, do so.

Program name

Currently, we use the input domain name as program name. This might or might not be correct, especially with redirects and canonical URL entries. To be discussed later.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].