ankane / Pdscan
Programming Languages
pdscan
Scan your data stores for unencrypted personal data (PII)
- Last names
- Email addresses
- IP addresses
- Street addresses (US)
- Phone numbers (US)
- Credit card numbers
- Social security numbers
- Dates of birth
- Location data
- OAuth tokens
Uses data sampling and naming, and works with compressed files
π₯ Zero runtime dependencies and minimal database load
Installation
Download the latest version.
Unzip and follow the instructions below for your data store.
On Mac, you can also use:
brew install ankane/brew/pdscan
Data Stores
Files
pdscan file://path/to/file.txt
You can also specify a directory.
pdscan file://path/to/directory
For absolute paths, use file:///
.
MySQL & MariaDB
pdscan mysql://user:[email protected]:3306/dbname
Postgres
pdscan postgres://user:[email protected]:5432/dbname
If your connection doesnβt use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;
SQLite
pdscan sqlite:/path/to/dbname.sqlite3
S3
pdscan s3://bucket/path/to/file.txt
Requires
s3:GetObject
permission
You can also specify a prefix by ending with a /
.
pdscan s3://bucket/path/to/directory/
Requires
s3:ListBucket
ands3:GetObject
permissions
Others
Feel free to submit a PR
Options
Show data found
pdscan --show-data
Show low confidence matches
pdscan --show-all
Change sample size
pdscan --sample-size 50000
Specify number of processes to use (defaults to 1)
pdscan --processes 4
Roadmap
- Add more data stores (SQL Server, MongoDB, Elasticsearch, Memcached, Redis)
- Improve rules
- Highlight matches
- Add more output formats, like JSON and CSV
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/pdscan.git
cd pdscan
make test