OpenRedact
Semi-automatic data anonymization for German documents.
Description
This repository is the home to the OpenRedact app, a webapp for semi-automatic anonymization of German language documents. OpenRedact is a Prototype Fund project, funded by the Federal Ministry of Education and Research. A detailed description of the project and prototype can be seen here.
CLI
You can use the CLI script backend/cli/redact.py
to anonymize a directory of documents in an unsupervised manner.
./redact.py --input_dir "path/to/documents/" --output_dir "out/directory/"
Call ./redact.py --help
for usage instructions and important notes.
Webapp
OpenRedact works with document file formats
This screencast walks you through the anonymization of a document, from upload to download of the anonymized file.
OpenRedact supports different anonymization methods
This screencast demonstrates the different anonymization methods that OpenRedact supports. The modifications on the left are immediately previewed on the right.
OpenRedact comes with an annotation tool
The automatically detected and proposed personal data can be corrected and extended by the user using our annotation tool.
OpenRedact tells you how good its automatic personal data detection is
Based on the manual corrections and extensions, we can assess the mechanism for automatic detection of personal data.
Deployment
The app is best deployed using Docker.
Run the full stack using Docker-Compose
We have pre-built Docker images available at https://hub.docker.com/u/openredact.
Pull and start the containers by running:
# Clone the repo
git clone https://github.com/openredact/openredact-app.git
cd openredact-app
# Pull images & start containers
docker-compose pull
docker-compose up
This will host the backend at port 8000 (and http://localhost/api) and the frontend at port 80. Once started, you can access the webapp at http://localhost/.
Run the frontend using Docker
cd frontend
docker build -t openredact/frontend .
docker run -p 80:80 openredact/frontend
This will build the frontend inside a node Docker container and deploy the result in an nginx container. For more details about this procedure see React in Docker with Nginx, built with multi-stage Docker builds, including testing.
Run the backend using Docker
cd backend
docker build -t openredact/backend .
docker run -p 8000:8000 openredact/backend
API Documentation
Documentation of the API is available at the endpoints /docs
(Swagger UI)
and /redocs
(ReDoc), e.g. http://127.0.0.1:8000/redoc.
The OpenAPI specification can be found here.
Development
First, follow the instructions in the backend or frontend readme. Then, continue with the instructions below.
Developing using Docker
If you want to use our Docker setup for development, run:
docker-compose -f docker-compose.dev.yml up
Don't forget to add the project's directory to the list of allowed file sharing resources in the Docker Desktop preferences.
Install the pre-commit hooks
pre-commit
is a Python tool to manage git pre-commit hooks.
Running the following code requires the backend dev requirements to be set up as explained here.
We have pre-commit hooks for formatting and linting Python and JavaScript code (black, flake8, prettier and eslint).
Note that the tests, being slower than formatters and linters, are run by CI.
So don't forget to run them manually before committing.
pre-commit install
git config --bool flake8.strict true # Makes the commit fail if flake8 reports an error
To run the hooks:
pre-commit run --all-files
How to contact us
For usage questions, bugs, or suggestions please file a Github issue. If you would like to contribute or have other questions please email [email protected].