All Projects → openredact → openredact-app

openredact / openredact-app

Licence: MIT license
This is a prototype of a semi-automatic data anonymization app for German documents.

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
Sass
350 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to openredact-app

pynonymizer
A universal tool for translating sensitive production database dumps into anonymized copies.
Stars: ✭ 58 (+314.29%)
Mutual labels:  anonymization
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (+185.71%)
Mutual labels:  documents
deep-learning-german-tts
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Stars: ✭ 268 (+1814.29%)
Mutual labels:  german
go-anonymize-mysqldump
Allows you to pipe data from mysqldump or an SQL file and anonymize it.
Stars: ✭ 49 (+250%)
Mutual labels:  anonymization
elecV2P-dei
elecV2P 说明文档、使用范例及问题交流
Stars: ✭ 171 (+1121.43%)
Mutual labels:  documents
dart.cn
Dart docs localization, get started from the wiki page here: https://github.com/cfug/dart.cn/wiki
Stars: ✭ 64 (+357.14%)
Mutual labels:  documents
awesome-made-by-germans
🇩🇪 The best open source projects that were made and mainly contributed by German developers
Stars: ✭ 170 (+1114.29%)
Mutual labels:  german
wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+264.29%)
Mutual labels:  german
guides
How we do things at OK GROW!
Stars: ✭ 16 (+14.29%)
Mutual labels:  documents
iOS-Programming-Documents
iOS Programming Documents in Korean
Stars: ✭ 64 (+357.14%)
Mutual labels:  documents
anonymisation
Anonymization of legal cases (Fr) based on Flair embeddings
Stars: ✭ 85 (+507.14%)
Mutual labels:  anonymization
HistoryOfMe
Your own personal diary.
Stars: ✭ 50 (+257.14%)
Mutual labels:  german
PhantomBotDE
PhantomBotDE ist ein aktiv Entwickelter interaktiver Open Source Twitch Bot mit einer lebendigen Community welche Unterhaltung und Moderation für deinen Kanal bietet, dieser erlaubt dir dich auf das was wirklich zählt zu Konzentrieren - dein Spiel und deine Zuschauer.
Stars: ✭ 24 (+71.43%)
Mutual labels:  german
Wortuhr
Software für eine ESP8266 basierte Wortuhr mit verschiedenen Layouts
Stars: ✭ 30 (+114.29%)
Mutual labels:  german
SoMeWeTa
A part-of-speech tagger with support for domain adaptation and external resources.
Stars: ✭ 20 (+42.86%)
Mutual labels:  german
pgantomizer
Anonymize data in your PostgreSQL dabatase with ease
Stars: ✭ 95 (+578.57%)
Mutual labels:  anonymization
GKey
German Keyboard Layout for TempleOS
Stars: ✭ 20 (+42.86%)
Mutual labels:  german
Bootstrap-Docs
Bootstrap 2.x, 3.x Traditional Chinese Docs, Based on ASP.NET MVC 5 Framework, No Continued maintain just open source it.
Stars: ✭ 16 (+14.29%)
Mutual labels:  documents
metatron-doc-discovery
Metatron Discovery user documents
Stars: ✭ 18 (+28.57%)
Mutual labels:  documents
theolog-ss2017
Notizen zur TheoLog-Vorlesung mit Begriffen aus Formale Systeme. Hinweis: die Unterlagen sind für die VL in 2017 und können Fehler enthalten
Stars: ✭ 18 (+28.57%)
Mutual labels:  german

OpenRedact

Semi-automatic data anonymization for German documents.


MIT license Code style: Black Code style: prettier Frontend Tests Backend Tests Black & Flake8

⚠️ Disclaimer ⚠️: This is a prototype. Do not use for anything critical.

⚠️ Note ⚠️: This tool focuses on the text content. Metadata will not be anonymized.

Description

This repository is the home to the OpenRedact app, a webapp for semi-automatic anonymization of German language documents. OpenRedact is a Prototype Fund project, funded by the Federal Ministry of Education and Research. A detailed description of the project and prototype can be seen here.

Using OpenRedact to anonymize documents

CLI

You can use the CLI script backend/cli/redact.py to anonymize a directory of documents in an unsupervised manner.

./redact.py --input_dir "path/to/documents/" --output_dir "out/directory/"

Call ./redact.py --help for usage instructions and important notes.

Webapp

OpenRedact works with document file formats

This screencast walks you through the anonymization of a document, from upload to download of the anonymized file.

OpenRedact supports different anonymization methods

This screencast demonstrates the different anonymization methods that OpenRedact supports. The modifications on the left are immediately previewed on the right.

OpenRedact comes with an annotation tool

The automatically detected and proposed personal data can be corrected and extended by the user using our annotation tool.

Annotate personal data inside a text

OpenRedact tells you how good its automatic personal data detection is

Based on the manual corrections and extensions, we can assess the mechanism for automatic detection of personal data.

Show scores and metrics for the automatic detection of personal data

Deployment

The app is best deployed using Docker.

Run the full stack using Docker-Compose

We have pre-built Docker images available at https://hub.docker.com/u/openredact.

Pull and start the containers by running:

# Clone the repo
git clone https://github.com/openredact/openredact-app.git
cd openredact-app

# Pull images & start containers
docker-compose pull
docker-compose up

This will host the backend at port 8000 (and http://localhost/api) and the frontend at port 80. Once started, you can access the webapp at http://localhost/.

Run the frontend using Docker

cd frontend
docker build -t openredact/frontend .
docker run -p 80:80 openredact/frontend

This will build the frontend inside a node Docker container and deploy the result in an nginx container. For more details about this procedure see React in Docker with Nginx, built with multi-stage Docker builds, including testing.

Run the backend using Docker

cd backend
docker build -t openredact/backend .
docker run -p 8000:8000 openredact/backend

API Documentation

Documentation of the API is available at the endpoints /docs (Swagger UI) and /redocs (ReDoc), e.g. http://127.0.0.1:8000/redoc. The OpenAPI specification can be found here.

Development

First, follow the instructions in the backend or frontend readme. Then, continue with the instructions below.

Developing using Docker

If you want to use our Docker setup for development, run:

docker-compose -f docker-compose.dev.yml up

Don't forget to add the project's directory to the list of allowed file sharing resources in the Docker Desktop preferences.

Install the pre-commit hooks

pre-commit is a Python tool to manage git pre-commit hooks. Running the following code requires the backend dev requirements to be set up as explained here. We have pre-commit hooks for formatting and linting Python and JavaScript code (black, flake8, prettier and eslint). Note that the tests, being slower than formatters and linters, are run by CI. So don't forget to run them manually before committing.

pre-commit install
git config --bool flake8.strict true  # Makes the commit fail if flake8 reports an error

To run the hooks:

pre-commit run --all-files

How to contact us

For usage questions, bugs, or suggestions please file a Github issue. If you would like to contribute or have other questions please email [email protected].

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].