ELS-RD / anonymisation

Licence: Apache-2.0 license

Anonymization of legal cases (Fr) based on Flair embeddings

Programming Languages

python

139335 projects - #7 most used programming language

Makefile

30231 projects

Projects that are alternatives of or similar to anonymisation

presidio-research

This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.

Stars: ✭ 62 (-27.06%)

Mutual labels: spacy, ner, flair

datagrand bert

2019达观杯信息提取第5名代码

Stars: ✭ 20 (-76.47%)

Mutual labels: ner, bert

wechsel

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Stars: ✭ 39 (-54.12%)

Mutual labels: transformers, bert

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Stars: ✭ 2,828 (+3227.06%)

Mutual labels: transformers, bert

neuro-comma

🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺

Stars: ✭ 46 (-45.88%)

Mutual labels: ner, bert

extractacy

Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)

Stars: ✭ 47 (-44.71%)

Mutual labels: spacy, ner

scikitcrf NER

Python library for custom entity recognition using Sklearn CRF

Stars: ✭ 17 (-80%)

Mutual labels: entities, ner

spacy-sentence-bert

Sentence transformers models for SpaCy

Stars: ✭ 88 (+3.53%)

Mutual labels: spacy, bert

question generator

An NLP system for generating reading comprehension questions

Stars: ✭ 188 (+121.18%)

Mutual labels: transformers, bert

oreilly-bert-nlp

This repository contains code for the O'Reilly Live Online Training for BERT

Stars: ✭ 19 (-77.65%)

Mutual labels: transformers, bert

ginza-transformers

Use custom tokenizers in spacy-transformers

Stars: ✭ 15 (-82.35%)

Mutual labels: transformers, spacy

ChineseNER

中文NER的那些事儿

Stars: ✭ 241 (+183.53%)

Mutual labels: ner, bert

gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

Stars: ✭ 216 (+154.12%)

Mutual labels: transformers, bert

anonymization-api

How to build and deploy an anonymization API with FastAPI

Stars: ✭ 51 (-40%)

Mutual labels: spacy, ner

DrFAQ

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Stars: ✭ 29 (-65.88%)

Mutual labels: spacy, bert

NER-and-Linking-of-Ancient-and-Historic-Places

An NER tool for ancient place names based on Pleiades and Spacy.

Stars: ✭ 26 (-69.41%)

Mutual labels: spacy, ner

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+169.41%)

Mutual labels: transformers, bert

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+3675.29%)

Mutual labels: transformers, bert

Nlp Architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Stars: ✭ 2,768 (+3156.47%)

Mutual labels: transformers, bert

nlp workshop odsc europe20

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and T…

Stars: ✭ 127 (+49.41%)

Mutual labels: transformers, spacy

View All Similar Projects ➔

Pseudo-anonymization of French legal cases

Scope

Build Named Entity Recognition (NER) training dataset and learn a model dedicated to French legal case anonymization by leveraging pre-trained language models.
The projects goes above the scope covered by our previous rule based system which was limited to address and natural person names (this rule based system was used by most legal actors in France).
This model can be used in a pseudo-anonymization system.

Measures computed over manually annotated data show strong performance, in particular on natural person and legal professionals names.

The only French legal cases massively acquired by Lefebvre Sarrut not pseudo-anonymized are those from appeal courts (Jurica database).
The input data are manually annotated data by Lefebvre Sarrut employees.

The project is focused on finding mentions of entities and guessing their types.
It doesn't manage the pseudo-anonymization step, meaning replacing entities found in precedent step by another representation.

Evolution

Previous version of the project in 2018 was based on Spacy library.
In 2019, new pre-trained language models appeared and provided a much butter quality than what Spacy delivered.
Current project is now based on Flair.

If you want more information about the project, check these articles:

Commands to use the code

This project uses Python virtual environment to manage dependencies without interfering with those used by the machine.
pip3 and python3 are the only requirements.
To setup a virtual environment on the machine, install virtualenv from pip3 and install the project dependencies (from the requirements.txt file).

These steps are scripted in the Makefile (tested only on Ubuntu) and can be performed with the following command:

make setup

Variable VIRT_ENV_FOLDER can be changed in the Makefile to change where to install Python dependencies.

... then you can use the project by running one of the following actions:

All commands can be found in the Makefile.

Setup Pycharm

For tests run from Pycharm, you need to create a Pytest test task.
Then the working folder by default (implicit) is the test folder.
It has to be setup as the project root folder explicitly.

License

This project is licensed under Apache 2.0 License (found in the LICENSE file in the root directory).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ELS-RD / anonymisation

Programming Languages

Labels

Projects that are alternatives of or similar to anonymisation

Pseudo-anonymization of French legal cases

Scope

Evolution

Commands to use the code

Setup Pycharm

License