All Projects → ELS-RD → anonymisation

ELS-RD / anonymisation

Licence: Apache-2.0 license
Anonymization of legal cases (Fr) based on Flair embeddings

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to anonymisation

presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (-27.06%)
Mutual labels:  spacy, ner, flair
datagrand bert
2019达观杯信息提取第5名代码
Stars: ✭ 20 (-76.47%)
Mutual labels:  ner, bert
wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
Stars: ✭ 39 (-54.12%)
Mutual labels:  transformers, bert
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+3227.06%)
Mutual labels:  transformers, bert
neuro-comma
🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺
Stars: ✭ 46 (-45.88%)
Mutual labels:  ner, bert
extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Stars: ✭ 47 (-44.71%)
Mutual labels:  spacy, ner
scikitcrf NER
Python library for custom entity recognition using Sklearn CRF
Stars: ✭ 17 (-80%)
Mutual labels:  entities, ner
spacy-sentence-bert
Sentence transformers models for SpaCy
Stars: ✭ 88 (+3.53%)
Mutual labels:  spacy, bert
question generator
An NLP system for generating reading comprehension questions
Stars: ✭ 188 (+121.18%)
Mutual labels:  transformers, bert
oreilly-bert-nlp
This repository contains code for the O'Reilly Live Online Training for BERT
Stars: ✭ 19 (-77.65%)
Mutual labels:  transformers, bert
ginza-transformers
Use custom tokenizers in spacy-transformers
Stars: ✭ 15 (-82.35%)
Mutual labels:  transformers, spacy
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+183.53%)
Mutual labels:  ner, bert
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+154.12%)
Mutual labels:  transformers, bert
anonymization-api
How to build and deploy an anonymization API with FastAPI
Stars: ✭ 51 (-40%)
Mutual labels:  spacy, ner
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-65.88%)
Mutual labels:  spacy, bert
NER-and-Linking-of-Ancient-and-Historic-Places
An NER tool for ancient place names based on Pleiades and Spacy.
Stars: ✭ 26 (-69.41%)
Mutual labels:  spacy, ner
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+169.41%)
Mutual labels:  transformers, bert
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+3675.29%)
Mutual labels:  transformers, bert
Nlp Architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Stars: ✭ 2,768 (+3156.47%)
Mutual labels:  transformers, bert
nlp workshop odsc europe20
Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and T…
Stars: ✭ 127 (+49.41%)
Mutual labels:  transformers, spacy

Pseudo-anonymization of French legal cases

Build Status License

Scope

Build Named Entity Recognition (NER) training dataset and learn a model dedicated to French legal case anonymization by leveraging pre-trained language models.
The projects goes above the scope covered by our previous rule based system which was limited to address and natural person names (this rule based system was used by most legal actors in France).
This model can be used in a pseudo-anonymization system.

Measures computed over manually annotated data show strong performance, in particular on natural person and legal professionals names.

The only French legal cases massively acquired by Lefebvre Sarrut not pseudo-anonymized are those from appeal courts (Jurica database).
The input data are manually annotated data by Lefebvre Sarrut employees.

The project is focused on finding mentions of entities and guessing their types.
It doesn't manage the pseudo-anonymization step, meaning replacing entities found in precedent step by another representation.

Evolution

Previous version of the project in 2018 was based on Spacy library.
In 2019, new pre-trained language models appeared and provided a much butter quality than what Spacy delivered.
Current project is now based on Flair.

If you want more information about the project, check these articles:

Commands to use the code

This project uses Python virtual environment to manage dependencies without interfering with those used by the machine.
pip3 and python3 are the only requirements.
To setup a virtual environment on the machine, install virtualenv from pip3 and install the project dependencies (from the requirements.txt file).

These steps are scripted in the Makefile (tested only on Ubuntu) and can be performed with the following command:

make setup

Variable VIRT_ENV_FOLDER can be changed in the Makefile to change where to install Python dependencies.

... then you can use the project by running one of the following actions:

All commands can be found in the Makefile.

Setup Pycharm

For tests run from Pycharm, you need to create a Pytest test task.
Then the working folder by default (implicit) is the test folder.
It has to be setup as the project root folder explicitly.

License

This project is licensed under Apache 2.0 License (found in the LICENSE file in the root directory).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].