Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ahalterman → Multiuser_prodigy

ahalterman / Multiuser_prodigy

Licence: mit

Running Prodigy for a team of annotators

Labels

html spacy

Projects that are alternatives of or similar to Multiuser prodigy

Wasabi

🍣 A lightweight console printing and formatting toolkit

Stars: ✭ 272 (+655.56%)

Mutual labels: spacy

Projects

🪐 End-to-end NLP workflows from prototype to production

Stars: ✭ 397 (+1002.78%)

Mutual labels: spacy

Mordecai

Full text geoparsing as a Python library

Stars: ✭ 579 (+1508.33%)

Mutual labels: spacy

Displacy

💥 displaCy.js: An open-source NLP visualiser for the modern web

Stars: ✭ 311 (+763.89%)

Mutual labels: spacy

Nlp Python Deep Learning

NLP in Python with Deep Learning

Stars: ✭ 374 (+938.89%)

Mutual labels: spacy

Spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

Stars: ✭ 21,978 (+60950%)

Mutual labels: spacy

spacy-clausie

Implementation of the ClausIE information extraction system for python+spacy

Stars: ✭ 106 (+194.44%)

Mutual labels: spacy

Spacy Transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Stars: ✭ 919 (+2452.78%)

Mutual labels: spacy

W.i.l.l

A python written personal assistant

Stars: ✭ 377 (+947.22%)

Mutual labels: spacy

Klayers

Python Packages as AWS Lambda Layers

Stars: ✭ 557 (+1447.22%)

Mutual labels: spacy

Camphr

spaCy plugin for Transformers , Udify, ELmo, etc.

Stars: ✭ 327 (+808.33%)

Mutual labels: spacy

Spacy Streamlit

👑 spaCy building blocks and visualizers for Streamlit apps

Stars: ✭ 360 (+900%)

Mutual labels: spacy

Mexican Government Report

Text Mining on the 2019 Mexican Government Report, covering from extracting text from a PDF file to plotting the results.

Stars: ✭ 473 (+1213.89%)

Mutual labels: spacy

Medacy

🏥 Medical Text Mining and Information Extraction with spaCy

Stars: ✭ 287 (+697.22%)

Mutual labels: spacy

Cltk

The Classical Language Toolkit

Stars: ✭ 650 (+1705.56%)

Mutual labels: spacy

Spacy Notebooks

💫 Jupyter notebooks for spaCy examples and tutorials

Stars: ✭ 255 (+608.33%)

Mutual labels: spacy

Subreddit Analyzer

A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.

Stars: ✭ 447 (+1141.67%)

Mutual labels: spacy

Scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

Stars: ✭ 855 (+2275%)

Mutual labels: spacy

Spacy Models

💫 Models for the spaCy Natural Language Processing (NLP) library

Stars: ✭ 796 (+2111.11%)

Mutual labels: spacy

Spacy Stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy

Stars: ✭ 508 (+1311.11%)

Mutual labels: spacy

View All Similar Projects ➔

multiuser_prodigy

This is a multi-annotator setup for Prodigy, Explosion AI's data annotation tool, that uses a Mongo DB to allocate annotation tasks to annotators working on different Prodigy instances running on seperate ports. This use case focuses on collecting gold standard annotations from a team of annotators using Prodigy, rather than on the active learning, single-annotator setup that Prodigy is primarily intended for.

There are a few examples of annotation interfaces in the repo, including code for annotators working on training an NER model or doing sentence classification with document context. Each annotator works on the Prodigy/port assigned to them, and a new DBStream class handles pulling the examples from Prodigy that are assigned to each worker.

I've used this setup for three major annotation projects now, but you'll need to modify the code to get it working for your project as well.

Mongo database

All tasks are stored in a Mongo DB, which allows different logic for how tasks are assigned to annotators. For instance, examples can go out to annotators until three annotations are collected, examples could go to two predetermined annotators from the wider pool, or annotations can be automatically resubmitted to a third annotator if the first two annotations disagree.

You can start a Mongo DB in a Docker container:

sudo docker run -d -p 127.0.0.1:27017:27017 -v /home/andy/MIT/multiuser_prodigy/db:/data/db  mongo

To load a list of tasks into the database:

python mongo_load.py -i assault_not_assault.jsonl -c "assault_gsr"

where -i is a JSONL file of tasks and -c specifies the collection name to load them into.

"seen" : {"$in" : [0,1]}}, {"coders"

Running

You'll need to modify the code of multiuser_db.py to access the right collection, set the names/ports of annotators, and the desired interface (NER, classification, etc).

Then you should launch the processes either in a screen or in the background:

python multiuser_db.py

Analysis

You can use Streamlit to set up a dashboard so annotators can check their progress. This one pulls results from the Mongo DB, but you could also call the Prodigy DB and show results from there.

A more complicated analysis dashboard setup is in Report.Rmd. This RMarkdown file reads in a CSV of coding information and generates figures in an HTML page that can be served from the annotation server. To record information about how long each task takes, add something like eg['time_loaded'] = datetime.now().isoformat() to your stream code and something like eg['time_returned'] = datetime.now().isoformat() to your update code. report_maker.py exports the DB to CSV and knits the RMarkdown on that CSV.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 36

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗