All Projects → explosion → Prodigy Recipes

explosion / Prodigy Recipes

🍳 Recipes for the Prodigy, our fully scriptable annotation tool

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Prodigy Recipes

Jupyterlab Prodigy
🧬 A JupyterLab extension for annotating data with Prodigy
Stars: ✭ 97 (-57.64%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing, annotation, spacy
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+9497.38%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing, spacy
Tageditor
🏖TagEditor - Annotation tool for spaCy
Stars: ✭ 92 (-59.83%)
Mutual labels:  data-science, natural-language-processing, annotation, spacy
Spacy Stanza
💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
Stars: ✭ 508 (+121.83%)
Mutual labels:  data-science, natural-language-processing, spacy
Lazynlp
Library to scrape and clean web pages to create massive datasets.
Stars: ✭ 1,985 (+766.81%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing
Data Science
Collection of useful data science topics along with code and articles
Stars: ✭ 315 (+37.55%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing
Thinc
🔮 A refreshing functional take on deep learning, compatible with your favorite libraries
Stars: ✭ 2,422 (+957.64%)
Mutual labels:  artificial-intelligence, natural-language-processing, spacy
Learn Data Science For Free
This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …
Stars: ✭ 4,757 (+1977.29%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+454.59%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing
Nlpaug
Data augmentation for NLP
Stars: ✭ 2,761 (+1105.68%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-27.95%)
Mutual labels:  artificial-intelligence, data-science, natural-language-processing
Delbot
It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.
Stars: ✭ 191 (-16.59%)
Mutual labels:  data-science, natural-language-processing
Displacy Ent
💥 displaCy-ent.js: An open-source named entity visualiser for the modern web
Stars: ✭ 191 (-16.59%)
Mutual labels:  natural-language-processing, spacy
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-16.16%)
Mutual labels:  artificial-intelligence, data-science
Gophernet
A simple from-scratch neural net written in Go
Stars: ✭ 194 (-15.28%)
Mutual labels:  artificial-intelligence, data-science
Vec4ir
Word Embeddings for Information Retrieval
Stars: ✭ 188 (-17.9%)
Mutual labels:  data-science, natural-language-processing
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (-16.59%)
Mutual labels:  artificial-intelligence, natural-language-processing
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-15.28%)
Mutual labels:  artificial-intelligence, data-science
Pytorch Lightning
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Stars: ✭ 16,641 (+7166.81%)
Mutual labels:  artificial-intelligence, data-science
Pytorch Geometric Yoochoose
This is a tutorial for PyTorch Geometric on the YooChoose dataset
Stars: ✭ 198 (-13.54%)
Mutual labels:  artificial-intelligence, data-science

Prodigy Recipes

This repository contains a collection of recipes for Prodigy, our scriptable annotation tool for text, images and other data. In order to use this repo, you'll need a license for Prodigy – see this page for more details. For questions and bug reports, please use the Prodigy Support Forum. If you've found a mistake or bug, feel free to submit a pull request.

Important note: The recipes in this repository aren't 100% identical to the built-in recipes shipped with Prodigy. They've been edited to include comments and more information, and some of them have been simplified to make it easier to follow what's going on, and to use them as the basis for a custom recipe.

📋 Usage

Once Prodigy is installed, you should be able to run the prodigy command from your terminal, either directly or via python -m:

python -m prodigy

The prodigy command lists the built-in recipes. To use a custom recipe script, simply pass the path to the file using the -F argument:

python -m prodigy ner.teach your_dataset en_core_web_sm ./data.jsonl --label PERSON -F prodigy-recipes/ner/ner_teach.py

You can also use the --help flag for an overview of the available arguments of a recipe, e.g. prodigy ner.teach -F ner_teach_.py --help.

Some things to try

You can edit the code in the recipe script to customize how Prodigy behaves.

  • Try replacing prefer_uncertain() with prefer_high_scores().
  • Try writing a custom sorting function. It just needs to be a generator that yields a sequence of example dicts, given a sequence of (score, example) tuples.
  • Try adding a filter that drops some questions from the stream. For instance, try writing a filter that only asks you questions where the entity is two words long.
  • Try customizing the update() callback, to include extra logging or extra functionality.

🍳 Recipes

Named Entity Recognition

Recipe Description
ner.teach Collect the best possible training data for a named entity recognition model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.
ner.match Suggest phrases that match a given patterns file, and mark whether they are examples of the entity you're interested in. The patterns file can include exact strings or token patterns for use with spaCy's Matcher.
ner.manual Mark spans manually by token. Requires only a tokenizer and no entity recognizer, and doesn't do any active learning.
ner.manual.bert Use BERT word piece tokenizer for efficient manual NER annotation for transformer models.
ner.make-gold Create gold-standard data by correcting a model's predictions manually. 
ner.silver-to-gold Take an existing "silver" dataset with binary accept/reject annotations, merge the annotations to find the best possible analysis given the constraints defined in the annotations, and manually edit it to create a perfect and complete "gold" dataset.

Text Classification

Recipe Description
textcat.teach Collect the best possible training data for a text classification model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.
textcat.custom-model Use active learning-powered text classification with a custom model. To demonstrate how it works, this demo recipe uses a simple dummy model that "predicts" random scores. But you can swap it out for any model of your choice, for example a text classification model implementation using PyTorch, TensorFlow or scikit-learn.

Terminology

Recipe Description
terms.teach Bootstrap a terminology list with word vectors and seeds terms. Prodigy will suggest similar terms based on the word vectors, and update the target vector accordingly.

Image

Recipe Description
image.manual Manually annotate images by drawing rectangular bounding boxes or polygon shapes on the image.
image-caption Annotate images with captions, pre-populate captions with image captioning model implemented in PyTorch and perform error analysis.
image.frozenmodel Model in loop manual annotation using Tensorflow's Object Detection API.
image.servingmodel Model in loop manual annotation using Tensorflow's Object Detection API. This uses Tensorflow Serving
image.trainmodel Model in loop manual annotation and training using Tensorflow's Object Detection API.

Other

Recipe Description
mark Click through pre-prepared examples, with no model in the loop.
choice Annotate data with multiple-choice options. The annotated examples will have an additional property "accept": [] mapping to the ID(s) of the selected option(s).
question_answering Annotate question/answer pairs with a custom HTML interface.

Community recipes

Recipe Author Description
phrases.teach @kabirkhan Now part of sense2vec.
phrases.to-patterns @kabirkhan Now part of sense2vec.
records.link @kabirkhan Link records across multiple datasets using the dedupe library.

📚 Example Datasets and Patterns

To make it even easier to get started, we've also included a few example-datasets, both raw data as well as data containing annotations created with Prodigy. For examples of token-based match patterns to use with recipes like ner.teach or ner.match, see the example-patterns directory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].