All Projects → microsoft → verseagility

microsoft / verseagility

Licence: MIT license
Ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
typescript
32286 projects

Projects that are alternatives of or similar to verseagility

Nlp research
NLP research:基于tensorflow的nlp深度学习项目,支持文本分类/句子匹配/序列标注/文本生成 四大任务
Stars: ✭ 141 (+513.04%)
Mutual labels:  transformer, classification, ner
Rust Bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Stars: ✭ 510 (+2117.39%)
Mutual labels:  transformer, question-answering, ner
Scientificsummarizationdatasets
Datasets I have created for scientific summarization, and a trained BertSum model
Stars: ✭ 100 (+334.78%)
Mutual labels:  transformer, summarization
Etagger
reference tensorflow code for named entity tagging
Stars: ✭ 100 (+334.78%)
Mutual labels:  transformer, ner
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (+982.61%)
Mutual labels:  transformer, ner
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+1552.17%)
Mutual labels:  transformer, ner
Meta Emb
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Stars: ✭ 28 (+21.74%)
Mutual labels:  transformer, ner
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (+521.74%)
Mutual labels:  transformer, summarization
Demo Chinese Text Binary Classification With Bert
Stars: ✭ 276 (+1100%)
Mutual labels:  transformer, classification
TitleStylist
Source code for our "TitleStylist" paper at ACL 2020
Stars: ✭ 72 (+213.04%)
Mutual labels:  transformer, summarization
expmrc
ExpMRC: Explainability Evaluation for Machine Reading Comprehension
Stars: ✭ 58 (+152.17%)
Mutual labels:  question-answering, machine-reading-comprehension
hf-experiments
Experiments with Hugging Face 🔬 🤗
Stars: ✭ 37 (+60.87%)
Mutual labels:  question-answering, summarization
Text Classification Models Pytorch
Implementation of State-of-the-art Text Classification Models in Pytorch
Stars: ✭ 379 (+1547.83%)
Mutual labels:  transformer, classification
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (+52.17%)
Mutual labels:  transformer, ner
Abstractive Summarization With Transfer Learning
Abstractive summarisation using Bert as encoder and Transformer Decoder
Stars: ✭ 358 (+1456.52%)
Mutual labels:  transformer, summarization
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (+152.17%)
Mutual labels:  transformer, question-answering
well-classified-examples-are-underestimated
Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
Stars: ✭ 21 (-8.7%)
Mutual labels:  transformer, classification
Nlp Interview Notes
本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。
Stars: ✭ 207 (+800%)
Mutual labels:  transformer, ner
fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+1730.43%)
Mutual labels:  transformer, question-answering
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+965.22%)
Mutual labels:  transformer, ner


Verseagility - NLP Toolkit

Verseagility is a Python-based toolkit to ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production. It is a central component of the Microsoft Data Science Toolkit.

Why Verseagility?

Building NLP solutions which cover all components from text classification, named entity recognition to answer suggestion, require testing and integration effort. For this purpose, we developed this toolkit, which serves to minimize the setup time of an end-to-end solution and maximizes the time for use case-specific enhancements and adjustments. On this way, first results should be made available in an accelerated way when bringing individual, pre-labeled text document data and allow more time for iterative improvements.

Verseagility Process

See the documentation section for detailed instructions how to get started with the toolkit.

Supported Use Cases

Verseagility is a modular toolkit that can be extended by further use-cases as needed. Following use-cases are already implemented and ready to be used:

  • Binary, multi-class & multi-label classification
  • Named entity recognition
  • Question answering
  • Opinion mining

Live Demo

The toolkit paves the way to build consumeable REST APIs, for example in Azure Container Instances. These APIs may be used by the application of your choice: a website, a business process or just for testing purposes. A web-based live demo of models resulting from Verseagility is hosted at the Microsoft Technology Center Germany (MTC):

Verseagility Demo

Repository Structure

The repository is built in the following structure:

├── /assets            <- Version controlled assets, such as stopword lists. Max size 
│                         per file: 10 MB. Training data should
│                         be stored in local data directory, outside of repository or within gitignore. 
│
├── /demo              <- Demo environment that can be deployed as is, or customized. 
│
├── /deploy            <- Scripts used for deploying training or test service  
│   ├── training.py    <- Deploy your training to a remote compute instance, via AML
│   │   
│   ├── hyperdrive.py  <- Deploy hyperparemeter sweep on a remote compute instance, via AML
│   │
│   └── service.py     <- Deploy a service (endpoint) to ACI or AKS, via AML
│
├── /docs              <- Detailed documentation.
│
├── /notebook          <- Jupyter notebooks. Naming convention is <[Task]-[Short Description]>,
│                         for example: 'Data - Exploration.ipynb'
│
├── /pipeline          <- Document processing pipeline components, including document cracker. 
│
├── /project           <- Project configuration files, detailing the tasks to be completed.
│
├── /src               <- Source code for use in this project.
│   ├── infer.py       <- Inference file, for scoring the model
│   │   
│   ├── data.py        <- Use case agnostic utils file, for data management incl upload/download
│   │
│   └── helper.py      <- Use case agnostic utils file, with common functions incl secret handling
│
├── /tests              <- Unit tests (using pytest)
│
├── README.md          <- The top-level README for developers using this project.
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment.
│                         Can be generated using `pip freeze > requirements.txt`
│
└── config.ini         <- Configuration and secrets used while developing locally
                          Secrets in production should be stored in the Azure KeyVault

Acknowledgements

Verseagility is built in part using the following frameworks:

See the illustration of our current tech stack below: Verseagility Process

Maintainers:

Current updates

The following section contains a list of possible new features or enhancements. Feel free to contribute.

Infrastructure

  • Verseagility Lite template (ARM)
  • Python Version >= 3.7 support (Transformers dependencies)
  • Upgrade to newer AzureML SDK

Datasets

  • Support if tabular data sets in AML

Classification

  • Integrate handling for larger documents vs short documents
  • Integrate explicit handling for unbalanced datasets
  • ONNX support

NER

  • Improve duplicate handling

Question Answering

  • Apply advanced IR methods

Summarization

  • (IP) full test of integration

Deployment

  • Deploy service to Azure Function (without AzureML)
  • Setup GitHub actions
  • AKS testing

Notebooks Templates

  • (IP) review model results (auto generate after each training step)
  • Review model bias (auto generate after each training step)
  • (IP) available models benchmark (incl AutoML)

Tests

  • Unit tests (pytest)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Feel free to contribute!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].