All Projects → microsoft → Unilm

microsoft / Unilm

Licence: mit
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Cuda
1817 projects
C++
36643 projects - #6 most used programming language
cython
566 projects
lua
6591 projects

Projects that are alternatives of or similar to Unilm

Crnn.pytorch
crnn实现水平和垂直方向中文文字识别, 提供在3w多个中文字符训练的水平识别和垂直识别的预训练模型; 欢迎关注,试用和反馈问题... ...
Stars: ✭ 145 (-96.45%)
Mutual labels:  ocr, text-recognition
Php Apache Tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Stars: ✭ 76 (-98.14%)
Mutual labels:  ocr, text-recognition
Image Text Localization Recognition
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
Stars: ✭ 788 (-80.7%)
Mutual labels:  ocr, text-recognition
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (-96.42%)
Mutual labels:  ocr, text-recognition
Text recognition toolbox
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.
Stars: ✭ 114 (-97.21%)
Mutual labels:  ocr, text-recognition
Aster.pytorch
ASTER in Pytorch
Stars: ✭ 473 (-88.41%)
Mutual labels:  ocr, text-recognition
Sar tf
This is an implementation of Show, Attend and Read with tensorflow
Stars: ✭ 70 (-98.29%)
Mutual labels:  ocr, text-recognition
Megreader
A research project for text detection and recognition using PyTorch 1.2.
Stars: ✭ 332 (-91.87%)
Mutual labels:  ocr, text-recognition
Chineseglue
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Stars: ✭ 1,548 (-62.08%)
Mutual labels:  language-understanding, pre-trained-model
Node Tesseract Ocr
A Node.js wrapper for the Tesseract OCR API
Stars: ✭ 92 (-97.75%)
Mutual labels:  ocr, text-recognition
Cnn lstm ctc ocr
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
Stars: ✭ 464 (-88.63%)
Mutual labels:  ocr, text-recognition
Transformer str
PyTorch implementation of my new method for Scene Text Recognition (STR) based on Transformer,Equipped with Transformer, this method outperforms the best model of the aforementioned deep-text-recognition-benchmark by 7.6% on CUTE80.
Stars: ✭ 131 (-96.79%)
Mutual labels:  ocr, text-recognition
React Native Tesseract Ocr
Tesseract OCR wrapper for React Native
Stars: ✭ 384 (-90.59%)
Mutual labels:  ocr, text-recognition
Tr
Free Offline OCR 离线的中文文本检测+识别SDK
Stars: ✭ 598 (-85.35%)
Mutual labels:  ocr, text-recognition
Awesome Ocr Resources
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).
Stars: ✭ 335 (-91.79%)
Mutual labels:  ocr, text-recognition
Crnn
Convolutional recurrent neural network for scene text recognition or OCR in Keras
Stars: ✭ 68 (-98.33%)
Mutual labels:  ocr, text-recognition
Vedastr
A scene text recognition toolbox based on PyTorch
Stars: ✭ 290 (-92.9%)
Mutual labels:  ocr, text-recognition
Chineseaddress ocr
Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。
Stars: ✭ 309 (-92.43%)
Mutual labels:  ocr, text-recognition
Crnn With Stn
implement CRNN in Keras with Spatial Transformer Network
Stars: ✭ 83 (-97.97%)
Mutual labels:  ocr, text-recognition
Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (-97.16%)
Mutual labels:  ocr, text-recognition

Hiring

We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-trained models, please send your resume to [email protected].

Pre-trained Models

Large-scale self-supervised pre-training across tasks (predictive and generative), languages (100+ languages), and modalities (language, image, audio, layout/format + language, vision + language, audio + language, etc.)

Language & Multilingual

UniLM: unified pre-training for language understanding and generation

InfoXLM/XLM-E: multilingual/cross-lingual pre-trained models for 100+ languages

DeltaLM/mT6: encoder-decoder pre-training for language generation and translation for 100+ languages

MiniLM: small and fast pre-trained models for language understanding and generation

AdaLM: domain, language, and task adaptation of pre-trained models

Vision

BEiT (NEW): generative self-supervised pre-training for image / BERT Pre-Training of Image Transformers

Speech

WavLM (NEW): speech pre-training for full stack tasks

Multimodal (X + Language)

LayoutLM: multimodal (text + layout/format + image) pre-training for Document AI (e.g. scanned documents, PDF, etc.)

LayoutXLM: multimodal (text + layout/format + image) pre-training for multilingual document understanding

MarkupLM (NEW): markup language model pre-training for visually-rich document understanding

UniSpeech: unified pre-training for self-supervised learning and supervised learning for ASR

UniSpeech-SAT: universal speech representation learning with speaker-aware pre-training

SpeechT5 (NEW): encoder-decoder pre-training for spoken language processing

VLMo (NEW): Unified vision-language pre-training - evolution of BEiT to multimodal

Toolkits

s2s-ft: sequence-to-sequence fine-tuning toolkit

Applications

TrOCR (NEW): transformer-based OCR w/ pre-trained models

LayoutReader: pre-training of text and layout for reading order detection

XLM-T: multilingual NMT w/ pretrained cross-lingual encoders

News

  • [Model Release] December 16th, 2021: TrOCR small models for handwritten and printed texts.
  • November 24th, 2021: VLMo as the new SOTA on the VQA Challenge
  • November, 2021: Multilingual translation at scale: 10000 language pairs and beyond
  • [Model Release] November, 2021: MarkupLM
  • [Model Release] November, 2021: VLMo - Unified vision-language pre-training w/ BEiT
  • [Model Release] October, 2021: WavLM - Large-scale self-supervised pre-trained models for speech.
  • [Model Release] October 2021: TrOCR is on HuggingFace
  • September 28th, 2021: T-ULRv5 (aka XLM-E/InfoXLM) as the SOTA on the XTREME leaderboard. // Blog
  • [Model Release] September, 2021: LayoutLM-cased are on HuggingFace
  • [Model Release] September, 2021: TrOCR - Transformer-based OCR w/ pre-trained BEiT and RoBERTa models.
  • August 2021: LayoutLMv2 and LayoutXLM are on HuggingFace
  • [Model Release] August, 2021: LayoutReader - Built with LayoutLM to improve general reading order detection.
  • [Model Release] August, 2021: DeltaLM - Encoder-decoder pre-training for language generation and translation.
  • August 2021: BEiT is on HuggingFace
  • [Model Release] July, 2021: BEiT - Towards BERT moment for CV
  • [Model Release] June, 2021: LayoutLMv2, LayoutXLM, MiniLMv2, and AdaLM.
  • May, 2021: LayoutLMv2, InfoXLMv2, MiniLMv2, UniLMv3, and AdaLM were accepted by ACL 2021.
  • April, 2021: LayoutXLM is coming by extending the LayoutLM into multilingual support! A multilingual form understanding benchmark XFUND is also introduced, which includes forms with human labeled key-value pairs in 7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese).
  • March, 2021: InfoXLM was accepted by NAACL 2021.
  • December 29th, 2020: LayoutLMv2 is coming with the new SOTA on a wide varierty of document AI tasks, including DocVQA and SROIE leaderboard.
  • October 8th, 2020: T-ULRv2 (aka InfoXLM) as the SOTA on the XTREME leaderboard. // Blog
  • September, 2020: MiniLM was accepted by NeurIPS 2020.
  • July 16, 2020: InfoXLM (Multilingual UniLM) arXiv
  • June, 2020: UniLMv2 was accepted by ICML 2020; LayoutLM was accepted by KDD 2020.
  • April 5, 2020: Multilingual MiniLM released!
  • September, 2019: UniLMv1 was accepted by NeurIPS 2019.

Release

***** New October, 2021: WavLM release *****

  • WavLM (October 27, 2021): WavLM, a new pre-trained speech model, to solve full-stack downstream speech tasks. WavLM integrates the gated relative position embedding structure and the utterance mixing method, to model both spoken content and speaker identity preservation. WavLM is trained on 94k hours of public audio data, which is larger than other released checkpoints for English Speech modeling. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks. "WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing"

***** New October, 2021: MarkupLM release *****

  • MarkupLM (October 19, 2021): MarkupLM, a simple yet effective pre-training approach for text and markup language. With the Transformer architecture, MarkupLM integrates different input embeddings including text embeddings, position embeddings, and XPath embeddings. Furthermore, we also propose new pre-training objectives that are specially designed for understanding the markup language. We evaluate the pre-trained MarkupLM model on the WebSRC and SWDE datasets. Experiments show that MarkupLM significantly outperforms several SOTA baselines in these tasks. "MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding"

***** September, 2021: TrOCR release *****

  • TrOCR (September 22, 2021): Transformer-based OCR with pre-trained models, which leverages the Transformer architecture for both image understanding and bpe-level text generation. The TrOCR model is simple but effective (convolution free), and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models"

***** August, 2021: LayoutReader release *****

***** August, 2021: DeltaLM release *****

***** July, 2021: BEiT release *****

***** June, 2021: LayoutXLM | AdaLM | MiniLMv2 release *****

***** May, 2021: LayoutLMv2 | LayoutXLM release *****

  • LayoutLM 2.0 (December 29, 2020): multimodal pre-training for visually-rich document understanding by leveraging text, layout and image information in a single framework. It is coming with new SOTA on a wide range of document understanding tasks, including FUNSD (0.7895 -> 0.8420), CORD (0.9493 -> 0.9601), SROIE (0.9524 -> 0.9781), Kleister-NDA (0.834 -> 0.852), RVL-CDIP (0.9443 -> 0.9564), and DocVQA (0.7295 -> 0.8672). "LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding ACL 2021"

***** February, 2020: UniLM v2 | MiniLM v1 | LayoutLM v1 | s2s-ft v1 release *****

***** October 1st, 2019: UniLM v1 release *****

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using the pre-trained models, please submit a GitHub issue.

For other communications, please contact Furu Wei ([email protected]).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].