Alternatives and detailed information of bert_tokenization_for_java

zhongbin1 / bert_tokenization_for_java

Licence: Apache-2.0 license

This is a java version of Chinese tokenization descried in BERT.

Programming Languages

java

68154 projects - #9 most used programming language

Projects that are alternatives of or similar to bert tokenization for java

berserker

Berserker - BERt chineSE woRd toKenizER

Stars: ✭ 17 (-56.41%)

Mutual labels: chinese-nlp, bert

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+16966.67%)

Mutual labels: chinese-nlp, bert

DeepNER

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

Stars: ✭ 9 (-76.92%)

Mutual labels: bert

BERT-Chinese-Couplet

BERT for Chinese Couplet | BERT用于自动对对联

Stars: ✭ 19 (-51.28%)

Mutual labels: bert

polycash

The ultimate open source betting protocol. PolyCash is a P2P blockchain platform for wallets, asset issuance, bonds & gaming.

Stars: ✭ 24 (-38.46%)

Mutual labels: tokenization

Self-Supervised-Embedding-Fusion-Transformer

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

Stars: ✭ 57 (+46.15%)

Mutual labels: bert

ParsBigBird

Persian Bert For Long-Range Sequences

Stars: ✭ 58 (+48.72%)

Mutual labels: bert

SentimentAnalysis

(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset

Stars: ✭ 40 (+2.56%)

Mutual labels: bert

bern

A neural named entity recognition and multi-type normalization tool for biomedical text mining

Stars: ✭ 151 (+287.18%)

Mutual labels: bert

spacy russian tokenizer

Custom Russian tokenizer for spaCy

Stars: ✭ 35 (-10.26%)

Mutual labels: tokenization

golgotha

Contextualised Embeddings and Language Modelling using BERT and Friends using R

Stars: ✭ 39 (+0%)

Mutual labels: bert

LMMS

Language Modelling Makes Sense - WSD (and more) with Contextual Embeddings

Stars: ✭ 79 (+102.56%)

Mutual labels: bert

ganbert-pytorch

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Stars: ✭ 60 (+53.85%)

Mutual labels: bert

transformer-models

Deep Learning Transformer models in MATLAB

Stars: ✭ 90 (+130.77%)

Mutual labels: bert

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+287.18%)

Mutual labels: bert

SQUAD2.Q-Augmented-Dataset

Augmented version of SQUAD 2.0 for Questions

Stars: ✭ 31 (-20.51%)

Mutual labels: bert

muse-as-service

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

Stars: ✭ 45 (+15.38%)

Mutual labels: bert

bert-AAD

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Stars: ✭ 27 (-30.77%)

Mutual labels: bert

ark-nlp

A private nlp coding package, which quickly implements the SOTA solutions.

Stars: ✭ 232 (+494.87%)

Mutual labels: bert

KAREN

KAREN: Unifying Hatespeech Detection and Benchmarking

Stars: ✭ 18 (-53.85%)

Mutual labels: bert

View All Similar Projects ➔

This is a java version of Chinese tokenization descried in BERT, including basic tokenization and wordpiece tokenization.

Motivation

In production, we usually deploy the BERT related model by tensorflow serving for high performance and flexibility. However, our application may not developed by python. Hence, we have to rewrite the tokenization module.

Usage

Just run Preprocess.java, you can get result. Now, it support single and pair sentence both.

Moreover, for Chinese natural language processing, we add full turn to half angle and uppercase to lowercase operation.

Reporting issues

Please let me know, if you encounter any problems.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

zhongbin1 / bert_tokenization_for_java

Programming Languages

Labels

Projects that are alternatives of or similar to bert tokenization for java

Motivation

Usage

Reporting issues