Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → robrua → Easy Bert

robrua / Easy Bert

Licence: mit

A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)

Programming Languages

java

68154 projects - #9 most used programming language

Labels

machine-learning tensorflow nlp natural-language-processing language-model natural-language-understanding word-embeddings

Projects that are alternatives of or similar to Easy Bert

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-63.21%)

Mutual labels: natural-language-processing, natural-language-understanding, word-embeddings

Attention Mechanisms

Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.

Stars: ✭ 203 (+91.51%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+52486.79%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Stars: ✭ 5,077 (+4689.62%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Chars2vec

Character-based word embeddings model based on RNN for handling real world texts

Stars: ✭ 130 (+22.64%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Spacy Transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Stars: ✭ 919 (+766.98%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Convai Baseline

ConvAI baseline solution

Stars: ✭ 49 (-53.77%)

Mutual labels: natural-language-processing, natural-language-understanding

Python Tutorial Notebooks

Python tutorials as Jupyter Notebooks for NLP, ML, AI

Stars: ✭ 52 (-50.94%)

Mutual labels: natural-language-processing, natural-language-understanding

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+1215.09%)

Mutual labels: natural-language-processing, word-embeddings

Chinese nlu by using rasa nlu

使用 RASA NLU 来构建中文自然语言理解系统（NLU）| Use RASA NLU to build a Chinese Natural Language Understanding System (NLU)

Stars: ✭ 99 (-6.6%)

Mutual labels: natural-language-processing, natural-language-understanding

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-65.09%)

Mutual labels: natural-language-processing, natural-language-understanding

Bidaf Keras

Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2

Stars: ✭ 60 (-43.4%)

Mutual labels: natural-language-processing, natural-language-understanding

Gpt2

PyTorch Implementation of OpenAI GPT-2

Stars: ✭ 64 (-39.62%)

Mutual labels: natural-language-processing, language-model

Ludwig

Data-centric declarative deep learning framework

Stars: ✭ 8,018 (+7464.15%)

Mutual labels: natural-language-processing, natural-language-understanding

Blocks

Blocks World -- Simulator, Code, and Models (Misra et al. EMNLP 2017)

Stars: ✭ 39 (-63.21%)

Mutual labels: natural-language-processing, natural-language-understanding

Vietnamese Electra

Electra pre-trained model using Vietnamese corpus

Stars: ✭ 55 (-48.11%)

Mutual labels: natural-language-processing, language-model

Textblob Ar

Arabic support for textblob

Stars: ✭ 60 (-43.4%)

Mutual labels: natural-language-processing, word-embeddings

Dialogue Understanding

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study

Stars: ✭ 77 (-27.36%)

Mutual labels: natural-language-processing, natural-language-understanding

Mt Dnn

Multi-Task Deep Neural Networks for Natural Language Understanding

Stars: ✭ 72 (-32.08%)

Mutual labels: natural-language-processing, natural-language-understanding

Greek Bert

A Greek edition of BERT pre-trained language model

Stars: ✭ 84 (-20.75%)

Mutual labels: natural-language-processing, language-model

View All Similar Projects ➔

easy-bert

easy-bert is a dead simple API for using Google's high quality BERT language model in Python and Java.

Currently, easy-bert is focused on getting embeddings from pre-trained BERT models in both Python and Java. Support for fine-tuning and pre-training in Python will be added in the future, as well as support for using easy-bert for other tasks besides getting embeddings.

Python

How To Get It

easy-bert is available on PyPI. You can install with pip install easybert or pip install git+https://github.com/robrua/easy-bert.git if you want the very latest.

Usage

You can use easy-bert with pre-trained BERT models from TensorFlow Hub or from local models in the TensorFlow saved model format.

To create a BERT embedder from a TensowFlow Hub model, simply instantiate a Bert object with the target tf-hub URL:

from easybert import Bert
bert = Bert("https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1")

You can also load a local model in TensorFlow's saved model format using Bert.load:

from easybert import Bert
bert = Bert.load("/path/to/your/model/")

Once you have a BERT model loaded, you can get sequence embeddings using bert.embed:

x = bert.embed("A sequence")
y = bert.embed(["Multiple", "Sequences"])

If you want per-token embeddings, you can set per_token=True:

x = bert.embed("A sequence", per_token=True)
y = bert.embed(["Multiple", "Sequences"], per_token=True)

easy-bert returns BERT embeddings as numpy arrays

Every time you call bert.embed, a new TensorFlow session is created and used for the computation. If you're calling bert.embed a lot sequentially, you can speed up your code by sharing a TensorFlow session among those calls using a with statement:

with bert:
    x = bert.embed("A sequence", per_token=True)
    y = bert.embed(["Multiple", "Sequences"], per_token=True)

You can save a BERT model using bert.save, then reload it later using Bert.load:

bert.save("/path/to/your/model/")
bert = Bert.load("/path/to/your/model/")

CLI

easy-bert also provides a CLI tool to conveniently do one-off embeddings of sequences with BERT. It can also convert a TensorFlow Hub model to a saved model.

Run bert --help, bert embed --help or bert download --help to get details about the CLI tool.

Docker

easy-bert comes with a docker build that can be used as a base image for applications that rely on bert embeddings or to just run the CLI tool without needing to install an environment.

Java

How To Get It

easy-bert is available on Maven Central. It is also distributed through the releases page.

To add the latest easy-bert release version to your maven project, add the dependency to your pom.xml dependencies section:

<dependencies>
  <dependency>
    <groupId>com.robrua.nlp</groupId>
    <artifactId>easy-bert</artifactId>
    <version>1.0.3</version>
  </dependency>
</dependencies>

Or, if you want to get the latest development version, add the Sonaype Snapshot Repository to your pom.xml as well:

<dependencies>
  <dependency>
    <groupId>com.robrua.nlp</groupId>
    <artifactId>easy-bert</artifactId>
    <version>1.0.4-SNAPSHOT</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>snapshots-repo</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    <releases>
      <enabled>false</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>

Usage

You can use easy-bert with pre-trained BERT models generated with easy-bert's Python tools. You can also used pre-generated models on Maven Central.

To load a model from your local filesystem, you can use:

try(Bert bert = Bert.load(new File("/path/to/your/model/"))) {
    // Embed some sequences
}

If the model is in your classpath (e.g. if you're pulling it in via Maven), you can use:

try(Bert bert = Bert.load("/resource/path/to/your/model")) {
    // Embed some sequences
}

Once you have a BERT model loaded, you can get sequence embeddings using bert.embedSequence or bert.embedSequences:

float[] embedding = bert.embedSequence("A sequence");
float[][] embeddings = bert.embedSequences("Multiple", "Sequences");

If you want per-token embeddings, you can use bert.embedTokens:

float[][] embedding = bert.embedTokens("A sequence");
float[][][] embeddings = bert.embedTokens("Multiple", "Sequences");

Pre-Generated Maven Central Models

Various TensorFlow Hub BERT models are available in easy-bert format on Maven Central. To use one in your project, add the following to your pom.xml, substituting one of the Artifact IDs listed below in place of ARTIFACT-ID in the artifactId:

<dependencies>
  <dependency>
    <groupId>com.robrua.nlp.models</groupId>
    <artifactId>ARTIFACT-ID</artifactId>
    <version>1.0.0</version>
  </dependency>
</dependencies>

Once you've pulled in the dependency, you can load the model using this code. Substitute the appropriate Resource Path from the list below in place of RESOURCE-PATH based on the model you added as a dependency:

try(Bert bert = Bert.load("RESOURCE-PATH")) {
    // Embed some sequences
}

Available Models

Model	Languages	Layers	Embedding Size	Heads	Parameters	Artifact ID	Resource Path
BERT-Base, Uncased	English	12	768	12	110M	easy-bert-uncased-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-uncased-L-12-H-768-A-12
BERT-Base, Cased	English	12	768	12	110M	easy-bert-cased-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-cased-L-12-H-768-A-12
BERT-Base, Multilingual Cased	104 Languages	12	768	12	110M	easy-bert-multi-cased-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-multi-cased-L-12-H-768-A-12
BERT-Base, Chinese	Chinese Simplified and Traditional	12	768	12	110M	easy-bert-chinese-L-12-H-768-A-12	com/robrua/nlp/easy-bert/bert-chinese-L-12-H-768-A-12

Creating Your Own Models

For now, easy-bert can only use pre-trained TensorFlow Hub BERT models that have been converted using the Python tools. We will be adding support for fine-tuning and pre-training new models easily, but there are no plans to support these on the Java side. You'll need to train in Python, save the model, then load it in Java.

Bugs

If you find bugs please let us know via a pull request or issue.

Citing easy-bert

If you used easy-bert for your research, please cite the project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 106

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗