All Projects → louiskirsch → tensorflow-with-kenlm

louiskirsch / tensorflow-with-kenlm

Licence: Apache-2.0 License
Tensorflow with KenLM integrated for beam search scoring

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
go
31211 projects - #10 most used programming language
typescript
32286 projects
HTML
75241 projects

Projects that are alternatives of or similar to tensorflow-with-kenlm

Ctcdecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.
Stars: ✭ 529 (+1663.33%)
Mutual labels:  beam-search, language-model
Tf chatbot seq2seq antilm
Seq2seq chatbot with attention and anti-language model to suppress generic response, option for further improve by deep reinforcement learning.
Stars: ✭ 369 (+1130%)
Mutual labels:  beam-search, language-model
cscg
Code Generation as a Dual Task of Code Summarization.
Stars: ✭ 28 (-6.67%)
Mutual labels:  language-model
MinTL
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Stars: ✭ 61 (+103.33%)
Mutual labels:  language-model
language-planner
Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
Stars: ✭ 84 (+180%)
Mutual labels:  language-model
open clip
An open source implementation of CLIP.
Stars: ✭ 1,534 (+5013.33%)
Mutual labels:  language-model
Word-Prediction-Ngram
Next Word Prediction using n-gram Probabilistic Model with various Smoothing Techniques
Stars: ✭ 25 (-16.67%)
Mutual labels:  language-model
gpt-j-api
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend
Stars: ✭ 248 (+726.67%)
Mutual labels:  language-model
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (+20%)
Mutual labels:  beam-search
chainer-notebooks
Jupyter notebooks for Chainer hands-on
Stars: ✭ 23 (-23.33%)
Mutual labels:  language-model
tying-wv-and-wc
Implementation for "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"
Stars: ✭ 39 (+30%)
Mutual labels:  language-model
bert-movie-reviews-sentiment-classifier
Build a Movie Reviews Sentiment Classifier with Google's BERT Language Model
Stars: ✭ 12 (-60%)
Mutual labels:  language-model
Deep-NLP-Resources
Curated list of all NLP Resources
Stars: ✭ 65 (+116.67%)
Mutual labels:  language-model
gpt-j
A GPT-J API to use with python3 to generate text, blogs, code, and more
Stars: ✭ 101 (+236.67%)
Mutual labels:  language-model
mongolian-nlp
Useful resources for Mongolian NLP
Stars: ✭ 119 (+296.67%)
Mutual labels:  language-model
pyVHDLParser
Streaming based VHDL parser.
Stars: ✭ 51 (+70%)
Mutual labels:  language-model
CoLAKE
COLING'2020: CoLAKE: Contextualized Language and Knowledge Embedding
Stars: ✭ 86 (+186.67%)
Mutual labels:  language-model
minimal-nmt
A minimal nmt example to serve as an seq2seq+attention reference.
Stars: ✭ 36 (+20%)
Mutual labels:  beam-search
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+580%)
Mutual labels:  language-model
transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (+100%)
Mutual labels:  beam-search


-----------------

Tensorflow with KenLM integration

This fork of tensorflow adds KenLM (a language model) to the ctc_beam_search_decoder operation.

tf.nn.ctc_beam_search_decoder(logits,
                              output_sequence_lengths,
                              kenlm_directory_path='your/directory/path')

Your specified kenlm_directory_path must contain three files

kenlm-model.binary
vocabulary
trie

See http://kheafield.com/code/kenlm/ to find out how to generate your kenlm-model.binary.

The vocabulary file contains the mapping from your logit labels to characters, the file should contain all allowed characteres in a single line, the indexing specifying the respective label id, e.g.

abcdefghijklmnopqrstuvwxyz '

The trie is generated from a text corpus of all words on a character level. Given a file corpus.txt which must satisfy the following conditions,

  • only contains words with characters specified in vocabulary
  • seperated by whitespace or new lines

we can generate trie using:

cd tensorflow-with-kenlm
bazel build -c opt --config=cuda //tensorflow/core/util/ctc:ctc_generate_trie
bazel-bin/tensorflow/core/util/ctc/ctc_generate_trie kenlm-model.binary vocabulary < corpus.txt > trie

How to compile tensorflow

See Download and Setup for more detailed instructions.

./configure
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-*.whl --upgrade

Linux CPU Linux GPU Mac OS CPU Windows CPU Android
Build Status Build Status Build Status Build Status Build Status

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

If you'd like to contribute to TensorFlow, be sure to review the contribution guidelines.

We use GitHub issues for tracking requests and bugs, but please see Community for general questions and discussion.

Installation

See Installing TensorFlow for instructions on how to install our release binaries or how to build from source.

People who are a little more adventurous can also try our nightly binaries:

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> sess.run(hello)
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> sess.run(a+b)
42
>>>

For more information

The TensorFlow community has created amazing things with TensorFlow, please see the resources section of tensorflow.org for an incomplete list.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].