All Projects → asyml → Texar

asyml / Texar

Licence: apache-2.0
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Texar

Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+12.61%)
Mutual labels:  natural-language-processing, machine-translation, bert, xlnet
Texar Pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 636 (-71.56%)
Mutual labels:  data-processing, natural-language-processing, machine-translation, text-generation
Attention Mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (-90.92%)
Mutual labels:  natural-language-processing, machine-translation, text-generation
Deep Learning Drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Stars: ✭ 9,717 (+334.57%)
Mutual labels:  natural-language-processing, machine-translation
Opennmt Tf
Neural machine translation and sequence learning using TensorFlow
Stars: ✭ 1,223 (-45.3%)
Mutual labels:  natural-language-processing, machine-translation
Forte
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 89 (-96.02%)
Mutual labels:  data-processing, natural-language-processing
Market Reporter
Automatic Generation of Brief Summaries of Time-Series Data
Stars: ✭ 54 (-97.58%)
Mutual labels:  natural-language-processing, text-generation
Opus Mt
Open neural machine translation models and web services
Stars: ✭ 111 (-95.04%)
Mutual labels:  natural-language-processing, machine-translation
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+2392.93%)
Mutual labels:  natural-language-processing, bert
Tensorflow Nlp
NLP and Text Generation Experiments in TensorFlow 2.x / 1.x
Stars: ✭ 1,487 (-33.5%)
Mutual labels:  natural-language-processing, text-generation
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-94.05%)
Mutual labels:  natural-language-processing, machine-translation
Nlp Tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Stars: ✭ 9,895 (+342.53%)
Mutual labels:  natural-language-processing, bert
Dialogpt
Large-scale pretraining for dialogue
Stars: ✭ 1,177 (-47.36%)
Mutual labels:  data-processing, text-generation
Bert As Service
Mapping a variable-length sentence to a fixed-length vector using BERT model
Stars: ✭ 9,779 (+337.34%)
Mutual labels:  natural-language-processing, bert
Comet
A Neural Framework for MT Evaluation
Stars: ✭ 58 (-97.41%)
Mutual labels:  natural-language-processing, machine-translation
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-95.17%)
Mutual labels:  natural-language-processing, text-generation
Awesome Bert
bert nlp papers, applications and github resources, including the newst xlnet , BERT、XLNet 相关论文和 github 项目
Stars: ✭ 1,732 (-22.54%)
Mutual labels:  bert, xlnet
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (-0.04%)
Mutual labels:  bert, gpt-2
Fasttext multilingual
Multilingual word vectors in 78 languages
Stars: ✭ 1,067 (-52.28%)
Mutual labels:  natural-language-processing, machine-translation
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-97.63%)
Mutual labels:  natural-language-processing, machine-translation



pypi Build Status codecov Documentation Status License

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation.

Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this repository is maintained by Petuum Open Source.

Key Features

  • Two Versions, (Mostly) Same Interfaces. Texar-TensorFlow (this repo) and Texar-PyTorch have mostly the same interfaces. Both further combine the best design of TF and PyTorch:
    • Interfaces and variable sharing in PyTorch convention
    • Excellent factorization and rich functionalities in TF convention.
  • Rich Pre-trained Models, Rich Usage with Uniform Interfaces. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!
  • Fully Customizable at multiple abstraction level -- both novice-friendly and expert-friendly.
    • Free to plug in whatever external modules, since Texar is fully compatible with the native TF/PyTorch APIs.
  • Versatile to support broad tasks, models, algorithms, data processing, evaluation, etc.
    • encoder(s) to decoder(s), sequential- and self-attentions, memory, hierarchical models, classifiers...
    • maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, ...
  • Modularized for maximal re-use and clean APIs, based on principled decomposition of Learning-Inference-Model Architecture.
  • Distributed model training with multiple GPUs.
  • Clean, detailed documentation and rich examples.


Library API Example

Builds an encoder-decoder model, with maximum likelihood learning:

import texar.tf as tx

# Data 
data = tx.data.PairedTextData(hparams=hparams_data) # a dict of hyperparameters 
iterator = tx.data.DataIterator(data)
batch = iterator.get_next()                         # get a data mini-batch

# Model architecture
embedder = tx.modules.WordEmbedder(data.target_vocab.size, hparams=hparams_emb)
encoder = tx.modules.TransformerEncoder(hparams=hparams_enc)
outputs_enc = encoder(inputs=embedder(batch['source_text_ids']),  # call as a function
                      sequence_length=batch['source_length'])
                      
decoder = tx.modules.TransformerDecoder(
    output_layer=tf.transpose(embedder.embedding) # tie input embedding w/ output layer
    hparams=hparams_decoder)
outputs, _, _ = decoder(memory=output_enc, 
                        memory_sequence_length=batch['source_length'],
                        inputs=embedder(batch['target_text_ids']),
                        sequence_length=batch['target_length']-1,
                        decoding_strategy='greedy_train')    # teacher-forcing decoding
                        
# Loss for maximum likelihood learning
loss = tx.losses.sequence_sparse_softmax_cross_entropy(
    labels=batch['target_text_ids'][:, 1:],
    logits=outputs.logits,
    sequence_length=batch['target_length']-1)  # automatic sequence masks

# Beam search decoding
outputs_bs, _, _ = tx.modules.beam_search_decode(
    decoder,
    embedding=embedder,
    start_tokens=[data.target_vocab.bos_token_id]*num_samples,
    end_token=data.target_vocab.eos_token_id)

The same model, but with adversarial learning:

helper = tx.modules.GumbelSoftmaxTraingHelper( # Gumbel-softmax decoding
    start_tokens=[BOS]*batch_size, end_token=EOS, embedding=embedder)
outputs, _ = decoder(helper=helper)            # automatic re-use of the decoder variables

discriminator = tx.modules.BertClassifier(hparams=hparams_bert)        # pre-trained model

G_loss, D_loss = tx.losses.binary_adversarial_losses(
    real_data=data['target_text_ids'][:, 1:],
    fake_data=outputs.sample_id,
    discriminator_fn=discriminator)

The same model, but with RL policy gradient learning:

agent = tx.agents.SeqPGAgent(samples=outputs.sample_id,
                             logits=outputs.logits,
                             sequence_length=batch['target_length']-1,
                             hparams=config_model.agent)

Many more examples are available here

Installation

(Note: Texar>0.2.3 requires Python 3.6 or 3.7. To use with older Python versions, please use Texar<=0.2.3)

Texar requires:

After tensorflow and tensorflow_probability are installed, install Texar from PyPI:

pip install texar

To use cutting-edge features or develop locally, install from source:

git clone https://github.com/asyml/texar.git
cd texar
pip install .

Getting Started

Reference

If you use Texar, please cite the tech report with the following BibTex entry:

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wanrong Zhu, Devendra Sachan and Eric Xing
ACL 2019

@inproceedings{hu2019texar,
  title={Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation},
  author={Hu, Zhiting and Shi, Haoran and Tan, Bowen and Wang, Wentao and Yang, Zichao and Zhao, Tiancheng and He, Junxian and Qin, Lianhui and Wang, Di and others},
  booktitle={ACL 2019, System Demonstrations},
  year={2019}
}

License

Apache License 2.0

Companies and Universities Supporting Texar

                  

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].