Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → asyml → Texar Pytorch

asyml / Texar Pytorch

Licence: apache-2.0

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning pytorch natural-language-processing machine-translation text-generation data-processing

Projects that are alternatives of or similar to Texar Pytorch

Texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 2,236 (+251.57%)

Mutual labels: data-processing, natural-language-processing, machine-translation, text-generation

Attention Mechanisms

Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.

Stars: ✭ 203 (-68.08%)

Mutual labels: natural-language-processing, machine-translation, text-generation

Dialogpt

Large-scale pretraining for dialogue

Stars: ✭ 1,177 (+85.06%)

Mutual labels: data-processing, text-generation

Forte

Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 89 (-86.01%)

Mutual labels: data-processing, natural-language-processing

parallel-corpora-tools

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

Stars: ✭ 35 (-94.5%)

Mutual labels: machine-translation, data-processing

Deep Generative Models For Natural Language Processing

DGMs for NLP. A roadmap.

Stars: ✭ 185 (-70.91%)

Mutual labels: natural-language-processing, text-generation

Hardware Aware Transformers

[ACL 2020] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Stars: ✭ 206 (-67.61%)

Mutual labels: natural-language-processing, machine-translation

Machine Learning Notebooks

Machine Learning notebooks for refreshing concepts.

Stars: ✭ 222 (-65.09%)

Mutual labels: data-processing, natural-language-processing

Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Stars: ✭ 132 (-79.25%)

Mutual labels: natural-language-processing, machine-translation

Bytenet Tensorflow

ByteNet for character-level language modelling

Stars: ✭ 319 (-49.84%)

Mutual labels: natural-language-processing, machine-translation

Zhihu

This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.

Stars: ✭ 3,307 (+419.97%)

Mutual labels: natural-language-processing, machine-translation

Awesome Text Generation

A curated list of recent models of text generation and application

Stars: ✭ 370 (-41.82%)

Mutual labels: natural-language-processing, text-generation

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+295.91%)

Mutual labels: natural-language-processing, machine-translation

Summarization Papers

Stars: ✭ 238 (-62.58%)

Mutual labels: natural-language-processing, text-generation

Mtbook

《机器翻译：基础与模型》肖桐朱靖波著 - Machine Translation: Foundations and Models

Stars: ✭ 2,307 (+262.74%)

Mutual labels: natural-language-processing, machine-translation

Textgan Pytorch

TextGAN is a PyTorch framework for Generative Adversarial Networks (GANs) based text generation models.

Stars: ✭ 479 (-24.69%)

Mutual labels: natural-language-processing, text-generation

Nonautoreggenprogress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Stars: ✭ 118 (-81.45%)

Mutual labels: natural-language-processing, machine-translation

Awesome Ai Services

An overview of the AI-as-a-service landscape

Stars: ✭ 133 (-79.09%)

Mutual labels: natural-language-processing, machine-translation

Deep-NLP-Resources

Curated list of all NLP Resources

Stars: ✭ 65 (-89.78%)

Mutual labels: machine-translation, text-generation

Nlp Progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Stars: ✭ 19,518 (+2968.87%)

Mutual labels: natural-language-processing, machine-translation

View All Similar Projects ➔

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar-PyTorch was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this repository is maintained by Petuum Open Source.

Texar-PyTorch integrates many of the best features of TensorFlow into PyTorch, delivering highly usable and customizable modules superior to PyTorch native ones.

Key Features

Two Versions, (Mostly) Same Interfaces. Texar-PyTorch (this repo) and Texar-TF have mostly the same interfaces. Both further combine the best design of TF and PyTorch:
- Interfaces and variable sharing in PyTorch convention
- Excellent factorization and rich functionalities in TF convention.
Versatile to support broad needs:
- data processing, model architectures, loss functions, training and inference algorithms, evaluation, ...
- encoder(s) to decoder(s), sequential- and self-attentions, memory, hierarchical models, classifiers, ...
- maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, ...
Fully Customizable at multiple abstraction level -- both novice-friendly and expert-friendly.
- Free to plug in whatever external modules, since Texar is fully compatible with the native PyTorch APIs.
Modularized for maximal re-use and clean APIs, based on principled decomposition of Learning-Inference-Model Architecture.
Rich Pre-trained Models, Rich Usage with Uniform Interfaces. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!
Clean, detailed documentation and rich examples.

Library API Example

A code example that builds and trains a Conditional GPT2 model (e.g., for machine translation and text summarization):

import texar.torch as tx
from texar.torch.run import *

# (1) Modeling
class ConditionalGPT2Model(nn.Module):
  """An encoder-decoder model with GPT-2 as the decoder."""
  def __init__(self, vocab_size):
    super().__init__()
    # Use hyperparameter dict for model configuration
    self.embedder = tx.modules.WordEmbedder(vocab_size, hparams=emb_hparams)
    self.encoder = tx.modules.TransformerEncoder(hparams=enc_hparams)
    self.decoder = tx.modules.GPT2Decoder("gpt2-small")  # With pre-trained weights

  def _get_decoder_output(self, batch, train=True):
    """Perform model inference, i.e., decoding."""
    enc_states = self.encoder(inputs=self.embedder(batch['source_text_ids']),
                              sequence_length=batch['source_length'])
    if train:  # Teacher-forcing decoding at training time
      return self.decoder(
          inputs=batch['target_text_ids'], sequence_length=batch['target_length'] - 1,
          memory=enc_states, memory_sequence_length=batch['source_length'])
    else:      # Beam search decoding at prediction time
      start_tokens = torch.full_like(batch['source_text_ids'][:, 0], BOS)
      return self.decoder(
          beam_width=5, start_tokens=start_tokens,
          memory=enc_states, memory_sequence_length=batch['source_length'])

  def forward(self, batch):
    """Compute training loss."""
    outputs = self._get_decoder_output(batch)
    loss = tx.losses.sequence_sparse_softmax_cross_entropy(  # Sequence loss
        labels=batch['target_text_ids'][:, 1:], logits=outputs.logits,
        sequence_length=batch['target_length'] - 1)  # Automatic masking
    return {"loss": loss}

  def predict(self, batch):
    """Compute model predictions."""
    sequence, _ = self._get_decoder_output(batch, train=False)
    return {"gen_text_ids": sequence}

  
# (2) Data
# Create dataset splits using built-in data loaders
datasets = {split: tx.data.PairedTextData(hparams=data_hparams[split])
            for split in ["train", "valid", "test"]}

model = ConditionalGPT2Model(datasets["train"].target_vocab.size)

# (3) Training
# Manage the train-eval loop with the Executor API
executor = Executor(
  model=model, datasets=datasets,
  optimizer={"type": torch.optim.Adam, "kwargs": {"lr": 5e-4}},
  stop_training_on=cond.epoch(20),
  log_every=cond.iteration(100),
  validate_every=cond.epoch(1),
  train_metric=("loss", metric.RunningAverage(10, pred_name="loss")),
  valid_metric=metric.BLEU(pred_name="gen_text_ids", label_name="target_text_ids"),
  save_every=cond.validation(better=True),
  checkpoint_dir="outputs/saved_models/")
executor.train()
executor.test(datasets["test"])

Many more examples are available here.

Installation

Texar-PyTorch requires:

python == 3.6 or 3.7
torch >= 1.0.0. Please follow the official instructions to install the appropriate version.

After torch is installed, install Texar from PyPI:

pip install texar-pytorch

To use cutting-edge features or develop locally, install from source:

git clone https://github.com/asyml/texar-pytorch.git
cd texar-pytorch
pip install .

To use tensorboard support with Executor, please install tensorboardX with the following command

pip install tensorboardX

Getting Started

Reference

If you use Texar, please cite the tech report with the following BibTex entry:

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wanrong Zhu, Devendra Sachan and Eric Xing
ACL 2019

@inproceedings{hu2019texar,
  title={Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation},
  author={Hu, Zhiting and Shi, Haoran and Tan, Bowen and Wang, Wentao and Yang, Zichao and Zhao, Tiancheng and He, Junxian and Qin, Lianhui and Wang, Di and others},
  booktitle={ACL 2019, System Demonstrations},
  year={2019}
}

License

Apache License 2.0

Companies and Universities Supporting Texar

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 636

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (36) 🔗