All Projects → EleutherAI → Gpt Neox

EleutherAI / Gpt Neox

Licence: apache-2.0
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gpt Neox

FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (-32.67%)
Mutual labels:  language-model
minicons
Utility for analyzing Transformer based representations of language.
Stars: ✭ 28 (-90.76%)
Mutual labels:  language-model
A Pytorch Tutorial To Sequence Labeling
Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling
Stars: ✭ 257 (-15.18%)
Mutual labels:  language-model
gpt-j
A GPT-J API to use with python3 to generate text, blogs, code, and more
Stars: ✭ 101 (-66.67%)
Mutual labels:  language-model
pyVHDLParser
Streaming based VHDL parser.
Stars: ✭ 51 (-83.17%)
Mutual labels:  language-model
DataAugmentationNMT
Data Augmentation for Neural Machine Translation
Stars: ✭ 26 (-91.42%)
Mutual labels:  language-model
chainer-notebooks
Jupyter notebooks for Chainer hands-on
Stars: ✭ 23 (-92.41%)
Mutual labels:  language-model
Transfer Nlp
NLP library designed for reproducible experimentation management
Stars: ✭ 287 (-5.28%)
Mutual labels:  language-model
tensorflow-with-kenlm
Tensorflow with KenLM integrated for beam search scoring
Stars: ✭ 30 (-90.1%)
Mutual labels:  language-model
few-shot-lm
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
Stars: ✭ 32 (-89.44%)
Mutual labels:  language-model
CodeT5
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
Stars: ✭ 390 (+28.71%)
Mutual labels:  language-model
MinTL
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Stars: ✭ 61 (-79.87%)
Mutual labels:  language-model
python-arpa
🐍 Python library for n-gram models in ARPA format
Stars: ✭ 35 (-88.45%)
Mutual labels:  language-model
Word-Prediction-Ngram
Next Word Prediction using n-gram Probabilistic Model with various Smoothing Techniques
Stars: ✭ 25 (-91.75%)
Mutual labels:  language-model
Bluebert
BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).
Stars: ✭ 273 (-9.9%)
Mutual labels:  language-model
language-planner
Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
Stars: ✭ 84 (-72.28%)
Mutual labels:  language-model
SDLM-pytorch
Code accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts
Stars: ✭ 27 (-91.09%)
Mutual labels:  language-model
Xlnet Pytorch
An implementation of Google Brain's 2019 XLNet in PyTorch
Stars: ✭ 304 (+0.33%)
Mutual labels:  language-model
Bertweet
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Stars: ✭ 282 (-6.93%)
Mutual labels:  language-model
Chinese-Word-Segmentation-in-NLP
State of the art Chinese Word Segmentation with Bi-LSTMs
Stars: ✭ 23 (-92.41%)
Mutual labels:  language-model

GitHub issues Weights & Biases monitoring

GPT-NeoX

This repository records EleutherAI's work-in-progress for training large scale GPU language models. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations.

If you are looking for our TPU codebase, see GPT-Neo.

GPT-NeoX is under active development and rough around the edges. GPT-NeoX is a complicated beast that will take time and patience to work on any specific environment.

Important: GPT-NeoX is a framework for training very large models on large numbers of GPUs. If you are trying to train on fewer than 8 GPUs or a model smaller than 1.5 billion parameters, you probably don't need GPT-NeoX.

Getting Started

Our codebase relies on DeeperSpeed, a custom modification to the DeepSpeed library. We strongly recommend using Anaconda, a virtual machine, or some other form of environment isolation before installing from requirements.txt. Failure to do so may cause other repositories that rely on DeepSpeed to break.

Datasets

Once you've installed requirements.txt, the next step is obtaining and processing data. For demonstrative purposes we have hosted the Enron Emails corpus and made it available for downloading. Running python prepare_data.py will download and process the dataset for language modeling. To use your own data, extend the DataDownloader class in tools/corpa.pyand register the new class in the DATA_DOWNLOADERS dict. Once this is done, you can add prepare_dataset(dataset_name) to process_data.py to load your data.

TO DO: Make a table showing the datasets currently available for download. List the name, size on disk (compressed), actual size, and number of tokens.

Training

GPT-NeoX is launched using the deepy.py script which is the root folder of this repo. You also need to ensure that repo root directory is added to the Python path so that the megatron folder is importable.

Example usage:

./deepy.py pretrain_gpt2.py -d configs pretrain_gpt2.yml local_setup.yml

This will:

  • Deploy the pretrain_gpt2.py script on all nodes with one process per GPU. The worker nodes and number of GPUs are specified in the /job/hostfile file (see parameter documentation). The worker processes are deployed by default using pdsh.
  • Model parameters are defined in the config file configs/ds_pretrain_gpt2.yml (configuration directory is configs/) which are used by GPT-NeoX
  • Data path parameters are defined in the config file configs/local_setup.yml. If you are an EleutherAI member and using the Kubernetes cluster, the eleutherai_cluster.yml config should be instead.

Further examples are contained in the examples folder.

Configuration and parameters

GPT-NeoX parameters are defined in a YAML configuration file which is passed to the deepy.py launcher - for examples see the configs folder and the examples folder. For a full list of parameters and documentation see corresponding readme.

Features

Model Structure

Positional Encodings:

Sparsity: Sparse attention kernels are supported, but they require model parallelism to be turned off. This is subject to change with updates in Deepspeed

Optimizers

Zero Redundnacy Optimizer (ZeRO): ZeRO stage 1 works seamlessly with NeoX, while ZeRO stage 2 does not, as it requires disabling pipeline parallelsm due to conflicts with gradient checkpointing among the two features.

ZeRO-Offloding: ZeRO-offloading requires ZeRO stage 2, hence is not supported.

1-Bit Adam:

Memory Optimizations

Data Parallel: Data parallelism is a ubiquitous technique in deep learning in which each input batch of training data is split among the data parallel workers. It is integrated into NeoX

Model Parallel: Model Parallelism is a broad class of techniques that partitions the individual layers of the model across workers. Model Parallelism is built into NeoX as it is a part of Megatron-LM

Pipeline Parallel: Pipeline parallelism divides the layers of the model into stages that can be processed in parallel. It is integrated into deepspeed itself.

Mixed Precision Training: Mixed precision training computes some operations in FP16 while some others in FP32, such as computing the forward pass and the gradient in fp16 and updating the weights in fp32. Mixed precision training is integrated into deepspeed as well.

Monitoring

EleutherAI is currently using Weights & Biases to record experiments. If you are logged into Weights & Biases on your machine - you can do this by executing wandb login - your runs will automatically be recorded. Additionally, set the config parameter wandb_team if you would like the run to be added to an organisation/team account.

Eleuther Cluster

We run our experiments on a Kubernetes cluster generously provided by CoreWeave. The /kubernetes/ directory contains code designed to facilitate work on our server. If you are an EleutherAI member, see the corresponding read-me for information about how to use our cluster.

Licensing

This repository hosts code that is part of EleutherAI's GPT-NeoX project. Copyright (c) 2021 Stella Biderman, Sid Black, Eric Hallahan, Josh Levy-Kramer, Michael Pieler, Shivanshu Purohit.

GPT-NeoX is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

This repository is based off code written by NVIDIA that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by NIVIDIA maintain a NVIDIA copyright header. All files that do not contain such a header are original to EleutherAI. When the NVIDIA code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.

For full terms, see the LICENSE file. If you have any questions, comments, or concerns about licensing please email us at [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].