Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

EleutherAI / Gpt Neox

Licence: apache-2.0

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

language-model

Projects that are alternatives of or similar to Gpt Neox

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Stars: ✭ 204 (-32.67%)

Mutual labels: language-model

minicons

Utility for analyzing Transformer based representations of language.

Stars: ✭ 28 (-90.76%)

Mutual labels: language-model

A Pytorch Tutorial To Sequence Labeling

Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling

Stars: ✭ 257 (-15.18%)

Mutual labels: language-model

gpt-j

A GPT-J API to use with python3 to generate text, blogs, code, and more

Stars: ✭ 101 (-66.67%)

Mutual labels: language-model

pyVHDLParser

Streaming based VHDL parser.

Stars: ✭ 51 (-83.17%)

Mutual labels: language-model

DataAugmentationNMT

Data Augmentation for Neural Machine Translation

Stars: ✭ 26 (-91.42%)

Mutual labels: language-model

chainer-notebooks

Jupyter notebooks for Chainer hands-on

Stars: ✭ 23 (-92.41%)

Mutual labels: language-model

Transfer Nlp

NLP library designed for reproducible experimentation management

Stars: ✭ 287 (-5.28%)

Mutual labels: language-model

tensorflow-with-kenlm

Tensorflow with KenLM integrated for beam search scoring

Stars: ✭ 30 (-90.1%)

Mutual labels: language-model

few-shot-lm

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Stars: ✭ 32 (-89.44%)

Mutual labels: language-model

CodeT5

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Stars: ✭ 390 (+28.71%)

Mutual labels: language-model

MinTL

MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems

Stars: ✭ 61 (-79.87%)

Mutual labels: language-model

python-arpa

🐍 Python library for n-gram models in ARPA format

Stars: ✭ 35 (-88.45%)

Mutual labels: language-model

Word-Prediction-Ngram

Next Word Prediction using n-gram Probabilistic Model with various Smoothing Techniques

Stars: ✭ 25 (-91.75%)

Mutual labels: language-model

Bluebert

BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).

Stars: ✭ 273 (-9.9%)

Mutual labels: language-model

language-planner

Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"

Stars: ✭ 84 (-72.28%)

Mutual labels: language-model

SDLM-pytorch

Code accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts

Stars: ✭ 27 (-91.09%)

Mutual labels: language-model

Xlnet Pytorch

An implementation of Google Brain's 2019 XLNet in PyTorch

Stars: ✭ 304 (+0.33%)

Mutual labels: language-model

Bertweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Stars: ✭ 282 (-6.93%)

Mutual labels: language-model

Chinese-Word-Segmentation-in-NLP

State of the art Chinese Word Segmentation with Bi-LSTMs

Stars: ✭ 23 (-92.41%)

Mutual labels: language-model

View All Similar Projects ➔

GPT-NeoX

This repository records EleutherAI's work-in-progress for training large scale GPU language models. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations.

If you are looking for our TPU codebase, see GPT-Neo.

GPT-NeoX is under active development and rough around the edges. GPT-NeoX is a complicated beast that will take time and patience to work on any specific environment.

Important: GPT-NeoX is a framework for training very large models on large numbers of GPUs. If you are trying to train on fewer than 8 GPUs or a model smaller than 1.5 billion parameters, you probably don't need GPT-NeoX.

Getting Started

Our codebase relies on DeeperSpeed, a custom modification to the DeepSpeed library. We strongly recommend using Anaconda, a virtual machine, or some other form of environment isolation before installing from requirements.txt. Failure to do so may cause other repositories that rely on DeepSpeed to break.

Datasets

Once you've installed requirements.txt, the next step is obtaining and processing data. For demonstrative purposes we have hosted the Enron Emails corpus and made it available for downloading. Running python prepare_data.py will download and process the dataset for language modeling. To use your own data, extend the DataDownloader class in tools/corpa.pyand register the new class in the DATA_DOWNLOADERS dict. Once this is done, you can add prepare_dataset(dataset_name) to process_data.py to load your data.

TO DO: Make a table showing the datasets currently available for download. List the name, size on disk (compressed), actual size, and number of tokens.

Training

GPT-NeoX is launched using the deepy.py script which is the root folder of this repo. You also need to ensure that repo root directory is added to the Python path so that the megatron folder is importable.

Example usage:

./deepy.py pretrain_gpt2.py -d configs pretrain_gpt2.yml local_setup.yml

This will:

Deploy the pretrain_gpt2.py script on all nodes with one process per GPU. The worker nodes and number of GPUs are specified in the /job/hostfile file (see parameter documentation). The worker processes are deployed by default using pdsh.
Model parameters are defined in the config file configs/ds_pretrain_gpt2.yml (configuration directory is configs/) which are used by GPT-NeoX
Data path parameters are defined in the config file configs/local_setup.yml. If you are an EleutherAI member and using the Kubernetes cluster, the eleutherai_cluster.yml config should be instead.

Further examples are contained in the examples folder.

Configuration and parameters

GPT-NeoX parameters are defined in a YAML configuration file which is passed to the deepy.py launcher - for examples see the configs folder and the examples folder. For a full list of parameters and documentation see corresponding readme.

Features

Model Structure

Positional Encodings:

Sparsity: Sparse attention kernels are supported, but they require model parallelism to be turned off. This is subject to change with updates in Deepspeed

Optimizers

Zero Redundnacy Optimizer (ZeRO): ZeRO stage 1 works seamlessly with NeoX, while ZeRO stage 2 does not, as it requires disabling pipeline parallelsm due to conflicts with gradient checkpointing among the two features.

ZeRO-Offloding: ZeRO-offloading requires ZeRO stage 2, hence is not supported.

1-Bit Adam:

Memory Optimizations

Data Parallel: Data parallelism is a ubiquitous technique in deep learning in which each input batch of training data is split among the data parallel workers. It is integrated into NeoX

Model Parallel: Model Parallelism is a broad class of techniques that partitions the individual layers of the model across workers. Model Parallelism is built into NeoX as it is a part of Megatron-LM

Pipeline Parallel: Pipeline parallelism divides the layers of the model into stages that can be processed in parallel. It is integrated into deepspeed itself.

Mixed Precision Training: Mixed precision training computes some operations in FP16 while some others in FP32, such as computing the forward pass and the gradient in fp16 and updating the weights in fp32. Mixed precision training is integrated into deepspeed as well.

Monitoring

EleutherAI is currently using Weights & Biases to record experiments. If you are logged into Weights & Biases on your machine - you can do this by executing wandb login - your runs will automatically be recorded. Additionally, set the config parameter wandb_team if you would like the run to be added to an organisation/team account.

Eleuther Cluster

We run our experiments on a Kubernetes cluster generously provided by CoreWeave. The /kubernetes/ directory contains code designed to facilitate work on our server. If you are an EleutherAI member, see the corresponding read-me for information about how to use our cluster.

Licensing

GPT-NeoX is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

This repository is based off code written by NVIDIA that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by NIVIDIA maintain a NVIDIA copyright header. All files that do not contain such a header are original to EleutherAI. When the NVIDIA code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.

For full terms, see the LICENSE file. If you have any questions, comments, or concerns about licensing please email us at [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 303

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗