All Projects → arrmansa → Basic-UI-for-GPT-J-6B-with-low-vram

arrmansa / Basic-UI-for-GPT-J-6B-with-low-vram

Licence: Apache-2.0 license
A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Basic-UI-for-GPT-J-6B-with-low-vram

Tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Stars: ✭ 5,077 (+5541.11%)
Mutual labels:  transformers, gpt
TransQuest
Transformer based translation quality estimation
Stars: ✭ 85 (-5.56%)
Mutual labels:  transformers
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+21.11%)
Mutual labels:  transformers
LIT
[AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"
Stars: ✭ 79 (-12.22%)
Mutual labels:  transformers
Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Stars: ✭ 1,813 (+1914.44%)
Mutual labels:  transformers
react-advertising
A JavaScript library for display ads in React applications.
Stars: ✭ 50 (-44.44%)
Mutual labels:  gpt
Nlp Architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Stars: ✭ 2,768 (+2975.56%)
Mutual labels:  transformers
text
Using Transformers from HuggingFace in R
Stars: ✭ 66 (-26.67%)
Mutual labels:  transformers
STAM-pytorch
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Stars: ✭ 109 (+21.11%)
Mutual labels:  transformers
thermostat
Collection of NLP model explanations and accompanying analysis tools
Stars: ✭ 126 (+40%)
Mutual labels:  transformers
SnowflakeNet
(TPAMI 2022) Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer
Stars: ✭ 74 (-17.78%)
Mutual labels:  transformers
finetune-gpt2xl
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
Stars: ✭ 353 (+292.22%)
Mutual labels:  gpt-neo
MultiOS-USB
Boot operating systems directly from ISO files
Stars: ✭ 106 (+17.78%)
Mutual labels:  gpt
KoBERT-Transformers
KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)
Stars: ✭ 162 (+80%)
Mutual labels:  transformers
jax-models
Unofficial JAX implementations of deep learning research papers
Stars: ✭ 108 (+20%)
Mutual labels:  transformers
nlp-papers
Must-read papers on Natural Language Processing (NLP)
Stars: ✭ 87 (-3.33%)
Mutual labels:  transformers
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+140%)
Mutual labels:  transformers
KB-ALBERT
KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델
Stars: ✭ 215 (+138.89%)
Mutual labels:  transformers
TabFormer
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
Stars: ✭ 209 (+132.22%)
Mutual labels:  gpt
Transformer-in-PyTorch
Transformer/Transformer-XL/R-Transformer examples and explanations
Stars: ✭ 21 (-76.67%)
Mutual labels:  transformers

Basic-UI-for-GPT-J-6B-with-low-vram

A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.

There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

How to run :

Use - pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3
Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch

Timing (2000 token context)

1

system -

16 gb ddr4 ram . 1070 8gb gpu.
23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

single run of the model(inputs) takes 6.5 seconds.
35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)

2

system -

16 gb ddr4 ram . 1060 6gb gpu.
26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].