All Projects → bigscience-workshop → bigscience

bigscience-workshop / bigscience

Licence: other
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Programming Languages

shell
77523 projects
python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to bigscience

Dynamic Training Bench
Simplify the training and tuning of Tensorflow models
Stars: ✭ 210 (-19.54%)
Mutual labels:  training, models
thelper
Training framework & tools for PyTorch-based machine learning projects.
Stars: ✭ 14 (-94.64%)
Mutual labels:  training, models
ModelZoo.pytorch
Hands on Imagenet training. Unofficial ModelZoo project on Pytorch. MobileNetV3 Top1 75.64🌟 GhostNet1.3x 75.78🌟
Stars: ✭ 42 (-83.91%)
Mutual labels:  training
spacy-universal-sentence-encoder
Google USE (Universal Sentence Encoder) for spaCy
Stars: ✭ 102 (-60.92%)
Mutual labels:  models
Multi-Person-Pose-using-Body-Parts
No description or website provided.
Stars: ✭ 41 (-84.29%)
Mutual labels:  training
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Stars: ✭ 567 (+117.24%)
Mutual labels:  training
carto-workshop
CARTO training materials
Stars: ✭ 81 (-68.97%)
Mutual labels:  training
Deep-Learning-Models
Deep Learning Models implemented in python.
Stars: ✭ 17 (-93.49%)
Mutual labels:  models
Teaching-Data-Visualisation
Presentation and exercises for the Software Sustainability Institute Research Data Visualisation Workshop (RDVW)
Stars: ✭ 15 (-94.25%)
Mutual labels:  training
tensorpeers
p2p peer-to-peer training of tensorflow models
Stars: ✭ 57 (-78.16%)
Mutual labels:  training
CPPE-Dataset
Code for our paper CPPE - 5 (Medical Personal Protective Equipment), a new challenging object detection dataset
Stars: ✭ 42 (-83.91%)
Mutual labels:  models
traindown-dart
Dart (and Flutter) library for the Traindown Markup Language. This is the reference implementation for now. It is first to receive features and fixes.
Stars: ✭ 16 (-93.87%)
Mutual labels:  training
Wipro-PJP
Code written during Wipro PJP. 🍵📑
Stars: ✭ 60 (-77.01%)
Mutual labels:  training
MaxibonKataKotlin
Maxibon kata for Kotlin Developers. The main goal is to practice property based testing.
Stars: ✭ 42 (-83.91%)
Mutual labels:  training
curriculum-foundation
iSAQB Curriculum for the CPSA - Foundation Level. This repository contains copyrighted work.
Stars: ✭ 35 (-86.59%)
Mutual labels:  training
model-zoo-old
The ONNX Model Zoo is a collection of pre-trained models for state of the art models in deep learning, available in the ONNX format
Stars: ✭ 38 (-85.44%)
Mutual labels:  models
KataSuperHeroesIOS
Super heroes kata for iOS Developers. The main goal is to practice UI Testing.
Stars: ✭ 69 (-73.56%)
Mutual labels:  training
diwa
A Deliberately Insecure Web Application
Stars: ✭ 32 (-87.74%)
Mutual labels:  training
pydbantic
A single model for shaping, creating, accessing, storing data within a Database
Stars: ✭ 137 (-47.51%)
Mutual labels:  models
condvis
Visualisation for statistical models.
Stars: ✭ 20 (-92.34%)
Mutual labels:  models

bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

  1. https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
  2. https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

  • JZ - Lots of information about our work environment which helps evaluate, plan and get things done
  • Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
  • Datasets info
  • Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size 1B3 760M 350M 125M
C4 + low warmup a b c
OSCAR + low warmup f
C4 + high warmup e
OSCAR + high warmup d (current baseline) g h i
Pile + high warmup m j k l

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].