All Projects → Kyubyong → Bert Token Embeddings

Kyubyong / Bert Token Embeddings

Licence: apache-2.0

Projects that are alternatives of or similar to Bert Token Embeddings

Algorithmsanddatastructuresinaction
Advanced Data Structures Implementation
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Openmiir
a public domain dataset of EEG recordings for music imagery information retrieval
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
3dpeople Dataset
Visualize 3DPeople Dataset
Stars: ✭ 96 (+0%)
Mutual labels:  jupyter-notebook
Pytorch Tf
Converting a pretrained pytorch model to tensorflow
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Codekatas
Keep your skills sharp by implementing basic algorithms and data structures
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Experiments
Some research experiments
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Deep Learning Coursera
Deep Learning Specialization by Andrew Ng on Coursera.
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Gcp For Bioinformatics
GCP Essentials for Bioinformatics Researchers
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Qiita contents
This is a repository for submitted contents of Qiita.
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Covid Mobility
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Url Classification
Machine learning to classify Malicious (Spam)/Benign URL's
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Porousmediagan
Reconstruction of three-dimensional porous media using generative adversarial neural networks
Stars: ✭ 94 (-2.08%)
Mutual labels:  jupyter-notebook
Breze
Breze with all the stuff.
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Transferlearningtutorial
Applying transfer learning to a custom dataset by retraining Inception's final layer
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (+0%)
Mutual labels:  jupyter-notebook
Python Thenotheoryguide
Jupyter NoteBooks to get you boosted with the basics of python with hands-on-practice.
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Rp extract
Rhythm Pattern music feature extractor by IFS @ TU-Vienna
Stars: ✭ 95 (-1.04%)
Mutual labels:  jupyter-notebook
Mimic Cxr
Code, documentation, and discussion around the MIMIC-CXR database
Stars: ✭ 96 (+0%)
Mutual labels:  jupyter-notebook
Person remover
People removal in images using Pix2Pix and YOLO.
Stars: ✭ 96 (+0%)
Mutual labels:  jupyter-notebook
Bnt162b2
Markdown version of Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine
Stars: ✭ 96 (+0%)
Mutual labels:  jupyter-notebook

Bert Pretrained Token Embeddings

BERT(BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding) yields pretrained token (=subword) embeddings. Let's extract and save them in the word2vec format so that they can be used for downstream tasks.

Requirements

  • pytorch_pretrained_bert
  • NumPy
  • tqdm

Extraction

  • Check extract.py.

Bert (Pretrained) Token Embeddings in word2vec format

Models # Vocab # Dim Notes
bert-base-uncased 30,522 768
bert-large-uncased 30,522 1024
bert-base-cased 28,996 768
bert-large-cased 28,996 1024
bert-base-multilingual-cased 119,547 768 Recommended
bert-base-multilingual-uncased 30,522 768 Not recommended
bert-base-chinese 21,128 768

Example

  • Check example.ipynb to see how to load (sub-)word vectors with gensim and plot them in 2d space using tSNE.

  • Related tokens to look

* Related tokens to ##go
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].