Kyubyong / Bert Token Embeddings
Licence: apache-2.0
Stars: ✭ 96
Labels
Projects that are alternatives of or similar to Bert Token Embeddings
Algorithmsanddatastructuresinaction
Advanced Data Structures Implementation
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Openmiir
a public domain dataset of EEG recordings for music imagery information retrieval
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Pytorch Tf
Converting a pretrained pytorch model to tensorflow
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Codekatas
Keep your skills sharp by implementing basic algorithms and data structures
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Deep Learning Coursera
Deep Learning Specialization by Andrew Ng on Coursera.
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Gcp For Bioinformatics
GCP Essentials for Bioinformatics Researchers
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Qiita contents
This is a repository for submitted contents of Qiita.
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Url Classification
Machine learning to classify Malicious (Spam)/Benign URL's
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Porousmediagan
Reconstruction of three-dimensional porous media using generative adversarial neural networks
Stars: ✭ 94 (-2.08%)
Mutual labels: jupyter-notebook
Transferlearningtutorial
Applying transfer learning to a custom dataset by retraining Inception's final layer
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (+0%)
Mutual labels: jupyter-notebook
Python Thenotheoryguide
Jupyter NoteBooks to get you boosted with the basics of python with hands-on-practice.
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Rp extract
Rhythm Pattern music feature extractor by IFS @ TU-Vienna
Stars: ✭ 95 (-1.04%)
Mutual labels: jupyter-notebook
Mimic Cxr
Code, documentation, and discussion around the MIMIC-CXR database
Stars: ✭ 96 (+0%)
Mutual labels: jupyter-notebook
Person remover
People removal in images using Pix2Pix and YOLO.
Stars: ✭ 96 (+0%)
Mutual labels: jupyter-notebook
Bnt162b2
Markdown version of Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine
Stars: ✭ 96 (+0%)
Mutual labels: jupyter-notebook
Bert Pretrained Token Embeddings
BERT(BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding) yields pretrained token (=subword) embeddings. Let's extract and save them in the word2vec format so that they can be used for downstream tasks.
Requirements
- pytorch_pretrained_bert
- NumPy
- tqdm
Extraction
- Check
extract.py
.
Bert (Pretrained) Token Embeddings in word2vec format
Models | # Vocab | # Dim | Notes |
---|---|---|---|
bert-base-uncased | 30,522 | 768 | |
bert-large-uncased | 30,522 | 1024 | |
bert-base-cased | 28,996 | 768 | |
bert-large-cased | 28,996 | 1024 | |
bert-base-multilingual-cased | 119,547 | 768 | Recommended |
bert-base-multilingual-uncased | 30,522 | 768 | Not recommended |
bert-base-chinese | 21,128 | 768 |
Example
-
Check
example.ipynb
to see how to load (sub-)word vectors with gensim and plot them in 2d space using tSNE. -
Related tokens to look
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].