chakki-works / Chakin
Licence: mit
Simple downloader for pre-trained word vectors
Stars: ✭ 323
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Chakin
Biosentvec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
Stars: ✭ 308 (-4.64%)
Mutual labels: natural-language-processing, word-embeddings
Wordgcn
ACL 2019: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
Stars: ✭ 230 (-28.79%)
Mutual labels: natural-language-processing, word-embeddings
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+3851.39%)
Mutual labels: natural-language-processing, word-embeddings
Danlp
DaNLP is a repository for Natural Language Processing resources for the Danish Language.
Stars: ✭ 111 (-65.63%)
Mutual labels: natural-language-processing, word-embeddings
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+175.85%)
Mutual labels: datasets, natural-language-processing
Flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Stars: ✭ 11,065 (+3325.7%)
Mutual labels: natural-language-processing, word-embeddings
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (-41.49%)
Mutual labels: natural-language-processing, word-embeddings
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-81.42%)
Mutual labels: natural-language-processing, word-embeddings
Doccano
Open source annotation tool for machine learning practitioners.
Stars: ✭ 5,600 (+1633.75%)
Mutual labels: datasets, natural-language-processing
Projects
🪐 End-to-end NLP workflows from prototype to production
Stars: ✭ 397 (+22.91%)
Mutual labels: datasets, natural-language-processing
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-66.56%)
Mutual labels: natural-language-processing, word-embeddings
Aidl kb
A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)
Stars: ✭ 219 (-32.2%)
Mutual labels: datasets, natural-language-processing
Easy Bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Stars: ✭ 106 (-67.18%)
Mutual labels: natural-language-processing, word-embeddings
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+433.13%)
Mutual labels: natural-language-processing, word-embeddings
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+331.58%)
Mutual labels: natural-language-processing, word-embeddings
Vec4ir
Word Embeddings for Information Retrieval
Stars: ✭ 188 (-41.8%)
Mutual labels: natural-language-processing, word-embeddings
Syntree2vec
An algorithm to augment syntactic hierarchy into word embeddings
Stars: ✭ 9 (-97.21%)
Mutual labels: natural-language-processing, word-embeddings
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-87.93%)
Mutual labels: natural-language-processing, word-embeddings
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+893.5%)
Mutual labels: natural-language-processing, word-embeddings
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+326.63%)
Mutual labels: datasets, natural-language-processing
chakin
chakin is a downloader for pre-trained word vectors. Supported many vectors
This library lets you download pre-trained word vectors without troublesome work.
Installation
To install chakin, simply:
$ pip install chakin
Usage
You can download pre-trained word vectors as follows:
$ python
>>> import chakin
>>> chakin.search(lang='English')
Name Dimension Corpus VocabularySize
2 fastText(en) 300 Wikipedia 2.5M
11 GloVe.6B.50d 50 Wikipedia+Gigaword 5 (6B) 400K
12 GloVe.6B.100d 100 Wikipedia+Gigaword 5 (6B) 400K
13 GloVe.6B.200d 200 Wikipedia+Gigaword 5 (6B) 400K
14 GloVe.6B.300d 300 Wikipedia+Gigaword 5 (6B) 400K
15 GloVe.42B.300d 300 Common Crawl(42B) 1.9M
16 GloVe.840B.300d 300 Common Crawl(840B) 2.2M
17 GloVe.Twitter.25d 25 Twitter(27B) 1.2M
18 GloVe.Twitter.50d 50 Twitter(27B) 1.2M
19 GloVe.Twitter.100d 100 Twitter(27B) 1.2M
20 GloVe.Twitter.200d 200 Twitter(27B) 1.2M
21 word2vec.GoogleNews 300 Google News(100B) 3.0M
>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% || | Time: 0:00:02 60.7 MiB/s
'./wiki.en.vec'
Supported vectors
So far, chakin supports following word vectors:
Name | Dimension | Corpus | VocabularySize | Method | Language |
---|---|---|---|---|---|
fastText(ar) | 300 | Wikipedia | 610K | fastText | Arabic |
fastText(de) | 300 | Wikipedia | 2.3M | fastText | German |
fastText(en) | 300 | Wikipedia | 2.5M | fastText | English |
fastText(es) | 300 | Wikipedia | 985K | fastText | Spanish |
fastText(fr) | 300 | Wikipedia | 1.2M | fastText | French |
fastText(it) | 300 | Wikipedia | 871K | fastText | Italian |
fastText(ja) | 300 | Wikipedia | 580K | fastText | Japanese |
fastText(ko) | 300 | Wikipedia | 880K | fastText | Korean |
fastText(pt) | 300 | Wikipedia | 592K | fastText | Portuguese |
fastText(ru) | 300 | Wikipedia | 1.9M | fastText | Russian |
fastText(zh) | 300 | Wikipedia | 330K | fastText | Chinese |
GloVe.6B.50d | 50 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.6B.100d | 100 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.6B.200d | 200 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.6B.300d | 300 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.42B.300d | 300 | Common Crawl(42B) | 1.9M | GloVe | English |
GloVe.840B.300d | 300 | Common Crawl(840B) | 2.2M | GloVe | English |
GloVe.Twitter.25d | 25 | Twitter(27B) | 1.2M | GloVe | English |
GloVe.Twitter.50d | 50 | Twitter(27B) | 1.2M | GloVe | English |
GloVe.Twitter.100d | 100 | Twitter(27B) | 1.2M | GloVe | English |
GloVe.Twitter.200d | 200 | Twitter(27B) | 1.2M | GloVe | English |
word2vec.GoogleNews | 300 | Google News(100B) | 3.0M | word2vec | English |
word2vec.Wiki-NEologd.50d | 50 | Wikipedia | 335K | word2vec + NEologd | Japanese |
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].