All Projects → chakki-works → Chakin

chakki-works / Chakin

Licence: mit
Simple downloader for pre-trained word vectors

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chakin

Biosentvec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
Stars: ✭ 308 (-4.64%)
Mutual labels:  natural-language-processing, word-embeddings
Wordgcn
ACL 2019: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
Stars: ✭ 230 (-28.79%)
Mutual labels:  natural-language-processing, word-embeddings
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+3851.39%)
Mutual labels:  natural-language-processing, word-embeddings
Danlp
DaNLP is a repository for Natural Language Processing resources for the Danish Language.
Stars: ✭ 111 (-65.63%)
Mutual labels:  natural-language-processing, word-embeddings
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+175.85%)
Mutual labels:  datasets, natural-language-processing
Flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Stars: ✭ 11,065 (+3325.7%)
Mutual labels:  natural-language-processing, word-embeddings
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (-41.49%)
Mutual labels:  natural-language-processing, word-embeddings
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-81.42%)
Mutual labels:  natural-language-processing, word-embeddings
Doccano
Open source annotation tool for machine learning practitioners.
Stars: ✭ 5,600 (+1633.75%)
Mutual labels:  datasets, natural-language-processing
Projects
🪐 End-to-end NLP workflows from prototype to production
Stars: ✭ 397 (+22.91%)
Mutual labels:  datasets, natural-language-processing
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-66.56%)
Mutual labels:  natural-language-processing, word-embeddings
Aidl kb
A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)
Stars: ✭ 219 (-32.2%)
Mutual labels:  datasets, natural-language-processing
Easy Bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Stars: ✭ 106 (-67.18%)
Mutual labels:  natural-language-processing, word-embeddings
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+433.13%)
Mutual labels:  natural-language-processing, word-embeddings
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+331.58%)
Mutual labels:  natural-language-processing, word-embeddings
Vec4ir
Word Embeddings for Information Retrieval
Stars: ✭ 188 (-41.8%)
Mutual labels:  natural-language-processing, word-embeddings
Syntree2vec
An algorithm to augment syntactic hierarchy into word embeddings
Stars: ✭ 9 (-97.21%)
Mutual labels:  natural-language-processing, word-embeddings
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-87.93%)
Mutual labels:  natural-language-processing, word-embeddings
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+893.5%)
Mutual labels:  natural-language-processing, word-embeddings
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+326.63%)
Mutual labels:  datasets, natural-language-processing

chakin

chakin is a downloader for pre-trained word vectors. Supported many vectors

This library lets you download pre-trained word vectors without troublesome work.



Installation

To install chakin, simply:

$ pip install chakin

Usage

You can download pre-trained word vectors as follows:

$ python
>>> import chakin
>>> chakin.search(lang='English')
                   Name  Dimension                     Corpus VocabularySize  
2          fastText(en)        300                  Wikipedia           2.5M   
11         GloVe.6B.50d         50  Wikipedia+Gigaword 5 (6B)           400K   
12        GloVe.6B.100d        100  Wikipedia+Gigaword 5 (6B)           400K   
13        GloVe.6B.200d        200  Wikipedia+Gigaword 5 (6B)           400K   
14        GloVe.6B.300d        300  Wikipedia+Gigaword 5 (6B)           400K   
15       GloVe.42B.300d        300          Common Crawl(42B)           1.9M   
16      GloVe.840B.300d        300         Common Crawl(840B)           2.2M   
17    GloVe.Twitter.25d         25               Twitter(27B)           1.2M   
18    GloVe.Twitter.50d         50               Twitter(27B)           1.2M   
19   GloVe.Twitter.100d        100               Twitter(27B)           1.2M   
20   GloVe.Twitter.200d        200               Twitter(27B)           1.2M   
21  word2vec.GoogleNews        300          Google News(100B)           3.0M 

>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% ||               | Time: 0:00:02  60.7 MiB/s
'./wiki.en.vec'

Supported vectors

So far, chakin supports following word vectors:

Name Dimension Corpus VocabularySize Method Language
fastText(ar) 300 Wikipedia 610K fastText Arabic
fastText(de) 300 Wikipedia 2.3M fastText German
fastText(en) 300 Wikipedia 2.5M fastText English
fastText(es) 300 Wikipedia 985K fastText Spanish
fastText(fr) 300 Wikipedia 1.2M fastText French
fastText(it) 300 Wikipedia 871K fastText Italian
fastText(ja) 300 Wikipedia 580K fastText Japanese
fastText(ko) 300 Wikipedia 880K fastText Korean
fastText(pt) 300 Wikipedia 592K fastText Portuguese
fastText(ru) 300 Wikipedia 1.9M fastText Russian
fastText(zh) 300 Wikipedia 330K fastText Chinese
GloVe.6B.50d 50 Wikipedia+Gigaword 5 (6B) 400K GloVe English
GloVe.6B.100d 100 Wikipedia+Gigaword 5 (6B) 400K GloVe English
GloVe.6B.200d 200 Wikipedia+Gigaword 5 (6B) 400K GloVe English
GloVe.6B.300d 300 Wikipedia+Gigaword 5 (6B) 400K GloVe English
GloVe.42B.300d 300 Common Crawl(42B) 1.9M GloVe English
GloVe.840B.300d 300 Common Crawl(840B) 2.2M GloVe English
GloVe.Twitter.25d 25 Twitter(27B) 1.2M GloVe English
GloVe.Twitter.50d 50 Twitter(27B) 1.2M GloVe English
GloVe.Twitter.100d 100 Twitter(27B) 1.2M GloVe English
GloVe.Twitter.200d 200 Twitter(27B) 1.2M GloVe English
word2vec.GoogleNews 300 Google News(100B) 3.0M word2vec English
word2vec.Wiki-NEologd.50d 50 Wikipedia 335K word2vec + NEologd Japanese
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].