Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Stars: ✭ 891 (+175.85%)

Mutual labels: datasets, natural-language-processing

Flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Stars: ✭ 11,065 (+3325.7%)

Mutual labels: natural-language-processing, word-embeddings

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (-41.49%)

Mutual labels: natural-language-processing, word-embeddings

Textblob Ar

Arabic support for textblob

Stars: ✭ 60 (-81.42%)

Mutual labels: natural-language-processing, word-embeddings

Doccano

Open source annotation tool for machine learning practitioners.

Stars: ✭ 5,600 (+1633.75%)

Mutual labels: datasets, natural-language-processing

Projects

🪐 End-to-end NLP workflows from prototype to production

Stars: ✭ 397 (+22.91%)

Mutual labels: datasets, natural-language-processing

Kadot

Kadot, the unsupervised natural language processing library.

Stars: ✭ 108 (-66.56%)

Mutual labels: natural-language-processing, word-embeddings

Aidl kb

A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)

Stars: ✭ 219 (-32.2%)

Mutual labels: datasets, natural-language-processing

Easy Bert

A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)

Stars: ✭ 106 (-67.18%)

Mutual labels: natural-language-processing, word-embeddings

Scattertext

Beautiful visualizations of how language differs among document types.

Stars: ✭ 1,722 (+433.13%)

Mutual labels: natural-language-processing, word-embeddings

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+331.58%)

Mutual labels: natural-language-processing, word-embeddings

Vec4ir

Word Embeddings for Information Retrieval

Stars: ✭ 188 (-41.8%)

Mutual labels: natural-language-processing, word-embeddings

Syntree2vec

An algorithm to augment syntactic hierarchy into word embeddings

Stars: ✭ 9 (-97.21%)

Mutual labels: natural-language-processing, word-embeddings

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-87.93%)

Mutual labels: natural-language-processing, word-embeddings

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+893.5%)

Mutual labels: natural-language-processing, word-embeddings

Codesearchnet

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+326.63%)

Mutual labels: datasets, natural-language-processing

View All Similar Projects ➔

chakin

chakin is a downloader for pre-trained word vectors. Supported many vectors

This library lets you download pre-trained word vectors without troublesome work.

Installation

To install chakin, simply:

$ pip install chakin

Usage

You can download pre-trained word vectors as follows:

$ python

>>> import chakin
>>> chakin.search(lang='English')
                   Name  Dimension                     Corpus VocabularySize  
2          fastText(en)        300                  Wikipedia           2.5M   
11         GloVe.6B.50d         50  Wikipedia+Gigaword 5 (6B)           400K   
12        GloVe.6B.100d        100  Wikipedia+Gigaword 5 (6B)           400K   
13        GloVe.6B.200d        200  Wikipedia+Gigaword 5 (6B)           400K   
14        GloVe.6B.300d        300  Wikipedia+Gigaword 5 (6B)           400K   
15       GloVe.42B.300d        300          Common Crawl(42B)           1.9M   
16      GloVe.840B.300d        300         Common Crawl(840B)           2.2M   
17    GloVe.Twitter.25d         25               Twitter(27B)           1.2M   
18    GloVe.Twitter.50d         50               Twitter(27B)           1.2M   
19   GloVe.Twitter.100d        100               Twitter(27B)           1.2M   
20   GloVe.Twitter.200d        200               Twitter(27B)           1.2M   
21  word2vec.GoogleNews        300          Google News(100B)           3.0M 

>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% ||               | Time: 0:00:02  60.7 MiB/s
'./wiki.en.vec'

Supported vectors

So far, chakin supports following word vectors:

Name	Dimension	Corpus	VocabularySize	Method	Language
fastText(ar)	300	Wikipedia	610K	fastText	Arabic
fastText(de)	300	Wikipedia	2.3M	fastText	German
fastText(en)	300	Wikipedia	2.5M	fastText	English
fastText(es)	300	Wikipedia	985K	fastText	Spanish
fastText(fr)	300	Wikipedia	1.2M	fastText	French
fastText(it)	300	Wikipedia	871K	fastText	Italian
fastText(ja)	300	Wikipedia	580K	fastText	Japanese
fastText(ko)	300	Wikipedia	880K	fastText	Korean
fastText(pt)	300	Wikipedia	592K	fastText	Portuguese
fastText(ru)	300	Wikipedia	1.9M	fastText	Russian
fastText(zh)	300	Wikipedia	330K	fastText	Chinese
GloVe.6B.50d	50	Wikipedia+Gigaword 5 (6B)	400K	GloVe	English
GloVe.6B.100d	100	Wikipedia+Gigaword 5 (6B)	400K	GloVe	English
GloVe.6B.200d	200	Wikipedia+Gigaword 5 (6B)	400K	GloVe	English
GloVe.6B.300d	300	Wikipedia+Gigaword 5 (6B)	400K	GloVe	English
GloVe.42B.300d	300	Common Crawl(42B)	1.9M	GloVe	English
GloVe.840B.300d	300	Common Crawl(840B)	2.2M	GloVe	English
GloVe.Twitter.25d	25	Twitter(27B)	1.2M	GloVe	English
GloVe.Twitter.50d	50	Twitter(27B)	1.2M	GloVe	English
GloVe.Twitter.100d	100	Twitter(27B)	1.2M	GloVe	English
GloVe.Twitter.200d	200	Twitter(27B)	1.2M	GloVe	English
word2vec.GoogleNews	300	Google News(100B)	3.0M	word2vec	English
word2vec.Wiki-NEologd.50d	50	Wikipedia	335K	word2vec + NEologd	Japanese

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 323

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗