All Projects → may- → cnn-ld-tf

may- / cnn-ld-tf

Licence: other
Convolutional Neural Network for Language Detection in Tensorflow

Programming Languages

Jupyter Notebook
11667 projects
ocaml
1615 projects
python
139335 projects - #7 most used programming language
emacs lisp
2029 projects
Hy
24 projects
ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to cnn-ld-tf

Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+3733.33%)
Mutual labels:  language-detection
Spacy Cld
Language detection extension for spaCy 2.0+
Stars: ✭ 103 (+758.33%)
Mutual labels:  language-detection
Go Lang Detector
A small library in golang, that detects the language of a text. (text categorization)
Stars: ✭ 134 (+1016.67%)
Mutual labels:  language-detection
Geomate
GeoMate is a friend in need for all things geolocation. IP to geo lookup, automatic redirects (based on country, continent, language, etc), site switcher... You name it.
Stars: ✭ 19 (+58.33%)
Mutual labels:  language-detection
Guess Language.el
Emacs minor mode that detects the language you're typing in. Automatically switches spell checker. Supports multiple languages per document.
Stars: ✭ 78 (+550%)
Mutual labels:  language-detection
Nlp Models Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Stars: ✭ 1,603 (+13258.33%)
Mutual labels:  language-detection
Yii2 Localeurls
Automatic locale/language management for URLs
Stars: ✭ 384 (+3100%)
Mutual labels:  language-detection
Hms Ml Demo
HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
Stars: ✭ 187 (+1458.33%)
Mutual labels:  language-detection
Paasaa
Natural language detection for Elixir
Stars: ✭ 86 (+616.67%)
Mutual labels:  language-detection
Whatthelang
Lightning Fast Language Prediction 🚀
Stars: ✭ 130 (+983.33%)
Mutual labels:  language-detection
Cadscenario personalisation
This is a end to end Personalisation business scenario
Stars: ✭ 10 (-16.67%)
Mutual labels:  language-detection
Google Translate Php
🌐 Free Google Translate API PHP Package. Translates totally free of charge.
Stars: ✭ 1,131 (+9325%)
Mutual labels:  language-detection
Padatious
A neural network intent parser
Stars: ✭ 124 (+933.33%)
Mutual labels:  language-detection
Language Detection
A language detection library for PHP. Detects the language from a given text string.
Stars: ✭ 665 (+5441.67%)
Mutual labels:  language-detection
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+20883.33%)
Mutual labels:  language-detection
Enry
A faster file programming language detector
Stars: ✭ 435 (+3525%)
Mutual labels:  language-detection
React Native Localize
🌍 A toolbox for your React Native app localization
Stars: ✭ 1,682 (+13916.67%)
Mutual labels:  language-detection
Malaya
Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (+1891.67%)
Mutual labels:  language-detection
L10n Swift
Localization of the application with ability to change language "on the fly" and support for plural form in any language.
Stars: ✭ 177 (+1375%)
Mutual labels:  language-detection
Fasttext.js
FastText for Node.js
Stars: ✭ 127 (+958.33%)
Mutual labels:  language-detection

Convolutional Neural Network for Language Detection

Note: This project is mostly based on https://github.com/yuhaozhang/sentence-convnet


Demo

  1. Run API Server

    python ./main.py
  2. Run HTML server
    for example:

    python -m SimpleHTTPServer 5050
    

    Access to http://localhost:5050/docs/


Requirements

To train with pretrained embedding (train.py --use_pretrain=True)

To download TED corpus (ted.py)

To visualize (visualize.ipynb)

Web API (main.py)

Data

  • TED Subtitle Corpus
    ./data/ted500 directory includes preprocessed data. To reproduce (2GB+ disk space required):

    python ./ted.py
  • Your own data
    Put the data file per class, e.g. class_names = ['neg', 'pos']:

    cnn-ld-tf
    ├── ...
    └── data
        └── mr
            ├── mr.neg  # examples with class neg
            └── mr.pos  # examples with class pos
    

    Note:

    • Data file encoding must be utf-8.
    • One example per line.
    • The number of examples of each class must be the same.

Preprocess

python ./util.py

Training

python ./train.py

Prediction

python ./predict.py

Evaluation

python ./eval.py

Run TensorBoard

tensorboard --logdir=./model/ted500/summaries

Embeddings by script name

References

CNN for text classification:

TED Corpus:

Language Detection:

Web API on heroku:

Pre-trained model

  • Supported languages (65):
    ["ar", "az", "bg", "bn", "bo", "cs", "da", "de", "el", "en", "es", "fa", "fi", "fil", "fr", "gu", "he", "hi", "ht", "hu", "hy", "id", "is", "it", "ja", "ka", "km", "kn", "ko", "ku", "lt", "mg", "ml", "mn", "ms", "my", "nb", "ne", "nl", "nn", "pl", "ps", "pt", "ro", "ru", "si", "sk", "sl", "so", "sq", "sv", "sw", "ta", "te", "tg", "th", "tl", "tr", "ug", "uk", "ur", "uz", "vi", "zh-cn", "zh-tw"]

Details: please visit documentation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].