All Projects → NirantK → Hindi2vec

NirantK / Hindi2vec

Licence: mit
State-of-the-Art Language Modeling and Text Classification in Hindi Language

Projects that are alternatives of or similar to Hindi2vec

Python Bootcamp
Python Bootcamp docs and lectures (UC Berkeley)
Stars: ✭ 208 (-1.42%)
Mutual labels:  jupyter-notebook
Exercise
exercise for nndl
Stars: ✭ 2,649 (+1155.45%)
Mutual labels:  jupyter-notebook
Intro Numerical Methods
Jupyter notebooks and other materials developed for the Columbia course APMA 4300
Stars: ✭ 210 (-0.47%)
Mutual labels:  jupyter-notebook
Mlops On Gcp
Stars: ✭ 207 (-1.9%)
Mutual labels:  jupyter-notebook
Machine Learning
从零基础开始机器学习之旅
Stars: ✭ 209 (-0.95%)
Mutual labels:  jupyter-notebook
Style transfer
CNN image style transfer 🎨.
Stars: ✭ 210 (-0.47%)
Mutual labels:  jupyter-notebook
Dlsys Course.github.io
Deep learning system course
Stars: ✭ 207 (-1.9%)
Mutual labels:  jupyter-notebook
Noise2self
A framework for blind denoising with self-supervision.
Stars: ✭ 211 (+0%)
Mutual labels:  jupyter-notebook
Simplified Deeplearning
Simplified implementations of deep learning related works
Stars: ✭ 2,389 (+1032.23%)
Mutual labels:  jupyter-notebook
Cartoframes
CARTO Python package for data scientists
Stars: ✭ 208 (-1.42%)
Mutual labels:  jupyter-notebook
3d Mri Brain Tumor Segmentation Using Autoencoder Regularization
Keras implementation of the paper "3D MRI brain tumor segmentation using autoencoder regularization" by Myronenko A. (https://arxiv.org/abs/1810.11654).
Stars: ✭ 209 (-0.95%)
Mutual labels:  jupyter-notebook
Book Resources
Stars: ✭ 209 (-0.95%)
Mutual labels:  jupyter-notebook
Image manipulation detection
Paper: CVPR2018, Learning Rich Features for Image Manipulation Detection
Stars: ✭ 210 (-0.47%)
Mutual labels:  jupyter-notebook
Hardware introduction
What scientific programmers must know about CPUs and RAM to write fast code.
Stars: ✭ 209 (-0.95%)
Mutual labels:  jupyter-notebook
Sttn
[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting
Stars: ✭ 211 (+0%)
Mutual labels:  jupyter-notebook
Windrose
A Python Matplotlib, Numpy library to manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
Stars: ✭ 208 (-1.42%)
Mutual labels:  jupyter-notebook
Monthofjulia
Some code examples gathered during my Month of Julia.
Stars: ✭ 209 (-0.95%)
Mutual labels:  jupyter-notebook
Algforopt Notebooks
Jupyter notebooks associated with the Algorithms for Optimization textbook
Stars: ✭ 210 (-0.47%)
Mutual labels:  jupyter-notebook
Academy
Ray tutorials from Anyscale
Stars: ✭ 210 (-0.47%)
Mutual labels:  jupyter-notebook
Knowledge Graph Analysis Programming Exercises
Exercises for the Analysis of Knowledge Graphs
Stars: ✭ 208 (-1.42%)
Mutual labels:  jupyter-notebook

hindi2vec

State-of-the-Art Language Modeling and Text Classification in Hindi Language

Results

We achieved State of the Art Perplexity = 46.81 for Hindi compared to 40.68 for English (lower is better)

  • To the best of my knowledge on September 18, 2018

Update: nlp-for-hindi uses sentencepiece instead of the word based spacCy tokenizer which I use. On those tokens, the measured perplexity for that LM is ~35. I encourage you to check that work out as well.

Downloads

TODO

  • [x] Language modeling based on wikipedia dump
  • [x] Release Language Models: Hindi Language Model
  • [x] Create Text classification Datasets: BBC Hindi
  • [ ] Benchmark text classification with FastText

Idea Dump

  • [ ] Change the custom head to be used for transliteration instead of classification, Hindi script (Devnagri) to English script (Roman)
  • [ ] MTL tasks for training and inference using custom heads
  • [ ] Text to Speech - using datasets from news recordings or Hindi subtitles of dubbed movies

FastAI Installation

This version of the notebook uses fastai lib's v0.7, used in their Part 2 v2 course in Summer 2018. The best way to install it via conda as mentioned here

Special thanks to Jeremy, Rachel and other contributors to fastai. This work is a reproduction of their work in English to Hindi. Thanks to @cstorm125 for thai2vec which inspired this work.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].