NirantK / Hindi2vec
Licence: mit
State-of-the-Art Language Modeling and Text Classification in Hindi Language
Stars: ✭ 211
Labels
Projects that are alternatives of or similar to Hindi2vec
Python Bootcamp
Python Bootcamp docs and lectures (UC Berkeley)
Stars: ✭ 208 (-1.42%)
Mutual labels: jupyter-notebook
Intro Numerical Methods
Jupyter notebooks and other materials developed for the Columbia course APMA 4300
Stars: ✭ 210 (-0.47%)
Mutual labels: jupyter-notebook
Dlsys Course.github.io
Deep learning system course
Stars: ✭ 207 (-1.9%)
Mutual labels: jupyter-notebook
Noise2self
A framework for blind denoising with self-supervision.
Stars: ✭ 211 (+0%)
Mutual labels: jupyter-notebook
Simplified Deeplearning
Simplified implementations of deep learning related works
Stars: ✭ 2,389 (+1032.23%)
Mutual labels: jupyter-notebook
Cartoframes
CARTO Python package for data scientists
Stars: ✭ 208 (-1.42%)
Mutual labels: jupyter-notebook
3d Mri Brain Tumor Segmentation Using Autoencoder Regularization
Keras implementation of the paper "3D MRI brain tumor segmentation using autoencoder regularization" by Myronenko A. (https://arxiv.org/abs/1810.11654).
Stars: ✭ 209 (-0.95%)
Mutual labels: jupyter-notebook
Image manipulation detection
Paper: CVPR2018, Learning Rich Features for Image Manipulation Detection
Stars: ✭ 210 (-0.47%)
Mutual labels: jupyter-notebook
Hardware introduction
What scientific programmers must know about CPUs and RAM to write fast code.
Stars: ✭ 209 (-0.95%)
Mutual labels: jupyter-notebook
Sttn
[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting
Stars: ✭ 211 (+0%)
Mutual labels: jupyter-notebook
Windrose
A Python Matplotlib, Numpy library to manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
Stars: ✭ 208 (-1.42%)
Mutual labels: jupyter-notebook
Monthofjulia
Some code examples gathered during my Month of Julia.
Stars: ✭ 209 (-0.95%)
Mutual labels: jupyter-notebook
Algforopt Notebooks
Jupyter notebooks associated with the Algorithms for Optimization textbook
Stars: ✭ 210 (-0.47%)
Mutual labels: jupyter-notebook
Knowledge Graph Analysis Programming Exercises
Exercises for the Analysis of Knowledge Graphs
Stars: ✭ 208 (-1.42%)
Mutual labels: jupyter-notebook
hindi2vec
State-of-the-Art Language Modeling and Text Classification in Hindi Language
Results
We achieved State of the Art Perplexity = 46.81 for Hindi compared to 40.68 for English (lower is better)
- To the best of my knowledge on September 18, 2018
Update: nlp-for-hindi uses sentencepiece instead of the word based spacCy tokenizer which I use. On those tokens, the measured perplexity for that LM is ~35. I encourage you to check that work out as well.
Downloads
- Pretrained Language Models that you can use in your classification for transfer learning
- EXCLUSIVE: BBC Hindi data of 4335 documents for text classification and text summarization. Release Notes
- Raw Data for Language Model shared above: Hindi Wikipedia with about 21k unique tokens for minfreq = 50
- Wikipedia Processed Data - please use this to train your model
TODO
- [x] Language modeling based on wikipedia dump
- [x] Release Language Models: Hindi Language Model
- [x] Create Text classification Datasets: BBC Hindi
- [ ] Benchmark text classification with FastText
Idea Dump
- [ ] Change the custom head to be used for transliteration instead of classification, Hindi script (Devnagri) to English script (Roman)
- [ ] MTL tasks for training and inference using custom heads
- [ ] Text to Speech - using datasets from news recordings or Hindi subtitles of dubbed movies
FastAI Installation
This version of the notebook uses fastai lib's v0.7, used in their Part 2 v2 course in Summer 2018. The best way to install it via conda as mentioned here
Special thanks to Jeremy, Rachel and other contributors to fastai. This work is a reproduction of their work in English to Hindi. Thanks to @cstorm125 for thai2vec which inspired this work.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].