oligoglot / theedhum-nandrum

Licence: Apache-2.0 license

A sentiment classifier on mixed language (and mixed script) reviews in Tamil, Malayalam and English

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to theedhum-nandrum

Grab .m3u8 from YouTube live channels and makes .m3u IPTV Playlist from various languages and Events. Tamil / Malayalam / English / Hindi / French / Kids / Sports / Urudu etc.

Stars: ✭ 48 (+200%)

Mutual labels: malayalam, tamil

govarnam

Easily Type Indian Languages on computer and mobile. GoVarnam is a cross-platform transliteration library. Manglish -> Malayalam, Thanglish -> Tamil, Hinglish -> Hindi plus another 10 languages. GoVarnam is a near-Go port of libvarnam

Stars: ✭ 97 (+506.25%)

Mutual labels: malayalam, tamil

SGDLibrary

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20

Stars: ✭ 165 (+931.25%)

Mutual labels: sgd, logistic-regression

Tensorflow Ml Nlp

텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)

Stars: ✭ 176 (+1000%)

Mutual labels: logistic-regression

Textclassification

several methods for text classification

Stars: ✭ 180 (+1025%)

Mutual labels: logistic-regression

batchnorm-pruning

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers https://arxiv.org/abs/1802.00124

Stars: ✭ 66 (+312.5%)

Mutual labels: sgd

AutoOpt

Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent

Stars: ✭ 44 (+175%)

Mutual labels: sgd

Deep Math Machine Learning.ai

A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.

Stars: ✭ 173 (+981.25%)

Mutual labels: logistic-regression

Awd Lstm Lm

LSTM and QRNN Language Model Toolkit for PyTorch

Stars: ✭ 1,834 (+11362.5%)

Mutual labels: sgd

numpy-neuralnet-exercise

Implementation of key concepts of neuralnetwork via numpy

Stars: ✭ 49 (+206.25%)

Mutual labels: sgd

FactorizationMachine

implementation of factorization machine, support classification.

Stars: ✭ 19 (+18.75%)

Mutual labels: sgd

Fake news detection

Fake News Detection in Python

Stars: ✭ 194 (+1112.5%)

Mutual labels: logistic-regression

TransE

TransE方法的Python实现，解释SGD中TransE的向量更新

Stars: ✭ 31 (+93.75%)

Mutual labels: sgd

Deeplearning.ai

该存储库包含由deeplearning.ai提供的相关课程的个人的笔记和实现代码。

Stars: ✭ 181 (+1031.25%)

Mutual labels: logistic-regression

Python-AndrewNgML

Python implementation of Andrew Ng's ML course projects

Stars: ✭ 24 (+50%)

Mutual labels: logistic-regression

Machine Learning Is All You Need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

Stars: ✭ 173 (+981.25%)

Mutual labels: logistic-regression

DiFacto2 ffm

Distributed Fieldaware Factorization Machines based on Parameter Server

Stars: ✭ 11 (-31.25%)

Mutual labels: sgd

LinkOS-Android-Samples

Java based sample code for developing on Android. The demos in this repository are stored on separate branches. To navigate to a demo, please click branches.

Stars: ✭ 52 (+225%)

Mutual labels: sgd

Voice Gender

Gender recognition by voice and speech analysis

Stars: ✭ 248 (+1450%)

Mutual labels: logistic-regression

AIML-Projects

Projects I completed as a part of Great Learning's PGP - Artificial Intelligence and Machine Learning

Stars: ✭ 85 (+431.25%)

Mutual labels: logistic-regression

View All Similar Projects ➔

theedhum-nandrum (தீதும் நன்றும்)

A sentiment classifier on mixed language (and mixed script) reviews in Tamil, Malayalam and English. You can read our paper describing the approach at https://arxiv.org/abs/2010.03189. Please cite our paper if you are using this.

@misc{lakshmanan2020theedhum, title={Theedhum Nandrum@Dravidian-CodeMix-FIRE2020: A Sentiment Polarity Classifier for YouTube Comments with Code-switching between Tamil, Malayalam and English}, author={BalaSundaraRaman Lakshmanan and Sanjeeth Kumar Ravindranath}, year={2020}, eprint={2010.03189}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Installation

Pre-requisites

Python 3.7 or above

Getting the code

cd /path/to/parent/
git clone https://github.com/oligoglot/theedhum-nandrum.git
cd theedhum-nandrum

Setting up dev environment

virtualenv venv_tn
source venv_tn/bin/activate
pip install -r requirements.txt

Running the classification scripts

You need to activate the virtualenv
- source venv_tn/bin/activate
cd src/tn
Hyper Parameter Tuning for SGD Classifier
- python3 sentiment_classifier.py experiment ta ../../resources/data/tamil_train.tsv ../../resources/data/tamil_dev.tsv configs/tuning_experiments_1.json
Classification for Tamil Input Set
- python3 sentiment_classifier.py test ta ../../resources/data/tamil_train.tsv ../../resources/data/tamil_dev.tsv <output File>
Classification for Malayalam Input Set
- python3 sentiment_classifier.py test ml ../../resources/data/malayalam_train.tsv ../../resources/data/malayalam_dev.tsv <output File>

Steps

Pre-processing

Noise removal

Remove irrelevant parts of the data, like html tags

Language identification

If the text is a different language, need to output "Not tamil"

Attributions

Spelling Corrector in Python 3; see http://norvig.com/spell-correct.html Copyright (c) 2007-2016 Peter Norvig MIT license: www.opensource.org/licenses/mit-license.php
Module to convert Unicode Emojis to corresponding Sentiment Rankings. Based on the research by Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) on Sentiment of Emojis. Journal Link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144296 CSV Data acquired from CLARIN repository, Repository Link: http://hdl.handle.net/11356/1048
Datasets: @inproceedings{chakravarthi-etal-2020-corpus, title = "Corpus Creation for Sentiment Analysis in Code-Mixed {T}amil-{E}nglish Text", author = "Chakravarthi, Bharathi Raja and Muralidaran, Vigneshwaran and Priyadharshini, Ruba and McCrae, John Philip", booktitle = "Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources association", url = "https://www.aclweb.org/anthology/2020.sltu-1.28", pages = "202--210", abstract = "Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code-mixed data for a low-resourced language like Tamil also adds difficulty to this problem. To overcome this, we created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube. In this paper, we describe the process of creating the corpus and assigning polarities. We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.", language = "English", ISBN = "979-10-95546-35-1", } @inproceedings{Chakravarthi2020ASA, title={A Sentiment Analysis Dataset for Code-Mixed Malayalam-English}, author={Bharathi Raja Chakravarthi and Navya Jose and Shardul Suryawanshi and E. Sherly and John P. McCrae}, booktitle={SLTU/CCURL@LREC}, year={2020} }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

oligoglot / theedhum-nandrum

Programming Languages

Labels

Projects that are alternatives of or similar to theedhum-nandrum

theedhum-nandrum (தீதும் நன்றும்)

Installation

Pre-requisites

Getting the code

Setting up dev environment

Running the classification scripts

Steps

Pre-processing

Noise removal

Language identification

Attributions