All Projects → tienthanhdhcn → Vietnamese-Accent-Prediction

tienthanhdhcn / Vietnamese-Accent-Prediction

Licence: other
A simple/fast/accurate accent prediction for non-accented Vietnamese text

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Vietnamese-Accent-Prediction

automatic speech recognition
Vietnamese Automatic Speech Recognition
Stars: ✭ 58 (+87.1%)
Mutual labels:  vietnamese, vietnamese-nlp
classification
Vietnamese Text Classification
Stars: ✭ 39 (+25.81%)
Mutual labels:  vietnamese, vietnamese-nlp
vietnamese-roberta
A Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (-29.03%)
Mutual labels:  vietnamese, vietnamese-nlp
word tokenize
Vietnamese Word Tokenize
Stars: ✭ 45 (+45.16%)
Mutual labels:  vietnamese, vietnamese-nlp
marc
Markov chain generator for Python and/or Swift
Stars: ✭ 61 (+96.77%)
Mutual labels:  markov-chain
SpeakIt Vietnamese TTS
Vietnamese Text-to-Speech on Windows Project (zalo-speech)
Stars: ✭ 81 (+161.29%)
Mutual labels:  vietnamese
mchmm
Markov Chains and Hidden Markov Models in Python
Stars: ✭ 89 (+187.1%)
Mutual labels:  markov-chain
AALpy
An Active Automata Learning Library Written in Python
Stars: ✭ 60 (+93.55%)
Mutual labels:  markov-chain
JointIDSF
BERT-based joint intent detection and slot filling with intent-slot attention mechanism (INTERSPEECH 2021)
Stars: ✭ 55 (+77.42%)
Mutual labels:  vietnamese
PyBorg
Fork of PyBorg AI bot for cutie578 on EFNet
Stars: ✭ 45 (+45.16%)
Mutual labels:  markov-chain
community
Ông Dev Community
Stars: ✭ 64 (+106.45%)
Mutual labels:  vietnamese
markovclick
Python package to model clickstream data as a Markov chain. Inspired by R package clickstream.
Stars: ✭ 29 (-6.45%)
Mutual labels:  markov-chain
4chanMarkovText
Text Generation using Markov Chains fed by 4chan APIs
Stars: ✭ 28 (-9.68%)
Mutual labels:  markov-chain
Deep-Learning-Mahjong---
Reinforcement learning (RL) implementation of imperfect information game Mahjong using markov decision processes to predict future game states
Stars: ✭ 45 (+45.16%)
Mutual labels:  markov-chain
number-to-words
⚡ Thư viện hổ trợ chuyển đổi số sang chữ số Tiếng Việt.
Stars: ✭ 19 (-38.71%)
Mutual labels:  vietnamese
POS-Taggers
Part-of-Speech Tagging Models in Python
Stars: ✭ 16 (-48.39%)
Mutual labels:  n-grams
comments-generator
A Reddit bot that generates new context-aware comments using Markov chains trained from a set of given users or subreddits comments history.
Stars: ✭ 63 (+103.23%)
Mutual labels:  markov-chain
Markov-Word-Generator
A web app that uses Markov chains to generate pseudorandom words.
Stars: ✭ 33 (+6.45%)
Mutual labels:  markov-chain
google assistant vietnamese speaking
Đây là dự án độ lại loa thông minh chạy Google Assistant hỗ trợ đa ngôn ngữ trong đó có tiếng Việt, phần source code do Nguyễn Duy code lại từ Source Gốc của Google
Stars: ✭ 19 (-38.71%)
Mutual labels:  vietnamese
py-simple-lyric-generator
A simple Markov chains lyric generator written in Python.
Stars: ✭ 17 (-45.16%)
Mutual labels:  markov-chain

Vietnamese Accent Prediction

A very simple/fast/accurate accent prediction for non-accented Vietnamese text using n-gram language model with Markov Chain

Performances

All the tests were done on my Macbook, 2.5 GHz Intel Core i7, 16 GB Ram

  • Speed: 350 sentences per second ~ 3500 words/syllables per second
  • Accuracy: 96.52% on test.txt provided in datasets folder
AccuracyCalculator ac = new AccuracyCalculator(); 
System.out.println("Accuracy:" + ac.getAccuracy("datasets/test.txt") +"%");

Examples

  • Anh yeu em --> Anh yêu em (I love you)

  • Toi dang di du lich o ha long --> Tôi đang đi du lịch ở hạ long (I am visting Halong)

API

Using the provided n-grams data

AccentPredictor ap = new AccentPredictor();
String str = "Toi thich di du lich Ha Noi";
String predictedStr = ap.predictAccents(str);
  • You can also get top N predicted results as follows:
AccentPredictor ap = new AccentPredictor();
String str = "Toi thich di du lich Ha Noi";

// (matched_str,  matched_score) map
LinkedHashMap<String, Double> = ap.predictAccentsWithMultiMatches(str, 5); //Return the 5 best matches

Using your own n-gram data

AccentPredictor ap = new AccentPredictor("_Your1GramFile", "_Your2GramsFile");
String str = "Toi thich di du lich Ha Noi";
String predictedStr = ap.predictAccents(str);
  • To create your own n-gram data, you can use the following API:
String dataFolderPath = "path_to_your_data"; // The folder contains your text data
int numberOfProcessingFiles = -1; // The max number of files you plan to process (-1 means using all the data)
boolean toLowercase = true; // if it is set to "true", the n-grams will be converted to lowercase
String _1GramFileOut =  "datasets/news1gram";
String _2GramsFileOut =  "datasets/news2grams";
new NGramer(dataFolderPath).statisticNGrams(numberOfProcessingFiles, toLowercase, _1GramFileOut, _2GramsFileOut);
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].