1ytic / edit-distance-papers

Licence: other

A curated list of papers dedicated to edit-distance as objective function

Projects that are alternatives of or similar to edit-distance-papers

eddie

No description or website provided.

Stars: ✭ 18 (-63.27%)

Mutual labels: edit-distance, levenshtein

edits.cr

Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

Stars: ✭ 16 (-67.35%)

Mutual labels: edit-distance, levenshtein

Quickenshtein

Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support

Stars: ✭ 204 (+316.33%)

Mutual labels: edit-distance, levenshtein

Symspell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Stars: ✭ 1,976 (+3932.65%)

Mutual labels: edit-distance, levenshtein

LinSpell

Fast approximate strings search & spelling correction

Stars: ✭ 52 (+6.12%)

Mutual labels: edit-distance, levenshtein

levenshtein.c

Levenshtein algorithm in C

Stars: ✭ 77 (+57.14%)

Mutual labels: edit-distance, levenshtein

pie

百度云流式语音识别客户端 SDK

Stars: ✭ 62 (+26.53%)

Mutual labels: asr

myG2P

Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech synthesis (TTS).

Stars: ✭ 43 (-12.24%)

Mutual labels: asr

levenshtein-edit-distance

Levenshtein edit distance

Stars: ✭ 59 (+20.41%)

Mutual labels: levenshtein

rasr

The RWTH ASR Toolkit.

Stars: ✭ 43 (-12.24%)

Mutual labels: asr

ctc-asr

End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.

Stars: ✭ 112 (+128.57%)

Mutual labels: asr

speech-recognition-evaluation

Evaluate results from ASR/Speech-to-Text quickly

Stars: ✭ 25 (-48.98%)

Mutual labels: asr

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (+8.16%)

Mutual labels: asr

recursion-and-dynamic-programming

Julia and Python recursion algorithm, fractal geometry and dynamic programming applications including Edit Distance, Knapsack (Multiple Choice), Stock Trading, Pythagorean Tree, Koch Snowflake, Jerusalem Cross, Sierpiński Carpet, Hilbert Curve, Pascal Triangle, Prime Factorization, Palindrome, Egg Drop, Coin Change, Hanoi Tower, Cantor Set, Fibo…

Stars: ✭ 37 (-24.49%)

Mutual labels: edit-distance

leopard

On-device speech-to-text engine powered by deep learning

Stars: ✭ 354 (+622.45%)

Mutual labels: asr

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-57.14%)

Mutual labels: asr

Speech-Corpus-Collection

A Collection of Speech Corpus for ASR and TTS

Stars: ✭ 113 (+130.61%)

Mutual labels: asr

AESRC2020

Data preperation scripts, training pipeline and baseline experiment results for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

Stars: ✭ 40 (-18.37%)

Mutual labels: asr

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+265.31%)

Mutual labels: asr

asr24

24-hour Automatic Speech Recognition

Stars: ✭ 27 (-44.9%)

Mutual labels: asr

View All Similar Projects ➔

Edit-distance as objective function

There are several research fields in which the edit-distance chosen as the objective function. For example, in Automatic Speech Recognition (ASR) the main metric of the quality of models is Word Error Rate (WER).

Problems

Unfortunately, directly optimize the edit-distance function is difficult. Therefore, in most cases, approaches based on a proxy function, like a cross-entropy. On the other hand, in the context of the sequence learning task this leads to several problems [1]:

Exposure Bias: the model is never exposed to its own errors during training, and so the inferred histories at test-time do not resemble the gold training histories.
Loss Evaluation Mismatch: training uses a word-level loss, while at test-time we target improving sequence-level evaluation metrics
Label Bias: since word probabilities at each time-step are locally normalized, guaranteeing that successors of incorrect histories receive the same mass as do the successors of the true history.

Solutions

The following table summarizes the works that attempts to solve the mentioned problems. There are much more detailed overview of works, for example [2], but this list includes only works that use the edit-distance explicitly or implicitly. Moreover, most of these works formalize the sequence prediction task as an action-taking problem in Reinforcement Learning.

Year	Task	Reward level	Algorithms, Models	Affiliation	Authors, Link
2020	ASR	Sentence	MWER, RNN-T	Amazon	Guo et al.
2020	MT	Sentence	MGS, parameter search	NYU	Welleck, Cho
2020	ASR	Sentence	Proper Noun, Phonetic Fuzzing, MWER, RNN-T, LAS	Google	Peyser, Sainath, Pundak
2019	NLP	Sentence	GPT-2, PPO, Human labeling	OpenAI	Ziegler, Stiennon et al.
2019	ASR	Sentence	Neural Architecture Search, REINFORCE, CTC	KPMG Nigeria, OAU	Baruwa et al.
2019	ASR	Sentence	Normalized MWER	Amazon	Gandhe, Rastrow
2019	ASR	Token	MBR, RNN-T	Tencent, USA	Weng et al.
2019	ASR	Token	ECTC-DOCD	China	Yi, Wang, Xu
2019	ASR	Sentence	MWER, RNN-T, LAS	Google	Sainath, Pang et al
2019	MT	Token	Reinforce-NAT, Non-Autoregressive Transformer	China, Tencent	Shao, Feng et al.
2019	MT, TS, APE	Token	Levenshtein Transformer, imitation learning	Facebook, New York	Gu, Wang, Zhao
2018	ASR	Token	MBR, softmax margin, PAPB, S2S	Brno, JHU, MERL	Baskar et al.
2018	ASR	Token	OCD, S2S	Google Brain	Sabour, Chan, Norouzi
2018	ASR	Token	REINFORCE, S2S	Nara, RIKEN	Tjandra et al.
2018	TS	Sentence	Alternating Actor-Critic	Hong Kong, Tencent	Li, Bing, Lam
2018	ASR	Sentence	REINFORCE, PPO, Reward shaping	Tokyo	Peng, Shibata, Shinozaki
2017	ASR	Sentence	REINFORCE, Self-critic	Salesforce	Zhou, Xiong, Socher
2017	ASR	Sentence	MWER, LAS, Sampling, N-best	Google	Prabhavalkar et al.
2017	ASR	Sentence	Expected Loss, RNA	Google	Sak et al.
2017	MT	Sentence	Actor-Critic, Critic-aware	Hong Kong, New York	Gu, Cho, Li
2016	ASR	Sentence	Reward Augmented ML	Google Brain	Norouzi et al.
2016	MT	Token	Actor-Critic	Montreal, McGill	Bahdanau et al.
2015	MT	Sentence	MIXER	Facebook	Ranzato et al.
2015	ASR	Token	Task Loss Estimation	Montreal, Wrocław	Bahdanau et al.
2014	ASR	Sentence	Expected Loss, CTC	DeepMind, Toronto	Graves, Jaitly

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

1ytic / edit-distance-papers

Labels

Projects that are alternatives of or similar to edit-distance-papers

Edit-distance as objective function

Problems

Solutions

Reference