All Projects → belambert → Edit Distance

belambert / Edit Distance

Licence: apache-2.0
Python library for computing edit distance between arbitrary Python sequences.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Edit Distance

Mmseqs2
MMseqs2: ultra fast and sensitive search and clustering suite
Stars: ✭ 441 (+622.95%)
Mutual labels:  alignment
Rapidfuzz
Rapid fuzzy string matching in Python using the Levenshtein Distance
Stars: ✭ 809 (+1226.23%)
Mutual labels:  levenshtein
Unsupervisedrr
[CVPR 2021 - Oral] UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering
Stars: ✭ 43 (-29.51%)
Mutual labels:  alignment
Head Pose Estimation
Real-time head pose estimation built with OpenCV and dlib
Stars: ✭ 467 (+665.57%)
Mutual labels:  alignment
Aligntab
An alignment plugin for Sublime Text using regular expression
Stars: ✭ 611 (+901.64%)
Mutual labels:  alignment
Facealignmentcompare
Empirical Study of Recent Face Alignment Methods
Stars: ✭ 15 (-75.41%)
Mutual labels:  alignment
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+588.52%)
Mutual labels:  levenshtein
Lambda
LAMBDA – the Local Aligner for Massive Biological DatA
Stars: ✭ 59 (-3.28%)
Mutual labels:  alignment
Alignedcollectionviewflowlayout
A collection view layout that gives you control over the horizontal and vertical alignment of the cells.
Stars: ✭ 751 (+1131.15%)
Mutual labels:  alignment
Alignmentduration
Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.
Stars: ✭ 36 (-40.98%)
Mutual labels:  alignment
Face Landmark
caffe 68 points face-landmark
Stars: ✭ 474 (+677.05%)
Mutual labels:  alignment
Ffsubsync
Automagically synchronize subtitles with video.
Stars: ✭ 5,167 (+8370.49%)
Mutual labels:  alignment
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Stars: ✭ 32,029 (+52406.56%)
Mutual labels:  alignment
Prnet
Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network (ECCV 2018)
Stars: ✭ 4,479 (+7242.62%)
Mutual labels:  alignment
Groot
A resistome profiler for Graphing Resistance Out Of meTagenomes
Stars: ✭ 48 (-21.31%)
Mutual labels:  alignment
Alass
"Automatic Language-Agnostic Subtitle Synchronization"
Stars: ✭ 421 (+590.16%)
Mutual labels:  alignment
Tabulate
Table Maker for Modern C++
Stars: ✭ 862 (+1313.11%)
Mutual labels:  alignment
Symspellcompound
SymSpellCompound: compound aware automatic spelling correction
Stars: ✭ 61 (+0%)
Mutual labels:  levenshtein
Levenshtein
Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.
Stars: ✭ 57 (-6.56%)
Mutual labels:  levenshtein
Node Damerau Levenshtein
Damerau - Levenstein distance function for node
Stars: ✭ 27 (-55.74%)
Mutual labels:  levenshtein

edit_distance

CircleCI PyPI version codecov

Python module for computing edit distances and alignments between sequences.

I needed a way to compute edit distances between sequences in Python. I wasn't able to find any appropriate libraries that do this so I wrote my own. There appear to be numerous edit distance libraries available for computing edit distances between two strings, but not between two sequences.

This is written entirely in Python. This implementation could likely be optimized to be faster within Python. And could probably be much faster if implemented in C.

The library API is modeled after difflib.SequenceMatcher. This is very similar to difflib, except that this module computes edit distance (Levenshtein distance) rather than the Ratcliff and Oberhelp method that Python's difflib uses. difflib "does not yield minimal edit sequences, but does tend to yield matches that 'look right' to people."

If you find this library useful or have any suggestions, please send me a message.

Installing & uninstalling

The easiest way to install is using pip:

pip install edit_distance

Alternatively you can clone this git repo and install using distutils:

git clone [email protected]:belambert/edit_distance.git
cd edit_distance
python setup.py install

To uninstall with pip:

pip uninstall edit_distance

API usage

To see examples of usage, view the difflib documentation. Additional API-level documentation is available on ReadTheDocs

This requires Python 2.7+ since it uses argparse for the command line interface. The rest of the code should be OK with earlier versions of Python

Example API usage:

import edit_distance
ref = [1, 2, 3, 4]
hyp = [1, 2, 4, 5, 6]
sm = edit_distance.SequenceMatcher(a=ref, b=hyp)
sm.get_opcodes()
sm.ratio()
sm.get_matching_blocks()

Differences from difflib

In addition to the SequenceMatcher methods, distance() and matches() methods are provided which compute the edit distance and the number of matches.

sm.distance()
sm.matches()

Even if the alignment of the two sequences is identical to difflib, get_opcodes() and get_matching_blocks() may return slightly different sequences. The opcodes returned by this library represent individual character operations, and thus should never span two or more characters.

It's also possible to compute the maximum number of matches rather than the minimum number of edits:

sm = edit_distance.SequenceMatcher(a=ref, b=hyp, 
     action_function=edit_distance.highest_match_action)

Notes

This doesn't implement the 'junk' matching features in difflib.

Hacking

To run unit tests:

python -m unittest

To deploy...

Contributing and code of conduct

For contributions, it's best to Github issues and pull requests. Proper testing and documentation required.

Code of conduct is expected to be reasonable, especially as specified by the Contributor Covenant

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].