All Projects → taleinat → Fuzzysearch

taleinat / Fuzzysearch

Licence: mit
Find parts of long text or data, allowing for some changes/typos.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Fuzzysearch

Yoyo-leaf
Yoyo-leaf is an awesome command-line fuzzy finder.
Stars: ✭ 49 (-68.79%)
Mutual labels:  fuzzy-search, fuzzy-matching
Leaderf
An efficient fuzzy finder that helps to locate files, buffers, mrus, gtags, etc. on the fly for both vim and neovim.
Stars: ✭ 1,733 (+1003.82%)
Mutual labels:  fuzzy-search, fuzzy-matching
fish-fzy
fzy inegration with fish. Search history, navigate directories and more. Blazingly fast.
Stars: ✭ 18 (-88.54%)
Mutual labels:  fuzzy-search, fuzzy-matching
Tntsearch
A fully featured full text search engine written in PHP
Stars: ✭ 2,693 (+1615.29%)
Mutual labels:  fuzzy-search, fuzzy-matching
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+1158.6%)
Mutual labels:  fuzzy-search, fuzzy-matching
levenshtein.c
Levenshtein algorithm in C
Stars: ✭ 77 (-50.96%)
Mutual labels:  fuzzy-search, fuzzy-matching
fuzzy-search
A collection of algorithms for fuzzy search like in Sublime Text.
Stars: ✭ 49 (-68.79%)
Mutual labels:  fuzzy-search, fuzzy-matching
Fuzzball.js
Easy to use and powerful fuzzy string matching, port of fuzzywuzzy.
Stars: ✭ 225 (+43.31%)
Mutual labels:  fuzzy-search, fuzzy-matching
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+167.52%)
Mutual labels:  fuzzy-search, fuzzy-matching
Liquidmetal
💦🤘 A mimetic poly-alloy of the Quicksilver scoring algorithm, essentially LiquidMetal. </Schwarzenegger Voice>
Stars: ✭ 279 (+77.71%)
Mutual labels:  fuzzy-search, fuzzy-matching
bolt.nvim
⚡ Ultrafast multi-pane file manager for Neovim with fuzzy matching
Stars: ✭ 100 (-36.31%)
Mutual labels:  fuzzy-search, fuzzy-matching
Fuse Swift
A lightweight fuzzy-search library, with zero dependencies
Stars: ✭ 767 (+388.54%)
Mutual labels:  fuzzy-search, fuzzy-matching
SymSpellCppPy
Fast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-82.17%)
Mutual labels:  fuzzy-search, fuzzy-matching
Fuzzywuzzy
Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
Stars: ✭ 506 (+222.29%)
Mutual labels:  fuzzy-search, fuzzy-matching
Faint
Extensible TUI fuzzy file file explorer
Stars: ✭ 82 (-47.77%)
Mutual labels:  fuzzy-search, fuzzy-matching
Neovim Fuzzy
Fuzzy file finding for neovim
Stars: ✭ 103 (-34.39%)
Mutual labels:  fuzzy-search
List.js
The perfect library for adding search, sort, filters and flexibility to tables, lists and various HTML elements. Built to be invisible and work on existing HTML.
Stars: ✭ 10,650 (+6683.44%)
Mutual labels:  fuzzy-search
Abydos
Abydos NLP/IR library for Python
Stars: ✭ 91 (-42.04%)
Mutual labels:  fuzzy-matching
Refinr
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Stars: ✭ 91 (-42.04%)
Mutual labels:  fuzzy-matching
Chrome Ff
Fuzzy Finder for Chrome/Chromium tabs and windows
Stars: ✭ 142 (-9.55%)
Mutual labels:  fuzzy-search

=========== fuzzysearch

.. image:: https://img.shields.io/pypi/v/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Latest Version

.. image:: https://img.shields.io/travis/taleinat/fuzzysearch.svg?branch=master :target: https://travis-ci.org/taleinat/fuzzysearch/branches :alt: Build & Tests Status

.. image:: https://img.shields.io/coveralls/taleinat/fuzzysearch.svg?branch=master :target: https://coveralls.io/r/taleinat/fuzzysearch?branch=master :alt: Test Coverage

.. image:: https://img.shields.io/pypi/wheel/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Wheels

.. image:: https://img.shields.io/pypi/pyversions/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Supported Python versions

.. image:: https://img.shields.io/pypi/implementation/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Supported Python implementations

.. image:: https://img.shields.io/pypi/l/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch/ :alt: License

Fuzzy search: Find parts of long text or data, allowing for some changes/typos.

Easy, fast, and just works!

.. code:: python

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1, matched="PATERN")]
  • Two simple functions to use: one for in-memory data and one for files

    • Fastest search algorithm is chosen automatically
  • Levenshtein Distance metric with configurable parameters

    • Separately configure the max. allowed distance, substitutions, deletions and/or insertions
  • Advanced algorithms with optional C and Cython optimizations

  • Properly handles Unicode; special optimizations for binary data

  • Simple installation:

    • pip install fuzzysearch just works
    • pure-Python fallbacks for compiled modules
    • only one dependency (attrs)
  • Extensively tested

  • Free software: MIT license <LICENSE>_

For more info, see the documentation <http://fuzzysearch.rtfd.org>_.

Installation

fuzzysearch supports Python versions 2.7 and 3.5+, as well as PyPy 2.7 and 3.6.

.. code::

$ pip install fuzzysearch

This will work even if installing the C and Cython extensions fails, using pure-Python fallbacks.

Usage

Just call find_near_matches() with the sub-sequence you're looking for, the sequence to search, and the matching parameters:

.. code:: python

>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1, matched="PATERN")]

To search in a file, use find_near_matches_in_file() similarly:

.. code:: python

>>> from fuzzysearch import find_near_matches_in_file
>>> with open('data_file', 'rb') as f:
...     find_near_matches_in_file(b'PATTERN', f, max_l_dist=1)
[Match(start=3, end=9, dist=1, matched="PATERN")]

Examples

fuzzysearch is great for ad-hoc searches of genetic data, such as DNA or protein sequences, before reaching for "heavier", domain-specific tools like BioPython:

.. code:: python

>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1, matched="TAGCACTGTAGGGATAACAAT")]

BioPython sequences are also supported:

.. code:: python

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> sequence = Seq('''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG''', IUPAC.unambiguous_dna)
>>> subsequence = Seq('TGCACTGTAGGGATAACAAT', IUPAC.unambiguous_dna)
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1, matched="TAGCACTGTAGGGATAACAAT")]

Matching Criteria

The search function supports four possible match criteria, which may be supplied in any combination:

  • maximum Levenshtein distance (max_l_dist)

  • maximum # of subsitutions

  • maximum # of deletions ("delete" = skip a character in the sub-sequence)

  • maximum # of insertions ("insert" = skip a character in the sequence)

Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.

.. code:: python

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1, matched="PATERN")]

# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]

# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1, matched="PAT-ERN")] # the Levenshtein distance is still 1

# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2, matched="PATERRN")]

When to Use Other Tools

  • Use case: Search through a list of strings for almost-exactly matching strings. For example, searching through a list of names for possible slight variations of a certain name.

    Suggestion: Consider using fuzzywuzzy <https://github.com/seatgeek/fuzzywuzzy>_.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].