All Projects → RobinL → Fuzzymatcher

RobinL / Fuzzymatcher

Licence: mit
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Fuzzymatcher

Box
Python dictionaries with advanced dot notation access
Stars: ✭ 1,804 (+942.77%)
Mutual labels:  pypi
Onionsearch
OnionSearch is a script that scrapes urls on different .onion search engines.
Stars: ✭ 135 (-21.97%)
Mutual labels:  pypi
Fuzzysearch
Find parts of long text or data, allowing for some changes/typos.
Stars: ✭ 157 (-9.25%)
Mutual labels:  fuzzy-matching
Py3 Pinterest
Fully fledged Python Pinterest client
Stars: ✭ 133 (-23.12%)
Mutual labels:  pypi
Dephell
📦 🔥 Python project management. Manage packages: convert between formats, lock, install, resolve, isolate, test, build graph, show outdated, audit. Manage venvs, build package, bump version.
Stars: ✭ 1,730 (+900%)
Mutual labels:  pypi
React Command Palette
An accessible browser compatible javascript command palette
Stars: ✭ 140 (-19.08%)
Mutual labels:  fuzzy-matching
Exabgp
The BGP swiss army knife of networking
Stars: ✭ 1,713 (+890.17%)
Mutual labels:  pypi
Sadb
(safe adb) More convenient to operate adb for multiple connected devices
Stars: ✭ 165 (-4.62%)
Mutual labels:  pypi
Sailboat
🐍 A quick and easy way to distribute your Python projects!
Stars: ✭ 137 (-20.81%)
Mutual labels:  pypi
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (-10.98%)
Mutual labels:  pypi
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+1042.2%)
Mutual labels:  fuzzy-matching
Tensordash
TensorDash is an application that lets you remotely monitor your deep learning model's metrics and notifies you when your model training is completed or crashed.
Stars: ✭ 137 (-20.81%)
Mutual labels:  pypi
Audioowl
Fast and simple music and audio analysis using RNN in Python 🕵️‍♀️ 🥁
Stars: ✭ 151 (-12.72%)
Mutual labels:  pypi
Pylightxl
A light weight, zero dependency, minimal functionality excel read/writer python library
Stars: ✭ 134 (-22.54%)
Mutual labels:  pypi
Eigensheep
massively parallel experimentation with Jupyter and AWS Lambda 🐑🌩📒
Stars: ✭ 158 (-8.67%)
Mutual labels:  pypi
Huobi
火币的行情交易的python实现
Stars: ✭ 129 (-25.43%)
Mutual labels:  pypi
Illacceptanything
The project where literally anything* goes.
Stars: ✭ 1,756 (+915.03%)
Mutual labels:  pypi
Stressberry
Stress tests for the Raspberry Pi
Stars: ✭ 167 (-3.47%)
Mutual labels:  pypi
Py3readiness
Python 3 support graph for most popular packages
Stars: ✭ 164 (-5.2%)
Mutual labels:  pypi
Opencv Python
Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages.
Stars: ✭ 2,413 (+1294.8%)
Mutual labels:  pypi

.. image:: https://badge.fury.io/py/fuzzymatcher.svg :target: https://badge.fury.io/py/fuzzymatcher

.. image:: https://codecov.io/gh/RobinL/fuzzymatcher/branch/dev/graph/badge.svg :target: https://codecov.io/gh/RobinL/fuzzymatcher

fuzzymatcher

A Python package that allows the user to fuzzy match two pandas dataframes based on one or more common fields.

Fuzzymatches uses sqlite3's Full Text Search to find potential matches.

It then uses probabilistic record linkage <https://en.wikipedia.org/wiki/Record_linkage#Probabilistic_record_linkage>_ to score matches.

Finally it outputs a list of the matches it has found and associated score.

Installation

pip install fuzzymatcher

Note that you will need a build of sqlite which includes FTS4. This seems to be widely included by default, but otherwise see here <https://www.sqlite.org/fts3.html#compiling_and_enabling_fts3_and_fts4>_.

Usage

See examples.ipynb <https://github.com/RobinL/fuzzymatcher/blob/master/examples.ipynb>_ for examples of usage and the output.

You can run these examples interactively here <https://mybinder.org/v2/gh/RobinL/fuzzymatcher/master?filepath=examples.ipynb>_.

Simple example

Suppose you have a table called df_left which looks like this:

==== ============= id ons_name ==== ============= 0 Darlington 1 Monmouthshire 2 Havering 3 Knowsley 4 Charnwood ... etc. ==== =============

And you want to link it to a table df_right that looks like this:

==== ========================= id os_name ==== ========================= 0 Darlington (B) 1 Havering London Boro 2 Sir Fynwy - Monmouthshire 3 Knowsley District (B) 4 Charnwood District (B) ... etc. ==== =========================

You can write:

.. code:: python

import fuzzymatcher fuzzymatcher.fuzzy_left_join(df_left, df_right, left_on = "ons_name", right_on = "os_name")

And you'll get:

================== ============= ========================= best_match_score ons_name os_name ================== ============= ========================= 0.178449 Darlington Darlington (B) 0.133371 Monmouthshire Sir Fynwy - Monmouthshire 0.102473 Havering Havering London Boro 0.155775 Knowsley Knowsley District (B) 0.155775 Charnwood Charnwood District (B) ... etc. etc. ================== ============= =========================

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].