All Projects → anhaidgroup → Py_entitymatching

anhaidgroup / Py_entitymatching

Licence: bsd-3-clause

Projects that are alternatives of or similar to Py entitymatching

Snucse
📓 Happy Campus Life
Stars: ✭ 121 (-0.82%)
Mutual labels:  jupyter-notebook
Ema
Explanatory Model Analysis. Explore, Explain and Examine Predictive Models
Stars: ✭ 121 (-0.82%)
Mutual labels:  jupyter-notebook
Mooc Coursera Advanced Machine Learning
Content from Coursera's ADVANCED MACHINE LEARNING Specialization (Deep Learning, Bayesian Methods, Natural Language Processing, Reinforcement Learning, Computer Vision).
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Sfmlearner
An unsupervised learning framework for depth and ego-motion estimation from monocular videos
Stars: ✭ 1,661 (+1261.48%)
Mutual labels:  jupyter-notebook
Pyross
PyRoss: inference, forecasts, and optimised control of epidemiological models in Python - http://pyross.readthedocs.io
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Practicalsessions
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Tutorials Scikit Learn
Scikit-Learn tutorials
Stars: ✭ 121 (-0.82%)
Mutual labels:  jupyter-notebook
Mxnet realtime multi Person pose estimation
This is a mxnet version of Realtime_Multi-Person_Pose_Estimation, origin code is here https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Bps
Efficient Learning on Point Clouds with Basis Point Sets
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Applied Machine Learning
A step-by-step guide to get started with Applied Machine Learning
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Magface
MagFace: A Universal Representation for Face Recognition and Quality Assessment
Stars: ✭ 117 (-4.1%)
Mutual labels:  jupyter-notebook
Mpss
Modelos Probabilísticos de Señales y Sistemas
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Opencv 3 Computer Vision With Python Cookbook
Published by Packt
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Learn Machine Learning In Two Months
Những kiến thức cần thiết để học tốt Machine Learning trong vòng 2 tháng. Essential Knowledge for learning Machine Learning in two months.
Stars: ✭ 1,726 (+1314.75%)
Mutual labels:  jupyter-notebook
Kaggle
My solution to Web Traffic Predictions competition on Kaggle.
Stars: ✭ 121 (-0.82%)
Mutual labels:  jupyter-notebook
Hermes
Recommender System Framework
Stars: ✭ 121 (-0.82%)
Mutual labels:  jupyter-notebook
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Captiongen
Generate captions for an image using PyTorch
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook
Disvoice
feature extraction from speech signals
Stars: ✭ 121 (-0.82%)
Mutual labels:  jupyter-notebook
Prototypical Networks Tensorflow
Tensorflow implementation of NIPS 2017 Paper "Prototypical Networks for Few-shot Learning"
Stars: ✭ 122 (+0%)
Mutual labels:  jupyter-notebook

py_entitymatching

This project seeks to build a Python software package to match entities between two tables using supervised learning. This problem is often referred as entity matching (EM). Given two tables A and B, the goal of EM is to discover the tuple pairs between two tables that refer to the same real-world entities. There are two main steps involved in entity matching: blocking and matching. The blocking step aims to remove obvious non-matching tuple pairs and reduce the set considered for matching. Entity matching in practice involves many steps than just blocking and matching. While performing EM users often execute many steps, e.g. exploring, cleaning, debugging, sampling, estimating accuracy, etc. Current EM systems however do not cover the entire EM pipeline, providing support only for a few steps (e.g., blocking, matching), while ignoring less well-known yet equally critical steps (e.g., debgging, sampling). This package seeks to support all the steps involved in EM pipeline.

The package is free, open-source, and BSD-licensed.

Important links

Dependencies

The required dependencies to build the packages are:

  • numpy 1.7.0 or higher. Tested on version 1.19.4.
  • pandas (provides data structures to store and manage tables). Tested on version 1.1.4.
  • scikit-learn 0.22 or higher (provides implementations for common machine learning algorithms). Tested on version 0.23.2.
  • joblib (provides multiprocessing capabilities). Tested on version 0.17.0.
  • py_stringsimjoin (provides implementations for string similarity joins). Tested on version 0.3.2.
  • py_stringmatching (provides a set of string tokenizers and string similarity functions). Tested on version 0.4.2.
  • cloudpickle (provides functions to serialize Python constructs). Tested on version 1.6.0.
  • pyprind (library to display progress indicators). Tested on version 2.9.8.
  • pyparsing (library to parse strings). Tested on version 2.4.7.
  • six (provides functions to write compatible code across Python 2 and 3). Tested on version 1.15.0.

Platforms

py_entitymatching has been tested on Linux, OS X and Windows.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].