All Projects → alephdata → Fingerprints

alephdata / Fingerprints

Licence: mit
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Fingerprints

Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+541.76%)
Mutual labels:  deduplication, clustering
Cop Kmeans
A Python implementation of COP-KMEANS algorithm
Stars: ✭ 88 (-3.3%)
Mutual labels:  clustering
Pt Sdae
PyTorch implementation of SDAE (Stacked Denoising AutoEncoder)
Stars: ✭ 72 (-20.88%)
Mutual labels:  clustering
Supercluster
A very fast geospatial point clustering library for browsers and Node.
Stars: ✭ 1,246 (+1269.23%)
Mutual labels:  clustering
Tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
Stars: ✭ 74 (-18.68%)
Mutual labels:  clustering
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+1295.6%)
Mutual labels:  clustering
Vfs495
Validity VFS495 (138a:003f) drivers & utilities for Linux
Stars: ✭ 71 (-21.98%)
Mutual labels:  fingerprint
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+0%)
Mutual labels:  clustering
Machine learning code
机器学习与深度学习算法示例
Stars: ✭ 88 (-3.3%)
Mutual labels:  clustering
Vxscan
python3写的综合扫描工具,主要用来存活验证,敏感文件探测(目录扫描/js泄露接口/html注释泄露),WAF/CDN识别,端口扫描,指纹/服务识别,操作系统识别,POC扫描,SQL注入,绕过CDN,查询旁站等功能,主要用来甲方自测或乙方授权测试,请勿用来搞破坏。
Stars: ✭ 1,244 (+1267.03%)
Mutual labels:  fingerprint
React Native Fingerprint Identify
Awesome Fingerprint Identify for react-native (android only)
Stars: ✭ 81 (-10.99%)
Mutual labels:  fingerprint
Lithosphere Docker
The docker for lithosphere project
Stars: ✭ 76 (-16.48%)
Mutual labels:  clustering
Libcluster
Automatic cluster formation/healing for Elixir applications
Stars: ✭ 1,280 (+1306.59%)
Mutual labels:  clustering
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-19.78%)
Mutual labels:  clustering
Swarm
A robust and fast clustering method for amplicon-based studies
Stars: ✭ 88 (-3.3%)
Mutual labels:  clustering
Slash Framework
Provides both a low-level implementation of component-based entity systems and Unity3D integration for them.
Stars: ✭ 71 (-21.98%)
Mutual labels:  entity
Icellr
Single (i) Cell R package (iCellR) is an interactive R package to work with high-throughput single cell sequencing technologies (i.e scRNA-seq, scVDJ-seq, ST and CITE-seq).
Stars: ✭ 80 (-12.09%)
Mutual labels:  clustering
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (-6.59%)
Mutual labels:  clustering
Refinr
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Stars: ✭ 91 (+0%)
Mutual labels:  clustering
Excelcy
Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.
Stars: ✭ 89 (-2.2%)
Mutual labels:  entity

fingerprints

package

This library helps with the generation of fingerprints for entity data. A fingerprint in this context is understood as a simplified entity identifier, derived from it's name or address and used for cross-referencing of entity across different datasets.

Usage

import fingerprints

fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'

fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'

fp = fingerprints.generate('New York, New York')
assert fp == 'new york'

Company type names

A significant part of what fingerprints does it to recognize company legal form names. For example, fingerprints will be able to simplify Общество с ограниченной ответственностью to ООО, or Aktiengesellschaft to AG. The required database is based on two different sources:

Wikipedia also maintains an index of types of business entity.

See also

  • Clustering in Depth, part of the OpenRefine documentation discussing how to create collisions in data clustering.
  • probablepeople, parser for western names made by the brilliant folks at datamade.us.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].