alephdata / Fingerprints
Licence: mit
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Stars: ✭ 91
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Fingerprints
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+541.76%)
Mutual labels: deduplication, clustering
Cop Kmeans
A Python implementation of COP-KMEANS algorithm
Stars: ✭ 88 (-3.3%)
Mutual labels: clustering
Pt Sdae
PyTorch implementation of SDAE (Stacked Denoising AutoEncoder)
Stars: ✭ 72 (-20.88%)
Mutual labels: clustering
Supercluster
A very fast geospatial point clustering library for browsers and Node.
Stars: ✭ 1,246 (+1269.23%)
Mutual labels: clustering
Tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
Stars: ✭ 74 (-18.68%)
Mutual labels: clustering
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+1295.6%)
Mutual labels: clustering
Vfs495
Validity VFS495 (138a:003f) drivers & utilities for Linux
Stars: ✭ 71 (-21.98%)
Mutual labels: fingerprint
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+0%)
Mutual labels: clustering
Vxscan
python3写的综合扫描工具,主要用来存活验证,敏感文件探测(目录扫描/js泄露接口/html注释泄露),WAF/CDN识别,端口扫描,指纹/服务识别,操作系统识别,POC扫描,SQL注入,绕过CDN,查询旁站等功能,主要用来甲方自测或乙方授权测试,请勿用来搞破坏。
Stars: ✭ 1,244 (+1267.03%)
Mutual labels: fingerprint
React Native Fingerprint Identify
Awesome Fingerprint Identify for react-native (android only)
Stars: ✭ 81 (-10.99%)
Mutual labels: fingerprint
Libcluster
Automatic cluster formation/healing for Elixir applications
Stars: ✭ 1,280 (+1306.59%)
Mutual labels: clustering
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-19.78%)
Mutual labels: clustering
Swarm
A robust and fast clustering method for amplicon-based studies
Stars: ✭ 88 (-3.3%)
Mutual labels: clustering
Slash Framework
Provides both a low-level implementation of component-based entity systems and Unity3D integration for them.
Stars: ✭ 71 (-21.98%)
Mutual labels: entity
Icellr
Single (i) Cell R package (iCellR) is an interactive R package to work with high-throughput single cell sequencing technologies (i.e scRNA-seq, scVDJ-seq, ST and CITE-seq).
Stars: ✭ 80 (-12.09%)
Mutual labels: clustering
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (-6.59%)
Mutual labels: clustering
Refinr
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Stars: ✭ 91 (+0%)
Mutual labels: clustering
Excelcy
Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.
Stars: ✭ 89 (-2.2%)
Mutual labels: entity
fingerprints
This library helps with the generation of fingerprints for entity data. A fingerprint in this context is understood as a simplified entity identifier, derived from it's name or address and used for cross-referencing of entity across different datasets.
Usage
import fingerprints
fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'
fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'
fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
Company type names
A significant part of what fingerprints
does it to recognize company legal form
names. For example, fingerprints
will be able to simplify Общество с ограниченной ответственностью
to ООО
, or Aktiengesellschaft
to AG
. The required database
is based on two different sources:
- A Google Spreadsheet created by OCCRP.
- The ISO 20275: Entity Legal Forms Code List
Wikipedia also maintains an index of types of business entity.
See also
- Clustering in Depth, part of the OpenRefine documentation discussing how to create collisions in data clustering.
- probablepeople, parser for western names made by the brilliant folks at datamade.us.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].