All Projects → gentom → sentences-similarity-cluster

gentom / sentences-similarity-cluster

Licence: MIT License
Calculate similarity of sentences & Cluster the result.

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to sentences-similarity-cluster

Ncar Python Tutorial
Numerical & Scientific Computing with Python Tutorial
Stars: ✭ 50 (+257.14%)
Mutual labels:  scipy, matplotlib
Cheatsheets Ai
Essential Cheat Sheets for deep learning and machine learning researchers https://medium.com/@kailashahirwar/essential-cheat-sheets-for-machine-learning-and-deep-learning-researchers-efb6a8ebd2e5
Stars: ✭ 14,095 (+100578.57%)
Mutual labels:  scipy, matplotlib
Ml Cheatsheet
A constantly updated python machine learning cheatsheet
Stars: ✭ 136 (+871.43%)
Mutual labels:  scipy, matplotlib
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+157385.71%)
Mutual labels:  scipy, matplotlib
scipy-crash-course
Material for a 24 hours course on Scientific Python
Stars: ✭ 98 (+600%)
Mutual labels:  scipy, matplotlib
Notes Python
中文 Python 笔记
Stars: ✭ 6,127 (+43664.29%)
Mutual labels:  scipy, matplotlib
Tftb
A Python module for time-frequency analysis
Stars: ✭ 185 (+1221.43%)
Mutual labels:  scipy, matplotlib
Audio Spectrum Analyzer In Python
A series of Jupyter notebooks and python files which stream audio from a microphone using pyaudio, then processes it.
Stars: ✭ 273 (+1850%)
Mutual labels:  scipy, matplotlib
jupyter boilerplate
Adds a customizable menu item to Jupyter (IPython) notebooks to insert boilerplate snippets of code
Stars: ✭ 69 (+392.86%)
Mutual labels:  scipy, matplotlib
CNCC-2019
Computational Neuroscience Crash Course (CNCC 2019)
Stars: ✭ 26 (+85.71%)
Mutual labels:  scipy, matplotlib
Scipy-Bordeaux-2017
Course taught at the University of Bordeaux in the academic year 2017 for PhD students.
Stars: ✭ 16 (+14.29%)
Mutual labels:  scipy, matplotlib
anesthetic
Nested Sampling post-processing and plotting
Stars: ✭ 34 (+142.86%)
Mutual labels:  scipy, matplotlib
Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (+2621.43%)
Mutual labels:  scipy, matplotlib
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+56778.57%)
Mutual labels:  scipy, matplotlib
Scipy Lecture Notes Zh Cn
中文版scipy-lecture-notes. 网站下线, 以离线HTML的形式继续更新, 见release.
Stars: ✭ 362 (+2485.71%)
Mutual labels:  scipy, matplotlib
Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (+914.29%)
Mutual labels:  scipy, matplotlib
Python-Matematica
Explorando aspectos fundamentais da matemática com Python e Jupyter
Stars: ✭ 41 (+192.86%)
Mutual labels:  scipy, matplotlib
The Elements Of Statistical Learning Notebooks
Jupyter notebooks for summarizing and reproducing the textbook "The Elements of Statistical Learning" 2/E by Hastie, Tibshirani, and Friedman
Stars: ✭ 241 (+1621.43%)
Mutual labels:  scipy, matplotlib
Algorithmic-Trading
Algorithmic trading using machine learning.
Stars: ✭ 102 (+628.57%)
Mutual labels:  scipy, matplotlib
introduction to ml with python
도서 "[개정판] 파이썬 라이브러리를 활용한 머신 러닝"의 주피터 노트북과 코드입니다.
Stars: ✭ 211 (+1407.14%)
Mutual labels:  scipy, matplotlib

sentences-similarity-cluster

sensim_cluster calculates the similarity of text data(from file) using Levenshtein distance and clusters(hierarchical clustering) the result. Clustering results are displayed with dendrogram.

Usage

  1. Prepare your data file
  2. Run this program below
# -*- coding: utf-8 -*-
import sys
from sensim_cluster.sensim_cluster import SensimCluster
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram

cluster = SensimCluster('YOUR_DATAFILE_PATH')
ids = cluster.get_ids()
result = cluster.ward()
mod_ids = [id[-6:] for id in ids]
r = dendrogram(result, p=100, truncate_mode='lastp', labels=mod_ids, leaf_rotation=90)
print(r['leaves'])
print(r['ivl'])
plt.ylim(ymin=-10.0)
plt.show()

Docker-Compose

# build from docker-compose.yml
docker-compose build

# run container "app"
docker-compose run app

# kill container
docker-compose kill

# delete container
docker-compose rm

sentences-similarity-cluster (Old Version)

sim_cluster.py calculates the similarity of text data(from file) using Levenshtein distance and clusters(hierarchical clustering) the result. Clustering results are displayed with dendrogram.

Usage

1. Prepare your data file
2. Execute
python sim_cluster.py your_file

Example

1. Prepare the data file

./data/dummydata.csv

A,helloworld
B,hallawerld
C,helldwoody
D,hallowarld
E,galloworld
F,herroworld

2. Execute

python sim_cluster.py ./data/dummydata.csv

3. Result

result

[['A', 'helloworld'], ['B', 'hallawerld'], ['C', 'helldwoody'], ['D', 'hallowarld'], ['E', 'galloworld'], ['F', 'herroworld']]
['A', 'B', 'C', 'D', 'E', 'F']
['helloworld', 'hallawerld', 'helldwoody', 'hallowarld', 'galloworld', 'herroworld']
n0, n0 : 0
n0, n1 : 3
n0, n2 : 4
n0, n3 : 2
n0, n4 : 2
n0, n5 : 2
n1, n0 : 3
n1, n1 : 0
n1, n2 : 6
n1, n3 : 2
n1, n4 : 3
n1, n5 : 5
n2, n0 : 4
n2, n1 : 6
n2, n2 : 0
n2, n3 : 6
n2, n4 : 6
n2, n5 : 6
n3, n0 : 2
n3, n1 : 2
n3, n2 : 6
n3, n3 : 0
n3, n4 : 2
n3, n5 : 4
n4, n0 : 2
n4, n1 : 3
n4, n2 : 6
n4, n3 : 2
n4, n4 : 0
n4, n5 : 4
n5, n0 : 2
n5, n1 : 5
n5, n2 : 6
n5, n3 : 4
n5, n4 : 4
n5, n5 : 0
-------------------------
matrix: [[0, 3, 4, 2, 2, 2], [3, 0, 6, 2, 3, 5], [4, 6, 0, 6, 6, 6], [2, 2, 6, 0, 2, 4], [2, 3, 6, 2, 0, 4], [2, 5, 6, 4, 4, 0]]
-------------------------
[[  3.           4.           3.           2.        ]
 [  1.           6.           4.2031734    3.        ]
 [  0.           5.           4.89897949   2.        ]
 [  7.           8.           7.57187779   5.        ]
 [  2.           9.          12.05542755   6.        ]]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].