Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.

Stars: ✭ 85 (-60.28%)

Mutual labels: tf-idf

Polyfuzz

Fuzzy string matching, grouping, and evaluation.

Stars: ✭ 292 (+36.45%)

Mutual labels: tf-idf

Textvec

Text vectorization tool to outperform TFIDF for classification tasks

Stars: ✭ 167 (-21.96%)

Mutual labels: tf-idf

Textmining

Python文本挖掘系统 Research of Text Mining System

Stars: ✭ 268 (+25.23%)

Mutual labels: tf-idf

How To Mine Newsfeed Data And Extract Interactive Insights In Python

A practical guide to topic mining and interactive visualizations

Stars: ✭ 61 (-71.5%)

Mutual labels: tf-idf

Textclassification

several methods for text classification

Stars: ✭ 180 (-15.89%)

Mutual labels: tf-idf

Vntk

Vietnamese NLP Toolkit for Node

Stars: ✭ 170 (-20.56%)

Mutual labels: tf-idf

Textclustering

Stars: ✭ 89 (-58.41%)

Mutual labels: tf-idf

View All Similar Projects ➔

The simplest TF-IDF library imaginable.

Usage

Add your documents as two-element lists [doc_name, [list_of_words_in_the_document]] with add_document(doc_name, list_of_words).

table.add_document("foo", ["alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"])

Get a list of all the [doc_name, similarity_score] pairs relative to a list of words by calling similarities([list_of_words]). Resulting similarities will always be between 0.0 and 1.0, inclusive.

table.similarities(["alpha", "bravo", "charlie"])

So, for example:

from tfidf import TfIdf

table = TfIdf()
table.add_document("foo", ["alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"])
table.add_document("bar", ["alpha", "bravo", "charlie", "india", "juliet", "kilo"])
table.add_document("baz", ["kilo", "lima", "mike", "november"])

print table.similarities(["alpha", "bravo", "charlie"]) # => [['foo', 0.6875], ['bar', 0.75], ['baz', 0.0]]

Run the tests

The tests use the standard library's unittest module, so there's no need to install anything. Just run:

$ python test_tfidf.py

Disclaimer

This library is a pretty clean example of how TF-IDF operates. However, it's totally unconcerned with efficiency (it's just an exercise to brush up my Python skills), so you probably don't want to be using it in production. If you're looking for a more heavy-duty Python library to do information retrieval and topic modeling, I'd suggest taking a look at Gensim.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 214

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗