All Projects → stephenhky → Pyshorttextcategorization

stephenhky / Pyshorttextcategorization

Licence: mit
Various Algorithms for Short Text Mining

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyshorttextcategorization

Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (-78.79%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+66.67%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+301.4%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-85.78%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-16.55%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Nsga Ii
an implementation of NSGA-II in java
Stars: ✭ 18 (-95.8%)
Mutual labels:  algorithm, package
Hackerrank
This is the Repository where you can find all the solution of the Problems which you solve on competitive platforms mainly HackerRank and HackerEarth
Stars: ✭ 68 (-84.15%)
Mutual labels:  algorithm, natural-language-processing
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-97.2%)
Mutual labels:  text-mining, topic-modeling
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (-65.73%)
Mutual labels:  text-mining, topic-modeling
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (-57.81%)
Mutual labels:  natural-language-processing, text-mining
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (-78.79%)
Mutual labels:  text-mining, topic-modeling
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-93.71%)
Mutual labels:  text-mining, topic-modeling
Rake Nltk
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Stars: ✭ 793 (+84.85%)
Mutual labels:  algorithm, text-mining
Chat
基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话
Stars: ✭ 516 (+20.28%)
Mutual labels:  algorithm, natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+196.04%)
Mutual labels:  algorithm, natural-language-processing
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (-55.48%)
Mutual labels:  natural-language-processing, text-mining
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (-87.18%)
Mutual labels:  text-mining, topic-modeling
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (-38.23%)
Mutual labels:  natural-language-processing, text-mining
Lda
LDA topic modeling for node.js
Stars: ✭ 262 (-38.93%)
Mutual labels:  natural-language-processing, topic-modeling
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: ✭ 280 (-34.73%)
Mutual labels:  text-mining, topic-modeling

Short Text Mining in Python

Build Status GitHub release Documentation Status Updates Python 3

Introduction

This package shorttext is a Python package that facilitates supervised and unsupervised learning for short text categorization. Due to the sparseness of words and the lack of information carried in the short texts themselves, an intermediate representation of the texts and documents are needed before they are put into any classification algorithm. In this package, it facilitates various types of these representations, including topic modeling and word-embedding algorithms.

Since release 1.2.4, it runs on Python 3.8. Since release 1.2.3, support for Python 3.5 was decommissioned. Since release 1.1.7, support for Python 2.7 was decommissioned. Since release 1.0.8, it runs on Python 3.7 with 'TensorFlow' being the backend for keras. Since release 1.0.7, it runs on Python 3.7 as well, but the backend for keras cannot be TensorFlow. Since release 1.0.0, shorttext runs on Python 2.7, 3.5, and 3.6.

Characteristics:

  • example data provided (including subject keywords and NIH RePORT);
  • text preprocessing;
  • pre-trained word-embedding support;
  • gensim topic models (LDA, LSI, Random Projections) and autoencoder;
  • topic model representation supported for supervised learning using scikit-learn;
  • cosine distance classification;
  • neural network classification (including ConvNet, and C-LSTM);
  • maximum entropy classification;
  • metrics of phrases differences, including soft Jaccard score (using Damerau-Levenshtein distance), and Word Mover's distance (WMD);
  • character-level sequence-to-sequence (seq2seq) learning;
  • spell correction;
  • API for word-embedding algorithm for one-time loading; and
  • Sentence encodings and similarities based on BERT.

Documentation

Documentation and tutorials for shorttext can be found here: http://shorttext.rtfd.io/.

See tutorial for how to use the package, and FAQ.

Installation

To install it, in a console, use pip.

>>> pip install -U shorttext

or, if you want the most recent development version on Github, type

>>> pip install -U git+https://github.com/stephenhky/[email protected]

Developers are advised to make sure Keras >=2 be installed. Users are advised to install the backend Tensorflow (preferred) or Theano in advance. It is desirable if Cython has been previously installed too.

See installation guide for more details.

Issues

To report any issues, go to the Issues tab of the Github page and start a thread. It is welcome for developers to submit pull requests on their own to fix any errors.

Contributors

If you would like to contribute, feel free to submit the pull requests. You can talk to me in advance through e-mails or the Issues page.

Useful Links

News

  • 02/11/2021: shorttext 1.4.8 released.
  • 01/11/2021: shorttext 1.4.7 released.
  • 01/03/2021: shorttext 1.4.6 released.
  • 12/28/2020: shorttext 1.4.5 released.
  • 12/24/2020: shorttext 1.4.4 released.
  • 11/10/2020: shorttext 1.4.3 released.
  • 10/18/2020: shorttext 1.4.2 released.
  • 09/23/2020: shorttext 1.4.1 released.
  • 09/02/2020: shorttext 1.4.0 released.
  • 07/23/2020: shorttext 1.3.0 released.
  • 06/05/2020: shorttext 1.2.6 released.
  • 05/20/2020: shorttext 1.2.5 released.
  • 05/13/2020: shorttext 1.2.4 released.
  • 04/28/2020: shorttext 1.2.3 released.
  • 04/07/2020: shorttext 1.2.2 released.
  • 03/23/2020: shorttext 1.2.1 released.
  • 03/21/2020: shorttext 1.2.0 released.
  • 12/01/2019: shorttext 1.1.6 released.
  • 09/24/2019: shorttext 1.1.5 released.
  • 07/20/2019: shorttext 1.1.4 released.
  • 07/07/2019: shorttext 1.1.3 released.
  • 06/05/2019: shorttext 1.1.2 released.
  • 04/23/2019: shorttext 1.1.1 released.
  • 03/03/2019: shorttext 1.1.0 released.
  • 02/14/2019: shorttext 1.0.8 released.
  • 01/30/2019: shorttext 1.0.7 released.
  • 01/29/2019: shorttext 1.0.6 released.
  • 01/13/2019: shorttext 1.0.5 released.
  • 10/03/2018: shorttext 1.0.4 released.
  • 08/06/2018: shorttext 1.0.3 released.
  • 07/24/2018: shorttext 1.0.2 released.
  • 07/17/2018: shorttext 1.0.1 released.
  • 07/14/2018: shorttext 1.0.0 released.
  • 06/18/2018: shorttext 0.7.2 released.
  • 05/30/2018: shorttext 0.7.1 released.
  • 05/17/2018: shorttext 0.7.0 released.
  • 02/27/2018: shorttext 0.6.0 released.
  • 01/19/2018: shorttext 0.5.11 released.
  • 01/15/2018: shorttext 0.5.10 released.
  • 12/14/2017: shorttext 0.5.9 released.
  • 11/08/2017: shorttext 0.5.8 released.
  • 10/27/2017: shorttext 0.5.7 released.
  • 10/17/2017: shorttext 0.5.6 released.
  • 09/28/2017: shorttext 0.5.5 released.
  • 09/08/2017: shorttext 0.5.4 released.
  • 09/02/2017: end of GSoC project. (Report)
  • 08/22/2017: shorttext 0.5.1 released.
  • 07/28/2017: shorttext 0.4.1 released.
  • 07/26/2017: shorttext 0.4.0 released.
  • 06/16/2017: shorttext 0.3.8 released.
  • 06/12/2017: shorttext 0.3.7 released.
  • 06/02/2017: shorttext 0.3.6 released.
  • 05/30/2017: GSoC project (Chinmaya Pancholi, with gensim)
  • 05/16/2017: shorttext 0.3.5 released.
  • 04/27/2017: shorttext 0.3.4 released.
  • 04/19/2017: shorttext 0.3.3 released.
  • 03/28/2017: shorttext 0.3.2 released.
  • 03/14/2017: shorttext 0.3.1 released.
  • 02/23/2017: shorttext 0.2.1 released.
  • 12/21/2016: shorttext 0.2.0 released.
  • 11/25/2016: shorttext 0.1.2 released.
  • 11/21/2016: shorttext 0.1.1 released.

Possible Future Updates

  • [ ] Dividing components to other packages;
  • [ ] More available corpus.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].