All Categories → Machine Learning → text-mining

Top 152 text-mining open source projects

Aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Multi rake
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Tokenizers
Fast, Consistent Tokenization of Natural Language Text
Lazynlp
Library to scrape and clean web pages to create massive datasets.
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Awesome Nlp
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Chemdataextractor
Automatically extract chemical information from scientific documents
Textfeatures
👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Qdap
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
Hands On Natural Language Processing With Python
This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.
Kate
Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Datasciencer
a curated list of R tutorials for Data Science, NLP and Machine Learning
Khcoder
KH Coder: for Quantitative Content Analysis or Text Mining
Textcluster
短文本聚类预处理模块 Short text cluster
Genius
Easily access song lyrics from Genius in a tibble.
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Lexicon
A data package containing lexicons and dictionaries for text analysis
R Text Data
List of textual data sources to be used for text mining in R
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Pyphonetics
A Python 3 phonetics library.
Konlpy
Python package for Korean natural language processing.
Pipeit
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Ngram
Fast n-Gram Tokenization
Spark Nkp
Natural Korean Processor for Apache Spark
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Friend.ly
A social media platform with a friend recommendation engine based on personality trait extraction
Tidytext
Text mining using tidy tools ✨📄✨
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Tidy Text Mining
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
Nlppln
NLP pipeline software using common workflow language
Spider
A configurable web spider with a easy-to-use web console
Autophrase
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Rake Nltk
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
1-60 of 152 text-mining projects