Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+324.73%)

Mutual labels: text-mining

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (+284.41%)

Mutual labels: text-mining

Unipdf

Golang PDF library for creating and processing PDF files (pure go)

Stars: ✭ 1,171 (+529.57%)

Mutual labels: text-extraction

Datasciencer

a curated list of R tutorials for Data Science, NLP and Machine Learning

Stars: ✭ 1,727 (+828.49%)

Mutual labels: text-mining

Pyphonetics

A Python 3 phonetics library.

Stars: ✭ 61 (-67.2%)

Mutual labels: text-mining

Awesome Text Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Stars: ✭ 158 (-15.05%)

Mutual labels: text-mining

Applied Text Mining In Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

Stars: ✭ 59 (-68.28%)

Mutual labels: text-mining

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (-34.95%)

Mutual labels: text-mining

Pipeit

PipeIt is a text transformation, conversion, cleansing and extraction tool.

Stars: ✭ 57 (-69.35%)

Mutual labels: text-mining

Tokenizers

Fast, Consistent Tokenization of Natural Language Text

Stars: ✭ 161 (-13.44%)

Mutual labels: text-mining

Ngram

Fast n-Gram Tokenization

Stars: ✭ 55 (-70.43%)

Mutual labels: text-mining

Scattertext

Beautiful visualizations of how language differs among document types.

Stars: ✭ 1,722 (+825.81%)

Mutual labels: text-mining

Tadw

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (-76.88%)

Mutual labels: text-mining

Chemdataextractor

Automatically extract chemical information from scientific documents

Stars: ✭ 152 (-18.28%)

Mutual labels: text-mining

Tika Python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Stars: ✭ 997 (+436.02%)

Mutual labels: text-extraction

Textcluster

短文本聚类预处理模块 Short text cluster

Stars: ✭ 115 (-38.17%)

Mutual labels: text-mining

Tidytext

Text mining using tidy tools ✨📄✨

Stars: ✭ 975 (+424.19%)

Mutual labels: text-mining

Nlp profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (-2.69%)

Mutual labels: text-mining

Uc Davis Cs Exams Analysis

📈 Regression and Classification with UC Davis student quiz data and exam data

Stars: ✭ 33 (-82.26%)

Mutual labels: text-mining

Learning Social Media Analytics With R

This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt

Stars: ✭ 102 (-45.16%)

Mutual labels: text-mining

Nlppln

NLP pipeline software using common workflow language

Stars: ✭ 31 (-83.33%)

Mutual labels: text-mining

Xioc

Extract indicators of compromise from text, including "escaped" ones.

Stars: ✭ 148 (-20.43%)

Mutual labels: text-mining

Text Mining

Text Mining in Python

Stars: ✭ 18 (-90.32%)

Mutual labels: text-mining

Lda Topic Modeling

A PureScript, browser-based implementation of LDA topic modeling.

Stars: ✭ 91 (-51.08%)

Mutual labels: text-mining

Autophrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

Stars: ✭ 835 (+348.92%)

Mutual labels: text-mining

Udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (-13.98%)

Mutual labels: text-mining

Rake Nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Stars: ✭ 793 (+326.34%)

Mutual labels: text-mining

R Text Data

List of textual data sources to be used for text mining in R

Stars: ✭ 85 (-54.3%)

Mutual labels: text-mining

Image Text Localization Recognition

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

Stars: ✭ 788 (+323.66%)

Mutual labels: text-extraction

Hands On Natural Language Processing With Python

This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.

Stars: ✭ 146 (-21.51%)

Mutual labels: text-mining

Unidoc

This repository has moved! https://github.com/unidoc/unipdf

Stars: ✭ 694 (+273.12%)

Mutual labels: text-extraction

Php Apache Tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats

Stars: ✭ 76 (-59.14%)

Mutual labels: text-extraction

Texthero

Text preprocessing, representation and visualization from zero to hero.