A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (+964.71%)

Mutual labels: text-mining

Udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (+841.18%)

Mutual labels: text-mining

text-mined-synthesis public

Codes for text-mined solid-state reactions dataset

Stars: ✭ 46 (+170.59%)

Mutual labels: text-mining

Blue Brain text mining toolbox for semantic search and structured information extraction

Stars: ✭ 26 (+52.94%)

Mutual labels: text-mining

readability

Fast readability scores for text data

Stars: ✭ 22 (+29.41%)

Mutual labels: text-mining

textreadr

Tools to uniformly read in text data including semi-structured transcripts

Stars: ✭ 65 (+282.35%)

Mutual labels: text-mining

Aravec

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Stars: ✭ 239 (+1305.88%)

Mutual labels: text-mining

teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

Stars: ✭ 91 (+435.29%)

Mutual labels: text-mining

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+1052.94%)

Mutual labels: text-mining

neji

Flexible and powerful platform for biomedical information extraction from text

Stars: ✭ 37 (+117.65%)

Mutual labels: text-mining

Breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Stars: ✭ 186 (+994.12%)

Mutual labels: text-mining

woolly

The Text Mining Elixir

Stars: ✭ 48 (+182.35%)

Mutual labels: text-mining

Tokenizers

Fast, Consistent Tokenization of Natural Language Text

Stars: ✭ 161 (+847.06%)

Mutual labels: text-mining

odinson

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

Stars: ✭ 59 (+247.06%)

Mutual labels: text-mining

intertext

Detect and visualize text reuse

Stars: ✭ 97 (+470.59%)

Mutual labels: text-mining

Awesome Nlp

📖 A curated list of resources dedicated to Natural Language Processing (NLP)

Stars: ✭ 12,626 (+74170.59%)

Mutual labels: text-mining

Textfeatures

👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️

Stars: ✭ 148 (+770.59%)

Mutual labels: text-mining

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+164.71%)

Mutual labels: text-mining

deduce

Deduce: de-identification method for Dutch medical text

Stars: ✭ 40 (+135.29%)

Mutual labels: text-mining

perke

A keyphrase extractor for Persian

Stars: ✭ 60 (+252.94%)

Mutual labels: text-mining

textlearnR

A simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.

Stars: ✭ 16 (-5.88%)

Mutual labels: text-mining

Answerable

Recommendation system for Stack Overflow unanswered questions

Stars: ✭ 13 (-23.53%)

Mutual labels: text-mining

malay-dataset

Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html

Stars: ✭ 189 (+1011.76%)

Mutual labels: text-mining

koshort

(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.

Stars: ✭ 62 (+264.71%)

Mutual labels: text-mining

TabInOut

Framework for information extraction from tables

Stars: ✭ 37 (+117.65%)

Mutual labels: text-mining

clustext

Easy, fast clustering of texts

Stars: ✭ 18 (+5.88%)

Mutual labels: text-mining

coursera-gan-specialization

Programming assignments and quizzes from all courses within the GANs specialization offered by deeplearning.ai

Stars: ✭ 277 (+1529.41%)

Mutual labels: bias

Gwu data mining

Materials for GWU DNSC 6279 and DNSC 6290.

Stars: ✭ 217 (+1176.47%)

Mutual labels: text-mining

extractnet

A Dragnet that also extract author, headline, date, keywords from context

Stars: ✭ 52 (+205.88%)

Mutual labels: text-mining

Qminer

Analytic platform for real-time large-scale streams containing structured and unstructured data.

Stars: ✭ 206 (+1111.76%)

Mutual labels: text-mining

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-5.88%)

Mutual labels: text-mining

Fake news detection

Fake News Detection in Python

Stars: ✭ 194 (+1041.18%)

Mutual labels: text-mining

reader

Distant Reader, a tool for using & understanding a corpus

Stars: ✭ 18 (+5.88%)

Mutual labels: text-mining

Hdltex

HDLTex: Hierarchical Deep Learning for Text Classification

Stars: ✭ 191 (+1023.53%)

Mutual labels: text-mining

estratto

parsing fixed width files content made easy

Stars: ✭ 12 (-29.41%)

Mutual labels: text-mining

Texthero

Text preprocessing, representation and visualization from zero to hero.

Stars: ✭ 2,407 (+14058.82%)

Mutual labels: text-mining

SEDTWik-Event-Detection-from-Tweets

Segmentation based event detection from Tweets. Published at NAACL SRW 2019

Stars: ✭ 58 (+241.18%)

Mutual labels: text-mining

Multi rake

Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

Stars: ✭ 162 (+852.94%)

Mutual labels: text-mining

sentometrics

An integrated framework in R for textual sentiment time series aggregation and prediction

Stars: ✭ 77 (+352.94%)

Mutual labels: text-mining

Lazynlp

Library to scrape and clean web pages to create massive datasets.

Stars: ✭ 1,985 (+11576.47%)

Mutual labels: text-mining

rosette-elasticsearch-plugin

Document Enrichment plugin for Elasticsearch

Stars: ✭ 25 (+47.06%)

Mutual labels: text-mining

Awesome Text Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Stars: ✭ 158 (+829.41%)

Mutual labels: text-mining

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+4082.35%)

Mutual labels: text-mining

Chemdataextractor

Automatically extract chemical information from scientific documents

Stars: ✭ 152 (+794.12%)

Mutual labels: text-mining

Twitter-Sentiment-Analyzer

Twitter Sentiment Analyzer

Stars: ✭ 13 (-23.53%)

Mutual labels: text-mining

Xioc

Extract indicators of compromise from text, including "escaped" ones.

Stars: ✭ 148 (+770.59%)

Mutual labels: text-mining

BioMedical-NLP-corpus

Biomedical NLP Corpus or Datasets.

Stars: ✭ 44 (+158.82%)

Mutual labels: text-mining

VERSE

Vancouver Event and Relation System for Extraction

Stars: ✭ 13 (-23.53%)

Mutual labels: text-mining

pubcrawl

🍺📖 Convert 'epub' Files to Text (Use https://github.com/ropensci/epubr instead)

Stars: ✭ 22 (+29.41%)

Mutual labels: tidytext

tf-idf-python

Term frequency–inverse document frequency for Chinese novel/documents implemented in python.

Stars: ✭ 98 (+476.47%)

Mutual labels: text-mining

Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Stars: ✭ 31 (+82.35%)

Mutual labels: text-mining

1-60 of 163 similar projects

›