Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

✭ 386

python ui dockerfile search ocr osint search-engine named-entity-recognition annotation text-mining semantic journalism text-analysis search-interface

Rmdl

RMDL: Random Multimodel Deep Learning for Classification

✭ 375

python deep-learning machine-learning tensorflow keras deep-neural-networks convolutional-neural-networks cnn classification image-classification rnn text-classification data-mining recurrent-neural-networks text-mining information-retrieval dnn ensemble-learning

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

✭ 358

machine-learning awesome awesome-list nlp natural-language-processing list text-classification data-mining sentiment-analysis nlp-machine-learning text-mining topic-modeling text-analysis

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

✭ 348

python python3 python2 machine-learning data-science metrics classification text-classification data-mining text text-mining text-processing text-analysis spam

Graphbrain

Language, Knowledge, Cognition

✭ 294

python nlp natural-language-processing artificial-intelligence knowledge-graph text-mining natural-language-understanding knowledge knowledge-base text-analysis

Rplos

R client for the PLoS Journals API

✭ 289

r pdf xml rstats r-package metadata text-mining web-api

Textract

extract text from any document. no muss. no fuss.

✭ 3,165

python HTML Rich Text Format shell Makefile PostScript Dockerfile natural-language-processing data-mining text-mining

2018 Machinelearning Lectures Esa

Machine Learning Lectures at the European Space Agency (ESA) in 2018

✭ 280

jupyter-notebook deep-learning machine-learning neural-network clustering machinelearning anomaly-detection text-mining random-forest linear-regression topic-modeling decision-trees pca tf-idf

Textmining

Python文本挖掘系统 Research of Text Mining System

✭ 268

python text-mining sklearn tf-idf jieba

Nlpython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

✭ 265

python2 jupyter-notebook deep-learning natural-language-processing parsing text-mining feature-extraction feature-engineering

awesome-text-summarization

Text summarization starting from scratch.

✭ 86

nlp text-mining deep-learning text-summarization extractive-summarization papers-collection abstractive-summarization sentence-compression

tg crawler

Just a crawler based on tg-cli for Telegram. Deprecated by now, please use telegram-export.

✭ 71

python crawler text-mining telegram telegram-cli telegram-crawler

Text-Analysis

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

✭ 56

Jupyter Notebook python nlp workflow machine-learning text-mining analysis tool dataset hetnet snorkel methodology

TwEater

A Python Bot for Scraping Conversations from Twitter

✭ 16

python text-mining twitter spider tweets sentiment-analysis conversations emojis

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

eventextraction

中文复合事件抽取，能识别文本的模式，包括条件事件、顺承事件、反转事件等，可以用于文本逻辑性分析。

✭ 17

python nlp text-mining text-analaysis

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

elpresidente

🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'

✭ 21

r text-mining rstats tidytext executive-orders potus

DaDengAndHisPython

【微信公众号：大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]

✭ 59

Jupyter Notebook HTML text-mining text-classification text-analysis

ruimtehol

R package to Embed All the Things! using StarSpace

✭ 95

C++r shell Makefile nlp natural-language-processing text-mining similarity embeddings classification starspace

vor-knowledge-graph

🎓 Open knowledge mining and graph builder

✭ 57

python javascript HTML text-mining graph-database word2vec-model

aera-workshop

This workshop introduces participants to the Learning Analytics (LA), and provides a brief overview of LA methodologies, literature, applications, and ethical issues as they relate to STEM education.

✭ 14

HTML javascript machine-learning text-mining learning-analytics network-analysis

named-entity-recognition

Notebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities

✭ 18

HTML Jupyter Notebook natural-language-processing text-mining teaching named-entity-recognition digital-humanities jupyter-notebooks beginners intermediate spacy2 cambridge-uni

sensim

Sentence Similarity Estimator (SenSim)

✭ 15

python perl text-mining paper nlu nlp-machine-learning semantic-textual-similarity sentence-similarity-estimator

textstem

Tools for fast text stemming & lemmatization

✭ 36

r text-mining stemming lemmatization

blueprints-text

Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"

✭ 103

Jupyter Notebook HTML TeX python machine-learning natural-language-processing text-mining

advanced-text-mining

TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.

✭ 15

Jupyter Notebook nlp text-mining data-mining text-processing korean-text-processing korean-nlp teanaps

textdigester

TextDigester: document summarization java library

✭ 23

java text-mining maven summarization gate freeling deeplearning4j

ipo-miner

IPO Investment via Text Mining.

✭ 20

HTML Jupyter Notebook python nlp data-science machine-learning text-mining jupyter-notebooks ipo

gofastr

Make a DocumentTermMatrix faster

✭ 19

r text-mining manipulation data-reshaping document-term-matrix

SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

✭ 127

python machine-learning text-mining data-mining clustering sparse-matrices

sacred

📖 Sacred texts in R

✭ 19

r data text-mining bible rstats

Guten-gutter

Strips boilerplate from Project Gutenberg text files

✭ 16

python shell sanitization text-mining miscellaneous-utilities

restaurant-finder-featureReviews

Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

✭ 21