BigartmFast topic modeling platform
Nlp NotebooksA collection of notebooks for Natural Language Processing from NLP Town
LdavisR package for web-based interactive topic model visualization.
Open Semantic SearchOpen Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
RmdlRMDL: Random Multimodel Deep Learning for Classification
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
RplosR client for the PLoS Journals API
Textractextract text from any document. no muss. no fuss.
TextminingPython文本挖掘系统 Research of Text Mining System
NlpythonThis repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
tg crawlerJust a crawler based on tg-cli for Telegram. Deprecated by now, please use telegram-export.
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
snorkelingExtracting biomedical relationships from literature with Snorkel 🏊
TwEaterA Python Bot for Scraping Conversations from Twitter
kwxBERT, LDA, and TFIDF based keyword extraction in Python
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
elpresidente🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'
ruimteholR package to Embed All the Things! using StarSpace
aera-workshopThis workshop introduces participants to the Learning Analytics (LA), and provides a brief overview of LA methodologies, literature, applications, and ethical issues as they relate to STEM education.
named-entity-recognitionNotebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities
sensimSentence Similarity Estimator (SenSim)
textstemTools for fast text stemming & lemmatization
blueprints-textJupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"
textdigesterTextDigester: document summarization java library
gofastrMake a DocumentTermMatrix faster
SparseLSHA Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Guten-gutterStrips boilerplate from Project Gutenberg text files
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
civicmineText mining cancer biomarkers for the CIVIC database
lda2vecMixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
learning2hash.github.ioWebsite for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
AdjutantRuns a pubmed query, returns results and allows user to explore high-level structure of returned documents
R.TeMiSR.TeMiS: R Text Mining Solution
TRUNAJOD2.0An easy-to-use library to extract indices from texts.
thrones2vecUsing Word2Vec to explore semantic similarities between the entities of "A Song of Ice and Fire" ("Game of Thrones").
converseConversational text Analysis using various NLP techniques
misinfo📊 Tools to Perform ‘Misinformation’ Analysis on a Text Corpus (wrapper for methods in https://github.com/PDXBek/Misinformation)
VERSEVancouver Event and Relation System for Extraction
readerDistant Reader, a tool for using & understanding a corpus
TabInOutFramework for information extraction from tables
nejiFlexible and powerful platform for biomedical information extraction from text
tf-idf-pythonTerm frequency–inverse document frequency for Chinese novel/documents implemented in python.