textlearnRA simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.
textreadrTools to uniformly read in text data including semi-structured transcripts
extractnetA Dragnet that also extract author, headline, date, keywords from context
JoSH[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
odinsonOdinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
deduceDeduce: de-identification method for Dutch medical text
SearchBlue Brain text mining toolbox for semantic search and structured information extraction
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
PubMed-Best-MatchMachine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
teanaps자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
iisInformation Inference Service of the OpenAIRE system
TableDisentanglerFunctional and structural analysis of tables in research papers (Table disentangling)
estrattoparsing fixed width files content made easy
sentometricsAn integrated framework in R for textual sentiment time series aggregation and prediction
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
crminer⛔ ARCHIVED ⛔ Fetch 'Scholary' Full Text from 'Crossref'
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
perkeA keyphrase extractor for Persian
palladianPalladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
AnswerableRecommendation system for Stack Overflow unanswered questions
koshort(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.