woollyThe Text Mining Elixir
Stars: ✭ 48 (+200%)
textlearnRA simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.
Stars: ✭ 16 (+0%)
teanaps자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+468.75%)
clustextEasy, fast clustering of texts
Stars: ✭ 18 (+12.5%)
nejiFlexible and powerful platform for biomedical information extraction from text
Stars: ✭ 37 (+131.25%)
intertextDetect and visualize text reuse
Stars: ✭ 97 (+506.25%)
TRUNAJOD2.0An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (+12.5%)
readabilityFast readability scores for text data
Stars: ✭ 22 (+37.5%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+1081.25%)
rulr📐 Validation and unit conversion errors in TypeScript at compile-time. Started in 2016.
Stars: ✭ 43 (+168.75%)
TableDisentanglerFunctional and structural analysis of tables in research papers (Table disentangling)
Stars: ✭ 21 (+31.25%)
AdjutantRuns a pubmed query, returns results and allows user to explore high-level structure of returned documents
Stars: ✭ 59 (+268.75%)
tf-idf-pythonTerm frequency–inverse document frequency for Chinese novel/documents implemented in python.
Stars: ✭ 98 (+512.5%)
crminer⛔ ARCHIVED ⛔ Fetch 'Scholary' Full Text from 'Crossref'
Stars: ✭ 17 (+6.25%)
palladianPalladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (+100%)
extractnetA Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+225%)
text-analysisWeaving analytical stories from text data
Stars: ✭ 12 (-25%)
converseConversational text Analysis using various NLP techniques
Stars: ✭ 147 (+818.75%)
Bluemondaybluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS
Stars: ✭ 2,135 (+13243.75%)
SearchBlue Brain text mining toolbox for semantic search and structured information extraction
Stars: ✭ 26 (+62.5%)
filter⏳ Provide filtering, sanitizing, and conversion of Golang data. 提供对Golang数据的过滤,净化,转换。
Stars: ✭ 53 (+231.25%)
readerDistant Reader, a tool for using & understanding a corpus
Stars: ✭ 18 (+12.5%)
PubMed-Best-MatchMachine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Stars: ✭ 36 (+125%)
learning2hash.github.ioWebsite for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
Stars: ✭ 14 (-12.5%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (+0%)
TabInOutFramework for information extraction from tables
Stars: ✭ 37 (+131.25%)
estrattoparsing fixed width files content made easy
Stars: ✭ 12 (-25%)
sentometricsAn integrated framework in R for textual sentiment time series aggregation and prediction
Stars: ✭ 77 (+381.25%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+4343.75%)
R.TeMiSR.TeMiS: R Text Mining Solution
Stars: ✭ 21 (+31.25%)
html-sanitizerHTML sanitizer, written in PHP, aiming to provide XSS-safe markup based on explicitly allowed tags, attributes and values.
Stars: ✭ 18 (+12.5%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+181.25%)
TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (+68.75%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (+275%)
textreadrTools to uniformly read in text data including semi-structured transcripts
Stars: ✭ 65 (+306.25%)
AnswerableRecommendation system for Stack Overflow unanswered questions
Stars: ✭ 13 (-18.75%)
thrones2vecUsing Word2Vec to explore semantic similarities between the entities of "A Song of Ice and Fire" ("Game of Thrones").
Stars: ✭ 27 (+68.75%)
koshort(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
Stars: ✭ 62 (+287.5%)
JoSH[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (+243.75%)
pathvalidateA Python library to sanitize/validate a string such as filenames/file-paths/etc.
Stars: ✭ 139 (+768.75%)
civicmineText mining cancer biomarkers for the CIVIC database
Stars: ✭ 19 (+18.75%)
SanitizeRuby HTML and CSS sanitizer.
Stars: ✭ 1,940 (+12025%)
odinsonOdinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (+268.75%)
Govalidator[Go] Package of validators and sanitizers for strings, numerics, slices and structs
Stars: ✭ 5,163 (+32168.75%)
misinfo📊 Tools to Perform ‘Misinformation’ Analysis on a Text Corpus (wrapper for methods in https://github.com/PDXBek/Misinformation)
Stars: ✭ 17 (+6.25%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (+31.25%)
lda2vecMixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+68.75%)
VERSEVancouver Event and Relation System for Extraction
Stars: ✭ 13 (-18.75%)
deduceDeduce: de-identification method for Dutch medical text
Stars: ✭ 40 (+150%)