elpresidente🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'
Stars: ✭ 21 (+23.53%)
TableDisentanglerFunctional and structural analysis of tables in research papers (Table disentangling)
Stars: ✭ 21 (+23.53%)
text-analysisWeaving analytical stories from text data
Stars: ✭ 12 (-29.41%)
Pyss3A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (+1023.53%)
palladianPalladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (+88.24%)
PubMed-Best-MatchMachine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Stars: ✭ 36 (+111.76%)
JoSH[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (+223.53%)
Nlp profilerA simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (+964.71%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+841.18%)
SearchBlue Brain text mining toolbox for semantic search and structured information extraction
Stars: ✭ 26 (+52.94%)
readabilityFast readability scores for text data
Stars: ✭ 22 (+29.41%)
textreadrTools to uniformly read in text data including semi-structured transcripts
Stars: ✭ 65 (+282.35%)
AravecAraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (+1305.88%)
teanaps자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+435.29%)
ShallowlearnAn experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+1052.94%)
nejiFlexible and powerful platform for biomedical information extraction from text
Stars: ✭ 37 (+117.65%)
BreadabilityReworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Stars: ✭ 186 (+994.12%)
woollyThe Text Mining Elixir
Stars: ✭ 48 (+182.35%)
TokenizersFast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+847.06%)
odinsonOdinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (+247.06%)
intertextDetect and visualize text reuse
Stars: ✭ 97 (+470.59%)
Awesome Nlp📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Stars: ✭ 12,626 (+74170.59%)
Textfeatures👷♂️ A simple package for extracting useful features from character objects 👷♀️
Stars: ✭ 148 (+770.59%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+164.71%)
deduceDeduce: de-identification method for Dutch medical text
Stars: ✭ 40 (+135.29%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (+252.94%)
textlearnRA simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.
Stars: ✭ 16 (-5.88%)
AnswerableRecommendation system for Stack Overflow unanswered questions
Stars: ✭ 13 (-23.53%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+1011.76%)
koshort(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
Stars: ✭ 62 (+264.71%)
TabInOutFramework for information extraction from tables
Stars: ✭ 37 (+117.65%)
clustextEasy, fast clustering of texts
Stars: ✭ 18 (+5.88%)
coursera-gan-specializationProgramming assignments and quizzes from all courses within the GANs specialization offered by deeplearning.ai
Stars: ✭ 277 (+1529.41%)
Gwu data miningMaterials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+1176.47%)
extractnetA Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+205.88%)
QminerAnalytic platform for real-time large-scale streams containing structured and unstructured data.
Stars: ✭ 206 (+1111.76%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-5.88%)
readerDistant Reader, a tool for using & understanding a corpus
Stars: ✭ 18 (+5.88%)
HdltexHDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (+1023.53%)
estrattoparsing fixed width files content made easy
Stars: ✭ 12 (-29.41%)
TextheroText preprocessing, representation and visualization from zero to hero.
Stars: ✭ 2,407 (+14058.82%)
Multi rakeMultilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Stars: ✭ 162 (+852.94%)
sentometricsAn integrated framework in R for textual sentiment time series aggregation and prediction
Stars: ✭ 77 (+352.94%)
LazynlpLibrary to scrape and clean web pages to create massive datasets.
Stars: ✭ 1,985 (+11576.47%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+4082.35%)
ChemdataextractorAutomatically extract chemical information from scientific documents
Stars: ✭ 152 (+794.12%)
XiocExtract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+770.59%)
VERSEVancouver Event and Relation System for Extraction
Stars: ✭ 13 (-23.53%)
pubcrawl🍺📖 Convert 'epub' Files to Text (Use https://github.com/ropensci/epubr instead)
Stars: ✭ 22 (+29.41%)
tf-idf-pythonTerm frequency–inverse document frequency for Chinese novel/documents implemented in python.
Stars: ✭ 98 (+476.47%)