deduceDeduce: de-identification method for Dutch medical text
Stars: ✭ 40 (-29.82%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (+5.26%)
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-15.79%)
XiocExtract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+159.65%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-21.05%)
Cogcomp NlpyCogComp's light-weight Python NLP annotators
Stars: ✭ 115 (+101.75%)
estrattoparsing fixed width files content made easy
Stars: ✭ 12 (-78.95%)
TRUNAJOD2.0An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (-68.42%)
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+510.53%)
Textcluster短文本聚类预处理模块 Short text cluster
Stars: ✭ 115 (+101.75%)
teanaps자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+59.65%)
text-analysisWeaving analytical stories from text data
Stars: ✭ 12 (-78.95%)
TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-52.63%)
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+149.12%)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+668.42%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+659.65%)
Aho CorasickA fast implementation of Aho-Corasick in Rust.
Stars: ✭ 424 (+643.86%)
TidytextText mining using tidy tools ✨📄✨
Stars: ✭ 975 (+1610.53%)
Chr🔤 Lightweight R package for manipulating [string] characters
Stars: ✭ 18 (-68.42%)
Open Semantic SearchOpen Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (+577.19%)
Text mining resourcesResources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+528.07%)
Diff Match PatchDiff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Stars: ✭ 4,910 (+8514.04%)
Concise Ipython Notebooks For Deep LearningIpython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-59.65%)
Gsoc2018 3gm💫 Automated codification of Greek Legislation with NLP
Stars: ✭ 36 (-36.84%)
AutophraseAutoPhrase: Automated Phrase Mining from Massive Text Corpora
Stars: ✭ 835 (+1364.91%)
GraphbrainLanguage, Knowledge, Cognition
Stars: ✭ 294 (+415.79%)
PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (+647.37%)
BagofconceptsPython implementation of bag-of-concepts
Stars: ✭ 18 (-68.42%)
BsedSimple SQL-like syntax on top of Perl text processing.
Stars: ✭ 414 (+626.32%)
PyparsingPython library for creating PEG parsers
Stars: ✭ 1,052 (+1745.61%)
RmdlRMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (+557.89%)
GohnHatena Notation (はてな記法) Parser written in Go
Stars: ✭ 17 (-70.18%)
Metasra PipelineMetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (-42.11%)
Rake NltkPython implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Stars: ✭ 793 (+1291.23%)
RplosR client for the PLoS Journals API
Stars: ✭ 289 (+407.02%)
Textractextract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+5452.63%)
TextpipeTextpipe: clean and extract metadata from text
Stars: ✭ 284 (+398.25%)
Lingua FrancaMycroft's multilingual text parsing and formatting library
Stars: ✭ 51 (-10.53%)
Qp Trie RsAn idiomatic and fast QP-trie implementation in pure Rust.
Stars: ✭ 47 (-17.54%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+1285.96%)
TextminingPython文本挖掘系统 Research of Text Mining System
Stars: ✭ 268 (+370.18%)
Text2vecFast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+1154.39%)
NlpythonThis repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (+364.91%)
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-66.67%)
Tidy Text MiningManuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
Stars: ✭ 961 (+1585.96%)
BigartmFast topic modeling platform
Stars: ✭ 563 (+887.72%)
tg crawlerJust a crawler based on tg-cli for Telegram. Deprecated by now, please use telegram-export.
Stars: ✭ 71 (+24.56%)
Nlp NotebooksA collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (+800%)
daachorse🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
Stars: ✭ 75 (+31.58%)
TadwAn implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-24.56%)
NlpplnNLP pipeline software using common workflow language
Stars: ✭ 31 (-45.61%)