PDFConverterBest PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...
Stars: ✭ 94 (+44.62%)
Nlp profilerA simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (+178.46%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+190.77%)
TokenizersFast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+147.69%)
woollyThe Text Mining Elixir
Stars: ✭ 48 (-26.15%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+146.15%)
opentbsWith OpenTBS you can merge OpenOffice - LibreOffice and Ms Office documents with PHP using the TinyButStrong template engine. Simple use OpenOffice - LibreOffice or Ms Office to edit your templates : DOCX, XLSX, PPTX, ODT, OSD, ODP and other formats. That is the Natural Template philosophy.
Stars: ✭ 48 (-26.15%)
Awesome Nlp📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Stars: ✭ 12,626 (+19324.62%)
Textfeatures👷♂️ A simple package for extracting useful features from character objects 👷♀️
Stars: ✭ 148 (+127.69%)
eofficeExport and import graphics and tables to MicroSoft office
Stars: ✭ 19 (-70.77%)
QdapQuantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
Stars: ✭ 146 (+124.62%)
intertextDetect and visualize text reuse
Stars: ✭ 97 (+49.23%)
KateCode & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Stars: ✭ 135 (+107.69%)
odinsonOdinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (-9.23%)
KhcoderKH Coder: for Quantitative Content Analysis or Text Mining
Stars: ✭ 126 (+93.85%)
my-writing-workflowTutorial for converting markdown files in to APA-formatted docs, based on my workflow.
Stars: ✭ 35 (-46.15%)
soldocA solidity documentation generator, based in NatSpec format. 📃 with standalone HTML, pdf, gitbook and docsify output ✏️ just plug and play.
Stars: ✭ 54 (-16.92%)
Cogcomp NlpyCogComp's light-weight Python NLP annotators
Stars: ✭ 115 (+76.92%)
crminer⛔ ARCHIVED ⛔ Fetch 'Scholary' Full Text from 'Crossref'
Stars: ✭ 17 (-73.85%)
GeniusEasily access song lyrics from Genius in a tibble.
Stars: ✭ 111 (+70.77%)
Text predictorChar-level RNN LSTM text generator📄.
Stars: ✭ 99 (+52.31%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-30.77%)
LexiconA data package containing lexicons and dictionaries for text analysis
Stars: ✭ 87 (+33.85%)
DocXConvert NSAttributedString / AttributedString to .docx Word files on iOS and macOS
Stars: ✭ 41 (-36.92%)
Orange3 Text🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (+27.69%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (-7.69%)
PyphoneticsA Python 3 phonetics library.
Stars: ✭ 61 (-6.15%)
JoSH[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (-15.38%)
palladianPalladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (-50.77%)
PipeitPipeIt is a text transformation, conversion, cleansing and extraction tool.
Stars: ✭ 57 (-12.31%)
teanaps자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+40%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-23.08%)
Friend.lyA social media platform with a friend recommendation engine based on personality trait extraction
Stars: ✭ 41 (-36.92%)
SearchBlue Brain text mining toolbox for semantic search and structured information extraction
Stars: ✭ 26 (-60%)
TidytextText mining using tidy tools ✨📄✨
Stars: ✭ 975 (+1400%)
koshort(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
Stars: ✭ 62 (-4.62%)
documentspark💖 DocumentSpark - Simple secure document viewing server. Converts a document to a picture of its pages. Content disarm and reconstruction. CDR. Formerly p2. The CDR solution for ViewFinder remote browser.
Stars: ✭ 211 (+224.62%)
NlpplnNLP pipeline software using common workflow language
Stars: ✭ 31 (-52.31%)
docx2csvExtracts tables from .docx files and saves them as .csv or .xls files
Stars: ✭ 42 (-35.38%)
AutophraseAutoPhrase: Automated Phrase Mining from Massive Text Corpora
Stars: ✭ 835 (+1184.62%)
AravecAraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (+267.69%)
Rake NltkPython implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Stars: ✭ 793 (+1120%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+1115.38%)
TableDisentanglerFunctional and structural analysis of tables in research papers (Table disentangling)
Stars: ✭ 21 (-67.69%)
Gwu data miningMaterials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+233.85%)
Text2vecFast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+1000%)
BigartmFast topic modeling platform
Stars: ✭ 563 (+766.15%)
Nlp NotebooksA collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (+689.23%)
QminerAnalytic platform for real-time large-scale streams containing structured and unstructured data.
Stars: ✭ 206 (+216.92%)