trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+282.26%)
GeniusEasily access song lyrics from Genius in a tibble.
Stars: ✭ 111 (-40.32%)
Pdfio.jlPDF Reader Library for Native Julia.
Stars: ✭ 56 (-69.89%)
Metasra PipelineMetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (-82.26%)
Friend.lyA social media platform with a friend recommendation engine based on personality trait extraction
Stars: ✭ 41 (-77.96%)
Textfeatures👷♂️ A simple package for extracting useful features from character objects 👷♀️
Stars: ✭ 148 (-20.43%)
LexiconA data package containing lexicons and dictionaries for text analysis
Stars: ✭ 87 (-53.23%)
SpiderA configurable web spider with a easy-to-use web console
Stars: ✭ 954 (+412.9%)
ArticleparseHeuristic text extraction from news sites in Python3
Stars: ✭ 6 (-96.77%)
Wikipedia ner📖 Labeled examples from wiki dumps in Python
Stars: ✭ 61 (-67.2%)
KhcoderKH Coder: for Quantitative Content Analysis or Text Mining
Stars: ✭ 126 (-32.26%)
KonlpyPython package for Korean natural language processing.
Stars: ✭ 1,098 (+490.32%)
Awesome Nlp📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Stars: ✭ 12,626 (+6688.17%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-73.12%)
Cogcomp NlpyCogComp's light-weight Python NLP annotators
Stars: ✭ 115 (-38.17%)
Gsoc2018 3gm💫 Automated codification of Greek Legislation with NLP
Stars: ✭ 36 (-80.65%)
LazynlpLibrary to scrape and clean web pages to create massive datasets.
Stars: ✭ 1,985 (+967.2%)
Tidy Text MiningManuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
Stars: ✭ 961 (+416.67%)
Text predictorChar-level RNN LSTM text generator📄.
Stars: ✭ 99 (-46.77%)
BagofconceptsPython implementation of bag-of-concepts
Stars: ✭ 18 (-90.32%)
QdapQuantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
Stars: ✭ 146 (-21.51%)
Orange3 Text🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (-55.38%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+324.73%)
Text2vecFast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+284.41%)
UnipdfGolang PDF library for creating and processing PDF files (pure go)
Stars: ✭ 1,171 (+529.57%)
Datasciencera curated list of R tutorials for Data Science, NLP and Machine Learning
Stars: ✭ 1,727 (+828.49%)
PyphoneticsA Python 3 phonetics library.
Stars: ✭ 61 (-67.2%)
PipeitPipeIt is a text transformation, conversion, cleansing and extraction tool.
Stars: ✭ 57 (-69.35%)
TokenizersFast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (-13.44%)
NgramFast n-Gram Tokenization
Stars: ✭ 55 (-70.43%)
ScattertextBeautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+825.81%)
TadwAn implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-76.88%)
ChemdataextractorAutomatically extract chemical information from scientific documents
Stars: ✭ 152 (-18.28%)
Tika PythonTika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Stars: ✭ 997 (+436.02%)
Textcluster短文本聚类预处理模块 Short text cluster
Stars: ✭ 115 (-38.17%)
TidytextText mining using tidy tools ✨📄✨
Stars: ✭ 975 (+424.19%)
Nlp profilerA simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (-2.69%)
Learning Social Media Analytics With RThis repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (-45.16%)
NlpplnNLP pipeline software using common workflow language
Stars: ✭ 31 (-83.33%)
XiocExtract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (-20.43%)
Lda Topic ModelingA PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (-51.08%)
AutophraseAutoPhrase: Automated Phrase Mining from Massive Text Corpora
Stars: ✭ 835 (+348.92%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (-13.98%)
Rake NltkPython implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Stars: ✭ 793 (+326.34%)
R Text DataList of textual data sources to be used for text mining in R
Stars: ✭ 85 (-54.3%)
Image Text Localization RecognitionA general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
Stars: ✭ 788 (+323.66%)
Hands On Natural Language Processing With PythonThis repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.
Stars: ✭ 146 (-21.51%)
UnidocThis repository has moved! https://github.com/unidoc/unipdf
Stars: ✭ 694 (+273.12%)
Php Apache TikaApache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Stars: ✭ 76 (-59.14%)
TextheroText preprocessing, representation and visualization from zero to hero.
Stars: ✭ 2,407 (+1194.09%)
Multi rakeMultilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Stars: ✭ 162 (-12.9%)
Lambda Text ExtractorAWS Lambda functions to extract text from various binary formats.
Stars: ✭ 159 (-14.52%)
KateCode & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Stars: ✭ 135 (-27.42%)