AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Analytic platform for real-time large-scale streams containing structured and unstructured data.
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Reworked parsing library (now is living alternative)
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Multi rake
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Fast, Consistent Tokenization of Natural Language Text
Library to scrape and clean web pages to create massive datasets.
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Awesome Nlp
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Automatically extract chemical information from scientific documents
👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️
Extract indicators of compromise from text, including "escaped" ones.
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
Hands On Natural Language Processing With Python
This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.
Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
a curated list of R tutorials for Data Science, NLP and Machine Learning
KH Coder: for Quantitative Content Analysis or Text Mining
短文本聚类预处理模块 Short text cluster
Easily access song lyrics from Genius in a tibble.
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
A data package containing lexicons and dictionaries for text analysis
R Text Data
List of textual data sources to be used for text mining in R
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
A Python 3 phonetics library.
Python package for Korean natural language processing.
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Fast n-Gram Tokenization
Spark Nkp
Natural Korean Processor for Apache Spark
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
A social media platform with a friend recommendation engine based on personality trait extraction
Text mining using tidy tools ✨📄✨
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Tidy Text Mining
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
NLP pipeline software using common workflow language
A configurable web spider with a easy-to-use web console
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Rake Nltk
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
