kavgan / Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790
Labels
Projects that are alternatives of or similar to Nlp In Practice
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (-76.08%)
Mutual labels: jupyter-notebook, natural-language-processing, word2vec, gensim
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-92.28%)
Mutual labels: natural-language-processing, text-mining, gensim, tf-idf
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+43.29%)
Mutual labels: jupyter-notebook, natural-language-processing, text-classification, gensim
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (-75.19%)
Mutual labels: text-classification, word2vec, text-mining, gensim
Aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (-69.75%)
Mutual labels: jupyter-notebook, word2vec, text-mining, gensim
Pythoncode Tutorials
The Python Code Tutorials
Stars: ✭ 544 (-31.14%)
Mutual labels: jupyter-notebook, natural-language-processing, text-classification
Log Anomaly Detector
Log Anomaly Detection - Machine learning to detect abnormal events logs
Stars: ✭ 169 (-78.61%)
Mutual labels: jupyter-notebook, word2vec, gensim
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (-77.09%)
Mutual labels: jupyter-notebook, natural-language-processing, text-mining
Practical 1
Oxford Deep NLP 2017 course - Practical 1: word2vec
Stars: ✭ 220 (-72.15%)
Mutual labels: jupyter-notebook, natural-language-processing, word2vec
Pytorch Transformers Classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Stars: ✭ 229 (-71.01%)
Mutual labels: jupyter-notebook, natural-language-processing, text-classification
Nlp Notebooks
A collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (-35.06%)
Mutual labels: jupyter-notebook, natural-language-processing, text-mining
Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities.
Stars: ✭ 1,486 (+88.1%)
Mutual labels: jupyter-notebook, natural-language-processing, word2vec
Deep Math Machine Learning.ai
A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.
Stars: ✭ 173 (-78.1%)
Mutual labels: jupyter-notebook, natural-language-processing, word2vec
Nlp Tutorial
A list of NLP(Natural Language Processing) tutorials
Stars: ✭ 1,188 (+50.38%)
Mutual labels: jupyter-notebook, natural-language-processing, text-classification
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (-90.89%)
Mutual labels: jupyter-notebook, natural-language-processing, text-mining
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (-9.49%)
Mutual labels: natural-language-processing, word2vec, text-mining
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (-89.75%)
Mutual labels: text-classification, word2vec, tf-idf
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-96.2%)
Mutual labels: text-classification, word2vec, gensim
Applied Text Mining In Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan
Stars: ✭ 59 (-92.53%)
Mutual labels: jupyter-notebook, text-classification, text-mining
NLP-IN-PRACTICE
Use these NLP, Text Mining and Machine Learning code samples and tools to solve real world text data problems.
Notebooks / Source
Links in the first column take you to the subfolder/repository with the source code.
Task | Related Article | Source Type | Description |
---|---|---|---|
Large Scale Phrase Extraction | phrase2vec article | python script | Extract phrases for large amounts of data using PySpark. Annotate text using these phrases or use the phrases for other downstream tasks. |
Word Cloud for Jupyter Notebook and Python Web Apps | word_cloud article | python script + notebook | Visualize top keywords using word counts or tfidf |
Gensim Word2Vec (with dataset) | word2vec article | notebook | How to work correctly with Word2Vec to get desired results |
Reading files and word count with Spark | spark article | python script | How to read files of different formats using PySpark with a word count example |
Extracting Keywords with TF-IDF and SKLearn (with dataset) | tfidf article | notebook | How to extract interesting keywords from text using TF-IDF and Python's SKLEARN |
Text Preprocessing | text preprocessing article | notebook | A few code snippets on how to perform text preprocessing. Includes stemming, noise removal, lemmatization and stop word removal. |
TFIDFTransformer vs. TFIDFVectorizer | tfidftransformer and tfidfvectorizer usage article | notebook | How to use TFIDFTransformer and TFIDFVectorizer correctly and the difference between the two and what to use when. |
Accessing Pre-trained Word Embeddings with Gensim | Pre-trained word embeddings article | notebook | How to access pre-trained GloVe and Word2Vec Embeddings using Gensim and an example of how these embeddings can be leveraged for text similarity |
Text Classification in Python (with news dataset) | Text classification with Logistic Regression article | notebook | Get started with text classification. Learn how to build and evaluate a text classifier for news classification using Logistic Regression. |
CountVectorizer Usage Examples | How to Correctly Use CountVectorizer? An In-Depth Look article | notebook | Learn how to maximize the use of CountVectorizer such that you are not just computing counts of words, but also preprocessing your text data appropriately as well as extracting additional features from your text dataset. |
HashingVectorizer Examples | HashingVectorizer Vs. CountVectorizer article | notebook | Learn the differences between HashingVectorizer and CountVectorizer and when to use which. |
CBOW vs. SkipGram | Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI article | notebook | A quick comparison of the three embeddings architecture. |
Notes
- For more articles, please see this list.
- If you would like to receive articles via email subscribe to my mailing list.
Contact
This repository is maintained by Kavita Ganesan. Connect with me on LinkedIn or Twitter.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].