All Projects → Vaibhavs10 → ml-with-text

Vaibhavs10 / ml-with-text

Licence: other
[Tutorial] Demystifying Natural Language Processing with Python

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ml-with-text

text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+350%)
Mutual labels:  text-classification
classifier multi label
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification
Stars: ✭ 127 (+605.56%)
Mutual labels:  text-classification
Research-Paper-Categorization
Research paper classification using machine learning and NLP
Stars: ✭ 23 (+27.78%)
Mutual labels:  text-classification
TextCategorization
⚡ Using deep learning (MLP, CNN, Graph CNN) to classify text in TensorFlow.
Stars: ✭ 30 (+66.67%)
Mutual labels:  text-classification
TLA
A comprehensive tool for linguistic analysis of communities
Stars: ✭ 47 (+161.11%)
Mutual labels:  text-classification
trove
Weakly supervised medical named entity classification
Stars: ✭ 55 (+205.56%)
Mutual labels:  text-classification
nlp classification workshop
NLP Classification Workshop
Stars: ✭ 22 (+22.22%)
Mutual labels:  text-classification
Deep-NLP-Resources
Curated list of all NLP Resources
Stars: ✭ 65 (+261.11%)
Mutual labels:  text-classification
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: ✭ 45 (+150%)
Mutual labels:  text-classification
deep-learning
Deep Learning Bootcamp
Stars: ✭ 60 (+233.33%)
Mutual labels:  text-classification
TextFeatureSelection
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Stars: ✭ 42 (+133.33%)
Mutual labels:  text-classification
classifier multi label seq2seq attention
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search
Stars: ✭ 26 (+44.44%)
Mutual labels:  text-classification
clfzoo
A deep text classifiers library.
Stars: ✭ 37 (+105.56%)
Mutual labels:  text-classification
awesome-text-classification
Text classification meets word embeddings.
Stars: ✭ 27 (+50%)
Mutual labels:  text-classification
kaggle redefining cancer treatment
Personalized Medicine: Redefining Cancer Treatment with deep learning
Stars: ✭ 21 (+16.67%)
Mutual labels:  text-classification
cnn-text-classification
Text classification with Convolution Neural Networks on Yelp, IMDB & sentence polarity dataset v1.0
Stars: ✭ 108 (+500%)
Mutual labels:  text-classification
QGNN
Quaternion Graph Neural Networks (ACML 2021) (Pytorch and Tensorflow)
Stars: ✭ 31 (+72.22%)
Mutual labels:  text-classification
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (+33.33%)
Mutual labels:  text-classification
Customer-Feedback-Analysis
Multi Class Text (Feedback) Classification using CNN, GRU Network and pre trained Word2Vec embedding, word embeddings on TensorFlow.
Stars: ✭ 18 (+0%)
Mutual labels:  text-classification
Text tone analyzer
Система, анализирующая тональность текстов и высказываний.
Stars: ✭ 15 (-16.67%)
Mutual labels:  text-classification

Tutorial: Machine Learning with Text using Python

Round of applause to Kevin Markham and his video tutorials!

Instructor: Vaibhav Srivastav

Description

Although numeric data is easy to work with in Python, most knowledge created by humans is actually raw, unstructured text. By learning how to transform text into data that is usable by machine learning models, you drastically increase the amount of data that your models can learn from. In this tutorial, we'll build and evaluate predictive models from real-world text using scikit-learn.

Objectives

By the end of this tutorial, attendees will be able to confidently build a predictive model from their own text-based data, including feature extraction, model building and model evaluation.

Required Software

Attendees will need to bring a laptop with scikit-learn and pandas (and their dependencies) already installed. Installing the Anaconda distribution of Python is an easy way to accomplish this. Both Python 2 and 3 are welcome.

I will be leading the tutorial using the IPython/Jupyter notebook, and have added a pre-written notebook to this repository. I have also created a Python script that is identical to the notebook, which you can use in the Python environment of your choice.

Tutorial Files

Prerequisite Knowledge

Attendees to this tutorial should be comfortable working in Python, should understand the basic principles of machine learning, and should have at least basic experience with both pandas and scikit-learn. However, no knowledge of advanced mathematics is required.

Abstract

It can be difficult to figure out how to work with text in scikit-learn, even if you're already comfortable with the scikit-learn API. Many questions immediately come up: Which vectorizer should I use, and why? What's the difference between a "fit" and a "transform"? What's a document-term matrix, and why is it so sparse? Is it okay for my training data to have more features than observations? What's the appropriate machine learning model to use? And so on...

In this tutorial, we'll answer all of those questions, and more! We'll start by walking through the vectorization process in order to understand the input and output formats. Then we'll read a simple dataset into pandas, and immediately apply what we've learned about vectorization. We'll move on to the model building process, including a discussion of which model is most appropriate for the task. We'll evaluate our model a few different ways, and then examine the model for greater insight into how the text is influencing its predictions. Finally, we'll practice this entire workflow on a new dataset, and end with a discussion of which parts of the process are worth tuning for improved performance.

Detailed Outline

  1. Model building in scikit-learn (refresher)
  2. Representing text as numerical data
  3. Reading a text-based dataset into pandas
  4. Vectorizing our dataset
  5. Building and evaluating a model
  6. Comparing models
  7. Examining a model for further insight
  8. Practicing this workflow on another dataset
  9. Tuning the vectorizer (discussion)

About the Instructor

Vaibhav Srivastav is a Data Scientist currently working with Deloitte Consulting LLP. He has a demonstrated experience of more than 3 plus years in building large scale Machine Learning and Natural Language Processing solutions for Fortune Technology 10 clients.

In his free time he teaches Machine Learning/ Data Science to young coders! If Python is what floats your boat then hit him up on any of the channels below:

Recommended Resources

Text classification:

  • Read Paul Graham's classic post, A Plan for Spam, for an overview of a basic text classification system using a Bayesian approach. (He also wrote a follow-up post about how he improved his spam filter.)
  • Coursera's Natural Language Processing (NLP) course has video lectures on text classification, tokenization, Naive Bayes, and many other fundamental NLP topics. (Here are the slides used in all of the videos.)
  • Automatically Categorizing Yelp Businesses discusses how Yelp uses NLP and scikit-learn to solve the problem of uncategorized businesses.
  • How to Read the Mind of a Supreme Court Justice discusses CourtCast, a machine learning model that predicts the outcome of Supreme Court cases using text-based features only. (The CourtCast creator wrote a post explaining how it works, and the Python code is available on GitHub.)
  • Identifying Humorous Cartoon Captions is a readable paper about identifying funny captions submitted to the New Yorker Caption Contest.
  • In this PyData video (50 minutes), Facebook explains how they use scikit-learn for sentiment classification by training a Naive Bayes model on emoji-labeled data.

Naive Bayes and logistic regression:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].