All Projects → hervenivon → aws-experiments-comprehend-custom-classifier

hervenivon / aws-experiments-comprehend-custom-classifier

Licence: other
How to train a custom NLP classifier with AWS Comprehend?

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to aws-experiments-comprehend-custom-classifier

Hacktoberfest-2k19
Just add pull requests to this repo and stand a chance to win a limited edition Hacktoberfest T-shirt.
Stars: ✭ 33 (+32%)
Mutual labels:  ml
cli
Polyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-28%)
Mutual labels:  ml
hierarchical-dnn-interpretations
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)
Stars: ✭ 110 (+340%)
Mutual labels:  ml
meerkat
Flexible data structures for complex machine learning datasets.
Stars: ✭ 115 (+360%)
Mutual labels:  ml
DeepBump
Normal & height maps generation from single pictures
Stars: ✭ 185 (+640%)
Mutual labels:  ml
neural inverse knitting
Code for Neural Inverse Knitting: From Images to Manufacturing Instructions
Stars: ✭ 30 (+20%)
Mutual labels:  ml
RE-VERB
speaker diarization system using an LSTM
Stars: ✭ 22 (-12%)
Mutual labels:  ml
responsible-ai-toolbox
This project provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
Stars: ✭ 615 (+2360%)
Mutual labels:  ml
FlutterIOT
Visit our website for more Mobile and Web applications
Stars: ✭ 66 (+164%)
Mutual labels:  ml
mlapp
MLApp is a Python library for building scalable data science solutions that meet modern software engineering standards.
Stars: ✭ 42 (+68%)
Mutual labels:  ml
ml-graphlab-boilerplate
Machine learning boiler plate to get you started in minutes (graphlab + sframe + jupyter + docker)
Stars: ✭ 17 (-32%)
Mutual labels:  ml
Learning-Resources
This repository contains curated, useful resources drafted by DSC Domain Leads.
Stars: ✭ 21 (-16%)
Mutual labels:  ml
neptune-client
📒 Experiment tracking tool and model registry
Stars: ✭ 348 (+1292%)
Mutual labels:  ml
predict Lottery ticket
双色球+大乐透彩票AI预测
Stars: ✭ 341 (+1264%)
Mutual labels:  ml
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+2520%)
Mutual labels:  ml
dask-sql
Distributed SQL Engine in Python using Dask
Stars: ✭ 271 (+984%)
Mutual labels:  ml
GatedPixelCNNPyTorch
PyTorch implementation of "Conditional Image Generation with PixelCNN Decoders" by van den Oord et al. 2016
Stars: ✭ 68 (+172%)
Mutual labels:  ml
creative-prediction
Creative Prediction with Neural Networks
Stars: ✭ 22 (-12%)
Mutual labels:  ml
pmml4s-spark
PMML scoring library for Spark as SparkML Transformer
Stars: ✭ 16 (-36%)
Mutual labels:  ml
deep-significance
Enabling easy statistical significance testing for deep neural networks.
Stars: ✭ 266 (+964%)
Mutual labels:  ml

Amazon Comprehend Experiment

Purpose 🎯

This repository provides resources to quickly analyze text and build a custom text classifier able to assign a specific class to a given text. It relates to the NLP (Natural Language Processing) field.

AWS Services ☁️

This repository explores Amazon Comprehend, a natural language processing (NLP) service that uses machine learning (ML) to find insights and relationships in texts. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; and understands how positive or negative the text is. For more information about everything Amazon Comprehend can do, see Amazon Comprehend Features.

In order to support that experiments other Amazon services are leveraged:

Amazon S3 to store the dataset for training and asynchronous analysis.

Amazon Simple Storage Service is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.

Amazon Sagemaker Notebook Instances to get an integrated to AWS environment able to manipulate and explore data:

Amazon Sagemaker Notebook Instances provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers.

Amazon API Gateway and AWS Lambda to build a serverless API:

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for applications to access data, business logic, or functionality from your backend services.

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.

Data and labels 🗄

We are going to use Yahoo answers corpus used in “Text Understanding from Scratch” paper by Xiang Zhang and Yann LeCun. This dataset is made available on the AWS Open Data Registry.

The guided steps aim at helping you using your own dataset to train your Custom Classifier, follow detailed recommendations, they are here to help you.

Repository structure 🏗

.
├── README.md                                             <-- This file
├── comprehend.ipynb                                      <-- Jupyter notebook which provides all details to interact with Amazon Comprehend
├── comprehend-experiment-notebook.yml                    <-- Cloud formation template to deploy the notebook
├── sam-app                                               <-- To support a real-time analysis API open to others in the Jupyter notebook, this repository also provides a SAM application to quickly deploy this API
│   ├── README.md
│   ├── custom_classifier                                  <-- Lambda function code
│   │   ├── __init__.py
│   │   ├── app.py
│   │   └── requirements.txt
│   ├── events
│   │   └── event.json                                    <-- event to test the API
│   ├── template.yaml                                     <-- AWS SAM template
│   └── tests                                             <-- Unit test for the lambda function
├── images                                                <-- Images used in the jupyther notebook. Some are drawio based and can be edited with xcode + drawio extension
└── command-line-path                                     <-- Directory for creating a Custom Classifier from the AWS CLI
    ├── ComprehendBucketAccessRole-Permissions.json       <-- Permissions for Amazon Comprehend to read the bucket
    ├── ComprehendBucketAccessRole-TrustPolicy.json       <-- Trust Policy for Amazon Comprehend to read the bucket
    ├── README.md                                         <-- Detailed step by step for the command line version
    ├── prepare_data.py                                   <-- Python script for data preparation
    └── requirements.txt                                  <-- Python script dependencies

Prerequisites ⚙️

You have an AWS account, and the AWS CLI is installed and configured. You have the proper IAM User and Role setup to run to both create and run a Sagemaker notebook instance.

Deploy the Sagemaker notebook instance

aws cloudformation deploy --template-file comprehend-experiment-notebook.yml --stack-name comprehend-experiment --capabilities CAPABILITY_IAM --region us-east-1

Note

Although the comprehend.ipynb notebook has been built to run in an Amazon SageMaker Notebook Instance, you should be able to run it outside of a notebook instance with minimal modifications (updating IAM role definition and installing the necessary libraries).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].