All Projects → IBM → watson-document-classifier

IBM / watson-document-classifier

Licence: Apache-2.0 license
Augment IBM Watson Natural Language Understanding APIs with a configurable mechanism for text classification, uses Watson Studio.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to watson-document-classifier

pixiedust-facebook-analysis
A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio
Stars: ✭ 42 (+2.44%)
Mutual labels:  watson, natural-language, ibm-developer-technology-cognitive, ibmcode, watson-natural-language
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+14004.88%)
Mutual labels:  text-classification, natural-language, nlu, natural-language-understanding
sms-analysis-with-wks
Analyzing SMS offers for domain specific entities using Watson Knowledge Studio and Watson's Natural Language Understanding
Stars: ✭ 17 (-58.54%)
Mutual labels:  watson, natural-language-understanding, ibmcode, watson-natural-language
gdpr-fingerprint-pii
Use Watson Natural Language Understanding and Watson Knowledge Studio to fingerprint personal data from unstructured documents
Stars: ✭ 49 (+19.51%)
Mutual labels:  natural-language, nlu, ibmcode
visualize-data-with-python
A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+46.34%)
Mutual labels:  dsx, ibm-developer-technology-cognitive, ibmcode
Botlibre
An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
Stars: ✭ 412 (+904.88%)
Mutual labels:  natural-language, nlu, natural-language-understanding
TextFeatureSelection
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Stars: ✭ 42 (+2.44%)
Mutual labels:  text-classification, natural-language, natural-language-understanding
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+1829.27%)
Mutual labels:  text-classification, nlu
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+2660.98%)
Mutual labels:  text-classification, natural-language
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+3507.32%)
Mutual labels:  text-classification, nlu
watson-vehicle-damage-analyzer
A server and mobile app to send pictures of vehicle damage to IBM Watson Visual Recognition for classification
Stars: ✭ 62 (+51.22%)
Mutual labels:  watson, ibmcode
Nlc Icd10 Classifier
A simple web app that shows how Watson's Natural Language Classifier (NLC) can classify ICD-10 code. The app is written in Python using the Flask framework and leverages the Watson Developer Cloud Python SDK
Stars: ✭ 66 (+60.98%)
Mutual labels:  watson, ibmcode
Node Sdk
☄️ Node.js library to access IBM Watson services.
Stars: ✭ 1,471 (+3487.8%)
Mutual labels:  watson, natural-language
Chatito
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (+1553.66%)
Mutual labels:  text-classification, nlu
Snips Nlu
Snips Python library to extract meaning from text
Stars: ✭ 3,583 (+8639.02%)
Mutual labels:  text-classification, nlu
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-63.41%)
Mutual labels:  text-classification, natural-language-understanding
watson-multimedia-analyzer
WARNING: This repository is no longer maintained ⚠️ This repository will not be updated. The repository will be kept available in read-only mode. A Node app that use Watson Visual Recognition, Speech to Text, Natural Language Understanding, and Tone Analyzer to enrich media files.
Stars: ✭ 23 (-43.9%)
Mutual labels:  natural-language, ibmcode
watson-discovery-food-reviews
Combine Watson Knowledge Studio and Watson Discovery to discover customer sentiment from product reviews
Stars: ✭ 36 (-12.2%)
Mutual labels:  watson, ibmcode
vr-speech-sandbox-cardboard
WARNING: This repository is no longer maintained ⚠️ This repository will not be updated. The repository will be kept available in read-only mode.
Stars: ✭ 27 (-34.15%)
Mutual labels:  ibm-developer-technology-cognitive, ibmcode
NLP Quickbook
NLP in Python with Deep Learning
Stars: ✭ 516 (+1158.54%)
Mutual labels:  text-classification, natural-language

Augmented Classification of text with Watson Natural Language Understanding and Watson Studio

Read this in other languages: 한국어.

Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.

In this code pattern we will use Jupyter notebooks in Watson Studio to augment IBM Watson Natural Language Understanding API output through configurable mechanism for text classification.

When the reader has completed this code pattern, they will understand how to:

  • Create and run a Jupyter notebook in Watson Studio.
  • Use Object Storage to access data and configuration files.
  • Use IBM Watson Natural Language Understanding API to extract metadata from documents in Jupyter notebooks.
  • Extract and format unstructured data using simplified Python functions.
  • Use a configuration file to build configurable and layered classification grammar.
  • Use the combination of grammatical classification and regex patterns from a configuration file to classify word token classes.
  • Store the processed output JSON in Object Storage.

The intended audience for this code pattern is developers who want to learn a method for augmenting classification metadata obtained from Watson Natural Language Understanding API, in situations when there is a scarcity of historical data. The traditional approach of training a Text Analytics model yields less than expected results. The distinguishing factor of this code pattern is that it allows a configurable mechanism of text classification. It helps give a developer a head start in the case of text from a specialized domain, with no generally available English parser.

Included components

  • IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

  • IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market.

  • Watson Natural Language Understanding: A IBM Cloud service that can analyze text to extract meta-data from content such as concepts, entities, keywords, categories, sentiment, emotion, relations, semantic roles, using natural language understanding.

Featured technologies

  • Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

Watch the Video

Steps

Follow these steps to setup and run this code pattern. The steps are described in detail below.

  1. Sign up for Watson Studio
  2. Create IBM Cloud services
  3. Create the notebook
  4. Add the data and configuraton file
  5. Update the notebook with service credentials
  6. Run the notebook
  7. Download the results
  8. Analyze the results

1. Sign up for Watson Studio

Sign up for IBM's Watson Studio. By creating a project in Watson Studio a free tier Object Storage service will be created in your IBM Cloud account. Take note of your service names as you will need to select them in the following steps.

Note: When creating your Object Storage service, select the Free storage type in order to avoid having to pay an upgrade fee.

2. Create IBM Cloud services

Create the following IBM Cloud service and name it wdc-NLU-service:

3. Create the notebook

4. Add the data and configuration file

Add the data and configuration to the notebook

  • From the My Projects > Default page, Use Find and Add Data (look for the 10/01 icon) and its Files tab.
  • Click browse and navigate to this repo watson-document-classifier/data/sample_text.txt
  • Click browse and navigate to this repo watson-document-classifier/configuration/sample_config.txt

Note: It is possible to use your own data and configuration files. If you use a configuration file from your computer, make sure to conform to the JSON structure given in configuration/sample_config.txt.

Fix-up file names for your own data and configuration files

If you use your own data and configuration files, you will need to update the variables that refer to the data and configuration files in the Jupyter Notebook.

In the notebook, update the global variables in the cell following 2.3 Global Variables section.

Replace the sampleTextFileName with the name of your data file and sampleConfigFileName with your configuration file name.

5. Update the notebook with service credentials

Add the Watson Natural Language Understanding credentials to the notebook

Select the cell below 2.1 Add your service credentials from IBM Cloud for the Watson services section in the notebook to update the credentials for Watson Natural Langauage Understanding.

Open the Watson Natural Language Understanding service in your IBM Cloud Dashboard and click on your service, which you should have named wdc-NLU-service.

Once the service is open click the Service Credentials menu on the left.

In the Service Credentials that opens up in the UI, select whichever Credentials you would like to use in the notebook from the KEY NAME column. Click View credentials and copy username and password key values that appear on the UI in JSON format.

Update the username and password key values in the cell below 2.1 Add your service credentials from IBM Cloud for the Watson services section.

Add the Object Storage credentials to the notebook

  • Select the cell below 2.2 Add your service credentials for Object Storage section in the notebook to update the credentials for Object Store.
  • Delete the contents of the cell
  • Use Find and Add Data (look for the 10/01 icon) and its Files tab. You should see the file names uploaded earlier. Make sure your active cell is the empty one below 2.2 Add...
  • Select Insert to code (below your sample_text.txt).
  • Click Insert Crendentials from drop down menu.
  • Make sure the credentials are saved as credentials_1.

6. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

IMPORTANT: The first time you run your notebook, you will need to install the necessary packages in section 1.1 and then Restart the kernel.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

  • A blank, this indicates that the cell has never been executed.
  • A number, this number represents the relative order this code step was executed.
  • A *, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

  • One cell at a time.
    • Select the cell, and then press the Play button in the toolbar.
  • Batch mode, in sequential order.
    • From the Cell menu bar, there are several options available. For example, you can Run All cells in your notebook, or you can Run All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.
  • At a scheduled time.
    • Press the Schedule button located in the top right section of your notebook panel. Here you can schedule your notebook to be executed once at some future time, or repeatedly at your specified interval.

7. Download the results

  • To see the results, go to Object Storage
  • Click on the name of your object storage
  • Click on the Container with the name you gave your Notebook
  • Select sample_text_classification.txt file using select box to the left of the file listing
  • Click the SelectAction button and use the Download File drop down menu to download sample_text_classification.txt file.

8. Analyze the results

After running each cell of the notebook under Classify text, the results will display.

The configuration json controls the way the text is classified. The classification process is divided into stages - Base Tagging and Domain Tagging. The Base Tagging stage can be used to specify keywords based classification, regular expression based classification, and tagging based on chunking expressions. The Domain Tagging stage can be used to specify classification that is specific to the domain, in order to augment the results from Watson Natural Language Understanding.

We can modify the configuration json to add more keywords or add regular expressions. In this way, we can augment the text classification without any changes to the code. We can add more stages to the configuration json if required and enhance the text classification results with code modifications.

It can be seen from the classification results that the keywords and regular expressions specified in the configuration have been correctly classified in the analyzed text that is displayed.

Other scenarios and usecases for which a solution can be built using the above methodology

See USECASES.md.

Related links

Mine insights from software development artifacts

Get insights on personal finance data

Fingerprinting personal data from unstructured text

Troubleshooting

See DEBUGGING.md.

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].