All Projects → hellohaptik → Chatbot_ner

hellohaptik / Chatbot_ner

Licence: gpl-3.0
chatbot_ner: Named Entity Recognition for chatbots.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chatbot ner

Danlp
DaNLP is a repository for Natural Language Processing resources for the Danish Language.
Stars: ✭ 111 (-59.34%)
Mutual labels:  natural-language-processing, named-entity-recognition, nlp-library
Pytorch Bert Crf Ner
KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)
Stars: ✭ 236 (-13.55%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+547.25%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
simple NER
simple rule based named entity recognition
Stars: ✭ 29 (-89.38%)
Mutual labels:  named-entity-recognition, ner, nlp-library
Convai Bot 1337
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager
Stars: ✭ 65 (-76.19%)
Mutual labels:  chatbot, chatbots, natural-language-processing
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-68.86%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Spacy Lookup
Named Entity Recognition based on dictionaries
Stars: ✭ 212 (-22.34%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+29.67%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Chatito
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (+148.35%)
Mutual labels:  chatbot, chatbots, named-entity-recognition
Snips Nlu
Snips Python library to extract meaning from text
Stars: ✭ 3,583 (+1212.45%)
Mutual labels:  chatbot, named-entity-recognition, ner
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+226.37%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Rasa Chatbot Templates
RASA chatbot use case boilerplate
Stars: ✭ 127 (-53.48%)
Mutual labels:  chatbot, chatbots, natural-language-processing
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+7950.55%)
Mutual labels:  natural-language-processing, named-entity-recognition, nlp-library
Bond
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-64.84%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Spacy Streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
Stars: ✭ 360 (+31.87%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Bert Sklearn
a sklearn wrapper for Google's BERT model
Stars: ✭ 182 (-33.33%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Yedda
YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Code for ACL 2018 Best Demo Paper Nomination.
Stars: ✭ 704 (+157.88%)
Mutual labels:  entity, named-entity-recognition, ner
Botfuel Dialog
Botfuel SDK to build highly conversational chatbots
Stars: ✭ 96 (-64.84%)
Mutual labels:  chatbot, chatbots, natural-language-processing
Rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Stars: ✭ 13,219 (+4742.12%)
Mutual labels:  chatbot, chatbots, natural-language-processing
virtual-assistant
Virtual Assistant
Stars: ✭ 67 (-75.46%)
Mutual labels:  chatbot, chatbots

Named Entity Recognition for chatbots

chatbotner logo

Chatbot NER is an open source framework custom built to supports entity recognition in text messages. After doing thorough research on existing NER systems, team at Haptik felt the strong need of building a framework which is tailored for Conversational AI and also supports Indian languages. Currently Chatbot-ner supports English, Hindi, Gujarati, Marathi, Bengali and Tamil and their code mixed form. Currently this framework uses common patterns along with few NLP techniques to extract necessary entities from languages with sparse data. API structure of Chatbot ner is designed keeping in mind usability for Conversational AI applications. Team at Haptik is continuously working towards porting this framework for all Indian languages and their respective local dialects.

Installation

Detailed documentation on how to setup Chatbot NER on your system using docker is available here.

Supported Entities

Entity type Code reference Description example Supported languages - ISO 639-1 code
Time TimeDetector Detect time from given text. tomorrow morning at 5, कल सुबह ५ बजे, kal subah 5 baje 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Date DateAdvancedDetector Detect date from given text next monday, agle somvar, अगले सोमवार 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Number NumberDetector Detect number and respective units in given text 50 rs per person, ५ किलो चावल, मुझे एक लीटर ऑइल चाहिए 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Phone number PhoneDetector Detect phone number in given text 9833530536, +91 9833530536, ९८३३४३०५३५ 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Email EmailDetector Detect email in text [email protected] 'en'
Text TextDetector Detect custom entities in text string using full text search in Datastore or based on contextual model Order me a pizza, मुंबई में मौसम कैसा है Search supported for 'en', 'hi', 'gu', 'bn', 'mr', 'ta', Contextual model supported for 'en' only
PNR PNRDetector Detect PNR (serial) codes in given text. My flight PNR is 4SGX3E 'en'
regex RegexDetector Detect entities using custom regex patterns My flight PNR is 4SGX3E NA

There are other custom detectors such as city, budget shopping size which are derived from above mentioned primary detectors but they are supported currently in English only and limited to Indian users only. We are currently in process of restructuring them to scale them across languages and geography and their current versions might be deprecated in future. So for applications already in production, we would recommend you to use only primary detectors mentioned in the table above.

API structure

Detailed documentation of APIs for all entity types is available here. Current API structure is built for ease of accessing it from conversational AI applications. However, it can be used for other applications also.

Framework Overview

In any conversational AI application, there are several entities to be identified and logic for detection on one entity might be different from other. We have organised this repository as shown below

entity hierarchy

We have classified entities into four main types i.e. numeral, pattern, temporal and textual.

  • numeral: This type will contain all the entities that deal with the numeral or numbers. For example, number detection, budget detection, size detection, etc.

  • pattern: This will contain all the detection logics where identification can be done using patterns or regular expressions. For example, email, phone_number, pnr, etc.

  • temporal: It will contain detection logics for detecting time and date.

  • textual: It identifies entities by looking at the dictionary. This detection mainly contains detection of text (like cuisine, dish, restaurants, etc.), the name of cities, the location of a user, etc.

Numeral, temporal and pattern have been moved to ner_v2 for language portability with more flexible detection logic. In ner_v1, currently only text entity has language support. We will be moving it to ner_v2 without any major API changes.

Contribution Guidelines

Currently, you can contribute to ner_v2 in Chatbot NER either by adding Training Data or by contributing Detection Patterns in form of regex. We will work on removing few architectural limitations which will ease out process of adding ML models and New Entities in future.

  • Adding Training Data: You can significantly improve detection capabilities of Chatbot NER by simply adding data in csv files. For example, date detection in Hindi and Hinglish can be improved by adding data in csv files mentioned in the image below. You can refer to documentation for date, time and numbers respectively if you wish to contribute. Date Contribution
  • Adding Detection Pattern: You can simply add custom language patterns for different languages by adding simple functions. An example of adding custom pattern for detecting number of people can be referred here.

Please refer to general steps of contribution, approval and coding guidelines mentioned here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].