All Projects → tzano → fountain

tzano / fountain

Licence: MIT license
Natural Language Data Augmentation Tool for Conversational Systems

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to fountain

xbot
Task-oriented Chatbot
Stars: ✭ 78 (-30.97%)
Mutual labels:  nlu, conversational-ai
Rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Stars: ✭ 13,219 (+11598.23%)
Mutual labels:  nlu, conversational-ai
nlp-dialogue
A full-process dialogue system that can be deployed online
Stars: ✭ 69 (-38.94%)
Mutual labels:  nlu, conversational-ai
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: ✭ 45 (-60.18%)
Mutual labels:  nlu, conversational-ai
Botlibre
An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
Stars: ✭ 412 (+264.6%)
Mutual labels:  natural-language, nlu
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+30.09%)
Mutual labels:  nlu, conversational-ai
Botpress
🤖 Dev tools to reliably understand text and automate conversations. Built-in NLU. Connect & deploy on any messaging channel (Slack, MS Teams, website, Telegram, etc).
Stars: ✭ 9,486 (+8294.69%)
Mutual labels:  nlu, conversational-ai
airy
💬 Open source conversational platform to power conversations with an open source Live Chat, Messengers like Facebook Messenger, WhatsApp and more - 💎 UI from Inbox to dashboards - 🤖 Integrations to Conversational AI / NLP tools and standard enterprise software - ⚡ APIs, WebSocket, Webhook - 🔧 Create any conversational experience
Stars: ✭ 299 (+164.6%)
Mutual labels:  nlu, conversational-ai
Rasa nlu gq
turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)
Stars: ✭ 256 (+126.55%)
Mutual labels:  natural-language, nlu
gdpr-fingerprint-pii
Use Watson Natural Language Understanding and Watson Knowledge Studio to fingerprint personal data from unstructured documents
Stars: ✭ 49 (-56.64%)
Mutual labels:  natural-language, nlu
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+4032.74%)
Mutual labels:  nlu, conversational-ai
Wisty.js
🧚‍♀️ Chatbot library turning conversations into actions, locally, in the browser.
Stars: ✭ 24 (-78.76%)
Mutual labels:  nlu, conversational-ai
watson-document-classifier
Augment IBM Watson Natural Language Understanding APIs with a configurable mechanism for text classification, uses Watson Studio.
Stars: ✭ 41 (-63.72%)
Mutual labels:  natural-language, nlu
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+5017.7%)
Mutual labels:  natural-language, nlu
small-talk-rasa-stack
Collection of casual conversations that can be used with the Rasa Stack
Stars: ✭ 87 (-23.01%)
Mutual labels:  conversational-ai, training-data
random-jpa
Create random test data for JPA/Hibernate entities.
Stars: ✭ 23 (-79.65%)
Mutual labels:  data-generator
FAQ Chatbot Rasa
FAQ's answering chatbot using open source chatbot framework Rasa Stack
Stars: ✭ 33 (-70.8%)
Mutual labels:  conversational-ai
app rasa chat bot
a stateless chat bot to perform natural language queries against the App Store top charts
Stars: ✭ 20 (-82.3%)
Mutual labels:  nlu
developers-community
LivePerson’s Developer Center and Community
Stars: ✭ 29 (-74.34%)
Mutual labels:  conversational-ai
gap-text2sql
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Stars: ✭ 83 (-26.55%)
Mutual labels:  nlu

Fountain

Fountain is a natural language data augmentation tool that helps developers create and expand domain specific chatbot training datasets for machine learning algorithms.

Overview.

In order to build better AI assistants, we need more data, better models aren’t enough.

Most of NLU system requires entering thousands of possible queries that future users would -- most possibly -- use, and annotate every sentence segment that can identiy user's intentions. It is generally a hectic and tedious manual process. Fountain aims to help developers smooth away this process and generate a volume of training examples to make it easier to train and build a robust chatbot systems.

The tool intends to make it easy to build the same dataset for different intent engines (Amazon's Alexa, Google's API.ai, Facebook's Wit, Microsoft's Luis). At the moment, the tool generates training datasets compatible with the RasaNLU format.

Getting Started

Installation

You can install the package via:

$ pip install git+git://github.com/tzano/fountain.git

Install the dependencies:

$ pip install -r requirements.txt

Syntax

Fountain uses a structured YAML template, developers can determine the scope of intentions through the template with the grammar definitions. Every intent should include at least one sample utterances that triggers an action. The query includes attributes that identify user's intention. These key information called slots. We include different samples to be able to generate datasets.

We use three operations:

  • Argument ({slot_name:slot_type}): used to declare slot pattern.
  • Argument (( first_word | second_word )): used to provide a set of keywords, these words could be synonymes (e.g: happy, joyful) or the same name with different spellings (e.g: colors|colours)

A simple example of an intent would be like the following

book_cab:
  - utterance: Book a (cab|taxi) to {location:place}
    slots:
      location:place:
        - airport
        - city center

This will generate the following intent json file using to_json.

[
    {
        "entities": [
            {
                "end": 21, 
                "entity": "location", 
                "start": 14, 
                "value": "airport"
            }
        ], 
        "intent": "book_cab", 
        "text": "book a cab to airport"
    }, 
    {
        "entities": [
            {
                "end": 25, 
                "entity": "location", 
                "start": 14, 
                "value": "city center"
            }
        ], 
        "intent": "book_cab", 
        "text": "book a cab to city center"
    }, 
    {
        "entities": [
            {
                "end": 22, 
                "entity": "location", 
                "start": 15, 
                "value": "airport"
            }
        ], 
        "intent": "book_cab", 
        "text": "book a taxi to airport"
    }, 
    {
        "entities": [
            {
                "end": 26, 
                "entity": "location", 
                "start": 15, 
                "value": "city center"
            }
        ], 
        "intent": "book_cab", 
        "text": "book a taxi to city center"
    }
]

The same file would generate the following csv file using to_csv.

intent	utterance
book_cab	book a cab to airport
book_cab	book a cab to city center
book_cab	book a taxi to airport
book_cab	book a taxi to city center

Builtin

The library supports several pre-defined slot types (entities) to simplify and standardize how data in the slot is recognized.

These entities have been collected from different open-source data sources.

  • Dates, and Times

    • FOUNTAIN:DATE
    • FOUNTAIN:WEEKDAYS
    • FOUNTAIN:MONTH_DAYS
    • FOUNTAIN:MONTHS
    • FOUNTAIN:HOLIDAYS
    • FOUNTAIN:TIME
    • FOUNTAIN:NUMBER
  • Location

    • FOUNTAIN:COUNTRY
    • FOUNTAIN:CITY
  • People

    • FOUNTAIN:FAMOUSPEOPLE

Data Sources

In order to build Fountain's builtin datatypes, we processed data from the following data sources:

How to use it:

You can easily load and parse DSL template and export the generated dataset (Rasa format).

You can find this sample under the directory \labs

# DataGenerator
data_generator = DataGenerator()
# load template
template_fname = '<file>.yaml'
# parse the DSL template
results = data_generator.parse(template_fname)

# export to csv file
data_generator.to_csv('results.csv')
# export to csv file
data_generator.to_json('results.json')

Test

pytest

Tutorials & Guides

You can find examples on how to use the library in labs folder. You can enrich the builtin datasets by adding more files under data/<language>/*files*.csv. Make sure to index the files that you insert in resources/builtin.py.

For more information about Chatbots and Natural Language Understanding, visit one of the following links:

Platforms

  • RASA NLU (Supported)

Projects that used Fountain:

  • Wren - News Chatbot to discover & explore daily news stories. We used Fountain to generate more than 20,000 samples. The Yaml file is available here.

Support

If you are having issues, please let us know or submit a pull request.

License

The project is licensed under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].