All Projects → SimGus → Chatette

SimGus / Chatette

Licence: mit
A powerful dataset generator for Rasa NLU, inspired by Chatito

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chatette

Chatito
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (+230.73%)
Mutual labels:  chatbot, chatbots, nlu, nlg
Rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Stars: ✭ 13,219 (+6348.29%)
Mutual labels:  chatbot, chatbots, nlu
Botfuel Dialog
Botfuel SDK to build highly conversational chatbots
Stars: ✭ 96 (-53.17%)
Mutual labels:  chatbot, chatbots, nlu
Botpress
🤖 Dev tools to reliably understand text and automate conversations. Built-in NLU. Connect & deploy on any messaging channel (Slack, MS Teams, website, Telegram, etc).
Stars: ✭ 9,486 (+4527.32%)
Mutual labels:  chatbot, chatbots, nlu
virtual-assistant
Virtual Assistant
Stars: ✭ 67 (-67.32%)
Mutual labels:  chatbot, nlu, chatbots
Mojo Weixin
使用Perl语言(不会没关系)编写的个人账号微信/weixin/wechat客户端框架(非GUI),可通过插件提供基于HTTP协议的api接口供其他语言或系统调用
Stars: ✭ 1,181 (+476.1%)
Mutual labels:  chatbot, cli
Sywac
🚫 🐭 Asynchronous, single package CLI framework for Node
Stars: ✭ 109 (-46.83%)
Mutual labels:  cli, parsing
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (-45.85%)
Mutual labels:  nlu, nlg
Ai Chatbot Framework
A python chatbot framework with Natural Language Understanding and Artificial Intelligence.
Stars: ✭ 1,564 (+662.93%)
Mutual labels:  chatbot, chatbots
Jarvis
J.A.R.V.I.S - Just Another Rudimentary Verbal Instruction Shell
Stars: ✭ 117 (-42.93%)
Mutual labels:  chatbot, cli
Rasa Chatbot Templates
RASA chatbot use case boilerplate
Stars: ✭ 127 (-38.05%)
Mutual labels:  chatbot, chatbots
Fb Botmill
A Java framework for building bots on Facebook's Messenger Platform.
Stars: ✭ 67 (-67.32%)
Mutual labels:  chatbot, chatbots
Mojo Webqq
【重要通知:WebQQ将在2019年1月1日停止服务,此项目目前已停止维护,感谢大家四年来的一路陪伴】使用Perl语言(不会没关系)编写的smartqq/webqq客户端框架(非GUI),可通过插件提供基于HTTP协议的api接口供其他语言或系统调用
Stars: ✭ 1,755 (+756.1%)
Mutual labels:  chatbot, cli
Convai Bot 1337
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager
Stars: ✭ 65 (-68.29%)
Mutual labels:  chatbot, chatbots
Botonic
Build chatbots and conversational experiences using React
Stars: ✭ 144 (-29.76%)
Mutual labels:  chatbots, nlu
Java Telegram Bot Tutorial
Java Telegram Bot Tutorial. Feel free to submit issue if you found a mistake.
Stars: ✭ 165 (-19.51%)
Mutual labels:  chatbot, chatbots
Botsharp
The Open Source AI Chatbot Platform Builder in 100% C# Running in .NET Core with Machine Learning algorithm.
Stars: ✭ 1,103 (+438.05%)
Mutual labels:  chatbot, nlu
Awesome Pretrained Chinese Nlp Models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型集合
Stars: ✭ 195 (-4.88%)
Mutual labels:  nlu, nlg
Pytlas
An open-source 🤖💬 Python 3 assistant library built for people and made to be super easy to setup and understand
Stars: ✭ 34 (-83.41%)
Mutual labels:  chatbot, nlu
Easynlu
Simple embedded NLU for mobile apps
Stars: ✭ 57 (-72.2%)
Mutual labels:  chatbot, nlu

Tweet about it

Chatette logo
Chatette

A data generator for Rasa NLU

PyPI package GitHub license Build status codecov Documentation

InstallationHow to use Chatette?Chatette vs Chatito?DevelopmentCredits

Chatette is a Python program that generates training datasets for Rasa NLU given template files. If you want to make large datasets of example data for Natural Language Understanding tasks without too much of a headache, Chatette is a project for you.

Preview of Chatette's capabilities

Specifically, Chatette implements a Domain Specific Language (DSL) that allows you to define templates to generate a large number of sentences, which are then saved in the input format(s) of Rasa NLU.

The DSL used is a near-superset of the excellent project Chatito created by Rodrigo Pimentel. (Note: the DSL is actually a superset of Chatito v2.1.x for Rasa NLU, not for all possible adapters.)

An interactive mode is available as well:

Interactive mode

Installation

To run Chatette, you will need to have Python installed. Chatette works with both Python 2.7 and 3.x (>= 3.4).

Chatette is available on PyPI, and can thus be installed using pip:

pip install chatette

Alternatively, you can clone the GitHub repository and install the requirements:

pip install -r requirements/common.txt

You can then install the project (as an editable package) using pip, by executing the following command from the directory Chatette/chatette/:

pip install -e .

You can then run the module by using the commands below in the cloned directory.

How to use Chatette?

Input and output data

The data that Chatette uses and generates is loaded from and saved to files. You will thus have:

  • One or several input file(s) containing the templates. There is no need for a specific file extension. The syntax of the DSL to make those templates is described on the wiki.

  • One or several output file(s), which will be generated by Chatette and will contain the generated examples. Those files can be formatted in JSON (by default) or in Markdown and can be directly fed to Rasa NLU. It is also possible to use a JSONL format.

Running Chatette

Once Chatette is installed and you created the template files, run the following command:

python -m chatette <path_to_template>

where python is your Python interpreter (some operating systems use python3 as the alias to the Python 3.x interpreter).

You can specify the name of the output file as follows:

python -m chatette <path_to_template> -o <output_directory_path>

<output_directory_path> is specified relatively to the directory from which the script is being executed. The output file(s) will then be saved in numbered .json files in <output_directory_path>/train and <output_directory_path>/test. If you didn't specify a path for the output directory, the default one is output.

Other program arguments and are described in the wiki.

Chatette vs Chatito?

TL;DR: main selling point: it is easier to deal with large projects using Chatette, and you can transform most Chatito projects into a Chatette one without any modification.

A perfectly legitimate question is:

Why does Chatette exist when Chatito already fulfills the same purposes?

The two projects actually have different goals:

Chatito aims to be a generic but powerful DSL, that should stay very legible. While it is perfectly fine for small projects, when projects get larger, the simplicity of its DSL may become a burden: your template file becomes overwhelmingly large, to the point you get lost inside it.

Chatette defines a more complex DSL to be able to manage larger projects and tries to stay as interoperable with Chatito as possible. Here is a non-exhaustive list of features Chatette has and that Chatito does not have:

  • Ability to break down templates into multiple files
  • Possibility to specify the probability of generating some parts of the sentences
  • Conditional generation of some parts of the sentences, given which other parts were generated
  • Choice syntax to prevent copy-pasting rules with only a few changes and to easily modify the generation behavior of parts of sentences
  • Ability to define the value of each slot (entity) whatever the generated example
  • Syntax for generating words with different case for the leading letter
  • Argument support so that some templates may be filled by different strings in different situations
  • Indentation is permissive and must only be somewhat coherent
  • Support for synonyms
  • Interactive command interpreter
  • Output for Rasa in JSON or in Markdown formats

As the Chatette's DSL is a superset of Chatito's one, input files used for Chatito are most of the time completely usable with Chatette (not the other way around). Hence, it is easy to start using Chatette if you used Chatito before.

As an example, this Chatito data:

// This template defines different ways to ask for the location of toilets (Chatito version)
%[ask_toilet]('training': '3')
    ~[sorry?] ~[tell me] where the @[toilet#singular] is ~[please?]?
    ~[sorry?] ~[tell me] where the @[toilet#plural] are ~[please?]?

~[sorry]
    sorry
    Sorry
    excuse me
    Excuse me

~[tell me]
    ~[can you?] tell me
    ~[can you?] show me
~[can you]
    can you
    could you
    would you

~[please]
    please

@[toilet#singular]
    toilet
    loo
@[toilet#plural]
    toilets

could be directly given as input to Chatette, but this Chatette template would produce the same results:

// This template defines different ways to ask for the location of toilets (Chatette version)
%[&ask_toilet](3)
    ~[sorry?] ~[tell me] where the @[toilet#singular] is [please?]?
    ~[sorry?] ~[tell me] where the @[toilet#plural] are [please?]?

~[sorry]
    sorry
    excuse me

~[tell me]
    ~[can you?] [tell|show] me
~[can you]
    [can|could|would] you

@[toilet#singular]
    toilet
    loo
@[toilet#plural]
    toilets

The Chatito version is arguably easier to read, but the Chatette version is shorter, which may be very useful when dealing with lots of templates and potential repetition.

Beware that, as always with machine learning, having too much data may cause your models to perform less well because of overfitting. While this script can be used to generate thousands upon thousands of examples, it isn't advised for machine learning tasks.

Chatette is named after Chatito: -ette in French could be translated to -ita or -ito in Spanish. Note that the last e in Chatette is not pronouced (as is the case in "note").

Development

For developers, you can clone the repo and install the development requirements: pip install -r requirements/develop.txt Then, install the module as editable: pip install -e <path-to-chatette-module>

Credits

Author and maintainer

Disclaimer: This is a side-project I'm not paid for, don't expect me to work 24/7 on it.

Contributors

Many thanks to them!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].