All Projects → crownpku → Rasa_nlu_chi

crownpku / Rasa_nlu_chi

Licence: apache-2.0
Turn Chinese natural language into structured data 中文自然语言理解

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rasa nlu chi

Unilmchatchitrobot
Unilm for Chinese Chitchat Robot.基于Unilm模型的夸夸式闲聊机器人项目。
Stars: ✭ 67 (-94.25%)
Mutual labels:  chatbot, chinese
fuzzychinese
A small package to fuzzy match chinese words
Stars: ✭ 50 (-95.71%)
Mutual labels:  natural-language, chinese
Ai Chatbot Framework
A python chatbot framework with Natural Language Understanding and Artificial Intelligence.
Stars: ✭ 1,564 (+34.13%)
Mutual labels:  chatbot, natural-language
Botlibre
An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
Stars: ✭ 412 (-64.67%)
Mutual labels:  chatbot, natural-language
Weibo terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+96.83%)
Mutual labels:  chatbot, chinese
Nlp xiaojiang
自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用
Stars: ✭ 954 (-18.18%)
Mutual labels:  chatbot, chinese
Awesome Cn
awesome项目中文翻译,提升查阅效率
Stars: ✭ 62 (-94.68%)
Mutual labels:  chinese
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (-2.92%)
Mutual labels:  natural-language
Megahal
MegaHAL is a learning chatterbot.
Stars: ✭ 60 (-94.85%)
Mutual labels:  chatbot
Dumbqq
对SmartQQ API的C#封装。(由于作者懒出了一定境界现已停止维护)
Stars: ✭ 60 (-94.85%)
Mutual labels:  chatbot
Chinese Hershey Font
Convert Chinese Characters to Single-Line Fonts using Computer Vision
Stars: ✭ 70 (-94%)
Mutual labels:  chinese
Microsoftbotframework
Microsoft Bot Framework is a wrapper for the Microsoft Bot API by Microsoft
Stars: ✭ 68 (-94.17%)
Mutual labels:  chatbot
Awesome machine learning solutions
A curated list of repositories for my book Machine Learning Solutions.
Stars: ✭ 65 (-94.43%)
Mutual labels:  chatbot
Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (-3.95%)
Mutual labels:  chatbot
Devchatterbot
Stars: ✭ 60 (-94.85%)
Mutual labels:  chatbot
Talkify
Talkify is an open source framework with an aim to standardize and model conversational AI enabling development of personal assistants and chat bots. The mission of this framework is to make developing chat bots and personal assistants as easy as spinning up a simple website in html.
Stars: ✭ 68 (-94.17%)
Mutual labels:  chatbot
When
A natural language date/time parser with pluggable rules
Stars: ✭ 1,113 (-4.55%)
Mutual labels:  natural-language
Localization Zh Cn Plugin
Chinese Localization for Jenkins
Stars: ✭ 65 (-94.43%)
Mutual labels:  chinese
Fb Botmill
A Java framework for building bots on Facebook's Messenger Platform.
Stars: ✭ 67 (-94.25%)
Mutual labels:  chatbot
Messenger Bot Rails
Ruby on Rails Gem for the Facebook Messenger Bot Platform
Stars: ✭ 64 (-94.51%)
Mutual labels:  chatbot

Rasa NLU for Chinese, a fork from RasaHQ/rasa_nlu.

Please refer to newest instructions at official Rasa NLU document

中文Blog

Files you should have:

  • data/total_word_feature_extractor_zh.dat

Trained from Chinese corpus by MITIE wordrep tools (takes 2-3 days for training)

For training, please build the MITIE Wordrep Tool. Note that Chinese corpus should be tokenized first before feeding into the tool for training. Close-domain corpus that best matches user case works best.

A trained model from Chinese Wikipedia Dump and Baidu Baike can be downloaded from 中文Blog.

  • data/examples/rasa/demo-rasa_zh.json

Should add as much examples as possible.

Usage:

  1. Clone this project, and run
python setup.py install
  1. Modify configuration.

    Currently for Chinese we have two pipelines:

    Use MITIE+Jieba (sample_configs/config_jieba_mitie.yml):

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_classifier_mitie"

RECOMMENDED: Use MITIE+Jieba+sklearn (sample_configs/config_jieba_mitie_sklearn.yml):

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
  1. (Optional) Use Jieba User Defined Dictionary or Switch Jieba Default Dictionoary:

    You can put in file path or directory path as the "user_dicts" value. (sample_configs/config_jieba_mitie_sklearn_plus_dict_path.yml)

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
  default_dict: "./default_dict.big"
  user_dicts: "./jieba_userdict"
#  user_dicts: "./jieba_userdict/jieba_userdict.txt"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
  1. Train model by running:

    If you specify your project name in configure file, this will save your model at /models/your_project_name.

    Otherwise, your model will be saved at /models/default

python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models
  1. Run the rasa_nlu server:
python -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models
  1. Open a new terminal and now you can curl results from the server, for example:
$ curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "rasa_nlu_test", "model": "model_20170921-170911"}' | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   652    0   552  100   100    157     28  0:00:03  0:00:03 --:--:--   157
{
    "entities": [
        {
            "end": 3,
            "entity": "disease",
            "extractor": "ner_mitie",
            "start": 1,
            "value": "发烧"
        }
    ],
    "intent": {
        "confidence": 0.5397186422631861,
        "name": "medical"
    },
    "intent_ranking": [
        {
            "confidence": 0.5397186422631861,
            "name": "medical"
        },
        {
            "confidence": 0.16206323981749196,
            "name": "restaurant_search"
        },
        {
            "confidence": 0.1212448457737397,
            "name": "affirm"
        },
        {
            "confidence": 0.10333600028547868,
            "name": "goodbye"
        },
        {
            "confidence": 0.07363727186010374,
            "name": "greet"
        }
    ],
    "text": "我发烧了该吃什么药?"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].