Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → GaoQ1 → Rasa_nlu_gq

GaoQ1 / Rasa_nlu_gq

Licence: apache-2.0

turn natural language into structured data(支持中文，自定义了N种模型，支持不同的场景和任务)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

tensorflow nlp nlu natural-language jieba

Projects that are alternatives of or similar to Rasa nlu gq

Nlp Recipes

Natural Language Processing Best Practices & Examples

Stars: ✭ 5,783 (+2158.98%)

Mutual labels: nlu, natural-language

gdpr-fingerprint-pii

Use Watson Natural Language Understanding and Watson Knowledge Studio to fingerprint personal data from unstructured documents

Stars: ✭ 49 (-80.86%)

Mutual labels: natural-language, nlu

fountain

Natural Language Data Augmentation Tool for Conversational Systems

Stars: ✭ 113 (-55.86%)

Mutual labels: natural-language, nlu

Botlibre

An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.

Stars: ✭ 412 (+60.94%)

Mutual labels: nlu, natural-language

watson-document-classifier

Augment IBM Watson Natural Language Understanding APIs with a configurable mechanism for text classification, uses Watson Studio.

Stars: ✭ 41 (-83.98%)

Mutual labels: natural-language, nlu

fillers

List of (possible) English filler words

Stars: ✭ 36 (-85.94%)

Mutual labels: natural-language

nli-go

Natural Language Interface in GO, a semantic parser and execution engine.

Stars: ✭ 20 (-92.19%)

Mutual labels: natural-language

virtual-assistant

Virtual Assistant

Stars: ✭ 67 (-73.83%)

Mutual labels: nlu

nlp-dialogue

A full-process dialogue system that can be deployed online

Stars: ✭ 69 (-73.05%)

Mutual labels: nlu

News Search Engine

新闻搜索引擎

Stars: ✭ 254 (-0.78%)

Mutual labels: jieba

retext-profanities

plugin to check for profane and vulgar wording

Stars: ✭ 34 (-86.72%)

Mutual labels: natural-language

sepia-docs

Documentation and Wiki for SEPIA. Please post your questions and bug-reports here in the issues section! Thank you :-)

Stars: ✭ 160 (-37.5%)

Mutual labels: nlu

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-79.69%)

Mutual labels: nlu

opensnips

Open source projects related to Snips https://snips.ai/.

Stars: ✭ 50 (-80.47%)

Mutual labels: nlu

array-to-sentence

Join all elements of an array and create a human-readable string

Stars: ✭ 32 (-87.5%)

Mutual labels: natural-language

expando

A simple syntax for defining the NLU model for a conversational interface.

Stars: ✭ 36 (-85.94%)

Mutual labels: nlu

apertium-html-tools

Web application providing a fully localised interface for text/website/document translation, analysis and generation powered by Apertium.

Stars: ✭ 36 (-85.94%)

Mutual labels: natural-language

lancaster-stemmer

Lancaster stemming algorithm

Stars: ✭ 22 (-91.41%)

Mutual labels: natural-language

react-taggy

A simple zero-dependency React component for tagging user-defined entities within a block of text.

Stars: ✭ 29 (-88.67%)

Mutual labels: natural-language

rita

Website, documentation and examples for RiTa

Stars: ✭ 42 (-83.59%)

Mutual labels: natural-language

View All Similar Projects ➔

Rasa NLU GQ

Rasa NLU (Natural Language Understanding) 是一个自然语义理解的工具，举个官网的例子如下：

"I'm looking for a Mexican restaurant in the center of town"

And returning structured data like:

  intent: search_restaurant
  entities: 
    - cuisine : Mexican
    - location : center

Introduction

原来的项目在分支0.2.7上，可自由切换。这个版本的修改是基于最新版本的rasa，将原来rasa_nlu_gao里面的component修改了下，并没有做新增。并且之前做法有些累赘，并不需要在rasa源码中修改。可以直接将原来的component当做addon加载，继承最新版本的rasa，可实时更新。

New features

目前新增的特性如下（请下载最新的rasa-nlu-gao版本）(edit at 2019.06.24)：

新增了实体识别的模型，一个是bilstm+crf，一个是idcnn+crf膨胀卷积模型，对应的yml文件配置如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: "(?u)\b\w+\b"
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 100

新增了jieba词性标注的模块，可以方便识别名字，地名，机构名等等jieba能够支持的词性，对应的yml文件配置如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "rasa_nlu_gao.extractors.jieba_pseg_extractor.JiebaPsegExtractor"
    part_of_speech: ["nr", "ns", "nt"]
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: "(?u)\b\w+\b"
  - name: "EmbeddingIntentClassifier"

新增了根据实体反向修改意图，对应的文件配置如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "JiebaPsegExtractor"
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: '(?u)\b\w+\b'
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.classifiers.entity_edit_intent.EntityEditIntent"
    entity: ["nr"]
    intent: ["enter_data"]
    min_confidence: 0

新增了bert模型提取词向量特征，对应的配置文件如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "EmbeddingIntentClassifier"
  - name: "CRFEntityExtractor"

新增了对CPU和GPU的利用率的配置，主要是EmbeddingIntentClassifier和ner_bilstm_crf这两个使用到tensorflow的组件，配置如下（当然config_proto可以不配置，默认值会将资源全部利用）：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: '(?u)\b\w+\b'
  - name: "EmbeddingIntentClassifier"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }

新增了embedding_bert_intent_classifier分类器，对应的配置文件如下：

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "rasa_nlu_gao.classifiers.embedding_bert_intent_classifier.EmbeddingBertIntentClassifier"
  - name: "CRFEntityExtractor"

在基础词向量使用bert的情况下，后端的分类器使用tensorflow高级api完成，tf.estimator,tf.data,tf.example,tf.saved_model intent_estimator_classifier_tensorflow_embedding_bert分类器，对应的配置文件如下：

language: "zh"

pipeline:
- name: "JiebaTokenizer"
- name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
  ip: '127.0.0.1'
  port: 5555
  port_out: 5556
  show_server_config: True
  timeout: 10000
- name: "rasa_nlu_gao.classifiers.embedding_bert_intent_estimator_classifier.EmbeddingBertIntentEstimatorClassifier"
- name: "SpacyNLP"
- name: "CRFEntityExtractor"

rasa-nlu的究极形态，对应的配置文件如下(edit at 2019.10.01)可参考上面的文章

Quick Install

pip install rasa-nlu-gao

Some Examples

具体的例子请看rasa_chatbot_cn

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 256

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗