基于Rasa NLU的中文语义理解服务搭建

发布于 2018-12-12 07:07:42

enter image description here

前言

在一年前关注了Rasa NLU,参考开源自然语言理解框架Rasa NLU,然后在中间持续关注的过程中发现已经有人基于Spacy训练中文语言模型,于是就顺藤摸瓜,找到了Rasa NLU Chi,以及博客文档 用Rasa_NLU构建自己的中文NLU系统,于是尝试在Windows上搭建中文NLU服务,以下是详细步骤以及中间遇到问题的解决办法。

环境

操作系统:windows10 wsl版本: Ubuntu 18.04 LTS python版本:3.6.7

安装


1.安装wsl

进入到Microsoft Store 搜索wsl 安装Ubuntu 18.04 LTS

2.更改源

https://www.linuxidc.com/Linux/2018-08/153709.htm

3.安装python和pip

Ubuntu 18.04 LTS默认是没有python的需要自己安装

python的安装是 sudo apt-get install python3

pip的安装参考 https://www.linuxidc.com/Linux/2018-05/152390.htm

4.修改pip的源为国内这里使用的是清华的源

linux下,修改 ~/.pip/pip.conf (没有就创建一个), 修改 index-url至tuna,内容如下: [global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple

5.安装rasa core,sklearn,mitieinstall rasa_core, this will install rasa nlu too, and now support chinese.

pip3 install rasa_core

install sklearn and MITIE pip install -U scikit-learn sklearn-crfsuite pip install git+https://github.com/mit-nlp/MITIE.git

6.clone rasanluchi

git clone https://github.com/crownpku/rasanluchi

7.下载训练好的model文件 copy到~/rasanluchi/data目录

链接:https://pan.baidu.com/s/1kNENvlHLYWZIddmtWJ7Pdg 密码:p4vx

8.训练NLU数据

python3 -m rasanlu.train -c sampleconfigs/configjiebamitiesklearn.yml --data data/examples/rasa/demo-rasazh.json --path models

会报Warning,查了一下好像并没有什么关系

/home/sh/.local/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for)

9.启动server

python3 -m rasanlu.server -c sampleconfigs/configjiebamitie_sklearn.yml --path models

10.请求报错,查找是由于是新版的scikit-learn与twist框架冲突造成的

按照issue中的办法解决

https://github.com/crownpku/RasaNLUChi/issues/73

curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "default", "model": "model_20181212-141830"}' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138    0    44  100    94     22     47  0:00:02  0:00:01  0:00:01    69
{
  "error": "bad value(s) in fds_to_keep"
}

11.重新训练数据,再次请求

curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "default", "model": "model_20181212-144335"}' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   813    0   719  100    94     89     11  0:00:08  0:00:08 --:--:--   188
{
  "intent": {
    "name": "medical",
    "confidence": 0.500349594001549
  },
  "entities": [
    {
      "entity": "disease",
      "value": "发烧",
      "start": 1,
      "end": 3,
      "confidence": null,
      "extractor": "ner_mitie"
    }
  ],
  "intent_ranking": [
    {
      "name": "medical",
      "confidence": 0.500349594001549
    },
    {
      "name": "restaurant_search",
      "confidence": 0.1943808602354033
    },
    {
      "name": "affirm",
      "confidence": 0.1207785366984987
    },
    {
      "name": "goodbye",
      "confidence": 0.11322856287182206
    },
    {
      "name": "greet",
      "confidence": 0.07126244619272717
    }
  ],
  "text": "我发烧了该吃什么药?"
}