Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+893.33%)

Mutual labels: seq2seq, ner

Nlp pytorch project

Embedding, NMT, Text_Classification, Text_Generation, NER etc.

Stars: ✭ 153 (-32%)

Mutual labels: seq2seq, ner

Min nlp practice

Chinese & English Cws Pos Ner Entity Recognition implement using CNN bi-directional lstm and crf model with char embedding.基于字向量的CNN池化双向BiLSTM与CRF模型的网络，可能一体化的完成中文和英文分词，词性标注，实体识别。主要包括原始文本数据，数据转换,训练脚本,预训练模型,可用于序列标注研究.注意：唯一需要实现的逻辑是将用户数据转化为序列模型。分词准确率约为93%，词性标注准确率约为90%，实体标注（在本样本上）约为85%。

Stars: ✭ 107 (-52.44%)

Mutual labels: pos, ner

Bert seq2seq

pytorch实现bert做seq2seq任务，使用unilm方案,现在也可以做自动摘要，文本分类，情感分析，NER，词性标注等任务,支持GPT2进行文章续写。

Stars: ✭ 298 (+32.44%)

Mutual labels: seq2seq, ner

Nlp Papers

Papers and Book to look at when starting NLP 📚

Stars: ✭ 111 (-50.67%)

Mutual labels: pos, ner

Jiagu

Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类

Stars: ✭ 2,368 (+952.44%)

Mutual labels: pos, ner

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (-9.78%)

Mutual labels: pos, ner

Deep Time Series Prediction

Seq2Seq, Bert, Transformer, WaveNet for time series prediction.

Stars: ✭ 183 (-18.67%)

Mutual labels: seq2seq

Kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.

Stars: ✭ 190 (-15.56%)

Mutual labels: seq2seq

Screenshot To Code

A neural network that transforms a design mock-up into a static website.

Stars: ✭ 13,561 (+5927.11%)

Mutual labels: seq2seq

Headliner

🏖 Easy training and deployment of seq2seq models.

Stars: ✭ 221 (-1.78%)

Mutual labels: seq2seq

Persian Ner

پیکره بزرگ شناسایی موجودیت‌های نامدار فارسی برچسب خورده

Stars: ✭ 183 (-18.67%)

Mutual labels: ner

Pytorch Beam Search Decoding

PyTorch implementation of beam search decoding for seq2seq models

Stars: ✭ 204 (-9.33%)

Mutual labels: seq2seq

Bert Sklearn

a sklearn wrapper for Google's BERT model

Stars: ✭ 182 (-19.11%)

Mutual labels: ner

Deeptoxic

top 1% solution to toxic comment classification challenge on Kaggle.

Stars: ✭ 180 (-20%)

Mutual labels: pos

Tgen

Statistical NLG for spoken dialogue systems

Stars: ✭ 179 (-20.44%)

Mutual labels: seq2seq

Pymystem3

A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggestion, please make a pull request. We are very open to accepting any contributions.

Stars: ✭ 224 (-0.44%)

Mutual labels: pos

View All Similar Projects ➔

NLP-tools

本项目旨在通过Tensorflow基于BiLSTM+CRF实现字符级序列标注模型。

功能：

1、对未登录字（词）识别能力

2、Http接口

3、可快速实现分词、词性标注、NER、SRL等序列标注模型

欢迎各位大佬吐槽。

说明

环境配置：创建新的conda环境

 $ conda env create -f environment.yaml

语料处理

不同标注语料格式不同，需额外处理，在example/DataPreprocessing.ipynb中提供了人民日报2014预处理过程（该语料集未上传至github，只有部分样例于corpus，可通过互联网找到。若找不到可email me），语料格式：人民网/nz 1月4日/t 讯/ng 据/p [法国/nsf 国际/n。

生成word2id字典和训练数据于data/xx.pkl中。

模型训练

 $ python train.py 
 [-h] [--dict_path DICT_PATH] [--train_data TRAIN_DATA]
      [--ckpt_path CKPT_PATH] [--embed_size EMBED_SIZE]
      [--hidden_size HIDDEN_SIZE] [--batch_size BATCH_SIZE] 
      [--epoch EPOCH] [--lr LR]
      [--save_path SAVE_PATH]

训练生成checkpoint存入SAVE_PATH, CKPT_PATH用于模型做finetune。

模型默认超参数

嵌入层向量长度：256
BiLstm层数：2
隐藏层节点数：512
Batch宽度：128
初始学习率：1e-3 （不同任务需做finetune）

模型测试

模型测试示例位于Modeltest.ipynb中。

HTTP接口

一个简单的web server

 $ python app.py

执行python，默认本机测试代码：(linux和windows下格式不同)

 $ curl -i -H "Content-Type: application/json" -X POST -d '{"text":"\u5f20\u51cc\u745e\u3002"}' http://localhost:7777/cws

现状

在人民日报上的分词能达到正确率97%，词性标注能达到正确率96%。

通过对该模型在上亿条句子上的训练结果测试，将CWS、POS、NER标签做成end2end的融合标签，综合正确率能达到96%，且对未登录字（词）识别能力佳，拥有对语义的捕获能力。

（在Modeltest.ipynb中列举了一些例子）

最近一直在看Google神奇BERT，后续会添加BERT的序列标注训练模块进来，让模型在不同领域进行迁移。

参考

本项目模型BiLSTM+CRF参考论文：http://www.aclweb.org/anthology/N16-1030

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 225

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗