Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → howl-anderson → Chinese_models_for_spacy

howl-anderson / Chinese_models_for_spacy

Licence: mit

SpaCy 中文模型 | Models for SpaCy that support Chinese

Labels

jupyter-notebook nlp nlp-machine-learning chinese-nlp

Projects that are alternatives of or similar to Chinese models for spacy

Seq2seq tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

Stars: ✭ 132 (-75.69%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Nlp profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (-66.67%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (-74.77%)

Mutual labels: jupyter-notebook, chinese-nlp

Codesearchnet

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+153.78%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Nemo

NeMo: a toolkit for conversational AI

Stars: ✭ 3,685 (+578.64%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Bertqa Attention On Steroids

BertQA - Attention on Steroids

Stars: ✭ 112 (-79.37%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Pytorch Question Answering

Important paper implementations for Question Answering using PyTorch

Stars: ✭ 154 (-71.64%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Natural Language Processing Specialization

This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera

Stars: ✭ 151 (-72.19%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Natural Language Processing With Tensorflow

Natural Language Processing with TensorFlow, published by Packt

Stars: ✭ 222 (-59.12%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Melusine

Melusine is a high-level library for emails classification and feature extraction "dédiée aux courriels français".

Stars: ✭ 222 (-59.12%)

Mutual labels: jupyter-notebook, nlp-machine-learning

News push project

Real Time News Scraping and Recommendation System - React | Tensorflow | NLP | News Scrapers

Stars: ✭ 44 (-91.9%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Dab

Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ

Stars: ✭ 294 (-45.86%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-92.82%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Chinese Chatbot

中文聊天机器人，基于10万组对白训练而成，采用注意力机制，对一般问题都会生成一个有意义的答复。已上传模型，可直接运行，跑不起来直播吃键盘。

Stars: ✭ 124 (-77.16%)

Mutual labels: jupyter-notebook, chinese-nlp

Sdtm mapper

AI SDTM mapping (R for ML, Python, TensorFlow for DL)

Stars: ✭ 27 (-95.03%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Ktext

Utilities for preprocessing text for deep learning with Keras

Stars: ✭ 182 (-66.48%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (-49.72%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Hands On Nltk Tutorial

The hands-on NLTK tutorial for NLP in Python

Stars: ✭ 419 (-22.84%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Intro To Python

An intro to Python & programming for wanna-be data scientists

Stars: ✭ 536 (-1.29%)

Mutual labels: jupyter-notebook

Photomosaic

Creating fun photomosaics, GIFs, and murals from your family pictures using ML & similarity search

Stars: ✭ 540 (-0.55%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

README written in English

SpaCy 官方中文模型已经上线(https://spacy.io/models/zh), 本项目『推动 SpaCy 中文模型开发』的使命已经完成，本项目将进入维护状态，后续更新将只进行 bug 修复，感谢各位用户长期的关注和支持。

SpaCy 中文模型

为 SpaCy 提供的中文数据模型. 模型目前还处于 beta 公开测试的状态。

在线演示

基于 Jupyter notebook 的在线演示在。

特性

部分 王小明在北京的清华大学读书 这个 Doc 对象的属性信息:

NER (New!)

部分 王小明在北京的清华大学读书 这个 Doc 对象的 NER 信息:

开始使用

模型用二进制文件的形式进行分发, 用户应该具备基础的 SpaCy （version > 2) 的基础知识.

系统要求

Python 3 (也许支持 python2, 但未经过良好测试)

安装

下载模型

从 releases 页面下载模型 (New! 为中国地区的用户提供了加速下载的链接)。假设所下载的模型名为 zh_core_web_sm-2.x.x.tar.gz。

安装模型

pip install zh_core_web_sm-2.x.x.tar.gz

为了方便后续在 Rasa NLU 等框架中使用，需要再为这个模型建立一个链接，by 执行以下命令：

spacy link zh_core_web_sm zh

运行完成后就可以使用 zh 这个别名来访问这个模型了。

运行 Demo 代码

Demo 代码位于 test.py. 在安装好模型后，用户下载或者克隆本仓库的代码，然后可以直接执行

python3 ./test.py

打开地址 http://127.0.0.1:5000, 将看到如下：

如何从零构造这个模型

见 workflow

语料库

本项目使用的语料库是 OntoNotes 5.0。

由于 OntoNotes 5.0 是 LDC (Linguistic Data Consortium) 的版权材料，无法直接包含在本项目中。好消息是，OntoNotes 5.0 对于团体用户（包含企业和学术组织）是完全免费的。用户可以建立一个企业或者学术组织账号，然后免费获取 OntoNotes 5.0。

TODO list

属性 pos_ 不正确. 这个和 SpaCy 中中文语言 Class 相关。
属性 shape_ and is_alpha 似乎对中文并无意义, 但需要权威信息确认一下.
属性 is_stop 不正确. 这个和 SpaCy 中中文语言 Class 相关。
属性 vector 似乎没有训练的很好。
~~属性 is_oov 完全错误. 第一优先级修复。~~
~~NER 模型，因为缺少 LDC 语料库，目前不可用. 正在解决中正在训练中。~~
将训练中所用的中间结果 release 出来, 方便用户自行定制模型

使用的组件

TODO

如何贡献

请阅读 CONTRIBUTING.md , 然后提交 pull requests 给我们.

版本化控制

我们使用 SemVer 做版本化的标准. 查看 tags 以了解所有的版本.

作者

Xiaoquan Kong - Initial work - howl-anderson

更多贡献者信息，请参考 contributors.

版权

MIT License - 详见 LICENSE.md

致谢

TODO

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 543

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (10) 🔗