All Projects → jason2506 → esapp

jason2506 / esapp

Licence: BSD-3-Clause license
An unsupervised Chinese word segmentation tool.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to esapp

Lac
百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+21376.92%)
Mutual labels:  chinese-nlp, word-segmentation
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+1061.54%)
Mutual labels:  chinese-text-segmentation, word-segmentation
youtokentome-ruby
High performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (+30.77%)
Mutual labels:  unsupervised-learning, word-segmentation
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+15100%)
Mutual labels:  chinese-text-segmentation, word-segmentation
dnn-lstm-word-segment
Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network
Stars: ✭ 24 (+84.62%)
Mutual labels:  chinese-text-segmentation, word-segmentation
lxa5
Linguistica 5: Unsupervised Learning of Linguistic Structure
Stars: ✭ 27 (+107.69%)
Mutual labels:  computational-linguistics, unsupervised-learning
Chinese semantic role labeling
基于 Bi-LSTM 和 CRF 的中文语义角色标注
Stars: ✭ 60 (+361.54%)
Mutual labels:  chinese-nlp
Information Extraction Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
Stars: ✭ 1,888 (+14423.08%)
Mutual labels:  chinese-nlp
Cnn Question Classification Keras
Chinese Question Classifier (Keras Implementation) on BQuLD
Stars: ✭ 28 (+115.38%)
Mutual labels:  chinese-nlp
Jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for the latest lucene,solr,elasticsearch
Stars: ✭ 754 (+5700%)
Mutual labels:  chinese-nlp
Fancy Nlp
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Stars: ✭ 233 (+1692.31%)
Mutual labels:  chinese-nlp
Weatherbot
一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面
Stars: ✭ 186 (+1330.77%)
Mutual labels:  chinese-nlp
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (+953.85%)
Mutual labels:  chinese-nlp
Chinesenlp
Datasets, SOTA results of every fields of Chinese NLP
Stars: ✭ 1,206 (+9176.92%)
Mutual labels:  chinese-nlp
G2pc
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Stars: ✭ 155 (+1092.31%)
Mutual labels:  chinese-nlp
Chinese Xinhua
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Stars: ✭ 8,705 (+66861.54%)
Mutual labels:  chinese-nlp
Awesome Chinese Nlp
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
Stars: ✭ 6,599 (+50661.54%)
Mutual labels:  chinese-nlp
Chinese Chatbot
中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行,跑不起来直播吃键盘。
Stars: ✭ 124 (+853.85%)
Mutual labels:  chinese-nlp
Thuctc
An Efficient Chinese Text Classifier
Stars: ✭ 179 (+1276.92%)
Mutual labels:  chinese-nlp
Thulac Python
An Efficient Lexical Analyzer for Chinese
Stars: ✭ 1,619 (+12353.85%)
Mutual labels:  chinese-nlp

ESA++

Usage Example

See test_package/example.cpp.

Instructions

Requirements

  • CMake >= 3.1
  • C++ compiler which supports features of C++14

Installing with Conan (and CMake)

The recommended way to use ESA++ package in your project is to install the package with Conan.

Assume that your project is built with CMake, you can just execute the following command in your build directory:

$ conan install esapp/0.5.2@jason2506/testing -b outdated -g cmake

The install command will download the package (together with its dependencies) and generate conanbuildinfo.cmake file in the current directory.

Additionally, you need to include conanbuildinfo.cmake and then add conan_basic_setup() command into your CMakeLists.txt:

cmake_minimum_required(VERSION 3.1)
project(myproj)

include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake)
conan_basic_setup()

This will setup necessary CMake variables for finding installed libraries and related files.

Now, you can use find_package() and target_link_libraries() commands to locate and link the package. For example,

find_package(ESA++)

if(ESA++_FOUND)
    add_executable(myexec mycode.cpp)
    target_link_libraries(myexec ESA++::ESA++)
endif()

The final step is to build your project with CMake, like:

$ cmake [SOURCE_DIR] -DCMAKE_BUILD_TYPE=Release
$ cmake --build .

Please check Conan Docs for more details about how to use conan packages, generators and much more.

Installing without Conan

You can also install the package without the help of Conan. ESA++ is a header-only library. Therefore, all you need to do is to copy header files (contained in the include/ directory) to your project and manually install all dependencies of ESA++.

Building Python Wrapper

There is a Python wrapper for ESA++ package. One way of building it is to execute conan commands with --scope wrappers=python option, like

$ mkdir _build && cd _build
$ conan install .. --build outdated --scope wrappers=python
$ conan build ..

Conan will install all necessary dependencies and build the wrapper.

Alternatively, you can install dependencies yourself, setup CMake variables for finding that, and enable ESAPP_WRAPPER_PYTHON option:

$ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INCLUDE_PATH=... \
    -DCMAKE_LIBRARY_PATH=... \
    -DCMAKE_PREFIX_PATH=... \
    -DCMAKE_MODULE_PATH=... \
    -DESAPP_WRAPPER_PYTHON=ON
$ cmake --build _build

After compiling, you will get esapp_python.so (or esapp_python.dll) in _build/wrapper/python/. You can directly import this module in your Python code:

from esapp_python import Segmenter

See wrapper/python/example.py.

Dependencies

  • DICT == 0.1.2
  • pybind11 >= 2.0.0
    • only required if you want to build the python wrapper

References

  • H. Feng, K. Chen, X. Deng, and W. Zheng, "Accessor variety criteria for Chinese word extraction," Computational Linguistics, vol. 30, no. 1, pp. 75–93, 2004.
  • H. Wang, J. Zhu, S. Tang, and X. Fan, "A new Unsupervised approach to word segmentation," Computational Linguistics, vol. 37, no. 3, pp. 421–454, 2011.
  • M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch, "Replacing suffix trees with enhanced suffix arrays," Journal of Discrete Algorithms, vol. 2, no. 1, pp. 53–86, 2004.

License

Copyright (c) 2014-2017, Chi-En Wu.

Distributed under The BSD 3-Clause License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].