1000-7 / xinlp

Licence: other

把李航老师《统计学习方法》的后几章的算法都用java实现了一遍，实现盒子与球的EM算法，扩展到去GMM训练，后来实现了HMM分词（实现了HMM分词的参数训练）和CRF分词（借用CRF++训练的参数模型），最后利用tensorFlow把BiLSTM+CRF实现了，然后为lucene包装了一个XinAnalyzer

Programming Languages

java

68154 projects - #9 most used programming language

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to xinlp

Nlp Journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+6042.86%)

Mutual labels: crf, lda

LinLP

使用Python进行自然语言处理相关实践，如新词发现，主题模型，隐马尔模型词性标注，Word2Vec，情感分析

Stars: ✭ 43 (+104.76%)

Mutual labels: hmm, lda

ChineseNER

中文NER的那些事儿

Stars: ✭ 241 (+1047.62%)

Mutual labels: crf, bilstm-crf

Machine Learning Code

《统计学习方法》与常见机器学习模型(GBDT/XGBoost/lightGBM/FM/FFM)的原理讲解与python和类库实现

Stars: ✭ 169 (+704.76%)

Mutual labels: hmm, crf

deepseg

Chinese word segmentation in tensorflow 2.x

Stars: ✭ 23 (+9.52%)

Mutual labels: crf, bilstm-crf

BiLSTM-CRF-NER-PyTorch

This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.

Stars: ✭ 109 (+419.05%)

Mutual labels: crf, bilstm-crf

CIP

Basic exercises of chinese information processing

Stars: ✭ 32 (+52.38%)

Mutual labels: hmm, crf

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (+9.52%)

Mutual labels: crf, lda

mahjong

开源中文分词工具包，中文分词Web API，Lucene中文分词，中英文混合分词

Stars: ✭ 40 (+90.48%)

Mutual labels: hmm, crf

reacnetgenerator

an automatic reaction network generator for reactive molecular dynamics simulation

Stars: ✭ 25 (+19.05%)

Mutual labels: hmm

Topic-Modeling-Workshop-with-R

A workshop on analyzing topic modeling (LDA, CTM, STM) using R

Stars: ✭ 51 (+142.86%)

Mutual labels: lda

bioinf-commons

Bioinformatics library in Kotlin

Stars: ✭ 21 (+0%)

Mutual labels: hmm

pymc3-hmm

Hidden Markov models in PyMC3

Stars: ✭ 81 (+285.71%)

Mutual labels: hmm

Machine-Learning-Models

In This repository I made some simple to complex methods in machine learning. Here I try to build template style code.

Stars: ✭ 30 (+42.86%)

Mutual labels: lda

sequence tagging

Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German

Stars: ✭ 25 (+19.05%)

Mutual labels: bilstm-crf

interspeech2018 submission01

Supplementary information and code for INTERSPEECH 2018 paper: Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

Stars: ✭ 43 (+104.76%)

Mutual labels: hmm

libfmp

libfmp - Python package for teaching and learning Fundamentals of Music Processing (FMP)

Stars: ✭ 71 (+238.1%)

Mutual labels: hmm

Gse

Go efficient multilingual NLP and text segmentation; support english, chinese, japanese and other. Go 高性能多语言 NLP 和分词

Stars: ✭ 1,695 (+7971.43%)

Mutual labels: hmm

deepvis

machine learning algorithms in Swift

Stars: ✭ 54 (+157.14%)

Mutual labels: lda

BayesHMM

Full Bayesian Inference for Hidden Markov Models

Stars: ✭ 35 (+66.67%)

Mutual labels: hmm

View All Similar Projects ➔

xinlp

学习《统计学习方法》，从第八章的EM算法到第十一章的CRF都基本实现了一遍，还结合现在深度学习热潮，实现了Bi-LSTM+CRF分词

2019.03.21

实现了一个简单的LDA模型，Gibbs采样迭代更新

EM和GMM

先是学习了EM算法，实现了GMM高斯混合模型
高斯混合模型和kmeans很像，亲身测试男女身高这种事情GMM很难训练出来的

自己实现HMM分词

HMM 盒子与球问题三种问题（概率，学习，预测）都实现了
主要思想就是参数训练好的情况下（jieba分词的参数），viterbi算法实现就好。
HMM参数使用的python jieba分词的参数
也尝试用Baum-Welch算法进行参数训练学习，发现效果贼差。。。。

自己实现CRF分词

CRF参照了Ansj和Hanlp两个的写法。
CRF参数来自于CRF++训练得到，利用训练的参数进行分词
CRF 人工定义特征函数太费劲了，其实就是特征工程，参数学习要用的方法也没实现。其实就是特征函数难定义。使用viterbi算法进行分词，学习借助 CRF，概率和hmm类似没有实现。

自己实现Bi-LSTM+CRF分词

实现的有两个版本：
ugly版本是第一遍直接实现的，因为以前也没怎么好好写过python，所以就随便命名、结构也很乱，做的时候不知道的东西就百度+bing去搜，反正遇山修路，过河修桥那样的实现的....,不过代码很精简，没有任何封装，看起来其实很流畅
非ugly版本是从github上找了一个很厉害的项目guillaumegenthial/sequence_tagging,仿照这种python代码完整度非常高的项目去重新写了一边代码（有很多地方直接抄的😊），代码很清晰，几个文件各司其职，也算没有辜负python（一个面向对象的动态解释型强语言）

自己实现一个支持lucene的分词器——XinAnalyzer

用lucene的时候，看见了一个叫SmartChineseAnalyzer的支持中文分词，效果不咋的，发现竟然用的HMM分词，当时一句"我的天"，于是就想自己也写一个。。。
2018.12.11 自己的HMM分词器已经支持了
2018.12.13 支持CRF分词（tcp通信），支持BiLSTM+CRF分词（http通信）

使用到的各种数据

链接:https://pan.baidu.com/s/1toe-0h4k9Ck_yGs-RwMqAA 密码:sn7o

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

1000-7 / xinlp

Programming Languages

Labels

Projects that are alternatives of or similar to xinlp

xinlp

2019.03.21

EM和GMM

相关博客

自己实现HMM分词

相关博客

自己实现CRF分词

相关博客

自己实现Bi-LSTM+CRF分词

相关博客

自己实现一个支持lucene的分词器——XinAnalyzer

相关博客

使用到的各种数据