轻量中文押韵神器，100%绝对可用，傻瓜式命令行操作，秒速实现烈焰单押，闪电双押，龙卷三押以及海啸式四押，目前版本 v0.2.6。Search for rhymes for Chinese words, with 1, 2, 3 and 4 characters, released on PyPI with current version of 0.2.6.

Stars: ✭ 72 (-1.37%)

Mutual labels: chinese

pinyin4js

A opensource javascript library for converting chinese to pinyin。welcome Star : P

Stars: ✭ 153 (+109.59%)

Mutual labels: chinese

awesome-malware-analysis

Defund the Police.

Stars: ✭ 9,181 (+12476.71%)

Mutual labels: chinese

chinese-calendar-golang

📅 公历, 农历, 干支历转换包, 提供精确的日历转换.

Stars: ✭ 104 (+42.47%)

Mutual labels: chinese

chinese-diceware

Diceware word lists in Chinese

Stars: ✭ 27 (-63.01%)

Mutual labels: chinese

goSpider

some small project and some articles

Stars: ✭ 56 (-23.29%)

Mutual labels: chinese

Chi-Wiki

A programmer who is not good at Chinese is not a advanced middle school student.

Stars: ✭ 18 (-75.34%)

Mutual labels: chinese

designing-with-libreoffice

The work to translate Designing with LibreOffice book into traditional Chinese.

Stars: ✭ 17 (-76.71%)

Mutual labels: chinese

djinni

djinni中文文档，一个根据djinni写成的demo（ios），解决了macOS Sierra 10.12环境下无法build的问题

Stars: ✭ 52 (-28.77%)

Mutual labels: chinese

chinese-novel

📙 Chinese novel database 最全的中国古典小说数据库。

Stars: ✭ 131 (+79.45%)

Mutual labels: chinese

unihandecode

unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities

Stars: ✭ 71 (-2.74%)

Mutual labels: chinese

FCH-TTS

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Stars: ✭ 154 (+110.96%)

Mutual labels: chinese

View All Similar Projects ➔

find-Chinese-medical-words

从网上抓取的医疗语料中，以一种改进的无监督方法寻找语料库存在的词；
主要方法利用互信息熵，正向最大匹配，搜索引擎进行迭代来找词；
语料库不限领域，本实验是以医疗领域的文本；

环境

python2/3
requests
lxml

方法

step1:统计语料库中出现单字，双字的频率，前后链接的字相关信息；

step2:对统计出的单字和双字的结果，使用互信熵，选择大于阈值K=10.8的词加入词库，作为初始词库；

step3:有了初始词库，使用正向最大匹配，对语料库进行切分，对切分出来的字串按频率排序输出并记下数量seg_num；

step4:对切分产生的字串按频率排序，前H=2000的字串进行搜索引擎（百度）,若字串是“百度百科”收录词条，将该字串作为词加入词库，或者在搜索页面的文本中出现的次数超过阈值R=60,也将该字串作为词加入词库；

step5:更新词库后，重复step3，step4进行迭代，,当searh_num=0时，结束迭代；当seg_num小于设定的Y=5000,进行最后一次step4，并H设定为H=seg_num，执行完后结束迭代，最后词库就是本程序所找的词；

流程图

算法

运行

python medfw.py
其中涉及的参数可根据实际环境进行调整

结果

最终输出的词库在./data/dict.txt文件中；./data目录中是语料库和程序产生的中间数据。
在本次实验中，用了约50M的医学领域的语料，迭代了9次，找到有4967个词。

结果样例

惶惶 org
爷爷 org
曼佗 org
垮垮 org
萧轼 org
艇舰 org
蝰蛇 org
攸琐 org
咔嚓 org
喀嚓 org
铒翠 org
诚挚 org
迪厅 org
不足 iter_0
知情同意书 iter_0
运动 iter_0
状态 iter_0
瘢痕 iter_0
心悸 iter_0
步态 iter_0
祸首 iter_0
照相 iter_0
形成 iter_0
面容 iter_0
先天 iter_0
动作 iter_0
由于 iter_0
价格 iter_0
行为 iter_0
淋病 iter_0
包括 iter_0
栓塞 iter_0
球感 iter_0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

cjymz886 / find-Chinese-medical-words

Programming Languages

Labels

Projects that are alternatives of or similar to find-Chinese-medical-words

find-Chinese-medical-words

环境

方法

流程图

算法

运行

结果

结果样例