sobhe / Hazm
Licence: mit
Python library for digesting Persian text.
Stars: ✭ 595
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Hazm
Persian Stopwords
Persian (Farsi) Stop Words List
Stars: ✭ 131 (-77.98%)
Mutual labels: persian, natural-language-processing
Nhazm
A C# version of Hazm (Python library for digesting Persian text)
Stars: ✭ 35 (-94.12%)
Mutual labels: persian, natural-language-processing
Fewrel
A Large-Scale Few-Shot Relation Extraction Dataset
Stars: ✭ 526 (-11.6%)
Mutual labels: natural-language-processing
Self Attentive Parser
High-accuracy NLP parser with models for 11 languages.
Stars: ✭ 569 (-4.37%)
Mutual labels: natural-language-processing
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+4038.82%)
Mutual labels: natural-language-processing
Leakgan
The codes of paper "Long Text Generation via Adversarial Training with Leaked Information" on AAAI 2018. Text generation using GAN and Hierarchical Reinforcement Learning.
Stars: ✭ 533 (-10.42%)
Mutual labels: natural-language-processing
React Modern Calendar Datepicker
A modern, beautiful, customizable date picker for React
Stars: ✭ 555 (-6.72%)
Mutual labels: persian
Languagetool
Style and Grammar Checker for 25+ Languages
Stars: ✭ 5,641 (+848.07%)
Mutual labels: natural-language-processing
Pythainlp
Thai Natural Language Processing in Python.
Stars: ✭ 582 (-2.18%)
Mutual labels: natural-language-processing
D2l Zh
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被55个国家的300所大学用于教学。
Stars: ✭ 29,132 (+4796.13%)
Mutual labels: natural-language-processing
Awesome Bert Nlp
A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
Stars: ✭ 567 (-4.71%)
Mutual labels: natural-language-processing
Sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 5,540 (+831.09%)
Mutual labels: natural-language-processing
Awesome Semi Supervised Learning
📜 An up-to-date & curated list of awesome semi-supervised learning papers, methods & resources.
Stars: ✭ 538 (-9.58%)
Mutual labels: natural-language-processing
Mycroft Core
Mycroft Core, the Mycroft Artificial Intelligence platform.
Stars: ✭ 5,489 (+822.52%)
Mutual labels: natural-language-processing
Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (-10.59%)
Mutual labels: natural-language-processing
Fast abs rl
Code for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"
Stars: ✭ 569 (-4.37%)
Mutual labels: natural-language-processing
Chat
基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话
Stars: ✭ 516 (-13.28%)
Mutual labels: natural-language-processing
Hate Speech And Offensive Language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Stars: ✭ 543 (-8.74%)
Mutual labels: natural-language-processing
Pythoncode Tutorials
The Python Code Tutorials
Stars: ✭ 544 (-8.57%)
Mutual labels: natural-language-processing
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (-1.85%)
Mutual labels: natural-language-processing
Hazm
Python library for digesting Persian text.
- Text cleaning
- Sentence and word tokenizer
- Word lemmatizer
- POS tagger
- Shallow parser
- Dependency parser
- Interfaces for Persian corpora
- NLTK compatible
- Python 2.7, 3.4, 3.5 and 3.6 support
Usage
>>> from __future__ import unicode_literals
>>> from hazm import *
>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیمفاصله پردازش را آسان مي كند')
'اصلاح نویسهها و استفاده از نیمفاصله پردازش را آسان میکند'
>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']
>>> stemmer = Stemmer()
>>> stemmer.stem('کتابها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('میروم')
'رفت#رو'
>>> tagger = POSTagger(model='resources/postagger.model')
>>> tagger.tag(word_tokenize('ما بسیار کتاب میخوانیم'))
[('ما', 'PRO'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('میخوانیم', 'V')]
>>> chunker = Chunker(model='resources/chunker.model')
>>> tagged = tagger.tag(word_tokenize('کتاب خواندن را دوست داریم'))
>>> tree2brackets(chunker.parse(tagged))
'[کتاب خواندن NP] [را POSTP] [دوست داریم VP]'
>>> parser = DependencyParser(tagger=tagger, lemmatizer=lemmatizer)
>>> parser.parse(word_tokenize('زنگها برای که به صدا درمیآید؟'))
<DependencyGraph with 8 nodes>
Installation
The latest stable version of Hazm can be installed through pip
:
pip install hazm
But for testing or using Hazm with the latest updates you may use:
pip install https://github.com/sobhe/hazm/archive/master.zip --upgrade
We have also trained tagger and parser models. You may put these models in the resources
folder of your project.
Extensions
Note: These are not official versions of hazm, not uptodate on functionality and are not supported by Sobhe.
Thanks
- to constributors: Mojtaba Khallash and Mohsen Imany.
- to Virastyar project for persian word list.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].