All Projects → sobhe → Hazm

sobhe / Hazm

Licence: mit
Python library for digesting Persian text.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Hazm

Persian Stopwords
Persian (Farsi) Stop Words List
Stars: ✭ 131 (-77.98%)
Mutual labels:  persian, natural-language-processing
Nhazm
A C# version of Hazm (Python library for digesting Persian text)
Stars: ✭ 35 (-94.12%)
Mutual labels:  persian, natural-language-processing
Fewrel
A Large-Scale Few-Shot Relation Extraction Dataset
Stars: ✭ 526 (-11.6%)
Mutual labels:  natural-language-processing
Self Attentive Parser
High-accuracy NLP parser with models for 11 languages.
Stars: ✭ 569 (-4.37%)
Mutual labels:  natural-language-processing
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+4038.82%)
Mutual labels:  natural-language-processing
Leakgan
The codes of paper "Long Text Generation via Adversarial Training with Leaked Information" on AAAI 2018. Text generation using GAN and Hierarchical Reinforcement Learning.
Stars: ✭ 533 (-10.42%)
Mutual labels:  natural-language-processing
React Modern Calendar Datepicker
A modern, beautiful, customizable date picker for React
Stars: ✭ 555 (-6.72%)
Mutual labels:  persian
Languagetool
Style and Grammar Checker for 25+ Languages
Stars: ✭ 5,641 (+848.07%)
Mutual labels:  natural-language-processing
Pythainlp
Thai Natural Language Processing in Python.
Stars: ✭ 582 (-2.18%)
Mutual labels:  natural-language-processing
D2l Zh
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被55个国家的300所大学用于教学。
Stars: ✭ 29,132 (+4796.13%)
Mutual labels:  natural-language-processing
Awesome Bert Nlp
A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
Stars: ✭ 567 (-4.71%)
Mutual labels:  natural-language-processing
Sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 5,540 (+831.09%)
Mutual labels:  natural-language-processing
Awesome Semi Supervised Learning
📜 An up-to-date & curated list of awesome semi-supervised learning papers, methods & resources.
Stars: ✭ 538 (-9.58%)
Mutual labels:  natural-language-processing
Mycroft Core
Mycroft Core, the Mycroft Artificial Intelligence platform.
Stars: ✭ 5,489 (+822.52%)
Mutual labels:  natural-language-processing
Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (-10.59%)
Mutual labels:  natural-language-processing
Fast abs rl
Code for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"
Stars: ✭ 569 (-4.37%)
Mutual labels:  natural-language-processing
Chat
基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话
Stars: ✭ 516 (-13.28%)
Mutual labels:  natural-language-processing
Hate Speech And Offensive Language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Stars: ✭ 543 (-8.74%)
Mutual labels:  natural-language-processing
Pythoncode Tutorials
The Python Code Tutorials
Stars: ✭ 544 (-8.57%)
Mutual labels:  natural-language-processing
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (-1.85%)
Mutual labels:  natural-language-processing

Hazm

Python library for digesting Persian text.

  • Text cleaning
  • Sentence and word tokenizer
  • Word lemmatizer
  • POS tagger
  • Shallow parser
  • Dependency parser
  • Interfaces for Persian corpora
  • NLTK compatible
  • Python 2.7, 3.4, 3.5 and 3.6 support
  • Build Status

Usage

>>> from __future__ import unicode_literals
>>> from hazm import *

>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیم‌فاصله پردازش را آسان مي كند')
'اصلاح نویسه‌ها و استفاده از نیم‌فاصله پردازش را آسان می‌کند'

>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']

>>> stemmer = Stemmer()
>>> stemmer.stem('کتاب‌ها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('می‌روم')
'رفت#رو'

>>> tagger = POSTagger(model='resources/postagger.model')
>>> tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))
[('ما', 'PRO'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('می‌خوانیم', 'V')]

>>> chunker = Chunker(model='resources/chunker.model')
>>> tagged = tagger.tag(word_tokenize('کتاب خواندن را دوست داریم'))
>>> tree2brackets(chunker.parse(tagged))
'[کتاب خواندن NP] [را POSTP] [دوست داریم VP]'

>>> parser = DependencyParser(tagger=tagger, lemmatizer=lemmatizer)
>>> parser.parse(word_tokenize('زنگ‌ها برای که به صدا درمی‌آید؟'))
<DependencyGraph with 8 nodes>

Installation

The latest stable version of Hazm can be installed through pip:

pip install hazm

But for testing or using Hazm with the latest updates you may use:

pip install https://github.com/sobhe/hazm/archive/master.zip --upgrade

We have also trained tagger and parser models. You may put these models in the resources folder of your project.

Extensions

Note: These are not official versions of hazm, not uptodate on functionality and are not supported by Sobhe.

  • JHazm: A Java port of Hazm
  • NHazm: A C# port of Hazm

Thanks

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].