disooqi / ArabicProcessingCog

Licence: MIT License

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ArabicProcessingCog

Syntok

Text tokenization and sentence segmentation (segtok v2)

Stars: ✭ 123 (+547.37%)

Mutual labels: tokenizer, segmentation

Kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Stars: ✭ 554 (+2815.79%)

Mutual labels: tokenizer, segmentation

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+2178.95%)

Mutual labels: tokenizer, text-processing

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Stars: ✭ 21 (+10.53%)

Mutual labels: tokenizer, computational-linguistics

python-mecab

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Stars: ✭ 27 (+42.11%)

Mutual labels: tokenizer, text-processing

perke

A keyphrase extractor for Persian

Stars: ✭ 60 (+215.79%)

Mutual labels: computational-linguistics, text-processing

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

Stars: ✭ 438 (+2205.26%)

Mutual labels: tokenizer, text-processing

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+136.84%)

Mutual labels: tokenizer, text-processing

frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Stars: ✭ 70 (+268.42%)

Mutual labels: computational-linguistics, text-processing

CISTEM

Stemmer for German

Stars: ✭ 33 (+73.68%)

Mutual labels: segmentation, computational-linguistics

sembei

🍘 単語分割を経由しない単語埋め込み 🍘

Stars: ✭ 14 (-26.32%)

Mutual labels: computational-linguistics

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (+647.37%)

Mutual labels: text-processing

HyperDenseNet pytorch

Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation

Stars: ✭ 58 (+205.26%)

Mutual labels: segmentation

daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.

Stars: ✭ 75 (+294.74%)

Mutual labels: text-processing

Baysor

Bayesian Segmentation of Spatial Transcriptomics Data

Stars: ✭ 53 (+178.95%)

Mutual labels: segmentation

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

Stars: ✭ 13 (-31.58%)

Mutual labels: computational-linguistics

gnu-linux-shell-scripting

A foundation for GNU/Linux shell scripting

Stars: ✭ 23 (+21.05%)

Mutual labels: text-processing

deepflash2

A deep-learning pipeline for segmentation of ambiguous microscopic images.

Stars: ✭ 34 (+78.95%)

Mutual labels: segmentation

text

Qiniu Text Processing Libraries for Go

Stars: ✭ 25 (+31.58%)

Mutual labels: text-processing

dilation-keras

Multi-Scale Context Aggregation by Dilated Convolutions in Keras.

Stars: ✭ 72 (+278.95%)

Mutual labels: segmentation

View All Similar Projects ➔

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

The library is intended for Python 3

from arabic_processing_cog.normalization import Arabic_normalization as an

import sentence_breaker as SB
import codecs
from normalization import Arabic_normalization as AN
from stemming import Light10stemmer as light10

with codecs.open('docs/68956', encoding='utf-8') as dd:
    with codecs.open('sentences', encoding='utf-8', mode='w') as outf:
        for line in dd:
            sentences = SB.break_into_sentences(line)            
            for sen in sentences:                   
                outf.write(AN.normalize_sentence(sen)+'\n')
        else:
            print 'Done :-)'

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

disooqi / ArabicProcessingCog

Programming Languages

Labels

Projects that are alternatives of or similar to ArabicProcessingCog

ArabicProcessingCog