All Projects → emres → Turkish Deasciifier

emres / Turkish Deasciifier

Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Turkish Deasciifier

Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+20250%)
Mutual labels:  nlp-library
Natas
Python 3 library for processing historical English
Stars: ✭ 28 (-74.07%)
Mutual labels:  nlp-library
Simstring
A Python implementation of the SimString, a simple and efficient algorithm for approximate string matching.
Stars: ✭ 79 (-26.85%)
Mutual labels:  nlp-library
Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+412.96%)
Mutual labels:  nlp-library
Underthesea
Underthesea - Vietnamese NLP Toolkit
Stars: ✭ 823 (+662.04%)
Mutual labels:  nlp-library
Simplenetnlp
.NET NLP library
Stars: ✭ 38 (-64.81%)
Mutual labels:  nlp-library
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (+294.44%)
Mutual labels:  nlp-library
Awesome Pytorch List Cnversion
Awesome-pytorch-list 翻译工作进行中......
Stars: ✭ 1,361 (+1160.19%)
Mutual labels:  nlp-library
Atr4s
Toolkit with state-of-the-art Automatic Terms Recognition methods in Scala
Stars: ✭ 23 (-78.7%)
Mutual labels:  nlp-library
Farm
🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Stars: ✭ 1,140 (+955.56%)
Mutual labels:  nlp-library
Pythainlp
Thai Natural Language Processing in Python.
Stars: ✭ 582 (+438.89%)
Mutual labels:  nlp-library
Kuromoji
Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Stars: ✭ 745 (+589.81%)
Mutual labels:  nlp-library
Tika Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Stars: ✭ 997 (+823.15%)
Mutual labels:  nlp-library
Sudachi
A Japanese Tokenizer for Business
Stars: ✭ 496 (+359.26%)
Mutual labels:  nlp-library
Punkt Segmenter
Ruby port of the NLTK Punkt sentence segmentation algorithm
Stars: ✭ 88 (-18.52%)
Mutual labels:  nlp-library
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+300.93%)
Mutual labels:  nlp-library
Sentiment Analyser
ML that can extract german and english sentiment
Stars: ✭ 35 (-67.59%)
Mutual labels:  nlp-library
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+51512.96%)
Mutual labels:  nlp-library
Toiro
A comparison tool of Japanese tokenizers
Stars: ✭ 95 (-12.04%)
Mutual labels:  nlp-library
Node Opennlp
Apache OpenNLP wrapper for Nodejs
Stars: ✭ 55 (-49.07%)
Mutual labels:  nlp-library

turkish-deasciifier: Turkish deasciifier

This is a deasciifier Python library and command line utility for Turkish that solves the problem of diacritics restoration (also known as diacritics reconstruction). It takes a Turkish string containing only ASCII characters (that is, without proper diacritics) and replaces the relevant characters with their corresponding Turkish letters.

The web-based, online version of this system is available at:

http://turkceyap.appspot.com/

Keep in mind that diacritics restoration (deasciification) for Turkish doesn't work 100% of the time; it is an active research topic! Still, this library is good enough for many practical purposes, and served many people and projects in the last 10 years.

This system is based on the turkish-mode for GNU Emacs by Prof. Deniz Yüret.

Table of Contents

  1. Installation
  2. Example Python Library Usage
  3. Example CLI (Command Line Interface) Usage
  4. Other Programming Languages and Systems
  5. Advanced Research

Installation

Python 3

For now, the recommended way to install is to use pip and install direcly from the project's GitHub repository:

pip install git+https://github.com/emres/turkish-deasciifier.git

Python 2

Keep in mind that switching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can install using the following command:

pip install Turkish-Deasciifier

Example Python Library Usage

Python 3

from turkish.deasciifier import Deasciifier

my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt)
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print(my_deasciified_turkish_txt)

Python 2

Keep in mind that switching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can use the library in the following manner:

from turkish.deasciifier import Deasciifier

my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt.decode("utf-8"))
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print my_deasciified_turkish_txt.encode("utf-8")

Example CLI (Command Line Interface) Usage

Python 3

Example tested in a Bash shell:

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify
$ cat somefile.txt | turkish-deasciify

Python 2

Keep in mind that switching to Python 3 is strongly recommended!

Example tested in a Bash shell:

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify-python2
$ cat somefile.txt | turkish-deasciify-python2

Other Programming Languages and Systems

Advanced Research

For recent advanced scientific research articles, please see the following:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].