All Projects → zzzsochi → trans

zzzsochi / trans

Licence: other
National characters transcription module.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to trans

transliteration-php
🇺🇦 🇬🇧 🔡 🐘 PHP library for transliteration.
Stars: ✭ 34 (+54.55%)
Mutual labels:  transliteration, translit, transliterate
iuliia-go
Transliterate Cyrillic → Latin in every possible way
Stars: ✭ 36 (+63.64%)
Mutual labels:  transliteration, translit
js-diacritic-regex
Creates the inverse of transliterated string to a regex. What? Basically, diacritic insensitiveness
Stars: ✭ 20 (-9.09%)
Mutual labels:  transliteration, transliterate
Transliterate
Транслитерация для Laravel
Stars: ✭ 48 (+118.18%)
Mutual labels:  transliteration, transliterate
transliterasijawa
Javanese Transliteration (Nulisa Aksara Jawa)
Stars: ✭ 55 (+150%)
Mutual labels:  transliteration, transliterator
finglish
A Finglish to Persian converter.
Stars: ✭ 60 (+172.73%)
Mutual labels:  transliteration
andaluh-js
Transliterate español (spanish) spelling to andaluz proposals using javascript
Stars: ✭ 22 (+0%)
Mutual labels:  transliterator
unihandecode
unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities
Stars: ✭ 71 (+222.73%)
Mutual labels:  transliteration
homoglyphs
Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.
Stars: ✭ 70 (+218.18%)
Mutual labels:  utf8
sbbs
Mirror of gitlab.synchro.net/sbbs (don't submit pull requests here)
Stars: ✭ 25 (+13.64%)
Mutual labels:  utf8
perstem
Persian stemmer and morphological analyzer
Stars: ✭ 18 (-18.18%)
Mutual labels:  transliterator
libutf8
A whatwg compliant UTF8 encoding and decoding library
Stars: ✭ 32 (+45.45%)
Mutual labels:  utf8
vastringify
Type-safe Printf in C
Stars: ✭ 60 (+172.73%)
Mutual labels:  utf8
mysqlutf8
默认支持utf8编码的MySQL镜像
Stars: ✭ 28 (+27.27%)
Mutual labels:  utf8
Translit
C# library for cyrillic-latin transliteration (support only slavik languages) by GOST 7.79-2000 (ISO 9).
Stars: ✭ 47 (+113.64%)
Mutual labels:  translit
unicode-programming
Unicode programming examples
Stars: ✭ 33 (+50%)
Mutual labels:  utf8
ICU4N
International Components for Unicode for .NET
Stars: ✭ 18 (-18.18%)
Mutual labels:  transliterator
fix-utf8
Fix Unicode encoding errors
Stars: ✭ 22 (+0%)
Mutual labels:  utf8
unidecoder
Replace Unicode characters with sensible US-ASCII equivalents
Stars: ✭ 67 (+204.55%)
Mutual labels:  transliteration
deep-trans
Transliterating English to Hindi using Recurrent Neural Networks
Stars: ✭ 44 (+100%)
Mutual labels:  transliteration

The trans module

This module translates national characters into similar sounding latin characters (transliteration). At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian, Kazakh and Farsi alphabets are supported (it covers 99% of needs).

Simple usage

It's very easy to use

Python 3:

>>> from trans import trans
>>> trans('Привет, Мир!')

Python 2:

>>> import trans
>>> u'Привет, Мир!'.encode('trans')
u'Privet, Mir!'
>>> trans.trans(u'Привет, Мир!')
u'Privet, Mir!'

Work only with unicode strings

>>> 'Hello World!'.encode('trans')
Traceback (most recent call last):
    ...
TypeError: trans codec support only unicode string, <type 'str'> given.

This is readability

>>> s = u'''\
...    -- Раскудрить твою через коромысло в бога душу мать
...             триста тысяч раз едрену вошь тебе в крыло
...             и кактус в глотку! -- взревел разъяренный Никодим.
...    -- Аминь, -- робко добавил из склепа папа Пий.
...                 (c) Г. Л. Олди, "Сказки дедушки вампира".'''
>>>
>>> print s.encode('trans')
   -- Raskudrit tvoyu cherez koromyslo v boga dushu mat
            trista tysyach raz edrenu vosh tebe v krylo
            i kaktus v glotku! -- vzrevel razyarennyy Nikodim.
   -- Amin, -- robko dobavil iz sklepa papa Piy.
                (c) G. L. Oldi, "Skazki dedushki vampira".

Table "slug"

Use the table "slug", leaving only the Latin characters, digits and underscores:

>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans')
1 2 3 4 5
6 7 8 9 0
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/slug')
1_2_3_4_5__6_7_8_9_0
>>> s.encode('trans/slug')[-42:-1]
u'_c__G__L__Oldi___Skazki_dedushki_vampira_'

Table "id"

Table id is deprecated and renamed to slug. Old name also available, but not recommended.

Define user tables

Simple variant

>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
Traceback (most recent call last):
    ...
ValueError: Table "my" not found in tables!
>>> trans.tables['my'] = {u'1': u'A', u'2': u'B'};
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
u'A_B________________'
>>>

A little harder

Table can consist of two parts - the map of diphthongs and the map of characters. Diphthongs are processed first by simple replacement in the substring. Then each character of the received string is replaced according to the map of characters. If character is absent in the map of characters, key None are checked. If key None is not present, the default character u'_' is used.

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-',
...               u'A': u'A', u'B': u'B'}  # See below...
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'AAzyxBBxyz--'

The characters are created by processing of diphthongs also processed by the map of the symbols:

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'--zyx--xyz--'

Without the diphthongs

These two tables are equivalent:

>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['t1'] = characters
>>> trans.tables['t2'] = ({}, characters)
>>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2')
True

ChangeLog

2.1 2016-09-19

  • Add Farsi alphabet (thx rodgar-nvkz)
  • Use pytest
  • Some code style refactoring

2.0 2013-04-01

  • Python 3 support
  • class Trans for create different tables spaces

1.5 2012-09-12

  • Add support of kazakh alphabet.

1.4 2011-11-29

  • Change license to BSD.

1.3 2010-05-18

  • Table "id" renamed to "slug". Old name also available.
  • Some speed optimizations (thx to AndyLegkiy <andy.legkiy at gmail.com>).

1.2 2010-01-10

  • First public release.
  • Translate documentation to English.

Finally

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].