All Projects → datacoon → russiannames

datacoon / russiannames

Licence: BSD-3-Clause license
Russian names parsers, gender identification and processing tools

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to russiannames

RussianNounsJS
Склонение существительных по падежам. Обычно требуются только форма в именительном падеже, одушевлённость и род.
Stars: ✭ 29 (-71.57%)
Mutual labels:  russian-specific, russian-language
mystem-scala
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-79.41%)
Mutual labels:  russian-specific
ds
👨‍🔬 In Russian: Обновляемая структурированная подборка бесплатных ресурсов по тематикам Data Science: курсы, книги, открытые данные, блоги и готовые решения.
Stars: ✭ 102 (+0%)
Mutual labels:  russian-language
namsor-python-sdk2
NamSor API v2 Python SDK - classify personal names accurately by gender, country of origin, or ethnicity.
Stars: ✭ 23 (-77.45%)
Mutual labels:  gender-detection
ML-Resources
books and courses on machine learning
Stars: ✭ 58 (-43.14%)
Mutual labels:  russian-language
vim-plugin-ruscmd
Vim plugin: support command mode in Russian keyboard layout
Stars: ✭ 60 (-41.18%)
Mutual labels:  russian-specific
ideas
Идеи по улучшению языка C++ для обсуждения
Stars: ✭ 65 (-36.27%)
Mutual labels:  russian-language
Strata
Раскладка клавиатуры для тех, кто любит Markdown и пишет по-русски
Stars: ✭ 70 (-31.37%)
Mutual labels:  russian-specific
swfk
“Snake wrangling for kids”: the Russian translation. Русский перевод книги «Snake Wrangling for Kids»
Stars: ✭ 24 (-76.47%)
Mutual labels:  russian-specific
age-and-gender
Predict Age and Gender of people from images | Determination of gender and age
Stars: ✭ 68 (-33.33%)
Mutual labels:  gender-detection
voice gender detection
♂️♀️ Detect a person's gender from a voice file (90.7% +/- 1.3% accuracy).
Stars: ✭ 51 (-50%)
Mutual labels:  gender-detection
svelte3-translation-ru
Russian translation of the official Svelte resources
Stars: ✭ 49 (-51.96%)
Mutual labels:  russian-language
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-85.29%)
Mutual labels:  russian-language
gazeta
Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке
Stars: ✭ 25 (-75.49%)
Mutual labels:  russian-language
ru punkt
Russian language support for NLTK's PunktSentenceTokenizer
Stars: ✭ 49 (-51.96%)
Mutual labels:  russian-specific
rudetoxifier
Code and data of "Methods for Detoxification of Texts for the Russian Language" paper
Stars: ✭ 30 (-70.59%)
Mutual labels:  russian-language
People-tracking-with-Age-and-Gender-detection
A combination between people tracking and age and gender detection
Stars: ✭ 57 (-44.12%)
Mutual labels:  gender-detection
ruMount-BladeBannerlord
Русификатор Mount&Blade Bannerlord
Stars: ✭ 57 (-44.12%)
Mutual labels:  russian-language
mystem
CGo bindings to Yandex.Mystem
Stars: ✭ 28 (-72.55%)
Mutual labels:  russian-specific
FA
Репозиторий практик факультета ИТиАБД направления Прикладной Информатики в Финансовом Университете при Правительстве РФ
Stars: ✭ 26 (-74.51%)
Mutual labels:  russian-specific

Russian Names

russiannames is a Python 3 library dedicated to parse Russian names, surnames and midnames, identify person gender by fullname and how name is written. It uses MongoDB as backend to speed-up name parsing.

Documentation

Documentation is built automatically and can be found on https://russiannames.readthedocs.org/en/latest/

Installation

To install Python library use pip install russiannames via pip or python setup.py install

To use database you need MongoDB instance. Unpack db_data_bson.zip file from https://github.com/datacoon/russiannames/blob/master/data/bson/db_dump_bson.zip

and use mongorestore command to restore names database with 3 collections: names, surnames and midnames

Features

Database of names used for identification

  • 375449 surnames - collection: surnames
  • 32134 first names - collection: names
  • 48274 midnames - collection: midnames

Detailed database statistics by gender and collection

collection total males females universal or unidentified
names 32134 19297 8278 1196
midnames 48274 30114 16143 0
surnames 375274 124662 111534 38827

Supports 12 formats of Russian full names writing style

Format Example Description
f Ольга only first name
s Петров only surname
Fs О. Сидорова first letter of first name and full surname
sF Николаев С. full surname and first letter of surname
sf Абрамов Семен full surname and full first name
fs Соня Камиуллина full first name and full surname
fm Иван Петрович full first name and full middlename
SFM М.Д.М. first letters of surname, first name, middlename
FMs А.Н. Егорова first letters of first and middle name and full furname
sFM Николаенко С.П. full surname and first letters of first and middle names
sfM Петракова Зинаида М. full surname, first name and first letter of middle name
sfm Казаков Ринат Артурович full name as surname, first name and middle name
fms Светлана Архиповна Волкова full name as first name, middle name and surname

Supports names with following ethnics identification

9 ethnic types in names, surnames and middle names supported

key name (en) name (rus)
arab Arabic Арабское
arm Armenian Армянское
geor Georgian Грузинское
germ German Немецкие
greek Greek Греческие
jew Jew Еврейские
polsk Polish Польские
slav Slavic (Russian) Славянские
tur Turkic Тюркские (тюркоязычные)

Limitations

  • very rare names, surnames or middlenames could be not parsed
  • ethnic identification is still on early stage

Speed optimization

  • preconfigured and preindexed MongoDb collections used

Usage and Examples

Parse name and identify gender

Parses names and returns: format, surname, first name, middle name, parsed (True/False) and gender

>>> from russiannames.parser import NamesParser
>>> parser = NamesParser()
>>> parser.parse('Нигматуллин Ринат Ахметович')
{'format': 'sfm', 'sn': 'Нигматуллин', 'fn': 'Ринат', 'mn': 'Ахметович', 'gender': 'm', 'text': 'Нигматуллин Ринат Ахметович', 'parsed': True}
>>> parser.parse('Петрова C.Я.')
{'format': 'sFM', 'sn': 'Петрова', 'fn_s': 'C', 'mn_s': 'Я', 'gender': 'f', 'text': 'Петрова C.Я.', 'parsed': True}

Gender field could have one of following values:

  • m: Male
  • f: Female
  • u: Unknown / unidentified
  • -: Impossible to identify

Ethnic identification (experimental)

Parses surname, first name and middle name and tries to identify person ethic affiliation of the person

>>> from russiannames.parser import NamesParser
>>> parser = NamesParser()
>>> parser.classify('Нигматуллин', 'Ринат', 'Ахметович')
{'ethnics': ['tur'], 'gender': 'm'}
>>> parser.classify('Алексеева', 'Ольга', 'Ивановна')
{'ethnics': ['slav'], 'gender': 'f'}

Supported languages

  • Russian

Requirements

  • pymongo
  • click

Related projects

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].