All Projects → derek73 → Python Nameparser

derek73 / Python Nameparser

Licence: other
A simple Python module for parsing human names into their individual components

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Python Nameparser

stringx
Drop-in replacements for base R string functions powered by stringi
Stars: ✭ 14 (-96.97%)
Mutual labels:  text-processing
Text-Analysis
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-89.61%)
Mutual labels:  text-processing
Aho Corasick
A fast implementation of Aho-Corasick in Rust.
Stars: ✭ 424 (-8.23%)
Mutual labels:  text-processing
advanced-text-mining
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Stars: ✭ 15 (-96.75%)
Mutual labels:  text-processing
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-69.26%)
Mutual labels:  text-processing
ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-95.89%)
Mutual labels:  text-processing
andaluh-js
Transliterate español (spanish) spelling to andaluz proposals using javascript
Stars: ✭ 22 (-95.24%)
Mutual labels:  text-processing
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (-5.19%)
Mutual labels:  text-processing
NLP-tools
Useful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-91.56%)
Mutual labels:  text-processing
Bsed
Simple SQL-like syntax on top of Perl text processing.
Stars: ✭ 414 (-10.39%)
Mutual labels:  text-processing
typ3r.js
🍟 [Library] dA aNn0Y1Ng t3Xt g3NeRa7or
Stars: ✭ 22 (-95.24%)
Mutual labels:  text-processing
gnu-linux-shell-scripting
A foundation for GNU/Linux shell scripting
Stars: ✭ 23 (-95.02%)
Mutual labels:  text-processing
Textpipe
Textpipe: clean and extract metadata from text
Stars: ✭ 284 (-38.53%)
Mutual labels:  text-processing
hck
A sharp cut(1) clone.
Stars: ✭ 542 (+17.32%)
Mutual labels:  text-processing
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (-7.79%)
Mutual labels:  text-processing
TextDatasetCleaner
🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-94.16%)
Mutual labels:  text-processing
daachorse
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
Stars: ✭ 75 (-83.77%)
Mutual labels:  text-processing
Diff Match Patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Stars: ✭ 4,910 (+962.77%)
Mutual labels:  text-processing
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (-6.28%)
Mutual labels:  text-processing
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (-24.68%)
Mutual labels:  text-processing

Name Parser

|Build Status| |PyPI| |PyPI version| |Documentation|

A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components.

  • hn.title
  • hn.first
  • hn.middle
  • hn.last
  • hn.suffix
  • hn.nickname
  • hn.surnames (middle + last)

Supported Name Structures


The supported name structure is generally "Title First Middle Last Suffix", where all pieces 
are optional. Comma-separated format like "Last, First" is also supported.

1. Title Firstname "Nickname" Middle Middle Lastname Suffix
2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]

Instantiating the `HumanName` class with a string splits on commas and then spaces, 
classifying name parts based on placement in the string and matches against known name 
pieces like titles and suffixes. 

It correctly handles some common conjunctions and special prefixes to last names
like "del". Titles and conjunctions can be chained together to handle complex
titles like "Asst Secretary of State". It can also try to correct capitalization
of names that are all upper- or lowercase names.

It attempts the best guess that can be made with a simple, rule-based approach. 
Its main use case is English and it is not likely to be useful for languages 
that do not conform to the supported name structure. It's not perfect, but it 
gets you pretty far.

Installation
------------

::

  pip install nameparser

If you want to try out the latest code from GitHub you can
install with pip using the command below.

``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``

If you need to handle lists of names, check out
`namesparser <https://github.com/gwu-libraries/namesparser>`_, a
compliment to this module that handles multiple names in a string.


Quick Start Example
-------------------

::

    >>> from nameparser import HumanName
    >>> name = HumanName("Dr. Juan Q. Xavier de la Vega III (Doc Vega)")
    >>> name 
    <HumanName : [
    	title: 'Dr.' 
    	first: 'Juan' 
    	middle: 'Q. Xavier' 
    	last: 'de la Vega' 
    	suffix: 'III'
    	nickname: 'Doc Vega'
    ]>
    >>> name.last
    'de la Vega'
    >>> name.as_dict()
    {'last': 'de la Vega', 'suffix': 'III', 'title': 'Dr.', 'middle': 'Q. Xavier', 'nickname': 'Doc Vega', 'first': 'Juan'}
    >>> str(name)
    'Dr. Juan Q. Xavier de la Vega III (Doc Vega)'
    >>> name.string_format = "{first} {last}"
    >>> str(name)
    'Juan de la Vega'


The parser does not attempt to correct mistakes in the input. It mostly just splits on white
space and puts things in buckets based on their position in the string. This also means
the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title
when it comes before the name and a suffix when it comes after. ("Pre-nominal"
and "post-nominal" would probably be better names.)

::

    >>> name = HumanName("1 & 2, 3 4 5, Mr.")
    >>> name 
    <HumanName : [
    	title: '' 
    	first: '3' 
    	middle: '4 5' 
    	last: '1 & 2' 
    	suffix: 'Mr.'
    	nickname: ''
    ]>

Customization
-------------

Your project may need some adjustment for your dataset. You can
do this in your own pre- or post-processing, by `customizing the configured pre-defined 
sets`_ of titles, prefixes, etc., or by subclassing the `HumanName` class. See the 
`full documentation`_ for more information.


`Full documentation`_
~~~~~~~~~~~~~~~~~~~~~

.. _customizing the configured pre-defined sets: http://nameparser.readthedocs.org/en/latest/customize.html
.. _Full documentation: http://nameparser.readthedocs.org/en/latest/


Contributing
------------

If you come across name piece that you think should be in the default config, you're
probably right. `Start a New Issue`_ and we can get them added. 

Please let me know if there are ways this library could be structured to make
it easier for you to use in your projects. Read CONTRIBUTING.md_ for more info
on running the tests and contributing to the project.

**GitHub Project**

https://github.com/derek73/python-nameparser

.. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md
.. _Start a New Issue: https://github.com/derek73/python-nameparser/issues
.. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py


.. |Build Status| image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
   :target: https://travis-ci.org/derek73/python-nameparser
.. |PyPI| image:: https://img.shields.io/pypi/v/nameparser.svg
   :target: https://pypi.org/project/nameparser/
.. |Documentation| image:: https://readthedocs.org/projects/nameparser/badge/?version=latest
   :target: http://nameparser.readthedocs.io/en/latest/?badge=latest
.. |PyPI version| image:: https://img.shields.io/pypi/pyversions/nameparser.svg
   :target: https://pypi.org/project/nameparser/
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].