Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → derek73 → Python Nameparser

derek73 / Python Nameparser

Licence: other

A simple Python module for parsing human names into their individual components

Programming Languages

139335 projects - #7 most used programming language

Labels

text-processing

Projects that are alternatives of or similar to Python Nameparser

Drop-in replacements for base R string functions powered by stringi

Stars: ✭ 14 (-96.97%)

Mutual labels: text-processing

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Stars: ✭ 48 (-89.61%)

Mutual labels: text-processing

A fast implementation of Aho-Corasick in Rust.

Stars: ✭ 424 (-8.23%)

Mutual labels: text-processing

advanced-text-mining

TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.

Stars: ✭ 15 (-96.75%)

Mutual labels: text-processing

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (-69.26%)

Mutual labels: text-processing

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-95.89%)

Mutual labels: text-processing

Transliterate español (spanish) spelling to andaluz proposals using javascript

Stars: ✭ 22 (-95.24%)

Mutual labels: text-processing

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

Stars: ✭ 438 (-5.19%)

Mutual labels: text-processing

Useful python NLP tools (evaluation, GUI interface, tokenization)

Stars: ✭ 39 (-91.56%)

Mutual labels: text-processing

Simple SQL-like syntax on top of Perl text processing.

Stars: ✭ 414 (-10.39%)

Mutual labels: text-processing

🍟 [Library] dA aNn0Y1Ng t3Xt g3NeRa7or

Stars: ✭ 22 (-95.24%)

Mutual labels: text-processing

gnu-linux-shell-scripting

A foundation for GNU/Linux shell scripting

Stars: ✭ 23 (-95.02%)

Mutual labels: text-processing

Textpipe: clean and extract metadata from text

Stars: ✭ 284 (-38.53%)

Mutual labels: text-processing

A sharp cut(1) clone.

Stars: ✭ 542 (+17.32%)

Mutual labels: text-processing

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (-7.79%)

Mutual labels: text-processing

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (-94.16%)

Mutual labels: text-processing

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.

Stars: ✭ 75 (-83.77%)

Mutual labels: text-processing

Diff Match Patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Stars: ✭ 4,910 (+962.77%)

Mutual labels: text-processing

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (-6.28%)

Mutual labels: text-processing

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (-24.68%)

Mutual labels: text-processing

View All Similar Projects ➔

Name Parser

|Build Status| |PyPI| |PyPI version| |Documentation|

A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components.

hn.title
hn.first
hn.middle
hn.last
hn.suffix
hn.nickname
hn.surnames (middle + last)

Supported Name Structures


The supported name structure is generally "Title First Middle Last Suffix", where all pieces 
are optional. Comma-separated format like "Last, First" is also supported.

1. Title Firstname "Nickname" Middle Middle Lastname Suffix
2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]

Instantiating the `HumanName` class with a string splits on commas and then spaces, 
classifying name parts based on placement in the string and matches against known name 
pieces like titles and suffixes. 

It correctly handles some common conjunctions and special prefixes to last names
like "del". Titles and conjunctions can be chained together to handle complex
titles like "Asst Secretary of State". It can also try to correct capitalization
of names that are all upper- or lowercase names.

It attempts the best guess that can be made with a simple, rule-based approach. 
Its main use case is English and it is not likely to be useful for languages 
that do not conform to the supported name structure. It's not perfect, but it 
gets you pretty far.

Installation
------------

::

  pip install nameparser

If you want to try out the latest code from GitHub you can
install with pip using the command below.

``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``

If you need to handle lists of names, check out
`namesparser <https://github.com/gwu-libraries/namesparser>`_, a
compliment to this module that handles multiple names in a string.


Quick Start Example
-------------------

::

    >>> from nameparser import HumanName
    >>> name = HumanName("Dr. Juan Q. Xavier de la Vega III (Doc Vega)")
    >>> name 
    <HumanName : [
    	title: 'Dr.' 
    	first: 'Juan' 
    	middle: 'Q. Xavier' 
    	last: 'de la Vega' 
    	suffix: 'III'
    	nickname: 'Doc Vega'
    ]>
    >>> name.last
    'de la Vega'
    >>> name.as_dict()
    {'last': 'de la Vega', 'suffix': 'III', 'title': 'Dr.', 'middle': 'Q. Xavier', 'nickname': 'Doc Vega', 'first': 'Juan'}
    >>> str(name)
    'Dr. Juan Q. Xavier de la Vega III (Doc Vega)'
    >>> name.string_format = "{first} {last}"
    >>> str(name)
    'Juan de la Vega'


The parser does not attempt to correct mistakes in the input. It mostly just splits on white
space and puts things in buckets based on their position in the string. This also means
the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title
when it comes before the name and a suffix when it comes after. ("Pre-nominal"
and "post-nominal" would probably be better names.)

::

    >>> name = HumanName("1 & 2, 3 4 5, Mr.")
    >>> name 
    <HumanName : [
    	title: '' 
    	first: '3' 
    	middle: '4 5' 
    	last: '1 & 2' 
    	suffix: 'Mr.'
    	nickname: ''
    ]>

Customization
-------------

Your project may need some adjustment for your dataset. You can
do this in your own pre- or post-processing, by `customizing the configured pre-defined 
sets`_ of titles, prefixes, etc., or by subclassing the `HumanName` class. See the 
`full documentation`_ for more information.


`Full documentation`_
~~~~~~~~~~~~~~~~~~~~~

.. _customizing the configured pre-defined sets: http://nameparser.readthedocs.org/en/latest/customize.html
.. _Full documentation: http://nameparser.readthedocs.org/en/latest/


Contributing
------------

If you come across name piece that you think should be in the default config, you're
probably right. `Start a New Issue`_ and we can get them added. 

Please let me know if there are ways this library could be structured to make
it easier for you to use in your projects. Read CONTRIBUTING.md_ for more info
on running the tests and contributing to the project.

**GitHub Project**

https://github.com/derek73/python-nameparser

.. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md
.. _Start a New Issue: https://github.com/derek73/python-nameparser/issues
.. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py


.. |Build Status| image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
   :target: https://travis-ci.org/derek73/python-nameparser
.. |PyPI| image:: https://img.shields.io/pypi/v/nameparser.svg
   :target: https://pypi.org/project/nameparser/
.. |Documentation| image:: https://readthedocs.org/projects/nameparser/badge/?version=latest
   :target: http://nameparser.readthedocs.io/en/latest/?badge=latest
.. |PyPI version| image:: https://img.shields.io/pypi/pyversions/nameparser.svg
   :target: https://pypi.org/project/nameparser/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 462

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (27) 🔗