Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+51.67%)

Mutual labels: natural-language-processing, crf

Libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

Stars: ✭ 3,312 (+184.29%)

Mutual labels: natural-language-processing, address

Convai Bot 1337

NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager

Stars: ✭ 65 (-94.42%)

Mutual labels: natural-language-processing

Unet Crf Rnn

Edge-aware U-Net with CRF-RNN layer for Medical Image Segmentation

Stars: ✭ 63 (-94.59%)

Mutual labels: crf

Slate

A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python

Stars: ✭ 61 (-94.76%)

Mutual labels: natural-language-processing

Fromscratch

Stars: ✭ 61 (-94.76%)

Mutual labels: natural-language-processing

Get started with deep learning for text with allennlp

Getting started with AllenNLP and PyTorch by training a tweet classifier

Stars: ✭ 69 (-94.08%)

Mutual labels: natural-language-processing

Hackerrank

This is the Repository where you can find all the solution of the Problems which you solve on competitive platforms mainly HackerRank and HackerEarth

Stars: ✭ 68 (-94.16%)

Mutual labels: natural-language-processing

Kor2vec

Library for Korean morpheme and word vector representation

Stars: ✭ 64 (-94.51%)

Mutual labels: natural-language-processing

Repo 2017

Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano

Stars: ✭ 1,123 (-3.61%)

Mutual labels: natural-language-processing

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (-2.83%)

Mutual labels: natural-language-processing

Emnlp2018 nli

Repository for NLI models (EMNLP 2018)

Stars: ✭ 62 (-94.68%)

Mutual labels: natural-language-processing

Touchdown

Cornell Touchdown natural language navigation and spatial reasoning dataset.

Stars: ✭ 69 (-94.08%)

Mutual labels: natural-language-processing

How To Mine Newsfeed Data And Extract Interactive Insights In Python

A practical guide to topic mining and interactive visualizations

Stars: ✭ 61 (-94.76%)

Mutual labels: natural-language-processing

Multilingual Latent Dirichlet Allocation Lda

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Stars: ✭ 64 (-94.51%)

Mutual labels: natural-language-processing

Intent classifier

Stars: ✭ 67 (-94.25%)

Mutual labels: natural-language-processing

Gpt2

PyTorch Implementation of OpenAI GPT-2

Stars: ✭ 64 (-94.51%)

Mutual labels: natural-language-processing

Languagetoys

Random fun with statistical language models.

Stars: ✭ 63 (-94.59%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

usaddress

usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods. Try it out on our web interface! For those who aren't Python developers, we also have an API.

What this can do: Using a probabilistic model, it makes (very educated) guesses in identifying address components, even in tricky cases where rule-based parsers typically break down.

What this cannot do: It cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.

It also does not normalize the address. However, this library built on top of usaddress does.

How to use the usaddress python library

Install usaddress with pip, a tool for installing and managing python packages (beginner's guide here).

In the terminal,

pip install usaddress

Parse some addresses!

Note that parse and tag are different methods:

import usaddress
addr='123 Main St. Suite 100 Chicago, IL'

# The parse method will split your address string into components, and label each component.
# expected output: [(u'123', 'AddressNumber'), (u'Main', 'StreetName'), (u'St.', 'StreetNamePostType'), (u'Suite', 'OccupancyType'), (u'100', 'OccupancyIdentifier'), (u'Chicago,', 'PlaceName'), (u'IL', 'StateName')]
usaddress.parse(addr)

# The tag method will try to be a little smarter
# it will merge consecutive components, strip commas, & return an address type
# expected output: (OrderedDict([('AddressNumber', u'123'), ('StreetName', u'Main'), ('StreetNamePostType', u'St.'), ('OccupancyType', u'Suite'), ('OccupancyIdentifier', u'100'), ('PlaceName', u'Chicago'), ('StateName', u'IL')]), 'Street Address')
usaddress.tag(addr)

How to use this development code (for the nerds)

usaddress uses parserator, a library for making and improving probabilistic parsers - specifically, parsers that use python-crfsuite's implementation of conditional random fields. Parserator allows you to train the usaddress parser's model (a .crfsuite settings file) on labeled training data, and provides tools for adding new labeled training data.

Building & testing the code in this repo

To build a development version of usaddress on your machine, run the following code in your command line:

git clone https://github.com/datamade/usaddress.git  
cd usaddress  
pip install -r requirements.txt  
python setup.py develop  
parserator train training/labeled.xml usaddress

Then run the testing suite to confirm that everything is working properly:

nosetests .

Having trouble building the code? Open an issue and we'd be glad to help you troubleshoot.

Adding new training data

If usaddress is consistently failing on particular address patterns, you can adjust the parser's behavior by adding new training data to the model. Follow our guide in the training directory, and be sure to make a pull request so that we can incorporate your contribution into our next release!

Important links

Web Interface: https://parserator.datamade.us/usaddress
Python Package Distribution: https://pypi.python.org/pypi/usaddress
Python Package Documentation: https://usaddress.readthedocs.io/
API Documentation: https://parserator.datamade.us/api-docs
Repository: https://github.com/datamade/usaddress
Issues: https://github.com/datamade/usaddress/issues
Blog post: http://datamade.us/blog/parsing-addresses-with-usaddress

Team

Forest Gregg, DataMade
Cathy Deng, DataMade
Miroslav Batchkarov, University of Sussex
Jean Cochrane, DataMade

Bad Parses / Bugs

Report issues in the issue tracker

If an address was parsed incorrectly, please let us know! You can either open an issue or (if you're adventurous) add new training data to improve the parser's model. When possible, please send over a few real-world examples of similar address patterns, along with some info about the source of the data - this will help us train the parser and improve its performance.

If something in the library is not behaving intuitively, it is a bug, and should be reported.

Note on Patches/Pull Requests

Fork the project.
Make your feature addition or bug fix.
Send us a pull request. Bonus points for topic branches!

Copyright

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,165

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (104) 🔗