All Projects → jasonrig → address-net

jasonrig / address-net

Licence: MIT license
A package to structure Australian addresses

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to address-net

Market-Trend-Prediction
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
Stars: ✭ 57 (-66.67%)
Mutual labels:  rnn
ACT
Alternative approach for Adaptive Computation Time in TensorFlow
Stars: ✭ 16 (-90.64%)
Mutual labels:  rnn
neural-namer
Fantasy name generator in TensorFlow
Stars: ✭ 65 (-61.99%)
Mutual labels:  rnn
STAR Network
[PAMI 2021] Gating Revisited: Deep Multi-layer RNNs That Can Be Trained
Stars: ✭ 16 (-90.64%)
Mutual labels:  rnn
nemesyst
Generalised and highly customisable, hybrid-parallelism, database based, deep learning framework.
Stars: ✭ 17 (-90.06%)
Mutual labels:  rnn
keras-utility-layer-collection
Collection of custom layers and utility functions for Keras which are missing in the main framework.
Stars: ✭ 63 (-63.16%)
Mutual labels:  rnn
machine learning
机器学习、深度学习、NLP实战项目
Stars: ✭ 123 (-28.07%)
Mutual labels:  rnn
Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-89.47%)
Mutual labels:  rnn
training-charRNN
Training charRNN model for ml5js
Stars: ✭ 87 (-49.12%)
Mutual labels:  rnn
name2gender
Extrapolate gender from first names using Naïve-Bayes and PyTorch Char-RNN
Stars: ✭ 24 (-85.96%)
Mutual labels:  rnn
Motor-Imagery-Tasks-Classification-using-EEG-data
Implementation of Deep Neural Networks in Keras and Tensorflow to classify motor imagery tasks using EEG data
Stars: ✭ 67 (-60.82%)
Mutual labels:  rnn
Predicting-Next-Character-using-RNN
Uses RNN on the Nietzsche dataset
Stars: ✭ 15 (-91.23%)
Mutual labels:  rnn
tf-attend-infer-repeat
TensorFlow-based implementation of "Attend, Infer, Repeat" paper (Eslami et al., 2016, arXiv:1603.08575).
Stars: ✭ 44 (-74.27%)
Mutual labels:  rnn
rnn-theano
RNN(LSTM, GRU) in Theano with mini-batch training; character-level language models in Theano
Stars: ✭ 68 (-60.23%)
Mutual labels:  rnn
Clockwork-RNN
This repository is a reproduction of the clockwork RNN paper.
Stars: ✭ 20 (-88.3%)
Mutual labels:  rnn
presidential-rnn
Project 4 for Metis bootcamp. Objective was generation of character-level RNN trained on Donald Trump's statements using Keras. Also generated Markov chains, and quick pyTorch RNN as baseline. Attempted semi-supervised GAN, but was unable to test in time.
Stars: ✭ 26 (-84.8%)
Mutual labels:  rnn
Probabilistic-RNN-DA-Classifier
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model
Stars: ✭ 22 (-87.13%)
Mutual labels:  rnn
catacomb
The simplest machine learning library for launching UIs, running evaluations, and comparing model performance.
Stars: ✭ 13 (-92.4%)
Mutual labels:  rnn
text-rnn-tensorflow
Tutorial: Multi-layer Recurrent Neural Networks (LSTM, RNN) for text models in Python using TensorFlow.
Stars: ✭ 22 (-87.13%)
Mutual labels:  rnn
MetaTraderForecast
RNN based Forecasting App for Meta Trader and similar trading platforms
Stars: ✭ 103 (-39.77%)
Mutual labels:  rnn

AddressNet

Read more on Towards Data Science

Background

This project is an attempt to create a recurrent neural network that segments an Australian street address into its components such that it can be more easily matched against a structured address database. The primary use-case for a model such as this is to transform legacy address data (e.g. unvalidated addresses, such as those collected on paper or by phone) into a reportable form at minimal cost. Once structured address data is produced, searching databases such as GNAF for geocoding information is much easier!

Installation

Get the latest code by installing directly from git using

pip install git+https://github.com/jasonrig/address-net.git

Or from PyPI:

pip install address-net
pip install address-net[tf]     # install TensorFlow (CPU version)
pip install address-net[tf_gpu] # install TensorFlow (GPU version)

You will need an appropriate version of TensorFlow installed, ideally greater than version 1.12. This is not automatically installed since the CPU and GPU versions of TensorFlow exist in separate packages.

Model output

This model performs character-level classification, assigning each a one of the following 22 classes as defined by the GNAF database:

  1. Separator/Blank
  2. Building name
  3. Level number prefix
  4. Level number
  5. Level number suffix
  6. Level type
  7. Flat number prefix
  8. Flat number
  9. Flat number suffix
  10. Flat type
  11. Number first prefix
  12. Number first
  13. Number first suffix
  14. Number last prefix
  15. Number last
  16. Number last suffix
  17. Street name
  18. Street suffix
  19. Street type
  20. Locality name
  21. State
  22. Postcode

An example result from this model for "168A Separation Street Northcote, VIC 3070" would be:

address classification for 168A Separation Street Northcote, VIC 3070

Architecture

This model uses a character-level vocabulary consisting of digits, lower-case ASCII characters, punctuation and whitespace as defined in Python's string package. These characters are encoded using embedding vectors of eight units in length.

The encoded text is fed through a bidirectional three-layer 128-Gated Recurrent Unit (GRU) Recurrent Neural Network (RNN). The outputs from the forward and backward pass are concatenated and fed through a dense layer with ELU activations to produce logits for each class. The final output probabilities are generated through a softmax transformation.

Regularisation is achieved in three ways:

  1. Data augmentation: the addresses constructed from the GNAF dataset are semi-randomly generated so that a huge variety of permutations are produced
  2. Noise: a random typo generator that creates plausible errors consisting of insertions, transpositions, deletions and substitutions of nearby keys on the keyboard is used for each address
  3. Dropout for the outputs and state is applied to the RNN layers

Data sources

The data used to produce this model was from the GNAF database and is available under a permissive Creative Commons-like license. The GNAF data is available as a series of SQL files that can be imported to databases such as PostgreSQL, including a summary view named "address_view". Code included in generate_tf_records.py was used to consume a CSV dump of this file, producing a TFRecord file that is natively supported by TensorFlow.

Pretrained model

While you are free to train this model using the model_fn provided, a pretrained model is supplied with this package under addressnet/pretrained and is the default model loaded when using the prediction function. Thus, using this package should be as simple as:

from addressnet.predict import predict_one

if __name__ == "__main__":
    # This is a fake address!
    print(predict_one("casa del gelato, 10A 24-26 high street road mount waverley vic 3183"))

Expected output:

{
    'building_name': 'CASA DEL GELATO',
    'flat_number': '10',
    'flat_number_suffix': 'A',
    'number_first': '24',
    'number_last': '26',
    'street_name': 'HIGH STREET',
    'street_type': 'ROAD',
    'locality_name': 'MOUNT WAVERLEY',
    'state': 'VICTORIA',
    'postcode': '3183'
}

Because the model is not sensitive to small typographical errors, a simple string similarity algorithm is used to normalise fields such as street_type and state, since we know exhaustively what they should be.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].