All Projects → milangritta → Geocoding-with-Map-Vector

milangritta / Geocoding-with-Map-Vector

Licence: GPL-3.0 license
Resources for the ACL 2018 publication "Which Melbourne? Augmenting Geocoding with Maps", published in July 2018.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Geocoding-with-Map-Vector

WhatsMissingInGeoparsing
The accompanying code and data for the Springer 2017 publication "What's missing in geographical parsing?" in Language Resources and Evaluation.
Stars: ✭ 15 (-37.5%)
Mutual labels:  geoparsing, geocoding, toponyms, toponymy, toponym-resolution
Geo On Fire
A library to create high performance geolocation queries for Firebase. Checkout the demos: https://run.plnkr.co/plunks/AYaN8ABEDcMntgbJyLVW/ and https://run.plnkr.co/plunks/xJgstAvXYcp0w7MbOOjm/
Stars: ✭ 54 (+125%)
Mutual labels:  geocoding, geolocation
Pelias Android Sdk
Android sdk for pelias
Stars: ✭ 20 (-16.67%)
Mutual labels:  geocoding, geolocation
Spyme
Rails plugin that stores the browser geolocation
Stars: ✭ 108 (+350%)
Mutual labels:  geocoding, geolocation
geocoder
Geocoder is a Typescript library which helps you build geo-aware applications by providing a powerful abstraction layer for geocoding manipulations
Stars: ✭ 28 (+16.67%)
Mutual labels:  geocoding, geolocation
Leaflet Geosearch
(Leaflet) GeoSearch / GeoCode provider
Stars: ✭ 666 (+2675%)
Mutual labels:  geocoding, geolocation
React Native Radar
React Native module for Radar, the leading geofencing and location tracking platform
Stars: ✭ 104 (+333.33%)
Mutual labels:  geocoding, geolocation
Googleapi
C# .NET Core Google Api (Maps, Places, Roads, Search, Translate). Supports all endpoints and requests / responses.
Stars: ✭ 346 (+1341.67%)
Mutual labels:  geocoding, geolocation
CLAVIN-rest
A Spring Boot microservice that serves the CLAVIN (https://github.com/novetta/CLAVIN) library for geo rectifying locations mentioned in text.
Stars: ✭ 16 (-33.33%)
Mutual labels:  geoparsing, geolocation
CLAVIN-NERD
Stanford NLP Implementation of the CLAVIN LocationTagger
Stars: ✭ 22 (-8.33%)
Mutual labels:  geoparsing, geolocation
kirby-locator
A simple map & geolocation field, built on top of open-source services and Mapbox. Kirby 3 only.
Stars: ✭ 83 (+245.83%)
Mutual labels:  geocoding, geolocation
radar-sdk-android
Android SDK for Radar, the leading geofencing and location tracking platform
Stars: ✭ 57 (+137.5%)
Mutual labels:  geocoding, geolocation
Pgeocode
Postal code geocoding and distance calculation
Stars: ✭ 92 (+283.33%)
Mutual labels:  geocoding, geolocation
Jpx
JPX - Java GPX library
Stars: ✭ 125 (+420.83%)
Mutual labels:  geocoding, geolocation
Xponents
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
Stars: ✭ 39 (+62.5%)
Mutual labels:  geoparsing, geocoding
cordova-plugin-radar
Cordova plugin for Radar, the leading geofencing and location tracking platform
Stars: ✭ 14 (-41.67%)
Mutual labels:  geocoding, geolocation
ember-cli-geo
Geolocation service for Ember.js web apps
Stars: ✭ 48 (+100%)
Mutual labels:  geolocation
ip2location-lua
Use IP2Location geolocation database to lookup the geolocation information with IP2Location Lua Package. It can be used to determine country, region, city, coordinates, zip code, time zone, ISP, domain name, connection type, area code, weather, MCC, MNC, mobile brand name, elevation, usage type, address type and IAB category that any IP address …
Stars: ✭ 14 (-41.67%)
Mutual labels:  geolocation
GeoLite.mmdb
MaxMind's GeoIP2 GeoLite2 Country, City, and ASN databases
Stars: ✭ 690 (+2775%)
Mutual labels:  geolocation
ip2location-csv-converter
This PHP script converts IP2Location CSV database into IP range or CIDR format.
Stars: ✭ 26 (+8.33%)
Mutual labels:  geolocation

Which Melbourne? Augmenting Geocoding with Maps

Resources accompanying the ACL 2018 long paper, presented in Melbourne, Australia.

The accepted pdf manuscript is also included in this directory (as is the .PPTX from the Melbourne presentation). The video recording of the Melbourne presentation can be found here (https://vimeo.com/285803462).

Abstract

The purpose of text geolocation is to associate geographic information contained in a document with a set (or sets) of coordinates, either implicitly by using linguistic features and/or explicitly by using geographic metadata combined with heuristics. We introduce a geocoder (location mention disambiguator) that achieves state-of-the-art (SOTA) results on three diverse datasets by exploiting the implicit lexical clues. Moreover, we propose a new method for systematic encoding of geographic metadata to generate two distinct views of the same text. To that end, we introduce the Map Vector (MapVec), a sparse representation obtained by plotting prior geographic probabilities, derived from population figures, on a World Map. We then integrate the implicit (language) and explicit (map) features to significantly improve a range of metrics. We also introduce an open-source dataset for geoparsing of news events covering global disease outbreaks and epidemics to help future evaluation in geoparsing.

Resources

This repository contains the accompanying data and source code for CamCoder (toponym resolver) described in the paper. Additional data is required as the files are too large for GitHub, please download files from https://www.repository.cam.ac.uk/handle/1810/277772.

Dependencies

Instructions

  • Download the weights.zip and geonames.db.zip files as a minimum (optional files available from https://www.repository.cam.ac.uk/handle/1810/277772).
  • Read the README.txt in the repository to learn about the contents.
  • Create a data folder outside the root directory to store the large files. N.B. There is already a data folder inside the root directory! This holds the small files.
  • Unzip the files into that directory, this will take up a few GBs of space.
  • For replication, use test.py and see further instructions in the code. That should run out of the box if you followed the previous instructions. If not, get in touch!
  • To tweak the model, use train.py, see comments inside the script for more info.

Use a GPU, if you can, a CPU epoch takes such a looooooong time, it's only worth it for small jobs. Contact me on ✉️ mg711 at cam dot ac dot uk ✉️ if you need any help with reproduction or some other aspect of this work at any time. After graduation, find me on Twitter/milangritta or raise an issue/ticket.

Tools

I included a couple of 'tools' for applied scientists and tinkerers in case you want to parse your own text and/or want to compare system performance with your research.

text2mapVec.py

This is a simple function buildMapVec(text) that turns text into a Map Vector i.e. extracts locations/toponyms with Spacy NER and creates the 'bag of locations' or the Map Vector as an additional feature vector to be used in a downstream task.

NOTE: The speed of execution won't be a record breaker, this is research code, I'm really busy trying to finish the PhD, sorry, I don't have time to rewrite it from scratch using proper software engineering principles. I hope you understand. Feel free to fork and edit.

geoparse.py

Unline most (maybe all) geoparsers, CamCoder can perform geotagging (NER) and geocoding separately. Use (1.) for the full pipeline and (2.) for toponym resolution only.

  1. To geocode with NER: Use geoparse(text), instructions in the code.
  2. To geocode with Oracle: This will be slightly more laborious as you will need the generate_evaluation_data(corpus, file_name) function in preprocessing.py. First, save your evaluation dataset in the format of data/lgl.txt (name,,name,,lat,,lon,,start,end) then you don't have to modify any code. I think it's the best option. Once you have generated machine-readable data with that function, you're ready to test.py the performance.

NOTE: CamCoder uses Spacy NER for Named Entity Recognition. The reported F-Scores for each model can be found here https://spacy.io/models/en, not that great and will certainly affect performance. Use Oracle NER for a scientifically adequate comparison. Oracle means you extract the entities separately with perfect fidelity, then evaluate toponym recognition in isolation. Also feel free to plug in a custom NER tagger, the code is extendable and should be well documented. Famous last words :-)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].