All Projects → Novetta → Clavin

Novetta / Clavin

Licence: apache-2.0
CLAVIN (Cartographic Location And Vicinity INdexer) is an open source software package for document geoparsing and georesolution that employs context-based geographic entity resolution.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Clavin

Flutter Geolocator
Android and iOS Geolocation plugin for Flutter
Stars: ✭ 759 (+220.25%)
Mutual labels:  geolocation, geo
Places
🌐 Turn any <input> into an address autocomplete
Stars: ✭ 5,322 (+2145.57%)
Mutual labels:  geolocation, geo
geocoder
Geocoder is a Typescript library which helps you build geo-aware applications by providing a powerful abstraction layer for geocoding manipulations
Stars: ✭ 28 (-88.19%)
Mutual labels:  geo, geolocation
Geotools
Geo-related tools PHP 5.4+ library built atop Geocoder and React libraries
Stars: ✭ 1,157 (+388.19%)
Mutual labels:  geolocation, geo
Snoop
Snoop — инструмент разведки на основе открытых данных (OSINT world)
Stars: ✭ 886 (+273.84%)
Mutual labels:  geolocation, geo
Z1p
Zip Codes Validation and Parse.
Stars: ✭ 17 (-92.83%)
Mutual labels:  geolocation, geo
geo
Geospatial primitives and algorithms for Crystal
Stars: ✭ 17 (-92.83%)
Mutual labels:  geo, geolocation
Geo From Ip
Get geolocation 🌐 information about an IP 📲
Stars: ✭ 24 (-89.87%)
Mutual labels:  geolocation, geo
Geo On Fire
A library to create high performance geolocation queries for Firebase. Checkout the demos: https://run.plnkr.co/plunks/AYaN8ABEDcMntgbJyLVW/ and https://run.plnkr.co/plunks/xJgstAvXYcp0w7MbOOjm/
Stars: ✭ 54 (-77.22%)
Mutual labels:  geolocation, geo
React Native Android Location Services Dialog Box
React Native Android Location Services Dialog Box
Stars: ✭ 175 (-26.16%)
Mutual labels:  geolocation, geo
Geolocation
Flutter geolocation plugin for Android and iOS.
Stars: ✭ 205 (-13.5%)
Mutual labels:  geolocation
Weather
Taiwan's Weather Maps! 想查詢每個地方的天氣嗎!?藉由 Google Maps API 的地圖服務,以及中央氣象局網站的天氣預報,讓你快速輕鬆的查詢台灣 368 個鄉鎮的天氣概況!
Stars: ✭ 206 (-13.08%)
Mutual labels:  geolocation
Geostats.jl
An extensible framework for high-performance geostatistics in Julia
Stars: ✭ 222 (-6.33%)
Mutual labels:  geo
Use Position
🌍 React hook usePosition() for fetching and following a browser geolocation
Stars: ✭ 230 (-2.95%)
Mutual labels:  geolocation
City Geo
🌄 中国城市经纬度数据。
Stars: ✭ 196 (-17.3%)
Mutual labels:  geo
Iploc
Fastest IP To Country Library
Stars: ✭ 224 (-5.49%)
Mutual labels:  geolocation
Awesome Gis
😎Awesome GIS is a collection of geospatial related sources, including cartographic tools, geoanalysis tools, developer tools, data, conference & communities, news, massive open online course, some amazing map sites, and more.
Stars: ✭ 2,582 (+989.45%)
Mutual labels:  geo
Jblog
🔱一个简洁漂亮的java blog 👉基于Spring /MVC+ Hibernate + MySQL + Bootstrap + freemarker. 实现 🌈
Stars: ✭ 187 (-21.1%)
Mutual labels:  lucene
Smartstorenet
Open Source ASP.NET MVC Enterprise eCommerce Shopping Cart Solution
Stars: ✭ 2,363 (+897.05%)
Mutual labels:  lucene
Lucene
lucene技术细节
Stars: ✭ 233 (-1.69%)
Mutual labels:  lucene

CLAVIN

CLAVIN Master

License

CLAVIN (Cartographic Location And Vicinity INdexer) is an open source software package for document geoparsing and georesolution that employs context-based geographic entity resolution. It combines a variety of open source tools with natural language processing techniques to extract location names from unstructured text documents and resolve them against gazetteer records. Importantly, CLAVIN does not simply "look up" location names; rather, it uses intelligent heuristics-based combinatorial optimization in an attempt to identify precisely which "Springfield" (for example) was intended by the author, based on the context of the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled location names, and it recognizes alternative names (e.g., "Ivory Coast" and "Côte d'Ivoire") as referring to the same geographic entity. By enriching text documents with structured geo data, CLAVIN enables hierarchical geospatial search and advanced geospatial analytics on unstructured data.

CLAVIN natively uses Apache OpenNLP for extracting place names in text as part of this library. CLAVIN now also integrates with Novetta's own AdaptNLP project for place name extraction. To use AdaptNLP, you'll need to follow the instructions on that repo to bring up an instance of the extractor. Lastly, we also maintain the clavin-nerd project (which will be updated in the near future), that enables CLAVIN to use Stanford NER.

Novetta also maintains the CLAVIN-Rest project, which provides a RESTful microservice wrapper around CLAVIN or CLAVIN-NERD. CLAVIN-Rest is configured (and provides instructions) to easily build and run this package as a docker image.

Breaking changes

This release includes breaking changes in the form of an update to all namespaces. The namespaces have been changed from com.bericotech to com.novetta which reflects a change in corporate ownership, and re-alignment to our new domain.

How to build & use CLAVIN:

  1. Check out a copy of the source code:
git clone https://github.com/Novetta/CLAVIN.git
  1. Move into the newly-created CLAVIN directory:
cd CLAVIN
  1. Download the latest version of allCountries.zip gazetteer file from GeoNames.org:
curl -O http://download.geonames.org/export/dump/allCountries.zip
  1. Unzip the GeoNames gazetteer file:
unzip allCountries.zip
  1. Compile the source code:
mvn compile
  1. Create the Lucene Index (this one-time process will take several minutes):
MAVEN_OPTS="-Xmx4g" mvn exec:java -Dexec.mainClass="com.novetta.clavin.index.IndexDirectoryBuilder"
  1. Run the example program:
MAVEN_OPTS="-Xmx2g" mvn exec:java -Dexec.mainClass="com.novetta.clavin.WorkflowDemo"

If you encounter an error that looks like this:

... InvocationTargetException: Java heap space ...

Set the appropriate environmental variable controlling Maven's memory usage, and increase the size with export MAVEN_OPTS=-Xmx4g or similar.

Once that all runs successfully, feel free to modify the CLAVIN source code to suit your needs.

N.B.: Loading the worldwide gazetteer uses a non-trivial amount of memory. When using CLAVIN in your own programs, if you encounter Java heap space errors (like the one described in Step 7), bump up the maximum heap size for your JVM.

Add CLAVIN to your project:

CLAVIN is published to Maven Central. You can add a dependency on the CLAVIN project:

<dependency>
   <groupId>com.novetta</groupId>
   <artifactId>CLAVIN</artifactId>
   <version>3.0.0</version>
</dependency>

You will still need to build the GeoNames Lucene Index as described in steps 3, 4, and 6 in "How to build & use CLAVIN".

Choosing an Extractor

When using this library, you're now able to choose between two different extractors: Novetta AdaptNLP and Apache OpenNLP. For AdaptNLP

AdaptNLP

Creating an AdaptNlpExtractor:

LocationExtractor extractor = new AdaptNlpExtractor();

OpenNLP

Creating an ApacheExtractor:

LocationExtractor extractor = new ApacheExtractor();

There are also some convenience methods in the GeoParserFactory for Apache OpenNLP.

So, for example, to set up the Gazetteer, AdaptNLP Extractor and GeoParser classes from scratch, it looks like this with default settings:

// the maximum hit depth for CLAVIN searches
private int maxHitDepth = 3;

// the maximum context window for CLAVIN searches
private int maxContextWindow = 5;

// switch controlling use of fuzzy matching
private boolean fuzzy = false;

// adaptnlp host, port
private string host = "http://localhost";
private int port = 5000;

Gazetteer gazetteer = new LuceneGazetteer(new File(pathToLuceneIndex));
LocationExtractor extractor = new AdaptNlpExtractor(host, port);
Geoparser parser = new GeoParser(extractor, gazetteer, maxHitDepth, maxContentWindow, fuzzy);

License:

Copyright (C) 2012-2020 Novetta

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].