GeoParser
The Geoparser is a software tool that can process information from any type of file, extract geographic coordinates, and visualize locations on a map. Users who are interested in seeing a geographical representation of information or data can choose to search for locations using the Geoparser, through a search index or by uploading files from their computer. The Geoparser will parse the files and visualizes cities or latitude-longitude points on the map. After the information is parsed and points are plotted on the map, users are able to filter their results by density, or by searching a key word and applying a "facet" to the parsed information. On the map, users can click on location points to reveal more information about the location and how it is related to their search.
Installation (Docker)
docker build -t nasajplmemex/geo-parser --no-cache -f Dockerfile .
docker-compose up -d
- Visit
http://localhost:8000
on your browser
Try it out to help fight COVID!
GeoParser has been updated with a new easy to use Docker install, and also an example to download and run the COVID-19 literature data and view the locations. Use that example to explore and test out GeoParser on a real example and view locations from that dataset.
Installation (manually)
Requirements
- Python 2.7
- pip
- Django
- Tika Python
Install Requirements
- Install python requirements
pip install -r requirements.txt
How to Run the Application
-
Run Solr Change directory to where you cloned the project
cd Solr/solr-5.3.1/ ./bin/solr start
-
Clone lucene-geo-gazetteer repo
git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git cd lucene-geo-gazetteer mvn install assembly:assembly add lucene-geo-gazetteer/src/main/bin to your PATH environment variable
make sure it is working
lucene-geo-gazetteer --help usage: lucene-geo-gazetteer -b,--build <gazetteer file> The Path to the Geonames allCountries.txt -h,--help Print this message. -i,--index <directoryPath> The path to the Lucene index directory to either create or read -s,--search <set of location names> Location names to search the Gazetteer for
-
You will now need to build a Gazetteer using the Geonames.org dataset. (1.2 GB)
cd lucene-geo-gazetteer curl -O http://download.geonames.org/export/dump/allCountries.zip unzip allCountries.zip lucene-geo-gazetteer -i geoIndex -b allCountries.txt
make sure it is working
lucene-geo-gazetteer -s Pasadena Texas [ {"Texas" : [ "Texas", "-91.92139", "18.05333" ]}, {"Pasadena" : [ "Pasadena", "-74.06446", "4.6964" ]} ]
Now start lucene-geo-gazetteer server
lucene-geo-gazetteer -server
-
Run tika server as mentioned in
https://cwiki.apache.org/confluence/display/TIKA/GeoTopicParser
on port8001
. Port can be configured via config.txt -
Make sure you can extract locations from Tika Server
curl -T /path/to/polar.geot -H "Content-Disposition: attachment; filename=polar.geot" http://localhost:8001/rmeta
You can obtain [file here] (https://raw.githubusercontent.com/chrismattmann/geotopicparser-utils/master/geotopics/polar.geot)
Output should be this
[
{
"Content-Type":"application/geotopic",
"Geographic_LATITUDE":"39.76",
"Geographic_LONGITUDE":"-98.5",
"Geographic_NAME":"United States",
"Optional_LATITUDE1":"27.33931",
"Optional_LONGITUDE1":"-108.60288",
"Optional_NAME1":"China",
"X-Parsed-By":[
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.geo.topic.GeoParser"
],
"X-TIKA:parse_time_millis":"1634",
"resourceName":"polar.geot"
}
]
-
Run Django server
python manage.py runserver
-
Open in browser http://localhost:8000/ Note : Please refer to the wiki page on this github repository which can act as a guide for you on how to use GeoParser.