All Projects ā†’ halolimat ā†’ LNEx

halolimat / LNEx

Licence: AGPL-3.0 License
šŸ“ šŸ¢ šŸ¦ šŸ£ šŸŖ šŸ¬ LNEx: Location Name Extractor

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to LNEx

Information Extraction Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT äø­ę–‡å®žä½“čƆ别äøŽå…³ē³»ęå–
Stars: āœ­ 1,888 (+8890.48%)
Mutual labels:  information-extraction, named-entity-recognition
IE Paper Notes
Paper notes for Information Extraction, including Relation Extraction (RE), Named Entity Recognition (NER), Entity Linking (EL), Event Extraction (EE), Named Entity Disambiguation (NED).
Stars: āœ­ 14 (-33.33%)
Mutual labels:  information-extraction, named-entity-recognition
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: āœ­ 249 (+1085.71%)
Mutual labels:  information-extraction, named-entity-recognition
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: āœ­ 121 (+476.19%)
Mutual labels:  information-extraction, named-entity-recognition
neji
Flexible and powerful platform for biomedical information extraction from text
Stars: āœ­ 37 (+76.19%)
Mutual labels:  information-extraction, named-entity-recognition
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: āœ­ 124 (+490.48%)
Mutual labels:  information-extraction, named-entity-recognition
neural name tagging
Code for "Reliability-aware Dynamic Feature Composition for Name Tagging" (ACL2019)
Stars: āœ­ 39 (+85.71%)
Mutual labels:  information-extraction, named-entity-recognition
knowledge-graph-nlp-in-action
从ęØ”åž‹č®­ē»ƒåˆ°éƒØē½²ļ¼Œå®žęˆ˜ēŸ„čÆ†å›¾č°±(Knowledge Graph)&č‡Ŗē„¶čÆ­č؀处ē†(NLP)ć€‚ę¶‰åŠ Tensorflow, Bert+Bi-LSTM+CRF,Neo4jē­‰ 궵ē›– Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction ē­‰ä»»åŠ”怂
Stars: āœ­ 58 (+176.19%)
Mutual labels:  information-extraction, named-entity-recognition
slotminer
Tool for slot extraction from text
Stars: āœ­ 15 (-28.57%)
Mutual labels:  information-extraction, named-entity-recognition
InformationExtractionSystem
Information Extraction System can perform NLP tasks like Named Entity Recognition, Sentence Simplification, Relation Extraction etc.
Stars: āœ­ 27 (+28.57%)
Mutual labels:  information-extraction, named-entity-recognition
Nested Ner Tacl2020 Transformers
Implementation of Nested Named Entity Recognition using BERT
Stars: āœ­ 76 (+261.9%)
Mutual labels:  information-extraction, named-entity-recognition
simple NER
simple rule based named entity recognition
Stars: āœ­ 29 (+38.1%)
Mutual labels:  information-extraction, named-entity-recognition
Understanding Financial Reports Using Natural Language Processing
Investigate how mutual funds leverage credit derivatives by studying their routine filings to the SEC using NLP techniques šŸ“ˆšŸ¤‘
Stars: āœ­ 36 (+71.43%)
Mutual labels:  information-extraction, named-entity-recognition
Triggerner
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
Stars: āœ­ 141 (+571.43%)
Mutual labels:  information-extraction, named-entity-recognition
Snips Nlu
Snips Python library to extract meaning from text
Stars: āœ­ 3,583 (+16961.9%)
Mutual labels:  information-extraction, named-entity-recognition
lima
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: āœ­ 75 (+257.14%)
Mutual labels:  information-extraction, named-entity-recognition
CogIE
CogIE: An Information Extraction Toolkit for Bridging Text and CogNet. ACL 2021
Stars: āœ­ 47 (+123.81%)
Mutual labels:  information-extraction, named-entity-recognition
trinity-ie
Information extraction pipeline containing coreference resolution, named entity linking, and relationship extraction
Stars: āœ­ 59 (+180.95%)
Mutual labels:  information-extraction, named-entity-recognition
nested-ner-tacl2020-flair
Implementation of Nested Named Entity Recognition using Flair
Stars: āœ­ 23 (+9.52%)
Mutual labels:  information-extraction, named-entity-recognition
wen-notes
My notes.
Stars: āœ­ 71 (+238.1%)
Mutual labels:  information-extraction

License: AGPL v3 GitHub release Build Status

LNEx Logo

Location Name Extractor

Extracts location names from targeted text streams. [Paper, Poster]


How do you pronounce LNEx?

Le-N-x


Following are the steps which allows you to setup and start using LNEx.

Querying OpenStreetMap Gazetteers

We will be using a ready to go elastic index of the whole OpenStreetMap data (~ 108 GB) provided by komoot as part of their photon open source geocoder (project repo). Follow the steps below to get Photon running in your system:

  • Download the full photon elastic index which is going to allow us to query OSM using a bounding box

    wget -O - http://download1.graphhopper.com/public/photon-db-latest.tar.bz2 | pbzip2 -cd | tar x
  • Now, start photon which starts the elastic index in the background as a service. You need to get the latest jar file from the releases at https://github.com/komoot/photon/releases. For example, the current latest version is 0.3.2, you can get the latest jar and run it as follows:

    wget https://github.com/komoot/photon/releases/download/0.3.2/photon-0.3.2.jar
    java -jar photon-0.3.2.jar
  • You can now test the running index by running the following command (9200 is the default port number, might be different in your system if the port is occupied by another application):

    curl -XGET 'http://localhost:9200/photon/'

Using LNEx

  • Clone this repository to your machine as follows (you should add the branch name if you are cloning a specific branch):

    git clone https://github.com/halolimat/LNEx.git
  • Make sure to use Python 3.6+.

  • Follow the example in pytest.py or pytest.ipy in order to use LNEx. You can use LNEx by initializing it using the cached files in the '_Data' folder or you can initialize it using the photon index after running it in the background as shown before.

  • The output is going to be a list of tuples of the following items:

    • Spotted_Location: is a substring of the tweet
    • Location_Offsets: are the start and end offsets of the Spotted_Location
    • Geo_Location: is the matched location name from the gazetteer
    • Geo_Info_IDs: are the ids of the geo information of the matched Geo_Locations
    # output of the tool:
    [('Chennai', (24, 31), 'chennai', [6568]),
     ('New avadi rd', (0, 12), u'new avadi road', [9568, 5060, 7238, 5063, 1896, 12722, 2820, 9375])]
  • Finally, LNEx is lightening fast and capable of tagging streams of texts, you can incorporate the following code to start streaming from Twitter (taking into consideration the spatial context) then define the bounding box that matches the spatial context established by your stream and start tagging the tweets.

Dataset

Tagged Location Names in Targeted Social Media Streams dataset

This dataset contains 4500 annotated tweets 1500 tweets from each of three Twitter streams (i.e., Chennai 2015, Louisiana 2016, and Houston 2016 floods). They were tagged using Brat tool recording the start and end character offsets of each mention with a given location category, i.e., inLoc, outLoc, and ambLoc, as mentioned in the LNEx paper.

You can fill out the following Form to get the full dataset. Alternatively, you can get a subset of the dataset from this folder, which only contains 150 tweets.

We would like to thank Xuke Hu of dlr.de for his contributions to fix some errors in the labels.

Notes

Since LNEx relies on OSM gazetteers for extraction, the performance of the tool will be affacted by the version of the data. The performance of the tool reported in the paper used the Photon index with the following properties:

  • "number" : "1.7.0",
  • "build_hash" : "929b9739cae115e73c346cb5f9a6f24ba735a743",
  • "build_timestamp" : "2015-07-16T14:31:07Z"

Debugging

There are a few things you need to make sure of before you get Photon to work well with LNEx.

  • You should have the latest photon jar file (see above).
  • Your elasticsearch-dsl in python should be compatible with the Photon version you downloaded. You should first check the version of elasticsearch by running curl -XGET 'http://localhost:9200', you will find the version number under version/number.
  • The current version of elasticsearch using by Photon is 5.5.0, so we should get the compatible elasticsearch-dsl to use it in our Python code. You can do that by visiting this page. You will find the compatible version under The recommended way to set your requirements .... In the case of 5.5.0, the compatible version is elasticsearch-dsl-5.4.0, so we install it as so pip install "elasticsearch-dsl>=5.0.0,<6.0.0".

Photon Index Issue

There is an issue with the elasticsearch index of Photon 0.3.2, so you need to do delete all files in the following directory /photon_data/elasticsearch/modules/lang-painless/ in order to get it running before you execute java -jar photon-0.3.2.jar. For more info, see the following: komoot/photon#427

Licenses

This work is licensed under AGPL-3.0 and CreativesForGood licenses. A copy of the first license can be found in this repository. The other license can be found over this link C4G License.

GPLv3 Logo CreativesForGood Logo

Citing

If you do make use of LNEx or any of its components please cite the following publication:

Hussein S. Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin, and Amit Sheth. 2018. 
Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models. 
In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), 
pages 1986ā€“1997. Association for Computational Linguistics.

@InProceedings{C18-1169,
  author = "Al-Olimat, Hussein S.
           and Thirunarayan, Krishnaprasad
           and Shalin, Valerie
           and Sheth, Amit",
  title = "Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models",
  booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1986--1997",
  location = "Santa Fe, New Mexico, USA",
  url = "http://aclweb.org/anthology/C18-1169"
}

We would also be very happy if you provide a link to the github repository:

... location name extractor tool (LNEx)\footnote{
    \url{https://github.com/halolimat/LNEx}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].