All Projects → chicago-justice-project → article-tagging

chicago-justice-project / article-tagging

Licence: MIT license
Natural Language Processing of Chicago news articles

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
r
7636 projects

Projects that are alternatives of or similar to article-tagging

OpenNameSearch
Script for Building a Basic Nominatim Server
Stars: ✭ 14 (-65.85%)
Mutual labels:  geocoding
nominatim-java-api
Nominatim search API client written in Java
Stars: ✭ 59 (+43.9%)
Mutual labels:  geocoding
rrgeo
A fast, offline, reverse geocoder
Stars: ✭ 76 (+85.37%)
Mutual labels:  geocoding
Geocoding-with-Map-Vector
Resources for the ACL 2018 publication "Which Melbourne? Augmenting Geocoding with Maps", published in July 2018.
Stars: ✭ 24 (-41.46%)
Mutual labels:  geocoding
Atlas
🌎 Atlas is a set of APIs for looking up information about locations
Stars: ✭ 21 (-48.78%)
Mutual labels:  geocoding
Osmunda
An offline geocode library for android, powered by SQLite, using osm data. 离线地理编码Android库,基于SQLite,使用开放街道地图数据。
Stars: ✭ 37 (-9.76%)
Mutual labels:  geocoding
django-mapbox-location-field
Simple in use location model and form field with MapInput widget for picking some location. Uses mapbox gl js, flexible map provider API. Fully compatible with bootstrap framework. Can be used with spatial or plain databases.
Stars: ✭ 60 (+46.34%)
Mutual labels:  geocoding
country-bounding-boxes
A list of ISO 3166-1 country codes and their bounding boxes.
Stars: ✭ 26 (-36.59%)
Mutual labels:  geocoding
open route service
An encapsulation made around openrouteservice API for Dart and Flutter projects. Made for easy generation of Routes and Directions on Maps, Isochrones, Time-Distance Matrix, Pelias Geocoding, POIs, Elevation and routing Optimizations using their amazing API.
Stars: ✭ 20 (-51.22%)
Mutual labels:  geocoding
geocoding
地理编码技术,提供地址标准化和相似度计算。
Stars: ✭ 148 (+260.98%)
Mutual labels:  geocoding
geocoder
Web app interface for geocoding addresses in CSV files.
Stars: ✭ 17 (-58.54%)
Mutual labels:  geocoding
addressr
Free Australian Address Validation, Search and Autocomplete
Stars: ✭ 46 (+12.2%)
Mutual labels:  geocoding
ais
Address Information System
Stars: ✭ 18 (-56.1%)
Mutual labels:  geocoding
python-censusbatchgeocoder
A simple Python wrapper for U.S. Census Geocoding Services API batch service
Stars: ✭ 40 (-2.44%)
Mutual labels:  geocoding
extractnet
A Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+26.83%)
Mutual labels:  news-articles
dtp-stat-archive
Карта ДТП v1.0. 👉 База знаний о проекте: https://github.com/dtpstat/dtp-project/wiki
Stars: ✭ 142 (+246.34%)
Mutual labels:  geocoding
python-omgeo
OMGeocoder - A python geocoding abstraction layer
Stars: ✭ 34 (-17.07%)
Mutual labels:  geocoding
NominatimGeocoderBackend
UnifiedNlp geocoder backend that uses the OSM Nominatim service
Stars: ✭ 49 (+19.51%)
Mutual labels:  geocoding
python-opencage-geocoder
Python module to access the OpenCage geocoding API
Stars: ✭ 54 (+31.71%)
Mutual labels:  geocoding
php-opencage-geocode
PHP library to access the OpenCage geocoding API
Stars: ✭ 26 (-36.59%)
Mutual labels:  geocoding

Build Status

tagnews

tagnews is a Python library that can

  • Automatically categorize the text from news articles with type-of-crime tags, e.g. homicide, arson, gun violence, etc.
  • Automatically extract the locations discussed in the news article text, e.g. "55th and Woodlawn" and "1700 block of S. Halsted".
  • Retrieve the latitude/longitude pairs for said locations using an instance of the pelias geocoder hosted by CJP.
  • Get the community areas those lat/long pairs belong to using a shape file downloaded from the city data portal parsed by the shapely python library.

Sound interesting? There's example usage below!

You can find the source code on GitHub.

Installation

You can install tagnews with pip:

pip install tagnews

NOTE: You will need to install some NLTK packages as well:

>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('wordnet')

Beware, tagnews requires python >= 3.5.

Example

The main classes are tagnews.CrimeTags and tagnews.GeoCoder.

>>> import tagnews
>>> crimetags = tagnews.CrimeTags()
>>> article_text = ('The homicide occurred at the 1700 block of S. Halsted Ave.'
...   ' It happened just after midnight. Another person was killed at the'
...   ' intersection of 55th and Woodlawn, where a lone gunman')
>>> crimetags.tagtext_proba(article_text)
HOMI     0.739159
VIOL     0.146943
GUNV     0.134798
...
>>> crimetags.tagtext(article_text, prob_thresh=0.5)
['HOMI']
>>> geoextractor = tagnews.GeoCoder()
>>> prob_out = geoextractor.extract_geostring_probs(article_text)
>>> list(zip(*prob_out))
[..., ('at', 0.0044685714), ('the', 0.005466637), ('1700', 0.7173856),
 ('block', 0.81395197), ('of', 0.82227415), ('S.', 0.7940061),
 ('Halsted', 0.70529455), ('Ave.', 0.60538065), ...]
>>> geostrings = geoextractor.extract_geostrings(article_text, prob_thresh=0.5)
>>> geostrings
[['1700', 'block', 'of', 'S.', 'Halsted', 'Ave.'], ['55th', 'and', 'Woodlawn,']]
>>> coords, scores = geoextractor.lat_longs_from_geostring_lists(geostrings)
>>> coords
         lat       long
0  41.859021 -87.646934
1  41.794816 -87.597422
>>> scores # confidence in the lat/longs as returned by pelias, higher is better
array([0.878, 1.   ])
>>> geoextractor.community_area_from_coords(coords)
['LOWER WEST SIDE', 'HYDE PARK']

Limitations

This project uses Machine Learning to automate data cleaning/preparation tasks that would be cost and time prohibitive to perform using people. Like all Machine Learning projects, the results are not perfect, and in some cases may look just plain bad.

We strived to build the best models possible, but perfect accuracy is rarely possible. If you have thoughts on how to do better, please consider reporting an issue, or better yet contributing.

How can I contribute?

Great question! Please see CONTRIBUTING.md.

Problems?

If you have problems, please report an issue. Anything that is behaving unexpectedly is an issue, and should be reported. If you are getting bad or unexpected results, that is also an issue, and should be reported. We may not be able to do anything about it, but more data rarely degrades performance.

Background

We want to compare the amount of different types of crimes are reported in certain areas vs. the actual occurrence amount in those areas. In essence, are some crimes under-represented in certain areas but over-represented in others? This is the main question driving the analysis.

This question came from the Chicago Justice Project. They have been interested in answering this question for quite a while, and have been collecting the data necessary to have a data-backed answer. Their efforts include

  1. Scraping RSS feeds of articles written by Chicago area news outlets for several years, allowing them to collect almost half a million articles.
  2. Organizing an amazing group of volunteers that have helped them tag these articles with crime categories like "Gun Violence" and "Drugs", but also organizations such as "Cook County State's Attorney's Office", "Illinois State Police", "Chicago Police Department", and other miscellaneous categories such as "LGBTQ", "Immigration".
  3. The web UI used to do this tagging was also recently updated to allow highlighting of geographic information, resulting in several hundred articles with labeled location sub-strings.

Most of the code for those components can be found here.

A group actively working on this project meets every Tuesday at Chi Hack Night.

See Also

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].