All Projects → narayave → Insight-GDELT-Feed

narayave / Insight-GDELT-Feed

Licence: other
A way for home buyers to know about factors affecting a state

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to Insight-GDELT-Feed

airflow-site
Apache Airflow Website
Stars: ✭ 95 (+120.93%)
Mutual labels:  airflow, apache
ap-airflow
Astronomer Core Docker Images
Stars: ✭ 87 (+102.33%)
Mutual labels:  airflow, apache
airflow-client-python
Apache Airflow - OpenApi Client for Python
Stars: ✭ 172 (+300%)
Mutual labels:  airflow, apache
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+55948.84%)
Mutual labels:  airflow, apache
airflow-prometheus-exporter
Export Airflow metrics (from mysql) in prometheus format
Stars: ✭ 25 (-41.86%)
Mutual labels:  airflow, apache
Beyond Jupyter
🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (+213.95%)
Mutual labels:  airflow, apache
basepath
Base path detector for Slim 4
Stars: ✭ 36 (-16.28%)
Mutual labels:  apache
aircan
💨🥫 A Data Factory system for running data processing pipelines built on AirFlow and tailored to CKAN. Includes evolution of DataPusher and Xloader for loading data to DataStore.
Stars: ✭ 24 (-44.19%)
Mutual labels:  airflow
Example Airflow Dags
Example DAGs using hooks and operators from Airflow Plugins
Stars: ✭ 243 (+465.12%)
Mutual labels:  airflow
Awesome Apache Airflow
Curated list of resources about Apache Airflow
Stars: ✭ 2,755 (+6306.98%)
Mutual labels:  airflow
fab-oidc
Flask-AppBuilder SecurityManager for OpenIDConnect
Stars: ✭ 28 (-34.88%)
Mutual labels:  airflow
kedro-airflow-k8s
Kedro Plugin to support running pipelines on Kubernetes using Airflow.
Stars: ✭ 22 (-48.84%)
Mutual labels:  airflow
regolith
A WordPress installation template that's a little bit looser than Bedrock
Stars: ✭ 24 (-44.19%)
Mutual labels:  apache
openwhisk-package-kafka
Apache OpenWhisk package for communicating with Kafka or Message Hub
Stars: ✭ 35 (-18.6%)
Mutual labels:  apache
phoenix-hibernate-dialect
An Apache Phoenix Hibernate dialect
Stars: ✭ 20 (-53.49%)
Mutual labels:  apache
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+158.14%)
Mutual labels:  airflow
dubbo-go-benchmark
benchmark for [apache/dubbo-go](github.com/apache/dubbo-go)
Stars: ✭ 26 (-39.53%)
Mutual labels:  apache
Paperboy
A web frontend for scheduling Jupyter notebook reports
Stars: ✭ 221 (+413.95%)
Mutual labels:  airflow
HttpClientMock
Library for mocking Apache HttpClient.
Stars: ✭ 41 (-4.65%)
Mutual labels:  apache
echarts-handbook
Apache ECharts Handbook
Stars: ✭ 59 (+37.21%)
Mutual labels:  apache

InfoCurrent

Helping home buyers know the factors that are making an impact on a state.

Business Use Case

A home buyer weighs multiple factors in the process of buying a home. Firstly, the location of the property is important. (For the purposes of this project, a location refers to a State of )On top of the location, knowing the general stability of a location, and the people and groups that support or challenge that stability, is also valuable.


Solution

Link: infocurrent.xyz

The solution for the business problem can be solved by compiling and looking at the news stories from a certain location. In general, this can in general give the answer that the user is looking for.

InfoCurrent filters through events logged in the GDELT database to produce results for the user. InfoCurrent in particular only considers events in US States. The data in GDELT does hold some sparse data, and part of the process within InfoCurrent, is to filter out sparse data and extract the useful data. Events consist of many different points of information, and one that's crucial for this application is the 'Goldstein Scale' of an event. The scale falls between -10.0 to 10.0, severely negative impact by extreme conflict to post impact by extreme cooperation. This rating itself is determined by the type of event it is addressing.

The events for the US states are consolidated and grouped by year. The number of events happening in a state are tracked, and the goldstein ratings are continously summed. Using the sum of the goldstein ratings and the number of events from a location, the average of the goldstein rating can be quickly derived. Note that, the average goldstein rating also serves as the final "Impact score" that the user sees. Also note that the rating is normalized to a value between 0.0-1.0 before the final step.

Global Database of Event, Language and Tone (GDELT)

The GDELT project collects news stories from print and web sources from around the world. It's able to identify a number of people, organizations, themes, emotions, and ultimately events that are driving the global soceity. This live data mining projects produces one of the largest open spatiotemporal datasets that exist.


ETL Pipeline

Image

New GDELT updates are acquired from the source. A Python script processes the data and places it in a PostgreSQL database. Since GDELT updates are posted every 15 minutes, an Airflow workflow is scheduled to complete this process as new data arrives.

GDELT's historic data exists in an Amazon S3 bucket. An offline batch processing Apache Spark job reads and processes the data from S3. The processed data is saved in a PostgreSQL database. The user facing component of this pipeline is the Flask application. The user is able to specify a single state, and a set of actors they are interested in. The application makes the appropriate queries to the PostgreSQL database. The results are viewed on the Flask application.


User Interface

Link to Flask application: infocurrent.xyz Image


Installation

Things are need to be installed and running

  • Apache Spark
  • PostgreSQL Database
  • Flask
  • Airflow
  • Python
  • sqlalchemy
  • pandas
  • psycopg2
  • pandasql

Presentation link

Link to Infocurrent presentation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].