All Projects → zimmerman-team → IATI.cloud

zimmerman-team / IATI.cloud

Licence: AGPL-3.0 license
The open-source IATI datastore for IATI data with RESTful web API providing XML, JSON, CSV output. It extracts and parses IATI XML files referenced in the IATI Registry and powered by Apache Solr.

Programming Languages

python
139335 projects - #7 most used programming language
CSS
56736 projects
SCSS
7915 projects
HTML
75241 projects
XSLT
1337 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to IATI.cloud

litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (+2.86%)
Mutual labels:  vagrant, solr
query-segmenter
Solr Query Segmenter for structuring unstructured queries
Stars: ✭ 21 (-40%)
Mutual labels:  solr, solrcloud
ansible-roles
Library of Ansible plugins and roles for deploying various services.
Stars: ✭ 14 (-60%)
Mutual labels:  vagrant, celery
Awesome Solr
A curated list of Awesome Apache Solr links and resources.
Stars: ✭ 69 (+97.14%)
Mutual labels:  solr, apache
multi-select-facet
An example of multi-select facet with Solr, Vue and Go
Stars: ✭ 30 (-14.29%)
Mutual labels:  solr, apache-solr
Celery Dashboard
A dashboard to monitor your celery app
Stars: ✭ 40 (+14.29%)
Mutual labels:  dataviz, celery
Primary Vagrant
An Apache based Vagrant configuration for helping you get the most out of WordPress Development
Stars: ✭ 192 (+448.57%)
Mutual labels:  vagrant, apache
Docker Superset
Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]
Stars: ✭ 86 (+145.71%)
Mutual labels:  apache, celery
solr-container
Ansible Container project that manages the lifecycle of Apache Solr on Docker.
Stars: ✭ 17 (-51.43%)
Mutual labels:  solr, apache-solr
SolRDF
An RDF plugin for Solr
Stars: ✭ 115 (+228.57%)
Mutual labels:  solr, apache-solr
pulse
phData Pulse application log aggregation and monitoring
Stars: ✭ 13 (-62.86%)
Mutual labels:  solr, solrcloud
solr-zkutil
Solr Cloud and ZooKeeper CLI
Stars: ✭ 14 (-60%)
Mutual labels:  solr, solrcloud
RelevancyTuning
Dice.com tutorial on using black box optimization algorithms to do relevancy tuning on your Solr Search Engine Configuration from Simon Hughes Dice.com
Stars: ✭ 28 (-20%)
Mutual labels:  solr, solr-search
Geodataviz Toolkit
The GeoDataViz Toolkit is a set of resources that will help you communicate your data effectively through the design of compelling visuals. In this repository we are sharing resources, assets and other useful links.
Stars: ✭ 149 (+325.71%)
Mutual labels:  dataviz, data-visualisation
BnLMetsExporter
Command Line Interface (CLI) to export METS/ALTO documents to other formats.
Stars: ✭ 11 (-68.57%)
Mutual labels:  solr, solrcloud
Metta
An information security preparedness tool to do adversarial simulation.
Stars: ✭ 867 (+2377.14%)
Mutual labels:  vagrant, celery
solr role
Ansible role to install an Apache Solr (Cloud) server/cluster
Stars: ✭ 21 (-40%)
Mutual labels:  apache-solr, solrcloud
akvo-rsr
Akvo Really Simple Reporting
Stars: ✭ 33 (-5.71%)
Mutual labels:  iati-standard, iati
datapackage-m
Power Query M functions for working with Tabular Data Packages (Frictionless Data) in Power BI and Excel
Stars: ✭ 26 (-25.71%)
Mutual labels:  open-data, data-visualisation
ltr-tools
Set of command line tools for Learning To Rank
Stars: ✭ 13 (-62.86%)
Mutual labels:  solr, apache-solr

IATI.cloud


Quality Gate Status License: AGPLv3 Open issues CircleCI


IATI.cloud extracts all published IATI XML files from the IATI Registry and makes them available in a normalised PostgreSQL database, that you can access using a RESTful API. The project also stores all the parsed data in Apache Solr cores, allowing for faster querying. Two APIs are currently encompassed by the IATI.cloud project.

IATI is a global aid transparency standard and it makes information about aid spending easier to access, re-use and understand the underlying data using a unified open standard. You can find more about the IATI data standard at: www.iatistandard.org

Requirements

Name Required version Installation instructions
Python 3.6.5 Python 3.6.5
Tip: for managing multiple versions of python you can use pyenv
PostgreSQL latest PostgreSQL
PostGIS latest PostGIS
Might already be installed depending on the PostgreSQL installation done
RabbitMQ latest RabbitMQ
Apache Solr 8.2.0 Solr
Python requirements Installed by requirements.txt See instructions below
Diskspace 1GB of space is recommended to ensure the Repository, Postgres Database, Apache Solr and required services can be installed. Do keep in mind, parsing and indexing datasets does increase the overall size of the IATI.cloud project, which can reach up to or more than 80GB. Not applicable

Setting up your IATI Cloud environment

  1. Go to folder root/OIPA.
  2. Create a virtual environment with the correct Python version, recommended name is ‘env’ ex:
    apt install python3-virtualenv
    virtualenv --python=/usr/bin/python3.6 env
    
  3. Activate the virtual environment (ex: source env/bin/activate)
  4. Install uwsgi manually:
    add-apt-repository ppa:deadsnakes/ppa
    apt-get update
    apt-get install build-essential python3.6-dev
    Pip install uwsgi
    
  5. Install required libraries using pip install -r requirements.txt
  6. Make sure the following services are running on your installation: PostgreSQL, (ex: sudo systemctl status postgresql )
  7. Run pre-commit install --hook-type commit-msg
  8. Create a PostgreSQL database
  9. Add the following .env file to the current working directory:
OIPA_DB_NAME=oipa
OIPA_DB_USER=oipa
OIPA_DB_PASSWORD=oipa
DJANGO_SETTINGS_MODULE=OIPA.development_settings
  1. Add the file “local_settings.py” with the following information to the folder root/OIPA/OIPA:
SOLR = {
  'indexing': True,
  'url': 'http://localhost:8983/solr',
  'cores': {
       'activity': 'activity',
       'budget': 'budget',
       'dataset': 'dataset',
       'organisation': 'organisation',
       'publisher': 'publisher',
       'result': 'result',
       'transaction': 'transaction',
  }
}
DOWNLOAD_DATASETS = False
  1. Go back to the folder root/OIPA and run database migrations with the command python manage.py migrate
  2. Start the development server with the command python manage.py runserver
  3. Create a superuser account for django with the command python manage.py createsuperuser
  4. Start RabbitMQ with brew service start rabbitmq on mac or sudo service rabbitmq-server start on linux.
  5. Start Celery worker with the command: celery -A OIPA worker --loglevel=info --concurrency=10, change the concurrency to your liking.
  6. Start Celery beat with the command: celery -A OIPA beat --loglevel=info -S django
  7. Start Celery flower with the command: celery flower -A OIPA --port=5555
  8. Navigate to your Solr installation.
  9. To use Apache Solr you will need to create the following 7 cores:
  • activity
  • budget
  • dataset
  • organisation
  • publisher
  • result
  • transaction

To create a core :

  • Input the following command on the command line: bin/solr create -c [name of your core].
  • Copy the ‘managed-schema’ file from OIPA/solr/[name of your core]/conf/ and paste it in the server/solr/[name of your core]/conf/ folder of the solr core.
  1. Run the command bin/solr start to run Solr.

You can now access the Django admin page at http://localhost:8000/admin/, the Flower Dashboard at http://localhost:5555 and the Apache Solr administrative dashboard at localhost:8983/solr/#/

Debugging Celery or Apache Solr

  • install telnet
  • add at the to-be-debugged line in your code :
from celery.contrib import rdb
rdb.set_trace()

When the code reaches the corresponding line, you will receive a notification in the terminal stating that they are waiting for the debugger at port [port number].
In another terminal you can then launch: telnet localhost [port number].
This will open the debugger.

Parsing/indexing the data.

This process is managed from the Django administration page. The following is a step by step description of everything that needs to be done to load the data inside your local postgres database.

  1. Disable Apache Solr indexing within OIPA/OIPA/local_settings.py by changing ‘indexing’: True to False.
  2. On the django administration page, run the task to import codelists.
    • Wait for this to finish.
  3. On the django administration page, force run the task to update exchange rates.
    • Wait for this to finish.
    • Activate a scheduled version of this task, the task should be run monthly, not strictly necessary on a local installation.
  4. Enable Apache Solr indexing.
  5. On the django administration page, run the task to import datasets.
    • In case you want to make use of the IATI validator, make sure that you set DOWNLOAD_DATASETS = True in OIA/OIPA/local_settings.py, so the IATI validator can be used.
    • Wait for this to finish.
  6. Wait for the IATI Validator to finish its validation. We can check the status here. If no data is returned, it has finished.
  7. If you want to parse ALL available datasets use the following: On the django administration page, run the task to validate the datasets. If you want to parse a specific organisation or a specific dataset , use those tasks.
    • Wait for this to finish.
  8. We can now parse and index the datasets that have been prepared in the previous steps. On the django administration page, run the task to parse all datasets.
    • Wait for this to finish.

After this all the data is available within our database as well as Solr. Solr can now be used to query the data. Simply select the core containing the information you're interested in, go to the query tab and ask your question.

API Documentation

Full API documentation for iati.cloud can be found at docs.iati.cloud.

About the project

Can I contribute?

Yes! We are mainly looking for coders to help on the project. If you are a coder feel free to Fork the repository and send us your amazing Pull Requests!

How should I contribute?

Python already has clear PEP 8 code style guidelines, so it's difficult to add something to it, but there are certain key points to follow when contributing:

  • PEP 8 code style guidelines should always be followed. Tested with flake8 OIPA.
  • Commitlint is used to check your commit messages.
  • Always try to reference issues in commit messages or pull requests ("related to #614", "closes #619" and etc.).
  • Avoid huge code commits where the difference can not even be rendered by browser based web apps (Github for example). Smaller commits make it much easier to understand why and how the changes were made, why (if) it results in certain bugs and etc.
  • When developing a new feature, write at least some basic tests for it. This helps not to break other things in the future. Tests can be run with pytest
  • If there's a reason to commit code that is commented out (there usually should be none), always leave a "FIXME" or "TODO" comment so it's clear for other developers why this was done.
  • When using external dependencies that are not in PyPI (from Github for example), stick to a particular commit (i. e. git+https://github.com/Supervisor/supervisor@ec495be4e28c694af1e41514e08c03cf6f1496c8#egg=supervisor), so if the library is updated, it doesn't break everything
  • Automatic code quality / testing checks (continuous integration tools) are implemented to check all these things automatically when pushing / merging new branches. Quality is the key!

Running the tests

Pytest-django is used to run tests. This will be installed automatically when the project is set up. To run tests, from the top level directory of the project, run pytest OIPA/. If you are in the same directory where manage.py is, only running pytest will be sufficient. Refer to Pytest-django documentations for details.

Tip: to be able to use debuggers (f. ex. ipdb) with pytest, run it with -s option (to turn off capturing test output).

Testing / code quality settings can be found in the setup.cfg file. Test coverage settings (for pytest-cov plugin) can be found at .coveragerc file.

Who currently makes use of IATI.cloud?

& many others

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].