All Projects → opendatacube → odc-tools

opendatacube / odc-tools

Licence: Apache-2.0 License
ODC features that DEA is experimenting with or prototyping with the intention of being integrated into odc-core in the future

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to odc-tools

dea-coastlines
Extracting tidally-constrained annual shorelines and robust rates of coastal change from freely available Earth observation data at continental scale
Stars: ✭ 24 (-29.41%)
Mutual labels:  opendatacube
datacube-explorer
Web-based exploration of Open Data Cube collections
Stars: ✭ 39 (+14.71%)
Mutual labels:  opendatacube
deafrica-sandbox-notebooks
Repository for Digital Earth Africa Sandbox, including: Jupyter notebooks, scripts, tools and workflows for geospatial analysis with Open Data Cube and xarray
Stars: ✭ 108 (+217.65%)
Mutual labels:  opendatacube

Test Status codecov

DEA Prototype Code

  • AWS s3 tools
  • Rasterio from S3 investigations
  • Utilities for data visualizations in notebooks

Installation

This repository provides a number of small libraries and CLI tools.

Full list of libraries, and install instructions:

  • odc.algo algorithms (GeoMedian wrapper is here)
  • odc.stats large scale processing framework (Moved to odc-stats)
  • odc.ui tools for data visualization in notebook/lab
  • odc.stac STAC to ODC conversion tools (Moved to odc-stac)
  • odc.dscache experimental key-value store where key=UUID, value=Dataset (moved to odc-dscache)
  • odc.io common IO utilities, used by apps mainly
  • odc-cloud[ASYNC,AZURE,THREDDS] cloud crawling support package
    • odc.aws AWS/S3 utilities, used by apps mainly
    • odc.aio faster concurrent fetching from S3 with async, used by apps odc-cloud[ASYNC]
    • odc.{thredds,azure} internal libs for cloud IO odc-cloud[THREDDS,AZURE]

Pre-release of these libraries is on PyPI now, so can be installed with pip "the normal way". Most recent development versions of odc-tools packages are pushed to https://packages.dea.ga.gov.au, and can be installed like so:

pip install --extra-index-url="https://packages.dea.ga.gov.au" \
  odc-ui \
  odc-stac \
  odc-stats \
  odc-algo \
  odc-io \
  odc-cloud[ASYNC] \
  odc-dscache

NOTE: on Ubuntu 18.04 the default pip version is awfully old and does not support --extra-index-url command line option, so make sure to upgrade pip first: pip3 install --upgrade pip.

For Conda Users

Currently there are no odc-tools conda packages. But majority of odc-tools dependencies can be installed with conda from conda-forge channel.

Use conda env update -f <file> to install all needed dependencies for odc-tools libraries and apps.

Conda `environment.yaml` (click to expand)
channels:
  - conda-forge
dependencies:
  # Datacube
  - datacube>=1.8.5

  # odc.dscache
  - python-lmdb
  - zstandard

  # odc.algo
  - dask-image
  - numexpr
  - scikit-image
  - scipy
  - toolz

  # odc.ui
  - ipywidgets
  - ipyleaflet
  - tqdm

  # odc-apps-dc-tools
  - pystac>=1
  - pystac-client>=0.2.0
  - azure-storage-blob
  - fsspec
  - lxml  # needed for thredds-crawler

  # odc.{aio,aws}: aiobotocore/boto3
  #  pin aiobotocore for easier resolution of dependencies
  - aiobotocore==1.3.3
  - boto3

  # eodatasets3 (used by odc-stats)
  - boltons
  - ciso8601
  - python-rapidjson
  - requests-cache
  - ruamel.yaml
  - structlog
  - url-normalize

  # for dev
  - pylint
  - autopep8
  - flake8
  - isort
  - black
  - mypy

  # For tests
  - pytest
  - pytest-httpserver
  - pytest-cov
  - pytest-timeout
  - moto
  - mock
  - deepdiff

  # for pytest-depends
  - future_fstrings
  - networkx
  - colorama

  - pip=20
  - pip:
      # odc.apps.dc-tools
      - thredds-crawler

      # odc.stats
      - eodatasets3

      # tests
      - pytest-depends

      # odc.ui
      - jupyter-ui-poll

      # odc-tools libs
      - odc-stac
      - odc-algo
      - odc-ui
      - odc-dscache
      - odc-stats

      # odc-tools CLI apps
      - odc-apps-cloud
      - odc-apps-dc-tools

CLI Tools

Installation

Cloud tools depend on aiobotocore package which has a dependency on a specific version of botocore. Another package we use, boto3, also depends on a specific version of botocore. As a result having both aiobotocore and boto3 in one environment can be a bit tricky. The easiest way to solve this, is to install aiobotocore[awscli,boto3] before anything else, which will pull in a compatible version of boto3 and awscli into the environment.

pip install -U "aiobotocore[awscli,boto3]==1.3.3"
# OR for conda setups
conda install "aiobotocore==1.3.3" boto3 awscli

The specific version of aiobotocore is not relevant, but it is needed in practice to limit pip/conda package resolution search.

  1. For cloud (AWS only)
    pip install odc-apps-cloud
    
  2. For cloud (GCP, THREDDS and AWS)
    pip install odc-apps-cloud[GCP,THREDDS]
    
  3. For dc-index-from-tar (indexing to datacube from tar archive)
    pip install odc-apps-dc-tools
    

Apps

  1. s3-find list S3 bucket with wildcard
  2. s3-to-tar fetch documents from S3 and dump them to a tar archive
  3. gs-to-tar search GS for documents and dump them to a tar archive
  4. dc-index-from-tar read yaml documents from a tar archive and add them to datacube

Example:

#!/bin/bash

s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'

s3-find "${s3_src}" | \
  s3-to-tar | \
    dc-index-from-tar --env s2 --ignore-lineage

Fastest way to list regularly placed files is to use fixed depth listing:

#!/bin/bash

# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'

s3-find --skip-check "${s3_src}" | \
  s3-to-tar | \
    dc-index-from-tar --env s2 --ignore-lineage

When using Google Storage:

#!/bin/bash

# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz

Local Development

Requires docker, procedure was only tested on Linux hosts.

docker pull opendatacube/odc-test-runner:latest

cd odc-tools
make -C docker run-test

Above will run tests and generate test coverage report in htmlcov/index.html.

Other option is to run make -C docker bash, this will drop you into a shell in /code folder that contains your current checkout of odc-tools. You can then use with-test-db start command to launch and setup test database for running integration tests that require datacube database to work. From here on you can run specific tests you are developing with py.test ./path/to/test_file.py. Any changes you make to code outside of the docker environment are available without any further action from you for testing.

Release Process

Development versions of packages are pushed to DEA packages repo on every push to develop branch, version is automatically increased by a script that runs before creating wheels and source distribution tar balls. Right now new dev version is pushed for all the packages even the ones that have not changed since last push.

Publishing to PyPi happens automatically when changes are pushed to a protected pypi/publish branch. Only members of Open Datacube Admins group have the permission to push to this branch.

Process:

  1. Manually edit {lib,app}/{pkg}/odc/{pkg}/_version.py file to increase version number
  2. Merge it to develop branch via PR
  3. Fast forward pypi/publish branch to match develop
  4. Push it to GitHub

Steps 3 and 4 can be done by an authorized user with ./scripts/sync-publish-branch.sh script.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].