All Projects → isciences → Exactextractr

isciences / Exactextractr

R package for fast and accurate raster zonal statistics

Programming Languages

r
7636 projects

Labels

Projects that are alternatives of or similar to Exactextractr

eemont
A python package that extends Google Earth Engine.
Stars: ✭ 290 (+124.81%)
Mutual labels:  gis, raster
wxee
A Python interface between Earth Engine and xarray for processing time series data
Stars: ✭ 113 (-12.4%)
Mutual labels:  gis, raster
georaster-layer-for-leaflet
Display GeoTIFFs and soon other types of raster on your Leaflet Map
Stars: ✭ 168 (+30.23%)
Mutual labels:  gis, raster
GeoArrays.jl
Simple geographical raster interaction built on top of ArchGDAL, GDAL and CoordinateTransformations
Stars: ✭ 42 (-67.44%)
Mutual labels:  gis, raster
Blendergis
Blender addons to make the bridge between Blender and geographic data
Stars: ✭ 4,642 (+3498.45%)
Mutual labels:  raster, gis
localtileserver
🌐 dynamic tile server for visualizing rasters in Jupyter with ipyleaflet or folium
Stars: ✭ 190 (+47.29%)
Mutual labels:  gis, raster
go-mbgl
Go bindings for Mapbox GL Native
Stars: ✭ 16 (-87.6%)
Mutual labels:  gis, raster
Mapchete
Tile-based geodata processing using rasterio & Fiona
Stars: ✭ 129 (+0%)
Mutual labels:  raster, gis
Datacube Core
Open Data Cube analyses continental scale Earth Observation data through time
Stars: ✭ 285 (+120.93%)
Mutual labels:  raster, gis
Grass
GRASS GIS - free and open source Geographic Information System (GIS)
Stars: ✭ 281 (+117.83%)
Mutual labels:  raster, gis
Earthenterprise
Google Earth Enterprise - Open Source
Stars: ✭ 2,425 (+1779.84%)
Mutual labels:  raster, gis
Rasterio
Rasterio reads and writes geospatial raster datasets
Stars: ✭ 1,643 (+1173.64%)
Mutual labels:  raster, gis
Python Geospatial
A collection of Python packages for geospatial analysis with binder-ready notebook examples
Stars: ✭ 187 (+44.96%)
Mutual labels:  raster, gis
FreehandRasterGeoreferencer
QGIS plugin for the interactive georeferencing of rasters
Stars: ✭ 38 (-70.54%)
Mutual labels:  gis, raster
Rioxarray
geospatial xarray extension powered by rasterio
Stars: ✭ 148 (+14.73%)
Mutual labels:  raster, gis
awesome-spectral-indices
A ready-to-use curated list of Spectral Indices for Remote Sensing applications.
Stars: ✭ 357 (+176.74%)
Mutual labels:  gis, raster
geoblaze
Blazing Fast JavaScript Raster Processing Engine
Stars: ✭ 80 (-37.98%)
Mutual labels:  gis, raster
Geotiff.io
Static website for viewing and analyzing GeoTIFF's in the browser
Stars: ✭ 53 (-58.91%)
Mutual labels:  raster, gis
Geocube
Tool to convert geopandas vector data into rasterized xarray data.
Stars: ✭ 87 (-32.56%)
Mutual labels:  raster, gis
Pyrate
A Python tool for estimating velocity and time-series from Interferometric Synthetic Aperture Radar (InSAR) data.
Stars: ✭ 110 (-14.73%)
Mutual labels:  gis

exactextractr

Build Status Build Status coverage report CRAN cran checks

exactextractr is an R package that quickly and accurately summarizes raster values over polygonal areas, commonly referred to as zonal statistics. Unlike most zonal statistics implementations, it handles grid cells that are partially covered by a polygon. Typical performance for real-world applications is orders of magnitude faster than the raster package.

Example Graphic.

Calculations are performed using the C++ exactextract tool. Additional background and a description of the method is available here. Full package reference documentation is available here.

Basic Usage

The package provides an exact_extract method that operates analogously to the extract method in the raster package. The snippet below demonstrates the use of this function to compute monthly mean precipitation for each municipality in Brazil.

library(raster)
library(sf)
library(exactextractr)

# Pull municipal boundaries for Brazil
brazil <- st_as_sf(getData('GADM', country='BRA', level=2))

# Pull gridded precipitation data
prec <- getData('worldclim', var='prec', res=10)

# Calculate vector of mean December precipitation amount for each municipality
brazil$mean_dec_prec <- exact_extract(prec[[12]], brazil, 'mean')

# Calculate data frame of min and max precipitation for all months
brazil <- cbind(brazil, exact_extract(prec, brazil, c('min', 'max')))

Summary Operations

exactextractr can summarize raster values using several named operations as well as arbitrary R functions. Where applicable, a named operation will provide better performance and reduced memory usage relative to an equivalent R function. Named operations are specified by providing a character vector with one or more operation names to the fun parameter of exact_extract.

The following summary operations are supported:

Name Description
count Sum of all cell coverage fractions.
majority (or mode) The raster value with the largest sum of coverage fractions.
max Maximum value of cells that intersect the polygon, ignoring coverage fractions.
mean Mean value of cells that intersect the polygon, weighted by the fraction of the cell that is covered.
median Median value of cells that intersect the polygon, weighted by the fraction of the cell that is covered.
quantile Arbitrary quantile value of cells that intersect the polygon, weighted by the fraction of the cell that is covered.
min Minimum value of cells that intersect the polygon, ignoring coverage fractions.
minority The raster value with the smallest sum of coverage fractions.
sum Sum of values of raster cells that intersect the polygon, with each raster value weighted by its coverage fraction.
variety The number of distinct raster values in cells wholly or partially covered by the polygon.
variance The population variance of cell values, weighted by the fraction of each cell that is covered by the polygon.
stdev The population standard deviation of cell values, weighted by the fraction of each cell that is covered by the polygon.
coefficient_of_variation The population coefficient of variation of cell values, weighted by the fraction of each cell that is covered by the polygon.

Two additional summary operations require the use of a second weighting raster, provided in the weights argument to exact_extract:

Name Description
weighted_mean Mean defined value of cells that intersect the polygon, weighted by the product of the coverage fraction and the value of a second weighting raster.
weighted_sum Sum of defined values of raster cells that intersect the polygon, multiplied by the coverage fraction and the value of a second weighting raster.

Weighted usage is discussed in more detail below.

Undefined (NA) values are ignored in all of the named summary operations when they occur in the value raster. When they occur in the weighting raster, they cause the result of the summary operation to be NA.

Summary Functions

In addition to the summary operations described above, exact_extract can accept an R function to summarize the cells covered by the polygon. Because exact_extract takes into account the fraction of the cell that is covered by the polygon, the summary function must take two arguments: the value of the raster in each cell touched by the polygon, and the fraction of that cell area that is covered by the polygon. (This differs from raster::extract, where the summary function takes the vector of raster values as a single argument and effectively assumes that the coverage fraction is 1.0.)

An example of a built-in function with the appropriate signature is weighted.mean. Some examples of custom summary functions are:

# Number of cells covered by the polygon (raster values are ignored)
exact_extract(rast, poly, function(values, coverage_fraction)
                            sum(coverage_fraction))

# Sum of defined raster values within the polygon, accounting for coverage fraction
exact_extract(rast, poly, function(values, coverage_fraction)
                            sum(values * coverage_fraction, na.rm=TRUE))

# Number of distinct raster values within the polygon (coverage fractions are ignored)
exact_extract(rast, poly, function(values, coverage_fraction)
                            length(unique(values)))

# Number of distinct raster values in cells more than 10% covered by the polygon
exact_extract(rast, poly, function(values, coverage_fraction)
                            length(unique(values[coverage_fraction > 0.1])))

Weighted Usage

exact_extract allows for calculation of summary statistics based on multiple raster layers, such as a population-weighted temperature. The weighting raster must use the same coordinate system as the primary raster, and it must use a grid that is compatible with the primary raster. (The resolutions and extents of the rasters need not be the same, but the higher resolution must must be an integer multiple of the lower resolution, and the cell boundaries of both rasters must coincide with cell boundaries in the higher-resolution grid.)

One application of this feature is the calculation of zonal statistics on raster data in geographic coordinates. The previous calculation of mean precipitation amount across Brazilian municipalities assumed that each raster cell covered the same area, which is not correct for rasters in geographic coordinates (latitude/longitude).

We can correct for varying cell areas by creating a weighting raster with the area of each cell in the primary raster using the area function from the raster package.

Weighted Summary Operations

Performing a weighted summary with the weighted_mean and weighted_sum operations is as simple as providing a weighting RasterLayer or RasterStack to the weights argument of exact_extract.

The area-weighted mean precipitation calculation can be expressed as:

brazil$mean_dec_prec_weighted <- exact_extract(prec[[12]], brazil, 'weighted_mean', weights = area(prec))

With the relatively small polygons used in this example, the error introduced by assuming constant cell area is negligible. However, for large polygons that span a wide range of latitudes, this may not be the case.

Weighted Summary Functions

A weighting raster can also be provided when an R summary function is used. When a weighting raster is provided, the summary function must accept a third argument containing the values of the weighting raster.

An equivalent to the weighted_mean usage above could be written as:

brazil$mean_dec_prec_weighted <- 
  exact_extract(prec[[12]], brazil, function(values, coverage_frac, weights) {
    weighted.mean(values, coverage_frac * weights)
  }, weights = area(prec))

Or, to calculate the area-weighted mean precipitation for all months:

brazil <- cbind(brazil,
  exact_extract(prec, brazil, function(values, coverage_frac, weights) {
    weighted.mean(values, coverage_frac * weights)
  },
  weights = area(prec),
  stack_apply = TRUE))

In this example, the stack_apply argument is set to TRUE so that the summary function will be applied to each layer of prec independently. (If stack_apply = FALSE, the summary function will be called with all values of prec in a 12-column data frame.)

Additional Usages

Multi-Raster Summary Functions

A multi-raster summary function can also be written to implement complex behavior that requires that multiple layers in a RasterStack be considered simultaneously.

Here, we compute an area-weighted average temperature by calling exact_extract with a RasterStack of minimum and maximum temperatures, and a RasterLayer, of cell areas.

tmin <- getData('worldclim', var = 'tmin', res = 10)
tmax <- getData('worldclim', var = 'tmax', res = 10)

temp <- stack(tmin[[12]], tmax[[12]])

brazil$tavg_dec <- exact_extract(temp, brazil,
  function(values, coverage_fraction, weights) {
    tavg <- 0.5*(values$tmin12 + values$tmax12)
    weighted.mean(tavg, coverage_fraction * weights)
  }, weights = area(prec))

When exact_extract is called with a RasterStack of values or weights and stack_apply = FALSE (the default), the values or weights from each layer of the RasterStack will be provided to the summary function as a data frame.

In the example above, the summary function is provided with a data frame of values (containing the values for each layer in the temp stack), a vector of coverage fractions, and a vector of weights.

Multi-Valued Summary Functions

In some cases, it is desirable for a summary function to return multiple values for each input feature. A common application is to summarize the fraction of each polygon that is covered by a given class of a categorical raster. This can be accomplished by writing a summary function that returns a one-row data frame for each input feature. The data frames for each feature will be combined into a single data frame using using rbind or, if it is available, dplyr::bind_rows.

In this example, the mean temperature for each municipality is returned for each altitude category.

altitude <- getData('alt', country = 'BRA')

prec_for_altitude <- exact_extract(prec[[12]], brazil, function(prec, frac, alt) {
  # ignore cells with unknown altitude
  prec <- prec[!is.na(alt)]
  frac <- frac[!is.na(alt)]
  alt <- alt[!is.na(alt)]
  
  low <- !is.na(alt) & alt < 500
  high <- !is.na(alt) & alt >= 500

  data.frame(
    prec_low_alt = weighted.mean(prec[low], frac[low]),
    prec_high_alt = weighted.mean(prec[high], frac[high])
  )
}, weights = altitude)

Rasterization

exactextractr can rasterize polygons though computation of the coverage fraction in each cell. The coverage_fraction function returns a RasterLayer with values from 0 to 1 indicating the fraction of each cell that is covered by the polygon. Because this function generates a RasterLayer for each feature in the input dataset, it can quickly consume a large amount of memory. Depending on the analysis being performed, it may be advisable to manually loop over the features in the input dataset and combine the generated rasters during each iteration.

Performance and Accuracy

An example benchmark using the example data is shown below. The mean execution time for exactextractr was 2.6 seconds, vs 136 for raster. Timing was obtained from execution on an AWS t2.medium instance.

microbenchmark(
  a <- exact_extract(prec[[12]], brazil, weighted.mean),
  b <- extract(prec[[12]], brazil, mean, na.rm=TRUE), times=5)
  
# Unit: seconds
#               expr         min          lq        mean     median          uq        max neval
# a <- exact_extract(...)    2.5674   2.586868   2.626761   2.587283   2.613296   2.778957     5
#       b <- extract(...)  136.1710 136.180563 136.741275 136.226435 136.773627 138.354764     5

Results from exactextractr are more accurate than other methods because raster pixels that are partially covered by polygons are considered. The significance of partial coverage increases for polygons that are small or irregularly shaped. For the 5500 Brazilian municipalities used in the example, the error introduced by incorrectly handling partial coverage is less than 1% for 88% of municipalities and reaches a maximum of 9%.

Although exactextractr is fast, it may still be slower than the command-line exactextract tool. However, some efficiencies, such as the simultaneous processing of multiple layers in a RasterStack, are only possible in exactextractr.

Dependencies

Installation requires version 3.5 or greater of the GEOS geometry processing library. It is recommended to use the most recent released version (3.8) for best performance. On Windows, GEOS will be downloaded automatically as part of package install. On MacOS, it can be installed using Homebrew (brew install geos). On Linux, it can be installed from system package repositories (apt-get install libgeos-dev on Debian/Ubuntu, or yum install libgeos-devel on CentOS/RedHat.)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].