GeoTrellis | |
Logo Alt: | GeoTrellis logo |
Developer: | LocationTech, Azavea |
Released: | 12 May 2012 |
Latest Release Version: | 3.6.0 |
Latest Release Date: | [1] |
Operating System: | Linux |
Programming Language: | Scala |
Genre: | Big Data, Map algebra |
License: | Apache License 2.0 |
GeoTrellis is an open source, geographic data processing library designed to work with large geospatial raster data sets. It is written in Scala and has an open-source Apache 2.0 license.
GeoTrellis' core competency is raster data processing: enabling distributed processing of large geospatial raster data sets using the techniques of map algebra. In addition to support for raster data operations, GeoTrellis includes some support for operations using vector and point cloud data.
GeoTrellis leverages Apache Spark for distributed processing. Distributed processing relies on indexing large datasets based on a multi-dimensional space-filling curve (SFC). SFCs enable the translation of multi-dimensional indices into a single-dimensional one, while maintaining geospatial locality. This allows for efficient reading and writing of large datasets to be performed in parallel across multiple computers.
Python bindings have been developed for GeoTrellis as a sub-project called GeoPySpark that enables Python developers to access and use the GeoTrellis library.
GeoTrellis started as a research project at Azavea, a geospatial software company based in Philadelphia. A precursor software component, DecisionTree, was developed beginning in 2006 with support from a Small Business Innovation Research grant from the U.S. Department of Agriculture. In 2009, with financial support from the William Penn Foundation and Stroud Water Research Center, Azavea embarked on early development of GeoTrellis.
GeoTrellis was released as an open source project in 2011 [2] with the goal of supporting fast processing of geospatial raster data at scale.
GeoTrellis initially supported distributed computation through Akka, a Scala framework for building concurrent and distributed applications. The need to support additional use cases and features such as caching and sharding datasets across a storage cluster led to a search for a new distribution framework. GeoTrellis moved to Apache Spark as its distribution engine in 2014 [3] in order to leverage management, scheduling, and other features in the Spark framework. One key use case that drove this phase of development was the need to efficiently process large, spatiotemporal datasets like those used for many earth science applications, such as climate change.[4] The move to Apache Spark enabled efficient support for large climate change forecast datasets published by the Intergovernmental Panel on Climate Change (IPCC).
GeoTrellis was submitted to the Eclipse Foundation's LocationTech[5] working group in 2013 and graduated from incubation with a 1.0 release in December 2016.[6]
GeoTrellis has been used in a number of geospatial domains including: satellite and aerial image processing, forest growth simulation, agricultural yield predictions, planning, digital humanities, government infrastructure investment, and machine learning to support crime risk forecasting. It is currently integrated into other open source software projects including: Raster Foundry,[7] Raster Frames,[8] and GeoPySpark.[9]