0% found this document useful (0 votes)

26 views13 pages

Hadoop for Geospatial Data Analysis

This conference paper presents a novel approach for analyzing remote sensing time series data using big data streaming and MapReduce techniques. The authors propose a combination of streaming analytics and distributed file systems to facilitate complex analyses of large satellite imagery datasets, specifically utilizing the BFAST classification algorithm on MODIS data. The results indicate that this method improves computational efficiency and enables scientists to perform reproducible analyses on extensive Earth observation data.

Uploaded by

Jorge D. Marques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views13 pages

Hadoop for Geospatial Data Analysis

Uploaded by

Jorge D. Marques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/309585345

Big data streaming for remote sensing time series analytics using MapReduce

Conference Paper · December 2016

CITATIONS READS

10 1,248

8 authors, including:

Luiz Fernando Ferreira Gomes de Assis Gilberto Queiroz

University of São Paulo National Institute for Space Research
22 PUBLICATIONS 509 CITATIONS 83 PUBLICATIONS 1,311 CITATIONS

SEE PROFILE SEE PROFILE

Karine Reis Ferreira Lubia Vinhas

National Institute for Space Research National Institute for Space Research
94 PUBLICATIONS 1,448 CITATIONS 92 PUBLICATIONS 845 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Luiz Fernando Ferreira Gomes de Assis on 01 November 2016.

The user has requested enhancement of the downloaded file.

Big data streaming for remote sensing time series analytics
using MapReduce
Luiz Fernando Assis1 , Gilberto Ribeiro1 , Karine Reis Ferreira1 , Lúbia Vinhas1 ,
Eduardo Llapa1 , Alber Sanchez1 , Victor Maus1 , Gilberto Câmara1
1
Image Processing Department
INPE - National Institute for Space Research
Av dos Astronautas 1758
Caixa Postal Sao Jose dos Campos – SP – Brazil

{luizffga,gribeiro,karine,lubia,edullapa,[Link],gilberto}@[Link]

Abstract. Governmental agencies provide a large and open set of satellite im-
agery which can be used to track changes in geographic features over time. The
current available analysis methods are complex and they are very demanding
in terms of computing capabilities. Hence, scientist cannot reproduce analytic
results because of lack of computing infrastructure. Therefore, we propose a
combination of streaming and map-reduce for time series analysis of time series
data. We tested our proposal by applying the classification algorithm BFAST to
MODIS imagery. Then, we evaluated account computing performance and re-
quirements quality attributes. Our results revealed that the combination between
Hadoop and R can handle complex analysis of remote sensing time series.

1. Introduction
Currently, there is huge amount of remote sensing images openly available, since many
space agencies have adopted open access policies to their repositories. This large data
sets are a good chance to broaden the scope of scientific research that uses Earth obser-
vation (EO) data. To support this research, scientists need platforms where they can run
algorithms that analysis big Earth observation data sets. Since most scientists are not data
experts, they need data management solutions that are flexible and adaptable.
To work with big EO, we need to develop and deploy innovative knowledge plat-
forms. When users want to work with hundreds or thousands of images to do their analy-
sis, it is not practical to work with individual files at their local disks. Innovative platforms
should allow scientists to perform data analysis directly on big data servers. Scientists
will be then able to develop completely new algorithms that can seamlessly span parti-
tions in space, time, and spectral dimensions. Thus, we share the vision for big scientific
data computing expressed by the late database researcher Jim Gray: ”Petascale data sets
require a new work style. Today the typical scientist copies files to a local server and
operates on the data sets using his own resources. Increasingly, the data sets are so large,
and the application programs are so complex, that it is much more economical to move the
end-user’s programs to the data and only communicate questions and answers rather than
moving the source data and its applications to the user’s local system” [Gray et al. 2005].
For instance, the standard for land use and land cover monitoring includes to se-
lect and download a set of images, processing of each one using visual interpretation or
semi-automatic classification methods, to delineate the areas of interest. This approach is
ineffective when there are too much data, or for example, when working on large exten-
sions of land using high spatio-temporal resolution. In contrast to analyzing one image
at a time, time-series analysis had become a valuable alternative in land use/land cover
monitoring, including early warning of deforestation [Verbesselt et al. 2012a]. Although,
we lack environments for validating and reproducing the analysis results of large remote
sensing data [Lu et al. 2016, Maus et al. 2016]. To avoid this problem, streaming ana-
lytics have emerged as a solution by combining fast access, scalable storage and easy
deployment for complex analysis. This approach is able to analyze data in near real-time
with low latency and to point to events in regional and global scales without overhead.
Sensor and location-based social networks are common data sources analysis of
spatial data in near real-time. Since these network users generate petabytes of data,
they are provided through streaming APIs which have several applications, including
the analysis the occurrence of events [Assis et al. 2015, Schnebele et al. 2014]. Unlike
these streaming APIs, parallel streaming processing plug-ins deal with I/O interpreters
in a more intuitively by allowing a powerful and flexible way to analyze data. Hadoop1
and SciDB streaming 2 are APIs that gather large amounts of data from a file system and
multidimensional database such as Hadoop and SciDB respectively. Specifically Hadoop
streaming has the advantage of using a standard processing model called MapReduce,
which optimized for specific features with different degrees of conformance to the model
[Urbani et al. 2014, Dede et al. 2014].
However, most of the MapReduce-based approaches only provide an image library
[Sweeney et al. 2011] by means of a customization, which is limiting for analysis. Be-
sides only a small variety of analysis methods are provided at a instance and new complex
algorithms are costly to develop and reproduce [Almeer 2012]. Furthermore, most of the
available methods extract land use and land cover information using region-based classifi-
cations, even though they may cause loss of information [Giachetta and Fekete 2015]. For
these reasons, a flexible, generic and broad solution is required to reuse remote sensing
time series analysis methods, avoiding the burden of development and adaptation accord-
ing to the scientific needs.
Therefore, we propose a combination of distributed file systems and complex anal-
ysis environments in a MapReduce streaming processing analytics. It is implemented as
<key, values> pairs, where key is an image pixel location and values is the time series
associated to that given location. We evaluated this approach, using the BFAST algo-
rithm that iteratively estimates the time and number of abrupt changes within time series,
and characterizes change by its magnitude and direction [Verbesselt et al. 2010]. We use
BFAST to detect and characterize changes in time series of MODIS (Moderate Resolution
Imaging Spectroradiometer) data [Rudorff 2007]. Briefly, the main contributions of this
work are:

1. To present a time series-based streaming processing analytics using MapReduce;

2. To discuss the learned lessons from a case study to evaluate our approach in terms
of performance and quality requirements;
1
[Link]
2
[Link]
The remainder of this paper is structured as follows. Section 2 presents a discus-
sion about the time-first, space-later vs space-first, time-later analysis. Section 3 describes
the related works while Section 4 outlines our approach using MapReduce for remote
sensing time series. Section 5 depicts the evaluation of our approach and its results. Sec-
tion 6 concludes this paper with recommendations for future works.

2. Time-first, Space-later vs Space-first, Time-later

Scientists have analyzed time series of remote sensing imagery, to detect changes, in three
different ways: 1) process each image independently and compare the results for different
time instances, 2) build time series of each pixel and process them independently and
3) develop algorithms that process multiple pixels at multiple time instances . The first
type of analysis will be called hereinafter as space-first, time-later approach. This type of
analysis aims to evaluate and compare the results of a pixel classification independently
in time. For example, if more than one method of an image classification based on forest
cover percentage (see Figure 1) are applied, a pixel may be classified in distinct land
cover types. The error resulted in one of them can lead the results to a classification
inconsistency when analyzing the pixels of each scene separately. Also, this inconsistency
may also increase with the number of scenes and leading to an analysis mistake depending
on the application.
Due to this limitation, scientists have used an alternative approach in which the
methods are based on what we define as time-first, space-later approach. The key is to
consider the temporal auto-correlation of the data instead of the spatial auto-correlation
[Eklundha and Jönssonb 2012], which is really important for remote sensing time series
analysis. In this case, scientists analyze each pixel independently taking into considera-
tion all the values of the pixel along the time (see Figure 2).
For example, given a set S = {s1 , s2 , ..., sn } of remote sensing satellite imagery that
depicts the same region at n-consecutive times, we can define them as a 3-D-dimensional
array in space-time. For each digital image si ∈ S, millions of pixels are associated with
their respective spatial location (latitude, longitude), which corresponds to the (x, y, z)
position in a 3D matrix. The z-component of the matrix corresponds to the time axis in
the satellite imagery. Each pixel location (x, y, z) contains a set A = {a1 , a2 , ...am } of
attributes values, represented by spectral bands of the set of images. These attributes can
provide land-use and land-cover information as each kind of target (forest, water, soil,
among others) on the ground has a different spectral reflectance signatures based on the
wavelength.
Time-first, space-later approach is more suitable, for example, to detect deforesta-
tion or forest degradation from time series of remote sensing imagery. Supposing that we
are working with images that have an spectral attribute a that is associated to the forest
cover. We can think of a situation in which an area was a prestine forest until 2000, it was
cut out in 2001 and started to regenerate in 2010. If we follow the value of a along the
time, using the time-series complex analytics, we can monitor this dynamics. If we con-
sider large databases of imagery, with high spatial and temporal resolutions and covering
large extensions we will need the best and robust methods to deal with the big EO data.
The streaming processing analytics approach presented in this paper, is a contribution to
fulfill this demand.
Figure 1. Space First, Time
Figure 2. Time First, Space
Later
Later

3. Related Works
Due to the increasing interest on EO applications, a set of additional mechanisms have
emerged to load, process and analyze remote sensing imagery. These mechanisms aim
to convert the images into different data formats since storage components sometimes
only accepts a specific representation. Analytic algorithms have been built to enrich ex-
isting storage components with more statistical and mathematical operations, but they
still lag far behind statistical software packages such as those presented in the CRAN
repository. In order to reduce the data movement and the communication overhead be-
tween storage and analysis, integrating these storage components and R by letting each
do what they do best is still a better approach. This combination aims to scale for ana-
lytic methods over massive datasets by exploiting the parallelism of storage components
in an analyst-friendly environment [Integrating 2011]. The problem about this integration
is that a sophisticated understanding of their particular characteristics are mandatory and
functionalities need to be re-implemented. For these reasons, data should be acquired,
processed and analyzed continuously in an easily and flexible manner in near real-time.
For this, location-based social networks streams analytics have been emerged as
the most common approaches in the literature provided by means of APIs. Most of the
existing studies that use these streamings aim to provide location-based eventful visu-
alization, statistical analysis and graphing capabilities [Schnebele et al. 2014]. They also
aim to explore the spatial information involved in social networks messages. For example,
social network messages can be used to detect events in near real-time such as floods and
elections [Assis et al. 2015, Song and Kim 2013]. The challenge here is in the combina-
tion of different data flows and data formats to support the analysis of high value social
network messages in near real-time. In distributed parallel processing, streaming APIs34
have been mainly used to perform an arbitrary set of independent tasks that can be broken
into parts, and run separately in another environment with a reusable code. It takes into
consideration input/reading and output/writing commands by using stdin and stdout.
Hadoop Streaming is an exemplary API that has an advantage of using MapRe-
duce, a standard processing model, to process in near real-time by customizing how input
and output are splitted into key/value pairs. One of the most important features of this
open implementation is that Hadoop is fault-tolerant. Its main goal is to support the
execution of tasks using a scalable cluster of computing nodes [Rusu and Cheng 2013].
Hadoop-GIS, MD-HBase and SpatialHadoop are exemplary GIS tools that require
an extra overhead for more flexible functions [Aji et al. 2013, Nishimura et al. 2013,
3
[Link]
4
[Link]
Eldawy and Mokbel 2015]. Unlike dedicated proprietary services such as Google Earth
Engine that offer minimal standards for scientific collaboration, alternative interfaces of
Hadoop can abstract highly technical details for image processing from the point of view
of computer vision [Sweeney et al. 2011].
However, when a large amount of analytics algorithms are necessary, these ap-
proaches burden the developers and scientists since there is a clearly limitation of avail-
able operations and functions, mainly regarding remote sensing time series analysis. Fur-
thermore, existing studies address this approach with a more spatial focus in image classi-
fication algorithms [Almeer 2012, Giachetta and Fekete 2015], which result in more loss
of information. For these reasons, the high technical complexities involved in developing
new applications should be hide from them, and consequently, a more flexible and generic
approach is required.

4. Streaming Processing Analytics using MapReduce

Since remote sensing time series analytics require dealing with a large amount of satellite
imagery of the same place at different times, it is necessary to build an approach that
provides a fast access, a scalable storage and more flexible complex analysis methods.
This makes easier to other scientists to reproduce and validate scientific research on this
topic. With this in mind, we propose an approach that combines a streaming processing
mechanism based on MapReduce with a complex statistical analysis environment. These
choices were made based on the flexibility offered by the existing streaming processing
that allows the implementation of algorithms in different languages, as well the several
analysis components provided by these environments with specific purpose. At first, we
stored all the images in a distributed file system so that they are processed by means of
two methods (Mapper and Reducer) aiming to build the timeline values and analyze them
calling a complex algorithm.
The main advantage of using a standard processing model such as MapReduce is
in the fact that both methods receive and transmit data as <key, values> pairs, giving to
the scientists more interoperability and clear capacity of processing data. In our approach,
the Mapper input is a <key, values> pair, in which the key is an image identifier and the
values are all of the desired pixel locations (x,y), that is, the image content itself. The
Mapper is responsible for extracting the features from the images for each desired pixel,
transforming them into a time series data and emit them to the Reducer. The Mapper
output is a <key, values> pair, in which the key is a pixel location (x,y) and the values are
time series data (e.g., x = 10, y = 45, values = ”0.5 0.7 0.4 0.6” are represented as a <(10,
45), (0.5 0.7 0.4 0.6)> pair). As the Mapper output is the Reducer input, the Reducer
receives the combination of pixel and time series values, and analyze them by means of
a complex method. The result in this case is stored in the distributed file system. A
high level architecture of this time series-based streaming processing analytics for remote
sensing data can be seen in Figure 3.

4.1. Data Model and Storage

As a distributed file system is able to store any data type and format without any restric-
tion, its schema-on-read approach offers a more adequate design for our case. Unlike
schema-on-write approaches such as database management systems that require a prede-
fined schema to store and query the data, schema-on-read approaches lead to load raw
Figure 3. MapReduce Streaming Analytics Processing

and unprocessed data with a structure based on a versatile processing according to the
applications requirements. As a result, data not previously accessible are interpreted as
it is read, that is, scientists learn the data over time in near real-time. The distributed file
system enables the storage of binary files such as raster and shapefiles. Additional tools
can help scientists organizing the data either defining a structure or not around their data.
In our case, the images gathered by the satellites are stored into years in a sequence it was
processed by the provider so that it makes easier to build the time series.

4.2. MapReduce Programming Model

The MapReduce programming model consists of two methods responsible for extracting
the features from the images and processing the complex algorithms for remote sensing
time series applications in a independently and reusable manner. Both Mapper and Re-
ducer methods receive their input and output by means of standard input (stdin) and stan-
dard output stdout as <key, values> pairs. Unlike other approaches, the <key, values>
pairs are line oriented and processed as it arrives, since the Mapper and Reducer controls
the processing. In this work, the Mapper performs the filtering and sorting of both pixel
and the attributes values into lines, while Reducer performs the complex analysis and
stores the result.
An informal high-level description of Mapper can be seen in the Algorithm 1.
At first, the Mapper get the dataset names for standardized stored images before creating
raster layer objects for them according to the spectral band id chosen by the scientists. The
input is a <IMG, (x1 ,y1 ), (x1 ,y2 ), ..., (x( n),y( n)> pair, where IMG is an identifier for each
image and the latter is a list of pixel coordinates to be analyzed. At second, the Mapper
builds the time series by getting the values for each pixel. In this part, the scientist define
the pixel interval and get the values for each pixel of them. For example, for an entire
image, the scientist would define the interval from 1 to 23040000 (4800x4800 - MODIS
data resolution). At third, the Mapper calculate the pixel by ceiling the number of the
pixel divided by the image resolution for the row and getting the remainder for the col.
Lastly, the Mapper emit the time series built to the Reducer.

Algorithm 1 Transform <key, values> input into a intermediate <key, values>

procedure M APPER
connection ← openFile(”stdin”, open ← ”readbynary”)
while length(path ← readLines(connection) do
files ← insert(files, openDirectory(path))
end while
closeFile(connection)
for i←1 to length(files) do
r[i] ← raster(getDatasets(files[i])[bandId])
end for
for pixel←beginInterval to endInterval do
initialize(values)
for j←1 to length(files) do
values ← concatenate(values, getValues(r[j], row←ceiling(j/imageRes),
col←remainder(j/imageRes))
end for
emit(”stdout”, pixel, values)
end for
end procedure

On the other hand, the Reducer receives each <(x,y), time series> pair as an input,
so that (x, y) is a pixel coordinate and the time series are the attributes found in a pixel of
an image for a spectral band defined. Similar to the Mapper, the Reducer get the dataset
names for standardized files before creating the time series. Then, it adapts the time
series format as an input for the complex analysis. Finally, the Reducer emit the output
as the result of the complex analysis by storing them into the distributed file system (see
Algorithm 2).

Algorithm 2 Transform <key, values> from Mapper into output <key, values>
procedure R EDUCER
connection ← openFile(”stdin”, open ← ”readbynary”)
while length(line ← readLines(connection)) do
timeseries ← getTimeSeries(line)
ts ← preProcess(timeseries)
analysis ← complexAnalysis(ts)
emit(”stdout”, pixel, analysis)
end while
closeFile(connection)
end procedure

5. Evaluation and Results

5.1. Experimental Setup
Runtime Environment: The experiments were run on a single-node computer with In-
tel(R) Core(TM) i7-5500U CPU @ 2.40GHz and 16GiB GB RAM memory running
Ubuntu 14.04.4 LTS (64 bit).
Dataset: The MODIS scientific instruments launched in the Earth’s orbit by
NASA in 1999 were used in our experiments since they are able to capture 36 spec-
tral bands ranging in wavelength from 0.4 µ m to 14.4 µ m. They are designed to
provide measures description of the land, oceans and the atmosphere that can be used
for studies of processes on local to global scales. In our case, we considered the
MOD13Q1 Normalized Difference Vegetation Index (NDVI) due to the large amount
of remote sensing studies that have focused on time series analysis using this index
[Verbesselt et al. 2010, Grogan et al. 2016]. Since MODIS data are provided every 16
days at 250-meter spatial resolution in the Sinusoidal projection and has more than 18,000
satellite images covering Brazil from 2000 to 2016, we built a time series only using a
fraction of these data regarding time and space (92 images with 21 Giga Bytes in total).

5.2. Application Case Study: Deforestation Detection

For handling remote sensing imagery as MODIS time series, at first we organized the
MODIS data into years. This organization enables us to build an infrastructure able to
extract, transform and load all the images by converting them into standard input for the
desired methods. In this work, we considered a method, that is part of an R package called
BFAST, that aims to detect iteratively breaks in seasonal and trend components of a time
series [Verbesselt et al. 2011]. This package is not only helpful for deforestation and phe-
nological change detection, but also for forest health monitoring [Verbesselt et al. 2012b].
After running BFAST for a specific pixel (latitude=-10.408, longitude=-53.495), we ob-
tained a breakpoint in 01-17-2011 (see Figure 4). As this processing can be performed
for a large amount of other pixels, we are not considering here to check the accuracy of
such algorithm. Our focus in this work is on presenting how these kind of analysis can
be validate by using a high variety of systems. For example, the deforestation detection
in this pixel situated in the state of Mato Grosso in Brazil (see Figure 5) can be seen in
DETER5 , a system for deforestation detection in near real-time. The problem here is in
the distinct date of breakpoint found when using both sources (BFAST and DETER).

Figure 4. BFAST for a NDVI time series (latitude=-10.408, longitude=-53.495).

In our approach, we decided to integrate Hadoop and R since we were able to

take the best of massively scalable capabilities and research-friendly programming en-
vironment of complex analytics. For evaluating this integration, we performed a set of
experiments by using BFAST and other R packages to see how this integration behaves
in terms of processing time and scalability (varying the amount of pixel and images). Our
tests also allowed us to see how the overhead of these tools affected this kind of process-
ing. The results are shown in Figure 6 for four different amount of images consisting of
5
[Link]
Figure 5. Deforested Area in the state of Mato Grosso in Brazil (latitude=-10.408,
longitude=-53.495).

one, two, three and four year MODIS time series data. As we can see, the integration be-
tween Hadoop and R has a stable, adequate and linear performance even when the amount
of information increase with the time. The limitation of the performance is upon to the
hardware infrastructure, that is, an extension of the hardware capabilities would provide a
better performance in terms of storage and computation power. By comparison, for each
thousand of pixels, an amount of 6000 seconds is necessary to analyze using a complex
algorithm such as BFAST. The flexibility of running complex algorithms using the famil-
iarity of an R script overcome the high cost related to the learning curve of Hadoop. The
reason is that in R is easy to install and load new packages and a high variety of complex
algorithms can be easily deployed.

Figure 6. Processing Time to apply BFAST to different amount MOD13Q1 images

using MapReduce.

We also calculated the output size files in bytes produced by BFAST in the
MapReduce programming model (see Table 1). As we can see, the variation of the image
amount change few the size of the output using an algorithm such as BFAST. On the other
hand, as the amount of pixel increase the size of the output increase proportionally. The
output files contain the timestamps when the break of the time series were detected for
each pixel.
In addition, we deployed similar packages in R aiming to detect breaks in time
series since they can also be applied to remote sensing time series applications. We con-
sidered R packages that help to perform behavioral change point analysis (bcpa), change
point detection methods (changepoint), structural changes detection in regression models
(strucchange) and behavioral change detection in several other applications (BreakoutDe-
tection). The processing time spent for each algorithm is almost the same and can be seen
Table 1. Size Files in Bytes of MapReduce output to apply BFAST
23 images 46 images 69 images 92 images
10 pixels 171 171 171 171
100 pixels 1792 1792 1783 1750
1000 pixels 18881 18868 18820 18662
2000 pixels 38863 38859 38771 38290
3000 pixels 58849 58844 58722 58107
4000 pixels 78827 78819 78675 77694

in Figure 7. In this experiment, we vary the amount of pixel to a smaller scale compared
to the previous one.

Figure 7. Processing Time to apply several other R packages aiming to detect

breaks using 23 images.

5.3. Quality Architectural Requirements

According to [Pressman 2005], external quality architectural requirements correspond to

the attributes of the systems that can be recognized by users and are important for design
evaluation, which includes performance, flexibility, portability, reusability, interoperabil-
ity, etc. In this work, we aim to use a qualitative evaluation of these attributes with the
main purpose of generating results that can respond whether the designed system meets
the architecture quality requirements of domain specialists. For example, decide whether
the performance of the software fail or not to compromise the previously planned infor-
mation processing time.
The chosen method is an adaptation of the most used scenario-based evaluation by
industry, also known as Architecture Trade-off Analysis Method (ATAM). ATAM consid-
ers how the goals interact with each other in an achieved balance between desirable and
compatible features aiming to provide an adequate detail about architectural documents
[Nord et al. 2003]. This method guide all the stakeholders to search for conflicts in the
architecture, and consequently, solve them. In Table 2 we list the quality attributes found
in each architectural decisions. In Figure 8 is depicted the quality attributes in terms of
ISO/IEC 25010. We also aim to highlight the level of how hard is to implement each
of them and how important they are to the application domain (H: high; M: medium; L:
low).
Table 2. List of architectural decisions
Id Architectural Decision Quality Attributes Description
D1 Distributed File System Performance The file system provide
Fault-Tolerance fast access to unstructured data in a properly,
Reusability continuously and reusable operating manner
D2 MapReduce processing model Modifiability The programming model is
Adaptability easily modifiable for different purposes
D3 Multilayered Architectural Modularity The storage, processing and
analysis occur in several layers by means of decoupling
D4 Complex Analysis Environment Learnability The complex analysis
environment should be easy to learn

Figure 8. Utility tree.

6. Conclusions
Complying with the memory limitations of the R, data scientists often have to restrict
their analysis only to a subset of the data. Integrating technologies such as Hadoop with
R language offer not only a strategy to overcome its memory challenges of large data
sets, but also provides a more flexibility programming of complex analysis in storage
components. This paper presents an approach for analyzing big remote sensing time
series in near real-time using a processing model known as MapReduce.
Our results guide the processing analytics streaming approaches as a more generic
way in terms of performance and capacity. They highlighted that for different amount of
pixels, and MODIS time series (one, two, three and four years), the processing time was
linear for complex algorithms such as those found in deforestation detection applications.
Exemplary situations in which such algorithms are important were demonstrated for a spe-
cific region in Brazil. Future works will comprise studies about alternative approaches that
perform streaming analytics processing in other sources of information such as SciDB, a
multidimensional array database. We also plan to evaluate this approach in a multi-node
cluster experiment focusing more on data, memory and CPU intensive tests. The Spark
framework is also a promising and efficient approach to be tested in our approach.

7. Acknowledgments
Gilberto Camara, Luiz Fernando Assis and Alber Sanchez are supported by São Paulo Re-
search Foundation (FAPESP) e-science program (grants 2014-08398-6, 201515/19540-0
and 2016-03397-7). Gilberto is also supported by CNPq (grant 312151-2014-4). Eduardo
Llapa is also supported by a BNDES Fundo Amazonia grant.

References
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., and Saltz, J. (2013). Hadoop GIS: A High Performance Spatial Data Warehousing
System over Mapreduce. Proc. VLDB Endow., 6(11):1009–1020.
Almeer, M. H. (2012). Hadoop mapreduce for remote sensing image analysis. International Journal of Emerging Technology and
Advanced Engineering, 2(4):443–451.
Assis, L. F. F. G., Herfort, B., Steiger, E., Horita, F. E. A., and ao Porto de Albuquerque, J. (2015). Geographical prioritization of
social network messages in near real-time using sensor data streams: an application to floods. In XVI Brazilian Symposium on
Geoinformatics (GEOINFO).
Dede, E., Fadika, Z., Govindaraju, M., and Ramakrishnan, L. (2014). Benchmarking mapreduce implementations under different
application scenarios. Future Generation Computer Systems, 36:389–399.
Eklundha, L. and Jönssonb, P. (2012). Timesat 3.1 software manual. Technical report, Lund University.
Eldawy, A. and Mokbel, M. F. (2015). SpatialHadoop: A MapReduce framework for spatial data. Data Engineering (ICDE), 2015
IEEE 31st International Conference on, 1:1352–1363.
Giachetta, R. and Fekete, I. (2015). A case study of advancing remote sensing image analysis. Acta Cybernetica, 22:57–79.
Gray, J., Liu, D. T., Nieto-Santisteban, M., Szalay, A., DeWitt, D. J., and Heber, G. (2005). Scientific data management in the coming
decade. SIGMOD Rec., 34(4):34–41.
Grogan, K., Pflugmacher, D., Hostert, P., Verbesselt, J., and Fensholt, R. (2016). Mapping clearances in tropical dry forests using
breakpoints, trend, and seasonal components from modis time series: Does forest type matter? Remote Sensing, 8(8):657.
Integrating, R. (2011). Bridging two worlds with rice. Proceedings of the VLDB Endowment, 4(12).
Lu, M., Pebesma, E., Sanchez, A., and Verbesselt, J. (2016). Spatio-temporal change detection from multidimensional arrays: Detect-
ing deforestation from {MODIS} time series. {ISPRS} Journal of Photogrammetry and Remote Sensing, 117:227–236.
Maus, V., Câmara, G., Cartaxo, R., Sanchez, A., Ramos, F. M., and de Queiroz, G. R. (2016). A time-weighted dynamic time warping
method for land-use and land-cover mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Nishimura, S., Das, S., Agrawal, D., and El Abbadi, A. (2013). Md-hbase: design and implementation of an elastic data infrastructure
for cloud-scale location services. Distributed and Parallel Databases, 31(2):289–319.
Nord, R. L., Barbacci, M. R., Clements, P., Kazman, R., and Klein, M. (2003). Integrating the architecture tradeoff analysis method
(atam) with the cost benefit analysis method (cbam). Technical report, DTIC Document.
Pressman, R. S. (2005). Software engineering: a practitioner’s approach. Palgrave Macmillan.
Rudorff, B. F. R. (2007). Sensor Modis e Suas Aplicações Ambientas no Brasil. Editora Parêntese.
Rusu, F. and Cheng, Y. (2013). A survey on array storage, query languages, and systems. arXiv preprint arXiv:1302.0103.
Schnebele, E., Cervone, G., Kumar, S., and Waters, N. (2014). Real time estimation of the calgary floods using limited remote sensing
data. Water, 6(2):381–398.
Song, M. and Kim, M. C. (2013). Rtˆ2m: Real-time twitter trend mining system. In Proceedings of the 2013 International Conference
on Social Intelligence and Technology, pages 64–71.
Sweeney, C., Liu, L., Arietta, S., and Lawrence, J. (2011). Hipi: A hadoop image processing interface for image-based mapreduce
tasks. Technical report, University of Virginia.
Urbani, J., Margara, A., Jacobs, C., Voulgaris, S., and Bal, H. (2014). Ajira: a lightweight distributed middleware for mapreduce and
stream processing. In Distributed Computing Systems (ICDCS), 2014 IEEE 34th International Conference on, pages 545–554.
IEEE.
Verbesselt, J., Hyndman, R., Zeileis, A., and Culvenor, D. (2010). Phenological change detection while accounting for abrupt and
gradual trends in satellite image time series. Remote Sensing of Environment, 114(12):2970–2980.
Verbesselt, J., Zeileis, A., and Herold, M. (2011). Near real-time disturbance detection in terrestrial ecosystems using satellite image
time series: Drought detection in somalia. Technical report, Faculty of Economics and Statistics, University of Innsbruck.
Verbesselt, J., Zeileis, A., and Herold, M. (2012a). Near real-time disturbance detection using satellite image time series. Remote
Sensing of Environment, 123:98–108.
Verbesselt, J., Zeileis, A., Hyndman, R., and Verbesselt, M. J. (2012b). Package ’bfast’.

View publication stats

Landsat 8 and Sentinel-2 LULC Review
No ratings yet
Landsat 8 and Sentinel-2 LULC Review
40 pages
Remote Sensing in Modern Agriculture
No ratings yet
Remote Sensing in Modern Agriculture
19 pages
Analysis of Land Use Change and Driving Mechanisms
No ratings yet
Analysis of Land Use Change and Driving Mechanisms
17 pages
Time-Series Land Cover Mapping in GBA
No ratings yet
Time-Series Land Cover Mapping in GBA
17 pages
CNN Applications in Vegetation Remote Sensing
No ratings yet
CNN Applications in Vegetation Remote Sensing
27 pages
Big Data and Statistical Process Control
No ratings yet
Big Data and Statistical Process Control
37 pages
Coccoetal JGSG2025
No ratings yet
Coccoetal JGSG2025
31 pages
Akash: AI Geospatial Search Engine
No ratings yet
Akash: AI Geospatial Search Engine
4 pages
GeoDa: A Cross-Platform Spatial Tool
No ratings yet
GeoDa: A Cross-Platform Spatial Tool
38 pages
21. Basic Best
No ratings yet
21. Basic Best
28 pages
Automated LULC Extraction Methodology
No ratings yet
Automated LULC Extraction Methodology
23 pages
Monitoring Vegetation Clearing in Queensland
No ratings yet
Monitoring Vegetation Clearing in Queensland
15 pages
LULC Mapping with Sentinel-2 and RF Classifier
No ratings yet
LULC Mapping with Sentinel-2 and RF Classifier
20 pages
Geospatial Time Series Data Analysis
No ratings yet
Geospatial Time Series Data Analysis
5 pages
Geospatial Data Challenges in COVID-19
No ratings yet
Geospatial Data Challenges in COVID-19
32 pages
Remote Sensing in Precision Farming
No ratings yet
Remote Sensing in Precision Farming
11 pages
PETSc Framework for Coastal Ocean Modeling
No ratings yet
PETSc Framework for Coastal Ocean Modeling
27 pages
XISMuS: XRF Imaging Software Overview
No ratings yet
XISMuS: XRF Imaging Software Overview
9 pages
WEHY Model for Hydrologic Processes
No ratings yet
WEHY Model for Hydrologic Processes
55 pages
Southeast Asia Forest Change Monitoring
No ratings yet
Southeast Asia Forest Change Monitoring
21 pages
GIS for Geotechnical Decision Making
No ratings yet
GIS for Geotechnical Decision Making
12 pages
CNNs vs Random Forests in Landsat Mapping
No ratings yet
CNNs vs Random Forests in Landsat Mapping
19 pages
S Patio Temporal Analysis
No ratings yet
S Patio Temporal Analysis
27 pages
Nagpur Land Use Change & Temperature Analysis
No ratings yet
Nagpur Land Use Change & Temperature Analysis
14 pages
GPS and Remote Sensing in Disaster Management
No ratings yet
GPS and Remote Sensing in Disaster Management
7 pages
Urbanization Dynamics in Delhi Analysis
No ratings yet
Urbanization Dynamics in Delhi Analysis
18 pages
Remote Sensing in Oceanography
No ratings yet
Remote Sensing in Oceanography
11 pages
Geospatial Mapping of Ribhoi District
No ratings yet
Geospatial Mapping of Ribhoi District
2 pages
Georeferenced Maps for Urban Growth Analysis
No ratings yet
Georeferenced Maps for Urban Growth Analysis
6 pages
Global Land Cover Change 1985-2020 Analysis
No ratings yet
Global Land Cover Change 1985-2020 Analysis
23 pages
Multi-Cube Architecture for Sentinel-2 Data
No ratings yet
Multi-Cube Architecture for Sentinel-2 Data
31 pages
Big Data's Role in Earth Science
No ratings yet
Big Data's Role in Earth Science
18 pages
LUCC Mapping in Northwest China
No ratings yet
LUCC Mapping in Northwest China
18 pages
Aboveground Biomass in Pampa Grasslands
No ratings yet
Aboveground Biomass in Pampa Grasslands
213 pages
Satellite-Based Marine Research for Blue Economy
No ratings yet
Satellite-Based Marine Research for Blue Economy
19 pages
Geospatial Data Science Innovations
No ratings yet
Geospatial Data Science Innovations
8 pages
Crop Mapping: Current and Future Insights
No ratings yet
Crop Mapping: Current and Future Insights
29 pages
Agricultural Drought Assessment Review
No ratings yet
Agricultural Drought Assessment Review
14 pages
Remote Sensing in Flood Management
100% (1)
Remote Sensing in Flood Management
12 pages
Future Shipyards: IoT Innovations
No ratings yet
Future Shipyards: IoT Innovations
13 pages
Liquid Biofuels in GTAP Database
No ratings yet
Liquid Biofuels in GTAP Database
18 pages
Predicting Urban Land Use Changes with CAM
No ratings yet
Predicting Urban Land Use Changes with CAM
14 pages
Contemporary Computing Technologies
No ratings yet
Contemporary Computing Technologies
25 pages
Selecting Satellite Data for Land Studies
No ratings yet
Selecting Satellite Data for Land Studies
21 pages
Dhansiri Restaurant Menu Overview
No ratings yet
Dhansiri Restaurant Menu Overview
11 pages
ZenSVI: Open-Source 2GIS Parsing Tool
No ratings yet
ZenSVI: Open-Source 2GIS Parsing Tool
50 pages
China's Geography and Globalization Dynamics
No ratings yet
China's Geography and Globalization Dynamics
2 pages
Blockchain for Environmental Data Sharing
No ratings yet
Blockchain for Environmental Data Sharing
10 pages
SPRING: GIS and Remote Sensing Integration
No ratings yet
SPRING: GIS and Remote Sensing Integration
17 pages
Forest Landscape Dynamics Modeling
No ratings yet
Forest Landscape Dynamics Modeling
21 pages
Systematic Mapping of Data Ecosystems
No ratings yet
Systematic Mapping of Data Ecosystems
42 pages
GIS in Animal Disease Control: Insights
No ratings yet
GIS in Animal Disease Control: Insights
15 pages
NYC Taxi Data Visualization Insights
No ratings yet
NYC Taxi Data Visualization Insights
14 pages
High Performance Geographic Information System
No ratings yet
High Performance Geographic Information System
334 pages
Big Data's Role in Climate Change Research
No ratings yet
Big Data's Role in Climate Change Research
6 pages
Classification of Indian Forest Types
No ratings yet
Classification of Indian Forest Types
43 pages
Spain's Neural Network Market Insights
No ratings yet
Spain's Neural Network Market Insights
7 pages
CBERS Data Cube for Brazilian Biomes Mapping
No ratings yet
CBERS Data Cube for Brazilian Biomes Mapping
7 pages
Geomorphometric Variables in Landform Classification
No ratings yet
Geomorphometric Variables in Landform Classification
13 pages
Sentinel 2 Cropland Mapping Using Pixel Based
No ratings yet
Sentinel 2 Cropland Mapping Using Pixel Based
15 pages
GeoNat v1.0: AI for Natural Feature Mapping
No ratings yet
GeoNat v1.0: AI for Natural Feature Mapping
17 pages
Zoback 1992
No ratings yet
Zoback 1992
27 pages
Kirchner, J. W. Et Al. (2001) - Mountain Erosion Over 10 Yr, 10 K.y., and 10 M.Y. Time Scales
No ratings yet
Kirchner, J. W. Et Al. (2001) - Mountain Erosion Over 10 Yr, 10 K.y., and 10 M.Y. Time Scales
4 pages
Orographic Precipitation's Impact on Mountain Erosion
No ratings yet
Orographic Precipitation's Impact on Mountain Erosion
26 pages
Global Tectonics Post-Pangea: 180 Ma Analysis
No ratings yet
Global Tectonics Post-Pangea: 180 Ma Analysis
50 pages
DTM - Big Data in Earth Observation v1
No ratings yet
DTM - Big Data in Earth Observation v1
8 pages
QGIS 3.4 UserGuide en
No ratings yet
QGIS 3.4 UserGuide en
603 pages
Blaschke, T. Et Al. Geographic Object-Based Image Analysis - Towards A New Paradigm
No ratings yet
Blaschke, T. Et Al. Geographic Object-Based Image Analysis - Towards A New Paradigm
12 pages
Imhof, E. Cartographic Relief Presentation PDF
No ratings yet
Imhof, E. Cartographic Relief Presentation PDF
409 pages
A Practical Guide To Geostatistical - Hengl
100% (1)
A Practical Guide To Geostatistical - Hengl
165 pages
Library Automation Assignment Guide
No ratings yet
Library Automation Assignment Guide
2 pages
MOJEC TMR-2012 API Documentation
No ratings yet
MOJEC TMR-2012 API Documentation
14 pages
SAP IBP Excel Add-In Version Guide
No ratings yet
SAP IBP Excel Add-In Version Guide
11 pages
156-215.81 Check Point Exam Dumps
No ratings yet
156-215.81 Check Point Exam Dumps
32 pages
Enhancing Requirements for Project Success
No ratings yet
Enhancing Requirements for Project Success
7 pages
Dollar to Rupee Conversion Function
No ratings yet
Dollar to Rupee Conversion Function
4 pages
MS Excel Parts and Functions Guide
No ratings yet
MS Excel Parts and Functions Guide
6 pages
Cloud Technology Assignment Overview
No ratings yet
Cloud Technology Assignment Overview
1 page
LaTeX Document Examples and Code
No ratings yet
LaTeX Document Examples and Code
6 pages
Liveinterface Installation Configuration Guide
No ratings yet
Liveinterface Installation Configuration Guide
62 pages
HBSE 10th Marks Statement 2024
No ratings yet
HBSE 10th Marks Statement 2024
1 page
Logic Synthesis with Synopsys Compiler
No ratings yet
Logic Synthesis with Synopsys Compiler
10 pages
Digital Solutions Exam Paper 2024
No ratings yet
Digital Solutions Exam Paper 2024
2 pages
Reattaching ISCSI Storage Repository
No ratings yet
Reattaching ISCSI Storage Repository
2 pages
Computer Organization Course Syllabus
No ratings yet
Computer Organization Course Syllabus
2 pages
PIC 10A Homework #4 Instructions
No ratings yet
PIC 10A Homework #4 Instructions
4 pages
Introduction to Relational Database Systems
No ratings yet
Introduction to Relational Database Systems
22 pages
SVEC B.Tech II Semester Timetable 2022-23
No ratings yet
SVEC B.Tech II Semester Timetable 2022-23
6 pages
Java Client-Server Chat Application
No ratings yet
Java Client-Server Chat Application
5 pages
SAP PI - PO Tutorial - Process Integration & Orchestration PDF
No ratings yet
SAP PI - PO Tutorial - Process Integration & Orchestration PDF
12 pages
Design Team Management in Biomedical Engineering
No ratings yet
Design Team Management in Biomedical Engineering
37 pages
Image Processing for IoT Security Enhancement
No ratings yet
Image Processing for IoT Security Enhancement
25 pages
Understanding Activity Diagrams in UML
No ratings yet
Understanding Activity Diagrams in UML
42 pages
Profile of Rajavel V M
No ratings yet
Profile of Rajavel V M
3 pages
8 Essential Python Libraries for DevOps
No ratings yet
8 Essential Python Libraries for DevOps
4 pages
Java Exception Handling Overview
No ratings yet
Java Exception Handling Overview
6 pages
Imamia Jantri 2025 PDF Download
No ratings yet
Imamia Jantri 2025 PDF Download
8 pages
Parthiv Jasoliya: Tech Portfolio Overview
No ratings yet
Parthiv Jasoliya: Tech Portfolio Overview
1 page
Learner Guide - ICTICT501
No ratings yet
Learner Guide - ICTICT501
58 pages
ID-DCU-Industrial-2.8 New Features Overview
No ratings yet
ID-DCU-Industrial-2.8 New Features Overview
11 pages

Hadoop for Geospatial Data Analysis

Uploaded by

Hadoop for Geospatial Data Analysis

Uploaded by

See discussions, stats, and author profiles for this publication at: [Link]

Conference Paper · December 2016

Luiz Fernando Ferreira Gomes de Assis Gilberto Queiroz

SEE PROFILE SEE PROFILE

Karine Reis Ferreira Lubia Vinhas

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

1. To present a time series-based streaming processing analytics using MapReduce;

2. Time-first, Space-later vs Space-first, Time-later

4. Streaming Processing Analytics using MapReduce

4.1. Data Model and Storage

4.2. MapReduce Programming Model

Algorithm 1 Transform <key, values> input into a intermediate <key, values>

5. Evaluation and Results

5.2. Application Case Study: Deforestation Detection

Figure 4. BFAST for a NDVI time series (latitude=-10.408, longitude=-53.495).

In our approach, we decided to integrate Hadoop and R since we were able to

Figure 6. Processing Time to apply BFAST to different amount MOD13Q1 images

Figure 7. Processing Time to apply several other R packages aiming to detect

5.3. Quality Architectural Requirements

According to [Pressman 2005], external quality architectural requirements correspond to

Figure 8. Utility tree.

View publication stats

You might also like