Using Long Short-Term Memory Recurrent Neural Network in Land Cover Classification on Landsat and Cropland Data Layer time series

Abstract: Land cover maps are significant in the agricultural analysis. However, the existing workflow of producing maps takes too long. This work builds long short-term memory (LSTM) recurrent neural network (RNN) model to improve the update frequency. An end-to-end framework is proposed to train the model. Landsat scenes are used as Earth observations. Field-measured data and CDL (Cropland Data Layer) are used as ground truth. The network is deeply trained using state-of-the-art techniques. Finally, we tested the network on multiple Landsat images to produce five-class land cover maps with timestamps. The results are visualized and compared with CDL and ground truth. The experiment shows a satisfactory overall accuracy (>97%) and proves the feasibility of the model. This study paves a path to efficiently using LSTM RNN in remote sensing image classification.

  1. Introduction

The remote sensing (RS) community has vaguely recognized that conventional schemes are driving to dead end [1-5]. Typically in the 2012 ImageNet competition, the neural network based solutions outperformanced most conventional methods. A recent trend shows that researchers are moving to deep learning (DL). A quite number of studies practice neural network to classify RS images and already achieved some satisfactory results.

Meanwhile, RS images have accumulated to big scale [4]. NASA has archived petabytes of data in their archives [6-8]. Conventional image interpretation technique has only exposed a tiny part of the information in the rich mine [9,10]. The pace of mining data is far behind the speed of acquiring data. Slowness and too many manual processes are the major obstacles to fully exposition. Thus DL, which is more automatic and addresses faster interpretation, becomes more and more popular in RS image analysis.

However, the success of DL requires the availability of more data and more powerful computational engines like Graphic Processing Units (GPU). It is not equal to decidedly say DL is a better algorithm than conventional algorithms like SVM or Decision Trees. A success neural network also requires careful engineering and considerable domain expertise to design the network configuration.

Feedforward neural network (FNN) and recurrent neural networks (RNN) are two mostly used networks. The former feeds information straight through the network, the latter cycles the information through a loop. RNN is often considered to have better memory capability than FNN and is more suitable to deal with time series.

1.1. Problem Statement

CDL is a land cover product by USDA NASS for the continental U.S. It has a very high accuracy due to its private ground truth data from NASS field offices. But CDL only has one layer for each year and the land cover usually changes along with season. Landsat satellites have observed millions of scenes at 30 meter resolution in the past forty years. Using the knowledge from the available CDL to classify Landsat scenes in different seasons is a potential urgent need.

1.2. Contributions

This paper creates a LSTM RNN to utilize CDL time series to guess the land cover of the Landsat pixels. We build a RNN with three hidden LSTM (long-short-term-memory) layers. We preprocessed Landsat and CDL time series, and used them to prepare training and testing dataset. We trained the network by many times and recorded the accuracy of each training phase. The results are plotted to charts and compared. We retrieve a fairly satisfactory accuracy when using the trained network on several Landsat scenes.

1.3. Relate work

ANN (artificial neural network), especially deep neural networks (DNN), already has plenty of application in image recognition [11]. A thorough investigation is made on the current researches of ANN in RS. Audebert et al reveals the general potential benefits that could be brought by DL to remote sensing [12]. They tested various deep network architectures in classification and semantic mapping of aerial images and better performances are achieved. Cooner et al evaluated the effectiveness of multilayer feedforward neural networks, radial basis neural networks and Random Forests in detecting earthquake damage by the 2010 Por-au-Prince Haiti 7.0 moment magnitude event [13]. Duro et al compared pixel-based and object-based image analysis approaches for classifying broad land cover classes over agricultural landscapes using three supervised learning algorithms: decision tree (DT), random forest (RF), and support vector machine (SVM) [14]. Zhao et al used multi-scale convolutional auto-encoder to extract features and trained logistic regression classifier for classification and got better results than traditional methods [15]. Kussul et al designed a multilevel DL architecture to classify land cover and crop type from multi-temporal multisource satellite imagery [16]. Maggiori et al trained CNNs to produce classification maps out of images [17]. Das et al proposed Deep-STEP for spatiotemporal predication of satellite remote sensing data [18]. They derived NDVI data from thousands to millions of pixels from satellite remote sensing imagery using DL. Marmanis et al used a pretrained CNN from ImageNet challenge to extract an initial set of representations which are later transferred into a supervised CNN classifier [19]. Ienco et al evaluated the LSTM RNN on land cover classification considering multi-temporal spatial data from a time series of satellite images [20]. Their experiments are made under both pixel-based and object-based scheme. The results show the LSTM RNN is very competitive compared to state-of-the-art classifiers and even outperform classic approaches at low represented and/or highly mixed classes. Li et al used DL to detect and count oil palm trees in high-resolution remote sensing images [21]. Successful cases have validated the great potential of DL in RS image recognition. This study adds a study case using LSTM RNN upon satellite imagery time series.

  1. Materials and Methods

2.1. Study Area and Materials

We choose the Eastern North Dakota, which has sound archive of historical Landsat and CDL images, as the study area. North Dakota is a state of the northern U.S. as shown in Fig. 1 and agriculture is its number one industry and economy base [22]. The products of North Dakota take a significant part in the overall yield of U.S. agriculture according to the U.S. National Agricultural Statistics Service (NASS), especially spring wheat and durum wheat[1].

Landsat satellites have observed the Earth for more than four decades and obtained more than six millions of scenes [23,24]. Landsat 5 delivered back images from the space since 1984. Landsat 7 operated perfectly after its 1999 launch but generated gaps in all the captured images due to the malfunction of Scan Line Corrector since May 2003. In 2013, while the Landsat 5 is planned to decommissioned, a new satellite, Landsat 8, is pushed to the orbit to continue the mission [25]. Each Landsat satellite provides images for any point on the Earth surface about every two weeks at 30 meter resolution.

CDL is an annual land cover product of the continental U.S. made by NASS. It is very popular and widely accepted as a fairly accurate reflection to the truth. The resolution is also 30 meters. Each year has one layer. Landsat images are one of its source datasets. Meanwhile, CDL fused the ground truth data received by NASS field offices. That results in a much better accuracy than the other existing land cover products. The claimed accuracy is 85% to 95% for major crop types [26]. The covered period of Landsat and CDL in North Dakota is as shown in Fig. 2. Another reason to choose North Dakota is that it is the only state having CDL from the very beginning. The CDL program only started to cover the entire continental U.S. since 2008. Given 1997 is the year when both Landsat and CDL started to coexist, we select the years after as our study interval.

[1]https://www.nass.usda.gov/Statistics_by_State/North_Dakota/Publications/Top_Commodities/pub/rank12.pdf

Figure 1. Study area

Figure 2. The availability of data since 1997 in North Dakota

2.2. Recurrent Neural Network and Long Short-Term Memory

Traditional RNN is simple. Different from FNN, RNN has feedback connections. The outputs of previous time steps will be considered in current time step. So the historical statuses have a long-term influence on the future judgment, which “memory” is all about. Given  is the sequence of input vectors, is the hidden vector sequence and  is the sequence of outputted vectors, where n represents the overall time steps. An example RNN cell is displayed in Fig. 3 (the left one). The equations computing output vectors from input vectors are as follows.

where  are weights on connections,  is the neuron activiation (mostly Tanh in RNN). The self-connection weigth  is usually simply set to 1.The subsequent back propagation will adjust all the weights of the entire input sequence.

Figure 3. RNN and LSTM introduction

LSTM RNN is more complex. We take the definition from Graves as baseline [27]. As shown in Fig. 3 (the right one), a cell of LSTM RNN has three extra “gates” which control the involvement of the context information. The input gate is responsible for scaling input to the cell. The output gate is to scale the output from the cell. The forget gate is to scale the old cell value. The equations for computing the gate outputs are:

where  is an activation function (e.g., logistic sigmoid or Tanh), and i, f, o and g are the output vectors of input gate, forget gate, output gate and the cell itself, respectively.  denotes the weights of connections, for example,  is the weight of the connection between the input  and the input gate and  is the connection weight between the input  and the input gate. b represents bias input.

The simple RNN cannot look far back to the past. LSTM solved that problem [28]. As its extraordinary performance, LSTM RNN have become a popular choice for modeling inherently dynamic process like voice and handwriting [27] and massively used by tech giants like Apple, Google, Microsoft and Amazon in their products. This work reuses the architecture and examines its performance in RS image classification. As the image spatial/temporal series have many similar characters to the signal of speech or handwriting, the same level of performances are expected to be achieved in RS too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s