Multi-Label Remote Sensing Image Retrieval by Using Deep Fetures
| | ![Begum Demir][BegumDemir-photo] | | |:-:|:-:|:-:|:-:| | Michele Compri | Begum Demir | Xavier Giro-i-Nieto |
A joint collaboration between:
| | | |:-:|:-:|:-:| | [Insight Centre for Data Analytics][insight-web] | [Dublin City University (DCU)][dcu-web] | UPC Image Processing Group |
Abstract
Recent advances in satellite technology has led to an increased volume of remote sensing (RS) image archives, from which retrieving useful information is challenging. Therefore, one important research area in remote sensing (RS) is the content-based retrieval of RS images (CBIR). The performance of the CBIR systems relies on the capability of the RS image features in modeling the content of the images as well as the considered retrieval algorithm that assesses the similarity among the features. Using supervised classification methods in the context of CBIR by training the classifier with the already annotated images has attracted attention in RS. However, existing supervised CBIR systems in the RS literature assume that each training image is categorized by only a single label that is associated to the most significant content of the image. However, RS images usually have complex content, i.e., there are usually several regions within each image related to multiple land-cover classes. Thus, available supervised CBIR systems are not capable of accurately characterizing and exploiting the high level semantic content of RS images for retrieval problems. To overcome these problems and to effectively characterize the high-level semantic content of RS images in supervised CBIR problems, we investigate effectiveness of different deep learning architectures in the framework of multi-label remote sensing image retrieval. It is worth noting that deep learning architectures such as CNNs have recently attracted great attention in RS [1,2] due to its effective and accurate feature learning. However, according to our knowledge this is the first work that deals with adaptation of CNN models to multi-label RS image retrieval problems. This is achieved based on a two-steps strategy. In the first step, a Convolutional Neural Network (CNN) pre-trained for image classification with the ImageNet dataset is used off-the-shelf as a feature extractor. In particular, three popular architectures are explored: 1) VGG16; 2) Inception V3; and 3) ResNet50. VGG16 is a CNN characterized by 16 convolutional layers of stacked 3x3 filters, with intermediate max pooling layers and 3 fully connected layers at the end. Inception V3 is an improved version of the former GoogleNet, which contains more layers but less parameters, by removing fully connected layers and using a global average pooling from the last convolutional layer. ResNet50 is even deeper thanks to the introduction of residual layers, that allow data to flow by skipping the convolutional blocks. In the second step of our research, we modify these three off-the-shelf models by fine-tuning their parameters with a subset of RS images and their multi-label information. Experiments carried out on an archive of aerial images show that fine-tuning CNN architectures with annotated images with multi-labels significantly improve the retrieval accuracy with respect to the standard CBIR methods. We find that fine-tuning using with a multi-class approach achieves better results than considering each label as an independent class.
Publication
This source code was used in the development of the master thesis of Michele Compri.
<iframe src="https://github.com// Slideswww.slideshare.net/slideshow/embed_code/key/aur7h9ST7R35Oa" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>
Software frameworks: Keras
The model is implemented in Keras, which at its time is developed over Theano.
pip install -r https://github.com/massens/saliency-360salient-2017/blob/master/requirements.txt
Acknowledgements
We would like to especially thank Albert Gil Moreno from our technical support team at the Image Processing Group at the UPC, as well as Albert Jimenez for his support with Keras.
Albert Gil |
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeoForce GTX Titan X used in this work. | |
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. | |
This work has been developed in the framework of the projects BigGraph TEC2013-43935-R and Malegra TEC2016-75976-R, funded by the Spanish Ministerio de EconomΓa y Competitividad and the European Regional Development Fund (ERDF). |
Contact
If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at mailto:[email protected].