Dataset Creation Methodology for CNN Land Use/ Cover Classification: Thailand’s Rural Area Study Case
Land cover is a powerful tool and takes advantage of Convolutional Neural Network
(CNN) in remote sensing image recognition. However, the existing datasets are pretty small
or are not relied to the studied class where the land cover is performed. In this paper,
a methodology is proposed and detailed to create dataset images to be used for land cover
through CNN. This method consists in 4 steps. Firstly, large remote sensing images are collected.
Then, a large amount of tiles are created using an adequate sampling method. Using a coarse
model tiles are automatically labeled. Finally, dataset is cleaned from mislabeled images in
order to be used in a CNN model. Rural area in Thailand is used as study case for a 4 class
dataset: buildings, forest, roads and wasteland. In a first step, satellite images are cropped using
overlapping process to create dataset tiles. Then, coarse model based on pixel RGB bands
value is developed and by applying ratio on these RGB filters, tiles can be classified. Results
show that building and wasteland class can be created with a very high precision of at least
98% demonstrating the robustness of the proposed method to quickly perform a dataset
image. Forest presents a good precision with a value of 90%. On the opposite, roads class
presents a low precision of 68% and therefore, this datasets needs to be manually cleaned by
the users. Finally, effects of cropping and overlapping size are investigated and results show
that using a different cropping size requires a new calibration of the methodology.
