Dataset creation methodology for CNN land use/cover classification: Thailand’s rural area study case

Main Article Content

Laurent Mezeix
Max Garcia Casanova


Land cover is a powerful tool and takes advantage of Convolutional Neural Network (CNN) in remote sensing image recognition. However, the existing datasets are pretty small or are not relied to the studied class where the land cover is performed. In this paper, a methodology is proposed and detailed to create dataset images to be used for land cover through CNN. This method consists in 4 steps. Firstly, large remote sensing images are collected. Then, a large amount of tiles are created using an adequate sampling method. Using a coarse model tiles are automatically labeled. Finally, dataset is cleaned from mislabeled images in order to be used in a CNN model. Rural area in Thailand is used as study case for a 4 class dataset: buildings, forest, roads and wasteland. In a first step, satellite images are cropped using overlapping process to create dataset tiles. Then, coarse model based on pixel RGB bands value is developed and by applying ratio on these RGB filters, tiles can be classified. Results show that building and wasteland class can be created with a very high precision of at least 98% demonstrating the robustness of the proposed method to quickly perform a dataset image. Forest presents a good precision with a value of 90%. On the opposite, roads class presents a low precision of 68% and therefore, this datasets needs to be manually cleaned by the users. Finally, effects of cropping and overlapping size are investigated and results show that using a different cropping size requires a new calibration of the methodology.


Download data is not yet available.

Article Details

How to Cite
L. Mezeix and M. G. Casanova, “Dataset creation methodology for CNN land use/cover classification: Thailand’s rural area study case ”, DTAJ, vol. 5, no. 11, pp. 74–95, Feb. 2023.
Research Articles


