WHU Cloud Dataset

Click Here to download.

We manually edited a Landsat 8 dataset for cloud detection and removal, which contains the cloudy images, corresponding cloudless historical images, and cloud and shadow masks. We named it as WHU cloud dataset. The whole dataset includes six different regions, covering different land types such as forests, cities, oceans, lakes, and mountains. Only red, green and blue bands, which correspond to the bands 4, 3, 2 of Landsat 8 images, are chosen. The spatial resolution is 30m. We selected images at the same region that were not affected by the clouds before about 1-6 months as multi-temporal data. The clouds in cloudy images account for about 5%.

Table I. Summary of the study sites used in the experiment
Data Path/row Location Covers Acquired Time Temporal data Time
I 118/032 Liaoning Province & North Korea forests, residential areas and sea 2018/06/24 2018/05/23
II 119/038 Jiangsu Province flat coastal area and large residential areas 2018/05/14 2018/02/23
III 123/039 Hubei Province Mountains, plains and lakes 2018/04/08 2017/10/30
IV 124/033 Hebei Province mountains and cities 2018/04/15 2017/12/23
V 126/035 Shanxi province Mountainous 2018/05/13 2018/03/12
VI 127/034 Shaanxi province Mountainous 2018/06/23 2018/01/14

The image of path/row 118/032 is located at the junction of Liaoning Province, China and North Korea, covering with forests, residential areas and sea. The image obtained at May 23, 2018 is utilized as the historical reference data and the cloud and shadows in the image of June 24, 2018 is to be detected and repaired. The size of the whole image is 7511 x 7791.

The second image, path/row 119/38, is located at the eastern flat coastal area of Jiangsu Province, China with large residential area. The image obtained at February 23, 2018 is utilized as the historical reference data and the cloud and shadows in the image of May 14, 2018 is to be detected and repaired. The size of the whole image is 7691 x 7921.

The image of path/row 123/39 is located at Hubei Province, where consists mountains and plains with lakes. The image obtained at October 30, 2017 is utilized as the historical reference data and the cloud and shadows in the image of April 8, 2018 is to be detected and repaired. The size of the whole image is 7501 x 7681.

The fourth area is in Hebei Province, northern China (path/row 124/33). Half of this region is a plain area with cities and the other half mountains. The image obtained on December 23, 2017 is utilized as the historical reference data and the cloud and shadows in the image of April 15, 2018 is to be detected and repaired. The size of the whole image is 7791 x 7971.

The images (path/row 126/35 and path/row 127/ 34) both are mountainous and located in Shanxi and Shaanxi provinces, China. In Shanxi, the image obtained at March 12, 2018 is utilized as the historical reference data and the cloud and shadows in the image of May 13, 2018 is to be detected and repaired. The size of the whole image is 7691 x 7841. In Shaanxi, the image obtained at January 14, 2018 is utilized as the historical reference data and the cloud and shadows in the image of June 23, 2018 is to be detected and repaired. The size of the whole image is 7701 x 7911.

Considering the computational limitation, we cropped the images into 512 x 512 patches seamlessly. For cloud detection, 680 patches are for training, 50 patches for validation and 129 for testing. For cloud removal, the ground truth can be simulated on the current image pairs (with available masks). For example, 60% patches in each image is selected for training, 20% for validation and 20% for testing.

The paper named “Simultaneous cloud detection and removal from bi-temporal remote sensing images using cascade convolutional neural networks” are now undergoing review at the IEEE Transactions on Geoscience and Remote Sensing.