摘 要: |
With the rapid urbanization process in China, numerous urban villages have been appeared, which are surrounded by the newly-built urban blocks. Due to the high population density, poor hygiene, chaotic waste discharge, and inadequate public facilities, urban villages have many negative impacts on both urban environment and management. The objective of this study is to propose a dual-branch deep learning model for multi modal satellite and street-view data fusion to detect urban villages in Beijing, Tianjin and Shijiazhuang, which are the core cities of Jing-Jin-Ji region of China. Specifically, the proposed model consists of a satellite branch, a street-view branch and a gated-fusion module. As for the satellite branch, a Trans-MDCNN (multi-scale dilated convolutional neural network) is proposed to learn multi-level local features and global contextual features from high resolution satellite imagery, while for the street-view branch, an MVRAN (multi-view recurrent attention network) is constructed to learn and fuse multi-angle features from street-view images. A gated-fusion module is designed to aggregate the important features of the dual-branches. Experimental results indicate that the proposed model has achieved good performance with an overall accuracy (OA) of 92.61%. Ablation study shows that compared with satellite data alone, the integration of street-view images could increase the OA by about 2%. Besides, 1-D feature fusion outperforms its 2-D counterpart and the classic feature concatenation method. The proposed model also yields a better performance than other deep learning models. Finally, the dataset of this study, (SUV)-U-2 (Satellite & Street-view images for Urban Village classification), is publicly available: https://doi.org/10.11922/sciencedb.01410. |