摘 要: |
Geo-parsing, one of the key components of geographical information retrieval, is a process to recognize and geo-locate toponyms mentioned in texts. Such a process can obtain locations contained in toponyms successfully with consistent updating of neural network models and multiple contextual features. The significant offset distance between the geo-parsed locations and the actual occurrence locations still remains. This is because the geo-parsed locations sourced from toponyms in texts always point to the centers of cities, counties, or towns, and cannot directly represent the actual occurrence locations such as factories, farms, and activity areas. Consequently, The significant offset distances between the geo-parsed locations and the actual occurrence locations limit text mining applications in micro-scale geographic discoveries. This research aims at decreasing offset distances of geo-parsed locations by proposing a novel Toponym Correction Method based on satellite Remote Sensing Images (TC-RSI). The TC-RSI method uses satellite remote sensing images to provide extra detailed spatial information that can be associated with the sentence toponym by corresponding attributes. The TC-RSI method was validated in a case study of the forest ecological pattern dataset of An'hui province from visual, statistical, and robustness assessments. The correction results show that the TC-RSI method dramatically decreases the offset distances from about 50 km to about 1 km and promotes geographical discoveries on smaller scales. A series of analyses indicated that the TC-RSI is a valid, effective, and promising method to improve the accuracy of geo-parsed locations, which allows text mining to find more accurate geographical discoveries with lower offset distances. Moreover, toponym correction promotes the use of more diverse spatial data sources, such as Lidar, domain gazetteers, Wikimedia, and streetscapes, which are expected to usher in a new era of geo-parsing with toponym corrections. |