摘 要: |
Toponym recognition is used to extract toponyms from natural language texts, which is a fundamental task of ubiquitous geographic information applications. Existing toponym recognition methods with state-of-the-art performance mainly leverage supervised learning (i.e., deep-learning-based approaches) with parameters learned from massive, labeled datasets that must be annotated manually. This is a great inconvenience when model training needs to fit different domain texts, especially those of social media messaging. To address this issue, this article proposes a weakly supervised Chinese toponym recognition (ChineseTR) architecture that leverages a training dataset creator that generates training datasets automatically based on word collections and associated word frequencies from various texts and an extension recognizer that employs a basic bidirectional recurrent neural network based on particular features designed for toponym recognition. The results show that the proposed ChineseTR achieves a 0.76 F1 score in a corpus with a 0.718 out-of-vocabulary rate and a 0.903 in-vocabulary rate. All comparative experiments demonstrate that ChineseTR is an effective and scalable architecture that recognizes toponyms. |