Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors

Authors:  Tomoya Iwakura, Seishi Okamoto

Polibits, 40, pp. 29-38, 2009.

Abstract:    This paper proposes feature augmentation methods using unlabeled data and several Named Entity (NE) extractors. We collect NE-related information of each word (which we call NE-related labels) from unlabeled data by using NE extractors. NE-related labels which we collect include candidate NE class labels of each word and NE class labels of co-occurring words. To accurately collect the NE-related labels from unlabeled data, we consider methods to collect NE-related labels by using outputs of several NE extractors. We use NE-related labels as additional features for creating new NE extractors. We apply our NE extraction methods using the NE-related labels to IREX Japanese NE extraction task. The experimental results show better accuracy than the previous results obtained with NE extractors using handcrafted resources.

Keywords:  Named entity recognition; unlabeled data; combination of extractors

PDF: Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors, Alternative link