Storage of Usage Information on Research Resource Metadata Database: Corpus Construction Using Academic Articles

Main Article Content

Shunsuke Kozawa
Hitomi Tohyama
Kiyotaka Uchimoto
Shigeki Matsubara

Abstract

Recently, language resources have become indispensable for linguistic researches. However, existing language resources are seldom fully utilized because their variety of usage is not well known, indicating that their intrinsic value is not recognized very well either. Regarding this issue, lists of usage information might improve language resource searches and lead to their efficient use. In this research, therefore, we collect a list of usage information for each language resource from academic articles to promote the efficient utilization of language resources. This paper describes the construction of a text corpus annotated with usage information (UI corpus). In particular, we automatically extract sentences containing language resource names from academic articles. Then, the extracted sentences are annotated with usage information by two annotators in a cascaded manner. We will show that the UI corpus contributes to efficient language resource searches, by combining the UI corpus with a metadata database of language resources and comparing the number of language resources retrieved with and without the UI corpus.

Article Details

How to Cite
[1]
S. Kozawa, H. Tohyama, K. Uchimoto, and S. Matsubara, “Storage of Usage Information on Research Resource Metadata Database: Corpus Construction Using Academic Articles”, ECTI-CIT Transactions, vol. 5, no. 2, pp. 98–106, Apr. 2016.
Section
Artificial Intelligence and Machine Learning (AI)