文章摘要
李丽双.基于条件随机场的汽车领域术语抽取[J].,2013,53(2):267-272
基于条件随机场的汽车领域术语抽取
Automotive term extraction based on conditional random fields
投稿时间:2013-03-19  修订日期:2013-03-20
DOI:10.7511/dllgxb201302018
中文关键词: 信息抽取  领域术语抽取  汽车领域术语  条件随机场
英文关键词: information extraction  domain term extraction  automotive term  conditional random fields
基金项目:国家自然科学基金资助项目(71031002,61173101,61173100).
作者单位
李丽双  
摘要点击次数: 1654
全文下载次数: 1810
中文摘要:
      中文领域术语抽取是中文信息处理领域的一项重要研究任务,在词典构建、领域本体构造等方面有重要的应用.采用条件随机场(conditional random fields, CRFs),从汽车知识网站上爬取网页,预处理后得到纯文本,然后分析汽车领域的术语组成特点并制定相应的语料标注规则进行人工标注,对汽车领域进行了术语抽取.在使用词和词性特征的基础上增加了词典特征、领域词频和背景领域词频等特征,精确率、召回率和F\|值分别达到84.61%、80.50%和82.50%.与其他方法比较说明所提出的汽车领域术语抽取方法是有效的.
英文摘要:
      Chinese domain term extraction is an important task in Chinese information processing, which has been applied to the construction of lexicography and ontology and so on. Term extraction based on CRFs (conditional random fields) in automotive field is discussed. Firstly, plain text is extracted from crawled web pages relating to automotive knowledge with preprocessing. Then, corpus is labeled manually with corresponding rules written by analyzing the characteristics of automotive terms. Therefore, domain corpus for term extraction is constructed. The features of dictionary, word frequencies in the domain and other domain corpora are used besides the features of word and part-of-speech. Experimental results show that the precision, recall and F-score are 84.61%, 80.50% and 82.50% respectively. The comparison with other methods illustrates that the established model for extracting automotive term is effective.
查看全文   查看/发表评论  下载PDF阅读器
关闭