李丽双.基于条件随机场的汽车领域术语抽取[J].,2013,53(2):267-272 |
基于条件随机场的汽车领域术语抽取 |
Automotive term extraction based on conditional random fields |
投稿时间:2013-03-19 修订日期:2013-03-20 |
DOI:10.7511/dllgxb201302018 |
中文关键词: 信息抽取 领域术语抽取 汽车领域术语 条件随机场 |
英文关键词: information extraction domain term extraction automotive term conditional random fields |
基金项目:国家自然科学基金资助项目(71031002,61173101,61173100). |
|
摘要点击次数: 1654 |
全文下载次数: 1810 |
中文摘要: |
中文领域术语抽取是中文信息处理领域的一项重要研究任务,在词典构建、领域本体构造等方面有重要的应用.采用条件随机场(conditional random fields, CRFs),从汽车知识网站上爬取网页,预处理后得到纯文本,然后分析汽车领域的术语组成特点并制定相应的语料标注规则进行人工标注,对汽车领域进行了术语抽取.在使用词和词性特征的基础上增加了词典特征、领域词频和背景领域词频等特征,精确率、召回率和F\|值分别达到84.61%、80.50%和82.50%.与其他方法比较说明所提出的汽车领域术语抽取方法是有效的. |
英文摘要: |
Chinese domain term extraction is an important task in Chinese information processing, which has been applied to the construction of lexicography and ontology and so on. Term extraction based on CRFs (conditional random fields) in automotive field is discussed. Firstly, plain text is extracted from crawled web pages relating to automotive knowledge with preprocessing. Then, corpus is labeled manually with corresponding rules written by analyzing the characteristics of automotive terms. Therefore, domain corpus for term extraction is constructed. The features of dictionary, word frequencies in the domain and other domain corpora are used besides the features of word and part-of-speech. Experimental results show that the precision, recall and F-score are 84.61%, 80.50% and 82.50% respectively. The comparison with other methods illustrates that the established model for extracting automotive term is effective. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|