Construction of parallel Chinese-English phrase dependency treebank
中文关键词: 短语依存树库  机器翻译  节点对齐  句法功能  语义角色
英文关键词: phrase dependency treebank  machine translation  node alignment  syntactic function  semantic roles
摘要点击次数: 2198
全文下载次数: 2063
      提出了面向翻译研究的融合短语结构树和依存分析的短语依存树库(phrase dependency treebank, PDT)的构建思想,阐述了中英平行PDT的构建方法.PDT 采用“扁平结构优先”的短语结构树和“基于语义”的依存句法功能标注原则,有别于传统依存分析的完全二分法.大连理工大学中英平行PDT(DUT-CEPDT)的生语料取自文本质量较高的政府工作报告和白皮书及其官方译文.首先,对文本进行分词和词性标注之后,利用专为语言学家开发的辅助工具LingTreeConstructor构建中文和英文的单语PDT;之后,在两个单语PDT之间从篇章到词的节点进行对齐,这种多层次的立体对齐比只有词、短语或句子的单层对齐能提供更丰富的翻译知识;最后,依据FrameNet进行双语平行的框架语义角色标注.DUT-CEPDT将为译员培训和机器翻译研究提供所需的标准语料.
      A phrase dependency treebank (PDT) integrating phrase structure grammar and dependency grammar is proposed and elaborated to cater for translation studies. The construction of DUT Parallel Chinese-English PDT (DUT-CEPDT) is reported. PDT favors flat structures and the dependency is based on semantics rather than syntactic functions, which differs from the mainstream dependency analysis that favors binary branching. The raw texts of DUT-CEPDT are Chinese government work reports and White Papers and their official English translation. First of all, after word segmentation and part of speech (POS) tagging, Chinese PDT and English PDT are constructed manually with the aid of LingTreeConstructor, a tool tailored for linguists. Then, node alignment, which covers translation alignments of words, phrases, clauses up to the whole passage, is proposed instead of traditional word or sentence alignment to provide more translation knowledge. Lastly, semantic roles based on the FrameNet are labeled simultaneously on the aligned nodes of the English and Chinese trees. DUT-CEPDT can serve as a resource and standard of the training and assessment of both human translators and machine translation systems.
查看全文   查看/发表评论  下载PDF阅读器