文章摘要
基于广义线性模型的基因表达水平预测
Prediction of Gene Expression Level Based on Generalized Linear Model
投稿时间:2019-08-09  修订日期:2019-09-29
DOI:
中文关键词: 广义线性模型  主从模型  组蛋白修饰  基因表达
英文关键词: generalized linear model  master-slave model  histone modification  gene expression
基金项目:国家自然科学基金资助项目(No.81872247)
作者单位邮编
师豪杰 控制科学与工程学院 116024
顾宏 控制科学与工程学院 
徐晓璐 控制科学与工程学院 
秦攀* 控制科学与工程学院 116024
摘要点击次数: 285
全文下载次数: 0
中文摘要:
      组蛋白修饰是生物体中普遍存在的一种现象,能够以不同的调控方式影响基因表达,且随着高通量测序技术的飞速发展,大量的测序数据使得探究组蛋白修饰信号与基因表达水平之间的内在联系成为可能。由于基因表达数据存在零膨胀现象,本文提出了一种基于广义线性模型的主从模型,能够以较高精度从组蛋白修饰信号预测基因表达水平。本文首先通过人类全基因组注释文件中的基因位点信息,筛选出包含完整基因位点信息的表达数据;其次,根据基因位点信息,定位并提取出组蛋白修饰数据中基因特定位点的特征信息,构建设计矩阵;最后结合响应变量数据零膨胀的特点,构建主从模型进行建模分析,以GM12878细胞系为例,与现有的多种回归算法进行对比,验证了所提方法的有效性。
英文摘要:
      Histone modification is a common phenomenon in organisms, which can affect gene expression in various ways. With the rapid development of high-throughput sequencing technology, adequate sequencing data make it possible to explore the relation between histone modification and gene expression. In this paper, a two-stepwise model based on the generalized linear model framework was proposed for GM12878 cell line, which can predict gene expression levels from histone modification signals with high precision. First, we used gene locus information from the human genome-wide annotation file to screen out the expression data which contain the complete locus information. Secondly, according to the locus information, the characteristics of the gene-specific locus in the histone modification data are located and extracted, and then constructing the design matrix. Finally, combined with the zero-expansion characteristics of the response variable data, the master-slave model is constructed and analyzed, then compared with the existing multiple regression algorithms. Compared with other models by using the data of GM12878 cell line, the proposed model performs best.
View Fulltext   查看/发表评论  下载PDF阅读器
关闭