生物学杂志

• 研究报告 • 上一篇    下一篇

肺癌关键基因筛选及预测模型的研究

  

  1. 江南大学 理学院, 无锡 214122
  • 出版日期:2019-04-18 发布日期:2019-04-18
  • 通讯作者: 朱平,博士,教授,主要从事计算分子生物学、理论计算机科学、代数理论研究,E-mail: zhuping@jiangnan.edu.cn
  • 作者简介:潘以红,硕士研究生,主要从事生物信息学研究,E-mail:18816201157@163.com
  • 基金资助:
    国家自然科学基金项目(No.11271163)

Research on key gene and prediction model associated with lung cancer

  1. School of Science, Jiangnan University, Wuxi 214122, China
  • Online:2019-04-18 Published:2019-04-18

摘要: 为了识别肺癌发展过程中的关键基因,揭示其发展的潜在机制,得到肺癌的预测模型,从GEO(Gene Expression Omnibus)数据库下载了基因芯片GSE19188和GSE40791进行研究。首先对基因表达数据分析得到805个同趋势差异表达基因(differentially expressed genes, DEGs),构建特异蛋白质交互(protein-protein interaction , PPI)网络。再通过基因在网络中的度和表达偏差得分乘积筛选出209个肺癌发展的关键基因。关键基因在11条细胞通路中显著富集,根据这些通路的差异得分可以清楚辨别癌症样本和正常样本。最后利用通路中18个串话基因结合支持向量机算法建立肺癌的预测模型。经测试模型分类准确性达到97%,表明这18个基因作为肺癌预测基因有较好的稳健性和敏感性,可作为肺癌的预测标记物和化疗的靶点。

关键词: 肺癌, 差异表达基因(DEGs), 蛋白质交互(PPI)网络, 串话基因, 预测模型

Abstract: The aim of the present study is to investigate the pathway of key genes that participate in the development of lung cancer and construct the prediction model. Firstly, the differentially expressed genes(DEGs) between cancer and healthy group were identified by analyzing the transcription profiles of GSE19188 and GSE40791. The two transcription profiles were downloaded from the GEO(Gene Expression Omnibus) database. Secondly, the DEGs were entered into the STRING database to construct the protein-protein interaction(PPI) network, and the key genes were screened by the product of expression deviation score and degree in the network. A total of 805 DEGs in the same trend were screened in lung cancer samples compared to normal tissue samples, including 214 up-regulated and 591 down-regulated genes. The PPI network of DEGs comprised 632 DEGs and 6312 interaction pairs. A total of 209 key genes were obtained, which were significantly enriched in seven up-regulated pathways and four down-regulated pathways. Hierarchy analysis discovered that the pathscore of the 11 significant pathways could distinguish cancer samples from normal samples. Finally, cross talk genes in the connected pathways were analyzed and used to construct the prediction model through the machine learning method. The accuracy of the prediction model could be as high as 97%, indicating that the 18 genes had good robustness and sensitivity as the prediction genes for lung cancer. So, these genes may be used as the prediction markers and target biomarkers for chemotherapies in lung cancer.

Key words: lung cancer, differentially expressed genes(DEGs), protein-protein interaction(PPI) network, cross talk gene, prediction model

中图分类号: