生物学杂志 ›› 2022, Vol. 39 ›› Issue (3): 46-.doi: 10.3969/j.issn. 2095-1736.2022.03.046

• 研究报告 • 上一篇    下一篇

基于多组学的T-ALL差异基因筛选及分析

  

  1. 河北工业大学 人工智能与数据科学学院, 天津 300401
  • 出版日期:2022-06-18 发布日期:2022-06-17
  • 通讯作者: 胡和智,讲师,从事非编码RNA调控疾病功能领域研究,E-mail:huhezhi@hebut.edu.cn
  • 作者简介:李建伟,教授,博士生导师,从事非编码RNA生物信息学相关研究,E-mail:lijianwei@hebut.edu.cn
  • 基金资助:
    国家自然科学基金项目(No.81672113);河北省自然科学基金项目(C2018202083)

Screening and analysis of differential genes in T-ALL based on multi-omics data #br# #br#

  1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
  • Online:2022-06-18 Published:2022-06-17

摘要: 采用生物信息学方法分析急性T淋巴细胞白血病(t-cell acute lymphoblastic leukemia,T-ALL)患者转录组和表观遗传组的差异基因,以期筛选出该病的关键基因并探索其发病机制。从GEO与SRA数据库中下载T-ALL的RNA-seq、CTCF ChIP-seq、DNA甲基化数据,分别运用DESeq2、edgeR生物信息学分析软件筛选RNA-seq和ChIP-seq数据的差异基因,运用CHAMP软件筛选DNA甲基化数据的差异基因。从RNA-seq数据、ChIP-seq数据和DNA甲基化数据中分别筛选出5887个、5315个和2196个差异基因,3组数据取交集后得到119个差异基因。构建基因相似性融合网络,筛选出基因间相互作用强且多的48个关键基因。对关键基因进行GO和KEGG通路的功能富集分析,并使用STRING数据库构建蛋白互作网络,借助Cytoscape软件筛选出8个核心基因(CTLA4、CD7、GPR29、CD5、CD247、IL2RB、FASLG和CD274)。经检索CGC与CTD数据库,均表明这8个核心基因有成为T-ALL生物标志物的潜力,为深入探索T-ALL的发病机制及研发相关靶向药物提供帮助。

关键词: T-ALL, 相似性融合网络, 差异基因, 关键基因, 生物信息学

Abstract: Bioinformatics analysis approaches were employed to screen the differential genes from the genomic and epigenetic data of the patients with T-cell acute lymphoblastic leukemia (T-ALL), and the multi-omics gene similarity fusion networks were built with a view to screen out key genes and explore its pathogenic mechanisms. The data of RNA-seq, CTCF ChIP-seq and DNA methylation of T-ALL were downloaded from GEO and SRA databases. Using both DESeq2 and edgeR software, the differential gene expression analysis of RNA-seq and CTCF ChIP-seq data was performed. The CHAMP software was adopted to screen the differential genes in DNA methylation data. Since then, 5 887, 5 315 and 2 196 differential genes had been identified from the data of RNA-seq, CTCF ChIP-seq and DNA methylation, respectively. There were 119 genes in the intersection of the three differential gene sets. The multi-omics gene similarity fusion network was constructed, and 48 key genes with strong interactions and more associations were screened out from it. Gene Ontology (GO) and KEGG path enrichment analysis were performed for the 48 key genes, the protein-protein interaction network of the key genes was established by using the STRING database, Cytoscape software was used to select eight core genes (CTLA4, CD7, GPR29, CD5, CD247, IL2RB, FASLG and CD274). After comprehensive searches in CGC and CTD databases, the results indicate that the eight core genes hold great potential of becoming the T-ALL's biomarkers, and they provide assistance to the exploration of pathogenesis of T-ALL and targeting drug development.

Key words: T-ALL, similarity fusion network, differential genes, key genes, bioinformatics

中图分类号: