Journal of Biology ›› 2025, Vol. 42 ›› Issue (5): 67-.doi: 10.3969/j.issn.2095-1736.2025.05.067

Previous Articles     Next Articles

Machine learning-based prediction of secretory efficiency of signal peptides in Bacillus subtilis 

MENG Xiangbo, LI Cen, YUAN Chengwu, LIU Fufeng, LU Fuping, PENG Chong   

  1. College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, China
  • Online:2025-10-18 Published:2025-10-14

Abstract: This study aimed at the problem of poor regularity in the secretion efficiency of heterologous proteins guided by signal peptides. Eight datasets were constructed from the relevant data of the secretion of heterologous proteins guided by signal peptides fromBacillus subtilis, and prediction models of signal peptide secretion efficiency were developed using support vector machine (SVM) and Random Forest (RF) algorithms. Through various permutations of datasets, sequence features, and computational algorithms, a total of 458 classification models and 228 regression models were devised. The RF algorithm demonstrated superior classification performance, achieving 83.21% accuracy with the α-amylase dataset. In regression analysis, RF also outperformed other methods for the α-amylase dataset, yielding a model with a determination coefficient of 0.43. Additionally, the work revealed compositional differences in amino acids and GC3 content (the frequency of G and C nucleotides at the third position of codons) between high- and low-efficiency signal peptides, highlighting that good-performing signal peptides tended to have a higher proportion of unfolded amino acids and elevated GC3 content. In this study, the prediction of signal peptide secretion efficiency was realized, and the factors affecting the secretion efficiency of signal peptide were explored.

Key words: signal peptides, secretion efficiency, support vector machine, Random Forest;Bacillus subtilis

CLC Number: