Abstract
Background: The knowledge of protein structural class plays an important role in understanding its tertiary structure. The globular protein domains, whose fold types are surprisingly similar, in spite of complex and irregular in natural condition, can be mainly divided into the following four classes: all – α, all – β, α /β, and alpha; +β according to secondary structural content. Various significant efforts have been made to predict protein structural classes. However, the information of protein sequence representation may exist redundancy in these approaches.
Method: The Relief F-SVM classification model was proposed to predict protein structural class. First, pseudo amino acid compositions (PseAA) features were extracted from each protein in the dataset, where features redundancy exists. Then, we used Relief F feature extraction method to reduce redundancy. Next, the optimized samples were given as input into the SVM. As the parameters were difficult to assure, the Simulated Annealing Particle Swarm Optimization (SAPSO) algorithm was embedded into the SVM.
Results: After the features are selected by the ReliefF algorithm, the dimension of the features was reduced from 420 to 292. The time of experiment reduced from 372.32s to195.58s, time-consuming reduced by nearly half. We compared it with the other existing methods to evaluate our method objectively. For the C204 dataset, the overall classification accuracy was 95.4% obtained using our method, which was 14.5% higher than the covariant matrix algorithm. Compared with the previous SVM, our method has improved by 10.1%. Under the circumstances of consistent feature data, the proposed method had 4.6% improvement over IDQD. As shown, the overall accuracy of the proposed method for the Z277 dataset achieved 96.5%, being higher than those of other methods.
Conclusion: The results found in this study further support the results of the description of protein sequence reported by Lin, and our method reduces the time-consumption by 47%. The accuracy of the prediction classification is also greatly improved, which proves the effectiveness of our method.
Keywords: Prediction, protein structural class, ReliefF, SVM, SAPSO, tertiary structure.