Frontiers in Protein and Peptide Sciences

Volume: 1

Prediction of Human Protein Subcellular Locations with Feature Selection and Analysis

Author(s): Bi-Qing Li, Tao Huang, Lei Chen, Kai-Yan Feng and Yu-Dong Cai

Pp: 206-225 (20)

DOI: 10.2174/9781608058624114010013

* (Excluding Mailing and Handling)


In this paper, we propose a strategy to predict subcellular locations of human proteins using multi-step feature selection. Each protein is firstly coded by features derived from KEGG and GO enrichment scores. After an initial feature reduction, 9958 features remain and they are sorted by the Minimum Redundancy Maximum Relevance (mRMR) method. The sorted features are then filtered by an incremental feature selection (IFS) procedure and a compact set of features are obtained. Random forest (RF) is used as the prediction model and achieved an overall prediction accuracy of 67.72%, evaluated by ten-fold cross-validation. The corresponding KEGG pathways and GO terms of the resultant features are analyzed in-depth, and are deemed as the most important terms relating to human protein subcellular location.

Keywords: Subcellular location, minimum redundancy maximum relevance, incremental feature selection, random forest algorithm, ten-fold crossvalidation.

Related Journals
Related Books
© 2024 Bentham Science Publishers | Privacy Policy