Abstract
Background: Cytokines, as small signaling proteins, play critical roles in biological functions and are closely related with human diseases. Accurate identification of cytokines is the first step to provide insights into the relevance of cytokines and human diseases. In recent years, many research efforts have been done for the development of computational methods, especially for machine learning based methods, to fast and accurately identify cytokines. Currently, a major challenge lying in existing machine learning based methods is to improve the performance of cytokine identification.
Method: In this study, we attempt to enhance the performance of cytokine identification methods from the two following factors: (1) feature representation and (2) classifier selection. For feature extraction, we fuse multiple types of features showing good performance to classify cytokines from noncytokines, and employ two feature selection techniques, Max-Relevance-Max-Distance (MRMD) and Principal Components Analysis (PCA), to yield the optimal feature representations. For classifier selection, various powerful classifiers are performed, and the one with the highest performance is determined to build the classification model for our method.
Results: Based on the analysis, we learned that our feature sets stably maintain high performance with any of the classifier we used. And, the overall performances of the combinations were in the following order from best to worst: 473D+LIBSVM, MRMD+LIBD3C, and PCA+LIBSVM.
Conclusion: Comparative studies demonstrate that our proposed strategy is effective for the improved performance in identification of cytokines.
Keywords: Biological activities, cytokines, human diseases, max-relevance-max-distance (MRMD), feature selection techniques, principal components analysis (PCA).