Global Emerging Innovation Summit (GEIS-2021)

In Silico Identification, Analysis, and Prediction Algorithm for Plant Gene Cluster

Author(s): Himanshu Singh, C. Vineeth, Bhupender Thakur, Atul Kumar Upadhyay and Vikas Kaushik *

Pp: 237-244 (8)

DOI: 10.2174/9781681089010121010030

Abstract

The concept/phenomenon of operons, which are organized genes that work in a coordinated way in microbes, is well established. Recent developments in genetics, biochemistry, and bioinformatics have unraveled similar gene arrangements in plants. Here we aim to develop an algorithm/tool which would help us detect and identify biosynthetic gene clusters (BGCs) from any input plant genome. Through this tool, we intend to match or supersede the performance of pre-existing sting tools for BGC prediction, like the popular plantiSMASH. The predictions models were developed using the machine learning tool WEKA using the physicochemical properties as data set to classify between terpene synthases and non-terpene synthases. A set of ten physicochemical properties were selected and their values were predicted for each of the 159 proteins (terpene synthases and non-terpene synthases) Employing the random forest and SMO classifiers, we were able to obtain significantly promising accuracy of over 90 percent with 66 percent percentage split testing. Accurate prediction of BGCs in the plants, especially the major food crops like rice, wheat, and corn revolutionize farming and nutrition for the better.


Keywords: Algorithm, BGC, Mining, PlantiSMASH, Random forest, SMO WEKA.

Related Journals
Related Books
© 2024 Bentham Science Publishers | Privacy Policy