Title:DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble
Classification Model for the Identification of Bamboo Species from
Genomic Sequences
Volume: 25
Issue: 3
Author(s): Aditi R. Durge and Deepti D. Shrimankar*
Affiliation:
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur,
India
Keywords:
Bamboo, dual, ensemble, feature selection, genome, genetic, heuristics.
Abstract:
Background: Analyzing genomic sequences plays a crucial role in understanding biological
diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer
from limitations such as complexity, low accuracy, and the need for constant reconfiguration in
response to evolving genomic datasets.
Aim: This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection-
based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo
species from genomic sequences.
Methods: The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic
feature selection. This process maximizes inter-class variance, leading to the selection of informative
N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal
training and validation sets, ensuring comprehensive coverage of class-specific features. The selected
features are then processed through an ensemble classification layer, combining multiple stratification
models for species-specific categorization.
Results: Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves
remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance
(4.5%). Importantly, the model maintains its performance even with an increased number of
species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm
Model.
Conclusion: DHFS-ECM offers several key advantages, including efficient feature extraction, reduced
model complexity, enhanced interpretability, and increased robustness and accuracy through
the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time
clinical applications and a valuable contribution to the field of genomic sequence analysis.