Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the
causative agent of the global pandemic of Coronavirus disease 2019 (COVID-19).
Limited information is available on evolutionary aspects of the structural proteins:
spike (S), envelope (E), membrane (M) and nucleocapsid (N) of the virus. Therefore,
we attempted detailed molecular and genetic characterization of SARS-CoV-2
structural protein genes using nucleotide composition, codon usage patterns,
phylogenetic, entropy and selection pressure analyses. The RSCU patterns suggested
codon biasness due to preference of U/A-ended over C/G-ended codons. Mutational
pressure and natural selection influence the synonymous codon usage of structural
protein genes in SARS-CoV-2. Phylogenetic analyses of different coronaviruses for all
the four structural genes showed that all 2019-nCoV study sequences were clustered
under the SARS-CoV-2 clade which was closest to bat coronaviruses. Additional
phylogenetic analyses of SARS-CoV-2 structural protein genes showed discordance in
the topology, suggesting different patterns of evolutionary relationships among these
genes. Few non-synonymous amino acid mutations, low value of entropy and purifying
selection suggested limited variations in the studied genes. However, these variations in
the SARS-CoV-2 genome are likely to increase in near future since the virus will try to
evade the host immune response to enhance its survival in humans. Thus, we evaluated
the genetic diversity of the structural protein genes along with the genomic
composition and codon usage patterns of SARS-CoV-2. Thus, present data on
molecular characterization of structural protein genes is likely to augment the
information about the evolution, biology and adaptation of SARS-CoV-2 in the human
host.
Keywords: Entropy, Gene ontology, Molecular characterization, Mutational
pressure, Natural selection, Nucleotide composition, Phylogenetic analysis,
SARS-CoV-2, Structural proteins, Synonymous codon usage.