
E-Pub Ahead of Schedule:Bentham Science Publishers
are pleased to offer electronic publication of accepted papers
prior to scheduled publication. These peer-reviewed papers
can be cited using the date of access and the unique DOI number.
Any final changes in manuscripts will be made at the time
of print publication and will be reflected in the final electronic
version of the issue. Articles ahead of schedule may be ordered
by pay-per-view at the relevant links by each article stated
via the E-Pub Ahead of Schedule
Disclaimer: Articles appearing in E-Pub
Ahead-of-Schedule sections have been peer-reviewed and accepted
for publication in this journal and posted online before scheduled
publication. Articles appearing here may contain statements,
opinions, and information that have errors in facts, figures,
or interpretation. Accordingly, Bentham Science Publishers,
the editors and authors and their respective employees are
not responsible or liable for the use of any such inaccurate
or misleading data, opinion or information contained of articles
in the E-Pub Ahead-of-Schedule.

Editorial: Bioinformatics on Proteins and Complexes
[BSP/CBIO/E-Pub/00001]
Sequence-structure similarity: Do sequentially identical peptide fragments have similar three-dimensional structures?
M. Uthayakumar, Sanjeev Patra and K. Sekar
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00002]
Glocal: Reconstructing protein 3D structure from 2D contact map by combining global and local optimization schemes
Yong-Xian Fan, Jun Chen and Hong-Bin Shen
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00003]
A computational identification method for GPI-anchored proteins by artificial neural network
Yuri Mukai, Hirotaka Tanaka, Masao Yoshizawa, Osamu Oura, Takanori Sasaki and Masami Ikeda
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00004]
Recent advances in predicting G-protein coupled receptor classification
Xuan Xiao, Wei-Zhong Lin and Kuo-Chen Chou
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00005]
Role of long-range contacts and structural classification in understanding the free energy of unfolding of two-state proteins
B. Harihar and S. Selvaraj
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00006]
Discrimination of thermophilic and mesophilic proteins using reduced amino acid alphabets with n-grams
Aydin Albayrak and Ugur O. Sezerman
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00007]
Bioinformatics of protein-protein interfaces and small molecule effectors
P. Walter, O, Ulucan, J. Metzger and V. Helms
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00008]
Development of RNA Stiffness parameters and analysis on Protein-RNA Recognition: Comparison with DNA
M. Michael Gromiha
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00009]
Predicting Protein Metal Binding Sites with RBF Networks based on PSSM Profiles and Additional Properties
Y-Y. Ou
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00010]
Structure-Based Discovery Of Anti-Viral Compounds
Devadasan Velmurugan, Udhayasuriyan Malar Selvi, Udhayakumar Mythily, Kutumba Rao, Ramaiah Rajarajeshwari and Sangeetha
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00011]
Data Preprocessing and Filtering In Mass Spectrometry Based Proteomics
Beáta Reiz, Attila Kertész-Farkas, Sándor Pongor and Michael P. Myers
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00012]
Database Searching In Mass Spectrometry Based Proteomics
Attila Kertész-Farkas, Beáta Reiz, Michael P. Myers and Sándor Pongor [Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00013]
A Novel Method of Sequence Similarity Evaluation in N-dimensional Sequence Space
Andrzej Kasperski and Renata Kasperska
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00022]
Advantages of a Pareto-based genetic algorithm to solve the gene synthetic design problem
Paulo Gaspar and José Luís Oliveira
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00023]
Semantic Web for Current Healthcare and Bioinformatics
Huajun Chen and Guotong Xie
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00024]
Towards an Ontology to support semantics enabled Diagnostic Decision Support Systems
Alejandro Rodríguez-González, Gandhi Hernández-Chan, Ricardo Colomo-Palacios, Juan Miguel Gomez-Berbis, Ángel García-Crespo, GinerAlor-Hernandez and Rafael Valencia-Garcia
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00025]
Towards a Metadata Model for Mass-SpectrometryBased Clinical Proteomics
John Springer, Fan Zhang, Peter Hussey, Charles Buck, Fred Regnier and Jake Chen
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00026]
Publishing Orthology and Diseases Information in the Linked Open Data cloud
Jose Antonio Miñarro-Giménez, Mikel Egaña-Aranguren, Boris Villazón-Terrazas and Jesualdo Tomás Fernández-Breis
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00027]
SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting
Christopher D. Pierce, David Booth, Chimezie Ogbuji, Chris Deaton, Eugene Blackstone and Doug Lenat
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00028]
DartWiki: A Semantic Wiki for Ontology-Based Knowledge Integration in the Biomedical Domain
Tong Yu, Huajun Chen, Jinhua Mi, Peiqin Gu and Ting Wu
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00029]
Editorial: Contemporary Trends in Bioinformatics Relevant for Some Important Biomedical Problems
[BSP/CBIO/E-Pub/00030]
Steered Molecular Dynamics - a Promising Tool for Drug Design
Mai Suan Li and Binh Khanh Mai
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00031]
Recent progress of molecular docking simulations applied to development of drugs
Linus Santana Azevedo, Fernanda Pretto Moraes, Mariana Morrone Xavier, Eduarda Ozório Pantoja, Bianca Villavicencio, Jana Aline Finck, Audrey Menegaz Proença, Kelen Beiestorf Rocha and Walter Filgueira de Azevedo Jr.
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00032]
Recent Developments and Prospects for Influenza M2 Ion Channel Inhibitors That Circumvent Amantadine Resistance
Petar M. Mitrasinovic
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00033]
Structure and Function of Enzymes of Shikimate Pathway
Aditya Dev, Satya Tapas, Shivendra Pratap and Pravindra Kumar
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00034]
Conserved Domains, Residues, WebLogo and Active Sites of Caspase-Cascades Related to Apoptotic Signaling Pathway
Chiranjib Chakraborty, Jinny Tomar and V.K.Gera
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00035]
Can Bioinformatic Methods Inform Us about the Molecular Evolution of Different Human Caspases?
Jinny Tomar, Vishnu Kumar Gera and Chiranjib Chakraborty
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00036]
Using Network-based Approaches to Predict Ligands of Orphan Nuclear Receptors
Zhernan Jiang, Ran Tao, Lei Du, Weiming Yu and Junxiang Wang
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00037]
Machine Learning Sequence Classification Techniques: Application To Cysteine Protease Cleavage Prediction
David A. duVerle and Hiroshi Mamitsuka
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00038]
Towards creating complete proteomic structural databases of whole organisms
B. Jayaram and Priyanka Dhingra
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00039]
From Ontology-Based Gene Function to Physiological Model
Ajay Shiv Sharma, Hari Om Gupta and Petar M. Mitrasinovic
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00040]
Experimental and computational challenges from array-based to sequence-based ChIP techniques
Xun Lan and Victor X. Jin
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00041]
Genome To Vaccinome: Role of Bioinformatics, Immunoinformatics & Comparative Genomics
Urmila Kulkarni-Kale, Vaishali Waman, Snehal Raskar, Swati Mehta and Smita Saxena
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00042]
Crimean- Congo hemorrhagic fever virus: Strategies to combat with an emerging threat to human
Pratap S, Narwal M, Dev A, Dhindwal S, Tomar S and Kumar P
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00043]
Comparative Genomics and Systems Biology of Malaria Parasites Plasmodium
Hong Cai, Zhan Zhou, Jianying Gu and Yufeng Wang
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00044]
Towards an Experimental and Systems Biology Framework for Cancer Cell Therapeutics
Petar M. Mitrasinovic
[Abstract] [FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00045]
Abstracts

Editorial: Bioinformatics on Proteins and Complexes
[BSP/CBIO/E-Pub/00001]
Proteins and their interactions play vital roles in living organisms. Elucidating the mechanism of protein folding as well as the recognition of proteins with other molecules (proteins, nucleic acids and carbohydrates and ligands) are intriguing and challenging problems in computational and molecular biology. The problem of protein folding, stability and interactions has been viewed through several perspectives using experimental and computational approaches. Further, Bioinformatics has been successfully applied to enhance our understanding on protein folding, stability and their interactions.
The special issue on “Bioinformatics on proteins and complexes” is aimed at providing a recent update on the computational analysis of proteins based on their folding, stability and interactions. It addressed various issues such as sequence-structure similarity, structure prediction, folding rates and stability of proteins. Further, it covered protein-protein/protein-RNA interactions, structure based drug design and proteomics analysis. The special issue is broadly classified into three parts; the first part is focused on the aspects of protein folding and stability with six articles, second part is devoted to protein interactions, which has four papers and the last part is dealing with database searching and preprocessing in mass spectrometry based proteomics.
The opening article by Uthayakumar et al. [1] explored the ambiguity between sequence-structure relationships in proteins and examined the probability of sequentially identical peptide fragments to adopt with similar three-dimensional structures. Chen and Shen [2] proposed a novel protocol to decipher the 3D structure of a protein based on its 2D contact map by combining both global and local optimization schemes. Mukai et al. [3] described a new method for predicting GPI anchored proteins based on hydropathy profiles and position-specific scores in combination with the back propagation artificial neural network. The overview of advancements in classifying G-protein coupled receptors has been illustrated by Xiao et al. [4]. Harihar and Selvaraj [5] analyzed the role of long-range contacts and structural classification for understanding the unfolding free energy of two-state proteins. Albayrak and Sezerman [6] utilized reduced amino acid alphabets with n-grams for discriminating thermophilic and mesophilic proteins. The bioinformatics of protein-protein interfaces and small molecule effectors has been extensively reviewed by Walter et al. [7]. Gromiha [8] developed stiffness parameters for different trinucleotides in RNA and analyzed the role of stiffness for understanding protein-RNA binding specificity. Ou [9] proposed a method based on Position Specific Scoring Matrix profiles along with conservation score and solvent accessible surface area for identifying metal-binding residues in proteins. The structure based drug discovery of anti-viral compounds for hepatitis B and C, human immunodeficiency and dengu viruses has been described elaboratly by Velmurugan et al. [10]. Reiz et al. [11] exhaustively reviewed the main principles underlying the preprocessing of mass spectrometry data and provided an overview of the publicly available tools. Kertész-Farkas [12] discussed the major computational approaches to spectrum database searching and the statistical analysis of the results.
In essence, this special issue comprehends the exciting developments in the area of bioinformatics on proteins and their complexes as well as database preprocessing and searching in mass spectrometry based proteomics. It will be a valuable resource for computational biologists, biochemists, biophysicists, bioinformaticians and researchers working in the field of proteins.
I would like to thank all the authors for their outstanding contributions and cooperation to complete the task. The guest editor also thanks the Editor-in-Chief Professor S.P. Gupta for his invitation, encouragement, and support for the successful completion of the special issue.
References
[1] Uthayakumar M, Patra S, Sekar K. Sequence -structure similarity: Do sequentially identical peptide fragments have similar three-dimensional structures? Curr Bioinf 2012; 7. [2] Chen J, Shen H-B. Recent progress of protein structure prediction based on residue contact map. Curr Bioinf 2012; 7.
[3] Mukai Y, Tanaka H, Yoshizawa M, Oura O, Sasaki T, Ikeda M. A computational identification method for GPI-anchored proteins by artificial neural network. Curr Bioinf 2012; 7.
[4] X. Xiao, W-Z. Lin, and K-C. Chou. Recent advances in predicting G-protein coupled receptor classification. Curr Bioinf 2012; 7.
[5] B. Harihar and S. Selvaraj. Role of long-range contacts and structural classification in understanding the free energy of unfolding of two-state proteins. Curr Bioinf 2012; 7.
[6] A. Albayrak, U.O. Sezerman. Discrimination of thermophilic and mesophilic proteins using reduced amino acid alphabets with n-grams. Curr Bioinf 2012; 7.
[7] P. Walter, O, Ulucan, J. Metzger and V. Helms. Bioinformatics of protein-protein interfaces and small molecule effectors. Curr Bioinf 2012; 7.
[8] M. M. Gromiha. Development of RNA Stiffness parameters and analysis on Protein-RNA Recognition: Comparison with DNA. Curr Bioinf 2012; 7.
[9] Y-Y. Ou. Predicting Protein Metal Binding Sites with RBF Networks based on PSSM Profiles and Additional Properties. Curr Bioinf 2012; 7.
[10] D. Velmurugan, U.M. Selvi, U. Mythily, K. Rao, R. Rajarajeshwari, and Sangeetha. Structure based discovery of anti-viral compounds. Curr Bioinf 2012; 7.
[11] B. Reiz, A. Kertész-Farkas, S. Pongor, M.P. Myers. Data preprocessing and filtering in mass spectrometry based proteomics. Curr Bioinf 2012; 7.
[12] A. Kertész-Farkas, B. Reiz, M.P. Myers, S. Pongor. Database searching in mass spectrometry based proteomics. Curr Bioinf 2012; 7.
[Back to top]
Sequence-structure similarity: Do sequentially identical peptide fragments have similar three-dimensional structures?
M. Uthayakumar, Sanjeev Patra and K. Sekar
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00002]
The rapidly growing structure databases enhance the probability of finding identical sequences sharing structural similarity. Structure prediction methods are being used extensively to abridge the gap between known protein sequences and the solved structures which is essential to understand its specific biochemical and cellular functions. In this work, we plan to study the ambiguity between sequence-structure relationships and examine if sequentially identical peptide fragments adopt similar three-dimensional structures. Fragments of varying lengths (five to ten residues) were used to observe the behavior of sequence and its three-dimensional structures. The STAMP program was used to superpose the three-dimensional structures and the two parameters (Sequence Structure Similarity Score (Sc) and Root Mean Square Deviation value) were employed to classify them into three categories: similar, intermediate and dissimilar structures. Furthermore, the same approach was carried out on all the three-dimensional protein structures solved in the two organisms, Mycobacterium tuberculosis and Plasmodium falciparum to validate our results.
[Back to top]
Glocal: Reconstructing protein 3D structure from 2D contact map by combining global and local optimization schemes
Yong-Xian Fan, Jun Chen and Hong-Bin Shen
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00003]
Prediction of protein 3D structure from solely its amino acid sequence is one of the most challenging problems in structural bioinformatics, where the 3D structure reconstruction from observed constraints is the key step. In this paper, we propose a novel protocol called Glocal to recover a protein’s 3D coordinates based on a given 2D contact map by combining both global and local optimization schemes achieved by the swarm intelligence of Particle Swarm Optimization (PSO) and the Simulated Annealing (SA) techniques respectively. Our results demonstrate that Glocal can recover the 3D structures with the average RMSD less than 2 Å from the native contact map. Further analysis also shows that Glocal is powerful for handling with noisy contact map with the proposed combination optimization approaches.
[Back to top]
A computational identification method for GPI-anchored proteins by artificial neural network
Yuri Mukai, Hirotaka Tanaka, Masao Yoshizawa, Osamu Oura, Takanori Sasaki and Masami Ikeda
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00004]
The attachment of glycosylphosphatidylinositol (GPI) is one of the most important post-translational modifications of proteins and plays an important role in promoting biochemical activities in eukaryotic cells. GPI-anchored proteins (GPI-APs) are characterized by GPI-anchor attachment signals of hydrophobic residues and small residues near the GPI-anchoring site (ω-site). Here, we describe a new method for predicting GPI-APs based on hydropathy profiles and position-specific scores (PSSs) in combination with the back propagation artificial neural network (BP-ANN). First, the sequences of GPI-APs and negative controls were aligned according to residue size in the C-terminal region and the position-specific amino acid propensities were analyzed according to their alignment positions. Next, PSSs were created using the amino acid propensities of GPI-APs and the negative controls, and BP-ANN with a three-layered structure was trained by the PSSs. The accuracy of discriminating GPI-APs from the negative controls was evaluated in a 4-fold cross-validation test and GPI-APs were detected with 94.8% sensitivity and 92.9% specificity. This result shows that our method can predict GPI-APs with high accuracy and a combination of PSSs and BP-ANN can effectively discriminate GPI-APs.
[Back to top]
Recent advances in predicting G-protein coupled receptor classification
Xuan Xiao, Wei-Zhong Lin and Kuo-Chen Chou
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00005]
G protein-coupled receptors (GPCRs) are integral membrane proteins with seven trans-membrane helices. Belonging to the largest family of cell surface receptors, GPCRs are among the most frequent targets of therapeutic drugs. Unfortunately, since they are difficult to crystallize and most of them will not dissolve in normal solvents, so far the number of GPCRs with three-dimensional structure determined is very limited.
This situation has challenged us to develop automated methods by which one can predict the family and sub-family classes of GPCRs based on the information of their primary sequences alone, so as to facilitate classifying drugs, a technique called “evolutionary pharmacology” often used in pharmaceutical industries for drug development. In the past eight years, various computational methods were proposed. This review is devoted to summarize their development. Meanwhile, the future challenge in this area has also been briefly addressed.
[Back to top]
Role of long-range contacts and structural classification in understanding the free energy of unfolding of two-state proteins
B. Harihar and S. Selvaraj
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00006]
Free energy of unfolding (ΔGu) is the difference between the free energy values of the folded and unfolded structures of a protein. A successful model describing both folding/unfolding rates of proteins should be able to provide considerable insight on free energy of unfolding. In our earlier works, we have shown that Long-range Order (LRO) correlates well with both folding/unfolding rates of two-state proteins. In the present work, we examine the extent to which LRO can be used to predict the free energy of unfolding. For a standard data set of 29 two-state proteins, no significant correlation was observed between ΔGu and LRO. However after grouping the proteins according to their structural class, all-alpha and all-beta proteins showed a better correlation of r = 0.77 and r = 0.89, whereas mixed-class proteins still showed a poor correlation. We have also analyzed the relationship between various other structure derived topological parameters with ΔGu values and the results observed showed that all these parameters also gave a poor correlation with ΔGu values when structural classification was not taken into account. Similar to LRO, after structural classification better improvement in correlation was observed for all-alpha and all-beta proteins and not a single topological parameter showed reasonable correlation with ΔGu values of mixed-class proteins and suggested that understanding ΔGu values of mixed-class proteins remains complicated.Our present work implies that theoretical models to understand stability of proteins can be developed based on their 3-D structures and further experimental/theoretical studies will shed light on these predictions.
[Back to top]
Discrimination of thermophilic and mesophilic proteins using reduced amino acid alphabets with n-grams
Aydin Albayrak and Ugur O. Sezerman
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00007]
Protein thermostabilization has been the focus of recent research due to growing interest in the production of enzymes that can operate at temperatures that are industrially beneficial. Understanding the determinants of thermostabilization at the level of sequence and structure are important to design such enzymes. A bioinformatical approach was used to determine the extent by which reduced amino acid alphabets (RAAA) with n-grams (subsequences of length n) that were subjected to a t-test-based feature selection procedure can be used to discriminate proteins from thermophiles and mesophiles. Classification performance of 65 different protein alphabets with 3 different n-gram sizes was systematically evaluated using support vector machines in a test set that contained 707 proteins from mesophilic Xylella fastidosa and thermophilic Aquifex aeolicus. A classification accuracy of 91.796% was achieved with Hsdm16 RAAA with 13 features: EK-ILV-ST-A-G-F-H-Q-N-R-M-W-Y. The t-test-based feature selection procedure reduced the classification time without significantly affecting classification accuracy. The overall combination of methods in this paper is useful and computationally fast for classifying protein sequences from thermophiles and mesophiles using sequence information alone.
[Back to top]
Bioinformatics of protein-protein interfaces and small molecule effectors
P. Walter, O, Ulucan, J. Metzger and V. Helms
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00008]
The structural analysis of protein-protein interactions and the prediction of their functional properties are important areas in modern structural bioinformatics. First, we review concepts for classifying protein-protein interactions, and for analyzing the geometry and composition of binding interfaces. Next, computational methods are discussed that allow predicting hot-spot residues and the kinetics and thermodynamics of binding. Then, we focus on the mode of action of small molecule effectors that may either act as competitive antagonists of protein binders or as allosteric modulators. Here, we emphasize the roles of pre-formed or transiently open ligand-binding pockets at protein-protein interfaces. The presentation is rounded up by an overview over databases on protein-protein and protein-small-molecule interactions.
[Back to top]
Development of RNA Stiffness parameters and analysis on Protein-RNA Recognition: Comparison with DNA
M. Michael Gromiha
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00009]
It has been well established that the elastic character of DNA plays an important role to protein-DNA binding specificity. In this work, we have analyzed the role of elasticity for understanding the binding specificity of protein-RNA complexes. We have developed a sequence dependent stiffness scale for the trinucleotides in RNA and revealed the similarities and differences compared with DNA. We found that the stiffness of 15 trinucleotides has inverse effects and nine nucleotides are significantly different between RNA and DNA. The analysis on the relationship between RNA stiffness and RNA-binding specificity shows that the influence of elasticity is minimal in protein-RNA recognition whereas it plays an important role in protein-DNA binding specificity. We observed a moderate correlation between stiffness and dissociation constant in U1A RBD1 protein and PP7 coat protein whereas the correlation is poor for many other complexes. This results show that along with RNA stiffness, other interactions, such as shape complementarity, electrostatic interactions, hydrogen bonds and direct contacts between RNA and protein atoms are important for protein-RNA recognition.
[Back to top]
Predicting Protein Metal Binding Sites with RBF Networks based on PSSM Profiles and Additional Properties
Y-Y. Ou
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00010]
Background: Metal atoms are involved in many biological mechanisms, such as protein structure stability, apoptosis and aging. Therefore, identifying metal-binding sites in proteins is an important issue in helping biologists better understand the workings of these mechanisms.
Methods: We propose a method based on Position Specific Scoring Matrix (PSSM) profiles and additional information (conservation score and solvent accessible surface area (ASA)) to identify metal-binding residues in proteins.
Results: We have selected a non-redundant set of 262 metal-binding proteins and 617 disulfide proteins as the independent test set. The proposed method can predict metal-binding sites at 51.0% recall and 73.4% precision. Comparing with the previous work of A. Passerini et al., the proposed method can improve over 7% of precision with the same level of recall on the independent dataset.
Conclusions: We have developed a novel approach based on PSSM profiles and additional properties for identifying metal-binding sites from proteins. The proposed approach achieved a significant improvement with newly discovering metal-binding proteins and disulfide proteins.
[Back to top]
Structure-Based Discovery Of Anti-Viral Compounds
Devadasan Velmurugan, Udhayasuriyan Malar Selvi, Udhayakumar Mythily, Kutumba Rao, Ramaiah Rajarajeshwari and Sangeetha
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00011]
Viral diseases cause severe damage to human lives than any other microbes. Hepatitis is the inflammation of liver and currently six strains of viral hepatitis are identified. Infection by Hepatitis B Virus (HBV) and Hepatitis C Virus (HCV) causes serious mortality, morbidity and becomes a global health problem. Human Immunodeficiency Virus (HIV) is increasing in the world, with an estimation of 5.7 million cases of HIV infection in India. In addition to these viruses, Dengue virus, which belongs to the family Flaviviridae also emerges as a global threat to humans and is a major emerging pathogen for which the development of vaccine and anti-viral therapy has seen a little success. The NS3 viral protease is a potential target for anti-viral drugs, since it is required for viral replication. As Dengue hemorrhage diseases are the life-threatening ones, attempts are being made worldwide to design inhibitors for DENV-2 NS2B-NS3 protease, DENV-4 NS3 protease-helicase as targets. In view of the above viral threats to human life, attempts are being made to come out with anti-viral compounds from natural resources and also from synthetic routes. Natural sources include compounds reported from Neem (Azadirachta indica), Bael (Aegle marmelos), Murraya koenigii, Heliopsis scabra, Taiwania cryptomerioides, edible fishes, and crab. Synthetic peptides and organic compounds are also attempted as inhibitors. Viral proteins are retrieved from Protein Data Bank (PDB) and docked with these lead compounds and the results are analyzed. All docking studies have been carried out using Schrödinger USA suite of programs 2009.
[Back to top]
Data Preprocessing and Filtering In Mass Spectrometry Based Proteomics
Beáta Reiz, Attila Kertész-Farkas, Sándor Pongor and Michael P. Myers
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00012]
Mass spectrometry based proteomics analysis can produce many thousands of spectra in a single experiment, and much of this data, frequently greater than 50%, cannot be properly evaluated computationally. Therefore a number of strategies have been developed to aid the processing of mass spectra and typically focus on the identification and elimination of noise, which can provide an immediate improvement in the analysis of large data streams. This is mostly carried out with proprietary software. Here we review the current main principles underlying the preprocessing of mass spectrometry data give an overview of the publicly available tools.
[Back to top]
Database Searching In Mass Spectrometry Based Proteomics
Attila Kertész-Farkas, Beáta Reiz, Michael P. Myers and Sándor Pongor
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00013]
Bottom-up proteomics (mass spectrometry analysis of peptides obtained by proteolysis and separated by liquid chromatography, (LC-MS/MS)) is one of the most frequently used techniques for identifying and characterizing proteins in biological samples. A key element of the analysis is database searching when the mass spectra of the peptides are compared with a database of theoretically computed (or experimental) peptide spectra. Here we discuss the main computational approaches to spectrum database searching and the statistical analysis of the results.
[Back to top]
A Novel Method of Sequence Similarity Evaluation in N-dimensional Sequence Space
Andrzej Kasperski and Renata Kasperska
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00022]
The aim of this work is to establish a universal method of searching for similarities between sequences in an n-dimensional sequence space. The presented idea extends out of the original Dot-Matrix and semihomology methods with a possibility of making analyses in an n-dimensional sequence space and indicates the method of similarity evaluation. The main novelty of the implemented dotPicker program is to allow for searches of similarities in an n-dimensional sequence space. Sets of identity fragments, which represent given protein families, have been obtained using this program. The idea of evaluation of the obtained identity fragments is proposed and the utilization of it is presented. Moreover, the potential of the dotPicker program is shown especially when analyzing and identifying previously unknown similarities in protein families.
[Back to top]
Advantages of a Pareto-based genetic algorithm to solve the gene synthetic design problem
Paulo Gaspar and José Luís Oliveira
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00023]
Codon usage, codon context, rare codons, nucleotide repetition and mRNA destabilizing sequences are but a few of the many factors that influence the efficiency of protein synthesis. Therefore, gene redesign for heterologous expression is a multi-objective optimization problem and the factors that need to be considered are often conflicting. Evolutionary approaches have already been shown to be able to evolve a sequence under the forces of specific constraints. However, it is unclear what are the advantages of a slower algorithm such as GA when compared with other faster algorithms in the gene redesign context.
Here, a solution using genetic algorithms along with a Pareto archive is used for the gene synthetic redesign problem. The different redesign parameters are merged using an adapted genetic algorithm strategy. From the created model, the best possible synonymous gene sequence is generated. This allows tackling the gene redesign problem by exploring the large search space of possible synonymous sequences. It is then shown that genetic algorithms have several advantages over other heuristics in the gene redesign problem. For instance, the ability to return the best solutions constituting the main part of the Pareto front, even in non-convex or non-continuous spaces. This allows a researcher to select synonymous genes among the optimal solutions, to best suit his purpose, instead of accepting a single solution that might represent an unwanted trade-off between the objectives.
[Back to top]
Semantic Web for Current Healthcare and Bioinformatics
Huajun Chen and Guotong Xie
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00024]
The decade since the publication of the Semantic Web article in Scientific American in 2001 has witnessed a multitude of novel healthcare and bioinformatics applications that builds upon the open integration capability of the Semantic Web. This theme issue illustrates how the semantically enriched information has both enhanced our knowledge and expanded the impact on biomedical research in terms of scientific knowledge modeling and integration, linked data publication and interlinking, and decision support systems. Five papers have been selected and included, serving as typical examples of Semantic Web adoption in both healthcare and bioinformatics.
[Back to top]
Towards an Ontology to support semantics enabled Diagnostic Decision Support Systems
Alejandro Rodríguez-González, Gandhi Hernández-Chan, Ricardo Colomo-Palacios, Juan Miguel Gomez-Berbis, Ángel García-Crespo, GinerAlor-Hernandez and Rafael Valencia-Garcia
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00025]
Healthcare has played a main role in the Semantic Web (SW) field given the knowledge representation possibilities that SW it is capable of addressing. Nowadays there are a large number of ontologies which can be used for several domains of healthcare (genetics, proteins, cellular components, anatomy, and specific diseases among others). However, in some cases, the definition and population of these ontologies are not enough to be used in concrete domains. In this paper we provide the design of a set of ontologies for their direct use in diagnostic decision support systems. We have designed an ontology modular architecture where main (root) ontology is created to define the main relations which can be found in the aforementioned domain. A set of subsumed ontologies hasalso been designed following some principles of OBO-Foundry and using SNOMED-CT terminology as the main interoperability component. These ontologies have been also designed trying to create them as light as possible. The evaluation of the designed ontology is based on a set of quantitative aspects which aims to show the main principles which should be followed in the process of design ontologies for the domain of differential diagnosis.
[Back to top]
Towards a Metadata Model for Mass-SpectrometryBased Clinical Proteomics
John Springer, Fan Zhang, Peter Hussey, Charles Buck, Fred Regnier and Jake Chen
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00026]
Recent proteomics studies of clinical samples have generated substantial interest. Aided by advances in analytical chemistry and bioinformatics, clinical proteomics has become a driving force behind molecular biomarker development. However, it is still difficult to manage and interpret large amounts of clinical proteomics data due to data integration challenges. The lack of practical metadata representation standards has prevented sharing and interpretation of mass spectrometry experimental results derived from different experimental conditions or different proteomics labs, and ultimately this absence has resulted in missed opportunities for proteomic biomarker discovery. Therefore, in this paper, we describemethods for deploying Semantic Web technologies to design anontology using OWL for clinical proteomics information and to manage such information using various mechanisms, such as CPAS.We developed a practical proteomics experimental metadata model using Semantic Web technologies and demonstrated the manner in which this model can be integrated with current proteomics data analysis software systems. We demonstrated the manner in which systems employing the metadata model canbegin to enable inter-laboratory sharing and analysis of clinical proteomics data. We also discussed the manner in which these tools and techniques have aided in proteomic biomarker discovery studies.Our work reflects an approach to adopt a Cancer Biomedical Informatics Grid (caBIG) compliant software system through the use ofan ontology-based metadata model. This effort is the first step in a bigger initiative to move toward an ontology-based approach that enables a standards-driven approach to large-scale inter-laboratory proteomics data integration and analyses with the overarching goal of the discovery of proteomic biomarkers.
[Back to top]
Publishing Orthology and Diseases Information in the Linked Open Data cloud
Jose Antonio Miñarro-Giménez, Mikel Egaña-Aranguren, Boris Villazón-Terrazas and Jesualdo Tomás Fernández-Breis
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00027]
The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.
[Back to top]
SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting
Christopher D. Pierce, David Booth, Chimezie Ogbuji, Chris Deaton, Eugene Blackstone and Doug Lenat
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00028]
Semantic Web technologies offer the potential to revolutionize management of health care data by increasing its interoperability and reusability while reducing the need for redundant data collection and storage. From 1998 through 2010, Cleveland Clinic sponsored a project designed to explore and develop this potential. The product of this effort, SemanticDB, is a suite of software tools and knowledge resources built to facilitate the collection, storage and use of the diverse data needed to conduct clinical research and health care quality reporting. SemanticDB consists of three main components: 1) a content repository driven by a meta-model that facilitates collection and integration of data in an XML format and automatically converts the data to RDF; 2) an inference-mediated, natural language query interface designed to identify patients who meet complex inclusion and exclusion criteria; and 3) a data production pipeline that uses inference to generate customized views of the repository content for statistical analysis and reporting. Since 2008, this system has been used by the Cleveland Clinic's Heart and Vascular Institute to support numerous clinical investigations, and in 2009 Cleveland Clinic was certified to submit data produced in this manner to national quality monitoring databases sponsored by the Society of Thoracic Surgeons and the American College of Cardiology.
[Back to top]
DartWiki: A Semantic Wiki for Ontology-Based Knowledge Integration in the Biomedical Domain
Tong Yu, Huajun Chen, Jinhua Mi, Peiqin Gu and Ting Wu
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00029]
Semantic Web languages and technologies can be used for the annotation, classification, and organization of knowledge assets and digital artifacts based on biomedical ontologies. In this paper, we present a semantic wiki, named DartWiki, to build ontology-based digital encyclopedia for the biomedicine domain. DartWiki provides a Web-based interface for accessing knowledge artifacts in both per-artifact and per-concept mode. In the per-artifact mode, users can access these artifacts, and annotate them in both short texts and logical statements in terms of domain ontologies. In the concept-based mode, users can navigate a graph of concepts, and review and edit the synthesized page about a selected concept, which contains meaningful information about the concept, and also its related concepts and artifacts. Smooth transitions between the two modes are achieved through semantic links. As a use case of the DartWiki, we provide an open platform for the management and maintenance of digital artifacts in Integrated Medicine. This system provides medical practitioners with relevant and trustworthy knowledge artifacts, and also means to input artifacts, to clarify their meaning, and to check and improve their quality, which encourages the inclusion and participation of users, and effectively creates an online community around knowledge sharing.
[Back to top]
Editorial: Contemporary Trends in Bioinformatics Relevant for Some Important Biomedical Problems
[BSP/CBIO/E-Pub/00030]
The focus of the thematic issue of Current Bioinformatics is to feature the latest advancements in the management and analysis of biological data aiming at contributing to solving some vital biomedical problems.
Several review articles deal with contemporary trends in the fields of drug design and discovery as well as structural bioinformatics.
Research progress in employing both molecular dynamics (Li and Mai) and molecular docking (Azevedo et al.) simulations in drug design and discovery is presented. Experimental and computational results converging to resolving a controversy on the proper binding mode of amantadine, well known inhibitor of the M2 ion channel protein of the influenza A virus, as well as critical implications for the development of novel inhibitors are analyzed by Mitrasinovic. The use of a broad spectrum of bioinformatics tools to shed more light on the structure and function of seven enzymes playing a pivotal role in the shikimate pathway, extensively studied in many pathogenic organisms, is elucidated by Dev et al. Tomar et al. feature a crucial standpoint addressing the question of how bioinformatics can help us understand molecular evolution of different human caspases, molecules of vital importance for the apoptotic context discussed by Chakraborty et al. Jiang et al. illustrate how molecular descriptor information may be geared toward predicting a variety of ligand-orphan nuclear receptor interactions being of substantial importance for discovery of novel drug targets. From a machine learning standpoint, interesting insights into proteolytic cleavage sites are provided by duVerle and Mamitsuka. The question of creating complete proteomic structural databases of whole organisms is addressed by Jayaram and Dhingra.
Current developments and prospects in experimental techniques, sequence analysis, comparative genomics and systems biology are also reviewed and their potential to address some important biomedical issues (reverse vaccinology, fight against infectious diseases, cancer cell therapeutics, etc.) is critically evaluated.
Sharma et al. consider the role of database management in approaching to physiological models from functional genomics via gene ontology. Experimental and computational challenges, arising from array-based to sequence-based ‘omics techniques, are dissected by Lan and Jin. A detailed account of the state-of-the-art resources and methods in the critical areas of reverse vaccinology is given by Kulkarni-Kale et al. Pratap et al. describe the sequence analysis-based strategies that are needed for the fight against an emerging threat of the Crimean-Congo hemorrhagic fever virus to humans. The way in which systems biology may be combined with comparative genomics to explore the malaria parasites Plasmodium is presented by Cai and Wang. An experimental and systems biology framework for cancer cell therapeutics is outlined by Mitrasinovic.
I am very grateful towards my dear colleagues for contributing the impressive review articles on the latest relevant developments in the field. It is my sincere hope the overall effort will provide inspiration to researchers at the interface between life and medical sciences to face new interesting challenges of substantial importance with vigor.
Guest Editor:
Prof. Dr. Petar M. Mitrasinovic
Indian Institute of Technology Roorkee, India,
Wakayama University, Japan &
Belgrade Institute of Science and Technology, Serbia
[Back to top]
Steered Molecular Dynamics - a Promising Tool for Drug Design
Mai Suan Li and Binh Khanh Mai
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00031]
About 15 years ago the steered molecular dynamics (SMD) was used to probe binding of ligand to biomolecule surfaces but in term of drug design this approach has only recently attached attention of researchers. The main idea of using SMD to screen out leads is based on the hypothesis that the larger is the force needed to unbind a ligand from a receptor the higher its binding affinity. Thus, instead of binding free energy, the rupture force defined as the maximum on the force-time/displacement profile, is used as a score function. In this mini-review we discuss basic concepts behind the experimental technique atomic force microscopy as well as SMD. Experimental and theoretical works on application of SMD to the drug design problem are covered. Accumulated evidences show that SMD is as accurate as the molecular mechanics-Poisson-Boltzmann surface area method in predicting ligand binding affinity but the former is computationally much more efficient. The high correlation level between theoretically determined rupture forces and experimental data on binding energies implies that SMD is a promising tool for drug design. Our special attention is drawn to recent studies on inhibitors of influenza viruses.
[Back to top]
Recent progress of molecular docking simulations applied to development of drugs
Linus Santana Azevedo, Fernanda Pretto Moraes, Mariana Morrone Xavier, Eduarda Ozório Pantoja, Bianca Villavicencio, Jana Aline Finck, Audrey Menegaz Proença, Kelen Beiestorf Rocha and Walter Filgueira de Azevedo Jr.
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00032]
In order to obtain structural information about intermolecular interactions between a protein target and a drug we could either solve the structure by experimental techniques (protein crystallography or nuclear magnetic resonance), or simulate the protein-drug complex computationally. Molecular docking is a computer simulation methodology that can predict the conformation of a protein-drug complex, with relatively high accuracy when compared with experimental structures. Although a plethora of algorithms has been applied to the problem of molecular docking simulation, recent results show that the most successful approaches are those based on evolutionary algorithms. Evolution as a source of inspiration has been shown to have a great positive impact on the progress of new computational methodologies. In this scenario, analyses of the interactions between a protein target and a drug can be simulated by these evolutionary algorithms. These algorithms mimic evolution to create new paradigms for computation. This review provides a description of evolutionary algorithms and applications to molecular docking simulation. Special attention is dedicated to differential evolutionary algorithm and its implementation in the program molegro virtual docker. Recent applications of these methodologies to protein targets such as acetylcholinesterese, cyclin-dependent kinase 2, purine nucleoside phosphorylase, and shikimate kinase are described.
[Back to top]
Recent Developments and Prospects for Influenza M2 Ion Channel Inhibitors That Circumvent Amantadine Resistance
Petar M. Mitrasinovic
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00033]
Amantadine is a specific anti-influenza A drug that inhibits viral replication by binding to the M2 channel and preventing proton conductance. The increasing resistance to amantadine in strains of the influenza A virus that infect both animals and humans has been highlighted frequently. Resistance is usually caused by one of several single mutations in the M2 channel, but variants with double mutations have also been reported. Attempts to develop alternative inhibitors of the M2 channel that are effective against the resistant mutants have been unsuccessful, mainly because of the lack of information on the precise mode of inhibitor binding. This review summarizes the advances made in determining the mechanisms of action of amantadine and the development of novel inhibitors of the M2 channel during the past 2 years.
[Back to top]
Structure and Function of Enzymes of Shikimate Pathway
Aditya Dev, Satya Tapas, Shivendra Pratap and Pravindra Kumar
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00034]
The shikimate pathway is found in microorganisms, fungi, plants and also in several apicomplexan parasites. This metabolic pathway consists of seven enzymes and converts the primary metabolites phosphoenolpyruvate and erythrose-4-phosphate to chorismate, the last common precursor for the three aromatic amino acids Phe, Tyr, and Trp and other aromatic compounds. The significance of targeting the enzymes of this pathway as selective targets for anti microbial drug design involves the fact that they are essential for microbes but absent in humans.
In present scenario, the emergence of multi-drug resistance in pathogenic bacteria and herbicide resistance in weeds is of great clinical and agro-economical concern. Therefore in this review, we did the comparative sequence and three-dimensional structure analysis of these enzymes from various microorganisms and plants for structure-function analysis, motif search, common structural signatures of active site and elucidation of regulation mechanisms. Also, the available structures of five shikimate pathway enzymes from M. tuberculosis, a dreadful microorganism, which causes 1.5 million deaths per year, have been comparatively analyzed with other reported homologous structures. To get the structural insight of remaining two shikimate pathway enzymes (dehydroquinate synthase and shikimate-5-dehydrogenase) of M. tuberculosis we did molecular modeling to find out key active site residues. These studies can further be proven helpful in designing novel structure based antimicrobial drugs.
[Back to top]
Conserved Domains, Residues, WebLogo and Active Sites of Caspase-Cascades Related to Apoptotic Signaling Pathway
Chiranjib Chakraborty, Jinny Tomar and V.K.Gera
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00035]
Caspases belong to the family of cysteinyl aspartate–a specific proteases which control the programmed cell death process, or apoptosis. In this paper, we have performed a structural bioinformatics analysis of the conserved domains and residues, WebLogo generation and active sites identification related to apoptosis activator and apoptosis executioner caspase-cascades. Here, we have also shown conservation patterns of backbone structures of activator and executioner caspase-cascades. It has been noted that the numbers of highly conserved amino acid residues are very high in caspase-12 (36 aa) and low in caspase-7 (18 aa). We have observed that highly conserved amino acids residues like LYS154, PRO155, LYS156 are present in caspase-3 and caspase-6. In apoptosis and executioner caspases, these amino acids may play an active role. From WebLogo, it has been observed that the stack height is very low between the sequences 231 to 240; 2.3 bits stack height has been observed in 1st sequence position and 236th position where WebLogo stack height is very low. We have identified 10 active sites in caspase-3, caspase-6, caspase-7 which may be helpful in drug development using caspase-cascades. Here, we have also performed literature survey about the drug development using caspase-cascades.
[Back to top]
Can Bioinformatic Methods Inform Us about the Molecular Evolution of Different Human Caspases?
Jinny Tomar, Vishnu Kumar Gera and Chiranjib Chakraborty
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00036]
Caspases are very important molecule which are playing key role for apoptosis. Deregulation of apoptosis contributes to the pathogenesis of many human diseases. Therefore, the regulation of this protein can be controlled for the therapeutic purpose. To determine molecular evolution (in silico) of proteins is popular method for certain laboratories and biotechnology companies. It may help to detect mutations often occur far away from the active site of the protein. It can give us insight thinking about drug development. Using sequence analysis and phylogenetic approach, we have described about the different human caspases and about their origin in terms of ancestral relationship. It was envisaged using the tools of bioinformatics. Among the fourteen mammalian caspases defined, we are able to make use of twelve human caspases, whose data is publicly available. It is evident from the data studied that human caspase 4 and 5 shares the same origin in comparison to human caspase 1 and caspase 12, irrespective of the fact that both share quite high level of similarity. Although, human caspase related ancestral aspect had been studied earlier but the variation which seems to be quite peculiar in this study is that the executioner, caspase3 shares a remarkable high level of similarity with caspase 7 but this is not applicable to human caspase 6, the other member of executioner group. Human caspase 3 and 7 were seen to have similar substrate specificity but it was not evident in terms of origin. Our findings are assumed to play a significant role in the studies of programmed cell death, inflammatory responses and for scholarly studies in the near future.
[Back to top]
Using Network-based Approaches to Predict Ligands of Orphan Nuclear Receptors
Zhernan Jiang, Ran Tao, Lei Du, Weiming Yu and Junxiang Wang
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00037]
Orphan nuclear receptors (oNRs) provide huge opportunities for the discovery of new drug targets. Identifying novel and perhaps unexpected types of ligands for oNRs may gain insight into potentially new principles of physiology. Recently, the network-based approaches are playing an increasingly important role in identifying and validating novel ligands for nuclear receptors. This review describes current progresses in network-based approaches for ligand prediction and discusses their strengths and some of the underlying difficulties.
[Back to top]
Machine Learning Sequence Classification Techniques: Application To Cysteine Protease Cleavage Prediction
David A. duVerle and Hiroshi Mamitsuka
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00038]
Sequence classification is one of the most fundamental machine learning task in computational biology nowadays. With the wide availability of large corpora of annotated sequences, the use of supervised learning techniques can greatly speed up the process of identifying new sequences sharing certain function or properties. Many methods have been proposed over the years and we hope to provide an introduction to some of the more prominent ones by focussing on protease cleavage prediction: a typical representative of this class of problem. The variety of proteolytic action modes between cysteine-proteases covers a broad range of complexity level and feature specificity, illustrating the strengths and limitations of the different machine learning techniques used on them.
This review briefly introduces the particulars of predicting cleavage by calpains and caspases. We then offer some general practical considerations on treating sequences for use with machine learning algorithms, before covering specific methods. The methods presented range from basic position-based statistical models to more technically advanced methods such as Markov models or kernel-based algorithms, as well as methods with more restricted goals such as decision trees. With each family of algorithms, examples of implementations are introduced and their performances compared, along with particular strengths and weaknesses.
With this review, we aim to provide useful elements of decision toward choosing an existing method or developing a new one, based on the complexity and specific needs of a given sequence classification problem.
[Back to top]
Towards creating complete proteomic structural databases of whole organisms
B. Jayaram and Priyanka Dhingra
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00039]
If structures of proteins of whole organisms were available, metabolomic models could be developed, drug targets could be identified, issues of affinity versus specificity could be sorted out and side effects and toxicity brought under control etc. all with greater levels of reliability. Advances in whole genome sequencing projects, annotation algorithms, growing protein sequence information with over half a million entries in the UniProtKB/Swiss-Prot database, progresses in structure based lead molecule design methodologies do uphold this optimism. However, x-ray and NMR structures of less than 15% of the protein sequences are available in RCSB protein data bank. The diverging gap between sequence and structure calls for immediate in silico solutions. The biennial community wide structure prediction (CASP) experiments have considerably catalyzed structure prediction attempts world-wide and accuracies of computational models are continually increasing. While ab initio models have crossed the 100 amino acid limit, it is still some way from the average sized human protein (~ 350 residues). Homology models which rely on the RCSB structures and the axiom that similar sequences adopt similar structures have been extremely powerful in providing high resolution structures limited only by sequence similarities. With dwindling similarities of query sequences with knowledge bases, newer ab intio / homology hybrid approaches are being explored to bring the structure prediction problem within the realm of feasibility in near future particularly for soluble proteins. The case of membrane bound proteins is still refractory. This review takes a stock of current protein tertiary structure prediction algorithms highlighting the problem areas to overcome and promises thereof.
[Back to top]
From Ontology-Based Gene Function to Physiological Model
Ajay Shiv Sharma, Hari Om Gupta and Petar M. Mitrasinovic
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00040]
Designing ontology to represent gene function is of vital importance for meeting the major challenge of integrating sequence data with the increasing amount of data from functional analyses of genes. Given that genes are expressed in temporally and spatially characteristic patterns, their products quite often reside in specific cellular compartments and may be part of one or more multi-component complexes. Genes may have more than one product and the products are functionally distinct. An overall strategy elucidating how an ontology-based gene function may be implemented using genomic databases is herein dissected. Knowing that gene products possess one or more biochemical, physiological or structural functions, the present strategy is suggested to lead towards physiological models. A review of the features of the currently available software tools for the implementation of the considered strategy is presented.
[Back to top]
Experimental and computational challenges from array-based to sequence-based ChIP techniques
Xun Lan and Victor X. Jin
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00041]
Transcriptional regulation is a key step to control the level of mRNA formed. Recent view of transcriptional regulation has evolved from a one-dimensional mode, i.e. RNA Polymerase II assembles with general transcription factors, and cis-regulatory elements (CREs) interact with transcription factors, to a much complex multiple-dimensional mode, involving combinatorial interactions between transcription factors and regulatory sequences, chromatin structure, histone modifications, DNA methylation. High throughput experimental technologies, such as array-based ChIP-chip and sequencing-based ChIP-seq, have been developed to survey in vivo transcription factor binding sites and histone modifications. Despite many efforts have been made to analyze and interpret the data, challenges remain in many aspects of both experimental protocols and computational analyses. For example, how to determine the optimized number of PCR cycles? How to normalize multiple datasets from multiple experiments? How to utilize the large number of unmapped and multiple mapped tags in ChIP-seq experiment? This review focuses on issues emerged in high throughput data processing and discusses advantages and disadvantages of various strategies.
[Back to top]
Genome To Vaccinome: Role of Bioinformatics, Immunoinformatics & Comparative Genomics
Urmila Kulkarni-Kale, Vaishali Waman, Snehal Raskar, Swati Mehta and Smita Saxena
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00042]
Emerging and re-emerging viral infections are a threat to human health and a cause of global concern. Several viral vaccines have been successfully developed using conventional methods. However, there are many viruses for which vaccines need to be developed on priority basis. Furthermore, the challenges viz. varying efficacy of existing vaccines also need to be addressed as viruses are known to evolve at a higher rate as compared to other species.
Under this scenario, availability of whole genome sequences of pathogens has brought a paradigm shift and reversed the process of vaccine development, which is termed as reverse vaccinology approach. The advents of next generation sequencing technologies coupled with pan-genomic approaches, offer unprecedented opportunities for data driven, knowledge-based approaches for rational design of viral vaccines. Reverse vaccinology approach begins with analysis of genomic data and culminates into identification and prioritization of a few tractable vaccine candidates. The genomic sequence of a viral pathogen is processed through various sequence and structure-based analyses to identify an ensemble of epitopes, which not only reduces time required for discovery but also helps in eliminating quite a few in vitro screening steps. The field leverages on tools & techniques of bioinformatics, immunoinformatics and comparative genomics, which are not only independent domains of research, but also offer a distinct advantage in designing vaccines. A detailed account of the state of art resources and methods in each of these areas are reviewed and presented to substantiate and highlight their roles in designing viral vaccines in the post genomics era.
[Back to top]
Crimean- Congo hemorrhagic fever virus: Strategies to combat with an emerging threat to human
Pratap S, Narwal M, Dev A, Dhindwal S, Tomar S and Kumar P
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00043]
Bunyaviridae family consists of vector borne lethal viruses, stands out as the largest virus family with its 350 members. One such virus of this family, Crimean- Congo hemorrhagic fever virus (CCHFV) is transmitted through bites of ixodid ticks or by direct contact with blood from infected animals. Crimean-Congo haemorrhagic fever (CCHF) is a severe disease in humans which is endemic in large parts of the world with a high mortality rate. This virus could also be used as a bioterrorism agent due to its human-to-human transmission with no specific therapy. The pathogenicity factor of CCHFV is unexplored due to the lack of animal models. CCHFV, being an RNA virus, is able to mutate rapidly hence preventing the development of effective therapy against it. Till now ribavarin is the only available drug for supportive treatment but has many side-effects. New technologies like RNA interference has emerged as a solution for epidemics of CCHF. RNAi is a sequence specific approach, has been used successfully against different pathogens. This review focuses on designing and application of RNAi with emphasis on the role of bioinformatics for the anti CCHFV therapeutic development strategy.
[Back to top]
Comparative Genomics and Systems Biology of Malaria Parasites Plasmodium
Hong Cai, Zhan Zhou, Jianying Gu and Yufeng Wang
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00044]
Malaria is a serious infectious disease that causes over one million deaths yearly. It is caused by a group of protozoan parasites in the genus Plasmodium. No effective vaccine is currently available and the elevated levels of resistance to drugs in use underscore the pressing need for novel antimalarial targets. In this review, we survey omics centered developments in Plasmodium biology, which have set the stage for a quantum leap in our understanding of the fundamental processes of the parasite life cycle and mechanisms of drug resistance and immune evasion.
Towards an Experimental and Systems Biology Framework for Cancer Cell Therapeutics
Petar M. Mitrasinovic
[FULL-TEXT INQUIRY] [BSP/CBIO/E-Pub/00045]
Since most molecular studies on death of cells in tissues have been carried out on isolated cell populations due to known difficulties manifested by interactions with surrounding cells, a novel means of investigating general principles governing cellular functions under oxidative stress conditions is needed in order to shed more light on the background of cancer disease. It is believed that relevant signal transmission may be discovered by transition from molecular to modular cell biology. Systems-level kinetic models are thus expected to explain dynamic behavior and go far beyond the static pictures of the topologies of the signaling pathways. The outline of this review is to feature several representative problems, based on combined - experimental and systems biology studies over the last few years, with a particular emphasis both on the elucidation of how cells interpret the same signal stimulation in distinct fashions (cell death vs. cell survival) and on the identification of signaling molecules with therapeutic relevancy. The origin of oscillations in such molecular mechanisms under oxidative stress conditions and the implications of these oscillatory non-linearities for the development of successful therapies are discussed.
[Back to top]
|