Current Proteomics
ISSN: 1570-1646

Current Proteomics
Volume 6 Number 4, December 2009
Contents
Editorial Pp.
203
Topological Charge-Transfer Indices: From Small Molecules
to Proteins Pp. 204-213
Francisco Torrens and Gloria Castellano
[Abstract] [Full
text article]
QSAR Models for Proteins of Parasitic
Organisms, Plants and Human Guests: Theory, Applications,
Legal Protection, Taxes, and Regulatory Issues Pp.
214-227
Humberto González-Díaz, Francisco
Prado-Prado, Lázaro G. Pérez Montoto, Aliuska
Duardo-Sánchez and Antonio López-Díaz
[Abstract] [Full
text article]
Computational Analysis of Amino Acid
Mutation: A Proteome Wide Perspective Pp.
228-234
Jiajia Chen and Bairong Shen
[Abstract] [Full
text article]
Proteins as Networks: A Mesoscopic Approach
Using Haemoglobin Molecule as Case Study Pp.
235-245
Alessandro Giuliani, Luisa Di Paola and
Roberto Setola
[Abstract] [Full
text article]
Study of Parasitic Infections, Cancer,
and other Diseases with Mass-Spectrometry and Quantitative
Proteome-Disease Relationships Pp. 246-261
Lázaro G. Pérez-Montoto, Francisco
Prado-Prado, Florencio M. Ubeira and Humberto González-Díaz
[Abstract] [Full
text article]
Pseudo Amino Acid Composition and its
Applications in Bioinformatics, Proteomics and System Biology
Pp. 262-274
Kuo-Chen Chou
[Abstract] [Full
text article]
Star Graphs of Protein Sequences and
Proteome Mass Spectra in Cancer Prediction Pp.
275-288
José M. Vázquez, Vanessa Aguiar,
Jose A. Seoane, Ana Freire, José A. Serantes, Julián
Dorado, Alejandro Pazos and Cristian R. Munteanu
[Abstract] [Full
text article]
Machine Learning Quantitative Structure-Activity
Relationships (QSAR) for Peptides Binding to the Human Amphiphysin–1
SH3 Domain Pp. 289-302
Ovidiu Ivanciuc
[Abstract] [Full
text article]
Abstracts
[Back to top]
Editorial:
In our days, there is an explosion on the use of
Topological Indices (TIs) and Connectivity Indices (CIs) described
from graph theory to study Complex Networks on a broad spectrum
of topics related to Bioinformatics and Proteomics. These
topics cover many biomedical fields from Virology, Parasitology,
and Microbiology in general to Toxicology, and Cancer research
to cite only some of the more investigated. The main reason
for this success of TIs/CIs, is the high flexibility of this
theory to solve in a fast but rigorous way many apparently
unrelated problems in all these disciplines. This determined
the recent development of several interesting software and
theoretical methods to handle with structure-function information
and data mining in this field. In a recent, preliminary review
in the field González-Díaz H et al.
Proteomics (2008) 8, 750-778, we noted that these software
and methods may work at different structural levels including:
-structure of protein ligand drugs,
-protein structure,
-protein-protein, protein-DNA and other types of protein involving
interactions,
-protein mediated cell-to-cell or organism-organism interactions,
-numerical description of 2D electrophoretic proteomics maps.
-prediction of protein fragmentation connected to mass spectra
-numerical description of whole blood proteome Mass Spectra
and other topics.
In any case, in only one manuscript is very difficult to zip
all this information. So it is necessary a topic issue because
many of the users of these programs limit to a narrow field
of application and ignore the several applications at different
proteomics levels. On the other hand, many researchers, which
move by the frontiers of these fields, miss a journal issue
reviewing the actual applications and future perspectives
of these software and methods and the possible relationships
of data flow between them in a common theoretic framework.
Such a collection of papers could be of the major interest
for many specialists on proteomics and may increase the interchange
between these specialists of different but related fields
with a common root: proteomics and graph theory. In addition,
it could be the seed for further improvement of software performance
and compatibility. Taking into consideration all these aspects,
Current Proteomics presents this special issue composed by
a collection of papers devoted to review the common theoretic
basis, applications, and inter-connections between the inputs
and outputs of some of the more used Cheminformatics-Bioinformatics
and Data Mining software or methods (about one paper per
method) that enable calculation of TIs and their applications
to Proteomics. We hope that the present issue may serve as
a bridge between theoretical scientists in graph theory and
experimentalists in proteomics in order to suggest new areas
of mutual interchange and collaboration.
Humberto González-Díaz
Department of Microbiology and Parasitology
Faculty of Pharmacy
University of Santiago de Compostela
15782
Spain
[Back to top] [Full
text article]
Topological Charge-Transfer Indices: From Small Molecules
to Proteins
Francisco Torrens and Gloria Castellano
Valence-topological charge-transfer indices are
applied to the calculation of dipole moment–pH
at the isoelectric point. Dipole moments calculated by algebraic–vector
semisums of charge-transfer indices are defined. The
ability of indices, for the description of molecular charge
distribution, is established by comparing them with the dipole
moment of the valence-isoelectronic series of cyclopentadiene–benzene–styrene.
Both charge-transfer indices are proposed: vector semisums
μvec–μvecV.
The μvecV
is intermediate between μvec
and μexperiment.
The steric effect is almost constant along series and the
dominating effect is electronic. The indices are applied to
the calculation of the dipole moments of the homologous series
of percutaneous enhancers and the isoelectric point of 21
amino acids. In most fits no superimposition of the corresponding
Gk–Jk/GkV–JkV
pairs is observed, which diminishes the risk of collinearity.
The inclusion of heteroatoms in the π-electron
systems is beneficial for the description of isoelectric point,
because of either the role of additional p orbitals provided
by heteroatom or the role of steric factors in the π-electron
conjugation. The use of (valence) charge-transfer indices
gives limited results for amino-acid isoelectric points. The
inclusion of the number of acidic/basic groups improves the
models, especially for amino acids with more than two functional
groups. The fitting line for 21 amino acids is used to estimate
the lysozyme isoelectric point by replacing (1+Δn/nT)
with (M+Δn)/nT.
The lysozyme fragment results can estimate the isoelectric
point of the whole protein within 1–13% error.
[Back to top]
[Full
text article]
QSAR Models for Proteins of Parasitic Organisms, Plants and
Human Guests: Theory, Applications, Legal Protection, Taxes,
and Regulatory Issues
Humberto González-Díaz, Francisco
Prado-Prado, Lázaro G. Pérez Montoto, Aliuska
Duardo-Sánchez and Antonio López-Díaz
The Quantitative Structure-Property Relationship
(QSPR) models based on Graph or Network theory are important
to represent and predict interesting properties of low-molecular-weight
compounds. The graph parameters called Topological Indices
(TIs) are useful to link the molecular structure with physicochemical
and biological properties. However, there have been recent
efforts to extend these methods to the study of proteins and
whole proteomes as well. In this case, we are in the presence
of Quantitative Protein/Proteome-Property Relationship (QPPR)
models, by analogy to QSPR. In the present work we review,
discuss, and outline some perspectives on the use of these
QPPR techniques applied to single proteins of Parasitic Organisms,
Plants and Human Guests. We make emphasis on the different
types of graphs and network representations of proteins, the
structural information codified by different protein TIs,
the statistical or machine learning techniques used and the
biological properties predicted. This article also provides
a reference to the various legal avenues that are available
for the protection of software used in proteins QSAR; as well
as the acceptance and legal treatment of scientific results
and techniques derived from such software. We also make reference
to the recent implementation by Munteanu and González-Díaz
of the internet portal called BioAims freely available for
the use of the international research community. This portal
includes the web-server packages TargetPred with two new Protein-QSAR
servers: ATCUNPred (http://miaja.tic.udc.es/Bio-AIMS/ATCUNPred.php)
for prediction of ATCUN-mediated DNA-clevage anticancer proteins
and EnzClassPred for prediction of enzyme classes (http://miaja.tic.udc.es/Bio-AIMS/EnzClassPred.php).
Last we included an overview of relevant topics related to
legal protection, regulation, and international tax issues
involved in practical use of this type of models and software
in proteomics.
[Back to top]
[Full
text article]
Computational Analysis of Amino Acid Mutation: A Proteome
Wide Perspective
Jiajia Chen and Bairong Shen
Amino acid mutations may have diverse effects on protein
structure and function. Thus reliable information about the
protein sequence variations is essential to gain insights
into disease genotype–phenotype correlations. With the
recent availability of the complete genome sequence and the
accumulation of variation data, determining the effects of
amino acid substitution will be the next challenge in mutation
research. The molecular consequences of amino acid mutations
can readily be predicted by numerous bioinformatic methods,
which analyze the mutation effects from different points of
view. In this review, these approaches are categorized according
to their analysis principles. The applicability of these tools
for inference of mutation-structure-function relationship
is also recapitulated. When the human diseases are likely
to involve defects in multiple genes, most of the current
mutation analysis focuses on single point mutation and lacks
an expansive proteome-wide perspective. We propose in this
review the application of the existing computational tools
in the analysis of correlated mutations at a system level.
Directions for future developments and implications are discussed,
which will help to understand the networks underlying human
disease.
[Back to top]
[Full
text article]
Proteins as Networks: A Mesoscopic Approach Using Haemoglobin
Molecule as Case Study
Alessandro Giuliani, Luisa Di Paola and
Roberto Setola
Protein structures allow for a straightforward representation
in terms of graph theory being the nodes the aminoacid residues
and the edges the scoring of a spatial contact between the
node pairs. Such a representation allows for a direct use
in the realm of protein science of the vast repertoire of
graph invariants developed in the analysis of complex networks.
In this work we give a general overview of the protein as
networks paradigm with a special emphasis on haemoglobin where
the most important features of protein systems like allostery,
protein-protein contacts and differential effect of mutations
were demonstrated to be amenable to a graph theory oriented
translation.
[Back to top]
[Full
text article]
Study of Parasitic Infections, Cancer, and other Diseases
with Mass-Spectrometry and Quantitative Proteome-Disease Relationships
Lázaro G. Pérez-Montoto, Francisco
Prado-Prado, Florencio M. Ubeira and Humberto González-Díaz
We can understand Mass-Spectra Quantitative Proteome-Disease
Relationships (MS-QPDRs) as models useful to detect Disease
Biomarkers or to predict Drug Toxicity effects based on Mass-Spectra
outcomes from samples of human body tissues, parasites, or
other organisms. MS-QPDR development and practical use is
an emerging area combining Proteomics and Bioinformatics;
which involves computational, molecular, and legal sciences.
We detect, at least two tendencies on QPDR development. The
first tendency (type 1) uses Statistical, Artificial Intelligence,
Machine Learning and/or Non-Linear Signal processing to fish
for single MS biomarker signals directly within MS data. A
recent alternative (type 2) uses Graph Theory to construct
Complex Network representations of MS data. Next, we can calculate
graph parameters called Mass-Spectra Topological Indices (MS-TIs)
useful to describe the graph. The last step is similar to
the first tendency but it uses MS-TIs as inputs (instead of
MS signals) to seek the MS-QPDR model. There are many examples
of QPDR models based on scheme 1. However, there has been
little effort to seek QPDR models with scheme 2. On the other
hand, MS-QPDR models can be obtained from different body fluids;
the case of Human Blood Proteome (BP) is one of the most interesting.
The outcomes obtained by Mass Spectrometry (MS) analysis of
Serum Protein Profile (SPP) of Blood Proteome (BP) are very
useful for the early detection of diseases and drug induced
toxicities. In the present work we review, discuss, and outline
some perspectives on the use of QPDR models based on the two
types of schemes. We also refer to the recent implementation
of the internet portal called BioAims for QPDR analysis (http://miaja.tic.udc.es/Bio-AIMS/
) for free use by the research community.
[Back to top]
[Full
text article]
Pseudo Amino Acid Composition and its Applications in Bioinformatics,
Proteomics and System Biology
Kuo-Chen Chou
With the avalanche of protein sequences generated
in the post-genomic age, it is highly desired to develop automated
methods for efficiently identifying various attributes of
uncharacterized proteins. This is one of the most important
tasks facing us today in bioinformatics, and the information
thus obtained will have important impacts on the development
of proteomics and system biology. To realize that, one of
the keys is to find an effective model to represent the sample
of a protein. The most straightforward model in this regard
is its entire amino acid sequence; however, the entire sequence
model would fail to work when the query protein did not have
significant homology to proteins of known characteristics.
Thus, various non-sequential models or discrete models were
proposed. The simplest discrete model is the amino acid (AA)
composition. Using it to represent a protein, however, all
the sequence-order information would be completely lost. To
cope with such a dilemma, the concept of pseudo amino acid
(PseAA) composition was introduced. Its essence is to keep
using a discrete model to represent a protein yet without
completely losing its sequence-order information. Therefore,
in a broad sense, the PseAA composition of a protein is actually
a set of discrete numbers that is derived from its amino acid
sequence and that is different from the classical AA composition
and able to harbour some sort of sequence order or pattern
information. Ever since the first PseAA composition was formulated
to predict protein subcellular localization and membrane protein
types, it has stimulated many different modes of PseAA composition
for studying various kinds of problems in proteins and proteins-related
systems. In this review, we shall give a brief and systematic
introduction of various modes of PseAA composition and their
applications. Meanwhile, the challenges for finding the optimal
PseAA composition are also briefly discussed.
[Back to top]
[Full
text article]
Star Graphs of Protein Sequences and Proteome Mass Spectra
in Cancer Prediction
José M. Vázquez, Vanessa Aguiar,
Jose A. Seoane, Ana Freire, José A. Serantes, Julián
Dorado, Alejandro Pazos and Cristian R. Munteanu
The impact of cancer in the society has created the necessity
of new and faster theoretical models that may allow earlier
cancer detection. The present review gives the prediction
of cancer by using the star graphs of the protein sequences
and proteome mass spectra by building a Quantitative Protein
- Disease Relationships (QPDRs), similar to Quantitative Structure
Activity Relationship (QSAR) models. The nodes of these star
graphs are represented by the amino acids of each protein
or by the amplitudes of the mass spectra signals and the edged
are the geometric and/or functional relationships between
the nodes. The star graphs can be numerically described by
the invariant values named topological indices (TIs). The
transformation of the star graphs (graphical representation)
of proteins into TIs (numbers) facilitates the manipulation
of protein information and the search for structure-function
relationships in Proteomics. The advantages of this method
include simplicity, fast calculations and free resources such
as S2SNet and MARCH-INSIDE tools. Thus, this ideal theoretical
scheme can be easily extended to other types of diseases or
even other fields, such as Genomics or Systems Biology.
[Back to top]
[Full
text article]
Machine Learning Quantitative Structure-Activity Relationships
(QSAR) for Peptides Binding to the Human Amphiphysin–1
SH3 Domain
Ovidiu Ivanciuc
Developing machine learning methods to predict peptide-protein
binding affinity has become an important approach in proteomics.
A diversity of linear and nonlinear machine learning algorithms
is applied in quantitative structure–activity relationships
(QSAR) to generate predictive models for ligand binding to
a biological receptor. QSAR represent regression models that
define quantitative correlations between the chemical structure
of molecules and their physical, chemical, or biological properties.
A QSAR equation predicts a molecular property from a set of
molecular descriptors representing the input data to a machine
learning algorithm, such as linear regression, partial least
squares, artificial neural networks, or support vector machines.
Here we present a QSAR comparative study for peptides binding
to the human amphiphysin–1 SH3 domain, based on five
machine learning methods, namely partial least squares, radial
basis function artificial neural networks, support vector
machines, Gaussian processes, k-nearest neighbors,
and the decision trees REPTree and M5P, as implemented in
the machine learning software Weka. The peptide structure
was encoded with five amino acid scales, namely the Miyazawa-Jernigan
(MJ) substitution matrix, G. Schneider’s principal component
(GSPC) scale, Lv’s DPPS scale, Clementi’s GRID
scale, and Wold’s z scale. The machine learning
models were trained with a dataset of 200 peptides, and the
QSAR models were tested for a prediction dataset of 684 peptides.
The best predictions were obtained with the decision tree
M5P for all five amino acid scales, namely z scale
q2 = 0.543, MJ scale
q2 = 0.553, GSPC
scale q2 = 0.557,
GRID scale q2 =
0.558, and DPPS scale q2
= 0.599. These results show that M5P decision trees give predictive
QSAR for peptide-protein binding affinity, and should be considered
as valuable candidates for other peptide QSAR. Also, the new
DPPS scale has clear advantages compared to the previous amino
acid descriptors. The study provides support to QSAR approaches
based on a large-scale evaluation of machine learning algorithms
and diverse classes of structural descriptors.
|