|
Current
Genomics
ISSN: 1389-2029

Current Genomics
Volume 10, Number 6, September 2009
Contents
Genomic Signal Processing: Part 1
Guest Editors: E.R. Dougherty, X. Cai, Y. Huang, S. Kim and R. Yamaguchi
Editorial Pp. 364
Performance of Feature Selection Methods
Pp. 365-374
E.R. Dougherty, J. Hua and C. Sima
[Abstract] [Purchase
Article]
Boolean Models of Genomic Regulatory
Networks: Reduction Mappings, Inference, and External Control
Pp. 375-387
I. Ivanov
[Abstract] [Purchase
Article]
Review of Peak Detection Algorithms in
Liquid-Chromatography-Mass Spectrometry Pp. 388-401
J. Zhang, E. Gonzalez, T. Hestilow, W. Haskins
and Y. Huang
[Abstract] [Purchase
Article]
Hidden Markov Models and their Applications
in Biological Sequence Analysis Pp. 402-415
B.-J. Yoon
[Abstract] [Purchase
Article]
Inference of Gene Regulatory Networks Using Time-Series Data:
A Survey Pp. 416-429
C. Sima, J. Hua and S. Jung
[Abstract] [Purchase
Article]
Clustering Algorithms: On Learning, Validation, Performance,
and Applications to Genomics Pp. 430-445
L. Dalton, V. Ballarin and M. Brun
[Abstract] [Purchase
Article]
Abstracts
[Back to top]
Editorial: Genomic Signal Processing
Genomic Signal Processing (GSP) has been defined
as the analysis, processing, and use of genomic signals for
gaining biological knowledge and the translation of that knowledge
into systems-based applications, where by genomic signals
we mean the measurable events, principally the production
of mRNA and protein carried out within the cell. Owing to
the defining role of DNA in the production of mRNA, the structural
characterization of DNA is inevitably a part of GSP and, interestingly,
signal processing methods are utilized in understanding DNA
structure.
A key goal of translational genomics is to discover families
of genes or gene products that can be used to classify disease,
thereby leading to molecular-based diagnosis and prognosis.
A deeper goal is to characterize genomic and proteomic regulation,
thereby leading to a functional understanding of disease and
the development of systems-based medical solutions.
GSP is growing in importance as an ever larger community is
recognizing that accomplishing these goals requires various
disciplines within or related to signal processing, including
pattern recognition, prediction/estimation theory, information
theory, dynamical systems, control theory, network modeling,
and communication theory. In sum, systems biology and systems
medicine demand deep understanding of systems theory. This
inevitably entails the theory and methods of signal processing,
which have been so successful in areas such as communications,
and the related theory pertaining to the characterization
and control of dynamical systems, without which one cannot
even imagine our contemporary technological society.
The purpose of this special issue is to bring some of the
key developments in GSP to the wider genomics community. Owing
to its grounding in systems theory and stochastic processes,
GSP often requires mathematics beyond the level of that studied
in undergraduate electrical engineering, or even undergraduate
mathematics and statistics, and, therefore, as originally
published in the scientific literature, is not accessible
to many researchers in biology and medicine. Because systems
biology and systems medicine will, ipso facto, have
to rely on mathematical systems theory, this dichotomy is
a problem that will have to be addressed in the future, from
both educational and research perspectives; nonetheless, in
a review format it is possible to communicate many of the
basic ideas without recourse to the kind of full rigorous
mathematical analyses required in original research.
In this issue, the guest editors have tried to accomplish
this aim in two ways: first, by specifying the kinds of issues
we would like contributors to address, with the requirement
that the mathematical details be kept to a minimum (with references
to the relevant literature); and, second, by working with
the contributors to achieve the goals of the issue through
the review process.
Although the final scope has been determined by the submissions
and acceptances, we believe that a good breadth of subject
matter is included in the issue, including the inference,
analysis, and control of gene regulatory networks, mass spectrometry
for proteomics, sequence analysis, metagenomics, MicroRNA
target prediction, clustering, and feature selection and error
estimation for classification, where in the cases of classification
and clustering the papers pay close attention to performance
in the context of small samples, a ubiquitous situation in
which many methods in the literature perform poorly.
Edward R. Dougherty
Department of Electrical and
Computer Engineering
Texas A&M
University
3128 TAMU
College Station, TX 77843-3128
USA
Tel: 979-862-8896
Fax: 979-845-6259
E-mail: e-dougherty@tamu.edu
Yufei Huang
Department of Electrical and
Computer Engineering
The University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249-0669
USA
Tel: 210-4586270
Fax: 210-4585947
E-mail: yufei.huang@utsa.edu
Seungchan Kim
Department of Computer Science and
Engineering, The School of Computing,
Informatics and Decision Systems
Engineering, Ira A. Fulton School of
Engineering, Arizona State University
Tempe, AZ 85281
USA
Tel: 480-727-8833
Fax: 480-965-2751
E-mail: dolchan@tgen.org
Xiaodong Cai
Department of Electrical and
Computer Engineering
University of Miami
1251 Memorial Drive, Coral Gables
FL 33146
Tel: 305-284-5329
Fax: 305-284-4044
E-mail: x.cai@miami.edu
Rui Yamaguchi
Laboratory of Sequence Analysis
Human Genome Center
Institute of Medical Science
University of Tokyo 4-6-1, Shirokanedai, Minato-ku, Tokyo
108-8639
Japan
Tel: +81-3-5449-5792
Fax: +81-3-5449-5790
E-mail: ruiy@ims.u-tokyo.ac.jp
[Back to top]
[Purchase
Article]
Performance of Feature Selection Methods
E.R. Dougherty, J. Hua and C. Sima
High-throughput biological technologies offer the promise
of finding feature sets to serve as biomarkers for medical
applications; however, the sheer number of potential features
(genes, proteins, etc.) means that there needs to be massive
feature selection, far greater than that envisioned in the
classical literature. This paper considers performance analysis
for feature-selection algorithms from two fundamental perspectives:
How does the classification accuracy achieved with a selected
feature set compare to the accuracy when the best feature
set is used and what is the optimal number of features that
should be used? The criteria manifest themselves in several
issues that need to be considered when examining the efficacy
of a feature-selection algorithm: (1) the correlation between
the classifier errors for the selected feature set and the
theoretically best feature set; (2) the regressions of the
aforementioned errors upon one another; (3) the peaking phenomenon,
that is, the effect of sample size on feature selection; and
(4) the analysis of feature selection in the framework of
high-dimensional models corresponding to high-throughput data.
[Back to top]
[Purchase
Article]
Boolean Models of Genomic Regulatory Networks:
Reduction Mappings, Inference, and External Control
I. Ivanov
Computational modeling of genomic regulation has
become an important focus of systems biology and genomic signal
processing for the past several years. It holds the promise
to uncover both the structure and dynamical properties of
the complex gene, protein or metabolic networks responsible
for the cell functioning in various contexts and regimes.
This, in turn, will lead to the development of optimal intervention
strategies for prevention and control of disease. At the same
time, constructing such computational models faces several
challenges. High complexity is one of the major impediments
for the practical applications of the models. Thus, reducing
the size/complexity of a model becomes a critical issue in
problems such as model selection, construction of tractable
subnetwork models, and control of its dynamical behavior.
We focus on the reduction problem in the context of two specific
models of genomic regulation: Boolean networks with perturbation
(BNP) and
probabilistic Boolean networks (PBN). We also compare
and draw a parallel between the reduction problem and two
other important problems of computational modeling of genomic
networks: the problem of network inference and the problem
of designing external control policies for intervention/altering
the dynamics of the model.
[Back to top] [Purchase
Article]
Review of Peak Detection Algorithms in Liquid-Chromatography-Mass
Spectrometry
J. Zhang, E. Gonzalez, T. Hestilow, W. Haskins and
Y. Huang
In this review, we will discuss peak detection in Liquid-Chromatography-Mass
Spectrometry (LC/MS) from a signal processing perspective.
A brief introduction to LC/MS is followed by a description
of the major processing steps in LC/MS. Specifically, the
problem of peak detection is formulated and various peak detection
algorithms are described and compared.
[Back to top]
[Purchase
Article]
Hidden Markov Models and their Applications in Biological
Sequence Analysis
B.-J. Yoon
Hidden Markov models (HMMs) have been extensively used
in biological sequence analysis. In this paper, we give a
tutorial review of HMMs and their applications in a variety
of problems in molecular biology. We especially focus on three
types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive
HMMs. We show how these HMMs can be used to solve various
sequence analysis problems, such as pairwise and multiple
sequence alignments, gene annotation, classification, similarity
search, and many others.
[Back to top] [Purchase
Article]
Inference of Gene Regulatory Networks Using Time-Series Data:
A Survey
C. Sima, J. Hua and S. Jung
The advent of high-throughput technology like microarrays
has provided the platform for studying how different cellular
components work together, thus created an enormous interest
in mathematically modeling biological network, particularly
gene regulatory network (GRN). Of particular interest is the
modeling and inference on time-series data, which capture
a more thorough picture of the system than non-temporal data
do. We have given an extensive review of methodologies that
have been used on time-series data. In realizing that validation
is an impartible part of the inference paradigm, we have also
presented a discussion on the principles and challenges in
performance evaluation of different methods. This survey gives
a panoramic view on these topics, with anticipation that the
readers will be inspired to improve and/or expand GRN inference
and validation tool repository.
[Back to top] [Purchase
Article]
Clustering Algorithms: On Learning, Validation, Performance,
and Applications to Genomics
L. Dalton, V. Ballarin and
M. Brun
The development of microarray technology has enabled
scientists to measure the expression of thousands
of genes simultaneously, resulting in a surge of interest
in several disciplines throughout biology and medicine. While
data clustering has been used for decades in image processing
and pattern recognition, in recent years it has joined this
wave of activity as a popular technique to analyze microarrays.
To illustrate its application to genomics, clustering applied
to genes from a set of microarray data groups together those
genes whose expression levels exhibit similar behavior throughout
the samples, and when applied to samples it offers the potential
to discriminate pathologies based on their differential patterns
of gene expression. Although clustering has now been used
for many years in the context of gene expression microarrays,
it has remained highly problematic. The choice of a clustering
algorithm and validation index is not a trivial one,
more so when applying them to high throughput biological or
medical data. Factors to consider when choosing an algorithm
include the nature of the application, the characteristics
of the objects to be analyzed, the expected number and shape
of the clusters, and the complexity of the problem versus
computational power available. In some cases a very simple
algorithm may be appropriate to tackle a problem, but many
situations may require a more complex and powerful algorithm
better suited for the job at hand. In this paper, we will
cover the theoretical aspects of clustering, including error
and learning, followed by an overview of popular clustering
algorithms and classical validation indices. We also discuss
the relative performance of these algorithms and indices and
conclude with examples of the application of clustering to
computational biology.
|