Current Genomics

ISSN: 1389-2029

Current Genomics
Volume 10, Number 6, September 2009


Contents


Genomic Signal Processing: Part 1
Guest Editors: E.R. Dougherty, X. Cai, Y. Huang, S. Kim and R. Yamaguchi


Editorial
Pp. 364


Performance of Feature Selection Methods Pp. 365-374
E.R. Dougherty, J. Hua and C. Sima
[Abstract] [Purchase Article]


Boolean Models of Genomic Regulatory Networks: Reduction Mappings, Inference, and External Control Pp. 375-387
I. Ivanov
[Abstract] [Purchase Article]


Review of Peak Detection Algorithms in Liquid-Chromatography-Mass Spectrometry Pp. 388-401
J. Zhang, E. Gonzalez, T. Hestilow, W. Haskins and Y. Huang
[Abstract] [Purchase Article]


Hidden Markov Models and their Applications in Biological Sequence Analysis Pp. 402-415
B.-J. Yoon
[Abstract] [Purchase Article]


Inference of Gene Regulatory Networks Using Time-Series Data: A Survey
Pp. 416-429
C. Sima, J. Hua and S. Jung
[Abstract] [Purchase Article]


Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
Pp. 430-445
L. Dalton, V. Ballarin and M. Brun
[Abstract] [Purchase Article]




Abstracts


[Back to top]
Editorial: Genomic Signal Processing

Genomic Signal Processing (GSP) has been defined as the analysis, processing, and use of genomic signals for gaining biological knowledge and the translation of that knowledge into systems-based applications, where by genomic signals we mean the measurable events, principally the production of mRNA and protein carried out within the cell. Owing to the defining role of DNA in the production of mRNA, the structural characterization of DNA is inevitably a part of GSP and, interestingly, signal processing methods are utilized in understanding DNA structure.

A key goal of translational genomics is to discover families of genes or gene products that can be used to classify disease, thereby leading to molecular-based diagnosis and prognosis. A deeper goal is to characterize genomic and proteomic regulation, thereby leading to a functional understanding of disease and the development of systems-based medical solutions.

GSP is growing in importance as an ever larger community is recognizing that accomplishing these goals requires various disciplines within or related to signal processing, including pattern recognition, prediction/estimation theory, information theory, dynamical systems, control theory, network modeling, and communication theory. In sum, systems biology and systems medicine demand deep understanding of systems theory. This inevitably entails the theory and methods of signal processing, which have been so successful in areas such as communications, and the related theory pertaining to the characterization and control of dynamical systems, without which one cannot even imagine our contemporary technological society.

The purpose of this special issue is to bring some of the key developments in GSP to the wider genomics community. Owing to its grounding in systems theory and stochastic processes, GSP often requires mathematics beyond the level of that studied in undergraduate electrical engineering, or even undergraduate mathematics and statistics, and, therefore, as originally published in the scientific literature, is not accessible to many researchers in biology and medicine. Because systems biology and systems medicine will, ipso facto, have to rely on mathematical systems theory, this dichotomy is a problem that will have to be addressed in the future, from both educational and research perspectives; nonetheless, in a review format it is possible to communicate many of the basic ideas without recourse to the kind of full rigorous mathematical analyses required in original research.

In this issue, the guest editors have tried to accomplish this aim in two ways: first, by specifying the kinds of issues we would like contributors to address, with the requirement that the mathematical details be kept to a minimum (with references to the relevant literature); and, second, by working with the contributors to achieve the goals of the issue through the review process.

Although the final scope has been determined by the submissions and acceptances, we believe that a good breadth of subject matter is included in the issue, including the inference, analysis, and control of gene regulatory networks, mass spectrometry for proteomics, sequence analysis, metagenomics, MicroRNA target prediction, clustering, and feature selection and error estimation for classification, where in the cases of classification and clustering the papers pay close attention to performance in the context of small samples, a ubiquitous situation in which many methods in the literature perform poorly.

Edward R. Dougherty
Department of Electrical and
Computer Engineering
Texas A&M University
3128 TAMU
College Station, TX 77843-3128
USA
Tel: 979-862-8896
Fax: 979-845-6259
E-mail: e-dougherty@tamu.edu

Yufei Huang
Department of Electrical and
Computer Engineering
The University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249-0669
USA
Tel: 210-4586270
Fax: 210-4585947
E-mail: yufei.huang@utsa.edu

Seungchan Kim
Department of Computer Science and
Engineering, The School of Computing,
Informatics and Decision Systems
Engineering, Ira A. Fulton School of
Engineering, Arizona State University
Tempe, AZ 85281
USA
Tel: 480-727-8833
Fax: 480-965-2751
E-mail: dolchan@tgen.org

Xiaodong Cai
Department of Electrical and
Computer Engineering
University of Miami
1251 Memorial Drive, Coral Gables
FL 33146
Tel: 305-284-5329
Fax: 305-284-4044
E-mail: x.cai@miami.edu

Rui Yamaguchi
Laboratory of Sequence Analysis
Human Genome Center
Institute of Medical Science
University of Tokyo 4-6-1, Shirokanedai, Minato-ku, Tokyo
108-8639
Japan
Tel: +81-3-5449-5792
Fax: +81-3-5449-5790
E-mail: ruiy@ims.u-tokyo.ac.jp


[Back to top] [Purchase Article]
Performance of Feature Selection Methods

E.R. Dougherty, J. Hua
and C. Sima

High-throughput biological technologies offer the promise of finding feature sets to serve as biomarkers for medical applications; however, the sheer number of potential features (genes, proteins, etc.) means that there needs to be massive feature selection, far greater than that envisioned in the classical literature. This paper considers performance analysis for feature-selection algorithms from two fundamental perspectives: How does the classification accuracy achieved with a selected feature set compare to the accuracy when the best feature set is used and what is the optimal number of features that should be used? The criteria manifest themselves in several issues that need to be considered when examining the efficacy of a feature-selection algorithm: (1) the correlation between the classifier errors for the selected feature set and the theoretically best feature set; (2) the regressions of the aforementioned errors upon one another; (3) the peaking phenomenon, that is, the effect of sample size on feature selection; and (4) the analysis of feature selection in the framework of high-dimensional models corresponding to high-throughput data.


[Back to top] [Purchase Article]
Boolean Models of Genomic Regulatory Networks: Reduction Mappings, Inference, and External Control

I. Ivanov

Computational modeling of genomic regulation has become an important focus of systems biology and genomic signal processing for the past several years. It holds the promise to uncover both the structure and dynamical properties of the complex gene, protein or metabolic networks responsible for the cell functioning in various contexts and regimes. This, in turn, will lead to the development of optimal intervention strategies for prevention and control of disease. At the same time, constructing such computational models faces several challenges. High complexity is one of the major impediments for the practical applications of the models. Thus, reducing the size/complexity of a model becomes a critical issue in problems such as model selection, construction of tractable subnetwork models, and control of its dynamical behavior. We focus on the reduction problem in the context of two specific models of genomic regulation: Boolean networks with perturbation (BNP) and probabilistic Boolean networks (PBN). We also compare and draw a parallel between the reduction problem and two other important problems of computational modeling of genomic networks: the problem of network inference and the problem of designing external control policies for intervention/altering the dynamics of the model.


[Back to top] [Purchase Article]
Review of Peak Detection Algorithms in Liquid-Chromatography-Mass Spectrometry

J. Zhang, E. Gonzalez, T. Hestilow, W. Haskins
and Y. Huang

In this review, we will discuss peak detection in Liquid-Chromatography-Mass Spectrometry (LC/MS) from a signal processing perspective. A brief introduction to LC/MS is followed by a description of the major processing steps in LC/MS. Specifically, the problem of peak detection is formulated and various peak detection algorithms are described and compared.


[Back to top] [Purchase Article]
Hidden Markov Models and their Applications in Biological Sequence Analysis

B.-J. Yoon

Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as pairwise and multiple sequence alignments, gene annotation, classification, similarity search, and many others.


[Back to top] [Purchase Article]
Inference of Gene Regulatory Networks Using Time-Series Data: A Survey

C. Sima, J. Hua
and S. Jung

The advent of high-throughput technology like microarrays has provided the platform for studying how different cellular components work together, thus created an enormous interest in mathematically modeling biological network, particularly gene regulatory network (GRN). Of particular interest is the modeling and inference on time-series data, which capture a more thorough picture of the system than non-temporal data do. We have given an extensive review of methodologies that have been used on time-series data. In realizing that validation is an impartible part of the inference paradigm, we have also presented a discussion on the principles and challenges in performance evaluation of different methods. This survey gives a panoramic view on these topics, with anticipation that the readers will be inspired to improve and/or expand GRN inference and validation tool repository.


[Back to top] [Purchase Article]
Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics

L. Dalton, V. Ballarin
and M. Brun

The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation in­dex is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.




Copyright © Bentham Science Publishers Ltd    Terms and Conditions
toptop