|
Current
Bioinformatics
ISSN: 1574-8936
OPEN ACCESS PLUS
Contents

Genome Annotation in Plants and Fungi: EuGène
as a Model Platform, 2008, 3, 87-97
Sylvain Foissac, Jérôme Gouzy, Stephane
Rombauts, Catherine Mathé, Joëlle Amselem, Lieven
Sterck, Yves Van de Peer, Pierre Rouzé and
Thomas Schiex
[Abstract] [Full
Text Article]
Computational Approaches for Predicting Causal Missense
Mutations in Cancer Genome Projects, 2008, 3, 46-55
Lawrence S. Hon, Joshua S. Kaminker and
Zemin Zhang
[Abstract] [Full
Text Article]
IMGT Colliers de Perles: Standardized Sequence-Structure
Representations of the IgSF and MhcSF Superfamily Domains,
2007, 2, 21-30
Quentin Kaas and Marie-Paule Lefranc
[Abstract] [Full
Text Article]
Phenotype Data: A Neglected Resource in Biomedical
Research?, 2006, 1, 347-358
Philip Groth and Bertram Weiss
[Abstract] [Full
Text Article]
The Role of the COG Database in Comparative and Functional
Genomics, 2006, 1, 291-300
Michael Kaufmann
[Abstract] [Full
Text Article]
Abstracts

[Back to top]
Genome Annotation in Plants and Fungi: EuGène as a
Model Platform
Sylvain Foissac, Jérôme Gouzy, Stephane
Rombauts, Catherine Mathé, Joëlle Amselem, Lieven
Sterck, Yves Van de Peer, Pierre Rouzé and
Thomas Schiex
[Full
Text Article]
In this era of whole genome sequencing, reliable genome annotations
(identification of functional regions) are the cornerstones
for many subsequent analyses. Not only is careful annotation
important for studying the gene and gene family content of
a genome and its host, but also for wide scale transcriptome
and proteome analyses attempting to describe a certain biological
process or to get a global picture of a cell's behavior. Although
the number of sequenced genomes is increasing thanks to the
application of new technologies, genome wide analyses will
critically depend on the quality of the genome annotations.
However, the annotation process is more complicated in the
plant field than in the animal field because of the limited
funding that leads to much fewer experimental data and less
annotation expertise. This situation calls for highly automated
annotation platforms that can make the best use of all available
data, experimental or not. We discuss how the gene prediction
(the process of predicting protein gene structures in genomic
sequences) research field increasingly shifts from methods
that typically exploited one or two types of data to more
integrative approaches that simultaneously deal with various
experimental, statistical, or other in silico evidence.
We illustrate the importance of integrative approaches for
producing high quality automatic annotations of genomes of
plants and algae as well as of fungi that live in close association
with plants using the platform EuGène as an example.
[Back to top]
Computational Approaches for Predicting Causal Missense Mutations
in Cancer Genome Projects
Lawrence S. Hon, Joshua S. Kaminker and
Zemin Zhang
[Full
Text Article]
A central focus of cancer genetics is the study of mutations
that are causally implicated in tumorigenesis. Although missense
variants are commonly identified in genomic sequence, only
a small fraction directly contributes to oncogenesis. The
ability to distinguish those somatic missense changes that
contribute to cancer progression from those that do not is
a difficult problem usually accomplished through functional
in vivo analyses. With the advent of several large-scale
cancer genome projects geared toward identifying mutations
that are causally implicated in cancer, it is becoming increasingly
important to develop methods for distinguishing functionally
relevant mutations from those passenger mutations and other
innocuous polymorphisms. Here we review two general strategies
that are based on either mutation frequency data or the nature
of amino acid substitutions. Frequency-based methods are commonly
used for estimating the enrichment of causal mutations and
for identifying specific mutations under positive selection
pressure. The statistical power of these methods is dependent
on the number of cancer samples being surveyed. The potential
functional consequences of missense mutations can also be
examined by bioinformatics approaches since multiple computational
methods have been developed to estimate the deleterious effect
of amino acid substitutions. It is likely that many of the
existing methods can potentially be applied to large-scale
cancer genome data to detect relevant causal mutations regardless
of their prevalence. Future data analysis of missense somatic
mutations will likely benefit from continual development of
integrated and automated methods for combining all available
information to predict whether a particular mutation is causally
implicated.
[Back to top]
IMGT Colliers de Perles: Standardized Sequence-Structure
Representations of the IgSF and MhcSF Superfamily Domains
Quentin Kaas and Marie-Paule Lefranc
[Full
Text Article]
IMGT®,
the international ImMunoGeneTics information system®
(http://imgt.cines.fr) provides a common access to expertly
annotated data on the genome, proteome, genetics and structure
of immunoglobulins (IG), T cell receptors (TR), major histocompatibility
complex (MHC) of human and other vertebrates, and related
proteins of the immune system (RPI) of any species. RPI include
proteins that belong to the immunoglobulin superfamily (IgSF)
and MHC superfamily (MhcSF). IMGT has set up a unique numbering
system, which takes into account the structural features of
the Ig-like and Mhc-like domains. In this paper, we describe
the IMGT Scientific chart rules for the description of the
IgSF V type and C type and of the MhcSF G type domains. These
rules are based on the IMGT-ONTOLOGY concepts and are applicable
for the sequence and structure analysis, whatever the species,
the IgSF or MhcSF protein, or the chain type. We present examples
of IMGT Colliers de Perles of IgSF V type (V-DOMAIN and V-LIKE-DOMAIN),
C type (C-DOMAIN and C-LIKE-DOMAIN) and MhcSF G type (G-DOMAIN
and G-LIKE-DOMAIN) based on the IMGT unique numbering. These
standardized two-dimensional graphical representations are
particularly useful for antibody engineering, sequence-structure
analysis, visualization and comparison of positions for mutations,
polymorphisms and contact analysis
[Back to top]
Phenotype Data: A Neglected Resource in Biomedical
Research?
Philip Groth and Bertram Weiss
[Full
Text Article]
To a great extent, our phenotype is determined by our genetic
material. Many genotypic modifications may ultimately become
manifest in more or less pronounced changes in phenotype.
Despite the importance of how specific genetic alterations
contribute to the development of diseases, surprisingly little
effort has been made towards exploiting systematically the
current knowledge of genotype-phenotype relationships. In
the past, genes were characterized with the help of so-called
"forward genetics" studies in model organisms, relating
a given phenotype to a genetic modification. Analogous studies
in higher organisms were hampered by the lack of suitable
high-throughput genetic methods. This situation has now changed
with the advent of new screening methods, especially RNA interference
(RNAi) which allows to specifically silence gene by gene and
to observe the phenotypic outcome. This ongoing large-scale
characterization of genes in mammalian in-vitro model
systems will increase phenotypic information exponentially
in the very near future. But will our knowledge grow equally
fast? As in other scientific areas, data integration is a
key problem. It is thus still a major bioinformatics challenge
to interpret the results of large-scale functional screens,
even more so if sets of heterogeneous data are to be combined.
It is now time to develop strategies to structure and use
these data in order to transform the wealth of information
into knowledge and, eventually, into novel therapeutic approaches.
In light of these developments, we thoroughly surveyed the
available phenotype resources and reviewed different approaches
to analyzing their content. We discuss hurdles yet to be overcome,
i.e. the lack of data integration, the missing adequate phenotype
ontologies and the shortage of appropriate analytical tools.
This review aims to assist researchers keen to understand
and make effective use of these highly valuable data.
[Back to top]
The Role of the COG Database in Comparative and Functional
Genomics
Michael Kaufmann
[Full
Text Article]
A major breakthrough in classifying proteins from different
microbial genomes in terms of sequence similarity was the
development of the COG concept by Tatusov et al.
in 1997. The authors defined clusters of orthologous groups
of proteins (COGs) by strictly applying all against all BLAST
alignments of protein sequences from completely sequenced
microbial genomes. The latest update of the COG database already
covered 66 microbial genomes and additionally included the
KOG database, an equivalent consisting of seven eukaryotic
genomes. Although excellent web-based software tools designed
to analyze this huge amount of data were initially provided
by the authors, many other groups independently developed
more specialized or extended programs making use of COG data
for diverse purposes. Here a brief introduction is given to
the concept behind COGs and their potentials in the field
of comparative and functional genomics are discussed. The
review then is focused on the multitude of recently developed
web services aimed at mining the COG database. Their capabilities
to solve diverse problems in biochemistry are addressed. In
order to illustrate the broad field of possible applications,
a compilation of recently published findings, implementing
information derived from comparative genomics with emphasis
on data retrieved from the COG database, is given.
.
|