Current Bioinformatics

ISSN: 1574-8936


OPEN ACCESS PLUS


Contents


Genome Annotation in Plants and Fungi: EuGène as a Model Platform, 2008, 3, 87-97
Sylvain Foissac, Jérôme Gouzy, Stephane Rombauts, Catherine Mathé, Joëlle Amselem, Lieven Sterck, Yves Van de Peer, Pierre Rouzé
and Thomas Schiex
[Abstract] [Full Text Article]


Computational Approaches for Predicting Causal Missense Mutations in Cancer Genome Projects, 2008, 3, 46-55
Lawrence S. Hon, Joshua S. Kaminker
and Zemin Zhang
[Abstract] [Full Text Article]


IMGT Colliers de Perles: Standardized Sequence-Structure Representations of the IgSF and MhcSF Superfamily Domains, 2007, 2, 21-30
Quentin Kaas and Marie-Paule Lefranc
[Abstract] [Full Text Article]


Phenotype Data: A Neglected Resource in Biomedical Research?, 2006, 1, 347-358
Philip Groth and Bertram Weiss
[Abstract] [Full Text Article]


The Role of the COG Database in Comparative and Functional Genomics, 2006, 1, 291-300
Michael Kaufmann
[Abstract] [Full Text Article]




Abstracts




[Back to top]
Genome Annotation in Plants and Fungi: EuGène as a Model Platform

Sylvain Foissac, Jérôme Gouzy, Stephane Rombauts, Catherine Mathé, Joëlle Amselem, Lieven Sterck, Yves Van de Peer, Pierre Rouzé
and Thomas Schiex

[Full Text Article]

In this era of whole genome sequencing, reliable genome annotations (identification of functional regions) are the cornerstones for many subsequent analyses. Not only is careful annotation important for studying the gene and gene family content of a genome and its host, but also for wide scale transcriptome and proteome analyses attempting to describe a certain biological process or to get a global picture of a cell's behavior. Although the number of sequenced genomes is increasing thanks to the application of new technologies, genome wide analyses will critically depend on the quality of the genome annotations. However, the annotation process is more complicated in the plant field than in the animal field because of the limited funding that leads to much fewer experimental data and less annotation expertise. This situation calls for highly automated annotation platforms that can make the best use of all available data, experimental or not. We discuss how the gene prediction (the process of predicting protein gene structures in genomic sequences) research field increasingly shifts from methods that typically exploited one or two types of data to more integrative approaches that simultaneously deal with various experimental, statistical, or other in silico evidence. We illustrate the importance of integrative approaches for producing high quality automatic annotations of genomes of plants and algae as well as of fungi that live in close association with plants using the platform EuGène as an example.


[Back to top]
Computational Approaches for Predicting Causal Missense Mutations in Cancer Genome Projects

Lawrence S. Hon, Joshua S. Kaminker
and Zemin Zhang

[Full Text Article]

A central focus of cancer genetics is the study of mutations that are causally implicated in tumorigenesis. Although missense variants are commonly identified in genomic sequence, only a small fraction directly contributes to oncogenesis. The ability to distinguish those somatic missense changes that contribute to cancer progression from those that do not is a difficult problem usually accomplished through functional in vivo analyses. With the advent of several large-scale cancer genome projects geared toward identifying mutations that are causally implicated in cancer, it is becoming increasingly important to develop methods for distinguishing functionally relevant mutations from those passenger mutations and other innocuous polymorphisms. Here we review two general strategies that are based on either mutation frequency data or the nature of amino acid substitutions. Frequency-based methods are commonly used for estimating the enrichment of causal mutations and for identifying specific mutations under positive selection pressure. The statistical power of these methods is dependent on the number of cancer samples being surveyed. The potential functional consequences of missense mutations can also be examined by bioinformatics approaches since multiple computational methods have been developed to estimate the deleterious effect of amino acid substitutions. It is likely that many of the existing methods can potentially be applied to large-scale cancer genome data to detect relevant causal mutations regardless of their prevalence. Future data analysis of missense somatic mutations will likely benefit from continual development of integrated and automated methods for combining all available information to predict whether a particular mutation is causally implicated.


[Back to top]
IMGT Colliers de Perles: Standardized Sequence-Structure Representations of the IgSF and MhcSF Superfamily Domains
Quentin Kaas
and Marie-Paule Lefranc

[Full Text Article]

IMGT®, the international ImMunoGeneTics information system® (http://imgt.cines.fr) provides a common access to expertly annotated data on the genome, proteome, genetics and structure of immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) of human and other vertebrates, and related proteins of the immune system (RPI) of any species. RPI include proteins that belong to the immunoglobulin superfamily (IgSF) and MHC superfamily (MhcSF). IMGT has set up a unique numbering system, which takes into account the structural features of the Ig-like and Mhc-like domains. In this paper, we describe the IMGT Scientific chart rules for the description of the IgSF V type and C type and of the MhcSF G type domains. These rules are based on the IMGT-ONTOLOGY concepts and are applicable for the sequence and structure analysis, whatever the species, the IgSF or MhcSF protein, or the chain type. We present examples of IMGT Colliers de Perles of IgSF V type (V-DOMAIN and V-LIKE-DOMAIN), C type (C-DOMAIN and C-LIKE-DOMAIN) and MhcSF G type (G-DOMAIN and G-LIKE-DOMAIN) based on the IMGT unique numbering. These standardized two-dimensional graphical representations are particularly useful for antibody engineering, sequence-structure analysis, visualization and comparison of positions for mutations, polymorphisms and contact analysis


[Back to top]
Phenotype Data: A Neglected Resource in Biomedical Research?
Philip Groth
and Bertram Weiss

[Full Text Article]

To a great extent, our phenotype is determined by our genetic material. Many genotypic modifications may ultimately become manifest in more or less pronounced changes in phenotype. Despite the importance of how specific genetic alterations contribute to the development of diseases, surprisingly little effort has been made towards exploiting systematically the current knowledge of genotype-phenotype relationships. In the past, genes were characterized with the help of so-called "forward genetics" studies in model organisms, relating a given phenotype to a genetic modification. Analogous studies in higher organisms were hampered by the lack of suitable high-throughput genetic methods. This situation has now changed with the advent of new screening methods, especially RNA interference (RNAi) which allows to specifically silence gene by gene and to observe the phenotypic outcome. This ongoing large-scale characterization of genes in mammalian in-vitro model systems will increase phenotypic information exponentially in the very near future. But will our knowledge grow equally fast? As in other scientific areas, data integration is a key problem. It is thus still a major bioinformatics challenge to interpret the results of large-scale functional screens, even more so if sets of heterogeneous data are to be combined. It is now time to develop strategies to structure and use these data in order to transform the wealth of information into knowledge and, eventually, into novel therapeutic approaches. In light of these developments, we thoroughly surveyed the available phenotype resources and reviewed different approaches to analyzing their content. We discuss hurdles yet to be overcome, i.e. the lack of data integration, the missing adequate phenotype ontologies and the shortage of appropriate analytical tools. This review aims to assist researchers keen to understand and make effective use of these highly valuable data.


[Back to top]
The Role of the COG Database in Comparative and Functional Genomics
Michael Kaufmann

[Full Text Article]

A major breakthrough in classifying proteins from different microbial genomes in terms of sequence similarity was the development of the COG concept by Tatusov et al. in 1997. The authors defined clusters of orthologous groups of proteins (COGs) by strictly applying all against all BLAST alignments of protein sequences from completely sequenced microbial genomes. The latest update of the COG database already covered 66 microbial genomes and additionally included the KOG database, an equivalent consisting of seven eukaryotic genomes. Although excellent web-based software tools designed to analyze this huge amount of data were initially provided by the authors, many other groups independently developed more specialized or extended programs making use of COG data for diverse purposes. Here a brief introduction is given to the concept behind COGs and their potentials in the field of comparative and functional genomics are discussed. The review then is focused on the multitude of recently developed web services aimed at mining the COG database. Their capabilities to solve diverse problems in biochemistry are addressed. In order to illustrate the broad field of possible applications, a compilation of recently published findings, implementing information derived from comparative genomics with emphasis on data retrieved from the COG database, is given.
.




Copyright © Bentham Science Publishers Ltd    Terms and Conditions
toptop