Combinatorial
Chemistry & High Throughput Screening
ISSN: 1386-2073

Combinatorial Chemistry &
High Throughput Screening
Volume 12, Number 4, May 2009
Contents
Machine Learning for Virtual Screening (Part 1)
Guest Editor: Ovidiu Ivanciuc
Editorial Pp. 330-331
Machine Learning in Virtual Screening Pp.
332-343
James L. Melville, Edmund K. Burke and
Jonathan D. Hirst
[Abstract]
[Purchase
Article]
[PMID:
19442063 PubMed - indexed for MEDLINE]
Comparative Analysis of Machine Learning
Methods in Ligand-Based Virtual Screening of Large Compound
Libraries Pp. 344-357
Xiao H. Ma, Jia Jia, Feng Zhu, Ying Xue,
Ze R. Li and Yu Z. Chen
[Abstract]
[Purchase
Article]
[PMID:
19442064 PubMed - indexed for MEDLINE]
Performance of Machine Learning Methods
for Ligand-Based Virtual Screening Pp. 358-368
Dariusz Plewczynski, Stéphane A.H.
Spieser and Uwe Koch
[Abstract]
[Purchase
Article]
[PMID:
19442065 PubMed - indexed for MEDLINE]
Virtual Screening for Cytochromes P450:
Successes of Machine Learning Filters Pp.
369-382
Julien Burton, Ismail Ijjaali, François
Petitet, André Michel and Daniel P. Vercauteren
[Abstract]
[Purchase
Article]
[PMID:
19442071 PubMed - indexed for MEDLINE]
Scaffold-Hopping Potential of Fragment-Based
De Novo Design: The Chances and Limits of Variation
Pp. 383-396
Bjoern A. Krueger, Axel Dietrich, Karl-Heinz
Baringhaus and Gisbert Schneider
[Abstract]
[Purchase
Article]
[PMID:
19442066 PubMed - indexed for MEDLINE]
Structure-Based Drug Screening and Ligand-Based
Drug Screening with Machine Learning Pp. 397-408
Yoshifumi Fukunishi
[Abstract]
[Purchase
Article]
[PMID:
19442067 PubMed - indexed for MEDLINE]
Virtual Screening with Support Vector
Machines and Structure Kernels Pp. 409-423
Pierre Mahé and Jean-Philippe
Vert
[Abstract]
[Purchase
Article]
[PMID:
19442068 PubMed - indexed for MEDLINE]
Reverse Fingerprinting and Mutual Information-Based
Activity Labeling and Scoring (MIBALS) Pp.
424-439
Chris Williams and Suzanne K. Schreyer
[Abstract]
[Purchase
Article]
[PMID:
19442069 PubMed - indexed for MEDLINE]
Review on Lazy Learning Regressors and
their Applications in QSAR Pp. 440-450
Abhijit J. Kulkarni, Valadi K. Jayaraman
and Bhaskar D. Kulkarni
[Abstract]
[Purchase
Article]
[PMID:
19442070 PubMed - indexed for MEDLINE]
Abstracts
[Back to top]
Editorial: Machine Learning for Virtual Screening
(Part 1)
Computer-assisted drug design is used to increase the chances
of finding valuable drug candidates, by applying a wide range
of computational methods, such as machine learning, structure-activity
relationships, quantitative structure-activity relationships,
molecular mechanics, quantum mechanics, molecular dynamics,
and drug-protein docking. Machine learning is an important
field of artificial intelligence, and includes a diversity
of methods and algorithms that extract rules and functions
from large datasets. The most important algorithms are linear
discriminant analysis, artificial neural networks, decision
trees, lazy learning, k-nearest neighbors, Bayesian
methods, Gaussian processes, support vector machines, and
kernel algorithms. This special issue presents a representative
selection of machine learning applications for the virtual
screening of chemical libraries.
In the opening paper, Melville, Burke and Hirst review recent
applications of machine learning techniques in ranking chemical
libraries based on their biological activity against a particular
protein target. Applications of ligand-based similarity searching
and structure-based docking are critically evaluated, with
an accent on the major algorithms, such as decision trees,
naïve Bayesian classifiers, artificial neural networks,
and support vector machines.
Chen et al. examine the technical aspects of ligand-based
virtual screening, such as available software, molecular descriptors,
and performance measures. The procedures reviewed include
binary kernel discrimination, k-nearest neighbors,
linear discriminant analysis, logistic regression, and probabilistic
neural networks. The detailed comparison of various studies
is especially valuable in providing an estimate of the level
of success that may be expected in virtual screening.
The comparison of various machine learning techniques is further
explored by Plewczynski, Spieser and Koch in a large-scale
evaluation of the screening success. Based on the biological
targets explored in the literature, it was found that there
is no machine learning approach that consistently provides
the best results. Thorough careful tuning of parameters, most
chemical libraries may be modeled with existing algorithms.
The study found that a promising class of methods is represented
by fusion (or ensemble) classifiers, which combine predictions
from several models and are thus able to outperform single
classifiers.
Burton et al. present an in-depth overview of recent
advances in screening ligands of cytochromes P450. The most
effective methods, which may reach 90% accuracy, are support
vector machines, decision trees, artificial neural networks,
k-nearest neighbors, and partial least squares.
Schneider et al. investigate the de novo
design of novel ligand structures for various biological targets
based on the software Flux. Extensive simulations show that
this evolutionary de novo algorithm may reconstruct
27% of all compounds from a set of known ACE inhibitors and
17% of known aldose reductase inhibitors. A lower success
rate was obtained for angiotensin-II receptor antagonists,
but the algorithms may be improved by considering retrosynthetic
routes to ring systems. Overall, the experiments demonstrate
that Flux is a valuable tool in discovering novel lead structures.
The multiple-target screening method evaluates the docking
scores of a chemical compound against a panel of biological
targets. Fukunishi presents several methods to improve the
multiple-target screening by a machine-learning score modification
that computes a new screening score as a combination of docking
scores that results in a maximum database enrichment. It is
suggested that a combination of structure-based screening
and ligand-based similarity evaluation provides higher database
enrichment.
Machine learning algorithms evaluate the molecular similarity
with various structural descriptors computed from the chemical
structure. However, the molecular graph and the three-dimensional
molecular structure may be used directly to compute the chemical
similarity, as reviewed by Mahé and Vert for support
vector machines and kernel methods. Novel molecular kernels
may be thus obtained by translating directly the chemical
structure into numerical scores of chemical similarity. Several
applications of molecular kernels in structure-activity relationships
are presented, demonstrating the modeling potential of these
novel similarity functions.
Williams and Schreyer present an original algorithm, mutual
information based activity labeling and scoring (MIBALS),
for screening molecules based on mutual information analysis
of 2D fingerprints. MIBALS was extensively tested in screening
ligands for 40 different biological targets, and the results
were promising compared with those obtained with traditional
similarity search methods. MIBALS may be applied to identify
important pharmacophore fragments, and to highlight beneficial
and detrimental groups in a congeneric series of chemicals.
Lazy learning consists of a group of memory-based local learning
methods, such as k-nearest neighbors, that delay all computations
until a request is made to predict the biological activity
of a chemical compound. Kulkarni, Jayaraman, and Kulkarni
present a comprehensive overview of regression lazy learning,
with detailed theoretical algorithms, practical applications,
and critical assessment of its advantages and limitations.
Lazy learning is a simple and robust method, which may provide
predictive structure-activity models.
This special issue of Combinatorial Chemistry and High
Throughput Screening will appear in two parts because
of redactional, technical reasons. For further details relevant
to this topic, see the second part (CCHTS Vol. 12, No. 5).
Ovidiu Ivanciuc
(Guest Editor)
Department of Biochemistry and Molecular Biology
University of Texas Medical Branch
301 University Boulevard
Galveston
TX 77555-0857
USA
E-mail: ivanciuc@gmail.com
[Back to top]
[Purchase
Article] [PMID:
19442063 PubMed - indexed for MEDLINE]
Machine Learning in Virtual Screening
James L. Melville, Edmund K. Burke and
Jonathan D. Hirst
In this review, we highlight recent applications of machine
learning to virtual screening, focusing on the use of supervised
techniques to train statistical learning algorithms to prioritize
databases of molecules as active against a particular protein
target. Both ligand-based similarity searching and structure-based
docking have benefited from machine learning algorithms, including
naïve Bayesian classifiers, support vector machines,
neural networks, and decision trees, as well as more traditional
regression techniques. Effective application of these methodologies
requires an appreciation of data preparation, validation,
optimization, and search methodologies, and we also survey
developments in these areas.
[Back to top]
[Purchase
Article] [PMID:
19442064 PubMed - indexed for MEDLINE]
Comparative Analysis of Machine Learning Methods in Ligand-Based
Virtual Screening of Large Compound Libraries
Xiao H. Ma, Jia Jia, Feng Zhu, Ying Xue,
Ze R. Li and Yu Z. Chen
Machine learning methods have been explored as ligand-based
virtual screening tools for facilitating drug lead discovery.
These methods predict compounds of specific pharmacodynamic,
pharmacokinetic or toxicological properties based on their
structure-derived structural and physicochemical properties.
Increasing attention has been directed at these methods because
of their capability in predicting compounds of diverse structures
and complex structure-activity relationships without requiring
the knowledge of target 3D structure. This article reviews
current progresses in using machine learning methods for virtual
screening of pharmacodynamically active compounds from large
compound libraries, and analyzes and compares the reported
performances of machine learning tools with those of structure-based
and other ligand-based (such as pharmacophore and clustering)
virtual screening methods. The feasibility to improve the
performance of machine learning methods in screening large
libraries is discussed.
[Back to top]
[Purchase
Article] [PMID:
19442065 PubMed - indexed for MEDLINE]
Performance of Machine Learning Methods for Ligand-Based Virtual
Screening
Dariusz Plewczynski, Stéphane A.H.
Spieser and Uwe Koch
Computational screening of compound databases has become
increasingly popular in pharmaceutical research. This review
focuses on the evaluation of ligand-based virtual screening
using active compounds as templates in the context of drug
discovery. Ligand-based screening techniques are based on
comparative molecular similarity analysis of compounds with
known and unknown activity. We provide an overview of publications
that have evaluated different machine learning methods, such
as support vector machines, decision trees, ensemble methods
such as boosting, bagging and random forests, clustering methods,
neuronal networks, naïve Bayesian, data fusion methods
and others.
[Back to top]
[Purchase
Article] [PMID:
19442071 PubMed - indexed for MEDLINE]
Virtual Screening for Cytochromes P450: Successes of Machine
Learning Filters
Julien Burton, Ismail Ijjaali, François
Petitet, André Michel and Daniel P. Vercauteren
Cytochromes P450 (CYPs) are crucial targets when predicting
the ADME properties (absorption, distribution, metabolism,
and excretion) of drugs in development. Particularly, CYPs
mediated drug-drug interactions are responsible for major
failures in the drug design process. Accurate and robust screening
filters are thus needed to predict interactions of potent
compounds with CYPs as early as possible in the process. In
recent years, more and more 3D structures of various CYP isoforms
have been solved, opening the gate of accurate structure-based
studies of interactions. Nevertheless, the ligand-based approach
still remains popular. This success can be explained by the
growing number of available data and the satisfying performances
of existing machine learning (ML) methods. The aim of this
contribution is to give an overview of the recent achievements
in ML applications to CYP datasets. Particularly, popular
methods such as support vector machine, decision trees, artificial
neural networks, k-nearest neighbors, and partial
least squares will be compared as well as the quality of the
datasets and the descriptors used. Consensus of different
methods will also be discussed. Often reaching 90% of accuracy,
the models will be analyzed to highlight the key descriptors
permitting the good prediction of CYPs binding.
[Back to top]
[Purchase
Article] [PMID:
19442066 PubMed - indexed for MEDLINE]
Scaffold-Hopping Potential of Fragment-Based De Novo
Design: The Chances and Limits of Variation
Bjoern A. Krueger, Axel Dietrich, Karl-Heinz
Baringhaus and Gisbert Schneider
The identification of new lead structures is a pivotal
task in early drug discovery. Molecular de novo design
of ligand structures has been successfully applied in various
drug discovery projects. Still, the question of the scaffold
hopping potential of drug design by adaptive evolutionary
optimization has been left unanswered. It was unclear whether
de novo design is actually able to leap away from given chemotypes
(“activity islands”), allowing for rescaffolding
of compounds. We have addressed these questions by scrutinizing
different scoring functions of our de novo design
software Flux for their ability to enable scaffold-hops for
various target classes. We evaluated both the potential bioactivity
and the scaffold diversity of de novo generated structures.
For several target classes, known lead structures were reconstructed
by the de novo algorithm (“lead-hopping”).
We demonstrate that for one or multiple templates of a given
chemotype, other chemotypes are reached during de novo
compound generation, thus indicating successful scaffold-hops.
[Back to top]
[Purchase
Article] [PMID:
19442067 PubMed - indexed for MEDLINE]
Structure-Based Drug Screening and Ligand-Based Drug Screening
with Machine Learning
Yoshifumi Fukunishi
The initial stage of drug development is the hit (active)
compound search from a pool of millions of compounds; for
this process, in silico (virtual) screening has been
successfully applied. One of the problems of in silico
screening, however, is the low hit ratio in relation to the
high computational cost and the long CPU time. This problem
becomes serious in structure-based in silico screening.
The major reason is the low accuracy of the estimation of
protein-compound binding free energy. The problem of ligand-based
in silico screening is that the conventional quantitative
structure-activity relationship (QSAR) approach is not effective
at predicting new hit compounds with new scaffolds. Recently,
machine-learning approaches have been applied to in silico
drug screening to overcome the above problems. We review here
machine-learning approaches for both structure-based and ligand-based
drug screening. Machine learning is used to improve database
enrichment in two ways, namely by improving the docking score
calculated by the protein-compound docking program and by
calculating the optimal distance between the feature vectors
of active and inactive compounds. Both approaches require
compounds that are known to be active with respect to the
target protein. In structure-based screening, the former approach
is mainly used with a protein-compound affinity matrix. In
ligand-based screening, both the former and latter approaches
are used, and the latter approach can be applied to various
kinds of descriptors, such as 1D/2D descriptors/fingerprints
and the affinity fingerprint given by the protein-compound
affinity matrix.
[Back to top]
[Purchase
Article] [PMID:
19442068 PubMed - indexed for MEDLINE]
Virtual Screening with Support Vector Machines and Structure
Kernels
Pierre Mahé and Jean-Philippe
Vert
Support vector machines and kernel methods have recently
gained considerable attention in chemoinformatics. They offer
generally good performance for problems of supervised classification
or regression, and provide a flexible and computationally
efficient framework to include relevant information and prior
knowledge about the data and problems to be handled. In particular,
with kernel methods molecules do not need to be represented
and stored explicitly as vectors or fingerprints, but only
to be compared to each other through a comparison
function technically called a kernel. While classical
kernels can be used to compare vector or fingerprint representations
of molecules, completely new kernels were developed in the
recent years to directly compare the 2D or 3D structures of
molecules, without the need for an explicit vectorization
step through the extraction of molecular descriptors. While
still in their infancy, these approaches have already demonstrated
their relevance on several toxicity prediction and structure-activity
relationship problems.
[Back to top]
[Purchase
Article] [PMID:
19442069 PubMed - indexed for MEDLINE]
Reverse Fingerprinting and Mutual Information-Based Activity
Labeling and Scoring (MIBALS)
Chris Williams and Suzanne K. Schreyer
A mutual information based activity labeling and scoring
(MIBALS) approach to reverse fingerprint analysis is presented.
Whole molecule scores produced by the method are shown to
be capable of ranking compounds in virtual high-throughput
screening (vHTS) experiments, while fragment scores produced
by the method are able to identify pharmacophore moieties
important for biological activity. The performance of MIBALS
in vHTS experiments is assessed using reference ligands active
against 40 different biological targets, and MIBALS retrieval
rates are compared with those obtained using more traditional
group fusion similarity search methods. The use of MIBALS
to identify important pharmacophore fragments is demonstrated
by comparing ligand fragment scores with known pharmacophores
and known ligand/protein contacts. The ability of MIBALS to
highlight beneficial and detrimental groups in a congeneric
series is examined by comparing MIBALS fragment scores with
features in known structure-activity relationships.
[Back to top]
[Purchase
Article] [PMID:
19442070 PubMed - indexed for MEDLINE]
Review on Lazy Learning Regressors and their Applications
in QSAR
Abhijit J. Kulkarni, Valadi K. Jayaraman
and Bhaskar D. Kulkarni
Building accurate quantitative structure-activity relationships
(QSAR) is important in drug design, environmental modeling,
toxicology, and chemical property prediction. QSAR methods
can be utilized to solve mainly two types of problems viz.,
pattern recognition, (or classification) where output is discrete
(i.e. class information), e.g., active or
non-active molecule, binding or non-binding molecule etc.,
and function approximation, (i.e. regression) where
the output is continuous (e.g., actual activity prediction).
The present review deals with the second type of problem (regression)
with specific attention to one of the most effective machine
learning procedures, viz. lazy learning. The methodologies
of the algorithm along with the relevant technical information
are discussed in detail. We also present three real life case
studies to briefly outline the typical characteristics of
the modeling formalism.
|