The fingerprints that are widely used for similarity-based virtual screening
typically encode the presence or absence of fragments, without any indication as to their
relative importance. This chapter discusses the use of weighted fingerprints, where each
fragment is associated with a weight denoting its degree of importance in quantifying the
degree of similarity between a reference structure and a database structure. Extensive
studies using the World of Molecular Bioactivity and MDL Drug Data Report databases
show that weighting fragments according to their frequency of occurrence within a
molecule can increase the effectiveness of screening, but that this is not the case when
fragments are weighted according to their frequency of occurrence within a database.
Keywords: Chemoinformatics, ECFC4 fingerprint, extended connectivity
fingerprint counts fingerprint, fingerprint, fragment weighting scheme, frequency
weighting, IDF weighting, information retrieval, inverse frequency weighting,
ligand-based virtual screening, /MDL Drug Data Report/database, similarity-based
virtual screening, similarity coefficient, similarity searching, TF weighting, virtual
screening, weighting scheme, /World of Molecular Bioactivity/database.