N-Linear Algebraic Maps for Chemical Structure Codification: A Suitable Generalization for Atom-pair Approaches?
Cesar R. Garcia-Jacas, Yovani Marrero-Ponce, Stephen J. Barigye, Jose R. Valdes-Martini, Oscar M. Rivera-Borroto and Jesus Olivero-VerbelAffiliation:
Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Universidad Central “Martha Abreu” de Las Villas, Santa Clara, 54830, Villa Clara, Cuba.
AbstractThe present manuscript introduces, for the first time, a novel 3D-QSAR alignment free method (QuBiLS-MIDAS) based on tensor concepts through the use of the three-linear and four-linear algebraic forms as specific cases of n-linear maps. To this end, the kth three-tuple and four-tuple spatial-(dis)similarity matrices are defined, as tensors of order 3 and 4, respectively, to represent 3Dinformation among “three and four” atoms of the molecular structures. Several measures (multi-metrics) to establish (dis)-similarity relations among “three and four” atoms are discussed, as well as, normalization schemes proposed for the n-tuple spatial-(dis)similarity matrices based on the simple-stochastic and mutual probability algebraic transformations. To consider specific interactions among atoms, both for the global and local indices, n-tuple path and length cut-off constraints are introduced. This algebraic scaffold can also be seen as a generalization of the vector-matrix-vector multiplication procedure (which is a matrix representation of the traditional linear, quadratic and bilinear forms) for the calculation of molecular descriptors and is thus a new theoretical approach with a methodological contribution. A variability analysis based on Shannon’s entropy reveals that the best distributions are achieved with the ternary and quaternary measures corresponding to the bond and dihedral angles. In addition, the proposed indices have superior entropy behavior than the descriptors calculated by other programs used in chemo-informatics studies, such as, DRAGON, PADEL, Mold2, and so on. A principal component analysis shows that the novel 3D n-tuple indices codify the same information captured by the DRAGON 3D-indices, as well as, information not codified by the latter. A QSAR study to obtain deeper criteria on the contribution of the novel molecular parameters was performed for the binding affinity to the corticosteroid-binding globulin, using Cramer’s steroid database. The achieved results reveal superior statistical parameters for the Bond Angle and Dihedral Angle approaches, consistent with the results obtained in variability analysis. Finally, the obtained QuBiLS-MIDAS models yield superior performances than all 3D-QSAR methods reported in the literature using the 31 steroids as training set, and for the popular division of Cramer’s database in training (1-21) and test (22-31) sets, comparable to superior results in the prediction of the activity of the steroids are obtained. From the results achieved, it can be suggested that the proposed QuBiLS-MIDAS N-tuples indices are a useful tool to be considered in chemo-informatics studies.
3D Three-linear and four-linear indices, aggregation operator, Cramer’s steroid, N-tuple simple stochastic and mutual probability matrices, N-tuple spatial-(Dis)similarity matrix, principal component analysis, QuBiLS-MIDAS N-tuples, QSAR, shannon entropy, TOMOCOMD- CARDD, variability analysis.
Purchase Online Order Reprints Order Eprints Rights and Permissions