Modeling biophysical and biological properties from the characteristics of the molecular electron density, electron localization and delocalization matrices, and the electrostatic potential
Abstract
The electron density and the electrostatic potential are fundamentally related to the molecular hamiltonian, and hence are the ultimate source of all properties in the ground and excitedstates. The advantages of using molecular descriptors derived from these fundamental scalar fields, both accessible from theory and from experiment, in the formulation of quantitative structuretoactivity and structuretoproperty relationships, collectively abbreviated as QSAR, are discussed. A few such descriptors encode for a wide variety of properties including, for example, electronic transition energies, pK_{a}'s, rates of ester hydrolysis, NMR chemical shifts, DNA dimers binding energies, πstacking energies, toxicological indices, cytotoxicities, hepatotoxicities, carcinogenicities, partial molar volumes, partition coefficients (log P), hydrogen bond donor capacities, enzyme–substrate complementarities, bioisosterism, and regularities in the genetic code. Electronic fingerprinting from the topological analysis of the electron density is shown to be comparable and possibly superior to Hammett constants and can be used in conjunction with traditional bulk and liposolubility descriptors to accurately predict biological activities. A new class of descriptors obtained from the quantum theory of atoms in molecules' (QTAIM) localization and delocalization indices and bond properties, cast in matrix format, is shown to quantify transferability and molecular similarity meaningfully. Properties such as “interacting quantum atoms (IQA)” energies which are expressible into an interaction matrix of two body terms (and diagonal one body “self” terms, as IQA energies) can be used in the same manner. The proposed QSARtype studies based on similarity distances derived from such matrix representatives of molecular structure necessitate extensive investigation before their utility is unequivocally established. © 2014 The Author and the Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Introduction
Quantitative structuretoactivity (or property) relationships (QSAR/QSPR) relate a set of molecular descriptors to biological, pharmacological, or physicochemical properties.[115] For simplicity, we will ignore the nuance between property and activity and use QSAR to designate both QSAR and QSPR in this article as the principles in modeling either are essentially the same. QSAR correlations are constructed to predict the activities of compounds (physicochemical, spectroscopic, biological, pharmacological, and so forth) from other more readily measurable experimental properties or from calculated properties regardless of the presence of a clear predictor–predicted causal link.
Not infrequently, QSAR models, despite of being predictive (and hence useful), provide little or no clue as to why they work. We can cite, for example, the chemical graph theoretical (connectivity) indices,[6] which useful as they may be, cannot be generally expected to provide, as such, much insight on the mode of action of a drug.
Alternatively, QSAR modeling based on descriptors[16] obtained directly from the electron density, electron (de)localization, or the molecular electrostatic potential (MESP), through the application of a topological analysis[1719] can reach beyond the primarily utilitarian approach.[15] Descriptors derived from the electron density[16] (Fig. 1) have been used by various research groups in the construction of physically insightful QSAR models.[15] The present review samples the relevant literature and expands the use of such descriptors in QSAR, especially those obtained from Bader's quantum theory of atoms in molecules (QTAIM),[1719] in the construction of predictive and physically insightful QSAR models. The terminology of QTAIM will be used without definition since it is reviewed elsewhere.[1724]
Electron densityderived descriptors have clear physical meanings that can be often related to the predicted property. In addition, these descriptors are often accessible from both experiment (Xray diffraction)[19, 2529] and theoretical calculations except for properties that require the full density matrix obtainable only from quantum mechanical calculations.
The analysis of the electron density can be a rich source of raw atomic and bonding descriptors that can be processed through empirical models to predict (and possibly explain) physicochemical and biological properties of molecules.[15] The use of QTAIM descriptors in QSAR may possibly increase drastically in the near future due to the fast and evergrowing availability of computational resources and efficient computer programs.
A particular advantage of QTAIM analysis of the electron density is that it provides a complete set of consistent properties (not only atomic charges) within one coherent and quantum mechanically sound theory.[17] Further, QTAIM atomic and bond properties are wellknown for their insensitivity to the underlying levels of ab initio electronic structure calculations, an assertion especially true when differences rather than absolute values are considered.[3032] This is contrasted with, say, Hirshfeld or Mulliken atomic charges, which are arbitrarily defined and basis set unstable.
Quantitative Structure–Activity Relationships (QSAR)
QSAR modeling[115] aims at the construction of empirical formulae capable of predicting the biological activity of an untested molecule given a set of its structural and electronic properties, often on the basis of some sort of multivariate regression.[8] This approach is widespread in the field of drug discovery and is especially useful when the mechanism of action of the drug is not known. The field is so vast that there are journals with names such as SAR QSAR Environ. Res. or Mol. Inform. (formerly Quant. Struct.Act. Relat. and later QSAR Comb. Sci.) and even a UKbased society (The QSAR Society).
The properties used in the construction of a QSAR model can include experimentally determined quantities, results of (quantum chemical) calculations, simple counts of atoms or bond types, and connectivity information regardless of the presence of an obvious causal link between them and the modeled activity. The basic assumption is expressed as
in which C is the concentration of the compound necessary to reach a biological endpoint such as LD_{50} (lethal dose 50%), IC_{50} (inhibitory concentration 50%), or ED_{50} (effective dose 50%), x_{i} is the ith predictor (from a set of n) raised to the jth power, and a_{i}_{j} are the weighing coefficient obtained from the statistical fitting. The last equality of Eq. (1) gives the explicit form of the oft assumed function relating the predictors to the predicted response. The predictors can be experimental or calculated properties whether structural, physicochemical, or quantum chemical or any combinations of all the above. It is recommended that the number of compounds used to build the statistical model be at least 5n, and the rule of thumb is that the greater the ratio of the number of compounds to the number of parameters, the better the model.[10]
It is impossible to review the immense QSAR literature but we cite a few examples here to illustrate the principally utilitarian use of QSAR in drug discovery. The HOMO–LUMO gap and the local density of electronic states, for example, have been used by Vendrame and coworkers[33] to construct a QSAR model capable of predicting the biological activity of steroids with 100% success rate. These workers also constructed a related QSAR model to identify the carcinogenic activity of a given polycyclic aromatic hydrocarbon, a model that exhibited an accuracy of over 80%.[33] In another study, it was found that the quantum mechanically calculated electron affinities of 270 nitroaromatic compounds provide a statistically significant basis for the discrimination between those that are mutagenic (Ames test positive) and those that are not (Ames test negative).[34] Hatch et al.[35] developed QSAR models for 80 amines with mutagenic activities spanning 10 orders of magnitude based on the total energy of the conjugated πelectrons and on the energy of the LUMO.
Clearly, there is no shortage of examples of QSAR models in the literature. A final example to help bringing the point to the fore is that of a strongly predictive (and hence useful) model in which the connections between the predictors and the response are not obvious. In this example, the effectiveness of a series of ketones with aromatic substituents against Candida albicans, measured by the negative logarithm of the minimum inhibitory concentration (pMIC), was found to be robustly predicted by the following model:[36]
where the predictors include molecular length [Lx], ionization potential [IP], molar refractivity [MR], and the Verloop B4 steric parameter [B4 (A)] of the substituent in the aromatic ring A. The connection between these quantities and the MIC of the compound is not evident, but this does not reduce the importance of the QSAR model in the design of more potent inhibitors.
From the above few examples, one can say that the predictors in QSAR modeling can be selected to maximize the correlation with the response before other considerations such as causal link or mechanism of action. Such descriptor(s) selection can be automated by modern statistical methods implemented in numerous software packages but this blackbox approach can lead to problems such as chance correlations[37] and even the inconsistency of the sign of the correlation coefficient when only part of the original dataset is analyzed.[38] It has been argued recently that the use of electron density–based molecular descriptors can eliminate a source of uncertainty in QSAR modeling since they are welldefined, physically meaningful, and fundamentally related to activity, which would constitute one way to remedy for the aforementioned possible limitations of traditional QSAR.[15]
The Electron Density ρ(r) and QSAR Modeling
The electron density ρ(r) at a point r is the probability density of finding an electron of either spin in a volume element dτ = dx dy dz at r weighted by the total number of electrons (N). The electron density can be calculated from the manyelectron wavefunction Ψ:
where x_{i} is the set of spatial and spin coordinate of the electron i, and is short for integration over the spatial coordinates of all electrons except one and summation over both spins.
The total electric charge density ρ_{total}(r) is the sum of the negative electronic charge density and the positive nuclear charge density (discrete distribution of point nuclear charges), which in atomic units (a.u.) (e = 1) is given by:
where Z_{α} and R_{α} are the charge and position of nucleus α, and δ(R_{α} − r) is a Dirac delta function.
Hohenberg and Kohn (HK)[39] proved the existence of a bijective mapping between the electron density ρ(r) of a manyelectron nondegenerate ground state (in the absence of external magnetic fields), and the external potential v[ρ(r)], that is, the nuclear potential augmented by scalar potentials external to the system of electrons. In other words, the electron density is bijectively mapped into the nuclear charges (the atomic numbers in a.u., Z_{α}) and positions [δ(R_{α} − r)], which, together, generate to the external potential. This last assertion is a consequence of Kato's cusp condition, which states that at the position of an atomic nucleus the derivative of the spherically averaged density ( ) with respect to the distance from that nucleus ( ) equals .[40] Further, the electron density also determines the total number of electrons in the system N via its integral over all space (Fig. 2). This mapping implies the interdeterminancy of the two terms of the R.H.S. of Eq. (4), and thus the electron density analyzed in this work, contains the same information as the total charge density.
As specifying v[ρ(r)] and N[ρ(r)] completely defines the Hamiltonian (Fig. 2), the groundstate wavefunction Ψ[ρ(r)] and all derived groundstate properties {O} including the total energy are uniquely specified as well, thus we may write:
The electron density is thus the scalar field par excellence for the construction of meaningful QSAR correlations since by necessity all molecular properties are uniquely mapped to it whether the exact functional relationships is known or not.
Riess and Münch (RM) demonstrated that the first HK theorem, expressed in the relation (5), applies not only to the total system but also that it applies to any finite bounded subsystem of any size and shape.[41] That is to say, an arbitrarily small open subsystem within a molecule enclosed by sharp but otherwise arbitrary boundaries can be mapped uniquely to all groundstate properties of the full system, including the total full electron density,[41] which can be written symbolically as:
where ω is any open system with sharp boundaries delimiting it, in threedimensional (3D) space, from the rest of the total manyelectron system.
Bader and Becker[42] applied this extension of the HK theorem to atoms in molecules (AIMs) bounded by surfaces of zeroflux in the gradient vector field associated with the electron density (when ω = Ω). Mezey[43] illustrates the use of RM's extension of the HK theorem in conjunction with Carbó's similarity index in what he terms the “holographic electron density theorem”. Mapping (6) constitutes fundamental grounds for basing the empirical modeling of properties in QSARtype studies on electron densityderived properties. This is particularly true when ω itself is welldefined from quantum mechanics, that is, when ω = Ω.[17]
Bader and Zou[44] emphasized that Dirac's definition of a quantum mechanical “observable”[45] applies to ρ(r) as it fulfils the following conditions:
 ρ(r) is a real dynamical variable, which is the expectation value of a linear Hermitian operator , and
 The eigenstates of form a complete set of coordinate states .
Dirac comments further that “[i]n practice it may be very awkward, or perhaps even beyond the ingenuity of the experimenter, to devise an apparatus which could measure some particular observable, but the theory always allows one to imagine that the measurement can be made.”[45] While not every “observable” in the quantum mechanical sense is observable in practice, the electron density is. Thus, the electron density is not only the source of all groundstate properties but also it is a quantum mechanical observable, which is readily accessible from both theory and experiment (primarily from Xray scattering experiments),[19, 2529] Figure 3. These qualities of the density when considered together with relation (6) lead one to conclude that atomic properties such as atomic charges themselves are quantum mechanical observables accessible experimentally and, contrary to the widely held view, are unique.[44, 46]
The Electron Density and Molecular Similarity
A minimum degree of similarity is often required to exist between the molecules used in the construction of a QSAR model, say a common molecular skeleton with different substituents such as the series of substituted benzoic acids.[47] Researchers have used a diversity of molecular features in quantifying similarity, some base similarity on a comparison of amino acid sequences,[4850] others on the degree of mismatching of 2D chemical graphs,[51] and others on 3D molecular superpositions.[52] By relations (5) and (6), a similarity of ρ(r) necessarily leads to the similarity of all other groundstate properties, and hence the most fundamental molecular comparisons are those effected at the electron density level. This realization is credited to CarbóDorca and coworkers[9, 5355] who used the electron density and its associated MESP[5668] to quantify molecular similarity. Carbó's (dimensionless) similarity index can be written in its general form as[9, 53]
where and are the electron densities of ith and jth molecules, Θ is an alignment parameter, and is an operator that defines the similarity index. If , a Dirac delta function, and assuming maximal alignment the index is[9]
and when , the Coulomb operator, is a Coulomb similarity index.
The denominator in Eqs. (7) or (8) causes to become unity when a molecule is compared with itself (i = j). Thus Carbó indexes range from a maximum of unity (identity) and approach but never reach zero, since there will always be some overlap of electron density no matter how dissimilar two molecules are (i.e., ).
Cioslowski and Nanayakkara (CN)[69] extend the applicability of a Carbótype indexes to pairs of AIMs as defined by QTAIM. In this approach, the comparison is between a pair of AIMs (AIMs A and B) rather than between pairs of molecules (molecules M_{i} and M_{j}). This is achieved by taking into account only the intersection of the spaces occupied by the two atoms delimited from the rest of their respective molecules by zeroflux surfaces. When the atom is not bound and extends to infinity, then a cutoff of 0.001 a.u. is taken as its outer isosurface. The atomic similarity index is defined as[69]
where
and where the intersection of the two basins is calculated after superposing the nuclei of the two atoms and maximizing the index through a rotation that maximizes their overlap. If one compares two atoms in the same molecule, then i = j in Eqs. (9) and (10). CN provide several examples showing that their atomic similarity index yields results that conform with chemical observation, intuition, and expectation.[69]
Despite of their groundbreaking nature, Carbó's indexes can be impractical because (i) they necessitate molecular alignment, that is, maximizing through an exhaustive search of the superposition space[9, 70, 71] and (ii) can be dominated by nuclear maxima, which overshadows the subtle differences in the valence shells that govern chemistry. A solution to these problems has been proposed by Popelier in what he terms the quantum topological molecular similarity (QTMS) approach, briefly reviewed below.
The Electron Density, Transferability, and Additivity
The contributions of a given atom (or of a group of atoms) to a set of molecular properties can often be predicted from its corresponding contributions to the molecular properties of a chemically different molecule. When such a prediction is possible with chemical accuracy the atom (or group) is said to be transferable. Transferability, when coupled with additivity, leads to additivity schemes such as those developed by Benson and coworkers.[7274]
Bader and coworkers[17, 24, 7582] stress that the QTAIM definition of AIMs maximizes their transferability. QTAIM partitions space exhaustively at the zeroflux surfaces[8385] without any gaps unaccounted for into “nonoverlapping, bounded, spacefilling open quantum systems”,[82] and hence AIMs defined in this manner also exhibit additivity since
where the left hand side (L.H.S.) is the usual molecular expectation value of a linear Hermitian operator , while the three equivalent expressions on the R.H.S. yield the molecular value as a sum of atomic averages obtained from the integral of a real space dressed density ρ_{O}(r) over the volume of each atom Ω.
The transferability and additivity of atomic properties has been used in the theoretical reconstruction of large molecules from fragments synthesized in molds. The goal of this type of reconstruction is to bypass or reduce the nonlinear scaling bottleneck as a function of the size of ab initio calculations. Examples of such reconstruction includes the theoretical synthesis of polypeptides from amino acid residues extracted from molds and melded together,[75, 7779] and the theoretical[86] and experimental[87] reconstruction of complicated alkaloids such as the opioids from their constituent molecular building blocks. Experimentally, the advantage of this approach is to circumvent the problem of obtaining electron densities of molecules that are difficult to crystallize from the crystallographic electron densities of more readily crystallizable smaller molecules. The transferability of larger molecular fragments saturated with hydrogen atoms termed “kernels” has been extensively tested by Huang, Massa, and Karle and has been used to predict total energies, interactions energies, and binding energies of gigantic molecules with chemical accuracy at a fraction of the computational cost of direct calculations, the reconstruction method being termed the “kernel energy method.”[8895] The transferability of the electron density obtained from theory and experiment has allowed several groups to build libraries of transferable Xray crystallographic multipolar parameters that are used to assist in the refinement and in the modeling of the electron density of large biological macromolecules.[87, 96109]
It is an observational fact that properties of AIMs are often transferable within chemical precision, that is, to within ≤1 kcal mol^{−1}, which is practically tantamount to perfect transferability. Such high transferability (and additivity) has been reported for QTAIM atomic energies[17, 110] and for atomic response properties such as polarizabilities[111] and magnetic susceptibilities[112]—all closely paralleling experiment.
But why and how does transferability work? Actually there are two types of transferability: Direct transferability, which is the almost constancy of the atomic properties of a given atom or group from one molecule to another, and compensatory transferability, where the change in one atom or group cancels the opposite change of similar magnitude in its neighboring atom(s) or group(s).[76, 113]
To understand Benson's additivity schemes, one must take both types of transferability into account.[113] We quote an example given by CortésGuzmán and Bader:[113]
The energy of CH_{2}XCH_{3} is the mean of the energies of CH_{2}XCH_{2}X and CH_{3}CH_{3} to within ≤0.5 kcal mol^{−1} (1 kcal = 4.184 kJ), experimental[74] or theoretical,[77] for X = OH, Cl and I. Theoretical results for X = F give the energy of CH_{3} in fluoroethane as being 15.5 kcal mol^{−1} more stable, that is, more negative, than in ethane while the energy of CH_{2}F is found to increase by 15.6 kcal mol^{−1} relative to its value in CH_{2}FCH_{2}F, the energy changes being accompanied by a transfer of 0.061 e from CH_{3} to CH_{2}F. The same principle applies to systems with delocalized electrons, the energy of pyridine, experimental or theoretical, equaling the mean of the energies of pyrazine and benzene to within 0.1 kcal mol^{−1} using the measured heats of formation.[74]
Bader and CortésGuzmán close their paper by noting the similarity of the net result of compensatory transferability and Le Châtelier principle:[113]
There appears to be a Le Châtelier principle at work—one that states that two open systems brought into contact respond in such a way as to minimize the overall changes in form and energy, resulting in many cases in a conservation of energy.
Taking advantage of the transferability of QTAIM atoms, Breneman invented the “transferable atom equivalent” (TAE)[114117] method, which is designed to reconstruct accurate approximations to ab initio electron densities at a tiny fraction of the computational effort. A TAE is a mononuclear atomic electron density region delimited by zeroflux surfaces extracted from a mold molecule and stored in an electronic database which incorporates a large number of element combinations, atomtypes, and electronic environments. The electron density reconstruction is achieved with the RECON program by recalling the desired TAE from the database, and after the necessary rotation and translation, each newly added fragment is melded to the growing reconstructed molecule to maximize the matching of its zeroflux surface with that of the growing molecule. The automatic melding of zeroflux surfaces is affected by allowing the quantum stress tensor to exert pressure on the two surfaces until the condition of pressure equalization is satisfied, a point at which the two nearly matching surfaces become melded into one gapless approximate interatomic zeroflux surface. This melding procedure necessitates some adjustments to bond lengths.
In addition to the TAE themselves, the database also stores derivatives of their properties with respect to radial variations of the zeroflux surface. These derivatives are used to correct atomic properties for their variations accompanying the change in the interatomic surface resulting from the melding. Molecular properties/descriptors such as total energy, total volume, surface area of a given isodensity envelope, electrostatic potential, dipole moment, Fukui function, and bare nuclear potential are then calculated by combining the properties of the composing TAEs.[114117]
Several robust predictive QSAR models have been built from TAE. Examples include a model capable of the accurate prediction of the acute toxicity of 375 organic molecules to fish in which 300 molecules were used as a training set to construct the model while 75 molecules were left as the test set for which a “leave one out” crossvalidated r^{2}, usually given the symbol q^{2}, of 0.81 was obtained.[117] The speed with which ab initio quality results can be obtained using the TAE approach is relevant to high speed screening of large molecular sets ubiquitous in the in silico phase of drug design.
Categories of QTAIM Descriptors
QTAIM defines a number of atomic and bonding properties simultaneously from the topological and the topographical analysis of the electron density that extract atomicallyresolved and bondresolved levels of description of a molecule from its manyelectron wavefunction. We can classify these QTAIM properties broadly as follows:
 Bond properties. These can be grouped into: (a) local properties evaluated at the bond critical point (BCP) such as the electron density (ρ_{BCP}) or the energy densities (e.g., G_{BCP}, V_{BCP}, and H_{BCP}); (b) properties integrated along the bond path such as the bond path length; and (c) properties integrated over the interatomic zeroflux surface such as the integrated electron density.[17]
 Atomic properties [Prop. (Ω)]. These are properties integrated over the 3D atomic basin bounded by the union of zeroflux interatomic surfaces and, when extending to infinity, a chosen outer isodensity envelope, normally taken as the 0.001 a.u. envelope. The general form of this integration is given in Eq. (11), which necessarily leads to additivity of atomic properties. Examples include the atomic energy [E(Ω)], the electrostatic charge q(Ω) (electrostatic monopole) and higher multipoles such as the dipole moment [μ(Ω)].[17]
 Interatomic properties [Prop. (Ω,Ω^{′})]. These are generally integrated properties that depend on pairs of atomic basins. Examples include the delocalization index (DI) δ(Ω,Ω^{′})[118] and the interaction energy components of the interacting quantum atoms (IQA).[119121] One can also calculate interatomic properties that do not involve any integration involving two basins such as interatomic distances, interatomic Coulomb repulsion energies taking only atomic monopoles, or nuclear–nuclear repulsion energies. Interatomic properties can be organized in matrix format that can be used in QSAR (See the section entitled “Localization/Delocalization Matrices (LDM), Delocalization Matrices (DM), and DensityWeighted Connectivity Matrices (DWCM) in QSAR” below).
 MESP (and field) [V(r), E(r)]. These fields are determined entirely by the charge density distribution[5668] and represent an additional source of descriptors that are directly related to reactivity that can be used in QSAR (See the section entitled “Molecular Electrostatic Potential (MESP) and Field (MESF)” below). While these are not QTAIMderived properties, the MESP has been shown to be accurately reproduced from truncated multipolar expansion representations of the atomic densities within molecules using the QTAIM multipoles.[122] Numerical evidence has shown, thus, that the relation between QTAIM atomic electrostatic properties and the MESP is a bijective mapping, not only in principle but also in computational practice as well.
Many of these QTAIM properties have been used in QSAR studies and drug design, but one can probably say safely that this is a field in its infancy despite of the enormous literature that has already accumulated. The purpose of the remainder of this article is not an exhaustive review of this extensive literature, an overwhelming task for the present author, but rather to emphasize (i) the breadth, depth, and potential practical utility of such an endeavor, with a necessarily biased sampling of the literature and (ii) to show that QTAIM is not exclusive of other approaches, as we read sometimes in the literature, in fact—to the contrary—it gains and complements (and is complemented by) other approaches.
QTAIM Bond Properties in QSAR
The predictive power of BCP propertiesbased descriptors is illustrated with the QTMS approach of Popelier and coworkers,[123144] and the work of Buttingsrud et al.[145147]
Quantum topological molecular similarity (QTMS)
Substituent effects can be gauged by Hammett substituent constants,[1, 5, 47, 148] that is, the change in the pK_{a} of benzoic acid on substitution in the aromatic ring by the substituent S (Scheme 1):
Popelier developed a method termed “quantum topological molecular similarity” (QTMS) that has been tested extensively for its capability of reproducing the trends predicted from Hammett constants accurately.[123] The QTMS descriptors have been shown to convey essentially the same information contained in the Hammett substituent constants and may constitute an alternative in QSAR modeling since it is generally easier to compute QTMS descriptors than it is to determine the experimental values of new nontabulated Hammett constants.[124126, 129, 131, 140, 142]
In the QTMS approach, ρ(r) is sampled at the BCPs, a sampling that entails several advantages such as the speed of the calculation and the emphasis on the chemically relevant bonding regions with only an indirect perturbing influence of the chemically irrelevant nuclear regions. Further, QTMS bypasses the molecular alignment problem, a considerable timesaver compared to methods requiring maximization with respect to the parameter Θ in Eq. (7). We note in passing that unlike Eq. (7), however, QTMS cannot distinguish between enantiomers, which does not constitute a limitation as long as the collected experimental data include those obtained using the correct enantiomer(s). Recapping, the QTMS method has a number of advantages with respect to its applicability in QSARtype drug design: (i) It is entirely based on the topology of the electron density in the chemically relevant bonding region and not dominated by nuclear maxima and (ii) its is considerably fast given that it does not require molecular prealignment and/or space integration as the electron density is only sampled at a set of point (the BCPs) rather than being considered in its entirety.
The QTMS method starts with the construction of a multidimensional mathematical space using bond properties determined at the BCPs. Each bond property is represented by a Cartesian axis after converting each to a dimensionless relative scale to avoid dimensional nonhomogeneity. The dimensionless Euclidian distance between two molecules in this mathematical space is then a measure of their dissimilarity, the larger the distance the less similar they are.
Examples of bond descriptors include, but are not limited to, the electron density at the BCP (ρ_{BCP}), the Laplacian of ρ_{BCP} (▿^{2}ρ_{BCP}), the bond ellipticity (ε), the three principal curvatures of the electron density evaluated at the BCP (λ_{1}, λ_{2}, and λ_{3}), the kinetic energy density at the BCP (K_{BCP}), the total energy density at the BCP (H_{BCP}), and the equilibrium bond length (R_{e}).
If we suppose the BCP space is constructed from ρ_{BCP}, ▿^{2}ρ_{BCP}, and ε, then the distance between molecules A and B is:
where the sum runs over the i common bonds between the two molecules A and B, and where standardised BCP properties are used, which means that property x is replaced by {[x – mean(x)]/standard deviation of x}, to avoid dimensional inhomogeneity. In this way, the smaller d(A,B) the more similar are A and B with a lower bound of d = 0 when A and B are one and the same molecule.
By construction, QTMS is an electronic structure fingerprinting tool, but it is devoid of a builtin capability to represent molecular size or hydrophobicity (as recognized by Popelier on a number of occasions). This is so since the sampling is of intensive properties (densities) rather than extensive properties (integrated densities). This feature can actually be a strength of QTMS rather than a limitation, because whenever QTMS alone (i.e., without the inclusion of extraneous steric or log P information) is capable of yielding a strongly predictive QSAR model, then this implies that the properties being modeled are determined primarily by electronic structure rather than by steric bulk, molecular size, or solvent partitioning tendencies. Conversely, failure of QTMS alone to predict activity is a diagnostic that steric bulk and/or hydrophobicity are important in determining activity. In the latter cases, QTMS can be used in conjunction with traditional steric bulk and/or hydrophobicity measures to yield powerful predictive combined models.
The exclusive sensitivity of QTMS to electronic structure parallels that of the Hammett σ constants. Both Hammett constants and QTMS must be supplemented with steric bulk and/or hydrophobicity descriptors when these are important in QSAR models. This parallelism between QTMS and Hammett constants lends merit to Popelier's proposal of the eventual replacement of the latter by the former given the relative ease with which QTMS descriptors may be obtained.[130, 132]
A remarkable feature of QTMS is its apparent inherent capability to pinpoint the “active center” automatically. By this, we mean that the experimental Hammett constants (σ)sequence relative to paminobenzoic acid (Table 1) is reproduced if the BCPs of the “active center,” and only the active center (here the bonds in the carboxylic group: O()H, C()O, and C(—)OH), are included in the distance calculation. The inclusion of any additional bond(s) destroys the agreement with the experimental sequence, and the more “irrelevant” bonds are incorporated in the Euclidian space the more the disagreement with experiment[123] (Table 1, especially the last column). As a corollary and by construction, the Euclidian QTMS molecular comparison space need not include (and probably is better not to include) all the common bonds of a series of molecules. In other words, one can include in the QTMS distance calculation only those bonds existing in a common “subgraph” that maximizes the predictability of the model. That is to say, given a wellcharacterized active center in otherwise unrelated molecules (i.e., molecules differing in their molecular skeletons), the relative activity can be predicted from the distances computed from the properties of the bonds in this active center alone. Figure 4a supports this hypothesis as can be seen from the plot of pK_{a} values predicted on the basis of the Euclidian distance including only the bonds of the carboxylic groups of 40 unrelated organic acids[129] after a selection of the strongest predictors (from the set of BCP properties) on the basis of a partial least square (PLS) regression procedure. A less stringent test is presented in the plots of Figures 4b and 4c that demonstrate the ability of QTMS to predict the pK_{a}'s of related molecules with accuracy: Substituted anilines in 4b and substituted phenols in 4c. The possibility of chance correlations has been eliminated in this study through the “leave one out” cross validation[129] leading to final statistics that are of higher quality than those obtained from traditional descriptors such as Hammett constants.[129]
Experimental sequence  NH_{2}  OCH_{3}  CH_{3}  H  F  Cl  CN  NO_{2}  # of disagreements/8  

QTMS sequences  No. of BCPs  


OH  1  NH_{2}  OCH_{3}  CH_{3}  H  F  Cl  CN  NO_{2}  0 
CO  1  NH_{2}  OCH_{3}  CH_{3}  H  F  Cl  CN  NO_{2}  0 
COOH  3  NH_{2}  OCH_{3}  CH_{3}  H  F  Cl  CN  NO_{2}  0 
CCOOH  4  NH_{2}  OCH_{3}  F  CH_{3}  Cl  H  CN  NO_{2}  4 
C_{6}  6  NH_{2}  CN  CH_{3}  OCH_{3}  H  Cl  NO_{2}  F  5 
C_{6}H_{4}  10  NH_{2}  CH3  OCH_{3}  CN  H  Cl  NO_{2}  F  6 
C_{6}COOH  10  NH_{2}  OCH_{3}  CH_{3}  Cl  CN  H  NO_{2}  F  5 
C_{6}H_{4}COOH  14  NH_{2}  OCH_{3}  CH_{3}  Cl  H  CN  NO_{2}  F  5 
SC_{6}H_{4}COOHb  15  NH_{2}  CN  H  OCH_{3}  CH_{3}  NO_{2}  F  Cl  7 
The versatility of QTMS has been demonstrated by its ability to predict a wide range of physicochemical and pharmacological properties. A few highlights from this work are given next for the sake of illustration.
Examples of predictive QSAR using QTMS only (without incorporating steric or log P information into the model) include the prediction of the proton affinities and the pK_{a}'s of 125 pyridine derivatives,[140] the relative bond dissociation enthalpies of phenolic antioxidants,[131] and the interaction energies of substituted cytosine–guanine (CG) Watson–Crick (WC) base pairs.[149] The predicted properties in this selection of examples are all primarily dependent on electronic (and much less on steric or solvent partitioning) properties. The latter study will now be briefly discussed.
Xue and Popelier (XP)[149] calculated the effect of 37 substituents at position 8 of guanine on the binding energy of the CG WC base pair. The position of the substituent R, attached to C8 of guanine, is indicated in the following structure with the most “predictive bonds” (see below) enclosed in boxes (Scheme 2):
The binding energies of the unsubstituted and substituted dimers were all calculated at the B3LYP/6311+G(2d,p) level of density functional theory (DFT) with the inclusion of counterpoise basis set superposition error corrections.[150, 151] If ΔE represents the interaction energy of the unsubstituted GC base pair and ΔE_{S} that of a guanine(C8)substituted GC pair, then ΔΔE ≡ ΔE_{S} − ΔE represents the change in ΔE on substitution. Defined in this manner, a positive value of ΔΔE implies a substitution that destabilizes the base pair and a negative value is a substitution that favors the hydrogen bonding of the WC pair. XP then compare a multiple linear regression model with a model constructed solely using Hammett σ_{m} constants. The QSAR models these authors report are:
where RMSD is root mean square deviation, using traditional Hammett constants, and:
using the QTAIM properties,[17] where ε (≡λ_{1}/λ_{2} − 1) is the bond ellipticity and G the gradient kinetic energy density ( ) evaluated at the BCP, and where the bonds are those enclosed in boxes in the chemical structure above.
As can be seen from the regression quality indicator lines below Eq. (15), the molecular set has been divided into to subsets: A training set used to construct the model and a set external to the training set to test the performance of the constructed model outside of its bounds (this is necessary to ensure the predictivity of the QSAR model rather than just its ability to model the given data accurately). Clearly, the QSAR model based on QTAIM descriptors is characterized by significantly superior statistical quality indicators that the corresponding traditional Hammett constantsbased model. This example, while not strictly QTMS since Eq. (13) has not been used, is nevertheless closely associated to QTMS since the descriptors used in Eq. (15) are typically incorporated into QTMS models. These results add weight to the proposal that QTMS can be an inexpensive alternative to the Hammett constantsbased QSAR approach.
The utility of QTMS has been extended beyond the prediction of static molecular properties. For example, this approach has been shown capable of reproducing the rate constants (k) of a model reaction, namely, the hydrolysis of esters:[127]
The alkaline hydrolysis rate constant at 25°C of 40 different esters (16 ethylcontaining and 24 other esters) were calculated from a QTMS QSAR model and compared with experimental values. The QSAR model yielded r^{2} = 0.930 and q^{2} = 0.863. Further, an examination of the “variable importance in the projection (VIP)” obtained from the PLS regression has (correctly) identified the bonds in molecular fragment enclosed in square brackets, R[(CO)O]R′, as the most important in determining activity,[127] that is, essentially automatically locating the active center.
It is customary to regard the accurate prediction of pK_{a}'s as one of the initial validation steps of a new drug design approach. QTMS has moved beyond that stage and has been applied to the direct prediction of biological properties. For example, QTMS[152] has been shown to have comparable predictivity as the more traditional comparative molecular field analysis (CoMFA)[153] modeling of the toxicological indices of a series of healthhazardous lipidsoluble environmental pollutants (polyhalogenated dibenzopdioxins).[154] A regression of r^{2} = 0.91 and q^{2} = 0.86 was obtained in a QTMS modeling of the cellgrowth inhibitory activities of 15 substituted (E)−1phenylbut1ene3ones.[155] The method has also delivered the automatic identification of the region in these congeners responsible for their biological activity, namely, the site of the Michael addition.[155] A study of the cytotoxicity of oalkyl substituted 4Xphenol (X being the substituent)[133] has shown that this biological activity is accurately predicted by QTMS alone (without the inclusion of any steric information), which indicates that cytotoxicity is primarily governed electronically, a result that does not support a previous proposal by the Hansch group that the latter is more important in determining activity.[156] The reverse situation is encountered in a study of the hepatotoxicity of a series of phenols, which demonstrates that, in this case, QTMS must be combined with external hydrophobicity descriptors such as log P, (see the section entitled “Partitioning Between Immiscible Liquids and log P” below) for a successful modeling,[134] liposolubility being a known key factor in determining the hepatoxicity of phenols. More recently and similarly, it has been shown that QSAR models for the toxicity of aldehydes, another class of environmental pollutants, to a model organism (Tetrahymena pyriformis) can be constructed from a proper mix of QTMS descriptors and log K_{o/w}[141] [defined in Eq. (42) and discussed in the section entitled “Partitioning Between Immiscible Liquids and log P”]. An earlier study has also shown that log K_{o/w} and E_{LUMO} are essential extraneous ingredients in the QTMS modeling of the toxicity of nitroaromatics, another class of pollutants, to Saccharomyces cerevisae.[136]
Bond properties as predictors of spectroscopic transitions and NMR proton chemical shifts
The calculation of UV excitation spectra accurately remains a challenge for computational quantum chemistry to this day and usually necessitate an expensive high level of explicit treatment of electron correlation. The fast empirical modeling of such key analytical characteristics of molecules is, thus, of particular significance from the practical point of view.
The first HK theorem[39] [relation (5)] has been proven for nondegenerate ground states, as discussed above in the section entitled “The Electron Density ρ(r) and QSAR Modeling”. The mapping expressed in expression (5) shows, however, that the groundstate density, ρ(r), does specify completely the Hamiltonian operator (Fig. 2). By specifying and through the timeindependent manyparticle Schrödinger equation, the groundstate density ρ(r) in principle also determines the excited states and their properties. In other words, the energies of the excited states are also functionals of the groundstate density, even though they are not functionals of the corresponding excited state densities since there exists different external potentials that can yield the same excited state density (i.e., there is no HK theorem for excited states).[157]
It is not entirely surprising, thus, to realize that (approximations to) the groundstate density and groundstate geometry, both of which are mapped to the groundstate external potential, contain information about excited states. In this section, we show that the groundstate bond properties do indeed encode information that can be used in the QSAR modeling of UV excitation energies.
About seven years ago, Buttingsrud, Alsberg, and Åstrand (BAA) have been able to accurately predict λ_{max} and the corresponding excitation energies (ΔE_{hν}) of 191 substituted azobenzene dyes from two sets of groundstate QSAR models: The first is based on optimized groundstate bond lengths, and the second on QTAIM BCP descriptors of the ground state.[146] The molecular set studied has the following general structure (Scheme 3):
where some of the R's may belong to condensed cyclic systems.
Several PLS models based only on optimized groundstate bond lengths (including different subsets of bonds in the molecules) have been tested in modeling the excitation properties. Even the simplest model, the one that includes only the NN bond length, yields already an elevated r^{2} of 0.90 (λ_{max}) and a corresponding r^{2} = 0.86 (ΔE_{hν}); indicating that this bond appears essential in describing the excitation[146] and hence represents the chromophore.
The cross validated statistics of the onebond length model are q^{2} = 0.89 for the training set and q^{2}(test) = 0.91 for the test set with a root mean square error of cross validation (RMSECV) of 11.2 nm, and a root mean square error of prediction (RMSEP) of 12.6 nm. The equivalent onebond (NN) QTAIM bond property model (based on two descriptors: ▿^{2}ρ_{BCP} and ε) yields q^{2} = 0.91 for the training set and q^{2}(test) = 0.92 for the test set with a RMSECV of 9.9 nm and a RMSEP of 12.3 nm.
The best QSAR of all in both sets of models are those that includes all bonds, in which case the statistics are (bond length model/QTAIM model): q^{2} = 0.94/0.97, q^{2}(test) = 0.96/0.98, RMSECV = 8.0/5.4 nm, and RMSEP = 7.6/5.4 nm. The above observations are consistent with the trends obtained from an inspection of Table 2 of Ref. 146 that (i) generally, QTAIM modeling yields more robust statistics than the modeling based on bond lengths only, and that (ii) there is an improvement, albeit modest, from the simplest QTAIM model (based only on the NN bond) to the most complex based on all bonds, a small gain at the cost of a significant increase in the complexity of the model by the empirical finetuning of the perturbation of the chromophore by its surroundings beyond the perturbation captured in the mere change in the NN bond length and properties.
Compound families  na  r^{2}  Average deviation  Maximum absolute deviation 



Aliphatic carboxylic acids  19  0.98  0.10  0.34 
Substituted phenols  20  0.92  0.21  0.60 
Substituted anilinium ions  12  0.96  0.14  0.27 
3,4Substituted pyridinium ions  11  0.97  0.33  0.68 
Hydrated aliphatic acids, substituted benzoic acids, and substituted phenols  53  0.99  0.22  0.62 
In another study, BAA use QTAIM bond properties to predict 170 distinct NMR proton isotropic shieldings in 60 substituted benzenes obtained from the direct electronic structure calculation. The 60 molecules were grouped into four separate groups, with some overlap (some molecules are members of more than one group), according to the nature of the substituents: group I, II, III, and IV, which include 12, 11, 38, and 28 compounds, respectively.[145] The purpose of the study is not to calculate the NMR shifts themselves since these are calculated directly (and more accurately) by the authors, but rather to use these data as a testcase example of a local property to be predicted by QTAIM bond properties.
Models of increasing complexity ranging from the simplest model, which includes the QTAIM bond properties of only the bond shared by the proton of interest (model a) to the most complex model that includes the properties of all bonds (model e) are tested for their predictivity. The models are further classified in three categories: Those based on bond lengths only (models B); those based on three “standard” QTAIM bond properties, namely, ρ_{BCP}, ▿^{2}ρ_{BCP}, and ε, (model S); and finally those that include more QTAIM bond properties, a group of models termed “extended” (model E). An examination of Table 4 of Ref. 145 reveals a general and significant improvement in the S models over the B models, but only a marginal improvement over the S models on passing to the E models. This observation is consistent across the table, hence we quote here only the extreme values for the simplest model a (onebond only) and the most complex model e (all bonds) applied to group I of 12 molecules exhibiting 20 distinct proton shieldings in the order B/S/E: model Ia: q^{2} = 0.6/0.98/0.98, RMSECV = 0.12/0.027/0.025 ppm, and model Ie: q^{2} = 0.995/0.997/0.996, RMSECV = 0.014/0.011/0.012 ppm.[145]
In some cases, the simplest model a fails completely to capture the proton shielding (group II of chlorobenzenes). The modeling in this case fails presumably due to the size and relatively large electron population of chlorine (compared to fluorine, for example) and which may cause “through space” shielding that invalidates the BCP modeling. Again, as in QTMS, bond properties can be used in electronic fingerprinting but fail to capture size effects. Even in these failed cases, the accuracy of the modeling is quickly recovered in models of the complexity c (seven bonds) and beyond for this group of molecules. For a comparison (B/S/E): model IIa: q^{2} = 0.1/0.5/0.0, RMSECV = 0.11/0.11/0.08 ppm, and model IIe: q^{2} = 0.97/0.96/0.96, RMSECV = 0.019/0.022/0.020 ppm.[145]
QTAIM Atomic Properties in QSAR
These properties are more expensive to obtain than bond properties since their calculation involves 3D numerical integrations over atomic basins [Eq. (11)]. However, exploitation of transferability and additivity and the use of precalculated database libraries of transferable atomic and fragment properties can virtually eliminate computational time once these libraries have been constructed, as in Breneman's TAE approach outlined above in the section entitled “The Electron Density, Transferability, and Additivity”.
Atomic energy and charge of the acidic hydrogen as predictors of pK_{a}
Adam demonstrated that the pK_{a} of a weak acid is inversely proportional to the QTAIM energy of the dissociating (acidic) hydrogen in the parent undissociated acid molecule at a given set of thermodynamic conditions.[158] The milestones of his demonstration are summarized here.
The pK_{a} of an organic acid is related to the Gibbs energy of the acid dissociation reaction ( ) and is expressed:
Reexpressing ΔG^{0} in terms of the molar dissociation energy (U^{0}) and the standard molar partition functions [q^{0}; not to be confused with atomic charge, which is given the symbol q(Ω)], we get:
where
and where
The last term in Eq. (17), defined in Eq. (18), depends on the characteristics of H_{2}O and H_{3}O^{+} and the temperature and, hence, is independent of the nature of the acid. The assumption implicit in Eq. (19) is expected to be more valid the larger the acid molecule.[158] Substituting approximation (19) into Eq. (17):
The molecular dissociation energy is given by:
where E is the bottomofthewell energy of dissociation, E^{ZPE} the zeropoint vibrational energy, E_{i} is the electronic energy of the separated ith groundstate atom. The numerator of the first term of the R.H.S. of Eq. (20) can then be reexpressed as:
in which and E_{HA} are the vibrationless total energies of the anion and the undissociated acid, respectively. The difference is expected to be small and approximately constant for a homologous series, and E_{H} is the energy of free groundstate hydrogen atoms, which is another constant, leading to a reexpression of Eq. (20) as:
where C is a new constant.
Assuming that the atomic energy of the dissociating hydrogen atom can be approximated as the difference between the total energies of the anion and the neutral undissociated acid, especially for a larger acid molecule:[158]
then,
It is important to realize that this result is independent of the particular model chemistry, which also may or may not incorporate solvation effects. The model embodied in Eq. (25) is validated by an excellent agreement with experiment. For instance, the following equation is the result of a statistical fitting of the pK_{a}'s of 53 unrelated organic acids to Eq. (25):[158]
which yields an r^{2} of 0.991.
The correlation of the experimental and calculated pK_{a} values and the correlation of pK_{a}'s with the QTAIM atomic energies of the acidic hydrogen are displayed in Figure 5. The statistical correlation indicators reported by Adam[158] based on Eqs. (25) and (26) for different sets of molecules are collected in Table 2.
Adam uses DFT [PW91/6311+G(d,p)] model chemistry from which he obtains the atomic energies of the acidic hydrogen,[158] raising the questions of (i) the physical meaning of QTAIM atomic energies obtained from DFT calculations and (ii) the effect of using a different level of theory with the possible result of obtaining significantly different atomic energies. The first question is addressed in detail in the Appendix of Ref. 159. As for the second question, QTAIM atomic energies that are numerically obtained by the same integration procedure but from different underlying electronic structure theories (such as Hartree–Fock, Møller–Plesset perturbation theory, or DFT) are very strongly correlated statistically (r^{2} typically = 0.99) and hence, from the point of view of statistical modeling, atomic energies contain essentially the same information no matter what underlying computational level of theory has been used and despite of their possibly significantly differing magnitudes.[32] References 30 and 31 contain additional discussions of the effects of underlying electronic structure calculation levels of theory on QTAIMderived properties.
Simultaneous to the appearance of Adam's paper, Hollingsworth, Seybold, and Hadad (HSH)[160] report QSAR models to predict the pK_{a} values of a series of substituted benzoic acids from models based on the atomic charges of the carboxylic group and of the acidic hydrogen, using several different definitions of atomic charges including that of QTAIM. Various data from the HSH paper are assembled in Table 3, which lists the pK_{a}'s against the Hammet constants [defined in Eq. (12)], and the total QTAIM charges of the acidic hydrogens, q(H), and of the carboxylic acid group, q(COOH). As can be seen from the table, Hammett constants are the better predictors in this case (r^{2} = 1.000), which is expected since they are constructed to deliver pK_{a}'s of substituted benzoic acids, but the QTAIM charges of the acidic hydrogens are also excellent predictors of the respective acidities (r^{2} = 0.913):[160]
which is in line with the simple intuition that an acid with a more electropositive acidic hydrogen (greater positive charge, q(H) > 0) is more acidic and, thus, has a lower pK_{a} value, as reflected in the negative coefficient of the second term.
Substituent  pK_{a}  σ  q(H)  q(COOH) 



H(benzoic acid)  4.19  0.00  0.5803  −0.1518 
mBromo  3.81  0.39  0.5840  −0.1189 
mChloro  3.83  0.37  0.5838  −0.1233 
mCyano  3.60  0.56  0.5865  −0.1090 
mFluoro  3.87  0.34  0.5830  −0.1279 
mHydroxy  4.08  0.12  0.5806  −0.1439 
mMethoxy  4.09  0.12  0.5798  −0.1501 
mMethyl  4.27  −0.07  0.5796  −0.1571 
mNitro  3.49  0.71  0.5868  −0.1008 
pBromo  3.97  0.23  0.5843  −0.1316 
pChloro  3.98  0.23  0.5826  −0.1388 
pCyano  3.55  0.66  0.5859  −0.1216 
pFluoro  4.14  0.06  0.5818  −0.1422 
pHydroxy  4.57b  −0.37  0.5790  −0.1642 
pMethoxy  4.47b  −0.27  0.5783  −0.1628 
pMethyl  4.37  −0.17  0.5792  −0.1501 
pNitro  3.43  0.78  0.5864  −0.1220 
r^{2}c  0.9985  0.9134  0.8698 
V^{0} (Exptl.)  V^{0} (Calc)  V(vdW)  CSI  

Amino acid  
gly  43.3  41.9  47.25  0.009 
ala  60.5  57.8  212.22  0.221 
ser  60.6  62.0  277.93  2.707 
cys  73.4  76.3  406.30  0.741 
asp^{(−)}  73.8  79.0  473.10  5.083 
thr  76.9  77.3  435.30  2.805 
asn  78.0  80.5  493.10  5.487 
pro  82.8  81.4  461.70  1.070 
glu^{(−)}  85.9  93.6  625.23  5.350 
val  90.8  87.3  519.70  0.737 
gln  93.9  95.2  644.87  5.665 
his  98.8  96.2  666.83  6.982 
met  105.4  106.9  715.79  0.275 
ile  105.8  101.9  670.83  1.010 
leu  107.8  102.2  674.48  1.019 
lys^{(+)}  108.5  107.8  760.30  4.175 
phe  121.5  122.5  879.93  0.698 
tyr  123.6  127.3  947.65  2.826 
arg^{(+)}  127.3  119.1  926.27  9.679 
trp  143.9  146.3  1146.63  3.238 
Group  
NH  11.6  10.7  126.83  1.826 
CO  13.1  12.7  167.78  3.119 
NH_{2}  15.4  16.4  175.63  2.013 
CH_{2}  15.9  17.4  150.49  0.309 
COOH  25.8  25.5  306.26  5.077 
CH_{3}  26.5  25.5  213.79  0.277 
CH_{2}OH  28.2  27.7  277.93  2.707 
CONH_{2}  28.8  29.6  344.49  5.363 
There is a striking consistency and similarity in the form of Eqs. (27) and (26), which deserves a comment. Atomic energies are always negative (E(Ω) < 0, always), and they usually become more negative with an increase in the electron population [N(Ω) = Z_{Ω} − q(Ω)] of a given atom. This is especially true for electropositive hydrogen atoms where the intraatomic electron–electron repulsion is nonexistent. It is, thus, reasonable that the intercept in Eq. (26) is also negative leading to a lower pK_{a} for an acid with more electropositive acidic hydrogen since it has a higher (less negative) energy, just as HSH result, Eq. (27), suggests.
QTAIM descriptors of bulk and chargeseparation for the prediction of partial molar volumes (PMV)
The partial molar volume (PMV, or V^{0}) is the rate of change of the volume of a solution with respect to the number of moles n of solute, extrapolated to infinite dilution. This quantity has the units (L^{3} mol^{−1}) and for a twocomponent system is defined:
where V is the volume of the solution at a given temperature (T), pressure (P), and number (n) of moles of solvent. Equivalently, V^{0} is the first derivative of the chemical potential of the solute with respect to the pressure.[161] Experimentally, the “apparent” PMV ( ) is measured, which only equals V^{0} at infinite dilution (to eliminate solute–solute association) since[162]
where the concentration of the solute is expressed in molality (m), and thus, the second term of Eq. (29) vanishes only at infinite dilution (when m = 0). All reported experimental PMVs are to be understood, thus, as extrapolated values at infinite dilution.
PMVs are additive and can be estimated from group additivity schemes,[162164] paralleling the behavior of QTAIM atomic properties expressed in Eq. (11). Further, the observed V^{0} can be considered as the resultant of two principal opposing contributions: A positive intrinsic steric bulk contribution due to the physical space occupied by each solute molecule, which displaces the surrounding solvent molecules ( ), and a generally negative electrostriction contribution ( ) due to the shrinking of the solvent shells surrounding the solute when both the solute and the solvent are polar ( can at times be positive if the solvent–solvent attraction is stronger than the solute–solvent attraction). Ignoring other contributions such as the temperaturedependent solute molecular vibration contribution (positive), we can write[163]
which can be expressed in terms of QTAIM atomic contributions (after substituting the symbol for approximate equality by that of equality for simplicity):
leading to the definition of an atomic contribution to the PMV as
where the atomic intrinsic volume is identified with the QTAIM atomic volume up to the ρ= 0.001 a.u. isodensity envelope, namely, the van der Waals volume , leading to the equivalent definition for a group (or a molecule) R:
where the sum runs over all the atoms in R.
The atomic electrostriction contribution in water is equated in this modeling to the magnitude of the atomic charge since a charge of either sign will pull water molecules toward the solute (with reversed orientation), ignoring the solvent–solute and electric charge signdependent difference in the magnitude of electrostriction. The justification of this assumption is empirical, that is, the ability of the resultant model(s) to fit experimental data with accuracy. For a molecule or a group of several atoms R, we identify the charge separation index (CSI),[165] namely the sum of the magnitudes of the atomic charges, as its electrostriction term:
where multiplication of the CSI by a constant of unit magnitude and of dimensions [volume/charge] is implied.
Figure 6 displays the atomic charges on the side chains of a nonpolar amino acid (alanine) and a polar amino amino acid (serine) of roughly similar molecular size. The total charge on the two side chains (R) is +0.08 indicating very little charge transfer to the C_{α}H_{α}(NH_{2})COOH group. The CSI clearly distinguishes the polar and nonpolar side chain being 0.58 and 2.70 a.u., respectively.
One can construct a QSAR model on the basis of the two contributions described above to model the experimental PMVs of the hydrogencapped genetically encoded amino acids side chains[162] leading to[80]
where the subscript AA refers to an entire amino acid and R refers to its side chain.
The agreement between the experimental and calculated values according to this equation can be gleaned from the upper section of Table 4 and upper plot of Figure 7. The agreement between the calculated and experimental values is remarkable especially that only two parameters were used in the fitting of 20 PMVs (with a ratio of data points to parameters = 10:1). This agreement between the QSAR model and experiment extends to the PMVs of the chemical groups composing the amino acids:[80]
where is the empirical additive group contributions to the PMV,[162] and the subscript G is a short for “group.” The bottom part of Table 4 and bottom plot in Figure 7 illustrate the agreement between the empirical and calculated PMV of the groups composing the naturally occurring amino acids according to model (36).
The coefficients of the last two terms of the R.H.S.' of Eqs. (35) and (36) exhibit the expected signs: Positive for the intrinsic volume terms and negative for the electrostriction terms. There is, however, a subtle but important difference between these two equations: The former has a significant intercept while the latter has practically none. The intercept of about 37.3 cm^{3} mol^{−1} in Eq. (35) represents the (predicted) PMV of the group which is common to all amino acids and which has not been directly modeled in this work. The (experimental) contribution of that group to the PMV can be estimated using the additivity property of PMVs and the experimental values listed in Table 4:
The experimentally estimated value of 34.0 cm^{3} mol^{−1} is gratifyingly close to the intercept of Eq. (35). Further, the estimated PMV of the zwitterionic group, 37.3 cm^{3} mol^{−1}, is significantly smaller than that of a hypothetical nonzwitter ionic group estimated to be 52.7 cm^{3} mol^{−1} using Eq. (36) and the group values listed in Table 4, and this is also expected due to electrostriction shrinkage accompanying the chargeseparated zwitterionic form.
Another important class of biomolecules are the five nucleic acid bases: Adenine (A), guanine (G), cytosine (C), thymine (T), and uracil (U). Lee and Chalikian[166] determined accurately the PMVs of the five corresponding nucleosides as well as those of free A, C, T, and U bases at 18 and 55°C (the PMV of the free G base was not determined due to experimental impediments). In the following model, it is shown that the QSAR modeling using again an intrinsic volume term and an electrostriction term is sufficiently sensitive to capture not only the PMVs but also their small changes as a function of a temperature change of 37°.
The contribution of the sugar moiety to the PMV of the nucleosides can be evaluated using the additivity and transferability of groups contributions by assuming:[167]
The entries in Table 5 demonstrate the simultaneous constancy (transferability) and additivity of the contribution of the sugar moiety to the PMV (80.4 ± 1.0 cm^{3} mol^{−1}), Eq. (38), a value unaffected by the temperature change from 18 to 55°C within the combined experimental and statistical precision. The empirical PMV of guanine that has not been directly measured can be estimated from the difference between the PMV of guanosine minus the constant contribution of the sugar moiety, that is[167]
Molecule  V^{0} (18°C) (Exptl)  V^{0} (18°C) (Calc)  V^{0} (55°C) (Exptl)  V^{0} (55°C) (Calc) 

Experimentala  
Free bases  
Uracil  70.2 ± 0.4  69.1  74.6 ± 0.5  74.7 
Cytosine  72.4 ± 0.4  74.0  75.8 ± 0.5  79.1 
Thymine  86.3 ± 0.4  86.0  91.6 ± 0.5  90.3 
Adenine  88.0 ± 0.4  87.5  93.5 ± 0.5  92.2 
Guanineb  91.6 ± 1.0b  91.8  97.5 ± 1.2b  96.6 
Nucleosides  
Uridine  150.7 ± 0.6  154.8 ± 0.7  
Cytidine  152.2 ± 0.6  156.4 ± 0.7  
Thymidine  166.4 ± 0.6  170.4 ± 0.7  
Adenosine  169.2 ± 0.6  175.3 ± 0.7  
Guanosine  172.0 ± 0.6  177.8 ± 0.7  
Differencesc  
Sugar (U)  80.5 ± 1.0  80.2 ± 1.0  
Sugar (C)  79.8 ± 1.0  80.6 ± 1.0  
Sugar (T)  80.1 ± 1.0  78.8 ± 1.0  
Sugar (A)  81.2 ± 1.0  81.8 ± 1.0  
Sugar(average)  80.4 ± 1.0  80.4 ± 1.0 
With a complete data set of PMV of all five nucleic acid bases, we can now proceed as before and model their PMV with an intrinsic and an electrostriction term to obtain[167]
and
The excellent agreement of the modeling at both temperatures can be seen from the listings in Table 5 and the plots of Figure 8. The increase in the average molecular kinetic energy and population shifts to higher vibrational levels increase the effective molecular volume at the higher temperature, which is reflected in the larger effective PMV at 55°C, an effect wellreproduced by the fitting.
Partitioning between immiscible liquids and log P
A solute in excess at equilibrium with two immiscible liquid phases sharing an interface will result in saturated solutions containing different concentrations of the solute in each of the two phases. In the absence of an excess of solute, the solute will be partitioned between the two phases in the same ratio if the activity coefficient remains approximately constant.[168] The ratio of the solute concentrations in two touching phases in equilibrium, C_{1} and C_{2}, respectively, namely the equilibrium constant K is also known as the distribution ratio, the distribution coefficient, or more commonly as the partition coefficient (P_{o/w} or simply P). Often the two solvents include an organic hydrophobic/oily phase (often octanol) and an aqueous phase. Thus, the partition coefficient, defined as
measures the tendency of a compound to be preferentially distributed in the organic or in the aqueous phase. The concentrations (often molarities) rather than activities are used in the definition of P since at equilibrium the activities of the solute in both the organic and the aqueous phase are equal and in this case K would always be unity, which is not useful.
Due to the form of thermodynamic equations, it is more common to use log P in QSAR modeling rather than P itself. The value of log P is smaller than zero when the concentration of the solute is higher in the aqueous phase, that is, in the case of hydrophilic solutes, and conversely it is positive for hydrophobic solutes.
The partition coefficient plays a central role in pharmaceutical sciences and technology as it governs the absorption, distribution, and binding of drugs and determines a diversity of phenomena from the activity of narcotic drugs to the efficiency of solvent extraction of a compound from its solution in another solvent.[168] For these reasons, extensive compilations of log P obtained from experimental measurement and/or estimated from group contributions have long been available.[2] The importance of log P has been underscored, more recently, in the Lipinski's rule of five obeyed by the majority of orally active drugs. The rule states that orally active drugs must satisfy at least three of the following criteria:[169]
 A molecular mass less than 500 D.
 No more than five hydrogen bond donor groups.
 No more than 10 hydrogen bond acceptor groups.
 An octanol–water partition coefficient of less than 5 (log P < 5).
Drugs that significantly violate these rules, except compounds that are substrates for biological transporters, are poorly absorbed and have to be administered by other routes, for example by injection.[169] (The rule acquired its name apparently because all the numbers that appear in it are multiples of five).
Partitioning between two phases across an interface such as a biomembrane is common in the living system. For example, the serum of blood in contact with the neuronal membranes constitute the blood–brainbarrier that must be crossed by any drug intended to act inside the brain tissue. Another example is the necessary ability of a molecule to cross the mucosa of the small intestine before reaching the systemic circulation. This passage through the intestinal mucosa is frequently accomplished by passive diffusion through partitioning of the drug between the aqueous phase in the lumen of the intestine, the oily phase of the mucosa, and finally again an aqueous phase on the other side of the mucosa, that is, the blood serum.
Soluble serum proteins are often exploited as drug reservoirs for the slow and steady release of sparingly soluble drugs, the steady plasma concentration being maintained due to the partitioning with the protein reservoir. The equilibrium constant of binding of a molecules to a protein, K, can be predicted accurately from P by the equation:[2]
where k_{2} and k_{4} are dimensionless fitting parameters, which have the values 0.751 ± 0.007 and 2.301 ± 0.15, respectively, when the protein is bovine serum albumin (n = 42 compounds, the standard error = 0.159 log K units, and an r^{2} = 0.960).[11]
A pharmacological endpoint is often expressed at the L.H.S. of equations similar in form to Eq. (43) [Eq. (1)] with a R.H.S. consisting of f(P). The linearity of Eq. (43) is deceptive since it is merely an approximation to a linear tail of a more general parabolic functional relationship between log (1/C) and log P when the range of values is enlarged. The more general relation has the form:[13, 5]
where k_{1} and k_{3} are fitting parameters, σ is the Hammett substituent constant defined in Eq. (12), and where the second and fourth terms are the R.H.S. of Eq. (43). The third term, k_{3}σ, is of importance only for molecules that possess a common skeleton with different substituents but looses its relevance when the molecular set used to construct the QSAR model do not share a common skeleton, in which case the equation simplifies to[5]
The parabolic dependence of log (1/C) on log P in Eq. (45) expresses the requirement that a molecule must possess an optimal log P, referred to as log P^{0},[170] for maximal pharmacological activity—higher or lower log P results in a reduction in activity. The optimal log P ensures that the solubility in the oily phase is just enough to cross biological membrane barriers but not elevated enough to hinder the eventual release of the drug from the membrane to be redistributed on the other (aqueous) side where the site of action is located. Consistent with these remarks is the observation that a log P^{0} ≈ 2.3 seems to be a universal requirement for a molecule to be a general anesthetic,[170] which is the value obtained from the fitting of the activities of ethers for which k_{1} = 0.22, k_{2} = 1.04, and k_{4} = 2.16.[170]
The partition coefficient [Eq. (42)] may be expressed as a ratio of the equilibrium mole fractions (rather than molarities) in which case it is labeled with a prime (P^{′}). The reason for this modified definition is the appearance of P^{′} in the thermodynamic expression of the standard free energy of transfer of a solute from solvent 1 to solvent 2:[2]
which embodies the experimental procedure of measuring the Gibbs energy of transfer from one solute to the other from partitioning experiments.
Conceptually, the most drastic partitioning between two phases is the partitioning of a solute between its gasphase and its aqueous solution phase. The Gibbs energy of transfer from the gasphase to the aqueous medium measures the affinity of a given solute to water, a property crucial in determining proteins folding in water.[171] The molar free energy of transfer from the gas phase to the aqueous phase, termed the molar free energy of hydration or “hydration potential” (ΔG_{hydr.}), has been accurately determined by Wolfenden et al.[172] at 25°C after correction to pH 7 for each of the hydrogencapped amino acid side chains ((R); since the amino acids themselves have very small vapor pressures due to their zwitterionization in water). These hydration potentials are accurately modeled using a single fitting parameter that measures the hydrophilicity/hydrophobicity, namely, the CSI[80] [defined in Eq. (34)]:
The experimental and calculated ΔG_{hydr.} are listed along with CSI_{R} in Table 6, which is sorted in order of decreasing CSI (hydrophilicity). The correlation of the experimental and calculated values obtained from Eq. (47) is displayed in the top of Figure 9.
AA  Descriptors  Modeled properties  

mRNA 2nd letter  Baseb  
CSI_{R}  V(vdW)_{R}  exptl.  calc.  exptl.  calc.  exptl.  calc.  


arg^{+}  9.679  926.27  G  I  −19.92  −19.28  −14.92  −12.65  −1.32  −0.58 
his^{+}  6.982  666.83  A  I  −10.27  −13.42  −4.66  −9.13  0.95  −0.42 
gln  5.665  644.87  A  I  −9.38  −10.57  −5.54  −6.70  −0.07  −0.06 
asn  5.487  493.10  A  I  −9.68  −10.18  −6.64  −7.39  −0.01  −0.42 
glu^{–}  5.350  625.23  A  I  −10.20  −9.88  −6.81  −6.22  −0.79  −0.01 
asp^{–}  5.083  473.10  A  I  −10.95  −9.30  −8.72  −6.73  −0.34  
lys^{+}  4.175  760.30  A  I  −9.52  −7.33  −5.55  −2.99  0.08  0.73 
trp  3.238  1146.63  G  I  −5.88  −5.30  2.33  1.50  2.51  2.08 
tyr  2.826  947.65  A  I  −6.11  −4.41  −0.14  0.95  1.63  1.67 
thr  2.805  435.30  C  II  −4.88  −4.36  −2.57  −2.52  0.27  0.29 
ser  2.707  277.93  G/C  I/IIc  −5.06  −4.15  −3.40  −3.41  0.04  −0.11 
pro  1.070  461.70  C  II  −0.59  1.06  0.91  
leu  1.019  674.48  U  II  2.28  −0.48  4.92  2.62  1.76  1.51 
ile  1.010  670.83  U  II  2.15  −0.46  4.92  2.61  2.04  1.50 
cys  0.741  406.30  G  I  −1.24  0.12  1.28  1.33  0.87  
val  0.737  519.70  U  II  1.99  0.13  4.04  2.11  1.18  1.18 
phe  0.698  879.93  U  II  −0.76  0.21  2.98  4.66  2.09  2.17 
met  0.275  715.79  U  II  −1.48  1.13  2.35  4.36  1.32  1.86 
ala  0.221  212.22  C  II  1.94  1.25  1.81  1.02  0.52  0.51 
gly  0.009  47.25  G  I  2.39  1.71  0.94  0.31  0.00  0.13 
It has long been recognized that the second position of the triplet genetic codon of an amino acid is the most important in determining the physical properties of that amino acid.[173] By sorting the amino acids on the basis of the hydration potentials of the side chains, Wolfenden et al.[172] noted that in the mRNA genetic code, hydrophilic amino acids have a purine base in the second position (A or G) while hydrophobic amino acids tend to have a pyrimidine base (U or C) in the second position. Since the hydrophilicity/hydrophobicity of the amino acid side chains is highly correlated to the CSI, one can glean from Table 6, a correspondingly strong correlation between the chemical nature of the middle letter of the triplet genetic codon and the CSI of the side chain of the encoded amino acid. All amino acid side chains with 9.68 > CSI_{R} ≥ 2.83 a.u., the hydrophilic amino acids according to CSI, possess a purine base in the second position without exceptions. Conversely, most hydrophobic amino acids with CSI_{R} ≤ 2.81 a.u. possess a pyrimidine base in the second position, with a few exceptions: (i) Serine, which is the only amino acid with a degenerate middle letter in its codons (middle letter can be either “G” or “C”), (ii) glycine, which is an exceptional amino acids in that it does not have a side chain and is the only amino acid with an achiral αcarbon, and (iii) cysteine, which is the only amino acid that dimerizes through its side chain to form SS bridges of cystine. To the author's knowledge, this is probably the most direct link between the genetic code and the electron density.
The experimental Gibbs energies of transfer of the 20 hydrogencapped amino acid side chains from cyclohexane (cyc) and from octanol (oct) to water (w) were reported by Radzicka and Wolfenden.[174] These experimental can be modeled with an electrostriction term [CSI_{R}, Eq.(34)] and an intrinsic volume term [V(vdW)_{R}, Eq.(33)]:[15, 80, 167]
and
where energies are in kcal mol^{−1}, and CSI values and volumes are in a.u. Table 6 provides a listing of the experimental and calculated values corresponding to Eqs. (48) and (49) along with a listing of the amino acid descriptors CSI_{R} and V(vdW)_{R} used in the modeling, while Figure 9 (middle, and bottom) displays the agreement between calculated and experimental values graphically. A number of other remarkable models were reported where CSI_{R} and V(vdW)_{R} could be used in the accurate prediction of the change in the stability of proteins on mutation, that is, the change in Gibbs energy upon denaturation of the mutant compared to this change for the wild type.[80, 167]
The versatility of the CSI index to model widely differing properties is quite notable. Recently, SánchezFloresy et al.[175] report accurate modeling of excitation energies in simple molecules using this single descriptor. For example, these workers show that the excitation energies for the lowlying singlet and triplet π→π* states of CO can be closely fitted to the following linear regression equation:[175]
while transition energies between lowlying singlet and triplet π→π* states of benzene can be fitted to:
again using the groundstate CSI, which reinforces the arguments presented above in the opening of the section entitled “Bond Properties as Predictors of Spectroscopic Transitions and NMR Proton Chemical Shifts”.
A general conclusion that can be drawn is that the QTAIM groundstate CSI is capable of capturing a good deal of important molecular physics, in the QSAR sense, and is an important and valuable descriptor worthy of note and further exploration.
πStacking of antineoplastic agents (Casiopenas^{®}) and charge transfer from flanking DNA base pairs
Casiopeinas^{®} are a promising class of antineoplastic cytotoxic agents of uncertain mechanism of action.[176] These complexes are denoted [Cu(NN)(NO)]NO_{3} and [Cu(NN)(OO)]NO_{3} where NN symbolizes a substituted bipyridine or phenanthroline aromatic ring system, NO an αaminoacidate or peptide, and OO acetylacetonate or salicylaldehyde. These compounds consist of tertradentate complexes of Cu^{2+} where the metal is chelated to a substituted bipyridine or phenanthroline ring from one side (top moiety in the structure below) and to an αaminoacidate, a peptide, an acetylacetonate, or a salicylaldehyde moiety from the other side (bottom moiety). The general structure of these molecules can be represented by Scheme 4,
where the light gray part is variable, existing in the phenanthroline congeners but nonexisting in the bipyridine congeners, and where R_{1} and R_{2} are parts of a closed ring system where the two atoms bonded to the central Cu^{2+} may be both oxygen atoms or one can be an oxygen and the other a nitrogen.
GalindoMurillo et al.[177] have recently suggested that these complexes can intercalate between DNA base pairs through their aromatic moiety and that the πstacking interaction is driven by an electron depletion of the planar ligand (the substituted bipyridine or phenanthroline ring) due to the transfer of charge to the metal center which, in turn, drives charge transfer from the flanking DNA bases to the intercalating ligand. Using the integrated QTAIM charges summed over entire molecular fragments, these workers found a simple but strong statistical correlation between the complex stabilization energy of the adenine–Casiopeinas^{®} complex and the net electron population transferred from adenine to the aromatic ligand:[177]
which lends numerical support to the chargetransfer assisted πstacking hypothesis advanced by these workers and, simultaneously, enhance the plausibility of their proposed mechanism of action initiated by stacking intercalation of Casiopeinas^{®} between DNA base pairs. What is desirable for a future study is, perhaps, to correlate energies of stacking interaction with a direct measure of biological antineoplastic activity.
Before leaving this section, we note that there exists a number of similar empirical correlations of πstacking interactions modeled with QTAIM (bond) properties, we cite as examples those of Zhikol et al.,[178] Platts and coworkers,[179] and Parthasarathi and Subramanian.[180]
Localization/Delocalization Matrices (LDM), Delocalization Matrices (DM), and DensityWeighted Connectivity Matrices (DWCM) in QSAR
Definition of LDM (ζ), DM, and DWCM
Among the properties that can be associated with a bounded region of space is the number of electrons localized within that region. When this region is an atom in a molecule defined by QTAIM, Ω, the average number of electrons localized within this atom, that is, not shared with any other atom in the molecule, is termed the “localization index” (LI, for short) and is given the symbol Λ(Ω). Since not all electrons are localized, in general Λ(Ω) ≤ N(Ω), the equality is approached at the closedshell limit as in ionic bonding and is exact only for infinitely separated isolated atoms or ions. For an illustration, in the case of the ionic molecule Li^{+0.935}F^{−0.935} molecule (the superscripts are the atomic charges): [Λ(F) = 9.845] < [N(F) = 9.935] with a difference of N(F) − Λ(F) = 0.090. In contrast, for the covalent F_{2}, the corresponding values are: [Λ(F) = 8.549] < [N(F) = 9.000] with a significantly larger difference of N(F) − Λ(F) = 0.451. (QCISD/6–311++G(3df,2pd) level of theory).
The difference between N(Ω) and Λ(Ω) accounts for the number of electrons belonging to Ω but shared with other atoms in the molecules. In the case of Li^{+0.935}F^{−0.935}, 0.090 e^{–} are contributed by F and an equal number is shared from the Li with a total of 0.180 e^{–} shared between the two basins [denoted by δ(Li,F) and which is termed the “delocalization index” (DI), in QTAIM parlance]. In this manner, all electrons in the LiF molecule are accounted for: [Λ(Li) = 1.975] + [Λ(F) = 9.845] + [δ(Li,F) = 0.180] = 12.000 e^{–}.
Thus, QTAIM defines a DI that counts the number of electrons shared between any two atomic basins in a molecule and which is generally denoted by δ(Ω_{1},Ω_{2}). The formal definition and methods of calculation of the LI and DI are discussed elsewhere.[118, 181187]
Two important points must be emphasized before moving to examples of applications. The first is that, as mentioned above, LI and DI account for the whereabouts of all electrons in a molecule composed of n atoms [Ω_{i}, i = 1,2,.n], their general relation to an atomic electron population being:
Given expression (53), the total molecular electron population can be expressed as the sum of two subpopulations:
where
and
The second point, which directly follows from Eq. (53), is that the DI measures the number of electrons (and not the number of pairs of electrons), an erroroneous characterization that has propagated in many publications (including some by the present author). A source of the confusion may be that when two atoms Ω_{i} and Ω_{j} are bonded the number of electrons shared between them, δ(Ω_{i},Ω_{j}), is numerically close to the number of shared pairs in the Lewis bonded structure and hence can be misconstrued as a bond order. One can assert, though, that δ is generally proportional to the classical bond order when two atoms share a bond path. A simple example helps settle this confusion. For the H_{2} molecule, δ(H,H^{′}) ≈ 1.0 and Λ(H) = Λ(H^{′}) ≈ 0.5, which means that N_{loc} = 2 × 0.5, leaving only 1 e to be delocalized (shared).
It is also worth reminding the reader that δ(Ω_{i},Ω_{j}) is not only defined between bonded atoms but also between any two atoms in a molecule, no matter how distant and regardless of the presence or absence of a bond path linkng their nuclei. Obviously the DI cannot be related to any bond order if the atoms do not share a bond path, that is, when they are not bonded in the first place.
FerroCostas, Vila, and Mosquera (FCVM)[188] have recently demonstrated that the anomeric effect in halogensubstituted methanols cannot be explained by hyperconjugation arguments based on the behavior of atomic populations and the QTAIM localization/delocalization indices. These authors have presented lowertriangular matrixlike tabulations of the delocalization indices and have used differences between these matrices in their argumentation.[188]
Given that the δ indices were found to be excellent predictors of experimental quantities such as NMR JJ coupling constants between protons[185] and fluorine atoms,[187] it is tempting to construct matrices similar to those of FCVM and use them as tools in QSARtype studies. Initial explorations of these ideas are presented here for the first time, meanwhile more extensive studies in this direction are being undertook in the author's laboratory.
Disregarding the nature of the matrix elements, the form of the matrices used by (FCVM)[188] is identical to the oftused graphtheoretical connectivity (adjacency) matrices. Connectivity matrices have been brought from the field of graph theory, a branch of pure mathematics, to mathematical chemistry by scientists such as Balaban, Gutman, Hosoya, Nicolić, Randić, Trinajstić, and several others.[6, 7, 51, 189193] In a typical graphtheoretic connectivity matrix, one places 1 or 0 depending on whether two atoms are bonded or not in the hydrogensuppressed graph. Atomconnectivity matrices have been proposed that do not suppress the hydrogen atoms and that include as offdiagonal entries properties that depend on atomic pairs such as interatomic distances, vibrational force constants, or the bond dissociation energies while atomic symbols are introduced along the diagonal elements.[51]
A natural extension of the graphtheoretical approach is to merge it with the topological analysis of the electron density. The idea does not appear to be new, already in 1981 Dmitriev, in a 155 page book on chemical graph theory, devotes pp. 110–115 to a section on Bader's “Topology of Molecular Charge Distributions,” but makes no connections between it and the main theme of the book.[193] In 1994, Balasubramanian has suggested the “[i]ntegration of graph theory and quantum chemistry”[194] in a paper that concludes with a review of the basic concepts of the topological analysis of the electron density according to the theory now known as QTAIM. In his paper, Balasubramanian has emphasized (i) the remarkable correspondence between the graphtheoretical graph of a molecule and the set of bond paths constituting its QTAIM molecular graph[17] and (ii) the role of the Laplacian of the electron density, ▿^{2}ρ(r), in determining the sites of nucleophilic and electrophilic attack in a molecule. The present author is unaware of other proposals to merge or integrate QTAIM and the graphtheoretical approach and presents below some thoughts on how this might possibly be achieved in the future.
Balasubramanian's integration is intuitively implemented through the construction and the use of “electron density weighted adjacency matrices (EDWAM)” (quoting the words and the idea of Massa (L. Massa, Personal communication, 2014)). An EDWAM places the value of the electron density at the BCP, ρ_{BCP}, as the offdiagonal matrix element whenever a bond path exists between two atoms instead of the “1” of the graphtheoretic connectivity matrices. There is no hydrogen suppression in the proposed EDWAM. Figure 10 displays a EDWAM for chloroacetic acid. The respective research groups of Massa and of the present author are jointly exploring the use of EDWAMs in QSARtype studies at the time of writing.
We now introduce a new matrix representation of molecules that encapsulates one and twoelectron information simultaneously and which will be referred to as “localization–delocalization matrix (LDM).” An early version of the LDM was introduced in 2001 in the documentation manual of the program AIMDELOC[195] and bears some similarity to the matrices used here and those used recently by FCVM.[188]
The diagonal elements of LDMs proposed here are the atomic localization indices Λ(Ω), as originally proposed in 2001,[195] but the offdiagonal elements are the delocalization indices divided by 2, that is, δ(Ω_{i},Ω_{j})/2, a feature distinguishing the LDMs in this work from the 2001 proposal and from that of FCVM. Further, we preserve all the elements of these symmetric matrices (not just the lower or upper triangular part). Thus, we define the LDM that we denote by ζ as
In this manner, and according to Eq. (53), the sum of any row or column of ζ equals to the given atomic electron population N(Ω), while the sum of the column sums or row sums yields the total number of electrons in the molecule [Eq. (54)]. Further, the trace of ζ is the total number of localized electrons (N_{loc}) in the molecule [Eq. (55)], and the number of delocalized electrons is, then, given by difference:
As known, δ(Ω_{i},Ω_{j}) decays exponentially with internuclear separation,[196] which implies that the ζ matrix, by including all the delocalization indices in a molecule, does implicitly contain interatomic distance matrix information. The ζ matrix is, thus, rich in coded information about not only the electron density distribution but also about the pair density and even contains some geometric distance information. For these reasons, this matrix can be expected to constitute the raw material for a new class of QSAR correlations.
All matrix representations of molecules have common and wellknown drawbacks, which we briefly discuss below with the particular reference to LDMs.
Problem 1: Nonuniqueness due to ambiguity of atom labelling (i.e., a family of matrices related by permutations all code for the same structure)
The matrices are not unique since they are labellingdependent and an interchange of a pair of atomic labels will result in the permutation of a pair of rows and the simultaneous permutation of a pair of columns. There are n! ways to label n atoms and correspondingly, there are as many different matrices describing the same system. This problem has prompted the graph theoretic community to search for “matrix invariants,” that is, properties associated with these matrices that are labellingindependent. Common invariants include the characteristic polynomial and its roots (the eigenvalues), the trace of the matrix, the determinant of the matrix and so forth. The labelling problem is of course independent of the nature of the matrix elements and hence is also relevant to the LDMs proposed here. We will hence follow the lead of the chemical graphtheoretic community and extract invariants from the LDMs at the cost of complicating the physical interpretation of the final (invariant) descriptors, which are, nevertheless, derived from wellcharacterized measures of electron localization and delocalization compactly collected in the LDMs. Since an LDM is by construction real and symmetric, then it is diagonalizable to a similar matrix D (the “diagonalized localization–delocalization matrix,” or DLDM) through a similarity transformation:
The matrix D is indispensible when the atomic numbering scheme cannot be made consistent across a molecular series that lacks, for example, a common skeleton. In these instances, the original ζ matrices of different molecules cannot be compared directly. At times, however, the ζ matrices themselves can be compared directly when a consistent labelling scheme can be imposed on all molecules (e.g., molecules with the same skeleton as in the example of substituted acetic acids discussed in the section entitled “The LDM ζ and the pK_{a} of Halogenated Acetic Acids” below). Matrices comparison can be accomplished, for example, by calculating the distance between them (e.g., the Frobenius distance, see below).
Problem 2: Unequal matrix dimensions for differently sized molecules
Unless the number of atoms in the compared molecules is the same, the matrix dimensions of ζ will differ from one molecule to another rendering the direct comparison such as calculating the Frobenius distance impossible. A solution to this problem has been proposed by White and Wilson[197] (WW) in the context of image statistical pattern recognition where different sized (sub)graphs are treated on equal footing. The approach of WW consists in determining first the matrix representative of the largest graph (n×n) and enlarge all other smaller matrices to n×n by filling all new empty matrix elements with zeros after placing the smaller matrix in the upper left block of the enlarged matrix, that is, “padding” all matrices other than the largest with zeros to the same common n×n size.[197]
The WW approach is applicable to LDMs since a zero diagonal element means that there are zero electrons localized within a given atomic basin while the zeros in the other (offdiagonal) matrix elements of the row and column that correspond to that atom imply that there are zero electrons shared between that basin and all other atoms in the molecule. This is what computational chemists refer to as a “ghost atom”. Thus, if one wishes to compare the first four saturated aliphatic hydrocarbons, the matrix size will be that of butane (14×14), and the LDMs of methane, ethane, and propane will be enlarged to that size by “zeros padding”. At the DFTB3LYP/6–311G(d,p) level of theory, and given the atomic numbering schemes in the Supporting Information, to two decimal places, an LDM of butane is:
while a corresponding (ghost atompadded) LDM of ethane is:
Problem 3: The mapping of the set of molecular graphs to the set of adjacency matrix representative can be surjective not injective (nonbijective), i.e., one and the same matrix invariant can code for more than one distinct graph
This problem is illustrated by the example recognized by Živković (Quoted through Ref. 192 due to unavailability of the original reference: T. Živković, Report on Summer School, Repino, Leningrad, 1973) and whereby two chemically distinct molecules (Scheme 5)
coded by their respective adjacency matrices yield one and the same characteristic polynomial (x^{10} – 10x^{8} + 33x^{6} – 44x^{4} + 24x^{2} – 4). Naturally, this problem will never occur when the molecules are coded by their respective LDMs or electron densityweighted adjacency matrices with matrix elements that are very different than simple ones and zeros.
Problem 4: Inability to distinguish optical isomers (enantiomeric ambiguity)
While the offdiagonal entries in LDMs are geometric distancesensitive, since the DIs decay with distance, there are no matrix representatives that can capture the relative arrangement of groups in space that determines handness. As mentioned previously in relation to QTMS, this is not a limitation as long as the experimental biological/pharmacological responses are induced using the correct optical isomer(s). For modeling of physicochemical properties, this issue is seldom relevant except in cases such as chiral chromatography where again the appropriate optical isomers must be used to generate the experimental data.
Problem 5: Conformational flexibility
The presence of multiple thermally accessible minima on conformational potential energy surfaces complicates any method that aims at extracting meaningful descriptors except when the descriptors are insensitive to the interatomic distance matrix (such as graphtheoretical indices). In principle, some form of conformational averaging has to be performed to mimic Nature, but in practice this may prove difficult especially when hundreds of molecules are under consideration. This problem is not specific to LDMs and will not be addressed here. We now explore examples illustrating the possible uses of ζ in QSAR.
The LDM ζ and transferability in aliphatic alkanes (methane, ethane, propane, and butane)
The LDM matrices for the first four members of the aliphatic alkanes homologous series have been obtained at the B3LYP/6311G** (the complete set of the ζ matrices and their eigenvalues are available in the Supporting Information). We note in passing that while the meaning of the LI and DI is strictly defined within Hartree–Fock theory, the indices obtained from DFT calculations are often used as they yield numerical values of LIs and DIs that are close to the Hartree–Fock values.
First, we investigate one way by which the similarity or dissimilarity of these four molecules can be quantified using their DLDM representations. It is possible to define a “distance” between two matrices (of the same dimensions), even though such a definition is not unique. The definition used in this work is the Frobenius distance, that is, the “Frobenius norm” of the difference matrix:
There are n(n − 1)/2 different distances d(A,B) between n molecules that may be arranged into a lower or upper triangular intermolecular distance matrix. The Frobenius distance matrix between the DLDMs (D) representing the first four members of the saturated alkane series is given in Table 7.
Methane  Ethane  Propane  

Ethane  3.2941  0  
Propane  4.8450  3.0398  0 
Butane  6.0535  4.5281  2.9155 
An examination of Table 7 reveals that the “contribution of a methylene group” to the distance of one member compared to the next in the homologous series follows expectations in the sense that the first addition of a CH_{2} to CH_{4} to obtain C_{2}H_{6} results in a somewhat anomalous change compared to the subsequent additions of CH_{2} to the growing alkane chain, and where this change appears to quickly reach a constant value (as emphasized by Bader in relation, for example, to the contribution of a methylene group to the heats of formation of aliphatic alkanes).[17] Thus, to two decimals, d(CH_{4},C_{2}H_{6}) = 3.29, d(C_{2}H_{6},C_{3}H_{8}) = 3.04, and d(C_{3}H_{8},C_{4}H_{10}) = 2.92, fast converging to an apparent asymptotic value around 3.
Since this is a relatively uncharted territory, it may be of interest to explore the geometry of this 14D comparison space. One criterion worthy of consideration is the satisfaction of the triangle inequality as emphasized by Muskulus.[198] The inequality applied to intermolecular metrics of three molecules A, B, and C is:
A violation of this inequality indicates that the geometric representation is inadequate as a proximity measure of the studied objects[198] and hence would invalidate the use of d(X,Y) as a similarity or dissimilarity criterion of molecules X and Y. Figure 11 displays the triangular distance and angular relationships between the three possible triads of molecules in this group of four aliphatic alkanes. The figure demonstrates that the inequality is clearly satisfied given the intermolecular distance matrix in Table 7. Further, the angular relationships displayed in the figure exhibit a curious regularity: The angles of the triangles formed by three consecutive members of the homologous series are almost identical (compare the first and third triangles). The almost constancy of the obtuse angle ∼99°–100° in all triangles cannot escape notice either. The significance of these observations is not evident to the author at the time of writing.
Another LDM invariant, other than the DLDM representation discussed above, is the trace tr(ζ) = N_{loc}, Eq. (58). Since in most molecules, the majority of electrons are localized within their respective atomic basins, for a given homologous series N_{loc} is roughly proportional to the total number of electrons N and is thus an extensive variable that grows with size. In other words, tr(ζ) can be expected to capture molecular properties that depend on steric bulk and/or lipid solubility. Figure 12 displays the correlation of tr(ζ) of the first four aliphatic hydrocarbons with a typical sizeextensive property, the calculated total energy E, yielding a correlation with an r^{2} = 1.000 (top), and with a lipid solubility, measured by log P, with an r^{2} = 0.994 (bottom). These preliminary results call for further investigation with many more data points to provide definitive conclusions though.
The LDM ζ and the pK_{a} of halogenated acetic acids
The LDMs of a set of six homohalogenated acetic acid (CH_{2}FCOOH, CHF_{2}COOH, CF_{3}COOH, CH_{2}ClCOOH, CHCl_{2}COOH, and CCl_{3}COOH) are compared with that of the parent unsubstituted compound, acetic acid. (LDM ζ matrices, their eigenvalues, and the atomic numbering scheme for all members of this molecular set are available in the Supporting Information). In this example, a consistent atomic numbering scheme can be adopted for all seven molecules, all of which have a common skeleton and the same number of atoms, obviating the need to diagonalize their LDMs or pad any matrices with ghost atoms, thus here the distances are calculated directly between the LDMs.
The Frobenius intermolecular distance matrix is given in Table 8. In this table, the first column, for example, lists the distance between every substituted acid from the parent unsubstituted acetic acid. A plot of these distances against the corresponding experimental pK_{a}'s is displayed in Figure 13. The figure shows that the experimental pK_{a}'s of all congeners substituted with one and the same halogen (either F or Cl) are almost perfectly linearly correlated with the Frobenius distances of their respective LDMs from that of unsubstituted acetic acid (r^{2} = 0.98 for both regression lines). The slopes of the two lines are different, however, which signals that the pK_{a} is not captured in its entirety by the LDM intermolecular Frobenius distance matrix.
CH_{i}X_{3}_{i}  CH_{3}  CH_{2}F  CHF_{2}  CF_{3}  CH_{2}Cl  CHCl_{2} 

CH_{2}F  4.52  0  
CHF_{2}  7.78  4.76  0  
CF_{3}  11.16  7.89  5.05  0  
CH_{2}Cl  9.29  7.47  8.67  10.65  0  
CHCl_{2}  14.05  11.91  10.47  11.40  9.17  0 
CCl_{3}  19.56  15.83  14.03  12.72  13.87  9.17 
The LDM is generally “biased” somewhat by its diagonal elements Λ(Ω_{i}), which typically have larger magnitudes than the offdiagonal ½δ(Ω_{i},Ω_{j}). In a homologous series, both N_{loc} [Eq. (55)] and N_{deloc} [Eq. (56)] increase roughly in proportion to the total number of electrons N [Eq. (54)]. The situation is different for a series of congeners molecules RX differing only in the identity of a substituent, say with X = F, Cl, Br, or I. In this case, the traces of the LDMs will be significantly different since these substituents will contribute a large fraction of their atomic numbers of electrons to the count yet the delocalization indices in these congeners will, at most, be slightly perturbed by the substitution. For these four congeners RX, then, the diagonal elements are the sole carrier of sizeextensive information in their respective LDMs. For a property primarily driven by electronic structure, such LDMs, biased by steric bulk, may not work just as we have seen in the case of QTMS (see the section entitled “Quantum Topological Molecular Similarity (QTMS)” above). The inability to fit the two sets of congeners of substituted acetic acids (X_{n} = F, Cl) on one line using the distances between their LDMs proves the point (Fig. 13).
We thus, define a sizeindependent electronic fingerprinting matrix representation of a molecule as the diagonalsuppressedLDM or simply the “delocalization matrix” (DM) in which all diagonal elements are suppressed to zeros (no explicit localization information). The similarity can then be measured as the Frobenius distance between DMs when the modeled properties are primarily governed by electronic structure and not by steric bulk.
The experimental pK_{a}'s are plotted in Figure 14 against the distances between the DMs of the substituted and unsubstituted acetic acids, denoted as d_{deloc}(AA,SAA), and are listed in Table 9. This sizeindependent electronic fingerprinting distance correlates much better with the experimental pK_{a}'s than the size dependent LDMsbased distances bringing both sets of congeners (X_{n} = F, Cl) together on a single exponential regression line (r^{2} = 0.979). A similar conclusion is reached within the QTMS framework that models pK_{a}'s accurately on its own without injecting extraneous sizedependent information in the model, since pK_{a} is inherently governed by electronics rather than by steric bulk. It may be concluded that the full LDMs appear to capture properties that depend simultaneously on the electronic structure and size (steric bulk) while the DMs are better tuned to capture the properties primarily dependent on electronic structure.
CH_{i}X_{3}_{i}  pK_{a}a  d_{deloc}(CH_{3}, CH_{i}X_{3}_{i}) 



CH_{3}  4.756  0.0000 
CH_{2}Cl  2.87  0.1000 
CH_{2}F  2.59  0.1122 
CHCl_{2}  1.35  0.1814 
CHF_{2}  1.33  0.2229 
CCl_{3}  0.66  0.2404 
CF_{3}  0.52  0.3470 
We return to Table 8 for some final observations. The distances between CH_{i}F_{3–}_{i}COOH and CH_{i}_{−1}F_{2–}_{i}COOH (i = 3, 2, 1), that is, two homologous members differing by one fluorine atom, are 4.52, 4.76, and 5.05 for i = 3, 2, and 1, respectively. A fluorine atom's contribution to the distance is between ∼ 4.5 and 5.1 and it changes by roughly a constant amount of ∼0.25 with i. The corresponding distances between CH_{i}Cl_{3–}_{i}COOH and CH_{i}_{−1}Cl_{2–}_{i}COOH in the case of chlorine substitutions are 9.29, 9.17, and 9.17, exhibiting an almost constant contribution to the distance for every addition of a chorine atom. The table exhibits other regularities that we skip for brevity.
Molecular Electrostatic Potential (MESP) and Field (MESF)
Bonaccorsi, Scrocco, and Tomasi[56] suggested in 1970 the use of the MESP (V(r)) to locate the molecular regions most prone electrophilic attack in a series of six small threemembered heterocyclic rings and cyclopropane. In this article, the authors calculate, plot, and analyze the topography of the MESP, for the first time, to extract more information from the calculated wavefunctions than merely atomic charges. The use of the MESP in determining the low energy (preferred) relative orientations of reactants, of guests with respect to their hosts, and of substrates or inhibitors with respect to their active sites has expanded into a wide area of computational and theoretical chemistry.[57, 58, 60, 62, 63, 201] MESP can also be calculated directly from Xray crystallographic scattering data.[2527, 61, 67] It is impossible to review this field of research here, instead, a few examples are selected for illustration as they relate to QSAR.
The total charge density expressed in Eq. (4), that is, the discrete charge distribution of the nuclei in addition to the continuous diffuse electron density, give rise to an electrostatic potential surrounding the molecule. In principle, the MESP can be calculated from[56]
in a.u., where Z_{α} is the charge of αth nucleus located at R_{α}, and where the αth term in the first sum must be eliminated when R_{α} = r (to evaluate the MESP at the position of the αth nucleus). The dimensions of V are [energy] × [charge^{−1}], but often the displays of the MESP feature contour lines that are labeled in units of energy where the multiplication by a test unit charge (unity in a.u.) is implied if not stated explicitly.
The first HK theorem tells us that ρ(r) fixes the external potential, that is, the positions and charges of the nuclei, that is, the first term in the R.H.S. of Eq. (64). Then ρ(r), which appears as the integrand of the second term, fixes the R.H.S. of Eq. (64) completely and hence there exists a bijective mapping between ρ(r) and V(r).
Before discussing a few applications, we emphasize that V(r) has the same fundamental status as a basic descriptor of the ground state as ρ(r). In 1962, Wilson has shown that the energy of a molecule can be expressed as a functional of ρ(r) and its change in response to a scaling of the nuclear charges by a continuous parameter λ.[202] Starting from this result, Politzer and Parr demonstrated that the total energy of a molecule E can be expressed as a sum of M terms of the form[203, 204]
where is the MESP at the position of the αth nucleus. The result in Eq. (65) is exact and demonstrates the fundamental role of V(r) even if it is not implementable in practice without approximations.
While the electrostatic potential of an isolated atom is always positive everywhere in space,[205] clearly in the case of molecules, according to Eq. (64), the MESP at r can have either sign that diagnose regions that are primarily dominated by the bare nuclear potential, that is, the first term on the R.H.S. of Eq. (64), or dominated by the electron density (the second term). While it is quite possible and common for neutral molecules to exhibit local minima (other than at the nuclear positions), it is impossible for any neutral molecule in the ground or excited states to exhibit true local maxima in the MESP, as demonstrated by Pathak and Gadre[206] on the basis of classical electrostatic arguments (only saddle points and directional maxima are allowed).
Minima in the MESP (V_{min}) pinpoint the sites favoring electrophilic attacks, while the magnitudes of corresponding V_{min}'s in related molecules often correlate with reactivity toward electrophiles. The converse correlation with reactivity toward nucleophiles is nonexistent due to the absence of local maxima in the MESP.
In 1982, Weiner, Langridge, Blaney, Schaefer, and Kollman (WLBSK) proposed to colorcode molecular 3D surfaces with the values of the MESP as a function of position on these surfaces.[207] In this manner, the structural 3D information conveyed by the surface of the molecule is augmented with electrostatic information provided by the color coding of the MESP on that surface, creating a “fourinformationdimensional image,” to quote Kahn, Pau, and Hehre (KPH).[208]
In the original proposal by WLBSK, the molecular surface was taken as the Connolly surface[207] (the solvent exclusion cavity in the bulk solvent). Nowadays, molecular isoelectron density envelopes are often selected for the color coding by the values of the MESP, as first suggested by KPH.[208] While true 3D maxima in the MESP are not possible, 2D local maxima on a given molecular surface, such as an isodensity isosurface, are possible.
Politzer has pioneered the application of MESP to solve chemical, environmental, and drug design problems. It is fitting then to briefly discuss one of his classical early works that provide a fast and accurate screening of the carcinogenic potential of epoxides.[201, 209] Table 10 lists a series of epoxides sorted in decreasing magnitude of the V_{min} near the oxygen atom. The table suggests a threshold of around −30 kcal mol^{−1} above which a compound is highly suspected of carcinogenic activity. This hypothesis has been confirmed by further extensive studies by Politzer and Murray[201] of over 60 epoxides, several of them calculated on the request of the US Environmental Protection Agency. The first step in DNA damage by epoxides is believed to involve protonation of the oxygen atom, which in turn promotes ring opening, which is consistent with Politzer's finding that highly negative V_{min} near the oxygen atom promotes carcinogenic activity.
Molecule  V_{min}(kcal mol^{−1})  Carcinogenic activity 




−53.4  Active 

−51.3  Active 

−47.7  Active 

−43.1  Active 

−38.1  Active 

−23.1  Inactive 

−17.1  Inactive 

−9.2  Inactive 
The second more recent example relates the highest positive MESP on the solventaccessible surface (SAS) to the hydrogenbondingdonor ability of the molecule.[210] First, the experimental quantification of molecular hydrogendonor (or acceptor) ability deserves a brief introduction.
Abraham et al.[211] express what they terms “solute property (SP)” as a sum of four variables gauged against saturated alkanes for which all properties are taken as zero. The four terms are: (1) E: The excess molar refraction (in units of cm^{3} mol^{−1}/100), which models the excess dispersion interaction capability over saturated alkanes that π and nelectrons impart; (2) S: solute polarity/polarizability, a term that captures the strength of solute–solvent interaction; (3) A: solute overall (sum) of Hbond acidity, the term subject of the modeling in the reviewed example below; (4) B: solute overall Hbond basicity; and (5) V_{M}: the McGowan characteristic molar volume (in units of cm^{3} mol^{−1}/100), which is a fragment contribution approximate molecular volume—not to be confused with the symbol for MESP. The SP is then taken as the dependent variable of a multivariate regression model of the form:[211]
where the small letters characterize the solvent effect on the solute and the capital letters the effect of the solute on the solvent. As a concrete example, consider the regression equation:[211]
where STD is standard deviation, obtained for the logarithm of the octanol–water partition coefficient using a set of 613 compounds. In this set, a ≈ 0 implies that octanol has the same basic hydrogen bonding power as water (essentially an oxygen atom acting as a hydrogen acceptor). In the meantime, water is far more acidic (better proton donor) than octanol, which is reflected in a significantly negative b.
Ghafourian and Dearden[210] model the experimentally determined A values for a set of 63 solute molecules using the MESP's most positive value on the SAS, V^{+}. Intuitively, the hydrogen atoms with the higher magnitude of V^{+} can be expected to be more acidic. These authors present 17 statistical models constructed from a variety of combinations of Gasteiger charges, V^{+}, and E_{LUMO} (and molecular sets that include or exclude two outliers) to predict the Abraham hydrogen bonding donor A parameter. The best two models including the two outliers are[210]
and
which are both slightly improved to r^{2} of 0.783 and 0.858, respectively, on exclusion of outliers (n = 57 in both latter cases). The performance of Eq. (69) in predicting experimental values is displayed graphically in Figure 15, which is excellent in the light of the smallness of the parameterstodata ratio of only 1:31.
The next example considers experimentally determined MESP maps. Today, advances in crystallography and the availability of multipolar refinement parameters databases[87, 96109] are such that atomic resolution protein structures are within reach, which enables a better grasp of the electrostatic complementarity between guests and their protein hosts and the role played by the coenzymes in finetuning the active site electrostatics.[106, 212214] High resolution Xray structures of enzymes complexed with their substrates, inhibitors, and/or coenzymes (which could be cosubstrates such as nicotinamide adenine dinucleotide phosphate cation (NADP^{+})) have been used to generate accurate maps of the MESP in the active site. These maps are particularly revealing if they are generated for the active site with and without its host(s).
The group in Nancy (Lecomte, Gillot, Jelsh, and coworkers) and the group in Strasbourg (Podjarny et al.) report experimental and calculated MESP maps of the inactive form (apoenzyme) of human aldol reductase (hALR2) (E); the active holoenzyme (apoenzyme complexed with its cosubstrate NADP^{+}) but without the substrate or inhibitors (EC); an inhibitor of aldol reductase (IDD594) alone (I); and the enzyme–cosubstrate–inhibitor (ECI) complex. All four MESPs are calculated from the experimental structure of the ECI complex by selective removal of the unwanted molecules.[106, 212214] The study provides a striking visualization of the electrostatic lockandkey complementarity of the MESP of the inhibitor and that of the active site. The cosubstrate is shown to accentuate the inhibitor–enzyme complementarity (Fig. 16).[106, 212214]
Another application of the MESP in drugdesign relates to bioisosterism. Bioisosteres are pharmacologically interchangeable groups used to optimize the pharmacodynamics and the pharmacokinetics [absorption, distribution, metabolism, excretion, and toxicity (ADMET)] of a drug.[215, 216] The bioisosterism of chemically unrelated pairs such as ethanoic (acetic) acid anion (EA) and 5methyltertrazole (5MT) is puzzling.[217, 218] The puzzle is resolved by examining the geometric disposition of the V_{min} surrounding the two bioisosteres, a disposition that is not much altered by changing the capping groups.[219] The MESP exhibits four local minima around each bioisostere in a quasiidentical geometric and superimposable planar arrangement as can be seen from the distance matrices between the centroids of these minimas (Fig. 17). This similarity of the 3D topography of the MESP is what the receptor site would “see,” and presumably has complementary electrophilic groups properly oriented in space (Fig. 17).[219]
It is noted in passing that the average group electron densities of these two chemically (very) different bioisosteres, EA and 5MT, is almost identical. Table 11 lists the atomic populations, volumes, and average electron densities defined as the electron population divided by the volume. The table lists the respective atomic, group, and molecular values of both isosteres. It is remarkable that the average density of the COO^{–} group is 0.0659 a.u. and that of the CN_{4}^{–} group is 0.0660, differing by only 0.0001 a.u. (∼0.15%) despite of being composed of different number and types of atoms and bonds, while the methyl groups in the two molecules have respective densities of 0.0386 a.u. (EA) and 0.0395 a.u. (5MT), that is, differing by an order of magnitude more that the bioisosteric groups in absolute terms and by ∼2.3% in relative terms.
Ehanoic acid anion (EA)

5Mehyltetrazole anion (5MT)



Atom (Ω)  N(Ω)  Vol(Ω)  Atom (Ω)  N(Ω)  Vol(Ω)  


C1  6.0102  73.06  0.0823  C1  5.9629  70.79  0.0842 
H2  1.0397  54.30  0.0192  H2  1.0174  52.54  0.0194 
H3  1.0392  54.67  0.0190  H3  1.0255  53.18  0.0193 
H4  1.0397  54.81  0.0190  H4  1.0114  52.01  0.0195 
9.1288  236.83  0.0386  9.0171  228.52  0.0395  
N5  7.7152  124.55  0.0619  
N6  7.2050  113.09  0.0637  
C5  4.3728  38.76  0.1128  N7  7.7174  124.85  0.0618 
O6  9.2513  153.98  0.0601  N9  7.2067  113.18  0.0637 
O7  9.2466  154.10  0.0600  C8  5.1390  54.71  0.0939 
22.8707  346.84  0.0659  34.9832  530.38  0.0660  
31.9995  583.67  0.0548  44.0003  758.90  0.0580 
Finally, it is sometime suggested to use the electric field E(r) as opposed to V(r), since the former provides a more direct display of the direction of approach of an infinitesimal positive charge to the undisturbed molecular charge distribution. From elementary electrostatics:
The representation of E instead of V is more convenient in cases when, for example, we are interested in the relative orientations of motionrestricted host molecules in the electric field generated by the restricting guest. A display of experimentally determined electric field lines in the active site of fatty acid binding protein, without its host (oleic acid), (B. Guillot, A. Podjarny, Private communication 2013), is provided in Figure 18. The field lines appear as bundles emanating from electropositive atoms and end at electronegative atom sinks. The apparent crossing of field lines in this image is due to the projection of 3D curved field lines on the plane of the figure (field lines, of course, never cross). The intensity of the electric field in this enzyme active site is reflected in the density of the field lines and averages to a magnitude of the order of 10^{9} V m^{−1} near the centroid of the cavity (B. Guillot, A. Podjarny, Private communication 2013). These electric field strengths, typically encountered in enzyme active sites, can alter the rates of chemical reactions significantly and can have detectable IR vibrational Stark shifts.[220232]
Through the use of calibration curves, vibrational Stark shifts exhibited by small host molecules or chromophoric groups in molecules trapped into enzyme active sites can be used to probe the strengths and directions of the local electric fields. Such studies have been carried out for a number of host–protein systems: CO bound to the Fe of the porphyrin of myoglobin,[225] and CN (nitrile)containing substrates complexed at the human aldose reductase enzyme (hALR2) active site[226, 227] are two notable examples. The changes in the local fields accompanying point mutations in the active site have been monitored through the observation of the corresponding vibrational Stark shifts of the nitrile group IR fingerprint by Boxer et al.[226, 227] (See also Refs. 228–231).
Conclusions
This review focuses on the intersection of two areas of research: One is QSARtype research and the other is the study of observable molecular fields derived from either theoretical calculations or experiment. QSARtype studies have, in general, a dominant utilitarian flavor in the sense that it places the highest import on the speedy and accurate prediction of activity with perhaps a secondary focus on insight as long as the QSAR model yields robust statistics. The purpose here is not to criticize this type of QSAR studies since they are of vital importance to cure disease and improve the lives of humans. Rather, what is being argued is that by choosing descriptors directly derived from fundamental fields such as the electron density or the electrostatic potential, then by necessity those fields must be the source of all predictions. It follows that density and/or MESPderived descriptors can yield robust QSAR relationships in addition to insight.
A densitybased descriptor as simple as the CSI has been shown to have a particularly wide reach of applicability. This one descriptor, which measures the local degree of charge separation at an atomic resolution, can be used in the predictive modeling of molecular properties that include PMVs, partition coefficients (log P), the change in the protein stability on mutation, the structure of the genetic code, and excitation energies of simple molecules. The latter prediction follows directly from the HK theorem as discussed at the start of the section entitled “Bond Properties as Predictors of Spectroscopic Transitions and NMR Proton Chemical Shifts” since the ground state electron density encodes information about excited states as well.
We have also seen how the concept of QTAIM atomic energy, E(Ω), and that of atomic electric charge, q(H), converge, through QSAR modeling, to two formulas of identical form despite of being reached via totally different considerations:
where the first, due to Adam,[158] is derived from statistical mechanical arguments and approximations followed by a fitting to experimental pK_{a} values, and the second is obtained by Hollingsworth et al.[160] from a direct fitting to experiment on grounds of chemical intuition. The connection between these two QSAR equations can be rationalized as follows: The more an acid is acidic the more positively charged is its acidic (dissociating) hydrogen, and hence by Eq. (27) it has a lower pK_{a}. In the meantime, the more positive the hydrogen, the less is its electron population, and the lower the electron population the lower the stabilization due to the nuclear–electron Coulombic attraction, and hence that leads to a higher atomic energy (less negative), which leads to a lower pK_{a} as well by Eq. (26).
Popelier's QTMS approach has been briefly reviewed in the light of being a molecular structure fingerprinting tools, invaluable in QTAIMbased QSAR. We have seen how the inability of QTMS to capture size and hydrophobicity/hydrophilicity information can actually be a strength rather than a weakness since it can be used as a diagnostic to indicate if a given property is primarily driven by bulk and/or by lipid/aqueous solubility or is primarily a reflection of electronic structure. Popelier and coworkers have complemented QTMS models, when necessary, with bulk and solubility descriptors extraneous to QTAIM, in analogy with QSAR strategies based on Hammett constants. This is an example of the complementarity and synergism between QTAIM and other more traditional approaches in QSAR.
The QTMS approach has the flexibility of being applicable to any desired portion of the molecular framework. One can construct a QTMS QSAR using any desired isolated subgraph of the molecule such as the subgraph representing the active site fragment. The ability to isolate and focus on a molecular active site is of particular value when such a site is common to a series of molecules that otherwise differ significantly in their molecular skeletons, and which are hence not amenable to direct comparison.
QTMS is based on bond properties, that is, a sampling of the electron density at the BCP. There are other approaches that share this focus on BCP properties in the construction of predictive tools but without constructing the BCP abstract mathematical space followed by the use of Eq. (13). An example is the work of BAA. These workers use BCP properties to predict NMR and UV spectroscopic transitions,[145147] while others use bond properties in the empirical prediction of πstacking interaction energies.[178180] CortésGuzmán and coworkers[177] use the net group charge transfer as a predictor of the strength of πstacking between anticancer Casiopeinas^{®} and adenine to support their hypothesis for the antineoplastic activity of this class of drugs through DNA intercalation.
There are a number of parallels between features of QTMS and the newly proposed LDMs and EDWAMs. First, both LDMs and EDWAMs can be defined for an arbitrary molecular fragment subgraph, in which case the sum of sums in Eq. (57) for an LDM would yield the electronic population of the fragment. Second, the LDMs have a builtin measure of molecular size for congeners with the same number of atoms and a common framework (see the section entitled “The LDM ζ and the pK_{a} of Halogenated Acetic Acids”), namely, the sum of the diagonal elements tr(ζ). The LDMs, hence, may not be ideal to model properties such as pK_{a} that strongly depend on electronic structure but only weakly on molecular size. In these cases, the diagonal suppressed LDMs (the DMs) perform far better being primarily an electronic fingerprinting tool, as the example of halogenated acetic acids demonstrates.
The reasoning leading to LDMs can be extended to any property that can be partitioned into atomic “self” and twoatom “interaction” terms, to use the Oviedo Group's IQA terminology.[119121] Within the IQA approach, the energy of a molecule is rigorously decomposed as:
which is identical in form to the R.H.S. of the second equality of Eq. (54) and hence can be conveniently organized in a matrix format similar to the LDM:
where the sum of any row or column is the socalled “additive” energy of atom Ω_{i}, E_{add}(Ω_{i}), since the total molecular energy is equal to the sum of all E_{add}(Ω_{i}). The same QSAR manipulations expressed in this article for the LDMs (ζ) can be applied to η.
Closely related to the total charge density [Eq. (4)] is the MESP V(r) and its associated electrostatic field E(r). The MESP and field have the same experimentallyaccessible and fundamental qualities of ρ itself [Eq. (65)]. Modern ultrahigh resolution Xray crystallography is capable of delivering accurate experimental MESPs and electric fields even within the confines of an enzyme active site. Probing the electric field inside an active site is of paramount importance in understanding enzyme catalysis as has been emphasized in the recent literature.[220232]
Lecomte, Guillot, Jelsh, Podjarny and coworkers[106, 212214] (and B. Guillot, A. Podjarny, Private communication 2013) map the electric fields in the active site of enzymes with and without their hosts (whether cofactors, substrates, or inhibitors, and combinations of those). This strategy yields key information about the electrostatic characteristics necessary for optimal inhibition which can guide drugdesign by “reverse engineering” in a striking visualization and use of Hermann Emil Fischer's “lockandkey” hypothesis.
Acknowledgments
The author is indebted to the late Professor Richard F. W. Bader (1931–2012) for numerous discussions over the years. The author thanks Dr. Todd A. Keith and Professor Fernando CortésGuzmán for making their manuscript available prior to publication;^{[175]} Dr. Benoit Guillot and Dr. Alberto D. Podjarny for sharing the electric field map displayed in Figure 18 prior to publication (B. Guillot, A. Podjarny, Private communication 2013); Professor Lou Massa, Dr. Nenad Trinajstić (NT), Dr. Sonja Nikolić (SN) for discussions regarding graphtheoretical matrices, and NT and SN for their hospitality in Zagreb and for offering the author a copy of Refs. 51 and 189; and Professor Paul L. A. Popelier for helpful comments on the manuscript regarding QTMS.
Note added in Proof
Very recently, a paper has appeared[233] that demonstrates strong correlations between the number of hydrogenhydrogen close contacts (hydrogenhydrogen weak bonding interactions)[234236] in alkane dimers and boiling points. These weak interactions also appear to have a secondary role in the relative stability of branched alkanes with respect to their less branched isomers.[233] This provides another example of a direct link between an observable property of the topology of the electron density and a measurable physical property of the corresponding compound. A review that examines the role of these weak interactions in the stability of molecules and crystals is being coauthored by the present writer and will be submitted to publication soon.