Distance Geometry
Distance Geometry (DG) is the mathematical framework for determining point positions from interpoint distances. In molecular conformer generation, we know approximate distance ranges between all atom pairs from the molecular topology, and we need to find 3D coordinates that satisfy these constraints.
The Core Idea
Given
where
Distance Bounds Sources
The bounds matrix is populated from several sources, each corresponding to the topological distance between atoms:
| Topological Distance | Source | Precision |
|---|---|---|
| 1 (bonded) | UFF bond lengths | ±0.01 Å |
| 2 (1-3 path) | Law of cosines from bond angles | ±0.04 Å |
| 3 (1-4 path) | Torsion angle cis/trans extremes | computed |
| 4 (1-5 path) | Chained 1-4 distances | ±0.08 Å |
| ≥5 (non-bonded) | Van der Waals radii | 0.7–1.0× |
Details on each: Bounds Matrix
The Triangle Inequality
For any three points in metric space:
Applied to bounds, this means for all triples
These updates are applied iteratively via Floyd-Warshall until convergence. This tightens the bounds and ensures feasibility — if any
Distance Picking
Given smoothed bounds
where
This is the same RNG as RDKit's boost::minstd_rand, ensuring reproducible, bit-identical outputs for the same seed.
From Distances to Coordinates
The Metric Matrix (Cayley-Menger Transform)
Given a distance matrix
where:
Intuitively,
Eigendecomposition
If the distances correspond to an exact Euclidean embedding in
The coordinates are recovered as:
where
Why 4D?
When the molecule has chiral centers (@/@@ in SMILES), we embed in 4 dimensions instead of 3. This gives the optimizer additional freedom to satisfy chiral volume constraints. After bounds minimization, the 4th dimension is collapsed:
- Phase 1 bounds FF:
, — establish chirality - Phase 2 bounds FF:
, — collapse 4th dim - Take first 3 columns of the coordinate matrix
Power Iteration
Instead of a full eigendecomposition (
- Start with a random vector
- Iterate:
- Eigenvalue:
- Deflate:
- Repeat for next eigenpair
This is more efficient for large molecules since we only need 3–4 eigenpairs, not all
Rejection Criteria
Not every random distance sample yields a valid embedding. Rejections happen when:
| Condition | Meaning |
|---|---|
| Degenerate metric matrix | |
| Distances not embeddable in | |
| Many consecutive failures | Switch to random-box fallback |
After